Saturday, April 9, 2016

Scale up and scale out with .NET framework

 

Recently I came across a question about the ability of .NET to scale, comparing to other frameworks. Here’s my answer. I hope you’ll find it useful.

While the scalability of an application is mostly determined by the way in which the code is written, the framework / platform that is being used can significantly influence the amount of effort required to produce an application that scales gracefully to many cores (scale up) and many machines (scale out).

Before joining Microsoft, I was part of a team that built a distributed, mission critical Command and Control system using .NET technologies (almost exclusively). The applications that make up the system are deployed on powerful servers, they are all stateful, massively concurrent, with strict throughput requirements. We were able to build a quality, maintainable, production ready system in less than 3 years (which is a record time for such system).

While working for Microsoft in the past 6 years, I worked on hyper scale services deployed on thousands of machines, all of which have been written using .NET tech. In most cases, unless there’s a good reason, .NET is used for the front ends (ASP.NET MVC), mid-tier and backend services.

Here’s how .NET empowers applications that need to scale up and scale out.

Scale up:

Building a reliable, high performance application that scales gracefully to many core hardware is hard to do. One of the challenges associated with scalability is finding the most efficient way to control the concurrency of the application. Practically, we need to figure out a way to divide the work and distribute it among the threads such that we put the maximum amount of cores to work. Another challenge that many developers struggle with is synchronizing access to shared mutable state (to avoid data corruption, race conditions and the like) while minimizing contentions between the threads.

Concurrency Control

Take the below concurrency/throughput analysis for example, note how throughput peaks at concurrency level of 20 (threads) and degrades when concurrency level exceeds 25. 

​So how can a framework help you maximize throughput and improve the scalability characteristic of your application?

It can make concurrency control dead simple. It can include tools that allow you to visualize, debug and reason about your concurrent code. It can have first-class language support for writing asynchronous code​. It can include best in class synchronization and coordination primitives and collections.

In the past 13 years I’ve been following the incremental progress that the developer division made in this space - it has been a fantastic journey.

In the first .NET version the managed Thread Pool was introduced to provide a convenient way to run asynchronous work. The Thread Pool optimizes the creation and destruction of threads (according to JoeDuffy, it cost ~200,000 cycles to create a thread) through the use of heuristic ‘thread injection and retirement algorithm’ that determines the optimal number of threads by looking at the machine architecture, rate of incoming work and current CPU utilization.

In .NET 4.0, TPL (Task Parallel Library) was introduced. The Task Parallel Library includes many features that enable the application to scale better. It supports worker thread local pool (to reduce contentions on the global queue) with work stealing capabilities, and the support for concurrency levels tuning (setting the number of tasks that are allowed to run in parallel)

In .NET 4.5 -  the async-await keywords were introduced, making asynchronicity a first class language feature, using compiler magic to make code that looks synchronous run asynchronously. As a result – we get all the advantages of asynchronous programming with a fraction of the effort.

Consistency / Synchronization

Although more and more code is now tempted to run in parallel, protecting shared mutable data from concurrent access (without killing scalability) is still a huge challenge. Some applications can get away relatively easy by sharing only immutable objects, or using lock free synchronization and coordination primitives (e.g. ConcurrentDictionary) which eliminate the need for locks almost entirely. However, in order to achieve greater scalability there’s not escape from using fine-grained locks.

In an attempt to provide a solution for mutable in-memory data sharing that on one hand scales, and on the other hand easy to use and less error prone than fine grained locks – the team worked on a Software Transactional Memory support for .NET that would have ease the tension between lock granularity and concurrency. With STM, instead of using multiple locks of various kinds to synchronize access to shared objects, you simply wrap all the code that access those objects in a transaction and let the runtime execute it atomically and in isolation by doing the appropriate synchronization behind the scenes. Unfortunately, this project never materialized.

As far as I know, the .NET team was the only one that even made a serious effort to make fine grand concurrency simpler to use in a non functional language.

Speaking of functional languages, F# is a great choice for building massively concurrent applications. Since  in F# structures are immutable by default, sharing state and avoiding locks is much easier. F# also integrates seamlessly with the .NET ecosystem, which gives you access to all the third party .NET libraries and tools (including TPL).

Scale out:

Say that you are building a stateless website/service that needs to scale to support millions of users: You can deploy your ASP.NET application as WebSite or Cloud Service to Microsoft Azure public cloud (and soon Microsoft Azure Stack for on premises) and run it on thousands of machines. You get automatic load balancing inside the data-center, and you can use Traffic Manager to load balance requests cross data-centers. All with very little effort.

If you are building a statesful service (or combination of stateful and staeless), you can use the Azure Service Fabric, which will allow you deploy and manage hundreds or thousandof .NET applications on a cluster of machines. You can scale up or scale down your cluster easily, knowing that the applications scale according to available resources.

Note that you can use the above with none .NET applications. But most of the tooling and libraries are optimized for .NET.

Saturday, January 24, 2015

Could not obtain exclusive lock on database 'model'? Here’s how you find the smoking gun..

If your team owns a process that provision SQL Server databases in masses, you probably encountered the below exception in one of your test clusters, or even in production:

Microsoft.SqlServer.Management.Smo.FailedOperationException: Create failed for Database '***'.  ---> Microsoft.SqlServer.Management.Common.ExecutionFailureException: An exception occurred while executing a Transact-SQL statement or batch. ---> System.Data.SqlClient.SqlException: Could not obtain exclusive lock on database 'model'. Retry the operation later.
CREATE DATABASE failed. Some file names listed could not be created. Check related errors.

This happens because when your process attempted to create a database, some other process was using the ‘model’ database. It’s possible that someone started a session with ‘model’ via SQL Server Management Studio, or maybe a SCOM process was using it.

Anyways, the important thing is that you find the smoking gun to unlock your process and make sure that this doesn’t happen again...

If you can repro the problem, you can find the culprit simply by running: ‘EXEC sp_who2’, you will get a list of all the processes/users that use any database in your server, including the ‘model’ database.

Or, you can run the following query in order to get info on the processes that use the ‘model’ database:

DECLARE @Table TABLE
(    SPID INT,
    Status VARCHAR(MAX),
    LOGIN VARCHAR(MAX),
    HostName VARCHAR(MAX),
    BlkBy VARCHAR(MAX),
    DBName VARCHAR(MAX),
    Command VARCHAR(MAX),
    CPUTime INT,
    DiskIO INT,
    LastBatch VARCHAR(MAX),
    ProgramName VARCHAR(MAX),
    SPID_1 INT,
    REQUESTID INT) 
INSERT INTO @Table EXEC sp_who2 
SELECT * FROM  @Table WHERE DBName='model'

Alternatively, you can use:

SELECT nt_domain, nt_username, program_name, cmd, status 
FROM sys.sysprocesses 
WHERE DB_NAME(dbid)='model'

I prefer the last option. The results will look something like this:

image

However, if your process is running somewhere in the cloud, it might be helpfully to get the blame list programmatically. Here’s a sample code that detects ‘model lock’ type error and adds details on the culprit processes to the exception.

    class Program
    {
        static void Main(string[] args)
        {
            var builder = new SqlConnectionStringBuilder()
            {
                DataSource = "localhost",
                IntegratedSecurity = true
            };

            var sqlConnection = new SqlConnection(builder.ConnectionString);
            sqlConnection.Open();
            try
            {
                Server server = new Server(new ServerConnection(sqlConnection));

                string name = "test" + Guid.NewGuid().ToString("N");
                Console.WriteLine("Creating db " + name);
                Database database = new Database(server, name);
                try
                {
                    database.Create();
                    Console.WriteLine("No Failure");
                }
                catch (FailedOperationException e)
                {
                    HandleModelLockException(server.Databases, e);
                    throw;
                }
            }
            catch (Exception e)
            {
                Console.WriteLine(e.ToString());
            }
            finally 
            {
                sqlConnection.Close();
                Console.Read();
            }
        }

        private static void HandleModelLockException(DatabaseCollection databases, FailedOperationException e)
        {
            bool modelLockException = ModelDatabaseLockedAnalysis.IsModelLockException(e);

            if (!modelLockException) return;

            string message = ModelDatabaseLockedAnalysis.Analyze(databases);
            
            var s = FormatErrorMessage(message);
            throw new ModelDatabaseLockedException(s, e);
        }

        private static string FormatErrorMessage(string message)
        {
            var builder = new StringBuilder();
            builder.AppendLine();
            builder.AppendLine("Failed to create database since the system database 'model' is locked. " +
                               "Here's a list of processes that currently use the 'model database:");
            builder.AppendLine(message.Replace(" ", string.Empty));
            string s = builder.ToString();
            return s;
        }
    }

The code above creates a database, incase of an error it calls HandleModelLockException. The latter check if the exception (or one of its InnerExceptions) indicates ‘model locked’ type of error. In that case, it calls ModelDatabaseLockedAnalysis.Analyze that queries for the processes that use the ‘model’ database. Than, it simply add the list to the exception.

Here’s the implementation of ModelDatabaseLockedAnalysis:

    public class ModelDatabaseLockedAnalysis
    {
        private const string query =
            "SELECT nt_domain, nt_username, program_name, cmd, status FROM sys.sysprocesses WHERE DB_NAME(dbid)='model'";

        public static bool IsModelLockException(FailedOperationException e)
        {
            const string messageIndication = "lock on database 'model'";

            Exception exception = e;
            while (exception != null)
            {
                bool modelLockException = exception.Message.ToLower().Contains(messageIndication);
                if (modelLockException)
                {
                    return true;
                }
                exception = exception.InnerException;
            }
            return false;
        }

        public static string Analyze(DatabaseCollection databaseCollection)
        {
            var masterDb = databaseCollection["master"];
            DataSet dataSet = masterDb.ExecuteWithResults(query);
            IEnumerable<LockingProcessInfo> processInfos = Parse(dataSet);
            var builder = new StringBuilder();
            foreach (var processInfo in processInfos)
            {
                builder.AppendLine(processInfo.ToString());
            }
            return builder.ToString();
        }

        private static IEnumerable<LockingProcessInfo> Parse(DataSet dataSet)
        {
            DataTable dataTable = dataSet.Tables[0];
            var list = new List<LockingProcessInfo>();
            foreach (DataRow row in dataTable.Rows)
            {
                string domain = (string) row["nt_domain"];
                string username = (string) row["nt_username"];
                string programname = (string) row["program_name"];
                string cmd = (string) row["cmd"];
                string status = (string) row["status"];
                list.Add(new LockingProcessInfo(domain, username, programname, cmd, status));
            }
            return list;
        }
    }
    
    public class LockingProcessInfo
    {
        public LockingProcessInfo(string domain, string username, string programname, string cmd, string status)
        {
            this.Domain = domain;
            this.Username = username;
            this.Programname = programname;
            this.Cmd = cmd;
            this.Status = status;
        }

        public string Domain { get; private set; }

        public string Username { get; private set; }

        public string Programname { get; private set; }

        public string Cmd { get; private set; }

        public string Status { get; private set; }

        public override string ToString()
        {
            return string.Format("Domain: {0}, Username: {1}, Programname: {2}, Cmd: {3}, Status: {4}", Domain, Username, Programname, Cmd, Status);
        }
    }

    internal class ModelDatabaseLockedException : Exception
    {
        public ModelDatabaseLockedException(string message, Exception exception)
            : base(message, exception)
        {
        }
    }

In order to test the code, open SQL Server Management Studio, create a new query on the ‘master’ database, and run:

use model
go

That would lock the ‘model’ database until you close the query window.

Now, run the program and you will get something like:

image 

You can see clearly that the user ‘avezra’ is using the database using SQL Server Management Studio.