Blog | Oct 7, 2013

SQL Server Performance Troubleshooting – Waits and Queues

One of the frequently reported issues by the end users is that “Our Database Servers performance is very slow”.  Amongst numerous possibilities and ways to find the bottleneck, we will understand about Wait Stats and Queues to revert the above situation. Waits tell us where SQL Server is spending most of the time waiting and enables to concentrate on most significant performance counters and Queues.

What are Waits and Queues?

Waits - SQL Server is designed to track the wait information when a query request waits on a resource (IO, memory, network, CPU etc.). When a query encounters something that is not available, SQL Server sets wait type for that wait reason. This data is retained in SQL Dynamic Views can be used to understand why things are slower than expected.
Queues- Measure system resources and utilization and it is represented by performance monitor objects and counters. A look at the highest wait types can give us the perspective to focus on the most relevant performance counters to find the problem area.

SQL Process Life Cycle – It is important to understand process life cycle to interpret the types of ‘Waits’.

Let’s look at a scenario we can all relate to.  It so happens that one has to wait for a Price Tag at a cashier and then step aside until it’s available? Till then the cashier will processes the next available customer in queue. Similarly, SQL Server SQLOS uses a scheduler to manage execution queue and is mainly divided into 3 states:

Running: A query which is being executed on a CPU is called a running query until it has to wait for a resource, it then moves onto a Suspended state, a query which is waiting for the resources. The goal is to find why and what it is waiting for. Once the resources are available the process moves to a Runnable State, and wait for the CPU time.

A combination of following 3 DMV’s can give us some real good information:

Sys.dm_os_wait_stats –Primary DMV that aggregates wait stats information at Instance level for all sessions since last SQL Restart or since wait stats last cleared.
sys.dm_os_waiting_tasks and sys.dm_exec_requests – A query combining these two provide us with detailed information at session id level on the wait types, wait time, query that is being executed, objects that are participating etc.

Interpretting Wait Types – First of all Waits always occur and not all wait types are areas of concern such as the ones related to background tasks. We will discuss some of the most frequent and important ones here.

CXPACKET -- Indicates that multiple CPUs are working in parallel, dividing up the query in smaller pieces. Ordinarily a well-tuned OLTP application would not parallelize unless an index is missing or there is an incomplete WHERE clause.
SOS_SCHEDULER_YIELD – Occurs when a task voluntarily yields the scheduler for other tasks to execute. During this wait the task is waiting for its quantum to be renewed. More threads clearly states CPU is under pressure. A simple solution is to add more CPU, but a better approach is to look for CPU intensive queries and Long Running Queries and fine tune them.

PAGEIOLATCH_XX – Latency between the disks to memory is the major cause for these wait types. When a query requests a page/data that is not in buffer cache, it has to wait for the data to be transferred from disk to buffer. Function Sys.dm_io_virtual_file_stats provides a breakdown of SQL Server reads, writes, and IO_Stalls for a particular database or transaction log file.

PAGELATCH_X – Latches are short term lightweight synchronization objects to facilitate the access to In-Memory pages. Some of the memory pages can be in high demand and this mechanism ensures transaction integrity. TempDB and Indexes which are heavily used are the main cause of contention. ASYNC_IO_COMPLETION and IO_COMPLETION – Simply put, tasks are waiting for IO to finish and Disk subsystem is likely a bottleneck here. Placing SQL Log files on different disks, SAN storage lookup and looking at virtual file stats are some of the ways to unearth this issue.

WRITELOG – Usually occurs on a Heavy transactional database while data in the log cache is being flushed to the disk. Checkpoints and Transaction commits are common causes for this wait type. Moving around Database log file to an independent disk really helps here. Look for IO related performance counters to find IO bottlenecks.

LCK_M_XX – When any task is waiting to acquire lock on any resource this wait type occurs. A common reason for the wait is that the resource has already been locked by other tasks leading to time outs and degradation in application performance. Keeping transactions small, using isolation levels and data partitioning are some of possible resolves.

Preemptive and Non preemptive – Introduced in SQL 2008, these have a wide variety of wait types. Some of the Extended Stored Procedure and CLR events run with high priority and OS interrupts the running process to give resources to these. Non-Preemptive and a bit more co-operative and scheduling is managed by SQL Server. High Preemptive wait is a cause of concern and should be looked into.

It can be concluded that wait types in correlation with performance counters are a strong tool for debugging performance issues and indicating the possible problem area. There are myriad of wait types and can be discussed in greater detail individually.   Look for future blogs that can go in to more details.