Sunday, 15 February 2015

Understanding Graphical Execution Plans - Part 1: Explaining the Plan


Being able to understand and analyze execution plans is an important and beneficial skill for SQL Server database administrators and developers alike. An execution plan documents the estimated cost of a query, the indexes used, and the operations performed. All of this information is vitally important in attempting to speed up a slow performing query.
This article is part of a three part series on graphical execution plans. Part 1 explains what execution plans are and discusses the differences between estimated and actual plans. Part 2 shows how to create both estimated and actual execution plans. Lastly, Part 3 takes a look at a simple graphical execution plan, and discusses several of the most common operators used in a query.

What is an Execution Plan?

The query optimizer is a component of SQL Server that analyzes the information of a query and the statistics available about the underlying tables, indexes and indexed views. The statistics allow the query to compare the number of rows in a table to the unique values in an index to determine which indexes will reduce the cost of a query.
The optimizer uses the statistics to create several different plans. Each plan is a step-by-step document of the operations and the sequence of those operations that can be performed for one possible means of returning the results of a query. The different plans may try using different indexes, seeks versus scans (more on seeks and scans in Part 3), different types of joins (again, check out Part 3 for more on joins), and operations in different sequences in order to find the optimal plan.
Next, the optimizer assigns a relative cost for both I/O and processing for each operation. It then sums the cost of each operation to produce a total cost for each plan. After producing several different execution plans, it then selects the plan with the lowest total cost and then uses that plan to execute the query. It is important to keep in mind that these costs are relative values calculated by the optimizer and cannot be compared to any real values (such as CPU and I/O time) at the completion of execution.
Once an execution plan is selected by the optimizer, it is stored in cache. The next time the query is executed, the optimizer will look in cache to see if it contains a plan for the query. If a plan is found, it will use it to execute the query, saving the time of creating new estimated plans.

Non-graphical Execution Plans

Besides graphical executions plans, there are also text and XML execution plans. Text plans were the first type of execution plan that Microsoft made available and are difficult to read. With the addition of XML execution plans, Microsoft has stated that they will deprecate text execution plans in a future release of SQL Server. The primary value of XML execution plans is that they can be saved and later opened as a graphical plan. To save a plan in XML format, see Exporting Execution Plans or the TechNet article “Save an Execution Plan in XML Format”. For more information on text and XML execution plans, see Grant Fritchey’s book SQL Server Execution Plans.

Estimated vs. Actual

There are two types of execution plans: estimated and actual. The estimated plan is calculated by the query optimizer prior to execution; it represents what the optimizer believes will be the lowest cost plan. It can typically be returned to the user in a matter of seconds. The actual plan, on the other hand, is the actual steps that were executed to process the query. The  actual plan is returned to the user after the completion of the query. At times, some of the estimated and actual values may be different. The values are the quantitative data provided in an execution plan. See Figure 1 for an example of several of the plan values.

Why do estimated and actual plans values differ?

There are three reasons why plan values may differ.
  1. An estimated plan cannot be created
    Before a query is given to the optimizer to create estimated execution plans, a process known as the algebrizer validates the query. If an object in the query does not exist, the validation fails and no estimated plan is created. This can occur when a create statement is located in the same batch that uses the object that is created.
  2. Stale statistics
    SQL Server creates statistics on each index about the distribution of values in each column. The query optimizer then uses this information when estimating the number of rows returned from an operation, and ultimately estimating the cost of using a particular index. As the data in a table changes, these statistics can get out of date. If the query optimizer uses bad statistics, it will calculate incorrect costs for its plans. You will see this discrepancy in an operation in an actual plan by comparing the estimated rows to the actual rows (see Figure 1). The difference in rows highlighted in Figure 1 was created by performing a delete operation on the table.
  3. Parallelism  
    If the machine where SQL Server is installed has more than one CPU, the query optimizer may go through the process of finding its lowest cost plan twice; creating one plan using one processor and a second plan making use of multiple processors (parallelism). It will not decide which of these two plans to run until execution time. When a user requests to see the estimated plan, only one of the plans is shown. This plan may or may not be the one chosen by the query engine when executing the query.
Figure 1: Comparing Estimated Number of Rows to Actual Number of Rows

Which Plan Is Better?

An estimated plan can be returned almost immediately, whereas an actual plan cannot. The actual plan gives a more complete picture, but when supporting a production environment, waiting for a long-running statement to complete in order to get the actual plan is not always practical.
The most frequently used information that can only be found on the actual plan is the actual number of rows returned from each operation. The actual number of rows can be compared with the estimated number of rows. If there is a significant difference between the two, then the statistics are most likely stale. In this case updating the statistics may improve the query performance (see the TechNet article “Statistics” for more information on the subject).
As an alternative to getting the actual number of rows, you can check the last time the statistics on the indexes for a table were updated. This information combined with your knowledge of the activity that takes place on the database will give you good indication of the staleness of the statistics. The following query can be run on the sys.indexes table to return the last date that the statistics were updated:
USE YourDatabase
GO 
SELECT name AS index_name
 , STATS_DATE(OBJECT_ID, index_id)  AS StatsUpdated 
 FROM sys.indexes 
 WHERE OBJECT_ID = OBJECT_ID('YourSchema.YourTable')

Other Information Only Found On Actual Plans

Actual Rewinds and Rebinds are values that only apply to a handful of operators. These values count the number of times certain operators are initialized. A large number of initializations may cause a high I/O usage. For a more complete coverage of the topic, see Grant Fritchey’s book SQL Server Execution Plans
In addition to Actual Rewinds and Rebinds, Number of Executions was introduced in SQL Server 2008. This value is a count of the number of times an operator was executed. For some operators, the number of executions is related to the number of rows returned, and the estimated and actual number of executions will vary in relation to the difference between the number of estimated and actual rows. In some cases, SQL Server cannot estimate the number of executions and will list a value of 1 in the Estimated Number of Executions.  
The estimated plan includes estimated costs, which would logically lead to the conclusion that the actual plan should contain actual costs. However, this is not the case. As was mentioned earlier, cost is merely a calculated number that gives a relative indication of how expensive (in time) an operation will be. It cannot not be compared to any real values such as CPU and I/O time.

Conclusion

Graphical execution plans show the details of all the operations and their sequence that will be (or were for actual plans) used to execute a query. This makes them an excellent tool for analyzing and optimizing slow-running queries. Estimated and actual execution plans both have their advantages and drawbacks. Each is more appropriate under certain circumstances. Read Parts 2 and 3 of this series for more information on graphical execution plans.

References                 

 * The 2008 edition was used for this article, but a second edition was published in 2013.

No comments:

Post a Comment