Using set showplan

This section explains how to use and interpret the showplan command to better understand and utilize the SQL Server query optimizer. When you send a SQL statement to the Sybase SQL Server, the request first goes to a cost-based query optimizer whose job it is to find the most efficient data access path to fulfill the request. To do this, the optimizer examines such information as:
  • The structure of any indices defined on the table(s) involved
  • The distribution of data within the indices
  • The number of rows in the table(s) involved
  • The number of data pages used by the table(s) involved
  • The amount of data cache SQL Server is currently using
  • The access path that would require the least amount of I/O and, therefore, would be the fastest
Once an optimal access path is calculated, it is stored in a query plan within the procedure cache. The showplan allows you to view the plan the optimizer has chosen and to follow each step that the optimizer took when joining tables, reading from an index or using one of several other methods to determine cost efficiency. To invoke the showplan command, enter: 1> set showplan on 2> go This command causes SQL Server to display query plan information for every SQL statement executed within the scope of the SQL Server session. Since the determination of a query plan is performed independently from the actual data retrieval or modification, it is possible to examine and tune query plans without actually executing the SQL statement. This can be accomplished by instructing SQL Server not to execute any SQL statements via the following command: 1> set noexec on 2> go Note Issue noexec after showplan or the set showplan command will not execute. For more information about executing the showplan command, refer to the SQL Server Performance and Tuning Guide. Note The showplan command does not function within stored procedures or triggers. However, if you set it to on and then execute a stored procedure or a command that fires a trigger, you can see the procedure or trigger output. Use the following examples to analyze a query plan. In all cases, examples use the pubs database provided with each SQL Server release. Interpreting showplan Output The output of the showplan command consists of many different types of statements, depending on the details of the access path that is being used. The following sections describe some of the more common statements. STEP n This statement is added to the showplan output for every query, where n is an integer, beginning with 1. For some queries, SQL Server cannot effectively retrieve the results in a single step, and must break the query plan into several steps. For example, if a query includes a group by clause, the query needs to be broken into at least two steps: one to select the qualifying rows from the table and another to group them. The following query demonstrates a single-step query and its showplan output: 1> select au_lname, au_fname from authors 2> where city = “Oakland” 3> go STEP 1 The type of query is SELECT FROM TABLE authors Nested iteration Table Scan A multiple-step example is shown in the next section. The Type of Query Is SELECT (into a Worktable) This showplan statement indicates that SQL Server needs to insert some of the query results into an intermediate worktable and, later in the query processing, select the values from that table. This is most often seen with a query which involves a group by clause, as the results are first put into a worktable, and then the qualifying rows in the worktable are grouped based on the given column in the group by clause. The following query returns a list of cities and indicates the number of authors who live in each city. The query plan is composed of two steps: the first step selects the rows into a worktable, and the second step retrieves the grouped rows from the worktable. 1> select city, total_authors = count (*) 2> from authors group by city 3> go STEP 1 The type of query is SELECT (into a worktable) GROUP BY Vector Aggregate FROM TABLE authors Nested iteration Table Scan TO TABLE Worktable STEP 2 The type of query is SELECT FROM TABLE Worktable Nested iteration Table Scan The Type of Query Is query_type This statement describes the type of query for each step. For most user queries, the value for query_ type is select, insert, update, or delete. If showplan is turned on while other commands are issued, the query_ type reflects the command that was issued. The following two examples show output for different queries or commands: 1> create table Mytab (col1 int) 2> go STEP 1 The type of query is CREATE TABLE 1> insert publishers 2> values (“9904”, “NewPubs”, “Nome”, “AL”) 3> go STEP 1 The type of query is INSERT The update mode is direct Table Scan TO TABLE publishers The Update Mode Is Deferred There are two methods or, modes, that SQL Server can use to perform update operations such as insert, delete, update, and select into. These methods are called deferred update and direct update. When the deferred method is used, the changes are applied to all rows of the table by making log records in the transaction log to reflect the old and new value of the column(s) being modified (in the case of update operations), or the values that will be inserted or deleted (in the case of insert and delete). When all log records have been constructed, the changes are applied to the data pages. This method generates more log records than a direct update, but it has the advantage of allowing commands to execute which may cascade changes throughout a table. For example, consider a table that has a column col1 with a unique index on it and data values numbered consecutively from 1 to 100 in that column. Execute an update statement to increase the value in each row by one: 1> update Mytable set col1 = col1 + 1 2> go STEP 1 The type of query is UPDATE The update mode is deferred FROM TABLE Mytable Nested iteration Table scan TO TABLE Mytable Consider the consequences of starting at the first row in the table, and updating each row until the end of the table. This violates the unique index. First, updating the first row (which has an initial value of 1) to 2 would cause an error, since 2 already exists in the table. Second, by updating the second row or any row in the table except the last one does the same. Deferred updates avoid unique index violations. The log records are created to show the new values for each row, the existing rows are deleted and new values are inserted. In the following example, the table authors has no clustered index or unique index: 1> insert authors select * from authors 2> go STEP 1 The type of query is INSERT The update mode is deferred FROM TABLE authors Nested iteration Table Scan TO TABLE authors Because the table does not have a clustered index, new rows are added at the end of the table. The query processor distinguishes between existing rows now in the table (before the insert command) from the rows to be inserted, thus avoiding the continuous loop of selecting a row, inserting it at the end of the table, re-selecting the row just inserted and reinserting it. The deferred insertion method first creates the log records to show all currently existing values in the table. Then SQL Server rereads those log records to insert the rows into the table. The Update Mode Is Direct Whenever possible, SQL Server tries to directly apply updates to tables, since this is faster and creates fewer log records than the deferred method. Depending on the type of command, one or more criteria must be met in order for SQL Server to perform the update using the direct method. The criteria are as follows:
  • insert ­ Using the direct method, the table into which the rows are being inserted cannot be a table which is being read from in the same command. The second query example in the previous section demonstrates this, where the rows are being inserted into the same table in which they are being selected from. In addition, if rows are being inserted into the target table, and one or more of the target table’s columns appear in the where clause of the query, then the deferred method, rather than the direct method, will be used.
  • select into ­ When a table is being populated with data by means of a select into command, the direct method will always be used to insert the new rows.
  • delete ­ For the direct update method to be used for delete, the query optimizer must be able to determine that either zero or one row qualifies for the delete. The only way to verify this is to check that one unique index exists on the table, which is qualified in the where clause of the delete command, and the target table is not joined with any other table(s).
  • update ­ For the direct update method to be used for update commands, the same criteria apply as for delete: a unique index must exist so that the query optimizer can determine that no more than one row qualifies for the update, and the only table in the update command is the target table to update. Also, all updated columns must be fixed-length datatypes, not variable- length datatypes. Note that any column that allows null values is internally stored by SQL Server as a variable-length datatype column.
1> delete from authors 2> where au_id = “172-32-1176” 3> go STEP 1 The type of query is DELETE The update mode is direct FROM TABLE authors Nested iteration Using Clustered Index TO TABLE authors 1> update titles set type = ‘popular_comp’ 2> where title_id = “BU2075” 3> go STEP 1 The type of query is UPDATE The update mode is direct FROM TABLE titles Nested iteration Using Clustered Index TO TABLE titles 1> update titles set price = $5.99 2> where title_id = “BU2075” 3> go STEP 1 The type of query is UPDATE The update mode is deferred FROM TABLE titles Nested iteration Using Clustered Index TO TABLE titles Note that the only difference between the second and third example queries is the column of the table which is updated. In the second query, the direct update method is used, whereas in the third query, the deferred method is used. This difference occurs because of the datatype of the column being updated: the titles.type column is defined as “char(12) NOT NULL” where the titles.price column is defined as “money NULL”. Since the titles.price column is not a fixed-length datatype, the direct method cannot be used. GROUP BY This statement appears in the showplan output for any query that contains a group by clause. Queries that contain a group by clause are always two-step queries: the first step selects the qualifying rows into a table and groups them; the second step returns the rows from the table as seen in the following example: 1> select type, avg (advance), sum(ytd_sales) 2> from titles group by type 3> go STEP 1 The type of query is SELECT (into a worktable) GROUP BY Vector Aggregate FROM TABLE titles Nested iteration Table Scan TO TABLE Worktable STEP 2 The type of query is SELECT FROM TABLE Worktable Nested iteration Table Scan Scalar Aggregate Transact-SQL includes the aggregate functions avg, count, max, min, and sum. Whenever you use an aggregate function in a select statement that does not include a group by clause, the result is a single value, regardless of whether it operates on all table rows or on a subset of the rows defined in the where clause. When an aggregate function produces a single value, the function is called a scalar aggregate and showplan lists it that way as seen in the following example: 1> select avg(advance), sum(ytd_sales) from titles 2> where type = “business” 3> go STEP 1 The type of query is SELECT Scalar aggregate FROM TABLE titles Nested iteration Table scan STEP 2 The type of query is SELECT Table Scan showplan considers this a two-step query, which is similar to the group by output. Since the query contains a scalar aggregate which will return a single value, SQL Server keeps a “variable” internally to store the result of the aggregate function. It can be thought of as a temporary storage space to keep a running total of the aggregate function as the qualifying rows from the table are evaluated. After all rows are evaluated from the table in step 1, the final value of the variable is selected in step 2 to return the scalar aggregate result. Vector Aggregates When a group by clause is used in a query that also includes an aggregate function, the aggregate function produces a value for each group. These values are called vector aggregates. The vector aggregate statement from showplan indicates that the query includes a vector aggregate. The following example query includes a vector aggregate: 1> select title_id, avg (qty) from sales 2> group by title_id 3> go STEP 1 The type of query is SELECT (into a worktable) GROUP BY Vector Aggregate FROM TABLE sales Nested iteration Table Scan TO TABLE Worktable STEP 2 The type of query is SELECT FROM TABLE worktable Nested iteration Table Scan from table Statement This showplan output shows the table from which the query reads. In most queries, the from table is followed by the table’s name. In other cases, it may show that it is selecting from a worktable. The significant fact is that the from table output show the query optimizer’s order for joining tables. The order in which the tables are listed is the order in which the tables are joined. This order often differs from the order in which tables are listed in the query’s from or where clauses. The reason for this is that the query optimizer checks many join orders for the tables and picks the order that uses the fewest I/Os. 1> select authors.au_id, au_fname, au_lname 2> from authors, titleauthor, titles 3> where authors.au_id = titlesauthor.au_id 4> and titleauthor.title_id = titles.title_id 5> and titles.type = “psychology” 6> go STEP 1 The type of query is SELECT FROM TABLE TITLES Nested iteration Table Scan FROM TABLE TITLEAUTHOR Nested iteration Table Scan FROM TABLE authors Nested iteration Table Scan This query illustrates the join order that the query optimizer chose for the tables, which is not the order listed in either the from or where clauses. By examining the order of the from table statements, it can be seen that the qualifying rows from the titles table are first located with the search clause titles.type = “psychology”. Those rows are then joined with the titleauthor table using the join clause titleauthor.title_id = titles.title_id. Finally, the titleauthor table is joined with the authors table to retrieve the desired columns using the join clause authors.au_id = titleauthor.au_id. to table Statement When you issue a command that tries to modify one or more table rows, such as insert, delete, update, or select into, the to table statement shows the target table that is being modified. If the operation requires an intermediate step and inserts the rows into a worktable, the to table statement names the worktable instead of the user table. 1> insert sales 2> values (“8042”, “QA973”, “7/15/94”, 7, 3> “Net 30”, “PC1035”) 4> go STEP 1 The type of query is INSERT The update mode is direct TO TABLE sales 1> update publishers 2> set city = “Los Angeles” 3> where pub_id = “1389” 4> go STEP 1 The type of query is UPDATE The update mode is deferred FROM TABLE publishers Nested iteration Using Clustered Index TO TABLE publishers Note that the showplan for the second query indicates that the publishers table is used for both from table and to table. With update operations, the query optimizer must first read the table containing the row(s) to be updated, resulting in the from table statement, and then must modify the row(s), resulting in the to table statement. Worktable For some queries, such as those that require ordered or grouped output, the query optimizer creates its own temporary table called a worktable. The worktable holds all the intermediate results of the query where they are ordered and/or grouped, and then the final select is done. When all results are returned, the table is dropped automatically. The tempdb database holds all temporary tables so the System Administrator may need to increase the size of that database to accommodate very large worktables. For more information about worktables, refer to Chapter 8, “The tempdb Database.” Since the query optimizer creates these worktables for its own internal use, the worktable names are not listed in the tempdb..sysobjects table. Nested Iteration The nested iteration is the default technique used to join table and return rows from a table. It indicates that the query optimizer uses one or more sets of loops to read a table and fetch a row, qualify the row based on the search criteria given in the where clause, return the row to the front end, and then loop again for the next row. The following example shows the query optimizer doing nested iterations through each of the tables in the join: 1> select title_id, tile 2> from titles, publishers 3> where titles.pub_id = publishers.pub_id 4> and publishers.pub_id = ‘1389’ 5> go STEP 1 The type of query is SELECT FROM TABLE publishers Nested iteration Using clustered index FROM TABLE titles Nested iteration Table Scan Table Scan This showplan statement identifies the method used to fetch the physical result rows from the given table. When the table scan method is used, execution begins with the first row on the table. Then each row is fetched and compared with the conditions set in the where clause, then returned as valid data if the conditions are met. No matter how many rows qualify, every row in the table must be checked, and this causes problems if the table is large (the scan has a high I/O overhead). If a table has one or more indexes on it, the query optimizer may still choose a table scan instead of reading the index. The following query shows a typical table scan: 1> select au_lname, au_fname 2> from authors 3> go STEP 1 The type of query is SELECT FROM TABLE authors Nested iteration Table Scan