This section explains how to use and interpret the
showplan command to better understand and utilize the SQL Server query optimizer.
When you send a SQL statement to the Sybase SQL Server, the request first goes to a cost-based query optimizer whose job it is to find the most efficient data access path to fulfill the request. To do this, the optimizer examines such information as:
- The structure of any indices defined on the table(s) involved
- The distribution of data within the indices
- The number of rows in the table(s) involved
- The number of data pages used by the table(s) involved
- The amount of data cache SQL Server is currently using
- The access path that would require the least amount of I/O and, therefore, would be the fastest
Once an optimal access path is calculated, it is stored in a query plan within the procedure cache.
The
showplan allows you to view the plan the optimizer has chosen and to follow each step that the optimizer took when joining tables, reading from an index or using one of several other methods to determine cost efficiency. To invoke the showplan command, enter:
1> set showplan on
2> go
This command causes SQL Server to display query plan information for every SQL statement executed within the scope of the SQL Server session.
Since the determination of a query plan is performed independently from the actual data retrieval or modification, it is possible to examine and tune query plans without actually executing the SQL statement. This can be accomplished by instructing SQL Server not to execute any SQL statements via the following command:
1> set noexec on
2> go
Note
Issue
noexec after
showplan or the
set showplan command will not execute.
For more information about executing the
showplan command, refer to the SQL Server Performance and Tuning Guide.
Note
The
showplan command does not function within stored procedures or triggers. However, if you set it to on and then execute a stored procedure or a command that fires a trigger, you can see the procedure or trigger output.
Use the following examples to analyze a query plan. In all cases, examples use the
pubs database provided with each SQL Server release.
Interpreting
showplan Output
The output of the
showplan command consists of many different types of statements, depending on the details of the access path that is being used. The following sections describe some of the more common statements.
STEP
n
This statement is added to the
showplan output for every query, where
n is an integer, beginning with 1. For some queries, SQL Server cannot effectively retrieve the results in a single step, and must break the query plan into several steps. For example, if a query includes a
group by clause, the query needs to be broken into at least two steps: one to select the qualifying rows from the table and another to group them.
The following query demonstrates a single-step query and its
showplan output:
1> select au_lname, au_fname from authors
2> where city = “Oakland”
3> go
STEP 1
The type of query is SELECT
FROM TABLE
authors
Nested iteration
Table Scan
A multiple-step example is shown in the next section.
The Type of Query Is SELECT (into a Worktable)
This
showplan statement indicates that SQL Server needs to insert some of the query results into an intermediate worktable and, later in the query processing, select the values from that table. This is most often seen with a query which involves a
group by clause, as the results are first put into a worktable, and then the qualifying rows in the worktable are grouped based on the given column in the
group by clause.
The following query returns a list of cities and indicates the number of authors who live in each city. The query plan is composed of two steps: the first step selects the rows into a worktable, and the second step retrieves the grouped rows from the worktable.
1> select city, total_authors = count (*)
2> from authors group by city
3> go
STEP 1
The type of query is SELECT (into a worktable)
GROUP BY
Vector Aggregate
FROM TABLE
authors
Nested iteration
Table Scan
TO TABLE
Worktable
STEP 2
The type of query is SELECT
FROM TABLE
Worktable
Nested iteration
Table Scan
The Type of Query Is
query_type
This statement describes the type of query for each step. For most user queries, the value for
query_ type is
select,
insert,
update, or
delete. If
showplan is turned on while other commands are issued, the
query_ type reflects the command that was issued. The following two examples show output for different queries or commands:
1> create table Mytab (col1 int)
2> go
STEP 1
The type of query is CREATE TABLE
1> insert publishers
2> values (“9904”, “NewPubs”, “Nome”, “AL”)
3> go
STEP 1
The type of query is INSERT
The update mode is direct
Table Scan
TO TABLE
publishers
The Update Mode Is Deferred
There are two methods or, modes, that SQL Server can use to perform update operations such as
insert,
delete,
update, and
select into. These methods are called
deferred update and
direct update. When the deferred method is used, the changes are applied to all rows of the table by making log records in the transaction log to reflect the old and new value of the column(s) being modified (in the case of
update operations), or the values that will be inserted or deleted (in the case of
insert and
delete).
When all log records have been constructed, the changes are applied to the data pages. This method generates more log records than a direct update, but it has the advantage of allowing commands to execute which may cascade changes throughout a table. For example, consider a table that has a column
col1 with a unique index on it and data values numbered consecutively from 1 to 100 in that column. Execute an
update statement to increase the value in each row by one:
1> update Mytable set col1 = col1 + 1
2> go
STEP 1
The type of query is UPDATE
The update mode is deferred
FROM TABLE
Mytable
Nested iteration
Table scan
TO TABLE
Mytable
Consider the consequences of starting at the first row in the table, and updating each row until the end of the table. This violates the unique index. First, updating the first row (which has an initial value of 1) to 2 would cause an error, since 2 already exists in the table. Second, by updating the second row or any row in the table except the last one does the same.
Deferred updates avoid unique index violations. The log records are created to show the new values for each row, the existing rows are deleted and new values are inserted. In the following example, the table
authors has no clustered index or unique index:
1> insert authors select * from authors
2> go
STEP 1
The type of query is INSERT
The update mode is deferred
FROM TABLE
authors
Nested iteration
Table Scan
TO TABLE
authors
Because the table does not have a clustered index, new rows are added at the end of the table. The query processor distinguishes between existing rows now in the table (before the
insert command) from the rows to be inserted, thus avoiding the continuous loop of selecting a row, inserting it at the end of the table, re-selecting the row just inserted and reinserting it. The deferred insertion method first creates the log records to show all currently existing values in the table. Then SQL Server rereads those log records to insert the rows into the table.
The Update Mode Is Direct
Whenever possible, SQL Server tries to directly apply updates to tables, since this is faster and creates fewer log records than the deferred method. Depending on the type of command, one or more criteria must be met in order for SQL Server to perform the update using the direct method. The criteria are as follows:
- insert Using the direct method, the table into which the rows are being inserted cannot be a table which is being read from in the same command. The second query example in the previous section demonstrates this, where the rows are being inserted into the same table in which they are being selected from. In addition, if rows are being inserted into the target table, and one or more of the target table’s columns appear in the where clause of the query, then the deferred method, rather than the direct method, will be used.
- select into When a table is being populated with data by means of a select into command, the direct method will always be used to insert the new rows.
- delete For the direct update method to be used for delete, the query optimizer must be able to determine that either zero or one row qualifies for the delete. The only way to verify this is to check that one unique index exists on the table, which is qualified in the where clause of the delete command, and the target table is not joined with any other table(s).
- update For the direct update method to be used for update commands, the same criteria apply as for delete: a unique index must exist so that the query optimizer can determine that no more than one row qualifies for the update, and the only table in the update command is the target table to update. Also, all updated columns must be fixed-length datatypes, not variable- length datatypes. Note that any column that allows null values is internally stored by SQL Server as a variable-length datatype column.
1> delete from authors
2> where au_id = “172-32-1176”
3> go
STEP 1
The type of query is DELETE
The update mode is direct
FROM TABLE
authors
Nested iteration
Using Clustered Index
TO TABLE
authors
1> update titles set type = ‘popular_comp’
2> where title_id = “BU2075”
3> go
STEP 1
The type of query is UPDATE
The update mode is direct
FROM TABLE
titles
Nested iteration
Using Clustered Index
TO TABLE
titles
1> update titles set price = $5.99
2> where title_id = “BU2075”
3> go
STEP 1
The type of query is UPDATE
The update mode is deferred
FROM TABLE
titles
Nested iteration
Using Clustered Index
TO TABLE
titles
Note that the only difference between the second and third example queries is the column of the table which is updated. In the second query, the direct update method is used, whereas in
the third query, the deferred method is used. This difference occurs because of the datatype of the column being updated: the
titles.type column is defined as “char(12) NOT NULL” where the
titles.price column is defined as “money NULL”. Since the
titles.price column is not a fixed-length datatype, the direct method cannot be used.
GROUP BY
This statement appears in the
showplan output for any query that contains a
group by clause. Queries that contain a
group by clause are always two-step queries: the first step selects the qualifying rows into a table and groups them; the second step returns the rows from the table as seen in the following example:
1> select type, avg (advance),
sum(ytd_sales)
2> from titles group by type
3> go
STEP 1
The type of query is SELECT (into a worktable)
GROUP BY
Vector Aggregate
FROM TABLE
titles
Nested iteration
Table Scan
TO TABLE
Worktable
STEP 2
The type of query is SELECT
FROM TABLE
Worktable
Nested iteration
Table Scan
Scalar Aggregate
Transact-SQL includes the aggregate functions
avg,
count,
max,
min, and
sum. Whenever you use an aggregate function in a
select statement that does not include a
group by clause, the result is a single value, regardless of whether it operates on all table rows or on a subset of the rows defined in the
where clause. When an aggregate function produces a single value, the function is called a
scalar aggregate and
showplan lists it that way as seen in the
following example:
1> select avg(advance), sum(ytd_sales) from
titles
2> where type = “business”
3> go
STEP 1
The type of query is SELECT
Scalar aggregate
FROM TABLE
titles
Nested iteration
Table scan
STEP 2
The type of query is SELECT
Table Scan
showplan considers this a two-step query, which is similar to the
group by output. Since the query contains a scalar aggregate which will return a single value, SQL Server keeps a “variable” internally to store the result of the aggregate function. It can be thought of as a temporary storage space to keep a running total of the aggregate function as the qualifying rows from the table are evaluated. After all rows are evaluated from the table in step 1, the final value of the variable is selected in step 2 to return the scalar aggregate result.
Vector Aggregates
When a
group by clause is used in a query that also includes an aggregate function, the aggregate function produces a value for each group. These values are called
vector aggregates. The
vector aggregate statement from
showplan indicates that the query includes a vector aggregate. The following example query includes a vector aggregate:
1> select title_id, avg (qty) from sales
2> group by title_id
3> go
STEP 1
The type of query is SELECT (into a worktable)
GROUP BY
Vector Aggregate
FROM TABLE
sales
Nested iteration
Table Scan
TO TABLE
Worktable
STEP 2
The type of query is SELECT
FROM TABLE
worktable
Nested iteration
Table Scan
from table Statement
This
showplan output shows the table from which the query reads. In most queries, the
from table is followed by the table’s name. In other cases, it may show that it is selecting from a worktable. The significant fact is that the
from table output show the query optimizer’s order for joining tables. The order in which the tables are listed is the order in which the tables are joined. This order often differs from the order in which tables are listed in the query’s
from or
where clauses. The reason for this is that the query
optimizer checks many join orders for the tables and picks the order that uses the fewest I/Os.
1> select authors.au_id, au_fname, au_lname
2> from authors, titleauthor, titles
3> where authors.au_id = titlesauthor.au_id
4> and titleauthor.title_id =
titles.title_id
5> and titles.type = “psychology”
6> go
STEP 1
The type of query is SELECT
FROM TABLE
TITLES
Nested iteration
Table Scan
FROM TABLE
TITLEAUTHOR
Nested iteration
Table Scan
FROM TABLE
authors
Nested iteration
Table Scan
This query illustrates the join order that the query optimizer chose for the tables, which is not the order listed in either the
from or
where clauses. By examining the order of the
from table statements, it can be seen that the qualifying rows from the titles table are first located with the search clause
titles.type = “psychology”. Those rows are then joined with the
titleauthor table using the join clause
titleauthor.title_id = titles.title_id. Finally, the
titleauthor table is joined with the
authors table to retrieve the desired columns using the join clause
authors.au_id = titleauthor.au_id.
to table Statement
When you issue a command that tries to modify one or more table rows, such as
insert,
delete,
update,
or select into, the
to table statement shows the target table that is being modified. If the operation requires an intermediate step and inserts the rows into a worktable, the
to table statement names the worktable instead of the user table.
1> insert sales
2> values (“8042”, “QA973”, “7/15/94”, 7,
3> “Net 30”, “PC1035”)
4> go
STEP 1
The type of query is INSERT
The update mode is direct
TO TABLE
sales
1> update publishers
2> set city = “Los Angeles”
3> where pub_id = “1389”
4> go
STEP 1
The type of query is UPDATE
The update mode is deferred
FROM TABLE
publishers
Nested iteration
Using Clustered Index
TO TABLE
publishers
Note that the
showplan for the second query indicates that the
publishers table is used for both
from table and
to table. With
update operations, the query optimizer must first read the table containing the row(s) to be updated, resulting in the
from table statement, and then must modify the row(s), resulting in the
to table statement.
Worktable
For some queries, such as those that require ordered or grouped output, the query optimizer creates its own temporary table called a worktable. The worktable holds all the intermediate results of the query where they are ordered and/or grouped, and then the final select is done. When all results are returned, the table is dropped automatically. The
tempdb database holds all temporary tables so the System Administrator may need to increase the size of that database to accommodate very large worktables. For more information about worktables, refer to Chapter 8, “The tempdb Database.”
Since the query optimizer creates these worktables for its own internal use, the worktable names are not listed in the
tempdb..sysobjects table.
Nested Iteration
The nested iteration is the default technique used to join table and return rows from a table. It indicates that the query optimizer uses one or more sets of loops to read a table and fetch a row, qualify the row based on the search criteria given in the
where clause, return the row to the front end, and then loop again for the next row. The following example shows the query optimizer doing nested iterations through each of the tables in the join:
1> select title_id, tile
2> from titles, publishers
3> where titles.pub_id = publishers.pub_id
4> and publishers.pub_id = ‘1389’
5> go
STEP 1
The type of query is SELECT
FROM TABLE
publishers
Nested iteration
Using clustered index
FROM TABLE
titles
Nested iteration
Table Scan
Table Scan
This
showplan statement identifies the method used to fetch the physical result rows from the given table. When the
table scan method is used, execution begins with the first row on the table. Then each row is fetched and compared with the conditions set in the
where clause, then returned as valid data if the conditions are met. No matter how many rows qualify, every row in the table must be checked, and this causes problems if the table is large (the scan has a high I/O overhead). If a table has one or more indexes on it, the query optimizer may still choose a table scan
instead of reading the index. The following query shows a typical table scan:
1> select au_lname, au_fname
2> from authors
3> go
STEP 1
The type of query is SELECT
FROM TABLE
authors
Nested iteration
Table Scan