Applied Business Consultancy Limited
SQL Server 2000 Indexing Overview

INDEX DEFINITION AND PURPOSE


An index in a SQL Server 2000 (SQL Server) database is like an index in a book. The latter enables readers of a book to quickly and efficiently locate the information that is of interest to them through a list of certain keywords and corresponding page numbers. The former enables SQL Server to quickly and efficiently locate the rows of a table that are of interest to a query through a structure containing certain table column values and a series of pointers. Without indexes, a query must perform the expensive task of scanning all rows within a table (a table scan) in order to find the data it requires. Likewise, poorly configured or inappropriate indexes will have a negative affect on performance.

A SQL Server index is structured as a Balanced tree (B-tree). A B-tree consists of three categories of nodes: a root node, intermediate nodes and leaf nodes. The nodes of a SQL Server index are logically implemented as special units of storage known as index pages, which contain certain sorted values from one or more columns in a table (the index key) and pointers to other index pages. All search requests begin at the root node and move through the tree to the appropriate leaf nodes.

In general, indexes should be created on table columns that are frequently referenced in queries. For example, a table containing customer details may frequently be queried based on a particular customer ID. In this instance, an index on the customer ID column would enable the row in question to be found more quickly than performing a table scan, particularly if thousands of rows exist in the table. Index performance considerations are described later in this overview.

INDEX TYPES


There are two types of index in SQL Server: clustered and nonclustered. The primary difference between the two is that a clustered index defines the physical order of the underlying table data whereas a nonclustered index contains a pointer-based logical ordering of the data. Although both take the form of the B-tree described in the preceding section, the two are very different. The fundamental difference is that the leaf nodes in a clustered index contain the actual table rows whereas the leaf nodes in a nonclustered index only contain pointers to the actual table rows.

A clustered index physically sorts table data based on the values in the index key such that the leaf nodes (known as the data pages) contain all of the table's rows in an ordered sequence. Since a clustered index sorts the rows, only one can be defined on a table. Specifying a primary key constraint on a table automatically creates a clustered index on the columns contained within the primary key if one doesn't already exist (in which case a nonclustered index is created). A clustered index is most useful on primary key columns since unique values are enforced, creating smaller B-trees and therefore more efficient indexes. A clustered index can be based on non-unique columns although the index will enforce uniqueness internally and is not recommended in general practice. Figure 1 shows the structure of a clustered index.

If a clustered index exists on a table, the leaf nodes of any nonclustered indexes created on the same table point to the leaf nodes (the data pages) of the clustered index since they contain the actual table rows. A table can have up to 249 nonclustered indexes created on it. If a table does not have a clustered index, it is unsorted and known as a heap. On a heap, the leaf nodes of a nonclustered index contain pointers to the table rows (row IDs). On a table that has a clustered index, the leaf nodes of a nonclustered index contain clustered index keys rather than row IDs. Figure 1 shows the structure of a nonclustered index.

Example Clustered and Nonclustered Index Structures
Figure 1. Example Clustered (Left) and Nonclustered (Right) Index Structures.
The leaf nodes of the clustered index show that table contains last name and first name columns.


INDEX CHARACTERISTICS


A number of characteristics can be defined on a SQL Server index. An index can be defined as unique, meaning the index's key must maintain distinct values. To create a unique index whose key consists of the values in one table column, each value in that column must be distinct. To create a unique index whose key consists of the values in more than one table column (a composite index - see below), the values across all columns must maintain distinctiveness. For example, where a unique index is defined on the last name and national insurance number columns of an employee table, then the combination of the values in these two columns must be distinct. A unique index will also not allow both columns to contain NULL values in more than one row.

A composite index's key spans more than one table column. The primary reason for utilising a composite index is to reduce input/output (I/O) operations - a query on a combination of columns that are contained within an index will locate the data it requires in the index without accessing underlying table data. When a query can locate data solely from the information contained within an index, that index is said to be covering. Consider the following query:

SELECT id, lastName, natInsurNo
FROM employees
WHERE dob >= CAST('1 May 1980 00:00:00' AS smalldatetime)
ORDER BY dob DESC

If a clustered index exists on the id column and a nonclustered index exists on the lastName, natInsurNo and dob columns, the nonclustered index is considered covering. This is because the leaf nodes of the nonclustered index contain the clustered index keys (i.e. the id column values) that are present in the leaf node of the clustered index.

When a row is inserted into a table, SQL Server must locate some space for it. If the table is a heap and contains one or more nonclustered indexes, the row and index page (if all other pages are full) are placed in any available space within the heap. If a clustered index exists on the table, the row must be placed in the appropriate index pages of the clustered index such that the sort order is preserved. If a selected page is full, new space must be created by means of a page split. This creates a new adjacent page and transfers half of the contents of the original page to create more space. Page splits are resource intensive, particularly when the insertion of large rows causes many to occur. To reduce the frequency of page splits, the fill factor of the index pages can be specified upon creation of the index. The fill factor is a percentage that specifies how full index pages should be upon creation. A lower fill factor will cause the index to occupy more space but will reduce the frequency of page splits. In high capacity transaction systems, additional space can be allocated to the intermediate index pages by enabling a pad index option upon index creation.

When a clustered or nonclustered index is created, the index keys can be sorted either in ascending or descending order. If a query is executed that utilises the index (which is decided by the Query Optimiser) and no ORDER BY clause is specified, the result set will be ordered according to the sort order of the index. The implementation of index characteristics is detailed in the following section.

INDEX CREATION AND ADMINISTRATION


There are several ways to create an index in SQL Server including the Enterprise Manager client tool, Transact-SQL (TSQL) statements and the SQL Distributed Management Objects (SQL-DMO) interface. This document examines the first two methods. To create an index in Enterprise Manager, right click on a table, select All Tasks, then Manage Indexes. Click on the New button to display the Create New Index dialog box shown in figure 2.

Create New Index Dialog
Figure 2. Create New Index Dialog Box

From this dialogue box you can define the characteristics described in the preceding section and several others including the column order, whether to ignore duplicate values in a unique index, whether index statistics should not be recomputed and the file group that the index should be created on. Some of these extra characteristics are discussed later. Enterprise Manager also provides the Create Index Wizard, which allows the step-by-step assisted configuration of existing and new indexes. To access the wizard, click the Tools menu and then select Create Index Wizard from the Select Wizard dialog box.

An index can also be created through the execution of CREATE INDEX, CREATE TABLE and ALTER TABLE TSQL statements. Of these three, the CREATE INDEX syntax is most flexible and allows the specification of index information such as the index name, the table name and the index columns:

CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX <index_name>
ON [<table_name>] (<column_name> [ASC | DESC] [,...n])
[WITH [<index_property> [,...n]]
[ON <file_group>]

CREATE UNIQUE CLUSTERED INDEX ON employees (id ASC)

If any of the optional clauses are omitted, SQL Server creates an index with the default settings. Some of these default settings overridden by the WITH clause are:
  • FILLFACTOR = x specifies the fill factor
  • IGNORE_DUP_KEY ignores the insertion of duplicate key values
  • DROP_EXISTING replaces an existing index with the same name
  • SORT_IN_TEMPDB ensures sort operations are performed in TempDB
  • STATISTICS_NORECOMPUTE ensures index statistics are not recomputed
SQL Server automatically creates an index when a primary key constraint or unique constraint is created on a table. A nonclustered index will always be created unless a clustered index does not already exist on the table. The CREATE TABLE and ALTER TABLE statements include options that enable the default settings to be overridden, which is shown in the following syntax and examples:

CONSTRAINT <constraint_name>
[PRIMARY KEY | UNIQUE]
[CLUSTERED | NONCLUSTERED]
[WITH FILLFACTOR = <fill_factor>]
[ON <file_group>]

CREATE TABLE employees (id int CONSTRAINT pk_id PRIMARY KEY CLUSTERED) ON PRIMARY

ALTER TABLE employees ADD natInsurNo int CONSTRAINT un_natInsurNo UNIQUE NONCLUSTERED

Index administration involves deleting, rebuilding and renaming existing indexes. An index can be deleted through the Manage Indexes dialog box of Enterprise Manager shown in figure 3 or by executing the DROP INDEX TSQL statement. The TSQL syntax allows multiple indexes to be dropped as long as the index names are separated by commas and are qualified as two part names. Indexes can be rebuilt in one of three ways: each index can be dropped and recreated manually, each existing index can be overwritten by specifying the DROP_EXISTING option upon index creation or a DBCC DBREINDEX TSQL statement can be executed. The former two can be laborious, particularly if several indexes exist on the table. DBCC DBREINDEX rebuilds some or all of a table's indexes using a single statement and avoids some of the problems that may arise from manually dropping and recreating the indexes. An index can also be renamed by dropping it and recreating it with a different name although a simpler solution is to execute the generic sp_rename system stored procedure with the appropriate parameters.

Manage Indexes Dialog
Figure 3. Manage Indexes Dialog Box

INDEX USAGE AND PERFORMANCE


The number and characteristics of indexes created on a database's tables must be carefully considered in order to attain optimal performance. In general, indexes should be created to improve the performance of common queries and index characteristics configured according to requirements, such as uniqueness. By utilising a set of statistics kept on each index, the Query Optimizer is capable of accurately choosing the indexes that should be used in order to optimise the performance of a given query. Index statistics pertain to the distribution of values in columns that indexes use. SQL Server automatically creates and maintains these statistics unless specifically disabled (e.g. when a table's data is unlikely to change). Several specific types of query benefit from the use of indexes:
  • Exact match queries include WHERE clauses for returning a specific row. A unique index is most suitable for this type of query.
  • Wildcard queries that use the LIKE operator in the format "LIKE 'smi%'" benefit from indexes since the index keys start with specific characters.
  • Range queries return results within a specified range (e.g. BETWEEN 'ald%' AND 'smi%'). Clustered indexes are beneficial in these queries since the required rows will be physically adjacent in the index.
  • Table join queries that span one or tables will benefit from indexes on the joined columns since these columns are likely to be accessed frequently.
  • Sorted output queries can also benefit from an index. Omitting an ORDER BY clause and allowing an index to dictate the sort order will improve query performance.
  • Covered queries locate the data they require entirely from an index (a covering index) and reduce I/O operations by eliminating the need to access the underlying table data.
  • Covering indexes are most beneficial for queries that return many rows since the required data is located entirely in the index, therefore reducing I/O and improving query performance. Queries that return few rows would have a less significant improvement in I/O where this type of index is used.
Designing and maintaining appropriate indexes for a database can be a difficult task, especially if database queries change frequently or the database structure is modified frequently. Although such changes may require indexes to be modified, there are several generic guidelines that should be followed in order to design effective indexes. Firstly, since only one clustered index can be defined on a table, it is highly important that it is well designed. Clustered index keys should ideally be unique and of minimal width in order to minimise the overall size of the index and therefore yield better efficiency. Furthering this, any subsequent nonclustered indexes will contain clustered index key values in the leaf nodes so their overall size will also be minimised. Secondly, covering indexes should be used with caution, especially if the table contains many columns and rows. In this situation, the size of covering indexes can often grow extremely quickly and become resource intensive during general maintenance.

Thirdly, resource intensive page splits should be anticipated and acted upon by means of setting appropriate fillfactor and pad index values for indexes. Indexes created on volatile tables (i.e. frequent row insertions) should specify low fillfactors in order to minimise page splits at the expense of index size. In contrast, indexes created on non-volatile tables should specify high fillfactors since page splits are less likely. Fourthly, the number of indexes created on a table should be minimised, particularly if the table is volatile. Every time an INSERT, UPDATE or DELETE statement is executed that involves the table, all indexes that exist on that table must be updated to reflect the changes. If the database is read only, this guideline does not apply since statements that attempt to write to the database cannot be executed. Finally, the sorting feature of indexes should be used in queries in order to save resources that would otherwise be consumed by an ORDER BY clause in the query.

Index Tuning Wizard
Figure 4. Index Tuning Wizard Opening

Since indexing can be a complicated and time-consuming area of database design and maintenance, Enterprise Manager includes an Index Tuning Wizard to provide recommendations for choosing correct indexes across one or more tables. The opening page of the wizard is shown in figure 4. The wizard analyses a representative sample of database activity (a workload) to recommend an ideal set of indexes for a database. If database queries change frequently or the database structure is modified frequently, the wizard can be rerun to reflect these changes. The index tuning wizard accepts a number of workload representations: an SQL script (.sql), a SQL Profiler Trace (.trc or .log), a trace table or a script selected in Query Analyzer.

INDEXING REVIEWS


Indexing is a crucial consideration in optimising database performance. Our database consultants can perform thorough indexing reviews on databases of any size and complexity.

Contact our Consultancy Manager at consultancy@abl.co.uk for further information. Alternatively, use our online contact facility to request more information.

ABOUT THE AUTHOR


Will Tomlin is a Microsoft Certified Database Administrator (MCDBA)-qualified consultant with Applied Business Consultancy Limited. Will specialises in the design and development of high performance SQL Server solutions and has worked with prestigious clients across a variety of business sectors.

This overview is available as a PDF document at our Downloads page.
Site Map
© 1998 - 2007 Applied Business Consultancy Limited