![]() |
||||||||||||
|
![]() INDEX DEFINITION AND PURPOSEAn index in a SQL Server 2000 (SQL Server) database is like an index in a book. The latter enables readers of a book to quickly and efficiently locate the information that is of interest to them through a list of certain keywords and corresponding page numbers. The former enables SQL Server to quickly and efficiently locate the rows of a table that are of interest to a query through a structure containing certain table column values and a series of pointers. Without indexes, a query must perform the expensive task of scanning all rows within a table (a table scan) in order to find the data it requires. Likewise, poorly configured or inappropriate indexes will have a negative affect on performance. A SQL Server index is structured as a Balanced tree (B-tree). A B-tree consists of three categories of nodes: a root node, intermediate nodes and leaf nodes. The nodes of a SQL Server index are logically implemented as special units of storage known as index pages, which contain certain sorted values from one or more columns in a table (the index key) and pointers to other index pages. All search requests begin at the root node and move through the tree to the appropriate leaf nodes. In general, indexes should be created on table columns that are frequently referenced in queries. For example, a table containing customer details may frequently be queried based on a particular customer ID. In this instance, an index on the customer ID column would enable the row in question to be found more quickly than performing a table scan, particularly if thousands of rows exist in the table. Index performance considerations are described later in this overview. INDEX TYPESThere are two types of index in SQL Server: clustered and nonclustered. The primary difference between the two is that a clustered index defines the physical order of the underlying table data whereas a nonclustered index contains a pointer-based logical ordering of the data. Although both take the form of the B-tree described in the preceding section, the two are very different. The fundamental difference is that the leaf nodes in a clustered index contain the actual table rows whereas the leaf nodes in a nonclustered index only contain pointers to the actual table rows. A clustered index physically sorts table data based on the values in the index key such that the leaf nodes (known as the data pages) contain all of the table's rows in an ordered sequence. Since a clustered index sorts the rows, only one can be defined on a table. Specifying a primary key constraint on a table automatically creates a clustered index on the columns contained within the primary key if one doesn't already exist (in which case a nonclustered index is created). A clustered index is most useful on primary key columns since unique values are enforced, creating smaller B-trees and therefore more efficient indexes. A clustered index can be based on non-unique columns although the index will enforce uniqueness internally and is not recommended in general practice. Figure 1 shows the structure of a clustered index. If a clustered index exists on a table, the leaf nodes of any nonclustered indexes created on the same table point to the leaf nodes (the data pages) of the clustered index since they contain the actual table rows. A table can have up to 249 nonclustered indexes created on it. If a table does not have a clustered index, it is unsorted and known as a heap. On a heap, the leaf nodes of a nonclustered index contain pointers to the table rows (row IDs). On a table that has a clustered index, the leaf nodes of a nonclustered index contain clustered index keys rather than row IDs. Figure 1 shows the structure of a nonclustered index. ![]() Figure 1. Example Clustered (Left) and Nonclustered (Right) Index Structures. The leaf nodes of the clustered index show that table contains last name and first name columns. INDEX CHARACTERISTICSA number of characteristics can be defined on a SQL Server index. An index can be defined as unique, meaning the index's key must maintain distinct values. To create a unique index whose key consists of the values in one table column, each value in that column must be distinct. To create a unique index whose key consists of the values in more than one table column (a composite index - see below), the values across all columns must maintain distinctiveness. For example, where a unique index is defined on the last name and national insurance number columns of an employee table, then the combination of the values in these two columns must be distinct. A unique index will also not allow both columns to contain NULL values in more than one row. A composite index's key spans more than one table column. The primary reason for utilising a composite index is to reduce input/output (I/O) operations - a query on a combination of columns that are contained within an index will locate the data it requires in the index without accessing underlying table data. When a query can locate data solely from the information contained within an index, that index is said to be covering. Consider the following query: SELECT id, lastName, natInsurNo FROM employees WHERE dob >= CAST('1 May 1980 00:00:00' AS smalldatetime) ORDER BY dob DESC If a clustered index exists on the id column and a nonclustered index exists on the lastName, natInsurNo and dob columns, the nonclustered index is considered covering. This is because the leaf nodes of the nonclustered index contain the clustered index keys (i.e. the id column values) that are present in the leaf node of the clustered index. When a row is inserted into a table, SQL Server must locate some space for it. If the table is a heap and contains one or more nonclustered indexes, the row and index page (if all other pages are full) are placed in any available space within the heap. If a clustered index exists on the table, the row must be placed in the appropriate index pages of the clustered index such that the sort order is preserved. If a selected page is full, new space must be created by means of a page split. This creates a new adjacent page and transfers half of the contents of the original page to create more space. Page splits are resource intensive, particularly when the insertion of large rows causes many to occur. To reduce the frequency of page splits, the fill factor of the index pages can be specified upon creation of the index. The fill factor is a percentage that specifies how full index pages should be upon creation. A lower fill factor will cause the index to occupy more space but will reduce the frequency of page splits. In high capacity transaction systems, additional space can be allocated to the intermediate index pages by enabling a pad index option upon index creation. When a clustered or nonclustered index is created, the index keys can be sorted either in ascending or descending order. If a query is executed that utilises the index (which is decided by the Query Optimiser) and no ORDER BY clause is specified, the result set will be ordered according to the sort order of the index. The implementation of index characteristics is detailed in the following section. INDEX CREATION AND ADMINISTRATIONThere are several ways to create an index in SQL Server including the Enterprise Manager client tool, Transact-SQL (TSQL) statements and the SQL Distributed Management Objects (SQL-DMO) interface. This document examines the first two methods. To create an index in Enterprise Manager, right click on a table, select All Tasks, then Manage Indexes. Click on the New button to display the Create New Index dialog box shown in figure 2. ![]() Figure 2. Create New Index Dialog Box From this dialogue box you can define the characteristics described in the preceding section and several others including the column order, whether to ignore duplicate values in a unique index, whether index statistics should not be recomputed and the file group that the index should be created on. Some of these extra characteristics are discussed later. Enterprise Manager also provides the Create Index Wizard, which allows the step-by-step assisted configuration of existing and new indexes. To access the wizard, click the Tools menu and then select Create Index Wizard from the Select Wizard dialog box. An index can also be created through the execution of CREATE INDEX, CREATE TABLE and ALTER TABLE TSQL statements. Of these three, the CREATE INDEX syntax is most flexible and allows the specification of index information such as the index name, the table name and the index columns: CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED] INDEX <index_name> ON [<table_name>] (<column_name> [ASC | DESC] [,...n]) [WITH [<index_property> [,...n]] [ON <file_group>] CREATE UNIQUE CLUSTERED INDEX ON employees (id ASC) If any of the optional clauses are omitted, SQL Server creates an index with the default settings. Some of these default settings overridden by the WITH clause are:
CONSTRAINT <constraint_name> [PRIMARY KEY | UNIQUE] [CLUSTERED | NONCLUSTERED] [WITH FILLFACTOR = <fill_factor>] [ON <file_group>] CREATE TABLE employees (id int CONSTRAINT pk_id PRIMARY KEY CLUSTERED) ON PRIMARY ALTER TABLE employees ADD natInsurNo int CONSTRAINT un_natInsurNo UNIQUE NONCLUSTERED Index administration involves deleting, rebuilding and renaming existing indexes. An index can be deleted through the Manage Indexes dialog box of Enterprise Manager shown in figure 3 or by executing the DROP INDEX TSQL statement. The TSQL syntax allows multiple indexes to be dropped as long as the index names are separated by commas and are qualified as two part names. Indexes can be rebuilt in one of three ways: each index can be dropped and recreated manually, each existing index can be overwritten by specifying the DROP_EXISTING option upon index creation or a DBCC DBREINDEX TSQL statement can be executed. The former two can be laborious, particularly if several indexes exist on the table. DBCC DBREINDEX rebuilds some or all of a table's indexes using a single statement and avoids some of the problems that may arise from manually dropping and recreating the indexes. An index can also be renamed by dropping it and recreating it with a different name although a simpler solution is to execute the generic sp_rename system stored procedure with the appropriate parameters. ![]() Figure 3. Manage Indexes Dialog Box INDEX USAGE AND PERFORMANCEThe number and characteristics of indexes created on a database's tables must be carefully considered in order to attain optimal performance. In general, indexes should be created to improve the performance of common queries and index characteristics configured according to requirements, such as uniqueness. By utilising a set of statistics kept on each index, the Query Optimizer is capable of accurately choosing the indexes that should be used in order to optimise the performance of a given query. Index statistics pertain to the distribution of values in columns that indexes use. SQL Server automatically creates and maintains these statistics unless specifically disabled (e.g. when a table's data is unlikely to change). Several specific types of query benefit from the use of indexes:
Thirdly, resource intensive page splits should be anticipated and acted upon by means of setting appropriate fillfactor and pad index values for indexes. Indexes created on volatile tables (i.e. frequent row insertions) should specify low fillfactors in order to minimise page splits at the expense of index size. In contrast, indexes created on non-volatile tables should specify high fillfactors since page splits are less likely. Fourthly, the number of indexes created on a table should be minimised, particularly if the table is volatile. Every time an INSERT, UPDATE or DELETE statement is executed that involves the table, all indexes that exist on that table must be updated to reflect the changes. If the database is read only, this guideline does not apply since statements that attempt to write to the database cannot be executed. Finally, the sorting feature of indexes should be used in queries in order to save resources that would otherwise be consumed by an ORDER BY clause in the query. ![]() Figure 4. Index Tuning Wizard Opening Since indexing can be a complicated and time-consuming area of database design and maintenance, Enterprise Manager includes an Index Tuning Wizard to provide recommendations for choosing correct indexes across one or more tables. The opening page of the wizard is shown in figure 4. The wizard analyses a representative sample of database activity (a workload) to recommend an ideal set of indexes for a database. If database queries change frequently or the database structure is modified frequently, the wizard can be rerun to reflect these changes. The index tuning wizard accepts a number of workload representations: an SQL script (.sql), a SQL Profiler Trace (.trc or .log), a trace table or a script selected in Query Analyzer. INDEXING REVIEWSIndexing is a crucial consideration in optimising database performance. Our database consultants can perform thorough indexing reviews on databases of any size and complexity. Contact our Consultancy Manager at consultancy@abl.co.uk for further information. Alternatively, use our online contact facility to request more information. ABOUT THE AUTHORWill Tomlin is a Microsoft Certified Database Administrator (MCDBA)-qualified consultant with Applied Business Consultancy Limited. Will specialises in the design and development of high performance SQL Server solutions and has worked with prestigious clients across a variety of business sectors. This overview is available as a PDF document at our Downloads page. Site Map
© 1998 - 2007 Applied Business Consultancy Limited |