Skip to main content

Clustered and Non-Clustered Indexes

What is an Index?

An index is a data structure used to quickly retrieve data from a database table. Similar to an index in a book, it allows you to find the desired information without reading every page.

Purpose of Indexes

  1. Improve Query Performance

    • Faster data retrieval
    • Optimization of WHERE, JOIN, and ORDER BY clauses
  2. Ensure Uniqueness

    • Implementation of PRIMARY KEY and UNIQUE constraints
  3. Efficiency of Foreign Keys

    • Faster referential integrity checks

Basic Structure of Indexes: What is B-Tree?

SQL Server indexes use a balanced tree structure called B-Tree (Balanced Tree).

B-Tree Hierarchy

Root Node
┌─────────────┐
│ [50, 100] │
└──────┬──────┘
┌───────────────┼───────────────┐
│ │ │
Intermediate Node Intermediate Node Intermediate Node
┌──────────┐ ┌──────────┐ ┌──────────┐
│[10, 30] │ │[60, 80] │ │[110, 130]│
└─────┬────┘ └─────┬────┘ └─────┬────┘
┌─────┼─────┐ ┌──────┼─────┐ ┌──────┼─────┐
│ │ │ │ │ │ │ │ │
Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node
┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐
│1-9 ││11-29 ││31-49 ││51-79 ││81-99 ││101+ │
└──────┘└──────┘└──────┘└──────┘└──────┘└──────┘

Features of B-Tree

  • Balanced Tree: All leaf nodes are at the same depth
  • Efficient Search: O(log n) complexity
  • Sorted: Key values are always sorted
  • Good for Range Searches: Efficient retrieval of continuous data

Clustered Index

Definition of Clustered Index

A clustered index physically reorders and stores the table data itself according to the index key.

Important Characteristics

  1. Only one per table

    • There can be only one physical order of data
    • PRIMARY KEY is clustered by default
  2. Leaf nodes contain data rows themselves

    • Index and data are integrated
    • No additional lookup required
  3. Determines physical order

    • Data pages themselves are part of the index

Structure of Clustered Index

Root Node (B-Tree Upper Level)


Intermediate Node (B-Tree Middle Level)


Leaf Node = Data Page (Actual Table Data)
┌───────────────────────────────────────────┐
│ CustomerID │ CustomerName │ City │
├───────────────────────────────────────────┤
│ 1 │ Alice Johnson │ New York │
│ 2 │ Bob Smith │ Chicago │
│ 3 │ Carol White │ Seattle │
└───────────────────────────────────────────┘
↑ Data itself is stored in leaf nodes

Creating a Clustered Index

-- Created as PRIMARY KEY during table creation (Clustered by default)
CREATE TABLE Customers
(
CustomerID INT PRIMARY KEY CLUSTERED, -- Clustered Index
CustomerName NVARCHAR(100),
Email NVARCHAR(100),
City NVARCHAR(50)
);

-- Add a clustered index to an existing table
CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (OrderDate);

-- Make PRIMARY KEY non-clustered and another column clustered
CREATE TABLE Orders
(
OrderID INT PRIMARY KEY NONCLUSTERED, -- Non-Clustered
OrderDate DATETIME,
CustomerID INT,
TotalAmount DECIMAL(18,2)
);

CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (OrderDate);

Benefits of Clustered Index

1. Fast Range Searches

-- Range search: Very fast with clustered index
SELECT *
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-31';
-- Fast reading with sequential I/O because data is physically contiguous

2. No Additional Lookup Required

-- No additional access needed when retrieving all columns
SELECT OrderID, OrderDate, CustomerID, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';
-- Completed in one access because leaf nodes contain all data

3. ORDER BY May Not Be Needed

-- Automatically sorted by clustered index key
SELECT *
FROM Orders
ORDER BY OrderDate;
-- Additional sort not needed because data is already physically ordered by OrderDate

Disadvantages of Clustered Index

1. Impact on Insertion Performance

-- Page split occurs when inserting with random values
INSERT INTO Orders (OrderID, OrderDate, CustomerID, TotalAmount)
VALUES (5000, '2023-06-15', 123, 500.00);
-- Since OrderDate is in 2023, it is inserted between existing 2024 data
-- -> Page split occurs, performance degrades

2. Physical Movement on Update

-- Updating clustered key involves physical movement
UPDATE Orders
SET OrderDate = '2025-01-01' -- Change date significantly
WHERE OrderID = 1234;
-- High cost because the row needs to move physically

Non-Clustered Index

Definition of Non-Clustered Index

A non-clustered index manages index keys and pointers to data separately. It is similar to an index in a book, where you find the page number from the index and then access the text.

Important Characteristics

  1. Maximum 999 per table

    • SQL Server 2008 and later: 999
    • SQL Server 2005 and earlier: 249
  2. Leaf nodes contain pointers

    • With Clustered: Stores clustered key value
    • Without Clustered (Heap): Stores RID (Row Identifier)
  3. Independent B-Tree Structure

    • Has a structure separate from data

Structure of Non-Clustered Index (With Clustered)

Non-Clustered Index
Root Node


Intermediate Node


Leaf Node (Index Key + Clustered Key)
┌────────────────────────────────┐
│ Email │ CustomerID│ ← Pointer to Clustered Key
├────────────────────────────────┤
│ alice@example.com │ 1 │
│ bob@example.com │ 2 │
│ carol@example.com │ 3 │
└────────────────────────────────┘

▼ Lookup via Clustered Index
┌───────────────────────────────────────────┐
│ CustomerID │ CustomerName │ City │ ← Clustered Index (Actual Data)
├───────────────────────────────────────────┤
│ 1 │ Alice Johnson │ New York │
│ 2 │ Bob Smith │ Chicago │
│ 3 │ Carol White │ Seattle │
└───────────────────────────────────────────┘

Creating a Non-Clustered Index

-- Single column non-clustered index
CREATE NONCLUSTERED INDEX IX_Customers_Email
ON Customers (Email);

-- Composite index (multiple columns)
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_OrderDate
ON Orders (CustomerID, OrderDate);

-- Covering index using INCLUDE clause
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_INC
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount);

-- Filtered index
CREATE NONCLUSTERED INDEX IX_Orders_ActiveOnly
ON Orders (OrderDate)
WHERE Status = 'Active';

-- Unique index
CREATE UNIQUE NONCLUSTERED INDEX UQ_Customers_Email
ON Customers (Email);

Benefits of Non-Clustered Index

1. Fast Search for Specific Columns

-- Fast search by Email
SELECT CustomerID, CustomerName
FROM Customers
WHERE Email = 'alice@example.com';
-- Fast search using IX_Customers_Email index

2. Support Diverse Queries with Multiple Indexes

-- Support different query patterns
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);
CREATE INDEX IX_Orders_OrderDate ON Orders (OrderDate);
CREATE INDEX IX_Orders_Status ON Orders (Status);

-- Each query uses the appropriate index
SELECT * FROM Orders WHERE CustomerID = 123; -- IX_Orders_CustomerID
SELECT * FROM Orders WHERE OrderDate = '2024-01-01'; -- IX_Orders_OrderDate
SELECT * FROM Orders WHERE Status = 'Active'; -- IX_Orders_Status

3. Best Performance with Covering Index

-- Create covering index with INCLUDE clause
CREATE INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount, Status);

-- This query completes with index only (No Key Lookup)
SELECT OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
-- No table access needed because index contains all necessary columns

Disadvantages of Non-Clustered Index

1. Key Lookup Overhead

-- Retrieving columns not in the index
SELECT CustomerID, CustomerName, Email, Phone, Address
FROM Customers
WHERE Email = 'alice@example.com';

-- Process executed:
-- 1. Search Email in IX_Customers_Email -> Get CustomerID (Clustered Key)
-- 2. Lookup CustomerID in Clustered Index -> Get all data
-- ⚠️ Requires 2 accesses (Key Lookup)

2. Storage Overhead

-- More indexes consume more storage
-- Each non-clustered index has an independent B-Tree structure

-- Example: 5 non-clustered indexes on Customers table
-- -> Requires storage for table data + 5 index trees

3. Update Cost

-- When updating data, all related indexes must also be updated
UPDATE Orders
SET TotalAmount = 1000.00
WHERE OrderID = 123;

-- Elements updated:
-- 1. Table data (Clustered Index)
-- 2. All non-clustered indexes containing TotalAmount
-- 3. Related statistics

Clustered vs Non-Clustered: Comparison Table

ItemClustered IndexNon-Clustered Index
Number per TableOnly 1Max 999
Leaf NodeData row itselfIndex key + Pointer
Physical ArrangementArranged in index orderSeparate structure
Range SearchVery fastFast (but Key Lookup may occur)
StorageIntegrated with table dataAdditional storage required
Insertion PerformancePotential for page splitsLess impact
Update PerformancePhysical movement on key updateUpdated independently
Recommended UsePRIMARY KEY, Range SearchWHERE clause, JOIN columns

Index Seek vs Index Scan

There are mainly two ways SQL Server accesses data using indexes.

Index Seek

Definition: A method to efficiently search for specific rows using the B-Tree structure.

How it works

Navigate B-Tree from top to bottom

Root Node: Check [50, 100]
↓ 25 < 50 so go left
Intermediate Node: Check [10, 30]
↓ 25 > 10 and 25 < 30 so go center
Leaf Node: Directly access page containing 25

Retrieve target data (access only a few pages)

Conditions for Index Seek

-- 1. Equality condition (=)
SELECT * FROM Customers WHERE CustomerID = 123;

-- 2. Range condition (small range)
SELECT * FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-07';

-- 3. IN clause (few values)
SELECT * FROM Customers WHERE CustomerID IN (1, 2, 3);

-- 4. Using leading column of composite index
-- Index: IX_Orders_CustomerID_OrderDate (CustomerID, OrderDate)
SELECT * FROM Orders WHERE CustomerID = 123; -- ✓ Seek possible

-- 5. LIKE clause (prefix match)
SELECT * FROM Customers WHERE Email LIKE 'alice%';

Features of Index Seek

  • Fast: Access only necessary pages
  • Efficient: Minimal I/O
  • Scalable: Little speed degradation as data volume increases
  • 📊 Logical Reads: Usually a few to dozens of pages

Index Scan

Definition: A method to scan the leaf level of the index sequentially from the beginning.

How it works

Sequentially read all leaf nodes of the index

Leaf Node 1 -> Leaf Node 2 -> Leaf Node 3 -> ... -> Leaf Node N
↓ ↓ ↓ ↓
Scan all pages (Access all pages)

Conditions for Index Scan

-- 1. No WHERE clause
SELECT * FROM Orders;

-- 2. Condition not using index key
-- Index: IX_Orders_OrderDate (OrderDate)
SELECT * FROM Orders WHERE CustomerID = 123; -- Not using OrderDate

-- 3. Using functions
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;

-- 4. Using NOT, !=, <>
SELECT * FROM Orders WHERE Status != 'Cancelled';

-- 5. LIKE clause (suffix match, infix match)
SELECT * FROM Customers WHERE Email LIKE '%@example.com';

-- 6. OR condition (different indexes)
SELECT * FROM Orders
WHERE CustomerID = 123 OR OrderDate = '2024-01-01';

-- 7. Not using leading column of composite index
-- Index: IX_Orders_CustomerID_OrderDate (CustomerID, OrderDate)
SELECT * FROM Orders WHERE OrderDate = '2024-01-01'; -- ✗ Scan occurs

Features of Index Scan

  • ⚠️ Slow: Access all index pages
  • ⚠️ Increased I/O: Large amount of page reads
  • ⚠️ Scale Issue: Slows down in proportion to data volume
  • 📊 Logical Reads: Thousands to tens of thousands of pages or more

Table Scan

Definition: A method to read the entire table without using an index.

-- Heap (No Clustered Index)
-- Occurs when index cannot be used or does not exist

-- Example: Search on column without index
SELECT * FROM Orders WHERE Notes LIKE '%urgent%';
-- No index on Notes -> Table Scan

Features of Table Scan

  • Very Slow: Reads entire table
  • Worst Performance: Most I/O
  • ⚠️ Acceptable for small tables: No problem for a few hundred rows

Seek vs Scan: Example

-- Table: Orders (1 million rows)
-- Index: IX_Orders_OrderDate (OrderDate)
-- Index: IX_Orders_CustomerID (CustomerID)

-- ✅ Index Seek (Fast)
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
-- Execution: Seek on IX_Orders_OrderDate
-- Logical Reads: 5 pages
-- Execution Time: 1ms

-- ⚠️ Index Scan (Slow)
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;
-- Execution: Scan on IX_Orders_OrderDate (due to function use)
-- Logical Reads: 5,000 pages
-- Execution Time: 500ms

-- Improvement: Do not use function
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';
-- Execution: Seek on IX_Orders_OrderDate
-- Logical Reads: 1,000 pages (only 2024 data)
-- Execution Time: 100ms

Criteria: When Seek and When Scan

ConditionOperationReason
Retrieved rows < 1% of totalIndex SeekEfficiently retrieve few rows
Retrieved rows > 20% of totalIndex/Table ScanBulk reading is more efficient than repeated Seeks
No WHERE clauseTable/Index ScanTo retrieve all rows
Equality condition on index keyIndex SeekB-Tree search is optimal
Function usageIndex ScanIndex key is transformed
Small table (<1000 rows)Table ScanFaster than Seek overhead

Execution Plan

What is an Execution Plan?

An execution plan shows the specific steps selected by SQL Server to execute a query. It is a "blueprint of the optimal execution method" generated by the query optimizer.

Importance of Execution Plan

  1. Identify performance bottlenecks
  2. Check effectiveness of indexes
  3. Determine direction of query optimization
  4. Diagnose unexpected behavior

Types of Execution Plans

1. Estimated Execution Plan

-- SQL Server Management Studio (SSMS)
-- Ctrl + L or "Display Estimated Execution Plan" from menu
  • Displayed without executing the query
  • Plan generated by optimizer
  • Prediction based on statistics

2. Actual Execution Plan

-- SQL Server Management Studio (SSMS)
-- Enable with Ctrl + M then execute query
-- Or "Include Actual Execution Plan" from menu
  • Obtained by actually executing the query
  • Includes actual row counts and execution time
  • Best for performance analysis

How to View Execution Plans

Method 1: SSMS Graphical Execution Plan

-- 1. Write query in SSMS
SELECT c.CustomerID, c.CustomerName, o.OrderDate, o.TotalAmount
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= '2024-01-01';

-- 2. Press Ctrl + M to enable "Include Actual Execution Plan"
-- 3. Execute query
-- 4. Open "Execution Plan" tab

How to read graphical plan:

SELECT

Nested Loops Join (80%)
↗ ↖
Index Seek (10%) Index Seek (10%)
IX_Customers_PK IX_Orders_OrderDate
↑ ↑
[Customers] [Orders]

Information displayed on each operator:
- Operator name (Index Seek, Nested Loops Join, etc.)
- Cost (%): Relative cost to the entire query
- Estimated rows vs Actual rows
- Estimated number of executions

Method 2: SET STATISTICS PROFILE

-- Text format execution plan
SET SHOWPLAN_TEXT ON;
GO

SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
GO

SET SHOWPLAN_TEXT OFF;
GO

-- Output Example:
-- StmtText
-- |--Index Seek(OBJECT:([Orders].[IX_Orders_OrderDate]),
-- SEEK:([OrderDate]='2024-01-15'))

Method 3: SET STATISTICS IO/TIME

-- Display I/O statistics and timing information
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT * FROM Orders WHERE OrderDate = '2024-01-15';

-- Output:
-- Table 'Orders'. Scan count 1, logical reads 5, physical reads 0
-- SQL Server Execution Times:
-- CPU time = 0 ms, elapsed time = 1 ms.

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Method 4: XML Format Execution Plan

SET SHOWPLAN_XML ON;
GO

SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
GO

SET SHOWPLAN_XML OFF;
GO

-- Get detailed information in XML (Useful for programmatic analysis)

Major Operators in Execution Plan

1. Data Access Operators

Clustered Index Scan

-- Scan entire clustered index
SELECT * FROM Orders;

-- Execution Plan:
-- Clustered Index Scan (Orders)
-- Cost: 100%
-- Logical Reads: 10,000 pages

Clustered Index Seek

-- Search specific row in clustered index
SELECT * FROM Orders WHERE OrderID = 123;

-- Execution Plan:
-- Clustered Index Seek (Orders.PK_Orders)
-- Seek Predicates: OrderID = 123
-- Cost: 0.0033%
-- Logical Reads: 3 pages

Index Scan (Non-Clustered)

-- Scan entire non-clustered index
SELECT OrderDate FROM Orders;

-- Execution Plan:
-- Index Scan (Orders.IX_Orders_OrderDate)
-- Cost: 50%
-- Logical Reads: 2,000 pages

Index Seek (Non-Clustered)

-- Search specific row in non-clustered index
SELECT OrderID FROM Orders WHERE OrderDate = '2024-01-15';

-- Execution Plan:
-- Index Seek (Orders.IX_Orders_OrderDate)
-- Seek Predicates: OrderDate = '2024-01-15'
-- Cost: 0.01%
-- Logical Reads: 5 pages

Key Lookup

-- Additional access to retrieve columns not in index
SELECT OrderID, OrderDate, CustomerName, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';

-- Execution Plan:
-- Nested Loops Join
-- ├─ Index Seek (IX_Orders_OrderDate) -- Search by OrderDate
-- └─ Key Lookup (PK_Orders) -- Retrieve remaining columns
-- ⚠️ Key Lookup is high cost (Clustered access for each row)

Table Scan

-- Scan entire heap table (no clustered index)
SELECT * FROM TempData; -- No clustered index

-- Execution Plan:
-- Table Scan (TempData)
-- Cost: 100%
-- Worst performance

2. Join Operators

Nested Loops Join

-- Optimal for joining small datasets
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.CustomerID = 123;

-- Execution Plan:
-- Nested Loops Join
-- ├─ Index Seek (Customers.PK_Customers) -- Outer loop (Driving table)
-- └─ Index Seek (Orders.IX_Orders_CustomerID) -- Inner loop
-- Characteristics: Search inner for each row of outer (Loop)
-- Application: Small dataset, index exists

Hash Match Join

-- Optimal for joining large datasets
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;

-- Execution Plan:
-- Hash Match (Inner Join)
-- ├─ Index Scan (Customers) -- Build input (Build hash table)
-- └─ Index Scan (Orders) -- Probe input (Search hash table)
-- Characteristics: Create hash table and match
-- Application: Large dataset, no index

Merge Join

-- Optimal for joining sorted data
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
ORDER BY c.CustomerID;

-- Execution Plan:
-- Merge Join (Inner Join)
-- ├─ Index Scan (Customers.PK_Customers) -- Sorted
-- └─ Index Scan (Orders.IX_Orders_CustomerID) -- Sorted
-- Characteristics: Merge assuming both data are sorted
-- Application: Sorted data, 1-to-many join

3. Aggregation and Sort Operators

Sort

-- Sort data
SELECT * FROM Orders ORDER BY TotalAmount DESC;

-- Execution Plan:
-- Sort (ORDER BY: TotalAmount DESC)
-- └─ Clustered Index Scan (Orders)
-- ⚠️ Sort is high cost (may use tempdb)

Stream Aggregate

-- Aggregation of sorted data (Efficient)
SELECT CustomerID, COUNT(*), SUM(TotalAmount)
FROM Orders
GROUP BY CustomerID;

-- Execution Plan (If index is in CustomerID order):
-- Stream Aggregate (GROUP BY: CustomerID)
-- └─ Index Scan (IX_Orders_CustomerID) -- Already sorted
-- Characteristics: Sequentially aggregate sorted data

Hash Aggregate

-- Aggregation without sorting (For large data)
SELECT Status, COUNT(*)
FROM Orders
GROUP BY Status;

-- Execution Plan:
-- Hash Aggregate (GROUP BY: Status)
-- └─ Clustered Index Scan (Orders)
-- Characteristics: Aggregate with hash table

How to Read Execution Plans

1. Check Cost

Query Cost: 100%
├─ Nested Loops Join: 80% ⚠️ Highest cost
│ ├─ Index Scan: 15%
│ └─ Key Lookup: 65% ⚠️ Bottleneck
└─ Index Seek: 20%

Criteria:

  • 50% or more: Target for optimization
  • Key Lookup: Consider covering index
  • Scan: Check if it can be changed to Seek

2. Check Row Count Discrepancy

Operator: Index Seek
├─ Estimated Rows: 10
└─ Actual Rows: 100,000 ⚠️ Large discrepancy

Causes:

  • Outdated statistics
  • Skewed data distribution

Countermeasures:

-- Update statistics
UPDATE STATISTICS Orders WITH FULLSCAN;

3. Check Warning Marks

⚠️ Warning: Implicit conversion
⚠️ Warning: Missing statistics
⚠️ Warning: Spillover due to insufficient memory

Execution Plan Optimization Examples

Example 1: Removing Key Lookup

Problematic Query:

-- Query
SELECT OrderID, OrderDate, CustomerID, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';

-- Execution Plan:
-- Nested Loops Join (Cost: 100%)
-- ├─ Index Seek (IX_Orders_OrderDate): 10%
-- └─ Key Lookup (PK_Orders): 90% ⚠️ Bottleneck

Solution: Covering Index

-- Create index
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate_Covering
ON Orders (OrderDate)
INCLUDE (CustomerID, TotalAmount);

-- Improved Execution Plan:
-- Index Seek (IX_Orders_OrderDate_Covering): 100%
-- Key Lookup disappeared!

Example 2: From Index Scan to Index Seek

Problematic Query:

-- Query
SELECT * FROM Orders
WHERE YEAR(OrderDate) = 2024;

-- Execution Plan:
-- Clustered Index Scan (Cost: 100%) ⚠️ Full scan
-- Logical Reads: 10,000 pages

Solution: Remove Function

-- Improved Query
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01'
AND OrderDate < '2025-01-01';

-- Improved Execution Plan:
-- Clustered Index Seek (Cost: 20%)
-- Logical Reads: 2,000 pages (80% reduction!)

Example 3: Adding Appropriate Index

Problematic Query:

-- Query
SELECT * FROM Orders
WHERE CustomerID = 123 AND Status = 'Active';

-- Execution Plan:
-- Clustered Index Scan (Cost: 100%) ⚠️ No index
-- Logical Reads: 10,000 pages

Solution: Composite Index

-- Create composite index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Status
ON Orders (CustomerID, Status);

-- Improved Execution Plan:
-- Index Seek (IX_Orders_CustomerID_Status): 100%
-- Logical Reads: 5 pages (99.95% reduction!)

Index Design Best Practices

1. Choosing Clustered Index

-- ✅ Recommended: Monotonically increasing value
CREATE TABLE Orders
(
OrderID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED, -- Sequentially increasing
OrderDate DATETIME,
CustomerID INT
);

-- ❌ Not Recommended: Random value (GUID)
CREATE TABLE Orders
(
OrderID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY CLUSTERED, -- Random
OrderDate DATETIME,
CustomerID INT
);
-- Problem: Frequent page splits, fragmentation progresses

Recommended Criteria:

  • ✅ Monotonically increasing (IDENTITY, SEQUENCE)
  • ✅ Columns frequently used in range searches
  • ✅ Frequently used in GROUP BY, ORDER BY
  • ❌ Frequently updated columns
  • ❌ Random values (NEWID())

2. Choosing Non-Clustered Index

-- ✅ Recommended: Columns frequently used in WHERE clause
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);

-- ✅ Recommended: JOIN columns
CREATE INDEX IX_OrderDetails_OrderID ON OrderDetails (OrderID);

-- ✅ Recommended: Composite index (Combination of search conditions)
CREATE INDEX IX_Orders_CustomerID_Status
ON Orders (CustomerID, Status);

-- ⚠️ Caution: Low selectivity columns (Few unique values)
CREATE INDEX IX_Orders_Status ON Orders (Status);
-- Status has only 2 values 'Active', 'Inactive' -> Limited effect

Recommended Criteria:

  • ✅ Frequently used in WHERE clause, JOIN clause
  • ✅ High selectivity (Many unique values)
  • ✅ Composite index should have most selective column first
  • ❌ Very low selectivity columns (Gender, Flags, etc.)
  • ❌ Do not create indiscriminately on frequently updated columns

3. Leveraging INCLUDE Clause

-- ✅ Avoid Key Lookup with covering index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID) -- Key column
INCLUDE (OrderDate, TotalAmount, Status); -- Included columns

-- This query completes with index only
SELECT CustomerID, OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
-- No Key Lookup -> Fast!

Benefits of INCLUDE:

  • Stored only at leaf level (Not included in upper levels of B-Tree)
  • Suppress increase in index size
  • Avoid Key Lookup

4. Filtered Index

-- ✅ Index only data with specific condition
CREATE NONCLUSTERED INDEX IX_Orders_Active
ON Orders (OrderDate)
WHERE Status = 'Active';

-- This query uses small index
SELECT * FROM Orders
WHERE Status = 'Active' AND OrderDate >= '2024-01-01';
-- Index size: Only Active data -> Small -> Fast