Clustered and Non-Clustered Indexes
What is an Index?
An index is a data structure used to quickly retrieve data from a database table. Similar to an index in a book, it allows you to find the desired information without reading every page.
Purpose of Indexes
-
Improve Query Performance
- Faster data retrieval
- Optimization of WHERE, JOIN, and ORDER BY clauses
-
Ensure Uniqueness
- Implementation of PRIMARY KEY and UNIQUE constraints
-
Efficiency of Foreign Keys
- Faster referential integrity checks
Basic Structure of Indexes: What is B-Tree?
SQL Server indexes use a balanced tree structure called B-Tree (Balanced Tree).
B-Tree Hierarchy
Root Node
┌─────────────┐
│ [50, 100] │
└──────┬──────┘
┌───────────────┼───────────────┐
│ │ │
Intermediate Node Intermediate Node Intermediate Node
┌──────────┐ ┌──────────┐ ┌──────────┐
│[10, 30] │ │[60, 80] │ │[110, 130]│
└─────┬────┘ └─────┬────┘ └─────┬────┘
┌─────┼─────┐ ┌──────┼─────┐ ┌──────┼─────┐
│ │ │ │ │ │ │ │ │
Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node
┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐
│1-9 ││11-29 ││31-49 ││51-79 ││81-99 ││101+ │
└──────┘└──────┘└──────┘└──────┘└──────┘└──────┘
Features of B-Tree
- Balanced Tree: All leaf nodes are at the same depth
- Efficient Search: O(log n) complexity
- Sorted: Key values are always sorted
- Good for Range Searches: Efficient retrieval of continuous data
Clustered Index
Definition of Clustered Index
A clustered index physically reorders and stores the table data itself according to the index key.
Important Characteristics
-
Only one per table
- There can be only one physical order of data
- PRIMARY KEY is clustered by default
-
Leaf nodes contain data rows themselves
- Index and data are integrated
- No additional lookup required
-
Determines physical order
- Data pages themselves are part of the index
Structure of Clustered Index
Root Node (B-Tree Upper Level)
│
▼
Intermediate Node (B-Tree Middle Level)
│
▼
Leaf Node = Data Page (Actual Table Data)
┌───────────────────────────────────────────┐
│ CustomerID │ CustomerName │ City │
├───────────────────────────────────────────┤
│ 1 │ Alice Johnson │ New York │
│ 2 │ Bob Smith │ Chicago │
│ 3 │ Carol White │ Seattle │
└───────────────────────────────────────────┘
↑ Data itself is stored in leaf nodes
Creating a Clustered Index
-- Created as PRIMARY KEY during table creation (Clustered by default)
CREATE TABLE Customers
(
CustomerID INT PRIMARY KEY CLUSTERED, -- Clustered Index
CustomerName NVARCHAR(100),
Email NVARCHAR(100),
City NVARCHAR(50)
);
-- Add a clustered index to an existing table
CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (OrderDate);
-- Make PRIMARY KEY non-clustered and another column clustered
CREATE TABLE Orders
(
OrderID INT PRIMARY KEY NONCLUSTERED, -- Non-Clustered
OrderDate DATETIME,
CustomerID INT,
TotalAmount DECIMAL(18,2)
);
CREATE CLUSTERED INDEX IX_Orders_OrderDate
ON Orders (OrderDate);
Benefits of Clustered Index
1. Fast Range Searches
-- Range search: Very fast with clustered index
SELECT *
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-31';
-- Fast reading with sequential I/O because data is physically contiguous
2. No Additional Lookup Required
-- No additional access needed when retrieving all columns
SELECT OrderID, OrderDate, CustomerID, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';
-- Completed in one access because leaf nodes contain all data
3. ORDER BY May Not Be Needed
-- Automatically sorted by clustered index key
SELECT *
FROM Orders
ORDER BY OrderDate;
-- Additional sort not needed because data is already physically ordered by OrderDate
Disadvantages of Clustered Index
1. Impact on Insertion Performance
-- Page split occurs when inserting with random values
INSERT INTO Orders (OrderID, OrderDate, CustomerID, TotalAmount)
VALUES (5000, '2023-06-15', 123, 500.00);
-- Since OrderDate is in 2023, it is inserted between existing 2024 data
-- -> Page split occurs, performance degrades
2. Physical Movement on Update
-- Updating clustered key involves physical movement
UPDATE Orders
SET OrderDate = '2025-01-01' -- Change date significantly
WHERE OrderID = 1234;
-- High cost because the row needs to move physically
Non-Clustered Index
Definition of Non-Clustered Index
A non-clustered index manages index keys and pointers to data separately. It is similar to an index in a book, where you find the page number from the index and then access the text.
Important Characteristics
-
Maximum 999 per table
- SQL Server 2008 and later: 999
- SQL Server 2005 and earlier: 249
-
Leaf nodes contain pointers
- With Clustered: Stores clustered key value
- Without Clustered (Heap): Stores RID (Row Identifier)
-
Independent B-Tree Structure
- Has a structure separate from data
Structure of Non-Clustered Index (With Clustered)
Non-Clustered Index
Root Node
│
▼
Intermediate Node
│
▼
Leaf Node (Index Key + Clustered Key)
┌────────────────────────────────┐
│ Email │ CustomerID│ ← Pointer to Clustered Key
├────────────────────────────────┤
│ alice@example.com │ 1 │
│ bob@example.com │ 2 │
│ carol@example.com │ 3 │
└────────────────────────────────┘
│
▼ Lookup via Clustered Index
┌───────────────────────────────────────────┐
│ CustomerID │ CustomerName │ City │ ← Clustered Index (Actual Data)
├───────────────────────────────────────────┤
│ 1 │ Alice Johnson │ New York │
│ 2 │ Bob Smith │ Chicago │
│ 3 │ Carol White │ Seattle │
└───────────────────────────────────────────┘
Creating a Non-Clustered Index
-- Single column non-clustered index
CREATE NONCLUSTERED INDEX IX_Customers_Email
ON Customers (Email);
-- Composite index (multiple columns)
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_OrderDate
ON Orders (CustomerID, OrderDate);
-- Covering index using INCLUDE clause
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_INC
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount);
-- Filtered index
CREATE NONCLUSTERED INDEX IX_Orders_ActiveOnly
ON Orders (OrderDate)
WHERE Status = 'Active';
-- Unique index
CREATE UNIQUE NONCLUSTERED INDEX UQ_Customers_Email
ON Customers (Email);
Benefits of Non-Clustered Index
1. Fast Search for Specific Columns
-- Fast search by Email
SELECT CustomerID, CustomerName
FROM Customers
WHERE Email = 'alice@example.com';
-- Fast search using IX_Customers_Email index
2. Support Diverse Queries with Multiple Indexes
-- Support different query patterns
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);
CREATE INDEX IX_Orders_OrderDate ON Orders (OrderDate);
CREATE INDEX IX_Orders_Status ON Orders (Status);
-- Each query uses the appropriate index
SELECT * FROM Orders WHERE CustomerID = 123; -- IX_Orders_CustomerID
SELECT * FROM Orders WHERE OrderDate = '2024-01-01'; -- IX_Orders_OrderDate
SELECT * FROM Orders WHERE Status = 'Active'; -- IX_Orders_Status
3. Best Performance with Covering Index
-- Create covering index with INCLUDE clause
CREATE INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount, Status);
-- This query completes with index only (No Key Lookup)
SELECT OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
-- No table access needed because index contains all necessary columns
Disadvantages of Non-Clustered Index
1. Key Lookup Overhead
-- Retrieving columns not in the index
SELECT CustomerID, CustomerName, Email, Phone, Address
FROM Customers
WHERE Email = 'alice@example.com';
-- Process executed:
-- 1. Search Email in IX_Customers_Email -> Get CustomerID (Clustered Key)
-- 2. Lookup CustomerID in Clustered Index -> Get all data
-- ⚠️ Requires 2 accesses (Key Lookup)
2. Storage Overhead
-- More indexes consume more storage
-- Each non-clustered index has an independent B-Tree structure
-- Example: 5 non-clustered indexes on Customers table
-- -> Requires storage for table data + 5 index trees
3. Update Cost
-- When updating data, all related indexes must also be updated
UPDATE Orders
SET TotalAmount = 1000.00
WHERE OrderID = 123;
-- Elements updated:
-- 1. Table data (Clustered Index)
-- 2. All non-clustered indexes containing TotalAmount
-- 3. Related statistics
Clustered vs Non-Clustered: Comparison Table
| Item | Clustered Index | Non-Clustered Index |
|---|---|---|
| Number per Table | Only 1 | Max 999 |
| Leaf Node | Data row itself | Index key + Pointer |
| Physical Arrangement | Arranged in index order | Separate structure |
| Range Search | Very fast | Fast (but Key Lookup may occur) |
| Storage | Integrated with table data | Additional storage required |
| Insertion Performance | Potential for page splits | Less impact |
| Update Performance | Physical movement on key update | Updated independently |
| Recommended Use | PRIMARY KEY, Range Search | WHERE clause, JOIN columns |
Index Seek vs Index Scan
There are mainly two ways SQL Server accesses data using indexes.
Index Seek
Definition: A method to efficiently search for specific rows using the B-Tree structure.
How it works
Navigate B-Tree from top to bottom
Root Node: Check [50, 100]
↓ 25 < 50 so go left
Intermediate Node: Check [10, 30]
↓ 25 > 10 and 25 < 30 so go center
Leaf Node: Directly access page containing 25
↓
Retrieve target data (access only a few pages)
Conditions for Index Seek
-- 1. Equality condition (=)
SELECT * FROM Customers WHERE CustomerID = 123;
-- 2. Range condition (small range)
SELECT * FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-07';
-- 3. IN clause (few values)
SELECT * FROM Customers WHERE CustomerID IN (1, 2, 3);
-- 4. Using leading column of composite index
-- Index: IX_Orders_CustomerID_OrderDate (CustomerID, OrderDate)
SELECT * FROM Orders WHERE CustomerID = 123; -- ✓ Seek possible
-- 5. LIKE clause (prefix match)
SELECT * FROM Customers WHERE Email LIKE 'alice%';
Features of Index Seek
- ✅ Fast: Access only necessary pages
- ✅ Efficient: Minimal I/O
- ✅ Scalable: Little speed degradation as data volume increases
- 📊 Logical Reads: Usually a few to dozens of pages
Index Scan
Definition: A method to scan the leaf level of the index sequentially from the beginning.
How it works
Sequentially read all leaf nodes of the index
Leaf Node 1 -> Leaf Node 2 -> Leaf Node 3 -> ... -> Leaf Node N
↓ ↓ ↓ ↓
Scan all pages (Access all pages)
Conditions for Index Scan
-- 1. No WHERE clause
SELECT * FROM Orders;
-- 2. Condition not using index key
-- Index: IX_Orders_OrderDate (OrderDate)
SELECT * FROM Orders WHERE CustomerID = 123; -- Not using OrderDate
-- 3. Using functions
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;
-- 4. Using NOT, !=, <>
SELECT * FROM Orders WHERE Status != 'Cancelled';
-- 5. LIKE clause (suffix match, infix match)
SELECT * FROM Customers WHERE Email LIKE '%@example.com';
-- 6. OR condition (different indexes)
SELECT * FROM Orders
WHERE CustomerID = 123 OR OrderDate = '2024-01-01';
-- 7. Not using leading column of composite index
-- Index: IX_Orders_CustomerID_OrderDate (CustomerID, OrderDate)
SELECT * FROM Orders WHERE OrderDate = '2024-01-01'; -- ✗ Scan occurs
Features of Index Scan
- ⚠️ Slow: Access all index pages
- ⚠️ Increased I/O: Large amount of page reads
- ⚠️ Scale Issue: Slows down in proportion to data volume
- 📊 Logical Reads: Thousands to tens of thousands of pages or more
Table Scan
Definition: A method to read the entire table without using an index.
-- Heap (No Clustered Index)
-- Occurs when index cannot be used or does not exist
-- Example: Search on column without index
SELECT * FROM Orders WHERE Notes LIKE '%urgent%';
-- No index on Notes -> Table Scan
Features of Table Scan
- ❌ Very Slow: Reads entire table
- ❌ Worst Performance: Most I/O
- ⚠️ Acceptable for small tables: No problem for a few hundred rows
Seek vs Scan: Example
-- Table: Orders (1 million rows)
-- Index: IX_Orders_OrderDate (OrderDate)
-- Index: IX_Orders_CustomerID (CustomerID)
-- ✅ Index Seek (Fast)
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
-- Execution: Seek on IX_Orders_OrderDate
-- Logical Reads: 5 pages
-- Execution Time: 1ms
-- ⚠️ Index Scan (Slow)
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;
-- Execution: Scan on IX_Orders_OrderDate (due to function use)
-- Logical Reads: 5,000 pages
-- Execution Time: 500ms
-- Improvement: Do not use function
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';
-- Execution: Seek on IX_Orders_OrderDate
-- Logical Reads: 1,000 pages (only 2024 data)
-- Execution Time: 100ms
Criteria: When Seek and When Scan
| Condition | Operation | Reason |
|---|---|---|
| Retrieved rows < 1% of total | Index Seek | Efficiently retrieve few rows |
| Retrieved rows > 20% of total | Index/Table Scan | Bulk reading is more efficient than repeated Seeks |
| No WHERE clause | Table/Index Scan | To retrieve all rows |
| Equality condition on index key | Index Seek | B-Tree search is optimal |
| Function usage | Index Scan | Index key is transformed |
| Small table (<1000 rows) | Table Scan | Faster than Seek overhead |
Execution Plan
What is an Execution Plan?
An execution plan shows the specific steps selected by SQL Server to execute a query. It is a "blueprint of the optimal execution method" generated by the query optimizer.
Importance of Execution Plan
- Identify performance bottlenecks
- Check effectiveness of indexes
- Determine direction of query optimization
- Diagnose unexpected behavior
Types of Execution Plans
1. Estimated Execution Plan
-- SQL Server Management Studio (SSMS)
-- Ctrl + L or "Display Estimated Execution Plan" from menu
- Displayed without executing the query
- Plan generated by optimizer
- Prediction based on statistics
2. Actual Execution Plan
-- SQL Server Management Studio (SSMS)
-- Enable with Ctrl + M then execute query
-- Or "Include Actual Execution Plan" from menu
- Obtained by actually executing the query
- Includes actual row counts and execution time
- Best for performance analysis
How to View Execution Plans
Method 1: SSMS Graphical Execution Plan
-- 1. Write query in SSMS
SELECT c.CustomerID, c.CustomerName, o.OrderDate, o.TotalAmount
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= '2024-01-01';
-- 2. Press Ctrl + M to enable "Include Actual Execution Plan"
-- 3. Execute query
-- 4. Open "Execution Plan" tab
How to read graphical plan:
SELECT
↑
Nested Loops Join (80%)
↗ ↖
Index Seek (10%) Index Seek (10%)
IX_Customers_PK IX_Orders_OrderDate
↑ ↑
[Customers] [Orders]
Information displayed on each operator:
- Operator name (Index Seek, Nested Loops Join, etc.)
- Cost (%): Relative cost to the entire query
- Estimated rows vs Actual rows
- Estimated number of executions
Method 2: SET STATISTICS PROFILE
-- Text format execution plan
SET SHOWPLAN_TEXT ON;
GO
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
GO
SET SHOWPLAN_TEXT OFF;
GO
-- Output Example:
-- StmtText
-- |--Index Seek(OBJECT:([Orders].[IX_Orders_OrderDate]),
-- SEEK:([OrderDate]='2024-01-15'))
Method 3: SET STATISTICS IO/TIME
-- Display I/O statistics and timing information
SET STATISTICS IO ON;
SET STATISTICS TIME ON;
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
-- Output:
-- Table 'Orders'. Scan count 1, logical reads 5, physical reads 0
-- SQL Server Execution Times:
-- CPU time = 0 ms, elapsed time = 1 ms.
SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;
Method 4: XML Format Execution Plan
SET SHOWPLAN_XML ON;
GO
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
GO
SET SHOWPLAN_XML OFF;
GO
-- Get detailed information in XML (Useful for programmatic analysis)
Major Operators in Execution Plan
1. Data Access Operators
Clustered Index Scan
-- Scan entire clustered index
SELECT * FROM Orders;
-- Execution Plan:
-- Clustered Index Scan (Orders)
-- Cost: 100%
-- Logical Reads: 10,000 pages
Clustered Index Seek
-- Search specific row in clustered index
SELECT * FROM Orders WHERE OrderID = 123;
-- Execution Plan:
-- Clustered Index Seek (Orders.PK_Orders)
-- Seek Predicates: OrderID = 123
-- Cost: 0.0033%
-- Logical Reads: 3 pages
Index Scan (Non-Clustered)
-- Scan entire non-clustered index
SELECT OrderDate FROM Orders;
-- Execution Plan:
-- Index Scan (Orders.IX_Orders_OrderDate)
-- Cost: 50%
-- Logical Reads: 2,000 pages
Index Seek (Non-Clustered)
-- Search specific row in non-clustered index
SELECT OrderID FROM Orders WHERE OrderDate = '2024-01-15';
-- Execution Plan:
-- Index Seek (Orders.IX_Orders_OrderDate)
-- Seek Predicates: OrderDate = '2024-01-15'
-- Cost: 0.01%
-- Logical Reads: 5 pages
Key Lookup
-- Additional access to retrieve columns not in index
SELECT OrderID, OrderDate, CustomerName, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';
-- Execution Plan:
-- Nested Loops Join
-- ├─ Index Seek (IX_Orders_OrderDate) -- Search by OrderDate
-- └─ Key Lookup (PK_Orders) -- Retrieve remaining columns
-- ⚠️ Key Lookup is high cost (Clustered access for each row)
Table Scan
-- Scan entire heap table (no clustered index)
SELECT * FROM TempData; -- No clustered index
-- Execution Plan:
-- Table Scan (TempData)
-- Cost: 100%
-- Worst performance
2. Join Operators
Nested Loops Join
-- Optimal for joining small datasets
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.CustomerID = 123;
-- Execution Plan:
-- Nested Loops Join
-- ├─ Index Seek (Customers.PK_Customers) -- Outer loop (Driving table)
-- └─ Index Seek (Orders.IX_Orders_CustomerID) -- Inner loop
-- Characteristics: Search inner for each row of outer (Loop)
-- Application: Small dataset, index exists
Hash Match Join
-- Optimal for joining large datasets
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;
-- Execution Plan:
-- Hash Match (Inner Join)
-- ├─ Index Scan (Customers) -- Build input (Build hash table)
-- └─ Index Scan (Orders) -- Probe input (Search hash table)
-- Characteristics: Create hash table and match
-- Application: Large dataset, no index
Merge Join
-- Optimal for joining sorted data
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
ORDER BY c.CustomerID;
-- Execution Plan:
-- Merge Join (Inner Join)
-- ├─ Index Scan (Customers.PK_Customers) -- Sorted
-- └─ Index Scan (Orders.IX_Orders_CustomerID) -- Sorted
-- Characteristics: Merge assuming both data are sorted
-- Application: Sorted data, 1-to-many join
3. Aggregation and Sort Operators
Sort
-- Sort data
SELECT * FROM Orders ORDER BY TotalAmount DESC;
-- Execution Plan:
-- Sort (ORDER BY: TotalAmount DESC)
-- └─ Clustered Index Scan (Orders)
-- ⚠️ Sort is high cost (may use tempdb)
Stream Aggregate
-- Aggregation of sorted data (Efficient)
SELECT CustomerID, COUNT(*), SUM(TotalAmount)
FROM Orders
GROUP BY CustomerID;
-- Execution Plan (If index is in CustomerID order):
-- Stream Aggregate (GROUP BY: CustomerID)
-- └─ Index Scan (IX_Orders_CustomerID) -- Already sorted
-- Characteristics: Sequentially aggregate sorted data
Hash Aggregate
-- Aggregation without sorting (For large data)
SELECT Status, COUNT(*)
FROM Orders
GROUP BY Status;
-- Execution Plan:
-- Hash Aggregate (GROUP BY: Status)
-- └─ Clustered Index Scan (Orders)
-- Characteristics: Aggregate with hash table
How to Read Execution Plans
1. Check Cost
Query Cost: 100%
├─ Nested Loops Join: 80% ⚠️ Highest cost
│ ├─ Index Scan: 15%
│ └─ Key Lookup: 65% ⚠️ Bottleneck
└─ Index Seek: 20%
Criteria:
- 50% or more: Target for optimization
- Key Lookup: Consider covering index
- Scan: Check if it can be changed to Seek
2. Check Row Count Discrepancy
Operator: Index Seek
├─ Estimated Rows: 10
└─ Actual Rows: 100,000 ⚠️ Large discrepancy
Causes:
- Outdated statistics
- Skewed data distribution
Countermeasures:
-- Update statistics
UPDATE STATISTICS Orders WITH FULLSCAN;
3. Check Warning Marks
⚠️ Warning: Implicit conversion
⚠️ Warning: Missing statistics
⚠️ Warning: Spillover due to insufficient memory
Execution Plan Optimization Examples
Example 1: Removing Key Lookup
Problematic Query:
-- Query
SELECT OrderID, OrderDate, CustomerID, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';
-- Execution Plan:
-- Nested Loops Join (Cost: 100%)
-- ├─ Index Seek (IX_Orders_OrderDate): 10%
-- └─ Key Lookup (PK_Orders): 90% ⚠️ Bottleneck
Solution: Covering Index
-- Create index
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate_Covering
ON Orders (OrderDate)
INCLUDE (CustomerID, TotalAmount);
-- Improved Execution Plan:
-- Index Seek (IX_Orders_OrderDate_Covering): 100%
-- Key Lookup disappeared!
Example 2: From Index Scan to Index Seek
Problematic Query:
-- Query
SELECT * FROM Orders
WHERE YEAR(OrderDate) = 2024;
-- Execution Plan:
-- Clustered Index Scan (Cost: 100%) ⚠️ Full scan
-- Logical Reads: 10,000 pages
Solution: Remove Function
-- Improved Query
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01'
AND OrderDate < '2025-01-01';
-- Improved Execution Plan:
-- Clustered Index Seek (Cost: 20%)
-- Logical Reads: 2,000 pages (80% reduction!)
Example 3: Adding Appropriate Index
Problematic Query:
-- Query
SELECT * FROM Orders
WHERE CustomerID = 123 AND Status = 'Active';
-- Execution Plan:
-- Clustered Index Scan (Cost: 100%) ⚠️ No index
-- Logical Reads: 10,000 pages
Solution: Composite Index
-- Create composite index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Status
ON Orders (CustomerID, Status);
-- Improved Execution Plan:
-- Index Seek (IX_Orders_CustomerID_Status): 100%
-- Logical Reads: 5 pages (99.95% reduction!)
Index Design Best Practices
1. Choosing Clustered Index
-- ✅ Recommended: Monotonically increasing value
CREATE TABLE Orders
(
OrderID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED, -- Sequentially increasing
OrderDate DATETIME,
CustomerID INT
);
-- ❌ Not Recommended: Random value (GUID)
CREATE TABLE Orders
(
OrderID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY CLUSTERED, -- Random
OrderDate DATETIME,
CustomerID INT
);
-- Problem: Frequent page splits, fragmentation progresses
Recommended Criteria:
- ✅ Monotonically increasing (IDENTITY, SEQUENCE)
- ✅ Columns frequently used in range searches
- ✅ Frequently used in GROUP BY, ORDER BY
- ❌ Frequently updated columns
- ❌ Random values (NEWID())
2. Choosing Non-Clustered Index
-- ✅ Recommended: Columns frequently used in WHERE clause
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);
-- ✅ Recommended: JOIN columns
CREATE INDEX IX_OrderDetails_OrderID ON OrderDetails (OrderID);
-- ✅ Recommended: Composite index (Combination of search conditions)
CREATE INDEX IX_Orders_CustomerID_Status
ON Orders (CustomerID, Status);
-- ⚠️ Caution: Low selectivity columns (Few unique values)
CREATE INDEX IX_Orders_Status ON Orders (Status);
-- Status has only 2 values 'Active', 'Inactive' -> Limited effect
Recommended Criteria:
- ✅ Frequently used in WHERE clause, JOIN clause
- ✅ High selectivity (Many unique values)
- ✅ Composite index should have most selective column first
- ❌ Very low selectivity columns (Gender, Flags, etc.)
- ❌ Do not create indiscriminately on frequently updated columns
3. Leveraging INCLUDE Clause
-- ✅ Avoid Key Lookup with covering index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID) -- Key column
INCLUDE (OrderDate, TotalAmount, Status); -- Included columns
-- This query completes with index only
SELECT CustomerID, OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
-- No Key Lookup -> Fast!
Benefits of INCLUDE:
- Stored only at leaf level (Not included in upper levels of B-Tree)
- Suppress increase in index size
- Avoid Key Lookup
4. Filtered Index
-- ✅ Index only data with specific condition
CREATE NONCLUSTERED INDEX IX_Orders_Active
ON Orders (OrderDate)
WHERE Status = 'Active';
-- This query uses small index
SELECT * FROM Orders
WHERE Status = 'Active' AND OrderDate >= '2024-01-01';
-- Index size: Only Active data -> Small -> Fast