Clustered and Non-Clustered Indexes

What is an Index?

An index is a data structure used to quickly retrieve data from a database table. Similar to an index in a book, it allows you to find the desired information without reading every page.

Purpose of Indexes

Improve Query Performance
- Faster data retrieval
- Optimization of WHERE, JOIN, and ORDER BY clauses
Ensure Uniqueness
- Implementation of PRIMARY KEY and UNIQUE constraints
Efficiency of Foreign Keys
- Faster referential integrity checks

Basic Structure of Indexes: What is B-Tree?

SQL Server indexes use a balanced tree structure called B-Tree (Balanced Tree).

B-Tree Hierarchy

                    Root Node
                    ┌─────────────┐
                    │  [50, 100]  │
                    └──────┬──────┘
           ┌───────────────┼───────────────┐
           │               │               │
    Intermediate Node Intermediate Node Intermediate Node
    ┌──────────┐      ┌──────────┐    ┌──────────┐
    │[10, 30]  │      │[60, 80]  │    │[110, 130]│
    └─────┬────┘      └─────┬────┘    └─────┬────┘
    ┌─────┼─────┐    ┌──────┼─────┐  ┌──────┼─────┐
    │     │     │    │      │     │  │      │     │
Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node Leaf Node
┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐┌──────┐
│1-9   ││11-29 ││31-49 ││51-79 ││81-99 ││101+  │
└──────┘└──────┘└──────┘└──────┘└──────┘└──────┘

Features of B-Tree

Balanced Tree: All leaf nodes are at the same depth
Efficient Search: O(log n) complexity
Sorted: Key values are always sorted
Good for Range Searches: Efficient retrieval of continuous data

Clustered Index

Definition of Clustered Index

A clustered index physically reorders and stores the table data itself according to the index key.

Important Characteristics

Only one per table
- There can be only one physical order of data
- PRIMARY KEY is clustered by default
Leaf nodes contain data rows themselves
- Index and data are integrated
- No additional lookup required
Determines physical order
- Data pages themselves are part of the index

Structure of Clustered Index

Root Node (B-Tree Upper Level)
        │
        ▼
Intermediate Node (B-Tree Middle Level)
        │
        ▼
Leaf Node = Data Page (Actual Table Data)
┌───────────────────────────────────────────┐
│ CustomerID │ CustomerName    │ City       │
├───────────────────────────────────────────┤
│ 1          │ Alice Johnson   │ New York   │
│ 2          │ Bob Smith       │ Chicago    │
│ 3          │ Carol White     │ Seattle    │
└───────────────────────────────────────────┘
↑ Data itself is stored in leaf nodes

Creating a Clustered Index

-- Created as PRIMARY KEY during table creation (Clustered by default)
CREATE TABLE Customers
(
    CustomerID INT PRIMARY KEY CLUSTERED,  -- Clustered Index
    CustomerName NVARCHAR(100),
    Email NVARCHAR(100),
    City NVARCHAR(50)
);

-- Add a clustered index to an existing table
CREATE CLUSTERED INDEX IX_Orders_OrderDate 
ON Orders (OrderDate);

-- Make PRIMARY KEY non-clustered and another column clustered
CREATE TABLE Orders
(
    OrderID INT PRIMARY KEY NONCLUSTERED,  -- Non-Clustered
    OrderDate DATETIME,
    CustomerID INT,
    TotalAmount DECIMAL(18,2)
);

CREATE CLUSTERED INDEX IX_Orders_OrderDate 
ON Orders (OrderDate);

Benefits of Clustered Index

1. Fast Range Searches

-- Range search: Very fast with clustered index
SELECT * 
FROM Orders
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-31';
-- Fast reading with sequential I/O because data is physically contiguous

2. No Additional Lookup Required

-- No additional access needed when retrieving all columns
SELECT OrderID, OrderDate, CustomerID, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';
-- Completed in one access because leaf nodes contain all data

3. ORDER BY May Not Be Needed

-- Automatically sorted by clustered index key
SELECT * 
FROM Orders
ORDER BY OrderDate;
-- Additional sort not needed because data is already physically ordered by OrderDate

Disadvantages of Clustered Index

1. Impact on Insertion Performance

-- Page split occurs when inserting with random values
INSERT INTO Orders (OrderID, OrderDate, CustomerID, TotalAmount)
VALUES (5000, '2023-06-15', 123, 500.00);
-- Since OrderDate is in 2023, it is inserted between existing 2024 data
-- -> Page split occurs, performance degrades

2. Physical Movement on Update

-- Updating clustered key involves physical movement
UPDATE Orders
SET OrderDate = '2025-01-01'  -- Change date significantly
WHERE OrderID = 1234;
-- High cost because the row needs to move physically

Non-Clustered Index

Definition of Non-Clustered Index

A non-clustered index manages index keys and pointers to data separately. It is similar to an index in a book, where you find the page number from the index and then access the text.

Important Characteristics

Maximum 999 per table
- SQL Server 2008 and later: 999
- SQL Server 2005 and earlier: 249
Leaf nodes contain pointers
- With Clustered: Stores clustered key value
- Without Clustered (Heap): Stores RID (Row Identifier)
Independent B-Tree Structure
- Has a structure separate from data

Structure of Non-Clustered Index (With Clustered)

Non-Clustered Index
Root Node
    │
    ▼
Intermediate Node
    │
    ▼
Leaf Node (Index Key + Clustered Key)
┌────────────────────────────────┐
│ Email              │ CustomerID│ ← Pointer to Clustered Key
├────────────────────────────────┤
│ alice@example.com  │ 1         │
│ bob@example.com    │ 2         │
│ carol@example.com  │ 3         │
└────────────────────────────────┘
         │
         ▼ Lookup via Clustered Index
┌───────────────────────────────────────────┐
│ CustomerID │ CustomerName    │ City       │ ← Clustered Index (Actual Data)
├───────────────────────────────────────────┤
│ 1          │ Alice Johnson   │ New York   │
│ 2          │ Bob Smith       │ Chicago    │
│ 3          │ Carol White     │ Seattle    │
└───────────────────────────────────────────┘

Creating a Non-Clustered Index

-- Single column non-clustered index
CREATE NONCLUSTERED INDEX IX_Customers_Email
ON Customers (Email);

-- Composite index (multiple columns)
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_OrderDate
ON Orders (CustomerID, OrderDate);

-- Covering index using INCLUDE clause
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_INC
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount);

-- Filtered index
CREATE NONCLUSTERED INDEX IX_Orders_ActiveOnly
ON Orders (OrderDate)
WHERE Status = 'Active';

-- Unique index
CREATE UNIQUE NONCLUSTERED INDEX UQ_Customers_Email
ON Customers (Email);

Benefits of Non-Clustered Index

1. Fast Search for Specific Columns

-- Fast search by Email
SELECT CustomerID, CustomerName
FROM Customers
WHERE Email = 'alice@example.com';
-- Fast search using IX_Customers_Email index

2. Support Diverse Queries with Multiple Indexes

-- Support different query patterns
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);
CREATE INDEX IX_Orders_OrderDate ON Orders (OrderDate);
CREATE INDEX IX_Orders_Status ON Orders (Status);

-- Each query uses the appropriate index
SELECT * FROM Orders WHERE CustomerID = 123;     -- IX_Orders_CustomerID
SELECT * FROM Orders WHERE OrderDate = '2024-01-01';  -- IX_Orders_OrderDate
SELECT * FROM Orders WHERE Status = 'Active';    -- IX_Orders_Status

3. Best Performance with Covering Index

-- Create covering index with INCLUDE clause
CREATE INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID)
INCLUDE (OrderDate, TotalAmount, Status);

-- This query completes with index only (No Key Lookup)
SELECT OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
-- No table access needed because index contains all necessary columns

Disadvantages of Non-Clustered Index

1. Key Lookup Overhead

-- Retrieving columns not in the index
SELECT CustomerID, CustomerName, Email, Phone, Address
FROM Customers
WHERE Email = 'alice@example.com';

-- Process executed:
-- 1. Search Email in IX_Customers_Email -> Get CustomerID (Clustered Key)
-- 2. Lookup CustomerID in Clustered Index -> Get all data
-- ⚠️ Requires 2 accesses (Key Lookup)

2. Storage Overhead

-- More indexes consume more storage
-- Each non-clustered index has an independent B-Tree structure

-- Example: 5 non-clustered indexes on Customers table
-- -> Requires storage for table data + 5 index trees

3. Update Cost

-- When updating data, all related indexes must also be updated
UPDATE Orders
SET TotalAmount = 1000.00
WHERE OrderID = 123;

-- Elements updated:
-- 1. Table data (Clustered Index)
-- 2. All non-clustered indexes containing TotalAmount
-- 3. Related statistics

Clustered vs Non-Clustered: Comparison Table

Item	Clustered Index	Non-Clustered Index
Number per Table	Only 1	Max 999
Leaf Node	Data row itself	Index key + Pointer
Physical Arrangement	Arranged in index order	Separate structure
Range Search	Very fast	Fast (but Key Lookup may occur)
Storage	Integrated with table data	Additional storage required
Insertion Performance	Potential for page splits	Less impact
Update Performance	Physical movement on key update	Updated independently
Recommended Use	PRIMARY KEY, Range Search	WHERE clause, JOIN columns

Index Seek vs Index Scan

There are mainly two ways SQL Server accesses data using indexes.

Index Seek

Definition: A method to efficiently search for specific rows using the B-Tree structure.

How it works

Navigate B-Tree from top to bottom

Root Node: Check [50, 100]
   ↓ 25 < 50 so go left
Intermediate Node: Check [10, 30]
   ↓ 25 > 10 and 25 < 30 so go center
Leaf Node: Directly access page containing 25
   ↓
Retrieve target data (access only a few pages)

Conditions for Index Seek

-- 1. Equality condition (=)
SELECT * FROM Customers WHERE CustomerID = 123;

-- 2. Range condition (small range)
SELECT * FROM Orders 
WHERE OrderDate BETWEEN '2024-01-01' AND '2024-01-07';

-- 3. IN clause (few values)
SELECT * FROM Customers WHERE CustomerID IN (1, 2, 3);

-- 4. Using leading column of composite index
-- Index: IX_Orders_CustomerID_OrderDate (CustomerID, OrderDate)
SELECT * FROM Orders WHERE CustomerID = 123;  -- ✓ Seek possible

-- 5. LIKE clause (prefix match)
SELECT * FROM Customers WHERE Email LIKE 'alice%';

Features of Index Seek

✅ Fast: Access only necessary pages
✅ Efficient: Minimal I/O
✅ Scalable: Little speed degradation as data volume increases
📊 Logical Reads: Usually a few to dozens of pages

Index Scan

Definition: A method to scan the leaf level of the index sequentially from the beginning.

How it works

Sequentially read all leaf nodes of the index

Leaf Node 1 -> Leaf Node 2 -> Leaf Node 3 -> ... -> Leaf Node N
   ↓            ↓            ↓                ↓
Scan all pages (Access all pages)

Conditions for Index Scan

-- 1. No WHERE clause
SELECT * FROM Orders;

-- 2. Condition not using index key
-- Index: IX_Orders_OrderDate (OrderDate)
SELECT * FROM Orders WHERE CustomerID = 123;  -- Not using OrderDate

-- 3. Using functions
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;

-- 4. Using NOT, !=, <>
SELECT * FROM Orders WHERE Status != 'Cancelled';

-- 5. LIKE clause (suffix match, infix match)
SELECT * FROM Customers WHERE Email LIKE '%@example.com';

-- 6. OR condition (different indexes)
SELECT * FROM Orders 
WHERE CustomerID = 123 OR OrderDate = '2024-01-01';

-- 7. Not using leading column of composite index
-- Index: IX_Orders_CustomerID_OrderDate (CustomerID, OrderDate)
SELECT * FROM Orders WHERE OrderDate = '2024-01-01';  -- ✗ Scan occurs

Features of Index Scan

⚠️ Slow: Access all index pages
⚠️ Increased I/O: Large amount of page reads
⚠️ Scale Issue: Slows down in proportion to data volume
📊 Logical Reads: Thousands to tens of thousands of pages or more

Table Scan

Definition: A method to read the entire table without using an index.

-- Heap (No Clustered Index)
-- Occurs when index cannot be used or does not exist

-- Example: Search on column without index
SELECT * FROM Orders WHERE Notes LIKE '%urgent%';
-- No index on Notes -> Table Scan

Features of Table Scan

❌ Very Slow: Reads entire table
❌ Worst Performance: Most I/O
⚠️ Acceptable for small tables: No problem for a few hundred rows

Seek vs Scan: Example

-- Table: Orders (1 million rows)
-- Index: IX_Orders_OrderDate (OrderDate)
-- Index: IX_Orders_CustomerID (CustomerID)

-- ✅ Index Seek (Fast)
SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
-- Execution: Seek on IX_Orders_OrderDate
-- Logical Reads: 5 pages
-- Execution Time: 1ms

-- ⚠️ Index Scan (Slow)
SELECT * FROM Orders WHERE YEAR(OrderDate) = 2024;
-- Execution: Scan on IX_Orders_OrderDate (due to function use)
-- Logical Reads: 5,000 pages
-- Execution Time: 500ms

-- Improvement: Do not use function
SELECT * FROM Orders 
WHERE OrderDate >= '2024-01-01' AND OrderDate < '2025-01-01';
-- Execution: Seek on IX_Orders_OrderDate
-- Logical Reads: 1,000 pages (only 2024 data)
-- Execution Time: 100ms

Criteria: When Seek and When Scan

Condition	Operation	Reason
Retrieved rows < 1% of total	Index Seek	Efficiently retrieve few rows
Retrieved rows > 20% of total	Index/Table Scan	Bulk reading is more efficient than repeated Seeks
No WHERE clause	Table/Index Scan	To retrieve all rows
Equality condition on index key	Index Seek	B-Tree search is optimal
Function usage	Index Scan	Index key is transformed
Small table (<1000 rows)	Table Scan	Faster than Seek overhead

Execution Plan

What is an Execution Plan?

An execution plan shows the specific steps selected by SQL Server to execute a query. It is a "blueprint of the optimal execution method" generated by the query optimizer.

Importance of Execution Plan

Identify performance bottlenecks
Check effectiveness of indexes
Determine direction of query optimization
Diagnose unexpected behavior

Types of Execution Plans

1. Estimated Execution Plan

-- SQL Server Management Studio (SSMS)
-- Ctrl + L or "Display Estimated Execution Plan" from menu

Displayed without executing the query
Plan generated by optimizer
Prediction based on statistics

2. Actual Execution Plan

-- SQL Server Management Studio (SSMS)
-- Enable with Ctrl + M then execute query
-- Or "Include Actual Execution Plan" from menu

Obtained by actually executing the query
Includes actual row counts and execution time
Best for performance analysis

How to View Execution Plans

Method 1: SSMS Graphical Execution Plan

-- 1. Write query in SSMS
SELECT c.CustomerID, c.CustomerName, o.OrderDate, o.TotalAmount
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE o.OrderDate >= '2024-01-01';

-- 2. Press Ctrl + M to enable "Include Actual Execution Plan"
-- 3. Execute query
-- 4. Open "Execution Plan" tab

How to read graphical plan:

                     SELECT
                       ↑
                  Nested Loops Join (80%)
                   ↗         ↖
    Index Seek (10%)      Index Seek (10%)
    IX_Customers_PK       IX_Orders_OrderDate
          ↑                     ↑
    [Customers]           [Orders]

Information displayed on each operator:
- Operator name (Index Seek, Nested Loops Join, etc.)
- Cost (%): Relative cost to the entire query
- Estimated rows vs Actual rows
- Estimated number of executions

Method 2: SET STATISTICS PROFILE

-- Text format execution plan
SET SHOWPLAN_TEXT ON;
GO

SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
GO

SET SHOWPLAN_TEXT OFF;
GO

-- Output Example:
-- StmtText
-- |--Index Seek(OBJECT:([Orders].[IX_Orders_OrderDate]), 
--              SEEK:([OrderDate]='2024-01-15'))

Method 3: SET STATISTICS IO/TIME

-- Display I/O statistics and timing information
SET STATISTICS IO ON;
SET STATISTICS TIME ON;

SELECT * FROM Orders WHERE OrderDate = '2024-01-15';

-- Output:
-- Table 'Orders'. Scan count 1, logical reads 5, physical reads 0
-- SQL Server Execution Times:
--    CPU time = 0 ms,  elapsed time = 1 ms.

SET STATISTICS IO OFF;
SET STATISTICS TIME OFF;

Method 4: XML Format Execution Plan

SET SHOWPLAN_XML ON;
GO

SELECT * FROM Orders WHERE OrderDate = '2024-01-15';
GO

SET SHOWPLAN_XML OFF;
GO

-- Get detailed information in XML (Useful for programmatic analysis)

Major Operators in Execution Plan

1. Data Access Operators

Clustered Index Scan

-- Scan entire clustered index
SELECT * FROM Orders;

-- Execution Plan:
-- Clustered Index Scan (Orders)
-- Cost: 100%
-- Logical Reads: 10,000 pages

Clustered Index Seek

-- Search specific row in clustered index
SELECT * FROM Orders WHERE OrderID = 123;

-- Execution Plan:
-- Clustered Index Seek (Orders.PK_Orders)
-- Seek Predicates: OrderID = 123
-- Cost: 0.0033%
-- Logical Reads: 3 pages

Index Scan (Non-Clustered)

-- Scan entire non-clustered index
SELECT OrderDate FROM Orders;

-- Execution Plan:
-- Index Scan (Orders.IX_Orders_OrderDate)
-- Cost: 50%
-- Logical Reads: 2,000 pages

Index Seek (Non-Clustered)

-- Search specific row in non-clustered index
SELECT OrderID FROM Orders WHERE OrderDate = '2024-01-15';

-- Execution Plan:
-- Index Seek (Orders.IX_Orders_OrderDate)
-- Seek Predicates: OrderDate = '2024-01-15'
-- Cost: 0.01%
-- Logical Reads: 5 pages

Key Lookup

-- Additional access to retrieve columns not in index
SELECT OrderID, OrderDate, CustomerName, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';

-- Execution Plan:
-- Nested Loops Join
--   ├─ Index Seek (IX_Orders_OrderDate)  -- Search by OrderDate
--   └─ Key Lookup (PK_Orders)            -- Retrieve remaining columns
-- ⚠️ Key Lookup is high cost (Clustered access for each row)

Table Scan

-- Scan entire heap table (no clustered index)
SELECT * FROM TempData;  -- No clustered index

-- Execution Plan:
-- Table Scan (TempData)
-- Cost: 100%
-- Worst performance

2. Join Operators

Nested Loops Join

-- Optimal for joining small datasets
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
WHERE c.CustomerID = 123;

-- Execution Plan:
-- Nested Loops Join
--   ├─ Index Seek (Customers.PK_Customers)  -- Outer loop (Driving table)
--   └─ Index Seek (Orders.IX_Orders_CustomerID)  -- Inner loop
-- Characteristics: Search inner for each row of outer (Loop)
-- Application: Small dataset, index exists

Hash Match Join

-- Optimal for joining large datasets
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID;

-- Execution Plan:
-- Hash Match (Inner Join)
--   ├─ Index Scan (Customers)  -- Build input (Build hash table)
--   └─ Index Scan (Orders)     -- Probe input (Search hash table)
-- Characteristics: Create hash table and match
-- Application: Large dataset, no index

Merge Join

-- Optimal for joining sorted data
SELECT c.CustomerName, o.OrderDate
FROM Customers c
INNER JOIN Orders o ON c.CustomerID = o.CustomerID
ORDER BY c.CustomerID;

-- Execution Plan:
-- Merge Join (Inner Join)
--   ├─ Index Scan (Customers.PK_Customers) -- Sorted
--   └─ Index Scan (Orders.IX_Orders_CustomerID) -- Sorted
-- Characteristics: Merge assuming both data are sorted
-- Application: Sorted data, 1-to-many join

3. Aggregation and Sort Operators

Sort

-- Sort data
SELECT * FROM Orders ORDER BY TotalAmount DESC;

-- Execution Plan:
-- Sort (ORDER BY: TotalAmount DESC)
--   └─ Clustered Index Scan (Orders)
-- ⚠️ Sort is high cost (may use tempdb)

Stream Aggregate

-- Aggregation of sorted data (Efficient)
SELECT CustomerID, COUNT(*), SUM(TotalAmount)
FROM Orders
GROUP BY CustomerID;

-- Execution Plan (If index is in CustomerID order):
-- Stream Aggregate (GROUP BY: CustomerID)
--   └─ Index Scan (IX_Orders_CustomerID) -- Already sorted
-- Characteristics: Sequentially aggregate sorted data

Hash Aggregate

-- Aggregation without sorting (For large data)
SELECT Status, COUNT(*)
FROM Orders
GROUP BY Status;

-- Execution Plan:
-- Hash Aggregate (GROUP BY: Status)
--   └─ Clustered Index Scan (Orders)
-- Characteristics: Aggregate with hash table

How to Read Execution Plans

1. Check Cost

Query Cost: 100%
├─ Nested Loops Join: 80% ⚠️ Highest cost
│  ├─ Index Scan: 15%
│  └─ Key Lookup: 65% ⚠️ Bottleneck
└─ Index Seek: 20%

Criteria:

50% or more: Target for optimization
Key Lookup: Consider covering index
Scan: Check if it can be changed to Seek

2. Check Row Count Discrepancy

Operator: Index Seek
├─ Estimated Rows: 10
└─ Actual Rows: 100,000 ⚠️ Large discrepancy

Causes:

Outdated statistics
Skewed data distribution

Countermeasures:

-- Update statistics
UPDATE STATISTICS Orders WITH FULLSCAN;

3. Check Warning Marks

⚠️ Warning: Implicit conversion
⚠️ Warning: Missing statistics
⚠️ Warning: Spillover due to insufficient memory

Execution Plan Optimization Examples

Example 1: Removing Key Lookup

Problematic Query:

-- Query
SELECT OrderID, OrderDate, CustomerID, TotalAmount
FROM Orders
WHERE OrderDate = '2024-01-15';

-- Execution Plan:
-- Nested Loops Join (Cost: 100%)
--   ├─ Index Seek (IX_Orders_OrderDate): 10%
--   └─ Key Lookup (PK_Orders): 90% ⚠️ Bottleneck

Solution: Covering Index

-- Create index
CREATE NONCLUSTERED INDEX IX_Orders_OrderDate_Covering
ON Orders (OrderDate)
INCLUDE (CustomerID, TotalAmount);

-- Improved Execution Plan:
-- Index Seek (IX_Orders_OrderDate_Covering): 100%
-- Key Lookup disappeared!

Example 2: From Index Scan to Index Seek

Problematic Query:

-- Query
SELECT * FROM Orders
WHERE YEAR(OrderDate) = 2024;

-- Execution Plan:
-- Clustered Index Scan (Cost: 100%) ⚠️ Full scan
-- Logical Reads: 10,000 pages

Solution: Remove Function

-- Improved Query
SELECT * FROM Orders
WHERE OrderDate >= '2024-01-01' 
  AND OrderDate < '2025-01-01';

-- Improved Execution Plan:
-- Clustered Index Seek (Cost: 20%)
-- Logical Reads: 2,000 pages (80% reduction!)

Example 3: Adding Appropriate Index

Problematic Query:

-- Query
SELECT * FROM Orders
WHERE CustomerID = 123 AND Status = 'Active';

-- Execution Plan:
-- Clustered Index Scan (Cost: 100%) ⚠️ No index
-- Logical Reads: 10,000 pages

Solution: Composite Index

-- Create composite index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Status
ON Orders (CustomerID, Status);

-- Improved Execution Plan:
-- Index Seek (IX_Orders_CustomerID_Status): 100%
-- Logical Reads: 5 pages (99.95% reduction!)

Index Design Best Practices

1. Choosing Clustered Index

-- ✅ Recommended: Monotonically increasing value
CREATE TABLE Orders
(
    OrderID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED,  -- Sequentially increasing
    OrderDate DATETIME,
    CustomerID INT
);

-- ❌ Not Recommended: Random value (GUID)
CREATE TABLE Orders
(
    OrderID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY CLUSTERED,  -- Random
    OrderDate DATETIME,
    CustomerID INT
);
-- Problem: Frequent page splits, fragmentation progresses

Recommended Criteria:

✅ Monotonically increasing (IDENTITY, SEQUENCE)
✅ Columns frequently used in range searches
✅ Frequently used in GROUP BY, ORDER BY
❌ Frequently updated columns
❌ Random values (NEWID())

2. Choosing Non-Clustered Index

-- ✅ Recommended: Columns frequently used in WHERE clause
CREATE INDEX IX_Orders_CustomerID ON Orders (CustomerID);

-- ✅ Recommended: JOIN columns
CREATE INDEX IX_OrderDetails_OrderID ON OrderDetails (OrderID);

-- ✅ Recommended: Composite index (Combination of search conditions)
CREATE INDEX IX_Orders_CustomerID_Status 
ON Orders (CustomerID, Status);

-- ⚠️ Caution: Low selectivity columns (Few unique values)
CREATE INDEX IX_Orders_Status ON Orders (Status);
-- Status has only 2 values 'Active', 'Inactive' -> Limited effect

Recommended Criteria:

✅ Frequently used in WHERE clause, JOIN clause
✅ High selectivity (Many unique values)
✅ Composite index should have most selective column first
❌ Very low selectivity columns (Gender, Flags, etc.)
❌ Do not create indiscriminately on frequently updated columns

3. Leveraging INCLUDE Clause

-- ✅ Avoid Key Lookup with covering index
CREATE NONCLUSTERED INDEX IX_Orders_CustomerID_Covering
ON Orders (CustomerID)  -- Key column
INCLUDE (OrderDate, TotalAmount, Status);  -- Included columns

-- This query completes with index only
SELECT CustomerID, OrderDate, TotalAmount, Status
FROM Orders
WHERE CustomerID = 123;
-- No Key Lookup -> Fast!

Benefits of INCLUDE:

Stored only at leaf level (Not included in upper levels of B-Tree)
Suppress increase in index size
Avoid Key Lookup

4. Filtered Index

-- ✅ Index only data with specific condition
CREATE NONCLUSTERED INDEX IX_Orders_Active
ON Orders (OrderDate)
WHERE Status = 'Active';

-- This query uses small index
SELECT * FROM Orders
WHERE Status = 'Active' AND OrderDate >= '2024-01-01';
-- Index size: Only Active data -> Small -> Fast

What is an Index?​

Purpose of Indexes​

Basic Structure of Indexes: What is B-Tree?​

B-Tree Hierarchy​

Features of B-Tree​

Clustered Index​

Definition of Clustered Index​

Important Characteristics​

Structure of Clustered Index​

Creating a Clustered Index​

Benefits of Clustered Index​

1. Fast Range Searches​

2. No Additional Lookup Required​

3. ORDER BY May Not Be Needed​

Disadvantages of Clustered Index​

1. Impact on Insertion Performance​

2. Physical Movement on Update​

Non-Clustered Index​

Definition of Non-Clustered Index​

Important Characteristics​

Structure of Non-Clustered Index (With Clustered)​

Creating a Non-Clustered Index​

Benefits of Non-Clustered Index​

1. Fast Search for Specific Columns​

2. Support Diverse Queries with Multiple Indexes​

3. Best Performance with Covering Index​

Disadvantages of Non-Clustered Index​

1. Key Lookup Overhead​

2. Storage Overhead​

3. Update Cost​

Clustered vs Non-Clustered: Comparison Table​

Index Seek vs Index Scan​

Index Seek​

How it works​

Conditions for Index Seek​

Features of Index Seek​

Index Scan​

How it works​

Conditions for Index Scan​

Features of Index Scan​

Table Scan​

Features of Table Scan​

Seek vs Scan: Example​

Criteria: When Seek and When Scan​

Execution Plan​

What is an Execution Plan?​

Importance of Execution Plan​

Types of Execution Plans​

1. Estimated Execution Plan​

2. Actual Execution Plan​

How to View Execution Plans​

Method 1: SSMS Graphical Execution Plan​

Method 2: SET STATISTICS PROFILE​

Method 3: SET STATISTICS IO/TIME​

Method 4: XML Format Execution Plan​

Major Operators in Execution Plan​

1. Data Access Operators​

2. Join Operators​

3. Aggregation and Sort Operators​

How to Read Execution Plans​

1. Check Cost​

2. Check Row Count Discrepancy​

3. Check Warning Marks​

Execution Plan Optimization Examples​

Example 1: Removing Key Lookup​

Example 2: From Index Scan to Index Seek​

Example 3: Adding Appropriate Index​

Index Design Best Practices​

1. Choosing Clustered Index​

2. Choosing Non-Clustered Index​

3. Leveraging INCLUDE Clause​

4. Filtered Index​

What is an Index?

Purpose of Indexes

Basic Structure of Indexes: What is B-Tree?

B-Tree Hierarchy

Features of B-Tree

Clustered Index

Definition of Clustered Index

Important Characteristics

Structure of Clustered Index

Creating a Clustered Index

Benefits of Clustered Index

1. Fast Range Searches

2. No Additional Lookup Required

3. ORDER BY May Not Be Needed

Disadvantages of Clustered Index

1. Impact on Insertion Performance

2. Physical Movement on Update

Non-Clustered Index

Definition of Non-Clustered Index

Important Characteristics

Structure of Non-Clustered Index (With Clustered)

Creating a Non-Clustered Index

Benefits of Non-Clustered Index

1. Fast Search for Specific Columns

2. Support Diverse Queries with Multiple Indexes

3. Best Performance with Covering Index

Disadvantages of Non-Clustered Index

1. Key Lookup Overhead

2. Storage Overhead

3. Update Cost

Clustered vs Non-Clustered: Comparison Table

Index Seek vs Index Scan

Index Seek

How it works

Conditions for Index Seek

Features of Index Seek

Index Scan

How it works

Conditions for Index Scan

Features of Index Scan

Table Scan

Features of Table Scan

Seek vs Scan: Example

Criteria: When Seek and When Scan

Execution Plan

What is an Execution Plan?

Importance of Execution Plan

Types of Execution Plans

1. Estimated Execution Plan

2. Actual Execution Plan

How to View Execution Plans

Method 1: SSMS Graphical Execution Plan

Method 2: SET STATISTICS PROFILE

Method 3: SET STATISTICS IO/TIME

Method 4: XML Format Execution Plan

Major Operators in Execution Plan

1. Data Access Operators

2. Join Operators

3. Aggregation and Sort Operators

How to Read Execution Plans

1. Check Cost

2. Check Row Count Discrepancy

3. Check Warning Marks

Execution Plan Optimization Examples

Example 1: Removing Key Lookup

Example 2: From Index Scan to Index Seek

Example 3: Adding Appropriate Index

Index Design Best Practices

1. Choosing Clustered Index

2. Choosing Non-Clustered Index

3. Leveraging INCLUDE Clause

4. Filtered Index