SQL Index Fragmentation and Maintenance
What is Index Fragmentation?
Index fragmentation is a significant factor affecting SQL Server performance. It refers to the phenomenon where the physical arrangement of the index no longer matches the logical order due to operations like INSERT, UPDATE, and DELETE over time.
Types of Fragmentation
SQL Server evaluates index health mainly from two perspectives:
-
Logical Fragmentation
- A state where the logical order based on index key values does not match the physical page order.
- Measured by
avg_fragmentation_in_percent. - High values mean increased random I/O.
-
Page Density
- The percentage of data occupying a page.
- Measured by
avg_page_space_used_in_percent. - Low values mean more pages are needed to store the same data, increasing I/O.
Causes of Fragmentation
1. Data Insertion
When new rows are added to a table, existing pages may be split (Page Split).
-- Example of page split
-- Insert a new row when existing page is full
INSERT INTO Orders (OrderID, CustomerID, OrderDate)
VALUES (12345, 'CUST001', GETDATE());
-- -> Page split occurs, data is distributed across two pages
2. Data Update
When variable-length data types (VARCHAR, NVARCHAR) are updated and the row size changes, it may no longer fit in the page.
3. Data Deletion
Even if rows are deleted, the page is not immediately released, leaving empty space.
Impact of Fragmentation on Performance
1. Increased I/O
- Fragmented Index: Requires multiple small I/O requests.
- Non-fragmented Index: Can be done with fewer large I/O requests.
2. Sequential I/O vs Random I/O
┌───────────────────────────────────┐
│ Sequential I/O (No Fragmentation) │
│ [Page1][Page2][Page3][Page4] │
│ ↓ Efficient continuous reading │
└───────────────────────────────────┘
┌───────────────────────────────────┐
│ Random I/O (With Fragmentation) │
│ [Page1]...[Page4]...[Page2][Page3]│
│ ↓ Inefficient random access │
└───────────────────────────────────┘
3. Reduced Memory Efficiency
Low page density requires more memory to cache the same data.
How to Measure Fragmentation
Using sys.dm_db_index_physical_stats
-- Check fragmentation and page density of all indexes in current database
SELECT
OBJECT_SCHEMA_NAME(ips.object_id) AS schema_name,
OBJECT_NAME(ips.object_id) AS object_name,
i.name AS index_name,
i.type_desc AS index_type,
ips.avg_fragmentation_in_percent,
ips.avg_page_space_used_in_percent,
ips.page_count,
ips.alloc_unit_type_desc
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
ORDER BY page_count DESC;
Interpreting Results
| avg_fragmentation_in_percent | Recommended Action |
|---|---|
| < 5% | No maintenance needed |
| 5% - 30% | Consider Reorganize (REORGANIZE) |
| > 30% | Consider Rebuild (REBUILD) |
Index Maintenance Methods
1. Index Reorganize (REORGANIZE)
Features:
- Online operation (fewer locks)
- Low resource consumption
- Optimizes leaf level only
- Interruptible (progress is saved)
Syntax:
-- Reorganize single index
ALTER INDEX IX_Orders_CustomerID
ON Sales.Orders
REORGANIZE;
-- Reorganize all indexes on table
ALTER INDEX ALL
ON Sales.Orders
REORGANIZE;
Benefits:
- Minimal impact on running queries in production environment
- Low transaction log usage
- No additional disk space required
2. Index Rebuild (REBUILD)
Features:
- Drops and recreates the index
- Eliminates fragmentation at all levels
- Can be run offline or online
- Consumes more resources
Syntax:
-- Offline rebuild (Fast but locks occur)
ALTER INDEX PK_Orders_OrderID
ON Sales.Orders
REBUILD;
-- Online rebuild (Table accessible during execution)
ALTER INDEX PK_Orders_OrderID
ON Sales.Orders
REBUILD WITH (ONLINE = ON);
-- Resumable online rebuild (Can be paused/resumed)
ALTER INDEX PK_Orders_OrderID
ON Sales.Orders
REBUILD WITH (ONLINE = ON, RESUMABLE = ON, MAX_DURATION = 240);
Options:
-- Specify FILL FACTOR (To reduce page splits)
ALTER INDEX IX_Orders_OrderDate
ON Sales.Orders
REBUILD WITH (FILLFACTOR = 80);
-- Disable statistics update (If updating separately)
ALTER INDEX ALL
ON Sales.Orders
REBUILD WITH (STATISTICS_NORECOMPUTE = ON);
3. Selection Criteria: Reorganize vs Rebuild
| Item | Reorganize (REORGANIZE) | Rebuild (REBUILD) |
|---|---|---|
| Fragmentation Rate | 5% - 30% | > 30% |
| Resource Consumption | Low | High |
| Execution Time | Short to Medium | Medium to Long |
| Locks | Minimal | Exclusive lock when offline |
| Transaction Log | Low | High |
| Disk Space | None | Approx. 2x index size |
| Statistics Update | None | Updated with full scan |
Index Maintenance Strategy
Recommended Approach
-
Start with Measurement
- Regularly monitor index fragmentation
- Check actual impact on performance
-
Importance of Statistics Update
- Often, improvement from index rebuild is due to statistics update
- Statistics update can be executed at lower cost
-- Update statistics (Full scan)
UPDATE STATISTICS Sales.Orders
WITH FULLSCAN;
-- Update statistics for specific index only
UPDATE STATISTICS Sales.Orders IX_Orders_CustomerID
WITH FULLSCAN;
- Phased Approach
-- Step 1: Check fragmentation
-- Step 2: Update statistics and measure performance
-- Step 3: If improvement is insufficient, execute Reorganize
-- Step 4: If still insufficient, consider Rebuild
Considerations for Azure SQL Database / Azure SQL Managed Instance
Specific Constraints
-
Resource Governance
- Limits on CPU, memory, I/O
- Maintenance execution affects other workloads
-
Cost Optimization
- Often provisioned with minimal resources
- Little surplus resources for maintenance
-
Impact on Replication
- Increased latency on replicas when using read scale-out
- Same applies when using Geo-replication
Recommendations
-- Use online reorganize (Recommended)
ALTER INDEX IX_Orders_CustomerID
ON Sales.Orders
REORGANIZE;
-- Or resumable online rebuild
ALTER INDEX IX_Orders_CustomerID
ON Sales.Orders
REBUILD WITH (ONLINE = ON, RESUMABLE = ON);
Azure Specific Scenarios:
- Improving page density before file compression
- Dealing with database size limits
Concerns When Using GUID as Primary Key
Characteristics of GUID (Globally Unique Identifier)
GUID is a 16-byte binary value that can generate globally unique identifiers.
-- Example of using GUID as primary key
CREATE TABLE Customers
(
CustomerID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY,
CustomerName NVARCHAR(100),
Email NVARCHAR(100)
);
Main Issues with GUID
1. Fragmentation due to Randomness
Issue:
- GUIDs generated by
NEWID()are random - Using as clustered index key causes page splits on every insertion
- Index becomes extremely fragmented
Insertion pattern of random GUID:
┌────────────────────────────────────┐
│ [GUID-5] [GUID-2] [GUID-8] [GUID-1]│
│ ↓ Page splits occur everywhere │
└────────────────────────────────────┘
Insertion pattern of sequential ID:
┌────────────────────────────────────┐
│ [ID-1] [ID-2] [ID-3] [ID-4] │
│ ↓ Insert only at end (No split) │
└────────────────────────────────────┘
2. Storage Overhead
| Data Type | Size | Note |
|---|---|---|
| INT | 4 bytes | Max approx. 2.1 billion rows |
| BIGINT | 8 bytes | Max approx. 9.22 quintillion rows |
| UNIQUEIDENTIFIER (GUID) | 16 bytes | 4x INT, 2x BIGINT |
Impact:
- Primary key value is copied to each non-clustered index
- Storage usage increases significantly
- I/O and memory consumption also increase
3. Impact on Performance
-- Inefficient: Random GUID
CREATE TABLE Orders
(
OrderID UNIQUEIDENTIFIER DEFAULT NEWID() PRIMARY KEY CLUSTERED,
CustomerID INT,
OrderDate DATETIME
);
-- Issues:
-- ✗ Page split at random position on every insertion
-- ✗ Index fragmentation
-- ✗ Increased I/O
-- Improvement 1: Use IDENTITY
CREATE TABLE Orders
(
OrderID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
OrderGUID UNIQUEIDENTIFIER DEFAULT NEWID() UNIQUE NONCLUSTERED,
CustomerID INT,
OrderDate DATETIME
);
-- Benefits:
-- ✓ Minimal page splits with sequential insertion
-- ✓ GUID kept as non-clustered index
-- ✓ Prevent fragmentation
-- Improvement 2: Use NEWSEQUENTIALID()
CREATE TABLE Orders
(
OrderID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY CLUSTERED,
CustomerID INT,
OrderDate DATETIME
);
-- Benefits:
-- ✓ GUIDs are generated sequentially
-- ✓ Page splits significantly reduced
-- Cautions:
-- ⚠ Not perfectly sequential, but partially sequential
-- ⚠ Order resets on server restart
Legitimate Reasons to Use GUID
GUID may be appropriate in some cases:
-
Distributed Systems
- Need to generate IDs independently on multiple servers
- Merge replication
-
Integration with External Systems
- External API requires GUID
- Global uniqueness is mandatory
-
Security
- Want to prevent ID guessing
Recommended Design Patterns
Pattern 1: Surrogate Key + GUID (Recommended)
CREATE TABLE Products
(
-- Sequential ID for internal use (Clustered Index)
ProductID BIGINT IDENTITY(1,1) PRIMARY KEY CLUSTERED,
-- GUID for external exposure (Non-clustered Unique Index)
ProductGUID UNIQUEIDENTIFIER DEFAULT NEWID() UNIQUE NONCLUSTERED,
ProductName NVARCHAR(100),
Price DECIMAL(18,2)
);
-- Use ProductID internally (Fast)
-- Expose ProductGUID to external APIs (Secure)
Pattern 2: Sequential GUID
CREATE TABLE Sessions
(
SessionID UNIQUEIDENTIFIER DEFAULT NEWSEQUENTIALID() PRIMARY KEY CLUSTERED,
UserID INT,
LoginTime DATETIME
);
-- Constraints of NEWSEQUENTIALID():
-- - Can only be used in DEFAULT constraint
-- - Cannot be called directly in INSERT statement
Pattern 3: Composite Key (User ID + Transaction ID)
-- Composite key to reduce latch contention
CREATE TABLE Transactions
(
UserID INT,
TransactionID BIGINT,
Amount DECIMAL(18,2),
TransactionDate DATETIME,
-- Random distribution by putting UserID first
PRIMARY KEY CLUSTERED (UserID, TransactionID)
);
-- Benefits:
-- ✓ Insertions distributed per user
-- ✓ Reduce latch contention
-- ✓ Page splits occur within each user range
Maintenance Best Practices
1. Regular Monitoring
-- Example of regular monitoring query
SELECT
OBJECT_NAME(ips.object_id) AS TableName,
i.name AS IndexName,
ips.avg_fragmentation_in_percent AS FragmentationPercent,
ips.avg_page_space_used_in_percent AS PageDensity,
ips.page_count AS PageCount,
CASE
WHEN ips.avg_fragmentation_in_percent > 30 AND ips.page_count > 1000
THEN 'REBUILD Recommended'
WHEN ips.avg_fragmentation_in_percent > 10 AND ips.page_count > 1000
THEN 'REORGANIZE Recommended'
ELSE 'No Maintenance Needed'
END AS Recommendation
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id
AND ips.index_id = i.index_id
WHERE ips.page_count > 100 -- Exclude small indexes
ORDER BY ips.avg_fragmentation_in_percent DESC;
2. Setting Maintenance Window
-- Example of maintenance job (SQL Server Agent)
-- Run as nightly batch
-- Step 1: Update statistics
EXEC sp_updatestats;
-- Step 2: Reorganize only indexes with high fragmentation
DECLARE @TableName NVARCHAR(128);
DECLARE @IndexName NVARCHAR(128);
DECLARE @SQL NVARCHAR(MAX);
DECLARE index_cursor CURSOR FOR
SELECT
OBJECT_NAME(ips.object_id),
i.name
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, 'SAMPLED') AS ips
INNER JOIN sys.indexes AS i
ON ips.object_id = i.object_id AND ips.index_id = i.index_id
WHERE ips.avg_fragmentation_in_percent > 10
AND ips.page_count > 1000;
OPEN index_cursor;
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName;
WHILE @@FETCH_STATUS = 0
BEGIN
SET @SQL = N'ALTER INDEX ' + QUOTENAME(@IndexName) +
N' ON ' + QUOTENAME(@TableName) + N' REORGANIZE;';
EXEC sp_executesql @SQL;
FETCH NEXT FROM index_cursor INTO @TableName, @IndexName;
END;
CLOSE index_cursor;
DEALLOCATE index_cursor;
3. Considering Resource Limits
-- For Azure SQL Database: Resumable maintenance
ALTER INDEX IX_LargeTable_Column
ON dbo.LargeTable
REBUILD WITH (
ONLINE = ON,
RESUMABLE = ON,
MAX_DURATION = 120 -- Timeout in 2 hours
);
-- Resume later
ALTER INDEX IX_LargeTable_Column
ON dbo.LargeTable
RESUME;
Summary
About Index Fragmentation
- ✅ Regular monitoring is important
- ✅ Impact of fragmentation varies by workload
- ✅ Often improved just by updating statistics
- ⚠️ Do not run maintenance indiscriminately (consumes resources)
About Using GUID
- ❌ Not Recommended: Using random GUID (NEWID()) as clustered index key
- ⚠️ Caution: Fragmentation progresses rapidly when using GUID
- ✅ Recommended: Sequential ID (IDENTITY) + Non-clustered GUID
- ✅ Alternative: Use NEWSEQUENTIALID() (with constraints)
Maintenance Strategy
| Scenario | Recommended Method |
|---|---|
| Mild fragmentation (5-10%) | Statistics update only |
| Moderate fragmentation (10-30%) | REORGANIZE |
| High fragmentation (>30%) | REBUILD (Online if possible) |
| Azure SQL | REORGANIZE or RESUMABLE REBUILD |
| Small index (<1000 pages) | No maintenance needed |
Design Recommendations
-
Selection of Primary Key
- Prioritize monotonically increasing values (IDENTITY, SEQUENCE)
- If GUID is needed, add as non-clustered index
-
Fill Factor Setting
- Usually leave at 100 (default)
- Set to 80-90 only if page splits are frequent
-
Monitoring and Improvement
- Measure performance using Query Store
- Check effect of maintenance with A/B testing