Thursday, March 31, 2016

Detecting access to partitioned indexes without providing the partitioning key

Partitioned tables provide effective means of data management and are usually used for archiving data with sliding window management. A lesser known use-case is partitioning large tables for higher performance. Partitioning in this sense splits large tables into smaller chunks which help with index depth reductions.

Index depth is a key performance issue because it affects every data access. The greater the index depth, the more pages must be read to traverse the index to find the required row.

There is a very important caveat for making effective use of partitions. If the query optimizer knows the partition(s) it has to hit in order to access certain data, it can eliminate the other partitions and avoid accessing them altogether. That is, the query optimizer is intelligent enough to access only those partitions which may have the required rows. This is only possible if the partitioning key is provided in the query predicates (the where clause).

Initial designs often take this key point into account and the queries are designed to include the partitioning key. However, this may not hold true over time. If a query is added later without the partitioning key, the SQL engine will have to check every partition to look for the requested row. Even if there is a partitioned index perfectly fitting the other predicates, missing the partitioning key results in checking every partition, whether there is data in it or not!

This point is very important. The query optimizer does not check whether a partition is empty and access the header page nevertheless. This may not seem to be a problem if the number of empty partitions is small. However, in many sliding window scenarios, switched-out partitions will be left empty if a separate maintenance is not planned ahead to take case of the older partitions. In this case, even singleton index accesses will keep taking up more and more I/O as well as time as more empty partitions are left over.

Another issue comes up when a table is partitioned for better performance by splitting large indexes into smaller partitions. If the partitions are relatively full and the statistics are updated frequently, better performance is to be expected. With a few hundred partitions, even access without a partitioning column predicate can be acceptable. But if you decide to add a significant number of empty partitions to cater for future growth, it is very likely to get a sudden performance impact due to access of empty partitions.

When you are victim to empty partition scans, you will see high CPU, high I/O and possibly latch contention in tempdb due to managing the empty results of empty partition accesses. It may be quite difficult to recover from such a situation. Getting rid of the empty partitions is always effective but may be very difficult to perform because partition merging is an operation which requires a schema lock on the underlying table; effectively an offline operation!

It is definitely better to be safe than sorry, so here is a way to detect if the empty partitions of your partitioned tables are accessed without providing the partitioning key:


 -- =============================================  
 -- Author      : Ersin Unal, (C)2016  
 -- Description : Retrieves partitioned indexes which are accessed without a partitioning column predicate 
 -- =============================================  
SET NOCOUNT ON  
 if OBJECT_ID('tempdb.dbo.#partition_stats') IS NOT NULL  
     DROP TABLE #partition_stats  
 -- Create temp table to store the results for each index  
 SELECT TOP 0 DB_NAME() database_name, OBJECT_SCHEMA_NAME(1) schema_name, OBJECT_NAME(1) table_name, i.name index_name, os.*   
 INTO #partition_stats  
 FROM sys.partitions p  
 JOIN sys.dm_db_index_operational_stats(DB_ID(), 1, 1, 0) os  
     ON os.object_id = p.object_id   
         AND os.index_id = p.index_id   
         AND os.partition_number = p.partition_number  
 JOIN sys.indexes i  
     ON i.object_id = p.object_id  
         AND i.index_id = p.index_id  
 DECLARE @dbid int, @objid int, @indid int  
 -- The cursor to traverse all indexes defined on any partition scheme  
 DECLARE c CURSOR FAST_FORWARD FOR  
     SELECT DB_ID() db_id, i.object_id objectid, i.index_id  
     FROM sys.indexes i  
     JOIN sys.partition_schemes ps ON ps.data_space_id = i.data_space_id  
     ORDER BY 1,2,3  
 OPEN c  
 FETCH NEXT FROM c INTO @dbid, @objid, @indid  
 WHILE @@FETCH_STATUS = 0  
 BEGIN  
     INSERT INTO #partition_stats  
         SELECT DB_NAME(@dbid) database_name, OBJECT_SCHEMA_NAME(@objid) schema_name, OBJECT_NAME(@objid) table_name, i.name index_name, os.*   
         FROM sys.partitions p  
         JOIN sys.dm_db_index_operational_stats(@dbid, @objid, @indid, 0) os  
             ON os.object_id = p.object_id   
                 AND os.index_id = p.index_id   
                 AND os.partition_number = p.partition_number  
         JOIN sys.indexes i  
             ON i.object_id = p.object_id  
                 AND i.index_id = p.index_id  
         WHERE  
             p.rows = 0 -- The partition is empty  
             AND os.singleton_lookup_count > 0 -- Yet, there is a singleton lookup which means the query optimizer did not prune the partition because it did not have the partitioning column!  
     FETCH NEXT FROM c INTO @dbid, @objid, @indid  
 END  
 CLOSE c  
 DEALLOCATE c  
 SELECT DISTINCT database_name, schema_name, table_name, index_name, index_id FROM #partition_stats ORDER BY 1,2,3,5