Check data distribution in partitioned table – Script

In an ideal world, all partitioned tables would maintain themselves through a sliding window implementation, archiving process, or other automated routine which performs a split/switch/merge of partitions as-needed. Occasionally however, you may come across a partitioned table which requires manual intervention.

This may happen when the table has outgrown the predefined partition ranges, or when the usage type/profile of the table has changed over time. Below is a procedure which I developed, to provide an overview of the partitions within your table.

The script finds the partitioning key column, and then returns the median, standard deviation, and distribution of data within partitions. It also suggest where action may be required!

From here you can determine your course of action; this might be a new partition, a revised function/scheme which are more suitable, or something automated.

Note that this particular stored procedure is designed to look for an even distribution across partition ranges. This is because I was targeting a rapidly growing OLTP table with high read/write concurrency, which is not being archived.

If your table has a different usage pattern, then feel free to go ahead and modify the procedure to meet your own needs.


/* ———- Start of Script ———— */

if object_id(‘dbo.CheckPartitioning’) is not null
drop procedure dbo.CheckPartitioning
go

create procedure dbo.CheckPartitioning

@Partitioned_Table nvarchar(255)

AS

/* declare variables */
declare @Partitioning_Key nvarchar(255),
@Partitioned_RowCount bigint,
@Partition_Range bigint,
@Sql nvarchar(max),
@Retval nvarchar(255),
@StdDev bigint,
@Median bigint

/* get partitioning key column. */
select @Partitioning_Key = sc.name from sys.tables st INNER JOIN sys.indexes si on si.object_id = st.object_id
INNER JOIN sys.index_columns ic on ic.index_id = si.index_id AND ic.object_id = st.object_id
INNER JOIN sys.columns sc on sc.object_id = ic.object_id AND sc.column_id = ic.column_id
where st.object_id = object_id(@Partitioned_Table)AND ic.partition_ordinal = 1

/* get max value from partitioned table */
set @Retval = N’@RetValOUT int OUTPUT’;
set @Sql = ‘select @RetValOUT = (select top 1 ‘ + @Partitioning_Key + ‘ from ‘ + @Partitioned_Table + ‘ with (nolock) order by 1 desc)’
exec sp_executesql @Sql, @Retval, @retvalOUT=@Partitioned_RowCount output;

/* get max range specified for the partition function */
set @Partition_Range = convert(bigint,(select top 1 pv.[value] from sys.partition_range_values pv inner join sys.partition_functions pf on pv.function_id = pf.function_id
inner join sys.partition_schemes ps on pf.function_id = ps.function_id
inner join sys.indexes si on si.data_space_id = ps.data_space_id
where si.object_id = object_id(@Partitioned_Table) order by pv.boundary_id desc))

/* determine if action is required */
select @Partitioned_Table as ‘table’, @Partition_Range as ‘last range in partition function’, @Partitioned_RowCount as ‘number of rows’, (@Partition_Range-@Partitioned_RowCount) as ‘rows until last range reached’,
case when @Partitioned_RowCount < @Partition_Range then ‘no action required’ else ‘rows are in the rightmost range, a new filegroup and boundary point will be needed.’ end as ‘result’

/* grab the spread of rows within each partitions */
if object_id(‘tempdb.dbo.##temp_partition_data’) is not null
drop table dbo.##temp_partition_data
select distinct p.partition_number, fg.name, p.rows
into dbo.##temp_partition_data
from sys.partitions p
INNER JOIN sys.allocation_units au on au.container_id = p.hobt_id
INNER JOIN sys.filegroups fg on fg.data_space_id = au.data_space_id
where p.object_id = OBJECT_ID(@Partitioned_Table)

/* report on whether variation is greater than 10% of the median */
SET @StdDev = (
select stdev(rows) [stddev]
from ( select rows,
row_number() over( partition by rows order by rows asc) as rowsAsc,
row_number() over( partition by rows order by rows desc) as rowsDesc
from dbo.##temp_partition_data where rows > 0
) x
where rowsAsc in (rowsDesc, rowsDesc – 1, rowsDesc + 1))

SET @Median = (
select avg(rows) [avgrows]
from ( select rows,
row_number() over( partition by rows order by rows asc) as rowsAsc,
row_number() over( partition by rows order by rows desc) as rowsDesc
from dbo.##temp_partition_data where rows > 0
) x
where rowsAsc in (rowsDesc, rowsDesc – 1, rowsDesc + 1))

select @Median as ‘median rowcount in partition’, @StdDev as ‘standard deviation’

if @StdDev > @Median / 10
begin
select ‘The deviation of rows within each partition is greater than 10 percent from the median. Suggest reviewing the partition function ranges.’ as [status]
select * from dbo.##temp_partition_data
end
else
begin
select * from dbo.##temp_partition_data
end

/* cleanup */
if object_id(‘tempdb.dbo.##temp_partition_data’) is not null
drop table dbo.##temp_partition_data

GO

/* ———- End of Script ———— */

Finally, a quick acknowledgment to Adam Machanic – whose post I referenced here, to efficiently return the median value in this query.

If you have any questions or feedback, don’t hesitate to send me an email, or reach out on twitter.

Posted on: August 4, 2016, by : admin

Leave a Reply