When SQL 2012 was released it came with a feature called ColumnStore indexes. What are they you ask? In a nutshell, the traditional b-tree structure that we know and love stores pages of data which contain the ‘rows’ of a table. An index page will contain one or more columns from each row. This is efficient for most OLTP scenarios where you are providing predicates to filter data during most queries, and where data access is typically more varied.
But in a data warehouse OLAP scenario, the data access requirements differ slightly. With aggregations often made over vast numbers of rows, but only a few carefully chosen columns, there can be an overhead associated with querying unneeded columns.
Enter ColumnStore indexes. The data is still stored in separate pages, although each page is filled with all the values from one or more columns! With that method, a query referencing only a select few columns can utilize the index which contains only those columns! Thereby reducing IO, as only the data that was needed was cached.
Additionally, having that we are now storing data of the same type in a given page, and that there is a high level of redundancy, this lends itself to increased efficiency when we apply compression.
As well as reducing IO as more data can fit into memory, ColumnStore indexes are also memory optimized and are processed in chunks of columns called batches which reduce CPU usage.
ColumnStore indexes were introduced In SQL 2012, were non clustered only, and could not be updated. This meant that you needed to have a workaround for updating your data. One method could be to disable or drop your index, load the data, and then rebuild. Or if you utilize partitioning you could possibly perform a SWITCH operation.
If you would like to go into more details about ColumnStore indexes you can follow the link here. http://technet.microsoft.com/en-gb/library/gg492088.aspx
But for now, we will explore that fact that in SQL 2014, Microsoft has allowed us to not only create clustered ColumnStore indexes, but also to update them!
Let’s run through a real quick example to see the benefits of a ColumnStore index, and finally we will prove that they can now be updated. As stated ColumnStore indexes have their real benefits in a DW situation, so for this example I will use the AdventureWorksDW2012 database, and we will be using the dbo.FactResellerSales table.
First of all, we will take a copy of the FactResellerSales table, and then create a clustered ColumnStore index on that copy. This will give us two tables which apart from the ColumnStore index, are identical to test against.
Now lets query both tables in a single batch, and then compare the results. We will turn on the graphical execution plan and also generate some time/IO metrics.
Now let’s look at the results…
As you can see, the use of a clustered ColumnStore index to satisfy this query had an overall cost of 25% relative to the batch. Whereas the query utilizing the standard clustered index has cost the remaining 75%! Additionally, if we tooltip over the ColumnStore index operator we can see that it has an estimated IO cost of 0.003125 compared to 2.19868 for the row-based index.
Lets go further and look at the statistics for Time and IO!
From this output you can see that the first query, with the normal clustered index, has performed 2982 logical reads. And the second query, which uses the ColumnStore index, has performed only 77 logical reads as well as 60 read-ahead reads. A significant difference!
A brief aside on reads… A ‘logical read’ occurs every time SQL server requests a data page from the buffer cache. If the page is not in memory, then this needs to be read from the disk and that constitutes a ‘physical read’. ‘read-ahead reads’ are an optimization method whereby the pages are brought into the buffer cache before they are referenced by the query. Put simply, SQL server was able to determine the pages that it needed in advance.
As stated previously, in SQL 2012 ColumnStore indexes could not be updated. And if you tried to update a ColumnStore index, you would get the following error.
Now in SQL 2014 ColumnStore indexes can be clustered, and they can also be updated, which means you have time to make another coffee, because you will see this screen instead!
So there we have it – a little example where you can see that the benefits of using a ColumnStore index are substantial. Please do consider however that this test was performed against the AdventureworksDW database and this particular fact table had only 60,000 rows. In a real life DW scenario it is likely that you will have hundreds of millions of rows, and beyond. And this is where you could truly see a significant gain by using ColumnStore indexes.