When SQL 2012 was released it came with a feature called ColumnStore indexes. What are they you ask? In a nutshell, the traditional b-tree structure that we know and love stores pages of data which contain the ‘rows’ of a table. An index page will contain one or more columns from each row. This is efficient for most OLTP scenarios where you are providing predicates to filter data during most queries, and where data access is typically more varied.
But in a data warehouse OLAP scenario, the data access requirements differ slightly. With aggregations often made over vast numbers of rows, but only a few carefully chosen columns, there can be an overhead associated with querying unneeded columns.
Enter ColumnStore indexes. The data is still stored in separate pages, although each page is filled with all the values from one or more columns! With that method, a query referencing only a select few columns can utilize the index which contains only those columns! Thereby reducing IO, as only the data that was needed was cached.
Additionally, having that we are now storing data of the same type in a given page, and that there is a high level of redundancy, this lends itself to increased efficiency when we apply compression.
As well as reducing IO as more data can fit into memory, ColumnStore indexes are also memory optimized and are processed in chunks of columns called batches which reduce CPU usage.
ColumnStore indexes were introduced In SQL 2012, were non clustered only, and could not be updated. This meant that you needed to have a workaround for updating your data. One method could be to disable or drop your index, load the data, and then rebuild. Or if you utilize partitioning you could possibly perform a SWITCH operation.
If you would like to go into more details about ColumnStore indexes you can follow the link here. http://technet.microsoft.com/en-gb/library/gg492088.aspx
But for now, we will explore that fact that in SQL 2014, Microsoft has allowed us to not only create clustered ColumnStore indexes, but also to update them!
Let’s run through a real quick example to see the benefits of a ColumnStore index, and finally we will prove that they can now be updated. As stated ColumnStore indexes have their real benefits in a DW situation, so for this example I will use the AdventureWorksDW2012 database, and we will be using the dbo.FactResellerSales table.
First of all, we will take a copy of the FactResellerSales table, and then create a clustered ColumnStore index on that copy. This will give us two tables which apart from the ColumnStore index, are identical to test against.
Now lets query both tables in a single batch, and then compare the results. We will turn on the graphical execution plan and also generate some time/IO metrics.
Now let’s look at the results…
As you can see, the use of a clustered ColumnStore index to satisfy this query had an overall cost of 25% relative to the batch. Whereas the query utilizing the standard clustered index has cost the remaining 75%! Additionally, if we tooltip over the ColumnStore index operator we can see that it has an estimated IO cost of 0.003125 compared to 2.19868 for the row-based index.
Lets go further and look at the statistics for Time and IO!
From this output you can see that the first query, with the normal clustered index, has performed 2982 logical reads. And the second query, which uses the ColumnStore index, has performed only 77 logical reads as well as 60 read-ahead reads. A significant difference!
A brief aside on reads… A ‘logical read’ occurs every time SQL server requests a data page from the buffer cache. If the page is not in memory, then this needs to be read from the disk and that constitutes a ‘physical read’. ‘read-ahead reads’ are an optimization method whereby the pages are brought into the buffer cache before they are referenced by the query. Put simply, SQL server was able to determine the pages that it needed in advance.
As stated previously, in SQL 2012 ColumnStore indexes could not be updated. And if you tried to update a ColumnStore index, you would get the following error.
Now in SQL 2014 ColumnStore indexes can be clustered, and they can also be updated, which means you have time to make another coffee, because you will see this screen instead!
So there we have it – a little example where you can see that the benefits of using a ColumnStore index are substantial. Please do consider however that this test was performed against the AdventureworksDW database and this particular fact table had only 60,000 rows. In a real life DW scenario it is likely that you will have hundreds of millions of rows, and beyond. And this is where you could truly see a significant gain by using ColumnStore indexes.
The release of SQL 2014 CTP1 has brought IO to the list of resources it can govern! (yes, previously it was memory and cpu only.) Lets get down to business and run through a quick demo so we can get a sneak peek of SQL server 2014 metering IO!
For the sake of simplicity in this demo, we will do 3 things:
1) Create two resource pools, each with their associated workloadgroup, and we will configure the max_iops value only. No throttling of memory or cpu will take place.
2) The users will each will have a login created, and under the context of those logins, we will generate some IO against the system volume (C:)
3) We will review some DMO’s and analyze the results using perfmon.
First create two new resource pools and set the max_iops_per_volume value. Note, there is no GUI for this in SQL 2014 CTP, so you will have to use TSQL. I have shown the code below. Just repeat this code twice for your resource pools and change out the max iops value.
I have randomly chosen to assign a low number so that we can quickly see the throttling take place. As you can see my pool name is ‘users’ and i have set max_iops to 100. My second resource pool is named ‘developers’ and i have given it a max_iops_per_volume value of 50.
Next, we need to create 2 workload groups which will be assigned to these resource pools. These can be done through GUI or scripts.
Then we create and assign a classifier function, and the logins. (I wont show this here as nothing has changed since the last version of SQL. Please email me if you would like a copy of the scripts that were used.)
Now we need to open two new query windows and log into each using the ‘users’ and ‘developers’ credentials. In each window we will run a DBCC CHECKDB on two copies of the Adventureworks2012 database in order to generate some IO.
Open Performance monitor to see how the IO has been throttled. The results look like this. (Note the performance counters that we are using)
The first thing that i notice is that throttling is clearly taking place. The second thing that i notice however is that the max_iops values that i have set are not being reached consistently. I suspect that a lot of CHECKDB is happening in memory after the initial process begins as AdventureWorks is quite small, a more IO specific workload and tuning of the max_iops could see a more consistent result set here.
Lets have a look at a couple of DMO’s. Microsoft has kindly added several new fields to the sys.dm_resource_governor_resource_pools. You can see here the reads_total versus the reads_throttled. (Note that this System view is not populated until you restart your instance, and that this is only a subset of the newly available columns.)
Lets also checkout sys.dm_resource_governor_resource_pool_volumes!
Here you can see the volumes that are in use by our resource pools, and also a taste of some of the other new columns added. Notice that the ‘developers’ login experienced much more throttling. (This DMV is populated once a user connection is made through Resource Governor.)
So there we have it, a first look at resource governors new IO capability. I have done a very basic example here and if you would like to test further you could scale out by increasing the workload, reducing concurrency, and playing with different values for min and max iops.
Just when we thought we were getting used to SQL server 2012, Microsoft unveils 2014 CTP1!
I still have a lot of reading to do on SQL 2012… So my initial feelings are denial! Then I realize that I cannot alter this fact, and I get over it. (Isn’t IT good for helping us accept change in life!) A little investigation shows that SQL server 2014 is destined to adopt, amongst other things, a new high volume, in-memory OLTP technology. Codename “Hekaton”…
Sounds cool right!
You can read more about Hekaton here: http://research.microsoft.com/en-us/news/features/hekaton-122012.aspx
And you can read more on SQL 2014 here: http://www.brentozar.com/archive/2013/06/almost-everything-you-need-to-know-about-the-next-version-of-sql-server/
To checkout Dave Ballantyne tearing Hekaton apart, go here:
Meanwhile… I am busy trying to get covered off with my SQL server 2012 skills! Let alone a version that is not even out yet… But a new version of SQL is too exciting to refuse!
To get started, download SQL server 2014 CTP1 here:
Take note that it can only be installed on a 64 bit platform and cannot coexist with any other SQL installation.
I have chosen to use Oracle’s VirtualBox to create a fresh VM using a trial version of windows 2012. But you can use whatever platform you like, as long as it meets the minimum requirements.
As always with a CTP version, the finished product may or may not contain the exact features that we are looking at… But let’s not wait to get started!
Finalize your install like any other by reviewing the setup log, and verifying that the data files and binaries exist where you wanted them. Make sure the services start and are running under the correct context. Then Open SSMS to set your preferred feature, instance and DB options, and then backup the sys DB’s!
Ready to go!