Archive Data with Table Partitioning
In my last post called Out In Space, I looked at ways of freeing up some space on your database server in a hurry; this led me to thinking about longer term solutions, like archiving your data.
Let’s run through a demo of how this can be achieved with table partitioning. One of the benefits of this method, as well as being really fast, is that the process of moving data off our transactional table is a metadata change, and will therefore not adversely affect concurrency.
There are lots of ways to implement this, and what you should do depends on your business requirements, so get creative and use a solution that works best for you!
In this example we are concerned with only the last 12 months worth of data, anything older than that will be moved to another table and from there it can be archived. I have chosen to create the partitioning key over an integer column storing the day and month only, this way it can be recycled regardless of the year. e.g. mmdd)
To get started, we create the following objects:
1) a database with 13 Filegroups.
2) at least one data file in each Filegroup.
3) a Partition Function and a Partition Scheme.
The script below can be used to create our Partition Function and Scheme. (Note the boundary points and FG’s specified)
Now that we have defined our storage, let’s go ahead and create two tables (one for our transactional data, and one for our archive data.)
Next, we want to simulate an application, so chuck some data in!
Now we have two partitioned tables, one of them populated with data, which is distributed across its partitions evenly using the partitioning key (ModTimeId). Next we will perform the SWITCH operation. But before we do that, let’s verify our data placement.
Execute a select statement on our newly created tables and you will see that we have no data in our archive table.
You can use the following query to analyze the spread of data in each partition (Note the date ranges within each partition!)
Let’s go ahead and create a procedure that will SWITCH data into the archive table.
Now, execute this procedure. The results are shown below…
What you can see is all data which was in the oldest partition has been (almost) instantaneously relocated to the archive table! (This is possible because the data itself is not moved; but the metadata about the location of the data is changed.)
Now that we have our archive table populated, we can take the final steps in our process. This could be deleting data from the archive table where date older than (<insert threshold here>), moving it to a slower (and cheaper) disk system, or moving to offline/near-line storage.
In terms of completing the solution, you can schedule this proc to run via an SQL agent job once per month, or at your desired frequency. As always, make sure you add some transaction and error handling to ensure your logic is bulletproof and you are moving the correct data.
So there we have a basic example of how you can use table partitioning to archive data. Please note that archiving itself is a broad topic which is beyond the scope of this post. It has many considerations including both business and legal compliance.Posted on: March 5, 2014, by : admin