Easy there, champ!  Let’s not go getting all crazy with applying compression now that you feel you are a good candidate for it.  Compression is extremely flexible and so we should take advantage of that by being selective when choosing what to compress.  By flexible I mean that you can compress:

  • whole heaps,
  • whole clustered indexes,
  • whole indexed views,
  • whole nonclustered indexes, and
  • individual or ranges of partitions for partitioned tables and nonclustered indexes (partition aligned indexes, let’s not be silly).

When I investigated this issue initially, I came across a SQL Server 2008 article on SQL Server Compression authored by Sanjay Mishra (a principal program manager of the SQLCat team over at Microsoft).  Sanjay’s article – which is definitely worth a full read if you’re interested in compression – has fantastic guidance in identifying where and what type of compression is most appropriate in specific cases. The article even contains a sample table that contains, what I would refer to as, a balanced scorecard to use for evaluating compression options.  I merely took the queries he was kind enough to provide in the article and did some work around them to get them to build the balanced scorecard manually for me instead of having to do it in excel (I’m lazy, what can I say).  Of note:

  • I did not perform this work with partitioning in mind,
  • Stats are only as representative as the duration of time SQL Server was last started or the database was opened,
  • This is at the database level, so it will iterate through all objects within the currently active database

Not to Get All Management-y On You, but It’s Balanced Scorecard Time

Below is the query that will build the balanced scorecard that comes from aforementioned SQLCat article:

These Aren’t the Droids You’re Looking For

Now that you have your balanced scorecard, what are you looking for?  This is how I look for good candidates:

  • Page compression – most CPU impactful so we’re a little more careful
    • Low update percentage to total table activity ( < 30% ) – compressing a very active table is going to cause overhead on update/insert/delete activity, which is in turn going to cause other issues
    • High percentage of scans to total table activity ( >65% ) – scans are going to give you the most bang for your buck fitting more data on each page, decreasing the amount of pages needing to be retrieved from storage and stored in memory
    • Database size is large (there’s a reason I sort the results descending by size) – 25 KB tables that compress to 2KB, really don’t matter because they’ll fit in memory compressed and uncompressed
    • Large difference between Row_Compression_Savings_Percentage and Page_Compression_Savings_Percentage – if you only get 25% additional savings from page compression, is it really worth the extra CPU overhead?  Again, this depends on the size of the table because if the table is 100GB, 25% is going to be 25GB and that’s a whole lot fewer pages
  • Row compression – least CPU impactful so we can be a little more liberal
    • Reasonable update percentage to total table activity ( <65% ) – we can compress tables that are a bit more active with Row vs. Compression as there will be less overhead on the insert/update/delete operations
    • Low to moderate percentage of scans compared to total table activity ( <40% ) – as you will likely only be touching a subset of pages with high seek activity, you still benefit, just not as much
    • Database size is reasonably large – same reason as page compression
    • Small difference between Row_Compression_Savings_Percentage and Page_Compression_Savings_Percentage – if the gain from page compression over row compression is small, why bother with the added overhead of page compression?

The tables that don’t fit for either page or row compression?  Those are indeed not the droids you are looking for.

Ready…Set…Compress!

There are ideal ways to go about compressing your data:

  • One compress operation at a time or many at the same time – I like one at a time as it lets you keep track of the system impact you are having much better.  Additionally it keeps your memory, CPU and disk requirements at a more predictable level
  • Compress small to large – as you complete the compression operations, you are releasing more space to data files so that they can hopefully absorb the space requirements for the compression on the larger objects without having to grow
  • Set online ON or OFF for the rebuild operation
    • OFF is faster and requires less resources to complete.  Offline rebuilds apply a table lock for the duration of the index operation.
    • ON is slower and requires more resources to complete.  Online rebuilds apply an Intent Share (IS) lock on the source table for the duration of index operation – except at the beginning (a shared lock – S Lock) and the end (schema mod lock – SCH-M).
    • Off requires a maintenance window due to the fact that you are taking the index offline while you are rebuilding it
  • Sort in tempdb – especially with online rebuilds, I like this option as it allows you to offload a lot of the IO activity – the results of the sort runs specifically.  This option specifically allows you to store the intermediate results of the rebuild operation in the tempdb vs. the same database as the index, specifically the destination filegroup

That’s All, Folks!

I’m really not going to cover the actual TSQL behind enabling compression.  It’s all covered in the BoL quite extensively and pretty darn well.  Indexes and tables have data compression enabled through:

I would also like to thank Michael Swart for proofreading this blog series to provide insights on writing it more fluidly and cohesively – very much appreciated!

I hope this was a beneficial starting point for identifying whether or not data compression may help with your specific environment!