Compression Has Hoops, JUMP!

Great!  Compression might just be able to help you out.  There are some additional hoops we need to jump through first before you start into your databases guns blazing.  The hoops you need to get through are:

Enterprise Edition

Table compression is an enterprise only feature, boys and girls.  If you’re on standard, you can’t use it.  In fact, if you’re on SQL Server Standard and below, you can’t even use the compression estimation stored procedures to see if it warrants the upgrade to Enterprise or not (I know, great marketing Microsoft – buy it, then figure out if you need it!).  You’ll be greeted with this lovely error if you attempt to use the sp_estimate_data_compression_savings stored procedure:

In order to find out the edition of your SQL Server environment, you can use:

CPU Impact

Do you have the spare cycles on your CPU to handle the extra workload introduced by compression?

In order to determine if you have the spare room in the CPU department to handle the extra workload introduced by compression, head back to Performance Monitor to get some information.  One counter is all you’ll need for this: Processor: % Processor Time.  You’ll want to get a relatively large sample size (over a few days) in order to ensure that you are seeing CPU utilization from all of your processing/database access patterns.  Eyeball this to make sure average utilization is not exceeding 90% for all processors.  Ideally you’ll be around the 60-80% level, which will leave all compression options available to you.

Types of Compression – CPU Impact

Since we’re on the topic, we might as well go over the types of compression and the impact they have on CPU:

  • Row compression
    • Stores fixed data type columns in variable-length format
  • Page compression
    • The following operations are performed (and it’s even done in this order!):
      • Row compression – you can’t have one without the other!
      • Prefix compression – stores repeated prefix values for a column in a row in a special compression information (CI) structure that is immediately after the page header.  The repeated prefix values in the column are replaced with a reference to the corresponding prefix in the CI structure (even partial matches can be indicated)
      • Dictionary compression – searches the page for repeating values and stores them in another CI area.  Dictionary compression is not column specific and operates on the page.
  • Unicode compression
    • This is implemented automatically with row and page compression – can’t have this without the others (one of the two).
    • The implementations benefits depend on the locale and involve compressing Unicode values that don’t require localization with one byte instead of two bytes.

That’s a whole lot of information regarding table compression, but what does it all mean?  For all forms of compression, the data pages are compressed both on disk and in memory.  Whenever SQL Server needs access that page, it must rely on the CPU to decompress the page before it can be read and then additional overhead to perform the compression of any modified pages.  The amount of strain placed on the CPU to compress and decompress the data is directly related to how much work is done by the CPU to perform the compression.  Thus, page compression is the most CPU intensive, and also most effective, form of data compression.  It will generally require anywhere from 20-30% of additional CPU capacity (I emphasize generally).  Row compression is generally a 10% hit to your CPU (again, emphasis on generally).