Data compression is a very powerful tool. Like all things, however, it is best applied when it will likely solve some of your problems. In this post, I am going to walk you through the methods that I use to determine if a SQL environment is experiencing symptoms that could be addressed through thoughtful implementation of compression. There are three main resource areas that I use to determine if I should be investigating compression: disk space pressure, memory performance issues, and disk I/O performance issues.

Disk Space Pressures

This one’s pretty straightforward.  Do you have much, or any, space available on the presented drive(s) that your database files are stored on?  More importantly, if this is a shared storage environment, is it possible to easily add additional capacity to the LUN(s)?  The first question can be answered simply by looking at disk management (diskmgmt.msc).  The second one may involve taking your wonderful SAN administrator out to lunch to ask (and it certainly wouldn’t hurt to pay for said lunch and bat your eyes at him/her as you beg, plead and/or threaten to key their car for additional capacity).

It is generally considered a best practice to right size your data files to account for growth and monitor your free space within the data files so that you can grow them in a manual and controlled manner, so this should not really be fluctuating very much.  You will more than likely want to take a look at the free space within your data files to find out how close you are to either an autogrowth event or outgrowing your available disk space, depending on how much room you left yourself on disk for growth.  I put together the following query to allow you to take a look at that information (it executes at the instance level and iterates all databases within the instance):

Memory Performance Issues

Memory performance is best looked at from a holistic view and not from the silo of a singular memory counter.  If you are fortunate enough to have a SQL monitoring solution, you will likely already have historic information that you can reference.  In the event that you do not have a monitoring solution already in place, the quickest way to get this information is to use the Performance Monitor (perfmon.msc) and the PAL (Performance Analysis of Logs) tool.

I’m going to describe the minimum set of counters that I like to use. But before I get to that, let me say a few things about the PAL tool and thresholds. Most of the rule-of-thumb thresholds I use come from the PAL Tool’s SQL Server 2012 threshold file.  If you haven’t heard of this tool, it lives on the CodePlex website and is a wonderful tool to use for diagnosing system issues (not just SQL Server).  You can, in fact, use the PAL Tool to export a canned perfmon template that you can use to gather most of this information and make life easy (https://pal.codeplex.com/).  I will caution you to let your Collector Set run somewhere… anywhere that isn’t production so you can take a look at what the size and performance impacts are going to be (especially if you are running the full set of counters that the PAL Tool exports for you in the template).

The memory counters that I like to look at are:

  • SQLServer:Buffer Manager – Page life expectancy
    • Unhealthy < 300s (5 minutes)
    • This is the duration, in seconds, that a data page stays in the buffer pool.  Higher is better for this statistic.
    • Note: if your SQL Server has multiple NUMA nodes, it is a good idea to also include SQLServer:Buffer Node – Page life expectancy.
  • SQLServer:SQL Statistics – Batch Requests/sec
    • There is no unhealthy threshold for this, it is merely a metric to give you an idea of how busy your SQL Server is at that time
    • This statistics is included for contextual purposes to help identify if there are a lot of simultaneous requests vs. few, heavy queries.  Why do you care?  Because it’s infinitely less effort and less impactful to the system to tune queries and indexes than it is to compress your data.
  • SQLServer:Memory Manager – Pages/sec
    • Unhealthy > 500
    • A high value doesn’t necessarily indicate an issue, but it is contextually valid for getting a view of the big picture.
  • SQLServer:Buffer Manager – Lazy writes/sec
    • Unhealthy > 20
    • This is how many times per second the lazy writer is flushing dirty pages from the buffer pool to disk.  This counter should increase as page life expectancy decreases.
  • Paging File – % Usage
    • Unhealthy > 60-70%
    • This is more indicative of external memory pressures (assuming maximum server memory has been configured for the instance)
  • Memory – Available Mbytes
    • Unhealthy < 200 MB
    • Your SQL server instances should have their maximum server memory settings configured to allow the operating system and any other applications on the server to have adequate memory available to complete their roles.  If you see this decreasing, the OS/applications on the server may require additional memory.

Some of the counters above are not explicitly relevant for identifying SQL Server memory pressures (and are also not in the PAL template), but they are useful in identifying whether memory issues are external or internal, which I like to do.  Why do I like to know if it’s internal vs. external memory pressure?  Internal memory pressure is more adequately solved with compression, however you can also address external memory pressure with compression and decreasing SQL Server’s memory footprint (but hey, we’re DBAs and we don’t usually want to decrease our memory footprint!).  The most significant indicators of internal memory pressure are going to be high lazy writes/sec and low/decreasing Page Life Expectancy.  External memory pressures are going to be evident if you notice low Memory – Available Mbytes and high Paging File – % Usage.

Once you have an adequate sample size, you can either evaluate it by eyeballing it or by pointing the PAL Tool to the counter log path to do the analysis for you.  You’re looking for any signs of memory pressure (even external memory pressure, although there are other things that you can do to mitigate that as well).  One last comment, this is just a piece of the puzzle.  Use it in conjunction with other clues.

Disk I/O Performance Issues

Now you would think that disk I/O performance would be the only thing to look at to identify whether you want to compress your data (After all, making the access methods pull fewer pages is the most important thing, right?!  Nope…).  To figure out if your storage subsystem needs a break, it’s time to dust off PerfMon again and pull another two counters  for investigation.  Here are the counters:

  • Physical disk: Avg. disk sec/Read
    • Less than stellar > .010
    • Unhealthy > .020
    • This counter tells you how long (in seconds) it takes to perform a read operation from disk.  Remember, this increases as the demand on the storage subsystem increases.  Make sure you obtain samples from a long enough period of time to hopefully even obtain a few times when you have depleted the reserves of the SAN’s disk caches (if you are in a SAN environment).  We want to make sure we’re getting some figures that observe all pieces of your storage access profile.
  • Physical disk: Avg. disk sec/Write
    • Less than stellar > .010
    • Unhealthy > .020
    • This counter tells you how long (in seconds) it takes to perform a write operation to disk.  I could copy and paste the same information as what was written for Avg. disk sec/Write, but I think we get the point.

Unhealthy disk metrics could certainly benefit from compression (less so if you have poorly tuned queries and indexes that do a lot of scanning, but still some benefit).

In Part II, I am going to go into much greater detail about what the limitations of compression are, the SQL Server implementations of the types of data compression and compression’s impact on the CPU.