This is the true test of compensation geekdom. The real geeks just sat forward in their chairs in eager anticipation of a meaty statistics debate. Everyone else either felt their eyes begin to glaze over, or ran for the hills.

A recent exchange with fellow Minnesota blogger Lisa Rosendahl (if you aren't reading Lisa's blog, start today) reminded me that it's been a few years since I tackled this topic, so here it comes once again.

Pay surveys present us with a variety of *descriptic statistics* to choose from in our efforts to review or develop new compensation structures and practices. Since most of us are seeking information on the middle of the market, we are typically presented with a choice between the following two measures of central tendency:

Mean- the mathematical average, calculated by adding up all the pay rates in the data set and then dividing by the number of pay rates in the data set.

Median- the value of the pay rate that falls in the middle of all the rates in an ordered data set (that is to say, a data set which is ordered from lowest to highest rate).

(The other measure of central tendency, which rarely if ever shows up in compensation surveys, is the **mode**, or the pay rate which occurs most often in the data set.)

I have a strong preference for the median over the mean. And I found a great explanation of my preference in an old Psychology textbook (Psychology, the 8th Edition, by David G. Myers, Worth Publishers):

With income distribution, the mode, median, and mean often tell very different stories. This happens because the mean is biased by a few extreme scores. When Microsoft Chairman Bill Gates sits down in an intimate cafe, its average (mean) patron instantly becomes a billionaire.

In other words, the mean pay rate in a pay survey will be affected by any extreme pay rates in the data set, where the median will not. For this reason, I believe the median (where it is offered; some surveys only provide the mean) is a better and more reliable measure to use in pay program assessment and design.

Hit me with any dissenting opinions in the comments!

*Image: Creative Commons Photo "Free mixed numbers" by D. Sharon Pruitt*

I'm quite upset with you for enticing me to reveal my true compensation geekness, but this is as a topic which, while not of world-changing proportions, has been central to many discussions I've had during my 20 years as a human resources consultant. So I can't resist offering a few comments.

First, I agree with you regarding a personal preference for the median. Not only does it provide a more representative description of central tendency in a set of survey data, minimizing the effect of outliers, but it also provides a more stable year-over-year measure. In most compensation surveys, particularly those with relatively few participants or matches to a particular benchmark position, mean data can vary from year to year in ways that don't jibe intuitively or match reality. This can be the result of "churn" -- or turnover -- among organizations participating in the survey. In extreme cases, particularly when dealing with finer "cuts" of data, even change of a single incumbent at a single organization can have a pronounced effect on the norm. The median will almost always show greater year-to-year stability and more closely line up with intuitive, realistic market movement.

Another thing that I prefer about the median is that I find it more relevant in translating compensation market data to discussions of compensation market targets. In other words, an organization that targets its compensation policies at the median of a particular market can reasonably expect that its compensation offerings are equal to or more attractive than one half of the organizations with which it competes for talent. If targeting the mean, it is impossible to make a similar quantitative statement about the organization's offerings relative to the market.

Having said all that, it has been my experience that a greater number of organizations choose to target or refer to the mean rather than the median of a market, despite the disadvantages we've identified. Why? Quite simply because it is an easier concept for many organizations -- or more specifically their managers and employees -- to understand. I am continually surprised at the number of people with whom I casually use the term median only to have to stop and define the term, often with the support of visual aids. This is even more true when the conversation turns to the median's statistical first cousins -- quartiles and percentiles.

Posted by: Joe Brown | February 05, 2010 at 03:05 PM

Joe:

Wow. I hereby pass my comp geek crown to you!

I second all of your experiences and observations. And yes, I do continue to encounter colleagues who are thrown by the term "median", or who have to be continually reminded of what it is. Or then there is the one colleague of mine who - despite my many efforts to use and define the term for her over the years - insists on calling it the "medium". Gaaaaahh.

Posted by: Ann Bares | February 05, 2010 at 03:21 PM

Ann - tell your friend that the "medium" is the person in HR who knows the future market pay data ;-).

Seriously, the median has all the advantages that Joe so eloquently described. In addition, I prefer it over the mean because it serves as a starting point for whatever set of percentiles, quartiles, etc. that you deem necessary to help convey the skewness, flatness, or other descriptors of the distribution shape. (A comparison of the mean against the median can hint at skewness, but leaves a lot of other important information out.) This information is very useful when you want more of an granular idea of where a particular value sits in the distribution (the classic "top quartile", etc. descriptor).

And of course, therein lies another of the other reasons why people sometimes prefer the mean over the median. Comparing two distributions is much simpler using the means; you are pulling all of the bits and pieces of the curve into one number (along with making huge assumptions) and "one number" has a lot of irresistible attraction to decision makers who want things presented in black and white. One curve's mean is either higher or lower than another curve's mean and that's that.

Mark

Posted by: Mark Bennett | February 05, 2010 at 04:10 PM

I'm loving this. Look at all the stat jocks coming together here!

Mark:

I should do that, but I'm afraid if I try to go through it with her one more time, my head will explode.

Great points all well made. Our tolerance for considering multiple statistical "bits and pieces" does ultimately bring us a richer understanding of the data, which - hopefully - informs our advice and decisions. Then, our challenge is to bring that richer understanding back to a simple explanation for those short-attention-spanned decision makers!

Posted by: Ann Bares | February 05, 2010 at 04:33 PM

I am not as much of a stat geek as any of you but I am going to come in and say that while it depends on the data set, the mean seems rarely useful without some additional context.

My personal preference when examining compensation data is to look at it by percentile. Obviously median is an important piece of that but also the gap between the quarters (25 & 75%) goes a long way into determining how competitive we need to go in order to stay safely above market.

Posted by: Lance Haun | February 05, 2010 at 05:19 PM

Great discussion with all bang on correct.

As Joe and Mark nailed the key principles, I'll just add that the MEDIAN, as the middle in the rank-ordered sequence, is not affected by how high the high is or how low the low is; the MEDIAN is still the middle no matter what the value of the extreme outliers: but the MEAN is directly affected by extreme outliers, thus producing the "head in the frezer with feet in the oven yielding an overall average temperature of just fine" scenario.

The MEDIAN is more stable over time and is a superior measure of normalcy, being the center of the whole shebang. Good statisticians intuitively want to see the middle 50% and also ask for standard error statistics to see how precise the MEAN is.

The best arguments for using MEAN are (a) it is typically the weighted average and regular folks are more familiar with the word "average" than with the statistical term "median" and (b) the mean average is required for reliability statistics, since one standard error is defined in terms of the symmetrical deviation from the mean that captures 67% of the observations in a parametric bell-shaped distribution. But pay is never actually distributed in a parametric bell shape. No one earns zero or below the minimum wage and there are always a few exceptions paid way more than anyone else, so pay has a curve like a baseball cap with a long bill extendting to the right.

As Ben. Disraeli said, "there are lies, damn lies, and statistics."

Posted by: E James (Jim) Brennan | February 06, 2010 at 12:11 AM

You beat me to the punch on variability issue James. I do like the median better - but in conjunction with the standard deviation you mention. The key in my mind is the variability of the data. Median tells you the middle and the standard deviation (which of course you sue the mean to calculate) tells you how much the data is spread out/varies from the average.

The best way (IMHO) to look at this is to eliminate the outliers. They exist - but they shouldn't be used when looking for information on the group as a whole. Outliers only explain - outliers - they don't explain the "norm" - at least until they begin to be the norm - and that would be good information to have - the change over time in the variability and the median. Or it could be that it's early on Saturday and I don't have enough coffee.

Posted by: Paul Hebert | February 06, 2010 at 08:45 AM

Lance, Jim and Paul - thanks so much for chiming in here. What a great discussion with additional excellent points, particularly looking at the spread and considering the standard deviation (when it is available - in my experience, few pay surveys provide this and so you have to guess at the variability via the percentiles and any other clues....).

Posted by: Ann Bares | February 06, 2010 at 11:50 AM

The reasoning offered by all the respondents for utilizing mean or median has been excellent. We calculate both and then evaluate the standard deviation between the mean and median to validate the market salary data.

Posted by: Blair Johanson | February 08, 2010 at 08:02 AM

We use the median for our salary pay ranges, and when looking at hourly pay rates we primarily focus on the mode. The mean isn't very helpful when trying to make pay decisions for all of the reasons already listed by others here. I do find it necessary to include definitions when we're talking to line managers, but these measures of central tendency are all easy to explain in plain English, so with a short explanation people get it.

Posted by: Darcy Dees | February 08, 2010 at 09:27 AM

Blair:

Interesting approach for validating the market salary data - thanks for sharing here!

Darcy:

So you use mode for hourly rates - interesing. I can think of some reasons why that might make sense - would love to hear more about how you landed on that approach. Thanks for sharing here!

Posted by: Ann Bares | February 08, 2010 at 03:34 PM

If the question to be answered is what would fully competent person command in base salary seeking a perfectly homogeneous, generic job in a perfectly competitive job market, then all of the fine points discussed would make great debating sense.

In the practical situation, often survey data I see doesn't give you a choice - more often than not, only the median and other quartiles are given without the mean. Even when the mean is given, it may not be noted whether it is weighted. So you take what you can get and move on.

Combining Ann's anecdote about Bill Gates and Darcy's point about the mode, it was precisely a situation of some programmer pay rates in Seattle were the mode would have been the choice because Microsoft so dominated the market.

BTW: A hello to Jim Brennan from a St. Louisian, and to his point about oven, freezer, and body - would not the mean and median coincide in that case?

And, remember don't attempt to do math tricks on percentiles at home.

Posted by: Andy Klemm | February 09, 2010 at 03:55 PM

Andy:

It's true: Given what's offered by some of the resources that try to pawn themselves off as pay surveys, the whole debate on choice of statistics becomes mute and irrelevant. Thanks for the reality check ... and the humor! And for checking in!

Posted by: Ann Bares | February 09, 2010 at 04:56 PM

Great discussion. Very enlightening. One of our main Compensation surveys moved to a new format this year and I cannot gather data by our preferred incumbent weighted median. What statistical measure would be your preference?

Posted by: Christina | February 11, 2010 at 12:27 PM

Christina:

Median. ;-)

Posted by: Ann Bares | February 16, 2010 at 01:19 PM

Andy: hi back to all in the Midwest from the NorthWet.

With freezer-head and oven-feet, my center of mass might be a third temperature, even if the median and mode matched, but I'll heed your advice and eschew further testing. (What a great word that no one ever uses in standard speech!)

;-)

Posted by: E. James (Jim) Brennan | February 16, 2010 at 02:04 PM