Investing oftentimes involves closely examining numbers. Investors rely on data, and trends in that data, to provide insights into what's happening with their investments. Numbers can sometimes look odd to analysts too; which is certainly true with Simpson's Paradox.

In this article, we're going to discuss the topic of Simpson's Paradox.  As part of that explanation, we'll provide a brief history of the term as well as a summary definition.  Then we'll finish this topic with some examples, illustrating how this paradox can apply to investment portfolios.

In the study of probability and statistics, Simpson's paradox is defined as the seemingly contradictory result that occurs when improvements in all subpopulations occur, yet when these subpopulations are combined, the improvement is lost.

### History

In the paper "The Interpretation of Interaction in Contingency Tables," published in the Journal of the Royal Statistical Society back in 1951, Edward Hugh Simpson explained the phenomenon whereby an event that would increase the occurrence of a condition in a given population could, at the same time, decrease the occurrence of that same condition in every subpopulation.

The importance of this paper to all analysts is simple:  Don't make assumptions about data; take care when interpreting numbers.  Let's see how this works with some practical examples.

### Finding Average Values

Every analyst knows that when examining a population, it's possible to calculate average values for each segment of the population.  They also know it's incorrect to take an average of those average values to determine the average for the population.  This point can be demonstrated with a hypothetical example consisting of a three-stock portfolio:

#### Average Value Example

 Investment Starting Value Ending Value Increase Stock A 100 110 10.0% Stock B 200 240 20.0% Stock C 300 390 30.0% Totals 600 740 23.3%

By taking a simple average of the increase for the above three stocks, the analyst might incorrectly conclude the overall portfolio increase was 20%.  The total row demonstrates the correct value is actually 23.3%.  Anyone that's made this mistake in the past knows the rule is "you cannot take an average of an average."

Another foolproof solution is to find the weighted average of each segment and add them together as shown in this second example:

#### Weighted Value Example

 Investment Starting Value Ending Value Increase Weighted Value Stock A 100 110 10.0% 1.7% Stock B 200 240 20.0% 6.7% Stock C 300 390 30.0% 15.0% Totals 600 740 23.3% 23.3%

The weighted value is found by taking each starting value and dividing it by the total of all starting values, then multiplying that number times the increase.  For Stock A, the calculation is:

Stock A Weighted Value = (100 / 600) x 10.0% = 1.7%

Is it possible for every stock in a portfolio to increase its year-over-year return and the overall value of the portfolio to decrease?  The answer is yes, especially if someone is actively trading stocks.

### Investment Portfolios

A very simple stock portfolio example is able to demonstrate Simpson's paradox.  In this case, there is a hypothetical portfolio consisting of three stocks, and trading is limited to exchanges between these stocks.  The total number of shares held at the start and end of this timeline will be exactly the same (3,000).  Finally, the ending value of each stock will be exactly 10% higher than its starting value.  Unfortunately, the ending value of the portfolio is exactly 10% lower than the starting value, as demonstrated in the example below.

 Stock A Stock B Stock C Totals Starting Stock Value \$10.00 \$20.00 \$30.00 Shares Held 1,000 1,000 1,000 3,000 Starting Value \$10,000 \$20,000 \$30,000 \$60,000 Stock Ending Value \$12.00 \$24.00 \$36.00 Shares Held 2,000 500 500 3,000 Ending Value \$24,000 \$12,000 \$18,000 \$54,000

For the above to be true, there were obvious dips in the value of stocks when the majority of the trades occurred.  Still, this example makes the point, and this scenario can, and does, occur all the time.  During bear markets, many investors try to time the market and wind up selling low, only to re-enter the market after prices have risen.  This is the classic mistake of "buy high / sell low."

A more commonly cited example of Simpson's paradox has to do with unemployment rates.  This example involves breaking a population into subgroups based on their level of education.  Over a five-year timeframe, each segment experiences an increase in unemployment, yet the unemployment rate of the entire population goes down.  This second example illustrates how this can happen.

 No High School High School College Degree Totals Initial Counts 10,000,000 10,000,000 10,000,000 30,000,000 Unemployment Rate 8.0% 6.0% 4.0% 6.0% Unemployment Count 800,000 600,000 400,000 1,800,000 Final Counts 5,000,000 5,000,000 20,000,000 30,000,000 Unemployment Rate 8.8% 6.6% 4.4% 5.5% Unemployment Count 440,000 330,000 880,000 1,650,000

In this second example, the unemployment rate for each subgroup increased by 10%, yet the overall rate fell from 6.0% to 5.5%.  These two examples not only serve to illustrate Simpson's paradox, but also the importance of examining data with care.