Before vs After Controlling for Z
The raw correlation between X and Y is 0.997 (very strong). After controlling for Z, the partial correlation is 0.594 (moderate), a 40% drop in absolute strength.
Ice cream sales correlate with drowning deaths. But once you control for summer heat, the correlation vanishes. Partial correlation is how you find the truth behind the numbers.
Partial correlation measures the relationship between X and Y after statistically removing the influence of a control variable Z.
The raw correlation between X and Y is 0.997 (very strong). After controlling for Z, the partial correlation is 0.594 (moderate), a 40% drop in absolute strength.
The test uses t = r(XY.Z) * sqrt(n - 3) / sqrt(1 - r(XY.Z)^2), with df = n - 3.
Raw r = 0.997 -> partial r = 0.594. Absolute strength changed by 40%.
Z explains part of the X-Y relationship, but a remaining direct relationship may still exist.
Z being held constant, X accounts for 35.2% of the remaining variance in Y.
To find the pure correlation between X and Y, first remove the influence of Z from both X and Y separately, then correlate what is left.
Use rXY = 0.63, rXZ = 0.57, and rYZ = 0.88.
0.63 - (0.57 x 0.88) = 0.128
sqrt(1 - 0.57^2) x sqrt(1 - 0.88^2) = 0.390
r(XY.Z) = 0.128 / 0.390 = 0.329
In the stork example, controlling for land area reduces the stork-birth-rate correlation from 0.63 to about 0.33. Much of the original relationship was explained by the control variable.
Ice cream sales and drowning deaths. Control for temperature, and the link vanishes.
Z is the true driver. X and Y have little direct relationship.
Stork counts and birth rates. Control for land area, and the relationship weakens.
Z explains part of the relationship, but some direct X-Y link remains.
Study time and exam score. Control for IQ, and the relationship barely changes.
Z is not explaining the X-Y relationship.
Control for Z and the relationship becomes stronger or changes direction.
Z was masking the true relationship.
| Type | Definition | Symbol | Use |
|---|---|---|---|
| Zero-order | Ordinary correlation with no control variable | r(XY) | Initial relationship exploration |
| Partial | Controls Z for both X and Y | r(XY.Z) | Remove a confounding variable |
| Semi-partial | Controls Z for only one variable | r(X.Z)Y | Unique contribution in regression |
Use partial correlation when you want the pure relationship between X and Y. Use semi-partial correlation when you are building a regression model and want to know how much unique variance X contributes.
Second-order partial correlation controls two variables. Conceptually, it repeats the same logic after one control variable has already been removed.
For controlling more than two variables simultaneously, use multiple regression residuals or dedicated software like R, Python, SPSS, or Stata.
Partial correlation measures the relationship between two variables while statistically removing the effect of one or more control variables. It is used to test whether an observed correlation is genuine or driven by a confounding variable.
They are closely related. The partial correlation r(XY.Z) equals the correlation between the residuals of regressing X on Z and the residuals of regressing Y on Z.
Yes. This is called a suppressor effect. When Z suppresses the true X-Y relationship, removing Z's influence can reveal a stronger or even sign-reversed correlation.
It means that after controlling for Z, there is insufficient evidence of a direct relationship between X and Y in your sample. This could indicate a spurious correlation or insufficient power.
Yes. The same formula can be applied to Spearman rank correlations, giving a rank-based partial correlation that is more robust to outliers and non-normality.
Partial correlation is one step in a broader workflow: first inspect the zero-order relationship, then test significance, then think carefully about causation.