r
Statistical Tools
Correlation Coefficient Calculator
Browse all pages
Formula anatomy

Correlation Coefficient Formula: Every Symbol Explained

Most explanations show you the formula. This page shows you what it means, why it works, and how each number in the calculation changes the final r value.

Conceptual form

Best for understanding what r is measuring.
r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r = \frac{\sum_{i=1}^{n}(x_i-\bar{x})(y_i-\bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i-\bar{x})^2}\sqrt{\sum_{i=1}^{n}(y_i-\bar{y})^2}}

Computational form

Best for hand calculation because it avoids writing deviations first.
r=nxiyixiyi[nxi2(xi)2][nyi2(yi)2]r = \frac{n\sum x_i y_i - \sum x_i\sum y_i}{\sqrt{[n\sum x_i^2-(\sum x_i)^2][n\sum y_i^2-(\sum y_i)^2]}}

Standardized-score form

Best for seeing r as the average product of z-scores.
r=1n1i=1n(xixˉsx)(yiyˉsy)r = \frac{1}{n-1}\sum_{i=1}^{n}\left(\frac{x_i-\bar{x}}{s_x}\right)\left(\frac{y_i-\bar{y}}{s_y}\right)
Quick calculator

Calculate r from two lists

Result
r = 0.953

Very strong positive correlation from 5 paired values.

r squared
0.907
Numerator
28
Denominator
29.394
See step-by-step solution ↓
Scatter plot previewr = 0.953
Step-by-step solution

See every number behind r

Step 1

Compute the means

xbar = 5, ybar = 4.6
Step 2

Multiply paired deviations

sum cross-products = 28
Step 3

Normalize by spread

sqrt(20 x 43.2)
Step 4

Divide

28 / 29.394 = 0.953
ixyx - xbary - ybarproduct(x - xbar)^2(y - ybar)^2
121-3-3.610.8912.96
243-1-1.61.612.56
3530-1.6002.56
46712.42.415.76
58934.413.2919.36
Sum2523282043.2

Computational shortcut

The hand-calculation form gives the same answer after multiplying the conceptual numerator and denominator by n.

nΣxy - ΣxΣy = 140
nΣx² - (Σx)² = 100
nΣy² - (Σy)² = 216

Standardized-score view

If you convert each value to a z-score, r is the average product of paired z-scores. Points that are above average together push r upward; opposite-side pairs push it down.

r = average zX x zY = 0.953

What every symbol in the formula actually means

The formula looks intimidating because it compresses four ideas into one line: center the data, measure co-movement, normalize by spread, then read the final ratio.

Deviation from the mean

(xixˉ)and(yiyˉ)(x_i - \bar{x}) \quad \text{and} \quad (y_i - \bar{y})

This asks how far each value sits from its own average. Positive means above average; negative means below average.

When X and Y deviations have the same sign, their product is positive. When they have opposite signs, the product is negative.

Sum of cross-products

(xixˉ)(yiyˉ)\sum (x_i - \bar{x})(y_i - \bar{y})

This is the co-movement score. It grows positive when X and Y tend to be above or below average together.

Divide this by n - 1 and you get the sample covariance. Pearson r standardizes that covariance.

Normalizing denominator

(xixˉ)2(yiyˉ)2\sqrt{\sum (x_i - \bar{x})^2}\sqrt{\sum (y_i - \bar{y})^2}

The numerator depends on measurement scale. The denominator rescales it so dollars, cents, meters, and miles all land on the same -1 to +1 scale.

Geometrically, these two square-root terms are the lengths of the mean-centered X and Y vectors.

The final ratio r

r=observed co-movementmaximum possible co-movementr = \frac{\text{observed co-movement}}{\text{maximum possible co-movement}}

A ratio of 1 means the variables move in perfect lockstep; 0 means their mean-centered movements do not align.

The Cauchy-Schwarz inequality guarantees the numerator cannot exceed the denominator in absolute value.

The geometric secret: r is a cosine

After subtracting the means, the X values become one vector and the Y values become another vector. Pearson r is the cosine of the angle between those two vectors.

r=cosθ=ababr = \cos\theta = \frac{\vec{a}\cdot\vec{b}}{|\vec{a}||\vec{b}|}

This is why Pearson r has no units. It is measuring directional alignment after the variables have been centered.

0 degrees

r = 1

The vectors point in the same direction: perfect positive correlation.

90 degrees

r = 0

The vectors are perpendicular: no linear alignment.

180 degrees

r = -1

The vectors point in opposite directions: perfect negative correlation.

Step-by-step calculation with real numbers

Here is a temperature and ice-cream-sales example. The important part is not the subject; it is how each row contributes to the numerator and denominator.

Means
xbar = 19.067
ybar = 416
Numerator
2750.6
Denominator
2817.002
Result
r = 0.976
itemperaturesalesx - xbary - ybarproduct(x - xbar)^2(y - ybar)^2
114.2215-4.867-201978.223.68440401
216.4325-2.667-91242.6677.1118281
322.15223.033106321.5339.20111236
419.44120.333-4-1.3330.11116
525.16146.0331981194.636.40139204
617.2408-1.867-814.9333.48464
Sum114.424962750.679.99399202

Conclusion: temperature and ice-cream sales have a very strong positive linear correlation in this example.

Why all three formulas give the same result

The conceptual form, computational form, and standardized-score form are algebraically equivalent. The first explains the idea; the second is convenient for arithmetic; the third reveals the z-score interpretation.

Show the algebra from conceptual form to computational form

Expanding the numerator turns the deviation expression into raw sums:

(xixˉ)(yiyˉ)=xiyinxˉyˉ=xiyixiyin\sum(x_i-\bar{x})(y_i-\bar{y}) = \sum x_i y_i - n\bar{x}\bar{y} = \sum x_i y_i - \frac{\sum x_i\sum y_i}{n}

Multiplying the numerator and denominator by n gives the computational formula. The denominator follows the same expansion for X and Y separately.

Sample vs. population correlation coefficient

In practice, you almost always calculate sample correlation rr and use it to learn about the population correlation ρ\rho.

QuestionSample rPopulation rho
Symbolrrho
Mean notationxbar, ybarmu_x, mu_y
Standard deviations_x, s_y using n - 1sigma_x, sigma_y using N
Use caseMost research samplesFull population data

From r to r squared

Squaring r gives the coefficient of determination. If r = 0.80, then r squared = 0.64, meaning 64% of the variance in Y is explained by X in this simple linear relationship.

Understand r squared →
r = 0.80 → r² = 0.64
64% explained
explained by Xother factors

Five mistakes people make when calculating r

Mistake

Multiplying raw values instead of deviations

Pearson r is built from values after subtracting their means. Raw products mostly reflect scale, not association.

Mistake

Using Pearson for a curved pattern

Pearson measures linear association. A strong U-shape can produce r near zero even when the relationship is obvious.

Mistake

Ignoring outliers

A single extreme point can pull r from weak to strong. Always inspect the scatter plot before trusting the number.

Mistake

Confusing sample and population notation

Lowercase r is a sample statistic. Greek rho is the population parameter you usually cannot observe directly.

Mistake

Treating r as causation

A high r does not prove X causes Y. Confounders, selection effects, and reverse direction can all create correlation.

FAQ

What is the correlation coefficient formula?

The Pearson correlation coefficient formula is r = sum((xi - xbar)(yi - ybar)) divided by the product of the square roots of sum((xi - xbar)^2) and sum((yi - ybar)^2). It measures linear relationship strength from -1 to +1.

What does each part of the formula mean?

The numerator measures how X and Y deviate from their means together. The denominator scales that co-movement by the spread of X and the spread of Y, forcing r onto a unitless -1 to +1 scale.

Is there a simpler formula for hand calculation?

Yes. The computational form uses n, sums of x, sums of y, sums of xy, and sums of squares. It gives the same answer but avoids writing every deviation first.

What is the difference between r and rho?

Lowercase r is the sample correlation coefficient calculated from observed data. Greek rho is the population correlation coefficient, the true value for the entire population.

Why does r always stay between -1 and +1?

Algebraically, this follows from the Cauchy-Schwarz inequality. Geometrically, Pearson r equals the cosine of the angle between two mean-centered vectors, and cosine values always fall between -1 and +1.