Statistics Guide

What Is Correlation? A Complete Guide

Learn what correlation means, how statisticians measure it, when to use Pearson, Spearman, Kendall, or Point-Biserial coefficients, and how to avoid the classic correlation-versus-causation mistake. This page is designed as the central guide for the entire site, so you can move from concepts to the right calculator without losing context.

15 min read

Last updated March 26, 2026

Jump to Calculator Jump to FAQ

Table of Contents

1. The Short Answer 2. Formal Definition 3. The Correlation Coefficient 4. Types of Correlation 5. How to Interpret Strength 6. Real-World Examples 7. Correlation vs CausationMost read 8. Which Method to Use?9. How to Calculate 10. Common Misconceptions 11. FAQ

The Short Answer

Correlation is a statistical measure that describes how two variables move in relation to each other. When one variable increases and the other tends to increase as well, they have a positive correlation. When one increases while the other decreases, they have a negative correlation. Correlation is measured by a coefficient ranging from -1 for perfect negative to +1 for perfect positive, with 0 indicating no linear relationship.

Formal Definition of Correlation

\text{Correlation}(X, Y) = \frac{\text{Cov}(X, Y)}{\sigma_X \cdot \sigma_Y}

\text{where } \text{Cov}(X,Y) = \frac{1}{n}\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})

In formal statistics, correlation is a standardized measure of how strongly two variables co-vary. Covariance tells you whether deviations from one mean tend to occur alongside deviations from another mean, but covariance still depends on the original units of measurement. Correlation removes those units by dividing covariance by the product of both standard deviations. That is why a correlation coefficient is dimensionless: it measures relationship strength on a common scale regardless of whether the original variables are in dollars, kilograms, minutes, or survey points.

Historically, Francis Galton first became famous for studying related patterns such as parent and child height, while Karl Pearson later turned that intuition into a general mathematical coefficient. Pearson's formulation focuses on linear co-variation: if one variable rises above its mean when the other does, the covariance and correlation become positive. If one tends to rise while the other falls below its mean, the coefficient moves negative.

One subtle but important distinction is linear versus monotonic association. Pearson correlation measures how well points align around a straight line. Rank-based statistics such as Spearman's rho and Kendall's tau instead measure whether variables move consistently in the same order, even when the curve is not straight. Correlation is also symmetric: $r(X,Y) = r(Y,X)$ . Swapping X and Y does not change the coefficient, because correlation describes mutual association rather than directional prediction.

The Correlation Coefficient: Measuring Relationship Strength

The correlation coefficient, usually written as $r$ for Pearson correlation, always falls between -1 and +1. A value of +1 means perfect positive correlation: every point lies exactly on an upward-sloping straight line. A value of -1 means perfect negative correlation: every point lies exactly on a downward-sloping line. A value of 0 means there is no linear relationship. That does not prove the variables are unrelated, only that a straight-line summary does not fit the pattern.

The squared coefficient, $r^2$ , is often called the coefficient of determination in simple linear settings. It tells you how much of the variance in one variable is explained by the linear pattern with the other. For example, $r = 0.70$ implies $r^2 = 0.49$ , so about 49 percent of the variance is explained by the relationship. That makes $r^2$ a better measure of predictive usefulness than raw $r$ alone.

The Correlation Scale

-1.0-0.7-0.30+0.3+0.7+1.0

r = -1

Perfect negative correlation. Every point falls on a descending line.

r = 0

No linear relationship. The points form a scattered cloud.

r = +1

Perfect positive correlation. Every point falls on an ascending line.

Types of Correlation

Positive Correlation

A positive correlation means the variables move in the same direction. As one variable increases, the other tends to increase too. Positive correlations range from 0 up to +1. The closer the value is to +1, the tighter the upward trend.

Real examples are everywhere: study time and exam score often correlate around $r \approx 0.70$ , height and weight in adults are often near $r \approx 0.65$ , and advertising spend versus sales can reach $r \approx 0.80$ in stable markets. In a scatter plot, these points usually run from the lower left toward the upper right.

Scatter plot: positive correlation

Negative Correlation

A negative correlation means the variables move in opposite directions. When one variable increases, the other tends to decrease. Negative correlations range from -1 up to 0. The closer the value is to -1, the more tightly the points follow a downward-sloping line.

Common examples include absences versus final grade at roughly $r \approx -0.65$ , temperature versus hot chocolate sales at $r \approx -0.75$ , and exercise level versus body-fat percentage around $r \approx -0.60$ . In a scatter plot, the pattern runs from upper left to lower right.

Scatter plot: negative correlation

Zero Correlation

Zero correlation means there is no linear relationship, so the coefficient sits near 0. A familiar example is shoe size versus IQ, where $r \approx 0$ in adult samples. The scatter plot looks like a cloud without a clear direction.

The important caution is that $r \approx 0$ does not mean the variables are completely unrelated. It only means a straight line does not capture the association. A U-shaped or inverted-U relationship can still be strong while Pearson correlation stays near zero.

Scatter plot: near-zero correlation

How to Interpret Correlation Strength

Absolute r range	Strength	Meaning	Typical context
0.90 to 1.00	Very strong	One variable almost predicts the other	Precision measurement
0.70 to 0.89	Strong	Clear linear trend with limited scatter	Psychological scales, economics
0.50 to 0.69	Moderate	Visible trend but substantial noise	Social science
0.30 to 0.49	Weak	Trend exists but is not obvious	Epidemiology
0.10 to 0.29	Very weak	Needs larger samples to detect reliably	Subtle behavioral effects
0.00 to 0.09	Negligible	Practically no linear relationship	-

Correlation strength is always contextual. In physics or engineering, an $r = 0.90$ relationship may still feel disappointingly noisy. In psychology or education, $r = 0.40$ can be remarkably strong because human behavior contains so many uncontrolled influences. That is why generic cutoffs should be treated as a starting point rather than a final answer.

Sample size also changes how you read correlation. With a sample of only 10 observations, you need roughly $r = 0.63$ to reach the conventional 5 percent significance level. With 100 observations, a much smaller $r = 0.20$ can become statistically significant. Significance, however, is not the same thing as importance.

Another common mistake is to treat correlation as a linear strength scale. It is not. An $r = 0.60$ relationship is not just twice as strong as $r = 0.30$ . Compare $r^2$ instead: 0.36 versus 0.09. That means the stronger association explains four times as much variance.

r^2 = 0.70^2 = 0.49 \Rightarrow \text{X explains 49\% of Y's variance}

Real-World Examples of Correlation

Strong Positive Correlation Examples

Variable X	Variable Y	Estimated r	Notes
Study time (hours)	Exam score	~0.70	A classic education pattern: more practice usually tracks better results.
Parents' average height	Child height	~0.65	Galton's original heredity work used this kind of association.
City population	GDP total	~0.85	Larger cities usually generate larger total economic output.
Cigarette consumption	Lung cancer incidence	~0.90	A famous public-health association before causality was firmly established.

Strong Negative Correlation Examples

Variable X	Variable Y	Estimated r	Notes
Unemployment rate	Consumer confidence	~-0.75	When job insecurity rises, sentiment usually weakens.
Exercise frequency	Resting heart rate	~-0.65	Better conditioning is generally linked to lower resting pulse.
Product price	Sales volume	~-0.70	Demand often falls as prices rise.
Sleep duration	Anxiety score	~-0.55	Shorter sleep is often linked to worse mental-health scores.

Surprising Near-Zero Correlations

Some of the most memorable examples of correlation are the ones that fail to appear. Shoe size versus IQ in adults is usually near zero once age is not confounding the sample. Birth month versus lifetime income is also extremely small in most datasets, despite recurring folklore around lucky months and hidden advantages.

Other examples are culturally persistent precisely because they feel plausible. Moon phase versus crime rate usually shows a near-zero relationship once enough data is collected, even though folklore says full moons trigger unusual behavior. Zodiac sign versus personality traits also remains essentially uncorrelated in controlled studies. These examples are useful reminders that intuitive stories often outrun real data.

Correlation vs Causation

Correlation does not imply causation. Just because two variables are correlated does not mean one causes the other. This is the single most important warning in introductory statistics because the human mind naturally turns patterns into stories. Once we see two things move together, we immediately want to explain why. Statistics asks you to slow down and separate association from mechanism.

When two variables correlate, at least three broad explanations are possible. First, X may really cause Y. Smoking and lung cancer became the textbook example because the association was strong, persistent, time-ordered, biologically plausible, and supported by multiple lines of evidence. Second, the direction may run the other way: Y could influence X. Depression and social isolation illustrate this challenge, because each can plausibly worsen the other. Third, a third variable Z may drive both X and Y. Hot weather creates a classic confounding pattern: it raises ice cream sales and also raises swimming activity, which can increase drownings. Ice cream does not cause drowning, even if the two series correlate.

This is why spurious correlations are so instructive. Tyler Vigen popularized absurd but real examples such as Nicolas Cage film appearances versus pool drownings, per-capita cheese consumption versus deaths caused by bedsheet entanglement, and Maine divorce rates versus margarine consumption. Those correlations are statistically real in the recorded series, but they do not reveal a meaningful causal mechanism. They reveal that enough time-series data can generate weird alignment by chance, seasonal overlap, or shared background trends.

To establish causation, statisticians look for stronger evidence than association alone. Randomized controlled trials are the gold standard because randomization helps break the link between treatment and hidden confounders. Even outside experiments, analysts want clear time ordering, plausible mechanisms, replication, and explicit attempts to control alternative explanations. If X supposedly causes Y, X must happen before Y. If a mechanism is impossible or incoherent, the causal story weakens. If the effect disappears after adjusting for a third variable, confounding was probably doing the real work.

Correlation is still extremely useful. It is excellent for prediction, pattern detection, exploratory analysis, and generating hypotheses worth testing. It is not enough on its own for clinical decisions, policy claims, or strong intervention advice. The right practical habit is simple: use correlation to discover questions, not to pretend you have already answered them.

Which Correlation Coefficient Should You Use?

Choosing the right coefficient matters because different statistics answer slightly different questions. Pearson focuses on linear association in continuous data. Spearman and Kendall work on ranked or ordered information and emphasize monotonic relationships. Point-Biserial handles the special case where one variable is binary and the other is continuous. If you are unsure, start with the decision guide below and then open the dedicated calculator for the method that matches your data best.

Which Correlation Method Should I Use?

Is one variable binary (0 or 1)?

Yes -> Point-Biserial

Are both variables continuous?

No -> Spearman or Kendall

Is the relationship roughly linear and the data approximately normal?

Yes -> Pearson

Is the sample very small or full of ties?

Yes -> Kendall, otherwise Spearman

Pearson r

Pearson r is the default choice when both variables are continuous and the relationship is roughly linear. It is the most widely reported coefficient in textbooks and software. Its main strength is interpretability and speed, but it is sensitive to outliers and can be misleading when the pattern is curved or the variables are ordinal.

Pearson Correlation Calculator

For continuous variables with approximately normal distributions and a linear pattern.

r, r squared, p-value, confidence interval, scatter plot, and worked steps

Try calculator

Spearman rho

Spearman rho is a non-parametric, rank-based coefficient. It is useful for ordinal scales, skewed numeric variables, and data where monotonic ordering matters more than exact distances. If your points form an increasing curve instead of a straight line, Spearman is often more informative than Pearson.

Spearman Correlation Calculator

For ordinal, skewed, rank-based, or monotonic data that does not fit Pearson well.

rho, p-value, tie handling, rank-based steps, and monotonic interpretation

Try calculator

Kendall tau

Kendall tau is also rank-based, but it works through concordant and discordant pairs rather than ranked distances. It tends to behave especially well in smaller samples and in data with many ties. Its probability-style interpretation is also more intuitive for many users: it can be understood in terms of pairwise agreement.

Kendall Tau Calculator

For smaller samples, tie-heavy rankings, and probability-style interpretation.

tau-b, concordant and discordant counts, p-value, and pairwise steps

Try calculator

Point-Biserial rpb

Point-Biserial correlation is the right choice when one variable is truly binary and coded as 0 or 1, while the other is continuous. A classic example is correct versus incorrect on a test item against total score. It is mathematically equivalent to Pearson correlation in that special case and closely linked to the independent-samples t-test.

Point-Biserial Correlation Calculator

For one binary variable and one continuous outcome, including t-test equivalence.

rpb, eta squared, Cohen's d, group means, and equivalent t-test output

Try calculator

	Pearson r	Spearman rho	Kendall tau	Point-Biserial
Data type	Continuous	Ordinal or continuous	Ordinal or continuous	Binary plus continuous
Normality assumption	Helpful	Not required	Not required	Not required
Small samples	Reasonable	Reasonable	Best	Reasonable
Outliers	Sensitive	More robust	Most robust	Depends on group structure
Calculator	Use Pearson	Use Spearman	Use Kendall	Use Point-Biserial

How to Calculate Correlation

By hand, the general logic is always the same. First, line up paired observations. Second, compute each variable's mean. Third, measure how far each observation falls above or below that mean. Fourth, combine those deviations in a standardized way to see whether the two variables tend to move together or in opposite directions. Pearson uses raw deviations, Spearman uses ranked values, and Kendall uses pairwise ordering.

In modern practice, however, almost nobody computes real research correlations entirely by hand. The useful skill is knowing which method to choose and how to interpret the result. Software should do the arithmetic so you can focus on assumptions, outliers, sample size, and meaning. If you need a fast route, the calculators on this site will handle the coefficient, p-value, confidence interval, and explanation automatically.

# Excel
=CORREL(A2:A20, B2:B20)

# R
cor(x, y, method = "pearson")   # or "spearman", "kendall"
cor.test(x, y)                  # with p-value

# Python (pandas)
df["X"].corr(df["Y"])           # Pearson default
df["X"].corr(df["Y"], method="spearman")

# Python (scipy)
from scipy import stats
stats.pearsonr(x, y)            # returns (r, p-value)
stats.spearmanr(x, y)
stats.kendalltau(x, y)

# SPSS
Analyze -> Correlate -> Bivariate

Correlation in Different Fields

Correlation is one of the rare statistical ideas that appears almost everywhere. In psychology, researchers use it to compare traits, symptoms, scales, and test scores. In medicine, it links biomarkers to severity, exposure to outcomes, and treatment indicators to recovery scores. In finance, it helps investors think about diversification by showing whether assets tend to rise and fall together.

Education uses correlation for both classroom questions and measurement theory. Study habits, attendance, and grades can all be analyzed with Pearson or Spearman, while item discrimination often relies on Point-Biserial correlation. In machine learning, correlation matrices are used early in feature selection to find duplicated or near-duplicated predictors. Climate science uses correlation to compare long-running time series such as temperature, rainfall, sea ice, and greenhouse-gas concentration.

The point is not that every field uses the same cutoff or method. The point is that correlation is a general language for asking whether two measurements move together strongly enough to be informative. Once you understand the concept, the domain-specific details become a matter of assumptions and context rather than brand-new theory.

Field	Typical application	Common method
Psychology	Scale construction, personality-trait association, reliability checks	Pearson or Spearman
Medicine	Biomarkers versus disease severity, treatment indicators versus outcomes	Pearson or Point-Biserial
Finance	Asset-return relationships and portfolio diversification	Pearson
Education	Item discrimination, study habits versus grades, attendance versus scores	Point-Biserial or Pearson
Machine Learning	Feature selection and multicollinearity screening	Pearson matrix
Climate Science	Temperature, precipitation, sea ice, and CO2 associations	Pearson or Spearman

Common Misconceptions About Correlation

Misconception 1

Myth: r = 0 means the variables are completely unrelated.

Correct view: A zero Pearson correlation only means there is no linear relationship. A curved pattern such as y = x squared can produce r near zero even when the variables are strongly linked.

Misconception 2

Myth: r = 0.6 is twice as strong as r = 0.3.

Correct view: Correlation strength is not linear. Compare r squared instead: 0.36 versus 0.09. That means r = 0.6 explains four times as much variance as r = 0.3.

Misconception 3

Myth: A strong correlation proves causation.

Correct view: Correlation alone cannot distinguish direct causation, reverse causation, or confounding. Causal claims need design and evidence beyond association.

Misconception 4

Myth: A statistically significant correlation must be important.

Correct view: P-values tell you whether the effect is unlikely under a null model, not whether the effect is large or practically useful. Tiny correlations become significant in very large samples.

Misconception 5

Myth: Negative correlation is worse than positive correlation.

Correct view: The sign only tells you direction. A strong negative relationship can be just as informative and useful as a strong positive one.

Misconception 6

Myth: Higher r always means better prediction.

Correct view: Prediction quality depends on r squared, model form, sample stability, and error structure, not just the raw coefficient.

Misconception 7

Myth: Pearson r works for every dataset.

Correct view: Pearson assumes a linear pattern and can be distorted by outliers, skew, or ordinal scales. In those cases Spearman or Kendall is often a better choice.

Misconception 8

Myth: The overall correlation always matches the subgroup correlations.

Correct view: Simpson's paradox shows that pooled data can reverse the apparent direction seen inside subgroups, so subgroup structure always deserves inspection.

Frequently Asked Questions

What is correlation in simple terms?

Correlation tells you whether two things tend to change together. If taller people tend to weigh more, height and weight are positively correlated. If people who sleep less tend to feel more tired, sleep and fatigue are negatively correlated. It is measured on a scale from -1 to +1, where plus or minus 1 is a perfect relationship and 0 means no linear relationship.

What is a correlation coefficient?

A correlation coefficient is a number between -1 and +1 that measures the strength and direction of the relationship between two variables. Pearson r measures linear association, while Spearman rho and Kendall tau measure monotonic association using ranks. The closer the value is to plus or minus 1, the stronger the relationship.

What is the difference between positive and negative correlation?

In a positive correlation, both variables move in the same direction, so when one increases the other tends to increase too. In a negative correlation, they move in opposite directions, so when one rises the other tends to fall. The sign of the correlation coefficient tells you which direction the relationship runs.

What does a correlation of 0.5 mean?

A correlation of 0.5 indicates a moderate positive relationship. It means r squared equals 0.25, so the first variable explains 25 percent of the variance in the second. Whether 0.5 is considered strong depends on the field. In psychology it may be impressive, while in physics it may be disappointing.

Does correlation mean causation?

No. Correlation means two variables tend to change together, but the pattern could come from direct causation, reverse causation, or a third variable that influences both. Establishing causation requires stronger evidence such as randomization, time ordering, mechanism, and control of confounding variables.

What is a strong correlation?

A common rule of thumb is that absolute values above 0.70 are strong, values from about 0.30 to 0.70 are moderate, and values below 0.30 are weak. Those cutoffs vary by discipline. A correlation of 0.50 may be strong in social science but too weak for engineering calibration.

What is the difference between correlation and covariance?

Covariance measures how two variables change together, but its value depends on the variables' original units, which makes it hard to compare across studies. Correlation is a standardized form of covariance. By dividing by both standard deviations, it becomes unit-free and always falls between -1 and +1.

Can correlation be greater than 1 or less than -1?

No. Pearson, Spearman, and Kendall correlation coefficients are mathematically bounded between -1 and +1. If you calculate a number outside that range, there is an error in the computation or the formula being used. A less common measure such as biserial correlation can behave differently, but standard correlations cannot.

How do you test if a correlation is statistically significant?

For Pearson-style tests, convert the correlation r to a t statistic using t = r times square root of (n minus 2) divided by square root of (1 minus r squared), then read the two-tailed p-value with degrees of freedom n minus 2. If p is below 0.05, the correlation is usually called statistically significant.

What is the difference between correlation and regression?

Correlation measures association symmetrically, so corr(X,Y) equals corr(Y,X). Regression goes further by modeling how one variable predicts another, producing a slope and intercept. Correlation is mainly descriptive, while regression is directional and used for estimation and prediction.

What Is Correlation? A Complete Guide

The Short Answer

Formal Definition of Correlation

The Correlation Coefficient: Measuring Relationship Strength

Types of Correlation

Positive Correlation

Negative Correlation

Zero Correlation

How to Interpret Correlation Strength

Real-World Examples of Correlation

Strong Positive Correlation Examples

Strong Negative Correlation Examples

Surprising Near-Zero Correlations

Correlation vs Causation

Which Correlation Coefficient Should You Use?

Pearson r

Spearman rho

Kendall tau

Point-Biserial rpb

How to Calculate Correlation

Correlation in Different Fields

Common Misconceptions About Correlation

Frequently Asked Questions

Ready to Calculate?