r
Statistical Tools
Correlation Coefficient Calculator
Live calculator, charts, and worked steps

Point-Biserial Correlation Calculator

Use this point-biserial correlation calculator to measure the relationship between a binary variable coded 0 or 1 and a continuous outcome. Enter paired values, upload a spreadsheet, or paste group data from Excel to instantly calculate rpb, η², Cohen's d, p-value, confidence intervals, and the mathematically equivalent independent-samples t-test result.

Validates binary variable automaticallyShows group mean comparisonEquivalent t-test result includedFree forever and no sign-up
Live update window
300ms
Manual row limit
500
Output package
r, p, CI
Active method
Pearson r

Best for continuous variables with a linear relationship.

r=i=1n(xixˉ)(yiyˉ)i=1n(xixˉ)2i=1n(yiyˉ)2r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \cdot \sum_{i=1}^{n}(y_i - \bar{y})^2}}
Data input
Enter or import paired values
8 valid pairs
Row
X Variable
Y Variable
1
2
3
4
5
6
7
8
Drag and drop a CSV or Excel file
Column headers are detected automatically so you can choose which variables become X and Y.
Example datasets
Interactive Scatter Plot
2.04.06.08.050.055.060.065.070.075.080.085.090.0X VariableY Variable
Correlation Meter
-1.00.0+1.00.9991
Strength badgeVery strong
Data health check
Sample sizeGood
8 valid pairs gives a stable first-pass estimate.
Distribution shapeGood
Neither variable shows strong skewness from a quick sample-skewness check.
Linearity checkInfo
Pearson and Spearman are close, which supports a mostly linear trend.
Residual Plot
X VariableResidual
r
0.9991
0.9981
p-value
2.13e-9
t
56.1931
df
6
95% CI
0.995 to 1.000
99% CI
0.991 to 1.000
Automatic interpretation
This dataset shows a very strong positive linear relationship. It is statistically significant at the 0.05 level.

Pearson r = 0.9991 based on 8 valid pairs, p = 0.0000.

Your result
r = 0.9991Very strong
-1-0.500.51
Statistically significant at p < 0.05
r² = 0.9981 so X explains 99.81% of Y variance.
Sample size n=8 is small, so treat the confidence interval with caution.
Step-by-step
How the calculator got this result
Step 1: Compute the means

Average the X values and the Y values before measuring joint movement.

x̄ = (1.0000 + 2.0000 + 3.0000 + 4.0000 + 5.0000 + 6.0000 + 7.0000 + 8.0000) / 8 = 36.0000 / 8 = 4.5000
ȳ = (52.0000 + 57.0000 + 62.0000 + 67.0000 + 72.0000 + 77.0000 + 83.0000 + 86.0000) / 8 = 556.0000 / 8 = 69.5000
Step 2: Measure paired deviations

Subtract the mean from every X and Y value to get centered deviations.

#1: dx = 1.0000 - 4.5000 = -3.5000, dy = 52.0000 - 69.5000 = -17.5000
#2: dx = 2.0000 - 4.5000 = -2.5000, dy = 57.0000 - 69.5000 = -12.5000
#3: dx = 3.0000 - 4.5000 = -1.5000, dy = 62.0000 - 69.5000 = -7.5000
#4: dx = 4.0000 - 4.5000 = -0.5000, dy = 67.0000 - 69.5000 = -2.5000
#5: dx = 5.0000 - 4.5000 = 0.5000, dy = 72.0000 - 69.5000 = 2.5000
#6: dx = 6.0000 - 4.5000 = 1.5000, dy = 77.0000 - 69.5000 = 7.5000
#7: dx = 7.0000 - 4.5000 = 2.5000, dy = 83.0000 - 69.5000 = 13.5000
#8: dx = 8.0000 - 4.5000 = 3.5000, dy = 86.0000 - 69.5000 = 16.5000
Step 3: Sum the covariance numerator

Multiply each pair of deviations and add them up.

#1: (-3.5000) × (-17.5000) = 61.2500
#2: (-2.5000) × (-12.5000) = 31.2500
#3: (-1.5000) × (-7.5000) = 11.2500
#4: (-0.5000) × (-2.5000) = 1.2500
#5: (0.5000) × (2.5000) = 1.2500
#6: (1.5000) × (7.5000) = 11.2500
#7: (2.5000) × (13.5000) = 33.7500
#8: (3.5000) × (16.5000) = 57.7500
Σ(xᵢ - x̄)(yᵢ - ȳ) = 209.0000
Step 4: Sum the squared deviations

Compute the denominator from the independent spread of X and Y.

#1: dx² = 12.2500, dy² = 306.2500
#2: dx² = 6.2500, dy² = 156.2500
#3: dx² = 2.2500, dy² = 56.2500
#4: dx² = 0.2500, dy² = 6.2500
#5: dx² = 0.2500, dy² = 6.2500
#6: dx² = 2.2500, dy² = 56.2500
#7: dx² = 6.2500, dy² = 182.2500
#8: dx² = 12.2500, dy² = 272.2500
Σ(xᵢ - x̄)² = 42.0000
Σ(yᵢ - ȳ)² = 1042.0000
Step 5: Divide numerator by denominator

The covariance term is normalized by both standard-deviation components.

r = 209.0000 / √(42.0000 × 1042.0000)
r = 0.9991

How to Use This Calculator

Step 1

Enter X as a binary variable coded 0 and 1, then enter Y as a continuous numeric variable.

Step 2

Use the variable setup guide to confirm that both groups are present and that the coding is valid.

Step 3

Read rpb, η², Cohen's d, the group means, confidence intervals, and the equivalent t-test result from the output cards.

Step 4

Expand the step-by-step section to inspect the group split, group means, population standard deviation, and final t conversion.

What Is Point-Biserial Correlation?

Point-biserial correlation measures the relationship between one truly binary variable and one continuous variable. The binary variable must be naturally dichotomous, coded as 0 and 1, while the other variable can be any continuous numeric outcome such as test score, income, recovery score, or productivity. A point-biserial correlation is ideal when the research question is really about how strongly group membership predicts or tracks a continuous result.

Mathematically, point-biserial correlation is not a different family of statistic from Pearson correlation. It is simply Pearson r applied to the special case where one variable is binary. That means if X is coded 0 and 1, Pearson r and rpbr_{pb} are the same number. The interpretation is then immediate: positive values mean the group coded 1 tends to have higher Y values, negative values mean the group coded 1 tends to have lower Y values, and larger absolute values indicate stronger group separation.

Typical examples include item analysis in education, where correct or incorrect on a question is compared to total score, often as an item discrimination index; medical studies, where treatment or control is compared to a biomarker or recovery index; marketing, where purchase or non-purchase is compared to income; and HR analytics, where certification status is compared to performance score. One important distinction is that point-biserial correlation is for naturally dichotomous variables. If the binary variable is artificially dichotomous because a continuous variable was cut into high versus low groups, the biserial correlation is conceptually more appropriate.

Point-Biserial Correlation Formula

rpb=Yˉ1Yˉ0sYn1n0n2r_{pb} = \frac{\bar{Y}_1 - \bar{Y}_0}{s_Y} \cdot \sqrt{\frac{n_1 \cdot n_0}{n^2}}
rpb=Yˉ1Yˉ0sYpq,p=n1/n,q=n0/nr_{pb} = \frac{\bar{Y}_1 - \bar{Y}_0}{s_Y} \cdot \sqrt{p \cdot q}, \quad p = n_1/n, \quad q = n_0/n
t=rpbn21rpb2,df=n2t = \frac{r_{pb}\sqrt{n-2}}{\sqrt{1-r_{pb}^2}}, \quad df = n-2
rpb=t2t2+dfr_{pb} = \sqrt{\frac{t^2}{t^2 + df}}
η2=rpb2,d=2rpb1rpb2\eta^2 = r_{pb}^2, \quad d = \frac{2r_{pb}}{\sqrt{1-r_{pb}^2}}
SymbolMeaning
Yˉ1\bar{Y}_1Mean of the continuous variable in the group coded 1
Yˉ0\bar{Y}_0Mean of the continuous variable in the group coded 0
sYs_YPopulation standard deviation of the continuous variable, using n in the denominator
n1n_1Number of observations in the group coded 1
n0n_0Number of observations in the group coded 0
nnTotal sample size, equal to n_1 + n_0
ppProportion of observations in the group coded 1
qqProportion of observations in the group coded 0

The main formula compares the mean of the continuous outcome in the group coded 1 against the mean in the group coded 0, then standardizes that gap by the population standard deviation of the continuous variable. The weighting term involving n1n_1, n0n_0, pp, and qq ensures that the coefficient reflects both the mean gap and the balance between the two groups.

This calculator uses the standard form with sYs_Y computed using nn in the denominator, not n1n-1. That matches the conventional point-biserial formula. Once rpbr_{pb} is known, you can convert it directly to the equivalent t statistic, recover it from a reported t test, and treat η2=rpb2\eta^2 = r_{pb}^2 as an immediately readable effect size.

How to Calculate Point-Biserial Correlation Step by Step

Step 1

Identify Group 1 (binary = 1) and Group 0 (binary = 0), then list the continuous values in each group.

Step 2

Compute Ȳ₁ and Ȳ₀ so you can see the raw mean difference before any standardization.

Step 3

Compute the overall population standard deviation of the continuous variable using all observations.

Step 4

Apply the rpb formula with the mean gap, the standard deviation, and the group proportions.

Step 5

Convert rpb into the equivalent t-statistic and read the identical p-value you would get from an independent samples t-test.

Relationship Between rpb and the Independent Samples t-Test

A point-biserial correlation and an independent samples t-test are two ways to report the same comparison. Both ask whether the two binary groups differ on a continuous outcome. The t-test frames the question as a mean difference, while rpbr_{pb} frames it as a standardized association between a binary grouping and a numeric variable. Because the two statistics are mathematically equivalent, they return the same p-value for the same data.

The choice between them depends on the story you want to tell. Report the t-test when your audience expects hypothesis testing about group means. Report rpbr_{pb}when you want a compact effect size inside a correlation framework, especially alongside other coefficients such as Pearson or Spearman. In practice, point-biserial correlation is often the cleaner option because it gives you both significance and an immediately interpretable effect size in one statistic.

rpb ↔ t-test Equivalence
As correlation: rpb = 0.72, p = 0.003
As t-test: t(18) = 4.41, p = 0.003
rpb = √(t² / (t² + df))
rpb = √(19.45 / (19.45 + 18)) = 0.72

How to Interpret Point-Biserial Correlation Results

Point-biserial correlation is best interpreted as an effect size for the difference between two groups. The sign tells you whether the group coded 1 tends to score higher or lower than the group coded 0. The magnitude tells you how substantial that difference is. Because η2=rpb2\eta^2 = r_{pb}^2, the squared coefficient gives the proportion of variance in the continuous outcome explained by group membership.

d=2rpb1rpb2d = \frac{2r_{pb}}{\sqrt{1-r_{pb}^2}}

A helpful cross-check is Cohen's d. Roughly speaking, rpb=0.10r_{pb} = 0.10 is near d=0.20d = 0.20 for a small effect, rpb=0.24r_{pb} = 0.24 is near d=0.50d = 0.50 for a medium effect, and rpb=0.37r_{pb} = 0.37 is near d=0.80d = 0.80 for a large effect.

0.50 to 1.00
Large Effect
η² ≥ 0.25
Group means differ strongly
0.30 to 0.49
Medium Effect
η² 0.09 to 0.24
Group means differ moderately
0.10 to 0.29
Small Effect
η² 0.01 to 0.08
Group means differ slightly
0.00 to 0.09
Negligible
η² < 0.01
Groups are nearly indistinguishable
Negative values
Same effect sizes
Same η²
Group coded 1 scores lower than group coded 0

Point-Biserial vs Pearson vs Biserial Correlation

Use point-biserial correlation when the binary variable is genuinely binary. If both variables are continuous, switch to Pearson correlation. If the binary grouping was created by cutting an underlying continuous variable at a threshold, the biserial correlation is the theoretical alternative, but it depends on stronger assumptions and is far less common in applied software.

Point-Biserial rpbPearson rBiserial rb
X variable typeTrue binary 0 or 1ContinuousArtificially dichotomized
Y variable typeContinuousContinuousContinuous
Typical use casePass or fail vs scoreHeight vs weightHigh income vs low income after a cutoff
Relation to PearsonPearson special caseBase statisticAdjusted under normality assumptions
Range-1 to +1-1 to +1Can exceed ±1
Recommended useNatural binary groupingTwo continuous variablesOnly for true artificial dichotomies
LinkCurrent page/pearson-correlation/Conceptual comparison only

Real-World Use Cases

Educational testing and item discrimination

A classic use of point-biserial correlation is item analysis: correct versus incorrect on a specific question against the total test score. In this setting it is commonly reported as an item discrimination index. Higher values mean the item separates stronger performers from weaker performers.

Medical treatment and control studies

Clinical analysts often compare a treatment indicator coded 1 or 0 against a recovery score, biomarker level, or symptom index. Point-biserial correlation expresses that group difference as a standardized effect size.

HR, marketing, and operational decisions

Certification status versus productivity, purchase versus non-purchase against income, and yes or no outcomes versus continuous KPIs are all natural point-biserial use cases where a binary grouping is linked to a continuous measure.

Certification vs Productivity

A clean HR-style example with pass or fail group membership and a productivity score.

Treatment vs Recovery Score

A clinical group comparison between treated patients and control patients.

Item Analysis: Question 5

Classic educational testing example: got the item correct or incorrect versus total exam score.

Frequently Asked Questions

What is point-biserial correlation?

Point-biserial correlation (rpb) measures the strength and direction of the relationship between a naturally dichotomous binary variable coded 0 and 1 and a continuous variable. It is mathematically equivalent to Pearson r when one variable is binary, and it produces the same p-value as an independent samples t-test. Values range from -1 to +1.

When should I use point-biserial correlation?

Use point-biserial correlation when one variable is a true binary variable such as pass or fail, yes or no, treatment or control, or certified or not certified, and the other variable is continuous. Common applications include item analysis in educational testing, where it often acts as an item discrimination index, along with clinical group comparisons, marketing conversion analysis, and HR performance studies.

How is point-biserial correlation related to the t-test?

Point-biserial correlation and the independent samples t-test are mathematically equivalent and produce identical p-values for the same dataset. You can convert between them using rpb = √(t² / (t² + df)). Point-biserial is often preferred when you want to report a standardized effect size inside a correlation framework.

What is a good point-biserial correlation?

Using common effect-size rules of thumb, rpb around 0.10 is small, 0.24 is medium, and 0.37 or above is large. In educational item analysis, values above 0.30 are usually acceptable and values above 0.40 are often considered excellent discrimination.

What is the difference between point-biserial and biserial correlation?

Point-biserial correlation is used when the binary variable is naturally dichotomous, such as alive versus dead or passed versus failed. Biserial correlation is used when the binary variable is artificially dichotomous, meaning it was created by splitting an underlying continuous variable at a threshold. Biserial correlation relies on stronger normality assumptions and is less commonly used in practical software workflows.

How do you calculate point-biserial correlation by hand?

First separate the continuous values into the group coded 1 and the group coded 0. Then compute each group mean, compute the overall standard deviation of the continuous variable using n in the denominator, and apply rpb = ((Ȳ₁ - Ȳ₀) / sY) × √(n₁n₀ / n²). This calculator shows each of those steps explicitly.

Can point-biserial correlation be negative?

Yes. A negative point-biserial coefficient means the group coded as 1 has a lower mean on the continuous variable than the group coded as 0. The sign depends entirely on your coding choice, so swapping 0 and 1 flips the sign while keeping the effect size magnitude the same.

What sample size is needed for point-biserial correlation?

A total sample of about 20, with at least around 10 observations per group when possible, is a practical minimum for a stable first estimate. Larger samples are better for significance testing, and very unequal group sizes reduce power and limit the maximum possible rpb value.