Live calculator, charts, and worked steps

Point-Biserial Correlation Calculator

Name: Point-Biserial Correlation Calculator
Brand: Correlation Coefficient Calculator
Availability: InStock

Use this point-biserial correlation calculator to measure the relationship between a binary variable coded 0 or 1 and a continuous outcome. Enter paired values, upload a spreadsheet, or paste group data from Excel to instantly calculate rpb, η², Cohen's d, p-value, confidence intervals, and the mathematically equivalent independent-samples t-test result.

✓ Validates binary variable automatically✓ Shows group mean comparison✓ Equivalent t-test result included✓ Free forever and no sign-up

Live update window

300ms

Manual row limit

500

Output package

r, p, CI

Active method

Pearson r

Best for continuous variables with a linear relationship.

r = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n}(x_i - \bar{x})^2 \cdot \sum_{i=1}^{n}(y_i - \bar{y})^2}}

Data input

Enter or import paired values

8 valid pairs

X VariableY Variable

Row

X Variable

Y Variable

Drag and drop a CSV or Excel file

Column headers are detected automatically so you can choose which variables become X and Y.

Example datasets

Interactive Scatter Plot

Correlation Meter

Strength badgeVery strong

Data health check

Sample sizeGood

8 valid pairs gives a stable first-pass estimate.

Distribution shapeGood

Neither variable shows strong skewness from a quick sample-skewness check.

Linearity checkInfo

Pearson and Spearman are close, which supports a mostly linear trend.

Residual Plot

0.9991

r²

0.9981

p-value

2.13e-9

56.1931

95% CI

0.995 to 1.000

99% CI

0.991 to 1.000

Automatic interpretation

This dataset shows a very strong positive linear relationship. It is statistically significant at the 0.05 level.

Pearson r = 0.9991 based on 8 valid pairs, p = 0.0000.

Your result

r = 0.9991Very strong

-1-0.500.51

✓ Statistically significant at p < 0.05

✓ r² = 0.9981 so X explains 99.81% of Y variance.

⚠ Sample size n=8 is small, so treat the confidence interval with caution.

Step-by-step

How the calculator got this result

Step 1: Compute the means

Average the X values and the Y values before measuring joint movement.

x̄ = (1.0000 + 2.0000 + 3.0000 + 4.0000 + 5.0000 + 6.0000 + 7.0000 + 8.0000) / 8 = 36.0000 / 8 = 4.5000

ȳ = (52.0000 + 57.0000 + 62.0000 + 67.0000 + 72.0000 + 77.0000 + 83.0000 + 86.0000) / 8 = 556.0000 / 8 = 69.5000

Step 2: Measure paired deviations

Subtract the mean from every X and Y value to get centered deviations.

#1: dx = 1.0000 - 4.5000 = -3.5000, dy = 52.0000 - 69.5000 = -17.5000

#2: dx = 2.0000 - 4.5000 = -2.5000, dy = 57.0000 - 69.5000 = -12.5000

#3: dx = 3.0000 - 4.5000 = -1.5000, dy = 62.0000 - 69.5000 = -7.5000

#4: dx = 4.0000 - 4.5000 = -0.5000, dy = 67.0000 - 69.5000 = -2.5000

#5: dx = 5.0000 - 4.5000 = 0.5000, dy = 72.0000 - 69.5000 = 2.5000

#6: dx = 6.0000 - 4.5000 = 1.5000, dy = 77.0000 - 69.5000 = 7.5000

#7: dx = 7.0000 - 4.5000 = 2.5000, dy = 83.0000 - 69.5000 = 13.5000

#8: dx = 8.0000 - 4.5000 = 3.5000, dy = 86.0000 - 69.5000 = 16.5000

Step 3: Sum the covariance numerator

Multiply each pair of deviations and add them up.

#1: (-3.5000) × (-17.5000) = 61.2500

#2: (-2.5000) × (-12.5000) = 31.2500

#3: (-1.5000) × (-7.5000) = 11.2500

#4: (-0.5000) × (-2.5000) = 1.2500

#5: (0.5000) × (2.5000) = 1.2500

#6: (1.5000) × (7.5000) = 11.2500

#7: (2.5000) × (13.5000) = 33.7500

#8: (3.5000) × (16.5000) = 57.7500

Σ(xᵢ - x̄)(yᵢ - ȳ) = 209.0000

Step 4: Sum the squared deviations

Compute the denominator from the independent spread of X and Y.

#1: dx² = 12.2500, dy² = 306.2500

#2: dx² = 6.2500, dy² = 156.2500

#3: dx² = 2.2500, dy² = 56.2500

#4: dx² = 0.2500, dy² = 6.2500

#5: dx² = 0.2500, dy² = 6.2500

#6: dx² = 2.2500, dy² = 56.2500

#7: dx² = 6.2500, dy² = 182.2500

#8: dx² = 12.2500, dy² = 272.2500

Σ(xᵢ - x̄)² = 42.0000

Σ(yᵢ - ȳ)² = 1042.0000

Step 5: Divide numerator by denominator

The covariance term is normalized by both standard-deviation components.

r = 209.0000 / √(42.0000 × 1042.0000)

r = 0.9991

How to Use This Calculator

Step 1

Enter X as a binary variable coded 0 and 1, then enter Y as a continuous numeric variable.

Step 2

Use the variable setup guide to confirm that both groups are present and that the coding is valid.

Step 3

Read rpb, η², Cohen's d, the group means, confidence intervals, and the equivalent t-test result from the output cards.

Step 4

Expand the step-by-step section to inspect the group split, group means, population standard deviation, and final t conversion.

What Is Point-Biserial Correlation?

Point-biserial correlation measures the relationship between one truly binary variable and one continuous variable. The binary variable must be naturally dichotomous, coded as 0 and 1, while the other variable can be any continuous numeric outcome such as test score, income, recovery score, or productivity. A point-biserial correlation is ideal when the research question is really about how strongly group membership predicts or tracks a continuous result.

Mathematically, point-biserial correlation is not a different family of statistic from Pearson correlation. It is simply Pearson r applied to the special case where one variable is binary. That means if X is coded 0 and 1, Pearson r and $r_{pb}$ are the same number. The interpretation is then immediate: positive values mean the group coded 1 tends to have higher Y values, negative values mean the group coded 1 tends to have lower Y values, and larger absolute values indicate stronger group separation.

Typical examples include item analysis in education, where correct or incorrect on a question is compared to total score, often as an item discrimination index; medical studies, where treatment or control is compared to a biomarker or recovery index; marketing, where purchase or non-purchase is compared to income; and HR analytics, where certification status is compared to performance score. One important distinction is that point-biserial correlation is for naturally dichotomous variables. If the binary variable is artificially dichotomous because a continuous variable was cut into high versus low groups, the biserial correlation is conceptually more appropriate.

Point-Biserial Correlation Formula

r_{pb} = \frac{\bar{Y}_1 - \bar{Y}_0}{s_Y} \cdot \sqrt{\frac{n_1 \cdot n_0}{n^2}}

r_{pb} = \frac{\bar{Y}_1 - \bar{Y}_0}{s_Y} \cdot \sqrt{p \cdot q}, \quad p = n_1/n, \quad q = n_0/n

t = \frac{r_{pb}\sqrt{n-2}}{\sqrt{1-r_{pb}^2}}, \quad df = n-2

r_{pb} = \sqrt{\frac{t^2}{t^2 + df}}

\eta^2 = r_{pb}^2, \quad d = \frac{2r_{pb}}{\sqrt{1-r_{pb}^2}}

Symbol	Meaning
$\bar{Y}_1$	Mean of the continuous variable in the group coded 1
$\bar{Y}_0$	Mean of the continuous variable in the group coded 0
$s_Y$	Population standard deviation of the continuous variable, using n in the denominator
$n_1$	Number of observations in the group coded 1
$n_0$	Number of observations in the group coded 0
$n$	Total sample size, equal to n_1 + n_0
$p$	Proportion of observations in the group coded 1
$q$	Proportion of observations in the group coded 0

The main formula compares the mean of the continuous outcome in the group coded 1 against the mean in the group coded 0, then standardizes that gap by the population standard deviation of the continuous variable. The weighting term involving $n_1$ , $n_0$ , $p$ , and $q$ ensures that the coefficient reflects both the mean gap and the balance between the two groups.

This calculator uses the standard form with $s_Y$ computed using $n$ in the denominator, not $n-1$ . That matches the conventional point-biserial formula. Once $r_{pb}$ is known, you can convert it directly to the equivalent t statistic, recover it from a reported t test, and treat $\eta^2 = r_{pb}^2$ as an immediately readable effect size.

How to Calculate Point-Biserial Correlation Step by Step

Step 1

Identify Group 1 (binary = 1) and Group 0 (binary = 0), then list the continuous values in each group.

Step 2

Compute Ȳ₁ and Ȳ₀ so you can see the raw mean difference before any standardization.

Step 3

Compute the overall population standard deviation of the continuous variable using all observations.

Step 4

Apply the rpb formula with the mean gap, the standard deviation, and the group proportions.

Step 5

Convert rpb into the equivalent t-statistic and read the identical p-value you would get from an independent samples t-test.

Relationship Between rpb and the Independent Samples t-Test

A point-biserial correlation and an independent samples t-test are two ways to report the same comparison. Both ask whether the two binary groups differ on a continuous outcome. The t-test frames the question as a mean difference, while $r_{pb}$ frames it as a standardized association between a binary grouping and a numeric variable. Because the two statistics are mathematically equivalent, they return the same p-value for the same data.

The choice between them depends on the story you want to tell. Report the t-test when your audience expects hypothesis testing about group means. Report $r_{pb}$ when you want a compact effect size inside a correlation framework, especially alongside other coefficients such as Pearson or Spearman. In practice, point-biserial correlation is often the cleaner option because it gives you both significance and an immediately interpretable effect size in one statistic.

rpb ↔ t-test Equivalence

As correlation: rpb = 0.72, p = 0.003

As t-test: t(18) = 4.41, p = 0.003

rpb = √(t² / (t² + df))

rpb = √(19.45 / (19.45 + 18)) = 0.72

How to Interpret Point-Biserial Correlation Results

Point-biserial correlation is best interpreted as an effect size for the difference between two groups. The sign tells you whether the group coded 1 tends to score higher or lower than the group coded 0. The magnitude tells you how substantial that difference is. Because $\eta^2 = r_{pb}^2$ , the squared coefficient gives the proportion of variance in the continuous outcome explained by group membership.

d = \frac{2r_{pb}}{\sqrt{1-r_{pb}^2}}

A helpful cross-check is Cohen's d. Roughly speaking, $r_{pb} = 0.10$ is near $d = 0.20$ for a small effect, $r_{pb} = 0.24$ is near $d = 0.50$ for a medium effect, and $r_{pb} = 0.37$ is near $d = 0.80$ for a large effect.

0.50 to 1.00

Large Effect

η² ≥ 0.25

Group means differ strongly

0.30 to 0.49

Medium Effect

η² 0.09 to 0.24

Group means differ moderately

0.10 to 0.29

Small Effect

η² 0.01 to 0.08

Group means differ slightly

0.00 to 0.09

Negligible

η² < 0.01

Groups are nearly indistinguishable

Negative values

Same effect sizes

Same η²

Group coded 1 scores lower than group coded 0

Point-Biserial vs Pearson vs Biserial Correlation

Use point-biserial correlation when the binary variable is genuinely binary. If both variables are continuous, switch to Pearson correlation. If the binary grouping was created by cutting an underlying continuous variable at a threshold, the biserial correlation is the theoretical alternative, but it depends on stronger assumptions and is far less common in applied software.

	Point-Biserial rpb	Pearson r	Biserial rb
X variable type	True binary 0 or 1	Continuous	Artificially dichotomized
Y variable type	Continuous	Continuous	Continuous
Typical use case	Pass or fail vs score	Height vs weight	High income vs low income after a cutoff
Relation to Pearson	Pearson special case	Base statistic	Adjusted under normality assumptions
Range	-1 to +1	-1 to +1	Can exceed ±1
Recommended use	Natural binary grouping	Two continuous variables	Only for true artificial dichotomies
Link	Current page	/pearson-correlation/	Conceptual comparison only

Real-World Use Cases

Educational testing and item discrimination

A classic use of point-biserial correlation is item analysis: correct versus incorrect on a specific question against the total test score. In this setting it is commonly reported as an item discrimination index. Higher values mean the item separates stronger performers from weaker performers.

Medical treatment and control studies

Clinical analysts often compare a treatment indicator coded 1 or 0 against a recovery score, biomarker level, or symptom index. Point-biserial correlation expresses that group difference as a standardized effect size.

HR, marketing, and operational decisions

Certification status versus productivity, purchase versus non-purchase against income, and yes or no outcomes versus continuous KPIs are all natural point-biserial use cases where a binary grouping is linked to a continuous measure.

Certification vs Productivity

A clean HR-style example with pass or fail group membership and a productivity score.

Treatment vs Recovery Score

A clinical group comparison between treated patients and control patients.

Item Analysis: Question 5

Classic educational testing example: got the item correct or incorrect versus total exam score.

Frequently Asked Questions

What is point-biserial correlation?

Point-biserial correlation (rpb) measures the strength and direction of the relationship between a naturally dichotomous binary variable coded 0 and 1 and a continuous variable. It is mathematically equivalent to Pearson r when one variable is binary, and it produces the same p-value as an independent samples t-test. Values range from -1 to +1.

When should I use point-biserial correlation?

Use point-biserial correlation when one variable is a true binary variable such as pass or fail, yes or no, treatment or control, or certified or not certified, and the other variable is continuous. Common applications include item analysis in educational testing, where it often acts as an item discrimination index, along with clinical group comparisons, marketing conversion analysis, and HR performance studies.

How is point-biserial correlation related to the t-test?

Point-biserial correlation and the independent samples t-test are mathematically equivalent and produce identical p-values for the same dataset. You can convert between them using rpb = √(t² / (t² + df)). Point-biserial is often preferred when you want to report a standardized effect size inside a correlation framework.

What is a good point-biserial correlation?

Using common effect-size rules of thumb, rpb around 0.10 is small, 0.24 is medium, and 0.37 or above is large. In educational item analysis, values above 0.30 are usually acceptable and values above 0.40 are often considered excellent discrimination.

What is the difference between point-biserial and biserial correlation?

Point-biserial correlation is used when the binary variable is naturally dichotomous, such as alive versus dead or passed versus failed. Biserial correlation is used when the binary variable is artificially dichotomous, meaning it was created by splitting an underlying continuous variable at a threshold. Biserial correlation relies on stronger normality assumptions and is less commonly used in practical software workflows.

How do you calculate point-biserial correlation by hand?

First separate the continuous values into the group coded 1 and the group coded 0. Then compute each group mean, compute the overall standard deviation of the continuous variable using n in the denominator, and apply rpb = ((Ȳ₁ - Ȳ₀) / sY) × √(n₁n₀ / n²). This calculator shows each of those steps explicitly.

Can point-biserial correlation be negative?

Yes. A negative point-biserial coefficient means the group coded as 1 has a lower mean on the continuous variable than the group coded as 0. The sign depends entirely on your coding choice, so swapping 0 and 1 flips the sign while keeping the effect size magnitude the same.

What sample size is needed for point-biserial correlation?

A total sample of about 20, with at least around 10 observations per group when possible, is a practical minimum for a stable first estimate. Larger samples are better for significance testing, and very unequal group sizes reduce power and limit the maximum possible rpb value.

Related Calculators

Pearson r

For two continuous variables

Spearman ρ

For ordinal, ranked, or monotonic data

Kendall τ

For small samples and tie-heavy ordered data

Correlation Matrix

For 3 or more variables