Many introductory statistics examples are mathematically clear but can feel disconnected from real-world decision-making. Sports, and baseball in particular, provide a data-rich alternative that grounds probability concepts in authentic, relatable contexts.
An excerpt from a 2025 interview of Jim Albert by Scott Berry, titled “An Inspiring Story of Teaching Statistics Through Sports Applications”, captures this idea well:
“Part of it [teaching statistics through sports applications] is the real application. Statistics can suffer from talking about urns, spinners, dice, and flipping a coin.”
This write-up is designed as a teaching resource that demonstrates how a classical probability model, the Binomial distribution, can illuminate a topical and substantive problem in baseball analytics: many all-time ranking lists often overrepresent players from the pre-integration era. It is motivated by the analysis in Challenging Nostalgia and Performance Metrics in Baseball and results from Comparing baseball players across eras via novel Full House Modeling.
When we look at all-time player rankings, older stars like Babe Ruth, Ty Cobb, and Walter Johnson often dominate the lists. But these players competed in a pre-integration MLB, when the talent pool excluded many qualified players.
Here, we use a simple Binomial model to show that traditional metrics, such as Wins Above Replacement (WAR), overrepresent pre-1950 players among the all-time greats. We also show that era-adjusted WAR does not suffer the same bias.
The Binomial distribution models the number of successes in a fixed number \(n\) of independent trials. Each trial results in a success or a failure, with \(p\) used to denote the probability of success where \(0 < p < 1\). Formally, if \(X \sim \text{Binomial}(n,p)\), then its probability mass function is \[ P(X = x) = {n \choose x} p^x (1-p)^{n-x}, \qquad x = 0,1,\ldots,n. \] Here, \({n \choose x}\) counts the number of ways \(x\) successes can occur in \(n\) trials. Importantly, we only care about the total count of successes and do not care about the order of those successes. This is why the “choose” function \({n \choose x}\) appears in the formula.
For the equation above to represent a valid probability mass function, we must show that \[ \sum_{\text{all} \;x \;\text{in sample space}} P(X = x) = 1, \] and that \(P(X = x) \geq 0\) for all \(x = 0,1,\ldots,n\). The second condition follows immediately since \({n \choose x} \ge 0\), \(p^x \ge 0\), and \((1-p)^{n-x} \ge 0\) for all valid \(p\) and \(x\).
Verification that the sum of probabilities equals one follows from the Binomial Theorem, which states that for two values \(a\) and \(b\) and nonnegative integer \(n\), \[ (a + b)^n = \sum_{x = 0}^n {n \choose x} a^xb^{n-x}. \] Now let \(a = p\) and \(b = 1-p\), choices that correspond to the probability mass function described above. These choices yield \(a + b = 1\). Thus, \[ 1 = 1^n = (p + (1-p))^n = \sum_{x = 0}^n {n \choose x} p^x(1-p)^{n-x}, \] and we see that the probability mass function for the Binomial distribution is indeed a probability mass function.
We now investigate WAR as a method to rank careers of baseball players. WAR is a one-number summary of a player’s total value measured in wins added above a replacement level player. It is a widely used metric for comparing players, often across eras, and is recognized as a prominent advance in the sabermetrics community.
There are three mainstream versions of WAR. Among these we select baseball reference WAR (bWAR), one of the most widely used versions and is easy to obtain. We now load in the following software packages and obtain bWAR from baseball reference.
library(tidyverse)
#install.packages("devtools")
#devtools::install_github(repo = "DEck13/fullhouse")
library(fullhouse)
# bref data
bwar_bat = readr::read_csv("https://www.baseball-reference.com/data/war_daily_bat.txt",
na = "NULL")
bwar_pit = readr::read_csv("https://www.baseball-reference.com/data/war_daily_pitch.txt",
na = "NULL")
The code below totals the career bWAR for all players. We also store the first year (rookie year) a player enters Major League Baseball (MLB) for all players. We also create an eligible year column which aligns each player to their first season between ages 20 and 29, so that comparisons are made at a similar life stage.
# get career WAR for all players
total_WAR = rbind(bwar_bat %>% select(player_ID, WAR),
bwar_pit %>% select(player_ID, WAR)) %>%
summarise(WAR = sum(WAR, na.rm = TRUE), .by = player_ID) %>%
arrange(desc(WAR))
# get player names
names = rbind(bwar_bat %>% select(player_ID, name_common),
bwar_pit %>% select(player_ID, name_common)) %>%
distinct()
# get rookie year and age at rookie year for all players
rookie_age = rbind(bwar_bat %>% select(player_ID, year_ID, age),
bwar_pit %>% select(player_ID, year_ID, age)) %>%
distinct() %>%
group_by(player_ID) %>%
slice_min(order_by = year_ID, n = 1) %>%
ungroup()
# join data sets together
total_WAR = total_WAR %>%
left_join(names) %>%
left_join(rookie_age) %>%
rename(name = name_common, rookie_year = year_ID) %>%
select(name, player_ID, WAR, age, rookie_year)
# get closest year that player is age 20-29
total_WAR = total_WAR %>%
mutate(eligible_year = rookie_year,
eligible_year = ifelse(age > 30, eligible_year - (age - 29), eligible_year),
eligible_year = ifelse(age < 20, eligible_year + (20 - age), eligible_year))
Below are the top 25 careers ranked by bWAR. Also included is a debut_1950 column which equals 1 if the eligible year season was at or before 1950. The year 1950 is chosen because it is the first census year after baseball was integrated and is roughly the midpoint of the recorded MLB history in baseball reference (1871-2025).
top25_bWAR = total_WAR %>%
arrange(desc(WAR)) %>%
select(name, WAR, eligible_year) %>%
mutate(debut_1950 = ifelse(eligible_year <= 1950, 1, 0)) %>%
slice(1:25) %>%
as.data.frame()
top25_bWAR
## name WAR eligible_year debut_1950
## 1 Babe Ruth 182.57 1915 1
## 2 Walter Johnson 167.82 1908 1
## 3 Cy Young 163.55 1890 1
## 4 Barry Bonds 162.78 1986 0
## 5 Willie Mays 156.19 1951 0
## 6 Ty Cobb 151.43 1907 1
## 7 Henry Aaron 143.22 1954 0
## 8 Roger Clemens 139.23 1984 0
## 9 Tris Speaker 134.88 1908 1
## 10 Honus Wagner 131.05 1897 1
## 11 Stan Musial 128.59 1941 1
## 12 Rogers Hornsby 127.30 1916 1
## 13 Eddie Collins 124.18 1907 1
## 14 Ted Williams 121.83 1939 1
## 15 Grover Alexander 119.74 1911 1
## 16 Alex Rodriguez 117.41 1996 0
## 17 Kid Nichols 116.29 1890 1
## 18 Lou Gehrig 113.67 1923 1
## 19 Rickey Henderson 111.18 1979 0
## 20 Mel Ott 110.95 1929 1
## 21 Mickey Mantle 110.26 1952 0
## 22 Tom Seaver 109.92 1967 0
## 23 Frank Robinson 107.24 1956 0
## 24 Nap Lajoie 106.92 1896 1
## 25 Mike Schmidt 106.90 1972 0
We see that 15 of the top 25 players ranked by career WAR began their career at or before 1950.
n_bWAR = sum(top25_bWAR$debut_1950)
n_bWAR
## [1] 15
To assess whether 15 is an unusually large number of pre-1950 players, we use the Binomial distribution. For this, we need to estimate the proportion of the historical talent pool that was active, or could have been active, at or before 1950.
The talent pool estimates are provided in the package. These estimates represent the number of males aged 20–29 in each decade, giving a reasonable approximation of everyone who could have been eligible to play Major League Baseball (without double counting across years).
From these data, we find that 26.6% of the historical talent pool was age-eligible in 1950 or earlier. For this analysis, we take 2010 as the final reference year, so that a 20-year-old in 2010 would be 35 in 2025, old enough to have plausibly completed a career and appeared among the top 25 all-time players by WAR.
data("talentpool")
# NL pop used (NL quicker to integrate than AL)
decades = talentpool %>%
select(year, NLpop) %>%
filter(year %in% c(1871, 1880 + 10 * 0:13)) %>%
mutate(cum_pops = cumsum(NLpop) / sum(NLpop))
# Prob at or before 1871-1950
# computed through 2010 (age 20-29 in 2010)
p = decades %>% filter(year == 1950) %>% pull(cum_pops)
p
## [1] 0.2657831
The plot below shows how the estimated size of the eligible talent pool evolved over time. We see a steady increase, reflecting the sport’s geographic expansion beyond the Northeast, the integration of MLB, globalization of the game, and overall population growth.
decades %>%
ggplot() +
aes(x = year, y = NLpop) +
geom_col() +
labs(title = "Talent Pool Size Over Time",
subtitle = "Eligible males aged 20-29 ") +
ylab("Population") +
xlab("Year") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold"))
We can now connect the Binomial distribution to our baseball application. The function in R computes cumulative probabilities of the form \(P(X \leq x)\). In this case, we are interested in the probability of observing at least 15 players from 1950 or earlier among the top 25 careers by WAR.
Formally, we compute \[ P(X \geq 15), \qquad X \sim \text{Binomial}(n = 25, p = 0.266). \] where \(p=0.266\) represents the proportion of the historical talent pool that was age-eligible in 1950 or before. The code below calculates this Binomial probability:
# bWAR 15/25
## setting lower = FALSE => P[X > x]
## we want P[X >= x] so we need pbomin(X > x-1)
pbinom(n_bWAR-1, size = 25, prob = p, lower.tail = FALSE)
## [1] 0.0004435481
The resulting probability is extremely small.
If talent were uniformly distributed across history, we would expect only about 6 or 7 of the top 25 players to have debuted before 1950:
# expected value = n * p
expected = 25 * p
expected
## [1] 6.644579
Instead, we observe 15. The Binomial probability of observing 15 or more pre-1950 players purely by chance is effectively zero.
For completeness we will compute the expected value for \(X \sim \text{Binomial}(n,p)\). The definition of expectation is \[ E(X) = \sum_{\text{all} \;x \;\text{in sample space}} x P(X = x). \] When \(X \sim \text{Binomial}(n,p)\) we have, \[\begin{align*} E(X) &= \sum_{x = 0}^n {n \choose x} x p^x(1-p)^{n-x} \\ &= \sum_{x = 1}^n {n \choose x} x p^x(1-p)^{n-x}. \end{align*}\] The choose function is expressed as \({n \choose x} = \frac{n!}{x!(n-x)!}\) where \(x! = x \cdot (x-1) \cdot (x-2) \cdots 2\cdot 1\). Thus we can continue the derivation with \[\begin{align*} \sum_{x = 1}^n {n \choose x} x p^x(1-p)^{n-x} &= \sum_{x = 1}^n \frac{n!}{x!(n-x)!} x p^x(1-p)^{n-x} \\ &= \sum_{x = 1}^n \frac{n!}{(x-1)!(n-x)!} p^x(1-p)^{n-x}. \end{align*}\] The trick here is that we can define a new Binomial random variable ranging from 0 to \(n-1\) when we substitute \(y = x - 1\). Doing this requires exponents to agree with \(y\) and not \(x\). We now express the above with \(y = x - 1\) and \(x = y + 1\) and simplify, \[\begin{align*} \sum_{x = 1}^n \frac{n!}{(x-1)!(n-x)!} p^x(1-p)^{n-x} &= \sum_{y + 1 = 1}^{n-1} \frac{n!}{y!(n-(y+1))!} p^{y+1}(1-p)^{n-(y+1)} \\ &= \sum_{y = 0}^{n-1} \frac{n!}{y!(n-1-y)!} p^{y+1}(1-p)^{n-1-y} \\ &= \sum_{y = 0}^{n-1} \frac{n(n-1)!}{y!(n-1-y)!} pp^{y}(1-p)^{n-1-y} \\ &= np \sum_{y = 0}^{n-1} \frac{(n-1)!}{y!(n-1-y)!} p^{y}(1-p)^{n-1-y} \\ &= np \sum_{y = 0}^{n-1} {n-1 \choose y} p^{y}(1-p)^{n-1-y}. \end{align*}\] Notice that \({n-1 \choose y} p^{y}(1-p)^{n-1-y)}\) is a probability mass function for \(Y \sim \text{Binomial}(n-1, p)\), and \(\sum_{y = 0}^{n-1} {n-1 \choose y} p^{y}(1-p)^{n-1-y} = 1\) as a result. Putting it all together yields \[ E(X) = np. \] This expectation gives a natural benchmark: if player greatness were randomly distributed over history, we would expect about \(np=6.65\) players who began their career at 1950 or before among the top 25.
We see that bWAR’s inclusion of 15 players who began their careers at or before 1950 in its top-25 list is more than double the expected count of roughly 6–7. The Binomial probability of observing 15 or more such players purely by chance is effectively zero.
The overrepresentation of pre-integration players is not unique to Baseball-Reference’s WAR rankings. Fangraphs’ version of WAR (fWAR) shows a similar pattern, with 12 of the top 25 career leaders debuting at or before 1950. Joe Posnanski’s New York Times bestselling book The Baseball 100 also includes 12 players who began their careers at or before 1950 in his top 25 among the greatest non–Negro League players in MLB history. Even ESPN’s top 100 list ranks 11 such players in its top 25.
The Binomial probabilities of observing at least these many pre-1950 players purely by chance are also extremely small:
## fWAR and Posnanski
pbinom(12-1, size = 25, prob = p, lower.tail = FALSE)
## [1] 0.01760938
## ESPN
pbinom(11-1, size = 25, prob = p, lower.tail = FALSE)
## [1] 0.04517858
These results show that even a simple Binomial model can reveal systematic bias: traditional WAR-based rankings favor older-era players, despite WAR being widely described as a tool for comparing players across eras.
The Full House Model (FHM) yields an era-adjusted version of WAR that explicitly accounts for changes in the quality and size of the historical talent pool.
This adjustment, implemented in the R package as era-balanced WAR (ebWAR), re-scales player value so that a win above replacement carries the same meaning across eras. Whereas traditional WAR implicitly compares players in isolation from their competition, era-adjusted WAR compares them relative to the depth and strength of their contemporaries.
The code below combines career ebWAR totals for batters and pitchers, merges in each player’s rookie year, and identifies whether the player debuted before or after 1950.
data("batters_career_adjusted")
data("pitchers_career_adjusted")
# get career ebWAR for all batters and pitchers
total_ebWAR = rbind(batters_career_adjusted %>% select(name, playerID, ebWAR),
pitchers_career_adjusted %>% select(name, playerID, ebWAR)) %>%
arrange(desc(ebWAR))
# combine Babe Ruth's batting and pitching WAR
index.BR = which(total_ebWAR$name == "Babe Ruth")
total_ebWAR[index.BR[1], 3] = sum(total_ebWAR[index.BR, 3])
total_ebWAR = total_ebWAR %>%
arrange(desc(ebWAR))
# get rookie year and age at rookie year for all players
rookie_year = rbind(batters_adjusted %>% select(name, playerID, year),
pitchers_adjusted %>% select(name, playerID, year)) %>%
distinct() %>%
group_by(playerID) %>%
slice_min(order_by = year, n = 1) %>%
ungroup()
# join data sets together
total_ebWAR = total_ebWAR %>%
left_join(rookie_year) %>%
rename(rookie_year = year) %>%
select(name, playerID, ebWAR, rookie_year)
## Joining with `by = join_by(name, playerID)`
The table below lists the top 25 players by career era-balanced WAR. We again mark whether each player’s rookie season occurred in 1950 or earlier.
# top 25
top25_ebWAR = total_ebWAR %>%
select(-playerID) %>%
mutate(debut_1950 = ifelse(rookie_year <= 1950, 1, 0)) %>%
slice(1:25) %>%
as.data.frame()
top25_ebWAR
## name ebWAR rookie_year debut_1950
## 1 Barry Bonds 154.71 1986 0
## 2 Willie Mays 145.30 1951 0
## 3 Roger Clemens 144.38 1984 0
## 4 Babe Ruth 138.64 1915 1
## 5 Henry Aaron 135.67 1954 0
## 6 Alex Rodriguez 120.64 1996 0
## 7 Stan Musial 119.37 1941 1
## 8 Ty Cobb 115.00 1907 1
## 9 Greg Maddux 113.55 1986 0
## 10 Albert Pujols 111.95 2001 0
## 11 Mike Schmidt 110.09 1972 0
## 12 Randy Johnson 109.64 1988 0
## 13 Rickey Henderson 109.26 1979 0
## 14 Ted Williams 107.95 1939 1
## 15 Tom Seaver 103.85 1967 0
## 16 Tris Speaker 102.66 1909 1
## 17 Lefty Grove 101.19 1926 1
## 18 Joe Morgan 100.31 1963 0
## 19 Justin Verlander 100.05 2005 0
## 20 Mel Ott 99.87 1928 1
## 21 Frank Robinson 99.84 1956 0
## 22 Cal Ripken Jr 97.95 1981 0
## 23 Bert Blyleven 97.38 1970 0
## 24 Rogers Hornsby 97.16 1916 1
## 25 Lou Gehrig 95.56 1923 1
We find that only 9 of the top 25 players in the era-adjusted rankings began their careers at or before 1950.
n_ebWAR = sum(top25_ebWAR$debut_1950)
n_ebWAR
## [1] 9
Using the same Binomial framework as before, we can evaluate whether this level of representation is consistent with the size of the pre-1950 talent pool.
pbinom(n_ebWAR-1, size = 25, prob = p, lower.tail = FALSE)
## [1] 0.1974557
The resulting Binomial probability is 0.197. Unlike the earlier unadjusted WAR lists, where probabilities ranged from very small to effectively zero, this value lies more comfortably within the range we would expect under random variation if talent were fairly distributed across eras. In other words, era-adjusted WAR produces rankings that are statistically well-calibrated to the size of the historical talent pool. Era-adjusted JAWS lists are even more tightly calibrated, reinforcing this conclusion.
To visualize the improvement, the figure below compares how many of the top 25 players in several prominent all-time lists began their careers at or before 1950.
dat = data.frame(list = c("bWAR", "fWAR", "Posnanski", "ESPN", "ebWAR"),
count = c(15, 12, 12, 11, 9))
dat %>%
mutate(color = ifelse(list == "ebWAR", "#377eb8", "#e41a1c")) %>%
ggplot(aes(x = list, y = count, fill = color)) +
aes(x = list, y = count) +
geom_col() +
scale_fill_identity() +
labs(title = "Pre-1950 Players in Top-25 All-Time Lists",
subtitle = "Traditional lists overrepresent older eras; era-adjusted WAR corrects this bias",
x = "",
y = "Count") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold"))
The results so far illustrate how a simple probability model can diagnose systemic bias. To make the lesson even more tangible, let’s apply the same reasoning to an unexpected source–ChatGPT (this sentence brought to you by ChatGPT).
This author interacts with ChatGPT about baseball all the time, so I went to ChatGPT’s temporary chat and supplied the following:
The answer was pretty good! ChatGPT listed only 9 players who began their careers in 1950 or earlier in its top 25 — the same count as ebWAR.
However, a closer look reveals a notable difference. ChatGPT’s top-10 list contains six older-era players, while ebWAR includes only three. The Binomial probability of observing at least six older-era players in a top-10 list is exceedingly small:
pbinom(6-1, size = 10, prob = p, lower.tail = FALSE)
## [1] 0.0266163
Meanwhile, the ebWAR list aligns closely with expectation:
# ebWAR count
pbinom(3-1, size = 10, prob = p, lower.tail = FALSE)
## [1] 0.5212305
# expectation
10*p
## [1] 2.657831
A very close inspection of ChatGPT’s top 25 “greatest baseball players of all time” list suggests that the AI may be conflating legacy and ability. To test this further, I asked ChatGPT to “also list the top 25 best baseball players of all time.” The results below again include six players who began their careers at or before 1950.
The results are summarized below.
dat = data.frame(list = c("ChatGPT Greatest Ever", "ChatGPT Best Ever", "ebWAR"),
count = c(6, 6, 3))
dat %>%
mutate(color = ifelse(list == "ebWAR", "#377eb8", "#e41a1c")) %>%
ggplot(aes(x = list, y = count, fill = color)) +
aes(x = list, y = count) +
geom_col() +
scale_fill_identity() +
labs(title = "Pre-1950 Players in Top-10 All-Time Lists",
subtitle = "ChatGPT lists overrepresent older eras; era-adjusted WAR corrects this bias",
x = "",
y = "Count") +
theme_minimal(base_size = 12) +
theme(plot.title = element_text(face = "bold"))
Even AI can fall for the nostalgia trap!
ChatGPT’s lists mirror what this author routinely encounters in baseball debates: fans instinctively elevate the legends who have endured in cultural memory, then fill the remaining slots with stars from more modern eras. If they were more consistent, and had better recall, names like Lefty Grove (click the link for a student made video on Lefty Grove through the lens of era-adjusted stats), Tris Speaker, and Eddie Collins would appear too, as might Kid Nichols, Jimmie Foxx, and Nap Lajoie. After all, when comparing to contemporaries, Speaker is much closer to Ty Cobb in career WAR than Mickey Mantle is to Willie Mays. Instead, those figures have faded into the sands of time, leaving a heavily nostalgic top ten wrapped in what appears to be a reasonably balanced top twenty-five–more a lucky accident than a principled outcome.
By accounting for changes in the competitive environment, era-adjusted metrics correct the imbalance observed in traditional WAR-based rankings and other lists of the greatest all-time players, human- or AI-derived. This demonstrates that era-adjusted WAR provides a fairer and more historically coherent measure of player greatness, a conclusion strongly supported by a simple Binomial model introduced here.
Those interested in exploring more advanced modeling approaches can find a concise introduction to Full House Modeling (FHM) and its applications to era-adjusted baseball statistics in the FHM primer, a vignette included with the Lahman R package. The primer builds statistical intuition for the probability integral transform and the distribution of order statistics, and shows how these concepts together form the foundation of Full House Modeling. Along with this write-up, it is intended as a teaching resource that connects introductory probability ideas to modern statistical modeling.
This example demonstrates how even introductory probability models, when paired with real data and domain context, can reveal deeper insights and bridge the gap between classical classroom statistics and contemporary research.
Reproducible code, datasets (including ebWAR/efWAR), and
teaching-ready examples are available in the fullhouse R
package repository:
https://github.com/DEck13/fullhouse/tree/main
Use these materials freely in your courses or workshops (with attribution). The repo includes vignette-style notebooks, figures, and helper functions that match this write-up.
Daniel J. Eck (2020). Challenging Nostalgia and Performance Metrics in Baseball. CHANCE. DOI: 10.1080/09332480.2020.1726114
Shen Yan, Adrian Burgos Jr., Christopher Kinson, and Daniel J. Eck, (2025). Comparing baseball players across eras via novel Full House Modeling. Annals of Applied Statistics, 19(2). DOI: 10.1214/24-AOAS1992