The Bayesian Framework
Estimated Plus-Minus (EPM) emerged as a leading modern all-in-one metric through its rigorous application of Bayesian statistical methodology. Developed by Taylor Snarr at Dunks and Threes, EPM addresses a fundamental challenge in player evaluation: quantifying uncertainty. Unlike metrics that produce single point estimates, EPM generates probability distributions for each player's impact.
The Bayesian framework provides a principled approach to combining different information sources. Prior beliefs about player abilityâbased on box scores, age, physical measurementsâare updated with observed plus-minus data to produce posterior distributions. This mirrors how experts actually reason about players: starting with expectations and adjusting as new evidence accumulates.
Bayesian Foundations
Bayesian statistics treats parameters as random variables with probability distributions reflecting uncertainty. The core is Bayes' theorem: P(θ|D) â P(D|θ) Ă P(θ). Our belief about a player's true impact (posterior) combines the likelihood of observing their statistics given various impact levels with our prior beliefs about their ability.
For EPM, prior distributions come from models predicting expected plus-minus based on box score statistics, age curves, and positional information. The width of these prior distributions reflects prediction uncertainty.
Uncertainty Quantification
EPM's most distinctive feature is explicit uncertainty reporting. Each player receives not just a point estimate but a full probability distribution. The 90% credible interval indicates the range within which we're 90% confident the true impact falls. Narrow intervals indicate high confidence; wide intervals signal uncertainty.
Consider two players with identical point estimates of +3.0. One might have a 90% credible interval of [+1.5, +4.5], indicating high confidence. The other might show [-1.0, +7.0], suggesting the player might be anywhere from below average to elite. These players should be evaluated very differently despite identical ratings.
Hierarchical Extensions
Full EPM implementations use hierarchical models that share information across players and seasons. Players with similar profiles inform each other's estimates. The hierarchical structure also enables modeling of time-varying ability through age curves.
Practical Applications
EPM and Bayesian approaches shine in contexts where uncertainty matters: draft evaluation, trade analysis, and roster construction. A team considering trading for a player wants to know not just expected value but the probability distribution of outcomes. Is there significant upside? What's the downside risk?
Implementation in R
# Understanding EPM (Estimated Plus-Minus)
library(tidyverse)
# Simplified EPM calculation
calculate_epm <- function(player_stats) {
player_stats %>%
mutate(
# Offensive EPM components
scoring_value = (ts_pct - 0.55) * pts_100 * 0.5,
playmaking_value = ast_100 * 0.15 - tov_100 * 0.2,
spacing_value = fg3_rate * fg3_pct * 0.3,
# Defensive EPM components
rim_protection = blk_100 * 0.15,
perimeter_defense = stl_100 * 0.12,
rebounding_value = drb_100 * 0.05,
# Combined EPM
epm_offense = scoring_value + playmaking_value + spacing_value,
epm_defense = rim_protection + perimeter_defense + rebounding_value,
epm_total = epm_offense + epm_defense
)
}
player_stats <- read_csv("player_advanced.csv")
epm_data <- calculate_epm(player_stats)
# Top EPM players
top_epm <- epm_data %>%
filter(min >= 1500) %>%
arrange(desc(epm_total)) %>%
select(player_name, epm_total, epm_offense, epm_defense) %>%
head(15)
print(top_epm)
Implementation in Python
# Understanding EPM (Estimated Plus-Minus)
import pandas as pd
def calculate_epm(player_stats):
"""Simplified EPM calculation"""
df = player_stats.copy()
# Offensive EPM components
df["scoring_value"] = (df["ts_pct"] - 0.55) * df["pts_100"] * 0.5
df["playmaking_value"] = df["ast_100"] * 0.15 - df["tov_100"] * 0.2
df["spacing_value"] = df["fg3_rate"] * df["fg3_pct"] * 0.3
# Defensive EPM components
df["rim_protection"] = df["blk_100"] * 0.15
df["perimeter_defense"] = df["stl_100"] * 0.12
df["rebounding_value"] = df["drb_100"] * 0.05
# Combined EPM
df["epm_offense"] = df["scoring_value"] + df["playmaking_value"] + df["spacing_value"]
df["epm_defense"] = df["rim_protection"] + df["perimeter_defense"] + df["rebounding_value"]
df["epm_total"] = df["epm_offense"] + df["epm_defense"]
return df
player_stats = pd.read_csv("player_advanced.csv")
epm_data = calculate_epm(player_stats)
top_epm = epm_data[epm_data["min"] >= 1500].nlargest(15, "epm_total")[
["player_name", "epm_total", "epm_offense", "epm_defense"]
]
print(top_epm)
Implementation in R
# Understanding EPM (Estimated Plus-Minus)
library(tidyverse)
# Simplified EPM calculation
calculate_epm <- function(player_stats) {
player_stats %>%
mutate(
# Offensive EPM components
scoring_value = (ts_pct - 0.55) * pts_100 * 0.5,
playmaking_value = ast_100 * 0.15 - tov_100 * 0.2,
spacing_value = fg3_rate * fg3_pct * 0.3,
# Defensive EPM components
rim_protection = blk_100 * 0.15,
perimeter_defense = stl_100 * 0.12,
rebounding_value = drb_100 * 0.05,
# Combined EPM
epm_offense = scoring_value + playmaking_value + spacing_value,
epm_defense = rim_protection + perimeter_defense + rebounding_value,
epm_total = epm_offense + epm_defense
)
}
player_stats <- read_csv("player_advanced.csv")
epm_data <- calculate_epm(player_stats)
# Top EPM players
top_epm <- epm_data %>%
filter(min >= 1500) %>%
arrange(desc(epm_total)) %>%
select(player_name, epm_total, epm_offense, epm_defense) %>%
head(15)
print(top_epm)
Implementation in Python
# Understanding EPM (Estimated Plus-Minus)
import pandas as pd
def calculate_epm(player_stats):
"""Simplified EPM calculation"""
df = player_stats.copy()
# Offensive EPM components
df["scoring_value"] = (df["ts_pct"] - 0.55) * df["pts_100"] * 0.5
df["playmaking_value"] = df["ast_100"] * 0.15 - df["tov_100"] * 0.2
df["spacing_value"] = df["fg3_rate"] * df["fg3_pct"] * 0.3
# Defensive EPM components
df["rim_protection"] = df["blk_100"] * 0.15
df["perimeter_defense"] = df["stl_100"] * 0.12
df["rebounding_value"] = df["drb_100"] * 0.05
# Combined EPM
df["epm_offense"] = df["scoring_value"] + df["playmaking_value"] + df["spacing_value"]
df["epm_defense"] = df["rim_protection"] + df["perimeter_defense"] + df["rebounding_value"]
df["epm_total"] = df["epm_offense"] + df["epm_defense"]
return df
player_stats = pd.read_csv("player_advanced.csv")
epm_data = calculate_epm(player_stats)
top_epm = epm_data[epm_data["min"] >= 1500].nlargest(15, "epm_total")[
["player_name", "epm_total", "epm_offense", "epm_defense"]
]
print(top_epm)