Chapter 18 Advanced ~60 min read

EPM and Bayesian Approaches

Estimated Plus-Minus and the application of Bayesian methods to basketball player evaluation.

The Bayesian Framework

Estimated Plus-Minus (EPM) emerged as a leading modern all-in-one metric through its rigorous application of Bayesian statistical methodology. Developed by Taylor Snarr at Dunks and Threes, EPM addresses a fundamental challenge in player evaluation: quantifying uncertainty. Unlike metrics that produce single point estimates, EPM generates probability distributions for each player's impact.

The Bayesian framework provides a principled approach to combining different information sources. Prior beliefs about player ability—based on box scores, age, physical measurements—are updated with observed plus-minus data to produce posterior distributions. This mirrors how experts actually reason about players: starting with expectations and adjusting as new evidence accumulates.

Bayesian Foundations

Bayesian statistics treats parameters as random variables with probability distributions reflecting uncertainty. The core is Bayes' theorem: P(θ|D) ∝ P(D|θ) × P(θ). Our belief about a player's true impact (posterior) combines the likelihood of observing their statistics given various impact levels with our prior beliefs about their ability.

For EPM, prior distributions come from models predicting expected plus-minus based on box score statistics, age curves, and positional information. The width of these prior distributions reflects prediction uncertainty.

Uncertainty Quantification

EPM's most distinctive feature is explicit uncertainty reporting. Each player receives not just a point estimate but a full probability distribution. The 90% credible interval indicates the range within which we're 90% confident the true impact falls. Narrow intervals indicate high confidence; wide intervals signal uncertainty.

Consider two players with identical point estimates of +3.0. One might have a 90% credible interval of [+1.5, +4.5], indicating high confidence. The other might show [-1.0, +7.0], suggesting the player might be anywhere from below average to elite. These players should be evaluated very differently despite identical ratings.

Hierarchical Extensions

Full EPM implementations use hierarchical models that share information across players and seasons. Players with similar profiles inform each other's estimates. The hierarchical structure also enables modeling of time-varying ability through age curves.

Practical Applications

EPM and Bayesian approaches shine in contexts where uncertainty matters: draft evaluation, trade analysis, and roster construction. A team considering trading for a player wants to know not just expected value but the probability distribution of outcomes. Is there significant upside? What's the downside risk?

Implementation in R

# Understanding EPM (Estimated Plus-Minus)
library(tidyverse)

# Simplified EPM calculation
calculate_epm <- function(player_stats) {
  player_stats %>%
    mutate(
      # Offensive EPM components
      scoring_value = (ts_pct - 0.55) * pts_100 * 0.5,
      playmaking_value = ast_100 * 0.15 - tov_100 * 0.2,
      spacing_value = fg3_rate * fg3_pct * 0.3,

      # Defensive EPM components
      rim_protection = blk_100 * 0.15,
      perimeter_defense = stl_100 * 0.12,
      rebounding_value = drb_100 * 0.05,

      # Combined EPM
      epm_offense = scoring_value + playmaking_value + spacing_value,
      epm_defense = rim_protection + perimeter_defense + rebounding_value,
      epm_total = epm_offense + epm_defense
    )
}

player_stats <- read_csv("player_advanced.csv")
epm_data <- calculate_epm(player_stats)

# Top EPM players
top_epm <- epm_data %>%
  filter(min >= 1500) %>%
  arrange(desc(epm_total)) %>%
  select(player_name, epm_total, epm_offense, epm_defense) %>%
  head(15)

print(top_epm)

Implementation in Python

# Understanding EPM (Estimated Plus-Minus)
import pandas as pd

def calculate_epm(player_stats):
    """Simplified EPM calculation"""
    df = player_stats.copy()

    # Offensive EPM components
    df["scoring_value"] = (df["ts_pct"] - 0.55) * df["pts_100"] * 0.5
    df["playmaking_value"] = df["ast_100"] * 0.15 - df["tov_100"] * 0.2
    df["spacing_value"] = df["fg3_rate"] * df["fg3_pct"] * 0.3

    # Defensive EPM components
    df["rim_protection"] = df["blk_100"] * 0.15
    df["perimeter_defense"] = df["stl_100"] * 0.12
    df["rebounding_value"] = df["drb_100"] * 0.05

    # Combined EPM
    df["epm_offense"] = df["scoring_value"] + df["playmaking_value"] + df["spacing_value"]
    df["epm_defense"] = df["rim_protection"] + df["perimeter_defense"] + df["rebounding_value"]
    df["epm_total"] = df["epm_offense"] + df["epm_defense"]

    return df

player_stats = pd.read_csv("player_advanced.csv")
epm_data = calculate_epm(player_stats)

top_epm = epm_data[epm_data["min"] >= 1500].nlargest(15, "epm_total")[
    ["player_name", "epm_total", "epm_offense", "epm_defense"]
]
print(top_epm)

Implementation in R

# Understanding EPM (Estimated Plus-Minus)
library(tidyverse)

# Simplified EPM calculation
calculate_epm <- function(player_stats) {
  player_stats %>%
    mutate(
      # Offensive EPM components
      scoring_value = (ts_pct - 0.55) * pts_100 * 0.5,
      playmaking_value = ast_100 * 0.15 - tov_100 * 0.2,
      spacing_value = fg3_rate * fg3_pct * 0.3,

      # Defensive EPM components
      rim_protection = blk_100 * 0.15,
      perimeter_defense = stl_100 * 0.12,
      rebounding_value = drb_100 * 0.05,

      # Combined EPM
      epm_offense = scoring_value + playmaking_value + spacing_value,
      epm_defense = rim_protection + perimeter_defense + rebounding_value,
      epm_total = epm_offense + epm_defense
    )
}

player_stats <- read_csv("player_advanced.csv")
epm_data <- calculate_epm(player_stats)

# Top EPM players
top_epm <- epm_data %>%
  filter(min >= 1500) %>%
  arrange(desc(epm_total)) %>%
  select(player_name, epm_total, epm_offense, epm_defense) %>%
  head(15)

print(top_epm)

Implementation in Python

# Understanding EPM (Estimated Plus-Minus)
import pandas as pd

def calculate_epm(player_stats):
    """Simplified EPM calculation"""
    df = player_stats.copy()

    # Offensive EPM components
    df["scoring_value"] = (df["ts_pct"] - 0.55) * df["pts_100"] * 0.5
    df["playmaking_value"] = df["ast_100"] * 0.15 - df["tov_100"] * 0.2
    df["spacing_value"] = df["fg3_rate"] * df["fg3_pct"] * 0.3

    # Defensive EPM components
    df["rim_protection"] = df["blk_100"] * 0.15
    df["perimeter_defense"] = df["stl_100"] * 0.12
    df["rebounding_value"] = df["drb_100"] * 0.05

    # Combined EPM
    df["epm_offense"] = df["scoring_value"] + df["playmaking_value"] + df["spacing_value"]
    df["epm_defense"] = df["rim_protection"] + df["perimeter_defense"] + df["rebounding_value"]
    df["epm_total"] = df["epm_offense"] + df["epm_defense"]

    return df

player_stats = pd.read_csv("player_advanced.csv")
epm_data = calculate_epm(player_stats)

top_epm = epm_data[epm_data["min"] >= 1500].nlargest(15, "epm_total")[
    ["player_name", "epm_total", "epm_offense", "epm_defense"]
]
print(top_epm)
Chapter Summary

You've completed Chapter 18: EPM and Bayesian Approaches.