The Projection Challenge
Player projections must account for past performance, aging effects, role changes, and injury history. Simple projection (assume last year repeats) fails because performance varies year-to-year and systematic changes occur with age. Sophisticated projections weight multiple years, apply age adjustments, and regress toward appropriate baselines.
Multi-Year Weighting
Recent seasons predict future performance better than distant ones, but relying solely on the most recent year ignores stable skill components. Most projection systems weight the last 3-4 seasons, with declining weight as seasons recede. This approach balances recency with sample size.
def weighted_projection(season_stats, weights=[0.5, 0.3, 0.15, 0.05]):
"""Project stats using weighted multi-year approach"""
if len(season_stats) == 0:
return None
# Weight most recent seasons more heavily
projection = {}
for stat in season_stats[0].keys():
weighted_sum = 0
weight_sum = 0
for i, (season, weight) in enumerate(zip(season_stats, weights[:len(season_stats)])):
weighted_sum += season.get(stat, 0) * weight
weight_sum += weight
projection[stat] = weighted_sum / weight_sum if weight_sum > 0 else 0
return projection
Age Adjustments
Player skills evolve with age following predictable patterns. Most skills peak in the mid-to-late 20s, with gradual improvement before and decline after. Projections apply age-specific adjustments to account for expected development or deterioration.
Regression to the Mean
Extreme performances—both good and bad—tend to regress toward average in subsequent seasons. Projections apply regression based on the reliability of each statistic. Highly reliable stats (free throw percentage) regress less than noisy ones (three-point percentage over small samples).
Implementation in R
# Win probability model
library(tidyverse)
calculate_win_probability <- function(margin, time_remaining, possession) {
# Logistic model coefficients (simplified)
intercept <- 0
margin_coef <- 0.15
time_coef <- -0.001
interaction_coef <- -0.0001
# Linear predictor
lp <- intercept +
margin_coef * margin +
time_coef * time_remaining +
interaction_coef * margin * time_remaining
# Adjust for possession
if (possession == "offense") {
lp <- lp + 0.03
}
# Convert to probability
1 / (1 + exp(-lp))
}
# Example: Team leading by 5 with 2 minutes left
wp <- calculate_win_probability(margin = 5, time_remaining = 120,
possession = "offense")
print(paste("Win Probability:", round(wp * 100, 1), "%"))
# Win probability chart for a game
library(tidyverse)
library(ggplot2)
plot_win_probability <- function(game_pbp) {
game_pbp %>%
mutate(
wp = map2_dbl(home_margin, time_remaining,
~calculate_win_probability(.x, .y, "neutral"))
) %>%
ggplot(aes(x = game_time_elapsed, y = wp)) +
geom_line(color = "#1d428a", size = 1) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
labs(
title = "Win Probability Chart",
x = "Game Time",
y = "Home Team Win Probability"
) +
theme_minimal()
}
game <- read_csv("game_pbp.csv")
plot_win_probability(game)
Implementation in R
# Win probability model
library(tidyverse)
calculate_win_probability <- function(margin, time_remaining, possession) {
# Logistic model coefficients (simplified)
intercept <- 0
margin_coef <- 0.15
time_coef <- -0.001
interaction_coef <- -0.0001
# Linear predictor
lp <- intercept +
margin_coef * margin +
time_coef * time_remaining +
interaction_coef * margin * time_remaining
# Adjust for possession
if (possession == "offense") {
lp <- lp + 0.03
}
# Convert to probability
1 / (1 + exp(-lp))
}
# Example: Team leading by 5 with 2 minutes left
wp <- calculate_win_probability(margin = 5, time_remaining = 120,
possession = "offense")
print(paste("Win Probability:", round(wp * 100, 1), "%"))
# Win probability chart for a game
library(tidyverse)
library(ggplot2)
plot_win_probability <- function(game_pbp) {
game_pbp %>%
mutate(
wp = map2_dbl(home_margin, time_remaining,
~calculate_win_probability(.x, .y, "neutral"))
) %>%
ggplot(aes(x = game_time_elapsed, y = wp)) +
geom_line(color = "#1d428a", size = 1) +
geom_hline(yintercept = 0.5, linetype = "dashed", color = "gray50") +
scale_y_continuous(labels = scales::percent, limits = c(0, 1)) +
labs(
title = "Win Probability Chart",
x = "Game Time",
y = "Home Team Win Probability"
) +
theme_minimal()
}
game <- read_csv("game_pbp.csv")
plot_win_probability(game)