Modeling Game Outcomes
Game outcome prediction typically uses team strength estimates, home court advantage, and situational factors (rest, travel, injuries) to forecast results. The fundamental approach regresses historical outcomes on these inputs to learn how each factor predicts winning.
Team Strength Estimation
The foundation of game prediction is accurate team strength measurement. Elo ratings, Simple Rating System (SRS), and point-differential-based power rankings all attempt to quantify team quality. More sophisticated approaches use margin-adjusted results with opponent adjustments to estimate true team strength.
def simple_game_prediction(home_rating, away_rating, hca=3.0):
"""Predict game outcome using team ratings and home court advantage"""
predicted_margin = home_rating - away_rating + hca
# Convert margin to win probability
# Approximately: each point of margin = 3.5% win probability
win_prob = 0.5 + (predicted_margin * 0.035)
win_prob = max(0.05, min(0.95, win_prob)) # Bound probability
return {
'predicted_margin': round(predicted_margin, 1),
'home_win_prob': round(win_prob, 3)
}
Incorporating Injuries
Injuries significantly affect game predictions but are challenging to incorporate systematically. Player impact estimates allow converting injury news to team strength adjustments. When a player who provides +5 points per 100 possessions is out, the team's expected performance drops accordingly.
Prediction Accuracy
State-of-the-art game prediction models achieve approximately 68-70% accuracy against the spread on a season sample. This represents only modest edge over random chance (50%), highlighting basketball's inherent unpredictability. Models that claim dramatically higher accuracy are likely overfitting or measuring on non-representative samples.
Implementation in R
# Player performance projection
library(tidyverse)
project_player_stats <- function(career_data, player_ages) {
# Marcel-style projection
career_data %>%
arrange(player_id, desc(season)) %>%
group_by(player_id) %>%
slice_head(n = 3) %>%
summarise(
seasons = n(),
# Weighted average (recent seasons weighted more)
proj_pts = weighted.mean(pts, c(5, 3, 2)[1:n()]),
proj_reb = weighted.mean(reb, c(5, 3, 2)[1:n()]),
proj_ast = weighted.mean(ast, c(5, 3, 2)[1:n()]),
.groups = "drop"
) %>%
left_join(player_ages, by = "player_id") %>%
mutate(
# Age adjustment
age_factor = case_when(
age < 24 ~ 1.05, # Improvement expected
age <= 28 ~ 1.00, # Prime
age <= 32 ~ 0.97, # Slight decline
TRUE ~ 0.93 # Decline
),
proj_pts = proj_pts * age_factor,
proj_reb = proj_reb * age_factor,
proj_ast = proj_ast * age_factor
)
}
career <- read_csv("player_career_stats.csv")
ages <- read_csv("player_ages.csv")
projections <- project_player_stats(career, ages)
print(projections)
Implementation in R
# Player performance projection
library(tidyverse)
project_player_stats <- function(career_data, player_ages) {
# Marcel-style projection
career_data %>%
arrange(player_id, desc(season)) %>%
group_by(player_id) %>%
slice_head(n = 3) %>%
summarise(
seasons = n(),
# Weighted average (recent seasons weighted more)
proj_pts = weighted.mean(pts, c(5, 3, 2)[1:n()]),
proj_reb = weighted.mean(reb, c(5, 3, 2)[1:n()]),
proj_ast = weighted.mean(ast, c(5, 3, 2)[1:n()]),
.groups = "drop"
) %>%
left_join(player_ages, by = "player_id") %>%
mutate(
# Age adjustment
age_factor = case_when(
age < 24 ~ 1.05, # Improvement expected
age <= 28 ~ 1.00, # Prime
age <= 32 ~ 0.97, # Slight decline
TRUE ~ 0.93 # Decline
),
proj_pts = proj_pts * age_factor,
proj_reb = proj_reb * age_factor,
proj_ast = proj_ast * age_factor
)
}
career <- read_csv("player_career_stats.csv")
ages <- read_csv("player_ages.csv")
projections <- project_player_stats(career, ages)
print(projections)