The Prediction Challenge
Predicting basketball outcomes involves uncertainty at every level. Individual possessions are highly variable, games aggregate that variability, and seasons compound it further. Yet pattern exists within the noise—better teams tend to win more, better shooters tend to make more shots. Predictive modeling extracts these patterns to forecast future outcomes.
Types of Predictions
Basketball prediction encompasses multiple domains: game outcomes (will Team A beat Team B?), player performance (how will Player X perform next season?), career trajectories (will this rookie become a star?), and in-game situations (what's the win probability given current score and time?). Each domain requires different methodologies and data sources.
The Baseline: Point Spreads
Las Vegas point spreads provide the most efficient baseline for game prediction. Betting markets aggregate vast information and provide forecasts that are extremely difficult to beat consistently. Any predictive model should be evaluated against this baseline—if it can't match Vegas, it probably isn't capturing signal that isn't already priced into lines.
def evaluate_predictions_vs_vegas(predictions, vegas_lines, outcomes):
"""Compare model predictions to Vegas accuracy"""
model_correct = sum(1 for p, o in zip(predictions, outcomes) if (p > 0) == (o > 0))
vegas_correct = sum(1 for v, o in zip(vegas_lines, outcomes) if (v > 0) == (o > 0))
model_accuracy = model_correct / len(outcomes)
vegas_accuracy = vegas_correct / len(outcomes)
return {
'model_accuracy': round(model_accuracy, 3),
'vegas_accuracy': round(vegas_accuracy, 3),
'model_beats_vegas': model_accuracy > vegas_accuracy
}
Model Building Approach
Effective prediction models balance complexity with interpretability. Simple models based on established metrics often outperform complex machine learning approaches that overfit to noise. The goal is capturing genuine signal without learning patterns that won't persist.
Implementation in R
# Game outcome prediction model
library(tidyverse)
library(caret)
build_game_predictor <- function(game_data) {
# Prepare features
features <- game_data %>%
select(
home_net_rtg, away_net_rtg,
home_rest_days, away_rest_days,
home_b2b, away_b2b,
home_travel_miles, away_travel_miles,
home_win # target
)
# Split data
set.seed(42)
train_idx <- createDataPartition(features$home_win, p = 0.8, list = FALSE)
train <- features[train_idx, ]
test <- features[-train_idx, ]
# Train logistic regression
model <- train(
home_win ~ .,
data = train,
method = "glm",
family = "binomial"
)
# Evaluate
predictions <- predict(model, test, type = "prob")
accuracy <- mean((predictions[,2] > 0.5) == test$home_win)
list(model = model, accuracy = accuracy)
}
games <- read_csv("game_features.csv")
predictor <- build_game_predictor(games)
print(paste("Accuracy:", round(predictor$accuracy * 100, 1), "%"))
Implementation in R
# Game outcome prediction model
library(tidyverse)
library(caret)
build_game_predictor <- function(game_data) {
# Prepare features
features <- game_data %>%
select(
home_net_rtg, away_net_rtg,
home_rest_days, away_rest_days,
home_b2b, away_b2b,
home_travel_miles, away_travel_miles,
home_win # target
)
# Split data
set.seed(42)
train_idx <- createDataPartition(features$home_win, p = 0.8, list = FALSE)
train <- features[train_idx, ]
test <- features[-train_idx, ]
# Train logistic regression
model <- train(
home_win ~ .,
data = train,
method = "glm",
family = "binomial"
)
# Evaluate
predictions <- predict(model, test, type = "prob")
accuracy <- mean((predictions[,2] > 0.5) == test$home_win)
list(model = model, accuracy = accuracy)
}
games <- read_csv("game_features.csv")
predictor <- build_game_predictor(games)
print(paste("Accuracy:", round(predictor$accuracy * 100, 1), "%"))