Chapter 43 Advanced ~45 min read

Introduction to Basketball Prediction

Foundations of predictive modeling in basketball, from game outcomes to player performance.

The Prediction Challenge

Predicting basketball outcomes involves uncertainty at every level. Individual possessions are highly variable, games aggregate that variability, and seasons compound it further. Yet pattern exists within the noise—better teams tend to win more, better shooters tend to make more shots. Predictive modeling extracts these patterns to forecast future outcomes.

Types of Predictions

Basketball prediction encompasses multiple domains: game outcomes (will Team A beat Team B?), player performance (how will Player X perform next season?), career trajectories (will this rookie become a star?), and in-game situations (what's the win probability given current score and time?). Each domain requires different methodologies and data sources.

The Baseline: Point Spreads

Las Vegas point spreads provide the most efficient baseline for game prediction. Betting markets aggregate vast information and provide forecasts that are extremely difficult to beat consistently. Any predictive model should be evaluated against this baseline—if it can't match Vegas, it probably isn't capturing signal that isn't already priced into lines.

def evaluate_predictions_vs_vegas(predictions, vegas_lines, outcomes):
    """Compare model predictions to Vegas accuracy"""
    model_correct = sum(1 for p, o in zip(predictions, outcomes) if (p > 0) == (o > 0))
    vegas_correct = sum(1 for v, o in zip(vegas_lines, outcomes) if (v > 0) == (o > 0))

    model_accuracy = model_correct / len(outcomes)
    vegas_accuracy = vegas_correct / len(outcomes)

    return {
        'model_accuracy': round(model_accuracy, 3),
        'vegas_accuracy': round(vegas_accuracy, 3),
        'model_beats_vegas': model_accuracy > vegas_accuracy
    }

Model Building Approach

Effective prediction models balance complexity with interpretability. Simple models based on established metrics often outperform complex machine learning approaches that overfit to noise. The goal is capturing genuine signal without learning patterns that won't persist.

Implementation in R

# Game outcome prediction model
library(tidyverse)
library(caret)

build_game_predictor <- function(game_data) {
  # Prepare features
  features <- game_data %>%
    select(
      home_net_rtg, away_net_rtg,
      home_rest_days, away_rest_days,
      home_b2b, away_b2b,
      home_travel_miles, away_travel_miles,
      home_win  # target
    )

  # Split data
  set.seed(42)
  train_idx <- createDataPartition(features$home_win, p = 0.8, list = FALSE)
  train <- features[train_idx, ]
  test <- features[-train_idx, ]

  # Train logistic regression
  model <- train(
    home_win ~ .,
    data = train,
    method = "glm",
    family = "binomial"
  )

  # Evaluate
  predictions <- predict(model, test, type = "prob")
  accuracy <- mean((predictions[,2] > 0.5) == test$home_win)

  list(model = model, accuracy = accuracy)
}

games <- read_csv("game_features.csv")
predictor <- build_game_predictor(games)

print(paste("Accuracy:", round(predictor$accuracy * 100, 1), "%"))

Implementation in R

# Game outcome prediction model
library(tidyverse)
library(caret)

build_game_predictor <- function(game_data) {
  # Prepare features
  features <- game_data %>%
    select(
      home_net_rtg, away_net_rtg,
      home_rest_days, away_rest_days,
      home_b2b, away_b2b,
      home_travel_miles, away_travel_miles,
      home_win  # target
    )

  # Split data
  set.seed(42)
  train_idx <- createDataPartition(features$home_win, p = 0.8, list = FALSE)
  train <- features[train_idx, ]
  test <- features[-train_idx, ]

  # Train logistic regression
  model <- train(
    home_win ~ .,
    data = train,
    method = "glm",
    family = "binomial"
  )

  # Evaluate
  predictions <- predict(model, test, type = "prob")
  accuracy <- mean((predictions[,2] > 0.5) == test$home_win)

  list(model = model, accuracy = accuracy)
}

games <- read_csv("game_features.csv")
predictor <- build_game_predictor(games)

print(paste("Accuracy:", round(predictor$accuracy * 100, 1), "%"))
Chapter Summary

You've completed Chapter 43: Introduction to Basketball Prediction.