Chapter 23 Intermediate ~55 min read

Shot Quality and Expected Points Models

Building models to estimate expected point value based on shot characteristics and defender positioning.

The Shot Quality Revolution

Shot quality models represent one of the most impactful applications of tracking data. Before tracking, we knew that some shots were better than others, but we couldn't systematically quantify shot quality. Tracking data changed this by providing the contextual information needed to estimate expected value for every shot: location, defender distance, shot clock, and shooter identity.

Shot Context Variables

Distance from basket is the most fundamental predictor. The three-point line introduces discontinuity where slightly longer shots become more valuable. Defender proximity dramatically affects shot success—wide-open shots convert at much higher rates than contested attempts. Shot type captures whether the attempt is catch-and-shoot, pull-up, or off dribble.

Building an Expected Points Model

from sklearn.linear_model import LogisticRegression
import numpy as np

def build_expected_points_model(shot_data):
    """Build a model to predict expected points from shot context"""
    shot_data = shot_data.copy()
    shot_data['SHOT_DISTANCE'] = np.sqrt(shot_data['LOC_X']**2 + shot_data['LOC_Y']**2)

    features = ['SHOT_DISTANCE', 'DEFENDER_DISTANCE', 'SHOT_CLOCK', 'DRIBBLES']
    X = shot_data[features]
    y = shot_data['SHOT_MADE_FLAG']

    model = LogisticRegression(max_iter=1000)
    model.fit(X, y)

    shot_data['MAKE_PROBABILITY'] = model.predict_proba(X)[:, 1]
    shot_data['SHOT_VALUE'] = shot_data['SHOT_TYPE'].apply(lambda x: 3 if '3PT' in str(x) else 2)
    shot_data['EXPECTED_POINTS'] = shot_data['MAKE_PROBABILITY'] * shot_data['SHOT_VALUE']

    return model, shot_data

Interpreting Expected Points

Expected points per shot (xPPS) provides a baseline for evaluating actual performance. A player averaging 1.2 points per shot when expected was 1.1 is exceeding expectations—either through shooting skill or favorable variance.

The difference between actual and expected points aggregates to "points added" through shooting skill. Elite shooters like Stephen Curry consistently add points above expectation because they convert difficult shots at higher rates than historical baselines predict.

Applications Across Contexts

Shooter evaluation separates shot creation skill from shot conversion skill. Offensive system evaluation compares shot quality across schemes. Defensive evaluation uses allowed shot quality as a metric. Teams that allow only low-quality attempts may be more effective than their raw defensive rating suggests.

Implementation in R

# Analyze player touches and time of possession
library(tidyverse)

calculate_touch_metrics <- function(possession_data) {
  possession_data %>%
    group_by(player_id, player_name) %>%
    summarise(
      total_touches = n(),
      total_time_of_poss = sum(touch_duration, na.rm = TRUE),

      # Touch categories
      front_court_touches = sum(touch_zone == "front_court"),
      elbow_touches = sum(touch_zone == "elbow"),
      paint_touches = sum(touch_zone == "paint"),
      post_touches = sum(touch_zone == "post"),

      # Time per touch
      avg_seconds_per_touch = mean(touch_duration, na.rm = TRUE),

      # Outcomes
      pts_per_touch = sum(points, na.rm = TRUE) / n(),
      ast_per_touch = sum(assist, na.rm = TRUE) / n(),
      tov_per_touch = sum(turnover, na.rm = TRUE) / n(),

      .groups = "drop"
    ) %>%
    mutate(
      touch_efficiency = pts_per_touch - 0.5 * tov_per_touch
    )
}

touches <- read_csv("player_touches.csv")
touch_metrics <- calculate_touch_metrics(touches)

# Most efficient touch players
efficient_touches <- touch_metrics %>%
  filter(total_touches >= 200) %>%
  arrange(desc(touch_efficiency)) %>%
  select(player_name, total_touches, avg_seconds_per_touch,
         pts_per_touch, touch_efficiency) %>%
  head(15)

print(efficient_touches)

# Passing analytics from tracking
library(tidyverse)

analyze_passing <- function(pass_data) {
  pass_data %>%
    group_by(passer_id, passer_name) %>%
    summarise(
      total_passes = n(),
      potential_assists = sum(potential_assist, na.rm = TRUE),
      actual_assists = sum(assist, na.rm = TRUE),
      pass_to_assist_pct = round(actual_assists / potential_assists * 100, 1),

      # Pass types
      swing_passes = sum(pass_type == "swing"),
      entry_passes = sum(pass_type == "entry"),
      drive_kicks = sum(pass_type == "drive_kick"),

      # Pass distance
      avg_pass_distance = mean(pass_distance, na.rm = TRUE),

      .groups = "drop"
    ) %>%
    mutate(
      drive_kick_rate = round(drive_kicks / total_passes * 100, 1)
    )
}

passes <- read_csv("pass_tracking.csv")
passing_analysis <- analyze_passing(passes)

# Top passers by potential assist conversion
top_passers <- passing_analysis %>%
  filter(potential_assists >= 100) %>%
  arrange(desc(pass_to_assist_pct)) %>%
  head(15)

print(top_passers)

Implementation in R

# Analyze player touches and time of possession
library(tidyverse)

calculate_touch_metrics <- function(possession_data) {
  possession_data %>%
    group_by(player_id, player_name) %>%
    summarise(
      total_touches = n(),
      total_time_of_poss = sum(touch_duration, na.rm = TRUE),

      # Touch categories
      front_court_touches = sum(touch_zone == "front_court"),
      elbow_touches = sum(touch_zone == "elbow"),
      paint_touches = sum(touch_zone == "paint"),
      post_touches = sum(touch_zone == "post"),

      # Time per touch
      avg_seconds_per_touch = mean(touch_duration, na.rm = TRUE),

      # Outcomes
      pts_per_touch = sum(points, na.rm = TRUE) / n(),
      ast_per_touch = sum(assist, na.rm = TRUE) / n(),
      tov_per_touch = sum(turnover, na.rm = TRUE) / n(),

      .groups = "drop"
    ) %>%
    mutate(
      touch_efficiency = pts_per_touch - 0.5 * tov_per_touch
    )
}

touches <- read_csv("player_touches.csv")
touch_metrics <- calculate_touch_metrics(touches)

# Most efficient touch players
efficient_touches <- touch_metrics %>%
  filter(total_touches >= 200) %>%
  arrange(desc(touch_efficiency)) %>%
  select(player_name, total_touches, avg_seconds_per_touch,
         pts_per_touch, touch_efficiency) %>%
  head(15)

print(efficient_touches)

# Passing analytics from tracking
library(tidyverse)

analyze_passing <- function(pass_data) {
  pass_data %>%
    group_by(passer_id, passer_name) %>%
    summarise(
      total_passes = n(),
      potential_assists = sum(potential_assist, na.rm = TRUE),
      actual_assists = sum(assist, na.rm = TRUE),
      pass_to_assist_pct = round(actual_assists / potential_assists * 100, 1),

      # Pass types
      swing_passes = sum(pass_type == "swing"),
      entry_passes = sum(pass_type == "entry"),
      drive_kicks = sum(pass_type == "drive_kick"),

      # Pass distance
      avg_pass_distance = mean(pass_distance, na.rm = TRUE),

      .groups = "drop"
    ) %>%
    mutate(
      drive_kick_rate = round(drive_kicks / total_passes * 100, 1)
    )
}

passes <- read_csv("pass_tracking.csv")
passing_analysis <- analyze_passing(passes)

# Top passers by potential assist conversion
top_passers <- passing_analysis %>%
  filter(potential_assists >= 100) %>%
  arrange(desc(pass_to_assist_pct)) %>%
  head(15)

print(top_passers)

Chapter Summary

You've completed Chapter 23: Shot Quality and Expected Points Models.

Practice Exercises View Glossary Continue to Chapter 24