The Shot Quality Revolution
Shot quality models represent one of the most impactful applications of tracking data. Before tracking, we knew that some shots were better than others, but we couldn't systematically quantify shot quality. Tracking data changed this by providing the contextual information needed to estimate expected value for every shot: location, defender distance, shot clock, and shooter identity.
Shot Context Variables
Distance from basket is the most fundamental predictor. The three-point line introduces discontinuity where slightly longer shots become more valuable. Defender proximity dramatically affects shot success—wide-open shots convert at much higher rates than contested attempts. Shot type captures whether the attempt is catch-and-shoot, pull-up, or off dribble.
Building an Expected Points Model
from sklearn.linear_model import LogisticRegression
import numpy as np
def build_expected_points_model(shot_data):
"""Build a model to predict expected points from shot context"""
shot_data = shot_data.copy()
shot_data['SHOT_DISTANCE'] = np.sqrt(shot_data['LOC_X']**2 + shot_data['LOC_Y']**2)
features = ['SHOT_DISTANCE', 'DEFENDER_DISTANCE', 'SHOT_CLOCK', 'DRIBBLES']
X = shot_data[features]
y = shot_data['SHOT_MADE_FLAG']
model = LogisticRegression(max_iter=1000)
model.fit(X, y)
shot_data['MAKE_PROBABILITY'] = model.predict_proba(X)[:, 1]
shot_data['SHOT_VALUE'] = shot_data['SHOT_TYPE'].apply(lambda x: 3 if '3PT' in str(x) else 2)
shot_data['EXPECTED_POINTS'] = shot_data['MAKE_PROBABILITY'] * shot_data['SHOT_VALUE']
return model, shot_data
Interpreting Expected Points
Expected points per shot (xPPS) provides a baseline for evaluating actual performance. A player averaging 1.2 points per shot when expected was 1.1 is exceeding expectations—either through shooting skill or favorable variance.
The difference between actual and expected points aggregates to "points added" through shooting skill. Elite shooters like Stephen Curry consistently add points above expectation because they convert difficult shots at higher rates than historical baselines predict.
Applications Across Contexts
Shooter evaluation separates shot creation skill from shot conversion skill. Offensive system evaluation compares shot quality across schemes. Defensive evaluation uses allowed shot quality as a metric. Teams that allow only low-quality attempts may be more effective than their raw defensive rating suggests.
Implementation in R
# Analyze player touches and time of possession
library(tidyverse)
calculate_touch_metrics <- function(possession_data) {
possession_data %>%
group_by(player_id, player_name) %>%
summarise(
total_touches = n(),
total_time_of_poss = sum(touch_duration, na.rm = TRUE),
# Touch categories
front_court_touches = sum(touch_zone == "front_court"),
elbow_touches = sum(touch_zone == "elbow"),
paint_touches = sum(touch_zone == "paint"),
post_touches = sum(touch_zone == "post"),
# Time per touch
avg_seconds_per_touch = mean(touch_duration, na.rm = TRUE),
# Outcomes
pts_per_touch = sum(points, na.rm = TRUE) / n(),
ast_per_touch = sum(assist, na.rm = TRUE) / n(),
tov_per_touch = sum(turnover, na.rm = TRUE) / n(),
.groups = "drop"
) %>%
mutate(
touch_efficiency = pts_per_touch - 0.5 * tov_per_touch
)
}
touches <- read_csv("player_touches.csv")
touch_metrics <- calculate_touch_metrics(touches)
# Most efficient touch players
efficient_touches <- touch_metrics %>%
filter(total_touches >= 200) %>%
arrange(desc(touch_efficiency)) %>%
select(player_name, total_touches, avg_seconds_per_touch,
pts_per_touch, touch_efficiency) %>%
head(15)
print(efficient_touches)
# Passing analytics from tracking
library(tidyverse)
analyze_passing <- function(pass_data) {
pass_data %>%
group_by(passer_id, passer_name) %>%
summarise(
total_passes = n(),
potential_assists = sum(potential_assist, na.rm = TRUE),
actual_assists = sum(assist, na.rm = TRUE),
pass_to_assist_pct = round(actual_assists / potential_assists * 100, 1),
# Pass types
swing_passes = sum(pass_type == "swing"),
entry_passes = sum(pass_type == "entry"),
drive_kicks = sum(pass_type == "drive_kick"),
# Pass distance
avg_pass_distance = mean(pass_distance, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
drive_kick_rate = round(drive_kicks / total_passes * 100, 1)
)
}
passes <- read_csv("pass_tracking.csv")
passing_analysis <- analyze_passing(passes)
# Top passers by potential assist conversion
top_passers <- passing_analysis %>%
filter(potential_assists >= 100) %>%
arrange(desc(pass_to_assist_pct)) %>%
head(15)
print(top_passers)
Implementation in R
# Analyze player touches and time of possession
library(tidyverse)
calculate_touch_metrics <- function(possession_data) {
possession_data %>%
group_by(player_id, player_name) %>%
summarise(
total_touches = n(),
total_time_of_poss = sum(touch_duration, na.rm = TRUE),
# Touch categories
front_court_touches = sum(touch_zone == "front_court"),
elbow_touches = sum(touch_zone == "elbow"),
paint_touches = sum(touch_zone == "paint"),
post_touches = sum(touch_zone == "post"),
# Time per touch
avg_seconds_per_touch = mean(touch_duration, na.rm = TRUE),
# Outcomes
pts_per_touch = sum(points, na.rm = TRUE) / n(),
ast_per_touch = sum(assist, na.rm = TRUE) / n(),
tov_per_touch = sum(turnover, na.rm = TRUE) / n(),
.groups = "drop"
) %>%
mutate(
touch_efficiency = pts_per_touch - 0.5 * tov_per_touch
)
}
touches <- read_csv("player_touches.csv")
touch_metrics <- calculate_touch_metrics(touches)
# Most efficient touch players
efficient_touches <- touch_metrics %>%
filter(total_touches >= 200) %>%
arrange(desc(touch_efficiency)) %>%
select(player_name, total_touches, avg_seconds_per_touch,
pts_per_touch, touch_efficiency) %>%
head(15)
print(efficient_touches)
# Passing analytics from tracking
library(tidyverse)
analyze_passing <- function(pass_data) {
pass_data %>%
group_by(passer_id, passer_name) %>%
summarise(
total_passes = n(),
potential_assists = sum(potential_assist, na.rm = TRUE),
actual_assists = sum(assist, na.rm = TRUE),
pass_to_assist_pct = round(actual_assists / potential_assists * 100, 1),
# Pass types
swing_passes = sum(pass_type == "swing"),
entry_passes = sum(pass_type == "entry"),
drive_kicks = sum(pass_type == "drive_kick"),
# Pass distance
avg_pass_distance = mean(pass_distance, na.rm = TRUE),
.groups = "drop"
) %>%
mutate(
drive_kick_rate = round(drive_kicks / total_passes * 100, 1)
)
}
passes <- read_csv("pass_tracking.csv")
passing_analysis <- analyze_passing(passes)
# Top passers by potential assist conversion
top_passers <- passing_analysis %>%
filter(potential_assists >= 100) %>%
arrange(desc(pass_to_assist_pct)) %>%
head(15)
print(top_passers)