Chapter 21 Intermediate ~45 min read

Introduction to Player Tracking Data

An overview of NBA player tracking technology and the revolutionary data it provides for basketball analysis.

The Tracking Data Revolution

The introduction of player tracking technology transformed basketball analytics from an exercise in statistical approximation to something approaching comprehensive measurement. When the NBA installed Second Spectrum's optical tracking system in all 30 arenas for the 2013-14 season, analysts gained access to data streams that previous generations could only imagine: precise player locations 25 times per second, ball position in three dimensions, and derived metrics describing movement, speed, and spatial relationships that no box score could capture.

How Player Tracking Works

Modern NBA tracking uses computer vision systems with multiple cameras positioned throughout each arena. These cameras capture video of the playing surface at high frame rates, and sophisticated algorithms process the video to identify and locate each player and the ball in real-time. The system produces coordinates for all ten players plus the ball at approximately 25 frames per second.

The technical achievement is remarkable. The system must distinguish ten players wearing similar uniforms, track a small, fast-moving ball that's often occluded by bodies, handle varying lighting conditions, and produce accurate coordinates even during rapid movement.

Categories of Tracking Data

Positional data provides the foundation—x,y coordinates for each player at each moment. From positions, we derive distances, speeds, accelerations, and spatial relationships. Event data catalogs specific occurrences: shots, passes, dribbles, screens, contests, rebounds. Derived metrics synthesize multiple data streams into summary statistics like speed and distance totals.

Accessing Tracking Data

The NBA makes select tracking data publicly available through stats.nba.com and associated APIs. The public data includes many derived metrics but not raw positional coordinates. Here is how to access tracking data programmatically:

import requests
import pandas as pd

def get_tracking_data(season, metric_type):
    """
    Retrieve tracking data from NBA Stats API
    """
    base_url = "https://stats.nba.com/stats/leaguedashptstats"

    headers = {
        'User-Agent': 'Mozilla/5.0',
        'Accept': 'application/json',
        'Referer': 'https://stats.nba.com/'
    }

    params = {
        'PerMode': 'PerGame',
        'PtMeasureType': metric_type,
        'Season': season,
        'SeasonType': 'Regular Season',
        'LeagueID': '00',
        'PlayerOrTeam': 'Player'
    }

    response = requests.get(base_url, headers=headers, params=params)
    data = response.json()

    headers_list = data['resultSets'][0]['headers']
    rows = data['resultSets'][0]['rowSet']

    return pd.DataFrame(rows, columns=headers_list)

# Example: Get speed and distance data
speed_data = get_tracking_data("2023-24", "SpeedDistance")

Key Tracking Metrics

Speed and Distance metrics quantify movement—miles traveled per game, average speed, breakdowns by offensive versus defensive possessions. Touches and Time of Possession measure ball-handling. Passing statistics count assists and potential assists. Defensive tracking provides individual defensive statistics that box scores cannot capture.

Limitations of Public Tracking Data

Public tracking data comes pre-aggregated and summarized, limiting analytical flexibility. Researchers cannot access raw positional coordinates. Sample size issues affect tracking statistics just as they affect traditional statistics. Context remains partially uncontrolled even with tracking data.

Implementation in R

# Working with NBA tracking data
library(tidyverse)
library(jsonlite)

# Load tracking data from JSON
load_tracking_data <- function(filepath) {
  data <- fromJSON(filepath)

  # Extract player movements
  movements <- data$events %>%
    map_df(~{
      moments <- .x$moments
      tibble(
        event_id = .x$eventId,
        quarter = moments[[1]][1],
        game_clock = moments[[1]][2],
        shot_clock = moments[[1]][3],
        ball_x = moments[[1]][4],
        ball_y = moments[[1]][5],
        ball_z = moments[[1]][6],
        players = list(moments[[1]][7:16])
      )
    })

  return(movements)
}

tracking <- load_tracking_data("game_tracking.json")
head(tracking)
# Calculate player speed from tracking data
library(tidyverse)

calculate_speed <- function(tracking_data) {
  tracking_data %>%
    arrange(player_id, frame_id) %>%
    group_by(player_id) %>%
    mutate(
      # Distance between consecutive frames
      dx = x_loc - lag(x_loc),
      dy = y_loc - lag(y_loc),
      distance = sqrt(dx^2 + dy^2),

      # Time between frames (25 FPS = 0.04 seconds)
      dt = 0.04,

      # Speed in feet per second
      speed_fps = distance / dt,

      # Convert to mph
      speed_mph = speed_fps * 0.681818
    ) %>%
    ungroup()
}

player_tracking <- read_csv("player_tracking_frames.csv")
with_speed <- calculate_speed(player_tracking)

# Average speed by player
avg_speed <- with_speed %>%
  group_by(player_name) %>%
  summarise(
    avg_speed_mph = round(mean(speed_mph, na.rm = TRUE), 2),
    max_speed_mph = round(max(speed_mph, na.rm = TRUE), 2)
  ) %>%
  arrange(desc(avg_speed_mph))

print(avg_speed)

Implementation in R

# Working with NBA tracking data
library(tidyverse)
library(jsonlite)

# Load tracking data from JSON
load_tracking_data <- function(filepath) {
  data <- fromJSON(filepath)

  # Extract player movements
  movements <- data$events %>%
    map_df(~{
      moments <- .x$moments
      tibble(
        event_id = .x$eventId,
        quarter = moments[[1]][1],
        game_clock = moments[[1]][2],
        shot_clock = moments[[1]][3],
        ball_x = moments[[1]][4],
        ball_y = moments[[1]][5],
        ball_z = moments[[1]][6],
        players = list(moments[[1]][7:16])
      )
    })

  return(movements)
}

tracking <- load_tracking_data("game_tracking.json")
head(tracking)
# Calculate player speed from tracking data
library(tidyverse)

calculate_speed <- function(tracking_data) {
  tracking_data %>%
    arrange(player_id, frame_id) %>%
    group_by(player_id) %>%
    mutate(
      # Distance between consecutive frames
      dx = x_loc - lag(x_loc),
      dy = y_loc - lag(y_loc),
      distance = sqrt(dx^2 + dy^2),

      # Time between frames (25 FPS = 0.04 seconds)
      dt = 0.04,

      # Speed in feet per second
      speed_fps = distance / dt,

      # Convert to mph
      speed_mph = speed_fps * 0.681818
    ) %>%
    ungroup()
}

player_tracking <- read_csv("player_tracking_frames.csv")
with_speed <- calculate_speed(player_tracking)

# Average speed by player
avg_speed <- with_speed %>%
  group_by(player_name) %>%
  summarise(
    avg_speed_mph = round(mean(speed_mph, na.rm = TRUE), 2),
    max_speed_mph = round(max(speed_mph, na.rm = TRUE), 2)
  ) %>%
  arrange(desc(avg_speed_mph))

print(avg_speed)
Chapter Summary

You've completed Chapter 21: Introduction to Player Tracking Data.