Practice Exercises | NBA Analytics Textbook

Medium 40 min

Create a Player Comparison Visualization

Chapter 4: Data Visualization Fundamentals

Build a multi-panel visualization comparing two players across key statistics. Include a bar chart for counting stats, a radar chart for percentages, and proper styling.

["visualization" "multi-panel" "comparison"]

Create a Player Comparison Visualization - Starter Code

R

library(tidyverse)
library(ggplot2)

# Player comparison data
player1 <- list(name = "Player A", pts = 28.5, reb = 7.2, ast = 5.4, stl = 1.2, fg_pct = 0.52, ft_pct = 0.85)
player2 <- list(name = "Player B", pts = 24.1, reb = 10.8, ast = 4.1, stl = 0.8, fg_pct = 0.58, ft_pct = 0.72)

# TODO: Create comparison visualization
# 1. Bar chart comparing pts, reb, ast, stl
# 2. Add proper labels and title
# 3. Use appropriate colors for each player
# 4. Create side-by-side or faceted layout

Python

import matplotlib.pyplot as plt
import numpy as np

# Player comparison data
player1 = {"name": "Player A", "pts": 28.5, "reb": 7.2, "ast": 5.4, "stl": 1.2, "fg_pct": 0.52, "ft_pct": 0.85}
player2 = {"name": "Player B", "pts": 24.1, "reb": 10.8, "ast": 4.1, "stl": 0.8, "fg_pct": 0.58, "ft_pct": 0.72}

# TODO: Create comparison visualization
# 1. Bar chart comparing pts, reb, ast, stl
# 2. Add proper labels and title
# 3. Use appropriate colors for each player
# 4. Create side-by-side layout

Medium 35 min

Hypothesis Testing for Three-Point Shooting

Chapter 5: Statistical Foundations for Analytics

Test whether there is a statistically significant difference in three-point shooting percentage between guards and forwards. Perform appropriate statistical tests and interpret results.

["hypothesis testing" "t-test" "statistical inference"]

Hypothesis Testing for Three-Point Shooting - Starter Code

R

library(tidyverse)

# Sample data
set.seed(42)
guards <- data.frame(
  position = "Guard",
  fg3_pct = rnorm(50, mean = 0.38, sd = 0.05)
)
forwards <- data.frame(
  position = "Forward",
  fg3_pct = rnorm(50, mean = 0.35, sd = 0.06)
)
players <- rbind(guards, forwards)

# TODO:
# 1. Calculate summary statistics for each group
# 2. Perform two-sample t-test
# 3. Calculate effect size (Cohens d)
# 4. Interpret results

Python

import numpy as np
from scipy import stats
import pandas as pd

# Sample data
np.random.seed(42)
guards = pd.DataFrame({
    "position": "Guard",
    "fg3_pct": np.random.normal(0.38, 0.05, 50)
})
forwards = pd.DataFrame({
    "position": "Forward",
    "fg3_pct": np.random.normal(0.35, 0.06, 50)
})
players = pd.concat([guards, forwards])

# TODO:
# 1. Calculate summary statistics for each group
# 2. Perform two-sample t-test
# 3. Calculate effect size (Cohens d)
# 4. Interpret results

Easy 25 min

Analyze Relationships Between Player Statistics

Chapter 5: Statistical Foundations for Analytics

Explore the correlations between key player statistics. Create a correlation matrix, identify the strongest relationships, and visualize the results with a heatmap.

["correlation" "heatmap" "exploratory analysis"]

Analyze Relationships Between Player Statistics - Starter Code

R

library(tidyverse)

# Sample player data
set.seed(42)
n <- 100
player_stats <- tibble(
  pts = runif(n, 5, 30),
  ast = runif(n, 1, 10),
  reb = runif(n, 2, 12),
  min = runif(n, 15, 38),
  tov = runif(n, 0.5, 4)
)
# Add some realistic correlations
player_stats$ast <- player_stats$ast + player_stats$pts * 0.1
player_stats$tov <- player_stats$tov + player_stats$ast * 0.2

# TODO:
# 1. Calculate correlation matrix
# 2. Find the 3 strongest correlations
# 3. Create a heatmap visualization

Python

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Sample player data
np.random.seed(42)
n = 100
player_stats = pd.DataFrame({
    "pts": np.random.uniform(5, 30, n),
    "ast": np.random.uniform(1, 10, n),
    "reb": np.random.uniform(2, 12, n),
    "min": np.random.uniform(15, 38, n),
    "tov": np.random.uniform(0.5, 4, n)
})
player_stats["ast"] += player_stats["pts"] * 0.1
player_stats["tov"] += player_stats["ast"] * 0.2

# TODO:
# 1. Calculate correlation matrix
# 2. Find the 3 strongest correlations
# 3. Create a heatmap visualization

Medium 30 min

Parse Game Box Scores

Chapter 6: Box Score Statistics Deep Dive

Write a function to parse raw box score data and calculate Game Score (John Hollinger formula) for each player.

["box score" "Game Score" "parsing"]

Parse Game Box Scores - Starter Code

R

library(tidyverse)

# Game Score = PTS + 0.4*FGM - 0.7*FGA - 0.4*(FTA-FTM) + 0.7*OREB + 0.3*DREB + STL + 0.7*AST + 0.7*BLK - 0.4*PF - TOV

# TODO: Create calculate_game_score function

Python

import pandas as pd

# Game Score = PTS + 0.4*FGM - 0.7*FGA - 0.4*(FTA-FTM) + 0.7*OREB + 0.3*DREB + STL + 0.7*AST + 0.7*BLK - 0.4*PF - TOV

# TODO: Create calculate_game_score function

Easy 20 min

Calculate All Shooting Efficiency Metrics

Chapter 7: Shooting Efficiency Metrics (TS%, eFG%)

Create a comprehensive function that calculates FG%, eFG%, and TS% for a player and explains when to use each.

["shooting efficiency" "TS%" "eFG%"]

Calculate All Shooting Efficiency Metrics - Starter Code

R

# TODO: Create function that returns FG%, eFG%, and TS%

Python

# TODO: Create function that returns FG%, eFG%, and TS%

Medium 35 min

Shot Zone Efficiency Analysis

Chapter 7: Shooting Efficiency Metrics (TS%, eFG%)

Analyze shooting efficiency by zone (rim, mid-range, three-point) and determine optimal shot distribution.

["shot zones" "efficiency" "optimization"]

Shot Zone Efficiency Analysis - Starter Code

R

# TODO: Calculate efficiency by zone and recommend optimal distribution

Python

# TODO: Calculate efficiency by zone and recommend optimal distribution

Hard 45 min

Build Assist Network Visualization

Chapter 9: Playmaking and Assist Metrics

Create a pass network diagram showing assist relationships between teammates.

["network analysis" "assists" "visualization"]

Build Assist Network Visualization - Starter Code

R

# TODO: Build assist network from play-by-play data

Python

# TODO: Build assist network from play-by-play data

Medium 30 min

Turnover Type Classification

Chapter 10: Turnover Analysis and Ball Security

Classify turnovers by type and analyze patterns to identify areas for improvement.

["turnovers" "classification" "pattern analysis"]

Turnover Type Classification - Starter Code

R

# TODO: Classify and analyze turnover types

Python

# TODO: Classify and analyze turnover types

Medium 30 min

Calculate Rebounding Percentages

Chapter 8: Rebounding and Possession Metrics

Implement ORB%, DRB%, and TRB% calculations with proper team context adjustments.

["rebounding" "TRB%" "rate stats"]

Calculate Rebounding Percentages - Starter Code

R

# TODO: Calculate rebounding percentages

Python

# TODO: Calculate rebounding percentages

Medium 25 min

Calculate Team Pace

Chapter 11: Pace and Tempo Analysis

Implement multiple pace calculation formulas and compare results.

["pace" "possessions" "tempo"]

Calculate Team Pace - Starter Code

R

# TODO: Calculate pace using multiple formulas

Python

# TODO: Calculate pace using multiple formulas

Medium 30 min

Pace-Adjusted Team Comparison

Chapter 11: Pace and Tempo Analysis

Compare two teams' performance after adjusting for pace differences.

["pace adjustment" "team comparison" "normalization"]

Pace-Adjusted Team Comparison - Starter Code

R

# TODO: Pace-adjust and compare teams

Python

# TODO: Pace-adjust and compare teams

Medium 30 min

Calculate Offensive and Defensive Ratings

Chapter 12: Per-Possession and Rate Statistics

Compute team and player offensive/defensive ratings per 100 possessions.

["ratings" "per-100" "efficiency"]

Calculate Offensive and Defensive Ratings - Starter Code

R

# TODO: Calculate ORtg and DRtg

Python

# TODO: Calculate ORtg and DRtg

Easy 20 min

Per-36 and Per-100 Conversion Tool

Chapter 12: Per-Possession and Rate Statistics

Build a tool that converts raw stats to per-36 minutes and per-100 possessions.

["rate stats" "conversion" "normalization"]

Per-36 and Per-100 Conversion Tool - Starter Code

R

# TODO: Create per-36 and per-100 conversion functions

Python

# TODO: Create per-36 and per-100 conversion functions

Hard 45 min

Calculate Player Efficiency Rating

Chapter 13: Player Efficiency Rating (PER)

Implement the PER formula and calculate league-adjusted PER for multiple players.

["PER" "advanced metrics" "normalization"]

Calculate Player Efficiency Rating - Starter Code

R

# TODO: Implement PER calculation

Python

# TODO: Implement PER calculation

Hard 50 min

Win Shares Calculation

Chapter 14: Win Shares and WS/48

Calculate offensive and defensive win shares for a player given their stats and team context.

["Win Shares" "value metrics" "team context"]

Win Shares Calculation - Starter Code

R

# TODO: Implement Win Shares calculation

Python

# TODO: Implement Win Shares calculation

Hard 45 min

BPM and VORP Calculator

Chapter 15: Box Plus-Minus (BPM) and VORP

Build a tool that calculates Box Plus-Minus and Value Over Replacement Player.

["BPM" "VORP" "replacement level"]

BPM and VORP Calculator - Starter Code

R

# TODO: Calculate BPM and VORP

Python

# TODO: Calculate BPM and VORP

Hard 60 min

Simulate Plus-Minus Regression

Chapter 16: Real Plus-Minus (RPM)

Use regularized regression to estimate player impact from lineup data.

["RAPM" "regression" "regularization"]

Simulate Plus-Minus Regression - Starter Code

R

# TODO: Build RAPM-style model

Python

# TODO: Build RAPM-style model

Medium 30 min

Parse Tracking Data JSON

Chapter 21: Introduction to Player Tracking Data

Load and parse NBA tracking data from JSON format into analysis-ready dataframes.

["tracking data" "JSON parsing" "data wrangling"]

Parse Tracking Data JSON - Starter Code

R

# TODO: Parse tracking JSON

Python

# TODO: Parse tracking JSON

Medium 35 min

Calculate Player Speed Distribution

Chapter 22: Speed, Distance, and Movement Analysis

Analyze player speed from tracking data and create speed distribution visualizations.

["speed" "tracking" "distribution"]

Calculate Player Speed Distribution - Starter Code

R

# TODO: Calculate and visualize speed

Python

# TODO: Calculate and visualize speed

Hard 50 min

Expected Points Model

Chapter 24: Finishing at the Rim

Build a shot quality model that predicts expected points based on shot location and defender distance.

["xPTS" "shot quality" "modeling"]

Expected Points Model - Starter Code

R

# TODO: Build expected points model

Python

# TODO: Build expected points model

Hard 45 min

Rim Protection Analysis

Chapter 30: Defensive Rating and Team Defense

Calculate rim protection metrics including DFG% differential and block rate.

["rim protection" "defense" "tracking"]

Rim Protection Analysis - Starter Code

R

# TODO: Analyze rim protection

Python

# TODO: Analyze rim protection

Medium 35 min

Perimeter Defense Evaluation

Chapter 33: Perimeter Defense Metrics

Evaluate perimeter defenders based on contest rate and opponent shooting.

["perimeter defense" "contests" "evaluation"]

Perimeter Defense Evaluation - Starter Code

R

# TODO: Evaluate perimeter defense

Python

# TODO: Evaluate perimeter defense

Medium 40 min

Four Factors Dashboard

Chapter 38: Four Factors Analysis

Create a comprehensive Four Factors analysis comparing multiple teams.

["Four Factors" "team analysis" "visualization"]

Four Factors Dashboard - Starter Code

R

# TODO: Build Four Factors comparison

Python

# TODO: Build Four Factors comparison

Medium 35 min

Lineup Net Rating Calculator

Chapter 39: Lineup Analysis and Optimization

Calculate net rating for lineup combinations with sample size warnings.

["lineups" "net rating" "sample size"]

Lineup Net Rating Calculator - Starter Code

R

# TODO: Calculate lineup ratings

Python

# TODO: Calculate lineup ratings

Hard 50 min

Build Game Prediction Model

Chapter 43: Introduction to Basketball Prediction

Create a logistic regression model to predict game outcomes.

["prediction" "logistic regression" "classification"]

Build Game Prediction Model - Starter Code

R

# TODO: Build game predictor

Python

# TODO: Build game predictor

Hard 45 min

Win Probability Calculator

Chapter 45: Player Performance Projections

Build a real-time win probability model based on score margin and time remaining.

["win probability" "real-time" "modeling"]

Win Probability Calculator - Starter Code

R

# TODO: Calculate win probability

Python

# TODO: Calculate win probability

Medium 30 min

Pythagorean Expectation Analysis

Chapter 46: Aging Curves and Career Trajectories

Calculate expected wins using Pythagorean expectation and identify lucky/unlucky teams.

["pythagorean" "expected wins" "luck"]

Pythagorean Expectation Analysis - Starter Code

R

# TODO: Calculate expected wins

Python

# TODO: Calculate expected wins

Medium 30 min

Fetch Player Stats from API

Chapter 2: NBA Data Sources and APIs

Write a function to fetch current season player statistics from the NBA Stats API. Handle potential errors gracefully and return the data as a clean dataframe.

["API" "error handling" "data retrieval"]

Fetch Player Stats from API - Starter Code

R

library(httr)
library(jsonlite)

# TODO: Create function to fetch player stats
# fetch_player_stats <- function(season = "2023-24") {
#   base_url <- "https://stats.nba.com/stats/leaguedashplayerstats"
#   # Add headers and parameters
#   # Make request
#   # Parse response
#   # Return dataframe
# }

# TODO: Call function and display first 10 rows

Python

import requests
import pandas as pd

# TODO: Create function to fetch player stats
# def fetch_player_stats(season="2023-24"):
#     base_url = "https://stats.nba.com/stats/leaguedashplayerstats"
#     # Add headers and parameters
#     # Make request
#     # Parse response
#     # Return dataframe

# TODO: Call function and display first 10 rows

Hard 50 min

Draft Pick Value Model

Chapter 48: Win Probability Models

Analyze historical draft data to estimate pick value and trade equivalencies.

["draft" "pick value" "trade analysis"]

Draft Pick Value Model - Starter Code

R

# TODO: Analyze draft pick value

Python

# TODO: Analyze draft pick value

Medium 35 min

Salary Cap Efficiency Analysis

Chapter 49: Draft Analytics and Prospect Evaluation

Calculate cost per win share and identify best value contracts.

["salary cap" "contract value" "efficiency"]

Salary Cap Efficiency Analysis - Starter Code

R

# TODO: Analyze salary efficiency

Python

# TODO: Analyze salary efficiency

Hard 45 min

Player Clustering with K-Means

Chapter 56: Machine Learning in Basketball

Use k-means clustering to identify player archetypes based on statistical profiles.

["clustering" "machine learning" "archetypes"]

Player Clustering with K-Means - Starter Code

R

# TODO: Cluster players

Python

# TODO: Cluster players

Medium 40 min

Build Simple Shot Chart

Chapter 57: Computer Vision and Video Analysis

Create a court visualization with shot locations using tracking coordinates.

["visualization" "shot chart" "spatial"]

Build Simple Shot Chart - Starter Code

R

# TODO: Create shot chart

Python

# TODO: Create shot chart

Hard 60 min

Build Streamlit Dashboard

Chapter 67: Building an Analytics Dashboard

Create a simple interactive dashboard for player comparison using Streamlit.

["dashboard" "Streamlit" "interactive"]

Build Streamlit Dashboard - Starter Code

R

# Use Python for Streamlit

Python

# TODO: Build Streamlit dashboard

Easy 20 min

Compare Scoring Across Eras

Chapter P: The Analytics Revolution in Basketball

Load historical NBA scoring data and compare average points per game between the 1990s and 2020s. Calculate the percentage change and visualize the trend over time.

["data loading" "basic statistics" "comparison"]

Compare Scoring Across Eras - Starter Code

R

# Load tidyverse
library(tidyverse)

# TODO: Load the historical data from "nba_historical.csv"
# historical <- ???

# TODO: Filter for 1990s (1990-1999) and 2020s (2020-2024)
# nineties <- ???
# twenties <- ???

# TODO: Calculate average PPG for each era
# avg_90s <- ???
# avg_20s <- ???

# TODO: Calculate percentage change
# pct_change <- ???

Python

import pandas as pd

# TODO: Load the historical data
# historical = ???

# TODO: Filter for 1990s and 2020s
# nineties = ???
# twenties = ???

# TODO: Calculate average PPG for each era
# avg_90s = ???
# avg_20s = ???

# TODO: Calculate percentage change
# pct_change = ???

Easy 15 min

Verify Analytics Environment

Chapter 1: Setting Up Your Analytics Environment

Create a function that checks if all required packages are installed and returns a summary of your analytics environment including package versions.

["environment setup" "functions" "package management"]

Verify Analytics Environment - Starter Code

R

# TODO: Create a function that checks for required packages
# check_environment <- function() {
#   required <- c("tidyverse", "httr", "jsonlite")
#   # Check each package and return status
# }

# TODO: Call the function and print results

Python

# TODO: Create a function that checks for required packages
# def check_environment():
#     required = ["pandas", "numpy", "requests", "matplotlib"]
#     # Check each package and return status

# TODO: Call the function and print results

Medium 30 min

Clutch Performance Analysis

Chapter 41: Clutch Performance Analytics

Identify and analyze clutch performance (final 5 minutes, margin within 5).

["clutch" "pressure" "situational"]

Clutch Performance Analysis - Starter Code

R

# TODO: Analyze clutch performance

Python

# TODO: Analyze clutch performance

Medium 35 min

Clean and Transform Player Data

Chapter 3: Data Wrangling with tidyverse and pandas

Given a messy dataset with missing values, inconsistent formats, and duplicates, clean and transform it into analysis-ready format. Calculate derived metrics and handle edge cases.

["data cleaning" "transformation" "missing values"]

Clean and Transform Player Data - Starter Code

R

library(tidyverse)

# Sample messy data (in practice, load from file)
messy_data <- tibble(
  player = c("LeBron James", "lebron james", "Kevin Durant", NA, "Stephen Curry"),
  team = c("LAL", "lal", "PHX", "GSW", "GSW"),
  pts = c("25.5", "25.5", "29.1", "30.2", "invalid"),
  games = c(55, 55, 47, 65, 56),
  min = c(35.2, 35.2, 34.8, 32.1, 34.7)
)

# TODO: Clean the data
# 1. Remove duplicates
# 2. Handle missing values
# 3. Convert pts to numeric
# 4. Standardize team abbreviations
# 5. Calculate pts_per_min

Python

import pandas as pd
import numpy as np

# Sample messy data
messy_data = pd.DataFrame({
    "player": ["LeBron James", "lebron james", "Kevin Durant", None, "Stephen Curry"],
    "team": ["LAL", "lal", "PHX", "GSW", "GSW"],
    "pts": ["25.5", "25.5", "29.1", "30.2", "invalid"],
    "games": [55, 55, 47, 65, 56],
    "min": [35.2, 35.2, 34.8, 32.1, 34.7]
})

# TODO: Clean the data
# 1. Remove duplicates
# 2. Handle missing values
# 3. Convert pts to numeric
# 4. Standardize team abbreviations
# 5. Calculate pts_per_min

Easy 15 min

Calculate True Shooting Percentage

Write a function to calculate True Shooting Percentage (TS%) for any player given their points, field goal attempts, and free throw attempts. Test with sample data.

functions metrics efficiency

Calculate True Shooting Percentage - Starter Code

R

# Create a function called calculate_ts
# Arguments: points (PTS), fga (FGA), fta (FTA)
# Return: True Shooting Percentage

calculate_ts <- function(pts, fga, fta) {
  # Your code here
}

# Test cases:
# calculate_ts(25, 18, 6) should return ~0.595
# calculate_ts(30, 20, 10) should return ~0.577

Python

# Create a function called calculate_ts
# Arguments: pts, fga, fta
# Return: True Shooting Percentage

def calculate_ts(pts, fga, fta):
    """Calculate True Shooting Percentage"""
    # Your code here
    pass

# Test cases:
# calculate_ts(25, 18, 6) should return ~0.595
# calculate_ts(30, 20, 10) should return ~0.577

Easy 20 min

Filter High-Volume Scorers

Load player statistics and filter to only include players who average at least 20 points per game and play at least 30 minutes per game.

data-wrangling filtering pandas tidyverse

Filter High-Volume Scorers - Starter Code

R

library(hoopR)
library(tidyverse)

# Get 2023-24 player stats
# Filter for PPG >= 20 and MPG >= 30
# Display player name, team, PPG, and MPG

# Your code here

Python

from nba_api.stats.endpoints import leaguedashplayerstats
import pandas as pd

# Get 2023-24 player stats
# Filter for PPG >= 20 and MPG >= 30
# Display player name, team, PPG, and MPG

# Your code here

Medium 25 min

Team Assist Leaders

Find the leading assister for each NBA team. Display the player name, team abbreviation, and assists per game.

groupby aggregation data-wrangling

Team Assist Leaders - Starter Code

R

library(hoopR)
library(tidyverse)

# Get player stats
# Group by team
# Find max assists per team
# Display results

# Your code here

Python

from nba_api.stats.endpoints import leaguedashplayerstats
import pandas as pd

# Get player stats
# Group by team
# Find max assists per team
# Display results

# Your code here

Medium 25 min

Calculate Per-100 Possession Stats

Convert raw counting statistics to per-100-possession rates. Create a function that takes points, rebounds, assists, and possessions played, returning the per-100 rates.

functions pace-adjustment normalization

Calculate Per-100 Possession Stats - Starter Code

R

# Create per_100_stats function
# Input: pts, reb, ast, possessions
# Output: named vector with per-100 rates

per_100_stats <- function(pts, reb, ast, poss) {
  # Your code here
}

# Test: A player with 20 pts, 8 reb, 5 ast in 80 possessions
# Should return pts100 = 25, reb100 = 10, ast100 = 6.25

Python

# Create per_100_stats function
# Input: pts, reb, ast, possessions
# Output: dictionary with per-100 rates

def per_100_stats(pts, reb, ast, poss):
    """Convert to per-100 possession rates"""
    # Your code here
    pass

# Test: A player with 20 pts, 8 reb, 5 ast in 80 possessions
# Should return pts100 = 25, reb100 = 10, ast100 = 6.25

Medium 45 min

Create a Shot Chart

Using shot location data, create a basic shot chart visualization showing makes and misses on a basketball court diagram.

visualization shot-charts ggplot2 matplotlib

Create a Shot Chart - Starter Code

R

library(hoopR)
library(ggplot2)

# Get shot data for a player (e.g., Stephen Curry)
# Create court outline
# Plot shots colored by make/miss
# Add title and legend

# Your code here

Python

from nba_api.stats.endpoints import shotchartdetail
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Get shot data for a player
# Create court outline
# Plot shots colored by make/miss
# Add title and legend

# Your code here

Hard 60 min

Hexbin Shot Chart

Create a hexbin shot chart showing shooting efficiency by court zone. Use color to indicate efficiency (FG%) and size/opacity for volume.

visualization hexbin shot-quality

Hexbin Shot Chart - Starter Code

R

library(hoopR)
library(ggplot2)
library(hexbin)

# Get shot data
# Create hexagonal bins by court location
# Color by efficiency, size by volume
# Add court lines and labels

# Your code here

Python

from nba_api.stats.endpoints import shotchartdetail
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
import numpy as np

# Get shot data
# Create hexagonal bins by court location
# Color by efficiency, size by volume
# Add court lines and labels

# Your code here

Hard 45 min

Calculate Player Efficiency Rating

Implement the PER formula to calculate Player Efficiency Rating. Compare your results to published values.

per advanced-metrics formulas

Calculate Player Efficiency Rating - Starter Code

R

# PER Formula Components (simplified):
# PER = (1/MP) * [3PM + (2/3)*AST + (2-factor*tm_ast/tm_fg)*FG
#       + FT*0.5*(1+(1-(tm_ast/tm_fg))+(2/3)*(tm_ast/tm_fg))
#       - VOP*TOV - VOP*DRB%*(FGA-FG) - VOP*0.44*(0.44+(0.56*DRB%))*(FTA-FT)
#       + VOP*(1-DRB%)*(TRB-ORB) + VOP*DRB%*ORB + VOP*STL + VOP*DRB%*BLK
#       - PF*(lg_FT/lg_PF - 0.44*(lg_FTA/lg_PF)*VOP)]

# Implement step by step
# Your code here

Python

# PER Formula - Implement step by step
# This is a complex calculation with many components

# Your code here

Hard 50 min

Calculate Box Plus-Minus

Implement a simplified BPM calculation. Use the box score components to estimate a player's point differential contribution.

bpm plus-minus regression

Calculate Box Plus-Minus - Starter Code

R

# BPM uses regression weights on box score stats
# Simplified formula (actual uses more complex coefficients):
# BPM ≈ 0.123*ORB% + 0.053*DRB% - 0.104*AST% + 0.076*STL%
#       + 0.131*BLK% - 0.036*TOV% + 0.003*USG% - 0.087*Position

# Implement and test on sample players
# Your code here

Python

# BPM uses regression weights on box score stats
# Implement the simplified calculation

# Your code here

Medium 40 min

Analyze Tracking Data Profiles

Load player tracking data and create player profiles based on speed, touches, and time of possession. Identify different player types.

tracking clustering player-types

Analyze Tracking Data Profiles - Starter Code

R

library(hoopR)
library(tidyverse)

# Get tracking data
# Calculate key metrics: speed, touches, time of possession
# Identify clusters of player types
# Visualize the profiles

# Your code here

Python

from nba_api.stats.endpoints import leaguedashptstats
import pandas as pd
from sklearn.cluster import KMeans

# Get tracking data
# Calculate key metrics
# Identify clusters of player types
# Visualize the profiles

# Your code here

Hard 50 min

Defensive Rating Analysis

Calculate individual defensive ratings using tracking data. Compare rim protection, perimeter defense, and overall impact.

defense tracking rating

Defensive Rating Analysis - Starter Code

R

library(hoopR)
library(tidyverse)

# Get defensive tracking data
# Calculate DFG% at rim and perimeter
# Compare to opponent average FG%
# Rank players by defensive impact

# Your code here

Python

from nba_api.stats.endpoints import leaguedashptdefend
import pandas as pd

# Get defensive tracking data
# Calculate DFG% at rim and perimeter
# Compare to opponent average FG%
# Rank players by defensive impact

# Your code here

Medium 35 min

Contested Rebound Analysis

Analyze the relationship between contested rebounds and team defensive rebounding success. Which players provide the most value?

rebounding hustle correlation

Contested Rebound Analysis - Starter Code

R

library(hoopR)
library(tidyverse)

# Get hustle stats with rebound data
# Calculate contested vs uncontested rebounds
# Correlate with team defensive rebounding
# Identify high-value rebounders

# Your code here

Python

from nba_api.stats.endpoints import leaguehustlestatsplayer
import pandas as pd

# Get hustle stats with rebound data
# Calculate contested vs uncontested rebounds
# Correlate with team defensive rebounding
# Identify high-value rebounders

# Your code here

Medium 40 min

Playmaking Value Analysis

Compare assist-based playmaking metrics. Calculate potential assists, assist conversion rate, and points created for top playmakers.

playmaking assists creation

Playmaking Value Analysis - Starter Code

R

library(hoopR)
library(tidyverse)

# Get passing stats
# Calculate potential assists and conversion rate
# Estimate points created from assists
# Compare top playmakers

# Your code here

Python

from nba_api.stats.endpoints import leaguedashptstats
import pandas as pd

# Get passing stats
# Calculate potential assists and conversion rate
# Estimate points created from assists
# Compare top playmakers

# Your code here

Hard 60 min

Build Shot Quality Model

Create a logistic regression model predicting shot make probability based on distance, shot type, and defender distance.

machine-learning shot-quality logistic-regression

Build Shot Quality Model - Starter Code

R

library(hoopR)
library(tidyverse)

# Get shot data with features
# Build logistic regression model
# Predict make probability
# Calculate expected points (xPTS)
# Evaluate model accuracy

# Your code here

Python

from nba_api.stats.endpoints import shotchartdetail
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd

# Get shot data with features
# Build logistic regression model
# Predict make probability
# Calculate expected points (xPTS)
# Evaluate model accuracy

# Your code here

Medium 45 min

Three-Point Revolution Analysis

Analyze the historical trend of three-point shooting. Calculate year-over-year changes in 3PA, 3P%, and the decline of mid-range.

historical trends visualization

Three-Point Revolution Analysis - Starter Code

R

library(hoopR)
library(tidyverse)
library(ggplot2)

# Get historical league averages
# Calculate 3PA and 3P% by year
# Track mid-range volume decline
# Visualize the revolution

# Your code here

Python

from basketball_reference_scraper import seasons
import pandas as pd
import matplotlib.pyplot as plt

# Get historical league averages
# Calculate 3PA and 3P% by year
# Track mid-range volume decline
# Visualize the revolution

# Your code here

Medium 40 min

Shot Selection Optimization

Given shot quality data, calculate each player's expected points from their shot distribution. Identify players with good/bad shot selection.

shot-selection expected-value optimization

Shot Selection Optimization - Starter Code

R

library(hoopR)
library(tidyverse)

# Get shot zone data for players
# Calculate expected points by zone
# Compare to league average
# Identify best/worst shot selection

# Your code here

Python

from nba_api.stats.endpoints import playerdashptshots
import pandas as pd

# Get shot zone data for players
# Calculate expected points by zone
# Compare to league average
# Identify best/worst shot selection

# Your code here

Hard 75 min

Build Player Projection Model

Create a simple projection model for next-season performance using previous seasons, age, and regression to mean.

machine-learning projection regression

Build Player Projection Model - Starter Code

R

library(hoopR)
library(tidyverse)

# Get multi-year player data
# Create features (previous stats, age, etc.)
# Build regression model
# Project next season
# Validate on historical data

# Your code here

Python

import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Get multi-year player data
# Create features (previous stats, age, etc.)
# Build regression model
# Project next season
# Validate on historical data

# Your code here

Hard 60 min

Create Custom Metric

Design and implement your own composite metric measuring a specific aspect of player value. Validate its predictive power.

metrics custom validation

Create Custom Metric - Starter Code

R

library(hoopR)
library(tidyverse)

# Define what you want to measure
# Select component statistics
# Weight and combine components
# Test correlation with wins
# Document and explain the metric

# Your code here

Python

import pandas as pd
from scipy.stats import pearsonr

# Define what you want to measure
# Select component statistics
# Weight and combine components
# Test correlation with wins
# Document and explain the metric

# Your code here

Hard 60 min

Player Similarity Tool

Build a tool that finds the most similar players to a given player using statistical profiles and clustering.

similarity clustering machine-learning

Player Similarity Tool - Starter Code

R

library(hoopR)
library(tidyverse)

# Get player statistical profiles
# Normalize/scale features
# Calculate similarity (cosine or Euclidean)
# Find N most similar players
# Create visualization

# Your code here

Python

from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Get player statistical profiles
# Normalize/scale features
# Calculate similarity (cosine or Euclidean)
# Find N most similar players
# Create visualization

# Your code here

Medium 45 min

Lineup Analysis Tool

Analyze the performance of different 5-player lineups. Calculate net rating and identify the best/worst combinations.

lineups net-rating combinations

Lineup Analysis Tool - Starter Code

R

library(hoopR)
library(tidyverse)

# Get lineup data for a team
# Calculate net rating for each lineup
# Filter for adequate minutes
# Identify best/worst lineups
# Analyze what makes them work

# Your code here

Python

from nba_api.stats.endpoints import teamdashlineups
import pandas as pd

# Get lineup data for a team
# Calculate net rating for each lineup
# Filter for adequate minutes
# Identify best/worst lineups
# Analyze what makes them work

# Your code here

Easy 25 min

Pace-Adjusted Analysis

Convert a team's statistics to pace-adjusted values. Compare raw vs. pace-adjusted rankings.

pace adjustment normalization

Pace-Adjusted Analysis - Starter Code

R

library(hoopR)
library(tidyverse)

# Get team stats including pace
# Calculate per-100-possession values
# Compare raw vs adjusted rankings
# Identify biggest movers

# Your code here

Python

from nba_api.stats.endpoints import teamgamelogs
import pandas as pd

# Get team stats including pace
# Calculate per-100-possession values
# Compare raw vs adjusted rankings
# Identify biggest movers

# Your code here

Hard 60 min

Player Archetype Clustering

Use K-means clustering to identify player archetypes based on statistical profiles. Visualize and label the clusters.

clustering archetypes k-means

Player Archetype Clustering - Starter Code

R

library(hoopR)
library(tidyverse)
library(cluster)

# Get comprehensive player stats
# Select features for clustering
# Run K-means with various k
# Visualize clusters with PCA
# Label and interpret archetypes

# Your code here

Python

from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import pandas as pd
import matplotlib.pyplot as plt

# Get comprehensive player stats
# Select features for clustering
# Run K-means with various k
# Visualize clusters with PCA
# Label and interpret archetypes

# Your code here

Medium 40 min

Contract Value Analysis

Calculate the value per dollar for player contracts. Identify the best and worst values in the league.

salary value contracts

Contract Value Analysis - Starter Code

R

library(hoopR)
library(tidyverse)

# Get player salaries and stats
# Estimate wins contributed (using BPM or WS)
# Calculate value per dollar
# Identify best/worst contracts
# Consider age and years remaining

# Your code here

Python

import pandas as pd

# Get player salaries and stats
# Estimate wins contributed
# Calculate value per dollar
# Identify best/worst contracts
# Consider age and years remaining

# Your code here

Hard 60 min

Aging Curve Construction

Build aging curves showing how different skills change with age. Use delta method on year-over-year changes.

aging projection regression

Aging Curve Construction - Starter Code

R

library(hoopR)
library(tidyverse)

# Get multi-year player data
# Calculate year-over-year changes
# Group by age
# Build aging curves
# Visualize by skill type

# Your code here

Python

import pandas as pd
import matplotlib.pyplot as plt

# Get multi-year player data
# Calculate year-over-year changes
# Group by age
# Build aging curves
# Visualize by skill type

# Your code here

Hard 75 min

Draft Projection Model

Build a simple NBA draft projection model using college statistics. Predict NBA success based on college performance.

draft projection machine-learning

Draft Projection Model - Starter Code

R

library(hoopR)
library(tidyverse)

# Get historical draft and college data
# Define success metric (NBA WAR, etc.)
# Feature engineering
# Train prediction model
# Evaluate on recent drafts

# Your code here

Python

from sklearn.ensemble import GradientBoostingRegressor
import pandas as pd

# Get historical draft and college data
# Define success metric
# Feature engineering
# Train prediction model
# Evaluate on recent drafts

# Your code here

Medium 45 min

Era-Adjusted Comparison

Compare players across eras using era-adjusted statistics. Who was truly the best scorer relative to their era?

historical era-adjustment comparison

Era-Adjusted Comparison - Starter Code

R

library(hoopR)
library(tidyverse)

# Get historical player and league data
# Calculate Z-scores relative to era
# Compare across different eras
# Create era-adjusted rankings

# Your code here

Python

import pandas as pd

# Get historical player and league data
# Calculate Z-scores relative to era
# Compare across different eras
# Create era-adjusted rankings

# Your code here

Hard 60 min

All-Time Ranking System

Create a composite ranking system for all-time greatest players combining peak performance, longevity, and championships.

historical ranking composite

All-Time Ranking System - Starter Code

R

library(hoopR)
library(tidyverse)

# Get career stats for all-time players
# Calculate peak (best 5-year WAR)
# Calculate career value (total WAR)
# Factor in championships
# Create weighted composite ranking

# Your code here

Python

import pandas as pd

# Get career stats for all-time players
# Calculate peak (best 5-year WAR)
# Calculate career value (total WAR)
# Factor in championships
# Create weighted composite ranking

# Your code here

Hard 60 min

Hall of Fame Probability Model

Build a logistic regression model predicting Hall of Fame induction based on career statistics and awards.

classification hall-of-fame logistic-regression

Hall of Fame Probability Model - Starter Code

R

library(hoopR)
library(tidyverse)

# Get career stats and HoF status
# Feature engineering (awards, totals, etc.)
# Build logistic regression model
# Predict probabilities for active players
# Calibrate and validate

# Your code here

Python

from sklearn.linear_model import LogisticRegression
import pandas as pd

# Get career stats and HoF status
# Feature engineering
# Build logistic regression model
# Predict probabilities for active players
# Calibrate and validate

# Your code here

Hard 75 min

Championship Probability Model

Build a model predicting championship probability for each team based on point differential and other factors.

championship simulation prediction

Championship Probability Model - Starter Code

R

library(hoopR)
library(tidyverse)

# Get team ratings and standings
# Build win probability model
# Simulate playoff bracket
# Calculate championship probabilities
# Compare to betting markets

# Your code here

Python

import pandas as pd
import numpy as np

# Get team ratings and standings
# Build win probability model
# Simulate playoff bracket
# Calculate championship probabilities
# Compare to betting markets

# Your code here

Medium 30 min

True Shooting Leaderboard Analysis

Build a complete True Shooting % leaderboard with volume filters, league comparisons, and visualization.

True Shooting Leaderboard Analysis - Starter Code

R

library(tidyverse)
library(hoopR)

# TODO: Load player statistics for 2023-24
# TODO: Calculate True Shooting %
# TODO: Filter for qualified players (10+ FGA per game)
# TODO: Calculate league-relative TS (TS+)
# TODO: Identify the most efficient high-volume scorers
# TODO: Create a visualization

Python

from nba_api.stats.endpoints import LeagueDashPlayerStats
import pandas as pd
import matplotlib.pyplot as plt

# TODO: Load player statistics for 2023-24
# TODO: Calculate True Shooting %
# TODO: Filter for qualified players (10+ FGA per game)
# TODO: Calculate league-relative TS (TS+)
# TODO: Identify the most efficient high-volume scorers
# TODO: Create a visualization

Hard 30 min

Build a Simplified BPM Calculator

Create a Box Plus-Minus estimator using regression on box score statistics.

Build a Simplified BPM Calculator - Starter Code

R

library(tidyverse)

# TODO: Load historical player data with known BPM values
# TODO: Select relevant features (per-minute stats)
# TODO: Split into training and test sets
# TODO: Train a regression model
# TODO: Evaluate model accuracy
# TODO: Apply to current season players

Python

import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

# TODO: Load historical player data with known BPM values
# TODO: Select relevant features (per-minute stats)
# TODO: Split into training and test sets
# TODO: Train a regression model
# TODO: Evaluate model accuracy
# TODO: Apply to current season players

Hard 30 min

Build a Shot Quality Model

Create an expected points model based on shot location and context.

Build a Shot Quality Model - Starter Code

R

library(tidyverse)
library(hoopR)

# TODO: Load shot chart data
# TODO: Engineer features (distance, angle, shot zone)
# TODO: Train a model to predict make probability
# TODO: Calculate expected points for each shot
# TODO: Evaluate players on shot selection vs shot making

Python

from nba_api.stats.endpoints import ShotChartDetail
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# TODO: Load shot chart data
# TODO: Engineer features
# TODO: Train model
# TODO: Calculate expected points

Medium 30 min

Player Similarity Finder

Build a system to find the most similar players based on statistical profiles using cosine similarity.

Player Similarity Finder - Starter Code

R

library(tidyverse)

# TODO: Load player statistics
# TODO: Normalize stats
# TODO: Calculate similarity
# TODO: Find most similar players

Python

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# TODO: Load player statistics
# TODO: Normalize stats
# TODO: Calculate similarity matrix
# TODO: Create similarity finder function

Medium 30 min

Interactive Player Comparison Dashboard

Create an interactive radar chart visualization comparing multiple players across key metrics.

Interactive Player Comparison Dashboard - Starter Code

R

library(tidyverse)
library(plotly)

# TODO: Select players to compare
# TODO: Create radar chart
# TODO: Add interactivity

Python

import plotly.graph_objects as go
import pandas as pd

# TODO: Select players
# TODO: Create radar chart
# TODO: Add hover tooltips