Practice Exercises

Apply what you've learned with hands-on coding exercises. Each exercise includes starter code in both R and Python, along with solutions when you're ready.

Filter by difficulty: All Easy Medium Hard

70 exercises available

Medium 40 min
Create a Player Comparison Visualization

Chapter 4: Data Visualization Fundamentals

Build a multi-panel visualization comparing two players across key statistics. Include a bar chart for counting stats, a radar chart for percentages, and proper styling.

["visualization" "multi-panel" "comparison"]
Create a Player Comparison Visualization - Starter Code
R
library(tidyverse)
library(ggplot2)

# Player comparison data
player1 <- list(name = "Player A", pts = 28.5, reb = 7.2, ast = 5.4, stl = 1.2, fg_pct = 0.52, ft_pct = 0.85)
player2 <- list(name = "Player B", pts = 24.1, reb = 10.8, ast = 4.1, stl = 0.8, fg_pct = 0.58, ft_pct = 0.72)

# TODO: Create comparison visualization
# 1. Bar chart comparing pts, reb, ast, stl
# 2. Add proper labels and title
# 3. Use appropriate colors for each player
# 4. Create side-by-side or faceted layout
Python
import matplotlib.pyplot as plt
import numpy as np

# Player comparison data
player1 = {"name": "Player A", "pts": 28.5, "reb": 7.2, "ast": 5.4, "stl": 1.2, "fg_pct": 0.52, "ft_pct": 0.85}
player2 = {"name": "Player B", "pts": 24.1, "reb": 10.8, "ast": 4.1, "stl": 0.8, "fg_pct": 0.58, "ft_pct": 0.72}

# TODO: Create comparison visualization
# 1. Bar chart comparing pts, reb, ast, stl
# 2. Add proper labels and title
# 3. Use appropriate colors for each player
# 4. Create side-by-side layout
Medium 35 min
Hypothesis Testing for Three-Point Shooting

Chapter 5: Statistical Foundations for Analytics

Test whether there is a statistically significant difference in three-point shooting percentage between guards and forwards. Perform appropriate statistical tests and interpret results.

["hypothesis testing" "t-test" "statistical inference"]
Hypothesis Testing for Three-Point Shooting - Starter Code
R
library(tidyverse)

# Sample data
set.seed(42)
guards <- data.frame(
  position = "Guard",
  fg3_pct = rnorm(50, mean = 0.38, sd = 0.05)
)
forwards <- data.frame(
  position = "Forward",
  fg3_pct = rnorm(50, mean = 0.35, sd = 0.06)
)
players <- rbind(guards, forwards)

# TODO:
# 1. Calculate summary statistics for each group
# 2. Perform two-sample t-test
# 3. Calculate effect size (Cohens d)
# 4. Interpret results
Python
import numpy as np
from scipy import stats
import pandas as pd

# Sample data
np.random.seed(42)
guards = pd.DataFrame({
    "position": "Guard",
    "fg3_pct": np.random.normal(0.38, 0.05, 50)
})
forwards = pd.DataFrame({
    "position": "Forward",
    "fg3_pct": np.random.normal(0.35, 0.06, 50)
})
players = pd.concat([guards, forwards])

# TODO:
# 1. Calculate summary statistics for each group
# 2. Perform two-sample t-test
# 3. Calculate effect size (Cohens d)
# 4. Interpret results
Easy 25 min
Analyze Relationships Between Player Statistics

Chapter 5: Statistical Foundations for Analytics

Explore the correlations between key player statistics. Create a correlation matrix, identify the strongest relationships, and visualize the results with a heatmap.

["correlation" "heatmap" "exploratory analysis"]
Analyze Relationships Between Player Statistics - Starter Code
R
library(tidyverse)

# Sample player data
set.seed(42)
n <- 100
player_stats <- tibble(
  pts = runif(n, 5, 30),
  ast = runif(n, 1, 10),
  reb = runif(n, 2, 12),
  min = runif(n, 15, 38),
  tov = runif(n, 0.5, 4)
)
# Add some realistic correlations
player_stats$ast <- player_stats$ast + player_stats$pts * 0.1
player_stats$tov <- player_stats$tov + player_stats$ast * 0.2

# TODO:
# 1. Calculate correlation matrix
# 2. Find the 3 strongest correlations
# 3. Create a heatmap visualization
Python
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Sample player data
np.random.seed(42)
n = 100
player_stats = pd.DataFrame({
    "pts": np.random.uniform(5, 30, n),
    "ast": np.random.uniform(1, 10, n),
    "reb": np.random.uniform(2, 12, n),
    "min": np.random.uniform(15, 38, n),
    "tov": np.random.uniform(0.5, 4, n)
})
player_stats["ast"] += player_stats["pts"] * 0.1
player_stats["tov"] += player_stats["ast"] * 0.2

# TODO:
# 1. Calculate correlation matrix
# 2. Find the 3 strongest correlations
# 3. Create a heatmap visualization
Medium 30 min
Parse Game Box Scores

Chapter 6: Box Score Statistics Deep Dive

Write a function to parse raw box score data and calculate Game Score (John Hollinger formula) for each player.

["box score" "Game Score" "parsing"]
Parse Game Box Scores - Starter Code
R
library(tidyverse)

# Game Score = PTS + 0.4*FGM - 0.7*FGA - 0.4*(FTA-FTM) + 0.7*OREB + 0.3*DREB + STL + 0.7*AST + 0.7*BLK - 0.4*PF - TOV

# TODO: Create calculate_game_score function
Python
import pandas as pd

# Game Score = PTS + 0.4*FGM - 0.7*FGA - 0.4*(FTA-FTM) + 0.7*OREB + 0.3*DREB + STL + 0.7*AST + 0.7*BLK - 0.4*PF - TOV

# TODO: Create calculate_game_score function
Easy 20 min
Calculate All Shooting Efficiency Metrics

Chapter 7: Shooting Efficiency Metrics (TS%, eFG%)

Create a comprehensive function that calculates FG%, eFG%, and TS% for a player and explains when to use each.

["shooting efficiency" "TS%" "eFG%"]
Calculate All Shooting Efficiency Metrics - Starter Code
R
# TODO: Create function that returns FG%, eFG%, and TS%
Python
# TODO: Create function that returns FG%, eFG%, and TS%
Medium 35 min
Shot Zone Efficiency Analysis

Chapter 7: Shooting Efficiency Metrics (TS%, eFG%)

Analyze shooting efficiency by zone (rim, mid-range, three-point) and determine optimal shot distribution.

["shot zones" "efficiency" "optimization"]
Shot Zone Efficiency Analysis - Starter Code
R
# TODO: Calculate efficiency by zone and recommend optimal distribution
Python
# TODO: Calculate efficiency by zone and recommend optimal distribution
Hard 45 min
Build Assist Network Visualization

Chapter 9: Playmaking and Assist Metrics

Create a pass network diagram showing assist relationships between teammates.

["network analysis" "assists" "visualization"]
Build Assist Network Visualization - Starter Code
R
# TODO: Build assist network from play-by-play data
Python
# TODO: Build assist network from play-by-play data
Medium 30 min
Turnover Type Classification

Chapter 10: Turnover Analysis and Ball Security

Classify turnovers by type and analyze patterns to identify areas for improvement.

["turnovers" "classification" "pattern analysis"]
Turnover Type Classification - Starter Code
R
# TODO: Classify and analyze turnover types
Python
# TODO: Classify and analyze turnover types
Medium 30 min
Calculate Rebounding Percentages

Chapter 8: Rebounding and Possession Metrics

Implement ORB%, DRB%, and TRB% calculations with proper team context adjustments.

["rebounding" "TRB%" "rate stats"]
Calculate Rebounding Percentages - Starter Code
R
# TODO: Calculate rebounding percentages
Python
# TODO: Calculate rebounding percentages
Medium 25 min
Calculate Team Pace

Chapter 11: Pace and Tempo Analysis

Implement multiple pace calculation formulas and compare results.

["pace" "possessions" "tempo"]
Calculate Team Pace - Starter Code
R
# TODO: Calculate pace using multiple formulas
Python
# TODO: Calculate pace using multiple formulas
Medium 30 min
Pace-Adjusted Team Comparison

Chapter 11: Pace and Tempo Analysis

Compare two teams' performance after adjusting for pace differences.

["pace adjustment" "team comparison" "normalization"]
Pace-Adjusted Team Comparison - Starter Code
R
# TODO: Pace-adjust and compare teams
Python
# TODO: Pace-adjust and compare teams
Medium 30 min
Calculate Offensive and Defensive Ratings

Chapter 12: Per-Possession and Rate Statistics

Compute team and player offensive/defensive ratings per 100 possessions.

["ratings" "per-100" "efficiency"]
Calculate Offensive and Defensive Ratings - Starter Code
R
# TODO: Calculate ORtg and DRtg
Python
# TODO: Calculate ORtg and DRtg
Easy 20 min
Per-36 and Per-100 Conversion Tool

Chapter 12: Per-Possession and Rate Statistics

Build a tool that converts raw stats to per-36 minutes and per-100 possessions.

["rate stats" "conversion" "normalization"]
Per-36 and Per-100 Conversion Tool - Starter Code
R
# TODO: Create per-36 and per-100 conversion functions
Python
# TODO: Create per-36 and per-100 conversion functions
Hard 45 min
Calculate Player Efficiency Rating

Chapter 13: Player Efficiency Rating (PER)

Implement the PER formula and calculate league-adjusted PER for multiple players.

["PER" "advanced metrics" "normalization"]
Calculate Player Efficiency Rating - Starter Code
R
# TODO: Implement PER calculation
Python
# TODO: Implement PER calculation
Hard 50 min
Win Shares Calculation

Chapter 14: Win Shares and WS/48

Calculate offensive and defensive win shares for a player given their stats and team context.

["Win Shares" "value metrics" "team context"]
Win Shares Calculation - Starter Code
R
# TODO: Implement Win Shares calculation
Python
# TODO: Implement Win Shares calculation
Hard 45 min
BPM and VORP Calculator

Chapter 15: Box Plus-Minus (BPM) and VORP

Build a tool that calculates Box Plus-Minus and Value Over Replacement Player.

["BPM" "VORP" "replacement level"]
BPM and VORP Calculator - Starter Code
R
# TODO: Calculate BPM and VORP
Python
# TODO: Calculate BPM and VORP
Hard 60 min
Simulate Plus-Minus Regression

Chapter 16: Real Plus-Minus (RPM)

Use regularized regression to estimate player impact from lineup data.

["RAPM" "regression" "regularization"]
Simulate Plus-Minus Regression - Starter Code
R
# TODO: Build RAPM-style model
Python
# TODO: Build RAPM-style model
Medium 30 min
Parse Tracking Data JSON

Chapter 21: Introduction to Player Tracking Data

Load and parse NBA tracking data from JSON format into analysis-ready dataframes.

["tracking data" "JSON parsing" "data wrangling"]
Parse Tracking Data JSON - Starter Code
R
# TODO: Parse tracking JSON
Python
# TODO: Parse tracking JSON
Medium 35 min
Calculate Player Speed Distribution

Chapter 22: Speed, Distance, and Movement Analysis

Analyze player speed from tracking data and create speed distribution visualizations.

["speed" "tracking" "distribution"]
Calculate Player Speed Distribution - Starter Code
R
# TODO: Calculate and visualize speed
Python
# TODO: Calculate and visualize speed
Hard 50 min
Expected Points Model

Chapter 24: Finishing at the Rim

Build a shot quality model that predicts expected points based on shot location and defender distance.

["xPTS" "shot quality" "modeling"]
Expected Points Model - Starter Code
R
# TODO: Build expected points model
Python
# TODO: Build expected points model
Hard 45 min
Rim Protection Analysis

Chapter 30: Defensive Rating and Team Defense

Calculate rim protection metrics including DFG% differential and block rate.

["rim protection" "defense" "tracking"]
Rim Protection Analysis - Starter Code
R
# TODO: Analyze rim protection
Python
# TODO: Analyze rim protection
Medium 35 min
Perimeter Defense Evaluation

Chapter 33: Perimeter Defense Metrics

Evaluate perimeter defenders based on contest rate and opponent shooting.

["perimeter defense" "contests" "evaluation"]
Perimeter Defense Evaluation - Starter Code
R
# TODO: Evaluate perimeter defense
Python
# TODO: Evaluate perimeter defense
Medium 40 min
Four Factors Dashboard

Chapter 38: Four Factors Analysis

Create a comprehensive Four Factors analysis comparing multiple teams.

["Four Factors" "team analysis" "visualization"]
Four Factors Dashboard - Starter Code
R
# TODO: Build Four Factors comparison
Python
# TODO: Build Four Factors comparison
Medium 35 min
Lineup Net Rating Calculator

Chapter 39: Lineup Analysis and Optimization

Calculate net rating for lineup combinations with sample size warnings.

["lineups" "net rating" "sample size"]
Lineup Net Rating Calculator - Starter Code
R
# TODO: Calculate lineup ratings
Python
# TODO: Calculate lineup ratings
Hard 50 min
Build Game Prediction Model

Chapter 43: Introduction to Basketball Prediction

Create a logistic regression model to predict game outcomes.

["prediction" "logistic regression" "classification"]
Build Game Prediction Model - Starter Code
R
# TODO: Build game predictor
Python
# TODO: Build game predictor
Hard 45 min
Win Probability Calculator

Chapter 45: Player Performance Projections

Build a real-time win probability model based on score margin and time remaining.

["win probability" "real-time" "modeling"]
Win Probability Calculator - Starter Code
R
# TODO: Calculate win probability
Python
# TODO: Calculate win probability
Medium 30 min
Pythagorean Expectation Analysis

Chapter 46: Aging Curves and Career Trajectories

Calculate expected wins using Pythagorean expectation and identify lucky/unlucky teams.

["pythagorean" "expected wins" "luck"]
Pythagorean Expectation Analysis - Starter Code
R
# TODO: Calculate expected wins
Python
# TODO: Calculate expected wins
Medium 30 min
Fetch Player Stats from API

Chapter 2: NBA Data Sources and APIs

Write a function to fetch current season player statistics from the NBA Stats API. Handle potential errors gracefully and return the data as a clean dataframe.

["API" "error handling" "data retrieval"]
Fetch Player Stats from API - Starter Code
R
library(httr)
library(jsonlite)

# TODO: Create function to fetch player stats
# fetch_player_stats <- function(season = "2023-24") {
#   base_url <- "https://stats.nba.com/stats/leaguedashplayerstats"
#   # Add headers and parameters
#   # Make request
#   # Parse response
#   # Return dataframe
# }

# TODO: Call function and display first 10 rows
Python
import requests
import pandas as pd

# TODO: Create function to fetch player stats
# def fetch_player_stats(season="2023-24"):
#     base_url = "https://stats.nba.com/stats/leaguedashplayerstats"
#     # Add headers and parameters
#     # Make request
#     # Parse response
#     # Return dataframe

# TODO: Call function and display first 10 rows
Hard 50 min
Draft Pick Value Model

Chapter 48: Win Probability Models

Analyze historical draft data to estimate pick value and trade equivalencies.

["draft" "pick value" "trade analysis"]
Draft Pick Value Model - Starter Code
R
# TODO: Analyze draft pick value
Python
# TODO: Analyze draft pick value
Medium 35 min
Salary Cap Efficiency Analysis

Chapter 49: Draft Analytics and Prospect Evaluation

Calculate cost per win share and identify best value contracts.

["salary cap" "contract value" "efficiency"]
Salary Cap Efficiency Analysis - Starter Code
R
# TODO: Analyze salary efficiency
Python
# TODO: Analyze salary efficiency
Hard 45 min
Player Clustering with K-Means

Chapter 56: Machine Learning in Basketball

Use k-means clustering to identify player archetypes based on statistical profiles.

["clustering" "machine learning" "archetypes"]
Player Clustering with K-Means - Starter Code
R
# TODO: Cluster players
Python
# TODO: Cluster players
Medium 40 min
Build Simple Shot Chart

Chapter 57: Computer Vision and Video Analysis

Create a court visualization with shot locations using tracking coordinates.

["visualization" "shot chart" "spatial"]
Build Simple Shot Chart - Starter Code
R
# TODO: Create shot chart
Python
# TODO: Create shot chart
Hard 60 min
Build Streamlit Dashboard

Chapter 67: Building an Analytics Dashboard

Create a simple interactive dashboard for player comparison using Streamlit.

["dashboard" "Streamlit" "interactive"]
Build Streamlit Dashboard - Starter Code
R
# Use Python for Streamlit
Python
# TODO: Build Streamlit dashboard
Easy 20 min
Compare Scoring Across Eras

Chapter P: The Analytics Revolution in Basketball

Load historical NBA scoring data and compare average points per game between the 1990s and 2020s. Calculate the percentage change and visualize the trend over time.

["data loading" "basic statistics" "comparison"]
Compare Scoring Across Eras - Starter Code
R
# Load tidyverse
library(tidyverse)

# TODO: Load the historical data from "nba_historical.csv"
# historical <- ???

# TODO: Filter for 1990s (1990-1999) and 2020s (2020-2024)
# nineties <- ???
# twenties <- ???

# TODO: Calculate average PPG for each era
# avg_90s <- ???
# avg_20s <- ???

# TODO: Calculate percentage change
# pct_change <- ???
Python
import pandas as pd

# TODO: Load the historical data
# historical = ???

# TODO: Filter for 1990s and 2020s
# nineties = ???
# twenties = ???

# TODO: Calculate average PPG for each era
# avg_90s = ???
# avg_20s = ???

# TODO: Calculate percentage change
# pct_change = ???
Easy 15 min
Verify Analytics Environment

Chapter 1: Setting Up Your Analytics Environment

Create a function that checks if all required packages are installed and returns a summary of your analytics environment including package versions.

["environment setup" "functions" "package management"]
Verify Analytics Environment - Starter Code
R
# TODO: Create a function that checks for required packages
# check_environment <- function() {
#   required <- c("tidyverse", "httr", "jsonlite")
#   # Check each package and return status
# }

# TODO: Call the function and print results
Python
# TODO: Create a function that checks for required packages
# def check_environment():
#     required = ["pandas", "numpy", "requests", "matplotlib"]
#     # Check each package and return status

# TODO: Call the function and print results
Medium 30 min
Clutch Performance Analysis

Chapter 41: Clutch Performance Analytics

Identify and analyze clutch performance (final 5 minutes, margin within 5).

["clutch" "pressure" "situational"]
Clutch Performance Analysis - Starter Code
R
# TODO: Analyze clutch performance
Python
# TODO: Analyze clutch performance
Medium 35 min
Clean and Transform Player Data

Chapter 3: Data Wrangling with tidyverse and pandas

Given a messy dataset with missing values, inconsistent formats, and duplicates, clean and transform it into analysis-ready format. Calculate derived metrics and handle edge cases.

["data cleaning" "transformation" "missing values"]
Clean and Transform Player Data - Starter Code
R
library(tidyverse)

# Sample messy data (in practice, load from file)
messy_data <- tibble(
  player = c("LeBron James", "lebron james", "Kevin Durant", NA, "Stephen Curry"),
  team = c("LAL", "lal", "PHX", "GSW", "GSW"),
  pts = c("25.5", "25.5", "29.1", "30.2", "invalid"),
  games = c(55, 55, 47, 65, 56),
  min = c(35.2, 35.2, 34.8, 32.1, 34.7)
)

# TODO: Clean the data
# 1. Remove duplicates
# 2. Handle missing values
# 3. Convert pts to numeric
# 4. Standardize team abbreviations
# 5. Calculate pts_per_min
Python
import pandas as pd
import numpy as np

# Sample messy data
messy_data = pd.DataFrame({
    "player": ["LeBron James", "lebron james", "Kevin Durant", None, "Stephen Curry"],
    "team": ["LAL", "lal", "PHX", "GSW", "GSW"],
    "pts": ["25.5", "25.5", "29.1", "30.2", "invalid"],
    "games": [55, 55, 47, 65, 56],
    "min": [35.2, 35.2, 34.8, 32.1, 34.7]
})

# TODO: Clean the data
# 1. Remove duplicates
# 2. Handle missing values
# 3. Convert pts to numeric
# 4. Standardize team abbreviations
# 5. Calculate pts_per_min
Easy 15 min
Calculate True Shooting Percentage

Write a function to calculate True Shooting Percentage (TS%) for any player given their points, field goal attempts, and free throw attempts. Test with sample data.

functions metrics efficiency
Calculate True Shooting Percentage - Starter Code
R
# Create a function called calculate_ts
# Arguments: points (PTS), fga (FGA), fta (FTA)
# Return: True Shooting Percentage

calculate_ts <- function(pts, fga, fta) {
  # Your code here
}

# Test cases:
# calculate_ts(25, 18, 6) should return ~0.595
# calculate_ts(30, 20, 10) should return ~0.577
Python
# Create a function called calculate_ts
# Arguments: pts, fga, fta
# Return: True Shooting Percentage

def calculate_ts(pts, fga, fta):
    """Calculate True Shooting Percentage"""
    # Your code here
    pass

# Test cases:
# calculate_ts(25, 18, 6) should return ~0.595
# calculate_ts(30, 20, 10) should return ~0.577
Easy 20 min
Filter High-Volume Scorers

Load player statistics and filter to only include players who average at least 20 points per game and play at least 30 minutes per game.

data-wrangling filtering pandas tidyverse
Filter High-Volume Scorers - Starter Code
R
library(hoopR)
library(tidyverse)

# Get 2023-24 player stats
# Filter for PPG >= 20 and MPG >= 30
# Display player name, team, PPG, and MPG

# Your code here
Python
from nba_api.stats.endpoints import leaguedashplayerstats
import pandas as pd

# Get 2023-24 player stats
# Filter for PPG >= 20 and MPG >= 30
# Display player name, team, PPG, and MPG

# Your code here
Medium 25 min
Team Assist Leaders

Find the leading assister for each NBA team. Display the player name, team abbreviation, and assists per game.

groupby aggregation data-wrangling
Team Assist Leaders - Starter Code
R
library(hoopR)
library(tidyverse)

# Get player stats
# Group by team
# Find max assists per team
# Display results

# Your code here
Python
from nba_api.stats.endpoints import leaguedashplayerstats
import pandas as pd

# Get player stats
# Group by team
# Find max assists per team
# Display results

# Your code here
Medium 25 min
Calculate Per-100 Possession Stats

Convert raw counting statistics to per-100-possession rates. Create a function that takes points, rebounds, assists, and possessions played, returning the per-100 rates.

functions pace-adjustment normalization
Calculate Per-100 Possession Stats - Starter Code
R
# Create per_100_stats function
# Input: pts, reb, ast, possessions
# Output: named vector with per-100 rates

per_100_stats <- function(pts, reb, ast, poss) {
  # Your code here
}

# Test: A player with 20 pts, 8 reb, 5 ast in 80 possessions
# Should return pts100 = 25, reb100 = 10, ast100 = 6.25
Python
# Create per_100_stats function
# Input: pts, reb, ast, possessions
# Output: dictionary with per-100 rates

def per_100_stats(pts, reb, ast, poss):
    """Convert to per-100 possession rates"""
    # Your code here
    pass

# Test: A player with 20 pts, 8 reb, 5 ast in 80 possessions
# Should return pts100 = 25, reb100 = 10, ast100 = 6.25
Medium 45 min
Create a Shot Chart

Using shot location data, create a basic shot chart visualization showing makes and misses on a basketball court diagram.

visualization shot-charts ggplot2 matplotlib
Create a Shot Chart - Starter Code
R
library(hoopR)
library(ggplot2)

# Get shot data for a player (e.g., Stephen Curry)
# Create court outline
# Plot shots colored by make/miss
# Add title and legend

# Your code here
Python
from nba_api.stats.endpoints import shotchartdetail
import matplotlib.pyplot as plt
import matplotlib.patches as patches

# Get shot data for a player
# Create court outline
# Plot shots colored by make/miss
# Add title and legend

# Your code here
Hard 60 min
Hexbin Shot Chart

Create a hexbin shot chart showing shooting efficiency by court zone. Use color to indicate efficiency (FG%) and size/opacity for volume.

visualization hexbin shot-quality
Hexbin Shot Chart - Starter Code
R
library(hoopR)
library(ggplot2)
library(hexbin)

# Get shot data
# Create hexagonal bins by court location
# Color by efficiency, size by volume
# Add court lines and labels

# Your code here
Python
from nba_api.stats.endpoints import shotchartdetail
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
import numpy as np

# Get shot data
# Create hexagonal bins by court location
# Color by efficiency, size by volume
# Add court lines and labels

# Your code here
Hard 45 min
Calculate Player Efficiency Rating

Implement the PER formula to calculate Player Efficiency Rating. Compare your results to published values.

per advanced-metrics formulas
Calculate Player Efficiency Rating - Starter Code
R
# PER Formula Components (simplified):
# PER = (1/MP) * [3PM + (2/3)*AST + (2-factor*tm_ast/tm_fg)*FG
#       + FT*0.5*(1+(1-(tm_ast/tm_fg))+(2/3)*(tm_ast/tm_fg))
#       - VOP*TOV - VOP*DRB%*(FGA-FG) - VOP*0.44*(0.44+(0.56*DRB%))*(FTA-FT)
#       + VOP*(1-DRB%)*(TRB-ORB) + VOP*DRB%*ORB + VOP*STL + VOP*DRB%*BLK
#       - PF*(lg_FT/lg_PF - 0.44*(lg_FTA/lg_PF)*VOP)]

# Implement step by step
# Your code here
Python
# PER Formula - Implement step by step
# This is a complex calculation with many components

# Your code here
Hard 50 min
Calculate Box Plus-Minus

Implement a simplified BPM calculation. Use the box score components to estimate a player's point differential contribution.

bpm plus-minus regression
Calculate Box Plus-Minus - Starter Code
R
# BPM uses regression weights on box score stats
# Simplified formula (actual uses more complex coefficients):
# BPM ≈ 0.123*ORB% + 0.053*DRB% - 0.104*AST% + 0.076*STL%
#       + 0.131*BLK% - 0.036*TOV% + 0.003*USG% - 0.087*Position

# Implement and test on sample players
# Your code here
Python
# BPM uses regression weights on box score stats
# Implement the simplified calculation

# Your code here
Medium 40 min
Analyze Tracking Data Profiles

Load player tracking data and create player profiles based on speed, touches, and time of possession. Identify different player types.

tracking clustering player-types
Analyze Tracking Data Profiles - Starter Code
R
library(hoopR)
library(tidyverse)

# Get tracking data
# Calculate key metrics: speed, touches, time of possession
# Identify clusters of player types
# Visualize the profiles

# Your code here
Python
from nba_api.stats.endpoints import leaguedashptstats
import pandas as pd
from sklearn.cluster import KMeans

# Get tracking data
# Calculate key metrics
# Identify clusters of player types
# Visualize the profiles

# Your code here
Hard 50 min
Defensive Rating Analysis

Calculate individual defensive ratings using tracking data. Compare rim protection, perimeter defense, and overall impact.

defense tracking rating
Defensive Rating Analysis - Starter Code
R
library(hoopR)
library(tidyverse)

# Get defensive tracking data
# Calculate DFG% at rim and perimeter
# Compare to opponent average FG%
# Rank players by defensive impact

# Your code here
Python
from nba_api.stats.endpoints import leaguedashptdefend
import pandas as pd

# Get defensive tracking data
# Calculate DFG% at rim and perimeter
# Compare to opponent average FG%
# Rank players by defensive impact

# Your code here
Medium 35 min
Contested Rebound Analysis

Analyze the relationship between contested rebounds and team defensive rebounding success. Which players provide the most value?

rebounding hustle correlation
Contested Rebound Analysis - Starter Code
R
library(hoopR)
library(tidyverse)

# Get hustle stats with rebound data
# Calculate contested vs uncontested rebounds
# Correlate with team defensive rebounding
# Identify high-value rebounders

# Your code here
Python
from nba_api.stats.endpoints import leaguehustlestatsplayer
import pandas as pd

# Get hustle stats with rebound data
# Calculate contested vs uncontested rebounds
# Correlate with team defensive rebounding
# Identify high-value rebounders

# Your code here
Medium 40 min
Playmaking Value Analysis

Compare assist-based playmaking metrics. Calculate potential assists, assist conversion rate, and points created for top playmakers.

playmaking assists creation
Playmaking Value Analysis - Starter Code
R
library(hoopR)
library(tidyverse)

# Get passing stats
# Calculate potential assists and conversion rate
# Estimate points created from assists
# Compare top playmakers

# Your code here
Python
from nba_api.stats.endpoints import leaguedashptstats
import pandas as pd

# Get passing stats
# Calculate potential assists and conversion rate
# Estimate points created from assists
# Compare top playmakers

# Your code here
Hard 60 min
Build Shot Quality Model

Create a logistic regression model predicting shot make probability based on distance, shot type, and defender distance.

machine-learning shot-quality logistic-regression
Build Shot Quality Model - Starter Code
R
library(hoopR)
library(tidyverse)

# Get shot data with features
# Build logistic regression model
# Predict make probability
# Calculate expected points (xPTS)
# Evaluate model accuracy

# Your code here
Python
from nba_api.stats.endpoints import shotchartdetail
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd

# Get shot data with features
# Build logistic regression model
# Predict make probability
# Calculate expected points (xPTS)
# Evaluate model accuracy

# Your code here
Medium 45 min
Three-Point Revolution Analysis

Analyze the historical trend of three-point shooting. Calculate year-over-year changes in 3PA, 3P%, and the decline of mid-range.

historical trends visualization
Three-Point Revolution Analysis - Starter Code
R
library(hoopR)
library(tidyverse)
library(ggplot2)

# Get historical league averages
# Calculate 3PA and 3P% by year
# Track mid-range volume decline
# Visualize the revolution

# Your code here
Python
from basketball_reference_scraper import seasons
import pandas as pd
import matplotlib.pyplot as plt

# Get historical league averages
# Calculate 3PA and 3P% by year
# Track mid-range volume decline
# Visualize the revolution

# Your code here
Medium 40 min
Shot Selection Optimization

Given shot quality data, calculate each player's expected points from their shot distribution. Identify players with good/bad shot selection.

shot-selection expected-value optimization
Shot Selection Optimization - Starter Code
R
library(hoopR)
library(tidyverse)

# Get shot zone data for players
# Calculate expected points by zone
# Compare to league average
# Identify best/worst shot selection

# Your code here
Python
from nba_api.stats.endpoints import playerdashptshots
import pandas as pd

# Get shot zone data for players
# Calculate expected points by zone
# Compare to league average
# Identify best/worst shot selection

# Your code here
Hard 75 min
Build Player Projection Model

Create a simple projection model for next-season performance using previous seasons, age, and regression to mean.

machine-learning projection regression
Build Player Projection Model - Starter Code
R
library(hoopR)
library(tidyverse)

# Get multi-year player data
# Create features (previous stats, age, etc.)
# Build regression model
# Project next season
# Validate on historical data

# Your code here
Python
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split

# Get multi-year player data
# Create features (previous stats, age, etc.)
# Build regression model
# Project next season
# Validate on historical data

# Your code here
Hard 60 min
Create Custom Metric

Design and implement your own composite metric measuring a specific aspect of player value. Validate its predictive power.

metrics custom validation
Create Custom Metric - Starter Code
R
library(hoopR)
library(tidyverse)

# Define what you want to measure
# Select component statistics
# Weight and combine components
# Test correlation with wins
# Document and explain the metric

# Your code here
Python
import pandas as pd
from scipy.stats import pearsonr

# Define what you want to measure
# Select component statistics
# Weight and combine components
# Test correlation with wins
# Document and explain the metric

# Your code here
Hard 60 min
Player Similarity Tool

Build a tool that finds the most similar players to a given player using statistical profiles and clustering.

similarity clustering machine-learning
Player Similarity Tool - Starter Code
R
library(hoopR)
library(tidyverse)

# Get player statistical profiles
# Normalize/scale features
# Calculate similarity (cosine or Euclidean)
# Find N most similar players
# Create visualization

# Your code here
Python
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd

# Get player statistical profiles
# Normalize/scale features
# Calculate similarity (cosine or Euclidean)
# Find N most similar players
# Create visualization

# Your code here
Medium 45 min
Lineup Analysis Tool

Analyze the performance of different 5-player lineups. Calculate net rating and identify the best/worst combinations.

lineups net-rating combinations
Lineup Analysis Tool - Starter Code
R
library(hoopR)
library(tidyverse)

# Get lineup data for a team
# Calculate net rating for each lineup
# Filter for adequate minutes
# Identify best/worst lineups
# Analyze what makes them work

# Your code here
Python
from nba_api.stats.endpoints import teamdashlineups
import pandas as pd

# Get lineup data for a team
# Calculate net rating for each lineup
# Filter for adequate minutes
# Identify best/worst lineups
# Analyze what makes them work

# Your code here
Easy 25 min
Pace-Adjusted Analysis

Convert a team's statistics to pace-adjusted values. Compare raw vs. pace-adjusted rankings.

pace adjustment normalization
Pace-Adjusted Analysis - Starter Code
R
library(hoopR)
library(tidyverse)

# Get team stats including pace
# Calculate per-100-possession values
# Compare raw vs adjusted rankings
# Identify biggest movers

# Your code here
Python
from nba_api.stats.endpoints import teamgamelogs
import pandas as pd

# Get team stats including pace
# Calculate per-100-possession values
# Compare raw vs adjusted rankings
# Identify biggest movers

# Your code here
Hard 60 min
Player Archetype Clustering

Use K-means clustering to identify player archetypes based on statistical profiles. Visualize and label the clusters.

clustering archetypes k-means
Player Archetype Clustering - Starter Code
R
library(hoopR)
library(tidyverse)
library(cluster)

# Get comprehensive player stats
# Select features for clustering
# Run K-means with various k
# Visualize clusters with PCA
# Label and interpret archetypes

# Your code here
Python
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import pandas as pd
import matplotlib.pyplot as plt

# Get comprehensive player stats
# Select features for clustering
# Run K-means with various k
# Visualize clusters with PCA
# Label and interpret archetypes

# Your code here
Medium 40 min
Contract Value Analysis

Calculate the value per dollar for player contracts. Identify the best and worst values in the league.

salary value contracts
Contract Value Analysis - Starter Code
R
library(hoopR)
library(tidyverse)

# Get player salaries and stats
# Estimate wins contributed (using BPM or WS)
# Calculate value per dollar
# Identify best/worst contracts
# Consider age and years remaining

# Your code here
Python
import pandas as pd

# Get player salaries and stats
# Estimate wins contributed
# Calculate value per dollar
# Identify best/worst contracts
# Consider age and years remaining

# Your code here
Hard 60 min
Aging Curve Construction

Build aging curves showing how different skills change with age. Use delta method on year-over-year changes.

aging projection regression
Aging Curve Construction - Starter Code
R
library(hoopR)
library(tidyverse)

# Get multi-year player data
# Calculate year-over-year changes
# Group by age
# Build aging curves
# Visualize by skill type

# Your code here
Python
import pandas as pd
import matplotlib.pyplot as plt

# Get multi-year player data
# Calculate year-over-year changes
# Group by age
# Build aging curves
# Visualize by skill type

# Your code here
Hard 75 min
Draft Projection Model

Build a simple NBA draft projection model using college statistics. Predict NBA success based on college performance.

draft projection machine-learning
Draft Projection Model - Starter Code
R
library(hoopR)
library(tidyverse)

# Get historical draft and college data
# Define success metric (NBA WAR, etc.)
# Feature engineering
# Train prediction model
# Evaluate on recent drafts

# Your code here
Python
from sklearn.ensemble import GradientBoostingRegressor
import pandas as pd

# Get historical draft and college data
# Define success metric
# Feature engineering
# Train prediction model
# Evaluate on recent drafts

# Your code here
Medium 45 min
Era-Adjusted Comparison

Compare players across eras using era-adjusted statistics. Who was truly the best scorer relative to their era?

historical era-adjustment comparison
Era-Adjusted Comparison - Starter Code
R
library(hoopR)
library(tidyverse)

# Get historical player and league data
# Calculate Z-scores relative to era
# Compare across different eras
# Create era-adjusted rankings

# Your code here
Python
import pandas as pd

# Get historical player and league data
# Calculate Z-scores relative to era
# Compare across different eras
# Create era-adjusted rankings

# Your code here
Hard 60 min
All-Time Ranking System

Create a composite ranking system for all-time greatest players combining peak performance, longevity, and championships.

historical ranking composite
All-Time Ranking System - Starter Code
R
library(hoopR)
library(tidyverse)

# Get career stats for all-time players
# Calculate peak (best 5-year WAR)
# Calculate career value (total WAR)
# Factor in championships
# Create weighted composite ranking

# Your code here
Python
import pandas as pd

# Get career stats for all-time players
# Calculate peak (best 5-year WAR)
# Calculate career value (total WAR)
# Factor in championships
# Create weighted composite ranking

# Your code here
Hard 60 min
Hall of Fame Probability Model

Build a logistic regression model predicting Hall of Fame induction based on career statistics and awards.

classification hall-of-fame logistic-regression
Hall of Fame Probability Model - Starter Code
R
library(hoopR)
library(tidyverse)

# Get career stats and HoF status
# Feature engineering (awards, totals, etc.)
# Build logistic regression model
# Predict probabilities for active players
# Calibrate and validate

# Your code here
Python
from sklearn.linear_model import LogisticRegression
import pandas as pd

# Get career stats and HoF status
# Feature engineering
# Build logistic regression model
# Predict probabilities for active players
# Calibrate and validate

# Your code here
Hard 75 min
Championship Probability Model

Build a model predicting championship probability for each team based on point differential and other factors.

championship simulation prediction
Championship Probability Model - Starter Code
R
library(hoopR)
library(tidyverse)

# Get team ratings and standings
# Build win probability model
# Simulate playoff bracket
# Calculate championship probabilities
# Compare to betting markets

# Your code here
Python
import pandas as pd
import numpy as np

# Get team ratings and standings
# Build win probability model
# Simulate playoff bracket
# Calculate championship probabilities
# Compare to betting markets

# Your code here
Medium 30 min
True Shooting Leaderboard Analysis

Build a complete True Shooting % leaderboard with volume filters, league comparisons, and visualization.

True Shooting Leaderboard Analysis - Starter Code
R
library(tidyverse)
library(hoopR)

# TODO: Load player statistics for 2023-24
# TODO: Calculate True Shooting %
# TODO: Filter for qualified players (10+ FGA per game)
# TODO: Calculate league-relative TS (TS+)
# TODO: Identify the most efficient high-volume scorers
# TODO: Create a visualization
Python
from nba_api.stats.endpoints import LeagueDashPlayerStats
import pandas as pd
import matplotlib.pyplot as plt

# TODO: Load player statistics for 2023-24
# TODO: Calculate True Shooting %
# TODO: Filter for qualified players (10+ FGA per game)
# TODO: Calculate league-relative TS (TS+)
# TODO: Identify the most efficient high-volume scorers
# TODO: Create a visualization
Hard 30 min
Build a Simplified BPM Calculator

Create a Box Plus-Minus estimator using regression on box score statistics.

Build a Simplified BPM Calculator - Starter Code
R
library(tidyverse)

# TODO: Load historical player data with known BPM values
# TODO: Select relevant features (per-minute stats)
# TODO: Split into training and test sets
# TODO: Train a regression model
# TODO: Evaluate model accuracy
# TODO: Apply to current season players
Python
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split

# TODO: Load historical player data with known BPM values
# TODO: Select relevant features (per-minute stats)
# TODO: Split into training and test sets
# TODO: Train a regression model
# TODO: Evaluate model accuracy
# TODO: Apply to current season players
Hard 30 min
Build a Shot Quality Model

Create an expected points model based on shot location and context.

Build a Shot Quality Model - Starter Code
R
library(tidyverse)
library(hoopR)

# TODO: Load shot chart data
# TODO: Engineer features (distance, angle, shot zone)
# TODO: Train a model to predict make probability
# TODO: Calculate expected points for each shot
# TODO: Evaluate players on shot selection vs shot making
Python
from nba_api.stats.endpoints import ShotChartDetail
import pandas as pd
from sklearn.ensemble import RandomForestClassifier

# TODO: Load shot chart data
# TODO: Engineer features
# TODO: Train model
# TODO: Calculate expected points
Medium 30 min
Player Similarity Finder

Build a system to find the most similar players based on statistical profiles using cosine similarity.

Player Similarity Finder - Starter Code
R
library(tidyverse)

# TODO: Load player statistics
# TODO: Normalize stats
# TODO: Calculate similarity
# TODO: Find most similar players
Python
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity

# TODO: Load player statistics
# TODO: Normalize stats
# TODO: Calculate similarity matrix
# TODO: Create similarity finder function
Medium 30 min
Interactive Player Comparison Dashboard

Create an interactive radar chart visualization comparing multiple players across key metrics.

Interactive Player Comparison Dashboard - Starter Code
R
library(tidyverse)
library(plotly)

# TODO: Select players to compare
# TODO: Create radar chart
# TODO: Add interactivity
Python
import plotly.graph_objects as go
import pandas as pd

# TODO: Select players
# TODO: Create radar chart
# TODO: Add hover tooltips
Tips for Success
  • Start with Easy exercises if you're new to R or Python
  • Read the chapter content before attempting related exercises
  • Try to solve problems yourself before looking at solutions
  • Experiment with the code - change parameters and see what happens
  • Use the glossary if you encounter unfamiliar terms
  • Don't hesitate to search for documentation online