Apply what you've learned with hands-on coding exercises. Each exercise includes starter code in both R and Python, along with solutions when you're ready.
70 exercises available
Chapter 4: Data Visualization Fundamentals
Build a multi-panel visualization comparing two players across key statistics. Include a bar chart for counting stats, a radar chart for percentages, and proper styling.
library(tidyverse)
library(ggplot2)
# Player comparison data
player1 <- list(name = "Player A", pts = 28.5, reb = 7.2, ast = 5.4, stl = 1.2, fg_pct = 0.52, ft_pct = 0.85)
player2 <- list(name = "Player B", pts = 24.1, reb = 10.8, ast = 4.1, stl = 0.8, fg_pct = 0.58, ft_pct = 0.72)
# TODO: Create comparison visualization
# 1. Bar chart comparing pts, reb, ast, stl
# 2. Add proper labels and title
# 3. Use appropriate colors for each player
# 4. Create side-by-side or faceted layout
import matplotlib.pyplot as plt
import numpy as np
# Player comparison data
player1 = {"name": "Player A", "pts": 28.5, "reb": 7.2, "ast": 5.4, "stl": 1.2, "fg_pct": 0.52, "ft_pct": 0.85}
player2 = {"name": "Player B", "pts": 24.1, "reb": 10.8, "ast": 4.1, "stl": 0.8, "fg_pct": 0.58, "ft_pct": 0.72}
# TODO: Create comparison visualization
# 1. Bar chart comparing pts, reb, ast, stl
# 2. Add proper labels and title
# 3. Use appropriate colors for each player
# 4. Create side-by-side layout
Chapter 5: Statistical Foundations for Analytics
Test whether there is a statistically significant difference in three-point shooting percentage between guards and forwards. Perform appropriate statistical tests and interpret results.
library(tidyverse)
# Sample data
set.seed(42)
guards <- data.frame(
position = "Guard",
fg3_pct = rnorm(50, mean = 0.38, sd = 0.05)
)
forwards <- data.frame(
position = "Forward",
fg3_pct = rnorm(50, mean = 0.35, sd = 0.06)
)
players <- rbind(guards, forwards)
# TODO:
# 1. Calculate summary statistics for each group
# 2. Perform two-sample t-test
# 3. Calculate effect size (Cohens d)
# 4. Interpret results
import numpy as np
from scipy import stats
import pandas as pd
# Sample data
np.random.seed(42)
guards = pd.DataFrame({
"position": "Guard",
"fg3_pct": np.random.normal(0.38, 0.05, 50)
})
forwards = pd.DataFrame({
"position": "Forward",
"fg3_pct": np.random.normal(0.35, 0.06, 50)
})
players = pd.concat([guards, forwards])
# TODO:
# 1. Calculate summary statistics for each group
# 2. Perform two-sample t-test
# 3. Calculate effect size (Cohens d)
# 4. Interpret results
Chapter 5: Statistical Foundations for Analytics
Explore the correlations between key player statistics. Create a correlation matrix, identify the strongest relationships, and visualize the results with a heatmap.
library(tidyverse)
# Sample player data
set.seed(42)
n <- 100
player_stats <- tibble(
pts = runif(n, 5, 30),
ast = runif(n, 1, 10),
reb = runif(n, 2, 12),
min = runif(n, 15, 38),
tov = runif(n, 0.5, 4)
)
# Add some realistic correlations
player_stats$ast <- player_stats$ast + player_stats$pts * 0.1
player_stats$tov <- player_stats$tov + player_stats$ast * 0.2
# TODO:
# 1. Calculate correlation matrix
# 2. Find the 3 strongest correlations
# 3. Create a heatmap visualization
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Sample player data
np.random.seed(42)
n = 100
player_stats = pd.DataFrame({
"pts": np.random.uniform(5, 30, n),
"ast": np.random.uniform(1, 10, n),
"reb": np.random.uniform(2, 12, n),
"min": np.random.uniform(15, 38, n),
"tov": np.random.uniform(0.5, 4, n)
})
player_stats["ast"] += player_stats["pts"] * 0.1
player_stats["tov"] += player_stats["ast"] * 0.2
# TODO:
# 1. Calculate correlation matrix
# 2. Find the 3 strongest correlations
# 3. Create a heatmap visualization
Chapter 6: Box Score Statistics Deep Dive
Write a function to parse raw box score data and calculate Game Score (John Hollinger formula) for each player.
library(tidyverse)
# Game Score = PTS + 0.4*FGM - 0.7*FGA - 0.4*(FTA-FTM) + 0.7*OREB + 0.3*DREB + STL + 0.7*AST + 0.7*BLK - 0.4*PF - TOV
# TODO: Create calculate_game_score function
import pandas as pd
# Game Score = PTS + 0.4*FGM - 0.7*FGA - 0.4*(FTA-FTM) + 0.7*OREB + 0.3*DREB + STL + 0.7*AST + 0.7*BLK - 0.4*PF - TOV
# TODO: Create calculate_game_score function
Chapter 7: Shooting Efficiency Metrics (TS%, eFG%)
Create a comprehensive function that calculates FG%, eFG%, and TS% for a player and explains when to use each.
# TODO: Create function that returns FG%, eFG%, and TS%
# TODO: Create function that returns FG%, eFG%, and TS%
Chapter 7: Shooting Efficiency Metrics (TS%, eFG%)
Analyze shooting efficiency by zone (rim, mid-range, three-point) and determine optimal shot distribution.
# TODO: Calculate efficiency by zone and recommend optimal distribution
# TODO: Calculate efficiency by zone and recommend optimal distribution
Chapter 9: Playmaking and Assist Metrics
Create a pass network diagram showing assist relationships between teammates.
# TODO: Build assist network from play-by-play data
# TODO: Build assist network from play-by-play data
Chapter 10: Turnover Analysis and Ball Security
Classify turnovers by type and analyze patterns to identify areas for improvement.
# TODO: Classify and analyze turnover types
# TODO: Classify and analyze turnover types
Chapter 8: Rebounding and Possession Metrics
Implement ORB%, DRB%, and TRB% calculations with proper team context adjustments.
# TODO: Calculate rebounding percentages
# TODO: Calculate rebounding percentages
Chapter 11: Pace and Tempo Analysis
Implement multiple pace calculation formulas and compare results.
# TODO: Calculate pace using multiple formulas
# TODO: Calculate pace using multiple formulas
Chapter 11: Pace and Tempo Analysis
Compare two teams' performance after adjusting for pace differences.
# TODO: Pace-adjust and compare teams
# TODO: Pace-adjust and compare teams
Chapter 12: Per-Possession and Rate Statistics
Compute team and player offensive/defensive ratings per 100 possessions.
# TODO: Calculate ORtg and DRtg
# TODO: Calculate ORtg and DRtg
Chapter 12: Per-Possession and Rate Statistics
Build a tool that converts raw stats to per-36 minutes and per-100 possessions.
# TODO: Create per-36 and per-100 conversion functions
# TODO: Create per-36 and per-100 conversion functions
Chapter 13: Player Efficiency Rating (PER)
Implement the PER formula and calculate league-adjusted PER for multiple players.
# TODO: Implement PER calculation
# TODO: Implement PER calculation
Chapter 14: Win Shares and WS/48
Calculate offensive and defensive win shares for a player given their stats and team context.
# TODO: Implement Win Shares calculation
# TODO: Implement Win Shares calculation
Chapter 15: Box Plus-Minus (BPM) and VORP
Build a tool that calculates Box Plus-Minus and Value Over Replacement Player.
# TODO: Calculate BPM and VORP
# TODO: Calculate BPM and VORP
Chapter 16: Real Plus-Minus (RPM)
Use regularized regression to estimate player impact from lineup data.
# TODO: Build RAPM-style model
# TODO: Build RAPM-style model
Chapter 21: Introduction to Player Tracking Data
Load and parse NBA tracking data from JSON format into analysis-ready dataframes.
# TODO: Parse tracking JSON
# TODO: Parse tracking JSON
Chapter 22: Speed, Distance, and Movement Analysis
Analyze player speed from tracking data and create speed distribution visualizations.
# TODO: Calculate and visualize speed
# TODO: Calculate and visualize speed
Chapter 24: Finishing at the Rim
Build a shot quality model that predicts expected points based on shot location and defender distance.
# TODO: Build expected points model
# TODO: Build expected points model
Chapter 30: Defensive Rating and Team Defense
Calculate rim protection metrics including DFG% differential and block rate.
# TODO: Analyze rim protection
# TODO: Analyze rim protection
Chapter 33: Perimeter Defense Metrics
Evaluate perimeter defenders based on contest rate and opponent shooting.
# TODO: Evaluate perimeter defense
# TODO: Evaluate perimeter defense
Chapter 38: Four Factors Analysis
Create a comprehensive Four Factors analysis comparing multiple teams.
# TODO: Build Four Factors comparison
# TODO: Build Four Factors comparison
Chapter 39: Lineup Analysis and Optimization
Calculate net rating for lineup combinations with sample size warnings.
# TODO: Calculate lineup ratings
# TODO: Calculate lineup ratings
Chapter 43: Introduction to Basketball Prediction
Create a logistic regression model to predict game outcomes.
# TODO: Build game predictor
# TODO: Build game predictor
Chapter 45: Player Performance Projections
Build a real-time win probability model based on score margin and time remaining.
# TODO: Calculate win probability
# TODO: Calculate win probability
Chapter 46: Aging Curves and Career Trajectories
Calculate expected wins using Pythagorean expectation and identify lucky/unlucky teams.
# TODO: Calculate expected wins
# TODO: Calculate expected wins
Chapter 2: NBA Data Sources and APIs
Write a function to fetch current season player statistics from the NBA Stats API. Handle potential errors gracefully and return the data as a clean dataframe.
library(httr)
library(jsonlite)
# TODO: Create function to fetch player stats
# fetch_player_stats <- function(season = "2023-24") {
# base_url <- "https://stats.nba.com/stats/leaguedashplayerstats"
# # Add headers and parameters
# # Make request
# # Parse response
# # Return dataframe
# }
# TODO: Call function and display first 10 rows
import requests
import pandas as pd
# TODO: Create function to fetch player stats
# def fetch_player_stats(season="2023-24"):
# base_url = "https://stats.nba.com/stats/leaguedashplayerstats"
# # Add headers and parameters
# # Make request
# # Parse response
# # Return dataframe
# TODO: Call function and display first 10 rows
Chapter 48: Win Probability Models
Analyze historical draft data to estimate pick value and trade equivalencies.
# TODO: Analyze draft pick value
# TODO: Analyze draft pick value
Chapter 49: Draft Analytics and Prospect Evaluation
Calculate cost per win share and identify best value contracts.
# TODO: Analyze salary efficiency
# TODO: Analyze salary efficiency
Chapter 56: Machine Learning in Basketball
Use k-means clustering to identify player archetypes based on statistical profiles.
# TODO: Cluster players
# TODO: Cluster players
Chapter 57: Computer Vision and Video Analysis
Create a court visualization with shot locations using tracking coordinates.
# TODO: Create shot chart
# TODO: Create shot chart
Chapter 67: Building an Analytics Dashboard
Create a simple interactive dashboard for player comparison using Streamlit.
# Use Python for Streamlit
# TODO: Build Streamlit dashboard
Chapter P: The Analytics Revolution in Basketball
Load historical NBA scoring data and compare average points per game between the 1990s and 2020s. Calculate the percentage change and visualize the trend over time.
# Load tidyverse
library(tidyverse)
# TODO: Load the historical data from "nba_historical.csv"
# historical <- ???
# TODO: Filter for 1990s (1990-1999) and 2020s (2020-2024)
# nineties <- ???
# twenties <- ???
# TODO: Calculate average PPG for each era
# avg_90s <- ???
# avg_20s <- ???
# TODO: Calculate percentage change
# pct_change <- ???
import pandas as pd
# TODO: Load the historical data
# historical = ???
# TODO: Filter for 1990s and 2020s
# nineties = ???
# twenties = ???
# TODO: Calculate average PPG for each era
# avg_90s = ???
# avg_20s = ???
# TODO: Calculate percentage change
# pct_change = ???
Chapter 1: Setting Up Your Analytics Environment
Create a function that checks if all required packages are installed and returns a summary of your analytics environment including package versions.
# TODO: Create a function that checks for required packages
# check_environment <- function() {
# required <- c("tidyverse", "httr", "jsonlite")
# # Check each package and return status
# }
# TODO: Call the function and print results
# TODO: Create a function that checks for required packages
# def check_environment():
# required = ["pandas", "numpy", "requests", "matplotlib"]
# # Check each package and return status
# TODO: Call the function and print results
Chapter 41: Clutch Performance Analytics
Identify and analyze clutch performance (final 5 minutes, margin within 5).
# TODO: Analyze clutch performance
# TODO: Analyze clutch performance
Chapter 3: Data Wrangling with tidyverse and pandas
Given a messy dataset with missing values, inconsistent formats, and duplicates, clean and transform it into analysis-ready format. Calculate derived metrics and handle edge cases.
library(tidyverse)
# Sample messy data (in practice, load from file)
messy_data <- tibble(
player = c("LeBron James", "lebron james", "Kevin Durant", NA, "Stephen Curry"),
team = c("LAL", "lal", "PHX", "GSW", "GSW"),
pts = c("25.5", "25.5", "29.1", "30.2", "invalid"),
games = c(55, 55, 47, 65, 56),
min = c(35.2, 35.2, 34.8, 32.1, 34.7)
)
# TODO: Clean the data
# 1. Remove duplicates
# 2. Handle missing values
# 3. Convert pts to numeric
# 4. Standardize team abbreviations
# 5. Calculate pts_per_min
import pandas as pd
import numpy as np
# Sample messy data
messy_data = pd.DataFrame({
"player": ["LeBron James", "lebron james", "Kevin Durant", None, "Stephen Curry"],
"team": ["LAL", "lal", "PHX", "GSW", "GSW"],
"pts": ["25.5", "25.5", "29.1", "30.2", "invalid"],
"games": [55, 55, 47, 65, 56],
"min": [35.2, 35.2, 34.8, 32.1, 34.7]
})
# TODO: Clean the data
# 1. Remove duplicates
# 2. Handle missing values
# 3. Convert pts to numeric
# 4. Standardize team abbreviations
# 5. Calculate pts_per_min
Write a function to calculate True Shooting Percentage (TS%) for any player given their points, field goal attempts, and free throw attempts. Test with sample data.
# Create a function called calculate_ts
# Arguments: points (PTS), fga (FGA), fta (FTA)
# Return: True Shooting Percentage
calculate_ts <- function(pts, fga, fta) {
# Your code here
}
# Test cases:
# calculate_ts(25, 18, 6) should return ~0.595
# calculate_ts(30, 20, 10) should return ~0.577
# Create a function called calculate_ts
# Arguments: pts, fga, fta
# Return: True Shooting Percentage
def calculate_ts(pts, fga, fta):
"""Calculate True Shooting Percentage"""
# Your code here
pass
# Test cases:
# calculate_ts(25, 18, 6) should return ~0.595
# calculate_ts(30, 20, 10) should return ~0.577
Load player statistics and filter to only include players who average at least 20 points per game and play at least 30 minutes per game.
library(hoopR)
library(tidyverse)
# Get 2023-24 player stats
# Filter for PPG >= 20 and MPG >= 30
# Display player name, team, PPG, and MPG
# Your code here
from nba_api.stats.endpoints import leaguedashplayerstats
import pandas as pd
# Get 2023-24 player stats
# Filter for PPG >= 20 and MPG >= 30
# Display player name, team, PPG, and MPG
# Your code here
Find the leading assister for each NBA team. Display the player name, team abbreviation, and assists per game.
library(hoopR)
library(tidyverse)
# Get player stats
# Group by team
# Find max assists per team
# Display results
# Your code here
from nba_api.stats.endpoints import leaguedashplayerstats
import pandas as pd
# Get player stats
# Group by team
# Find max assists per team
# Display results
# Your code here
Convert raw counting statistics to per-100-possession rates. Create a function that takes points, rebounds, assists, and possessions played, returning the per-100 rates.
# Create per_100_stats function
# Input: pts, reb, ast, possessions
# Output: named vector with per-100 rates
per_100_stats <- function(pts, reb, ast, poss) {
# Your code here
}
# Test: A player with 20 pts, 8 reb, 5 ast in 80 possessions
# Should return pts100 = 25, reb100 = 10, ast100 = 6.25
# Create per_100_stats function
# Input: pts, reb, ast, possessions
# Output: dictionary with per-100 rates
def per_100_stats(pts, reb, ast, poss):
"""Convert to per-100 possession rates"""
# Your code here
pass
# Test: A player with 20 pts, 8 reb, 5 ast in 80 possessions
# Should return pts100 = 25, reb100 = 10, ast100 = 6.25
Using shot location data, create a basic shot chart visualization showing makes and misses on a basketball court diagram.
library(hoopR)
library(ggplot2)
# Get shot data for a player (e.g., Stephen Curry)
# Create court outline
# Plot shots colored by make/miss
# Add title and legend
# Your code here
from nba_api.stats.endpoints import shotchartdetail
import matplotlib.pyplot as plt
import matplotlib.patches as patches
# Get shot data for a player
# Create court outline
# Plot shots colored by make/miss
# Add title and legend
# Your code here
Create a hexbin shot chart showing shooting efficiency by court zone. Use color to indicate efficiency (FG%) and size/opacity for volume.
library(hoopR)
library(ggplot2)
library(hexbin)
# Get shot data
# Create hexagonal bins by court location
# Color by efficiency, size by volume
# Add court lines and labels
# Your code here
from nba_api.stats.endpoints import shotchartdetail
import matplotlib.pyplot as plt
from matplotlib.collections import PatchCollection
import numpy as np
# Get shot data
# Create hexagonal bins by court location
# Color by efficiency, size by volume
# Add court lines and labels
# Your code here
Implement the PER formula to calculate Player Efficiency Rating. Compare your results to published values.
# PER Formula Components (simplified):
# PER = (1/MP) * [3PM + (2/3)*AST + (2-factor*tm_ast/tm_fg)*FG
# + FT*0.5*(1+(1-(tm_ast/tm_fg))+(2/3)*(tm_ast/tm_fg))
# - VOP*TOV - VOP*DRB%*(FGA-FG) - VOP*0.44*(0.44+(0.56*DRB%))*(FTA-FT)
# + VOP*(1-DRB%)*(TRB-ORB) + VOP*DRB%*ORB + VOP*STL + VOP*DRB%*BLK
# - PF*(lg_FT/lg_PF - 0.44*(lg_FTA/lg_PF)*VOP)]
# Implement step by step
# Your code here
# PER Formula - Implement step by step
# This is a complex calculation with many components
# Your code here
Implement a simplified BPM calculation. Use the box score components to estimate a player's point differential contribution.
# BPM uses regression weights on box score stats
# Simplified formula (actual uses more complex coefficients):
# BPM ≈ 0.123*ORB% + 0.053*DRB% - 0.104*AST% + 0.076*STL%
# + 0.131*BLK% - 0.036*TOV% + 0.003*USG% - 0.087*Position
# Implement and test on sample players
# Your code here
# BPM uses regression weights on box score stats
# Implement the simplified calculation
# Your code here
Load player tracking data and create player profiles based on speed, touches, and time of possession. Identify different player types.
library(hoopR)
library(tidyverse)
# Get tracking data
# Calculate key metrics: speed, touches, time of possession
# Identify clusters of player types
# Visualize the profiles
# Your code here
from nba_api.stats.endpoints import leaguedashptstats
import pandas as pd
from sklearn.cluster import KMeans
# Get tracking data
# Calculate key metrics
# Identify clusters of player types
# Visualize the profiles
# Your code here
Calculate individual defensive ratings using tracking data. Compare rim protection, perimeter defense, and overall impact.
library(hoopR)
library(tidyverse)
# Get defensive tracking data
# Calculate DFG% at rim and perimeter
# Compare to opponent average FG%
# Rank players by defensive impact
# Your code here
from nba_api.stats.endpoints import leaguedashptdefend
import pandas as pd
# Get defensive tracking data
# Calculate DFG% at rim and perimeter
# Compare to opponent average FG%
# Rank players by defensive impact
# Your code here
Analyze the relationship between contested rebounds and team defensive rebounding success. Which players provide the most value?
library(hoopR)
library(tidyverse)
# Get hustle stats with rebound data
# Calculate contested vs uncontested rebounds
# Correlate with team defensive rebounding
# Identify high-value rebounders
# Your code here
from nba_api.stats.endpoints import leaguehustlestatsplayer
import pandas as pd
# Get hustle stats with rebound data
# Calculate contested vs uncontested rebounds
# Correlate with team defensive rebounding
# Identify high-value rebounders
# Your code here
Compare assist-based playmaking metrics. Calculate potential assists, assist conversion rate, and points created for top playmakers.
library(hoopR)
library(tidyverse)
# Get passing stats
# Calculate potential assists and conversion rate
# Estimate points created from assists
# Compare top playmakers
# Your code here
from nba_api.stats.endpoints import leaguedashptstats
import pandas as pd
# Get passing stats
# Calculate potential assists and conversion rate
# Estimate points created from assists
# Compare top playmakers
# Your code here
Create a logistic regression model predicting shot make probability based on distance, shot type, and defender distance.
library(hoopR)
library(tidyverse)
# Get shot data with features
# Build logistic regression model
# Predict make probability
# Calculate expected points (xPTS)
# Evaluate model accuracy
# Your code here
from nba_api.stats.endpoints import shotchartdetail
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import pandas as pd
# Get shot data with features
# Build logistic regression model
# Predict make probability
# Calculate expected points (xPTS)
# Evaluate model accuracy
# Your code here
Analyze the historical trend of three-point shooting. Calculate year-over-year changes in 3PA, 3P%, and the decline of mid-range.
library(hoopR)
library(tidyverse)
library(ggplot2)
# Get historical league averages
# Calculate 3PA and 3P% by year
# Track mid-range volume decline
# Visualize the revolution
# Your code here
from basketball_reference_scraper import seasons
import pandas as pd
import matplotlib.pyplot as plt
# Get historical league averages
# Calculate 3PA and 3P% by year
# Track mid-range volume decline
# Visualize the revolution
# Your code here
Given shot quality data, calculate each player's expected points from their shot distribution. Identify players with good/bad shot selection.
library(hoopR)
library(tidyverse)
# Get shot zone data for players
# Calculate expected points by zone
# Compare to league average
# Identify best/worst shot selection
# Your code here
from nba_api.stats.endpoints import playerdashptshots
import pandas as pd
# Get shot zone data for players
# Calculate expected points by zone
# Compare to league average
# Identify best/worst shot selection
# Your code here
Create a simple projection model for next-season performance using previous seasons, age, and regression to mean.
library(hoopR)
library(tidyverse)
# Get multi-year player data
# Create features (previous stats, age, etc.)
# Build regression model
# Project next season
# Validate on historical data
# Your code here
import pandas as pd
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
# Get multi-year player data
# Create features (previous stats, age, etc.)
# Build regression model
# Project next season
# Validate on historical data
# Your code here
Design and implement your own composite metric measuring a specific aspect of player value. Validate its predictive power.
library(hoopR)
library(tidyverse)
# Define what you want to measure
# Select component statistics
# Weight and combine components
# Test correlation with wins
# Document and explain the metric
# Your code here
import pandas as pd
from scipy.stats import pearsonr
# Define what you want to measure
# Select component statistics
# Weight and combine components
# Test correlation with wins
# Document and explain the metric
# Your code here
Build a tool that finds the most similar players to a given player using statistical profiles and clustering.
library(hoopR)
library(tidyverse)
# Get player statistical profiles
# Normalize/scale features
# Calculate similarity (cosine or Euclidean)
# Find N most similar players
# Create visualization
# Your code here
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
# Get player statistical profiles
# Normalize/scale features
# Calculate similarity (cosine or Euclidean)
# Find N most similar players
# Create visualization
# Your code here
Analyze the performance of different 5-player lineups. Calculate net rating and identify the best/worst combinations.
library(hoopR)
library(tidyverse)
# Get lineup data for a team
# Calculate net rating for each lineup
# Filter for adequate minutes
# Identify best/worst lineups
# Analyze what makes them work
# Your code here
from nba_api.stats.endpoints import teamdashlineups
import pandas as pd
# Get lineup data for a team
# Calculate net rating for each lineup
# Filter for adequate minutes
# Identify best/worst lineups
# Analyze what makes them work
# Your code here
Convert a team's statistics to pace-adjusted values. Compare raw vs. pace-adjusted rankings.
library(hoopR)
library(tidyverse)
# Get team stats including pace
# Calculate per-100-possession values
# Compare raw vs adjusted rankings
# Identify biggest movers
# Your code here
from nba_api.stats.endpoints import teamgamelogs
import pandas as pd
# Get team stats including pace
# Calculate per-100-possession values
# Compare raw vs adjusted rankings
# Identify biggest movers
# Your code here
Use K-means clustering to identify player archetypes based on statistical profiles. Visualize and label the clusters.
library(hoopR)
library(tidyverse)
library(cluster)
# Get comprehensive player stats
# Select features for clustering
# Run K-means with various k
# Visualize clusters with PCA
# Label and interpret archetypes
# Your code here
from sklearn.cluster import KMeans
from sklearn.decomposition import PCA
import pandas as pd
import matplotlib.pyplot as plt
# Get comprehensive player stats
# Select features for clustering
# Run K-means with various k
# Visualize clusters with PCA
# Label and interpret archetypes
# Your code here
Calculate the value per dollar for player contracts. Identify the best and worst values in the league.
library(hoopR)
library(tidyverse)
# Get player salaries and stats
# Estimate wins contributed (using BPM or WS)
# Calculate value per dollar
# Identify best/worst contracts
# Consider age and years remaining
# Your code here
import pandas as pd
# Get player salaries and stats
# Estimate wins contributed
# Calculate value per dollar
# Identify best/worst contracts
# Consider age and years remaining
# Your code here
Build aging curves showing how different skills change with age. Use delta method on year-over-year changes.
library(hoopR)
library(tidyverse)
# Get multi-year player data
# Calculate year-over-year changes
# Group by age
# Build aging curves
# Visualize by skill type
# Your code here
import pandas as pd
import matplotlib.pyplot as plt
# Get multi-year player data
# Calculate year-over-year changes
# Group by age
# Build aging curves
# Visualize by skill type
# Your code here
Build a simple NBA draft projection model using college statistics. Predict NBA success based on college performance.
library(hoopR)
library(tidyverse)
# Get historical draft and college data
# Define success metric (NBA WAR, etc.)
# Feature engineering
# Train prediction model
# Evaluate on recent drafts
# Your code here
from sklearn.ensemble import GradientBoostingRegressor
import pandas as pd
# Get historical draft and college data
# Define success metric
# Feature engineering
# Train prediction model
# Evaluate on recent drafts
# Your code here
Compare players across eras using era-adjusted statistics. Who was truly the best scorer relative to their era?
library(hoopR)
library(tidyverse)
# Get historical player and league data
# Calculate Z-scores relative to era
# Compare across different eras
# Create era-adjusted rankings
# Your code here
import pandas as pd
# Get historical player and league data
# Calculate Z-scores relative to era
# Compare across different eras
# Create era-adjusted rankings
# Your code here
Create a composite ranking system for all-time greatest players combining peak performance, longevity, and championships.
library(hoopR)
library(tidyverse)
# Get career stats for all-time players
# Calculate peak (best 5-year WAR)
# Calculate career value (total WAR)
# Factor in championships
# Create weighted composite ranking
# Your code here
import pandas as pd
# Get career stats for all-time players
# Calculate peak (best 5-year WAR)
# Calculate career value (total WAR)
# Factor in championships
# Create weighted composite ranking
# Your code here
Build a logistic regression model predicting Hall of Fame induction based on career statistics and awards.
library(hoopR)
library(tidyverse)
# Get career stats and HoF status
# Feature engineering (awards, totals, etc.)
# Build logistic regression model
# Predict probabilities for active players
# Calibrate and validate
# Your code here
from sklearn.linear_model import LogisticRegression
import pandas as pd
# Get career stats and HoF status
# Feature engineering
# Build logistic regression model
# Predict probabilities for active players
# Calibrate and validate
# Your code here
Build a model predicting championship probability for each team based on point differential and other factors.
library(hoopR)
library(tidyverse)
# Get team ratings and standings
# Build win probability model
# Simulate playoff bracket
# Calculate championship probabilities
# Compare to betting markets
# Your code here
import pandas as pd
import numpy as np
# Get team ratings and standings
# Build win probability model
# Simulate playoff bracket
# Calculate championship probabilities
# Compare to betting markets
# Your code here
Build a complete True Shooting % leaderboard with volume filters, league comparisons, and visualization.
library(tidyverse)
library(hoopR)
# TODO: Load player statistics for 2023-24
# TODO: Calculate True Shooting %
# TODO: Filter for qualified players (10+ FGA per game)
# TODO: Calculate league-relative TS (TS+)
# TODO: Identify the most efficient high-volume scorers
# TODO: Create a visualization
from nba_api.stats.endpoints import LeagueDashPlayerStats
import pandas as pd
import matplotlib.pyplot as plt
# TODO: Load player statistics for 2023-24
# TODO: Calculate True Shooting %
# TODO: Filter for qualified players (10+ FGA per game)
# TODO: Calculate league-relative TS (TS+)
# TODO: Identify the most efficient high-volume scorers
# TODO: Create a visualization
Create a Box Plus-Minus estimator using regression on box score statistics.
library(tidyverse)
# TODO: Load historical player data with known BPM values
# TODO: Select relevant features (per-minute stats)
# TODO: Split into training and test sets
# TODO: Train a regression model
# TODO: Evaluate model accuracy
# TODO: Apply to current season players
import pandas as pd
from sklearn.linear_model import Ridge
from sklearn.model_selection import train_test_split
# TODO: Load historical player data with known BPM values
# TODO: Select relevant features (per-minute stats)
# TODO: Split into training and test sets
# TODO: Train a regression model
# TODO: Evaluate model accuracy
# TODO: Apply to current season players
Create an expected points model based on shot location and context.
library(tidyverse)
library(hoopR)
# TODO: Load shot chart data
# TODO: Engineer features (distance, angle, shot zone)
# TODO: Train a model to predict make probability
# TODO: Calculate expected points for each shot
# TODO: Evaluate players on shot selection vs shot making
from nba_api.stats.endpoints import ShotChartDetail
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
# TODO: Load shot chart data
# TODO: Engineer features
# TODO: Train model
# TODO: Calculate expected points
Build a system to find the most similar players based on statistical profiles using cosine similarity.
library(tidyverse)
# TODO: Load player statistics
# TODO: Normalize stats
# TODO: Calculate similarity
# TODO: Find most similar players
import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.metrics.pairwise import cosine_similarity
# TODO: Load player statistics
# TODO: Normalize stats
# TODO: Calculate similarity matrix
# TODO: Create similarity finder function
Create an interactive radar chart visualization comparing multiple players across key metrics.
library(tidyverse)
library(plotly)
# TODO: Select players to compare
# TODO: Create radar chart
# TODO: Add interactivity
import plotly.graph_objects as go
import pandas as pd
# TODO: Select players
# TODO: Create radar chart
# TODO: Add hover tooltips