The Metric Landscape
The proliferation of all-in-one metrics presents both opportunity and challenge. PER, Win Shares, BPM, RPM, RAPTOR, EPM all purport to measure player value, yet they employ different methodologies, data sources, and philosophical approaches. Understanding how these metrics relate—where they agree, where they diverge—enables more sophisticated player evaluation than relying on any single number.
Methodological Categories
All-in-one metrics fall into distinct categories. Box score metrics (PER, Win Shares, BPM) rely exclusively on traditional statistics. Plus-minus hybrid metrics (RPM, RAPTOR, EPM) combine box score information with lineup-based point differential data. Tracking-enhanced metrics incorporate player tracking data measuring spatial and movement information.
Correlation Analysis
Major metrics correlate highly for identifying the best and worst players. The top 20 players by RAPTOR substantially overlap with the top 20 by EPM, BPM, or Win Shares. This consensus at the extremes provides validation.
Divergence increases in the middle of the distribution where measurement uncertainty is highest. These disagreements reflect both random error and systematic methodological differences.
Systematic Biases
Each metric contains systematic biases. PER favors high-usage scorers and rebounders while undervaluing defense. Win Shares allocates defensive credit heavily by minutes. BPM's regression captures historical patterns that may not hold for novel player types. Plus-minus hybrids can conflate individual and context effects.
Multi-Metric Analysis
The most robust approach consults multiple metrics. When several agree, confidence increases. When they diverge, the pattern often reveals important information—perhaps a player whose impact doesn't match their statistics.
Practical Selection Guidelines
For historical comparisons, PER and Win Shares provide useful baselines. For current evaluation emphasizing on-court impact, RAPTOR and EPM offer the most sophistication. For uncertainty-aware decisions, EPM's Bayesian framework provides explicit confidence intervals. For projections, dedicated systems like DARKO outperform evaluation metrics.
The Limits of All-in-One Metrics
No all-in-one metric captures everything that matters. These metrics compress complex, multi-dimensional contributions into single numbers, inevitably losing information. Analysts should view them as useful summaries that complement rather than replace deeper analysis.
Implementation in R
# Compare multiple advanced metrics
library(tidyverse)
compare_metrics <- function(player_stats) {
player_stats %>%
select(player_name, per, ws_48, bpm, vorp, rpm, raptor, epm) %>%
mutate(
# Standardize each metric
per_z = scale(per),
ws_z = scale(ws_48),
bpm_z = scale(bpm),
rpm_z = scale(rpm),
raptor_z = scale(raptor),
epm_z = scale(epm),
# Composite score
composite = (per_z + ws_z + bpm_z + rpm_z + raptor_z + epm_z) / 6
) %>%
arrange(desc(composite))
}
player_metrics <- read_csv("all_advanced_metrics.csv")
comparison <- compare_metrics(player_metrics)
# Top 20 by composite
top_composite <- comparison %>%
filter(!is.na(composite)) %>%
select(player_name, per, bpm, rpm, composite) %>%
head(20)
print(top_composite)
# Metric correlation analysis
library(tidyverse)
library(corrplot)
analyze_metric_correlations <- function(player_stats) {
metrics <- player_stats %>%
select(per, ws_48, bpm, vorp, rpm, raptor, epm) %>%
na.omit()
cor_matrix <- cor(metrics)
# Visualize correlations
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.srt = 45,
addCoef.col = "black", number.cex = 0.7)
return(cor_matrix)
}
correlations <- analyze_metric_correlations(player_metrics)
print(round(correlations, 2))
Implementation in Python
# Compare multiple advanced metrics
import pandas as pd
from scipy import stats
def compare_metrics(player_stats):
"""Compare and combine advanced metrics"""
metrics = ["per", "ws_48", "bpm", "vorp", "rpm", "raptor", "epm"]
df = player_stats.copy()
# Standardize each metric
for m in metrics:
df[f"{m}_z"] = stats.zscore(df[m], nan_policy="omit")
# Composite score
z_cols = [f"{m}_z" for m in metrics]
df["composite"] = df[z_cols].mean(axis=1)
return df.sort_values("composite", ascending=False)
player_metrics = pd.read_csv("all_advanced_metrics.csv")
comparison = compare_metrics(player_metrics)
top_composite = comparison[["player_name", "per", "bpm", "rpm", "composite"]].head(20)
print(top_composite)
# Metric correlation analysis
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def analyze_metric_correlations(player_stats):
"""Analyze correlations between advanced metrics"""
metrics = ["per", "ws_48", "bpm", "vorp", "rpm", "raptor", "epm"]
cor_matrix = player_stats[metrics].corr()
# Visualize correlations
plt.figure(figsize=(10, 8))
sns.heatmap(cor_matrix, annot=True, cmap="coolwarm", center=0,
fmt=".2f", square=True)
plt.title("Correlation Between Advanced Metrics")
plt.tight_layout()
plt.show()
return cor_matrix
correlations = analyze_metric_correlations(player_metrics)
print(correlations.round(2))
Implementation in R
# Compare multiple advanced metrics
library(tidyverse)
compare_metrics <- function(player_stats) {
player_stats %>%
select(player_name, per, ws_48, bpm, vorp, rpm, raptor, epm) %>%
mutate(
# Standardize each metric
per_z = scale(per),
ws_z = scale(ws_48),
bpm_z = scale(bpm),
rpm_z = scale(rpm),
raptor_z = scale(raptor),
epm_z = scale(epm),
# Composite score
composite = (per_z + ws_z + bpm_z + rpm_z + raptor_z + epm_z) / 6
) %>%
arrange(desc(composite))
}
player_metrics <- read_csv("all_advanced_metrics.csv")
comparison <- compare_metrics(player_metrics)
# Top 20 by composite
top_composite <- comparison %>%
filter(!is.na(composite)) %>%
select(player_name, per, bpm, rpm, composite) %>%
head(20)
print(top_composite)
# Metric correlation analysis
library(tidyverse)
library(corrplot)
analyze_metric_correlations <- function(player_stats) {
metrics <- player_stats %>%
select(per, ws_48, bpm, vorp, rpm, raptor, epm) %>%
na.omit()
cor_matrix <- cor(metrics)
# Visualize correlations
corrplot(cor_matrix, method = "color", type = "upper",
tl.col = "black", tl.srt = 45,
addCoef.col = "black", number.cex = 0.7)
return(cor_matrix)
}
correlations <- analyze_metric_correlations(player_metrics)
print(round(correlations, 2))
Implementation in Python
# Compare multiple advanced metrics
import pandas as pd
from scipy import stats
def compare_metrics(player_stats):
"""Compare and combine advanced metrics"""
metrics = ["per", "ws_48", "bpm", "vorp", "rpm", "raptor", "epm"]
df = player_stats.copy()
# Standardize each metric
for m in metrics:
df[f"{m}_z"] = stats.zscore(df[m], nan_policy="omit")
# Composite score
z_cols = [f"{m}_z" for m in metrics]
df["composite"] = df[z_cols].mean(axis=1)
return df.sort_values("composite", ascending=False)
player_metrics = pd.read_csv("all_advanced_metrics.csv")
comparison = compare_metrics(player_metrics)
top_composite = comparison[["player_name", "per", "bpm", "rpm", "composite"]].head(20)
print(top_composite)
# Metric correlation analysis
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def analyze_metric_correlations(player_stats):
"""Analyze correlations between advanced metrics"""
metrics = ["per", "ws_48", "bpm", "vorp", "rpm", "raptor", "epm"]
cor_matrix = player_stats[metrics].corr()
# Visualize correlations
plt.figure(figsize=(10, 8))
sns.heatmap(cor_matrix, annot=True, cmap="coolwarm", center=0,
fmt=".2f", square=True)
plt.title("Correlation Between Advanced Metrics")
plt.tight_layout()
plt.show()
return cor_matrix
correlations = analyze_metric_correlations(player_metrics)
print(correlations.round(2))