Why Plus-Minus Needs a Thousand Games: On-Off, RAPM, and Noise

Plus-minus feels like it should be the perfect stat. It’s the scoreboard, attributed to a player: how did the team do while you were out there? No subjective judgment, no box-score bookkeeping — just the margin. And yet raw plus-minus is the single noisiest number on the entire stat sheet, the one most likely to lie to you in a confident voice. Understanding why is a short course in everything that makes evaluating basketball hard.

Raw plus-minus: the noisiest number on the sheet

Raw plus-minus is a player’s team point differential while he’s on the court. Simple to define, treacherous to interpret — because basketball is five-on-five, and that margin is the joint product of ten players, only one of whom is the subject. Your plus-minus is shaped by the four teammates next to you and the five opponents across from you at least as much as by anything you personally do.

Play your minutes alongside three All-Stars against the other team’s benches, and your plus-minus will glow no matter how you perform. Play them as the lone competent player on a bad unit against opposing starters, and it’ll look dreadful even if you’re the only reason the margin isn’t worse. The number isn’t measuring you; it’s measuring the situations you happened to be in. Over a small sample — a game, a week, even a month — it’s mostly noise.

A hypothetical that shows the trap

Picture a reserve — call him a deep-bench wing — who posts a sparkling plus-minus over a stretch of games. The naive read is that he’s quietly excellent and deserves more minutes. But look at when he plays. Suppose his minutes come almost entirely in the second and fourth quarters alongside the team’s two best players, against the opponent’s reserves, during stretches when the starters have already built a lead. Of course the margin is positive while he’s out there — the lineup around him is excellent and the competition is weak.

Those circumstances are illustrative, but the trap is entirely real and entirely common. The reserve’s glowing plus-minus is a fact about his context — great teammates, soft opponents, garbage-adjacent minutes — not about his individual contribution. Promote him to starter minutes against starters, strip away the All-Star teammates, and that shine can vanish overnight. Raw plus-minus pointed at a star and was really describing the company he kept.

On/off splits: a step up, still confounded

The natural next move is the on/off split: compare how the team does with a player on the floor versus off it. This is genuinely better than raw plus-minus, because it controls for the team’s baseline — a player on a great team and a player on a poor one are each measured against their own context.

But on/off is confounded by exactly the thing raw plus-minus is: who you share minutes with. If a player almost always plays alongside a particular co-star, his on-court number is really the duo’s number, and his off-court number reflects whatever lineups happen to fill the bench minutes. Staggered rotations, injuries, and a coach’s habits all leak into the split. On/off tells you what happened around a player; it still can’t cleanly separate his fingerprint from his teammates’.

APM and RAPM: untangling ten players with regression

To actually isolate one player, you need to account for the other nine in every stint simultaneously — and that’s a regression problem. Adjusted plus-minus (APM) does exactly this: it treats every lineup combination across thousands of stints as data and solves for each player’s individual contribution to margin, holding teammates and opponents constant. In principle it answers the dream question — how much does this specific player move the needle, independent of who’s around him?

In practice, raw APM is unstable, and the culprit is multicollinearity: players who almost always share the floor are statistically hard to tell apart, because the model rarely sees them separately. The regression, starved of independent variation, throws up wild, jumpy estimates — huge positive values for one member of an inseparable pair and huge negatives for the other. The signal is there, but it’s drowning in variance.

RAPM — regularized adjusted plus-minus — is the fix, and it’s a beautiful one. It adds ridge (Bayesian) regularization: a prior that gently pulls every player’s estimate toward zero unless the data pushes back hard enough to justify moving it. In plain terms, the model starts by assuming everyone is league-average and demands real evidence before crediting anyone with being far better or worse. That penalty tames the multicollinearity, trading a little bias for a large reduction in variance, and turns the jumpy APM mess into estimates stable enough to actually use.

one prior RAPM’s entire trick: a regularization prior that pulls every estimate toward zero until the data earns the right to move it — which is exactly why it needs seasons, not games, to stabilize.

Why RAPM needs a thousand games

That regularization comes with a string attached, and it’s the headline of this whole piece: RAPM needs a lot of data to stabilize. Because the prior deliberately holds estimates near zero until the evidence accumulates, a single season often isn’t enough to separate genuinely great players from merely good ones — there simply aren’t enough independent lineup combinations in one year to overcome the multicollinearity for every player. The estimates are still half-shrunk toward the mean.

This is why serious RAPM is almost always multi-year, pooling several seasons so the model sees each player in enough varied contexts to pin him down. The practical rules fall right out of this:

First, never trust one season of RAPM as a precise player rating — the single-year version is noisy enough to mislead, and the gaps between similarly-rated players are mostly statistical fog. Second, use the multi-year version when you can, and read even that with humility about its error bars. Third, and most important, combine it with box-score metrics. On/off-based numbers and box-score numbers fail in different, partly uncorrelated ways — the box score can’t see defense or spacing well, while plus-minus methods can’t see who actually did what — so blending them cancels some of each one’s noise. That blend is precisely the design philosophy behind the modern all-in-one metrics we cover in BPM and EPM: a box-score component for stability and signal attribution, fused with an on/off component for the things the box score misses.

The takeaway

Plus-minus is a ladder, and most arguments go wrong by standing on the bottom rung. Raw plus-minus is dominated by teammates and opponents — pure context, mostly noise over any short window. On/off climbs a step but stays tangled in who you play with. APM tries to cut the knot with regression and shakes itself apart on multicollinearity. RAPM steadies it with a prior toward zero — which is exactly why it demands multiple seasons before its estimates mean much. So treat any single-season plus-minus derivative as a hint, never a verdict; reach for multi-year data; and blend it with the box score, because the surest way to be fooled in basketball analytics is to trust one noisy number in isolation. If you want to see how margin gets converted into something more interpretable at the team level, that’s the work of offensive and defensive rating, and the in-game cousin of all this lives in win probability models.

Sources & Further Reading

Background reading: Chapter 11: Regularized Adjusted Plus-Minus (RAPM), a free textbook chapter at DataField.dev.
On/off splits, lineup, and play-by-play data: PBP Stats and NBA.com/stats.
Plus-minus, APM, and RAPM definitions: Basketball-Reference Glossary.
Foundational work on isolating individual value from team results: Dean Oliver, Basketball on Paper.

C. B. Zakarian

C. B. Zakarian is an independent analyst who writes about what he can measure: ball sports and the player-run economies inside Roblox. He builds every model, chart, and calculator here himself from public data, shows the working, and never invents a number. When the data can't answer a question, he says so. On NBAAnalytic, that means NBA ratings, shot charts, and stat explainers built from the league's public data. More about the methodology →

Raw plus-minus: the noisiest number on the sheet

A hypothetical that shows the trap

On/off splits: a step up, still confounded

APM and RAPM: untangling ten players with regression

Why RAPM needs a thousand games

The takeaway

Sources & Further Reading

C. B. Zakarian

Related articles

Pythagorean Expectation: Predicting the NBA Standings From Two Numbers

Why Player Projection Systems Disagree: A Look Under the Hood

How NBA Win-Probability Models Work (And Why They Disagree)