Every basketball analysis you've ever read started with someone getting the numbers from somewhere. That part rarely makes the article, which is a shame, because choosing the wrong data source is how you end up confidently wrong with last season's totals. I've pulled NBA data from most of the free options at one point or another — some of them at two in the morning while fighting a rate limit — so here is the field guide I wish I'd had: ranked, with the warts.

How I'm ranking these

"Best" depends on what you're doing, so I'm grading each source on four things: coverage (how much it actually has), reliability (is it current, does it break), ease (how fast you get from "I have a question" to "I have a dataframe"), and cost. I weight free-and-current heavily, because a beautiful dataset that stopped updating in 2023 is a museum piece, not a tool. The order below is roughly how often I personally reach for each one.

1. nba_api — the one you should learn first

The nba_api Python package is a wrapper around the same stats endpoints that power NBA.com. It is free, needs no API key, no signup, no credit card held hostage — and it exposes basically everything the league publishes: player and team stats, box scores, shot charts, hustle stats, tracking data, play-by-play. If a number lives on stats.nba.com, you can get it here, usually in one function call.

The catch, and it's the only real one, is that stats.nba.com rate-limits aggressively and never documented the rules. Hammer it with rapid-fire requests and you'll get timeouts or empty responses, so you learn to add a polite delay between calls and cache anything you've already fetched. Live with that one quirk and nothing else on this list competes on raw power. If you're starting from zero, I wrote a whole walkthrough on getting it running in getting started with nba_api.

Here's the pitch in one image. I asked for the current assist leaders — a single endpoint call, no scraping, no spreadsheet download — and got this back:

Bar chart of the 2025-26 NBA assist leaders led by Nikola Jokić at 10.7 assists per game, Cade Cunningham at 9.9, and Luka Dončić at 8.3.
The 2025-26 assist leaders, pulled in one call. Nikola Jokić leads at 10.7 per game, with Cade Cunningham at 9.9. Source: NBA Stats API via nba_api · 2025-26 · retrieved June 2026.

That a center leads the league in assists is its own conversation, but the point here is the plumbing: Jokić's 10.7 dimes, Cade Cunningham's 9.9, and Luka Dončić's 8.3 all arrived from the same request, current as of this week, for the cost of zero dollars and about three lines of code.

10.7 Nikola Jokić's assists per game, 2025-26 — the league lead, retrieved from nba_api in a single endpoint call with no API key.

2. Basketball-Reference — the historical record

If nba_api is the present, Basketball-Reference is the entire past. It is the canonical home of derived metrics the NBA doesn't publish itself — Win Shares, Box Plus/Minus, VORP, PER — going back decades, plus play-by-play, awards, transactions, and roster history that the official API simply doesn't carry. For any question with the word "all-time" in it, this is where I go.

The honest downsides are about access, not data. There's no free official API, so people scrape the HTML, and the site rate-limits and has terms of use about automated access that you should actually read before you point a bot at it. Be a courteous scraper — throttle hard, cache everything, and don't redistribute their data wholesale. Treated with respect, it's irreplaceable.

3. NBA.com/stats, pbpstats, and the rest

Below the top two, the field specializes. A quick rundown of what each is genuinely good for:

  • NBA.com/stats (the UI) — the same data as nba_api but in a browser. Perfect for eyeballing a leaderboard, filtering by season type, or sanity-checking a number you scripted. Not a pipeline; you can't easily automate the front end. Think of it as the read-only window onto the source nba_api drinks from.
  • pbpstats — my go-to when the question is about lineups and possessions rather than box-score totals. It does the unglamorous, fiddly work of parsing play-by-play into clean on/off and five-man-unit data, which is genuinely painful to roll yourself. Narrower than nba_api, but excellent at its specialty.
  • Kaggle datasets — wonderfully convenient: someone has already scraped, cleaned, and CSV'd huge swaths of NBA history, and you just download it. The trap is staleness. A Kaggle dataset is a snapshot frozen whenever its uploader last bothered, so it's great for a finished historical season and dangerous for anything current. Always check the last-updated date before you trust it.
  • hoopR / sportsdataverse — if you live in R instead of Python, the hoopR package (part of the sportsdataverse family) is the equivalent of nba_api: tidy access to NBA and men's college data, including play-by-play, built for the tidyverse. Same spirit, different language.

The partly-paid tier: Cleaning the Glass & Dunks & Threes

Two sources earn a mention even though they're not fully free, because the analysis is worth it. Cleaning the Glass is the gold standard for context — it strips out garbage time and breaks the game into possessions and play types — but the good stuff sits behind a subscription, with a limited free tier. Dunks & Threes does similar high-quality work, notably on its EPM all-in-one metric, again partly gated. Neither is a bulk-download data source; they're for reading and learning, not for feeding your own scripts. I pay for one of them and consider it cheap.

The short version

If you only remember one row of this table, make it the first one. Here's the whole field at a glance:

Free (and partly-free) basketball data sources at a glance.
SourceBest forCostWatch out for
nba_apiEverything current; official stats, no keyFreestats.nba.com rate limits
Basketball-ReferenceHistory, Win Shares, BPM, VORPFreeScraping terms & rate limits
NBA.com/stats (UI)Eyeballing & sanity checksFreeNot automatable
pbpstatsLineups & play-by-play possessionsFreeNarrow scope
Kaggle datasetsPre-cleaned historical bulkFreeOften stale
hoopR / sportsdataverseThe same, but in RFreeR-only ecosystem
Cleaning the Glass / Dunks & ThreesContext, play types, EPMPartly paidBest data is gated

My actual workflow, if you want to copy it: nba_api for anything happening this season, Basketball-Reference for anything historical or for the derived metrics the league won't publish, pbpstats when I need lineup data, and a subscription read for context when I'm trying to understand why a number is what it is. Start with the first one. It's free, it has no gatekeeper, and the only thing it asks of you is patience between requests — which, after a few rate-limit timeouts, you will learn.

Sources & Further Reading

  • Live assist data and the demo chart: NBA.com/stats, pulled via the nba_api Python package (2025-26, retrieved June 2026). The script is in scripts/free_data_sources_demo.py.
  • Historical stats and derived metrics (Win Shares, BPM, VORP): Basketball-Reference.
  • Also referenced in prose: pbpstats (lineups & play-by-play), Kaggle datasets, hoopR / sportsdataverse (R), and the partly-paid Cleaning the Glass and Dunks & Threes.

NBAAnalytic

Independent basketball analyst writing data-first NBA coverage. Every stat here is pulled from public sources with the scripts published alongside it. More about the methodology →