Every chart on this site starts the same dull, glorious way: a Python script asks stats.nba.com a question and writes the answer to disk. The package that does the asking is nba_api, and it is the single fastest way to go from "I have an opinion about the scoring race" to "I have the actual numbers." This is the tutorial I wish I'd had — install it, make one real call, understand what comes back, and learn the etiquette that keeps the API from quietly blocking you.

Install it (one line)

There is no account, no API key, no signup email that takes three business days. The NBA's stats site exposes a sprawling, undocumented JSON API, and nba_api is a community package that wraps the useful corners of it in tidy Python classes. You install it the way you install anything:

Python
pip install nba_api

That pulls in pandas and requests as dependencies, which are the only other pieces you need for this whole tutorial. If you work in virtual environments — and you should — activate yours first so this doesn't land in your system Python. That's the entire setup. We can go ask the league a question now.

Your first call

Here's the question: who are this season's per-game scoring leaders? The endpoint that answers it is LeagueLeaders. You hand it a season, a season type, a stat category, and a "per mode," and it hands back a table. This snippet is the real thing — it's a faithful, paste-ready version of the script that generated the chart below:

Python
from nba_api.stats.endpoints import leagueleaders

# Ask stats.nba.com for this season's per-game scoring leaders.
leaders = leagueleaders.LeagueLeaders(
    season="2025-26",
    season_type_all_star="Regular Season",
    stat_category_abbreviation="PTS",
    per_mode48="PerGame",
    timeout=30,
)

# Endpoints can return several tables; the leaderboard is the first one.
df = leaders.get_data_frames()[0]

# Sort by points and look at the top of the race.
top = df.sort_values("PTS", ascending=False).head(10)
print(top[["RANK", "PLAYER", "TEAM", "GP", "PTS"]])

Run that and the top line that prints is Luka Dončić, now in a Lakers uniform, at 33.5 points per game across 64 games. Shai Gilgeous-Alexander is right behind at 31.1, and Anthony Edwards rounds out the podium at 28.8. Ten lines of Python, the live scoring race. The per_mode48 argument name is a piece of NBA-API archaeology — it dates back to a 48-minute-pace option — but "PerGame" is what you want, and the awkward keyword is a good early lesson: this API was built for the NBA's own website, not for you, and the parameter names show it.

Horizontal bar chart of the 2025-26 NBA per-game scoring leaders, led by Luka Dončić at 33.5 points, Shai Gilgeous-Alexander at 31.1, and Anthony Edwards at 28.8.
The output of the LeagueLeaders call above — this season's top ten scorers, per game. Source: NBA Stats API via nba_api · 2025-26 · retrieved June 2026.

What a DataFrame is

Notice that the call ends in .get_data_frames()[0], and that what we sorted was a thing called df. That object is a pandas DataFrame, and if you're new to data work in Python it's the one concept worth slowing down for. A DataFrame is a spreadsheet that lives in memory: labeled columns (PLAYER, TEAM, PTS, GP) and one row per record — here, one row per player. Everything you'd do in Excel, you do with a method call instead.

The reason this matters is that almost every nba_api endpoint returns a DataFrame, so the handful of moves you learn here transfer everywhere. A few you'll use constantly:

Python
# Pick columns:
df[["PLAYER", "PTS"]]

# Filter rows — only players averaging 25+:
df[df["PTS"] >= 25]

# Sort, then take the top of the list:
df.sort_values("PTS", ascending=False).head(10)

# Compute a new column from existing ones (total points scored):
df["TOTAL_PTS"] = df["PTS"] * df["GP"]

An endpoint can return more than one table, which is why get_data_frames() gives you a list and we index [0]. For LeagueLeaders the first table is the leaderboard and that's all there is; for richer endpoints, index 1, 2, and 3 might hold splits, summaries, or supporting rows. When in doubt, print len(leaders.get_data_frames()) and inspect each one.

Picking the right endpoint

The hard part of nba_api isn't writing the call — it's knowing which of the dozens of endpoints answers your question. The mental model that helps me: ask whether you want a leaderboard, a full-league table, or one entity's detail.

  • One ranked stat, league-wide? LeagueLeaders — exactly what we used. Great for "top scorers / rebounders / assisters."
  • Every player, many stats at once? LeagueDashPlayerStats, with a measure_type of Base, Advanced, Usage, or Scoring. This is the workhorse behind most of the efficiency pieces here.
  • Every team? LeagueDashTeamStats — same idea, team level. It's where the Four Factors numbers come from.
  • One game's box score, or one player's shot locations? BoxScoreTraditionalV2 and ShotChartDetail — the latter is the backbone of the shot-chart tutorial.

Every one of these returns DataFrames, so the skills from the last section carry straight over; you're really only learning new parameter names. The package's GitHub README lists all of them, and if you'd rather not learn the NBA's API at all, my roundup of free basketball data sources covers the friendlier alternatives.

33.5 Luka Dončić's league-leading points per game this season, in his first year with the Lakers — the top line returned by a single LeagueLeaders call.

The numbers it returns

To make the leaderboard concrete, here is the exact top ten the call above produces, copied straight from the DataFrame. This is also a tidy demonstration of what "one row per player, labeled columns" looks like once it's on a page:

2025-26 per-game scoring leaders. Source: NBA Stats API via nba_api (LeagueLeaders, PerGame), retrieved June 2026.
RankPlayerTeamPPG
1Luka DončićLAL33.5
2Shai Gilgeous-AlexanderOKC31.1
3Anthony EdwardsMIN28.8
4Jaylen BrownBOS28.7
5Tyrese MaxeyPHI28.3
6Kawhi LeonardLAC27.9
7Donovan MitchellCLE27.9
8Nikola JokićDEN27.7
9Devin BookerPHX26.1
10Jalen BrunsonNYK26.0

Mitchell and Leonard are tied at 27.9 to the tenth of a point, which is the kind of thing you only notice when you have the table instead of a vibe. That's the entire pitch for learning this: opinions about the scoring race are cheap, and the data is now ten lines away.

Rate-limit etiquette (read this part)

Here is where most first-time nba_api users get themselves in trouble. The stats endpoints are not a public, supported API — they're the NBA's own website plumbing, and they will time out, hang, or temporarily stop answering you if you're rude. Three habits keep you on the polite side, and they're baked into every script on this site:

1. Send real headers and a generous timeout. Out of the box, requests to stats.nba.com often hang forever because the server expects browser-like headers before it answers. Pass a real User-Agent and the NBA's own request headers, and always set a timeout so a stalled call fails fast instead of freezing your script:

Python
import requests

STATS_HEADERS = {
    "Host": "stats.nba.com",
    "User-Agent": ("Mozilla/5.0 (Windows NT 10.0; Win64; x64) "
                   "AppleWebKit/537.36 (KHTML, like Gecko) "
                   "Chrome/124.0 Safari/537.36"),
    "Accept": "application/json, text/plain, */*",
    "Accept-Language": "en-US,en;q=0.9",
    "Referer": "https://www.nba.com/",
    "Origin": "https://www.nba.com",
    "x-nba-stats-origin": "stats",
    "x-nba-stats-token": "true",
    "Connection": "keep-alive",
}

# nba_api accepts a `headers=` argument on every endpoint;
# `timeout=30` means "give up after 30 seconds," not "wait 30 seconds."
leaders = leagueleaders.LeagueLeaders(
    season="2025-26",
    headers=STATS_HEADERS,
    timeout=30,
)

2. Don't hammer the server. If you're looping over players or games, put a short pause — half a second to a second — between calls, and wrap each call in a retry with exponential backoff so a single hiccup doesn't crash a long pull. A loop that fires a hundred requests as fast as Python can manage is exactly what gets you a temporary block. Sleep is cheap; getting rate-limited in the middle of a run is not.

3. Cache everything. This is the most important habit and the one people skip. The scoring leaders don't change between 2:00 and 2:05, so there is no reason to ask twice. Pull once, write the raw response to disk, and on every subsequent run read from the file instead of the network:

Python
import io, time
from pathlib import Path
import pandas as pd

CACHE = Path("data/raw")
CACHE.mkdir(parents=True, exist_ok=True)

def cached_df(key, producer, sleep_after=0.7):
    """Pull a DataFrame once, then serve it from disk forever after.
    Delete the cached file to force a fresh pull."""
    path = CACHE / f"{key}.json"
    if path.exists():
        return pd.read_json(io.StringIO(path.read_text(encoding="utf-8")),
                            orient="records")
    df = producer()                      # the only line that hits the network
    path.write_text(df.to_json(orient="records"), encoding="utf-8")
    time.sleep(sleep_after)              # be polite on the way out
    return df

leaders = cached_df(
    "leagueleaders_PTS_2025-26",
    lambda: leagueleaders.LeagueLeaders(
        season="2025-26", season_type_all_star="Regular Season",
        stat_category_abbreviation="PTS", per_mode48="PerGame", timeout=30,
    ).get_data_frames()[0],
)

Caching does three good things at once: it makes your work reproducible (the numbers don't shift under you between runs), it makes development fast (no waiting on the network every time you tweak a chart), and it makes you a good citizen (one request instead of a thousand). The full version of this pattern — headers, retries with backoff, on-disk caching, and a manifest that records where every number came from — is the helper that drives every script in this project. The specific script behind this article is scripts/nba_api_getting_started.py, and it does nothing more exotic than what you've read here.

Where to go next

You now have the whole loop: install the package, call an endpoint, get a DataFrame, sort it, and do it without annoying the server. That's genuinely 80% of what I do to build the data pieces on this site — the other 20% is knowing which stat to ask for, which is a basketball question, not a Python one. Swap "PTS" for "REB" or "AST" in the call above and you've got the rebounding or assist leaders; reach for LeagueDashPlayerStats with measure_type="Advanced" and you've got the raw material for an efficiency study. The API is undocumented and a little cranky, but it's free, it's complete, and it's ten lines away. Go ask it something.

Sources & Further Reading

  • Data source: NBA.com/stats — the (undocumented) stats endpoints this whole tutorial talks to.
  • The Python package: nba_api on GitHub, whose README is the closest thing to a full endpoint reference.
  • The runnable code for this article lives in scripts/nba_api_getting_started.py (LeagueLeaders, PTS, PerGame; 2025-26, retrieved June 2026).

NBAAnalytic

Independent basketball analyst writing data-first NBA coverage. Every stat here is pulled from public sources with the scripts published alongside it. More about the methodology →