Skip to content

DOL Form 5500 — ERISA pension & welfare plan filings

Verified May 16, 2026 · tested with live www.askebsa.dol.gov fetch (F_5500_2022, 30 MB) + apex-redirect gotcha confirmed

pensionsfilingspanel-datafreeno-api-keydol

Form 5500 (US DOL / EBSA) is the free annual ERISA filing for every US pension and welfare plan with ≥100 participants. Schedule H gives plan-level asset breakdowns — including mutual-fund holdings — making it the go-to free source for retirement, household-finance, DC/401(k), and pension-as-shareholder research. It is what the ZeroPaper pipeline uses for retirement and labor-finance work.

Per-year, per-schedule ZIPs (no auth):

https://www.askebsa.dol.gov/FOIA%20Files/{year}/Latest/F_{NAME}_{year}_Latest.zip
NAME ∈ {5500, SCH_A, SCH_C, SCH_D, SCH_G, SCH_H, SCH_I, SCH_R, SCH_MB, SCH_SB}
import io, zipfile, requests, pandas as pd
def get_5500(year, name="5500"):
url = (f"https://www.askebsa.dol.gov/FOIA%20Files/{year}"
f"/Latest/F_{name}_{year}_Latest.zip")
z = zipfile.ZipFile(io.BytesIO(
requests.get(url, stream=True, timeout=120).content))
return pd.read_csv(z.open(z.namelist()[0]), low_memory=False)
main22 = get_5500(2022) # ~243K plan filings
sch_h = get_5500(2022, "SCH_H") # plan financials / asset breakdown
joined = main22.merge(sch_h, on="ACK_ID") # filing-level key — see gotchas

The reason to read this page rather than the DOL site. Verified live on the date above (the 2022 main file is a ~30 MB ZIP; the apex-redirect below was reproduced).

  • Hit www.askebsa.dol.gov, not the apex. The apex askebsa.dol.gov issues a 301 whose Location contains a literal space (FOIA Files, not FOIA%20Files) — confirmed live. That malformed header hangs urllib. Always request the www. host directly.
  • Join schedules on ACK_ID, not EIN. ACK_ID is the DOL filing ID and the correct key to attach a schedule to one filing. (EIN, PN) identifies a plan across years — use that for a plan-year panel, not for joining schedules.
  • “Latest” is a moving target. DOL revises files as late filings arrive. Row counts change month to month — snapshot the cache and report the access date, or your results aren’t reproducible.
  • Slow source. ~50 KB/s on urllib; use requests streaming (~2.5 MB/s). Expect a multi-minute first download per (year, schedule). Cache to data/form_5500/.
  • Big in memory. Schedule H is ~52 MB CSV/year; multi-year panels reach hundreds of MB — use a column subset (usecols=).
  • Schedule structure stabilized in 2009. Pre-2009 column names differ; don’t assume a 2022 schema for a 2005 file.
ColumnMeaning
ACK_IDDOL filing ID — the join key for schedules
EIN, PNSponsor EIN + plan number; (EIN, PN) = plan across years
TOT_ASSETS_EOY_AMTTotal plan assets, end of year (Sch H)
INT_REG_INVST_CO_EOY_AMTMutual-fund holdings (registered inv. cos.)
INT_COMMON_TR_EOY_AMTCommon collective trusts (DC substitute)
EMPLR_CONTRIB_*_AMT / PARTCP_CONTRIB_*_AMTEmployer / participant contributions
PARTCP_LOANS_*_AMTParticipant loans outstanding

Pair BOY/EOY columns to construct flows.

  • Plan-year panel: stack Schedule H across years, key on (EIN, PN, year).
  • Mutual-fund exposure share: INT_REG_INVST_CO_EOY_AMT / TOT_ASSETS_EOY_AMT.
  • Implied flows: EOY − BOY·(1+r_t) with a benchmarked plan return.
  • Sponsor link: match sponsor EIN to Compustat for firm characteristics.
  • Always state schedule, year, and access date (DOL revises “Latest”).

U.S. Department of Labor, Employee Benefits Security Administration, Form 5500 [Schedule, plan year], public-use research files; https://www.dol.gov/agencies/ebsa/…/form-5500-datasets, accessed YYYY-MM-DD.

Found an error or want a topic covered? Open an issue, use the Edit page link above, or email contact@instituteforautomatedresearch.org. Edits are reviewed before publishing; provenance and accuracy are the point.