Skip to content

Open Source Asset Pricing (Chen-Zimmermann)

Verified May 16, 2026 · tested with openassetpricing.com + GitHub reachable (pkg/data per docs, not bulk-fetched)

asset-pricinganomaliesequitiespanel-datafreeacademic

Open Source Asset Pricing (Chen & Zimmermann 2022, RFS) is the free, reproducible cross-sectional anomaly dataset: 212 firm-level signals plus pre-computed long-short and decile portfolio returns, with the CRSP/Compustat merge already done. It is what the ZeroPaper pipeline uses to test whether a model’s mechanism maps to a known anomaly. This page is the distilled recipe.

Option 1 — openassetpricing package (preferred)

Section titled “Option 1 — openassetpricing package (preferred)”
# pip install openassetpricing
from openassetpricing import OpenAP
ap = OpenAP()
# Firm-level signals — predictor arg MUST be a list (see gotchas)
sig = ap.dl_signal("pandas", ["BM", "Mom12m", "AssetGrowth"])
# → permno, yyyymm, BM, Mom12m, AssetGrowth
port = ap.dl_port("op", "pandas", ["BM"]) # original-paper portfolios
ap.list_port() # op, deciles_ew/vw, quintiles_*
docs = ap.dl_signal_doc("pandas") # all 212 with paper refs

A ~1.6 GB zipped wide CSV of all predictors is at https://www.openassetpricing.com/data/. Download once, cache under data/, never re-pull.

The reason to read this page rather than the repo README. Site and GitHub confirmed reachable on the date above; the package/data interface is as documented (not bulk-fetched here — the full set is ~1.6 GB).

  • Predictor arguments must be LISTS, not strings. ["BM"], never "BM". Passing a string is the #1 failure and the error is not obvious.
  • The bulk download is ~1.6 GB. dl_all_signals / the direct CSV will blow memory and time if pulled naively. Request only the signals you need; cache aggressively.
  • The CRSP/Compustat merge is already done. Do not re-merge — signals are delivered at permno × yyyymm. Re-merging double-counts and misaligns.
  • yyyymm is an integer, not a date — convert before joining to returns.
  • Releases are versioned and periodic. State the release you used (latest tags have been updated as recently as late 2025); results drift across releases.
  • Pre-built portfolios beat hand-rolled. Use dl_port deciles for long-short spreads rather than re-sorting — it matches the paper’s methodology and avoids look-ahead in the sort.
SignalDescriptionCategory
BMBook-to-marketValue
Mom12m12-month momentum (skip last month)Momentum
AssetGrowthAsset growthInvestment
GPGross profitabilityProfitability
EPEarnings-to-priceValue
BetaCAPM betaRisk
IdioVolIdiosyncratic volatilityRisk
AccrualsOperating accrualsQuality
SUEStandardized unexpected earningsEarnings
ShareIss1YNet share issuance, 1yrIssuance

ap.dl_signal_doc('pandas') for the full 212 with references.

  • Long-short spread: decile 10 − decile 1 from dl_port("deciles_vw", …).
  • Alpha: regress the long-short series on FF5 (see Ken French).
  • Signal-zoo test: does your model’s mechanism map onto an existing anomaly, or is it genuinely new?
  • Always state the signals, release/version, sample period, and weighting.

Chen, A. Y., and T. Zimmermann (2022). “Open Source Cross-Sectional Asset Pricing.” Review of Financial Studies. Data from https://www.openassetpricing.com, release [tag], accessed YYYY-MM-DD.

Found an error or want a topic covered? Open an issue, use the Edit page link above, or email contact@instituteforautomatedresearch.org. Edits are reviewed before publishing; provenance and accuracy are the point.