NuminorBeta

Numinor SAM Product Momentum Construct Data v1.0

English: Numinor SAM Product Momentum Construct Data 中文: Numinor SAM 产品动量构建数据


1. Product Identity

Product: SAM PM 构建数据 (Construct Data) Version: v1.0 Methodology Reference: Numinor SAM Product Momentum Whitepaper v2.2 (May 2026) Reference Implementation: github.com/Numinor-Systems/sam-pm-construct-reference — MIT License

What this product is

A pre-engineered stock-level product-momentum signal for A-share listcos, derived from ChinaScope's SAM product taxonomy and daily price data. Each trading day, for each A-share listco with SAM coverage, Numinor publishes three signal values:

  • ne_composite_styled — the headline composite signal, suitable for direct use as a factor input or sortable rank
  • biz_mom_styled — the product-momentum component (revenue-mix-weighted aggregation of product-level momentum)
  • biz_resvol_styled — the product-residual-volatility component (revenue-mix-weighted aggregation of product-level residual volatility)

The buyer can use the composite directly, or combine the two components themselves with their preferred weights.

What this product is NOT

  • Not a buy/sell signal. This is a continuous-valued factor signal; the buyer's portfolio construction is theirs to design.
  • Not raw ChinaScope data. That is sold separately by ChinaScope. This product uses the raw SAM product mix and daily prices as input and reshapes them into a per-stock momentum signal.
  • Not a pre-computed strategy P&L. No long-short basket construction is performed — that's the buyer's choice.
  • Not residualized against Numinor's illustrative factors. The signal is factor-model agnostic; the buyer can apply their own factor neutralization on top.

Who it's for

Quant equity buyers operating in Chinese A-shares who want an orthogonal product-momentum factor with established validation (see WP v2.2). Particularly useful as a complement to traditional price-momentum factors, since this signal captures momentum at the product-level (where the company actually does business) rather than the stock-level (where prices are observed).


2. Schema

Primary schema (parquet and CSV — identical columns)

ColumnTypeDescription
trade_datedate32The trading date this signal value applies to (Asia/Shanghai timezone).
ts_codestringA-share stock ticker in format NNNNNN.XX (e.g., 000001.SZ, 600000.SH).
ne_composite_styledfloat64Composite product-momentum signal. Z-scored cross-sectionally per trade_date (i.e., mean ≈ 0 and std ≈ 1 across the cross-section on any given day).
biz_mom_styledfloat64Product-momentum component. Z-scored cross-sectionally.
biz_resvol_styledfloat64Product-residual-volatility component. Z-scored cross-sectionally.
source_basisstringAlways "sam_product_mix" in v1.0. Reserved for future variants (e.g., supply-chain-routed momentum).
source_rpt_datedate32The SAM source data effective date used to compute the revenue-mix weights (PIT-correct).
source_publish_datedate32When the governing filing became public. (schema v1.1)
eff_datedate32When the filing became usable: source_publish_date + 30 days. Strict audit columneff_date ≤ trade_date on every served row. (schema v1.1)

Example rows

trade_date  | ts_code    | ne_composite_styled | biz_mom_styled | biz_resvol_styled | source_basis     | source_rpt_date
2026-05-30  | 000001.SZ  | +0.31               | +0.42          | -0.18              | sam_product_mix  | 2025-12-31
2026-05-30  | 000002.SZ  | -0.62               | -0.81          | +0.55              | sam_product_mix  | 2025-12-31
2026-05-30  | 600519.SH  | +1.85               | +2.10          | +0.32              | sam_product_mix  | 2025-12-31
2026-05-30  | 300750.SZ  | -0.04               | +0.18          | -0.41              | sam_product_mix  | 2025-12-31

Type / value rules

  • trade_date: ISO 8601 calendar date (YYYY-MM-DD). Trading-day-aligned (Shanghai/Shenzhen Stock Exchange calendar). Non-trading days are NOT published.
  • ts_code: A-share listco tickers only. Format NNNNNN.SZ for Shenzhen, NNNNNN.SH for Shanghai.
  • ne_composite_styled, biz_mom_styled, biz_resvol_styled: float64, z-scored cross-sectionally per trade_date. Typical range [-3, +3] with extremes occasionally beyond ±5. Cross-section mean is ~0, std is ~1.0 (small deviations from exactly 1.0 due to NaN handling).
  • source_basis: String literal "sam_product_mix". New values may be added in future versions but existing values are stable.
  • source_rpt_date: The SAM source data effective date used for revenue-mix weighting. Always ≤ trade_date - 30 calendar days (see §4 PIT discipline).
  • source_publish_date: the filing's publication date (source_rpt_date ≤ source_publish_date).
  • eff_date: source_publish_date + 30 calendar days — when the filing became usable. eff_date ≤ trade_date on every row (the strict no-look-ahead invariant, verifiable directly from the data).

Sign convention

A higher ne_composite_styled predicts a higher 20-day forward return. Buyer convention should be:

  • Long the top quantile
  • Short the bottom quantile
  • Or use as a positive-direction factor weight in a multi-factor model

If a buyer's factor model expects "low value = good" (some Barra-style conventions), they can simply negate the column.

What's NOT in the schema (intentional)

  • No "raw" intermediate values (biz_mom_daily, biz_mom_neu, z_mom, etc.) — these can be reconstructed from the source data + methodology doc + reference code, and aren't needed for normal use.
  • No per-product breakdown (product-level momentum values aggregated to the stock) — the aggregation is the product, and exposing per-product would expose the SAM taxonomy details that are ChinaScope's IP, not Numinor's.
  • No factor exposures (size, value, momentum betas, etc.) — buyer's own factor model handles this.

3. File Layout & Delivery

File layout — Hive-partitioned daily

The product is published as Hive-partitioned daily parquet (one partition per trading day), mirroring SAM Amplifier 构建数据:

s3://numinor-construct-data/sam-pm/parquet/
└── year=YYYY/month=MM/day=DD/data.parquet     (one partition per trading day)

Each partition holds every covered stock's row for that trade_date (~3,000–4,600 rows). A CSV mirror (data.csv.zip, ZIP-compressed, identical columns) is written alongside each parquet partition. Parquet is snappy-compressed. The API layer (§7) mints signed-URL access over this store — /historical returns the full range, /delta/{YYYYMMDD} a single day's partition, /range a date span.

Current published coverage: 2016-01-05 → 2026-04-07 (2,489 daily partitions); advances each trading day via the daily refresh cron (scripts/cron/run_cron_a.py).

File sizes (approximate)

FileParquetCSV.zip
Historical dump (2016-2025, ~10M rows: ~4000 stocks × 2500 trading days)~120 MB~400 MB
Daily delta (one trading day, ~3000-5000 rows)~0.5-1 MB~1-3 MB

These files are an order of magnitude smaller than SAM Amplifier 构建数据 because SAM PM is one row per (stock, day) rather than many edges per (stock, day).

Update cadence & SLA

  • Daily refresh: new sam_pm_delta_YYYYMMDD.parquet published by 06:00 Asia/Shanghai time, for trading day YYYYMMDD (T+1).
  • No publication on Chinese A-share market holidays.
  • Historical dump: issued once per subscriber at onboarding; rebuilt only when methodology changes (rare; documented in changelog).
  • Methodology stability: Numinor commits not to change the methodology mid-version. Methodology updates trigger a version bump (e.g., v1.1) with 60-day advance notice.

Buyer-chosen rebalance cadence

The signal is published every trading day for every covered stock. The buyer is NOT restricted to any particular rebalance cadence:

  • Monthly rebalancer? Pull signal values from each month-end.
  • Weekly Wednesday rebalancer? Pull from each Wednesday.
  • 20-trading-day rebalancer (as in WP v2.2)? Pull every 20th trading day from your chosen anchor.
  • Daily rebalancer? Pull every day.

The Numinor pipeline delivers daily; the buyer's pipeline decides which dates to consume.


4. PIT Discipline

The product is point-in-time correct: a signal value dated trade_date = D uses only ChinaScope source data that was already publicly available on day D.

Publish lag rule

For every source used in v1.0:

source_publish_date + 30 calendar days ≤ trade_date

This buffer models realistic vendor delivery (ChinaScope T+1) + institutional-buyer ingestion / recompute / deployment lag (typically 3-4 weeks combined) + a modest conservatism cushion. It matches the convention used in Numinor's SAM Amplifier construct, keeping the lag rule consistent across our catalog.

Measured availability floor (2026-06): across 5,675 filings spanning the FY2025 annual + Q1 reporting season, 99% of SAM records were delivered within 4 days of publish_date (99.9% within 30). The 30-day buffer is therefore ~4 days of measured availability + ~26 days of buyer-workflow allowance — conservative by construction.

This is mechanically enforced by the pipeline — the revenue-mix weights underlying the signal at trade_date = D are sourced from filings whose publish_date + 30 calendar days ≤ D.

Daily price data PIT

Daily price returns (used to compute the daily momentum component before aggregation) are PIT by construction: returns on day D are computed from the close of day D price relative to the close of day D-1. The signal value for trade_date = D uses returns through close of D.

What this means for the buyer

  • The buyer never receives a signal value that "looked into the future" relative to its trade_date.
  • Backtests using this data inherit the PIT discipline automatically.
  • For audit / transparency, the buyer can verify the strict no-look-ahead rule directly from the served data: eff_date ≤ trade_date on every row, where eff_date = source_publish_date + 30 (all three dates ship as of schema v1.1). The report-date proxy (source_rpt_date + 30 ≤ trade_date) also holds but is the weaker check.

Relationship to the WP

The published SAM PM Whitepaper v2.2 used 30-day lag throughout. The v1.0 construct data matches this exactly. A buyer running the WP's methodology on the construct data should reproduce structurally similar results (subject to differences in factor models for evaluation purposes — the WP uses Numinor's illustrative 22-factor base, which the buyer would not use directly).

Can the buyer change the lag?

  • Tighter lag (<30 days): not available through this product. Requires licensing raw ChinaScope SAM + daily price data and running your own pipeline.
  • Looser lag (>30 days, more conservative): easily applied buyer-side — simply consume the signal at trade_date + extra_days in your pipeline.

5. Signal Construction — Numinor's Engineering Value-Add

ChinaScope ships raw SAM data (per-company × per-product revenue mix tables) and daily stock prices. For stock-level quantitative analysis, the buyer needs these inputs assembled into a per-stock momentum signal with appropriate aggregation, PIT discipline, and z-scoring. That assembly is the engineering work Numinor performs.

Signal recipe (Construction R, the v1.0 canonical)

For each trade_date = D:

Step 1 — Daily product-level returns. Each SAM product node p is mapped to its constituent A-share listcos with revenue exposure. For each product on day D, compute a revenue-share-weighted return of its constituent stocks (with self-exclusion: each focal stock is excluded from its own products' aggregates when constructing the focal's signal).

Step 2 — Daily product-level residual return. Strip out cross-sectional mean from product returns per date to get the product's daily residual return. This isolates idiosyncratic product-level momentum from market-wide moves.

Step 3 — Rolling 20-trading-day momentum. For each product, compute the trailing-20-day sum of daily residual returns → biz_mom_daily[p, D]. Also compute the trailing-20-day standard deviation → biz_resvol_daily[p, D].

Step 4 — Project back to focal stock. For each focal stock i, aggregate biz_mom_daily[p, D] across the focal's products, weighted by the focal's revenue share on each product:

biz_mom[i, D] = Σ_p revenue_share[i, p] × biz_mom_daily[p, D]

Same aggregation for biz_resvol[i, D].

Step 5 — Cross-sectional z-score per date. Standardize each of biz_mom and biz_resvol to mean 0, std 1 across the A-share cross-section on day D. This gives biz_mom_styled and biz_resvol_styled.

Step 6 — Composite. Combine the two styled components:

ne_composite_styled = z_mom_weight × biz_mom_styled + z_resvol_weight × biz_resvol_styled

Weights are tuned in the methodology doc. Sign of ne_composite_styled is calibrated so higher = predicts higher 20-day forward return.

Universe filtering

After computing raw signals:

  1. Drop stocks with insufficient SAM coverage (no revenue mix data at trade_date - 30 days)
  2. Drop stocks suspended for extended periods (cross-section unstable)
  3. Drop stocks not in A-share SH/SZ/KC/CYB listco universe
  4. Drop stocks IPO'd after trade_date or delisted before trade_date

Daily refresh

At each new trade_date:

  1. Update daily product-level returns (one new day of data)
  2. Roll the 20-day window forward by one day
  3. If any new SAM source data has just crossed publish_date + 30 days, update affected revenue-mix weights
  4. Recompute z-scores per date
  5. Emit a delta file containing the new day's row for every covered stock (~3000-5000 rows)

What the buyer pays for vs. does themselves

StepDone by NuminorBuyer would need to do
Read raw ChinaScope SAM + daily pricesSchema knowledge, multi-table joins
Apply PIT discipline (revenue-mix as of date)Track each filing's publish_date
Compute product-level returns with self-exclusionImplement aggregation correctly
20-day rolling momentum & residual volatilityMaintain rolling-window state
Cross-sectional z-scoring per dateRecompute z-scores daily
Composite weightingChoose weights, replicate methodology
Daily refresh pipelineRun own pipeline daily
Historical snapshotsBuild own historical store

Doing this end-to-end from raw ChinaScope data is approximately 2-3 weeks of focused data engineering work for an experienced team, plus ongoing maintenance.


6. Universe Rules

Inclusion

  • All A-share listcos with SAM coverage at trade_date, traded on Shanghai (SH), Shenzhen (SZ), STAR Board (KC), or ChiNext (CYB).
  • Stocks under brief suspension on trade_date are included in the signal if their underlying product-mix data is still valid (the signal doesn't require trading on trade_date to be computable).
  • Stocks delisted before trade_date are excluded from that date forward.
  • Stocks IPO'd after trade_date are excluded prior to listing.

Exclusion (by design)

  • Beijing Stock Exchange (BJ): excluded.
  • Stocks with no SAM coverage (no product revenue mix in ChinaScope's SAM tables): excluded.
  • Stocks with <90 trading days of price history (insufficient to compute residual volatility): excluded.

Coverage start

  • The dataset covers 2016-01-04 → present. SAM data prior to 2016 has insufficient depth.
  • Signal effective from ~2016-02-02. The momentum/residual-volatility features need a ~20-trading-day rolling window; with no price history before 2016-01-04 to seed it, the first ~19 trading days of 2016 (early January) carry no computable signal (NaN). Every later date is fully warmed. (Partitions in this start-of-data window may be absent or NaN; treat as "no signal".)

Typical universe size

  • 2016: ~3,000 stocks per day
  • 2026: ~4,500 stocks per day
  • Coverage grows roughly with the A-share listing universe over time

7. API Specification

All API access is via signed URL minting. The buyer authenticates once with their API key; the API returns a time-limited S3 URL the buyer downloads from directly.

Base URL

https://api.numinor.io/v1/constructs/sam-pm

Authentication

Authorization: Bearer <numinor_api_key>
  • Default expiration: none. API keys do not expire automatically.
  • Rotation: client-controlled via subscriber dashboard. Subscribers may rotate at any cadence; we recommend 90 days as security best practice.
  • Revocation: immediate. Compromised keys can be invalidated instantly via the dashboard.
  • Multiple keys per subscriber: supported, useful for separating dev / staging / production access.

Endpoints

Identical surface to SAM Amplifier 构建数据. The construct path is /v1/constructs/sam-pm instead of /v1/constructs/sam-amplifier.

EndpointMethodReturns
/manifestGETSchema, available date range, total row count
/historicalGETSigned S3 URL for historical dump file
/delta/{YYYYMMDD}GETSigned S3 URL for a specific date's delta file
/rangeGETList of signed S3 URLs for a date range
/queryPOSTFiltered query results inline as JSON (small results only)

/query body shape

{
  "trade_date": "2026-05-30",
  "ts_code": "000001.SZ"    // optional; omit for all stocks on that date
}

Rate limits

EndpointLimitPer-response cap
POST /query100 req/min per API key, burst 20 in 10 sec10,000 rows per response. Exceed → HTTP 413, use /range instead
GET /historical, /delta, /rangeunlimitedn/a (signed S3 URL)
GET /manifest1000 req/minsmall JSON

Signed URL validity: 4 hours from issuance. Within the validity window, downloads from S3 are unlimited.

Coming in v1.1: MCP

We will publish an MCP server exposing the same API as native LLM tools (get_sam_pm_signal(date, ts_code)) once MCP infrastructure matures across major model providers.


8. Quickstart: From Subscription to First Value in 5 Minutes

1. Get your API key

Issued via email at onboarding. Store as NUMINOR_API_KEY:

export NUMINOR_API_KEY="nm_live_..."

2. Pull today's delta

Python (pandas):

import requests, os, pandas as pd

key = os.environ["NUMINOR_API_KEY"]
date = "20260530"
resp = requests.get(
    f"https://api.numinor.io/v1/constructs/sam-pm/delta/{date}",
    headers={"Authorization": f"Bearer {key}"}
).json()
df = pd.read_parquet(resp["url"])
print(df.head())
# Output:
# trade_date | ts_code   | ne_composite_styled | biz_mom_styled | biz_resvol_styled | ...

Python (polars):

import polars as pl
df = pl.read_parquet(resp["url"])

R:

library(arrow)
library(httr)
resp <- httr::GET("https://api.numinor.io/v1/constructs/sam-pm/delta/20260530",
                  httr::add_headers(Authorization=paste("Bearer", Sys.getenv("NUMINOR_API_KEY"))))
url <- jsonlite::fromJSON(httr::content(resp, "text"))$url
df <- arrow::read_parquet(url)

DuckDB:

INSTALL httpfs;
LOAD httpfs;
SELECT ts_code, ne_composite_styled FROM read_parquet('<signed_url>') ORDER BY ne_composite_styled DESC LIMIT 100;

3. Pull the historical dump (one-time)

resp = requests.get(
    "https://api.numinor.io/v1/constructs/sam-pm/historical",
    headers={"Authorization": f"Bearer {key}"}
).json()
df_history = pd.read_parquet(resp["url"])

4. Use the signal in your strategy

Simplest use — sort and trade:

# Latest cross-section
today = df_history[df_history["trade_date"] == "2026-05-30"]

# Long top quintile, short bottom quintile
long_basket  = today.nlargest(int(len(today) * 0.20), "ne_composite_styled")["ts_code"].tolist()
short_basket = today.nsmallest(int(len(today) * 0.20), "ne_composite_styled")["ts_code"].tolist()

Or use as a feature in your multi-factor model:

# Merge with your factor file
my_factors = pd.read_parquet("my_factor_panel.parquet")
combined = my_factors.merge(
    df_history[["trade_date", "ts_code", "ne_composite_styled"]],
    on=["trade_date", "ts_code"], how="left"
)
# combined now has your factors + ne_composite_styled as a new column
# Use in your usual factor-combination pipeline

5. Ask Gandalf for help

Stuck on integration? Open Gandalf (your onsite AI assistant), ask:

"How do I use ne_composite_styled in a vol-scaled portfolio?" "What's the difference between biz_mom_styled and biz_resvol_styled?" "How do I run the WP v2.2 multi-offset robustness test on this data?"

Gandalf has context on the data dictionary, the reference code, and methodology.


9. Onboarding Checklist

When a new subscriber comes online:

StepOwnerTime
1. Subscription contract signedSales
2. API key generated, emailed to subscriber's technical leadNuminor ops< 1 hour
3. Subscriber tests /manifest endpoint to confirm accessSubscriber5 min
4. Subscriber downloads historical dumpSubscriber1-2 min (~120 MB)
5. Subscriber validates schema against this data dictionarySubscriber10 min
6. Subscriber runs reference code (sort + long/short example)Subscriber10-20 min
7. Subscriber's first signal-driven backtest producedSubscriberend of day 1
8. Optional: live integration call with Numinor teamJoint1 hour

Total time from contract to first usable signal: < 1 business day.


10. Versioning & Changelog

VersionDateNotes
v1.02026-05-28Initial release. Mirrors SAM PM WP v2.2 (May 2026) Construction R signal at offset=0, daily refresh.
v1.0 (published)2026-06-05Historical dump published as Hive daily partitions (sam-pm/parquet/, 2016-01-05 → 2026-04-07, 2,489 partitions). Realm registry live (sam-pm/realm/: catalog.json, dials.json, applicability matrix). Pipeline numinor_sam_pm/construct.py (Construction R) + daily refresh cron scripts/cron/run_cron_a.py. Defaults reproduce WP v2.2 multi-offset orthogonal ICIR +0.3497 full / +0.3523 OOS (vs 22-factor base), 100% positive offsets.

Roadmap

  • v1.1 (planned Q3 2026): MCP server for LLM-native data access.
  • v1.2 (planned Q4 2026): Optional multi-offset variants for buyers wanting to replicate WP §6 multi-offset robustness internally.
  • v1.3 (planned Q4 2026): Construction S variant (source-residualized daily returns) for buyers who already factor-neutralize at the daily-returns layer.
  • v2.0 (planned 2027): SAM Supply Chain v5 incorporation, plus optional supply-chain-routed momentum (combining SAM PM with SAM Amplifier methodology).

Subscribers receive 60-day advance notice of any breaking changes.


11. FAQ

Q: How is this different from buying ChinaScope's raw SAM data directly?

A: ChinaScope sells the raw sam_product_calc and daily price tables. To compute the product-momentum signal yourself, you'd need to (a) implement the revenue-share aggregation with self-exclusion, (b) build the rolling 20-day momentum and residual volatility pipeline, (c) handle PIT discipline correctly, (d) maintain daily refreshes. Numinor does all this for you with a published, peer-reviewable methodology (WP v2.2) and a reference implementation. You pay for the engineering + ongoing maintenance, not the data.

Q: Why three columns instead of just ne_composite_styled?

A: Most buyers will use only ne_composite_styled (the headline composite). The two components are included for buyers who want to:

  • Combine the components with their own weights (rather than our default composite weighting)
  • Use only momentum or only residual-volatility separately
  • Validate that the composite is calculable from the components

If you only want one column, you can drop the other two on read.

Q: Is the WP v2.2 ICIR of +0.42 / +0.35 what I should expect?

A: Those numbers were computed on Numinor's illustrative 22-factor base for orthogonalization. Your numbers depend on YOUR factor model. As a directional benchmark: the signal has consistent positive cross-sectional information beyond standard size/value/momentum factors. Magnitude depends on what's already in your stack.

Q: Does the signal work better at certain forward horizons?

A: Per WP §6.4, the signal is most predictive at 60-day forward horizon (orth-ICIR +0.46/+0.48 raw, +0.27/+0.28 de-overlapped). At the canonical 20-day horizon, it's +0.42/+0.35 (raw). The signal is "slow alpha" by construction — product-spillover effects play out over weeks, not days. Buyers running daily-rebal strategies should account for this; weekly-or-monthly-rebal strategies are well-aligned.

Q: What if a stock has no signal value on a given date?

A: It's simply absent from the delta file for that date. The buyer's pipeline should treat missing ne_composite_styled as "no signal available" — fall back to whatever default behavior makes sense (no position, average factor value, etc.). The reference code demonstrates the fallback pattern.

Q: Does the signal account for industry / sector effects?

A: The cross-sectional z-scoring per date provides one layer of normalization (the signal value tells you "how does this stock's product-momentum rank against the whole A-share cross-section today?"). For industry-relative or sector-neutral use, buyers typically apply their own industry/sector neutralization on top. The signal is delivered "raw cross-sectional" so the buyer can choose their preferred neutralization.

Q: Why does the composite use biz_mom + biz_resvol instead of just biz_mom?

A: Per WP §3, including the residual-volatility component improves OOS ICIR by ~+0.05-0.10. Empirically, product-residual-volatility carries unique predictive content (likely reflecting product-level information dispersion and idiosyncratic risk pricing) that complements pure momentum. Both are included.

Q: Can I run my own backtest to verify the WP's claims before purchasing?

A: Yes. Request a 30-day evaluation license (contact sales). Evaluation includes full historical dump + 30 days of daily deltas, with the same API access. The buyer can replicate WP §4-§6 in their own infrastructure.


12. Contact & Support

ChannelUse case
Gandalf (in-product)First-line technical questions, code examples, methodology clarifications
support@numinor.ioEverything else — production issues, subscriptions, data quality, methodology

Appendix A: Schema Reference Card (Printable)

Numinor SAM Product Momentum 构建数据 v1.0
Format: parquet (.parquet) or CSV/ZIP (.csv.zip)

  trade_date              date32     YYYY-MM-DD, Shanghai trading days
  ts_code                 string     NNNNNN.SZ or NNNNNN.SH
  ne_composite_styled     float64    z-scored composite signal (higher = predicts higher fwd ret)
  biz_mom_styled          float64    z-scored momentum component
  biz_resvol_styled       float64    z-scored residual-vol component
  source_basis            string     "sam_product_mix" (v1.0)
  source_rpt_date         date32     YYYY-MM-DD, fiscal period of the governing filing
  source_publish_date     date32     YYYY-MM-DD, when that filing became public
  eff_date                date32     YYYY-MM-DD, publish + 30d; STRICT: eff_date ≤ trade_date

Universe:        A-shares (SH + SZ + KC + CYB); not BJ
Cadence:         daily, one row per stock per trading day
Coverage:        2016-01-04 → present
PIT discipline:  source_rpt_date + 30 days ≤ trade_date
Forward horizon: signal calibrated to 20-day forward returns (per WP v2.2 §3)
Sign:            higher ne_composite_styled = predicts higher 20-day forward return
Layout:          Hive year=YYYY/month=MM/day=DD/data.parquet (+ data.csv.zip mirror)
Delivery:        s3://numinor-construct-data/sam-pm/parquet/  (Hive daily partitions)
                 via API signed-URL minting at https://api.numinor.io/v1/
Realm registry:  s3://numinor-construct-data/sam-pm/realm/ (catalog.json, dials.json,
                 ANALYSIS_DIAL_APPLICABILITY.md)
Build dials:     data tier (pit_buffer_days, sam_level, revenue_threshold, min_peer_count,
                 mom_window) + analysis tier — see realm/dials.json

Appendix B: How SAM PM 构建数据 Relates to SAM Amplifier 构建数据

Both products derive from ChinaScope's SAM data, but they capture different aspects of the network:

SAM Amplifier 构建数据SAM PM 构建数据
What it isStock-to-stock relationship graph (peer / upstream / downstream edges)Per-stock momentum signal
SchemaEdge-list: (focal, counterparty, weight)Stock-day: (date, ts_code, signal value)
Buyer usageApply as aggregation operator on buyer's own factorsUse directly as a factor input or sort signal
UpdatesDaily delta of changed edgesDaily delta of new day's signals
Subscriber asks Gandalf"Which stocks are similar to 600519.SH today?""What's the momentum signal for 600519.SH today?"

Subscribers can purchase one or both. They are complementary, not substitutes — Amplifier captures who is connected to whom; PM captures which stocks have product-level momentum. Bundling discounts available.


End of SAM PM 构建数据 v1.0 specification. Methodology reference: Numinor SAM PM Whitepaper v2.2. Reference implementation: github.com/Numinor-Systems/sam-pm-construct-reference (MIT License). © 2026 Numinor Systems. All rights reserved on product definitions; reference code MIT-licensed.