NuminorBeta

Numinor Co-Movement Graph Construct Data — Data Dictionary v1.0

SKU comovement-graph-v1 · Methodology Numinor Co-Movement Graph Whitepaper v1.3 · Canonical repo Numinor-Systems/comovement-codebase (MIT) · June 2026


1. What this is

A point-in-time, pairwise relationship feed over China A-shares that forecasts which names genuinely co-move — and flags which observed correlations are spurious. It is built in two layers:

  • The substrate — the news co-movement graph. Every pair in the feed is one the market is co-mentioning in news over the trailing 90 days. This is the candidate universe: broad, timely, and by itself noisy.
  • The overlay — the structural grading. Each co-mentioned pair is annotated with four structural "lamps" (deep product peer, SAM supply chain, disclosed customer–supplier, affiliate). A pair with ≥1 lamp is confirmed; a pair with none — co-mentioned but structurally unexplained — is dark (the blacklist).

The product is a risk/correlation layer, not a return signal and not a portfolio. It ships as two tables: the edge feed (the graph) and a derived per-stock peer set (the hedging / comparables view). Every number in the whitepaper is reproduced from these tables by the reference codebase (verify_outputs.py → "✓ matches whitepaper").

This is not a structural-relationship map. A network-only feed would list every supply-chain or affiliate pair across ~6,000 names — most dormant, and silent on which observed correlations are spurious. By scoping to the news substrate, the feed keeps only the links the market is currently acting on, and gains the one field a static map cannot have: the blacklist.

2. Delivery and layout

Follows the Numinor Construct Data standard (one live store, all tiers; see NUMINOR_CONSTRUCT_DATA_SPEC.md):

$COMOVEMENT_DATA_DIR/comovement_edges/T=YYYY-MM-DD.parquet   (the edge feed — the graph)
$COMOVEMENT_DATA_DIR/comovement_peers/T=YYYY-MM-DD.parquet   (the per-stock peer set)
$COMOVEMENT_DATA_DIR/model_coefficients.json                 (the forecast model)

Cadence: a daily point-in-time snapshot (the Ops-Desk daily cron, scripts/cron/run_cron.py). The live feed publishes one partition per trading day to s3://numinor-construct-data/comovement/parquet/year=/month=/day=/data.parquet (edges) + comovement/peers/... (peer set); the trailing-90d co-mention window and trailing correlation roll each day, so today's partition is the graph as-of today. The research vintage above (comovement_edges/T=YYYY-MM-DD.parquet, one file per month-end) is what reproduces the whitepaper — the daily feed's month-ends are exactly those vintages.

The PIT contract is the trade_date column. A partition holds the graph as it stood that day; to read the graph usable at backtest time t, take the latest partition with trade_date <= t.

3. Table A — comovement_edges (the edge feed)

One row per (trade_date, ts_a, ts_b) news-co-mention A-share pair (~54k pairs/vintage over ~5,450 names). Columns are ordered substrate → overlay → output → scope.

LayerColumnTypeMeaning
keytrade_datedateas-of month-end (the PIT contract)
keyts_a, ts_bstringthe A-share pair, canonical order ts_a < ts_b (NNNNNN.SH/.SZ/.BJ)
substratecomention_strengthint32news co-mention weight over the trailing window (how heavily the two are linked in the news)
substratecomention_daysint32distinct days the pair was co-mentioned in the window
substratetrailing_corrdoublerealized 90-day return co-movement (the normalized measure of §3.2 of the whitepaper)
overlaylamp_product_peerboolthe two firms share a product (any depth on the SAM tree) — the shallow-peer base
overlaypeer_depthint8deepest shared level on the SAM product tree (≥4 = same specific product)
overlaylamp_deep_peerboolpeer_depth >= 4 — the standout forecasting lamp
overlaylamp_sam_chainboolSAM supply-chain link (core inputs), product-derived and continuously available
overlaylamp_disclosedboolfinancial-statement disclosed customer–supplier, active within 2 years, bids excluded
overlaylamp_affiliateboolcommon ownership / cross-holding
outputn_lampsint8how many of the four lamps are lit (the relatedness gate)
outputconfirmedbooln_lamps >= 1 — structurally explained co-movement
outputdarkbooln_lamps == 0 — co-mentioned, structurally unconfirmed (the discount flag)
outputtierstringdeep_peer / multi / single / dark (the grading ladder)
outputexpected_fwd_corrdoubleforward-correlation forecast = const + trailing·c + dark·c + Σ lamp·c (develop-fit coefficients)
outputcorr_deltadoubleexpected_fwd_corr − trailing_corrwhere the graph disagrees with the price screen
scopecap_tierstringCSI300 / CSI500 / CSI1000 / other — pair is in a tier iff both names are members (the hedging edge is large-cap; §8)

4. Table B — comovement_peers (the per-stock peer set)

Derived from Table A: each name's top-10 structural peers, the raw material for a hedge basket or a comparables set. One row per (trade_date, ts_code, peer_rank).

ColumnTypeMeaning
trade_datedateas-of month-end
ts_codestringthe focal A-share name
peer_rankint1 = closest peer (rank by relatedness, then co-mention strength)
peer_tsstringthe peer name
peer_scoreintrelatedness score = 2·deep_peer + 2·disclosed + n_lamps
tierstringthe pair's grading tier
trailing_corrdoublethe pair's trailing correlation
expected_fwd_corrdoublethe pair's forward-correlation forecast
confirmedboolstructurally confirmed (always true for ranked peers)

5. Point-in-time rule

Every layer is constructed as-of trade_date with no look-ahead:

  • Substrate — co-mention is measured over the trailing 90 days ending trade_date; trailing_corr uses returns up to and including trade_date.
  • Lamps — product versions and ownership enter as-of trade_date; the disclosed lamp uses only financial-statement relationships with an availability date in (trade_date − 2yr, trade_date], rebuilt point-in-time (not accumulated), so a relationship that has gone quiet drops out. Tender/bid awards are excluded — they do not forecast co-movement out-of-sample (whitepaper §8).
  • expected_fwd_corr uses coefficients fitted on the develop window only (2017–mid-2022), so a holdout pair's forecast never sees holdout data.

No forward field is ever shipped: forward correlation exists only inside the replication harness, where it is computed at analysis time to validate the feed.

6. Construction notes a buyer should know

  • Universe character. The feed concentrates on well-covered names — small caps are not co-mentioned heavily enough to enter the graph with signal. This is why the hedging edge is large-cap (whitepaper §8): the architecture and the scope boundary come from the same fact.
  • confirmed/dark is the hero field. It exists only because the news substrate gives an observed co-movement to certify. Confirmed correlations retain ~92% of their level a quarter forward vs ~75% for price-screened pairs.
  • expected_fwd_corr is a structural forecast, not a blacklist penalty. For a dark pair it falls back to the no-structure baseline; the blacklist lives in the dark flag and the retention evidence, not in a negative forecast.
  • Risk, not alpha. The graph forecasts how names move together, not which way. A return-spillover signal on the same graph is flat-to-negative out-of-sample (whitepaper §8).
  • Not a global risk model. Adding the graph to a whole-universe minimum-variance optimizer does not reduce realized variance; the value is targeted (single-name hedging, pairs, concentrated-position risk).

7. Quickstart

import pandas as pd, pyarrow.dataset as pads

edges = pads.dataset(f"{DATA}/comovement_edges", format="parquet",
                     partitioning="hive").to_table().to_pandas()

# the graph usable at time t (latest vintage on/before t):
t = pd.Timestamp("2026-04-30")
g = edges[edges.trade_date == edges.loc[edges.trade_date <= t, "trade_date"].max()]

confirmed = g[g.confirmed]          # trust these correlations
blacklist = g[g.dark]               # discount these — co-moving for no structural reason
peers = pd.read_parquet(f"{DATA}/comovement_peers/T={t:%Y-%m-%d}.parquet")
hedge_basket = peers[peers.ts_code == "600519.SH"].peer_ts.tolist()   # top-10 graph hedge

To reproduce the whitepaper, run scripts/verify_outputs.py (reference codebase, Numinor-Systems/comovement-codebase) → "✓ matches whitepaper".

8. Versioning

ComponentVersion
Feed schema1.0
MethodologyWhitepaper v1.3
Forecast modeldevelop-fit, shipped as model_coefficients.json
Codebase / reference implcomovement-codebase ≥ 1.0.0 (MIT)

Schema changes bump the schema version and are announced ahead of effect.


Numinor Systems Limited · support@numinor.io