Numinor C2C Supply-Chain Construct Data — Data Dictionary v1.0
SKU c2c-construct-v1 · Methodology Numinor C2C Supply-Chain Whitepaper v3.0 ·
Canonical repo Numinor-Systems/c2c-codebase (MIT) · June 2026
1. What this is
A daily-refreshed, point-in-time edge table of observed company-to-company supply-chain relationships between Chinese A-share listed companies. Two observation channels, one table:
- disclosed — mandatory periodic-report disclosures: top-5 customers / top-5 suppliers and related-party transactions (reporting periods from 2015);
- bid — awarded procurement contracts (winning-bid announcements, from 2020).
Both channels are resolved to listed-company identity on both sides through ChinaScope's structured affiliate-ownership tables — subsidiaries and operating entities roll up to their listco parents with real ownership ratios. No language model is involved anywhere in the construction; the rollup is deterministic joins over structured identifier tables.
The product is the graph, not a signal. The customer-momentum signal validated in the whitepaper (union orthogonal ICIR +0.470 full / +0.394 OOS, t = 2.8) is one construction over this table; the same edges support supplier-side spillover, concentration and counterparty-risk measures, network centrality, and shock-propagation studies.
2. Delivery and layout
s3://numinor-construct-data/c2c/parquet/year=YYYY/month=MM/day=DD/data.parquet
data.csv.zip (same columns)
s3://numinor-construct-data/c2c/_heartbeat.json (freshness)
One live store, all tiers — Sandbox/Realm computes in-kernel on exactly the data the API delivers; freshness is identical everywhere. Refresh is daily by 16:00 Asia/Shanghai, seven days a week.
Partitions are delivery batches, not the PIT clock. A partition holds the edges
emitted that day (live: the day the records arrived from the source feed; historical
backfill: the day the edge became public). Always filter on the eff_date
column — never on partition dates. The stream is append-only: corrections arrive as
new rows with the same edge_id and a CDC operation flag; the latest row per
edge_id is the current state.
3. Schema (25 columns, in order)
| # | Column | Type | Meaning |
|---|---|---|---|
| 1 | edge_id | string | Stable identity: d:<record>:<sup>:<cus> (disclosed) / b:<bid>:<sup>:<cus> (bid). Latest row per edge_id wins. |
| 2 | source | enum | disclosed | bid |
| 3 | relation_type | enum | trade (disclosed) | procurement_award (bid) |
| 4 | supplier_ts | string | Seller (focal) listco — NNNNNN.SH/.SZ/.BJ |
| 5 | customer_ts | string | Buyer listco |
| 6 | relation_value_cny | double | Ownership-adjusted economic value: raw value × both ownership ratios |
| 7 | raw_value_cny | double | Pre-adjustment source value (bid awards occasionally lack a parsed price) |
| 8 | balance_value_cny | double | Ownership-adjusted ending balance — disclosed only |
| 9 | supplier_own_ratio | double | Listco ownership of the seller party (1.0 = the listco itself) |
| 10 | customer_own_ratio | double | Same, buyer side |
| 11 | supplier_resolution | enum | direct | affiliate_rollup |
| 12 | customer_resolution | enum | direct | affiliate_rollup |
| 13 | raw_supplier_id | string | ChinaScope entity id of the party as disclosed |
| 14 | raw_customer_id | string | — |
| 15 | raw_supplier_name | string | Bid channel only (the disclosed source carries ids) |
| 16 | raw_customer_name | string | — |
| 17 | source_record_id | string | The underlying disclosure record / bid id — audit trail to the raw feed |
| 18 | source_rpt_date | date | Reporting period (disclosed) / award date (bid) |
| 19 | source_publish_date | date | When the edge became public (see §4) |
| 20 | source_basis | enum | filing_publish | rpt_proxy | award_announcement |
| 21 | eff_date | date | The PIT contract: source_publish_date + pit_buffer_days |
| 22 | pit_buffer_days | int32 | The buffer baked into column 21 (product default 30) |
| 23 | operation | enum | CDC flag as delivered (A/U/D) |
| 24 | ingestion_date | date | When the record arrived (live) / its publish-basis day (backfill) |
| 25 | data_vintage | string | Ownership-mapping snapshot (YYYYMMDD) used for the rollup |
4. Point-in-time rule
eff_date = source_publish_date + pit_buffer_days (default 30, dial 0–120)
- Disclosed:
source_publish_date= the earliest filing publication date of either resolved listco for that reporting period (source_basis = 'filing_publish'). Where no filing date resolves, a conservativereport-period + 30dproxy applies, marked'rpt_proxy'. - Bid: the award result/announcement date itself (
'award_announcement'). - The 30-day default decomposes as ~4 days of measured availability necessity (p99
delivery latency of the underlying feed, measured across a full filing season) plus
~26 days of buyer-workflow allowance. Because
source_publish_dateships in every row, you can apply any buffer instantly — recomputeeff_datefrom the basis; nothing needs rebuilding. - The whitepaper's research convention (report-date + 60d disclosed, +0d bid) is pinned in the frozen data package for bit-exact replication; the feed is the forward-looking product convention.
- Naming note: whitepaper Appendix A.6 sketched these fields as
available_date(its schema is explicitly representative); the production columns follow the platform-wide Date 1–4 audit standard —source_rpt_date≤source_publish_date≤eff_date, enforced row-exact by the validation gate.
5. Construction notes a buyer should know
- Clean vintage discipline: one row per source record per genuine listco parent; the ownership mapping is deduplicated to the latest vintage per (entity, parent), and multi-parent entities legitimately yield one edge per parent, ownership-weighted. The production construction was re-run through the whitepaper's evaluation harness before launch: union orthogonal ICIR +0.465 / +0.381 vs the published +0.470/+0.394 — the finding is construction-robust.
- Self-edges are excluded (both sides resolving to the same listco).
- Ownership ratios are carried as delivered; rare source anomalies with ratios slightly above 1.0 exist and are not clipped (provenance-faithful; quantified in the release audit).
- Negative source amounts (~0.15% of disclosed records — filing reversals/ corrections) are excluded: a negative relationship value has no meaning as a graph weight. Quantified in the release audit.
- Coverage character: disclosed edges cluster at filing seasons (Apr/Aug/Oct); the bid channel is a steadier daily trickle from 2020. ~4,860 unique sellers reach the union construction historically (~half the tradable A-share universe per month-end).
- Validation gate: every partition passes a schema/enum/identifier/PIT-identity/ degeneracy gate before upload; the heartbeat carries the latest validation block.
6. Quickstart
import pandas as pd, pyarrow.dataset as pads
from pyarrow import fs
ds = pads.dataset("numinor-construct-data/c2c/parquet", format="parquet",
partitioning="hive", filesystem=fs.S3FileSystem(region="ap-northeast-2"))
edges = ds.to_table().to_pandas()
# current state of the graph usable at time t (the only correct PIT read):
t = pd.Timestamp("2026-04-30")
live = (edges[edges["eff_date"] <= t]
.sort_values("ingestion_date").groupby("edge_id").last()
.query("operation != 'D'"))
To rebuild the whitepaper's momentum signal on top, use the MIT reference implementation
(Numinor-Systems/c2c-codebase, notebooks 01–04) — build_disclosed,
build_bid_band_median, standardize_union, orth_eval.
Ask Gandalf for help
Stuck on integration? Open Gandalf (your onsite AI assistant) and ask:
"How do I read the C2C edge table point-in-time as of a backtest date?" "How do I rebuild the whitepaper's union signal against my own factor book?" "What does source_basis = 'rpt_proxy' mean for my backtest?"
Gandalf has context on this dictionary, the reference code, and the methodology.
7. Versioning
| Component | Version |
|---|---|
| Edge-table schema | 1.0 |
| Methodology | Whitepaper v3.0 |
| Frozen research vintage | c2c-data-package @ 7cb44891 (immutable) |
| Codebase / reference impl | c2c-codebase ≥ 1.1.0 (MIT) |
Schema changes bump the schema version and are announced ahead of effect; the frozen whitepaper vintage is never modified.
Numinor Systems Limited · Gandalf (onsite) · support@numinor.io