Product-Level Intelligence: Granular Classification for Stock Selection

The Question

"Technology sector" includes cloud software firms with 80% gross margins and capital-light models alongside semiconductor fabs with 40% margins and multi-billion-dollar capex cycles. "Consumer discretionary" lumps luxury brands with e-commerce logistics companies.

Broad industry classifications obscure more than they reveal. When you build factors, run screens, or construct portfolios using these labels, you're grouping together companies with fundamentally different unit economics, competitive dynamics, and return drivers.

What if you classified companies not by what industry they're in, but by what products they actually make and sell?

The Approach

SAM (Segment and Market) Value Chain provides a 12-layer hierarchical product taxonomy covering the entire Chinese economy—from broad sectors down to granular product nodes like "lithium-ion battery cathode materials" or "cloud infrastructure services."

Using revenue segment data from financial disclosures, we map each company to multiple product nodes, weighted by actual revenue contribution. Unlike single-industry assignment, this creates a multi-label classification: 95% of CSI 300 stocks span 2+ products, and 67% span 5+ products.

We then implement the HIST (Hidden Information for Stock Trend) model, which decomposes stock returns into three information components:

Predefined concept information: Shared signals among stocks in the same product categories (what SAM provides)
Hidden concept information: Latent factors the model discovers (analogous to PCA or other unsupervised learning)
Individual stock information: Idiosyncratic returns not explained by concepts

The model uses a 2-layer GRU (gated recurrent unit) for time-series feature extraction, followed by attention mechanisms that aggregate information across product-related stocks. This lets the model ask: Given what's happening to other stocks making similar products, what should we expect for this stock?

The Finding

Using SAM's product-level concepts as predefined inputs, the HIST model achieved 13.45% excess return over the CSI 300 benchmark—outperforming an industry-based control group by 3.94 percentage points.

The alpha came from granularity. Product-level classification captured business model similarities that coarse industry labels missed. A semiconductor equipment manufacturer and a semiconductor material supplier both serve the same value chain, face similar demand drivers, and correlate in returns—but they're often classified in different industries (industrials vs. materials). SAM's product taxonomy unites them, and the HIST model exploits that shared information.

The synthesized concept factor showed increasing effectiveness with longer holding periods—suggesting the model captures structural, slow-moving co-movements rather than transient noise. After applying industry and market-cap neutralization, stock-specific information (idiosyncratic returns) performed better short-term, while concept-based factors dominated longer horizons. This aligns with intuition: fundamentals diffuse slowly through product networks.

Point-in-time logic ensured no forward-looking bias: the model only uses revenue data available at the time of prediction, avoiding the classic pitfall of alternative data strategies that accidentally leak future information into training sets.

Try It Yourself

Product-level classification requires rethinking how you structure data pipelines and factor models—but the payoff is a more accurate representation of economic reality.

Practical applications:

Multi-label factor models: Replace single-industry dummy variables with weighted product exposures when running cross-sectional regressions
Pair selection: Find statistically similar stocks based on product overlap, not arbitrary sector assignments
Thematic investing: Build product-specific baskets (e.g., "EV battery supply chain") that capture the entire value chain, not just one industry slice
Risk management: Decompose portfolio exposure by product category to identify hidden concentration risks

Interested in integrating SAM taxonomy into your research workflow? Book a call to discuss data structure, model architecture, and backtesting frameworks.

Product-Level Intelligence: Granular Classification for Stock Selection

The Question

The Approach

The Finding

Try It Yourself

Want to explore this with your own data?

Related Use Cases

Beyond Industry Labels: Mining Alpha from Company Relationship Networks

Proportional Sector Exposure: Precision Risk Management with Revenue Data

Growth Classification and Dynamic Valuation: Screening Beyond Analyst Coverage