Signal from Noise: Precision Filtering for News-Based Alpha

The Question

A news article mentions 47 companies in a market roundup. Another article analyzes a single company's product launch in depth. Both get aggregated into sentiment scores. Which one should matter more for stock selection?

Raw news volume is not raw alpha. Most mentions are noise—generic references, boilerplate industry overviews, tangential co-occurrences. The challenge isn't accessing news data; it's systematically separating signal from distraction.

How do you filter news in a way that amplifies predictive power without over-fitting to historical quirks?

The Approach

We test four dimensions of news filtering using SmarTag's structured metadata:

1. Relevance Scoring
SmarTag assigns each company mention a relevance score (0-1) based on linguistic context—does the article focus on this company, or just mention it in passing? We filter out low-relevance mentions (scores < 0.3) to remove noise from market roundups and generic industry commentary.

2. Sentiment Significance
Not all sentiment is created equal. An article with 45% positive and 40% negative sentiment is ambiguous—likely balanced or hedged. We filter for high-conviction sentiment where |positive% - negative%| exceeds a threshold (e.g., 20 percentage points), retaining only articles with clear directional tone.

3. Product/Industry Specificity
SmarTag tags articles with mentioned products and industries. We filter for alignment: keep news where the mentioned product/industry matches the company's core business (from SAM taxonomy). This removes generic "tech sector" articles that mention Apple, Microsoft, and 30 other names, focusing instead on company-specific product news.

4. Single-Entity Focus
Articles mentioning only 1-3 companies are more informative than those mentioning 20+. We filter for focused coverage—news where the company is one of few subjects, not one of many.

The Finding

Filtering by relevance and sentiment significance improved factor performance by 40-60% as measured by information coefficient (IC) and IC information ratio (ICIR).

Product-specific news outperformed generic industry news. Articles mentioning a single product aligned with the company's main business delivered significantly higher alpha than broad industry commentary mentioning multiple product categories. This makes intuitive sense: "Apple launches new iPhone" is more actionable than "Tech sector sees headwinds."

Negative news consistently outperformed positive news as a predictive signal. Markets react faster and more decisively to bad news—earnings warnings, regulatory issues, product failures—than to positive developments, which often get discounted as promotional hype. The short side of sentiment factors (fading negative news coverage) contributed disproportionately to long-short returns.

Event-labeled news (articles tagged with specific events like "earnings announcement" or "regulatory approval") improved factor performance more than generic positive/negative event screening. Context matters more than polarity.

Source filtering (weighting news by publisher credibility) provided marginal value—surprising, but consistent with the hypothesis that markets react to attention (news volume) as much as authority (source prestige).

Try It Yourself

News filtering is the difference between a noisy sentiment factor (IC ~1-2%) and a robust alpha signal (IC ~3-4%). The challenge is balancing precision (removing noise) and recall (not discarding valid signals).

Practical implementation:

Start with relevance filtering: Discard mentions below 0.3 relevance—this single cut removes 60-70% of noise with minimal signal loss
Layer specificity filters: Product/industry alignment, entity focus, sentiment conviction—stack these incrementally and monitor IC decay
Test on your universe: Filtering thresholds vary by market cap (small caps need tighter filters due to lower coverage quality)
Backtest robustly: News filtering rules can overfit; validate across multiple time periods and market regimes

Want to build a filtered sentiment pipeline optimized for your strategy? Book a call to discuss data structure, filter implementation, and backtest frameworks.

Source: 中金公司《另类数据策略（2）：如何优化新闻文本因子》 (2023-09-12).

Signal from Noise: Precision Filtering for News-Based Alpha

The Question

The Approach

The Finding

Try It Yourself

Want to explore this with your own data?

Related Use Cases

Beyond Industry Labels: Mining Alpha from Company Relationship Networks

Sentiment-Resistant Stocks: Finding Alpha in Emotional Market Noise

Timing Sentiment: When News Creates Tomorrow's Alpha