RECAP · Reviewed June 16, 2026

The research behind the rankings: what we tested, kept, and threw out

In one line: Most stock models are black boxes that only ever show you their wins. This is the opposite — a plain-English tour of how the Bull Rankings score is actually built, the one rule that governs every change to it, and the signals we tested and deliberately rejected because they didn't hold up out of sample.

Most stock-ranking products are black boxes. You get a number, maybe a star rating, and no way to tell whether it reflects a real edge or an over-fit backtest that fell apart the day it went live. We built the Bull Rankings the other way around: every scoring choice is documented, auditable, and — this is the important part — held to a single rule that most models quietly skip.

This is a tour of that research. Not just what the model does, but why it does it, and what we tried that failed.

The one rule: out-of-sample or it doesn't ship

Here is the rule that governs every change to the score:

No signal enters the model unless it improves results on data it was not tuned on.

That sounds obvious. It is also where most quantitative strategies go wrong. It is trivially easy to find a rule that would have worked beautifully on the past — sweep enough thresholds and combinations and something always "wins" the history you fed it. The question that matters is whether it keeps working on data the rule never saw. So when we test a new signal, we tune it on one window (say, 2023–2024) and then judge it only on a held-out window (2025–2026) the tuning never touched. If it doesn't survive that, it doesn't ship — no matter how good the in-sample chart looks.

The consequence of taking that rule seriously is that we reject far more ideas than we keep. The rest of this article is mostly a list of good-sounding ideas that didn't make it, because that list is the honest evidence that the rule is real.

What the score actually is

The headline number is a 0–110 composite built from seven fundamental signals:

  • Free cash flow — is the business generating real, spendable cash?
  • Revenue growth — is the top line expanding?
  • Leverage (debt-to-equity) — how much financial risk is baked in?
  • Valuation — trailing P/E, or price-to-sales for companies that aren't profitable yet.
  • PEG ratio — valuation adjusted for the growth rate, so you're not overpaying for growth.
  • Free-cash-flow yield — cash generation relative to the price you pay.
  • Return on equity — how efficiently the business compounds the capital it keeps.

Each signal is graded against sector-aware thresholds — a debt load that's normal for a utility would be a red flag for a software company — and the grades are blended into the composite. A small set of signed adjustments then nudges the result for things a single metric can't capture on its own: a durable-compounder profile (high cash yield and high ROE and low leverage together), a growth-at-a-reasonable-price sweet spot, extreme leverage, cash burn outside the turnaround bucket, and a discounted-cash-flow cross-check.

The names that score well are then sorted into three buckets — Momentum Leaders, Value, and Turnarounds — because those styles lead at different points in a market cycle, and holding all three diversifies across regimes instead of betting everything on one being in favour.

What we kept (and the evidence for it)

A few choices earned their place by clearing the out-of-sample bar:

  • Bucket-specific weights. The seven signals aren't weighted equally for every name. A growth company is judged mostly on its revenue trajectory; a value name is judged mostly on cheapness. Those weights were tuned by search on the training window and adopted only because they also won the held-out validation window at a shallower drawdown — not because they looked good on the history they were fit to.
  • Momentum for the growth bucket. The "growth" sleeve ranks its names by 12-month price momentum paired with a strict quality gate, rather than by cheapness. This was a deliberate swap, and it won out-of-sample — particularly in trending markets where cheap-growth screens lag the actual leaders.
  • A daily data-quality gate. Before any edition ships, an automated validator checks it for completeness, sane prices, and coverage regressions, and aborts the run rather than publishing degraded data. Bad numbers are worse than stale numbers on a finance site.

What we tested and threw out

This is the part you won't usually see published. Each of these was a reasonable hypothesis. Each failed the out-of-sample test, so none of them are in the model:

  • Price-momentum and trend overlays on the value model. Adding a 6–12 month momentum tilt, a 200-day-average filter, or a golden-cross bonus all degraded returns out of sample — some variants by double digits of annual return, with deeper drawdowns. The reason is structural: this model's edge is mean-reversion — it buys cheap, out-of-favour names that are below trend precisely because they're beaten down. Trend filters systematically avoid exactly those names, so they don't just fail to help, they remove the source of return.
  • Gross profitability inside the composite. This one is instructive. Gross profitability (a well-known academic quality factor) is genuinely strong on its own — a sleeve that simply buys the highest-gross-profitability names beat both the market and our model on risk-adjusted return in testing. So we tried to wire it into the score. It backfired: at every weight we tried, adding it to the composite lowered return and risk-adjusted return and deepened drawdowns versus leaving it out. A factor can be a real edge standalone and still poison a composite that's already capturing what it captures. It stays a displayed data point, not a scoring input.
  • Quality-breadth signals like ROIC. Return on invested capital is a great descriptive metric, but as a ranking signal it was weak out of sample. It earns a place on the grade card for context, not in the score.
  • Market-timing crash defenses. Going to cash when the S&P fell below its 200-day average sounds prudent. Tested across a decade, it gutted returns and deepened the worst drawdown — because for a buy-cheap-into-weakness strategy, "the market is below trend" is exactly when the best entries appear. Timing out of weakness sells the bottom and misses the rebound.
  • A bank-friendly quality bonus. A rule to let strong financials (high ROE at a cheap multiple) compete despite lacking the cash-flow signals the model leans on — it degraded returns by slipping banks into the top picks ahead of names that did better.

Why publish the failures

Because it's the only honest way to claim an edge. Anyone can show you a flattering backtest. The strategies that survive contact with live markets are the ones built by rejecting most of what looked good in the rear-view mirror. The list above is the cost of the rule — and the rule is the reason to trust the number.

We're equally explicit about the limits: the backtest universe is survivors-only (a bias we've quantified and haircut rather than hidden), fundamentals run about ten years deep, and the live forward track record is still young. You can read the full methodology and limitations on the about page, and watch the model's real, never-back-dated results accrue on the track record.


The Bull Rankings is automated fundamentals research for education, not personalised investment advice and not a recommendation to buy or sell any security. Always do your own research.

Not investment advice. The Bull Rankings publishes a quantitative ranking model and accompanying analysis for general informational purposes only. Nothing on this page is a recommendation to buy, sell, or hold any security; nothing is personalized to your circumstances, risk tolerance, or tax situation. Investing carries the risk of loss — invest at your own risk and consider consulting a licensed financial professional before acting on anything you read here. See terms and methodology for full disclosures.