The research behind the rankings: what we tested, kept, and threw out

Most stock-ranking products are black boxes. You get a number, maybe a star rating, and no way to tell whether it reflects a real edge or an over-fit backtest that fell apart the day it went live. We built the Bull Rankings the other way around: every scoring choice is documented, auditable, and — this is the important part — held to a single rule that most models quietly skip.

This is a tour of that research. Not just what the model does, but why it does it, and what we tried that failed.

The one rule: out-of-sample or it doesn't ship

Here is the rule that governs every change to the score:

No signal enters the model unless it improves results on data it was not tuned on.

That sounds obvious. It is also where most quantitative strategies go wrong. It is trivially easy to find a rule that would have worked beautifully on the past — sweep enough thresholds and combinations and something always "wins" the history you fed it. The question that matters is whether it keeps working on data the rule never saw. So when we test a new signal, we tune it on one window (say, 2023–2024) and then judge it only on a held-out window (2025–2026) the tuning never touched. If it doesn't survive that, it doesn't ship — no matter how good the in-sample chart looks.

The consequence of taking that rule seriously is that we reject far more ideas than we keep. The rest of this article is mostly a list of good-sounding ideas that didn't make it, because that list is the honest evidence that the rule is real.

What the score actually is

The headline number is a single quality-growth score from 0 to 100 — the classic GARP idea, growth at a reasonable price — built from three pillars:

Quality — durable returns on capital, healthy margins, low leverage, and clean, cash-backed earnings. Is this genuinely a good business?
Growth — revenue and earnings expansion. Is it getting bigger?
Value — valuation versus sector peers (the PEG ratio, earnings and cash-flow multiples). Are you paying a fair price for the quality and growth?

Each pillar is built from the underlying fundamentals — free cash flow, revenue growth, leverage, P/E (or P/S for unprofitable names), PEG, FCF yield, return on equity — graded against sector-aware thresholds, because a debt load that's normal for a utility would be a red flag for a software company. The pillars combine into one blended score: it's not a simple sum, so a name has to be decent on all three legs — a glaring weakness on any one pillar pulls the whole number down. That intersection — quality and growth and a fair price — is the historically best-performing risk/reward setup, and it is now the entire model rather than a bonus layered on top of something else.

There is one score, shown the same way everywhere on the site — no separate growth/value/turnaround buckets. Balance-sheet businesses (banks, insurers, REITs) where free cash flow and leverage don't translate are graded on a sector-appropriate card instead and tagged Bank · REIT.

What we kept (and the evidence for it)

A few choices earned their place by clearing the out-of-sample bar:

Growth at a reasonable price as the core. The intersection of quality, growth, and a fair valuation — not any one of them alone — is what survived testing across regimes. The score is built so a name must clear all three legs, because cheap-and-bad, dear-and-good, and great-but-not-growing all under-deliver on their own.
Sector-relative grading. Every valuation and quality grade is scored against the name's own sector, not the whole market — so a software P/E isn't judged against a utility's, and a bank isn't penalized for a balance sheet that's normal for banks. This beat absolute thresholds out of sample by not systematically tilting the book toward whichever sectors happen to screen cheap.
A daily data-quality gate. Before any edition ships, an automated validator checks it for completeness, sane prices, and coverage regressions, and aborts the run rather than publishing degraded data. Bad numbers are worse than stale numbers on a finance site.

What we tested and threw out

This is the part you won't usually see published. Each of these was a reasonable hypothesis. Each failed the out-of-sample test, so none of them are in the model:

Price-momentum and trend overlays. Adding a 6–12 month momentum tilt, a 200-day-average filter, or a golden-cross bonus all degraded returns out of sample — some variants by double digits of annual return, with deeper drawdowns. The reason is structural: the value pillar leans the model toward names trading at a fair or cheap price, which are often below trend precisely because they're out of favour. Trend filters systematically avoid exactly those names, so they don't just fail to help, they remove part of the source of return.
Gross profitability inside the score. This one is instructive. Gross profitability (a well-known academic quality factor) is genuinely strong on its own — a sleeve that simply buys the highest-gross-profitability names beat both the market and our model on risk-adjusted return in testing. So we tried to wire it into the score. It backfired: at every weight we tried, adding it lowered return and risk-adjusted return and deepened drawdowns versus leaving it out. A factor can be a real edge standalone and still poison a score that's already capturing what it captures. It stays a displayed data point, not a scoring input.
Quality-breadth signals like ROIC. Return on invested capital is a great descriptive metric, but as a ranking signal it was weak out of sample. It earns a place on the grade card for context, not in the score.
Market-timing crash defenses. Going to cash when the S&P fell below its 200-day average sounds prudent. Tested across a decade, it gutted returns and deepened the worst drawdown — because for a buy-cheap-into-weakness strategy, "the market is below trend" is exactly when the best entries appear. Timing out of weakness sells the bottom and misses the rebound.
A bank-friendly quality bonus. A rule to let strong financials (high ROE at a cheap multiple) compete despite lacking the cash-flow signals the model leans on — it degraded returns by slipping banks into the top picks ahead of names that did better.

Why publish the failures

Because it's the only honest way to claim an edge. Anyone can show you a flattering backtest. The strategies that survive contact with live markets are the ones built by rejecting most of what looked good in the rear-view mirror. The list above is the cost of the rule — and the rule is the reason to trust the number.

We're equally explicit about the limits: the backtest universe is survivors-only (a bias we've quantified and haircut rather than hidden), fundamentals run about ten years deep, and the live forward track record is still young. You can read the full methodology and limitations on the about page, and watch the model's real, never-back-dated results accrue on the track record.

The Bull Rankings is automated fundamentals research for education, not personalised investment advice and not a recommendation to buy or sell any security. Always do your own research.

Not investment advice. The Bull Rankings publishes a quantitative ranking model and accompanying analysis for general informational purposes only. Nothing on this page is a recommendation to buy, sell, or hold any security; nothing is personalized to your circumstances, risk tolerance, or tax situation. Investing carries the risk of loss — invest at your own risk and consider consulting a licensed financial professional before acting on anything you read here. See terms and methodology for full disclosures.