PhaseFolio Signal Evaluations

Signal Registry

Every signal we have tested, with its verdict, shipped status, measured effect, and linked evidence. A not-predictive result is a feature of this surface: we publish what failed to clear the bar so you can see what the bar is.

2 verdicts · held-out cohort methodology · evidence linked per row

Signal	Context	Verdict	Scored in Engine	Effect / Key Finding	Evidence	Related
Biomarker quality Genomic-grade biomarker selection raises a program's probability of success.	NSCLC · Phase II/III	Validated	Scored in engine Scored in the production engine (2.6.0) for oncology solid tumor at Phase II/III.	+5.2pp held-out AUC over the structural baseline, stable across cohort sizes. The genomic_validated cohort odds ratio was 5.59 vs the Schwaederle (2016) literature anchor of 1.35 — we ship the lower anchor with the overshoot disclosed. Scored: genomic_validated 1.35x / protein_only 0.85x / unselected 1.00x (log-odds, Phase II/III).	JSON ↗	backtest-nsclc ↗pos-calibration ↗
Phase 1 objective response rate (ORR magnitude) A strong early-phase tumor response rate predicts later-stage success (a signal …	Oncology solid tumor · Phase II/III	Not Predictive	Not scored Never scored in the engine. Tested as a candidate; surfaced as a non-scored informational flag only.	A joint biomarker x ORR-bucket model beat the biomarker-only baseline by only +0.5pp held-out AUC (paired DeLong p=0.48), and the comparison is statistically unpowerable — detecting a +3pp gain at this baseline needs ~830 drugs; the cohort is 85. This followed the Phase 1 finding that the two signals combined fell below baseline (-0.3pp at 43-drug coverage) — the double-counting that motivated the joint-table test. Not scored. Remains a non-scored, surfaced flag in engine 2.6.0. Published as a transparent negative result.	JSON ↗	backtest-nsclc ↗pos-calibration ↗

Machine-readable: /evaluations/verdicts.jsonPer-verdict: /evaluations/verdicts/[id]

How Verdicts Are Assigned

One Method, Disclosed Before Testing

Each verdict follows a pre-registered protocol: define the signal, specify the cohort, run a 70/30 stratified held-out backtest, and report the pairwise AUC with a paired DeLong significance test. A signal earns “validated” only if the AUC lift is practically meaningful and statistically distinguishable at our cohort size. Anything that does not clear that bar is reported as not-predictive or flag-only — and that verdict stands publicly.

Read the full methodology →

← Back to Evaluations