PhaseFolio
PhaseFolio Signal Evaluations

Signal Registry

Every signal we have tested, with its verdict, shipped status, measured effect, and linked evidence. A not-predictive result is a feature of this surface: we publish what failed to clear the bar so you can see what the bar is.

2 verdicts · held-out cohort methodology · evidence linked per row
SignalContextVerdictScored in EngineEffect / Key FindingEvidenceRelated
Biomarker quality
Genomic-grade biomarker selection raises a program's probability of success.
NSCLC · Phase II/IIIValidatedScored in engine
Scored in the production engine (2.6.0) for oncology solid tumor at Phase II/III.
+5.2pp held-out AUC over the structural baseline, stable across cohort sizes. The genomic_validated cohort odds ratio was 5.59 vs the Schwaederle (2016) literature anchor of 1.35 — we ship the lower anchor with the overshoot disclosed.
Scored: genomic_validated 1.35x / protein_only 0.85x / unselected 1.00x (log-odds, Phase II/III).
Phase 1 objective response rate (ORR magnitude)
A strong early-phase tumor response rate predicts later-stage success (a signal
Oncology solid tumor · Phase II/IIINot PredictiveNot scored
Never scored in the engine. Tested as a candidate; surfaced as a non-scored informational flag only.
A joint biomarker x ORR-bucket model beat the biomarker-only baseline by only +0.5pp held-out AUC (paired DeLong p=0.48), and the comparison is statistically unpowerable — detecting a +3pp gain at this baseline needs ~830 drugs; the cohort is 85. This followed the Phase 1 finding that the two signals combined fell below baseline (-0.3pp at 43-drug coverage) — the double-counting that motivated the joint-table test.
Not scored. Remains a non-scored, surfaced flag in engine 2.6.0. Published as a transparent negative result.
Machine-readable: /evaluations/verdicts.jsonPer-verdict: /evaluations/verdicts/[id]

One Method, Disclosed Before Testing

Each verdict follows a pre-registered protocol: define the signal, specify the cohort, run a 70/30 stratified held-out backtest, and report the pairwise AUC with a paired DeLong significance test. A signal earns “validated” only if the AUC lift is practically meaningful and statistically distinguishable at our cohort size. Anything that does not clear that bar is reported as not-predictive or flag-only — and that verdict stands publicly.

Read the full methodology →
← Back to Evaluations