Every signal we have tested, with its verdict, shipped status, measured effect, and linked evidence. A not-predictive result is a feature of this surface: we publish what failed to clear the bar so you can see what the bar is.
| Signal | Context | Verdict | Scored in Engine | Effect / Key Finding | Evidence | Related |
|---|---|---|---|---|---|---|
Biomarker quality Genomic-grade biomarker selection raises a program's probability of success. | NSCLC · Phase II/III | Validated | Scored in engine Scored in the production engine (2.6.0) for oncology solid tumor at Phase II/III. | +5.2pp held-out AUC over the structural baseline, stable across cohort sizes. The genomic_validated cohort odds ratio was 5.59 vs the Schwaederle (2016) literature anchor of 1.35 — we ship the lower anchor with the overshoot disclosed. Scored: genomic_validated 1.35x / protein_only 0.85x / unselected 1.00x (log-odds, Phase II/III). | ||
Phase 1 objective response rate (ORR magnitude) A strong early-phase tumor response rate predicts later-stage success (a signal … | Oncology solid tumor · Phase II/III | Not Predictive | Not scored Never scored in the engine. Tested as a candidate; surfaced as a non-scored informational flag only. | A joint biomarker x ORR-bucket model beat the biomarker-only baseline by only +0.5pp held-out AUC (paired DeLong p=0.48), and the comparison is statistically unpowerable — detecting a +3pp gain at this baseline needs ~830 drugs; the cohort is 85. This followed the Phase 1 finding that the two signals combined fell below baseline (-0.3pp at 43-drug coverage) — the double-counting that motivated the joint-table test. Not scored. Remains a non-scored, surfaced flag in engine 2.6.0. Published as a transparent negative result. |
/evaluations/verdicts/[id]Each verdict follows a pre-registered protocol: define the signal, specify the cohort, run a 70/30 stratified held-out backtest, and report the pairwise AUC with a paired DeLong significance test. A signal earns “validated” only if the AUC lift is practically meaningful and statistically distinguishable at our cohort size. Anything that does not clear that bar is reported as not-predictive or flag-only — and that verdict stands publicly.