Every drug-specific signal we test gets a published verdict. We run held-out cohort ablations against programs whose real-world outcomes are now known, and we disclose the full picture — the signals that held, the signals that failed to clear the bar, and the evidence either way.
+5.2pp held-out AUC over the structural baseline, stable across cohort sizes. The genomic_validated cohort odds ratio was 5.59 vs the Schwaederle (2016) literature anchor of 1.35 — we ship the lower anchor with the overshoot disclosed.
Every signal we have tested appears below. A not-predictive or flag-only verdict is a feature, not a failure — it means we ran the test, documented the result, and did not ship a signal we could not defend.
| Signal | Context | Verdict | Effect | Evidence |
|---|---|---|---|---|
| Biomarker quality | NSCLC · Phase II/III | Validated | +5.2pp held-out AUC over the structural baseline, stable across cohort sizes. The genomic_validated cohort odds ratio wa… | |
| Phase 1 objective response rate (ORR magnitude) | Oncology solid tumor · Phase II/III | Not Predictive | A joint biomarker x ORR-bucket model beat the biomarker-only baseline by only +0.5pp held-out AUC (paired DeLong p=0.48)… |
Machine-readable: /evaluations/verdicts.json
We tested two drug-specific signals against a held-out cohort of oncology programs whose outcomes are now known. Biomarker quality held and is scored in the engine; early-phase ORR magnitude — a signal several vendors market — did not clear our bar and is not scored.