PhaseFolio Validation Study

Back-Test Results: Antimicrobial Drug Cohort

36 historical antibacterial Phase 3 entrants (2004–2019) evaluated against PhaseFolio's rNPV engine. Pairwise AUC of 0.629 (up from 0.524 pre-Sprint-1) clears the conventional ≥0.60 PASS target. The +0.105 lift is entirely the single-asset-sponsor-fragility scored multiplier — the one Sprint-1 signal the cohort can validate, because it fires on three approvals as well as failures. Two other candidate signatures were deliberately demoted to non-scored risk flags after a pre-publication ablation; the full ablation is published below, not just the largest number.

2026-05-16 · 36 drugs (25 approved, 11 not approved) · 10,000 MC iterations per drug

Pairwise AUC

0.629

target ≥0.60 · 173/275 concordant pairs

PASS

Separation Gap

0.7pp

successes 90.8% vs failures 90.1% · target ≥10pp

GAP DISCLOSED

Risk Flag Sensitivity

90.9%

10/11 failures flagged · target ≥70%

PASS

Cohort Base Rate

69.4%

25 approved / 36 total · mean predicted PoS ~91% (well-calibrated as point estimate)

POINT EST OK

Key finding: Sprint-1 (2026-05-16) partially closed the discrimination gap. Pre-Sprint-1 the engine's Phase 3-entry PoS was well-calibrated as a point estimate (mean ~91% vs observed 69.4% cohort approval rate) but did not discriminate approved from failed (AUC 0.524). One scored multiplier — single-asset sponsor fragility — moves AUC to 0.629 (PASS ≥0.60). It is the only Sprint-1 signal that scores, because it is the only one the cohort can validate: it fires on three approvals (plazomicin, eravacycline cIAI, lefamulin) as well as failures, so a skeptic can verify it does not merely track outcomes. The engine's value for antimicrobials remains substantially downstream of Phase 3-entry PoS — rNPV math, Monte Carlo distribution, sub-indication context, IRA terminal-value modeling.

Data Foundation

How We Built the 36-Drug Cohort

Unlike the NSCLC cohort, which derives from a curated enrichment pipeline over 5,167 trials, the antimicrobial cohort is hand-built and verified by an LLM CMO-grade review — Claude Opus 4.7 acting in a chief-medical-officer reviewer role, not a human medical officer. Antibacterial Phase 3 programs over 2004–2019 are a small, well-bounded universe; the verification path emphasizes primary-source accuracy over scale.

Initial Cohort Draft

Pharma-domain recall of US-pathway antibacterial Ph3 programs 2005-2020 with known outcomes. Schema mirrored the RA + NSCLC backtests: drug, sponsor, modality, mechanism class, indication, decision date, remaining stages, peak revenue at decision, WACC, outcome.

Five-Agent LLM CMO Verification

Five parallel Claude Opus 4.7 research agents, one-per-task: approved cohort verification (2 batches), failed/CRL cohort verification, completeness audit, borderline adjudication. Each worked from a self-contained CMO-lens prompt (an LLM in a CMO-grade reviewer role — no human medical officer) against primary sources (ClinicalTrials.gov NCT records, FDA approval letters, SEC 8-K filings).

Material Corrections Caught

6 Phase 3 start dates corrected (telavancin / fidaxomicin / omadacycline / plazomicin / vabomere — initial draft off by 12-24 months); 1 sponsor-at-Ph3 misattribution (vabomere = The Medicines Company via Rempex, not Melinta); 1 drift error (ridinilazole owner is Summit Therapeutics, not Sebela).

Eight Drugs Added

Completeness audit caught 8 missing Ph3 programs: ceftobiprole (2024), sulbactam-durlobactam (Xacduro 2023), cefepime-enmetazobactam (Exblifep 2024), zoliflodacin (Nuzolvence 2025), cefepime-taniborbactam (CRL 2024 — CMC), rifamycin SV MMX (Aemcolo 2018), murepavadin (PRISM Ph3 halted 2019 — nephrotoxicity), tebipenem (Spero — CRL 2022).

Methodology Lock-Ins

Unit of analysis = drug × Ph3-indication-program (eravacycline = 2 rows, iclaprim = 2 rows). Outcome binary = first-decision (sulopenem CRL = miss). Excluded by design: live biotherapeutics (Rebyota, Vowst), Animal Rule biodefense mAbs, Ph2b approvals (bedaquiline), ex-US-only (delamanid), Ph3 pre-2005 (tigecycline), topical-only, 505(b)(2) bridging.

Cohort Lock + Run

36-drug locked cohort persisted as the antimicrobial backtest fixture. Re-run 2026-05-16 with the Sprint-1 M3 scored multiplier (M1/M2 demoted to non-scored flags); results surfaced in the intelligence dashboard. RA (0.625) / NSCLC (0.709) re-run as regression — number-identical.

Cohort Drugs

25 / 11

Approved / Not Approved

275

Ranking Pairs

Drugs Added via Audit

Hand-curated, LLM CMO-verified (Claude Opus 4.7), and the test set — not the training set. The 36-drug cohort is verified by a Claude Opus 4.7 LLM CMO-grade pass (no human medical officer) against ClinicalTrials.gov, FDA approval letters, and SEC 8-K filings; the 4,102-trial antimicrobial enrichment in the platform's peer population is a separate corpus. Enrichment did not move AUC (0.531→0.524, within noise) — the scored path reads basic trial-metadata fields already at 97–100% coverage, which established the discrimination gap was structural and needed model features, not more data. That is what Sprint-1 addressed.

Why Phase-3 entry, not Phase-2 — the anchor-selection finding. RA and NSCLC are anchored at Phase 2 entry; this cohort is anchored at Phase 3. The rule is identical for all three — anchor at the earliest decision point at which the cohort's failure population is observable in public registries, so the cohort is not survivorship-truncated on the failure side. For antibacterials it is not: a reproducible scan of the platform's antimicrobial peer-population corpus — 4,102 trials across 81 distinct drugs — found 68 Phase-2 entrants, of which 61 progressed to Phase 3/4 and only 7 were Phase-2-terminal; at most 4 of those sit outside the 36-program cohort and none are registry-flagged as terminated or failed. Effectively zero clean Phase-2 antibacterial failures exist in the registry — a Phase-2-anchored antibacterial backtest would have a near-empty, survivorship-fatal failure arm. Phase 3 is the earliest anchor at which the antibacterial universe is small, bounded, and registry+FDA-complete, which is exactly why the 36-program cohort can be primary-source-complete. Oncology and RA Phase-2 failures, by contrast, are densely registered, so Phase-2 anchoring is unbiased there. Full cross-cohort treatment: backtest methodology.

Methodology

How the Back-Test Works

Each drug is evaluated using only information available before its real-world Phase 3 start. The decision phase is Phase 3 entry, mirroring BIO/QLS 2021 anti-infective transition calibration.

Reconstruct Phase 3 Entry

Decision date = earliest registered Phase 3 trial start for the FDA-target indication, anchored to ClinicalTrials.gov 'Actual study start date'. No future data leaks into the model.

Apply BIO/QLS Base Rates

Phase 3 → NDA/BLA transition rate from BIO/QLS 2021 infectious_disease (62% small molecule, 66% mAb). NDA/BLA → approval ~92%. Combined Phase 3-entry cumulative PoS baseline ~57%.

Apply Modifiers via Logistic Path

Target validation (prior class approvals), competitive density, era, sponsor track record, endpoint tier, biomarker enrichment — applied through log-odds to keep PoS bounded in [0,1]. Multipliers gated by source-publication date so post-decision evidence cannot leak.

Antibacterial Multipliers + Sprint-1 M3

Positive-direction regulatory relaxations: indication_tier, QIDP (+5% Ph3 & NDA/BLA), LPAD (+10%). Sprint-1 added the single-asset-sponsor-fragility scored multiplier (0.80x odds-ratio) — the one negative-direction signature that is cohort-validatable (fires on 3 approvals too), moving AUC 0.524 → 0.629. The pre-existing LPAD phase_3-only no-op was fixed as a same-day @2026-05-16 amendment (re-scoped to phase_3 + nda_bla, net −0.002).

Risk-Flag Emission

Generic risk flags from cohort metadata at decision (HIGH_COMPETITION, LIMITED_TRIAL_DATA, FIRST_IN_CLASS_RISK, NOVEL_MODALITY, SAFETY_CLASS_SIGNAL, LATE_ENTRANT) plus the Sprint-1 non-scored HEPATOTOX_CLASS_PRIOR / SCR_ENDPOINT_FRAGILITY flags. 10 of 11 failed drugs carried ≥1 flag at decision (90.9%, up from 72.7%) — surfacing the hepatotox/SCR risk without inflating scored AUC with a prior the cohort cannot validate.

Score Against Actual Outcomes

Pairwise AUC over all approved×failed pairs. Best-threshold sweep identifies the cutoff and accuracy; Wilson 95% CI on accuracy. Outcome binary is first-decision FDA approval by 2026-12-31 on the originally-designed Ph3 program; ultimate-outcome reported as sensitivity.

Results

Predicted Cumulative PoS by Drug

Bars show the engine's predicted cumulative probability of success at Phase 3 entry, sorted within group. Top 12 of 25 approved + all 11 not-approved shown. Post-Sprint-1 the single-asset-sponsor failures separate downward; residual overlap in the ~0.88–0.93 band is the part of the gap one cohort-validatable multiplier does not close (the demoted M1/M2 signatures would tighten it but cannot be cohort-validated, so they are carried as non-scored flags).

Approved — top 12 of 25

imipenem_relebactamRecarbrio · Carbapenem+BLI

94.1%

cefiderocolFetroja · Siderophore cephalosporin

94.1%

gepotidacinBlujepa · Triazaacenaphthylene

92.8%

cefepime_enmetazobactamExblifep · Cephalosporin+BLI

92.2%

sulbactam_durlobactamXacduro · Sulbactam+BLI

92.2%

dalbavancinDalvance · Lipoglycopeptide

91.9%

oritavancinOrbactiv · Lipoglycopeptide

91.9%

zoliflodacinNuzolvence · Spiropyrimidinetrione

91.9%

bezlotoxumabZinplava · Anti-toxin B mAb

91.8%

ceftazidime_avibactamAvycaz · Cephalosporin+BLI

91.4%

meropenem_vaborbactamVabomere · Carbapenem+BLI

91.4%

ceftolozane_tazobactamZerbaxa · Cephalosporin+BLI

91.1%

Not approved — all 11

cadazolidFailed (Ph 3) · Quinox-oxazolidinone

92.2%

sulopenemFailed (CRL/Reg) · Penem

91.9%

cefepime_taniborbactamFailed (CRL/Reg) · Cephalosporin+boronic-BLI

91.4%

tebipenemFailed (CRL/Reg) · Penem (oral)

91.1%

ridinilazoleFailed (Ph 3) · Bis-benzimidazole

90.9%

murepavadinFailed (Ph 3) · LptD inhibitor (OMP-targeting peptide)

90.4%

iclaprim_motifFailed (CRL/Reg) · DHFR inhibitor

89.1%

eravacycline_cUTIFailed (Ph 3) · Fluorocycline

89.1%

surotomycinFailed (Ph 3) · Lipopeptide

88.9%

iclaprim_arpidaFailed (CRL/Reg) · DHFR inhibitor

88.7%

solithromycinFailed (CRL/Reg) · Fluoroketolide

87.8%

Mean PoS (approved): 90.8% · Mean PoS (not approved): 90.1% · Separation: +0.7pp · Pairwise AUC: 0.629

Cohort

36-Drug Antimicrobial Back-Test Cohort

Drug	Sponsor	Mechanism	Outcome
imipenem_relebactam	Merck & Co. (relebactam in-house)	Carbapenem+BLI	Approved
cefiderocol	Shionogi & Co., Ltd. (Japan)	Siderophore cephalosporin	Approved
gepotidacin	GSK plc	Triazaacenaphthylene	Approved
cefepime_enmetazobactam	Allecra Therapeutics	Cephalosporin+BLI	Approved
sulbactam_durlobactam	Entasis Therapeutics (Innoviva subsidiary)	Sulbactam+BLI	Approved
dalbavancin	Durata Therapeutics (acquired Vicuron rights 2009 post-Vicuron-CRL-2007; acquired by Actavis/Allergan Nov 2014, now AbbVie)	Lipoglycopeptide	Approved
oritavancin	The Medicines Company (acquired from Targanta Feb 2009 post-Targanta-CRL-2008; acquired by Melinta Aug 2017)	Lipoglycopeptide	Approved
zoliflodacin	Innoviva Specialty Therapeutics / Entasis (in partnership with GARDP)	Spiropyrimidinetrione	Approved
bezlotoxumab	Merck (via Medarex → BMS + MBL co-development)	Anti-toxin B mAb	Approved
ceftazidime_avibactam	AstraZeneca + Forest (avibactam from Novexel acq Forest 2010; sold to Pfizer 2016)	Cephalosporin+BLI	Approved
meropenem_vaborbactam	The Medicines Company (acquired Rempex Dec 2013; Vabomere transferred to Melinta Jan 2018 post-approval)	Carbapenem+BLI	Approved
ceftolozane_tazobactam	Cubist Pharmaceuticals (acquired Calixa Dec 2009; Cubist acquired by Merck Jan 2015 post-approval)	Cephalosporin+BLI	Approved
omadacycline	Paratek Pharmaceuticals (acq Gurnet Point Capital/Novo 2023)	Tetracycline (aminomethyl)	Approved
pretomanid	TB Alliance (first not-for-profit to register a drug with FDA)	Nitroimidazole	Approved
telavancin	Theravance (now Cumberland for US Vibativ post-2018)	Lipoglycopeptide	Approved
ceftaroline	Forest Laboratories (acquired Cerexa Jan 2007; now AbbVie via Actavis-Allergan)	5th-gen cephalosporin	Approved
fidaxomicin	Optimer Pharmaceuticals (acquired by Cubist 2013, now Merck)	Macrocyclic	Approved
ceftobiprole	Basilea Pharmaceutica (originally J&J/Cilag, 2008 CRL on data-integrity grounds)	Anti-MRSA cephalosporin	Approved
lefamulin	Nabriva Therapeutics (US rights divested 2022; Nabriva wound down 2023)	Pleuromutilin	Approved
doripenem	Shionogi (originator) / Johnson & Johnson Pharmaceutical R&D (US Ph3)	Carbapenem	Approved
tedizolid	Trius Therapeutics (originator; acquired by Cubist July 2013 mid-Ph3; now Merck)	Oxazolidinone	Approved
eravacycline_cIAI	Tetraphase Pharmaceuticals (acq La Jolla 2020 → Innoviva 2022)	Fluorocycline	Approved
rifamycin_sv_mmx	Cosmo Pharmaceuticals (US co-promote: RedHill → Aries Pharmaceuticals)	Rifamycin	Approved
delafloxacin	Melinta Therapeutics (formerly Rib-X Pharmaceuticals)	Fluoroquinolone	Approved
plazomicin	Achaogen (BANKRUPT 2019-04-15; assets to Cipla USA)	Aminoglycoside	Approved
cadazolid	Actelion (acq J&J Jun 2017; J&J discontinued April 2018)	Quinox-oxazolidinone	Failed (Ph 3)
sulopenem	Iterum Therapeutics	Penem	Failed (CRL/Reg)
cefepime_taniborbactam	Venatorx Pharmaceuticals (US partner Melinta)	Cephalosporin+boronic-BLI	Failed (CRL/Reg)
tebipenem	Spero Therapeutics	Penem (oral)	Failed (CRL/Reg)
ridinilazole	Summit Therapeutics Inc. (still active; pivoted to oncology with ivonescimab post-failure)	Bis-benzimidazole	Failed (Ph 3)
murepavadin	Polyphor Ltd.	LptD inhibitor (OMP-targeting peptide)	Failed (Ph 3)
iclaprim_motif	Motif Bio plc (defunct 2020-2021)	DHFR inhibitor	Failed (CRL/Reg)
eravacycline_cUTI	Tetraphase Pharmaceuticals	Fluorocycline	Failed (Ph 3)
surotomycin	Cubist Pharmaceuticals (acq Merck Jan 2015; development discontinued post-Trial 2)	Lipopeptide	Failed (Ph 3)
iclaprim_arpida	Arpida AG (acquired by Evolva 2010 post-failure)	DHFR inhibitor	Failed (CRL/Reg)
solithromycin	Cempra Pharmaceuticals (merged into Melinta Nov 2017; asset abandoned post-CRL)	Fluoroketolide	Failed (CRL/Reg)

Case Studies

Deep Dives

Scoring-Decision Ablation

Only One of Three Signatures Scores

Sprint-1 pre-publication ablation · 2026-05-16

Sprint-1 added three candidate antibacterial multipliers, each pre-registered on mechanism / endpoint-design / financial-structure evidence dated before each drug's decision date. A pre-publication ablation disaggregated them: baseline 0.524 → M3-only 0.631 → M1+M2-only 0.797 → all three 0.782. M1 (hepatotoxicity class) and M2 (SCR endpoint fragility) dominate the headline yet fire only on this cohort's failures with zero approved counterexamples — imported priors the cohort structurally cannot self-validate. M3 (single-asset sponsor fragility) fires on three approvals too, so it is cohort-validatable.

Decision: only M3 scores (final 0.629 after a same-day LPAD-gate fix). M1/M2 demoted to non-scored risk flags. The full ablation is published — not the largest number (0.782) — because a headline that is mostly an unvalidatable prior is not one a CMO advisor should be asked to trust.

0.629

Scored AUC (PASS)

0.782

All-3 (Not Shipped)

Sole Scored Signal

Honest Limitation

C. difficile Sub-Cohort Does Not Separate

5 C. difficile Phase 3 programs · 2 approved / 3 failed

Five C. difficile Phase 3 programs span the window: fidaxomicin (2011) and bezlotoxumab (2016) approved; surotomycin, cadazolid, ridinilazole failed. Post-Sprint-1 the scored engine does not separate them — predicted-PoS rank: cadazolid (failed) 92.2% › bezlotoxumab (approved) 91.8% › ridinilazole (failed) 90.9% › fidaxomicin (approved) 90.5% › surotomycin (failed) 88.9%. A failure outranks an approval; sub-cohort AUC is at chance, and we report that honestly. The M2 SCR-fragility multiplier, when scored, produced a spurious perfect sub-cohort AUC of 1.000 by firing on exactly the three failures — the overfitting signature the ablation was designed to catch.

M2 was demoted to a non-scored flag; all three failed CDI drugs carry SCR_ENDPOINT_FRAGILITY in the dossier. Honest scored non-separation plus a surfaced design-risk flag beats a fake-perfect number that collapses under five minutes of CMO scrutiny.

1.000

Spurious (M2 Scored)

At chance

Scored Sub-Cohort

Flag

SCR Risk Carried

Limitations

Discrimination is real but modest, and rank-only. Post-Sprint-1 pairwise AUC is 0.629 (was 0.524) with a separation gap of only 0.7pp — the engine ranks approved above failed better than chance but does not separate them with confident probabilities; point-estimate calibration (mean ~91% vs observed 69.4%) is intact. Only one of three failure signatures legitimately scores: M1/M2 are deliberately non-scored flags because they fire only on this cohort's failures with no approved counterexample, so the cohort cannot validate them. The residual gap is the deliberate cost of that honesty; closing it defensibly needs an external validation set, not a bigger in-cohort multiplier. Per-point detail, the full ablation, and the per-drug ledger are in the backtest methodology.

Engine version: PhaseFolio rNPV engine v1 · substrate methodology version: methodology@2026-05-21 · cohort verified via a 5-agent Claude Opus 4.7 LLM CMO-grade audit (no human medical officer) against primary sources (ClinicalTrials.gov, FDA approval letters, SEC 8-K filings).