PhaseFolio
PhaseFolio Validation Study

Back-Test Results: Antimicrobial Drug Cohort

36 historical antibacterial Phase 3 entrants (2004–2019) evaluated against PhaseFolio's rNPV engine. Pairwise AUC of 0.629 (up from 0.524 pre-Sprint-1) clears the conventional ≥0.60 PASS target. The +0.105 lift is entirely the single-asset-sponsor-fragility scored multiplier — the one Sprint-1 signal the cohort can validate, because it fires on three approvals as well as failures. Two other candidate signatures were deliberately demoted to non-scored risk flags after a pre-publication ablation; the full ablation is published below, not just the largest number.

2026-05-16 · 36 drugs (25 approved, 11 not approved) · 10,000 MC iterations per drug
Pairwise AUC
0.629
target ≥0.60 · 173/275 concordant pairs
PASS
Separation Gap
0.7pp
successes 90.8% vs failures 90.1% · target ≥10pp
GAP DISCLOSED
Risk Flag Sensitivity
90.9%
10/11 failures flagged · target ≥70%
PASS
Cohort Base Rate
69.4%
25 approved / 36 total · mean predicted PoS ~91% (well-calibrated as point estimate)
POINT EST OK

Key finding: Sprint-1 (2026-05-16) partially closed the discrimination gap. Pre-Sprint-1 the engine's Phase 3-entry PoS was well-calibrated as a point estimate (mean ~91% vs observed 69.4% cohort approval rate) but did not discriminate approved from failed (AUC 0.524). One scored multiplier — single-asset sponsor fragility — moves AUC to 0.629 (PASS ≥0.60). It is the only Sprint-1 signal that scores, because it is the only one the cohort can validate: it fires on three approvals (plazomicin, eravacycline cIAI, lefamulin) as well as failures, so a skeptic can verify it does not merely track outcomes. The engine's value for antimicrobials remains substantially downstream of Phase 3-entry PoS — rNPV math, Monte Carlo distribution, sub-indication context, IRA terminal-value modeling.

How We Built the 36-Drug Cohort

Unlike the NSCLC cohort, which derives from a curated enrichment pipeline over 5,167 trials, the antimicrobial cohort is hand-built and verified by an LLM CMO-grade review — Claude Opus 4.7 acting in a chief-medical-officer reviewer role, not a human medical officer. Antibacterial Phase 3 programs over 2004–2019 are a small, well-bounded universe; the verification path emphasizes primary-source accuracy over scale.

1
Initial Cohort Draft
Pharma-domain recall of US-pathway antibacterial Ph3 programs 2005-2020 with known outcomes. Schema mirrored the RA + NSCLC backtests: drug, sponsor, modality, mechanism class, indication, decision date, remaining stages, peak revenue at decision, WACC, outcome.
2
Five-Agent LLM CMO Verification
Five parallel Claude Opus 4.7 research agents, one-per-task: approved cohort verification (2 batches), failed/CRL cohort verification, completeness audit, borderline adjudication. Each worked from a self-contained CMO-lens prompt (an LLM in a CMO-grade reviewer role — no human medical officer) against primary sources (ClinicalTrials.gov NCT records, FDA approval letters, SEC 8-K filings).
3
Material Corrections Caught
6 Phase 3 start dates corrected (telavancin / fidaxomicin / omadacycline / plazomicin / vabomere — initial draft off by 12-24 months); 1 sponsor-at-Ph3 misattribution (vabomere = The Medicines Company via Rempex, not Melinta); 1 drift error (ridinilazole owner is Summit Therapeutics, not Sebela).
4
Eight Drugs Added
Completeness audit caught 8 missing Ph3 programs: ceftobiprole (2024), sulbactam-durlobactam (Xacduro 2023), cefepime-enmetazobactam (Exblifep 2024), zoliflodacin (Nuzolvence 2025), cefepime-taniborbactam (CRL 2024 — CMC), rifamycin SV MMX (Aemcolo 2018), murepavadin (PRISM Ph3 halted 2019 — nephrotoxicity), tebipenem (Spero — CRL 2022).
5
Methodology Lock-Ins
Unit of analysis = drug × Ph3-indication-program (eravacycline = 2 rows, iclaprim = 2 rows). Outcome binary = first-decision (sulopenem CRL = miss). Excluded by design: live biotherapeutics (Rebyota, Vowst), Animal Rule biodefense mAbs, Ph2b approvals (bedaquiline), ex-US-only (delamanid), Ph3 pre-2005 (tigecycline), topical-only, 505(b)(2) bridging.
6
Cohort Lock + Run
36-drug locked cohort persisted as the antimicrobial backtest fixture. Re-run 2026-05-16 with the Sprint-1 M3 scored multiplier (M1/M2 demoted to non-scored flags); results surfaced in the intelligence dashboard. RA (0.625) / NSCLC (0.709) re-run as regression — number-identical.
36
Cohort Drugs
25 / 11
Approved / Not Approved
275
Ranking Pairs
8
Drugs Added via Audit

Hand-curated, LLM CMO-verified (Claude Opus 4.7), and the test set — not the training set. The 36-drug cohort is verified by a Claude Opus 4.7 LLM CMO-grade pass (no human medical officer) against ClinicalTrials.gov, FDA approval letters, and SEC 8-K filings; the 4,102-trial antimicrobial enrichment in the platform's peer population is a separate corpus. Enrichment did not move AUC (0.531→0.524, within noise) — the scored path reads basic trial-metadata fields already at 97–100% coverage, which established the discrimination gap was structural and needed model features, not more data. That is what Sprint-1 addressed.

Why Phase-3 entry, not Phase-2 — the anchor-selection finding. RA and NSCLC are anchored at Phase 2 entry; this cohort is anchored at Phase 3. The rule is identical for all three — anchor at the earliest decision point at which the cohort's failure population is observable in public registries, so the cohort is not survivorship-truncated on the failure side. For antibacterials it is not: a reproducible scan of the platform's antimicrobial peer-population corpus — 4,102 trials across 81 distinct drugs — found 68 Phase-2 entrants, of which 61 progressed to Phase 3/4 and only 7 were Phase-2-terminal; at most 4 of those sit outside the 36-program cohort and none are registry-flagged as terminated or failed. Effectively zero clean Phase-2 antibacterial failures exist in the registry — a Phase-2-anchored antibacterial backtest would have a near-empty, survivorship-fatal failure arm. Phase 3 is the earliest anchor at which the antibacterial universe is small, bounded, and registry+FDA-complete, which is exactly why the 36-program cohort can be primary-source-complete. Oncology and RA Phase-2 failures, by contrast, are densely registered, so Phase-2 anchoring is unbiased there. Full cross-cohort treatment: backtest methodology.

How the Back-Test Works

Each drug is evaluated using only information available before its real-world Phase 3 start. The decision phase is Phase 3 entry, mirroring BIO/QLS 2021 anti-infective transition calibration.

1
Reconstruct Phase 3 Entry
Decision date = earliest registered Phase 3 trial start for the FDA-target indication, anchored to ClinicalTrials.gov 'Actual study start date'. No future data leaks into the model.
2
Apply BIO/QLS Base Rates
Phase 3 → NDA/BLA transition rate from BIO/QLS 2021 infectious_disease (62% small molecule, 66% mAb). NDA/BLA → approval ~92%. Combined Phase 3-entry cumulative PoS baseline ~57%.
3
Apply Modifiers via Logistic Path
Target validation (prior class approvals), competitive density, era, sponsor track record, endpoint tier, biomarker enrichment — applied through log-odds to keep PoS bounded in [0,1]. Multipliers gated by source-publication date so post-decision evidence cannot leak.
4
Antibacterial Multipliers + Sprint-1 M3
Positive-direction regulatory relaxations: indication_tier, QIDP (+5% Ph3 & NDA/BLA), LPAD (+10%). Sprint-1 added the single-asset-sponsor-fragility scored multiplier (0.80x odds-ratio) — the one negative-direction signature that is cohort-validatable (fires on 3 approvals too), moving AUC 0.524 → 0.629. The pre-existing LPAD phase_3-only no-op was fixed as a same-day @2026-05-16 amendment (re-scoped to phase_3 + nda_bla, net −0.002).
5
Risk-Flag Emission
Generic risk flags from cohort metadata at decision (HIGH_COMPETITION, LIMITED_TRIAL_DATA, FIRST_IN_CLASS_RISK, NOVEL_MODALITY, SAFETY_CLASS_SIGNAL, LATE_ENTRANT) plus the Sprint-1 non-scored HEPATOTOX_CLASS_PRIOR / SCR_ENDPOINT_FRAGILITY flags. 10 of 11 failed drugs carried ≥1 flag at decision (90.9%, up from 72.7%) — surfacing the hepatotox/SCR risk without inflating scored AUC with a prior the cohort cannot validate.
6
Score Against Actual Outcomes
Pairwise AUC over all approved×failed pairs. Best-threshold sweep identifies the cutoff and accuracy; Wilson 95% CI on accuracy. Outcome binary is first-decision FDA approval by 2026-12-31 on the originally-designed Ph3 program; ultimate-outcome reported as sensitivity.

Predicted Cumulative PoS by Drug

Bars show the engine's predicted cumulative probability of success at Phase 3 entry, sorted within group. Top 12 of 25 approved + all 11 not-approved shown. Post-Sprint-1 the single-asset-sponsor failures separate downward; residual overlap in the ~0.88–0.93 band is the part of the gap one cohort-validatable multiplier does not close (the demoted M1/M2 signatures would tighten it but cannot be cohort-validated, so they are carried as non-scored flags).

Approved — top 12 of 25
imipenem_relebactamRecarbrio · Carbapenem+BLI
94.1%
cefiderocolFetroja · Siderophore cephalosporin
94.1%
gepotidacinBlujepa · Triazaacenaphthylene
92.8%
cefepime_enmetazobactamExblifep · Cephalosporin+BLI
92.2%
sulbactam_durlobactamXacduro · Sulbactam+BLI
92.2%
dalbavancinDalvance · Lipoglycopeptide
91.9%
oritavancinOrbactiv · Lipoglycopeptide
91.9%
zoliflodacinNuzolvence · Spiropyrimidinetrione
91.9%
bezlotoxumabZinplava · Anti-toxin B mAb
91.8%
ceftazidime_avibactamAvycaz · Cephalosporin+BLI
91.4%
meropenem_vaborbactamVabomere · Carbapenem+BLI
91.4%
ceftolozane_tazobactamZerbaxa · Cephalosporin+BLI
91.1%
Not approved — all 11
cadazolidFailed (Ph 3) · Quinox-oxazolidinone
92.2%
sulopenemFailed (CRL/Reg) · Penem
91.9%
cefepime_taniborbactamFailed (CRL/Reg) · Cephalosporin+boronic-BLI
91.4%
tebipenemFailed (CRL/Reg) · Penem (oral)
91.1%
ridinilazoleFailed (Ph 3) · Bis-benzimidazole
90.9%
murepavadinFailed (Ph 3) · LptD inhibitor (OMP-targeting peptide)
90.4%
iclaprim_motifFailed (CRL/Reg) · DHFR inhibitor
89.1%
eravacycline_cUTIFailed (Ph 3) · Fluorocycline
89.1%
surotomycinFailed (Ph 3) · Lipopeptide
88.9%
iclaprim_arpidaFailed (CRL/Reg) · DHFR inhibitor
88.7%
solithromycinFailed (CRL/Reg) · Fluoroketolide
87.8%
Mean PoS (approved): 90.8% · Mean PoS (not approved): 90.1% · Separation: +0.7pp · Pairwise AUC: 0.629

36-Drug Antimicrobial Back-Test Cohort

DrugSponsorMechanismOutcome
imipenem_relebactamMerck & Co. (relebactam in-house)Carbapenem+BLIApproved
cefiderocolShionogi & Co., Ltd. (Japan)Siderophore cephalosporinApproved
gepotidacinGSK plcTriazaacenaphthyleneApproved
cefepime_enmetazobactamAllecra TherapeuticsCephalosporin+BLIApproved
sulbactam_durlobactamEntasis Therapeutics (Innoviva subsidiary)Sulbactam+BLIApproved
dalbavancinDurata Therapeutics (acquired Vicuron rights 2009 post-Vicuron-CRL-2007; acquired by Actavis/Allergan Nov 2014, now AbbVie)LipoglycopeptideApproved
oritavancinThe Medicines Company (acquired from Targanta Feb 2009 post-Targanta-CRL-2008; acquired by Melinta Aug 2017)LipoglycopeptideApproved
zoliflodacinInnoviva Specialty Therapeutics / Entasis (in partnership with GARDP)SpiropyrimidinetrioneApproved
bezlotoxumabMerck (via Medarex → BMS + MBL co-development)Anti-toxin B mAbApproved
ceftazidime_avibactamAstraZeneca + Forest (avibactam from Novexel acq Forest 2010; sold to Pfizer 2016)Cephalosporin+BLIApproved
meropenem_vaborbactamThe Medicines Company (acquired Rempex Dec 2013; Vabomere transferred to Melinta Jan 2018 post-approval)Carbapenem+BLIApproved
ceftolozane_tazobactamCubist Pharmaceuticals (acquired Calixa Dec 2009; Cubist acquired by Merck Jan 2015 post-approval)Cephalosporin+BLIApproved
omadacyclineParatek Pharmaceuticals (acq Gurnet Point Capital/Novo 2023)Tetracycline (aminomethyl)Approved
pretomanidTB Alliance (first not-for-profit to register a drug with FDA)NitroimidazoleApproved
telavancinTheravance (now Cumberland for US Vibativ post-2018)LipoglycopeptideApproved
ceftarolineForest Laboratories (acquired Cerexa Jan 2007; now AbbVie via Actavis-Allergan)5th-gen cephalosporinApproved
fidaxomicinOptimer Pharmaceuticals (acquired by Cubist 2013, now Merck)MacrocyclicApproved
ceftobiproleBasilea Pharmaceutica (originally J&J/Cilag, 2008 CRL on data-integrity grounds)Anti-MRSA cephalosporinApproved
lefamulinNabriva Therapeutics (US rights divested 2022; Nabriva wound down 2023)PleuromutilinApproved
doripenemShionogi (originator) / Johnson & Johnson Pharmaceutical R&D (US Ph3)CarbapenemApproved
tedizolidTrius Therapeutics (originator; acquired by Cubist July 2013 mid-Ph3; now Merck)OxazolidinoneApproved
eravacycline_cIAITetraphase Pharmaceuticals (acq La Jolla 2020 → Innoviva 2022)FluorocyclineApproved
rifamycin_sv_mmxCosmo Pharmaceuticals (US co-promote: RedHill → Aries Pharmaceuticals)RifamycinApproved
delafloxacinMelinta Therapeutics (formerly Rib-X Pharmaceuticals)FluoroquinoloneApproved
plazomicinAchaogen (BANKRUPT 2019-04-15; assets to Cipla USA)AminoglycosideApproved
cadazolidActelion (acq J&J Jun 2017; J&J discontinued April 2018)Quinox-oxazolidinoneFailed (Ph 3)
sulopenemIterum TherapeuticsPenemFailed (CRL/Reg)
cefepime_taniborbactamVenatorx Pharmaceuticals (US partner Melinta)Cephalosporin+boronic-BLIFailed (CRL/Reg)
tebipenemSpero TherapeuticsPenem (oral)Failed (CRL/Reg)
ridinilazoleSummit Therapeutics Inc. (still active; pivoted to oncology with ivonescimab post-failure)Bis-benzimidazoleFailed (Ph 3)
murepavadinPolyphor Ltd.LptD inhibitor (OMP-targeting peptide)Failed (Ph 3)
iclaprim_motifMotif Bio plc (defunct 2020-2021)DHFR inhibitorFailed (CRL/Reg)
eravacycline_cUTITetraphase PharmaceuticalsFluorocyclineFailed (Ph 3)
surotomycinCubist Pharmaceuticals (acq Merck Jan 2015; development discontinued post-Trial 2)LipopeptideFailed (Ph 3)
iclaprim_arpidaArpida AG (acquired by Evolva 2010 post-failure)DHFR inhibitorFailed (CRL/Reg)
solithromycinCempra Pharmaceuticals (merged into Melinta Nov 2017; asset abandoned post-CRL)FluoroketolideFailed (CRL/Reg)

Deep Dives

Scoring-Decision Ablation

Only One of Three Signatures Scores

Sprint-1 pre-publication ablation · 2026-05-16
Sprint-1 added three candidate antibacterial multipliers, each pre-registered on mechanism / endpoint-design / financial-structure evidence dated before each drug's decision date. A pre-publication ablation disaggregated them: baseline 0.524 → M3-only 0.631 → M1+M2-only 0.797 → all three 0.782. M1 (hepatotoxicity class) and M2 (SCR endpoint fragility) dominate the headline yet fire only on this cohort's failures with zero approved counterexamples — imported priors the cohort structurally cannot self-validate. M3 (single-asset sponsor fragility) fires on three approvals too, so it is cohort-validatable.
Decision: only M3 scores (final 0.629 after a same-day LPAD-gate fix). M1/M2 demoted to non-scored risk flags. The full ablation is published — not the largest number (0.782) — because a headline that is mostly an unvalidatable prior is not one a CMO advisor should be asked to trust.
0.629
Scored AUC (PASS)
0.782
All-3 (Not Shipped)
M3
Sole Scored Signal
Honest Limitation

C. difficile Sub-Cohort Does Not Separate

5 C. difficile Phase 3 programs · 2 approved / 3 failed
Five C. difficile Phase 3 programs span the window: fidaxomicin (2011) and bezlotoxumab (2016) approved; surotomycin, cadazolid, ridinilazole failed. Post-Sprint-1 the scored engine does not separate them — predicted-PoS rank: cadazolid (failed) 92.2% › bezlotoxumab (approved) 91.8% › ridinilazole (failed) 90.9% › fidaxomicin (approved) 90.5% › surotomycin (failed) 88.9%. A failure outranks an approval; sub-cohort AUC is at chance, and we report that honestly. The M2 SCR-fragility multiplier, when scored, produced a spurious perfect sub-cohort AUC of 1.000 by firing on exactly the three failures — the overfitting signature the ablation was designed to catch.
M2 was demoted to a non-scored flag; all three failed CDI drugs carry SCR_ENDPOINT_FRAGILITY in the dossier. Honest scored non-separation plus a surfaced design-risk flag beats a fake-perfect number that collapses under five minutes of CMO scrutiny.
1.000
Spurious (M2 Scored)
At chance
Scored Sub-Cohort
Flag
SCR Risk Carried

Discrimination is real but modest, and rank-only. Post-Sprint-1 pairwise AUC is 0.629 (was 0.524) with a separation gap of only 0.7pp — the engine ranks approved above failed better than chance but does not separate them with confident probabilities; point-estimate calibration (mean ~91% vs observed 69.4%) is intact. Only one of three failure signatures legitimately scores: M1/M2 are deliberately non-scored flags because they fire only on this cohort's failures with no approved counterexample, so the cohort cannot validate them. The residual gap is the deliberate cost of that honesty; closing it defensibly needs an external validation set, not a bigger in-cohort multiplier. Per-point detail, the full ablation, and the per-drug ledger are in the backtest methodology.

Engine version: PhaseFolio rNPV engine v1 · substrate methodology version: methodology@2026-05-21 · cohort verified via a 5-agent Claude Opus 4.7 LLM CMO-grade audit (no human medical officer) against primary sources (ClinicalTrials.gov, FDA approval letters, SEC 8-K filings).