PhaseFolio Validation Study

Back-Test Results: RA Drug Cohort

16 historical rheumatoid arthritis drugs evaluated against PhaseFolio's rNPV engine using indication-specific transition rates computed from 679 curated clinical trials. Phase-controlled AUC of 0.575 confirms genuine discriminative signal.

April 2026 · 16 drugs · 679 enriched trials · 10,000 MC iterations per drug

Phase-Controlled AUC

0.575

target ≥0.55

PASS

Computed Rate Lift

+0.150

vs BIO/QLS-only baseline of 0.425

LIFT

Risk Flag Sensitivity

87.5%

7/8 failures flagged · target ≥70%

PASS

Threshold Precision

100%

at ≥40% PoS · 0 false positives

PASS

Key finding: Indication-specific transition rates computed from 679 curated RA clinical trials improved the model's phase-controlled AUC from 0.425 (industry benchmarks only) to 0.575 — crossing the 0.55 pass threshold. The primary driver: correcting the NDA/BLA success rate from 91% (BIO/QLS) to ~42% (computed from actual FDA approval data).

Data Foundation

How We Built the Dataset

Raw ClinicalTrials.gov data lacks the drug-level structure needed for transition rate computation. We built a 9-phase enrichment pipeline to transform 1,304 raw RA trials into 679 curated records with CMO-grade intelligence.

Ingest Raw CT.gov Data

192,411 interventional studies ingested via ClinicalTrials.gov API. Linked condition mappings (420K rows) and intervention data (424K rows) stored in Supabase.

Filter for Rheumatoid Arthritis

1,304 unique RA trials identified by condition text matching across Phase 1 through Phase 4, spanning 1990s to present.

Cross-Reference 4 Data Sources

Each trial enriched by AI agent cross-referencing: ClinicalTrials.gov (structured fields), FDA Drugs@FDA (regulatory data + approval dates), PubMed (published efficacy), and web search (press releases, analyst reports). Confidence score computed per trial.

Drug-Class Knowledge Mapping

Pharmacology domain knowledge applied per drug class: drug_class, mechanism_of_action, molecular_target, modality, route, dosing. Batched by class — Anti-TNF first (~180 trials), then JAK (~120), IL-6, Anti-CD20, etc. 32 drug classes identified and consolidated.

Outcome & Efficacy Extraction

Published pivotal trial results mapped: ACR20/50/70 response rates, p-values, comparator results. Terminated trials mapped via CT.gov’s why_stopped field. Strict anti-hallucination rules: only include numbers with high confidence, always cite study name and timepoint.

Verification & Bias Checks

Random sample spot checks, drug class distribution sanity checks, FDA date cross-referencing. Completion rates verified between raw (1,304) and enriched (679) datasets — identical within 0.5pp at every phase. No survivorship bias.

679

Enriched Trials

Distinct Drugs

Drug Classes

Columns Per Trial

Outcome Summary Coverage100%

Drug Class / MoA / Target99.9%

FDA Regulatory Linkage73%

Quantitative Efficacy Data55%

Data integrity verified: We compared completion-to-termination ratios between raw CT.gov data (1,304 RA trials) and the enriched dataset (679 trials). Rates are virtually identical at every phase (within 0.5pp), confirming the enrichment process did not selectively retain successful trials. The 625 excluded trials lacked drug-level metadata (non-drug interventions, unmappable entries), not outcomes.

Methodology

How the Back-Test Works

Each drug is evaluated using only information available before its real-world decision point. No future data leaks into the model.

Curate 679 RA Trials

Deep Dive agent enriches raw CT.gov data with FDA, PubMed, and web sources. 71 distinct drugs, 45 structured columns.

Compute Drug-Level Transition Rates

Time-gated: only data before decision date. Drug-level counting (not trial-level). 3-tier fallback: drug-class (n≥5) → RA-overall → BIO/QLS benchmark.

Reconstruct Decision Point

For each drug, identify what was known at its go/no-go moment. Costs, competitive landscape, target validation history.

Apply Multipliers

Target validation (0-2+ prior class approvals), competitive density, risk flags. All via logistic adjustment to keep PoS bounded.

Run rNPV Engine + Monte Carlo

10,000 iterations per drug with Bernoulli stage gates. Same production engine used by PhaseFolio customers.

Score Against Actual Outcomes

Pairwise AUC, phase-controlled AUC, go/no-go threshold sweep, risk flag sensitivity.

Results

Predicted Cumulative PoS by Drug

Bars show the model's predicted cumulative probability of success for each drug, sorted within group. All values computed prospectively (no hindsight).

Approved (8 drugs)

AdalimumabHumira · Anti-TNF

54.1%

EtanerceptEnbrel · Anti-TNF

40.7%

RituximabRituxan · Anti-CD20

30.3%

SarilumabKevzara · IL-6

27.8%

AbataceptOrencia · T-cell

27.3%

TofacitinibXeljanz · JAK

21.6%

BaricitinibOlumiant · JAK

21%

UpadacitinibRinvoq · JAK

11.2%

Failed (8 drugs)

FilgotinibJAK

36%

OcrelizumabAnti-CD20

33%

FostamatinibSYK

30.9%

PeficitinibJAK

30.7%

TabalumabAnti-BAFF

21.7%

DecernotinibJAK

16.7%

VobarilizumabIL-6

14.4%

AtaciceptBAFF/APRIL

9.4%

Mean PoS (approved): 29.3% · Mean PoS (failed): 24.1% · Separation: +5.2pp

Cohort

16-Drug RA Back-Test Cohort

Drug	Brand	Mechanism	Outcome
Adalimumab	Humira	Anti-TNF	Approved
Etanercept	Enbrel	Anti-TNF	Approved
Tofacitinib	Xeljanz	JAK	Approved
Upadacitinib	Rinvoq	JAK	Approved
Baricitinib	Olumiant	JAK	Approved
Abatacept	Orencia	T-cell	Approved
Sarilumab	Kevzara	IL-6	Approved
Rituximab	Rituxan	Anti-CD20	Approved
Tabalumab	—	Anti-BAFF	Failed
Fostamatinib	—	SYK	Failed
Filgotinib	—	JAK	Failed
Peficitinib	—	JAK	Failed
Atacicept	—	BAFF/APRIL	Failed
Ocrelizumab	—	Anti-CD20	Failed
Decernotinib	—	JAK	Failed
Vobarilizumab	—	IL-6	Failed

Case Studies

Deep Dives

Strongest No-Go Signal

Atacicept

BAFF/APRIL inhibitor · Merck Serono · Decision: January 2008

PhaseFolio assigned the lowest cumulative PoS in the cohort (9.4%) with three risk flags: FIRST_IN_CLASS_RISK, NOVEL_MODALITY, and LIMITED_TRIAL_DATA. The target validation multiplier applied 0.60x for zero prior approvals. Monte Carlo showed 90.5% probability of negative outcome.

Actual outcome: Phase 2 terminated due to severe immunoglobulin reduction and fatal infections.

9.4%

Predicted PoS

$71M

rNPV

90.5%

P(Negative)

Methodology Breakthrough

The Computed Rate Breakthrough

NDA/BLA transition rate correction · 679 enriched trials

Static BIO/QLS benchmarks assign 91% NDA/BLA success — but that measures "given an NDA was filed, did it succeed?" Our computed rate of ~42% answers the real investment question: "given a drug reached Phase 3, did it ultimately get FDA approval?" This single correction drove phase-controlled AUC from 0.425 to 0.575.

This reframing of the NDA/BLA question is the primary source of discriminative signal in the model.

91%

BIO/QLS NDA

~42%

Computed NDA

+0.150

AUC Lift

Limitations

This validation uses 16 drugs — sufficient for proof of concept, but not statistically powered for calibration. The phase-controlled AUC of 0.575 confirms the model has genuine discriminative signal beyond structural phase bias, but pairwise AUC (0.547) and separation gap (5.2pp) indicate room for improvement. Cross-indication validation (oncology) and larger cohorts are planned next steps. See the full research report for detailed methodology.