PhaseFolio
PhaseFolio Validation Study

Back-Test Results: RA Drug Cohort

16 historical rheumatoid arthritis drugs evaluated against PhaseFolio's rNPV engine using indication-specific transition rates computed from 679 curated clinical trials. Phase-controlled AUC of 0.575 confirms genuine discriminative signal.

April 2026 · 16 drugs · 679 enriched trials · 10,000 MC iterations per drug
Phase-Controlled AUC
0.575
target ≥0.55
PASS
Computed Rate Lift
+0.150
vs BIO/QLS-only baseline of 0.425
LIFT
Risk Flag Sensitivity
87.5%
7/8 failures flagged · target ≥70%
PASS
Threshold Precision
100%
at ≥40% PoS · 0 false positives
PASS

Key finding: Indication-specific transition rates computed from 679 curated RA clinical trials improved the model's phase-controlled AUC from 0.425 (industry benchmarks only) to 0.575 — crossing the 0.55 pass threshold. The primary driver: correcting the NDA/BLA success rate from 91% (BIO/QLS) to ~42% (computed from actual FDA approval data).

How We Built the Dataset

Raw ClinicalTrials.gov data lacks the drug-level structure needed for transition rate computation. We built a 9-phase enrichment pipeline to transform 1,304 raw RA trials into 679 curated records with CMO-grade intelligence.

1
Ingest Raw CT.gov Data
192,411 interventional studies ingested via ClinicalTrials.gov API. Linked condition mappings (420K rows) and intervention data (424K rows) stored in Supabase.
2
Filter for Rheumatoid Arthritis
1,304 unique RA trials identified by condition text matching across Phase 1 through Phase 4, spanning 1990s to present.
3
Cross-Reference 4 Data Sources
Each trial enriched by AI agent cross-referencing: ClinicalTrials.gov (structured fields), FDA Drugs@FDA (regulatory data + approval dates), PubMed (published efficacy), and web search (press releases, analyst reports). Confidence score computed per trial.
4
Drug-Class Knowledge Mapping
Pharmacology domain knowledge applied per drug class: drug_class, mechanism_of_action, molecular_target, modality, route, dosing. Batched by class — Anti-TNF first (~180 trials), then JAK (~120), IL-6, Anti-CD20, etc. 32 drug classes identified and consolidated.
5
Outcome & Efficacy Extraction
Published pivotal trial results mapped: ACR20/50/70 response rates, p-values, comparator results. Terminated trials mapped via CT.gov’s why_stopped field. Strict anti-hallucination rules: only include numbers with high confidence, always cite study name and timepoint.
6
Verification & Bias Checks
Random sample spot checks, drug class distribution sanity checks, FDA date cross-referencing. Completion rates verified between raw (1,304) and enriched (679) datasets — identical within 0.5pp at every phase. No survivorship bias.
679
Enriched Trials
71
Distinct Drugs
32
Drug Classes
45
Columns Per Trial
Outcome Summary Coverage100%
Drug Class / MoA / Target99.9%
FDA Regulatory Linkage73%
Quantitative Efficacy Data55%

Data integrity verified: We compared completion-to-termination ratios between raw CT.gov data (1,304 RA trials) and the enriched dataset (679 trials). Rates are virtually identical at every phase (within 0.5pp), confirming the enrichment process did not selectively retain successful trials. The 625 excluded trials lacked drug-level metadata (non-drug interventions, unmappable entries), not outcomes.

How the Back-Test Works

Each drug is evaluated using only information available before its real-world decision point. No future data leaks into the model.

1
Curate 679 RA Trials
Deep Dive agent enriches raw CT.gov data with FDA, PubMed, and web sources. 71 distinct drugs, 45 structured columns.
2
Compute Drug-Level Transition Rates
Time-gated: only data before decision date. Drug-level counting (not trial-level). 3-tier fallback: drug-class (n≥5) → RA-overall → BIO/QLS benchmark.
3
Reconstruct Decision Point
For each drug, identify what was known at its go/no-go moment. Costs, competitive landscape, target validation history.
4
Apply Multipliers
Target validation (0-2+ prior class approvals), competitive density, risk flags. All via logistic adjustment to keep PoS bounded.
5
Run rNPV Engine + Monte Carlo
10,000 iterations per drug with Bernoulli stage gates. Same production engine used by PhaseFolio customers.
6
Score Against Actual Outcomes
Pairwise AUC, phase-controlled AUC, go/no-go threshold sweep, risk flag sensitivity.

Predicted Cumulative PoS by Drug

Bars show the model's predicted cumulative probability of success for each drug, sorted within group. All values computed prospectively (no hindsight).

Approved (8 drugs)
AdalimumabHumira · Anti-TNF
54.1%
EtanerceptEnbrel · Anti-TNF
40.7%
RituximabRituxan · Anti-CD20
30.3%
SarilumabKevzara · IL-6
27.8%
AbataceptOrencia · T-cell
27.3%
TofacitinibXeljanz · JAK
21.6%
BaricitinibOlumiant · JAK
21%
UpadacitinibRinvoq · JAK
11.2%
Failed (8 drugs)
FilgotinibJAK
36%
OcrelizumabAnti-CD20
33%
FostamatinibSYK
30.9%
PeficitinibJAK
30.7%
TabalumabAnti-BAFF
21.7%
DecernotinibJAK
16.7%
VobarilizumabIL-6
14.4%
AtaciceptBAFF/APRIL
9.4%
Mean PoS (approved): 29.3% · Mean PoS (failed): 24.1% · Separation: +5.2pp

16-Drug RA Back-Test Cohort

DrugBrandMechanismOutcome
AdalimumabHumiraAnti-TNFApproved
EtanerceptEnbrelAnti-TNFApproved
TofacitinibXeljanzJAKApproved
UpadacitinibRinvoqJAKApproved
BaricitinibOlumiantJAKApproved
AbataceptOrenciaT-cellApproved
SarilumabKevzaraIL-6Approved
RituximabRituxanAnti-CD20Approved
TabalumabAnti-BAFFFailed
FostamatinibSYKFailed
FilgotinibJAKFailed
PeficitinibJAKFailed
AtaciceptBAFF/APRILFailed
OcrelizumabAnti-CD20Failed
DecernotinibJAKFailed
VobarilizumabIL-6Failed

Deep Dives

Strongest No-Go Signal

Atacicept

BAFF/APRIL inhibitor · Merck Serono · Decision: January 2008
PhaseFolio assigned the lowest cumulative PoS in the cohort (9.4%) with three risk flags: FIRST_IN_CLASS_RISK, NOVEL_MODALITY, and LIMITED_TRIAL_DATA. The target validation multiplier applied 0.60x for zero prior approvals. Monte Carlo showed 90.5% probability of negative outcome.
Actual outcome: Phase 2 terminated due to severe immunoglobulin reduction and fatal infections.
9.4%
Predicted PoS
$71M
rNPV
90.5%
P(Negative)
Methodology Breakthrough

The Computed Rate Breakthrough

NDA/BLA transition rate correction · 679 enriched trials
Static BIO/QLS benchmarks assign 91% NDA/BLA success — but that measures "given an NDA was filed, did it succeed?" Our computed rate of ~42% answers the real investment question: "given a drug reached Phase 3, did it ultimately get FDA approval?" This single correction drove phase-controlled AUC from 0.425 to 0.575.
This reframing of the NDA/BLA question is the primary source of discriminative signal in the model.
91%
BIO/QLS NDA
~42%
Computed NDA
+0.150
AUC Lift

This validation uses 16 drugs — sufficient for proof of concept, but not statistically powered for calibration. The phase-controlled AUC of 0.575 confirms the model has genuine discriminative signal beyond structural phase bias, but pairwise AUC (0.547) and separation gap (5.2pp) indicate room for improvement. Cross-indication validation (oncology) and larger cohorts are planned next steps. See the full research report for detailed methodology.