Validation

Fixed public benchmark suite and generated reports

pharmacoml uses benchmark-gated development. The default hybrid workflow is evaluated on a fixed public suite, and the benchmark runner now writes a reusable Markdown/CSV/JSON report bundle for GitHub and release documentation.

Run the benchmark suite

PYTHONPATH=. python benchmarks/run_public_benchmarks.py --check

By default this command prints the benchmark summary and writes report artifacts to benchmarks/reports/fixed_public/.

Generated report artifacts

File Purpose
public_benchmark_report.md Human-readable benchmark report for GitHub/docs
public_benchmark_summary.csv Variant-level metrics summary
public_benchmark_details.csv Per-case performance details
public_benchmark_report.json Structured machine-readable report bundle

Current fixed release suite

The current benchmark-backed default workflow has exact agreement on the real/public PK cases and several targeted synthetic checks, with the remaining gaps concentrated in the hardest collinearity-heavy synthetic scenarios.

Dataset Target / published covariates Current agreement Source / data
pheno CL/WGT, VC/WGT, VC/ASPHYXIA Exact Pharmpy example model/data
eleveld_union A1V2, AGE, HGT, M1F2, PMA, TECH, WGT Exact Wahlquist public propofol benchmark repo
ggpmx_theophylline CL/AGE0, CL/SEX_1, CL/STUD_2, CL/WT0, V/WT0 Exact ggPMX theophylline example files
high_shrinkage_user_input CL/WT Exact Generated in package
age_pma_distinct CL/AGE, CL/PMA, CL/WT Exact Generated in package
interaction_xor_screening CL/COPD, CL/SMK, CL/COPD__xor__SMK Exact Generated in package
asiimwe_correlated_small_n CL/CRCL, CL/SEX, CL/WT, V/WT Partial Generated in package
shapcov_collinear CL/AGE, CL/CRCL, CL/WT, V/WT Partial Generated in package

How to read the benchmark output

Primary summary

Compares configuration variants like baseline, RFE, shrinkage-aware, and combined workflows across the fixed suite.

Per-case details

Shows precision, recall, F1, and FDR for each benchmark case, which is what you use to understand where the workflow helps and where it is still conservative.

The benchmark gate is used to choose defaults. New features should only become default behavior if they improve or preserve the pinned public baseline.

Useful command variants

# write report bundle to the default location
PYTHONPATH=. python benchmarks/run_public_benchmarks.py