pharmacoml Tutorial

What the package expects

Input 1: EBEs or individual parameters

One row per subject with columns such as CL, V, KA, or other subject-level PK/PD quantities derived from a base model.

Input 2: Covariates

One row per subject with columns such as WT, AGE, SEX, CRCL, and ALB.

Example input files

ebes.csv

ID,CL,V,KA
1,5.21,48.3,1.10
2,6.03,55.1,0.92
3,4.88,44.7,1.25
4,5.67,51.0,1.05

covariates.csv

ID,WT,AGE,SEX,CRCL,ALB
1,72,55,1,88,4.1
2,81,49,0,95,3.8
3,64,63,1,70,4.0
4,77,46,0,91,4.2

analysis_dataset.csv

ID,CL,V,KA,WT,AGE,SEX,CRCL,ALB
1,5.21,48.3,1.10,72,55,1,88,4.1
2,6.03,55.1,0.92,81,49,0,95,3.8
3,4.88,44.7,1.25,64,63,1,70,4.0
4,5.67,51.0,1.05,77,46,0,91,4.2

Basic run

import pandas as pd
from pharmacoml.covselect import HybridScreener

ebes = pd.read_csv("ebes.csv")
covariates = pd.read_csv("covariates.csv")

df = ebes.merge(covariates, on="ID")

ebe_cols = ["CL", "V", "KA"]
cov_cols = ["WT", "AGE", "SEX", "CRCL", "ALB"]

report = HybridScreener(include_scm=True).fit(
    ebes=df[ebe_cols],
    covariates=df[cov_cols],
)

print(report.confirmed_covariates())
print(report.candidate_covariates())
print(report.proxy_groups())
print(report.to_nonmem_candidates())

import pandas as pd
from pharmacoml.covselect import HybridScreener

df = pd.read_csv("analysis_dataset.csv")

report = HybridScreener(include_scm=True).fit(
    ebes=df[["CL", "V", "KA"]],
    covariates=df[["WT", "AGE", "SEX", "CRCL", "ALB"]],
)

print(report.confirmed_covariates())
print(report.candidate_covariates())

Optional: supply shrinkage

If you know ETA shrinkage from your base model, pass it in. This makes screening more pharmacometrically credible for low-information parameters.

parameter_shrinkage = {
    "CL": 0.12,
    "V": 0.28,
    "KA": 0.22,
}

report = HybridScreener(include_scm=True).fit(
    ebes=df[ebe_cols],
    covariates=df[cov_cols],
    parameter_shrinkage=parameter_shrinkage,
)

Example output

The snapshots below correspond to the toy dataset shown earlier on this page. They are meant to show the shape of the output a user should expect after running the tutorial example.

Summary table

  parameter covariate functional_form  combined_score       tier  support_count  scm_selected
0        CL        WT           power          0.9420       core              3          True
1        CL       AGE          linear          0.6110  candidate              2         False
2        CL       BSA          linear          0.4020      proxy              1         False
3         V        WT           power          0.8870       core              3          True
4         V       ALB          linear          0.1730   rejected              1         False

This is usually the first table users inspect via report.summary().

Confirmed covariates

  parameter covariate functional_form confirmation_status
0        CL        WT           power                 scm
1         V        WT           power                 scm

Candidate covariates

  parameter covariate functional_form  combined_score       tier
0        CL        WT           power          0.9420       core
1        CL       AGE          linear          0.6110  candidate
2         V        WT           power          0.8870       core

Proxy groups

  parameter proxy_group_id representative proxy_member
0        CL            G1             WT          BSA

NONMEM-style candidate block

# pharmacoml hybrid candidate covariates (nonmem)
# core = strongest evidence, candidate = carry forward to SCM/backward elimination
; [CORE] WT -> CL | form=power | score=0.942 scm=yes
; [CANDIDATE] AGE -> CL | form=linear | score=0.611
; [CORE] WT -> V | form=power | score=0.887 scm=yes

Optional advanced modes

Interaction screening

report = HybridScreener(
    include_scm=True,
    include_interactions=True,
    interaction_top_n=3,
).fit(df[ebe_cols], df[cov_cols])

Symbolic structure search

report = HybridScreener(
    include_symbolic=True,
    symbolic_backend="basis",
).fit(df[ebe_cols], df[cov_cols])

How to interpret the tiers

Tier	Meaning
`core`	Strongest ML-supported signals
`candidate`	Shortlist to carry into SCM/backward elimination
`confirmed`	Compact answer after SCM-style confirmation
`proxy`	Correlated alternatives to selected covariates

Which result should I look at first?

Start with report.confirmed_covariates() for the most compact daily-use answer. This is the easiest view to hand to a pharmacometrician who wants the short list with the strongest confirmation behind it.

Then review report.candidate_covariates(). This is the broader shortlist and is often the most useful table to carry into SCM, backward elimination, or manual covariate review.

When do the other outputs matter?

Use report.core_covariates() when you want to see the strongest AI/ML-supported signals before confirmation. Use report.proxy_groups() when correlated variables are plausible and you need to understand which covariates are acting as substitutes. Use report.interaction_covariates() only when interaction screening has been enabled.

Important: pharmacoml is a covariate screening and preselection tool. It helps prioritize likely covariates before formal model confirmation; it is not a replacement for full NONMEM, nlmixr2, or final population-model validation.

Run pharmacoml on a new dataset