Tutorial

Run pharmacoml on a new dataset

This tutorial shows the most common workflow: start with subject-level parameters or EBEs plus covariates, run the hybrid screener, inspect the shortlist, and export candidate covariates for downstream PMx confirmation.

What the package expects

Input 1: EBEs or individual parameters

One row per subject with columns such as CL, V, KA, or other subject-level PK/PD quantities derived from a base model.

Input 2: Covariates

One row per subject with columns such as WT, AGE, SEX, CRCL, and ALB.

Example input files

ebes.csv

ID,CL,V,KA
1,5.21,48.3,1.10
2,6.03,55.1,0.92
3,4.88,44.7,1.25
4,5.67,51.0,1.05

covariates.csv

ID,WT,AGE,SEX,CRCL,ALB
1,72,55,1,88,4.1
2,81,49,0,95,3.8
3,64,63,1,70,4.0
4,77,46,0,91,4.2

Basic run

import pandas as pd
from pharmacoml.covselect import HybridScreener

ebes = pd.read_csv("ebes.csv")
covariates = pd.read_csv("covariates.csv")

df = ebes.merge(covariates, on="ID")

ebe_cols = ["CL", "V", "KA"]
cov_cols = ["WT", "AGE", "SEX", "CRCL", "ALB"]

report = HybridScreener(include_scm=True).fit(
    ebes=df[ebe_cols],
    covariates=df[cov_cols],
)

print(report.confirmed_covariates())
print(report.candidate_covariates())
print(report.proxy_groups())
print(report.to_nonmem_candidates())

Optional: supply shrinkage

If you know ETA shrinkage from your base model, pass it in. This makes screening more pharmacometrically credible for low-information parameters.

parameter_shrinkage = {
    "CL": 0.12,
    "V": 0.28,
    "KA": 0.22,
}

report = HybridScreener(include_scm=True).fit(
    ebes=df[ebe_cols],
    covariates=df[cov_cols],
    parameter_shrinkage=parameter_shrinkage,
)

Example output

The snapshots below correspond to the toy dataset shown earlier on this page. They are meant to show the shape of the output a user should expect after running the tutorial example.

Summary table

  parameter covariate functional_form  combined_score       tier  support_count  scm_selected
0        CL        WT           power          0.9420       core              3          True
1        CL       AGE          linear          0.6110  candidate              2         False
2        CL       BSA          linear          0.4020      proxy              1         False
3         V        WT           power          0.8870       core              3          True
4         V       ALB          linear          0.1730   rejected              1         False

Optional advanced modes

Interaction screening

report = HybridScreener(
    include_scm=True,
    include_interactions=True,
    interaction_top_n=3,
).fit(df[ebe_cols], df[cov_cols])

Symbolic structure search

report = HybridScreener(
    include_symbolic=True,
    symbolic_backend="basis",
).fit(df[ebe_cols], df[cov_cols])

How to interpret the tiers

Tier Meaning
core Strongest ML-supported signals
candidate Shortlist to carry into SCM/backward elimination
confirmed Compact answer after SCM-style confirmation
proxy Correlated alternatives to selected covariates

Which result should I look at first?

Start with report.confirmed_covariates() for the most compact daily-use answer. This is the easiest view to hand to a pharmacometrician who wants the short list with the strongest confirmation behind it.

Then review report.candidate_covariates(). This is the broader shortlist and is often the most useful table to carry into SCM, backward elimination, or manual covariate review.

When do the other outputs matter?

Use report.core_covariates() when you want to see the strongest AI/ML-supported signals before confirmation. Use report.proxy_groups() when correlated variables are plausible and you need to understand which covariates are acting as substitutes. Use report.interaction_covariates() only when interaction screening has been enabled.