Built 2023-12-04 using NMdata 0.1.3.902.
Please make sure to see latest vignette version available here.
This vignettes aims at enabling you at
Using NMscanData to read and combine all output and input data based only on (the path to) the Nonmem list file (understanding how NMscanData prioritizes output and input data in case of redundancy)
Switching between combining output and input data by mimicking the Nonmem data filters (IGNORE/ACCEPT) and merging by a row identifier
NMdata to return the data class of your
tbl) instead of
data.frame which is default
Using automatically generated meta data to look up information on
input and output tables, how they were combined, and results of checks
Including input data rows that were not processed by Nonmem
Combining such data sets for multiple models
If available, using an rds file to represent the input data in order to preserve all data properties (e.g. factor levels) from data set preparation
After having checked the rare exceptions, feeling confident that
NMscanData should work on all your Nonmem models
This vignette focuses on how to use
NMdata to automate
what needs to be trivial: get one dataset out of a Nonmem run, combining
all output tables and including additional columns and rows from the
input data. After scanning the Nonmem list file and/or control stream
for file and column names, the data files are read and combined.
In brevity, the most important steps are:
.lst): Identify input and output table files
An additional complication is the potential renaming of input data
column names in the Nonmem
NMscanData by default (but optionally) follows the column
names as read by Nonmem.
This way of reading the output and input data is fully compatible with most other of the great R packages for reading data from Nonmem.
In most cases, the steps above are not too hard to do. But with the
large degree of flexibility Nonmem offers, the code will likely have to
be adjusted between models. The implementation in
works for the vast majority of models and aims at preventing and
checking for as many caveats as possible. It is fast too.
Default argument values can be configured depending on your setup (data standards, directory structure and other preferences).
Like the rest of
NMdata, this functionality assumes as
little as possible about how you work. It assumes nothing about the
Nonmem model itself and as little as possible about the organization of
data and file paths/names. This makes it powerful for meta analyses, for
reading a model developed by someone else - or one written by ourselves
when we used to do things slightly differently. It will work out of the
box in the vast majority of cases.
We start by attaching
NMdata. Also, I use
data.tablefor a few post-processing steps. You can just as
well use base R or
dplyr if you prefer. Then
For the examples we will be using files that are available in the
NMdata package. To type a little less, we use this shortcut
Depending on your Nonmem setup, habits and preferences, you may name
your control streams and list files differently than this vignette.
Here, we use the
NMdata default which is
.lst. You can easily configure
match your preferences. See the FAQ
for how. So for now, rest assured that this is easy to adjust and read
NMscanData on a control stream or a list file:
res1 <- NMscanData(file.NMdata("xgxr018.lst")) #> Input and output data were searched for candidate unique row identifiers. None #> found. To skip this check, please use merge.by.row=TRUE or merge.by.row=FALSE. #> #> Model: xgxr018 #> #> Used tables, contents shown as used/total: #> file rows columns IDs #> xgxr018_res.txt 905/905 6/6 150/150 #> xgxr018_res_vols.txt 905/905 3/7 150/150 #> xgxr018_res_fo.txt 150/150 1/2 150/150 #> xgxr4.rds (input) 905/1502 21/23 150/150 #> (result) 905 31+2 150 #> Input and output data combined by translation of #> Nonmem data filters. #> #> Distribution of rows on event types in returned data: #> EVID CMT output result #> 0 2 755 755 #> 1 1 150 150 #> All All 905 905
NMscanData tells that it has read a model called
xgxr018 and how output and input data were combined. We
shall see how these properties can be modified in a bit. Then follows an
overview of how much data is used from the data files that were read. It
$TABLEsection(s) in the
.lstfile) from which it used all 905 rows and all 15 column, totaling 150 different values of
In the resulting data, 755 out of the 905 rows are
EVID==0, the remaining 150 rows are
Let’s take a quick look at key properties of the data that was
returned. It’s a
data.frame with the additional
NMdata class (for now, we just use it as a
The data used for the example is a PK single ascending dose data set, great thanks to the xgxr package authors.
The obtained dataset contains both model predictions (i.e. from
output tables) and a character variable,
trtact (i.e. from
input data). To the
.lst (output control stream) file path
was supplied by us.
head(res1,n=2) #> ID NOMTIME TIME EVID CMT AMT DV FLAG STUDY KA Q PRED RES WRES V2 #> 1 31 0 0 1 1 3 0 0 1 0.1812 2307400 0 0 0 0.042 #> 2 32 0 0 1 1 3 0 0 1 0.1812 2307400 0 0 0 0.042 #> V3 BLQ CYCLE DOSE PART PROFDAY PROFTIME WEIGHTB EFF0 CL EVENTU #> 1 0.1785 0 1 3 1 1 0 87.031 56.461 0.7245691 mg #> 2 0.1785 0 1 3 1 1 0 100.620 45.096 0.7245691 mg #> NAME TIMEUNIT TRTACT flag trtact model nmout #> 1 Dosing Hours 3 mg Dosing 3 mg xgxr018 TRUE #> 2 Dosing Hours 3 mg Dosing 3 mg xgxr018 TRUE
You may have noticed that when reading the model, we were told that
37 columns were read while 39 columns are found in the result. The
reason is the last two columns added by
contains the name of the model which is by default derived from the list
file name. See later in the “Recover rows” section what
Column in output data can overlap, and data can be available in both
output and input data. The following main principles are followed by
$INPUTsection in Nonmem.
SKIP) are included by
$INPUTare named as in the input data file.
recover.rowsargument), no information from output is merged onto these rows.
Once you have data from
can be used to browse meta information on what data was combined and how
that was done.
Above, we were told that “Input and output data combined by
translation of Nonmem data filters (not recommended).” Because of the
very commonly used
statements in Nonmem
$DATA sections, the rows in output
tables are often a subset of the input data rows. If no other
information is available,
NMscanData reads and interprets
IGNORE statements and applies
them to the input data before combining with the output data.
A more robust approach is using a unique row identifier in both input
data and output data.
NMscanData can use this for merging
the data. This means that the
are not interpreted at all. Even though
work even without, it is always recommended to always include a unique
row identifier in both input and output tables (in fact, we just need it
in one full-length output table).
The following model happens to have such a unique row identifier in
the column called
ROW. The default
behavior is to use the row identifier if it can find it. The name of the
column with the row identifier can be supplied using the
col.row argument (and the default can be changed using the
NMdataConf function). The default is to look for
All features shown below will work whether you supply
col.row or not. We use
col.row because it is
more robust and because it allows us to easily trace a row in the
analysis back to the source data. We are now told that the data was
ROW - that’s better.
res1.tbl <- NMscanData(file.NMdata("xgxr003.lst"),as.fun=tibble::as_tibble) #> Model: xgxr003 #> #> Used tables, contents shown as used/total: #> file rows columns IDs #> xgxr003_res.txt 905/905 7/7 150/150 #> xgxr003_res_vols.txt 905/905 3/7 150/150 #> xgxr003_res_fo.txt 150/150 1/2 150/150 #> xgxr1.csv (input) 905/1502 21/24 150/150 #> (result) 905 32+2 150 #> #> Input and output data merged by: ROW #> #> Distribution of rows on event types in returned data: #> EVID CMT output result #> 0 2 755 755 #> 1 1 150 150 #> All All 905 905
res1.tbl, we also added the
as.fun argument. the “
as.” refers to
as.data.table etc. - a function applied to the data before
it’s returned by
NMscanData (or any other
NMdata function). So now we have a
class(res1.tbl) #>  "NMdata" "tbl_df" "tbl" "data.frame"
I happen to be a
data.table user so I am more
comfortable working that way. Instead of using the
all the time, we will change the default behavior using the
NMdataConf function. Because
data.table we don’t need to pass the
data.table::as.data.table function but we can (better) use
the exception - for anything else, please pass a function):
NMdataConf will set the default value for all
NMdata functions that use that argument. So when setting
as.fun this way, we will get the desired class returned
from all data generating
We don’t want the same information about the dimensions repeated, so
we use the
quiet argument this time.
res1.dt <- NMscanData(file.NMdata("xgxr003.lst"),quiet=TRUE)
As expected we got a
data.table this time:
class(res1.dt) #>  "NMdata" "data.table" "data.frame"
NMdata object returned by
comes with meta information about when and how what was read, and how
the data was combined. The
NMinfo function browses this
information, and three options are available. It provides three sections
of meta data:
“details”: A list including the function call, what options were effective (if input was included, rows recovered, if data was merged by a row identifier or combined by filters etc).
“tables”: Overview of the tables that were read and combined by
NMscanData and properties of the different tables.
“columns”: Information on the columns that were treated by
NMscanData (see example below).
The following shows the “columns” information as example. Remember,
we are still getting a data.table because we used
NMdataConf to change the configuration. We use the
data.table print function to only look at first and last
print(NMinfo(res1,info="columns"),nrows=20,topn=10) #> variable file source level COLNUM #> 1: ID xgxr018_res_vols.txt output row 1 #> 2: NOMTIME xgxr4.rds input row 2 #> 3: TIME xgxr4.rds input row 3 #> 4: EVID xgxr4.rds input row 4 #> 5: CMT xgxr4.rds input row 5 #> 6: AMT xgxr4.rds input row 6 #> 7: DV xgxr018_res.txt output row 7 #> 8: FLAG xgxr4.rds input row 8 #> 9: STUDY xgxr4.rds input row 9 #> 10: KA xgxr018_res.txt output row 10 #> --- #> 31: trtact xgxr4.rds input row 31 #> 32: model <NA> NMscanData model 32 #> 33: nmout <NA> NMscanData row 33 #> 34: DV xgxr018_res_vols.txt output row NA #> 35: PRED xgxr018_res_vols.txt output row NA #> 36: RES xgxr018_res_vols.txt output row NA #> 37: WRES xgxr018_res_vols.txt output row NA #> 38: ID xgxr4.rds input row NA #> 39: DV xgxr4.rds input row NA #> 40: ID xgxr018_res_fo.txt output id NA
The column names are sorted by the order in the resulting dataset,
the order given by the
COLNUM column. The variables in the
bottom that have
COLNUM==NA were redundant when combining
the data (the same columns were included from other sources). The file
names and their source (input/output) and a “level” are given. “level”
is the information level of the source. Input data and full-length
output tables are “row” level, a firstonly or lastonly table is
id-level. And then there is the
model column added by
NMscanData which is obviously model-level.
nmout is the other column added by
NMscanData as source.
Let’s have a quick look at the data we got back. The following is
data.table. The comments in the code should make
it clear what happens if you are not familiar with
data.table. You can do this with
stats::aggregate, a combination of
whatever you prefer.
gmPRED is calculated for sample times only and
represents the geometric mean of population prediction
PRED) by dose and nominal time.
## trtact is a character. Make it a factor with levels ordered by ## numerical dose level. The := is a data.table assignment within ## res3. In dplyr, you could use mutate. res1.dt[,trtact:=reorder(trtact,DOSE)] ## Derive geometric mean pop predictions by treatment and nominal ## sample time. In dplyr, use group_by, summarize, and ifelse? res1.dt[EVID==0,gmPRED:=exp(mean(log(PRED))), by=.(trtact,NOMTIME)]
Notice, how little data is shown on the small doses. Remember, only
905 of the 1502 rows in the input data were used? Most of the rows
excluded in the analysis are so due to observation being below the
quantification limit (BLQ). The next section shows how to recover all
the input data rows with
We may want to include the input data that was ignored by Nonmem. Use
recover.rows=TRUE to include all rows from input data.
res2 <- NMscanData(file.NMdata("xgxr014.lst"),recover.rows=TRUE) #> Model: xgxr014 #> #> Used tables, contents shown as used/total: #> file rows columns IDs #> xgxr014_res.txt 905/905 12/12 150/150 #> xgxr2.rds (input) 1502/1502 22/24 150/150 #> (result) 1502 34+2 150 #> #> Input and output data merged by: ROW #> #> Distribution of rows on event types in returned data: #> EVID CMT input-only output result #> 0 1 2 0 2 #> 0 2 595 755 1350 #> 1 1 0 150 150 #> All All 597 905 1502
model column holding the model name,
NMscanData creates one other column by default.
nmout is a boolean column created by
NMscanData expressing whether each row was in the output
nmout==TRUE) or they were recovered from the input
We recognize these numbers from the message from
NMscanData - the number of rows in output (905) and number
of rows from input only (597). Since we changed the default value of
res2 is a
res2[,.N,by=nmout] #> nmout N #> 1: TRUE 905 #> 2: FALSE 597
We make use of the
nmout column to only calculate
gmPRED for observations (
EVID==0) processed by
Obviously, we were lucky that meaningful values were assigned to
DV for the BLQ and pre-dose samples in input data, so we in
this case could easily plot all the data.
NMscanData by default adds a column called
model for convenience when working with multiple models.
You can specify both column name (which is by
default) and model name (contents of that column) as arguments in
NMdataConf, You can also
configure the default column name and the function that generates the
The default is to derive the model name from the
file name (say,
xgxr001). In the
following we use this to compare population predictions from two
different models. We read them again just to show the use of the
argument to name the models ourselves. Remember, we configure
as.fun option so we are working with
data.table and we easily stack with
rbind.data.table) filling in
NA’s. We add a
couple of options to specify how input and output data are to be
NMdataConf(as.fun="data.table", ## already set above, repeated for completeness col.row="ROW", ## This is default, included for completeness merge.by.row=TRUE ## Require input and output data to be combined by merge )
res1.m <- NMscanData(system.file("examples/nonmem/xgxr001.lst", package="NMdata"), quiet=TRUE) ## using a custom modelname for this model res2.m <- NMscanData(system.file("examples/nonmem/xgxr014.lst", package="NMdata"), modelname="One compartment", quiet=TRUE) ## notice fill is an option to rbind with data.table (like bind_rows in dplyr) res.mult <- rbind(res1.m,res2.m,fill=T) ## Notice, the NMdata class disappeared class(res.mult) #>  "data.table" "data.frame" res.mult[EVID==0&nmout==TRUE, gmPRED:=exp(mean(log(PRED))), by=.(model,trtact,NOMTIME)]
In this, we specifically wanted to rename one model for illustration
modelname argument. We can pass a function to
NMscanData derives it from the list file
path. This one skips the characters and leading zeros, so we just get an
integer. We could pass use the
modelname argument in
NMdata but why not changed the default instead?
namefun <- function(path) sub("^[[:alpha:]0]+","",fnExtension(basename(path),"")) NMdataConf(modelname=namefun) res1.m <- NMscanData(system.file("examples/nonmem/xgxr001.lst", package="NMdata"), quiet=TRUE) res2.m <- NMscanData(system.file("examples/nonmem/xgxr014.lst", package="NMdata"), quiet=TRUE) ## notice fill is an option to rbind with data.table (like bind_rows in dplyr) res.mult <- rbind(res1.m,res2.m,fill=T) res.mult[,.N,by=model] #> model N #> 1: 1 905 #> 2: 14 905 ## resetting default NMdataConf(modelname=NULL)
NMdataConf can be used to change a lot of the default
behaviour of the functions in
NMdata so it fits in with
your current setup and preferred work flow.
Return to the example above creating the dataset
Notice in the list of tables in the message from
NMscanData, that input data was a
This is why we could sort the plots correctly on the dose level without
reordering the factor levels first.
If the argument
NMscanData will look for an rds file next to the input data
file (which is a delimited text file) the exact same name as the text
file except the extension must be
.rds rather than say
.csv (for Nonmem and
NMscanData, the extension
of the delimited text file doesn’t matter). If it finds the
rds file, this will be used instead. No checks are done of
whether the contents are similar in any way to the delimited text file
which is ignored in this case.
There are three advantages of using
data.tablewhich is extremely fast for delimited files so in many cases this difference can be small).
rds. This can be a big advantage if you are transferring files or reading over a network connection.
NMdatais generally very fast (thanks to
data.table) so file/network access (I/O) is likely to be the main bottleneck.
If you write Nonmem datasets with the
NMdata::NMwriteData, you can get an
automatically, exactly where
NMscanData will look for it.
Preparing datasets using
NMdata is described in this
You probably want to use
NMdataConf to change the
default behavior if you don’t want to use
Each of the steps involved in reading and combining the data from a model run can be done separately.
lst file was scanned for output tables, and they
were all read (including interpreting the possible
firstonly option). The input data has been used based on
$INPUT sections of the control
stream. The key steps in this process are available as independent
NMreadTab: Read an Nonmem output table based on the
path to the output table file.
NMscanTables: Read all output data files defined in
a Nonmem run. Return a list of tables (as data.frames or
NMtransInput: Read input data based on a Nonmem
file. Data will be processed and named like the Nonmem model.
IGNORE filters can be applied as
well. There are a few limitations to this functionality at this point.
More about them below.
The answer to this should be as close to “nothing” as possible -
that’s more or less the aim of the function. You just have to make sure
that the information that you need is present in input data and output
data. No need to output information that is unchanged from input, but
make sure to output what you need (like
ETA1 etc which cannot
be found in input). Some of these values can be found from other files
generated by Nonmem but notice:
NMscanData uses only input
and output data.
It is recommended to always use a unique row identifier in both input and output data. This is the most robust way to merge back with input data. In firstonly tables, include the subject ID. Again, everything will most likely work even if you don’t, I personally don’t like relying on “most likely” when I can just as well have robustness.
Even if there are a few limitations to what models
NMscanData can handle, there is a good chance you will
never run into any of them, as they are mostly quite rare. If you do,
reach out to me, and we’ll figure it out.
If merging with input data, the input data must be available as was
when the model was run. If you want to avoid this potential issue,
Nonmem can be run in a wrapper script that either copies the input data,
NMscanData and saves the output in a compressed
file format (like
IGNORE statements are not
supported at this point. The resulting number of rows after applying
filters is checked against row-level output table dimensions (if any
available). In other words, you have to be unlucky to run into trouble
without an error. But it is always recommended to use a unique row
identifier in both input and output tables in order to avoid relying on
interpretation of Nonmem code.
NULL options in
$DATA are not implemented. If using
please use the
col.row option to merge by a unique row
Nonmem supports a clocktime input format for a column called TIME in
input data. Based on a day counter and a character (“00:00”) clock
format, Nonmem (or rather,
NM-TRAN) can calculate the
individual time since first record. This behaviour is not mimicked by
NMscanData, and the only ways to get TIME in this case are to either
include it in an output
TABLE or to code the translation
yourself after calling
NMscanData. Of course, this is on
the todo list.
For now, only output tables returning either all rows or one row per
subject can be merged with input. Tables written with options like
FIRSTLASTONLY (two rows per subject) and
OBSONLY are disregarded with a warning (you can read them
LASTONLY is treated like
FIRSTONLY, i.e. as ID-level information if not available
In this vignette you should have learned to
NMscanDatacan automatically read and combine all output and input data, only based on the path to the list (.lst) file
merge.by.rowis the argument of interest
NMdatato return your favorite data class
NMdataConf(as.fun=tibble::as_tibble)for tibbles (
NMinfoon the result coming out of
modelwill hold the model name which you can use when combining (
rbind) multiple model data sets
modelnameoption to change the model name or how the model name is derived from the list file path.
rdsfile to preserve all input data
rdsfile by default
You should have seen that
NMscanData have very little
limitations in what Nonmem models it can read. You should not have to
change anything in the way you work to make use of