Automatically find Nonmem input and output tables and organize data

This is a very general solution to automatically identifying, reading, and merging all output and input data in a Nonmem model. The most important steps are

Read and combine output tables,
If wanted, read input data and restore variables that were not output from the Nonmem model
If wanted, also restore rows from input data that were disregarded in Nonmem (e.g. observations or subjects that are not part of the analysis)

NMscanData(
  file,
  col.row,
  use.input,
  merge.by.row,
  recover.rows,
  file.mod,
  dir.data,
  file.data,
  translate.input = TRUE,
  quiet,
  formats.read,
  args.fread,
  as.fun,
  col.id = "ID",
  modelname,
  col.model,
  col.nmout,
  col.nmrep,
  order.columns = TRUE,
  check.time,
  tz.lst,
  skip.absent = FALSE,
  tab.count,
  use.rds
)

Arguments

file: Path to a Nonmem control stream or output file from Nonmem (.mod or .lst)
col.row: A column with a unique value for each row. Such a column is recommended to use if possible. See merge.by.row and details as well. Default ("ROW") can be modified using NMdataConf.
use.input: Should the input data be added to the output data. Only column names that are not found in output data will be retrieved from the input data. Default is TRUE which can be modified using NMdataConf. See merge.by.row too.
merge.by.row: If use.input=TRUE, this argument determines the method by which the input data is added to output data. The default method (merge.by.row=FALSE) is to interpret the Nonmem code to imitate the data filtering (IGNORE and ACCEPT statements), but the recommended method is merge.by.row=TRUE which means that data will be merged by a unique row identifier. The row identifier must be present in input and at least one full length output data table. See argument col.row too.
recover.rows: Include rows from input data files that do not exist in output tables? This will be added to the $row dataset only, and $run, $id, and $occ datasets are created before this is taken into account. A column called nmout will be TRUE when the row was found in output tables, and FALSE when not. Default is FALSE and can be configured using NMdataConf.
file.mod: The input control stream file path. Default is to look for \"file\" with extension changed to .mod (PSN style). You can also supply the path to the file, or you can provide a function that translates the output file path to the input file path. The default behavior can be configured using NMdataConf. See dir.data too.
dir.data: The data directory can only be read from the control stream (.mod) and not from the output file (.lst). So if you only have the output control stream, use dir.data to tell in which directory to find the data file. If dir.data is provided, the .mod file is not used at all.
file.data: Specification of the data file path. When this is used, the control streams are not used at all.
translate.input: Default is TRUE, meaning that input data column names are translated according to $INPUT section in Nonmem listing file.
quiet: The default is to give some information along the way on what data is found. But consider setting this to TRUE for non-interactive use. Default can be configured using NMdataConf.
formats.read: Prioritized input data file formats to look for and use if found. Default is c("rds","csv") which means rds will be used if found, and csv if not. fst is possible too. Default can be modified using NMdataConf().
args.fread: List of arguments passed to when reading _input_ data. Notice that except for "input" and "file", you need to supply all arguments to fread if you use this argument. Default values can be configured using NMdataConf.
as.fun: The default is to return data as a data.frame. Pass a function (say tibble::as_tibble) in as.fun to convert to something else. If data.tables are wanted, use as.fun="data.table". The default can be configured using NMdataConf.
col.id: The name of the subject ID variable, default is "ID".
modelname: The model name to be stored if col.model is not NULL. If not supplied, the name will be taken from the control stream file name by omitting the directory/path and deleting the .lst extension (path/run001.lst becomes run001). This can be a character string or a function which is called on the value of file (file is another argument to NMscanData). The function must take one character argument and return another character string. As example, see NMdataConf()$modelname. The default can be configured using NMdataConf.
col.model: A column of this name containing the model name will be included in the returned data. The default is to store this in a column called "model". See argument "modelname" as well. Set to NULL if not wanted. Default can be configured using NMdataConf.
col.nmout: A column of this name will be a logical representing whether row was in output table or not. Default can be modified using NMdataConf.
col.nmrep: If tables are repeated, include a counter? It does not relate to the order of the $TABLE statements but to cases where a $TABLE statement is run repeatedly. E.g., in combination with the SUBPROBLEMS feature in Nonmem, it is useful to keep track of the table (repetition) number. If col.nmrep is TRUE, this will be carried forward and added as a column called NMREP. This is default behavior when more than one $TABLE repetition is found in data. Set it to a different string to request the column with a different name. The argument is passed to NMscanTables.
order.columns: If TRUE (default), NMorderColumns is used to reorder the columns before returning the data. NMorderColumns will be called with alpha=FALSE, so columns are not sorted alphabetically. But standard Nonmem columns like ID, TIME, and other will be first. If col.row is used, this will be passed to NMorderColumns too.
check.time: If TRUE (default) and if input data is used, input control stream and input data are checked to be newer than output control stream and output tables. These are important assumptions for the way information is merged by NMscanData. However, if data has been transferred from another system where Nonmem was run, these checks may not make sense, and you may not want to see these warnings. The default can be configured using NMdataConf. For the output control stream, the time stamp recorded by Nonmem is used if possible, and if the input data is created with NMwriteData, the recorded creation time is used if possible. If not, and for all other files, the file modification times are used.
tz.lst: If supplied, the timezone to be used when reading the time stamp in the output control stream. Please supply something listed in OlsonNames(). Can be configured using NMdataConf() too.
skip.absent: Skip missing output table files with a warning? Default is FALSE in which case an error is thrown.
tab.count: Deprecated. Use col.tableno.
use.rds: Deprecated - use formats.read instead. If provided (though not recommended), this will overwrite formats.read, and only formats rds and csv can be used.

Value

A data set of class 'NMdata'.

Details

This function makes it very easy to collect the data from a Nonmem run.

A useful feature of this function is that it can automatically combine "input" data (the data read by Nonmem in $INPUT or $INFILE) with "output" data (tables written by Nonmem in $TABLE). There are two implemented methods for doing so. One (the default but not recommended) relies on interpretation of filter (IGNORE and ACCEPT) statements in $INPUT. This will work in most cases, and checks for consistency with Nonmem results. However, the recommended method is using a unique row identifier in both input data and at least one output data file (not a FIRSTONLY or LASTONLY table). Supply the name of this column using the col.row argument.

Limitations. A number of Nonmem features are not supported. Most of this can be overcome by using merge.by.row=TRUE. Incomplete list of known limitations:

character TIME: If Nonmem is used to translate DAY and a character TIME column, TIME has to be available in an output table. NMscanData does not do the translation to numeric.
RECORDS: The RECORDS option to limit the part of the input data being used is not searched for. Using merge.by.row=TRUE will work unaffectedly.
NULL: The NULL argument to specify missing value string in input data is not respected. If delimited input data is read (as opposed to rds files), missing values are assumed to be represented by dots (.).

Examples

if (FALSE) {
res1 <- NMscanData(system.file("examples/nonmem/xgxr001.lst", package="NMdata"))
}

Automatically find Nonmem input and output tables and organize data

Arguments

Value

Details

See also

Examples