This is a very general solution to automatically identifying, reading, and merging all output and input data in a Nonmem model. The most important steps are
Read and combine output tables,
If wanted, read input data and restore variables that were not output from the Nonmem model
If wanted, also restore rows from input data that were disregarded in Nonmem (e.g. observations or subjects that are not part of the analysis)
NMscanData( file, col.row, use.input, merge.by.row, recover.rows, file.mod, dir.data, file.data, translate.input = TRUE, quiet, use.rds, args.fread, as.fun, col.id = "ID", modelname, col.model, col.nmout, rep.count, order.columns = TRUE, check.time, tz.lst, tab.count )
A Nonmem control stream or output file from Nonmem (.mod or .lst)
A column with a unique value for each row. Such a column is recommended to use if possible. See merge.by.row and details as well. Default ("ROW") can be modified using NMdataConf.
Should the input data be added to the output data. Only column names that are not found in output data will be retrieved from the input data. Default is TRUE which can be modified using NMdataConf. See merge.by.row too.
If use.input=TRUE, this argument determines the method by which the input data is added to output data. The default method (merge.by.row=FALSE) is to interpret the Nonmem code to imitate the data filtering (IGNORE and ACCEPT statements), but the recommended method is merge.by.row=TRUE which means that data will be merged by a unique row identifier. The row identifier must be present in input and at least one full length output data table. See argument col.row too.
Include rows from input data files that do not exist in output tables? This will be added to the $row dataset only, and $run, $id, and $occ datasets are created before this is taken into account. A column called nmout will be TRUE when the row was found in output tables, and FALSE when not. Default is FALSE and can be configured using NMdataConf.
The input control stream. Default is to look for \"file\" with extension changed to .mod (PSN style). You can also supply the path to the file, or you can provide a function that translates the output file path to the input file path. The default behavior can be configured using NMdataConf. See dir.data too.
The data directory can only be read from the control stream (.mod) and not from the output file (.lst). So if you only have the output control stream, use dir.data to tell in which directory to find the data file. If dir.data is provided, the .mod file is not used at all.
Specification of the data file path. When this is used, the control streams are not used at all.
Default is TRUE, meaning that input data column names are translated according to $INPUT section in Nonmem listing file.
The default is to give some information along the way on what data is found. But consider setting this to TRUE for non-interactive use. Default can be configured using NMdataConf.
If an rds file is found with the exact same name (except for .rds instead of say .csv) as the input data file mentioned in the Nonmem control stream, should this be used instead? The default is yes, and NMwriteData will create this by default too. Default can be configured using NMdataConf.
List of arguments passed to when reading _input_ data. Notice that except for "input" and "file", you need to supply all arguments to fread if you use this argument. Default values can be configured using NMdataConf.
The default is to return data as a data.frame. Pass a function (say tibble::as_tibble) in as.fun to convert to something else. If data.tables are wanted, use as.fun="data.table". The default can be configured using NMdataConf.
The name of the subject ID variable, default is "ID".
The model name to be stored if col.model is not NULL. If not supplied, the name will be taken from the control stream file name by omitting the directory/path and deleting the .lst extension (path/run001.lst becomes run001). This can be a character string or a function which is called on the value of file (file is another argument to NMscanData). The function must take one character argument and return another character string. As example, see NMdataConf()$modelname. The default can be configured using NMdataConf.
A column of this name containing the model name will be included in the returned data. The default is to store this in a column called "model". See argument "modelname" as well. Set to NULL if not wanted. Default can be configured using NMdataConf.
A column of this name will be a logical representing whether row was in output table or not. Default can be modified using NMdataConf.
Nonmem includes a counter of tables in the written data files. These are often not useful. Especially for NMscanData output it can be meaningless because multiple tables can be combined so this information is not unique across those source tables. However, if rep.count is TRUE (not default), this will be carried forward and added as a column called NMREP. The argument is passed to NMscanTables.
If TRUE (default), NMorderColumns is used to reorder the columns before returning the data. NMorderColumns will be called with alpha=FALSE, so columns are not sorted alphabetically. But standard Nonmem columns like ID, TIME, and other will be first. If col.row is used, this will be passed to NMorderColumns too.
If TRUE (default) and if input data is used, input control stream and input data are checked to be newer than output control stream and output tables. These are important assumptions for the way information is merged by NMscanData. However, if data has been transferred from another system where Nonmem was run, these checks may not make sense, and you may not want to see these warnings. The default can be configured using NMdataConf. For the output control stream, the time stamp recorded by Nonmem is used if possible, and if the input data is created with NMwriteData, the recorded creation time is used if possible. If not, and for all other files, the file modification times are used.
If supplied, the timezone to be used when reading the time stamp in the output control stream. Please supply something listed in OlsonNames(). Can be configured using NMdataConf() too.
Deprecated. Use rep.count.
A data set of class 'NMdata'.
This function makes it very easy to collect the data from a Nonmem run.
A useful feature of this function is that it can automatically combine "input" data (the data read by Nonmem in $INPUT or $INFILE) with "output" data (tables written by Nonmem in $TABLE). There are two implemented methods for doing so. One (the default but not recommended) relies on interpretation of filter (IGNORE and ACCEPT) statements in $INPUT. This will work in most cases, and checks for consistency with Nonmem results. However, the recommended method is using a unique row identifier in both input data and at least one output data file (not a FIRSTONLY or LASTONLY table). Supply the name of this column using the col.row argument.
Limitations. A number of Nonmem features are not supported. Most of this can be overcome by using merge.by.row=TRUE. Incomplete list of known limitations:
character TIMEIf Nonmem is used to translate DAY and a character TIME column, TIME has to be available in an output table. NMscanData does not do the translation to numeric.
RECORDSThe RECORDS option to limit the part of the input data being used is not searched for. Using merge.by.row=TRUE will work unaffectedly.
NULLThe NULL argument to specify missing value string in input data is not respected. If delimited input data is read (as opposed to rds files), missing values are assumed to be represented by dots (.).
res1 <- NMscanData(system.file("examples/nonmem/xgxr001.lst", package="NMdata")) #> Model: xgxr001 #> #> Used tables, contents shown as used/total: #> file rows columns IDs #> xgxr001_res.txt 905/905 16/16 150/150 #> xgxr1.csv (input) 905/1502 22/24 150/150 #> (result) 905 38+2 150 #> #> Input and output data merged by: ROW #> #> Distribution of rows on event types in returned data: #> EVID CMT output result #> 0 2 755 755 #> 1 1 150 150 #> All All 905 905