<img alt="" src="https://secure.perk0mean.com/171547.png" style="display:none;">

First Steps in Laboratory Dataset in SAS

Laboratory Datasets in SAS

The laboratory dataset is one of the core safety datasets and, at first glance, it could appear intimidating, with multiple tests and visits per patient. This article will illustrate checks that are worth applying at the very beginning of programming work – these could be in addition to the standardized process of domain validation.

Early identification of laboratory issues will allow you to:

  • ensure data quality long before the database lock
  • increase work efficiency and the quality of the data provided by the programming team
  • prevent data loss due to unmapped or non-standard values
  • pool data easily and reply to regulatory agencies faster
  • communicate efficiently across departments/ companies that apply the same standards, for example, cooperation between Programming and Medical Writing

Understand the Domain

The laboratory domain (LB) captures laboratory data collected in the case report form (CRF) or received from a central provider or vendor. Very often there are only two values to be summarized per subject and parameter: baseline and maximum on treatment value. For example values outside the normal ranges and toxicities, as defined per National Cancer Institute Common Terminology Criteria for Adverse Events (NCI CTCAE), may also be summarized. There are also other toxicities grading systems including Rheumatology Common Toxicity Criteria (RCTC) and Division of Aids (DAIDS) to name a few that could be considered.  Sometimes laboratory test values are summarized per treatment cycle or epoch, more complicated outputs include:

  • Summarising abnormalities overlapping with other abnormalities or specific Adverse Events. One example is Hy’s Law analysis
  • CTC grades analysis, including shift tables 

Study Requirements

Before the statistical programming commences , there are few required actions to perform:

  • Check the analysis plan and table mocks to determine the programming outputs that are required
  • Discuss how to handle duplicates with the study team.
  • Review available tests, specimen types and categories for potential exclusions
  • Verify the sorting key (Clinical Data Interchange Standards Consortium (CDISC) recommendation is that you have one record per laboratory test, time point, visit and subject)
  • Exclusions may also be performed on the category level (for example: urinalysis), specimen level or test level.

See table below for an example of specimen exclusions for potassium:

Laboratory dataset tab 1

Mapping Checks

In addition to the study requirements discussed above, the following mapping checks are suggested:

  • Check if each test has a unique category. When reporting by laboratory category, if a test is assigned to multiple categories (as shown in the table above) this may result in the same test being reported in different tables. In this case, verify in which table each test belongs, for example, the haematology category may feed into chemistry or be a separate table
  • Check if specimens should be grouped for reporting, for example, the three specimens: arterial blood, blood and venous blood can all be mapped to blood
  • Examine if test codes and names map one-to-one at least within a category. Below is an example of mapping laboratory test names

Laboratory dataset tab 2Note: The example above is using CDISC Mapping.

Qualitative Values

Not all tests will have continuous numeric results, examples may be urinalysis tests like ketones, glucose, and protein. Qualitative values can be mapped as follows:

  • to other qualitative values: N, NEG, NEGATIV, NONE-DETECTED → NEGATIVE
  • to quantitative values: ++, +++, ++++, +1, 1+, POSITIVE, Positive → 1

                                                   >60 → 60.01

Simple mapping of collected terms to standardized terms can be done in the SDTM datasets, however this does not include imputations such as >6 → 60.01 which should be done in the analysis ADaM dataset instead as no imputations should be done in the SDTMs. Care should be taken to follow the CDISC Implementation Guides for how to populate the Original, Standard and Analysis results across the datasets. If there are no units associated with a result it does not necessarily mean that it is qualitative. Example tests with quantitative results and no unit are pH or Specific Gravity.

Conversion of Units

There is the possibility to choose from many unit standards, which one to follow is up to your clinical team. Some example standards are the Système International (SI) unit, the U.S. Conventional units or Client-specific standard. See table below for examples of the differences between units:

Laboratory dataset tab 3

It is strictly recommended to check that all units follow the agreed standard, if not, unit conversion should be applied. The unit 10**6/MM*3 for CDISC STDM should be mapped to CDISC Controlled Terminology equivalent. This is a CDISC SDTM requirement and all units should be mapped to the equivalent CDISC CT Submission Value where possible. Conversion of units is done by multiplying the original lab test value by the specified conversion factor. The clinical team will provide you with missing conversion factors; these would also be needed to convert lower and upper limits of the reference range where populated, unless new ranges are being applied for consistency across the dataset. Conversion may depend on the laboratory test – see tables below:

Laboratory dataset tab 4.2In addition, the checking of outliers in the converted observations is recommended as sometimes the initial unit is incorrectly assigned and the conversion was not in fact needed.

Conversion of units may be time-consuming, especially if each non-standard unit is handled separately. This would also significantly increase the size of the SAS program and make the code cumbersome to read. It is therefore worth considering writing a reusable program for unit conversion.
The initial step for the conversion process would be making sure that the laboratory dataset contains the standard unit for each test, i.e., the unit that will be used for the reporting; exceptions are laboratory tests for which units are not required. It is good practice to keep a list of reporting units in an external file (.txt, .csv, .xls), which in this form can be easily read in SAS and transformed to a SAS dataset; moreover, all potential updates will require only an update to the external file and rerun of a previously created code.

If a dataset of conversion factors is not provided to the programming team, it can be created by programmers and submitted for clinical review and approval. For conversions which are not dependent on the test (for example g/dL to g/L), the dataset with conversion factors should contain at a minimum: original unit, conversion factor and reporting unit. For conversions specific to the laboratory tests the dataset should additionally contain variable(s) allowing identification of the test, such as the Lab Test Code or the Lab Test Name. The final dataset should contain unique records only, in order to avoid duplication of laboratory records.

The below macro call can be used for merging the laboratory dataset with the dataset containing reporting units:

/* Macro for merging laboratory dataset (in_ds) with dataset containing reporting units (unit_ds) for each test.
Datasets are merged by common variables (byvars) which:
- are specific to the project,
- identify unique lab test.
Example merge key can be:
- Lab Test Name and Specimen Type, Lab Test Code, LOINC code.
‘all’ variable specifies the content of the output dataset (out_ds):
- only records with reporting unit found in unit_ds dataset (all=N) or
- all records, irrespective of the corresponding reporting unit found in unit_ds dataset or not */
%macro std_units(in_ds=lb, byvars=lbtestcd, unit_ds=units, out_ds=lb_unit, all=Y);

    proc sort data=&in_ds. out=&in_ds.s;
       by &byvar.

    proc sort data=&unit_ds. out=&unit_ds.s;
      by &byvar.

   data=&unit_ds. ;
     merge &in_ds.s(in=a) &unit_ds.s(in=b);
     by &byvars.;

     %if &all.=N %then
         if a and b;
         if a;

%mend std_units;

Once the reporting units are in the dataset, the next step is to use the conversion dataset to obtain the conversion factor for each pair of original and standard units, as in the code below:

/* lb - laboratory dataset

conv - conversion dataset
factor - conversion factor variable
org_unit - original unit variable
rep_unit - reporting unit variable
lbtestcd - lab test code variable, in this example it is identifying variable for lab test. Lbtestcd is used to assign factors for conversion dependent on lab test (a.lbtestcd=b.lbtestcd); for conversion not dependent on lab test lbtestcd is missing in conv dataset. */
proc sql;
create table lb_conv as select a.*, b.factor
  from lb as a left join conv as b 
    on upcase (a.org_unit)=upcase (b.org_unit)
    and upcase (a.rep_unit)=upcase (b.rep_unit)
    and (a.lbtestcd=b.lbtestcd or missing(b.lbtestcd))
  order by a.lbtestcd;

A simple macro for unit conversion is included below. This macro should be called within a data step and the conversion applied only to records where:


  • the original unit and the reporting unit are not equal
  • the result in the original unit and factor variables are not missing
  • the result is numeric (i.e. contains only digits and ‘.’)

The conversion of units is done by multiplying the result in the original unit by the specified conversion factor. As a result of calling the below macro, values are assigned to reporting unit, numeric and character result in the reporting unit and reference ranges for lower and upper limits. The macro parameter “&conv_ln” can be used to switch off the conversion of lower and upper limits; however the default and highly recommended option (presented below) is that the conversion is done. Additionally, the variable “convfl” is created to flag records that have been converted and to aid quality checks of the conversion.

In the example below a specific format for the results is not required, however, standard specific or study-specific numeric result precision may be expected.

/* conv_ln – macro parameter, specify if lower and upper limits of range should be converted (conv_ln=Y)
rep_unit – reporting unit
factor – conversion factor
convfl – flag for converted observations
standard CDISC variables:
lborres – result or finding in original units
lborresu – original units
lbsrtesu - standard units
lbstresn – numeric result/finding in standard units
lbstresc – character result/finding in standard format
lbornrlo – reference range lower limit in original unit
lbornrhi – reference range upper limit in original unit
lbstnrlo – reference range lower limit - standard units
lbstnrhi – reference range upper limit - standard units */ 
%macro conversion(conv_ln=Y);
   %let conv_ln=%upcase l(&conv_ln);
   if upcase(lborresu) ne upcase(rep_unit) and cmiss(factor,
 lborres)=and findc(lborres, '.', 'dkt')=then
       lbstresn=input(lborres, best.)*factor;
       lbstresc=strip(put(lbstresn, best.));

       %if &conv_lnest. = Y %then 
           lbstnrlo=input(lbornrlo, best.)*factor;
           lbstnrhi=input(lbornrhi, best.)*factor;

%mend conversion; 

After conversion of units it is good practice to check the dataset for outliers as they may indicate data issues in the laboratory dataset. An example of a data issue is incorrect recording of prefix ‘micro’ in the unit and using incorrect symbol ‘m’ instead of ‘u’. Unit micromole per litre should be written as ‘umol/L’ however it may be wrongly assigned as ‘mmol/L’, resulting in 1000 times higher result than the actual value. Such cases should be reported to Clinical Data Management.

Central and local laboratories

Central laboratories can be identified by a unique laboratory identifier or name/address which should be documented in a clinical data management plan, data transfer agreement, dataset specifications or similar document.

Common issues with data from local laboratories are:

  • missing or non‑evaluable result (for example, ‘less than 10’, ‘11-21’, ‘3 Plus’, ‘>=1000’)
  • upper and/or lower limit ranges not provided
  • non-standard/missing units or units concatenated with the result

Data obtained from a combination of central and local laboratories may also cause issues, for example:

  • not including results from local laboratories in the outputs
  • converting units only for local laboratories
  • applying different parameters derivations for results obtained from local and central laboratories

While central laboratories ensure a standard approach and provide values in standardised units, it is not always the case with local laboratories. The macro below may be used to separate units concatenated to the value in the result variable (i.e. unit is included in the result variable and the unit variable is missing). Macro should be called within the data step for tests where numeric result is expected. It is assumed that:

  • result value begins with number or ‘.’ in the case of missing result
  • unit value starts with a letter, the macro will not work properly for units like ‘10^9/L’ or ‘10^6/UL

/*condition - macro parameter, used to specify subset of tests for macro
sepfl – flag for separated observations
old_lborres - holds value of lborres before separation
lborres, lborresu – standard CDISC variables, see code above *

%macro separate (condition);
  if &condition. then
      if anyalpha(lborres)> 
        and (anydigit(substr(lborres_,,))=
 substr(lborres_,,1 )='.') 
         and missing(lborresu) then
           lborresu = substr(lborres_, anyalpha(lborres_));
           lborres = substr(lborres_, , anyalpha(lborres_)-);

   drop lborres_;
%mend separate;

Example calls may be:
%separete (lbtestcd in ('ALT' 'CA' 'BILI' 'K'))
%separate (lbcat ne 'URINALYSIS')


Assigning the baseline value

The Study Data Tabulation Model (SDTM) baseline flag should be used on team consent; otherwise it may be necessary to ask for appropriate baseline definition. Baseline definition can be a specific visit or the last non missing result prior to first dose. While developing baseline algorithm, consider usage/ imputation of measurement time or time-points, imputation of missing/ incomplete dates and inclusion of unplanned visit. Finally, is important to clarify if subjects with no baseline are expected to be summarised in post-baseline or shift tables.

How can you develop the processing of laboratory data within your organisation?

  • use a CDISC validator which will cover many checks described earlier
  • have a designated laboratory subject matter expert
  • develop a Best Programming Practices for Laboratory Domain document
  • create a database of mapping decisions from studies that use the same standard or keep one study as a reference for mapping
  • develop a list of standard QC checks for the laboratory domain

Develop standard macros to handle repeated steps, for example: read in and check the specification with a CDISC validator, perform the test/units mapping, read in/create codelists, convert units and derive baseline.

sas macros in clinical trial reporting


Quanticate's statistical programming team can support you with Laboratory dataset, CDISC Mapping and SDTM conversions and domains. Submit a Request for Information and a member of Quanticate's Business Development team will be in touch with you shortly.


Related Blog Posts:


Subscribe to the Blog