<img alt="" src="https://secure.perk0mean.com/171547.png" style="display:none;">

The Logic of a Clinical Research Organization Programmer

Clinical Research Organization Programmer.png

As a statistical programmer at a leading data focused Clinical Research Organization (CRO), we are requested to become involved with many programming activities on a daily basis, centred around dataset or display generation and quality control (QC). Having the opportunity to develop a process/system which can be used by others is rare. To build any system, a lot of in depth thought is required before any programming begins. We take the requirements and build a robust system to address each potential scenario that may arise, including some which should not.

Within this blog we will explain how statistical programmers can work through a simple request to build a robust system using logic, SAS, UNIX and experience. It will provide a flavour of how robust systems are built and other considerations. 

Customer Requirements

The client provided a simple top level request about the new system they required.

“To build a robust system which reacts to display creation; finds and runs the related QC programs automatically, and compiles the information in one central place. This needs to work for multiple studies with differing standards.”

A programmer must consider all scenarios and tease out all the nuances of the system to create a system (the batch program).


The Thinking Programmer

After receiving the client’s request, a programmer should brainstorm all possible questions/issues and other considerations. This approach helps focus the programmer on the requirements and enables identification of areas for the client to consider. This is a fundamental part of a programmers role before starting a request.

Some of the considerations to a request are:

  • What aspects are there to creating a robust system?
  • Should there be restrictions on who can run the system?
  • How will we monitor who runs the system?
  • How will this work across the different reports of data?
  • How are we going to gather the required information?
  • How will we link the information?
  • What will be the default settings?
  • Where will we define the default settings?
  • What is the new directory structure?
  • What sub folders are required?
  • What will be the QC naming convention?
  • Do we need to consider switching code on/off?
  • How can we maintain and update the QC information?
  • What are the assumptions?

Breaking down the task

After all the brainstorming, we gather all the questions and create an order to these, i.e. in terms of how the batch program will work. The client should be kept informed of progress and any issues discussed to ensure the initial top level requirements have been interpreted correctly. Without the clients involvement at this stage, there is the potential risk of creating an improved system and wasting time if it was not originally required in the first place.

Therefore we break down the items as follows, which will be considered separately throughout this article:

  • Setting up the environment
  • Directory structure
  • QC programming naming convention

The main components of the process includes:

  1. Setting up the QC reporting environment
  2. Checking the displays vs QC programs available
  3. Copying QC programs from MASTER directory
  4. Running the QC programs

Setting up the Environment

To begin with we need to set up the environment, but to do this we need to first consider how and where we are going to define this information to enable the environment to be set up with ease.

Having one key program defined (GLOBPROT.SAS) with all the relevant information means that we can use this key program to set up the environment but also utilize this program in every other program thereafter. This means that we can have a suite of standard programs which call the specific GLOBPROT.SAS which has all the settings for that specific reporting delivery.

What information do we require to be included in this specific GLOBPROT.SAS for our client?

  • Date of the reporting effort: e.g. 26SEP2011
  • Type of deliverable: e.g. SRT, CSR etc.
  • Standard “Client” macro library path
  • Formats
  • Subsetting code
  • Macro vars including:
    • outdir_lib: Display directory
    • m_lib: QC Master directory
    • q_lib: QC Reporting effort directory
  • Anything else relevant to our specific reporting needs

Directory Structure

When developing a directory structure, keep it simple, you want the users to become familiar with this new directory structure immediately. There will be a directory created called QC_MASTER and underneath this sub directories PROGRAMS, OUTPUT, LOG etc.

Note: Underneath the QC_MASTER/programs the master copies of the programs will reside.

For each reporting delivery there will be a directory called QC_snapdate (e.g. QC_26SEP2011) and mirroring the QC_MASTER directory the same subdirectories will be created.

The easiest way to explain this is diagrammatically:

Figure 1: Diagrammatic representation of the directory/sub-directory structure

Logic of a CRO Programmer1.png


QC Programming Naming Convention

Again there needs to be a simple method used so that it may be applied to all with ease.

For this particular client the display name followed a specific pattern so it was easy to apply. By taking the original display name to be QCed and stripping this back to create a linking variable, the QC program name can then be assigned.

Figure 2: Diagrammatic representation of how the QC programming naming convention was derived

Logic of a CRO Programmer3.png

The linking variable is a key variable within the program. Without this the batch program will not function properly. This variable will enable us to define the status flag for each display within the batch program, this will be discussed later.

The Main Components of the Process

There are six main components of the process:

  1. Setting up the QC reporting environment
  2. Checking the displays vs QC programs available
  3. Copying QC programs from MASTER directory
  4. Running the QC programs
  5. Checking the QC logs
  6. Copying the information back to file

The batch program was developed with the above in mind but by allowing each of the components to be driven independently using macro variables enables as much flexibility with the code as possible. The macro variables are highlighted in bold below:

  • P1_NEWDIRYN=Y: Setting up the QC reporting environment
  • P2_QCVSOUTYN=Y: Checking the displays vs QC programs available
    • 2a] Displays to be QCed
    • 2b] QC programs already in QC reporting effort
    • 2c] QC logs already in QC reporting effort
    • 2d] QC programs in the MASTER area
    • 2e] Incorporating the QC spreadsheet
    • 2f] Creating the match and defining the status
  • P3_COPYQCYN=Y: Copying QC programs from QC_MASTER/programs
  • P4_RUNQCYN=Y: Running the QC programs
  • P5_CHKLOGYN=Y: Checking the LOGs

Note when P2-P5 are run, the QC spreadsheet is updated after each of these tasks has been performed. This ensures the QC spreadsheet always reflects the latest information gathered.

For the remainder of the blog we will concentrate on P1-P2 and as a whole discuss the rest when identifying the status of each display.

Setting up the QC Reporting Environment

This is driven by macro variable P1_NEWDIRYN.

At this point in the paper we will be introducing examples of the code from the batch program, specifically the UNIX commands. These need to be surrounded by the SAS code in bold below to allow the UNIX commands to be run as part of the SAS batch program.

The code below creates the directories (mkdir) then copies (cp -p) the globprot.sas from the MASTER area to the reporting effort retaining the file permissions.

%sysexec %str(    
  cd&m_lib; QC_MASTER/programs
  cd ../../;  
  mkdir QC_&snapdt2; QC_26SEP2011
  cd ../QC_&snapdt2;  
  mkdir programs; QC_26SEP2011/programs
  cp -p &m_lib.globprot.sas&q_lib.globprot.sas;


The code below copies the qc_plan.csv template file from the MASTER area to the reporting effort ONLY if it doesn’t already exist in the reporting effort area. It also renames it to include the reporting effort snap date.

%if %sysfunc(fileexist(“&q_lib.qc_plan_&snapdt2d..csv”)) eq 0
%then %do;
       %sysexec %str(
                               cp -p &m_lib.qc_plan.csv

Identifying the Information for the Displays to be QC'd (2a)

The code below navigates to the display output directory (cd) , then opens up each display and extracts the date/time stamp when the file was run (perl). The reason we didn’t take the Unixdate/time stamp was because the displays could have been copied to the display folder which wouldn’t accurately reflect the display information.

%sysexec %str(
                     cd&outdir_lib;                                            Display directory
                                                                                                     Internal date/time stamp
                     perl -nle ‘print “$ARGV: $&” if /(0[1-9]|[12][0-9]|3[01])
                             (11|12)\ ([0-1][0-9]|2[0-3]):([0-5][0-9])/’ *.* >&q_lib.dd.txt;

Note: The perl code was provided by a colleague which saved us writing additional SAS code.

Identifying the QC Information (2B-2D)

The code below navigates to the reporting effort directory, creates a list of the QC programs only (ls -1) and a long list of the QC logs (including date/times) in this area (ls –el). This also navigates to the MASTER directory and creates a list of QC programs available.

%sysexec %str(
                     cd&q_lib ; QC snap directory
                     ls -1 qc_*.sas>&q_lib.qcprog.txt;                         2b] QC programs
                     ls -el qc_*.log >&q_lib.qclog.txt;                                     2c] QC log

                     cd&m_lib;                                                                 QC MASTER directory
                     ls -1 qc_*.sas>&q_lib.qcmaster.txt;                      2d] QC programs

QC Plan Information (2E)

The data from the QC plan spreadsheet is imported into the batch program to maintain existing information. The following are a list of some of the variables within the spreadsheet with example entries.


  • DISPLAY:                              demo4_saf.htm
  • DISPLAYINFO:                     /project/study/output/26sep11_srt/
  • LINKVAR:                             demo4
  • DISPDT:                               28SEP11:16:20:00
  • QCPROG:                            qc_demo4.sas
  • QCLOG:                               qc_demo4.log
  • QCDT: 01OCT11:               04:09:00
  • STATUS:                              0
  • STATUSD:                           QCed
  • ORDER:                               28
  • AUTO_COMMENTS:          1 MESSAGES - CHECK LOG (0 ERROR 1 WARNING 0 UNINIT 0                                                            REPEATS)


  • MANUAL_COMMENTS:     All values matched with the actual output
  • QCPASSED_YN:                  Y
  • QCAPPROVER:                    JO
  • QCAPP_DATE:                     01OCT11

Compiling the Information

Here is a summary of all the files created so far which will be used in the batch program to identity the status of each displays

  • dd.txt: Displays to QC (*):
  • qcmaster.txt: QC programs in Master area
  • qcprog.txt: QC programs in reporting effort area
  • qclog.txt: QC log (*) in reporting effort area
  • qc_plan*.csv: Existing QC_PLAN (*) contents

(*) The date/time derived variables are used to also identify the status and to ensure the QC information occurs AFTER the display has been created.

Creating the Match and Defining the Status (2F)

The table below provides an example for each status setting:

      DISPLAYS TO QC                         QC PROGRAMS AREAS

Logic of a CRO Programmer4.png

The Importance of Documentation

Although the initial request was simple this did develop into a robust system which could be used by all. To ensure the system can be easily used by all users this system needs to be clearly documented. Any assumptions need to be explained and specific limitations of the system need to be identified. Examples about how to use the system will help the user work with the batch program. This should be a living document which is updated alongside any system modifications.

Audit File

In addition to the standard requirements, we have created an audit file which collates information every time the batch program is run. This way we can monitor the use of the program and the type of information which was required using the following 5 categories: AUDIT DATE, AUDIT TIME, USER, TASK and ADDITIONAL INFORMATION.

Examples of the contents of TASK are as follows:

  • P1_NEWDIRYN=Y: Created directory QC_22NOV11 and subdir programs, output and log. Copied: QCSETUP.SAS, GLOBPROT.SAS and QC_PLAN.CSV from QC_MASTER/programs
  • P2_QCVSOUTYN=Y: Identify which displays have QC programs in place, which need copying over from QC_MASTER/programs and which need developing
  • P3_COPYQCYN=Y: Copied over QC programs from QC/MASTER/programs
  • P4_RUNQCYN=Y: Ran QC programs
  • P5_CHKLOGYN=Y: Checked logs

Additional information for P3 and P4 are the QC program names.

Benefits of the System

The benefits of the new robust system to the client were:

  • One reference area for all the master versions of the QC programs. If updates were required these were only made in the master area
  • The client had initally requested a full automated system but with the flexibility to manually run individual QC programs independant of the system. This includes the benefit that next time the system was run, it picked up the findings from the individual runs which allowed for maximum flexibility
  • Having a mainly automated system enabled QC programs for SRT displays to be generated in minutes rather than hours. This looked into the displays and worked out which QC programs were required, ran them, and collated the findings into one central document
  • Each reporting effort was self contained with Programs, Logs and Outputs related to that delivery
  • Consistency with directory structure and QC program naming convention enable programmers to move around the reporting efforts with ease
  • With careful planning upfront the new system which was built was robust and dynamic


A statistical programmer is a key role in ensuring that a robust and dynamic system is put in place. Being a ‘thinking programmer’ ensures that the most optimal system and process are developed to meet the needs in each team. Communication and involvement by all stakeholders will ensure that the resulting system is easy to use, relevant, and includes all appropriate steps to ensure quality and consistency.

sas macros in clinical trial reporting


Subscribe to the Blog