As a statistical programmer at a leading data focused Clinical Research Organization (CRO), we are requested to become involved with many programming activities on a daily basis, centred around dataset or display generation and quality control (QC). Having the opportunity to develop a process/system which can be used by others is rare. To build any system, a lot of in depth thought is required before any programming begins. We take the requirements and build a robust system to address each potential scenario that may arise, including some which should not.
Within this blog we will explain how statistical programmers can work through a simple request to build a robust system using logic, SAS, UNIX and experience. It will provide a flavour of how robust systems are built and other considerations.
The client provided a simple top level request about the new system they required.
“To build a robust system which reacts to display creation; finds and runs the related QC programs automatically, and compiles the information in one central place. This needs to work for multiple studies with differing standards.”
A programmer must consider all scenarios and tease out all the nuances of the system to create a system (the batch program).
After receiving the client’s request, a programmer should brainstorm all possible questions/issues and other considerations. This approach helps focus the programmer on the requirements and enables identification of areas for the client to consider. This is a fundamental part of a programmers role before starting a request.
Some of the considerations to a request are:
After all the brainstorming, we gather all the questions and create an order to these, i.e. in terms of how the batch program will work. The client should be kept informed of progress and any issues discussed to ensure the initial top level requirements have been interpreted correctly. Without the clients involvement at this stage, there is the potential risk of creating an improved system and wasting time if it was not originally required in the first place.
Therefore we break down the items as follows, which will be considered separately throughout this article:
The main components of the process includes:
To begin with we need to set up the environment, but to do this we need to first consider how and where we are going to define this information to enable the environment to be set up with ease.
Having one key program defined (GLOBPROT.SAS) with all the relevant information means that we can use this key program to set up the environment but also utilize this program in every other program thereafter. This means that we can have a suite of standard programs which call the specific GLOBPROT.SAS which has all the settings for that specific reporting delivery.
What information do we require to be included in this specific GLOBPROT.SAS for our client?
When developing a directory structure, keep it simple, you want the users to become familiar with this new directory structure immediately. There will be a directory created called QC_MASTER and underneath this sub directories PROGRAMS, OUTPUT, LOG etc.
Note: Underneath the QC_MASTER/programs the master copies of the programs will reside.
For each reporting delivery there will be a directory called QC_snapdate (e.g. QC_26SEP2011) and mirroring the QC_MASTER directory the same subdirectories will be created.
The easiest way to explain this is diagrammatically:
Figure 1: Diagrammatic representation of the directory/sub-directory structure
Again there needs to be a simple method used so that it may be applied to all with ease.
For this particular client the display name followed a specific pattern so it was easy to apply. By taking the original display name to be QCed and stripping this back to create a linking variable, the QC program name can then be assigned.
Figure 2: Diagrammatic representation of how the QC programming naming convention was derived
The linking variable is a key variable within the program. Without this the batch program will not function properly. This variable will enable us to define the status flag for each display within the batch program, this will be discussed later.
There are six main components of the process:
The batch program was developed with the above in mind but by allowing each of the components to be driven independently using macro variables enables as much flexibility with the code as possible. The macro variables are highlighted in bold below:
Note when P2-P5 are run, the QC spreadsheet is updated after each of these tasks has been performed. This ensures the QC spreadsheet always reflects the latest information gathered.
For the remainder of the blog we will concentrate on P1-P2 and as a whole discuss the rest when identifying the status of each display.
This is driven by macro variable P1_NEWDIRYN.
At this point in the paper we will be introducing examples of the code from the batch program, specifically the UNIX commands. These need to be surrounded by the SAS code in bold below to allow the UNIX commands to be run as part of the SAS batch program.
The code below creates the directories (mkdir) then copies (cp -p) the globprot.sas from the MASTER area to the reporting effort retaining the file permissions.
|cp -p &m_lib.globprot.sas&q_lib.globprot.sas;
The code below copies the qc_plan.csv template file from the MASTER area to the reporting effort ONLY if it doesn’t already exist in the reporting effort area. It also renames it to include the reporting effort snap date.
%if %sysfunc(fileexist(“&q_lib.qc_plan_&snapdt2d..csv”)) eq 0
cp -p &m_lib.qc_plan.csv
The code below navigates to the display output directory (cd) , then opens up each display and extracts the date/time stamp when the file was run (perl). The reason we didn’t take the Unixdate/time stamp was because the displays could have been copied to the display folder which wouldn’t accurately reflect the display information.
cd&outdir_lib; Display directory
Internal date/time stamp
perl -nle ‘print “$ARGV: $&” if /(0[1-9]|[0-9]|3)
(11|12)\ ([0-1][0-9]|2[0-3]):([0-5][0-9])/’ *.* >&q_lib.dd.txt;
Note: The perl code was provided by a colleague which saved us writing additional SAS code.
The code below navigates to the reporting effort directory, creates a list of the QC programs only (ls -1) and a long list of the QC logs (including date/times) in this area (ls –el). This also navigates to the MASTER directory and creates a list of QC programs available.
cd&q_lib ; QC snap directory
ls -1 qc_*.sas>&q_lib.qcprog.txt; 2b] QC programs
ls -el qc_*.log >&q_lib.qclog.txt; 2c] QC log
cd&m_lib; QC MASTER directory
ls -1 qc_*.sas>&q_lib.qcmaster.txt; 2d] QC programs
The data from the QC plan spreadsheet is imported into the batch program to maintain existing information. The following are a list of some of the variables within the spreadsheet with example entries.
Here is a summary of all the files created so far which will be used in the batch program to identity the status of each displays
(*) The date/time derived variables are used to also identify the status and to ensure the QC information occurs AFTER the display has been created.
The table below provides an example for each status setting:
DISPLAYS TO QC QC PROGRAMS AREAS
Although the initial request was simple this did develop into a robust system which could be used by all. To ensure the system can be easily used by all users this system needs to be clearly documented. Any assumptions need to be explained and specific limitations of the system need to be identified. Examples about how to use the system will help the user work with the batch program. This should be a living document which is updated alongside any system modifications.
In addition to the standard requirements, we have created an audit file which collates information every time the batch program is run. This way we can monitor the use of the program and the type of information which was required using the following 5 categories: AUDIT DATE, AUDIT TIME, USER, TASK and ADDITIONAL INFORMATION.
Examples of the contents of TASK are as follows:
Additional information for P3 and P4 are the QC program names.
The benefits of the new robust system to the client were:
A statistical programmer is a key role in ensuring that a robust and dynamic system is put in place. Being a ‘thinking programmer’ ensures that the most optimal system and process are developed to meet the needs in each team. Communication and involvement by all stakeholders will ensure that the resulting system is easy to use, relevant, and includes all appropriate steps to ensure quality and consistency.
Address - UK HQ:
Address - US HQ: