<img alt="" src="https://secure.perk0mean.com/171547.png" style="display:none;">

The Advantages of Parallel Processing Clinical Data in SAS/Connect

Parallel Processing Clinical Data in SAS Connect

Increasing amounts of data to be processed and further use of computationally intensive statistical techniques such as Bayesian Analysis and Multiple Imputation (MI) in clinical trials has resulted in a large increase in computer processing times which presents challenges when analyzing and reporting clinical trial data. This increase in processing time can cause delays in timelines if this has not been fully accounted for. The execution of the quality control (QC) programs of such tasks may also have to be coordinated and performed in parallel to limit the total time spent processing on production and QC which can be difficult to coordinate. It may even be the case that results from the production program are unable to be fully produced when double programmed due to time constraints (e.g. obtaining fewer samples in a Bayesian analysis or performing fewer imputations) which can reduce the quality of the QC process.

Parallel processing

Typically the reporting and analysis programs are run in SAS whereby a user would submit code to be executed in a single server session. This means that through single threaded workloads all code is executed sequentially.

SAS/connect is a SAS client/server toolset that provides the ability to manage, access, and process data in a distributed and parallel SAS environment. This means that a program can be executed on multiple server sessions at once, resulting in code being executed in parallel which can vastly reduce the processing time of tasks that can be executed in this manner.

Multiple imputation

Multiple imputation is a statistical technique for analyzing missing data by imputing missing values multiple times, analyzing each of these datasets separately before combining the analysis.

Using SAS/connect, datasets can be imputed and analysed across multiple sessions at once before being pooled together after all of the sessions have finished executing. For example MI can be used to impute a dataset 1,000 times across 10 sessions with 100 datasets being imputed and analysed simultaneously in each session. This can result in a vast reduction in the time taken to perform this analysis when compared to 1,000 datasets being processed sequentially.

Dataset processing

Clinical trials collecting vasts amount of data on each subject can result in the execution of dataset processing (e.g. sorting, merging and deriving endpoints) taking longer as the size of the SAS dataset to be processed increases.

When processing a dataset it may be possible to perform the vast majority of operations required on a subset of data being processed. Subsets of a dataset can therefore be executed on separate sessions in parallel before being combined once all sessions have finished executing. For example when producing an analysis dataset e.g. ADLB, in a trial where the derived variables for a given subject do not depend on the information collected on other subjects, a subset of subjects can be processed in a different server session e.g. a dataset containing information from 100 subjects can be executed across 10 sessions, each processing the results from 10 subjects.

Analyzing multiple endpoints

Clinical trials often require analyzing multiple endpoints using the same analysis e.g. mixed effect model. Using SAS/connect, we would be able to perform the analysis on different endpoints in parallel opposed to sequentially in a given program.

Executing multiple programmes at once

In the reporting of a clinical trail, multiple analyses and reporting programs are required to be executed which can be time consuming. For example when updates to programs are made that impact the results produced by other analysis programs or on the receipt of new/un-blinded data.  SAS/connect enables you to run codes in parallel and even account for dependence between programs, allowing for codes to have completed running before executing dependent programs.

Disadvantages

Unfortunately not all tasks can benefit from being executed in parallel as the results from a previous processing are required for subsequent processing steps.

The use of multiple server sessions by a single user can result in a user taking up more server resources than what has been accounted for resulting in a reduction in resources used by other users making use of the same server. 

Conclusion

SAS/connect is a powerful tool that can enable SAS code to be executed in parallel with little additional programming required which can vastly reduce processing times for computationally intensive tasks. However, the additional processing carried out by an individual user can put strain on the servers used by others, therefore a compromise must be made by reducing the number of sessions used (increasing processing time) in order to maintain enough processing power for other users on the system.

sas macros in clinical trial reporting

Quanticate's statistical programming team can support you with Laboratory dataset, CDISC Mapping and SDTM conversions and domains. Our team of experts would be happy to provide support and guidance for your development programme if you have a need for these types of services please Submit a RFI and member of our Business Development team will be in touch with you shortly. 

Subscribe to the Blog