<img alt="" src="https://secure.perk0mean.com/171547.png" style="display:none;">

Exploring CDISC Analysis Data Model (ADaM) Datasets

AdAM Datasets

Clinical Data Interchange Standards Consortium (CDISC) defines and manages industry level data standards that are widely used during the analysis, reporting and the regulatory submission of clinical data. For instance, the Study Data Tabulation Model (SDTM) is the submission data standard into which raw study data are mapped and collated. ADaM is a companion standard for use with analysis data and it is best practice to use SDTM data as the source for these datasets. Doing this allows for the easy documentation of any data processing with Define-XML, the CDISC standard for data definition files.

Being able to trace the flow from source values to derived ones is a clear intention of the ADaM standard and that applies to the structure of any datasets and the required linkage to machine-readable metadata. It also is crucial that data are made analysis-ready so that the production of tables, listings and figures needs minimal effort to achieve with currently available tools with little or no further data manipulation.

While SDTM domain classes are determined according to data type such as interventions, events or findings, their ADaM equivalents are classified by analysis approach. Of the main data structures, one is best suited to the needs of analysis of continuous data values while another supports categorical analyses. There also is a subject-level analysis dataset that needs to be created for every study where ADaM is used.

All ADaM datasets are named ADxxxx, where xxxx is sponsor-defined and often carries over the name of the source SDTM domain. For example, an ADaM domain called ADLB would use the LB SDTM domain as its data source. This one-to-one domain mapping is not mandatory though and the required number of ADaM domains depends on the needs of any study data analysis or data review. An ADaM domain may use more than one SDTM domain as its source and carry a unique name that reflects this.

For ADaM variables, the naming conventions should follow the standardized variable names defined in the ADaM Implementation Guide. Any variables copied directly from SDTM data into an ADaM domain shall be used unchanged, with no change made either to their attributes (name, label, type, length, etc.) or their contents. Sponsor-defined variable names can be given to any other analysis variable that is not defined within the ADaM or SDTM standards. Following these conventions will provide clarity for the reviewer.

The ADaM subject-level analysis dataset is called ADSL and contains a maximum of one record per subject that contains variables which contain key information for subject disposition, demographic, and baseline characteristics. Other variables within ADSL will contain planned or actual treatment group information as well as key dates and times of the subjects study participation on the study. Not all variables within ADSL may be used directly for analysis but could be used in conjunction with other datasets for display or grouping purposes or possibly included simply as variables of interest for review. Given that the intention of ADSL is to contain variables that describe subjects, analysis populations and treatment groups to which they belong or prognostic factors, subject level efficacy information should not be added here but should be placed in another domain. Variables from ADSL may be added to other ADaM domains where doing so aids output creation or data review.

Another main class of ADaM datasets is the Basic Data Structure (BDS) and this contains one or more records per subject, analysis parameter or analysis timepoint. It is possible to add derived analysis parameters if required for an analysis. An example would be where a derivation uses results from a number of different parameters or where a mean is calculated at subject level from all the values collected for a subject. Derived records also may be added to support Last Observation Carried Forward (LOCF) or Worst Observation Carried Forward (WOCF) analyses.

The BDS is especially useful for continuous value analyses such as presenting mean, median, standard deviation and so on. This may not be the only usage but for a domain to comply with the BDS standard, it at the very least must contain variables for study and subject identifiers, analysis parameter name and code as well as analysis values. If any of these are absent, then the dataset does not fit the BDS description.

A variant of the BDS is available for Time to Event (TTE) analyses that are commonly used in therapeutic areas like oncology. This additionally contains variables for the original date of risk used for the start times in any TTE analysis or censoring for subject where the events of interest are not observed.

In February 2016, CDISC published the Occurrence Data Structure (OccDS) for use in categorical analyses where summaries of frequencies and percentages of occurrence are planned. This is an extension of the previously published ADAE structure that contains extra variables for use with concomitant mediation or medical history data. Data from other SDTM domains in the event or intervention classes may be mapped into OccDS if it fulfils their analysis needs. Some, such as exposure data, may be mapped to either BDS or OccDS depending on the analysis and even may be split into two ADaM domains in study where both categorical and continuous analyses are required.

Currently, ADaM supports the majority of analysis needs for clinical data. It may not be as prescriptive as SDTM but if offers flexibility while at the same time ensuring that a set of analysis data standards can be set in place by a sponsor. ADaM datasets also can be submitted to a regulatory agency much like SDTM and has in-built traceability while also having compatibility with Define-XML, so that machine-readable data definitions can be supplied along with any detailed computational details.

 

 

Case Study: ISS and ISE Submissions Using CDISC ADaM Datasets

Using the ADaM standards to facilitate ISS ISE submissions ensures that the data presented to regulatory bodies is consistent, traceable, and of high integrity. This case study outlines best practices and considers some challenges you may need to overcome for the successful integration of ADaM datasets to create ISS and ISE submissions.

In the integration project Quanticate performed, there were 24 early phase (Phase 1 and Phase 2) studies which were reported containing 800+ subjects combining all studies. All subjects were treated with different medications along with the medication of interest at different dosing levels. Studies ranged from one to six periods. The summaries were based on whether a subject received a single or multiple dose of the medication along with or without concomitant medications and a placebo. The summaries were based on treatment groups under which subjects were randomized. Some subjects were excluded if they never received the medication of interest from the summaries, but not excluded from SDTM/ADaM data.

The below sections will explain the planning and handling of study integration conforming to CDISC ADaM guidelines.



Pre-Planning and Data Harmonization

Data harmonization is an important step in preparing for the ADaM data integration. Standardizing data elements across studies involves aligning demographics, treatment regimens, endpoint measurements, and other relevant data to ensure consistency. Applying consistent CDISC standards across all datasets is vital for maintaining regulatory compliance and ensuring the reliability of the integrated analysis.Understanding the study designs of all the studies to be integrated was the key for developing the specifications for the datasets. Equally important was identification of at which stage the medication of interest was administered. Planning of any such integrations beforehand is helpful. We need to understand the complexity of the submission and keep the details in hand such as the number of studies to be integrated, exclusions were needed to be applied in the datasets, and end points of the submission. Parallel and crossover studies can be integrated, but care must be taken when grouping doses and period definition clarified for this. A list of variables was created according to the CDISC ADaM naming convention which are named with the prefix of “IS”. Example variables are ISPERIOD, ISSTDT, ISENDT, ISS01A, ISS02A etc.

The below table explains about how this information was included/excluded for summaries without losing any of the detail.

STUDYID USUBJID TRT01A TRT02A ISS01A ISS02A
111 111.001 X A A  
111 111.002 A X A  
111 111.003 A B A  
111 111.004 A AB A AB
112 111.001 PLACEBO   PLACEBO  

 

In the above, Treatment A is the medication of interest, so the TRTxxA variables represent the information to match the source studies and ISxxA variables are used for integration. 111 is a cross over study with TRTA, TRTB, TRTX and TRTAB. For IS period or treatment we use TRTA reference only, ignoring treatment X and treatment B. Whereas in some studies some subjects undergone a pre-treatment for 1 or 2 weeks with a different medication, in such cases all the data from those weeks is also excluded from summaries. If a subject withdrew in that period, then they were not counted.

 

Examples of Traceability Challenges

Traceability is an important part of a regulatory submissions. Every step from the source data to the final analysis-ready datasets must be transparent and well-documented. This ensures that any data transformations or derivations are easily understandable and justifiable to regulatory reviewers. Effective documentation, including comprehensive metadata and annotated CRFs, supports this traceability. The Analysis Data Reviewer’s Guide (ADRG) is an excellent resource for maintaining such detailed documentation.

 

Dictionary Versions

In the integration it is often found that some studies were completed using different versions of MedDRA and WHO drug dictionaries. Following FDA requirements, all coded terms need to be up-versioned using the same coding dictionaries, whilst maintaining the original coding for traceability back to the original study terms. It is challenging when integrating 20+ studies as all terms need to be checked and compared back to the individual studies.

 

Population Definitions

Populations created for the individual study reporting may not support the integrated summaries or analyses that are required for an ISS or ISE and as such the safety population definition was re-defined in relation to the administration of medication of interest. This may or may not provide the same number of safety subjects to that of individual studies. The original study population flags were retained in the combined ADaM datasets to maintain traceability back to the original studies, so any new population flags were created using an IS prefix. To support traceability, it is necessary to document all derivations in the metadata in the ADRG. If any records were excluded from the source study, these were also documented in the ADRG with proper explanation. An additional document was also created where discrepancies between source study and the integrated data as a part of our quality checks process was documented to make sure no subject was excluded, and all subjects were reported as per expectations.

 

 

Statistical Analysis Plan: Changing the Definition of Baseline

Within the ISS or ISE, a Statistical Analysis Plan (SAP) will define the rules for data integration and specify the criteria for inclusion, data imputation methods, and definitions of analysis populations. The Baseline definition differed across the studies, so for the needs of the integration, the baseline was redefined to be consistent as the last assessment prior to the administration of medication of interest or placebo. Again, for there to be traceability back to the original studies the original baseline was kept. A new derived baseline was set up based on the requirements with the usage of BASETYP. BASETYP variable in ADaM to identify the original baseline from the source study and the updated baseline rows for integration.

 

Rigorous Quality Control

Quality assurance is vital throughout the dataset preparation and integration process. Utilizing validation tools such as Pinnacle 21 can help verify that datasets comply with regulatory and CDISC standards, highlighting areas that may require attention before submission. Because of Quanticate’s quality control process there were a few instances when there were discrepancies between source study summary results and the integrated data summary result. These were documented with an explanation provided for the root cause of the discrepancy.

 

Use Technology and Collaboration to Enhance Efficiency

As a specialist Contract Research Organization (CRO), Quanticate can leverage our advanced statistical programming tools to automate data integration and analysis for ISS and ISE submissions with automated STDMs. Establishing integrated project teams across biostatistics, clinical data management and medical writing aligned project goals and efficient workflows. Utilization of project management and communication tools like Microsoft Project and Teams facilitates seamless collaboration and real-time updates across geographically dispersed teams. Continuous training on regulatory guidelines and technological advancements enhances our teams’ competency and operational efficiency.

 

 

 

References:

  • Analysis Data Model, CDISC Analysis Data Model Team
  • Analysis Data Model Implementation Guide, CDISC Analysis Data Model Team
  • The ADaM Basic Data Structure for Time-to-Event Analyses, CDISC Analysis Data Model Team
  • ADaM Structure for Occurrence Data, CDISC Analysis Data Model Team

Authors note: This blog was originally published on 08/10/2012 and has since been updated.

  New Call-to-action

 

  Related Blog Posts:

Subscribe to the Blog