Podcast

QCast Episode 28: Real World Data Analysis in Clinical Trials

Written by Marketing Quanticate | Jan 9, 2026 9:00:00 AM

In this QCast episode, co-hosts Jullia and Tom explore how real world data is analysed to complement clinical trials. They clarify the difference between real world data and real world evidence, explain where these datasets add value across feasibility, development, and longer-term follow-up, and discuss the practical steps required to turn routine healthcare records into analysis-ready evidence. The conversation also covers common data pitfalls, how to handle ambiguity in a defensible way, and what governance and regulatory expectations mean in day-to-day delivery.

🎧 Listen to the Episode:

 

 

Key Takeaways

What Real World Data Is and Why It Matters in Trials
Real world data is collected in routine care, and real world evidence is the insight produced when it is analysed for a defined question. Used alongside trials, it adds scale, longer follow-up, and everyday-care context that trials often cannot capture.

How Real World Data Analysis Works in Practice
Most of the work is in building defensible cohorts and definitions from fragmented records across sources such as electronic health records, claims, and registries. Clear rules for exposure, outcomes, and visit definitions, plus systematic data quality checks, are essential to produce repeatable results.

Limitations, Governance, and Best Practices
Because analyses are observational, teams must manage confounding, bias, and representativeness, and be careful not to over-claim findings. Strong governance, documentation, and sensitivity analyses help maintain trust and meet regulatory expectations.

Full Transcript

Jullia
Welcome to QCast, the show where biometric expertise meets data-driven dialogue. I’m Jullia.

Tom
I’m Tom, and in each episode, we dive into the methodologies, case studies, regulatory shifts, and industry trends shaping modern drug development.

Jullia
Whether you’re in biotech, pharma or life sciences, we’re here to bring you practical insights straight from a leading biometrics CRO. Let’s get started.

Tom
Today we're going to be discussing real world data analysis in the context of clinical trials, but what does that actually mean? Real world data and real-world evidence are used almost interchangeably, and that can confuse teams. Can you set out the difference, and why sponsors are putting more weight on it now, alongside the usual controlled trial data?

Jullia
So, real world data, or RWD, is information collected about patients and healthcare in routine settings, outside the strict visit schedules and intervention rules of a trial. Real world evidence, or RWE, is what you get when you analyse that RWD to answer a specific question. The reason it has become so central is scale and time. Trials are designed to be clean and focused, but they usually observe people for a limited window. Real world data can extend the picture over years, across broader populations, and under everyday patterns of care. That makes it useful for questions like who gets treated in practice, what happens after a trial ends, and whether outcomes in routine care match what we saw under controlled conditions.

Tom
Now what are the main sources of real world data that teams rely on and how different are they from an analysis point of view? And while we’re there, where does this work tend to sit alongside a trial, before it starts, during delivery, and after it finishes?

Jullia
So the big sources are electronic health records, often shortened to EHRs, insurance claims, disease registries, lab test results, and patient generated data from wearables or surveys. Each comes with trade-offs. Claims data is created for billing, so it’s often structured and consistent, but it can be limited in clinical detail and may only cover the period someone is on a particular plan. EHR data can be rich and longitudinal, but it can also be messy because it’s captured across many providers and systems, with variation in how events are recorded. Registries can be excellent in rare diseases because they’re focused and long term, but they may take time to build enough volume and can miss wider information. In practice, teams often combine sources to balance strengths and weaknesses, but that raises linkage, harmonisation, and governance needs.

Alongside a trial, real world data is typically complementary. Before a trial, it can help establish target populations, understand care pathways, and support feasibility by showing where eligible patients are and how they’re treated. That can speed up recruitment planning and reduce avoidable protocol changes. During development, it can highlight an efficacy effectiveness gap, where a drug looks strong in a trial but performs differently in routine care because of missed doses, interactions, or broader case mix. It can also support label expansion planning when real world outcomes suggest a population worth testing. After a trial, real world data can extend safety and outcome monitoring. A trial might last a year, but routine data can cover 10 to 15 years and a wider population, which is where rarer events and longer-term patterns can emerge.

Tom
Now let’s shift to the “analysis in the real world” problem. When teams actually start programming and building cohorts, what tends to be harder than expected? What are the recurring data issues that can trip up even experienced trial programmers?

Jullia
Well, one of the recurring challenges is that the patient record is fragmented. You’re often pulling diagnoses, procedures, prescriptions, and visits from multiple tables and then stitching them into a coherent timeline. That can surface obvious data issues, like implausible dates or invalid codes, and those are usually solvable with standard cleaning and checks. The harder problems sit in the grey zone, where an event could be a data artefact or a real clinical scenario. Prescriptions are a good example. Patients can have overlapping supplies, refill early, lose medication, or change dose, and you need a defensible algorithm for exposure episodes and gaps. Visit counting is another. For healthcare resource utilisation, many databases do not neatly group records into a single visit, so you have to agree a visit definition with statisticians and document it. Even coding systems vary, so you must confirm whether the database uses ICD-10, National Drug Code, or other schemes, and how formatting is handled.

Tom
That grey zone is where analysis decisions start affecting conclusions. How do teams handle those ambiguous situations without making the analysis look arbitrary? And when clinician input is needed, how do you structure that collaboration so it stays audit-friendly?

Jullia
So the key for that is pre-specification and consistency. You define the cohort, exposures, endpoints, and key algorithms up front, including how you’ll handle edge cases, then you apply them systematically. Where judgement is unavoidable, you make it explicit and keep it traceable. Clinician input helps interpret what the medical story could plausibly be, and it can guide whether a rule should be conservative or inclusive. For example, with non-billable codes in claims, you may treat them as invalid in general, but you might decide to retain them in a rare disease context if excluding them would erase genuine patients. The collaboration works best when you turn discussions into documented rules, with examples, and then test those rules on real extracts. That makes quality checks faster, and it reduces the risk of a different analyst making a different choice later.

Tom
Now let’s move onto limitations. What are the main risks with real world data, from bias and confounding through to privacy and regulatory expectations? And to close, what are the quick wins and common pitfalls you want teams to watch for when they start this work?

Jullia
So as you’d imagine, there are a few core risks. First, lack of standardisation and variable data quality can create missingness, misclassification, and mismatches across sources. Second, representativeness isn’t guaranteed. Claims data can under-represent people without stable coverage, and wearable data can skew towards groups more likely to use devices. Third, confounding is a constant challenge because routine care is not randomised, and outcomes can be driven by factors you cannot fully observe. The practical response is to design studies carefully, choose comparators thoughtfully, and run sensitivity analyses so you understand how robust the conclusions are to reasonable alternative assumptions.

On privacy, even de-identified health data remains sensitive, and governance needs to address re-identification risk, access controls, and security. From a regulatory perspective, current guidance expects that real world evidence is built on credible data, clear provenance, and transparent methods. Regulators can be open to real world evidence when the study is well designed and limitations are handled honestly, but teams should avoid presenting observational results as if they were randomised proof.

Moving onto quick wins, start with the decision you are trying to support, because simply saying, “use real world data” isn’t a plan. Define the study question, the intended use, and what would change if the result goes one way or the other. Choose data sources that are fit for purpose and be clear about what each source can and cannot tell you. Lock down cohort definitions, exposure rules, and endpoint definitions early, and document them in plain language that both clinicians and analysts can follow. Build in data quality checks as you go, not at the end, and expect to iterate on linkage and harmonisation. Where possible, align to common data models, because it helps with repeatability across databases. The common pitfalls are treating real world data like trial data, ignoring confounding, and leaving key assumptions undocumented.

Jullia
With that, we’ve come to the end of today’s episode on real world data analysis in clinical trials. If you found this discussion useful, don’t forget to subscribe to QCast so you never miss an episode and share it with a colleague. And if you’d like to learn more about how Quanticate supports data-driven solutions in clinical trials, head to our website or get in touch.

Tom
Thanks for tuning in, and we’ll see you in the next episode.

About QCast

QCast by Quanticate is the podcast for biotech, pharma, and life science leaders looking to deepen their understanding of biometrics and modern drug development. Join co-hosts Tom and Jullia as they explore methodologies, case studies, regulatory shifts, and industry trends shaping the future of clinical research. Where biometric expertise meets data-driven dialogue, QCast delivers practical insights and thought leadership to inform your next breakthrough.

Subscribe to QCast on Apple Podcasts or Spotify to never miss an episode.