QCast Episode 36: ACR Response in Rheumatoid Arthritis Clinical Trials

QCast Header ACR Response

In this QCast episode, co-hosts Jullia and Tom explore ACR response criteria in rheumatoid arthritis clinical trials and why these endpoints demand more delivery discipline than their simple responder labels suggest. They clarify what an ACR response means in day-to-day trial work: a structured improvement definition built from joint counts, patient- and physician-reported assessments, disability measures, and acute phase reactants, combined into a single responder outcome at a specified visit. The conversation focuses on where teams are most exposed operationally, including assessment consistency, visit window control, and the knock-on effects of missing or late component data when you need to derive ACR20 and its higher thresholds.

They also discuss what changes when teams look beyond a single time point. Landmark ACR20 responder rates remain useful because they are easy to communicate and analyse, but rheumatoid arthritis response evolves over time, and two treatments can present similar responder proportions at a headline visit while behaving differently earlier on. For studies where timing and similarity both matter, including equivalence contexts, teams may use repeated assessments more deliberately and consider longitudinal summaries that reflect the full response profile. Along the way, Jullia and Tom highlight common misconceptions and failure modes, such as treating ACR as a single field, building derivations without reflecting the hierarchy across ACR20, ACR50, and ACR70, or leaving interpretation plans vague when curve-focused methods are introduced.

🎧 Listen to the Episode:

Key Takeaways

What ACR Response Criteria Are and Why They Matter
ACR response criteria are a standardised way to define clinically meaningful improvement in rheumatoid arthritis by comparing baseline to a later visit. ACR20 requires at least a 20 per cent improvement in tender or swollen joint counts and at least a 20 per cent improvement in three of five additional measures, with ACR50 and ACR70 applying the same structure at higher thresholds. Although reported as a binary responder outcome, ACR behaves like a composite endpoint in delivery because classification depends on multiple assessments being present, consistent, and in-window. Treating ACR as endpoint-driving data early reduces missingness, avoids unclassifiable responses, and strengthens the interpretability of treatment comparisons.

How ACR Endpoint Risk Shows Up in Practice
Risk often appears first as operational drift rather than statistical complexity. Inconsistent tender and swollen joint counts, late or missing acute phase reactants, and small visit window deviations can prevent clean derivation and trigger downstream sensitivity work. Patient- and physician-reported assessments add variability if sites apply instruments inconsistently or capture them out of sequence. Teams also create avoidable confusion if derivations and reporting do not reflect the ACR hierarchy, where ACR70 implies ACR50 and ACR20. Prioritising queries that affect joint counts, disability measures, and lab timing helps protect endpoint completeness and reduces late-cycle rework.

Longitudinal Use, Interpretation, and Practical Best Practices
Landmark ACR20 responder rates remain useful because they are easy to communicate and analyse, but rheumatoid arthritis response evolves over time and the response profile can matter. When multiple assessments are collected, teams may use repeated measures, curve-focused approaches, or overall summaries such as weighted differences or GEE to reflect treatment behaviour across visits. The practical requirement is clarity: pre-specify how repeated time points will be used and how results will be interpreted in clinical terms. Operationally, good practice includes focused site training on joint counts, tight visit window control, reliable lab turnaround for ESR or C-reactive protein, and derivations that enforce the ACR threshold nesting to keep outputs coherent.

Full Transcript

Jullia
Welcome to QCast, the show where biometric expertise meets data-driven dialogue. I’m Jullia.

Tom
I’m Tom, and in each episode, we dive into the methodologies, case studies, regulatory shifts, and industry trends shaping modern drug development.

Jullia
Whether you’re in biotech, pharma or life sciences, we’re here to bring you practical insights straight from a leading biometrics CRO. Let’s get started.

Tom
Today we’re talking about ACR response criteria in rheumatoid arthritis trials. People hear ACR20 and think it’s a single score, but really it’s more like a package of checks. When you’re designing a study or cleaning data, what do you want teams to keep in their heads first?

Jullia
So I’d start with what it’s trying to do. The American College of Rheumatology criteria, or ACR, is a structured way to quantify improvement in rheumatoid arthritis symptoms between two time points. These are usually baseline and a post-baseline visit. It lands as a responder or non-responder, so it’s dichotomous, but underneath that you’ve got multiple components that all need to move together.

And the important bit is you can’t “rescue” a response with one great value. If the tender and swollen joint counts don’t improve enough, it doesn’t meet the definition, even if pain scores look better.

Tom
So let’s pin down that definition in plain language. What is ACR20 actually asking you to show, and what are the moving parts you’re reliant on?

Jullia
So ACR20 means a patient shows at least a 20 per cent improvement in tender or swollen joint counts, and at least a 20 per cent improvement in three of five other measures. Those other measures are the patient’s global assessment of disease activity, the physician’s global assessment, a patient pain scale, disability or function often captured via the Health Assessment Questionnaire Disability Index, and an acute phase reactant like ESR or C-reactive protein.

You tend to see the dependency when a single component is missing. If you don’t have the joint counts, or the lab value isn’t available in-window, you can’t classify the endpoint cleanly.

Tom
And ACR50 and ACR70 follow the same structure, just with higher thresholds, right? One thing people sometimes miss is how those categories relate to each other.

Jullia
Exactly. ACR50 is at least 50 per cent improvement, and ACR70 is at least 70 per cent, using the same pattern of joint counts plus three of the five measures. And there’s that nesting relationship: someone who meets ACR70 also meets ACR50 and ACR20, and ACR50 implies ACR20.

That’s useful when you’re setting up derivations and reporting, because you want the hierarchy to be explicit. Otherwise, you can end up with tables that look inconsistent when they shouldn’t.

Tom
Most trials I see still use ACR20 at a specific visit as a key endpoint, often around mid-study. Now why do sponsors keep coming back to that single time point comparison?

Jullia
Well it’s a clean decision point. You compare the proportion of ACR20 responders in each treatment group at a pre-specified visit, and you can test a difference at that moment in time. It’s easy to communicate and it maps neatly to a binary endpoint analysis plan.

But rheumatoid arthritis response unfolds over time. Two arms can look similar at a landmark visit, and behave differently earlier on, and that can matter in interpretation. It’s also why teams pay attention to the full profile in equivalence settings, including biosimilar comparisons, where the overall pattern through treatment can be part of the similarity argument.

Tom
I think that’s the bit people underestimate. Once you start looking across multiple time points, what does “good analysis” look like without overcomplicating things?

Jullia
So you’ve got a few options, and they sit on a spectrum. Repeated measures modelling lets you use all those assessments rather than treating everything between baseline and the primary visit as background noise. In rheumatoid arthritis studies, there’s also interest in modelling the response pattern through time with non-linear approaches, because the curve often has a shape you can capture.

The simplest way to think about it is you’re comparing response curves across the study, not only asking who responded at week 24.

Tom
Now you just mentioned non-linear modelling. People hear that and immediately think “complex, fragile, hard to explain”. What tends to go wrong there?

Jullia
A couple of things. First, the model is only as good as the data density and quality across visits. If you’ve got patchy attendance, inconsistent assessment timing, or lots of missing joint counts, you can force the model into doing more guesswork than it should.

Second, interpretation can get slippery. Some curve comparison summaries don’t map neatly onto the equivalence margins teams are used to, like a difference in responder rates. If you haven’t planned how you’ll explain what the summary means, you can end up with a result that’s technically sound but hard to use in a decision meeting.

Tom
Could you give me a concrete example at study level? Not numbers, just what a team actually does differently when they’re treating longitudinal ACR data as a first-class citizen.

Jullia
Yeah so you build the visit schedule so ACR components line up, and you put real operational emphasis on hitting those windows. For example, if ACR is assessed at weeks four, eight, 12, and 24, you work with sites so the joint assessment happens consistently and at the right point in the visit.

Then on the data side, you tighten the turnaround for lab uploads because acute phase reactants are part of the criteria. If ESR is coming in late, you’re not just missing a lab, you’re potentially missing the ability to classify response. And in data management, you prioritise queries that affect tender and swollen joint counts, the HAQ Disability Index, and the time windows that anchor the endpoint.

Tom
Okay so we’ve got the single time point responder comparison, and we’ve got response curves. Now when teams do want an “overall summary” across time, what are the main analytical routes?

Jullia
So there are approaches that keep you close to the scale most teams recognise. A weighted mean method takes the difference in responder rates at each time point, uses inverse standard errors as weights, and averages those differences into one estimate with a confidence interval. Another is a generalised estimating equations approach, or GEE, for binary data, using an identity link to estimate an overall difference in proportions across relevant time points, again with a confidence interval.

You’ll also see curve-distance style summaries mentioned in the literature, but the practical challenge is agreeing what threshold means “equivalent” in clinical terms. That’s why many teams prefer summaries that still speak in differences in responder rates.

Tom
So to summarise in one sentence: teams want the simplicity of responder rates, but they also want to respect the full-time profile when timing and similarity both matter. Does that sound right?

Jullia
Yeah that pretty much captures it. And it’s worth saying that a lot of programmes can still make a solid decision using one pre-determined time point, especially when the endpoint, estimand, and missing data handling are clearly specified.

The moment it matters is when the time profile tells a different story than the snapshot. If one arm takes longer to show benefit, even if it catches up later, that can change how clinicians and patients experience the treatment. So the longitudinal view often acts as a check on whether the pattern matches what teams expect.

Tom
Now what are the quick wins and the common pitfalls to consider when a programme uses ACR endpoints?

Jullia
Let’s cover quick wins first. Train sites hard on consistency of tender and swollen joint counts, because that’s the gatekeeper for every ACR threshold. Make sure patient-reported outcomes are captured cleanly and within the visit window, and build a reliable process for getting ESR or C-reactive protein into the database on time.

Then moving onto pitfalls, these include treating ACR20 as if it’s “just one field” and discovering late that you’ve got missingness in one of the supporting measures. Another is letting visit timing drift, then trying to run longitudinal summaries on assessments that don’t line up.

Tom
Now one last area I want to touch on is misconceptions. I’ve heard teams assume that because ACR is dichotomous, it’s automatically straightforward. But what’s the nuance they’re missing?

Jullia
Well the dichotomous output hides complexity. You’ve got multiple underlying measures, each with its own variability and operational risks. Joint counts are sensitive to training and consistency, labs can be missing or out of window, and patient-reported measures depend on clean completion.

So even though the endpoint ends up as responder or non-responder, it behaves like a composite in terms of data dependencies. If you don’t treat it that way, you’ll feel it later in query load and sensitivity analyses.

Tom
Now could you pull together a short recap for listeners to remember?

Jullia
Yeah so first, ACR endpoints give a consistent framework for demonstrating improvement, and the timing of response across visits can change the clinical story. Next, plan how you’ll use multiple time points, whether that’s repeated measures, curve modelling, or an overall summary like weighted mean or GEE, and make the interpretation strategy clear early. Finally, ACR is only as strong as its components, so protect joint counts, lab turnaround, and visit window control.

With these in mind, you’re far less likely to be surprised when you derive ACR20, ACR50, and ACR70 and start comparing treatment groups.

And it all circles back to being deliberate. Define the endpoint properly, protect the component data at site level, and decide whether you need that longitudinal view before the database is full of avoidable gaps.

With that, we’ve come to the end of today’s episode on ACR Response Criteria in Rheumatoid Arthritis Clinical Trials. If you found this discussion useful, don’t forget to subscribe to QCast so you never miss an episode and share it with a colleague. And if you’d like to learn more about how Quanticate supports data-driven solutions in clinical trials, head to our website or get in touch.

Tom
Thanks for tuning in, and we’ll see you in the next episode.

About QCast

QCast by Quanticate is the podcast for biotech, pharma, and life science leaders looking to deepen their understanding of biometrics and modern drug development. Join co-hosts Tom and Jullia as they explore methodologies, case studies, regulatory shifts, and industry trends shaping the future of clinical research. Where biometric expertise meets data-driven dialogue, QCast delivers practical insights and thought leadership to inform your next breakthrough.

Subscribe to QCast on Apple Podcasts or Spotify to never miss an episode.

QCast Episode 36: ACR Response in Rheumatoid Arthritis Clinical Trials

🎧 Listen to the Episode:

Key Takeaways

Full Transcript

About QCast

QCast Episode 34: Therapeutic Areas in Clinical Research

QCast Episode 46: What is Clinical Data Review?

QCast Episode 21: The Role of Reconciliation in Clinical Data Management

Don’t let your data let you down

🎧 Listen to the Episode:

Key Takeaways

Full Transcript

About QCast

Related Articles

QCast Episode 34: Therapeutic Areas in Clinical Research

QCast Episode 46: What is Clinical Data Review?

QCast Episode 21: The Role of Reconciliation in Clinical Data Management

Don’t let your data let you down