[Video] Tipping Point Analysis in Multiple Imputation for Binary Missing Data

In this Statistical Knowledge Share Video our Senior Statisticians, Niccolo, presents an example on simulated data of Tipping Point Analysis in Multiple Imputation for Missing Data.

Video Transcribe

"Good afternoon, everyone, this is going to be rather informal, so there is no strange or odd formula or notation in here, hopefully should be relatively clear and it's going to be practical. The Subject today is about tipping points analysis, which in the end, is closely related to missing data, so most of the concepts I'm going to touch base on will be relatively familiar to all of you.

In fact, these two missing records are the root cause of everything for this presentation and the biggest point here is that since this missing data are in fact missing, the only thing which we can do with respect to their nature is to make assumptions. Which is really difficult to know why each single data is in fact missing, and every assumptions we make, Is neither to be tested, we need to investigate what happens if the one we make is not true and its impact on analysis downstream, in order to cross check with the interpretation we are going to make from these results.

So I believe we all know the literature about missing data types. So data can either be missing completely at random which is the simplest situation, where for whichever reason, a data point is not recorded, as observed at random, so there is no real explanation, it's just the example which I've reported here is that the device which measures the endpoints simply faced with these measurements by random, because there's a, there is some kind of bug in the machine, which doesn't make it work.

The data could be missing at random, which is slightly more strict situation where the fact that the data is missing is totally and completely explained by the data which we have observed. So for instance, the example which I've reported here is the device which measures the endpoint, has some faulty behavior when females are being tested.

So we know that the subject being measured, this woman, is completely giving us all the information we need to say whether the measurement is going to be missing or not. And then there is the one which every statistician and investigator is really concerned about, which is the missing not at random, which means that the fact that an observed record is not observed due to the value of the record itself. Which means that in the example I've given here, the device is most likely to fail in providing the value, if the value is above a given threshold. A very common example when it comes to social survey is when you ask people to provide their income. It has been seen that people with the highest level of income are the one which are going to be the most likely individuals not to report it.

So in that case we have a very obvious scenario where the data are going to be missing not at random. Most of the statistical methods which we used are perfectly working with missing not at random, and missing completely at random. Yeah, the first acronym is MCAR, not MNAR, obviously, and missing at random, but no methods, itself, is really fit for purpose when data missing not at random.

So, what usually happens is that you, within your study protocols, statistical analysis plan, or whichever documentation you're going to write, you will specify primary analysis, which is usually a maximal likelihood based method, it can be an ANCOVA, but it can be a mixed model; it can be whatever it's fit for purpose. And these methods work fine, when we are missing at random, or missing completely at random data. And if there is a large amount of missing values, or if there is any valid scientific reason for doing it, we perform what we call a sensitivity analysis, using any reasonable imputation method, be it a very simple one, as the last observation carried forward, which in fact is not really, is not much sponsored by the authorities, but is still quite commonly used, or we can perform multiple imputation, just to take let's say the data generation processing into account when generating these new data points.

The idea of sensitivity analysis is that, we are testing our sensitive in factors, the main results when we do, let's say, fill in the gaps in the observed data. If the data are going to be missing at random, or completely at random, what we do expect is that the available data are a good representation of the missing ones.

So the results should not differ that much, and if they do differ, we will see what happens. However, what do we do if the data are missing not at random? This question is actually imposed, because we'll never really know if the data are truly missing not at random. We can make the assumption, or we can assume that they are not missing not at random, and see what would happen, what would be the impact on all our analyses and interpretation if they actually were.

So just to show you what might happen, there is a very simple simulation here. So we have 100 patients randomized to two treatment groups. The endpoint is a continuous one, analyze using the generalized linear model, with gender and treatment as fixed effects.

So assuming it's an equivalent study, and there is a margin of – 3/ +3 , the data we're simulated in order to have the active compounds to provide a mean value which was at least 3, which was higher by 3 points, compared to the placebo, so in theory these two drugs are not equivalent, because obviously with the margin of – 3/ +3 and the mean which is already higher by 3, then I would expect the confidence interval for the difference within treatments not to lie within this – 3/ +3 bounds.

Then I have generated some missing data by simply either removing randomly 20% of points, either removing 20% of points from the female patients, and 0% on the male one. Then using them to generate missing not at random, I have given a 99% of the data to be missing, if they were above 13. In this case, this is going to have quite a very relevant impact, because most of the data which are above 13 are going to belong to the active group, as we are going to see.
In fact, these are the results on the full dataset and on the incomplete dataset. If you see up, until the missing at random, data results are more or less the same, which means that, roughly speaking, the observations left in the dataset are a good representation of the full dataset.

But when we go to the missing not at random, we see that the results do change quite a lot, not only in quantitative terms, but also in qualitative terms, because now if the MNR dataset was the one we were actually provided. We would declare the drugs to be equivalent, and that would only because all the highest data points have been removed from the active, and thus we have a lower estimate for the mean, that is 13, compared to the 14.4, which is the correct one.

So obviously this is quite a problem, because we will never know which of these three scenarios are going to apply, we will only know that we need to be aware that if this happens, this is going to lead to a false claim of equivalence, which can be quite bad again, if the drug is not equivalent.

So can we test? The short answer is no, because for us to be able to test the missing not at random, we will need to have some of the missing data available, to make this judgment, and whereas in some social service, you have the recall option, so we can call back people who didn't respond to see whether this happens, within clinical trials usually we don't have this privilege. What you can do however is to check out how severe departures from the MAR, or MCAR need to be, in order for this departure to have an impact on your results. So, this is the reason for tipping point analysis, which really means, push down this little coin down this slope, and see how fast this coin starts to roll as we move away from the starting point.

So, in a nutshell what we do is a standard sensitivity analysis using multiple imputation, and we create like, 20 or 30 or how many we want imputed dataset, then we modify the values which have been imputed by shifting them, that means we assume the multiple imputation model doesn't provide the correct estimate for the missing data, but, the estimate for the missing data need to be shifted downwards or upwards, depending on what we believe is a reasonable shift we want to apply.

Then we redo the analysis using the multiple imputation approach, which means that we combine estimates from all imputed dataset using the standard error provided by each dataset, as a way to a conference for multiple imputation uncertainty. We get results, and we see, okay, the confidence interval under, using this shift value, is leading us to a change in the inferential results of the study; yes or no? If no, that is good. If yes, then we need to check how large the shift has been and, how credible, how meaningful, how relevant, is the shift going to be in practice? So is this a shift which could happen or not?

The idea is that, as I said before, if the data are missing at random, then the ones which we observe fully explain the missing data, and so the average predictions which we're getting from the observed data should be more or less realistic, and should fit what we do not have as missing. However, if they are not, if this predictions do not in fact reflect what the missing data are, we do the tipping point analysis, to see if this difference is really relevant or not.

So this is briefly the SAS code, we have the multiple imputation step, which I've used this Freedman match, which is just a way, what it does is just fitting a model, and then once you have the predictions, it checks for every imputed value. It figures out which are the five observations, which have the closest prediction to this missing value, and randomly select one of the five observed value as the imputed one. Then we analyze the imputed dataset using in this case a proc mixed with a by statement, and we will get results for every imputation, and then with proc mi-analyze putting everything together using the standard error as our measure of waiting for the uncertainty. And then, on this simulated dataset, we do apply the shift, so if the data is missing we add this istochastic shift that, to some extent, not just hard mode at hard number, but by doing this, we make sure that we are adding a consistent shift to all the imputed datasets, then we analyze the shifted datasets, and we combine the results the same way as we did before.

And these are the results, so you see that GLM and multiple imputation are substantially similar, whereas if you look at the tipping point, what we see here is that if we were to apply a shift of 1.5, the results would change, because we would have a confidence interval which is not any more contained in the – 3/ +3 margin. So if we were to look at this what we would say is that it really takes a very small departure from the assumptions that the missing data and the observed data were of very similar pattern, to completely change our conclusions. And if this 1.5 shift is something which is reasonable, we know the instrument and we know that in fact the numbers which are missing could be higher by 1.5 compared to the original, to the ones which are available. Then this means that the main results are either questionable, or need to be supported by some further analysis, which take this into account.

One thing which could be done is to analyze shift from 0.5 to 1.5, with some work and granularity just to see exactly which shift leads to this change; so the shift is closer to 1.5, or closer to 0.5, for instance, but the assurance on missing data is influenced on results.

That's important because you never really know the impact of missing data on the results, substantially because you don't know what the missing data look like them. Tipping point do tell you, to some extent, which is their impact. It's not mandatory, so nowhere is written that you have to use them, or the study's going to be rejected, but in some cases, the authority really wants you to use them, in particular if you have a lot of missing data, and if the reason for missing data is not obviously a random one. So in this case, for this binary endpoint, since it could be due to, you know, if the data are missing because of a lack of efficacy, it could be that the reason why they're missing is because they're all no responders, so really that is a case of missing not at random, and you need to check for it. And then yeah, and then we have the regulatory reason for it. As you can see this is the end, thanks everyone".