Tuesday, September 24, 2013

Evidence in Evaluation -- beyond a [faith-based?] approach to the dominant RCT-for-all approach?

As we try to build capacity, sustainability, ownership, results, the question of evaluation comes around frequently. We're naturally regularly faced with the "what's the evidence?" question - and behind that, is the frequent not-so-silent assumption that without an RCT (Randomized Controlled Trial) or some pseudo-experimental version of the same concept, you really don't have anything to show.

RCTs are great, when they're applied on questions and in contexts where they are appropriate and feasible. In my view, the "golden standard" image that they carry has more to do with our dominant professional cultural belief system than serious methodological discussion of the issues at hand. Talk long enough with people who actually conduct RCTs, and you'll find out that (1) it doesn't work all the time, (2) interpretation can be very sensitive to small tweaks in the model, and (3) most of the time on the current and burning complex development questions of the time, they require that parts of an intervention be parceled out for evaluation, or that an intervention be modified to fit the RCT design. In other terms, it's like going for your physical and having the nurse say: "please bend your knees as my measuring rod doesn't reach that high!" So, golden standard? Not always.

RCT implementers themselves are usually careful about their claims (except of course when responding to call for proposal; we're only human). The problem is with the mass of non-specialists who may have skipped the subtleties and have developed a nearly faith-based attachment to RCTs as the answer to every question. Disciples are always the most problematic...

Let me stop venting. I just wanted to point out to some useful and fairly recent publications, which shed an interesting and balanced light on the topic. Not surprisingly, the issue of the complexity of the evaluation question at hand comes regularly into play.

Let's promote a new Gold Standard: the method most appropriate to build evidence should be determined by careful consideration of the question at hand, and equally important, who wants to know. And yes, it's harder than having a one response-fits all solution.

So, let's keep at it, and stand straight, even when we're asked to bend our knees. Fit the evaluation method to the program, not the other way around*.

Enjoy the readings, and if you have other suggested references, please use the comment box to add them.
DfID now recommends Stern et al (2012), Broadening the Range of Designs and Methods for Impact Evaluations in their solicitations for proposals.
The American Evaluation Association (AEA) has a recent special issue about Mixed Methods and Credibility of Evidence in Evaluation. It's a rich publication (which I'm still going through) deserving some serious attention. 
Finally, Michael Quinn Patton, former president of the AEA, offers a course based on his book Developmental Evaluation: Applying Complexity Concepts to Enhance Innovation and Use. It's not per se about impact evaluation, but it's certainly relevant to the questions we face at CEDARS about sustainability evaluation. The book, and Michael himself, pack a punch.

Thanks,


Eric

* Thanks to Florence Nyangara for inspiring this entry.
[picture source can be found here]

7 comments:

  1. Excellent post, Eric! While I do think that NGOs need to be doing more RCTs and particularly Cluster RCTs, they do have their limitations and I don't think they need to be done routinely. One of the most serious ethical questions to consider is, "How many fewer beneficiaries will we reach by doing this intervention as an RCT or CRCT vs. simply implementing it with a before and after survey?" In our recently published paper on our Care Group project in Mozambique [see http://www.ghspjournal.org/content/1/1/35.full], we had a reduction in malnutrition of about 38% at an annual cost-per-beneficiary of only USD $2.78. There were an estimated 6,848 children's lives saved (detailed in the final evaluation). Had we used a more robust design for measuring the effect and impact of the program (e.g., adding control groups, randomization), the logistics of a very simple project would have become more difficult and costly, and we would have needed to reduce the number of beneficiaries.

    The question would then become, how many of those lives saved do we want to give up to be able to prove to some folks the effectiveness of our approach? For *novel* approaches and in areas where we do not know if what we are doing is effective at all, the ability to know what happened and to persuade others to adopt a new way of doing things may trump the short-term goal of lives saved. (We will be doing a CRCT to show how reducing maternal depression increases behavior change, for example.)

    But it should never become *routine* to have that level of rigor in one's evaluation design since I don't think we can justify the missed opportunities of saved and improved lives by putting that many resources into measurement.

    Where we can use more inexpensive methods and still reach a lot of beneficiaries (e.g. stepped wedge design, introducing interventions over time to new areas, but measuring in all from baseline), we should do it. But as we move forward with sometimes using these more rigorous evaluation methods, I think it's worthwhile to count the cost.

    Tom Davis
    Senior Specialist for Social & Behavioral Change, TOPS Program
    Chief Program Officer, Food for the Hungry

    ReplyDelete
  2. Comment from Jennifer Yourkavitch:

    You're certainly not alone in this. Many *modern* epidemiologists agree that RCT is not a gold standard. My class discussed that just last week.
    RCTs are not particularly appropriate in our field. How could you randomize poor people into a group that will receive life-saving interventions and a group that won't? Ethics aside, the expense of a RCT renders the exercise counterproductive. And the question is wrong--we're usually not testing a treatment; rather, we test delivery mechanisms. So we have a burgeoning operations research capacity in the community of practitioners but some argue that even this is inappropriate. As far as I can tell, no one agrees on what "implementation science" is but it seems that systematizing our approaches and documentation of those approaches--the successes AND failures--would help to advance knowledge in our field.
    I think the post below alludes to program evaluation, in particular, and AEA has great resources and a huge active and vocal community about advancing that art and science.
    But I think there's something more/different that we can do to build evidence for approaches designed to enhance international health and human development and I think that something can emerge from ongoing discussions about systematizing our approaches and documentation (quant and qual), employing innovative designs with matching and crossover techniques to build evidence, and most importantly, asking good questions. Maybe that's implementation science.
    Jennifer

    ReplyDelete
  3. No doubt it is an excellent article. I support Tom's argument that in our development programs we can use more inexpensive methods for evaluation and still able to answer the question of effectiveness and impact without making RCTs as "norm." Recently I was involved in designing a Mid Term Evaluation and I was surprised that donor was insisting use of RCT to prove the effectiveness of donation for a meager amount and a very limited reach. Same amount that was being used for the evaluation could have been used to reach more beneficiaries as Tom mentioned in his comment. I fully agree that there is a need of RCT, but then we should make a very careful consideration when to use it rather than imposing as " the evaluation" methodology to demonstrate the effectiveness or impact of the program. The most important question is whose interest are we serving in the evaluation? Does it make sense to the community from whom we are collecting the data?

    As someone whose breath is Monitoring and Evaluation, who has experimented various simple and low cost evaluation design and is always open to new learning is very supportive to Eric's idea of promoting a new Gold Standard: the method most appropriate to build evidence should be determined by careful consideration of the question at hand, and equally important, who wants to know. Do other M&E guys in various development organisations willing to travel with me in this new exciting journey?

    ReplyDelete
  4. Nice to see the debate flourish. I think the problem of the opportunity cost is a real one. It also links to that of ownership: the global community might be excited to know that Care Groups can show a coefficient of X with Y confidence in improving health outcomes, but the district health officer and the local governor probably are happy to know that (a) someone has brought a social capital forming, human capacity building intervention to poor communities; (b) as this was implemented -- along with a ton of other uncontrollable events in the environment -- the health of our children improved from A to B.
    There's no question in my mind that sometimes we absolutely need something as rigorous as RCTs (are we going to push Zinc as a drug now? we'd better know whether it can possibly do the trick.) But there are times when it is simply (1) not possible -- Jennifer's point, (2) not worth it -- Tom's, or (3) not the question for the owners -- Subodh's point. If I summarize correctly.

    ReplyDelete
  5. More from Epi class (you get the benefits without the exams): RCT, by design, utilizes a controlled environment so that the effects of confounding can be minimized --"controlled." The field is not a controlled environment--there are too many variables that interfere with the exposure's (our approach) relationship to the outcome(s). It's impossible to create experimental conditions. So while the point of a RCT is to control confounding, in the field environment you'll get it anyway and in unknown ways--the "unknown unknowns" to quote Don Rumsfeld.
    Secondly, RCTs limit generalizability beyond the controlled conditions and the restricted study population. But generalizability is a goal of our work. We want our findings to work in other parts of a country and in other countries.
    If we require an understanding of causality in these conditions (and I maintain this question is incorrect and unhelpful in aggregate) a cohort study design is more appropriate. Again, that requires resources well beyond a project implementation budget.
    So I think we're back to Eric's call for a new paradigm--and I think there's an opportunity to use implementation science as a vehicle to take us there.

    ReplyDelete
    Replies
    1. Also see Smith and Pell in BMJ, 2003--review of RCTs with parachutes.

      Delete
  6. Additional related link from Easterly: http://aidwatchers.com/2009/07/development-experiments-ethical-feasible-useful/

    ReplyDelete