Quality control in the evaluation reports
The usefulness of an evaluation depends on credibility which relies on transparency of the process and the quality of the evaluation. The quality of the evaluation has to be checked at four levels: the terms of reference, the evaluation process, the evaluation report and the dissemination and feedback of the evaluation. The present paper is only dealing with the quality of the evaluation report. |
Origin of the quality grid |
In 1999, the Directorate General Regio of the European Commission proposed in their guidelines on evaluation (MEANS collection) a quality grid for assessing the quality of the evaluation reports. In 1999 also, the Directorate General Agriculture of the EC decided to use systematically the quality grid for assessing the quality of the evaluation reports. In 2001, it has then been decided that the use of the grid would be mandatory for all the European Commission services, as one of the standards to be applied by the evaluation function in each of the Directorates General. The External Relations Directorates have applied, to all the evaluations launched from 2002 onwards, the quality grid as described below. |
Content of the quality grid |
The quality grid is filled twice. The first time for assessing the draft final report and a second time for assessing the final report in its publication form. The quality grid is published on the web site beside the evaluation report. The quality assessment ensures that the evaluation report complies with professional standards and meets the information needs of the intended users. The quality grid is presented in the terms of reference. The external evaluators know from the start on which basis they will be assessed. The quality assessment of the final report has 3 main aims:
|
Nine quality criteria |
The quality grid is based on nine criteria which are independent of each other and are examined for their own sake: 1. Meeting needs: does the report adequately address the information needs of the commissioning body and fit the terms of reference? 2. Relevant scope: is the rationale of the intervention and its set of outputs, results and impacts examined fully, including both intended and unexpected policy interactions and consequences? 3. Defensible design: is the evaluation design appropriate and adequate to ensure that the full set of findings, along with methodological limitations, is made accessible for answering the main evaluation questions? 4. Reliable data: are the primary and secondary data selected adequate? Are they sufficiently reliable for their intended use? When statistical data are missing, are qualitative data meaningful? When only weak data are available, the evaluators have explained their weaknesses and the limit of use. 5. Sound data analysis: is the analysis of quantitative and qualitative data appropriately and systematically done so that evaluation questions and judgement criteria are informed in a valid way? Are cause and effect links between the intervention and its results explained? Are external factors correctly taken into consideration? Are comparisons made explicit? 6. Credible findings: do findings follow logically from and are justified by the data analysis? Are findings based on careful described assumptions and rationale? 7. Valid conclusions: are conclusions linked to the findings? Are conclusions based on the judgement criteria? Are conclusions clear, clustered and prioritised? 8. Useful recommendations: are the recommendations linked to the conclusions? Are they fair, unbiased by personal or stakeholders' views, and sufficiently detailed to be operationally applicable? Are they clustered and prioritised? 9. Clear report: does the report describe the intervention being evaluated, including its context and purpose together with the process and findings of the evaluation? Is the report easy to read and has a short but comprehensive summary? Does the report contain graphs and tables? |
How to fill out the quality assessment grid? |
Each of the nine criteria of the quality grid is rated by five levels: excellent, very good, good, poor and unacceptable. The quality grid for each evaluation report is filled by two persons who discuss the nine criteria as to produce one grid sent to the external evaluators. The quality grids of the various evaluation reports are filled and rated by various persons within the evaluation team. It is important that, dependent of the persons and on the time, there is a consistency in the quality assessment of the report. To ensure that the quality grids of the reports are comparable, it has been necessary to elaborate a referential for the evaluation managers when assessing the quality of the evaluation reports. Moreover, the head of the evaluation unit checks each quality assessment to ensure a better consistency. The evaluation unit of External Relations within the European Commission has set up a comprehensive set of referential for rating the nine criteria. Criterion 1: Meeting needs Good: The report deals with the whole intervention in its temporal, geographic and regulatory dimensions. The main intended and unintended effects have been identified. Very good: In addition to the previous point, the evaluation took an interest in interferences with other EC policies, other donors' interventions and the partner government(s)' policies. Unintended effects have been addressed. Poor: One of the three dimensions of the intervention and/or one major effect has been inadequately or insufficiently addressed. Unacceptable: Several dimensions of the intervention and/or several major effects have been inadequately or insufficiently addressed. Excellent: In addition to the remarks for a "very good" rating, the report has systematically examined the unintended effects in detail. Criterion 2: Relevant scope Good: The report deals with the whole intervention in its temporal, geographic and regulatory dimensions. The main intended and unintended effects have been identified. Very good: In addition to the previous point, the evaluation took an interest in interferences with other EC policies, other donors' interventions and the partner government(s)' policies. Unintended effects have been addressed. Poor: One of the three dimensions of the intervention and/or one major effect has been inadequately or insufficiently addressed. Unacceptable: Several dimensions of the intervention and/or several major effects have been inadequately or insufficiently addressed. Excellent: In addition to the remarks for a "very good" rating, the report has systematically examined the unintended effects in detail. Criterion 3: Defensible design Good: The evaluation method is clearly explained and has been actually applied throughout the process. The methodological choices have been appropriate enough so as to meet the requirements of the terms of reference. Very good: The limitations inherent in the evaluation method have been clearly specified and the choices have been discussed while standing up against other options. Poor: Upon reading the evaluation report, the methodological choices seem to have been made without being either explained or defended. Unacceptable: Either there is no evaluation method, or the methodological choices are not in line with the results sought after. Excellent: In addition to meeting the expectations for a "very good" rating, the evaluation team submits a critique of the method and methodological choices. The report points out to the risks that might have been incurred if other methodological options had been adopted. Criterion 4: Reliable data This criterion does not apply to the intrinsic validity of existing data but on the way in which the evaluation team has collected and used the data. Good: Both quantitative and qualitative data are identified through explicit sources. The evaluation team has tested and discussed the reliability of data. The data collection tools have been clearly explained and adjusted to the data sought after. Very good: Data have been systematically cross-checked by relying upon sources or data collection tools that are independent of one another. Limitations pertaining to the reliability of data or to data collection tools are made explicit. Poor: Both quantitative and qualitative data provided are not very reliable regarding the question asked. The data collection tools are questionable (for instance, insufficient samples or off-the-target case studies). Unacceptable: Certain data are manifestly distorted. The data collection tools have not been applied correctly or else they provide biased or useless information. Excellent: All biases deriving from the information provided are analysed and rectified by means of recognised techniques. Criterion 5: Sound data analysis Good: The quantitative and/or qualitative data analysis is done rigorously, following the recognised and relevant steps depending on the type of analysed data. Cause-and-effect links between the intervention and its consequences are explained. Comparisons (for example: before / after, beneficiaries / non beneficiaries, with / without) are made explicit as well. Data on external factors are analysed. Very good: The analysis approaches are made explicit and their validity limitations are specified. Underlying cause-and-effect assumptions are explained. Validity limitations of comparisons made are pointed out. Poor: Either one out of three elements (analysis approach, cause-and-effect relations, and comparisons) is not well addressed or two out of such elements are dealt with inadequately. Unacceptable: 2 out of 3 elements are addressed inadequately. Excellent: Every analysis bias (across the 3 elements: analysis approach, cause-and-effect relations, and comparisons) has been systematically reviewed and presented, including its consequence in terms of limiting the analysis validity. Influence of external data on the analysis of relevant data is explicited. Criterion 6: Credible findings Good: The findings derived from the analysis seem both reliable and balanced, especially in view of the context in which the intervention is being assessed. Interpretations and extrapolations made are acceptable. Very good: The limitations applying to interpretations and extrapolations are explained and discussed. Poor: Findings seem imbalanced. The context is not made explicit. Neither extrapolations made nor generalisations of analysis are relevant. Unacceptable: Credibility of findings seems very poor. Some assertions in the text cannot be sustained. Neither extrapolations made nor generalisations of analysis are relevant. Excellent: Imbalances between the internal and external validity of findings are systematically analysed and the consequences this has on the evaluation is made explicit. Criterion 7: Valid conclusions This criterion does not assess the conclusion's intrinsic substance but the way in which the conclusions have been reached. Good: Conclusions derive from findings. Conclusions are grounded on both facts and analysis that are easily identifiable throughout the report. They are linked to explicit judgement criteria. The limitations to conclusions' validity are pointed out as well as the context in which the analysis was done. Very good: Conclusions are organised along clusters and hierarchical side. Conclusions are debated upon in connection with the context in which the analysis was done. The limitations to conclusions' validity are made explicit and well grounded. Poor: Conclusions stem from a hasty generalisation of some of the findings. The limitations to conclusions' validity are not pointed out. Unacceptable: Conclusions are not backed up by relevant and thorough findings. Conclusions are partial because they reflect the evaluator's preconceived ideas rather than the analysis of the facts. Excellent: Conclusions are reached in relation with the global nature of the intervention under evaluation. They take into account the intervention's connection with the context in which it takes place, considering other programmes or connected public policies in particular. Criterion 8: Useful recommendations This criterion does not judge the recommendations' intrinsic substance but the way in which they are articulated and whether they really derive from the conclusions. Good: The recommendations follow logically from the conclusions. They are impartial. Very good: In addition to the previous points, the recommendations are prioritised and clustered. They are presented in the form of options for possible actions. Poor: The recommendations are not very clear or they are mere evidence without any added value, their operability is arguable. The connection with the conclusions is not clear. Unacceptable: The recommendations are disconnected from the conclusions. They are biased because they mostly reflect certain players' or beneficiaries' viewpoints or the evaluation team's preconceived ideas. Excellent: In addition to meeting the requirements for a "very good" rating, the recommendations are tested and the validity limitations are pointed out. Criterion 9: Clear report Good: The report is easy to read and its structure is logical. Very good: The body of the report is short, concise and easy to read. Poor: The report is hard to read and/or its structure is complex. Unacceptable: Absence of summary. Criterion 10: Overall assessment The general quality of the report results from the ratings assigned to each one of the 9 criteria. |
Recommendations |
Use of the term criterion: A warning! The term "criterion" is used here in the sense of a quality criterion and should not be mistaken for evaluation criteria (effectiveness, efficiency, etc.), or with judgement criteria (also called "reasoned assessment criteria") |