Skip to main content

Evaluation methodological approach

Group
public
50
 Members
2
 Discussions
213
 Library items

Table of contents

Judgement references

This section is structured as follows:

JUDGEMENT CRITERION

What does this mean?

A judgement criterion specifies an aspect of the evaluated intervention that will allow its merits or success to be assessed. Whilst "judgement criterion" is the appropriate word, an acceptable alternative is "reasoned assessment criterion". The criterion is used to answer an evaluation question. One or more judgement criteria are derived from each question.

What is the purpose?

  • To avoid subjectivity and to formulate judgements on accepted terms.
  • To improve the transparency of the evaluation by making the judgment explicit.
  • To structure the answers to the questions asked, since the judgement criteria will determine the indicators and, more generally, the nature of the data collected and the type of analysis.

How can a judgement criterion be clarified on the basis of a question?

All the evaluation questions relate to one or more judgement criteria, unless they are designed only to further knowledge or understanding about the intervention or its effects. The following is an example of a question:

Question
To what extent has EC aid improved the capacity of the primary educational system to enrol pupils from underprivileged groups without discrimination?

Like most evaluative questions, it has two parts:

  • What is being judged: "EC aid".
  • The way of judging: Has it "… improved the capacity of the primary educational system to enrol pupils from underprivileged groups without discrimination".
  • " This question belongs to the effectiveness family

The judgement criteria develop and specify the second part of the question, for example:

Judgement criteria
Capacity of the primary school system to enrol pupils from ethnic minority X satisfactorily. 

Capacity of the primary school system to enrol pupils from disadvantaged urban areas satisfactorily.

The judgement criteria derive from the question, for instance in the case of the first criterion:

  • It concerns the way of judging and not what is judged. This is why the beginning of the question concerning "EC aid" has been removed.
  • It specifies the type of success to be evaluated, that is, an improvement in the "capacity of the primary school system to enrol pupils from underprivileged groups without discrimination", and specifically "pupils from ethnic minority X".
  • It emphasises the judgement and not the causality analysis. That is why the terms "To what extent … has it improved" have been removed.

To be used in practice, each judgement criterion has to be accompanied by a target level and one or more indicator(s).

Recommendations

  • Always define the judgement criterion before selecting an existing indicator or designing a new indicator. This is essential in order to clarify the concepts. By focusing too soon on indicators, one is likely to get trapped into existing information, even if it is inadequate for answering the question asked.
  • Have the definition of the judgement criteria discussed by the reference group so that the diversity of points of view relevant to the intervention can be taken into account.
  • There may be disagreement on the judgement criteria, for instance the same effect may have a dimension that is judged positively by certain members of the reference group and another dimension judged negatively by others. In this case there are two options: (1) choose only one judgement criterion but be careful to avoid biasing the evaluation; or (2) choose several criteria, although this will increase and complicate the data collection and analysis work.
  • To optimise the collection and analysis of data, it is best to define a limited number of judgement criteria for each question. This recommendation also takes into account users' capacity to absorb information.
  • Where relevant, explain any gaps between the criteria used to formulate the judgement at the end of the evaluation process and those identified in the first phase (desk) of the evaluation.

Be careful not to confuse concepts

On this site, the word criterion is used for three different concepts:

  • Judgement criteria presented on this page
  • The evaluation criteria: relevance, effectiveness, efficiency, sustainability, impact, Community value added, and coherence/complementarity.
  • The quality assessment criteria of evaluation reports.

According to the EC, the value added of an evaluation is the formulation of value judgements on the basis of evidence and explicit judgement criteria. When dealing with organisations which are not familiar with evaluation, it may be wise not to use the word "judgement", which may induce resistance. An acceptable alternative is "assessment", or preferably "reasoned assessment".

TARGET

What does this mean?

The concept of a 'target' is widely used in the context of public management for setting a verifiable objective or a level of performance to be achieved. In an evaluation context it is used in a much wider sense since the evaluated intervention may have to be judged against targets that were not set in advance but that are specifically identified, such as a benchmark, a success threshold or a comparable good practice.

What is the purpose?

  • To avoid subjectivity and formulate a judgement on accepted and recognised terms.


 

How can they be determined?

- By reference to an objective defined in a verifiable way

The target may appear in one of the intervention objectives, that is, as long as they have been established in a verifiable way. In this particular case, the same indicator helps to define the objective, to make the judgment criterion operational and to determine the target.

  • Example: the number of qualified and experienced teachers per 1,000 children of primary-school age is at least 20.

- In relation to comparable good practices outside the intervention

In this case, the target is established at the outset of the evaluation. It is not related to an objective or a performance framework existing prior to the evaluation.

  • Example: the access to primary education with qualified and experienced teachers is at least as satisfactory as in the case of X (recognised good practice at regional level).

The procedure is as follows:

  • Identify a comparable practice recognised for its quality (similar EC intervention in another country; intervention by another donor, intervention in another sector though using the same instruments).
  • Obtain information on the practice for comparison (this is easier if it has already been evaluated).
  • Ensure that the contextual conditions are close enough that they allow for comparison.
  • Proceed to carry out the comparison (essentially qualitative).
  • Discuss and validate the comparison with the reference group.

- Compared to best practices identified within the intervention

The target can be found within the evaluated intervention itself during the synthesis phase, provided that specific practices can be considered as good as regards the judgement criteria under consideration. 

In this case, the good practices will serve as benchmarks to judge the others. Of course, it is advisable to check that the contextual conditions are close enough so as to allow for comparison.

  • Example: In areas where ethnic minority X concentrates, the number of qualified and experienced teachers per 1,000 children of primary-school age is close to the best performing areas in the country.

When should they be determined?

- Earlier or later in the evaluation process

If the target is derived from a verifiable objective or a performance framework, then it can be determined at the very first stage of the evaluation process. 

If the target is derived from an outside benchmark, then it should be identified during the early stages of the evaluation. However, the process may involve the gathering of secondary data with a view to specifying the benchmark, as well as a careful examination of comparability. This means that the target will not be completely defined in the first phase of the evaluation. 

If the target is to be derived from the best practices discovered within the intervention by the evaluation team, it will be determined in the synthesis phase.

- After choosing the judgement criterion

Determining the target takes place in a three-step process:

  • Choice and finalisation of the evaluation question.
  • Choice of the judgment criterion (or criteria).
  • The targets, the indicators and the sources of information are determined together in the third step.

Evaluation targets and others

When the evaluation question pertains to an intended result or impact, the target level is usually derived from a verifiable objective or borrowed from a performance assessment framework. 

Performance monitoring may however be of little or no help in the instance of evaluation questions relating to cross-cutting issues, sustainability factors, unintended effects, evolving needs and problems, coherence, etc.

INDICATOR

What does this mean?

The evaluation team may use any kind of reliable data to assess whether an intervention has been successful or not in relation to a judgement criterion and a target. 

Data may be collected in a structured way by using indicators. Indicators specify precisely which data are to be collected. An indicator may be quantitative or qualitative. In the latter case the scoring technique may be used. 

Unstructured data are also collected during the evaluation, either incidentally, or because tools such as case studies are used. This kind of evidence may be sound enough to be a basis for conclusions, but it is not an indicator. 
 

What is the purpose?

  • To collect and process data in a form that can be used directly when answering questions.
  • To avoid collecting an excessive amount of irrelevant data and to focus the process on the questions asked.

Evaluation indicators

The main evaluation indicators are those related to judgement criteria, that specify the data needed to make a judgement based on those criteria. 

An indicator can be constructed specifically for an evaluation (ad hoc indicator) and measured during a survey, for example. It may also be drawn from monitoring databases, a performance assessment framework, or statistical sources. 

A qualitative indicator (or descriptor) takes the form of a statement that has to be verified during the data collection (e.g. parents' opinion is that their children have the possibility of attending a primary school class with a qualified and experienced teacher). 

A quantitative indicator is based on a counting process (e.g. number of qualified and experienced teachers). The basic indicator directly results from the counting process. It may be used for computing more elaborate indicators (ratios, rates) such as cost per pupil or number of qualified and experienced teachers per 1,000 children of primary-school age. 

Indicators may belong to different categories: inputs, outputs, results or impacts. 

Evaluation indicators and others

When an evaluation question pertains to an intended result or impact, it is worth checking whether this result or impact has been subject to performance monitoring. In such cases, the evaluation team uses the corresponding indicators and data, which is a considerable help, especially if baseline data have been recorded. 

Performance monitoring may, however, be of little or no help in the instance of evaluation questions relating to cross-cutting issues, sustainability factors, unintended effects, evolving needs or problems, coherence, etc. 
 

Quality of an indicator

An indicator measures or qualifies with precision the judgement criterion or variable under observation (construct validity). If necessary, several less precise indicators (proxies) may be used to enhance validity. 

It provides straightforward information that is easy to communicate and is understood in the same way by the information supplier and the user. 

It is precise, that is, associated with a definition containing no ambiguity. 

It is sensitive, that is, it generates data which vary significantly when a change appears in what is being observed. 

Performance indicators and targets are often expected to be SMART, i.e. Specific, Measurable, Attainable, Realistic and Timely. The quality of an evaluation indicator is assessed differently. 

Indicators and effects: a warning!

The indicator used to evaluate an effect is not in itself a measurement or evidence of that effect. The indicator only informs on changes, which may either result from the intervention (effect) or from other causes. 

The evaluation team always has to analyse or interpret the indicator in order to assess the effect.

Categories of indicators

- Indicators and the intervention cycle

Indicators are used throughout the intervention cycle. They are first used to analyse the context; then, for the choice and validation of the intervention strategy, afterwards for monitoring outputs and results and, finally, for the evaluation.

Indicators and intervention design

Context indicators may be used to support the identification of the needs, problems and challenges which justify the intervention. 

As far as possible, objectives and targets are defined in a measurable way by using indicators.

Indicators, monitoring and performance assessment

Monitoring systems and performance assessment frameworks also use indicators which derive from the diagram of expected effects (also called results chain). 

Monitoring indicators primarily relate to inputs and outputs. Performance indicators primarily focus on intended results and impacts. The EC's Result Oriented Monitoring (ROM) does not rely that much on indicators. It delivers systematic assessments of external aid projects in the form of ratings with a view to intended results and impacts.

Indicators and evaluation

Evaluation indicators are used to help answering specific evaluation questions. Depending on the question, they may relate to the needs, problems and challenges which have justified the intervention, or to the achievement of intended outputs, results and impacts, or to anything else.

- Global and specific indicators

Global or contextual indicators apply to an entire territory, population or group, without any distinction between those who have been reached by the intervention and those who have not. They are mainly taken from statistical data. This site offers help to look for contextual indicators. 

Specific indicators concern only a group or territory that has actually been reached. With specific indicators, changes among those affected by the intervention can be monitored. Most of these indicators are produced through surveys and management databases.

- Indicators and intervention logic

Input indicators

Input indicators provide information on financial, human, material, organisational or regulatory resources mobilised during the implementation of the intervention. Most input indicators are quantified on a regular basis by the management and monitoring systems (providing that they are operational).

Output indicators

Output indicators provide information on the operators' activity, especially on the products and services that they deliver and for which they are responsible. To put it simply, one could say that outputs correspond to what is bought with public money.

Result indicators

Result indicators provide information on the immediate effects of the intervention for its direct addressees. An effect is immediate if the operator notices it easily while he/she is in contact with an addressee. Because they are easily recognised by the operators, direct result indicators can be quantified exhaustively by the monitoring system.

Impact indicators

Impact indicators provide information on the long-term direct and indirect consequences of the intervention. 

A first category concerns the consequences that appear or last in the medium or long term for the direct beneficiaries. 

A second category of impacts concerns people or actors that are not direct beneficiaries. 

Impact indicators cannot be produced in general from management information. They require statistical data or surveys specially conducted during the evaluation process.

Indicators derived from scoring

What does this mean?

Scoring (or rating) produces figures that synthesise a set of qualitative data and or opinions. Scoring is guided by a scoring grid (or scorecard) with varying degrees of detail. 

From an evaluation point of view, both words scoring and rating can be used.

What is the point?

Scoring allows the production of structured and comparable data on judgement criteria that do not lend themselves to a measurement using quantitative indicators.

How to construct a scoring grid

  • Examine several possible dimensions for the criterion that has to be assessed (sub-criteria).
  • For each dimension or sub-criterion, write a short sentence defining the success of the intervention (descriptor of full success).
  • For each dimension or sub-criterion, write another sentence (descriptor) defining the failure of the intervention.
  • Write one or more sentences (descriptors) that represent intermediate levels of success.
  • Associate a score with each descriptor (e.g. from 0 to 10, from 0 to 3, from -3 to +3).
  • Weight the sub-criteria if necessary.
  • Test the grid on a few pilot examples.
  • Discuss the test with the reference group if relevant

How to use the scoring grid

Scoring grids usually apply to projects or components of the intervention and allow for comparing these. 

The evaluation team puts together all the data it has on the project or intervention to be assessed. It then chooses the level (or descriptor) in the scoring grid that corresponds best (or the least badly) to this information. The score results from this choice. 

Recommendations

The more detailed the scoring grid the less subjective the score will be and the more comparable the scores allocated by two different evaluators will be.

FROM QUESTIONS TO INDICATORS

Example of a question

  • To what extent has EC support improved the capacity of the educational system to enrol pupils from disadvantaged groups without discrimination?

From questions to judgement criteria

The judgement criterion (also called reasoned assessment criterion) specifies an aspect of the evaluated intervention that will allow its merits or worth to be assessed in order to answer the evaluation question. For instance:

Judgement criterion derived from the question

  • Capacity of the primary school system to enrol pupils from ethnic minority X with satisfactory quality.

The judgement criterion gives a clear indication of what is positive or negative, for example: "enhancing the expected effects" is preferable to "taking potential effects into account".
 

A more precise judgement criterion than the question

The question is drafted in a non-technical way with wording that is easily understood by all, even if it lacks precision. 

The judgement criterion focuses the question on the most essential points for the judgement. 

Yet the judgement criterion does not need to be totally precise. In the first example the term "satisfactory quality" can be specified elsewhere (at the indicator stage).

Not too many criteria

It is often possible to define many judgement criteria for the same question, but this would complicate the data collection and make the answer less clear. 

In the example below, the question is treated with three judgement criteria (multicriteria approach):

  • "capacity of the primary school system to enrol pupils from ethnic minority X with satisfactory quality"
  • "capacity of the primary school system to enrol pupils from the poorest urban areas with satisfactory quality"
  • "capacity of the primary school system to enrol girls ".

A judgement criterion corresponding to the question

The judgement criterion should not betray the question. In the following example, two judgement criteria are considered for answering the same question:

  • "capacity of the primary school system to enrol pupils from ethnic minority X with satisfactory quality"
  • "primary school leavers from ethnic minority X pass their final year exam "

The first judgement criterion is faithful to the question, while the second is less so in so far as it concerns the success in primary education, whereas the question concerns only the access to it. The question may have been badly worded, in which case it may be amended if there is still time.

Also specify the scope of the question

Most questions have a scope (what is judged) and a judgement criterion (the way of judging). In addition to the judgement criterion, it is therefore often desirable to specify the scope of the question, for example: "European aid granted over the past X years", "design of programme X", "the principle of decentralisation adopted to implement action X".

Also specify the type of cause-and-effect analysis

Some questions imply a cause-and-effect analysis prior to the judgement. It may therefore also be useful to specify the type of analysis required by means of terms such as "has European aid led to", "has it contributed to", "is it likely to".

From judgement criteria to indicators

An indicator describes in detail the information required to answer the question according to the judgement criterion chosen, for example:

Indicator derived from the judgement criterion

  • Number of qualified and experienced teachers per 1000 children of primary-school age in areas where ethnic minority X concentrates

Not too many indicators

It is possible to define many indicators for the same judgement criterion. Relying upon several indicators allows for cross-checking and strengthens the evidence base on which the question is answered. However, an excessive number of indicators involves a heavy data collection workload without necessarily improving the soundness of the answer to the question. 

In the examples below three indicators are applied to a judgement criterion ("capacity of the primary school system to enrol pupils from ethnic minority X with satisfactory quality"):

  • "Number of qualified and experienced teachers per 1000 children of primary-school age in areas where ethnic minority X concentrates"
  • "Number of pupils per teacher in areas where ethnic minority X concentrates"
  • "Level of quality of the premises (scale 1 to 3) assigned to primary education in areas where ethnic minority X concentrates ".

Indicator corresponding to the judgement criterion

The indicator should not betray the judgement criterion. Two indicators are considered below:

  • "Number of qualified and experienced teachers per 1000 children of primary-school age in areas where ethnic minority X concentrates."
  • "Primary education enrolment rate in areas where ethnic minority X concentrates"

The first indicator corresponds faithfully in so far as it describes an essential aspect of the judgement criterion. The second indicator is less faithful because it fails to reflect the concept of "satisfactory quality". Its construct validity is not good.

Unambiguous indicators

An indicator must be defined without any ambiguity and understood in the same way by all the members of the evaluation team. For instance, in the above examples it is necessary to specify what a "qualified and experienced teacher" is. This can be done with reference to an existing definition, or else a definition can be formulated as precisely as possible until there is no more ambiguity whatsoever.

Indicators independent from the observation field

The same indicator should be able to serve to collect data in several contexts, for example:

  • "Number of qualified and experienced teachers per 1000 children of primary-school age" - in areas where ethnic minority X concentrates.
  • "Number of qualified and experienced teachers per 1000 children of primary-school age" - in areas where ethnic minority X is absent.

In this case the same indicator is applied in both types of area and serves as a comparison, on the basis of which a judgement is formulated.

Quantitative and qualitative indicators

The following two examples present an alternative between a quantitative indicator and a qualitative indicator for treating the same judgement criterion:

  • "Number of qualified and experienced teachers per 1000 children of primary-school age" (quantitative)
  • "Surveyed parents confirm that their children have the possibility of attending a primary-school class and benefit from a qualified and experienced teacher" (qualitative);

An indicator is preferably associated with a target

The target indicates which comparison should be made in order to answer the question, for example: "In the areas where ethnic minority X concentrates, the indicator is at least as good as in the entire country in average". 

The target and the indicator are often specified interactively in successive steps. It is important not to digress from the judgement criterion during this process. 

When the evaluation question pertains to an intended result or impact, the target is usually derived from a verifiable objective or borrowed from a performance assessment framework.

Feasible indicators

The indicator makes it possible to focus and structure data collection but serves no purpose as long as data does not exist. To ensure the feasibility of an indicator, it is necessary to indicate the source of the information to use, for example:

  • management data from the national education system,
  • periodical national surveys on the education system,
  • questionnaire survey in several specially selected areas and in the framework of the evaluation.

If no source is available or feasible, the indicator should be changed. If no feasible indicator can be found, excluding the question should be envisaged. 

Example of a country evaluation

Question

Question
To what extent does the EC ensure coherence between its support and its other policies?
  • The question relates to a family of evaluation criteria: coherence/complementarity
     

Judgement criterion

Judgement criterion
The expected effects of the intervention, and the effects of other EC policies affecting the partner country are likely to reinforce each others.

The judgement criterion is derived from the question in the following way:

  • It shows the type of success that the question is supposed to evaluate, that is, "coherence between the aid and other EC policies".
  • It specifies several concepts, primarily that of coherence but also the term "other policies".
  • It concerns the way of judging and not what is judged. That is why the beginning of the question concerning "To what extent does the EC ensure" has been removed.

Indicator

Indicator
Positive/negative synergy between the expected effects of the intervention and the expected effects of other EC policies, as regards affected groups in the partner country.

The indicator is derived from the judgement criterion in the following way:

  • It corresponds faithfully to the judgement criterion.
  • It describes in detail the data to be collected in order to apply the judgement criterion chosen. However, terms such as "positive/negative synergy" still have to be defined further. The definition must be drafted as precisely as possible, until there is no more ambiguity whatsoever.
  • It is qualitative, although a quantitative indicator could have been defined, for example: "proportion of positive synergies among identified synergies".
  • It makes it possible to define a target, for example by comparing the respective importance of positive and negative synergies.
  • Its feasibility still has to be verified by ensuring that one or more sources of information and evaluation tools will be available, for example: opinion of a panel of independent experts, questionnaire administered at the end of a focus group composed of administrators.
Example of a sector evaluation

Question

Question
To what extent has EC support enhanced the capacity of the educational system to enrol pupils from disadvantaged groups without discrimination?

The question refers to a family of evaluation criteria: effectiveness.

Judgement criterion

Judgement criterion
Capacity of the primary school system to enrol pupils from ethnic minority X with satisfactory quality.

The judgement criterion is derived from the question in the following way:

  • It shows the type of success that the question is supposed to evaluate, that is, the best "capacity of the education system to enrol pupils from disadvantaged groups without discrimination".
  • It clarifies several concepts such as "educational system", "disadvantaged groups" and "discrimination". The term "satisfactory quality" has yet to be specified.
  • It concerns the way of judging and not what has been judged. That is why the beginning of the question concerning "EC support" has been removed.
  • It focuses on the judgement and not on the causal analysis. That is why the term "has … enhanced" has been removed.

Indicator

Indicator
Number of qualified and experienced teachers per 1000 children of primary-school age in areas where the ethnic minority X concentrates.

The indicator derives from the judgement criterion in the following way:

  • It describes in detail the information required to judge according to the judgement criterion chosen. But this is not enough. In particular, it is necessary to specify what a "qualified and experienced teacher" is. This can be done by referring to an existing definition or else by formulating a definition as precisely as possible until there is no more ambiguity whatsoever.
  • The indicator corresponds faithfully to the judgement criterion ("capacity of the primary school system to enrol pupils with satisfactory quality"). It does not encompass all the dimensions of the judgement criterion but highlights what is considered as essential.
  • It is quantitative, but a qualitative indicator could also have been defined, for example: "surveyed parents confirm that their children have the possibility of attending a primary school class and benefit from a qualified and experienced teacher".
  • It will make it possible to define a target, for example by comparing "areas where ethnic minority X concentrates" with "the entire country".
  • Its feasibility still has to be verified by ensuring that one or more sources of information are available, for example: management data of the national education system, periodical national surveys on the educational system, questionnaire survey run as part of the evaluation.