Jump to content

Evaluation: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Spartakan (talk | contribs)
m See also: Links to Formative and Summative evaluation
Line 326: Line 326:
* [http://evaluationcanada.ca Canadian Evaluation Society] - A Canada-wide non-profit bilingual association dedicated to the advancement of evaluation theory and practice
* [http://evaluationcanada.ca Canadian Evaluation Society] - A Canada-wide non-profit bilingual association dedicated to the advancement of evaluation theory and practice
* [http://www.sil.org/lingualinks/literacy/ReferenceMaterials/GlossaryOfLiteracyTerms/WhatIsFormativeEvaluation.htm Formative vs. Summative Evaluation] - Two general purposes for evaluation
* [http://www.sil.org/lingualinks/literacy/ReferenceMaterials/GlossaryOfLiteracyTerms/WhatIsFormativeEvaluation.htm Formative vs. Summative Evaluation] - Two general purposes for evaluation
* [http://www.genderevaluation.net Gender evaluation methodology (GEM)]
* [http://www.ibec-its.org IBEC] - International, Benefits, Evaluation and Costs Working Group for the ITS community
* [http://www.ibec-its.org IBEC] - International, Benefits, Evaluation and Costs Working Group for the ITS community
* [http://www.education.purdue.edu/assessmentcouncil/Links/Index.htm Links to Assessment and Evaluation Resources] - List of links to resources on several topics
* [http://www.education.purdue.edu/assessmentcouncil/Links/Index.htm Links to Assessment and Evaluation Resources] - List of links to resources on several topics

Revision as of 14:04, 8 February 2010

Evaluation is systematic determination of merit, worth, and significance of something or someone using criteria against a set of standards. Evaluation often is used to characterize and appraise subjects of interest in a wide range of human enterprises, including the arts, criminal justice, foundations and non-profit organizations, government, health care, and other human services.


Definition

The definition of evaluation is often problematic and it can be argued that evaluation does not need a definition. Practical problems are not due to a lack of a definition but rather are a result of attempting to define evaluation.

Within the last three decades there have been tremendous theoretical and methodological developments within the field of evaluation (Hurteau, Houle, & Mongiat, 2009)[1]. Despite its progress there are still many fundamental problems faced by this field as Davidson (2005) argues ‘unlike medicine, evaluation is not a discipline that has been developed by practicing professionals over thousands of years, so we are not yet at the stage where we have huge encyclopaedias that will walk us through any evaluation step-by-step’, or provide a clear definition of what evaluation entails (cited in, Hurteau, Houle, & Mongiat, 2009, p.307). Using Davidson’s argument one can argue that a key problem that evaluators face is the lack of a clear definition of evaluation. Hurteau, Houle and Mongiat (2009, p.307) observe that the lack of a clear definition may “underline why program evaluation is periodically called into question as an original process, whose primary function is the production of legitimate and justified judgments which serve as the bases for relevant recommendations.” However, Potter (2006)[2] postulates that the strict adherence to a set of methodological assumptions may make the field of evaluation more acceptable to a mainstream audience but this adherence will work towards preventing evaluators from developing new strategies for dealing with the myriad of problems that programs face.

Datta (2006) states that “from an often huge body of relevant evaluations and reports, only about 10% of these reports” or less, are used by the evaluand’s (clients) (cited in, Hurteau, Houle, & Mongiat, 2009, p.308). Fournier and Smith (1993) comment that, “when evaluation findings are challenged or utilization has failed, it was because stakeholders and clients found the inferences weak or the warrants unconvincing” (cited in, Hurteau, Houle, & Mongiat, 2009, p.308). Some reasons for this situation, may be the failure of the evaluator to establish a set of shared aims with the evaluand (client), or creating overly ambitious aims, as well as failing to compromise and incorporate the cultural differences of individuals and programs within the evaluation aims and process (Reeve, & Peerbhoy, 2007)[3].

None of these problems are due to a lack of a definition of evaluation but are rather due to evaluators attempting to impose predisposed notions and definitions of evaluations on clients. The central reason for the poor utilization of evaluations is arguably due to the lack of tailoring of evaluations to suit the needs of the client, due to a predefined idea ( or definition) of what an evaluation is rather than what the client needs are. As House (1980) explains “unless an evaluation provides an explanation for a particular audience, and enhances the understanding of that audience by the content and form of the arguments it presents, it is not an adequate evaluation for that audience, even though the facts on which it is based are verifiable by other procedures” ( cited in, Hurteau, Houle, & Mongiat, 2009, p.308).

If one were to page through many well reviewed books on the topic of evaluation one would be sure to come across a definition similar to these. An evaluation is a systematic, rigorous, and meticulous application of scientific methods to assess the design, implementation, improvement or outcomes of a program (Rossi, Lipsey, & Freeman, 2004). It is a resource-intensive process, frequently requiring resources, such as, evaluator expertise, labour, time and a sizeable budget (Rossi, Lipsey, & Freeman, 2004)[4].

St Leger and Walsworth-Bell have defined evaluation as ‘the critical assessment, in as objective a manner as possible, of the degree to which a service or its component parts fulfils stated goals’ (cited in, Reeve, & Peerbhoy, 2007,p.122). The focus of this definition is on attaining objective knowledge, and scientifically or quantitatively measuring predetermined and external concepts (Reeve, & Peerbhoy, 2007).

Stufflebeam defines evaluation as ‘a study designed to assist some audience to assess an object’s merit and worth’ (cited in, Reeve, & Peerbhoy, 2007,p.122). In this definition the focus is on facts as well as value laden judgements of the programs outcomes and worth (Reeve, & Peerbhoy, 2007).

Stake and Schwandt (2006) argue that “the main purpose of a program evaluation is to determine the quality of a program by formulating a judgment” (cited in, Hurteau, Houle, & Mongiat, 2009, 307). This definition is contested by Reeve, and Peerbhoy (2007, p.122) who argue that “ projects, evaluators and other stakeholders (including funders) will all have potentially different ideas about how best to evaluate a project since each may have a different definition of ‘merit’. The core of the problem is thus about defining what is of value.”

Reeve and Peerbhoy (2007, p.122) argue that evaluation “is a contested term”, as ‘evaluators’ use the term evaluation to describe an assessment, or investigation of a program whilst others simply understand evaluation as being synonymous with applied research. The above definitions outline some facets of evaluation whilst excluding other functions and goals of an evaluation. Not all evaluations serve the same purpose some evaluations serve a monitoring function rather than focusing solely on measurable program outcomes or evaluation findings, and it would be a tremendous feat to define the numerous types of evaluations that can be conducted (Reeve, & Peerbhoy, 2007). As Alkin and Ellett (1990, p.454) eloquently state that “evaluation is a large, but not unified theoretical area. No single theory specific to evaluation is available to describe, explain and predict all types of evaluation activities”.

Potter (2006, p.88) contends that “evaluation is an eclectic and diverse field”. He argues that this diversity is reflected in the body of literature around evaluation, as the literature, “draws on a number of disciplines, which include management and organisational theory, policy analysis, education, sociology, social anthropology, and social change” (Potter, 2006, p.88). These arguments capture an important facet of evaluation, namely, that evaluation is a theoretically informed approach, and consequently a definition of evaluation would have be tailored to the theory, approach, needs, purpose and methodology of the evaluation itself. Therefore, the problem of defining evaluation is not that the field of evaluation does not permit a definition but rather that a definition of evaluation would depict evaluation as a static concept rather than the fluid approach it is.

Standards and meta-evaluation

Depending on the topic of interest, there are professional groups which look to the quality and rigor of the evaluation process.

The Joint Committee on Standards for Educational Evaluation [5] has developed standards for educational programmes, personnel, and student evaluation. The Joint Committee standards are broken into four sections: Utility, Feasibility, Propriety, and Accuracy. Various European institutions have also prepared their own standards, more or less related to those produced by the Joint Committee. They provide guidelines about basing value judgments on systematic inquiry, evaluator competence and integrity, respect for people, and regard for the general and public welfare.

The American Evaluation Association has created a set of Guiding Principles [6] for evaluators. The order of these principles does not imply priority among them; priority will vary by situation and evaluator role. The principles run as follows:

Furthermore, the international organizations such as the I.M.F. and the World Bank have independent evaluation functions. The various funds, programmes, and agencies of the United Nations has a mix of independent, semi-independent and self-evaluation functions, which have organized themselves as a system-wide UN Evaluation Group (UNEG)[7], that works together to strengthen the function, and to establish UN norms and standards for evaluation. There is also an evaluation group within the OECD-DAC, which endeavors to improve development evaluation standards. [8]

Approaches

Evaluation approaches are conceptually distinct ways of thinking about, designing and conducting evaluation efforts. Many of the evaluation approaches in use today make truly unique contributions to solving important problems, while others refine existing approaches in some way.

Classification of approaches

Two classifications of evaluation approaches by House [9] and Stufflebeam & Webster [10] can be combined into a manageable number of approaches in terms of their unique and important underlying principles.

House considers all major evaluation approaches to be based on a common ideology, liberal democracy. Important principles of this ideology include freedom of choice, the uniqueness of the individual, and empirical inquiry grounded in objectivity. He also contends they are all based on subjectivist ethics, in which ethical conduct is based on the subjective or intuitive experience of an individual or group. One form of subjectivist ethics is utilitarian, in which “the good” is determined by what maximizes some single, explicit interpretation of happiness for society as a whole. Another form of subjectivist ethics is intuitionist / pluralist, in which no single interpretation of “the good” is assumed and these interpretations need not be explicitly stated nor justified.

These ethical positions have corresponding epistemologiesphilosophies of obtaining knowledge. The objectivist epistemology is associated with the utilitarian ethic. In general, it is used to acquire knowledge capable of external verification (intersubjective agreement) through publicly inspectable methods and data. The subjectivist epistemology is associated with the intuitionist/pluralist ethic. It is used to acquire new knowledge based on existing personal knowledge and experiences that are (explicit) or are not (tacit) available for public inspection.

House further divides each epistemological approach by two main political perspectives. Approaches can take an elite perspective, focusing on the interests of managers and professionals. They also can take a mass perspective, focusing on consumers and participatory approaches.

Stufflebeam and Webster place approaches into one of three groups according to their orientation toward the role of values, an ethical consideration. The political orientation promotes a positive or negative view of an object regardless of what its value actually and might be. They call this pseudo-evaluation. The questions orientation includes approaches that might or might not provide answers specifically related to the value of an object. They call this quasi-evaluation. The values orientation includes approaches primarily intended to determine the value of some object. They call this true evaluation.

When the above concepts are considered simultaneously, fifteen evaluation approaches can be identified in terms of epistemology, major perspective (from House), and orientation (from Stufflebeam & Webster). Two pseudo-evaluation approaches, politically controlled and public relations studies, are represented. They are based on an objectivist epistemology from an elite perspective. Six quasi-evaluation approaches use an objectivist epistemology. Five of them—experimental research, management information systems, testing programs, objectives-based studies, and content analysis—take an elite perspective. Accountability takes a mass perspective. Seven true evaluation approaches are included. Two approaches, decision-oriented and policy studies, are based on an objectivist epistemology from an elite perspective. Consumer-oriented studies are based on an objectivist epistemology from a mass perspective. Two approaches—accreditation/certification and connoisseur studies—are based on a subjectivist epistemology from an elite perspective. Finally, adversary and client-centered studies are based on a subjectivist epistemology from a mass perspective.

Summary of approaches

The following table is used to summarize each approach in terms of four attributes—organizer, purpose, strengths, and weaknesses. The organizer represents the main considerations or cues practitioners use to organize a study. The purpose represents the desired outcome for a study at a very general level. Strengths and weaknesses represent other attributes that should be considered when deciding whether to use the approach for a particular study. The following narrative highlights differences between approaches grouped together.

Summary of approaches for conducting evaluations
Approach Attribute
Organizer Purpose Key strengths Key weaknesses
Politically controlled Threats Get, keep or increase influence, power or money. Secure evidence advantageous to the client in a conflict. Violates the principle of full & frank disclosure.
Public relations Propaganda needs Create positive public image. Secure evidence most likely to bolster public support. Violates the principles of balanced reporting, justified conclusions, & objectivity.
Experimental research Causal relationships Determine causal relationships between variables. Strongest paradigm for determining causal relationships. Requires controlled setting, limits range of evidence, focuses primarily on results.
Management information systems Scientific efficiency Continuously supply evidence needed to fund, direct, & control programs. Gives managers detailed evidence about complex programs. Human service variables are rarely amenable to the narrow, quantitative definitions needed.
Testing programs Individual differences Compare test scores of individuals & groups to selected norms. Produces valid & reliable evidence in many performance areas. Very familiar to public. Data usually only on testee performance, overemphasizes test-taking skills, can be poor sample of what is taught or expected.
Objectives-based Objectives Relates outcomes to objectives. Common sense appeal, widely used, uses behavioral objectives & testing technologies. Leads to terminal evidence often too narrow to provide basis for judging the value of a program.
Content analysis Content of a communication Describe & draw conclusion about a communication. Allows for unobtrusive analysis of large volumes of unstructured, symbolic materials. Sample may be unrepresentative yet overwhelming in volume. Analysis design often overly simplistic for question.
Accountability Performance expectations Provide constituents with an accurate accounting of results. Popular with constituents. Aimed at improving quality of products and services. Creates unrest between practitioners & consumers. Politics often forces premature studies.
Decision-oriented Decisions Provide a knowledge & value base for making & defending decisions. Encourages use of evaluation to plan & implement needed programs. Helps justify decisions about plans & actions. Necessary collaboration between evaluator & decision-maker provides opportunity to bias results.
Policy studies Broad issues Identify and assess potential costs & benefits of competing policies. Provide general direction for broadly focused actions. Often corrupted or subverted by politically motivated actions of participants.
Consumer-oriented Generalized needs & values, effects Judge the relative merits of alternative goods & services. Independent appraisal to protect practitioners & consumers from shoddy products & services. High public credibility. Might not help practitioners do a better job. Requires credible & competent evaluators.
Accreditation / certification Standards & guidelines Determine if institutions, programs, & personnel should be approved to perform specified functions. Helps public make informed decisions about quality of organizations & qualifications of personnel. Standards & guidelines typically emphasize intrinsic criteria to the exclusion of outcome measures.
Connoisseur Critical guideposts Critically describe, appraise, & illuminate an object. Exploits highly developed expertise on subject of interest. Can inspire others to more insightful efforts. Dependent on small number of experts, making evaluation susceptible to subjectivity, bias, and corruption.
Adversary “Hot” issues Present the pro & cons of an issue. Ensures balances presentations of represented perspectives. Can discourage cooperation, heighten animosities.
Client-centered Specific concerns & issues Foster understanding of activities & how they are valued in a given setting & from a variety of perspectives. Practitioners are helped to conduct their own evaluation. Low external credibility, susceptible to bias in favor of participants.
Note. Adapted and condensed primarily from House (1978) and Stufflebeam & Webster (1980).

Pseudo-evaluation

Politically controlled and public relations studies are based on an objectivist epistemology from an elite perspective. Although both of these approaches seek to misrepresent value interpretations about some object, they go about it a bit differently. Information obtained through politically controlled studies is released or withheld to meet the special interests of the holder.

Public relations information is used to paint a positive image of an object regardless of the actual situation. Neither of these approaches is acceptable evaluation practice, although the seasoned reader can surely think of a few examples where they have been used.

Objectivist, elite, quasi-evaluation

As a group, these five approaches represent a highly respected collection of disciplined inquiry approaches. They are considered quasi-evaluation approaches because particular studies legitimately can focus only on questions of knowledge without addressing any questions of value. Such studies are, by definition, not evaluations. These approaches can produce characterizations without producing appraisals, although specific studies can produce both. Each of these approaches serves its intended purpose well. They are discussed roughly in order of the extent to which they approach the objectivist ideal.

Experimental research is the best approach for determining causal relationships between variables. The potential problem with using this as an evaluation approach is that its highly controlled and stylized methodology may not be sufficiently responsive to the dynamically changing needs of most human service programs.

Management information systems (MISs) can give detailed information about the dynamic operations of complex programs. However, this information is restricted to readily quantifiable data usually available at regular intervals.

Testing programs are familiar to just about anyone who has attended school, served in the military, or worked for a large company. These programs are good at comparing individuals or groups to selected norms in a number of subject areas or to a set of standards of performance. However, they only focus on testee performance and they might not adequately sample what is taught or expected.

Objectives-based approaches relate outcomes to prespecified objectives, allowing judgments to be made about their level of attainment. Unfortunately, the objectives are often not proven to be important or they focus on outcomes too narrow to provide the basis for determining the value of an object.

Content analysis is a quasi-evaluation approach because content analysis judgments need not be based on value statements. Instead, they can be based on knowledge. Such content analyses are not evaluations. On the other hand, when content analysis judgments are based on values, such studies are evaluations.

Objectivist, mass, quasi-evaluation

Accountability is popular with constituents because it is intended to provide an accurate accounting of results that can improve the quality of products and services. However, this approach quickly can turn practitioners and consumers into adversaries when implemented in a heavy-handed fashion.

Objectivist, elite, true evaluation

Decision-oriented studies are designed to provide a knowledge base for making and defending decisions. This approach usually requires the close collaboration between an evaluator and decision-maker, allowing it to be susceptible to corruption and bias.

Policy studies provide general guidance and direction on broad issues by identifying and assessing potential costs and benefits of competing policies. The drawback is these studies can be corrupted or subverted by the politically motivated actions of the participants.

Objectivist, mass, true evaluation

Consumer-oriented studies are used to judge the relative merits of goods and services based on generalized needs and values, along with a comprehensive range of effects. However, this approach does not necessarily help practitioners improve their work, and it requires a very good and credible evaluator to do it well.

Subjectivist, elite, true evaluation

Accreditation / certification programs are based on self-study and peer review of organizations, programs, and personnel. They draw on the insights, experience, and expertise of qualified individuals who use established guidelines to determine if the applicant should be approved to perform specified functions. However, unless performance-based standards are used, attributes of applicants and the processes they perform often are overemphasized in relation to measures of outcomes or effects.

Connoisseur studies use the highly refined skills of individuals intimately familiar with the subject of the evaluation to critically characterize and appraise it. This approach can help others see programs in a new light, but it is difficult to find a qualified and unbiased connoisseur.

Subjectivist, mass, true evaluation

The adversary approach focuses on drawing out the pros and cons of controversial issues through quasi-legal proceedings. This helps ensure a balanced presentation of different perspectives on the issues, but it is also likely to discourage later cooperation and heighten animosities between contesting parties if “winners” and “losers” emerge.

Client-centered studies address specific concerns and issues of practitioners and other clients of the study in a particular setting. These studies help people understand the activities and values involved from a variety of perspectives. However, this responsive approach can lead to low external credibility and a favorable bias toward those who participated in the study.

Methods and techniques

Evaluation is methodologically diverse using both qualitative methods and quantitative methods, including case studies, survey research, statistical analysis, and model building among others. A more detailed list of methods, techniques and approaches for conducting evaluations would include the following:

See also

Notes and references

  1. ^ Hurteau, M., Houle, S., & Mongiat, S.(2009). How Legitimate and Justified are Judgments in Program Evaluation?Evaluation.15(3).307-319.
  2. ^ Potter, C. (2006). Psychology and the art of program evaluation. South African journal of psychology. 36(1). 82-102.
  3. ^ Reeve, J., & Peerbhoy, D. (2007). Evaluating the evaluation: Understanding the utility and limitations of evaluation as a tool for organizational learning. Health Education Journal. 66(2). 120-131.
  4. ^ Rossi, P.H., Lipsey, M.W., & Freeman, H.E. (2004). Evaluation: A systematic approach (7th edition). Thousand Oaks: Sage.
  5. ^ Joint Committee on Standards for Educational Evaluation
  6. ^ American Evaluation Association Guiding Principles for Evaluators
  7. ^ UNEG
  8. ^ DAC Network on Development Evaluation Home Page
  9. ^ House, E. R. (1978). Assumptions underlying evaluation models. Educational Researcher. 7(3), 4-12.
  10. ^ Stufflebeam, D. L., & Webster, W. J. (1980). An analysis of alternative approaches to evaluation. Educational Evaluation and Policy Analysis. 2(3), 5-19.