OAERS

Program Evaluation

Welcome to the Program Evaluation Resource Center. This site provides a range of resources for individuals seeking accessible information about program evaluation concepts and applications. Information on this page is compiled by the Office of Assessment, Evaluation, and Research Services (OAERS) in consultation with members of the Department and Educational Research Methodology at The University of North Carolina at Greensboro.

Overview of Program Evaluation

Program evaluation is a systematic process of gathering evidence to inform judgments of whether a program, product, policy, or system is meeting is goals and how it can be improved to better do so. This definition of program evaluation reflects the evolution in the conceptualization of program evaluation that began with the early work of Scriven (1967) and extends to the later works of Michael Scriven (1991), Carol Weiss (1998), Melvin Mark, Gary Henry, and George Julnes (2000), Robert Stake (2000), Daniel Stufflebeam (2001), and Thomas Schwandt (2008).

Several key terms are commonly used in describing the evaluation process. These key terms are:

Evaluand: The evaluand is the program, product, policy, or system that is being evaluated.

Evaluator: The evaluator is an individual involved in conducting the program evaluation. Evaluators who are internal to the client’s organization or group are referred to as internal evaluators. Evaluators who are hired from outside of the client’s organization or group are referred to as external evaluators.

Stakeholders: The stakeholders associated with a program evaluation are the individuals who participate in, or are affected by, the program, product, policy, or system being evaluated.

Formative Evaluation: The primary purpose of a formative evaluation is to provide ongoing information for evaluand improvement.

Summative Evaluation: The primary purpose of a summative evaluation is to provide information to make programmatic decisions or to judge whether a program should be adopted, continued, or expanded.

Phases of a Program Evaluation

The process of program evaluation involves the following key phases:

Phase 1. Gather information about the evaluand and client needs.

Phase 2. Draft the evaluation proposal and a formal agreement that specifies the terms of the evaluation with the client. The evaluation proposal should: (a) describe the evaluand, its context, and key stakeholders (e.g., this can be facilitated with a logic model, which is a diagram that explains the theory behind the program being evaluated); (b) outline the evaluation approach; and (c) align the evaluation questions, methods and procedures with the evaluation approach. Stufflebeam’s (1999) evaluation contract checklist serves as a useful reference when drafting the contract (http://www.wmich.edu/evalctr/archive_checklists/contracts.pdf).

Phase 3. Collect the data, or information needed to answer the evaluation questions.

Phase 4. Analyze the data and use the information yielded from the data as evidence to justify answers to the evaluation questions and possibly future recommendations.

Phase 5. Share findings and recommendations with the client and any other stakeholder groups identified by the client.

The Logic Model

A logic model is a diagram that presents key features of the program being evaluated. According Jody Fitzpatrick, James Sanders, and Blaine Worthen (2011), the logic model is helpful for program planning, evaluation, and research because, in addition to program theory, the logic model can help the evaluator understand the reasoning behind program goals, which can differ from stated program goals. While logic models may be included in any type of program evaluation they are often included in program-oriented approaches, such as objectives-oriented and theory-based evaluation (Fitzpatrick, et al., 2011). The basic design of a logic model includes information about program inputs, activities, outputs, and outcomes. Program inputs are defined as information such as annual budgets, staffing facilities, equipment, and materials needed to run the program (Fitzpatrick et al., 2011). Program activities are events such as weekly meetings, curriculum, workshops, conferences, recruitment, clinical services, newsletters, staff training, and all the key components of the program (Fitzpatrick et al., 2011). Outputs are the numbers of participants or clients served each week, number of class meetings, number of hours of direct service to each participant, number of newsletters and other immediate program products (Fitzpatrick et al., 2011). Lastly, the outcomes are the immediate, intermediate, or long-term goals for participant change after they finish the program (Fitzpatrick et al., 2011). Figure 1 displays a logic model from one major organization that was instrumental in the development and use of logic models to improve program organization and evaluation, The United Way of America. The W.K. Kellogg Foundation and the Annie E. Casey Foundation were also key organizations who provide training in the use of logic models (Fitzpatrick, et al., 2011).

Figure 1. United Way Sample Logic Model
Figure 1. United Way Sample Logic Model

Comparing Program Evaluation to Research

There are many similarities between evaluation and research, such as their use of similar methods of data collection and analysis; however Miri Levin-Rozalis (2003) proposes that the two are very distinct fields. The purpose of research is to generate and amass knowledge by seeking conclusions whereas the purpose of evaluation is to provide feedback to a client by judging and valuing (Levin-Rozalis, 2003; Fitzpatrick et al., 2011). Unlike in research where the researcher typically sets the agenda for what is to be investigated, the agenda in evaluation is usually co-created by the client requesting the evaluation and the evaluator. The evaluator consolidates the questions that the client has in the final evaluation questions (Fitzpatrick et al., 2011). One common criteria for research is that the results are generalizable, meaning that the findings can be generalized to apply to other settings/samples of similar type (Weiss, 1998). In evaluation, the notion of strict generalizability is less important than in some research because evaluation is highly context-specific. In other words, evaluation results are situated in the context of the evaluand so that stakeholders are capable of making judgments specifically about the object of the evaluation (Fitzpatrick et al., 2011). Focusing on the relative immediate impact (and thus the intended use of results) is critical to evaluation, because one important goal of evaluation is use (Fitzpatrick et al., 2011). This differs from research in that research may or may not be used right away, with important contributions to theory not always immediately recognized by the field or implemented in practice. Researchers often report measures of internal and external validity as critical criteria for assessing the adequacy of the research and its conclusions. Professional evaluators often accompany or supplement discussion of validity with a discussion of accuracy, utility, feasibility, propriety, and accountability (Fitzpatrick, et al., 2011). These categories correspond with the Program Evaluation Standards established by the Joint Committee on Standards for Educational Evaluation (JCSEE) that were first published in 1981 to provide standards for the evaluation of educational programs, projects, and products (Yarbrough, Shulha, Hopson, & Caruthers, 2011).

Program Evaluation Standards

The JCSEE first developed these standards in response to revisions to the American Psychological Association, American Educational Research Association, and National Council on Measurement in Education standards for testing that proposed developing program evaluation standards specifically (Sanders, 1999). There are a total of 30 standards (that assess different aspects of evaluation processes and results) and are grouped into four categories: accuracy is the extent to which the evaluation results are dependable, truthful, and reveal and convey technically adequate information; utility refers to whether the evaluation serves the information needs of the users; feasibility represents the effectiveness and efficiency of the evaluation work; and propriety emphasizes the legal and ethical practices of the evaluator(s) (Yarbrough et al., 2011). These standards are supported by the American Evaluation Association (AEA). The Table 1 below outlines the definitions of the program evaluation standards established by the JCSEE (Yarbrough et al., 2011).

For more information about the JCSEE and their work in developing educational evaluation standards visit http://www.jcsee.org/about. Along with the evaluation standards, the professional preparation of evaluators is different from researchers. In terms of professional preparation you can be an expert of evaluation and evaluate projects across multiple fields of which you are not an expert, whereas researchers typically have greater depth of focus and expertise on their research topics (Fitzpatrick, et al., 2011).

Theoretical Approaches to Program Evaluation

Vital to any rigorous evaluation is a complete understanding of the multitude of evaluation theories. Marvin Alkin and Christina Christie (2004) illustrated the progression and development of evaluation theories by situating leading evaluation theorists on an evaluation theory tree. The evaluation tree is an image which illustrates the underlying root reasons and goals for evaluation and categorizes the theorists into three overarching categories labeled use, methods, and valuing (see http://www.sagepub.com/upm-data/5074_Alkin_Chapter_2.pdf). Social accountability and social inquiry are conceptualized as the roots of the evaluation theory tree, or reasons and goals of the evaluation theorists in their efforts to develop program evaluation theory. Alkin and Christie (2004) define use as when the theorist emphasizes how the findings will be utilized in the evaluation activities, methods as when the theorist places emphasis on constructing well-designed methods, and valuing as when the theorist places emphasis on the role of the evaluator in making judgments. While not every evaluation theorist wholly agrees with Alkin and Christie’s evaluation theory tree or where they are situated among the branches, this illustration is useful to gain a preliminary understanding of the progression and development of evaluation theory, where evaluators have made major contributions, and what are three key facets of good evaluation. While the evaluation theory tree has been updated by Alkin and Christie (2013), either evaluation theory tree is useful for the purpose of starting to think about where the different evaluation theories originated.

Embedded in the beginning phases of program evaluation is a need for the evaluator to make the critical decision of what theoretical lens(es) they will use to view and design their evaluation. In practice, specific evaluation theories are often referred to as evaluation approaches. According to Fitzpatrick et al. (2011), evaluation approaches are prescriptive (i.e., provide a set of rules and guiding frameworks that specify what a good or proper evaluation is and how it should be) and descriptive (i.e., provide a set of statement and generalizations to describe, predict, or explain evaluation activities). There are a number of approaches used to conduct evaluations from which an evaluator can choose. These approaches differ in terms of the underlying assumptions they make, what they value and emphasize, the methods used, the role of the evaluator, and the areas of application. Discussed below are the principal categories for evaluation approaches.

Adversary-oriented Approach

The adversary-oriented approach centers the focus of the evaluation on planned opposition in stakeholder or participant points of view. This approach has been used in the context of highly controversial issues, but can be resource intensive (requiring information to be collected from the perspective of both sides of an argument). Thomas Owens and Robert Wolf each developed evaluation approaches classified as adversary-oriented.

Consumer-Oriented Approach

A consumer-oriented approach develops evaluative information based on consumer needs and perceptions (Fitzpatrick et al., 2011). This approach tends to require the evaluator to maintain distance from program staff and is goal-free (removing the influence of goals/objectives from the evaluation process) in order to determine if the program meets the consumer needs. Michael Scriven is credited with furthering the development of this approach.

Expertise-Oriented Approach

An expertise-oriented approach will depend primarily on the application of an evaluator’s professional expertise to judge the quality of the evaluand (Fitzpatrick et al., 2011). The critical aspect of this approach lies in the reputation of the critic or expert. Elliot Eisner was an emerging leader in developing this approach.

Management-Oriented Approach

The management-oriented approach focuses on identifying and meeting the informational needs of managerial decision-makers (those most likely to use the results to make decisions and changes). This approach requires a close link between the evaluator(s) and the manager/key decision makers, but it may not address critical issues. Key management-oriented theorists include: Daniel Stufflebeam, Joseph Wholey, Michael Quinn Patton, and Marvin Alkin.

Objectives-Oriented Approach

The objectives-oriented approach focuses on specifying goals/objectives and determining the extent to which those goals/objectives have been attained (Fitzpatrick et al., 2011; Farmer, 1997). This approach assumes that goals exist and are worthwhile and attempts to link program activities with outcomes. However, there may be instances where impact or outcomes unrelated to the goals are overlooked. Some well-known objectives-oriented theorists include: Ralph Tyler, Malcolm Provus, and Carol Weiss.

Participant-Oriented Approach

Participant-oriented approach to evaluation require the involvement of multiple levels of stakeholders in determining the values, criteria, needs, and data collected for the evaluation. In this approach the evaluator acts as a facilitator, encouraging dialogue, participation, and deliberation among all stakeholders involved. Leading theorists in this approach include Robert Stake, David Fetterman, Rita O’Sullivan, Egon Guba and Yvonna Lincoln, Donna Mertens, Abraham Wandersman, J. Bradley Cousins and Lorna Earl, and Ernest House and Kenneth Howe.

Policy-Oriented Approach

A policy-oriented approach to evaluation recognizes the influence that evaluation findings can have on policy decisions and incorporates that into the evaluation activities. This approach highlights the fact that the evaluator has some political influence regardless of their intent. People who have contributed research about this approach include Lee Cronbach and Carol Weiss.

Theory-Based Approach

A theory-based approach to evaluation investigates whether a program’s challenges are chiefly a result of problems in the program theory. While program goals may be implemented with fidelity to the program theory, the underlying theory may be the root of program challenges. Steward Donaldson has contributed heavily to the development and application of this approach to evaluation.

Utilization-Focused Approach

While Michael Quinn Patton has been identified as a leading theorist in the management-oriented approach, he is more widely known as having developed the utilization focused evaluation approach that focuses on the intended use for the intended users. A key aspect of this approach is that the evaluator is responsible for ensuring that the evaluation results are used.

Methodology and Analysis

Data methodology decisions are important in evaluation and involve many different considerations. As part of the methods in an evaluation the evaluator must determine whether they will used qualitative, quantitative, or mixed methods types of information, or data, as all of these are potential data that can be useful in an evaluation. Qualitative data is not numerical information and is typically textual or observational information that is in a narrative form (Weiss, 1998). Quantitative data is numerical information (Creswell, 1994; Weiss, 1998). Lastly, mixed methods information is more than simply using both qualitative and quantitative data within an evaluation. Mixed methods data are combined, or associated, qualitative and quantitative types of data that strengthen the evaluation of a construct or specific research topic (Fitzpatrick, et al., 2011; Creswell, 2009). Ultimately, the most important rule of thumb to keep in mind when deciding what types of data, methods, or procedures to use is that evaluators should select that which is “most appropriate for answering the evaluation question at hand given the context of the program and its stakeholders” (Fitzpatrick, et al., 2011). Methods of data collection should also be decided. The methods of data collection that are commonly used to conduct an evaluation include tests, surveys, direct measures of specific constructs, individual interviews, focus group interviews, observations, and artifacts (Fitzpatrick, et al., 2011). In cases in which the evaluator is actively collecting data from program stakeholders the sampling methods should be determined and two common types of sampling are purposive and random sampling methods (Weiss, 1998). Purposive sampling is when the evaluator justifies the selection of specific people for their sample for specific reasons (Weiss, 1998). Random sampling is when the evaluator uses a method of sampling that allows for the laws of chance to determine who is chosen to be in the sample and in itself random sampling justifies sample selection because it minimizes the potential for bias (Weiss, 1998). Weiss (1998) also provides specific examples of situations in which purposive and random sampling are each appropriate. Three other types of sampling that are highlighted by Weiss (1998) are opportunistic sampling, convenience sampling, and snowball sampling, which are all possible methods of sampling in qualitative data procedures (Miles & Huberman, 1994 as cited in Weiss, 1998). In consideration that many more methods of sampling exist other than those previously mentioned it is important to note that no sampling methods have been excluded from evaluation and any may be used with proper justification and in the appropriate contexts. It is also commonly known that any methods of qualitative, quantitative, and mixed method data analyses may be used in evaluation (Weiss, 1998, pgs. 271-293). General considerations to keep in mind when determining the methods and analyses to use in an evaluation are budgetary, time, and data limitations (Fitzpatrick, et al., 2011, pgs. 117-120). Another crucial consideration when designing the methods for program evaluation is to consider the evaluation questions. Since the evaluation questions are meant to guide and structure the evaluation, it is intuitive that they should also guide the methods and the data that are collected. Table 2 is a sample of one format by which to organize your evaluation so that the evaluation questions clearly align with the data sources, methods, and analyses.

Reliability and Validity in Program Evaluation

All data collected for the purposes of an evaluation should be assessed for its quality. Both the reliability and validity of evaluation findings are important to consider in regards to the quality of an evaluation. The reliability of these findings can be addressed through internal consistency coefficients for a measurement scale (i.e., Cronbach’s alpha), inter-rater reliability (i.e., Kappa coefficient) for observations, and triangulation of methods. Triangulation of methods is when multiple data points separately collected through multiple methods are analyzed concurrently in an attempt to triangulate the findings. Triangulation allows for a test of consistency by identifying any convergent and divergent findings among the multiple methods. Any potential divergent findings are then used as “opportunities for deeper insight into the relationship between [the] inquiry approach and the phenomenon under study” (Patton, 2002). Convergence can be used as evidence of the other quality criteria that is important in evaluation—internal validity. The combination of evidence from difference sources and methods achieved through triangulation allows for evaluators to gauge their confidence in the validity of their information. The more those findings converge, the more confidence there is that the construct of interest has been captured.

A metaevaluation, or the evaluation of an evaluation, is also conducted to improve the quality of evaluation findings by searching for any biases or errors in the evaluation process. Metaevaluations are generally guided by the framework proposed by the Joint Committee on Standards for Educational Evaluation (2011). Evaluators may unwittingly underrepresent certain stakeholder groups in the evaluation or overlook cultural differences that change the meaning of evaluation findings. A metaevaluation is capable of making aware and sometimes avoiding these types of misrepresentations.

Evaluation and Its Context

Cultural Competence in Evaluation

According to Stanley Sue (1998), cultural competence in evaluation regards the awareness and appreciation of differences in cultural groups and ability to communicate effectively across cultural groups in the design, implementation, interpretation, and analysis of an evaluation. Logistically, this means that appropriate programs, standards, interventions, and measures should be used in evaluation that is relevant and valid for every key cultural group (Thompson-Robinson, Hopson, & SenGupta, 2004). In its highest level, cultural competence can be embedded in every phase of the evaluation process (see key phases described in Overview of Evaluation section). While cultural competence can serve aspirations for social justice it also improves the data validity and accuracy through the use of conceptual and empirical strategies to understand differing values, belief systems, behaviors, and social regularities within the cultural groups involved with the program (Hughes, Seidman, and Williams, 1993).

Reporting Practices in Program Evaluation

General reporting for program evaluation is typically discussed as part of the evaluation contract and negotiated with the client to be delivered throughout the entire evaluation process. Specifically, reporting may include presentations at client meetings, progress report memorandum emails about preliminary or mid-evaluation findings, and obviously the final evaluation report. Ongoing and consistent communication throughout an evaluation has been reported as an important facet of evaluation use, thus it is suggested that evaluators schedule interim reports throughout the evaluation (Fitzpatrick, et al., 2011; Weiss, 1998). Interim reports can be conformed to evaluation milestones, such as following data collection and analysis, and program milestones related to budget cycles or scheduled stakeholder/client meetings (Fitzpatrick, et al., 2011). The format of reporting depends on the intended audience(s), and on how many evaluation reports will be written to differentiate the results to those varied groups. Typical evaluation reports begin with an executive summary, outlining the essential findings and recommendations of the evaluation and the evidence found to support those claims (Fitzpatrick, et al. 2011; Weiss, 1998). Other sections of evaluation reports typically included are: an introduction to the purpose and audience of the evaluation; a description of the evaluand, evaluation objectives, and evaluation questions; a review of the evaluation procedures; presentation of the findings and results (supported by the data); conclusions and recommendations; and appropriate appendices that may include supporting documents related to data collection, analysis, or interpretation. While these are typical in traditional written reports, these pieces of information can be conveyed in many alternative forms, depending on the audience and their needs (e.g., oral presentations; visual illustrations; online/social media). The American Evaluation Association’s data visualization and reporting topical interest group is a great resource for current trends and alternative reporting options in evaluation (http://comm.eval.org/datavisualizationandreporting/DataVisualizationandReporting/Home/).

For a tool to assist in the establishment of reports that contain the necessary information for both the client and evaluator, check out the “Evaluation Report Checklist” developed by Gary Miron in 2004 and made available by the Western Michigan Evaluation Center at http://www.wmich.edu/evalctr/checklists/evaluation-checklists/.

How to Learn More about Program Evaluation

  • See the American Evaluation Association (AEA) website (eval.org)
  • Connect with the Linkedin AEA discussion board
  • Look up articles from leading evaluation journals in Google Scholar or library databases (Advances in Program Evaluation, American Journal of Evaluation, Educational Evaluation and Policy Analysis, Evaluation, Journal of Multidisciplinary Evaluation, New Directions for Evaluation)

References for Program Evaluation Resources

Alkin, M. C. & Christie, C. A. (2004). An evaluation theory tree. In M. C. Alkin (Ed.), Evaluation roots: Tracing theorists’ views and influences (pp. 12-65). Thousand Oaks, CA: Sage.

Alkin, M. C. & Christie, C. A. (2013). An evaluation theory tree. In M. C. Alkin (Ed.), Evaluation roots: A wider perspective of theorists’ views and influences (pp. 11-57). Thousand Oaks, CA: Sage.

Centers for Disease Control and Prevention. Framework for program evaluation in public health. MMWR 1999;48 (No. RR-11):[pp.4-25].

Creswell, John W. (1994). Research design: Qualitative and quantitative approaches; Sage Publications; California.

Creswell, John W. (2009). Research design: Qualitative, quantitative, and mixed methods approaches (3rd ed.). Thousand Oaks, CA: Sage.

Farmer, D. (1997). Program evaluation matrix. Retrieved from http://hostedweb.cfaes.ohiostate.edu/brick/Matrix.PDF.

Fitzpatrick, J. L., Sanders, J. R., & Worthen, B. R. (2011). Program evaluation: Alternative approaches and practical guidelines (4th ed.). Upper Saddle River, NJ: Pearson Education, Inc.

Hughes, D., Seidman, E., & Williams, N. (1993). Cultural phenomena and the research enterprise: Toward a culturally anchored methodology. American Journal of Community Psychology, 21, 17.

Joint Committee on Standards for Educational Evaluation (2011). Program Evaluation Standard Statements. Retrieved from Joint Committee on Standards for Educational Evaluation: http://www.jcsee.org/program-evaluation-standards/program-evaluation-standards-statements.

Levin-Rozalis, M., (2003). Evaluation and research, differences and similarities. The Canadian Journal of program Evaluation, 18(2), 1-31.

Mark, M. M., Henry, G. T., & Julnes, G. (2000). Evaluation: An integrated framework for understanding, guiding, and improving policies and programs. San Francisco: Jossey-Bass.

Patton, M.Q. (2002). Qualitative research and evaluation methods. Thousand Oaks, CA: Sage Publications.

Sanders, J. (1999, April). The development of standards for evaluations of students. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.

Schwandt, T. A. (2008). Educating for intelligent belief in evaluation. American Journal of Evaluation, 29(2), 139-150.

Scriven, M. (1967). The methodology of evaluation. In R. E. Stake (Ed.), Curriculum evaluation. (American Educational Research Association Monograph Series on Evaluation, No. 1, pp. 39-83). Chicago: Rand McNally.

Scriven, M. (1991). Evaluation Thesaurus (4th ed.). Newbury Park, CA: Sage.

Stake, R. E. (2000). Case studies. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (2nd ed.). Thousand Oaks, CA: Sage.

Stufflebeam, D. (1999). Evaluation Contracts Checklist. In The Evaluation Center. Retrieved from http://www.wmich.edu/evalctr/archive_checklists/contracts.pdf

Stufflebeam, D. (2001). Evaluation models. New Directions for Evaluation, No. 89. San Francisco: Jossey-Bass.

Sue, S. (1998). In search of cultural competence in psychotherapy and counseling. American Psychologist, 53, 9.

Thompson-Robinson, M., Hopson, R., & SenGupta, S. (2004). Editors notes. New Directions for Evaluation, 102, 4.

Weiss, C.H. (1998). Evaluation, second edition. Upper Saddle River, NJ: Prentice Hall, Inc.

Yarbrough, D. B., Shulha, L. M., Hopson, R. K., and Caruthers, F. A. (2011). The program evaluation standards: A guide for evaluators and evaluation users (3rd ed.). Thousand Oaks, CA: Sage.