Planning Useful Evaluations

Susan Twombly

4 Planning Useful Evaluations

Key Topics

The logic of evaluation
Defining criteria and standards
Importance and role of a plan
Components of an evaluation plan
Guiding questions

Introduction

This chapter reviews the steps involved in planning useful assessment and evaluation activities. This includes planning needs assessments, implementation/operation, outcomes assessment, and to a lesser extent, developing a logic model. Although accreditation and program reviews processes and activities lie outside of the other types of evaluation included in this book, they follow some of the same logic. I first outline the logic of assessment and evaluation and then describe more specifically about the steps in planning (and conducting) an assessment or evaluation project. The chapter concludes with a brief discussion of politics and research ethics as applied to assessment and evaluation.

Logic of Program Assessment and Evaluation

Evaluation activities follow a basic logic and process. At its most fundamental level, the logic of program evaluation is quite simple. It involves the following:

Determining what will be assessed or evaluated.
Identifying the criteria on which program performance will be assessed.
Identifying standards or levels of performance expected for each criterion.
Identifying and collecting evidence that can be used to determine whether the program is successful in meeting the criteria and standards.
Synthesizing information collected to draw conclusions about program achievements and effectiveness based on the evidence collected and in relation to standards for success.

In practice, implementing this logic is a bit more complicated. Each of these bullets is considered in more detail below. It should be noted that the steps identified above may not completely fit with the logic of needs assessment and logic model development.

Determining What Is Being Evaluated and Why

A decision must be made about what will be evaluated and for what purpose. This involves putting boundaries around the program, identifying specifically what will be included in the evaluation and what will be excluded. Additionally, the evaluator and sponsor/stakeholders need to be clear about what they want to learn, focus appropriately, and be sure that critical stakeholders and evaluators are on the same page about what is to be included and excluded from the evaluation. This is where the evaluator and sponsor determine whether the purpose is to assess need, program implementation, program outcomes, or impact and whether the ultimate purpose is formative or summative This also involves identifying specific questions an evaluation will seek to answer. I will discuss these issues in more depth later.

Specifying Criteria

After clearly identifying the program to be evaluated and the purpose of the evaluation, evaluators must identify the criteria on which the program will be judged and on which data will be collected. For Rossi et al. (2004) criteria “represent domains in which the program can realistically hope to have accomplishments” (p. 70). The Community Toolbox’s very straightforward definition of criteria is that they are “aspects of the program that are most meaningful to monitor” (Work Group for Community Health and Development Tool Box, Chapter 36, Section 1, p. 15).

Another perspective is that criteria are the critical dimensions on which a program must do well to be effective or to achieve its goals. For example, what are the key characteristics, attributes, or dimensions of an effective academic advising or faculty development program without which it cannot achieve its goals? In short, criteria are the key aspects of programs on which you want to collect data to determine program effectiveness. They are the things that are critical to a program performing well. To use an advising office as an example, you would first identify the attributes of an effective advising program and then would collect data on these attributes.

One of the clearest examples of identifying and applying criteria in an evaluative way is seen in product evaluation. Consumer Reports is a classic example. It sets out clear dimensions of a product that it will assess and then judges each brand or model of a product on those criteria. Identification of these dimensions for social programs is not always straightforward or easy as I describe below. Program outcomes serve as the criteria for many evaluations in higher education.

NOTE: it would not be typical to identify criteria for needs assessment or for constructing a logic model. Both are activities geared more toward developing the program rather than assessing its outcomes. A needs assessment does require the evaluator to make decisions about important aspects of the situation about which data will be collected.

Sources of Criteria

Where do the criteria by which a program will be judged come from?

As Owen (p. 11) notes, criteria may be found in or shaped by:

Program goals, objectives and intended outcomes
What you and/or others want to learn from the evaluation process
Interests and/or preferences of stakeholders
The needs of clients for whom a program is intended
The policy that determines the guidelines for a program
Externally defined criteria (such as those set by the Council for the Advancement of Standards in Higher Education and accrediting bodies
Best practices

The criteria applied to evaluation of any one program will change depending on the type of evaluation being carried out. For example, if one is doing an operation and implementation evaluation, the criteria would focus on design, aspects of delivery, and use. However, if one is evaluating the outcomes of the same program, the focus would be on the program’s identified outcomes.

Criteria are sometimes determined by an external body such as an accreditation body or professional association. The Higher Learning Commission (a regional accrediting body) has identified several overarching criteria on which colleges and universities are evaluated. The Council for Advancement of Standards (CAS) in higher education identifies essential characteristics or criteria by which student affairs functional areas or units can be evaluated, however, these criteria are often better suited to global assessments of a unit then they are to assessment of program specific outcomes.

Criteria for assessing program outcomes or effectiveness most often center around that program’s goals and outcomes. Goals and outcomes are found embedded in a statement or logic model, either written or unwritten, about what program administrators or stakeholders believe are the intended goals of a program and the dimensions or aspects of a program that are most critical to attaining program goals and activities. Federally sponsored programs, such as many of the TRIO programs, have very clearly stated objectives and standards on which their programs will be judged. These outcomes are often specified in the grant proposal. Professional associations and stakeholder interests and perspectives about a program may also serve as a source of criteria on which a program can be assessed. It is important when using these sources to understand the perspectives these groups bring to the process.

Criteria can also come from the literature, benchmarks, or best practices. Literature reviews, benchmarks and best practices will be discussed in the needs assessment chapter.

Challenges Identifying Criteria

Criteria are often easily identified and very specific. For example, if weight loss is a goal of a recreation services program, then weight loss (amount of weight lost or number of people who lose weight) is an aspect of the program, or a criterion, by which the effectiveness of the program can be judged.

Determining criteria is more difficult in other cases. Program goals or objectives may or may not specifically name the areas in which accomplishments can be expected or name the elements to monitor and do so in a readily observable way. Or if they do, the goals and outcomes are vague and too general, such as “to develop leaders.” When this is the case, in order to use program goals or outcomes as criteria, they must be defined more specifically to be useful. The criteria must be operationalized, a topic discussed in Chapter 5.

One of the biggest problems in conducting program evaluations in higher education is that, for many programs, written program descriptions, objectives, or goals do not exist—at least not at the level of detail that makes them useful for evaluation without some additional work. Programs typically do not have formal logic models unless they are grant driven. Even when program descriptions do exist, they may not identify the program’s intended outcomes or the critical components of the program that you would want to monitor. In cases where goals and outcomes are not in writing, the evaluator may have to deduce them through reading program information and interviewing key program participants.

Setting Performance Standards

Once criteria or key program dimensions are identified, evaluators and program administrators will ideally specify how well the program is expected do on each dimension or criterion. That is, they will set a performance target for each criterion. A target to aim for and by which to judge performance is necessary. This allows them to have a basis for determining whether the program is successful or not. Suskie (2014) suggests setting rigorous but justifiable targets for success. As she notes, it is hard to share successes or identify areas in need of improvement if one doesn’t have a good sense of what success looks like. Standards are often built into outcome statements as shown in Chapter 5.

To use a specific example, if a key characteristic of an effective advising program is that students should be able to see an advisor within a reasonable amount of time after requesting an appointment (an operational outcome and a criterion), then one needs to have some idea of the ideal time-to-appointment. The specified desired time becomes one standard by which performance could be judged.

For this example, let’s assume that the director of advising determined that two working days or less (48 hours) is the maximum desired wait time and thus is one of the standards by which to judge program success. An administrator/evaluator has several choices for operationalizing this standard. The standard could be “the average wait time to get advising an appointment is 48 hours or less.” It is also possible to state the performance standard in a different way: 75% of students requesting appointments get them within two working days. Either approach provides a standard against which to compare actual performance to assess how well your advising program is doing. If there is no previously established standard, ideally the evaluator will work with program administrators to create one. With no predetermined desired time (the standard) or the percentage of students who obtain appointments in the advising example, you have no basis for determining whether the actual wait time is good or bad. Identifying the standardagainst which actual program performance on key criteria is compared is a core aspect of program evaluation.

All this said, it is fairly typical in higher education for there to be no predetermined standards by which to judge whether programs are serving their purpose and doing so as expected. One can make a credible argument that reducing some of higher education goals to defined criteria and clear standards of performance is problematic (some of the more ephemeral general education goals, for example). Clear criteria and reasonable standards of performance are useful for decision making purposes.

Sources of Standards

Standards of performance come from many sources. One typical source is current performance comes from standards set by professional organizations. Many national professional organizations have such standards that can be used. The National Academic Advising Association (NACADA), or some other professional association, may provide recommended ratios of advisor to advisees that can be used as a standard.

Another source is peer group performance or national norms. If a college’s current six-year graduation rate is 60% and the rate for its self-selected peer institutions is 70%, the institution may set 70% as its target six-year graduation rate to be achieved over a five-year period. Seventy percent then becomes the standard by which the institution will compare its actual performance to determine how successful it is in achieving its target. In this hypothetical case, the rate of 70% came from a group of peer institutions. Most colleges and universities have peer groups and many have two: a set of comparable institutions and a set of aspirational peers—ones they want to be like.

Regardless of the source, standards should themselves meet a couple of standards. They should not be drawn out of thin air. There should be some basis for the choice. Second, standards set should be reasonable and attainable. It is unlikely that a college can go from a 60% six-year graduation rate to 70% in one year. A more gradual increase over a longer period is likely more feasible. On the other hand, setting a performance target that is reachable but too low is not good either. Suskie (2015) argues that institutions should set “justifiably rigorous targets” (p. 173). This could involve setting two targets: acceptable and aspirational (Suskie, 2015).

Caution!

Evaluation texts and practitioners are not consistent in the way they use the terms criteria and standards. Sometimes the terms are used interchangeably. In some cases, the terms follow the definitions used above. Other times, the term standard is used to describe the aspects of a program on which it should be evaluated. This text makes a distinction between the two terms as laid out in this section. Criteria are aspects on which a program is evaluated; standards are performance targets—typically numeric—for how well a program should do on the criteria specified. Record of refereed publications is a criterion on which faculty tenure is based (at a research university). A department may set the required number of such articles, which becomes the standard on which a record is deemed acceptable or not.

Collecting and Analyzing Information

All evaluations involve systematic collection of appropriate data to assess program performance on the identified criteria. These data will ideally allow evaluators to compare results with standards. If the goal is to reduce time to advising appointments, you would collect data on the number of students requesting appointments, when they request appointments, and when their appointment takes place. With these data you would calculate average wait time. You can also use these same data to calculate the percentage of students who obtain appointments in one, two, or three days.

Information collected is then synthesized and compared to the identified standards to form a determination about whether the program is meeting its goals. The nature of the comparison will vary depending on the purpose of the evaluation. Linda Suskie (2015) argues that in evaluation, numbers only have meaning when compared to other numbers: for example, actual performance compared to targets or actual performance to actual performance of others.

Using the advising example, you could compare the data about actual wait time to the desired time and ask yourself whether the average appointment wait time is above or below the set standard (two days) or whether the targeted percentage of students seeking appointments is getting them within the desired wait time. Or you could compare wait time at your institution to national norms.

Much assessment and evaluation data collection employs fairly common social science research methods and as such is based on either qualitative, or more typically, fairly standard quantitative designs. These methods are discussed further in Chapters 15-18.

Drawing Conclusions and Sharing Results

The final step in the evaluation logic involves synthesizing data collected and drawing conclusions. The kinds of conclusions one can draw depend on research design, which will be discussed in greater depth in various sections of the book. That aside, evaluation scholars are of two minds about the extent to which evaluators should draw conclusions about program effectiveness or worth. Some argue that this is the role of evaluators. Others argue that evaluators should summarize data and draw conclusions from the data but leave the ultimate “judgment” about program worth or effectiveness to those responsible for the program itself. This debate may be more relevant in cases in which external evaluators are employed. For all practical purposes in higher education, administrators of programs sponsor, or design and carry out evaluations of their programs and are in the best position to draw conclusions and make judgments about effectiveness. On the other hand, administrator-evaluators may be too close to the situation to be objective.

Rarely will formative or summative judgments be based on one piece of information or even one set of data collected at one point in time. Nor should they be. In the case of wait time for advising appointment, there are additional important questions to be answered to determine if advising is effective; time-to-appointment is just one. For example, program stakeholders would also likely be concerned about the quality of the advising provided (i.e., competence of advisors). If the wait time is long, they would want to know why. Or they might want to know if wait time is different by different advisors or for different groups of students. Perhaps wait time is outweighed by quality or length of appointment. Rarely would a judgment of merit or worth be based solely on wait time or on any single criterion for that matter.

The Evaluation Plan

Good evaluations are guided by a plan. The plan operationalizes the logic described above and functions like a blueprint or recipe to guide conducting an evaluation. It is equivalent to a dissertation or research proposal. According to the Workgroup for Community Health and Development Tool Box there are several very good reasons for developing a plan to guide an evaluation:

It guides the evaluator through each step of the process of evaluation.
It helps to decide what sort of information you and your stakeholders really need.
It can prevent wasting time gathering information that is not needed.
It helps to identify the best possible methods and strategies for getting the needed information.
Ideally it forces one to come up with a reasonable and realistic timeline for evaluation.
Most importantly, it will help you improve your initiative. (Chapter 36, Section 5, p. 2)

In other words, formalizing the intended evaluation steps in a plan keeps evaluators focused and provides a roadmap for conducting the evaluation activity. If one is doing a long-term evaluation project (as in assessment of student learning outcomes) or an evaluation for someone else, a written plan is particularly useful. Grant proposals will require them. The Director of Study Abroad and Global Engagement at my university and I served as evaluators for a multi-year, multi-activity, curricular internationalization project for a nearby community college. We constantly referred back to the original grant proposal and its written evaluation plan in order to remind ourselves what our evaluation sponsors wanted to know and what we had promised to provide. Accrediting agencies require written student learning outcomes assessment plans. Even if you do your own evaluation, you will ideally prepare some sort of plan. The plan may be more or less detailed depending on audience and extent of evaluation task.

A typical plan mirrors evaluation logic discussed above and includes the following components that are described in more detail in the following sections:

Program description, evaluation purpose, and focus (including specifying criteria and standards)
The data to be collected
How the data will be analyzed
How results will be shared and with whom

Evaluation Purpose and Focus

Most evaluation experts today argue that evaluation findings should be useful to evaluation sponsors and program stakeholders. To be useful, an evaluation must answer questions program administrators and stakeholders care about and be “tailored,” in Rossi et al.’s (2004) language, to the situation at hand. Thus, the first step in creating an evaluation plan is to determine the purpose and focus of the evaluation. What does the program administrator or evaluation sponsor want to know and about what aspects of the program? To maximize usefulness of evaluation results, stakeholders should be consulted at all stages of the evaluation process—beginning with identifying the purpose—to ensure usefulness of findings (Owen, 2007).

The specific purpose is typically tied to one of the six evaluation types introduced in the previous chapter and to be discussed in detail in the next section of the book. In fact, the type of evaluation can help narrow the specific focus of an evaluation. This is true for all but accreditation where the external accrediting body sets the purpose and even specific questions a program or college or university must answer.

Evaluators typically consider the following when deciding evaluation focus and which type of evaluation is most appropriate:

Why is the evaluation is being conducted? What is the overarching purpose of the evaluation? What do administrators or program sponsors want to know? Will the evaluation results lead to a summary decision about a program’s effectiveness or continued existence or is the primary purpose to improve the program?
The program’s design and stage of development. Is the program in the planning and development state, new, or mature? This is often related to specific things the administrator and stakeholders want to know.
What sponsors and stakeholders specifically want to know about the program.
How results will be used and by whom.
What type of data are available or what would need to be/can be collected. If comparative data are not available, you are limited in the types of evaluation from which to choose.
How much time, financial, and human resources are available for evaluation activities. It is tempting to want to answer as many questions as one can think of in a single evaluation. However, evaluation takes time and resources, both human and financial. It is preferable to focus on and collect data only on key questions of interest.

As noted earlier, a single, comprehensive evaluation of a mature program may seek to answer questions about implementation, outcomes, impact, and cost-effectiveness. In fact, Rossi et al. (2004) recommend conducting some operation and implementation evaluation concurrently with outcome or impact evaluations to appropriately understand the outcomes or impact. For example, if outcomes from a training program are lower or higher than expected, questions about the program operations and implementation help administrators understand why. Frequently, the types of evaluation conducted in higher education involve collecting survey data that may cover implementation questions about program design, delivery and use as well as questions to assess outcomes.

The Guiding (Research) Questions

Because evaluation is first and foremost an exercise in asking and answering questions about a program or policy, identifying the specific guiding questions that will focus an evaluation is a crucial part of elaborating the evaluation’s specific purpose. Guiding questions are the equivalent of, and serve the same purpose, as research questions in any social science research study. Identifying good, clear questions is one of, if not the, most crucial aspects of an evaluation. As you will see in subsequent chapters, the guiding questions often stem from the purpose of the evaluation and translate the criteria or groups of criteria on which the program is being evaluated into questions guiding an assessment or evaluation project. The questions will tell you and others what you want to learn about the program and what data you need to collect and guide all aspects of the evaluation. Furthermore, they will serve notice to sponsors what the final report will speak to.

Each of the evaluation types introduced in Chapter 2, and discussed in more depth in subsequent chapters, lends itself to a particular set of questions. For example, needs assessment asks about the magnitude and nature of a social problem—it provides insight into a problem, whereas outcome assessment asks what the outcomes of a program are and whether the program causes change in the target population on the matter of interest. The guiding questions will reflect these differences. Lack of focused guiding questions can result in a waste of everyone’s time and may result in an evaluation with little useful information. This is especially true if carefully chosen questions are not based on an accurate understanding of the program’s logic model, what the stakeholders care about, and questions that you can answer.

Characteristics of Good Guiding Questions

It takes time—and often multiple iterations—to identify and formulate the questions one wants to answer about a program. Good guiding questions have the following characteristics (See for example, Bresciani, et al, 2004; Mertler, 2016):

Good questions are meaningful and useful. It is a waste of time to ask and answer a question that is of little importance. All the data in the world are of little use if they can’t be used to answer important questions of interest or relevance to the program. Just because you have data, does not mean that it is relevant or useful for a particular evaluation.
Good questions are reasonable and realistic. They are manageable. The questions should not be so broad and expansive that you can’t possibly answer them with the time and resources at your disposal.
Good questions are answerable. It must be possible to gather data to answer the questions. It’s of little use to ask questions for which data can’t be collected. In other words they are measurable.

You will also see this used in reference to outcomes as being meaningful, manageable, and measurable, the 3 M’s (Bresciani et al., 2004). It’s the same concept. It is important to note that because of good questions needing to meet these three criteria, questions often have to change because data are not available to answer the original question.

Tip!!

New evaluators often have a difficult time distinguishing between the guiding questions an evaluation seeks to answer and interview or survey questions that evaluators use to collect data to answer the guiding questions. One method for distinguishing between guiding or evaluation questions and survey items or interview questions is the following: The subject of questions that guide evaluations are typically nouns—is the program reaching its intended audience, for example. Or are students learning what faculty and administrators think they should learn? Survey or interview questions, in contrast, often use personal pronouns—what do you know about the program? Is the program offered at a time you can attend? What did you learn?

Sources of Guiding Questions

Where do guiding evaluation questions come from? Evaluation questions emerge from several sources. The first might be conversations with evaluation sponsors and stakeholders about why they are commissioning an evaluation and for what purpose. The purpose of the evaluation (to assess needs, to assess program implementation or operation, or to assess outcomes) will determine the general focus of the guiding questions as you will see in subsequent chapters. From there, specific questions will be tailored based on 1) the program description, 2) program goals, 3) the program’s logic model, and 4) conversations with stakeholders, program leaders, and participants about what they most want to know. In particular, the program’s logic model (if there is not one, you may have to construct one) and documents such as program descriptions and goals, program history, and results from previous evaluations are good sources. Additionally, the criteria and standards by which success will be determined serve as good sources for evaluation questions.

Specific information one might collect from stakeholders, evaluation sponsors, and program documents to help formulate useful questions to guide evaluation activity includes the following. Sources for the data are in parentheses:

What do administrators who are conducting the evaluation care about?
What do stakeholders and program sponsors want to learn from the evaluation and how extensively do they want to be involved in the evaluation? (Stakeholders, the evaluation sponsor.)
What does the program hope to achieve? What are its goals and objectives? (Program description, logic model.)
How does the program work? What are its activities? (Program description and logic model, conversations with stakeholders). Particularly important for implementation/operation assessment.
What are the main criteria on which the program performance will be judged? (Program description, stakeholder conversations, logic model).
What important program history and characteristics need to be considered? (Stakeholders).

Centering one’s evaluation and assessment activities around good questions that are sufficiently narrow, but not so narrow as to be meaningless, is a skill that develops with practice. Given time and resources available you may have to prioritize the questions an evaluation pursues because you may not be able to do them all. I cannot emphasize enough how important it is to develop good, clear questions to guide evaluation work that meet the meaningful, reasonable, and answerable criteria. These questions may change over time, but all the data in a college/university will be of little use if administrator/evaluators do not know what they are looking for or if the data they have or can collect does not answer questions that they care about. In subsequent chapters, I will provide examples of guiding questions appropriate for each type of evaluation.

In the process of conducting any evaluation, program administrators may well learn things that they did not set out to learn—unintended findings. That is perfectly normal. It is also important to reiterate that to be maximally useful, the plan should be a result of consultation with program stakeholders and an accurate understanding of the existing program.

“I’m just curious”

Fight the desire to pursue questions just because you are curious about the answers. Assessment and evaluation activities take time and effort, the time and effort of participants who complete surveys or interviews, and time of evaluators. Curiosity is not, in and of itself, a sufficient reason to pose a question. Questions must be designed to get answers that will fulfill your ultimate purpose.

Methods for Data Collection and Analysis

An important step in the plan is to identify what data is necessary to answer the guiding questions. It is essential that you be able to gather data to answer each guiding question. You may have a great question but if you can’t gather data that will allow you to answer the question, the question is not helpful and you need to modify it. Typical methods of data collection and analysis for each type of evaluation will be described in subsequent chapters. What you will actually do should be described in the plan’s method section. Keep in mind that time and resources should be considered when identifying data collection plans.

Anticipating Report Format and Use

Although reporting and using findings is not done in the planning stage, it is often useful to begin with the end in mind, not with the answers, but with the types of things you or the program sponsor wants to know and what the report is supposed to look like. In that sense, reporting and using the results is an important part of the planning process. Especially if a committee is involved, it is very helpful to give the members an outline of what is expected in a formal report (not the specific answers) but the type and form of the final report. Committee members often want this guidance. Chapter 14 addresses how results should be communicated.

Politics and Ethics

Political and ethical considerations are important factors to consider in any evaluation and are particularly important because most evaluations in colleges and universities are carried out by program administrators and coordinators and faculty members on their own programs. Henning and Roberts, (2024), Schuh, et al., (2016), and Suskie (2018) all have lengthy discussions of the effects of power, politics, and ethics in assessment and evaluation.

Power and Politics

Because program evaluation is deeply entwined with the reality of organizations, one cannot escape the politics of evaluation in the way that neutral, objective researchers try to do. These political issues must be considered in the plan. In fact, Mertens (2012) argues that the chief difference between evaluation and other forms of scholarly research is precisely the inherently political nature of evaluation. Power and politics (in the sense of building coalitions to exert influence and not political party affiliation) insert themselves into evaluation in various ways. For example, perhaps administrators commissioning an evaluation do not like a particular program (or its director) regardless of how well it is performing. Or conversely, perhaps staff like a program regardless of its outcomes and are engaging in program evaluation primarily for public relations or symbolic reasons (i.e., they have no intention of using the results). Perhaps program administrators or sponsors do not want an evaluation to take place or staff will not cooperate. There may be a power dynamic at play that makes participants uncomfortable answering questions honestly if at all. On campuses without a centralized approach to student learning outcomes assessment, there may be competing views about how student learning outcomes should be done. Individual assessment experts may have competing views about how to do assessment and what level of effort needs to be expended. Faculty members tend to view evaluation processes, such as program review, as inherently political and thus are hesitant to be too forthcoming about any weaknesses or areas in need of improvement.

These are just some of the ways in which power and politics affect evaluation. It is not possible to reduce or eliminate all the ways in which politics can affect evaluations. It is, however, incumbent upon evaluators to attempt to uncover and understand these potential dynamics as they conduct an evaluation.

Ethics

Evaluations must follow proscribed standards of ethics in research. Schuh and Upcraft (2001) highlight Kitchner’s four principles as a guide to assessment efforts: (1) assessment activities should respect participants’ autonomy. For assessment this means participants should have a right to participate or not. (2) Assessment should “do no harm.” For research and assessment this means that participants should not be put at risk. In the current social and political environment, it is not so easy to know what participants will construe as harm, but Institutional Review Boards have criteria by which they judge “harm.” In assessment, however, there may be other dimensions of harm that ought to be considered. These might include lack of ability to safeguard data collected, collecting but not analyzing data, or misspeaking about data in meetings. (3) Assessment should benefit others. (4) Assessment should be just, by which Kitchner means doing what is promised. I would add that this involves being faithful to the data collected and being of benefit to the people you serve. These principles have relevance as one strives to conduct equity-minded assessments for equitable purposes.

Institutional Review Boards

While Kitchner’s ethical principles are general, there are four specific principles of ethics in social science research that should be considered and respected in evaluation studies regardless of whether an institution’s Institutional Review Board (IRB) requires approval for evaluation studies or not. IRB approval will be required and should be sought if the evaluators seek to present findings at conferences or publish the results but may not be required for typical program evaluations.

The first requirement is being transparent about the purpose of the evaluation and how the data will be used. Participants can use this information to make an informed decision about choosing to participate in a study or not. There is, however, a fine line between providing participants a general idea of purpose and telling them so much that their answers are affected. The former is required, but the latter is not necessary. Second, participation should be voluntary, and participants should know that. Third, participants should be able to withdraw from a study at any time without punishment. Fourth, participants should know how you will treat their identities. Will you know who is participating? If not, then you can say that the data will be anonymous. (Anonymity in social science research typically means that not even the researcher/evaluator knows the names of individuals participating or cannot associate individual with response.) If you know who the participants are and could identify them, then you need to be clear whether you will use their names or whether their data and names will be treated confidentially—not known to anyone but you. (Some participants want the power of being named in a study.) The goal here is to be honest with participants so that they know when they participate how their identifying information and responses will be treated. It is okay, for example, to use an institution or program’s name or the names of participants if you have informed participants of this and they have agreed to participate under these conditions. Some institutions go so far as to solicit student signatures at orientation that obtains the right to use any assessment data collected.

Institutions differ on whether program evaluations require IRB approval. However, even if they do not require such approval, it is essential that evaluations follow widely accepted standards for ethical behavior in research reviewed above especially for the protection of those who participate in an evaluation (transparency about purpose, informed consent, confidentiality, cause no harm, and voluntary participation and withdrawal). Program administrator-evaluators have to live and work with their colleagues after an evaluation activity is complete and their employees’ jobs could be at stake as a result of evaluation activities. It is essential to treat participants ethically. Maintaining confidentiality of responses is a particular challenge. It may also be challenging for an evaluator to overcome something called confirmation bias in which one looks for evidence to confirm their initial beliefs.

Ethics in Student Learning Outcomes Assessment

Student learning outcomes assessment comes with a particular set of ethical considerations related to, but beyond, the traditional standards identified by Kitchner, the Belmont Report, traditional IRB standards, and standards for conducting evaluations competently. Specifically, because student learning outcomes assessment is typically carried out by faculty and staff practitioners, they owe it to students to employ sound methods of data collection and to analyze and use the data to inform practice. It is not responsible to collect reams of data from poorly constructed instruments that have little chance of being analyzed and used.

The Ethics of Conducting Evaluations Competently

Another aspect of ethics involves responsibility for conducting the assessment and reporting results competently. The Joint Committee on Educational Evaluation (Joint Committee, n.d.) has developed a series of 30 specific standards by which to judge the quality of an evaluation. (http://www.jcsee.org/program-evaluation-standards-statements). These standards fall into four categories and within each category there are many individual indicators (Workgroup for Community Health and Development, Chapter 36, Section 1):

Utility—the extent to which the evaluation considers stakeholders and engages in practices that enhance the use of the results.
Feasibility—the extent to which the evaluation activities are possible and can be carried out.
Propriety—the extent to which the evaluation meets ethical standards
Accuracy—the extent to which the findings are correct.

Standards of ethical conduct are described in more depth in most evaluation texts. Because I do not discuss them in depth in this book does not mean they are unimportant. In fact, it is important to be informed when doing assessments and evaluation. The message of the Joint Committee criteria is that evaluation and evaluation activities are to be taken seriously and done well.

Summary

An evaluation plan is essential to ensuring efficient and useful evaluations. Central to the plan is a clear focus for the evaluation, and most importantly, a set of good guiding questions, questions that can be answered by data that are available or that can be collected within the time and resources available. Several other factors related to planning successful evaluation efforts, namely power, politics, and ethics are also briefly discussed in this chapter.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Assessment and Evaluation in Higher Education: A Practical Guide Copyright © 2024 by Susan Twombly is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.