17 Designs with Comparison Groups
Introduction
This chapter covers research designs featuring (and requiring) comparison groups of participants and non-participants. The ideal in evaluation research is to have at least two groups of people to study: those who participated in a program or intervention (or some version of the intervention) and a similar group of participants who did not participate or who participated in a different version of the program. First, I discuss causal comparative research. Then, I discuss various types of experimental designs, which are often considered the gold standard in program evaluation. These designs allow one to attribute outcomes to the program or intervention with varying degrees of certainty.
Causal Comparative Research
Overview
Researchers and evaluators use causal comparative methods when they want to explore the reasons behind differences in outcomes for two or more groups when the groups naturally exist and the program participation has already happened (researchers cannot form groups and give one the treatment and withhold it from the other). The goal of causal comparative research is to determine why the outcomes for one group (participants) are different from another (e.g., non-participants). The term “ex post facto” is applied to this type of research because a difference is observed after data have been collected and the evaluator tries to find out what explains that difference based on the data already collected. As Mertler notes “The researcher is looking for a possible cause ‘after the fact’ since both the precursory conditions and the resulting differences have already occurred….” (p. 104).
NOTE: This type of analysis, often done by analysts in an institutional research office, may not be called causal comparative research.
Causal comparative research requires that the evaluator be able to define an independent variable or a grouping variable(s) that is expected to influence an outcome variable. In the case of evaluation studies, participating or not in an intervention such as UNIV101 is an example of a typical grouping variable. The evaluator examines differences in the outcome or dependent variable between groups—those who took UNIV101 and a similar group of those who did not.
There are two situations that typically motivate this approach in higher education. From looking at descriptive data, someone notes a difference on an outcome measure of importance (e.g., retention rates) between two or more groups of individuals and tries to figure out what explains the difference. The study conditions and data already exist. In the second case, you wonder if students who participated in UNIV101 are retained at higher rates than first-year students who did not. To answer this question, you identify a group of individuals who participated in UNIV101 and a similar group of non-participants and seek to find out if there are differences in outcomes between the participants and non-participants. Because UNIV101 is not required, you can find a similar comparison group. This design will allow you to say with more confidence whether the grouping variable (the program or intervention) led to the outcome.
Applying this to the UNIV101 example, some first-year students took UNIV101. The university has data about them in a university database, demographic characteristics as well as general outcome data such as grade point average and whether they are retained and graduate in four or six years. Even though there was no control group set up ahead of time consisting of first year students who did not take UNIV101, the university also has this same grade point average, retention and graduation data on first year students who did not take UNIV101. The database does not, however, have equivalent data on UNIV101 assignment measures for non-participants so you can’t compare the two groups on quiz scores. To find out if there are differences in outcomes such as grade point average and retention rates between participants and non-participants, a causal comparative design is appropriate. It can help you answer the question of whether taking UNIV101 contributes to retention, taking into consideration other selected demographic variables.
Causal comparative research addresses the following types of questions:
- Does participation in UNIV101 (yes-no) explain differences in outcomes? Or to what extent does participation in UNIV101 explain the difference? (Assumes same data exist for both participants and non-participants.)
- To what extent does participation (yes-no) in UNIV101 explain outcome (e.g., retention) scores controlling for selected student characteristics? (You must have the same data for both participants and non-participants.)
- Does participation in UNIV101 (yes) predict retention, college GPA, and graduation when compared to peers who did not enroll in UNIV101?
A key to effective causal comparative studies is being able to construct comparable groups on which to compare outcomes. There are several methods of constructing comparable groups. Although causal comparative designs can be constructed after a program is implemented (post-hoc) and do not require creation of experimental and control groups before a program is implemented, they do require forethought because the same data must be available for participants in all comparison groups. For this reason, before discussing the types of causal comparative studies, I will spend a bit of time on data collection tools.
Data Collection Tools
Causal comparative research designs rely on data from similar individuals who have and have not participated in some sort of intervention at some point in the past; in the example, students who took UNIV101 and a similar group of students that did not. In colleges and universities, these data are often found in large databases from college or university student records or other databases that contain data on variables of interest (e.g. a human resources database of faculty and staff). The dataset was likely not created just to store data for the purpose of comparing groups on outcomes, but the data can be used to do so. Data on some variables for all students at a college or university would be in the student database regardless of whether they took UNIV101 or not. One of the limitations of such research is that data on outcome variables of interest must be available for both participants and non-participants. This is one reason that global outcomes such as retention, grade point average, and time to graduation are frequently used as outcomes in causal comparative studies in higher education: the data exist for all students. Similarly, lack of data availability on program-specific outcomes for non-participants is a reason that these studies don’t focus on more specific program outcomes.
Survey data may also be a source of data for causal comparative analyses as long as the survey asks about and includes data from respondents who participated in some program and respondents who did not. An example of a survey for which causal comparative design can be used is NSSE, the National Survey of Student Engagement or its community college equivalent (CCSSE). Both surveys ask respondents to indicate whether and how often they participate in various practices known to improve student outcomes called high impact practices. High impact practices are considered interventions. These two surveys also ask students to self-report gains on measures of student engagement and learning (outcomes). A researcher can compare self-reported gains (outcome measures) for those who report participating in one or more of the high impact practices such as tutoring (an intervention), for example, with those who do not. NSSE and CCSSE both suffer from using self-reported outcome measures. They do not use tests or inventories to capture actual learning or attitudes. Despite limitations, this is often the best data available.
Approaches to Constructing Groups
The key to conducting causal comparative studies is to create groups after the fact that are like one another except for the intervention (UNIV101). There are several approaches used to create the comparable groups of participants and non-participants. All these methods have in common the goal of creating relatively comparable groups to eliminate potential selection bias (e.g., students with higher ACT scores participate; students with lower ACT scores do not.)
Using Multivariate Analysis
Using statistical controls and multivariate regression are common methods for analyzing the effect of group differences between program participants and non-participation on an outcome variable. Introducing additional variables of interest to statistically control for their influence on the outcome allows evaluators to rule out competing explanations and to explain more fully what is contributing to the outcome. In the UNIV101 example, participation in UNIV101 (Yes-No) would become an independent (grouping) variable because course participation is expected to act on the dependent variable—grade point average, retention, or years to graduation. To make the groups somewhat more comparable, other demographic variables that might be related to or affect the outcome are added to the analysis as control variables.
Multiple regression (or logistic regression if the outcome variable is a categorial or nominal variable such as “retained: yes or no”). When a group that did not participate in UNIV101 is introduced as discussed above, it is appropriate to use multivariate statistics to determine the predictive power of participating in UNIV101 or not controlling for other variables of interest (e.g., major, high school grade point average, race) on outcome variables such as retention and grade point average in college. Participation in the program (yes or no) becomes the main independent variable in the analysis. Or alternatively two separate regression analyses are performed, one with the participant data and one with the non-participant data. The results of the two analyses are compared. It is important to reiterate that causal comparative research design necessitate that the same data must be available for both participants and non-participants.
Although not considered as powerful as experimental or matching designs, multivariate analysis is certainly more powerful than simple descriptive or correlational research—because it allows you to compare participants and non-participants who are similar on relevant characteristics and for whom you have the same “outcome” data, such as retention and time to degree. Use of multivariate methods to create groups occurs during the analysis is probably the most frequently used causal comparative approach in higher education evaluation. The statistical procedures in a sense allow you to construct roughly equivalent groups. This approach allows you to determine the extent to which program participation or not predicts the outcome or is related to the outcome when taking into consideration other variables. Statistically controlling for these potential factors allows you to rule them out (or in) as affecting the outcome or at least understand how they do affect the outcome. As discussed earlier, you might assume there is some selection bias operating, for example, certain types of students are more likely to sign up for UNIV101 than others, and those biases might affect the outcome measures. Controlling for these other variables helps you understand what is contributing to the outcome.
Constructing Comparison Groups by Matching
A somewhat stronger design involves constructing matching groups. In this design, groups are created post hoc—after the intervention. An example illustrates the concept. Years ago, the University of Kansas sponsored a freshman summer institute (FSI), an extended on-campus summer orientation course, for students who needed a little bit of a head start. Evaluators wanted to know if the program made a difference in first semester grade point average and retention. Students were not randomly assigned to the FSI or to a control group so evaluators could not employ an experimental design. However, because considerable data on all freshman students existed, including data on the outcomes of retention and grade point average, the evaluators could use a matching design to create a treatment (participated in FSI) and a control group (did not participate) to see if FSI had an impact on first semester grade point average, for example. (As I recall, it did not. The likely reason being that the participants were perhaps not the intended participants and would likely have done well with or without the summer program.) There are a number of ways to construct matching groups. These matching techniques can also be employed in quasi-experimental designs as discussed below. In causal comparative methods, groups are created after the treatment of interest has occurred; In quasi-experimental designs discussed below, it is presumed that the evaluator creates the groups and then administers the treatment.
Group matching
The design for the FSI study used existing data from the student record system to construct two equivalent groups—one composed of students who participated in FSI and one composed of an equivalent group of students who did not. To do this, the evaluators identified some important student characteristics the literature suggests affect the outcomes as the basis for creating equivalent groups of students who participated in FSI and students who did not.
Suppose the evaluator has reason to believe that socioeconomic status (Pell eligibility), combined ACT or SAT (at the time ACT was in important input variable), and high school grade point average are critical variables that might potentially affect whether FSI has its intended effects. The evaluator could calculate descriptive statistics for the participants in FSI and then select a set of comparable students (on the measures identified) that did not. If, for example, average ACT was 23 for students who took the course, the comparable group of students who did not take the course would be selected so that the average ACT of the comparison group is also 23. Likewise, if 55% of the students in FSI sample were self-identified female, the constructed comparison group would also contain 55% individuals who identified as female.
In the FSI example, statistical software was used to identify the group of students who participated in FSI and a comparable group that had similar characteristics except that they did not participate. Constructing groups post hoc worked because a large database was available that contained the same information on the groups of interest. With a database of information, evaluators created comparable groups and compared the retention rate grade point average of participants and non-participants.
Pair matching
Another, more restrictive, but more robust, way of creating comparison groups is called pair matching. This involves finding individuals who are the same on the variables of interest to comprise the treatment and control groups. If one member of the treatment group (FSI participant) is self-identified as a female student with a GPA of 3.3, efforts will be made to find a female student for the control group who has the same GPA and so on. The result is treatment and comparison groups that are alike on at least some important characteristics. In the FSI example, the groups and the outcomes “were in the system” as existing events having happened in the past.
Propensity score matching
This method of matching is considered the gold standard process for drawing causal inferences in the absence of randomly assigned treatment and control groups. This is often the case particularly with respect to evaluating the effects of educational interventions at the K-12 level. Propensity score matching employs advanced statistical techniques to establish matched groups by estimating the probability that a person with certain characteristics will be in the treatment group. A definition and description of the method can be found here https://www.statisticshowto.com/propensity-score-matching/.
Strengths and Limitations of Causal Comparative Methods
These designs provide a good alternative to experimental designs when requirements for an experiment are not met. They can be used to answer broad questions, such as whether participating in certain activities predicts retention, grade point average, or time to degree when compared to a similar group of non-participants. Causal comparative studies enable the evaluator to be more certain about the effects of programs on observed outcomes because there is a comparison group involved.
And, importantly for you, causal comparative studies are much more doable in higher education evaluation research than are experimental designs. They require the same outcome data on both groups. Because most colleges and universities have large student, faculty, and staff databases, causal comparative studies may be much more possible to carry out than are experimental designs. It would certainly be possible to do a pretty convincing post hoc, study with several years of data to demonstrate whether UNIV101, taking into consideration important demographic characteristics, contributes significantly to the desired results (e.g., retention, and time to degree) because a comparable group that did not take UNIV101 could likely be identified. Causal comparative methods, allow you to establish with some degree of confidence that that program contributes to the outcome of interest (or not).
However, unless the evaluators were very thoughtful at the outset and collected the same course-specific outcome data (quiz scores and other assignment grades) from students who did not take UNIV 101, which is unlikely, the comparison would have to be on some global outcomes, such as grade point average, retention, and time to degree, that exist for all students in a student database. In fact, lack of availability of common outcome data in student record systems is the reason that the outcomes for causal comparative studies are often global measures such as retention, time to degree, and grade point average. All student record systems allow you to compute these outcomes for all students. Another limitation of causal comparative studies is that, even if you can do so statistically, it is very hard to argue that participation in any one program causes outcomes such as grade point average and retention without taking into consideration many other variables that could potentially influence the outcomes.
Experimental Designs
With exception of the “one shot posttest and pretest/posttest designs”described below, the designs discussed in this section allow the evaluator to compare outcomes for program participants with those who did not participate or with participants in different versions of the program with the goal of making causal inferences about the effect of the program. Even though some of the designs discussed below are not often employed in higher education, it is important that you understand the different designs—and what is involved in determining causation.
One Shot Posttest and Pretest-Posttest Designs
Research methods texts often refer to these designs as pre-experimental (e.g., Mertler, 2019). These are simple designs used frequently in outcome assessment in higher education especially in student affairs, success, and co-curricular programs, so they deserve some discussion and scrutiny. When you have outcomes for participants only, these designs allow you to answer one or both of the following questions about participants in a program:
- How do participants score on a measure after participating in a program? (post test only) OR
- Do participant scores change as a result of participating in the program as measured by pre and posttests?
Following the norm among research methods books, I am lumping one shot pretest and posttest designs with experimental designs, but typically, as used in higher education assessment, one-shot posttest and pre-posttest designs are employed as descriptive designs involving only program participants. In the posttest design, a single group is exposed to a treatment (a training program) and given a posttest about what the participants learned. Assuming that a unit on campus services in UNIV101 is an intervention, instructors would do a one-shot posttest design when they give a test on campus services at the end of the unit on campus services. There is no premeasure, which is a limitation of one-shot designs. In other words, you do not know what knowledge of campus services students brought with them to UNIV101.
An extension, and slight improvement, of this design is to give the students in UNIV101 a pretest before the unit on campus services followed by a posttest. This way, you are able to look at the outcome taking pre-knowledge into consideration. The evaluator would then compare scores from the pretest with scores on the posttest. This design is stronger than the post-test only, but it still lacks a comparison group of students who did not take UNIV101. Moreover, the evaluator has no control over group membership or exposure to the treatment. A particular type of student may enroll in UNIV101 that would influence the outcomes.
With the pretest-posttest design, one can use a variety of statistical tests, the simplest of which is the paired T-Test to compare the pretest and posttest scores. Maybe you are interested in computing change. Another method determines the difference between the pre and posttest score for on and off campus residents, for example, controlling for the pretest scores using ANCOVA. These designs do not do a very good job of considering other variables that might influence either pre or posttest scores unless the program systematically collects and uses demographic data. The most one can conclude is that scores went up (or down) for a particular group of participants at a particular point in time. The reason for limited power of the simple pre and posttest design is that there may be some systematic characteristics of the participants other than the program itself that influence the outcome, namely selection-bias.
There are other limitations of one-shot posttest and pretest/posttest designs. The measures must be valid and reliable. It is not easy to create good assessments that measure what they intend to measure and do so consistently. Moreover, the measures should be measures of learning or behavior outcomes, not of satisfaction. Second, participants may be high performers when they enter a program and thus might not learn much that is new. Short programs may be limited in their ability to produce significant learning. For longer programs, participants may drop out, and also be subject to contamination (they learned information from other sources), or students may naturally mature over the course of the program. Research methodologists generally consider one-shot posttest and pretest/posttest designs to be weak designs. I participated in an evaluation of study abroad program that suffered from two of these problems. Students rated themselves as very self-aware on the pretest and thus there was not much room for them to improve on the posttest. Also, the pre and posttests asked students to rate themselves on outcomes that were overly ambitious to achieve in a short, ten-day or two week study abroad, such as language development.
For these reasons, one of my main concerns about use of pre and posttests in higher education outcomes assessment, particularly in student affairs and success programs, is that they may communicate a false sense of precision and causation that may not be warranted. That said, if the measures are good and collected over time, they may yield useful information for program improvement.
NOTE: To be clear, there is nothing wrong with the pre and posttest concept itself. In fact, pre and posttests are typically used to measure outcomes in simple and complex experimental designs discussed next. The key is understanding the limitations and what one can conclude from one shot posttest and pre and posttest measures when no control group is involved.
Experimental Designs
The primary methods for establishing that the intervention caused the outcome with the highest degree of certainty, are experimental designs. Experimental designs require that “the conditions being compared should be identical in all respects except for the intervention” (Rossi et al., p. 237). Experimental designs have three key characteristics: 1) Random assignment of individuals from a population to 2) at least one treatment group and a control group. Group membership is considered to be the independent variable. The treatment group participates in a program and the control group does not or participates in a different version of the program. 3) Third, there must be a dependent or outcome variable that can be measured for all groups at least at the end of the intervention. Any systematic differences (e.g., grade point average differences) among participants are assumed to be randomly distributed equally in the two groups and thus participant differences will not affect the results. Without such carefully controlled experimental conditions there is simply too much possibility that other variables intervene to conclude with confidence that program “x” caused outcome “y.” Other methods are less effective at ruling out all possible alternative factors that might contribute to a particular finding.
There are many types of experimental designs. This chapter focuses on the simplest. A simple experiment gives each group the same posttest. A second and even stronger design administers a pre and posttest to both groups. In the latter, scores are compared and perhaps even a change score computed and compared statistically between the treatment and control group.
To use a specific example: If the question is whether taking UNIV101 results in higher sense of belonging, two groups could be randomly created from incoming students. One group is assigned to UNIV101 and one group of first year students to a control group who don’t take the course. Both can be given sense of belonging scale at the beginning of the course/semester and at the end of the course/semester. Even if this were possible, there are a number of weaknesses to this approach not the least of which is the fact that students are also participating in many other activities simultaneously that could enhance sense of belonging. In general, experimental designs are challenging to use to asses effects of long-term programs. Other processes may contribute to the outcome. Students, in particular, mature, which may account for outcome changes, and conversely people drop out.
There are many types of complex experimental designs used in social science research that I do not touch on here. Experimental designs typically, if not always, involve quantitative measures of the outcome of interest and employ statistical analysis to compare the results of the experimental group with those of the control group. Experimental designs typically employ pre and posttests to gather data. As with the above example, you would have had to plan carefully to employ this design to UNIV101.
Questions to be answered by experimental methods:
- Are there differences in scores on an outcome measure between the treatment and control group? Statistically significant differences are interpreted as the intervention or treatment causing the outcome or not.
- If the comparison groups consist of two versions of the program, experimental designs can answer the question of whether one produces better outcomes than the other.
If true experiments with random assignment to treatment and control groups is not possible, it may be possible to do what are called quasi-experimental studies.
Quasi-experimental Designs
Quasi-experimental designs involve a treatment but lack one of the key characteristics of true experiments—random assignment to a treatment and control group. Quasi-experiments involve creating relatively similar matching groups one of which is subjected to a treatment and the other is not. An intervention is administered and then these groups are given pre-tests and compared on some sort of post-test or outcome to see if there is a difference. One type of quasi-experimental design involves two intact groups (for example, sections of the same course or two different residence halls. One section of a course is given some sort of treatment (e.g., team based learning), and another section uses traditional lecture. Students are assessed using the same outcome measure (and perhaps same remeasure) and outcomes are compared.
As applied to the UNIV101 example, some UNIV101 enrolled students (likely intact sections) would be considered a control group (given a placebo unit), and some (other intact sections) would be identified to get the “treatment” (a unit on campus services) and all are given the same pretest and posttest. Students are not randomly assigned to either group.
Creating the groups from intact groups and then administering the intervention is what differentiates quasi-experiments from causal comparative studies described earlier. Mertler (2019) suggests that this type of design can take the added step of matching students from these two groups on important variables to make them somewhat equivalent. One approach involves selecting a sample of individuals from each section who are alike on some key characteristics. One section is then exposed to the treatment, the other is not (or to a different version), and a post-test is administered. Alternatively, individuals can be randomly selected within each intact group.
Quasi-experimental research is useful when random assignment is not possible, but it is possible to administer the intervention to a treatment group and compare performance to that of a similar control group that did not get treatment. You can see how this might be both possible and potentially difficult in higher education. The researcher has somewhat more control over the groups and the outcome measures than for a causal comparative study. The key to recognizing this design is that 1) participants are in somewhat similar intact groups to which members are not randomly assigned, 2) one group is exposed to a treatment and the other is not, and 3) a test is administered either post intervention only or before and after the intervention to both groups. Results are compared. This design is more feasible to use in higher education but still faces obstacles.
Challenges to Using Experiments in Higher Education
Although experiments, also sometimes called randomized controlled trials, are considered to be the gold standard in education research, especially in the preK-12 arena, there are many challenges to using them in higher education.
Choosing the best evaluation design always involves a series of tradeoffs. Gold standard though they may be, experimental designs come with some concerns. There is some concern in the social science literature that much of the psychological research based on small sample experimental designs does not hold up in replication studies (Barnett, 2016). It is also that that experimental or randomized controlled trials can result in obvious conclusions. A number of years ago, I heard a professor from the University of Wisconsin report on findings from his multi-million-dollar Institute of Education Sciences funded randomized control trial, report the very common sense finding that school context matters. He needed neither millions of dollars nor a randomized control trial to come to that conclusion! Evaluators should be cautious even about findings from experimental or studies based on randomized control trials that are so popular in education. This is particularly true when sample sizes are small.
True experimental designs are seldom used in higher education research and evaluation except by professors who routinely use experimental designs in their own research. The reasons for this are several:
- It is difficult to grant or withhold services from students, faculty, or staff if the services are believed to be beneficial and the student is paying tuition. Thus, it is difficult to randomly assign faculty, staff, or students to a treatment or control group. And, it might be considered unethical, impossible, or politically untenable to do so. Just imagine randomly assigning incoming freshmen to either take UNIV101 or not! Or, one section of UNIV101 gets a “treatment” and another does not. Parents might get angry, for example, if they knew that their son or daughter was not assigned to UNIV101 when their neighbor’s son or daughter is under the premise that they paid the same tuition and should receive the same services, especially ones that are known to have positive effects.
- It is hard to control all necessary factors to set up an experiment when using real groups, such as courses. For example, I recently sat on the doctoral committee for a student attempting to test an intervention through a college communication studies course. Even though she had agreement from the department to do so, she could not control the conditions of who was teaching the course or the books used. In this case, some important course dimensions affecting her study changed the week before she wanted to start the study! Additionally, a course is a long-term proposition, meaning that results are subject to contamination from outside sources (students learning on their own, learning from other students, other courses, etc.) over the course of a semester.
- Unless program administrators have a strong background in research design, especially experimental studies, they are not likely to think about setting up an experiment in the program design phase. To use an experimental design, one has to randomly assign students in the case described above to either a treatment or a control group before the intervention starts. It is not possible to recreate true randomized groups after the program has begun. Causal comparative and quasi-experimental designs can be done after the groups have already been formed, but to do these types of studies you must have the same outcome data for both participants and non-participants. For quasi-experimental designs you must be able to identify the groups and administer some sort of treatment to one and collect identical data for both. For causal comparative designs, you need to have the same data for participant and non-participants for a treatment that occurred in the past and be able to construct a treatment an control group through some sort of matching process.
One could make an argument that some version of experimental designs could be useful in some cases. Colleges and universities too rarely set up carefully designed pilot studies to study the effects of different program options. This would be a very good use of experimental, or more likely, quasi-experimental, studies involving intact groups. For example, if administrators want to know if “flipping” classrooms results in better learning, they could set up a controlled quasi-experiment in which a section of PSYCH104 students is assigned to the experimental group (flipped classroom) and another randomly selected section is assigned to the typical course format taught by the same professor using the same syllabus. This presumably would overcome some of the identified obstacles; Psych104 is not being withheld from anyone. Individual professors often use different pedagogical techniques and students would not be denied standard methods of teaching the course. Creating experimental and control groups in student affairs and co-curricular areas is harder to do. In these cases, however, quasi-experimental designs using intact groups could be possible with sufficient forethought and planning..
Henning and Roberts (2024) argue that the most common type of experimental design employed in student affairs research involves a pre and posttest design, with which I agree. Unfortunately, pre and posttests are most frequently used with participants only, limiting the type of conclusions to be drawn. Unless designs involving pre and posttest include control groups, and there is an incentive to do well on the outcome measure, they suffer from a series of limitations discussed earlier and should be used with caution to draw firm conclusions about the impact of a program on participants.
A very good example of research using experimental design is the work done by the Prevention Innovation Research Center at the University of New Hampshire. This center is well known for its work on sexual assault prevention. A recent report from the center provides a very good example of use of experimental design to “test” several alternative approaches to sexual assault prevention (Prevention Innovations Research Center).
Colleges and universities could likely make more use of quasi-experimental designs in carefully selected places. In order to do so, program sponsors need to think about the evaluation design while they are designing the pilot program itself because it is too late to do so after the program is implemented. That is, they need to think like researchers while planning the program.
Summary
This chapter provided a brief overview of research designs involving participants and comparable non-participants allowing the evaluator to establish with more certainty that program participation “caused” the outcome. Designs included introduced in this chapter include various designs involving comparison groups. One such method falls into the category of causal comparative research. Other methods involve different types of experimental designs: One-shot and pre and posttest, experimental and quasi-experimental designs. Various obstacles to using experimental designs in college and universities are described.