[NOTE: this is the final post in a series of posts about a report recently issued based on a study done by Marzano Research Laboratory. Part I is here, Part II is here, Part III is here, and Part IV is here.]
PART V: SUMMARY AND RECOMMENDATIONS
[NOTE #2: I know, I know…I’m a couple of days late on this one. Sorry.]
Before I sum up and conclude, I should point out one other major flaw in this study. Marzano and his team use percentile ranks incorrectly. On page 18 of the report, they write: “Of particular interest is the column entitled ‘% Gain.’ Again, this column contains the percentile gain (or loss) in achievement associated with the treatment (i.e., use of Promethean technology).” Two problems here. First, percentiles are not the same as percentages (or % as it is written in the report). Second, they then go on to write: “This value [the percentile gain] was determined by consulting a normal curve table for the area for each reported effect size.” This would be fine if the scores on the dependent variables are normally distributed, which they most definitely are not. For Marzano to go around saying that incorporating Promethean IWBs into instruction will improve student achievement by 17 percentiles is wrong on lots of levels.
It should be clear by now that if I were reviewing the Marzano IWB study report as a manuscript submitted for publication in a peer-reviewed journal, I would reject it. I would not even mark it as “revise and resubmit.” The problems with the work are too critical and, in most cases, impossible to fix.
In summary, those problems are:
- Misuse and misapplication of meta analysis.
- Incorrect usage of key terms.
- Serious problems with measurement validity and reliability.
- Major threats to internal validity.
Those last two points are with respect to each of the 85 classroom-based studies that serve as the basis for the meta-analysis. The ultimate problem, then, is that the hallmark of good meta-analysis is the use of strong criteria as decision points for including individual studies.
As a point of comparison, I’m linking to two reviews of research. Each is described as having used “best-evidence synthesis” which very closely resembles meta-analysis. The methods used in the studies reported in the articles below are also consistent with those used by the What Works Clearinghouse.
In the first article, you’ll notice on the seventh page of the document (p. 432 of the article) a list of criteria for inclusion. The authors of those articles also provide a list of studies that were considered for inclusion but that were ultimately excluded along with the reasons for exclusion. This combined approach is critical; it gives the consumer of the research confidence that the data used in the meta-analysis come from many solid studies.
The impact of sample size for any given study included in a meta-analysis is another important point raised in the articles above. According to the authors of the second article, “[p]revious research (e.g., Rothstein et al., 2005; Slavin, 2008; Sterne, Gavaghan, & Egger, 2000; Taylor & Tweedie, 1998) has shown that studies with small sample sizes report larger effect sizes than studies with large samples.” As a result, in their meta analysis, the authors weight the individual findings by sample size. In each of the separate sites/studies used by Marzano and his team in their meta analysis, sample sizes were tiny. Consider for example site #34, teacher #57 where there were 9 students in the control group and 5 in the treatment group. There is no way that study gets included in any decent meta analysis.
There is a bit of irony in my choice of articles to post as exemplars. The lead author in each of those studies is Dr. Robert Slavin, the developer/founder of Success for All. Slavin has been frequently critiqued for being the lead researcher/analyst/author on many evaluation studies of Success for All, the program that he created. In other words, he has been accused of producing biased research. I don’t know enough to say if his research is biased or not; it’s certainly legitimate though to raise the question of bias where he is involved in the research. What I do know, though, is that each of the articles appears in one of the most well-respected, highly selective peer-reviewed journals. The math study appears in the Review of Educational Research which is dedicated to only publishing exquisite and top-notch reviews, syntheses, and meta-analyses in education. Thus, there is good reason to believe that those two articles present exemplars of how meta-analysis type research should be done.
I wrote earlier that doing good, comprehensive program evaluation in education is difficult and resource-intensive. That said, I believe it would actually be reasonably easy to evaluate the impact of IWBs on student achievement. In this era of standards and accountability, in any given state, we have year-to-year state test scores (at least in math and reading/language arts) from grades 3 to 8. So, Marzano’s team could have focused on one or two grade levels in one or two subject areas in one state.
Let’s say they focused on 8th grade student achievement. All they needed to do was to find about 20 middle schools that were willing to participate. In those 20 schools, there would be one subject-area teacher teaching in a classroom with the IWB and one teacher teaching a comparable class (NOTE: comparable here refers to students who are demographically similar and who are no different with respect to student achievement at baseline) without the IWB. Surely there are at least 20 middle schools in any state where there are two 8th grade teachers teaching comparable classes.
A common way to get schools and teachers to participate in such a study would be to offer an incentive. For Promethean, the promise of a free IWB to the teacher/classroom in the control condition the year after the study would be a wonderful incentive. Given this sampling framework, Marzano’s research team could work with the schools, districts or state departments to get student achievement data on the students in those 40 classrooms (20 treatment + 20 control). This could easily be done without violating any privacy laws. The students’ scores on the 7th grade state exams could serve as the pretest or the covariate. Their scores at the end of the 8th grade year would be the dependent variables. Over 40 classrooms, we’d be talking about a sample size of well over 800, with well over 400 students in each condition. Such a study would have lots of power. Analytic decisions would have to be made with respect to the unit of analysis. Marzano and his team could use the classroom as the level of analysis and conduct matched-pairs statistical test. Or, they could use the student as the unit of analysis and account for the nesting or lack of independence by using multilevel modeling techniques. Either way, this design would be much more appropriate and powerful for estimating the effects of IWBs on student achievement.
In the last couple of days, I spoke about this series of posts to two professors who I respect greatly. Interestingly, each one was very surprised to hear my opinion that Marzano was affiliated with sloppy work. One said, “he’s always been so careful.” That may very well be. I don’t intend for this series to be an indictment of Marzano (or even of IWBs). My hope is that I’ve provided a sensible critique of research that is being widely disseminated.
I often lament that decisions in education are too often made in the absence of empirical evidence. I wish policymakers in education would consult research more often. However, if educational decision makers decide to make an investment in interactive white boards, I would strongly urge them to do so for reasons other than the evidence offered by the Marzano Research Labs.