Gary Stager rightfully challenged edubloggers to follow/care/blog about the Reading First study. I’d ignored his words… until now. After reading an article by Sol Stern about the study (txs to eduwonkette for pointing me to the article), I had to comment…
You’re probably aware of the old adage, “If you’re losing the game, change the rules.” Well, I think the corollary for the social sciences is, “If you don’t like the results of a study, bash the methods.” Thus, given the unnecessarily political nature of reading instruction, and especially research on reading instruction, it was unsurprising to read a methodological critique of the recent IES-supported study of the Reading First program. It was even more predictable that the critique would come from a “scholar” at the Manhattan Institute.
What IS incredible and even ironic about Stern’s attack, in my opinion, is that he basically makes the argument against everything that the current federal administration has argued for over the last 8 or so years. And, having read a bit about Stern, I doubt he wants to do that.
A little background…
Reid Lyon and Grover “Russ” Whitehurst are two of the most influential figures in federal education policy and research circles. Lyon was the head of an influential branch of the National Institute of Child Health and Human Development (NICHD). In that capacity, he had his hands in the policy circles as well as within the research arenas. He is often credited with being the architect of the Reading First program. Whitehurst is the Director of the Institute for Education Sciences (IES), the education arm of the U.S. Department of Education. Thus, he is the chief federal educational researcher; as such, he certainly has influence on educational policy as well. Lyon and Whitehurst are both trained as psychologists and worked in the realm of special education and especially children with learning disabilities. Their research and development work has probably helped advance reading outcomes for lots of children with learning disabilities.
However, they arrived in Washington and tried to push the idea that all children must learn to read the same way that children with learning disabilities learn to read. Furthermore, and more importantly for this blog post, they came to Washington with narrow conceptions of educational research. Like just about everyone I know in the field of educational research who is trained within the disciplinary traditions of psychology, they are paradigmatically disposed to experimental research designs. Thus, Lyon, Whitehurst and their ilk are chiefly responsible for the hegemony of experimental research that emanates from inside the beltway. As a result of their influence, virtually all educational research conducted and funded by the federal government must be done according to the dictates of one particular research paradigm; i.e. the “gold standard” experimental design.
So, it is incredibly ironic when Whitehurst’s agency (IES) funds/supports/directs a study of Reid Lyon’s baby, Reading First, and the initial result is that “on average across the 18 study sites, Reading First did not have statistically significant impacts on student reading comprehension test scores in grades 1-3” [that language comes directly from the IES website where the study report is located]. That irony and that result notwithstanding, the real irony may be Stern’s criticism of the study which amounts to a condemnation of experimental research in education.
Research using experimental designs rarely employ large samples. Experimental designs, to function properly, must be “controlled” so as to avoid contamination (i.e. internal validity). And, to carry out a large-scale experimental study is terribly expensive and difficult to do. In fact, experimental research is not so concerned with sample size as it is with assignment and validity. Stern does not seem to recognize this and can’t seem to make up his mind about the sample in the Reading First study. Early in the article, he writes, “The study found that students in a small sample of Reading First schools…” So, he characterizes the final sample as “small.” Then, later, complaining about how the study methods were compromised, he writes, “Instead of 30 Reading First schools in six districts, the study would compare 128 Reading First schools in 13 states to a control group of schools that applied for Reading First but didn’t qualify for the federal grants.” So, if a sample involving 128 schools in 13 states is “small,” what would he have said if they stuck with the original sample of 30 schools? Would that have been a REALLY small sample?
Stern also writes, “…instead of using the ‘gold standard’ —random-assignment design—the study would instead compare the schools using a statistical technique known as a ‘regression discontinuity model,’ a less rigorous and comprehensive approach.” If RDM is less rigorous and comprehensive, why is it then one of the favored approaches on virtually all of the calls for proposals coming from IES these days? For example, consider page 9 of this current RFP, where, under the section called “Implementing Rigorous Designs,” in subsection A, it is written: “One approach to rigorously evaluating an intervention is to employ a regression discontinuity design (http://www.socialresearchmethods.net/kb/quasird.php).” Mr. Stern, would you like to tell Dr. Whitehurst that grant proposals put out by his agency are favoring less rigorous approaches to evaluation?
Finally, and most importantly, Stern is concerned that “the study was compromised because ‘the control groups were often doing the same thing that the Reading First groups were doing.'” In methodological terms, we would refer to that problem as a threat to internal validity (eduwonkette specifically refers to this threat as SUTVA). In common sense terms, we would refer to that problem as, well…unavoidable. Do you know why? BECAUSE THESE ARE REAL SCHOOLS AND CLASSROOMS UNDER STUDY. THESE ARE NOT PRISTINE LABORATORIES WHERE EVERYTHING IS CONTROLLED BY THE RESEARCHER!!!
You see, researchers like Russ Whitehurst believe that the best research starts with randomly assigned treatment and control groups and accounts for every possible threat to internal validity. That may be fine in a small-scale study with laboratory-like conditions. However, in education, threats to internal validity abound. Teachers and students come and go. Teacher and students talk to each other about what they do. They are not independent observations. And, Mr. Stern, I know it’s hard to believe, but some teachers do very similar things to each other. One teacher may actually be teaching or using a particular curriculum very much like that of a teacher randomly assigned to a treatment group for a study. Thus, “treatments” in education may not be so distinguishable.
Eduwonkette wrote that “[e]xperiments in social science are fundamentally different than experiments in medicine, and it turns out gold standard is often more silver or bronze than we would have hoped.” Exactly. And, I would add, experiments in SCHOOLS are fundamentally different than just about any other “industry.” Why, because in experimental research, “noise” is a huge problem. In education, I would argue, “noise” is very often a very good thing.
So, Mr. Stern. Thanks for pointing out why IES has been naive and wrong all these years.