A Note on Greene's
"A Meta-Analysis of the Effectiveness of Bilingual Education"

by Stephen D. Krashen
University of Southern California

March 1998

Greene's "Meta-Analysis" is a short report that should have a profound impact on the field.(1) In 1996, Russell and Baker published an analysis of the effectiveness of bilingual education, and concluded that there was no evidence that bilingual programs were superior to English-only options for limited English proficient children. Greene has reworked the data Rossell and Baker analyzed, applying a more rigorous and precise approach called meta-analysis, thus repeating what Willig did in 1985, when she reanalyzed the results of Baker and de Kanter (1983).

Rossell and Baker used a "vote-counting" technique in their review of studies of bilingual education. If a study showed that students in bilingual classes did better than those in non-bilingual classes, bilingual education got one "vote," and if those in non-bilingual classes did better, non-bilingual got one "vote." A problem with vote-getting is that a study can be counted as favoring one method even when it is only slightly better. Winning by a little and winning by a lot count the same. Meta-analysis takes this into consideration by assigning to each study a number that indicates the size of the effect, how big the difference was between the treatments. Greene reviewed the studies Rossell and Baker did, but calculated, for each study, the "effect size." An effect size of zero means no difference between the groups. A positive effect size meant students in bilingual education did better, a negative effect size means that students in non-bilingual groups did better.(2)

Greene's analysis differs in another way from Rossell and Baker's: He only included studies with a treatment of at least one year. He did, however, accept all of Rossell and Baker's other criteria for whether a study was included in the analysis (use of a control group, control for initial differences in the groups or randomization, use of standardized tests in English, use of appropriate statistical tests).(3) He found 11 studies that were eligible for analysis, computed the average effect size for English reading, math, and Spanish reading. This average effect size was positive, which meant that on the average, bilingual education had a positive effect. This replicated Willig's results in her re-analysis of Baker and de Kanter's vote-getting review. Greene reported an average effect size for English reading of .21, which statisticians consider to be modest. It was, however, statistically significantly different from zero, which meant it is unlikely that it happened by chance. (This means that the average student in the bilingual groups scored .21 standard deviations above the mean of the average student in the non-bilingual education groups. According to the Tomas Rivera Center, minority students score about one standard deviation below non-minority students; bilingual education, then, makes up about 20% of the gap. For math, the effect size was .12, while for Spanish reading it was .74.)

Greene's analysis may have underestimated the effect of bilingual education: First, the average duration of program in the 11 studies he analyzed was only two years. Cummins and others have argued that the full impact of education in the primary language is not felt until more time has gone by (see e.g. Thomas and Collier, 1997 for extensive discussion). Second, Greene did not attempt to account for the kind of bilingual education model used; some kinds of programs are more effective than others (e.g. Legaretta, 1979). Of the three published studies he included, "bilingual education" is not described in any detail in two of them (Bacon, Kidd, and Seaberg, 1982; Rossell, 1991), in one, paraprofessionals are used (Bacon et. al., 1982), and in another "bilingual education" was direct instruction in reading for seventh graders who could already read in English to some extent (Kaufman, 1968). Current theory predicts that in well-designed programs, with subject matter teaching in the first language, literacy development in the first language, and quality comprehensible input in English, the effects will be larger.

According to Pyle (1998), supporters of Proposition 227, a proposal to end bilingual education in California, have criticized Greene's study because the studies are "old" (The 11 studies are from 1968 to 1991). It must be pointed out, however, that Greene simply reanalyzed the studies Rossell and Baker considered, a review that anti-bilingual education critics have applauded. In addition, I ran a correlation between year of publication of the study and the effect size reported: the correlation was close to zero (r = .04) for the ten studies that reported an effect size for English reading, meaning that earlier studies and later studies had similar effects.


1. Greene's report is also discussed in a "Policy Brief" issued by the Tomas Rivera Center, March, 1998.

2. For those who survived Statistics 1, here is how effect size is computed. You take the mean of the experimental group and subtract the mean of the comparison group and divide the whole thing by the standard deviation of the control group (or the pooled standard deviation of both groups). This "effect size" can be converted to a correlation coefficient by a simple formula. By far the clearest introductions I have found to the calculation of effect sizes and their use in meta-analyses are Wolf (1986) and Light and Pillemer (1984). (It should also be noted Greene used a technique that allowed him to take the size of the sample of the study into consideration.)

3. Greene notes that I "proposed that Rossell and Baker include additional studies favorable to bilingual education even though they do not meet the criteria" (p. 3). This is not quite what I proposed (Krashen, 1996). I only pointed out that while Rossell and Baker excluded a number of studies for not randomizing or otherwise controlling for existing differences among groups, we have no reason to suspect such differences exist, and that when a number of studies like this are done, one has a kind of randomization. Their results should not be ignored. I agree, however, with Greene that meta-analyses should not include them (or should deal with them as a separate group). In addition, I pointed out that for many of the studies Rossell and Baker accepted that favored non-bilingual groups, sample sizes were very small or the duration of treatment was very short or the comparison itself was not valid (comparison of different kinds of Canadian immersion).

