A Note on Greene's
"A Meta-Analysis of the Effectiveness of Bilingual Education"
by Stephen D. Krashen
University of Southern California
Greene's "Meta-Analysis" is a short
report that should have a profound impact on the field.(1)
In 1996, Russell and Baker published an analysis of the effectiveness of
bilingual education, and concluded that there was no evidence that bilingual
programs were superior to English-only options for limited English proficient
children. Greene has reworked the data Rossell and Baker analyzed, applying
a more rigorous and precise approach called meta-analysis, thus repeating
what Willig did in 1985, when she reanalyzed the results of Baker and de
Rossell and Baker used a "vote-counting" technique in their
review of studies of bilingual education. If a study showed that students
in bilingual classes did better than those in non-bilingual classes, bilingual
education got one "vote," and if those in non-bilingual classes
did better, non-bilingual got one "vote." A problem with vote-getting
is that a study can be counted as favoring one method even when it is only
slightly better. Winning by a little and winning by a lot count the same.
Meta-analysis takes this into consideration by assigning to each study
a number that indicates the size of the effect, how big the difference
was between the treatments. Greene reviewed the studies Rossell and Baker
did, but calculated, for each study, the "effect size." An effect
size of zero means no difference between the groups. A positive effect
size meant students in bilingual education did better, a negative effect
size means that students in non-bilingual groups did better.(2)
Greene's analysis differs in another way from Rossell and Baker's: He
only included studies with a treatment of at least one year. He did, however,
accept all of Rossell and Baker's other criteria for whether a study was
included in the analysis (use of a control group, control for initial differences
in the groups or randomization, use of standardized tests in English, use
of appropriate statistical tests).(3) He
found 11 studies that were eligible for analysis, computed the average
effect size for English reading, math, and Spanish reading. This average
effect size was positive, which meant that on the average, bilingual education
had a positive effect. This replicated Willig's results in her re-analysis
of Baker and de Kanter's vote-getting review. Greene reported an average
effect size for English reading of .21, which statisticians consider to
be modest. It was, however, statistically significantly different from
zero, which meant it is unlikely that it happened by chance. (This means
that the average student in the bilingual groups scored .21 standard deviations
above the mean of the average student in the non-bilingual education groups.
According to the Tomas Rivera Center, minority students score about one
standard deviation below non-minority students; bilingual education, then,
makes up about 20% of the gap. For math, the effect size was .12, while
for Spanish reading it was .74.)
Greene's analysis may have underestimated the effect of bilingual education:
First, the average duration of program in the 11 studies he analyzed was
only two years. Cummins and others have argued that the full impact of
education in the primary language is not felt until more time has gone
by (see e.g. Thomas and Collier, 1997 for extensive discussion). Second,
Greene did not attempt to account for the kind of bilingual education model
used; some kinds of programs are more effective than others (e.g. Legaretta,
1979). Of the three published studies he included, "bilingual education"
is not described in any detail in two of them (Bacon, Kidd, and Seaberg,
1982; Rossell, 1991), in one, paraprofessionals are used (Bacon et. al.,
1982), and in another "bilingual education" was direct instruction
in reading for seventh graders who could already read in English to some
extent (Kaufman, 1968). Current theory predicts that in well-designed programs,
with subject matter teaching in the first language, literacy development
in the first language, and quality comprehensible input in English, the
effects will be larger.
According to Pyle (1998), supporters of Proposition 227, a proposal
to end bilingual education in California, have criticized Greene's study
because the studies are "old" (The 11 studies are from 1968 to
1991). It must be pointed out, however, that Greene simply reanalyzed the
studies Rossell and Baker considered, a review that anti-bilingual education
critics have applauded. In addition, I ran a correlation between year of
publication of the study and the effect size reported: the correlation
was close to zero (r = .04) for the ten studies that reported an effect
size for English reading, meaning that earlier studies and later studies
had similar effects.
Bacon, H., Kidd, G. and Seaberg, J. 1982. The effectiveness of bilingual
instruction with Cherokee Indian students. Journal of American Indian
Education, February: 34-43.
Baker, K. and de Kanter, A. 1983. Federal policy and the effectiveness
of bilingual education. In K. Baker and A. de Kanter (Eds.) Bilingual
Education. Lexington, MA: DC Heath. pp. 33-85.
Greene, J. 1998. A Meta-Analysis of the Effectiveness
of Bilingual Education. Claremont, CA: Tomas Rivera Policy Institute.
Krashen, S. 1986. Under Attack: The Case Against Bilingual Education.
Culver City: Language Education Associates.
Kaufmann, M. 1968. Will instruction in reading Spanish affect ability
in reading English? The Journal of Reading, 11: 521-527.
Legarreta, D. 1979. The effects of program models on language acquisition
by Spanish-speaking children. TESOL Quarterly, 13 (4):521-534.
Light, R. and Pillemer, D. 1984. Summing Up: The Science of Reviewing
Research. Cambridge: Harvard University Press.
Pyle, A. 1998. Opinions vary on studies that back bilingual classes.
Los Angeles Times, March 2, 1998, B1, B3.
Rossell, C. 1990. The effectiveness of educational alternatives for
limited English-proficient children. In G. Imhoff (Ed.) Learning in
Two Languages. New Brunswick: Transaction. pp. 71-121.
Rossell, C. and Baker, K. 1996. The educational effectiveness of bilingual
education. Research in the Teaching of English, 30: 7-74.
Thomas, W. and Collier, V. 1997. School Effectiveness for Language
Minority Students. Washington, DC: National Clearinghouse for Bilingual
Willig, A. 1985. A meta-analysis of selected studies on the effectiveness
of bilingual education. Review of Educational Research, 55: 269-317.
Wolf, F. 1986. Meta-Analysis. Thousand Oaks, CA: Sage Publications.
1. Greene's report is also discussed in a "Policy
Brief" issued by the Tomas
Rivera Center, March, 1998.
2. For those who survived Statistics 1, here is how
effect size is computed. You take the mean of the experimental group and
subtract the mean of the comparison group and divide the whole thing by
the standard deviation of the control group (or the pooled standard deviation
of both groups). This "effect size" can be converted to a correlation
coefficient by a simple formula. By far the clearest introductions I have
found to the calculation of effect sizes and their use in meta-analyses
are Wolf (1986) and Light and Pillemer (1984). (It should also be noted
Greene used a technique that allowed him to take the size of the sample
of the study into consideration.)
3. Greene notes that I "proposed that Rossell
and Baker include additional studies favorable to bilingual education even
though they do not meet the criteria" (p. 3). This is not quite what
I proposed (Krashen, 1996). I only pointed out that while Rossell and Baker
excluded a number of studies for not randomizing or otherwise controlling
for existing differences among groups, we have no reason to suspect such
differences exist, and that when a number of studies like this are done,
one has a kind of randomization. Their results should not be ignored. I
agree, however, with Greene that meta-analyses should not include them
(or should deal with them as a separate group). In addition, I pointed
out that for many of the studies Rossell and Baker accepted that favored
non-bilingual groups, sample sizes were very small or the duration of treatment
was very short or the comparison itself was not valid (comparison of different
kinds of Canadian immersion).
COPYRIGHT NOTICE: Copyright © 1998 by
Stephen D. Krashen. All rights reserved.