Another Response to Keith Baker
by Stephen Krashen
University of Southern California
This paper continues a discussion of the merits of structured immersion,
begun by Baker (1998), who claimed that structured English immersion programs
have been successful. In my response to his paper (Krashen, 1999), I argued
that this claim was not supported by the research. Much of the research
Baker cited was unpublished and not available to readers, and in cases
in which data was available, his report of this data was inaccurate. Baker
(1999) responded to me and to Meier (1999). In this paper, I first respond
to points made in Baker (1999) on the issue of the claimed success of structured
immersion. Second, I respond to Mr. Baker's criticism of my work.
Part One: On the "Success" of Structured Immersion
In Baker (1998), structured immersion is defined as a program for limited
English proficient children in which:
- English "is used and taught at a level appropriate to the class
of English learners" (p. 199). In other words, there is an attempt
to make the input comprehensible (Krashen, 1982).
- Teachers "are oriented toward maximizing instruction in English
and use English for 70% to 90% of instructional time, averaged over the
first three years of instruction" (p. 199). Baker does not specify
how the first language is used.
I review here cases in which it was claimed that children in structured
immersion outperform comparison children in bilingual education. My view
is that none of these studies provides clear evidence that this is the
Baker (1999) continues to insist that children in structured immersion
in Uvalde, Texas, outperformed comparison children in bilingual education.
I pointed out that children in structured immersion only reached the 30th
percentile in grade 3, and then dropped to the 15th and 16th percentiles
in grades 5 and 6, a dismal performance. Baker (1999) claims that this
poor performance is irrelevant because "LE (late exit bilingual) children
did even worse" (p. 709). But a careful reading of the report (Becker
and Gersten, 1982; see also Gersten and Woodward, 1985) reveals no evidence
that the comparison group in this case had bilingual education. The decline
in scores of the structured immersion group remains serious counterevidence
to the claim that structured immersion has been successful.
Unnamed California District
This is another case in which it is claimed that structured immersion
children did better than those in bilingual education. The original report
is Gersten (1985). I pointed out that sample sizes were small in this study.
There were 28 children in the "immersion" group and 16 in bilingual
education. I also noted that no SES data was reported, that the bilingual
program was not described in any detail, and that the bilingual program
was highly unusual: The bilingual group included speakers of Korean, Vietnamese,
Samoan and Thai. I know of no district that provides bilingual education
in all of these languages. Baker (1999) only addresses the issue of sample
size, ignoring my other comments, noting that it is statistical significance
that counts, not sample size.
But there is a problem with this study. Gersten only compared the number
of immersion and bilingual students who performed at or above grade level
after grade 2. He found that more immersion students did so on the CTBS
Reading (75%, or 21), compared to the bilingual students (19%, or 3). This
is statistically significant. But we do not know how close the bilingual
students came to the criterion, and how far above it the immersion children
were. If three children fewer in the immersion group and three children
more in the bilingual group had made grade level, the chi square would
not have reached significance. The Language CTBS scores are even more fragile:
20/28 immersion and 7/16 made grade level. If one child more in the bilingual
education group had reached grade level, the difference would not have
been statistically significant. Conclusions in favor of a program should
be made of sterner stuff.
Gersten makes it clear that he had to make do with the data he had.
He also points out that only higher achieving bilingual education students
were tested, while all immersion students were tested. Nevertheless, one
cannot use this study as strong evidence that immersion is effective.
Baker claims that structured immersion was also a winner over bilingual
education in El Paso. I noted that the El Paso "immersion" program
contained 60 to 90 minutes of Spanish instruction daily, not thirty minutes,
as Baker claimed. Baker's response: "Whether Spanish was used 30,
60, or 90 minutes a day is not the point" (p. 709). He feels the point
is that there was less Spanish than the amount "advocates of bilingual
education maintain is absolutely essential" and that children in the
program, did better than those in a "standard" bilingual education
I think there are several things the El Paso "immersion" program
- The first language was used where it did the most good, in "the
more demanding content areas." This approach is similar to that used
in the "gradual exit" program described in Krashen (1996), in
which the first language is used for all subjects (except ESL and art,
music and physical education) at first. At a later stage, the first language
is used for those subjects that are difficult to make comprehensible for
those limited in English (social studies and language arts), while English
is used in those subjects that are easier to contextualize (math, science).
Gradually, students do more and more of their work in English. El Paso
does not inform us how much first language was used in the "standard"
program – it may well have been more than the amount used in the "immersion"
program. We do not know, however, if the first language was used as efficiently
in the standard program as it was in the immersion program.
- The "immersion" program also utilized Natural Approach for
ESL, sheltered subject matter teaching, and adopted a whole language philosophy.
In contrast, the standard bilingual program did not have sheltered subject
matter teaching and used a bottom-up approach to reading. My point is that
the comparison made was between two versions of bilingual education, not
between structured immersion and bilingual education, and that the programs
differed in important ways.
Cummins (1999) comments that Baker has argued
on both sides of this issue. While Baker now considers the El Paso program
to be a structured immersion program whose success shows "the harm
that bilingual education programs do to learning English" (Baker,
1998), in 1992 Baker referred to the same program as Spanish-English dual
immersion, and interpreted the results differently: "The El Paso study
supports the claims of bilingual education advocates that most bilingual
education programs do not use enough of the native language" (Baker,
1992, cited in Cummins, 1999).
Texas Education Agency
Baker claims victory for structured English immersion over bilingual
education on the basis of a report done by the Texas Educational Agency
in the 1980s. Baker provides no citation for this report. In Krashen (1999)
I produced evidence that one report issued by the Texas Educational Agency
at that time was actually a comparison between bilingual education and
ESL, and showed slight advantages for bilingual education. Both programs
contained a significant amount of content-based ESL teaching. Baker only
says that this report "appears not to be the one I mentioned"
but his copy of the report "has disappeared" (p. 709). Yet he
does not hesitate to claim it supports structured English immersion over
Baker (1999) says that in his original Kappan paper (Baker, 1998) he
forgot to include an additional study showing structured immersion to be
superior to LE (late-exit bilingual education), a "very effective
project in McAllen, Texas" (p. 707). Baker also claimed that Willig
(1985) had erroneously classified this program as late-exit bilingual education.
False. Willig classified the special program in McAllen as "alternate
immersion," not late-exit bilingual education (p. 284). Willig deals
with the McAllen data as a comparison of two versions of bilingual education,
alternate immersion versus concurrent translation. The program referred
to by Baker as structured immersion contained one period per day of Spanish
reading: for kindergarten children this amounted to half the instructional
time (the total evaluation covered kindergarten and grade one). The explicit
goal of the program was biliteracy. As Willig (1987) notes, the only report
issued on McAllen by the district is a few mimeographed sheets in which
it is stated that the goals of the program include acquisition of English
and the continued development of the native language. Willig also points
out that students in the so-called immersion program outperformed "bilingual
education" comparisons in Spanish as well as English, confirming that
an emphasis was placed on Spanish language development.
I said that many of Baker's claims were based on unpublished data.
Specifically, several victories claimed for structured immersion are based
on unpublished reports: Yap and Enoki, 1988; Webb, Clerc, and Gavito, 1987;
and Baker's own Seattle study. Rather than supplying the reader with details
about these studies in his response to me and Meier, Baker (1999) only
says that rejecting unpublished data is "so absurd that it warrants
no comment" and then accuses me of citing unpublished data, claiming
I did so in my notes 1 through 4 and note 10 of my response (Krashen, 1999).
All of the citations in my footnotes are in the public domain, none is
unpublished, and several are in refereed journals. Baker's citations are
not only unpublished, but they are not available to readers: Yap and Enoki
is a conference presentation, which means it is not available unless one
happens to know where Yap and Enoki work. Webb et al. is only described
as a "HID mimeo." Baker's Seattle study is not even cited in
A few other points:
Who Is a Dilettante?
In his response to Meier (1999), Baker says that "Meier's contention
that certain variables must be controlled ... is common among dilettante
methodologists. But it is wrong" (p. 708).
Baker cites Deming (1975), an article in a collection by Struening (Ed.)
as a source for this claim. I was curious to see if Deming really said
that controlling variables is only for dilettantes, and tried to order
the book. I finally managed to locate a copy through Booksource, Ltd. I
read Deming's paper and it does not say what Baker says it says. In fact,
on page 56, Deming explicitly recommends "statistical controls."
I would recommend to Mr. Baker that he go back and check Deming's article,
but I am sure that he will say that his copy has disappeared, along with
the TEA report mentioned just above. I know for a fact that Keith Baker
no longer has a copy of Deming's paper, which explains why he didn't remember
what was in it. The copy I received from Booksource was a used copy. The
original owner wrote his name on the inside cover. It was Keith Baker.
In his citation, Baker neglected to include the name of the second editor
of this volume (M. Guttentag), and the title was incomplete.
Baker claims that in Willig's study (Willig, 1985), the overall effect
of bilingual education on English was negative, and that "Willig's
analysis shows that bilingual education harms children" (p. 707).
False: Willig reported both mean effect sizes and adjusted mean effect
sizes, the latter controlled for the method used to calculate effect size,
use of random assignment, and type of score used (e.g., raw scores, percentiles,
etc). For 62 comparisons of bilingual education versus submersion on tests
of English reading and vocabulary, the unadjusted effect size was .05,
the adjusted effect size was .20. For 85 comparisons of "total language"
in English, the effect sizes were .01 (unadjusted) and .21 (adjusted).
Thus, students in bilingual education did at least as well as comparisons,
or a bit better. For additional discussion, see Willig (1987).
It is extremely interesting that another meta-analysis of studies comparing
bilingual education to second-language only approaches found very similar
results. Greene (1998) found an average effect
size of .21 for English reading favoring students in bilingual education,
based on eleven studies. Only four of these studies were in Willig's analysis.
Part Two: Response to Criticisms of Krashen and Biber (1988)
Instead of providing detail about the unpublished studies in his articles,
Baker (1999) devotes a considerable amount of space to attacking a monograph
published ten years ago reviewing the accomplishments of bilingual education
in California. This monograph has no bearing on the issue under discussion
because we did not cover structured English immersion at all.
1. Baker claims that Biber and I concluded that the students we studied
"typically score at grade level ... (p. 65)" (Baker, p. 709).
We did not say that on page 65 or anywhere else. We said that "students
in the bilingual programs studied here reach, or come close to national
norms by grade 6" (p. 65).
2. Baker complains that we present scores "in only 45 of 136 cells"
in our data matrix (p. 709). We reported all the data we had, all the data
available to us that was relevant to the issue. No data available to us
was withheld. Empty cells were simply an artifact of the manner of presentation
of our summary data.
3. Baker notes that of 45 reported scores, only 15 reached grade level.
Baker arrived at his figure by counting all the scores at the 50th percentile
and above in our reading and language arts CTBS summary table on page 65.
Excluding results from grades 1, 2, and 3, all eleven language arts scores
are at the 44th percentile or higher. Excluding one of the Baldwin Park
scores (the higher one), because it represents a sample that is a subset
of the other, the mean score is 48.9. For reading, grades 4 through 8,
ten of 18 scores are above the 42nd percentile. All low scores come from
the San Diego cohort, which eventually reached the 51st percentile, and
from Rockwood sixth graders, who showed vast improvement from their third
grade scores, and who scored well above district norms on the CAP. Eastman's
only score below the 40th percentile (grade 4) was higher than city and
local norms. This is very impressive. If Gersten (1985) is typical, structured
immersion students don't even come close to this accomplishment. Recall
that structured immersion students were at the 15th and 16th percentile
in grades 5 and 6 in Uvalde.
4. Baker claims that "in no year did the number of classrooms scoring
'at or above the norm' exceed chance expectations" (p. 709). Baker
apparently assumes that LEP or former LEP students should be at national
norms. There are many reasons why these children do not always reach national
norms, coming in general from a lower socioeconomic situation. The issue
is whether children in these programs do as well or better than children
in other programs, a point Baker (1999) makes in discussing the Uvalde
results. In addition, as noted above, these children are quite close to
national norms, and exceed them in mathematics.
5. Baker claims that we used old norms for the CAP. But our comparison
groups took the same test at the same time as our experimental subjects
did. Sixth graders at the Eastman school, for example, scored 47 points
below the city norm in 1982 and 11 points below it in 1983, before the
revised bilingual plan was installed. Sixth graders who experienced the
new bilingual program did better, scoring five points below the norm in
1984, equal to it in 1985, and three points above it in 1986. CAP scores
were also used to evaluate bilingual education in Rockwood. Here again,
students are compared to others who took the same test at the same time.
All cohorts of Rockwood students we studied scored above district norms.
6. Baker claims that our results for one school, Eastman, could have
been due to the "26 major curricular and program changes" that
took place at that time. Could be, but Baker provides no description of
these changes nor does he provide any citation for them.
The flaws in Baker's arguments remain. Much of the data he cites supporting
the efficacy of sturctured immersion is unpublished and not available.
When the data cited is available, it does not say what he says it says.
Baker, Keith. 1998. Structured English immersion: Breakthrough in teaching
limited-English-proficient students. Phi Delta Kappan 80(3): 199-204.
Baker, Keith. 1999. How can we best serve LEP students? A reply to Nicholas
Meier and Stephen Krashen. Phi Delta Kappan 80(9): 707-10.
Becker, Wesley, and Gersten, Russell. 1982. A follow up of Follow Through:
The later effects of the direct instruction model on children in fifth
and sixth grades. American Educational Research Journal 19: 75-92.
Cummins, Jim. 1999. Research, ethics, and public
discourse: the debate on bilingual education. Presentation at the National
Conference of the American Association of Higher Education, March 22, 1999,
Deming, W. Edwards. 1975. The logic of evaluation. In Elmer Struening
and Marcia Guttentag (Eds.), Handbook of evaluation research, Vol.
1 (pp. 53-68). Beverly Hills, CA: Sage.
Gersten, Russell. 1985. Structured immersion for language minority students:
Results of a longitudinal evaluation. Educational Evaluation and Policy
Analysis 7: 187-96.
Gersten, Russell and Woodward, John. 1985. A case for structured immersion.
Educational Leadership 43: 75-79.
Greene, Jay. A meta-analysis of the effectiveness
of bilingual education. Claremont, CA: Tomas Rivera Policy Institute.
Krashen, Stephen. 1982. Principles and practice in second language
acquisition. New York: Prentice Hall.
Krashen, Stephen. 1996. Under attack: The case against bilingual
education. Culver City, CA: Language Education Associates.
Krashen, Stephen. 1999. What the research really says about structured
English immersion: A response to Keith Baker. Phi Delta Kappan 80(9):
Meier, Nicholas. 1999. A fabric of half-truths: A response to Keith
Baker on structured English immersion. Phi Delta Kappan 80(9): 704,
Willig, Ann. 1985. A meta-analysis of selected studies on the effectiveness
of bilingual education. Review of Educational Research 55: 269-317.
Willig, Ann. 1987. Examining bilingual education research through meta-analysis
and narrative review: A response to Baker. Review of Educational Research