Another Response to Keith Baker

by Stephen Krashen
University of Southern California

This paper continues a discussion of the merits of structured immersion, begun by Baker (1998), who claimed that structured English immersion programs have been successful. In my response to his paper (Krashen, 1999), I argued that this claim was not supported by the research. Much of the research Baker cited was unpublished and not available to readers, and in cases in which data was available, his report of this data was inaccurate. Baker (1999) responded to me and to Meier (1999). In this paper, I first respond to points made in Baker (1999) on the issue of the claimed success of structured immersion. Second, I respond to Mr. Baker's criticism of my work.

Part One: On the "Success" of Structured Immersion

In Baker (1998), structured immersion is defined as a program for limited English proficient children in which:

  1. English "is used and taught at a level appropriate to the class of English learners" (p. 199). In other words, there is an attempt to make the input comprehensible (Krashen, 1982).
  2. Teachers "are oriented toward maximizing instruction in English and use English for 70% to 90% of instructional time, averaged over the first three years of instruction" (p. 199). Baker does not specify how the first language is used.

I review here cases in which it was claimed that children in structured immersion outperform comparison children in bilingual education. My view is that none of these studies provides clear evidence that this is the case.

Baker (1999) continues to insist that children in structured immersion in Uvalde, Texas, outperformed comparison children in bilingual education. I pointed out that children in structured immersion only reached the 30th percentile in grade 3, and then dropped to the 15th and 16th percentiles in grades 5 and 6, a dismal performance. Baker (1999) claims that this poor performance is irrelevant because "LE (late exit bilingual) children did even worse" (p. 709). But a careful reading of the report (Becker and Gersten, 1982; see also Gersten and Woodward, 1985) reveals no evidence that the comparison group in this case had bilingual education. The decline in scores of the structured immersion group remains serious counterevidence to the claim that structured immersion has been successful.

Unnamed California District
This is another case in which it is claimed that structured immersion children did better than those in bilingual education. The original report is Gersten (1985). I pointed out that sample sizes were small in this study. There were 28 children in the "immersion" group and 16 in bilingual education. I also noted that no SES data was reported, that the bilingual program was not described in any detail, and that the bilingual program was highly unusual: The bilingual group included speakers of Korean, Vietnamese, Samoan and Thai. I know of no district that provides bilingual education in all of these languages. Baker (1999) only addresses the issue of sample size, ignoring my other comments, noting that it is statistical significance that counts, not sample size.

But there is a problem with this study. Gersten only compared the number of immersion and bilingual students who performed at or above grade level after grade 2. He found that more immersion students did so on the CTBS Reading (75%, or 21), compared to the bilingual students (19%, or 3). This is statistically significant. But we do not know how close the bilingual students came to the criterion, and how far above it the immersion children were. If three children fewer in the immersion group and three children more in the bilingual group had made grade level, the chi square would not have reached significance. The Language CTBS scores are even more fragile: 20/28 immersion and 7/16 made grade level. If one child more in the bilingual education group had reached grade level, the difference would not have been statistically significant. Conclusions in favor of a program should be made of sterner stuff.

Gersten makes it clear that he had to make do with the data he had. He also points out that only higher achieving bilingual education students were tested, while all immersion students were tested. Nevertheless, one cannot use this study as strong evidence that immersion is effective.

El Paso
Baker claims that structured immersion was also a winner over bilingual education in El Paso. I noted that the El Paso "immersion" program contained 60 to 90 minutes of Spanish instruction daily, not thirty minutes, as Baker claimed. Baker's response: "Whether Spanish was used 30, 60, or 90 minutes a day is not the point" (p. 709). He feels the point is that there was less Spanish than the amount "advocates of bilingual education maintain is absolutely essential" and that children in the program, did better than those in a "standard" bilingual education program.

I think there are several things the El Paso "immersion" program did right:

  1. The first language was used where it did the most good, in "the more demanding content areas." This approach is similar to that used in the "gradual exit" program described in Krashen (1996), in which the first language is used for all subjects (except ESL and art, music and physical education) at first. At a later stage, the first language is used for those subjects that are difficult to make comprehensible for those limited in English (social studies and language arts), while English is used in those subjects that are easier to contextualize (math, science). Gradually, students do more and more of their work in English. El Paso does not inform us how much first language was used in the "standard" program it may well have been more than the amount used in the "immersion" program. We do not know, however, if the first language was used as efficiently in the standard program as it was in the immersion program.
  2. The "immersion" program also utilized Natural Approach for ESL, sheltered subject matter teaching, and adopted a whole language philosophy. In contrast, the standard bilingual program did not have sheltered subject matter teaching and used a bottom-up approach to reading. My point is that the comparison made was between two versions of bilingual education, not between structured immersion and bilingual education, and that the programs differed in important ways.

Cummins (1999) comments that Baker has argued on both sides of this issue. While Baker now considers the El Paso program to be a structured immersion program whose success shows "the harm that bilingual education programs do to learning English" (Baker, 1998), in 1992 Baker referred to the same program as Spanish-English dual immersion, and interpreted the results differently: "The El Paso study supports the claims of bilingual education advocates that most bilingual education programs do not use enough of the native language" (Baker, 1992, cited in Cummins, 1999).

Texas Education Agency
Baker claims victory for structured English immersion over bilingual education on the basis of a report done by the Texas Educational Agency in the 1980s. Baker provides no citation for this report. In Krashen (1999) I produced evidence that one report issued by the Texas Educational Agency at that time was actually a comparison between bilingual education and ESL, and showed slight advantages for bilingual education. Both programs contained a significant amount of content-based ESL teaching. Baker only says that this report "appears not to be the one I mentioned" but his copy of the report "has disappeared" (p. 709). Yet he does not hesitate to claim it supports structured English immersion over bilingual education.

Baker (1999) says that in his original Kappan paper (Baker, 1998) he forgot to include an additional study showing structured immersion to be superior to LE (late-exit bilingual education), a "very effective project in McAllen, Texas" (p. 707). Baker also claimed that Willig (1985) had erroneously classified this program as late-exit bilingual education. False. Willig classified the special program in McAllen as "alternate immersion," not late-exit bilingual education (p. 284). Willig deals with the McAllen data as a comparison of two versions of bilingual education, alternate immersion versus concurrent translation. The program referred to by Baker as structured immersion contained one period per day of Spanish reading: for kindergarten children this amounted to half the instructional time (the total evaluation covered kindergarten and grade one). The explicit goal of the program was biliteracy. As Willig (1987) notes, the only report issued on McAllen by the district is a few mimeographed sheets in which it is stated that the goals of the program include acquisition of English and the continued development of the native language. Willig also points out that students in the so-called immersion program outperformed "bilingual education" comparisons in Spanish as well as English, confirming that an emphasis was placed on Spanish language development.

Unpublished Data
I said that many of Baker's claims were based on unpublished data. Specifically, several victories claimed for structured immersion are based on unpublished reports: Yap and Enoki, 1988; Webb, Clerc, and Gavito, 1987; and Baker's own Seattle study. Rather than supplying the reader with details about these studies in his response to me and Meier, Baker (1999) only says that rejecting unpublished data is "so absurd that it warrants no comment" and then accuses me of citing unpublished data, claiming I did so in my notes 1 through 4 and note 10 of my response (Krashen, 1999). All of the citations in my footnotes are in the public domain, none is unpublished, and several are in refereed journals. Baker's citations are not only unpublished, but they are not available to readers: Yap and Enoki is a conference presentation, which means it is not available unless one happens to know where Yap and Enoki work. Webb et al. is only described as a "HID mimeo." Baker's Seattle study is not even cited in the references.

A few other points:

Who Is a Dilettante?
In his response to Meier (1999), Baker says that "Meier's contention that certain variables must be controlled ... is common among dilettante methodologists. But it is wrong" (p. 708).

Baker cites Deming (1975), an article in a collection by Struening (Ed.) as a source for this claim. I was curious to see if Deming really said that controlling variables is only for dilettantes, and tried to order the book. I finally managed to locate a copy through Booksource, Ltd. I read Deming's paper and it does not say what Baker says it says. In fact, on page 56, Deming explicitly recommends "statistical controls."

I would recommend to Mr. Baker that he go back and check Deming's article, but I am sure that he will say that his copy has disappeared, along with the TEA report mentioned just above. I know for a fact that Keith Baker no longer has a copy of Deming's paper, which explains why he didn't remember what was in it. The copy I received from Booksource was a used copy. The original owner wrote his name on the inside cover. It was Keith Baker.

In his citation, Baker neglected to include the name of the second editor of this volume (M. Guttentag), and the title was incomplete.

Willig's Meta-Analysis
Baker claims that in Willig's study (Willig, 1985), the overall effect of bilingual education on English was negative, and that "Willig's analysis shows that bilingual education harms children" (p. 707). False: Willig reported both mean effect sizes and adjusted mean effect sizes, the latter controlled for the method used to calculate effect size, use of random assignment, and type of score used (e.g., raw scores, percentiles, etc). For 62 comparisons of bilingual education versus submersion on tests of English reading and vocabulary, the unadjusted effect size was .05, the adjusted effect size was .20. For 85 comparisons of "total language" in English, the effect sizes were .01 (unadjusted) and .21 (adjusted). Thus, students in bilingual education did at least as well as comparisons, or a bit better. For additional discussion, see Willig (1987).

It is extremely interesting that another meta-analysis of studies comparing bilingual education to second-language only approaches found very similar results. Greene (1998) found an average effect size of .21 for English reading favoring students in bilingual education, based on eleven studies. Only four of these studies were in Willig's analysis.

Part Two: Response to Criticisms of Krashen and Biber (1988)

Instead of providing detail about the unpublished studies in his articles, Baker (1999) devotes a considerable amount of space to attacking a monograph published ten years ago reviewing the accomplishments of bilingual education in California. This monograph has no bearing on the issue under discussion because we did not cover structured English immersion at all.

    1. Baker claims that Biber and I concluded that the students we studied "typically score at grade level ... (p. 65)" (Baker, p. 709). We did not say that on page 65 or anywhere else. We said that "students in the bilingual programs studied here reach, or come close to national norms by grade 6" (p. 65).

    2. Baker complains that we present scores "in only 45 of 136 cells" in our data matrix (p. 709). We reported all the data we had, all the data available to us that was relevant to the issue. No data available to us was withheld. Empty cells were simply an artifact of the manner of presentation of our summary data.

    3. Baker notes that of 45 reported scores, only 15 reached grade level. Baker arrived at his figure by counting all the scores at the 50th percentile and above in our reading and language arts CTBS summary table on page 65. Excluding results from grades 1, 2, and 3, all eleven language arts scores are at the 44th percentile or higher. Excluding one of the Baldwin Park scores (the higher one), because it represents a sample that is a subset of the other, the mean score is 48.9. For reading, grades 4 through 8, ten of 18 scores are above the 42nd percentile. All low scores come from the San Diego cohort, which eventually reached the 51st percentile, and from Rockwood sixth graders, who showed vast improvement from their third grade scores, and who scored well above district norms on the CAP. Eastman's only score below the 40th percentile (grade 4) was higher than city and local norms. This is very impressive. If Gersten (1985) is typical, structured immersion students don't even come close to this accomplishment. Recall that structured immersion students were at the 15th and 16th percentile in grades 5 and 6 in Uvalde.

    4. Baker claims that "in no year did the number of classrooms scoring 'at or above the norm' exceed chance expectations" (p. 709). Baker apparently assumes that LEP or former LEP students should be at national norms. There are many reasons why these children do not always reach national norms, coming in general from a lower socioeconomic situation. The issue is whether children in these programs do as well or better than children in other programs, a point Baker (1999) makes in discussing the Uvalde results. In addition, as noted above, these children are quite close to national norms, and exceed them in mathematics.

    5. Baker claims that we used old norms for the CAP. But our comparison groups took the same test at the same time as our experimental subjects did. Sixth graders at the Eastman school, for example, scored 47 points below the city norm in 1982 and 11 points below it in 1983, before the revised bilingual plan was installed. Sixth graders who experienced the new bilingual program did better, scoring five points below the norm in 1984, equal to it in 1985, and three points above it in 1986. CAP scores were also used to evaluate bilingual education in Rockwood. Here again, students are compared to others who took the same test at the same time. All cohorts of Rockwood students we studied scored above district norms.

    6. Baker claims that our results for one school, Eastman, could have been due to the "26 major curricular and program changes" that took place at that time. Could be, but Baker provides no description of these changes nor does he provide any citation for them.

The flaws in Baker's arguments remain. Much of the data he cites supporting the efficacy of sturctured immersion is unpublished and not available. When the data cited is available, it does not say what he says it says.

Baker, Keith. 1998. Structured English immersion: Breakthrough in teaching limited-English-proficient students. Phi Delta Kappan 80(3): 199-204.

Baker, Keith. 1999. How can we best serve LEP students? A reply to Nicholas Meier and Stephen Krashen. Phi Delta Kappan 80(9): 707-10.

Becker, Wesley, and Gersten, Russell. 1982. A follow up of Follow Through: The later effects of the direct instruction model on children in fifth and sixth grades. American Educational Research Journal 19: 75-92.

Cummins, Jim. 1999. Research, ethics, and public discourse: the debate on bilingual education. Presentation at the National Conference of the American Association of Higher Education, March 22, 1999, Washington, DC.

Deming, W. Edwards. 1975. The logic of evaluation. In Elmer Struening and Marcia Guttentag (Eds.), Handbook of evaluation research, Vol. 1 (pp. 53-68). Beverly Hills, CA: Sage.

Gersten, Russell. 1985. Structured immersion for language minority students: Results of a longitudinal evaluation. Educational Evaluation and Policy Analysis 7: 187-96.

Gersten, Russell and Woodward, John. 1985. A case for structured immersion. Educational Leadership 43: 75-79.

Greene, Jay. A meta-analysis of the effectiveness of bilingual education. Claremont, CA: Tomas Rivera Policy Institute.

Krashen, Stephen. 1982. Principles and practice in second language acquisition. New York: Prentice Hall.

Krashen, Stephen. 1996. Under attack: The case against bilingual education. Culver City, CA: Language Education Associates.

Krashen, Stephen. 1999. What the research really says about structured English immersion: A response to Keith Baker. Phi Delta Kappan 80(9): 705-6.

Meier, Nicholas. 1999. A fabric of half-truths: A response to Keith Baker on structured English immersion. Phi Delta Kappan 80(9): 704, 706.

Willig, Ann. 1985. A meta-analysis of selected studies on the effectiveness of bilingual education. Review of Educational Research 55: 269-317.

Willig, Ann. 1987. Examining bilingual education research through meta-analysis and narrative review: A response to Baker. Review of Educational Research 57(3): 363-76.