Stanford Encyclopedia of Philosophy

Notes to Innateness and Language

1. This paper assumes that linguistic competence involves syntactic knowledge of some kind. See, however, Devitt 2006 for an intriguing argument that no representation of syntactic knowledge is involved in being able to speak and understand a language.

2. Terminological Note: In this article, I will use ‘innate’ and ‘inborn’ as synonyms; such usage is typical in the literature on linguistic nativism. As many have pointed out, however, these terms cannot generally be treated as synonymous in the philosophical literature on innateness (See, e.g., Cowie, 1999, chs.1-3; Elman et al., 1996; Griffiths, 2002; Wimsatt, 1999.)

3. Assuming, of couse, that speakers have knowledge about their language — again, see Devitt 2006 for arguments that speakers do not have knowledge of grammar, and that whatever knowledge they do have about language plays no role in language production or comprehension. Devitt Ch. 5 explores the implications of this radical view for linguistic nativism.

4. The question seems pressing even if one assumes, as most linguists in fact do, that incoming sentences are subject to a certain amount of analysis or preprocessing before being used as evidence for language learning. For instance, suppose (with Chomsky 1965) that the evidence used by language learners (the pld) were so rich as to include ‘a partial and tentative pairing of signals with structural descriptions’ (1965:32). The question still remains how learners could figure out (i) whether that ‘tentative pairing’ was correct and (ii) what the rules governing utterances with those sorts of structural descriptions are. (Not to mention how they are able to perform the relevant analysis or preprocessing of the noises hitting their eardrums.)

5. See, e.g., Chomsky, 1975:30-33; 1986:7-13

6. An asterix (*) before a sentence indicates that it is ungrammatical or ill-formed.

7. Tomasello stresses that the kind of distributional analysis at issue here is functionally-based. The learner does not merely keep track of the frequencies with which a given word or phrase occurs in different syntactic contexts; instead, she is concerned to analyze the communicative functions associated with particular elements: elements are grouped together on the basis of their playing similar communicative roles. More adult-like syntactic categories emerge only later, and very gradually, as a result of additional analysis. Chomsky (1957) argued that distributional analyses cannot give rise to knowledge of grammar, as there are not enough ‘minimally differing’ sentences in the input to enable children to figure out what the allowable contexts for a given element are. In stressing the role of (broadly) semantic information in childrens' analyses, Tomasello seeks to avoid these kinds of problems. (He also points out, however, that empirical studies of speech directed at children show large amounts of repetition, of the sort that ordinary distibrutional analysis could exploit — e.g., children hear around 5,000-7,000 utterances per day, and around 45% of mothers' utterances to young begin with one of just 17 words, like What…, That…, and You…. Tomasello, 2003: 110-111 and ff.)

8. Pullum and Scholz, as they admit, are vulnerable to this objection, as their data are taken from corpora like the Wall Street Journal. Sampson, however, is not: his corpus enables him to estimate that even “linguistically deprived three year olds” (2002:78) hear a complex auxiliary about every other day.

9. In this connection, it is worth noting that according to Tomasello (2004), children tend to learn question forms in a very piecemeal way, as formulas tied to particular communicative situations, like Where's THING? Can I ACTION? Is it PROPERTY? Etc. A child may thus use the formula Is it F? quite correctly and productively, yet consistently produce errors like Are the clouds are moving? This suggests that they do not, contra Chomsky, hit unerringly upon the correct, general rule. (Tomasello 2003:158-9).

10. For critical discussion of the one theory that does supply a linking mechanism, that of Pinker (1984, 1987, 1989) see Tomasello (2004:184-5).

11. Cf Pinker 1994:12: “[When the Head parameter is set] huge chunks of grammar are then available to the child, all at once, as if the child were merely flipping a switch.” For further discussion, see Cowie 1999, §§10.5, 10.6.

12. To be sure, Chomsky has often claimed that all speakers of a given language acquire the same grammar. However, he is quite explicit that this is an idealization made to simplify both one's account of learning and the task of grammar construction. It's clear, however, that abstracting away from individual differences for some purposes does not make those differences go away.

13. The learnability literature has explored the possibility of making various of these assumptions. Surveying this literature is beyond our scope (and probably beyond me). My sense is that as learners are endowed with different kinds of learning algorithms, particularly ones that regard confirmation probabilistically and sample data statistically, the Unlearning problem disappears (or perhaps, disappears ‘in the limit).’ For more information and references, see the entry on formal learning theory.

14. It's perhaps worth reiterating here that all theories of language acquisition suppose that learners are able to perform some kind of structural analysis of the data. E.g., as mentioned above (§2.2.1(d)), a learner of Spanish setting the head parameter to ‘Heads optional’ must be able to tell that she has, in fact, encountered a sentence that lacks a Head. The point here is that such analyses can be exploited as a means of bringing data to bear on whole classes of sentences.

15. Tomasello discusses this method under the name of ‘pre-emption’ in his (2004:178ff). See also Cowie 1997.

16. Thanks to Jim Woodward for this example. At this point, it is perhaps appropriate to mention the oft-misinterpreted results in formal learning theory which are held to show that languages are not learnable from positive instances (or ‘text’). Gold, 1967 showed that under certain assumptions about the learning algorithm and the class of natural languages, the latter are not learnable (in the sense of ‘identifiable in the limit’) from text. However, more recent results have shown that under different (and in many ways more plausible) assumptions about the learning algorithm, the class of natural languages and the correct criterion for learnability, natural languages are learnable from text. See Jain et al. 1999, Feldman, 1972 and Shinohara, 1994.

17. Cf. Baker, 1979; Lasnik; 1989:89-90; Pinker, 1989.

18. See Skyrms, 2002 for more on this point.

19. This is true of 96% of right-handers and approx. 70% of left-handers. (Saffran, 2000:410)

20. Ocular dominance columns are groups of connected neurons in the visual cortex, organized into columns lying perpendicular to the cortical surface. While most columns respond to input from both eyes, most also respond more strongly to inputs from one eye rather than the other. In addition, columns have ‘preferred’ stimuli (e.g., bars of light at a particular orientation) to which they respond most strongly; these preferences change gradually across the cortex, giving rise to a ‘map’ of the visual field.

21. Note that everything is now known to be much more complicated than this, with many factors influencing both the outcome of occular deprivation and the possibility of recovery. See Katz et al., 2000 and Elbert et al., 2001 for surveys.

22. Of interest in this connection is the debate within the SLA literature about the degree of ‘access’ to UG enjoyed by second language learners. Some prominent researchers argue that UG is fully accessible to second language learners, and that it constrains SLA in much the same way as it constrains first language learning (see White, 2003; Epstein, Flynn and Martohardjono, 1996, Epstein, 1998). However others (e.g., Kellerman (e.g., Kellerman and Yoshioka, 1999 and Kellerman et al., 1999) argue that UG only influences SLA to the extent that its principles are implicit in the native language grammar and still others (e.g., Clahsen and Muysken, 1986) argue that SLA is not constrained by UG at all. (For an overview, see Sharwood Smith, 143-171.) While further discussion of these issues is beyond the scope of this paper, it is worth pointing out that if either of the latter views turn out to be true, then it would appear that to the extent that it succeeds and does not rely on explicit training (itself a controversial issue), SLA can succeed without innate knowledge of UG (or without full knowledge of UG). This raises the obvious question: if people can learn a second language without knowledge of UG, why not a first language too?

23. Elbert et al., 2001 and Kuhl, 2000 stress that adults can learn to speak a second language (L2) without accent, especially when trained with suitable techniques (e.g., those minimizing the effects of memory and stressing feedback). Based on their studies of the superior efficacy of ‘behaviorally relevant’ training in restoring lost functions in lower animals, Elbert et al. attribute the generally non-native-like outcomes of adult L2 learning to the fact that they just don't care very much about their accent. However this of course begs the question of why children do care — or, perhaps, of why it doesn't matter whether they care or not.

24. Although training helps: Bradlow et al. 1997.

25. This may in part be a consequence of prenatal learning. DeCaspar and Spence, 1986.

26. Thanks to Jim Woodward for this point.

27. Topicalization is a device whereby the hearer's attention is focused on the occupier of a certain role. In English moving a constituent to the beginning of the sentence and/or stressing it is a way to topicalize that constituent, as in Chardonnay I love, but champagne I adore. In ASL, elements are topicalized by being moved to the beginning of the sentence and accompanied by a special facial expression lasting as long as that element takes to sign.

28. Although significantly lower than that of their relatives, the affected KE's nonverbal IQ is still close to the population average, hence the diagnosis of SLI still applies.

29. I owe the point about the monolithic simplicity of the nativist's theory to Peter Godfrey-Smith.