Steven Pinker: Linguistics as a Window to Understanding the Brain


My name is Steve Pinker, and I’m Professor
of Psychology at Harvard University.  And today I’m going to speak to you about language.
 I’m actually not a linguist, but a cognitive scientist.  I’m not so much interested
as language as an object in its own right, but as a window to the human mind.
Language is one of the fundamental topics in the human sciences.  It’s the trait
that most conspicuously distinguishes humans from other species, it’s essential to human
cooperation; we accomplish amazing things by sharing our knowledge or coordinating our
actions by means of words.  It poses profound scientific mysteries such as, how did language
evolve in this particular species?  How does the brain compute language? But also, language
has many practical applications not surprisingly given how central it is to human life. 
Language comes so naturally to us that we’re apt to forget what a strange and miraculous
gift it is.  But think about what you’re doing for the next hour.   You’re going
to be listening patiently as a guy makes noise as he exhales.  Now, why would you do something
like that?  It’s not that I can claim that the sounds I’m going to make are particularly
mellifluous, but rather I’ve coded information into the exact sequences of hisses and hums
and squeaks and pops that I’ll be making.  You have the ability to recover the information
from that stream of noises allowing us to share ideas.
Now, the ideas we are going to share are about this talent, language, but with a slightly
different sequence of hisses and squeaks, I could cause you to be thinking thoughts
about a vast array of topics, anything from the latest developments in your favorite reality
show to theories of the origin of the universe.  This is what I think of as the miracle of
language, its vast expressive power, and it’s a phenomenon that still fills me with wonder,
even after having studied language for 35 years.  And it is the prime phenomenon that
the science of language aims to explain.   Not surprisingly, language is central to human
life.  The Biblical story of the Tower of Babel reminds us that humans accomplish great
things because they can exchange information about their knowledge and intentions via the
medium of language.  Language, moreover, is not a peculiarity of one culture, but it
has been found in every society ever studied by anthropologists.
There’s some 6,000 languages spoken on Earth, all of them complex, and no one has ever discovered
a human society that lacks complex language.  For this and other reasons, Charles Darwin
wrote, “Man has an instinctive tendency to speak as we see in the babble of our young
children while no child has an instinctive tendency to bake, brew or write.”  Language is an intricate talent and it’s
not surprising that the science of language should be a complex discipline.
It includes the study of how language itself works including:  grammar, the assembly of
words, phrases and sentences; phonology, the study of sound; semantics, the study of meaning;
and pragmatics, the study of the use of language in conversation. 
Scientists interested in language also study how it is processed in real time, a
field called psycholinguistics; how is it acquired by children, the study of language
acquisition.  And how it is computed in the brain, the discipline called neurolinguistics.

 Now, before we begin, it’s important to
not to confuse language with three other things that are closely related to language.  One
of them is written language.  Unlike spoken language, which is found in all human cultures
throughout history, writing was invented a very small number of times in human history,
about 5,000 years ago.   And alphabetic writing where each mark on
the page stands for a vowel or a consonant, appears to have been invented only once in
all of human history by the Canaanites about 3,700 years ago.  And as Darwin pointed out,
children have no instinctive tendency to write, but have to learn it through construction
and schooling. A second thing not to confuse language with
is proper grammar.  Linguists distinguish between descriptive grammar – the rules, that
characterize how people to speak – and prescriptive grammar – rules that characterize how people
ought to speak if they are writing careful written prose.  
A dirty secret from linguistics is that not only are these not the same kinds of rules,
but many of the prescriptive rules of language make no sense whatsoever.  Take one of the
most famous of these rules, the rule not to split infinitives.  
According to this rule, Captain Kirk made a grievous grammatical error when he said
that the mission of the Enterprise was “to boldly go where no man has gone before.”
 He should have said, according to these editors, “to go boldly where no man has
gone before,” which immediately clashes with the rhythm and structure of ordinary
English.  In fact, this prescriptive rule was based on a clumsy analogy with Latin where
you can’t splint an infinitive because it’s a single word, as in facary[ph] to do.  Julius
Caesar couldn’t have split an infinitive if he wanted to.  That rule was translated
literally over into English where it really should not apply.  
Another famous prescriptive rule is that, one should never use a so-called double negative.
 Mick Jagger should not have sung, “I can’t get no satisfaction,” he really should have
sung, “I can’t get any satisfaction.”  Now, this is often promoted as a rule of
logical speaking, but “can’t” and “any” is just as much of a double negative as “can’t”
and “no.”  The only reason that “can’t get any satisfaction” is deemed correct
and “can’t get no satisfaction” is deemed ungrammatical is that the dialect of English
spoken in the south of England in the 17th century used “can’t” “any” rather
than “can’t” “no.”   If the capital of England had been in the
north of the country instead of the south of the country, then “can’t get no,”
would have been correct and “can’t get any,” would have been deemed incorrect.
 There’s nothing special about a language
that happens to be chosen as the standard for a given country.  In fact, if you compare
the rules of languages and so-called dialects, each one is complex in different ways.  Take
for example, African-American vernacular English, also called Black English or Ebonics.  There
is a construction in African-American where you can say, “He be workin,” which is
not an error or bastardization or a corruption of Standard English, but in fact conveys a
subtle distinction, one that’s different than simply, “He workin.”  “He be workin,”
means that he is employed; he has a job, “He workin,” means that he happens to be working
at the moment that you and I are speaking.  
Now, this is a tense difference that can be made in African-American English that is not
made in Standard English, one of many examples in which the dialects have their own set of
rules that is just as sophisticated and complex as the one in the standard language.  
Now, a third thing, not to confuse language with is thought.  Many people report that
they think in language, but commune of psychologists have shown that there are many kinds of thought
that don’t actually take place in the form of sentences.   (1.) Babies (and other mammals) communicate
without speech For example, we know from ingenious experiments
that non-linguistic creatures, such as babies before they’ve learned to speak, or other
kinds of animals, have sophisticated kinds of cognition, they register cause and effect
and objects and the intentions of other people, all without the benefit of speech.  
(2.) Types of thinking go on without language–visual thinking
We also know that even in creatures that do have language, namely adults, a lot of thinking
goes on in forms other than language, for example, visual imagery.  If you look at
the top two three-dimensional figures in this display, and I would ask you, do they have
the same shape or a different shape?  People don’t solve that problem by describing those
strings of cubes in words, but rather by taking an image of one and mentally rotating it into
the orientation of the other, a form of non-linguistic thinking.  
(3.) We use tacit knowledge to understand language and remember the gist
For that matter, even when you understand language, what you come away with is not in
itself the actual language that you hear.  Another important finding in cognitive psychology
is that long-term memory for verbal material records the gist or the meaning or the content
of the words rather than the exact form of the words.  
For example, I like to think that you retain some memory of what I have been saying for
the last 10 minutes.  But I suspect that if I were to ask you to reproduce any sentence
that I have uttered, you would be incapable of doing so.  What sticks in memory is far
more abstract than the actual sentences, something that we can call meaning or content or semantics.
  In fact, when it even comes to   understanding
a sentence, the actual words are the tip of a vast iceberg of a very rapid, unconscious,
non-linguistic processing that’s necessary even to make sense of the language itself.
 And I’ll illustrate this with a classic bit of poetry, the lines from the shampoo
bottle.  “Wet hair, lather, rinse, repeat.”  
Now, in understanding that very simple snatch of language, you have to know, for example,
that when you repeat, you don’t wet your hair a second time because its already wet,
and when you get to the end of it and you see “repeat,” you don’t keep repeating
over and over in infinite loop, repeat here means, “repeat just once.”  Now this
tacit knowledge of what the writers **** of language had in mind is necessary to understand
language, but it, itself, is not language. 
(4.) If language is thinking, then where did it come from?
Finally, if language were really thought, it would raise the question of where language
would come from if it were incapable of thinking without language.  After all, the English
language was not designed by some committee of Martians who came down to Earth and gave
it to us.  Rather, language is a grassroots phenomenon.  It’s the original wiki, which
aggregates the contributions of hundreds of thousands of people who invent jargon and
slang and new constructions, some of them get accumulated into the language as people
seek out new ways of expressing their thoughts, and that’s how we get a language in the
first place.   Now, this not to deny that language can affect
thought and linguistics has long been interested in what has sometimes been called, the linguistic
relativity hypothesis or the Sapir-Whorf Hypothesis (note correct spelling, named after the two
linguists who first formulated it, namely that language can affect thought.  There’s
a lot of controversy over the status of the linguistic relativity hypothesis, but no one
believes that language is the same thing as thought and that all of our mental life consists
of reciting sentences.   Now that we have set aside what language is
not, let’s turn to what language is beginning with the question of how language works.
In a nutshell, you can divide language into three topics.  
There are the words that are the basic components of sentences that are stored in a part of
long-term memory that we can call the mental lexicon or the mental dictionary.  There
are rules, the recipes or algorithms that we use to assemble bits of language into more
complex stretches of language including syntax, the rules that allow us to assemble words
into phrases and sentences; Morphology, the rules that allow us to assemble bits of words,
like prefixes and suffixes into complex words; Phonology, the rules that allow us to combine
vowels and consonants into the smallest words.  And then all of this knowledge of language
has to connect to the world through interfaces that allow us to understand language coming
from others to produce language that others can understand us, the language interfaces. Let’s start with words.
The basic principle of a word was identified by the Swiss linguist, Ferdinand de Saussure,
more than 100 years ago when he called attention to the arbitrariness of the sign.  Take for
example the word, “duck.”  The word, “duck” doesn’t look like a duck or walk
like a duck or quack like a duck, but I can use it to get you to think the thought of
a duck because all of us at some point in our lives have memorized that brute force
association between that sound and that meaning, which means that it has to be stored in memory
in some format, in a very simplified form and an entry in the mental lexicon might look
something like this.  There is a symbol for the word itself, there is some kind of specification
of its sound and there’s some kind of specification of its meaning.  
Now, one of the remarkable facts about the mental lexicon is how capacious it is.  Using
dictionary sampling techniques where you say, take the top left-hand word on every 20th
page of the dictionary, give it to people in a multiple choice test, correct for guessing,
and multiply by the size of the dictionary, you can estimate that a typical high school
graduate has a vocabulary of around 60,000 words, which works out to a rate of learning
of about one new word every two hours starting from the age of one.  When you think that
every one of these words is arbitrary as a telephone number of a date in history, you’re
reminded about the remarkable capacity of human long-term memory to store the meanings
and sounds of words.   But of course, we don’t just blurt out individual
words, we combine them into phrases and sentences.  And that brings up the second major component
of language; namely, grammar.   Now the modern study of grammar is inseparable
to the contributions of one linguist, the famous scholar, Noam Chomsky, who set the
agenda for the field of linguistics for the last 60 years. 
To begin with, Chomsky noted that the main puzzle that we have to explain in understanding
language is creativity or as linguists often call it productivity, the ability to produce
and understand new sentences.   Except for a small number of clichéd formulas,
just about any sentence that you produce or understand is a brand new combination produced
for the first time perhaps in your life, perhaps even in the history of the species.  We have
to explain how people are capable of doing it.  It shows that when we know a language,
we haven’t just memorized a very long list of sentences, but rather have internalized
a grammar or algorithm or recipe for combining elements into brand new assemblies.  For
that reason, Chomsky has insisted that linguistics is really properly a branch of psychology
and is a window into the human mind.  A second insight is that languages have a
syntax which can’t be identified with their meaning.  Now, the only quotation that I
know of, of a linguist that has actually made it into Bartlett’s Familiar Quotations,
is the following sentence from Chomsky, from 1956, “Colorless, green ideas sleep furiously.”
 Well, what’s the point of that sentence?  The point is that it is very close to meaningless.
 On the other hand, any English speaker can instantly recognize that it conforms to the
patterns of English syntax.  Compare, for example, “furiously sleep ideas dream colorless,”
which is also meaningless, but we perceive as a word salad.  
A third insight is that syntax doesn’t consist of a string of word by word associations as
in stimulus response theories in psychology where producing a word is a response which
you then hear and it becomes a stimulus to producing the next word, and so on.  Again,
the sentence, “colorless green ideas sleep furiously,” can help make this point.  Because
if you look at the word by word transition probabilities in that sentence, for example,
colorless and then green; how often have you heard colorless and green in succession.  Probably
zero times.  Green and ideas, those two words never occur together, ideas and sleep, sleep
and furiously.  Every one of the transition probabilities is very close to zero, nonetheless,
the sentence as a whole can be perceived as a well-formed English sentence.  
Language in general has long distance dependencies.  The word in one position in a sentence can
dictate the choice of the word several positions downstream.  For example, if you begin a
sentence with “either,” somewhere down the line, there has to be an “or.”  If
you have an “if,” generally, you expect somewhere down the line there to be a “then.”
 There’s a story about a child who says to his father, “Daddy, why did you bring
that book that I don’t want to be read to out of, up for?”  Where you have a set
of nested or embedded long distance dependencies.  
Indeed, one of the applications of linguistics to the study of good prose style is that sentences
can be rendered difficult to understand if they have too many long distance dependencies
because that could put a strain on the short-term memory of the reader or listener while trying
to understand them.   Rather than a set of word by word associations,
sentences are assembled in a hierarchical structure that looks like an upside down tree.
 Let me give you an example of how that works in the case of English.  One of the basic
rules of English is that a sentence consists of a noun phrase, the subject, followed by
a verb phrase, the predicate. A second rule in turn expands the verb phrase.
 A very phrase consists of a verb followed by a noun phrase, the object, followed by
a sentence, the complement as, “I told him that it was sunny outside.”  
 Now, why do linguists insist that language
must be composed out of  phrase structural rules?  
(1.) Rules allow for open-ended creativity Well for one thing, that helps explain
the main phenomenon that we want to explain, mainly the open-ended creativity of language.
  (2.) Rules allow for expression of unfamiliar
meaning It allows us to express unfamiliar meanings.
 There’s a cliché in journalism for example, that when a dog bites a man, that isn’t
news, but when a man bites a dog, that is news.  The beauty of grammar is that it allows
us to convey news by assembling into familiar word in brand new combinations.  Also, because
of the way phrase structure rules work, they produce a vast number of possible combinations.
 (3.) Rules allow for production of vast numbers
of combinations Moreover, the number of different thoughts
that we can express through the combinatorial power of grammar is not just humongous, but
in a technical sense, it’s infinite.  Now of course, no one lives an infinite number
of years, and therefore can shell off their ability to understand an infinite number of
sentences, but you can make the point in the same way that a mathematician can say that
someone who understands the rules of arithmetic knows that there are an infinite number of
numbers, namely if anyone ever claimed to have found the longest one, you can always
come up with one that’s even bigger by adding a one to it.  And you can do the same thing
with language.   Let me illustrate it in the following way.
 As a matter of fact, there has been a claim that there is a world’s longest sentence.
  Who would make such a claim?  Well, who else?
 The Guinness Book of World Records.  You can look it up.  There is an entry for the
World’s Longest Sentence.  It is 1,300 words long.  And it comes from a novel by
William Faulkner.  Now I won’t read all 1,300 words, but I’ll just tell you how
it begins.   “They both bore it as though in deliberate
flatulent exaltation…” and it runs on from there. 
But I’m here to tell you that in fact, this is not the world’s longest sentence.  And
I’ve been tempted to obtain immortality in Guinness by submitting the following record
breaker.  “Faulkner wrote, they both bore it as though in deliberate flatulent exaltation.”
 But sadly, this would not be immortality after all but only the proverbial 15 minutes
of fame because based on what you now know, you could submit a record breaker for the
record breaker namely, “Guinness noted that Faulkner wrote” or “Pinker mentioned that
Guinness noted that Faulkner wrote”, or “who cares that Pinker mentioned that Guinness
noted that Faulkner wrote…”   Take for example, the following wonderfully
ambiguous sentence that appeared in TV Guide.  “On tonight’s program, Conan will discuss
sex with Dr. Ruth.”   Now this has a perfectly innocent meaning
in which the verb, “discuss” involves two things, namely the topic of discussion,
“sex” and the person with who it’s being discussed, in this case, with Dr. Ruth.  But
is has a somewhat naughtier meaning if you rearrange the words into phrases according
to a different structure in which case “sex with Dr. Ruth” is the topic of conversation,
and that’s what’s being discussed.   Now, phrase structure not only can account
for our ability to produce so many sentences, but it’s also necessary for us to understand
what they mean.  The geometry of branches in a phrase structure is essential to figuring
out who did what to whom. Another important contribution of Chomsky
to the science of language is the focus on language acquisition by children. Now, children
can’t memorize sentences because knowledge of language isn’t just one long list of
memorized sentences, but somehow they must distill out or abstract out the rules that
goes into assembling sentences based on what they hear coming out of their parent’s mouths
when they were little.  And the talent of using rules to produce combinations is in
evidence from the moment that kids begin to speak.  
Children create sentences unheard from adults At the two-word stage, which you typically
see in children who are 18 months or a bit older, kids are producing the smallest sentences
that deserve to be counted as sentences, namely two words long.  But already it’s clear
that they are putting them together using rules in their own mind.  To take an example,
a child might say, “more outside,” meaning, take them outside or let them stay outside.
 Now, adults don’t say, “more outside.”  So it’s not a phrase that the child simply
memorized by rote, but it shows that already children are using these rules to put together
new combinations.   Another example, a child having jam washed
from his fingers said to his mother ‘all gone sticky’. Again, not a phrase that you
could ever have copied from a parent, but one that shows the child producing new combinations.
  Past tense rule
An easy way of showing that children assimilate rules of grammar unconsciously from the moment
they begin to speak, is the use of the past tense rule. 
For example, children go through a long stage in which they make errors like, “We holded
the baby rabbits” or “He teared the paper and then he sticked it.”  Cases in which
they over generalize the regular rule of forming the past tense, add ‘ed’ to irregular
verbs like “hold,” “stick” or “tear.”  And it’s easy to show… it’s easy to
get children to flaunt this ability to apply rules productively in a laboratory demonstration
called the Wug Test.  You bring a kid into a lab.  You show them a picture of a little
bird and you say, “This is a wug.”  And you show them another picture and you say,
“Well, now there are two of them.”  There are two and children will fill in the gap
by saying “wugs.”  Again, a form they could not have memorize because it’s invented
for the experiment, but it shows that they have productive mastery of the regular plural
rule in English.   And famously, Chomsky claimed that children
solved the problem of language acquisition by having the general design of language already
wired into them in the form of a universal grammar.  
A spec sheet for what the rules of any language have to look like.   What is the evidence that children are born
with a universal grammar?  Well, surprisingly, Chomsky didn’t propose this by actually
studying kids in the lab or kids in the home, but through a more abstract argument called,
“The poverty of the input.”  Namely, if you look at what goes into the ears of
a child and look at the talent they end up with as adults, there is a big chasm between
them that can only be filled in by assuming that the child has a lot of knowledge of the
way that language works already built in.  
Here’s how the argument works.  One of the things that children have to learn when
they learn English is how to form a question.  Now, children will get evidence from parent’s
speech to how the question rule works, such as sentences like, “The man is here,”
and the corresponding question, “Is the man here?” 
 Now, logically speaking, a child getting that kind of input could posit two different
kinds of rules. There’s a simple word by word linear rule.  In this case, find
the first “is” in the sentence and move it to the front.  “The man is here,”
“Is the man here?” Now there’s a more complex rule that the child could posit called
a structure dependent rule, one that looks at the geometry of the phrase structure tree.
 In this case, the rule would be:  find the first “is” after the subject noun
phrase and move that to the front of the sentence.  A diagram of what that rule would look like
is as follows:  you look for the “is” that occurs after the subject noun phrase
and that’s what gets moved to the front of the sentence. 
Now, what’s the difference between the simple word-by-word rule and the more complex structured
dependent rule?  Well, you can see the difference when it comes to performing the question from
a slightly more complex sentence like, “The man who is tall is
in the room.”  
But how is the child supposed to learn that?  How did all of us end up with the correct
structured dependent of the rule rather than the far simpler word-by-word version of the
rule?  “Well,” Chomsky argues, “if you were
actually to look at the kind of language that all of us hear, it’s actually quite rare
to hear a sentence like, “Is the man who is tall in the room?  The kind of input that
would logically inform you that the word-by-word rule is wrong and the structure dependent
rule is right.  Nonetheless, we all grow up into adults who unconsciously use the structure
dependent rule rather than the word-by-word rule.  Moreover, children don’t make errors
like, “is the man who tall is in the room,” as soon as they begin to form complex questions,
they use the structure dependent rule.  And that,” Chomsky argues, “is evidence that
structure dependent rules are part of the definition of universal grammar that children
are born with.”   Now, though Chomsky has been fantastically
influential in the science of language that does not mean that all language scientists
agree with him.  And there have been a number of critiques of Chomsky over the years.  For
one thing, the critics point out, Chomsky hasn’t really shown principles of universal
grammar that are specific to language itself as opposed to general ways in which the human
mind works across multiple domains, language and vision and control of motion and memory
and so on.  We don’t really know that universal grammar is specific to language, according
to this critique.  Secondly, Chomsky and the linguists working
with him have not examined all 6,000 of the world’s languages and shown that the principles
of universal grammar apply to all 6,000.  They’ve posited it based on a small number of languages
and the logic of the poverty of the input, but haven’t actually come through with the
data that would be necessary to prove that universal grammar is really universal.  
Finally, the critics argue, Chomsky has not shown that more general purpose learning models,
such as neuro network models, are incapable of learning language together with all the
other things that children learn, and therefore has not proven that there has to be specific
knowledge how grammar works in order for the child to learn grammar.     Another component of language governs the
sound pattern of language, the ways that the vowels and consonants can be assembled into
the minimal units that go into words.  Phonology, as this branch of linguistics is called, consists
of formation rules that capture what is a possible word in a language according to the
way that it sounds.   To give you an example, the sequence, bluk, is not an English word,
but you get a sense that it could be an English word that someone could coin a new form…
that someone could coin a new term of English that we pronounce “bluk.”  But when you
hear the sound ****, you instantly know thatthat not only isn’t it an English word, but it
really couldn’t be an English word.  ****, by the way, comes from Yiddish and it means kind
of to sigh or to moan.  Oi.  That’s to ****.  
The reason that we recognize that it’s not English is because it has sounds like **** and
sequences like ****, which aren’t part of the formation rules of English phonology.
 But together with the rules that define the basic words of a language, there are also
phonological rules that make adjustments to the sounds, depending on what the other words
the word appears with.  Very few of us realize, for example, in English, that the past tense
suffix “ed” is actually pronounced in three different ways.  When we say, “He
walked,” we pronounce the “ed” like a “ta,” walked.  When we say “jogged,”
we pronounce it as a “d,” jogged.  And when we say “patted,” we stick in
a vowel, pat-ted, showing that the same suffix, “ed” can be readjusted in its pronunciation
according to the rules of English phonology.  
Now, when someone acquires English as a foreign language or acquires a foreign language in
general, they carry over the rules of phonology of their first language and apply it to their
second language.  We have a word for it; we call it an “accent.”  When a language
user deliberately manipulates the rules of phonology, that is, when they don’t just
speak in order to convey content, they pay attention as to what phonological structures
are being used; we call it poetry and rhetoric.   So far, I’ve been talking about knowledge
of language, the rules that go into defining what are possible sequences of language.  But
those sequences have to get into the brain during speech comprehension and they have
to get out during speech production.  And that takes us to the topic of language interfaces.
  And let’s start with production.   This diagram here is literally a human cadaver
that has been sawn in half.  An anatomist took a saw and [sound] allowing it to see
in cross section the human vocal tract.  And that can illustrate how we get out knowledge
of language out into the world as a sequence of sounds.  
Now, each of us has at the top of our windpipe or trachea, a complex structure called the
larynx or voice box; it’s behind your Adam’s Apple.  And the air coming out of your lungs
have to go passed two cartilaginous flaps that vibrate and produce a rich, buzzy sound
source, full of harmonics.  Before that vibrating sound gets out to the world, it has to pass
through a gauntlet or chambers of the vocal tract.  The throat behind the tongue, the
cavity above the tongue, the cavity formed by the lips, and when you block off airflow
through the mouth, it can come out through the nose.  
Now, each one of those cavities has a shape that, thanks to the laws of physics, will
amplify some of the harmonics in that buzzy sound source and suppress others.  We can
change the shape of those cavities when we move our tongue around.  When we move our
tongue forward and backward, for example, as in “eh,” “aa,” “eh,” “aa,”
we change the shape of the cavity behind the tongue, change the frequencies that are amplified
or suppressed and the listener hears them as two different vowels.  
Likewise, when we raise or lower the tongue, we change the shape of the resonant cavity
above the tongue as in say, “eh,” “ah,” “eh,” “ah.”  Once again, the change
in the mixture of harmonics is perceived as a change in the nature of the vowel.  
When we stop the flow of air and then release it as in, “t,” “ca,” “ba.”  Then
we hear a consonant rather than a vowel or even when we restrict the flow of air as in
“f,” “ss” producing a chaotic noisy sound.  Each one of those sounds that gets
sculpted by different articulators is perceived by the brain as a qualitatively different
vowel or consonant.   Now, an interesting peculiarity of the human
vocal track is that it obviously co-ops structures that evolved for different purposes for breathing
and for swallowing and so on.  And it’s an… And it’s an interesting fact first
noted by Darwin that the larynx over the course of evolution has descended in the throat so
that every particle of food going from the mouth through the esophagus to the stomach
has to pass over the opening into the larynx with some probability of being inhaled leading
to the danger of death by choking.  And in fact, until the invention of the Heimlich
Maneuver, several thousand people every year died of choking because of this maladaptive
of the human vocal tract.  Why did we evolve a mouth and throat that
leaves us vulnerable to choking?  Well, a plausible hypothesis is that it’s a compromise
that was made in the course of evolution to allow us to speak.  By giving range to a
variety of possibilities for alternating the resonant cavities, for moving the tongue back
and forth and up and down, we expanded the range of speech sounds we could make, improve
the efficiency of language, but suffered the compromise of an increased risk of choking
showing that language presumably had some survival advantage that compensated for the
disadvantage in choking.   What about the flow of information in the
other direction, that is from the world into the brain, the process of speech comprehension?
  Speech comprehension turns out to be an extraordinarily
complex computational process, which we’re reminded of every time we interact with a
voicemail menu on a telephone or you use a dictation on our computers.  For example,
One writer, using the state-of-the-art speech-to-text systems dictated the following words into
his computer.  He dictated “book tour,” and it came out on the screen as “back to
work.”  Another example, he said, “I truly couldn’t see,” and it came out on
the screen as, “a cruelly good MC.”  Even more disconcertingly, he started a letter
to his parents by saying, “Dear mom and dad,” and what came out on the screen, “The
man is dead.”    Now, dictation systems have gotten better
and better, but they still have a way to go before they can duplicate a human stenographer.
  What is it about the problem of speech understanding
that makes it so easy for a human, but so hard for a computer? Well, there are two
main contributors.  One of them is the fact that each phony, each vowel or consonant actually
comes out very differently, depending on what comes before and what comes after.  A phenomenon
sometimes called co-articulation.   Let me give you an example.  The place called
Cape Cod has two “c” sounds.   
Each of them symbolized by the letter “C,”
the hard “C.”  Nonetheless, when you pay attention to the way you pronounce them,
you notice that in fact, you pronounce them in very different parts of the mouth.  Try
it.  Cape Cod, Cape Cod… “c,” “c”.  In one case, the “c” is produced way
back in the mouth; the other it’s produced much farther forward.  We don’t notice
that we pronounce “c” in two different ways depending whether it comes before an
“a” or an “ah,” but that difference forms a difference in the shape of the resonant
cavity in our mouth which produces a very different wave form.  And unless a computer
is specifically programmed to take that variability into account, it will perceive those two different
“c’s,” as a different sound that objectively speaking, they really are:  “c-eh” “c-oa”.
 They really are different sounds, but our brain lumps them together.  
The other reason that speech recognition is such a difficult problem is because of the
absence of segmentation.  Now we have an illusion when we listen to speech that consists
of a sequence to sounds corresponding to words.  But if you actually were to look at the
wave form of a sentence on a oscilloscope, there would not be little silences between
the words the way there are little bits of white space in printed words on a page, but
rather a continuous ribbon in which the end of one word leads right to the beginning of
the next.   It’s something that we’re aware of…
It’s something that we’re aware of when we listen to speech in a foreign language
when we have no idea where one word ends and the other one begins.  In our own language,
we detect the word boundaries simply because in our mental lexicon, we have stretches of
sound that correspond to one word that tell us where it ends.  But you can’t get that
information from the wave form itself.   In fact, there’s a whole genre of wordplay
that takes advantage of the fact that word boundaries are not physically present in the
speech wave.  Novelty songs like Mairzy doats and dozy doats and liddle lamzy divey 
A
kiddley divey too, wooden shoe? 

Now, it turns out that this is actually a grammatical
sequence in words in English… Mares eat oats and does eat oats and little lambs eat
ivy, a kid’ll eat ivy too, wouldn’t you? When it is spoken or sung normally, the boundaries
between words are obliterated and so the same sequence of sounds can be perceived either
as nonsense or if you know what they’re meant to convey, as sentences.  
Another example familiar to most children, Fuzzy Wuzzy was a bear, Fuzzy Wuzzy had
no hair.  Fuzzy Wuzzy wasn’t very fuzzy, was he?  And the famous dogroll, I scream,
you scream, we all scream for ice cream.  We are generally unaware of how unambiguous
language is.  In context, we effortlessly and unconsciously derive the intended meaning
of a sentence, but a poor computer not equipped with all of our common sense and human abilities
and just going by the words and the rules is often flabbergasted by all the different
possibilities.  Take a sentence as simple as “Mary had a little lamb,” you might
think that that’s a perfectly simple unambiguous sentence.  But now imagine that it was continued
with “with mint sauce.”  You realize that “have” is actually a highly ambiguous
word. As a result, the computer translations can often deliver comically incorrect results.
  According to legend, one of the first computer
systems that was designed to translate from English to Russian and back again did the
following given the sentence, “The spirit is willing, but the flesh is weak,” it translated
it back as “The vodka is agreeable, but the meat is rotten.” 
So why do people understand language so much better than computers?  What is the knowledge
that we have that has been so hard to program into our machines?  Well, there’s a third
interface between language and the rest of the mind, and that is the subject matter of
the branch of linguistics called Pragmatics, namely, how people understand language in
context using their knowledge of the world and their expectation about how other speakers
communicate.   The most important principle of Pragmatics
is called “the cooperative principle,” namely; assume that your conversational partner
is working with you to try to get a meaning across truthfully and clearly.  And our knowledge
of Pragmatics, like our knowledge of syntax and phonology and so on, is deployed effortlessly,
but involves many intricate computations.  For example, if I were to say, “If you
could pass the guacamole, that would be awesome.”  You understand that as a polite request
meaning, give me the guacamole.  You don’t interpret it literally as a rumination about
a hypothetical affair, you just assume that the person wanted something and was using
that string of words to convey the request politely.  
Often comedies will use the absence of pragmatics in robots as a source of humor.  As in the
old “Get Smart” situation comedy, which had a robot named, Hymie, and a recurring
joke in the series would be that Maxwell Smart would say to Hymie, “Hymie, can you give
me a hand?”  And then Hymie would go, {sound}, remove his hand and pass it over to Maxwell
Smart not understanding that “give me a hand,” in context means, help me rather
than literally transfer the hand over to me.  
Or take the following example of Pragmatics in action.  Consider the following dialogue,
Martha says, “I’m leaving you.”  John says, “Who is he?”  Now, understanding
language requires finding the antecedents pronouns, in this case who the “he” refers
to, and any competent English speaker knows exactly who the “he” is, presumably John’s
romantic rival even though it was never stated explicitly in any part of the dialogue.  This
shows how we bring to bear on language understanding a vast store of knowledge about human behavior,
human interactions, human relationships.  And we often have to use that background knowledge
even to solve mechanical problems like, who does a pronoun like “he” refer to.  It’s
that knowledge that’s extraordinarily difficult, to say the least to program into a computer.
  Language is a miracle of the natural world
because it allows us to exchange an unlimited number of ideas using a finite set of mental
tools.  Those mental tools comprise a large lexicon of memorized words and a powerful
mental grammar that can combine them.  Language thought of in this way should not be confused
with writing, with the prescriptive rules of proper grammar or style or with thought
itself.   Modern linguistics is guided by the questions,
though not always the answers suggested by the linguist known as Noam Chomsky, namely
how is the unlimited creativity of language possible?  What are the abstract mental structures
that relate word to one another? How do children acquire them?  
What is universal across languages?  And what does that say about the human mind?  
The study of language has many practical applications including computers that understand and speak,
the diagnosis and treatment of language disorders, the teaching of reading, writing, and foreign
languages, the interpreting of the language of law, politics and literature.
But for someone like me, language is eternally fascinating because it speaks to such fundamental
questions of the human condition.  [Language] is really at the center of a number of different
concerns of thought, of social relationships, of human biology, of human evolution, that
all speak to what’s special about the human species. 
Language is the most distinctively human talent.  Language is a window into human nature,
and most significantly, the vast expressive power of language is one of the wonders of
the natural world.  Thank you.