|
||||||||
Daily Notebook
This web site is protected by copyright law.
Reusing pictures or text requires permission from the author. |
|
2026 April 4 |
Do LLMs show that linguistics is bunk? This will be one of the longest and most academic Daily Notebook entries I've ever written — hang on to your hats. Two researchers have synthesized the work of many others to answer a question with implications about the significance of everything I studied in graduate school and for many years afterward. And the result is good news. The paper I'm talking about is Futrell and Mahowald, How Linguistics Learned to Stop Worrying and Love the Language Models. I've linked to an arXiv preprint, but it's soon going to appear in a major journal. Here I want to summarize it so that non-linguists will understand at least part of it, and linguists who have been away from this subfield will understand more. The question is whether the branch of theoretical linguistics founded by Noam Chomsky, studying sentence structure, is rendered obsolete or refuted by large language models (LLMs, chatbots). After all, we are getting high-quality natural language processing without having to feed the computers any kind of analysis, Chomskyan or not — we only give the computers large (huge) (titanically huge) samples of the language. First, some cautions:
Now then. Let's to back to 1955 or so. Scientific linguistics at the time had a good grip on phonetics, phonology, and word formation, but sentence structure seemed elusive; most descriptions of languages said relatively little about it. You have a fixed vocabulary of word and a fixed set of word forms, but a seemingly unlimited supply of sentences; certainly, you normally produce and understand sentences you've never heard before. How to proceed? The one thing that was clear about sentence structure is that it is tree-like, like this:
Whenever elements of a sentence are moved around, substituted, or questioned, they follow the grouping shown by the tree. For example, based on this sentence, you can easily construct questions whose answer is "the cat" or "my favorite pet" but not "is my" or "cat is." (See Wells, "Immediate Constituents," 1947, and Chapter 4 of my own NLP textbook.) Enter Noam Chomsky, who proposed a way to model this precisely. The sentence structure of a language is described by a set of phrase-structure rules that say how elements go together, such as:
S → NP VP A sentence is grammatical ("is generated by the rules") if everything in the tree is permitted by a phrase-structure rule. That's known as a context-free phrase-structure grammar and is mathematically well understood. But Chomsky noticed something else. That type of grammar is almost but not quite adequate for human language. There are things it doesn't handle, such as formation of English questions. A question like What did you see him climbing on yesterday? has what at the beginning and a missing noun phrase somewhere else in the sentence (in this case after on). Phrase-structure rules can't express how that works. Chomsky's answer was to add a second type of grammar rules, transformations, that allow rearranging the output of the phrase-structure rules in specific ways. The result was called transformational-generative grammar. And why was this interesting? Because if we can figure out what the grammar of human languages is made of, we'll know what the human brain processes when we speak and understand speech. We'll also know what the human brain learns when children learn to talk. It is clear that the human brain has some built-in language capabilities; this is a way to discover something about how they work. One cautionary note. Many linguists at the time (1950s to 1980s) mistakenly thought the phrase-structure rules and transformations were operations performed by the speaker's and hearer's mind. That need not be the case. They are just descriptions of the what is computed, so to speak, not how to compute it. Confusion about this point was widespread. The immediate impact of transformational-generative grammar ("TG" to its friends) was that we discovered a lot more about the grammar of English and other languages. Trying to write precise phrase-structure rules and transformations, we described languages explicitly in ways that had never been done before. I think the crown jewel of this was Jackendoff's X-bar theory. Paul Postal wrote a whole book about part of the way English forms subordinate clauses (On Raising). Most of what we all know about our native language was not in the grammar books! That's where I came in (mid-1970s). Some linguists, including me, continued to be interested mainly in making grammar explicit (notably, GPSG, and Jackendoff's later "Simpler Syntax," both of which I followed closely). But by the early 80s, Chomsky and his close associates were moving toward something else: principles and parameters (P&P). The idea was that the phrase structure rules and transformations don't need to be written out because they follow from more basic principles, together with language-specific parameters that tell you, for instance, the word order of one language versus another. The rules were simplified, in a system Chomsky called minimalism. P&P provided good tools for describing language typology and historical changes (see Ringe and Eska's book on historical linguistics). I, however, stuck with the more descriptive line of investigation and focused on computer algorithms for parsing (finding the tree structure of sentences). It was widely understood in the 1990s that parsing was the key to computer language understanding, and that the other key problem was relating tree structure to meaning (the problem that most interested me). Now here we are — and LLMs process language extremely well without ever having been given a grammar to work from. Does that mean TG and all its relatives were bunk? Crucially, an LLM, being trained, learns the usage of each word separately from all the others. It need not have a grammar at all. Superficially, it looks like a huge dictionary, with information only about individual words, although we can't quite see what generalizations might be hidden in its neural network layers. You'd think there'd be no grammar at all — that learning a language has been shown to be nothing but learning individual words and how they are used. If so, we theoretical linguists are out of business. Well... It is just now becoming possible to probe what goes on inside an LLM, and Futrell and Mahowald have pulled together what is currently known. Here are their key conclusions:
So there we stand. The story isn't over (fortunately) but I am glad to begin to see how a lot of things are coming out! |
|
|
2026 April 2 |
A day in the big city I learned about the Optimized AI Conference just a couple of days before it took place, but I managed to take in the last two hours of it on March 31. That involved driving to Marietta (north of Atlanta), where Melody lived and worked for a couple of years just before we got married, so I was revisiting old places that had undergone tremendous urban growth. The conference was a success. I was delighted to see quite a few people I knew from LinkedIn, and one former FormFree colleague who was a keynote speaker and is now very well known for her SQL and data science courses and videos.
From LinkedIn: That was in the Cumberland Galleria. Next I went for a walk in Cumberland Mall, where Melody and I used to walk around after going out to dinner, and was pleased to find it thriving. (As you know, the death of malls has bugged me; they were a feature of 1980s living that I very much enjoyed.) This one is doing plenty of business, although, like all malls, it no longer sells much but clothing; there is no bookstore, record store, or Radio Shack.
It has changed a little; Dick's Sporting Goods occupies what used to be Neiman-Marcus, where we used to look at luxury stereos and the like that we never expected to be able to afford. I'm glad to see the mall prospering, and I think the big problem in the 1980s is that about three times as many malls were built as the economy could support. Dinner in the Food Court, then a brief visit to Micro Center (formerly MEI Micro Center) in Marietta, which is the oldest Micro Center computer store presently operating, though the chain started earlier, in Ohio. This store dates from 1988 and we visited it occasionally when it was new, although we normally go to the Duluth store now. I am glad to see them catering for hobby electronics; with the demise of both Radio Shack and the local repair-parts jobbers that we used to rely on, it has become very hard to get even the most basic components, supplies, and tools locally. For me, "locally" now means 50 miles away, but the point is, by having these things in a local store, they keep people aware of what they can do. Off to the moon! Last night I watched the launch of a crewed spacecraft to the moon, for the first time since 1972. I saw the launches of Apollo 16 and 17 in person, from a swamp near Cape Canaveral, and had watched many previous launched on TV. This will be an orbit of the moon, like Apollo 8, not a landing there. This time I was using TV, but not broadcast TV. I connected to NASA's web page and used ChromeCast to send my computer's video to our seldom-used TV set.
This picture is from a NASA press release: While sending human crews is not the most efficient way to explore space, I think we need to preserve and update the technology we already have, rather than let the knowledge of Apollo be lost to posterity. God speed, Artemis II. From twisted pair to fiber Seven and a half years ago I chronicled the end of POTS (Plain Old Telephone Service) to our house and its replacement by an Internet cable and VOIP. Soon the coaxial cable will be replaced by fiber optics, and AT&T will again be our carrier. And we will have no cable TV service at all. |
|
|
|
||
|
This is a private web page,
not hosted or sponsored by the University of Georgia. Portrait at top of page by Sharon Covington. This web site has never collected personal information and is not affected by GDPR. Google Ads may use cookies to manage the rotation of ads, but those cookies are not made available to Covington Innovations. No personal information is collected or stored by Covington Innovations, and never has been. This web site is based and served entirely in the United States.
In compliance with U.S. FTC guidelines,
I am glad to point out that unless explicitly
indicated, I do not receive substantial payments, free merchandise, or other remuneration
for reviewing or mentioning products on this web site.
Any remuneration valued at more than about $10 will always be mentioned here,
and in any case my writing about products and dealers is always truthful.
|