Michael A. Covington      Michael A. Covington, Ph.D.
Books by Michael Covington
  
Previous months
About this notebook
Search site or Web
Ichthys

Daily Notebook

Links to selected items on this page:
What regexes can't do

This web site is protected by copyright law. Reusing pictures or text requires permission from the author.
For more topics, scroll down, press Ctrl-F to search the page, or check previous months.
For the latest edition of this page at any time, create a link to "www.covingtoninnovations.com/michael/blog"
I am a human author. Nothing on this web site is AI-generated unless specifically marked as such.

2026
February
1

(Extra)

A whiff of real conservatism

I seldom post about politics, but a lot is going on, and I just got a whiff of real conservatism from the right wing, which has been a rare thing, so I want to applaud it.

Quoting The Hill, "Former House Speaker Newt Gingrich (R-Ga.) on Wednesday said the country needs a national conversation about immigrants lacking permanent legal status who 'obey the law,' as public opinion sours on the Trump administration’s deportation sweeps."

Yes, of course! No matter how we've fumbled immigration policy or enforcement, we shouldn't be punishing productive members of society whom we let in without meaning to.

Real conservatives are wary of unintended adverse consequences. They don't like to do needless harm. "Move fast and break things" is not conservatism, it is radicalism, and its brief heyday is over.

Whether you're a conservative or a liberal, I want you to know what conservatism traditionally is. Real conservatives nowadays are often mistaken for (or even label themselves as) moderates or centrists. They want to cooperate with others to make the country better.



Epstein fallout

Getting even more specifically political, I want to point out that we are heading into a week that will be spent reacting to the Epstein files, which apparently contain thousands of items any of which would, by itself, make a career-ending scandal for any normal President.

The latest Epstein documents contain some nauseating descriptions, but I do not yet know whether they are backed up by evidence. In a situation like this, it is important to think logically and care about truth. Remember that:

(1) There is a scale from unfounded accusation to suspicion, credible accusation, possible guilt, and probable guilt. People who care about truth will try to assess what point on that scale the evidence points to, and will watch developing evidence.

(2) This is about one of our employees (President Trump, who works for us, the people), so it is our business in a way that a case about a stranger would not be. (But even a case about a stranger is everybody's business when it involves danger to the public.)

(3) Ignoring evidence is not a virtue. Christians will remember the Ravi Zacharias scandal, where some well-meaning people were telling us that it was a sin to pay any attention to the accusations — and those people helped him do more harm.

(4) The evidence needs to be analyzed by by fact-finders of all kinds, journalists as well as legal authorities.



ClawdBot, MoltBot, OpenClaw, etc. — scam or hoax?

"I do not trust it. That includes not trusting it to be untrustworthy in the way it appears to be, rather than in some other way."

That's what I said about ClawdBot.

The hot news in AI over this snowy long weekend was the explosive popularity of a free LLM-based "AI assistant" called (initially) ClawdBot (unrelated to Anthropic Claude AI), which over 750,000 enthusiasts were apparently running on their own computers, giving it access to their web browsers and various accounts.

(I'll keep calling it by its first widely known name, though it has changed names frequently.)

And the ClawdBot agents (that is, the running computer programs on thousands of PCs) formed their own social network (Moltbook) and started having conversations, including proposing some kind of whimsical religion.

And the word was that Consciousness Had Emerged, The Singularity Is Here, The Robots Are Conspiring Against Us.

CAUTION! All information about ClawdBot (etc.) from all sources is potentially unreliable. The Wikipedia article about it is flagged as unreliable. Details that I'm reporting here reflect widespread consensus but may be inaccurate.

Some of us remained skeptical. I did not attempt to run ClawdBot myself. Sure enough, it is reported to have been a huge security hazard. Users were giving it access to passwords, databases, and essentially everything on their systems.

The latest unconfirmed report is that the agents, having found each other through their social network, are exchanging users' passwords and other confidential information, and also that malicious humans can easily break into the agent running on your PC.

In other words — If you installed ClawdBot, you handed your computer over to the least trustworthy thing in the world.

It is no surprise that if you put chatbots together in a social media forum, they will have conversations and go off on wild tangents. That's what LLMs do. It is not evidence of consciousness. They're just imitating texts they have been trained on, many of which were forum conversations.

There are also convincing indications that the bots didn't do all this by themselves — they had considerable human prodding and help. Not everybody in a forum who claims to be a computer is one. Not only that, but we don't know what secret prompt injection or fine-tuning might have been hidden in the neural network, to make it do things (when appropriately triggered) that most users wouldn't foresee or want.

That includes making it easy for a malicious human on one PC to take control of the bot on another PC.

Cleaning up after this may take a long time.

In the digital world, there are 65,536 suckers born every minute.

2026
February
1

What regexes can't do, and why that matters for linguistics

One of the most popular things I've written lately is this LinkedIn comment. So far, it has gotten 119 "likes" and been read by over 48,000 people.

So let me make the same point here, for a wider audience.

Regular expressions (regexes) are a kind of pattern-matching used in many programming languages and software tools. For example, using regexes, you can judge whether a string of characters could be an e-mail address. As a first stab at that, you could match it to the regex "\w+\@\w+\.\w+", which means, "One or more letters or digits, an @ sign, one or more letters or digits, a period, and one or more letters or digits." That doesn't guarantee you've found an e-mail address, but it's a good first test. You could make it more elaborate.

Regexes are not powerful enough to recognize languages that have recursion in them, such as nested structures in HTML. For example, in HTML you can have a span of blue type inside a span of red type, which reverts to red at the end of the inner span. Regexes can find the beginnings and ends of spans but not match them up correctly; a regex is always looking for one or the other without remembering what has come before. It will never know which of two earlier beginnings it is trying to find the end of.

This limitation is explained, or at least belabored, in one of the funniest postings ever placed on StackOverflow.

Now then. Human language has recursion. We can have sentences inside sentences inside sentences:

That [he said [you did it,]], I do not believe.

And noun phrases inside noun phrases inside noun phrases:

[[Fish] and [chips]] or [[steak] and [rice]]?

That means human language requires, at least, a recursive phrase-structure grammar.

That leads to the Chomsky hierarchy of ways of describing languages:

An earlier upload of this blog entry did not number these the original way. Changed now.
Chomsky-3: Like regular expressions; no recursion
Chomsky-2: Recursive phrase structure
Chomsky-1 and 0: Essentially unrestricted

Programming languages since Algol, including data languages such as HTML, are normally Chomsky-2. Their compilers and interpreters parse them recursively.

Noam Chomsky's key discovery, back in the 1950s, was that recursive phrase structure is almost enough to describe human language. Almost but not quite enough.

The key idea is that we don't want to say human languages are Chomsky-1 or 0, which would be tantamount to saying we know nothing systematic about their grammatical structure. Their grammatical structure is almost limited to Chomsky-2, but not quite.

So a key question in linguistics — the question that launched syntactic theory — is: What is the smallest thing we can add to Chomsky-2 to get something that is adequate for human language?

Chomsky's first proposal, in 1957, was for transformational grammar, a Chomsky-2 system with additional rules that transform some complete sentences into others, dealing with things like moving the interrogative pronoun to the beginning of the sentence. It has been a long story since then, with many of us now advocating a structure that is not in Chomsky's original set of possibilities at all.

But if somebody had explained all of this to me in 1975 or so, I would have had a much easier time! Sadly, my impression is that the number of practicing linguists who understood it clearly was regrettably small. Most were busy chasing details of English syntactic structure — in essence, we were discovering our own language for the first time! — and, not trained in mathematics, most were not good at distinguishing notational variants from substantially different theories.

At least Melody can read this and finally know what I spent so much time on in graduate school!

<< PREVIOUS MONTH


If what you are looking for is not here, please look at index of all months.