originally posted by Jonathan Loesberg:
Are you saying that languages don't evolve to be more regular?
yes. languages are massively non-linear systems that have evolved to support the efficient communication of (a) the small set of things that everyone says and (b) the unbounded set of things that almost no-one says. one consequence of this is that, these days at least, no-one learns a language in its entirety and language learning never ends.
irregularities support (a) -- and in serving to emphasize communicative contrasts, they also make them easier for kids to learn -- while regularities help solve the problems posed by (b) -- all the unattested forms that you will encounter and need to use (what shape should the plurals of all of those nouns you've yet to learn take, and what should their past tenses be when you verb them?).
hopefully one thing we all learned from the pandemic is that our monkey brains are shit at thinking about non-linear systems. a set of 100 or so exponentially distributed words comprises half of the words any english speaker ever says, with the rest of the language comprising a massively long tail of further types.
to put how hard this shit is to intuit into context, 10 years or so ago some dudes from harvard (including the beloved pinker) published an article (in science no less) in which they estimated the vocabulary of english -- including proper nouns -- as comprising 1,022,000 types in 2000. which sounds clever until you realize that languages have other cool statistical properties (such as bustiness, which means briefly that word frequencies are not the same as word probabilities) and with this in mind, you drag you knuckles over to the us census bureau and discover in its records that in 2000 there were 6,248,415 surnames (alone) in use in the us...
You are right, of course, that sneaked/snuck is an example of the opposite. But is it an outlier or normal? Without any data, it is my sense that most newly coined verbs in the two languages I speak follow the most usual forms of conjugation. Thus, for instance, to google in French is googler and not googlir, or, god help us, googlre. But--truly--I'd be happy to be shown I'm wrong.
see above. irregular forms are outliers by definition if you look at types, whereas they are often the majority if you look at tokens. this means that the data you seek likely won't come in the form you expect.
but here's a fun one. if were were to dive into another example of irregularization, dive is a word that has enjoyed a frequency boost in the past 100 years or so:
so if we were to drag our knuckles away from the census bureau and look at a speech corpus instead, what we would find is that americans overwhelmingly say 'dove' and write 'dived'.
fb.