The Mathematics of Linguistics

Hey, y’all! It’s certainly been awhile. I’ve been busy the entire semester but with the lull in between Fall & Spring terms, I figured I would try to write something interesting I had been thinking about for some time.

Those that know me, know that I am absolutely crazy about natural language applications — pieces of technology that utilize your speaking voice to control some aspect of the device. It has a wide range of utility, that I’ll leave to a different post which will discuss examples of these pieces of technology. This post is about a topic a bit abstracted from that: the underlying ideas that allow you to make use of these applications come from computational linguistics, a unique & really cool subfield within the scope of AI.

Development of Linguistics as a Scientific Field

The idea of linguistics being a subject of mathematical research comes as a surprise to a lot of people I talk to that do not do much computer science type work; they have no idea that there are a number of a mathematical models used directly in the field of linguistics, including n-grams & ideas of entropy.


Basic wordcloud of Wikipedia page on linguistics. Generated using Mathematica.

More indirectly, foundational ideas from both of these direct applications of probability theory come, perhaps unsurprisingly, from the logic developed by Cantor & reconciled by Russell in set theory, as well as from various mathematicians during the Golden Age of Logic, specifically Gödel & his Incompleteness Theorems; These ideas concerned linguistic patterns of mathematical axioms, & set the stage for analyses of everyday languages as communicative systems of which they exist at any given point in time, regardless of their history, with ‘axioms’ being replaced with ‘grammars’.

A Basic Idea of Set Theory

Set theory provided a needed versatility when identifying patterns at higher levels of abstraction, with the first whole theory being developed by Georg Cantor, who relied on Boole’s algebraic notations when working with syllogisms. The theory begins by establishing an arithmetic for sets:

If A and B are sets, let AB denote the set consisting of all members common to A and B, and let A + B denote the set consisting of all members of A together with all members of B.

This is just a formalization of Boole’s work, the only difference being that here, A & B refer to any set, not just those arising as a consequence of propositional logic. To avoid confusion, a contemporary notation was developed to where AB becomes A ∩ B (read as ‘A intersection B’), & A + B becomes A ∪ B (read as ‘A union B’). The arithmetic developed by Boole still holds in this generalization, except for the axioms involving the object 1, because there is no need for these kinds of objects in set theory. Sets of objects are subjected to the same type of properties – the same type of arithmetic – that can, but not necessarily, preclude other objects from being categorized in the same set.


Sets are intuitively represented as Venn diagrams. The left is the union of both sets, whereas the right is the intersection of the given sets.

Small sets of objects are typically listed explicitly, while larger or infinite sets use ellipses e.g.:

  • {1, 2, 3}, a set consisting of the elements 1, 2, & 3
  • {1, 2, 3, …}, the infinite set of all natural numbers, also represented as ℕ

The introduction of sets allows us to examine patterns in varying levels of abstraction: the natural numbers are within the set of integers, which are within the set of rational numbers, which are within the set of real numbers, which are within the set of complex numbers. We typically refer to sets within sets as subsets. In notation, this is

ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ

Mathematicians have used this idea to answer questions regarding the nature of numbers; to answer a question like, “what is a number?” we have to look out how members within certain sets of numbers can be described by the sets that precede it. That is, how properties of numbers within a lower abstracted set can be combined in some fashion as to represent elements within higher abstracted set. At the lowest level – the set of natural numbers – can be described in terms of axioms (namely, you can construct the notion of the set of natural numbers using solely the empty set – the set with no elements – denoted as ∅).

Inconsistency & Reconciliation with Axiomatic Set Theory

For all the usefulness of set theory, there was a major flaw in the framework. Before I discuss what the flaw was, it is more apt to discuss why having it is such a major deal.

Of all things that can be wrong with a particular axiom system – & you would hope there is none – inconsistency is definitely the worst. It is possible to work with axioms that are hard to understand, we see this every day: the Peano axioms that make up the set of natural numbers are not widely known by most people, yet we use natural numbers constantly. It is also possible to work with axioms that are counterintuitive which we, again, see every day (shoutout to statisticians). It is even possible to work with axioms that do not accurately describe the system you are intending to represent, since these axioms could be useful elsewhere in mathematics. But inconsistent axioms? No way, José.

The particular inconsistency, found by Bertrand Russell, was one relating to properties. Recall that I said that elements satisfying particular properties can be grouped together into sets. Well what happens when you have properties that are not properties of the elements within sets but properties of sets themselves? Is the set a member of itself? Or, what happens, if certain sets do not fulfill this property, are they not members of themselves? That is to say:

If set R satisfies property P, but set W does not, can R ∈ R if W ∉ W? This is a contradiction in the definition of a set itself. Russell looked at this question in a more nuanced way, that I will not go into detail here, but arrived at the same impasse.

The solution? The development of axiomatic set theory, which added new axioms to the ones that already existed from before. It is not as simple & succinct as Cantors original theory, & so it was only with reluctance that it was abandoned, but it just goes to show that even the most intuitive ideas need to be critically examined for flaws.

Gödel’s Incompleteness Theorem

A similar type of foundational flaw was found in the axiomatic approach to mathematics itself. The reason we use axioms as building blocks for particular mathematical structures is because it makes it possible for us to separate the ideas of being able to prove something & something being objectively true. If you had a proposition that was provable, then you can use a sequence of arguments to deduce whether or not the proposition was true, as long as your axioms were assumed true. These ideas being separated allowed mathematicians to avoid dealing with the major philosophical implications that “objective truth” entails.

Now, since figuring out which axioms to use is crucial to formalized mathematics, this implies – within the notion of formalization – that you will, at some point, find all the axioms you need to be able to prove something. This is the idea of completeness of an axiomatic system. But, as noted with Russell, finding all the axioms needed can be a really difficult undertaking. How do we know if we have found them all? How can we have a complete set of axioms?

In comes Gödel, who asserts that an axiom system must be incomplete, because there are questions that cannot be answered on the basis of the axioms. The big idea here is that consistency precludes completeness: if your axioms are consistent – that is they build a basic arithmetic by which you can perform known operations like addition, multiplication, etc. – then there exists statements which are true but not provable.


Kurt Gödel (1906-1978)

The proof of Gödel’s Theorem is better left to the entire books that are dedicated to its uses & abuses as it pertains to mathematical logic. A book I would recommend, if you are curious, is Gödel’s Theorem: An Incomplete Guide to Its Use and Abuse by Torkel Franzén.

Grammatical Approach to Linguistics

From what we have seen previously, it is not so farfetched then to examine the study of everyday language with the same axiomatic basis that we have seen before with the notion of sets. I suppose another question that arises in this examination is why: why would we want to analyze language with this paradigm? Let us look at an example:

  1. People love going to the movie theater.
  2. Cats are interested in mayonnaise and like resting in catacombs.
  3. Lamp soft because mom to runs.

It should come as no surprise that the third sentence is not proper English, but the first & second sentences are. The first & second sentences are correct, yet one is nonsensical. How can that be the case? Why can a nonsensical sentence still be classified as a genuine English sentence? It is because of the way in which the words are stitched together – the structure of the sentence.

Sentence structure – much like sets – are abstract constructs that contain elements adhering to specific properties. These elements are things you can explicitly point to, in this case, particular words. Because sentences are a level of abstraction above words, they cannot be explicitly pointed at when coming up with rules. The only thing you can do is observe the repetitive behavior expressed by sentences & then come up with a way to generalize that behavior. These patterns constitute rules that are much like the axioms we’ve encountered before; the type of “arithmetic” that would be applied to this set of axioms is the basis of a grammar for a language.

Inspired by the advancements of logic in the Golden Age, this new mathematically based linguistics attempted to reduce all meaningful statements to a combination of propositional logic & information conveyed by the five human senses. A step further, Noam Chomsky attempted to do what could not be done with set theory: to design a process of finding all axioms that described the syntactic structure of language.

  • DNP VP → S
  • V DNP → VP
  • P DNP → PP


  • DNP = definite noun phrase
  • VP = verb phrase
  • S = sentence
  • V = verb
  • P = preposition
  • PP = prepositional phrase

These are just a few of pieces of the formalism developed by Chomsky. Using the above grammar – & typically a parse tree – it is very easy to apply to the English lexicon, as each word corresponds to one of the given types. This formal grammar, which is built on axiomatic principles, captures some of the structure of the English language.


Parse tree of a simple sentence. The sentence being “the rat ate cheese”, with the sentence broken down into its constituents.

The Advent of Natural Language Processing

It should be noted that the use of parse trees is highly developed in computer science, & used extensively in natural language processing (NLP). Chomsky’s algebraic work in grammar has allowed computer scientists to break down human speech patterns in a way that machines can analyze, understand & ultimately interpret; NLP considers the hierarchal structure of language.


Amazon’s family of natural language devices. From left to right: the Tap, the Echo, the Dot. Each device utilizes Alexa Voice Services, Amazon’s unique natural language processor.

However, there are still major issues to be resolved within the field. One of these is the ambiguity that arises out of spoken language — English in particular. Consider the following basic sentence:

  • I didn’t take her purse.

Depending on where the emphasis is placed, this sentence conveys different messages – & thus it conveys different pieces of information. Emphasis is not easily reconciled within machines, & this is because that was not an aspect of language captured within Chomsky’s mathematical treatment of sentences; most processors do not know how to handle these kinds of ambiguities unless the ambiguity somehow disappears. Crudely, this is done thru the use of hard-coding particular linguistic patterns you want your machine to interpret. However, there have been strides to develop probabilistic methods of interpreting speech. This is largely thanks to the introduction of machine learning algorithms, for which most NLP techniques are now based.

Machine Learning in a Nutshell

Machine learning is a discipline in its own right, which I’ve touched upon in other posts (but have yet to provide a real in-depth treatment of). It is characterized by the examination of large data sets – dubbed training sets – & making statistical inferences based on these data. The more data you analyze, the more accurate your model will be. A popular application of machine learning is social media analysis.

There are entire texts dedicated to the subject, one of my favorites being this one, which combines definitions with visualizations.

Linguistics as Portrayed in Popular Culture

Perhaps the most recent portrayal of modern linguistics can be seen in the recent sci-fi blockbuster Arrivalwhich is about a linguistics professor attempting to converse with aliens. The movie is a nod to researchers at SETI, as well as Freudenthal, who developed Lincos — an attempt at the creation of a language based on perceived commonalities in mathematics that we would have with an alien species. His work, while pivotal, had some serious pitfalls, which you can read about in the attached article.


Dr. Louise Banks attempting to communicate basic words to alien visitors in the latest sci-fi film Arrival.

During the film, the audience was able to see some of the computational methods that modern linguists use in analyzing speech, & this technology is written about extensively by Stephen Wolfram (in fact, it is his technology they’re using in the film).

As a whole, the film helped to showcase linguistics as a highly technical & scientific field of study, as opposed to a degree that people get because they “speak like five languages or something” (actual quote by someone I know, who will remain nameless).

Not the Beginning, but Not the End Either

While linguistics has been around for centuries, it is only during the mid-20th century that it took a turn from the historical to the mathematical; the major advancements made by Gödel, set forth by Cantor, Russell & Boole (as well as Aristotle, technically) allowed the study of patterns to be applied to other fields that were before, outside the scope of mathematics. The advent of computers ushered in new questions about language & artificial intelligence, & are now bustling fields of research. Yet for all our progress, we have yet to understand fundamental questions about what makes language the way it is. These – & other questions – suggest we have a ways to go, but it would seem as if we have a good foundation to base our future findings off of. Though, if anything is certain about mathematics, it is that intuition does not always reign supreme; language is constantly evolving, & it is good practice to examine foundations for flaws or else we might just end up at the same philosophical standstill that many mathematicians have fallen victim to before.


One comment

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s