Symmetries: Why Number Theory is Kind of Important

During my time as a physics student, I have often heard others in the department lament the fact that they have to take certain proof based upper division math courses for an elective credit. Among those, number theory is a popular course that physics students can take to fulfill this.

However, many don’t find it particularly enjoyable; a lot of physics students are concerned with the utility of a mathematical idea, which makes sense given the discipline, & are convinced number theory has no use in application. I think this shortsightedness is unfortunate, but easily correctable: just shed some light on a common application of ideas from number theory that are prominently used in physics! That application? Group theory, which will be the content of this post.

What is a Group?

Group theory is a fancy way of describing the analysis of symmetries. A symmetry, in the mathematical sense, refers to a system which remains invariant under certain transformations. These could be geometric objects, functions, sets, or any other type of system – physical or otherwise.

Physicists typically don’t need to be told how group theory is useful, they already know this to be the case. But a lot of undergrads are not able to properly study foundational aspects of groups because the degree plan doesn’t emphasize it, even though it underpins some of the most important physical laws – such as conservation principles – that we do, in fact, learn.

The easiest way to think of groups is to think of a set with an operation applied. For this set to indeed be a group, it must fulfill certain properties:

  • It must have a binary operation
  • It must be associative
  • It must contain an identity element
  • It must contain an inverse

Binary Operation

A binary operation, on a set A is a function which sends A × A (the set of all ordered pairs of A) back to A. In less formal terms, it takes in two of the elements from A & returns a single element which also exists in A. Another way to look at this, is that the operation in question has the property of closure.

A good example to think of, is addition of the set of integers: is addition a binary operation?


Some test cases for addition of integers

It would certainly seem to be the case! It is, & we intuitively know this to be true (which is why I am going to skip the proof).


Say we have a binary operation on the set A. This operation is said to associative if, for all a, b, cA

(a b) ∗ c  = a ∗ (bc)

If we use our previous example about addition on the set of integers, we can easily see that addition on ℤ is associative. But what about subtraction?


Using Mathematica, we can gauge whether or not two statements are equivalent using the triple ‘=’ symbol; explicitly, the left-hand side is equal to a-b-c, while the right-hand side is equal to a-b+c

So we can confidently say that subtraction on the set of integers is not associative. So while it is a binary operation, <ℤ, -> still fails to be a group.

Identity Element

Suppose again that ∗ is a binary operation on A. An element e of A is said to be an identity for ∗, for all aA if

a ∗ e = e ∗ a = a

Using our handy example of the set of integers, ℤ, we know that <ℤ, +> does have an identity element. Similarly, <ℤ, ×> also has an identity element:



Suppose, once more, that ∗ is a binary operation on A, with identity element e. Let aA. An element b of A is said to be an inverse of a wrt ∗ if

a ∗ b = b ∗ a = e

The set of integers does in fact have inverses under addition as well. In fact, we are very familiar with these inverses:


I will let you, the attentive reader, decide for yourself if you think that <ℤ, ×> also contains inverses. What about <ℤ, ->?


It might not seem like it, but subgroups are an important aspect of group theory. But why look at subgroups when you could just look at groups? After all, the term itself implies that it is not even a whole group…surely there must be more information contained in a regular ol’ group? That was a question I had asked myself, too; I just didn’t get the point.

Now though, I realize that you can learn a lot about the structure of a group by analyzing their subgroups. More generally, if you want to understand any class of mathematical structures, it helps to understand how objects in that class relate to one another. With groups, this begs the question: Can I build new groups from old ones? Subgroups help us answer this question.

Suppose G is a group. A subset H of G is called a subgroup of G if

  • H is non-empty
  • H is closed under the operation of G
  • H is closed under taking inverses


All this means is that there must be at least one element in H; it cannot be the empty set, ∅.


If H is a subgroup of G, we sometimes say that H inherits the operation of G.

Let’s look at this idea with some things we are familiar with. Particularly, let us use our handy dandy knowledge about the set of integers ℤ under addition. We know that a subset of ℤ could be S = {-1, 0, 1}. If we add any of the elements in S together, we will always end up with an element that is also in S.

That is pretty easy to see, so let’s look at a counter-example. Let us have a second subset of ℤ, T = {-3, -2, -1, 0, 1, 2, 3}. At first glance, it would seem like T is closed under addition on ℤ. However, if we add 2 and 3 together, that would result in 5. And we can see that 5 ∉ T.

Therefore, T does not inherit the operation of ℤ.

Inverses…Part Two

If H is a subset of G, then any element within H must have its respective inverse also be in H. We talked about what inverses were a bit earlier, so I am not going to re-type it.

You might be reading this and be thinking, “But wait! Shouldn’t a subgroup also contain the identity element as well? Silly Jesse…you buffoon…” Have no fear! For the existence of an identity element in H actually follows from the existence of inverses. I won’t prove it, but please trust me…

Special Groups and Their Properties

There are certain groups in which interesting patterns crop up. This makes them stand out amongst other groups, thus they demand special attention be paid to them. One such group is called a cyclic group.

Let G be a group with aG. The subgroup <a> is called the cyclic subgroup generated by a. If this subgroup contains all the elements of G, then G is also cyclic. You can inherently see the usefulness of subgroups in full effect here: the ability to understand more about a group can come from the existence of certain subgroups.

But what exactly does it mean for a group to be generated by an element a? In short, it means that all the elements of a group can be represented as a multiplier of a single element of that group.

More formally, ∃ m ∈ ℤ such that G = {a}, or in additive notation, G = {ma} .

Let’s take a look at this idea in the context of an example: Take the group <ℤ8, +> that is, {0, 1, 2, 3, 4, 5, 6, 7} under the operation of addition.

Screen Shot 2017-03-06 at 3.54.19 PM

Additive table of integers modulo (order) 8 & associated generators of the group. Elements {1, 3, 5, 7} can be multiplied by some m (mod 8) that result in the creation of every other element in the group.

Think about it: if we took m × 2, we would only end up with multiples of 2 – {0, 2, 4, 6} – which does not account for every element in ℤ8. Moreover, the same can be said about elements 4 and 6 – they do not generate every other element in the group.

Why are the generators the ones that they are? What is so special about the relationship between the generating elements of a group & the order of the group? Well, much to the dismay of physics students everywhere, the result is pretty interesting…

Relation to Number Theory

While there are multiple connections between algebraic structures – such as groups – to number theory, perhaps the most useful one comes from ideas related to the division of numbers. In particular, the notion of relative primality is especially useful for understanding more about behavior of cyclic groups.

Recall that m divides n (n | m) if there exists a k such that n = mk. Also recall, that an integer p is prime if p has exactly two divisors: p & 1.

Now, if both n & m are not zero, then ∃ d ∈ ℤ+ for which d is the greatest common divisor of n & m:  gcd(n, m) = d.

When d is equal to 1, we say that n & m are relatively prime. Meaning, there are no factors shared between n & m, which can divide the both of them, other than 1.

Going back to our previous example of cyclic groups, using <ℤ8, +>, what do we notice about the generators?

Screen Shot 2017-03-06 at 4.32.55 PM

The gcd between the generators of integers modulo 8 & 8 is 1.

More generally, the gcd between the generators of any group, & the order of the group will always be 1. If you did not know what the generators of a cyclic group were, you could find them using this concept.

There is more to be said about the relation between number theory & group theory – such as the use of the division algorithm to prove existence of particular elements in cyclic subgroups – but I feel like I’ve already made a compelling enough case for the utility of number theory.

Like always, thanks for reading!


The Mathematics of Linguistics

Hey, y’all! It’s certainly been awhile. I’ve been busy the entire semester but with the lull in between Fall & Spring terms, I figured I would try to write something interesting I had been thinking about for some time.

Those that know me, know that I am absolutely crazy about natural language applications — pieces of technology that utilize your speaking voice to control some aspect of the device. It has a wide range of utility, that I’ll leave to a different post which will discuss examples of these pieces of technology. This post is about a topic a bit abstracted from that: the underlying ideas that allow you to make use of these applications come from computational linguistics, a unique & really cool subfield within the scope of AI.

Development of Linguistics as a Scientific Field

The idea of linguistics being a subject of mathematical research comes as a surprise to a lot of people I talk to that do not do much computer science type work; they have no idea that there are a number of a mathematical models used directly in the field of linguistics, including n-grams & ideas of entropy.


Basic wordcloud of Wikipedia page on linguistics. Generated using Mathematica.

More indirectly, foundational ideas from both of these direct applications of probability theory come, perhaps unsurprisingly, from the logic developed by Cantor & reconciled by Russell in set theory, as well as from various mathematicians during the Golden Age of Logic, specifically Gödel & his Incompleteness Theorems; These ideas concerned linguistic patterns of mathematical axioms, & set the stage for analyses of everyday languages as communicative systems of which they exist at any given point in time, regardless of their history, with ‘axioms’ being replaced with ‘grammars’.

A Basic Idea of Set Theory

Set theory provided a needed versatility when identifying patterns at higher levels of abstraction, with the first whole theory being developed by Georg Cantor, who relied on Boole’s algebraic notations when working with syllogisms. The theory begins by establishing an arithmetic for sets:

If A and B are sets, let AB denote the set consisting of all members common to A and B, and let A + B denote the set consisting of all members of A together with all members of B.

This is just a formalization of Boole’s work, the only difference being that here, A & B refer to any set, not just those arising as a consequence of propositional logic. To avoid confusion, a contemporary notation was developed to where AB becomes A ∩ B (read as ‘A intersection B’), & A + B becomes A ∪ B (read as ‘A union B’). The arithmetic developed by Boole still holds in this generalization, except for the axioms involving the object 1, because there is no need for these kinds of objects in set theory. Sets of objects are subjected to the same type of properties – the same type of arithmetic – that can, but not necessarily, preclude other objects from being categorized in the same set.


Sets are intuitively represented as Venn diagrams. The left is the union of both sets, whereas the right is the intersection of the given sets.

Small sets of objects are typically listed explicitly, while larger or infinite sets use ellipses e.g.:

  • {1, 2, 3}, a set consisting of the elements 1, 2, & 3
  • {1, 2, 3, …}, the infinite set of all natural numbers, also represented as ℕ

The introduction of sets allows us to examine patterns in varying levels of abstraction: the natural numbers are within the set of integers, which are within the set of rational numbers, which are within the set of real numbers, which are within the set of complex numbers. We typically refer to sets within sets as subsets. In notation, this is

ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ

Mathematicians have used this idea to answer questions regarding the nature of numbers; to answer a question like, “what is a number?” we have to look out how members within certain sets of numbers can be described by the sets that precede it. That is, how properties of numbers within a lower abstracted set can be combined in some fashion as to represent elements within higher abstracted set. At the lowest level – the set of natural numbers – can be described in terms of axioms (namely, you can construct the notion of the set of natural numbers using solely the empty set – the set with no elements – denoted as ∅).

Inconsistency & Reconciliation with Axiomatic Set Theory

For all the usefulness of set theory, there was a major flaw in the framework. Before I discuss what the flaw was, it is more apt to discuss why having it is such a major deal.

Of all things that can be wrong with a particular axiom system – & you would hope there is none – inconsistency is definitely the worst. It is possible to work with axioms that are hard to understand, we see this every day: the Peano axioms that make up the set of natural numbers are not widely known by most people, yet we use natural numbers constantly. It is also possible to work with axioms that are counterintuitive which we, again, see every day (shoutout to statisticians). It is even possible to work with axioms that do not accurately describe the system you are intending to represent, since these axioms could be useful elsewhere in mathematics. But inconsistent axioms? No way, José.

The particular inconsistency, found by Bertrand Russell, was one relating to properties. Recall that I said that elements satisfying particular properties can be grouped together into sets. Well what happens when you have properties that are not properties of the elements within sets but properties of sets themselves? Is the set a member of itself? Or, what happens, if certain sets do not fulfill this property, are they not members of themselves? That is to say:

If set R satisfies property P, but set W does not, can R ∈ R if W ∉ W? This is a contradiction in the definition of a set itself. Russell looked at this question in a more nuanced way, that I will not go into detail here, but arrived at the same impasse.

The solution? The development of axiomatic set theory, which added new axioms to the ones that already existed from before. It is not as simple & succinct as Cantors original theory, & so it was only with reluctance that it was abandoned, but it just goes to show that even the most intuitive ideas need to be critically examined for flaws.

Gödel’s Incompleteness Theorem

A similar type of foundational flaw was found in the axiomatic approach to mathematics itself. The reason we use axioms as building blocks for particular mathematical structures is because it makes it possible for us to separate the ideas of being able to prove something & something being objectively true. If you had a proposition that was provable, then you can use a sequence of arguments to deduce whether or not the proposition was true, as long as your axioms were assumed true. These ideas being separated allowed mathematicians to avoid dealing with the major philosophical implications that “objective truth” entails.

Now, since figuring out which axioms to use is crucial to formalized mathematics, this implies – within the notion of formalization – that you will, at some point, find all the axioms you need to be able to prove something. This is the idea of completeness of an axiomatic system. But, as noted with Russell, finding all the axioms needed can be a really difficult undertaking. How do we know if we have found them all? How can we have a complete set of axioms?

In comes Gödel, who asserts that an axiom system must be incomplete, because there are questions that cannot be answered on the basis of the axioms. The big idea here is that consistency precludes completeness: if your axioms are consistent – that is they build a basic arithmetic by which you can perform known operations like addition, multiplication, etc. – then there exists statements which are true but not provable.


Kurt Gödel (1906-1978)

The proof of Gödel’s Theorem is better left to the entire books that are dedicated to its uses & abuses as it pertains to mathematical logic. A book I would recommend, if you are curious, is Gödel’s Theorem: An Incomplete Guide to Its Use and Abuse by Torkel Franzén.

Grammatical Approach to Linguistics

From what we have seen previously, it is not so farfetched then to examine the study of everyday language with the same axiomatic basis that we have seen before with the notion of sets. I suppose another question that arises in this examination is why: why would we want to analyze language with this paradigm? Let us look at an example:

  1. People love going to the movie theater.
  2. Cats are interested in mayonnaise and like resting in catacombs.
  3. Lamp soft because mom to runs.

It should come as no surprise that the third sentence is not proper English, but the first & second sentences are. The first & second sentences are correct, yet one is nonsensical. How can that be the case? Why can a nonsensical sentence still be classified as a genuine English sentence? It is because of the way in which the words are stitched together – the structure of the sentence.

Sentence structure – much like sets – are abstract constructs that contain elements adhering to specific properties. These elements are things you can explicitly point to, in this case, particular words. Because sentences are a level of abstraction above words, they cannot be explicitly pointed at when coming up with rules. The only thing you can do is observe the repetitive behavior expressed by sentences & then come up with a way to generalize that behavior. These patterns constitute rules that are much like the axioms we’ve encountered before; the type of “arithmetic” that would be applied to this set of axioms is the basis of a grammar for a language.

Inspired by the advancements of logic in the Golden Age, this new mathematically based linguistics attempted to reduce all meaningful statements to a combination of propositional logic & information conveyed by the five human senses. A step further, Noam Chomsky attempted to do what could not be done with set theory: to design a process of finding all axioms that described the syntactic structure of language.

  • DNP VP → S
  • V DNP → VP
  • P DNP → PP


  • DNP = definite noun phrase
  • VP = verb phrase
  • S = sentence
  • V = verb
  • P = preposition
  • PP = prepositional phrase

These are just a few of pieces of the formalism developed by Chomsky. Using the above grammar – & typically a parse tree – it is very easy to apply to the English lexicon, as each word corresponds to one of the given types. This formal grammar, which is built on axiomatic principles, captures some of the structure of the English language.


Parse tree of a simple sentence. The sentence being “the rat ate cheese”, with the sentence broken down into its constituents.

The Advent of Natural Language Processing

It should be noted that the use of parse trees is highly developed in computer science, & used extensively in natural language processing (NLP). Chomsky’s algebraic work in grammar has allowed computer scientists to break down human speech patterns in a way that machines can analyze, understand & ultimately interpret; NLP considers the hierarchal structure of language.


Amazon’s family of natural language devices. From left to right: the Tap, the Echo, the Dot. Each device utilizes Alexa Voice Services, Amazon’s unique natural language processor.

However, there are still major issues to be resolved within the field. One of these is the ambiguity that arises out of spoken language — English in particular. Consider the following basic sentence:

  • I didn’t take her purse.

Depending on where the emphasis is placed, this sentence conveys different messages – & thus it conveys different pieces of information. Emphasis is not easily reconciled within machines, & this is because that was not an aspect of language captured within Chomsky’s mathematical treatment of sentences; most processors do not know how to handle these kinds of ambiguities unless the ambiguity somehow disappears. Crudely, this is done thru the use of hard-coding particular linguistic patterns you want your machine to interpret. However, there have been strides to develop probabilistic methods of interpreting speech. This is largely thanks to the introduction of machine learning algorithms, for which most NLP techniques are now based.

Machine Learning in a Nutshell

Machine learning is a discipline in its own right, which I’ve touched upon in other posts (but have yet to provide a real in-depth treatment of). It is characterized by the examination of large data sets – dubbed training sets – & making statistical inferences based on these data. The more data you analyze, the more accurate your model will be. A popular application of machine learning is social media analysis.

There are entire texts dedicated to the subject, one of my favorites being this one, which combines definitions with visualizations.

Linguistics as Portrayed in Popular Culture

Perhaps the most recent portrayal of modern linguistics can be seen in the recent sci-fi blockbuster Arrivalwhich is about a linguistics professor attempting to converse with aliens. The movie is a nod to researchers at SETI, as well as Freudenthal, who developed Lincos — an attempt at the creation of a language based on perceived commonalities in mathematics that we would have with an alien species. His work, while pivotal, had some serious pitfalls, which you can read about in the attached article.


Dr. Louise Banks attempting to communicate basic words to alien visitors in the latest sci-fi film Arrival.

During the film, the audience was able to see some of the computational methods that modern linguists use in analyzing speech, & this technology is written about extensively by Stephen Wolfram (in fact, it is his technology they’re using in the film).

As a whole, the film helped to showcase linguistics as a highly technical & scientific field of study, as opposed to a degree that people get because they “speak like five languages or something” (actual quote by someone I know, who will remain nameless).

Not the Beginning, but Not the End Either

While linguistics has been around for centuries, it is only during the mid-20th century that it took a turn from the historical to the mathematical; the major advancements made by Gödel, set forth by Cantor, Russell & Boole (as well as Aristotle, technically) allowed the study of patterns to be applied to other fields that were before, outside the scope of mathematics. The advent of computers ushered in new questions about language & artificial intelligence, & are now bustling fields of research. Yet for all our progress, we have yet to understand fundamental questions about what makes language the way it is. These – & other questions – suggest we have a ways to go, but it would seem as if we have a good foundation to base our future findings off of. Though, if anything is certain about mathematics, it is that intuition does not always reign supreme; language is constantly evolving, & it is good practice to examine foundations for flaws or else we might just end up at the same philosophical standstill that many mathematicians have fallen victim to before.

PostFix Notation and Pure Functions

Hey, y’all! Hope the summer is treating you well! It’s been awhile since my last post, and the reason for that — as some of you may know — is because I have started working full time at internship with Business Laboratory in Houston, TX. I will likely be talking about the projects I’ve been working on in later posts; I’m reluctant to at the moment because I’m trying not to count my chickens too early. I am going to wait until the projects are officially finished for that, but for now, I’d like to talk about a cool method of programming with the Wolfram Language that I am really fond of: the use of PostFix notation and pure functions.

I’m going to assume at least a basic familiarity with Wolfram language (likely some of y’all have stumbled upon Mathematica if you’re a science student, or have used Wolfram|Alpha at some point for a class). That said, pure functions are still kind of mysterious, so let’s talk about them.


What the heck is a pure function

Pure functions are a way to refer to functions when you don’t want to assign them a name; they are completely anonymous. What I mean by that is, these functions have arguments we refer to as slots, and these slots can take any argument passed through regardless of whether or not the argument in question is a value or a string, and regardless of what kind of pattern the passed argument follows. To illustrate what I mean, consider:

Screen Shot 2016-08-06 at 7.44.19 PM

As you can see, the functions “f” and “g” don’t care whether what you’re passing  thru it is an undefined variable, a string, or a value. It’ll evaluate it regardless.

If you’re seeing some similarities between Wolfram pure functions and, say, Python λ-functions, no need to get your glasses checked! They are essentially equivalent constructs: in practice, they both allow you to write quick single-use functions that you basically throw away as soon as you’re finished with them. It’s a staple of functional programming.

Screen Shot 2016-08-06 at 7.39.53 PM

A comparison of the traditional way of evaluating functions vs the evaluation of functions with pure functions; the results, as you can see, are identical. Another thing to note is that traditionally, you must set the pattern in the function you are defining, hence the h[x_] — this underscore denotes a specific pattern to be passed as an argument in order for the function to evaluate. Hence, pure functions are much more flexible.

Last note about pure functions: it’s important to remember that when you are writing in the slots, you need to close them with the ampersand (&). This tells Mathematica that this is the end of the pure function you are working with.


Ok, but what about PostFix

We are familiar with applying functions to variables. We are taught this in math courses: apply this function, which acts as this verb, to this variable, which can take on this many values. In math terms, when you evaluate a function you are relating a set of inputs to a set of outputs via some specific kind of mapping. For example, if I have:

Screen Shot 2016-08-06 at 8.01.17 PM

What I am saying is my set of inputs is x, and when I apply the function, f(x) = x + 2, to this set of inputs, I get the corresponding set of outputs, f(x).

In Mathematica, this is typically expressed as brackets, [ ] (as you might have noticed). PostFix is the fundamentally the same exact thing. It is a method of applying a function to some kind of argument. Except, with PostFix notation for brackets, expressed as //, you are now tacking the function on at the end of an evaluation. Like so:

Screen Shot 2016-08-06 at 8.07.15 PM

PostFix notation has its pros and cons, and like any good developer, it is up to you to decide when it’s best for you to use this particular application.


Putting them together

When these two methods are put together it can make for some rapid development on a project, particularly when you are importing data; you are usually importing something that has a raw, messy format (say coming from an Excel spreadsheet or some other database). PostFix notation allows you to perform operations on this data immediately after import, and pure functions allow you to recursively call the things you did in the previous line of code (just before PostFixing) without rewriting it. Doing this makes debugging a breeze, because you can easily break apart the code if it fails to evaluate in order to see where the failure is occurring. Here’s an example of what I mean:

Screen Shot 2016-08-06 at 8.18.08 PM

A comparison between nesting of functions method that is typical of most Mathematica users and the PostFix with pure functions method. You can see that I’ve included the use of two global variables, because there may be another point in development where you would like to use just the specific pieces of information from previous code, and not the whole thing; e.g. the function DataSet was included solely for visualization purposes, and the pertinent information can be retrieved by just calling the variable csv if it needed to be analyzed separately at another time.

In conclusion

PostFix notation, coupled with the use of pure functions, accelerates your development and will help you get whatever you’re working on — whether it be an app, or some kind of homework problem — to the final stages much faster. It is not intended (by any means) to be a nice, polished piece of code; it is primarily for testing purposes, and I would advise to dress it up a bit before pushing it out for the world to see. Just one person’s opinion of course, you can ultimately do whatever you want. Anyway, I hope this has been somewhat helpful for y’all, and like always: thanks for sticking around!

Poisson Processes as a Limit of Shrinking Bernoulli Processes (aka: pt.2)

So what exactly are Poisson processes useful for? Consider buses in a queue. Whenever a bus arrives a person enters the bus. The amount of time a person waits for the bus (that is, the amount of time before and up until an event occurs) is a random variable with some probability distribution. We can model bus arrivals through a sequence of these random variables,(which constitute their interarrival times). Specifically, we can approximate a Poisson process from this, provided our conditions fulfill various properties and definitions.

The previous post about Poisson processes was more or less describing how they behave, not really a definitive grasp of what they are based on the properties they have. By giving a set of definitions, one can construct a process that follows all definitions. This process is then described accordingly.

Definition: a Poisson counting process, N(t), is a counting process (that is, the aggregate number of arrivals by time t, where t > 0) that has the independent and stationary increment properties and fulfills the following properties:

Screen Shot 2015-05-04 at 7.14.53 PM

1) says that the probability of no arrivals is 1-λδ = 1-λ2j

2) says that the probability of exactly one arrival is λδ = λ2j

3) says that the probability of 2 or more arrivals is the higher order term upon expansion, which is essentially negligible.

Basically, what these properties are telling us are that there can be no more than one arrival at a time! The term 2j is really important, and we’ll see why in a little bit.

So how does this relate to a Bernoulli process? Well, Poisson processes are continuous time processes, and thus have an uncountably infinite set of random variables. This is particularly difficult to work with, so we approximate a continuous time process with a discrete time process. In this case, we can use a Bernoulli process, with some extra parameters, to approximate a Poisson process (as it turns out, the Bernoulli process is the correct approximation to use, and we’ll see why in a bit).

A Bernoulli process is an IID sequence of binary random variables i.e.; {X1, X2, X3, …, Xn} for which px(1) = p and px(0) = 1-p. So if a random variable has the value of 1, that is Xi = 1, there is a bus arrival at time i. If Xi = 0, there is no arrival at time i.

To make this process useful to approximate a Poisson process with, we “shrink” the time scale of the Bernoulli process s.t. there exists a j > 0, where Xi has either an arrival or no arrival at time i2-j.  This new time, i2-j, come from the fact that we are splitting the length of the interarrival period i by half, and then half, and so on, where each new slot has half the previous arrival probability.

To give a visual understanding of this, imagine that this is our original distribution timeline:

Screen Shot 2015-05-04 at 9.58.19 PM

For the first Bernoulli process (imagine there is a 1 as a superscript for each rv). Now, if we split the length of the interval in half,

Screen Shot 2015-05-04 at 10.02.38 PM

we get the 2nd Bernoulli process. This can go on and on, j times. This is done in such a way that the rate of arrivals per unit time is λ, thus Xi ~ Bernoulli(λ2-j).

So for each j, the “j-th” Bernoulli process has an associated counting process,

Screen Shot 2015-05-04 at 10.12.23 PM

Screen Shot 2015-05-04 at 10.16.12 PM

which is a binomial PMF.

Now we’re ready to tackle Poisson’s Theorem, which says:

Consider the sequence of shrinking Bernoulli processes with arrival probability λ2-j and time-slot size 2-j. Then for every fixed time t > 0 and fixed number of arrivals, n, the counting PMF approaches the Poisson PMF (of same λ) with increasing j.

That is to say,

Screen Shot 2015-05-04 at 10.23.29 PM


Screen Shot 2015-05-04 at 10.36.39 PM

From the proof, we can see that a shrinking Bernoulli process is the appropriate approximation for a Poisson process, because the limit of the binomial PMF converges to the Poisson PMF.

Poisson Processes (pt. 1)

For the last few weeks I’ve been working to wrap my head around something called Poisson processes. A Poisson process is a simple and widely used stochastic process for modeling the time at which arrivals enter a system. If y’all have taken an intro probability course, y’all might’ve hear of something called the Poisson distribution — which is simply the probability of getting exactly n successes in N amount of trials. Like the Poisson distribution, a Poisson process is essentially thought of a the continuous-time version of the Bernoulli process (not trying to imply the Poisson dist. is continuous, but it is the limit of the Bernoulli dist. taken to infinity!)

For a Poisson process, arrivals may occur at arbitrary positive times, and the probability of an arrival at any particular instant is zero. This means that there’s no very clean way of describing a Poisson process in terms of the probability of an arrival at any given instant. Instead, it’s much easier to define a Poisson process in terms of the sequence of interarrival times, that is, the time between each successive arrival.

This chapter has been taking me forever to get through, primarily because it’s soooo large; this book really breaks it down into manageable (and all equally important) subsections. The chapter starts by giving you a Poisson process, then describing more generally what an arrival process is, and from there, talking about the important properties of Poisson processes which make it the most elegantly simple renewal process (which are also a special type of arrival process).

There really isn’t much to know about what exactly a Poisson process is; they are characterized, perhaps predictably, by having exponentially distributed inter arrival times. Explicitly, we say that

  1.  a renewal process is an arrival process for which the sequence of inter arrival times is a sequence of positive identical and independently distributed (IID) random variables.
  2. A Poisson process is a renewal process in which the inter arrival times have an exponential cumulative distribution function; i.e.; for some real λ > 0, each X_i has a density specified as f_X(x) = λexp(-λx) for all x greater than or equal to 0.

The parameter λ is the rate of the process, and remains constant despite the size of the interval.

Probably the coolest thing about a Poisson process is that, by nature of it’s exponential distribution, it has a memoryless property. This means that, if you suppose X is a rv denoting the waiting time until some arrival, and the arrival occurs at time t, then the distribution of X is the same for all times x < t. That is to say, that the distribution of the remaining time until an arrival, is exactly the same as the distribution of the original waiting time. This is denoted as:

Screen Shot 2015-03-10 at 3.32.09 PM

We can use this memoryless property to extrapolate certain ideas about how the distribution of the first arrival in a Poisson process behaves, as well as how this first arrival (after time t) is independent of all arrivals up to and including time t.

Following this idea of a memoryless property that is attributed to exponential processes, we can show a lot more things as well like the idea of stationary and independent increment properties (though I’m not going to go into detail just because it’d make this post very lengthy).

There are other ways to define a Poisson process which might be more intuitive than this, but I like this way of describing what exactly it is and how we can use it to model simplistic stochastic processes. I might write up a post later on how to build up what a Poisson process is from its properties (as opposed to being given a Poisson process, and describing how it behaves from there) because I think a lot of the mathematical nuance is lost in this definition, but it’s way more practical.

Existence of Solutions of Elliptic PDEs Using Fourier Transforms and Fixed Point Theory

I’m going to start this blog off right: a repost from my previous blog of a talk I gave Dec. 2013.

I’d like to post it here as a more well thought out, well formulated (and hopefully much more clear) post that is complementary to the talk I gave.

Before I delve right into it, I’d like to provide some background.  The Fourier Transform of a function is very important to solving any sort of differential equation. Similar to Fourier Series, it expands a function to its sum of trigonometric functions (sometimes finite, sometimes infinite). But unlike Fourier Series, your function need not be periodic. In which case, we just say that the period, T, tends to infinity. Formally, we write the Fourier transform as:

Screen Shot 2015-03-04 at 6.52.33 PM

Really, I could write up an entire post about the Fourier Transform alone (and perhaps I will) but for now, this will do.

Now, we consider a point fixed if, given some operator, F(X) = X. The solution of this kind of equation is called a “fixed point”. A good example of commonly used fixed point is the equation

Screen Shot 2015-03-04 at 6.52.41 PM

Where u’ is u when the derivative operator is applied.

Essentially, all PDEs can be considered fixed points. So you can have something like,

Screen Shot 2015-03-04 at 6.52.47 PM


Screen Shot 2015-03-04 at 6.52.53 PM

Now, for most of my time while working on researching this I worked in L2 space. L2 space is a subset of the real space where the Fourier Transform of a function in this space satisfies the following:

Screen Shot 2015-03-04 at 6.52.59 PM

This is called the Parseval Identity, and it is very important with regards to functions in L2 space (there is also much more to be said about L2 space, but this is all that we need for the moment). From here, you can see that the Fourier Transform of a function still lies in L2. Moreover, this implies that the Fourier Transform maps L2  to L2.

Now, L2 space has a lot of interesting properties about it. Some of which are very helpful in our analysis of elliptic PDEs. One of those properties is the norm. On L2, the norm is denoted as:

Screen Shot 2015-03-04 at 6.53.13 PM

(You can see how this will be useful later on)

Now, for any operator that maps L2 to L2, it is considered a contraction iff there is a constant α < 1, and for any f1(x), f2(x) etc element of L2:

Screen Shot 2015-03-04 at 6.53.20 PM

A contraction means that after n iterations, the function will decrease by a factor of α. Thus, the function converges. Because of that, we can conclude that the function must exist. We can prove this computationally as well. Say we have an equation for U(x),

Screen Shot 2015-03-04 at 6.53.26 PM

Now we can take an operator, F, as it maps the space L2 to L2. Assumming M(x,y) satisfies

Screen Shot 2015-03-04 at 6.53.31 PM

we can prove that it is a contraction:

Screen Shot 2015-03-04 at 6.53.38 PM

This is what I was talking about in my previous post about wanting to go more in depth on contractions, especially how they relate to L2 space. Now, moving on: what happens when we move away from general ODEs? Away from “normal” PDEs? A big problem in solving PDEs stems from the fact that occasionally, our PDEs will be what we like to dub as “disturbed”; we introduce a disturb term, ϵ, to our equation. Unfortunately, psychiatric treatment for non-physical entities is not a very lucrative field of study so we must use a different approach in determining if a solution exists. Suppose our disturbed PDE is,

Screen Shot 2015-03-04 at 6.53.50 PM

where a and c are positive integers and d(x) is a real-valued function and ϵ is assumed to be very small. We can start by taking the Fourier Transform of the equation like usual,

Screen Shot 2015-03-04 at 6.53.58 PM


Screen Shot 2015-03-04 at 6.54.05 PM

Oh, hey! It’s a fixed point problem now! So, since we’re still on L2 (and always will be…) this becomes,

Screen Shot 2015-03-04 at 6.54.13 PM

Remember to exchange k‘s for x‘s at this point because we are now working in terms of the original equation. g(x) is used in place of the large constant involving d(k), since by fixed point theory we know that any operator applied (or “unapplied”) to d(x) will equal d(k). Thus the original function is,

Screen Shot 2015-03-04 at 6.54.19 PM

That crazy integral you see there is the definition of a convolution.

Now, we want to see if this is a contraction on L2, and we know by definition of the norm on L2 that,

Screen Shot 2015-03-04 at 6.54.26 PM

where the right half of the equation is equal to

Screen Shot 2015-03-04 at 6.54.34 PM

Now, to test for a contraction, we need not worry about the outer portion of the integral; we can simply work with the dy integral on the inside; we can manipulate it as follows,

Screen Shot 2015-03-04 at 6.54.43 PM

Basic properties of integration tells us that this integral can be broken up into two separate integrals. Because of that, we only need work with one of them, and we can choose them arbitrarily. Let’s work with the first one. By Cauchy-Schwartz we know that

Screen Shot 2015-03-04 at 6.54.52 PM

Thus, by definition of the norm on L2,

Screen Shot 2015-03-04 at 6.55.00 PM

Because this holds true with the first integral that we’ve broken up, in the same vein, it must hold true for the second,

Screen Shot 2015-03-04 at 6.55.07 PM

Because both of these hold true, it must hold true for the entire integral. That is, for the outer portion of the original integral we were working with beforehand,

Screen Shot 2015-03-04 at 6.55.15 PM

And finally, we can take epsilon out of the integral so it becomes,

Screen Shot 2015-03-04 at 6.55.30 PM

Our disturb term acts exactly like α does, which we know to be < 1. Which means that this is a contraction and thus there exists a solution for this disturbed elliptic PDE.