Erdös-Rényi Graphs

So for the past few weeks, I’ve been examining random graph dynamics as a part of an independent study project through my university’s Directed Reading Program (DRP). Now, I’ve done DRP before, but this semester is the first time that I’m feeling like I’m learning some really cool and abstract math; DRP gives undergraduate students the opportunity to experience what research is sort of like within mathematics. You work one on one with a graduate student in an area of your mathematical interest, that is typically a topic not covered in your math courses.

Anyway, so my mentor and I are doing random graph dynamics. You’re probably thinking, “alright, cool…but what is that?”.  Random graph dynamics is basically just stochastic processes applied to networks; a given graph, G, will be sampled out of a set of graphs with some probability distribution. G has probabilities assigned to each edge, and each vertex represents a new state, or generation, in the resulting growth process.

The field was pioneered by — you guessed it — Erdös and Rényi; the graphs they looked at took the form, G(n,p). Aka, random graphs that follow a Binomial distribution. Usually referred to as ER(p), where the n argument is implicit in understanding that [n] = {1,…,n} is the vertex set, and edge probability for any edge connecting vertices in [n].

In particular with ER graphs, we’re interested in the ability to cluster. A cluster is simply just a vertex that has more than one edge attached to it. Clusters can tell us a lot about the properties of these random graphs. For instance, what if we were interested, for some G(n,p), how p changes with n? (In fact, many statisticians are interested in this, as it relates to social network analysis. I won’t be going over much of that here, but perhaps I’ll write up a post about applications of ER graphs. For now, I’m going to focus solely on the foundations).

When we look at this issue, we’re essentially looking at a phase transition between these large connected components in graphs. As it turns out, when the average number of neighbors per vertex is smaller than 1, then all clusters are quite small. While when the largest connected component exceeds 1, then there’s a unique giant component (a particularly noticeable cluster) containing a positive proportion of vertices.

Screen Shot 2015-10-23 at 8.45.57 PM

Screen Shot 2015-10-23 at 8.46.07 PM

Two ER random graphs each with 100 vertices. The top graph has edge probability .025, and the bottom graph has edge probability .04. You can see regions in which clustering occurs: for top graph, there is clustering in the middle left part of the graph. in the bottom, there is clustering in the right center part of the graph.

Naturally, we can next ask the question: is there a maximum size for these clusters? is there a minimum? i.e.; are these clusters bound? And if so, how are they bound? Let’s explore these questions a little more.

Branching Processes

To answer this question for ER graphs, we need to look at something a bit simpler. So, consider a random model for population growth which we call a branching process. These structures are commonly used to study the number of descendants of each generation. In order to do this, it’s convenient to sequentially investigate the number of offspring of each member of the population. This leads to a random walk interpretation of branching processes.

So let X1, X2, X3,…,Xn be i.i.d. random variables with a given probability distribution. Now, we categorize individuals, expressed as vertices, as either unexplored or explored. We define S0, S1, S2, S3,…, Sn to be the amount of unexplored individuals at each generation, {1,…,n}, by the recursion,

  • S0 = 1
  • Si = Si-1 + Xi – 1 = X1 + … + Xi – (i-1)

So at ith generation, there are Si unexplored individuals.

Now let the random variable T be the smallest time, t, for which St = 0.

This time T is equal to the total progeny of the branching process. That is, T = min{t: St = 0}. To get a sense of what exactly this means, suppose we evolve a family tree as such,

Screen Shot 2015-10-24 at 6.07.32 PM

Of course, by definition, S0 is 1. So, to begin the exploration algorithm, let’s explore the root……ok great! We’ve now explored the one individual there! Now, our S1 is 3, because there are a total of three offspring from the root individual. They can be explored arbitrarily…..awesome, they’ve been explored! Now we have no new individuals to explore, because none in the previous generation had any offspring. So, S2 is 0. Thus, our total progeny for this tree is 4. And hence, T is 4.

Ok, back to ER random graphs

So how are these related? Well, as it turns out, you can model the connected components of an ER graph, ER(λ/n), where λ = np, much like binomial branching processes such as the one we’ve discussed above. We can bound the size of these clusters, |C(v)|, (cluster containing v vertices all reachable via occupied edges), from above by the total progeny of a branching process with a binomial distribution, and from below, on the cluster tails, by the probability that the total progeny of a branching process exceeds a particular value. So let’s go ahead and state these stochastic domination theorems:

Theorem 1: For each k ≥ 1,

Screen Shot 2015-11-05 at 5.53.09 PM

Where T is the total progeny of a binomial branching process with parameters n and p.

Theorem 2: For every k ∈ [n],

Screen Shot 2015-11-05 at 5.59.52 PM

Where T is the total progeny of a branching process with binomial distribution with parameters n – k and success probability p = λ/n

I am not going to go through the proof for these theorems (as this post is already quite lengthy and has taken me a long time already to write), but they can be proven via a coupling approach, if you’d like to give it a shot!

Ideally what you want to take away from these two theorems is that because of the fact that as k/n approaches 0 with high probability, the expected offspring of the branching processes takes the form npλ, and so we can expect the ER graph to exhibit different behavior in the subcritical (λ < 1) and supercritical (λ > 1) regimes, since this is what happens with branching processes. Thus, the analysis of |Cmax| is much different in the subcritical regime in comparison to |Cmax| in the supercritical regime. Therefore, we can expect the bounds on the largest connected components to be different as well.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s