Uncategorized

So, What Next: Reflecting or Something Like That

Hey y’all

Some of you are already aware, but I recently graduated from university, and now I’m living that post-grad slow summer life (for the most part, anyway). For awhile, I was really unsure about what I’d be doing: I felt unprepared and didn’t have my sights narrowed down enough for graduate school, and so I was left with the sort of “what now?” feeling that a lot of physics students — around 40% actually — feel when graduating.

Post-Grad Blues (Minus the Unemployment)

Luckily, I haven’t been left alone long enough to wonder exactly what I’ll be doing in my immediate future: I was offered a full-time position with Wolfram Research as a Technical Writer (yeah, go ahead and say you saw it coming) which I accepted. As of a few weeks ago, I’ve been working! I also did a two week stint at their annual Summer Camp, teaching high schoolers about programming in Wolfram Language and pushing them to not just use computational tools — but to think computationally — about the problems they worked on. So far, I am enjoying things!

Some days though feel slower than others: With no working car, no friends in town, and by virtue of remote work, I get rather crushed by the weight of free time. I’ve never had much of it before, and I’ve been struggling trying not to get too wrapped up in my own thoughts which just kind of swim around in my head when I’m left alone; I do much better in a busier environment — either because I’m used to it or that’s just how my brain works. When I see other people going about their day, being productive, even just walking past and chatting with friends, I feel more inclined to remain focused. Like all the external stimuli help to distract any other thoughts that would otherwise remove my concentration from the task at hand.

Because of this, I have been attempting to find some ways of occupying my time during this transition state between graduation and relocation. I can’t really start up new hobbies or join organizations because I won’t be here long enough to contribute or get used to things, so I am relegated to investing my time in things I already enjoy doing but maybe got lost somewhere along the way while I was in school. One of those things being writing. However, you might’ve noticed I haven’t updated this blog recently (my last post being somewhere around late March). Since writing is my job now, getting burned out is something I’ve become considerably more worried about in the long term. Of course the types of things I write about at work are different sometimes, but my creativity is a bit sapped by the time I’m off the clock; I’ve wanted to update this blog but just felt like there was a big mental roadblock for the kinds of things I wanted to discuss.

This blog will consequently take a small shift back to what I was originally using it for: investigating cool math topics from a proof-based perspective. Rather than what I felt like it was becoming: investigating cool topics (of any kind) programmatically. Shifting back to more formal math will do a couple of things, first off, it will help to keep my life much more balanced by establishing concrete boundaries of what I write about here and secondly, it will help me stay interested and on top of fundamental math concepts for when I eventually apply to graduate programs sometime down the road.

Graduate School: A Simple Two-State System? Yes or No?

That being said, one of the other things I’ve been spending my free time doing recently is trying to break down my varying interests for what I want to pursue in graduate school: Normally I wouldn’t be thinking about all this so soon after graduating — or at least that’s what I told myself — but while I was at the Wolfram Summer Camp up in Boston, I got in touch with an old acquaintance from summer of 2015, when we were both students at the Wolfram Summer School. We had some nice discussions about my possible future, and he was really trying to sell me on the MIT physics department where he works in statistical physics, and it got me thinking about some things.

me_MIT

Hanging out with my friend outside MIT. Another hobby I’ve recently come into within the last few months, mostly as a way to create physical reminders of my time in important places with people I consider important, is instant photography. This was taken with a Polaroid Sun 660 using Impossible Project film. When I was younger I used to collect Polaroid cameras and have since amassed a small collection, though I never used them. This is me trying to use them!

For a long time I shunned the possibility of physics graduate school, instead convincing myself that computer science or mathematics would be more useful for both my own evolving interests and the types of problems that will emerge as we continue to develop technology. I’m realizing now that this line of thinking — one that I subscribed to as I became increasingly disenchanted with physics academia towards the end of my five years in undergrad — was incredibly narrow, though thru no fault of my own: Every institution focuses on different areas of research, and how much of that research is involved with other disciplines varies from project to project and consequently, from university to university. What I was seeing as sort of sequestered problems, restricted to their own domain, was really not the reality of things: As science progresses, fields become increasingly dependent on one another. Every project has a diverse array of researchers who all contribute their own life experiences, skills, and attitudes towards the project. It would make sense that — as problems become increasingly complex, and as we move thru new and exciting areas of research — that we would need different perspectives with which to tackle these new problems. Having people with different backgrounds, including different disciplines, makes for a wider tool belt with which to choose the appropriate tools from.

twitter.png

I was thinking about these things earlier this morning on Twitter (y’all can follow me if you want, though this isn’t meant to be a plug)

Filling a Gap: Getting Kids a Job

While I don’t pretend I had absolutely no control over my ostensible ignorance about the interdisciplinary nature of physics research (after all, APS and other national organizations have conferences which showcase all kinds of research, many of which involve the help of mechanical engineers or applied mathematicians, and even biologists and neuroscientists), I do think the department could have done a better job enlightening their undergraduate students about these things: Like I mentioned before, around 40% of undergraduates with physics degrees go directly into the workforce, mostly in the private sector. We work as data scientists, software engineers, teachers, etc, none of which preclude the opportunity to engage in research. It just takes a different form than what we’re used to seeing. But professors don’t usually know how to deal with students whose path doesn’t involve a direct entry into a physics graduate program; understandably, professors can give the best advice for what they know the best.

Screen Shot 2017-07-10 at 2.59.26 PM

The private sector remains the largest employer of physics bachelors holders. Source: AIP

Unfortunately, this results in carbon copies of said professors; I think physics departments could do better at engaging students who don’t follow traditional career paths, but I don’t know how to tackle this issue. Most of us are left to our own devices wrt networking, job fairs, and pretty much pursuing any opportunity we can find by ourselves with little outside help. We are locked out of the traditional network professors have of colleagues who are looking for students to fill openings in their group. In contrast, many engineering departments have fully staffed and knowledgeable career specialists who work with students in placing them somewhere they will thrive in industry.

I don’t really think there is much use in placing the blame on physics departments themselves, since every department is restricted by their own issues with budgets or university politics. Perhaps it doesn’t make much sense to place the blame anywhere. All I know is that there are necessary gaps to be filled within physics undergraduate programs that produce some of the most versatile types of workers to enter the job market, but who frequently struggle with looking for work.

I feel pretty lucky that this wasn’t my situation.

Weren’t You Talking About Graduate School?

Tangent aside, I would like to eventually go back to graduate school. But like my tweets above illustrate, I am no longer adhering to the “discipline first” perspective I had for these last few years. Instead, I’m embracing the fluidity of disciplines by approaching things from a “project first” perspective. My goal is to do work I find meaningful, and the more options I keep open in being able to do that, the more likely I will eventually get to do what I want to do.

What I want to do, however, is a different story. One that I am continuing to narrow down. But I feel confident, at least, that I can; I’m no longer stifled in the same way that I was before. Funnily enough, removing my linear train of thought has brought me closer to the broad area of nonlinear dynamics, which I am currently looking into as a potential fit for me.

Where I currently am in my life — graduated, about to move across the country, starting a new job — makes me feel open to so many new things. It feels only right that I continue to remain open about prospects of returning to graduate school, shifting my lens to view the possibility just a bit differently.

Like always, thanks for reading.

Statistical Mood Analysis

One of my favorite pastimes — like many millennials — is the evolution of memes. These are images which relay a central idea or joke that is largely contextual: their value as a communicative device exists only within particular social spheres. Outside of this sphere, they deliver no such information because the users outside of the user base do not understand the origins or humor of the meme in question. You can think of memes, in some sense, as a type of social currency — a cash crop if you will.

im_dropping_hints

Probably my favorite meme of all time. The variations of this meme that have been spun out of the creative minds of other millennials have been incredible. The ability to weave other memes, each meme with its own stylistic devices & structure, within this meme has been nothing short of linguistic genius. It really typifies how versatile a language can be when examined within a new media lens.

Anyway, there are many things to be said about memes, most of which I do not want to get bogged down in. We’ll just keep it simple & say that memes are vital to the ecosystem of the World Wide Web.

What is a Mood?

Upon the inception of this idea for mood analysis, together with my good friend Carlo, I ran into a tricky problem. How do I describe a mood to someone who is more out of touch from social media? How do I make the idea of a mood accessible to a wider audience? For me, these questions constituted a mood in an of itself.

A mood is a piece of textual information, typically a small bit about someone’s day-to-day living. This information explicitly describes a situation which we immediately connect to a particular feeling. Along with this piece of text, it is accompanied by a meme; the meme adds a humorous element about the situation, & also works to make the situation relatable to others. To paraphrase my good friend Alex, a mood “refers to content or experiences that evoke a […] feeling, and in sharing, communicate to others something about the self of the poster/sharer. […] Moods are funny, sad, joyful, upsetting uncomfortable, gross, specific, & confusing”, but may not be limited to these things.

Use of Neural Nets

With the release of Mathematica version 11.1, about 30 different types of neural net layers have been added to the core of the language. With the high level interface the notebook offers, it becomes really easy to build advanced neural networks. This is a consequence of the symbolic nature of the language: one specified, the language fills in the needed details. For those interested, Stephen talks about how everything interfaces with the low-level library MXNet.

In particular, the NetTrain function works to train parameters in any net from examples. NetTrain makes it easy to build up multi-layer neural networks. The one that we used for this analysis was LeNet, a type of network designed to recognize visual patterns directly from pixel images without creating large chains of composite functions — it minimizes the amount of preprocessing involved. Their strength comes from the fact that the patterns can be extremely varied yet still be recognized. This makes for a very robust network.

Analysis A

Since its pretty much impossible to build a model around specific moods such as “tfw u accidentally throw ur car keys into the trash, take out the trash, & then realize u threw out ur keys but the garbage truck has already collected ur trash” based on the sheer scope of moods created when you are dealing with specificity like this, we can at least attribute most moods to four different categories:

  1. Annoyed
  2. Confused
  3. Happy
  4. Sad

These seem basic enough to encompass a large swath of moods with relative ease. Using the above enumeration as the numeric representation of mood categories so that the code can interpret it. The data contains 10 memes a piece per mood. Here is a piece of the dataset:

Screen Shot 2017-04-02 at 7.55.45 PM

Various memes associated with the mood in question.

Next, the appropriate LeNet code was written for this dataset

Screen Shot 2017-04-02 at 8.03.45 PM

the LeNet code was written using the NetChain function, which allows us to create multiple layers of neural nets with ease. It outputs a NetChain object, which will be used to train the model on.

Then, we build the model

Screen Shot 2017-04-02 at 8.07.37 PM

As you might be able to guess from the progress of the neural net, it doesn’t seem to be matching the kind of progress we would hope to have. Still, it’s worthwhile to check the results:

Screen Shot 2017-04-02 at 8.12.51 PM

The results from the neural net analysis. The model tried to categorize all images as happy, & only managed to get two of these right.

Ok so, we definitely did not get the results we hoped for. That is actually way more interesting to me though, because it raises so many new questions: why did the machine try to sort every meme into the “happy” category? Is my sample size too small? Are there issues with the way machines render & interpret images that are contextually vague? Can non-human images be reconciled & contrasted with human images? And if so, can enough information/value be extracted to make meaningful inferences about the moods?

Use of Classify

In addition to neural networks, we also used the built-in function Classify, which is more automated than the family of ‘Net’ functions (NetModel, NetTrain, NetGraph, etc) in the sense that it doesn’t require like, any training — you can think of NetTrain as a linux machine & Classify as a Mac: the Mac works straight out of the box with little user configuration, while the linux machine is designed for complete customizability.

The Classify feature has been around since the introduction of image processing in the Wolfram Language, but its capabilities have been significantly expanded upon since then. So, might as well take advantage of it!

Analysis B

Using the same data set with the same dimensions, we write the code to be input into Classify:

Screen Shot 2017-04-02 at 8.22.30 PM

Then, we build the ClassifierFunction

Screen Shot 2017-04-02 at 8.24.07 PM

As you can tell in comparison to the analysis using NetTrain, this output has a lot less going on than what before. Because there is more automation here, there is more going on underneath the hood than we can see.

We can’t really hypothesize how the results will turn out based on this analysis like we could sort of before, so we will absolutely need to check the results to confirm a accurate model has been successfully built:

Screen Shot 2017-04-02 at 8.28.36 PM

This model has categorized four of the sampled images correctly. This is better than the previous analysis using neural nets which only got two right.

While the results are not stellar here either, we can still make some important observations wrt the differences between each analysis: This model did a much better job at categorizing images with their associated moods, but why? Or rather, how?

There must be more going on with the low-level interfaces than we can make sense of. Perhaps it has to do with the way each function understands & quantifies images? I am really not sure. I might come back to this post at a later date & work to fill in some of my knowledge gaps as I learn more about machine learning & image processing.

Machine Failure: Why Can’t Machines #Realize Things in 2017?

Ultimately, the failure of machines to fully grasp the idea of moods is not something that really needed to be demonstrated in this way: moods are highly variable, volatile, contextual, ambiguous, & in some cases incomprehensible — even for humans.

But what allows us to connect feelings to images — what allows us to relate thoughts, concerns, & people together — comes down to social cues. It isn’t just a learned pattern of behavior that lets us absorb the value of moods, but rather it is the combination of these patterns together with the added information we inherently consume while being steeped in the environment we live in; humans do not exist in a vacuum — we are affected by the unconscious messages we receive starting from childhood. This isn’t anything new, but for some reason we ostensibly have trouble balancing these ideas when it comes to artificial intelligence.

I don’t think we can reasonably expect to understand moods on the analytical level alone, & because of that I find it hard to believe we will also develop any useful machine learning methods to extract the full breadth of value that moods contain.

But perhaps this is the biggest mood of all.

Symmetries: Why Number Theory is Kind of Important

During my time as a physics student, I have often heard others in the department lament the fact that they have to take certain proof based upper division math courses for an elective credit. Among those, number theory is a popular course that physics students can take to fulfill this.

However, many don’t find it particularly enjoyable; a lot of physics students are concerned with the utility of a mathematical idea, which makes sense given the discipline, & are convinced number theory has no use in application. I think this shortsightedness is unfortunate, but easily correctable: just shed some light on a common application of ideas from number theory that are prominently used in physics! That application? Group theory, which will be the content of this post.

What is a Group?

Group theory is a fancy way of describing the analysis of symmetries. A symmetry, in the mathematical sense, refers to a system which remains invariant under certain transformations. These could be geometric objects, functions, sets, or any other type of system – physical or otherwise.

Physicists typically don’t need to be told how group theory is useful, they already know this to be the case. But a lot of undergrads are not able to properly study foundational aspects of groups because the degree plan doesn’t emphasize it, even though it underpins some of the most important physical laws – such as conservation principles – that we do, in fact, learn.

The easiest way to think of groups is to think of a set with an operation applied. For this set to indeed be a group, it must fulfill certain properties:

  • It must have a binary operation
  • It must be associative
  • It must contain an identity element
  • It must contain an inverse

Binary Operation

A binary operation, on a set A is a function which sends A × A (the set of all ordered pairs of A) back to A. In less formal terms, it takes in two of the elements from A & returns a single element which also exists in A. Another way to look at this, is that the operation in question has the property of closure.

A good example to think of, is addition of the set of integers: is addition a binary operation?

screen-shot-2017-03-04-at-3-39-54-pm

Some test cases for addition of integers

It would certainly seem to be the case! It is, & we intuitively know this to be true (which is why I am going to skip the proof).

Associativity

Say we have a binary operation on the set A. This operation is said to associative if, for all a, b, cA

(a b) ∗ c  = a ∗ (bc)

If we use our previous example about addition on the set of integers, we can easily see that addition on ℤ is associative. But what about subtraction?

screen-shot-2017-03-04-at-4-14-52-pm

Using Mathematica, we can gauge whether or not two statements are equivalent using the triple ‘=’ symbol; explicitly, the left-hand side is equal to a-b-c, while the right-hand side is equal to a-b+c

So we can confidently say that subtraction on the set of integers is not associative. So while it is a binary operation, <ℤ, -> still fails to be a group.

Identity Element

Suppose again that ∗ is a binary operation on A. An element e of A is said to be an identity for ∗, for all aA if

a ∗ e = e ∗ a = a

Using our handy example of the set of integers, ℤ, we know that <ℤ, +> does have an identity element. Similarly, <ℤ, ×> also has an identity element:

screen-shot-2017-03-04-at-4-23-54-pm

Inverses

Suppose, once more, that ∗ is a binary operation on A, with identity element e. Let aA. An element b of A is said to be an inverse of a wrt ∗ if

a ∗ b = b ∗ a = e

The set of integers does in fact have inverses under addition as well. In fact, we are very familiar with these inverses:

screen-shot-2017-03-04-at-4-29-07-pm

I will let you, the attentive reader, decide for yourself if you think that <ℤ, ×> also contains inverses. What about <ℤ, ->?

Subgroups

It might not seem like it, but subgroups are an important aspect of group theory. But why look at subgroups when you could just look at groups? After all, the term itself implies that it is not even a whole group…surely there must be more information contained in a regular ol’ group? That was a question I had asked myself, too; I just didn’t get the point.

Now though, I realize that you can learn a lot about the structure of a group by analyzing their subgroups. More generally, if you want to understand any class of mathematical structures, it helps to understand how objects in that class relate to one another. With groups, this begs the question: Can I build new groups from old ones? Subgroups help us answer this question.

Suppose G is a group. A subset H of G is called a subgroup of G if

  • H is non-empty
  • H is closed under the operation of G
  • H is closed under taking inverses

Non-empty

All this means is that there must be at least one element in H; it cannot be the empty set, ∅.

Closure

If H is a subgroup of G, we sometimes say that H inherits the operation of G.

Let’s look at this idea with some things we are familiar with. Particularly, let us use our handy dandy knowledge about the set of integers ℤ under addition. We know that a subset of ℤ could be S = {-1, 0, 1}. If we add any of the elements in S together, we will always end up with an element that is also in S.

That is pretty easy to see, so let’s look at a counter-example. Let us have a second subset of ℤ, T = {-3, -2, -1, 0, 1, 2, 3}. At first glance, it would seem like T is closed under addition on ℤ. However, if we add 2 and 3 together, that would result in 5. And we can see that 5 ∉ T.

Therefore, T does not inherit the operation of ℤ.

Inverses…Part Two

If H is a subset of G, then any element within H must have its respective inverse also be in H. We talked about what inverses were a bit earlier, so I am not going to re-type it.

You might be reading this and be thinking, “But wait! Shouldn’t a subgroup also contain the identity element as well? Silly Jesse…you buffoon…” Have no fear! For the existence of an identity element in H actually follows from the existence of inverses. I won’t prove it, but please trust me…

Special Groups and Their Properties

There are certain groups in which interesting patterns crop up. This makes them stand out amongst other groups, thus they demand special attention be paid to them. One such group is called a cyclic group.

Let G be a group with aG. The subgroup <a> is called the cyclic subgroup generated by a. If this subgroup contains all the elements of G, then G is also cyclic. You can inherently see the usefulness of subgroups in full effect here: the ability to understand more about a group can come from the existence of certain subgroups.

But what exactly does it mean for a group to be generated by an element a? In short, it means that all the elements of a group can be represented as a multiplier of a single element of that group.

More formally, ∃ m ∈ ℤ such that G = {a}, or in additive notation, G = {ma} .

Let’s take a look at this idea in the context of an example: Take the group <ℤ8, +> that is, {0, 1, 2, 3, 4, 5, 6, 7} under the operation of addition.

Screen Shot 2017-03-06 at 3.54.19 PM

Additive table of integers modulo (order) 8 & associated generators of the group. Elements {1, 3, 5, 7} can be multiplied by some m (mod 8) that result in the creation of every other element in the group.

Think about it: if we took m × 2, we would only end up with multiples of 2 – {0, 2, 4, 6} – which does not account for every element in ℤ8. Moreover, the same can be said about elements 4 and 6 – they do not generate every other element in the group.

Why are the generators the ones that they are? What is so special about the relationship between the generating elements of a group & the order of the group? Well, much to the dismay of physics students everywhere, the result is pretty interesting…

Relation to Number Theory

While there are multiple connections between algebraic structures – such as groups – to number theory, perhaps the most useful one comes from ideas related to the division of numbers. In particular, the notion of relative primality is especially useful for understanding more about behavior of cyclic groups.

Recall that m divides n (n | m) if there exists a k such that n = mk. Also recall, that an integer p is prime if p has exactly two divisors: p & 1.

Now, if both n & m are not zero, then ∃ d ∈ ℤ+ for which d is the greatest common divisor of n & m:  gcd(n, m) = d.

When d is equal to 1, we say that n & m are relatively prime. Meaning, there are no factors shared between n & m, which can divide the both of them, other than 1.

Going back to our previous example of cyclic groups, using <ℤ8, +>, what do we notice about the generators?

Screen Shot 2017-03-06 at 4.32.55 PM

The gcd between the generators of integers modulo 8 & 8 is 1.

More generally, the gcd between the generators of any group, & the order of the group will always be 1. If you did not know what the generators of a cyclic group were, you could find them using this concept.

There is more to be said about the relation between number theory & group theory – such as the use of the division algorithm to prove existence of particular elements in cyclic subgroups – but I feel like I’ve already made a compelling enough case for the utility of number theory.

Like always, thanks for reading!

The Mathematics of Linguistics

Hey, y’all! It’s certainly been awhile. I’ve been busy the entire semester but with the lull in between Fall & Spring terms, I figured I would try to write something interesting I had been thinking about for some time.

Those that know me, know that I am absolutely crazy about natural language applications — pieces of technology that utilize your speaking voice to control some aspect of the device. It has a wide range of utility, that I’ll leave to a different post which will discuss examples of these pieces of technology. This post is about a topic a bit abstracted from that: the underlying ideas that allow you to make use of these applications come from computational linguistics, a unique & really cool subfield within the scope of AI.

Development of Linguistics as a Scientific Field

The idea of linguistics being a subject of mathematical research comes as a surprise to a lot of people I talk to that do not do much computer science type work; they have no idea that there are a number of a mathematical models used directly in the field of linguistics, including n-grams & ideas of entropy.

screen-shot-2017-01-04-at-3-47-24-am

Basic wordcloud of Wikipedia page on linguistics. Generated using Mathematica.

More indirectly, foundational ideas from both of these direct applications of probability theory come, perhaps unsurprisingly, from the logic developed by Cantor & reconciled by Russell in set theory, as well as from various mathematicians during the Golden Age of Logic, specifically Gödel & his Incompleteness Theorems; These ideas concerned linguistic patterns of mathematical axioms, & set the stage for analyses of everyday languages as communicative systems of which they exist at any given point in time, regardless of their history, with ‘axioms’ being replaced with ‘grammars’.

A Basic Idea of Set Theory

Set theory provided a needed versatility when identifying patterns at higher levels of abstraction, with the first whole theory being developed by Georg Cantor, who relied on Boole’s algebraic notations when working with syllogisms. The theory begins by establishing an arithmetic for sets:

If A and B are sets, let AB denote the set consisting of all members common to A and B, and let A + B denote the set consisting of all members of A together with all members of B.

This is just a formalization of Boole’s work, the only difference being that here, A & B refer to any set, not just those arising as a consequence of propositional logic. To avoid confusion, a contemporary notation was developed to where AB becomes A ∩ B (read as ‘A intersection B’), & A + B becomes A ∪ B (read as ‘A union B’). The arithmetic developed by Boole still holds in this generalization, except for the axioms involving the object 1, because there is no need for these kinds of objects in set theory. Sets of objects are subjected to the same type of properties – the same type of arithmetic – that can, but not necessarily, preclude other objects from being categorized in the same set.

screen-shot-2017-01-04-at-3-51-57-am

Sets are intuitively represented as Venn diagrams. The left is the union of both sets, whereas the right is the intersection of the given sets.

Small sets of objects are typically listed explicitly, while larger or infinite sets use ellipses e.g.:

  • {1, 2, 3}, a set consisting of the elements 1, 2, & 3
  • {1, 2, 3, …}, the infinite set of all natural numbers, also represented as ℕ

The introduction of sets allows us to examine patterns in varying levels of abstraction: the natural numbers are within the set of integers, which are within the set of rational numbers, which are within the set of real numbers, which are within the set of complex numbers. We typically refer to sets within sets as subsets. In notation, this is

ℕ ⊂ ℤ ⊂ ℚ ⊂ ℝ ⊂ ℂ

Mathematicians have used this idea to answer questions regarding the nature of numbers; to answer a question like, “what is a number?” we have to look out how members within certain sets of numbers can be described by the sets that precede it. That is, how properties of numbers within a lower abstracted set can be combined in some fashion as to represent elements within higher abstracted set. At the lowest level – the set of natural numbers – can be described in terms of axioms (namely, you can construct the notion of the set of natural numbers using solely the empty set – the set with no elements – denoted as ∅).

Inconsistency & Reconciliation with Axiomatic Set Theory

For all the usefulness of set theory, there was a major flaw in the framework. Before I discuss what the flaw was, it is more apt to discuss why having it is such a major deal.

Of all things that can be wrong with a particular axiom system – & you would hope there is none – inconsistency is definitely the worst. It is possible to work with axioms that are hard to understand, we see this every day: the Peano axioms that make up the set of natural numbers are not widely known by most people, yet we use natural numbers constantly. It is also possible to work with axioms that are counterintuitive which we, again, see every day (shoutout to statisticians). It is even possible to work with axioms that do not accurately describe the system you are intending to represent, since these axioms could be useful elsewhere in mathematics. But inconsistent axioms? No way, José.

The particular inconsistency, found by Bertrand Russell, was one relating to properties. Recall that I said that elements satisfying particular properties can be grouped together into sets. Well what happens when you have properties that are not properties of the elements within sets but properties of sets themselves? Is the set a member of itself? Or, what happens, if certain sets do not fulfill this property, are they not members of themselves? That is to say:

If set R satisfies property P, but set W does not, can R ∈ R if W ∉ W? This is a contradiction in the definition of a set itself. Russell looked at this question in a more nuanced way, that I will not go into detail here, but arrived at the same impasse.

The solution? The development of axiomatic set theory, which added new axioms to the ones that already existed from before. It is not as simple & succinct as Cantors original theory, & so it was only with reluctance that it was abandoned, but it just goes to show that even the most intuitive ideas need to be critically examined for flaws.

Gödel’s Incompleteness Theorem

A similar type of foundational flaw was found in the axiomatic approach to mathematics itself. The reason we use axioms as building blocks for particular mathematical structures is because it makes it possible for us to separate the ideas of being able to prove something & something being objectively true. If you had a proposition that was provable, then you can use a sequence of arguments to deduce whether or not the proposition was true, as long as your axioms were assumed true. These ideas being separated allowed mathematicians to avoid dealing with the major philosophical implications that “objective truth” entails.

Now, since figuring out which axioms to use is crucial to formalized mathematics, this implies – within the notion of formalization – that you will, at some point, find all the axioms you need to be able to prove something. This is the idea of completeness of an axiomatic system. But, as noted with Russell, finding all the axioms needed can be a really difficult undertaking. How do we know if we have found them all? How can we have a complete set of axioms?

In comes Gödel, who asserts that an axiom system must be incomplete, because there are questions that cannot be answered on the basis of the axioms. The big idea here is that consistency precludes completeness: if your axioms are consistent – that is they build a basic arithmetic by which you can perform known operations like addition, multiplication, etc. – then there exists statements which are true but not provable.

kurt_godel_zpse372ee6a

Kurt Gödel (1906-1978)

The proof of Gödel’s Theorem is better left to the entire books that are dedicated to its uses & abuses as it pertains to mathematical logic. A book I would recommend, if you are curious, is Gödel’s Theorem: An Incomplete Guide to Its Use and Abuse by Torkel Franzén.

Grammatical Approach to Linguistics

From what we have seen previously, it is not so farfetched then to examine the study of everyday language with the same axiomatic basis that we have seen before with the notion of sets. I suppose another question that arises in this examination is why: why would we want to analyze language with this paradigm? Let us look at an example:

  1. People love going to the movie theater.
  2. Cats are interested in mayonnaise and like resting in catacombs.
  3. Lamp soft because mom to runs.

It should come as no surprise that the third sentence is not proper English, but the first & second sentences are. The first & second sentences are correct, yet one is nonsensical. How can that be the case? Why can a nonsensical sentence still be classified as a genuine English sentence? It is because of the way in which the words are stitched together – the structure of the sentence.

Sentence structure – much like sets – are abstract constructs that contain elements adhering to specific properties. These elements are things you can explicitly point to, in this case, particular words. Because sentences are a level of abstraction above words, they cannot be explicitly pointed at when coming up with rules. The only thing you can do is observe the repetitive behavior expressed by sentences & then come up with a way to generalize that behavior. These patterns constitute rules that are much like the axioms we’ve encountered before; the type of “arithmetic” that would be applied to this set of axioms is the basis of a grammar for a language.

Inspired by the advancements of logic in the Golden Age, this new mathematically based linguistics attempted to reduce all meaningful statements to a combination of propositional logic & information conveyed by the five human senses. A step further, Noam Chomsky attempted to do what could not be done with set theory: to design a process of finding all axioms that described the syntactic structure of language.

  • DNP VP → S
  • V DNP → VP
  • P DNP → PP

Where,

  • DNP = definite noun phrase
  • VP = verb phrase
  • S = sentence
  • V = verb
  • P = preposition
  • PP = prepositional phrase

These are just a few of pieces of the formalism developed by Chomsky. Using the above grammar – & typically a parse tree – it is very easy to apply to the English lexicon, as each word corresponds to one of the given types. This formal grammar, which is built on axiomatic principles, captures some of the structure of the English language.

simple-sentence

Parse tree of a simple sentence. The sentence being “the rat ate cheese”, with the sentence broken down into its constituents.

The Advent of Natural Language Processing

It should be noted that the use of parse trees is highly developed in computer science, & used extensively in natural language processing (NLP). Chomsky’s algebraic work in grammar has allowed computer scientists to break down human speech patterns in a way that machines can analyze, understand & ultimately interpret; NLP considers the hierarchal structure of language.

amazon-alexa-family-press-hero

Amazon’s family of natural language devices. From left to right: the Tap, the Echo, the Dot. Each device utilizes Alexa Voice Services, Amazon’s unique natural language processor.

However, there are still major issues to be resolved within the field. One of these is the ambiguity that arises out of spoken language — English in particular. Consider the following basic sentence:

  • I didn’t take her purse.

Depending on where the emphasis is placed, this sentence conveys different messages – & thus it conveys different pieces of information. Emphasis is not easily reconciled within machines, & this is because that was not an aspect of language captured within Chomsky’s mathematical treatment of sentences; most processors do not know how to handle these kinds of ambiguities unless the ambiguity somehow disappears. Crudely, this is done thru the use of hard-coding particular linguistic patterns you want your machine to interpret. However, there have been strides to develop probabilistic methods of interpreting speech. This is largely thanks to the introduction of machine learning algorithms, for which most NLP techniques are now based.

Machine Learning in a Nutshell

Machine learning is a discipline in its own right, which I’ve touched upon in other posts (but have yet to provide a real in-depth treatment of). It is characterized by the examination of large data sets – dubbed training sets – & making statistical inferences based on these data. The more data you analyze, the more accurate your model will be. A popular application of machine learning is social media analysis.

There are entire texts dedicated to the subject, one of my favorites being this one, which combines definitions with visualizations.

Linguistics as Portrayed in Popular Culture

Perhaps the most recent portrayal of modern linguistics can be seen in the recent sci-fi blockbuster Arrivalwhich is about a linguistics professor attempting to converse with aliens. The movie is a nod to researchers at SETI, as well as Freudenthal, who developed Lincos — an attempt at the creation of a language based on perceived commonalities in mathematics that we would have with an alien species. His work, while pivotal, had some serious pitfalls, which you can read about in the attached article.

methode-times-prod-web-bin-47b59b1a-7054-11e6-acba-85f5c900fc1a

Dr. Louise Banks attempting to communicate basic words to alien visitors in the latest sci-fi film Arrival.

During the film, the audience was able to see some of the computational methods that modern linguists use in analyzing speech, & this technology is written about extensively by Stephen Wolfram (in fact, it is his technology they’re using in the film).

As a whole, the film helped to showcase linguistics as a highly technical & scientific field of study, as opposed to a degree that people get because they “speak like five languages or something” (actual quote by someone I know, who will remain nameless).

Not the Beginning, but Not the End Either

While linguistics has been around for centuries, it is only during the mid-20th century that it took a turn from the historical to the mathematical; the major advancements made by Gödel, set forth by Cantor, Russell & Boole (as well as Aristotle, technically) allowed the study of patterns to be applied to other fields that were before, outside the scope of mathematics. The advent of computers ushered in new questions about language & artificial intelligence, & are now bustling fields of research. Yet for all our progress, we have yet to understand fundamental questions about what makes language the way it is. These – & other questions – suggest we have a ways to go, but it would seem as if we have a good foundation to base our future findings off of. Though, if anything is certain about mathematics, it is that intuition does not always reign supreme; language is constantly evolving, & it is good practice to examine foundations for flaws or else we might just end up at the same philosophical standstill that many mathematicians have fallen victim to before.

PostFix Notation and Pure Functions

Hey, y’all! Hope the summer is treating you well! It’s been awhile since my last post, and the reason for that — as some of you may know — is because I have started working full time at internship with Business Laboratory in Houston, TX. I will likely be talking about the projects I’ve been working on in later posts; I’m reluctant to at the moment because I’m trying not to count my chickens too early. I am going to wait until the projects are officially finished for that, but for now, I’d like to talk about a cool method of programming with the Wolfram Language that I am really fond of: the use of PostFix notation and pure functions.

I’m going to assume at least a basic familiarity with Wolfram language (likely some of y’all have stumbled upon Mathematica if you’re a science student, or have used Wolfram|Alpha at some point for a class). That said, pure functions are still kind of mysterious, so let’s talk about them.

 

What the heck is a pure function

Pure functions are a way to refer to functions when you don’t want to assign them a name; they are completely anonymous. What I mean by that is, these functions have arguments we refer to as slots, and these slots can take any argument passed through regardless of whether or not the argument in question is a value or a string, and regardless of what kind of pattern the passed argument follows. To illustrate what I mean, consider:

Screen Shot 2016-08-06 at 7.44.19 PM

As you can see, the functions “f” and “g” don’t care whether what you’re passing  thru it is an undefined variable, a string, or a value. It’ll evaluate it regardless.

If you’re seeing some similarities between Wolfram pure functions and, say, Python λ-functions, no need to get your glasses checked! They are essentially equivalent constructs: in practice, they both allow you to write quick single-use functions that you basically throw away as soon as you’re finished with them. It’s a staple of functional programming.

Screen Shot 2016-08-06 at 7.39.53 PM

A comparison of the traditional way of evaluating functions vs the evaluation of functions with pure functions; the results, as you can see, are identical. Another thing to note is that traditionally, you must set the pattern in the function you are defining, hence the h[x_] — this underscore denotes a specific pattern to be passed as an argument in order for the function to evaluate. Hence, pure functions are much more flexible.

Last note about pure functions: it’s important to remember that when you are writing in the slots, you need to close them with the ampersand (&). This tells Mathematica that this is the end of the pure function you are working with.

 

Ok, but what about PostFix

We are familiar with applying functions to variables. We are taught this in math courses: apply this function, which acts as this verb, to this variable, which can take on this many values. In math terms, when you evaluate a function you are relating a set of inputs to a set of outputs via some specific kind of mapping. For example, if I have:

Screen Shot 2016-08-06 at 8.01.17 PM

What I am saying is my set of inputs is x, and when I apply the function, f(x) = x + 2, to this set of inputs, I get the corresponding set of outputs, f(x).

In Mathematica, this is typically expressed as brackets, [ ] (as you might have noticed). PostFix is the fundamentally the same exact thing. It is a method of applying a function to some kind of argument. Except, with PostFix notation for brackets, expressed as //, you are now tacking the function on at the end of an evaluation. Like so:

Screen Shot 2016-08-06 at 8.07.15 PM

PostFix notation has its pros and cons, and like any good developer, it is up to you to decide when it’s best for you to use this particular application.

 

Putting them together

When these two methods are put together it can make for some rapid development on a project, particularly when you are importing data; you are usually importing something that has a raw, messy format (say coming from an Excel spreadsheet or some other database). PostFix notation allows you to perform operations on this data immediately after import, and pure functions allow you to recursively call the things you did in the previous line of code (just before PostFixing) without rewriting it. Doing this makes debugging a breeze, because you can easily break apart the code if it fails to evaluate in order to see where the failure is occurring. Here’s an example of what I mean:

Screen Shot 2016-08-06 at 8.18.08 PM

A comparison between nesting of functions method that is typical of most Mathematica users and the PostFix with pure functions method. You can see that I’ve included the use of two global variables, because there may be another point in development where you would like to use just the specific pieces of information from previous code, and not the whole thing; e.g. the function DataSet was included solely for visualization purposes, and the pertinent information can be retrieved by just calling the variable csv if it needed to be analyzed separately at another time.

In conclusion

PostFix notation, coupled with the use of pure functions, accelerates your development and will help you get whatever you’re working on — whether it be an app, or some kind of homework problem — to the final stages much faster. It is not intended (by any means) to be a nice, polished piece of code; it is primarily for testing purposes, and I would advise to dress it up a bit before pushing it out for the world to see. Just one person’s opinion of course, you can ultimately do whatever you want. Anyway, I hope this has been somewhat helpful for y’all, and like always: thanks for sticking around!

Slivnyak’s Theorem and the Boolean Model

A few weeks ago, I gave a talk over some fundamentals of stochastic geometry as a part of the DRP at my university. Specifically I talked about various spatial point processes and a particular model called the Boolean model, which is regarded as the bread and butter of the entire field. Here, I will try to incorporate as much as I was able to during my chalk talk, as well as include some nice visualization tools to hopefully help make the material a bit easier to digest.

So what exactly is a point process? Formally, a spatial point process (p.p.), Φ, is a random, finite or countably infinite collection of points in the space Rd, without accumulation points (which just means the process isn’t converging to the origin, and there aren’t infinitely many points as you get closer and closer). We can think of Φ as the sum of Dirac measures (a measure being a generalization of physical concepts which we are familiar with, like area, volume, and length; it’s a way to size objects), which are measures of a size of 1 – it assigns size based solely on whether it contains a fixed point, x, or not, and is one way of formalizing the Dirac delta distribution. That is,

Screen Shot 2016-05-19 at 10.49.49 PM

I like to just think of a p.p. as simply a collection of points in a set, since Φ is a random counting measure: Φ(A) = n where n is the total number of points generated.

Now, one point process I focused on in particular is the Poisson p.p. It’s the simplest one there is, and we can construct it as follows: a specific Φ, with intensity measure Λ (which we can think of as how often/how fast these points will appear in the space; if you are familiar with 1D stochastic processes, consider Λ (dx) = λ dx, which is a homogenous Poisson p.p. with a constant rate of arrivals/events) where Λ is,

Screen Shot 2016-05-19 at 10.52.12 PM

is characterized by its family of finite dimensional distributions

Screen Shot 2016-05-19 at 10.50.03 PM

If these sets, {Ak}, are disjoint, then this is Poisson, thus implying another characterization: that of independence. We can see this with the RHS of the equation, where all the individual probabilities are simply multiplied together. I like to think about this in a physical sense: say we have an ensemble of particles, all prepared in the same manner, and we want to look at some statistical value of interest. If we assume non-relativistic (and a few other) interactions, then the behavior of one particle is independent of another. In my previous post, I created a simulation of a Poisson p.p. for a unit square:

Screen Shot 2016-05-19 at 11.19.53 PM

a Poisson p.p. with generated points in a unit square

 

Another method of characterizing the Poisson p.p. is by using some ideas taken from Palm theory, in particular, by making use of the reduced Palm distribution, P!x. What this does is condition on a point being at a particular location (denoted by Px), and then you remove this point from Φ (this is denoted by the exclamation point), and then look at the resulting distribution. For general p.p., that may have attractive or repulsive forces between the points, you can see how conditioning on a point being at a location would affect the overall behavior of points within some finite distance of the event. Further, removing this point from the point process would then result in a different distribution. Because a Poisson p.p. has no interaction between the points, we have an explicit relationship between the reduced Palm distribution and the original distribution. In fact,

Screen Shot 2016-05-19 at 10.54.45 PM

This is Slivnyak’s Theorem, and I just think it’s really really cool! It’s remarkable, because I’m not sure that with any general p.p. we could get a relationship like this that relates these two distributions; the Poisson p.p. is quite special.

Let’s leave the Poisson p.p. for a moment and talk about a possible property of any p.p.: that of stationarity. Say we have a p.p., Φ. Recall our previous definition of Φ as being the sum of Dirac measures,

Screen Shot 2016-05-19 at 10.49.49 PM

Now, let’s add a vector:

Screen Shot 2016-05-19 at 10.58.43 PM

This p.p. is stationary iff:

Screen Shot 2016-05-19 at 10.59.19 PM

That is to say, that the p.p. is invariant under translation. This idea will be helpful to us later on. For now, let’s move on to another type of p.p.; consider we now attach some piece of information (in Rl) to each point (which exists in Rd ) in the process Φ. We call this a marked p.p., which is a locally finite, random set of points with some random vector attached to each point, and is denoted with a similar notation:

Screen Shot 2016-05-19 at 11.04.19 PM

Marked p.p. are important in their own right, but I’m just gonna be using it to talk about the basis of the Boolean model. In particular, it’s based on a Poisson p.p., whose points in Rd are called germs, and an independent sequence of i.i.d. compact sets called grains,

Screen Shot 2016-05-19 at 11.05.40 PM

You can see how this underlying p.p. mirrors our description of a marked p.p.. However, for the latter we only considered a vector, mi, in Rl. To deal with more general mark spaces, we can think of Ξi subsets as being picked from a family of closed sets (where m acts like a random radius),

Screen Shot 2016-05-19 at 11.07.12 PM

The associated Boolean model is the union of all grains shifted to the germs. That is, the set-theoretic union of all disks centered at each point generated in the underlying Poisson p.p.,

Screen Shot 2016-05-19 at 11.08.43 PM

 

Screen Shot 2016-05-19 at 11.18.07 PM

A Boolean model is the union of many grains centered at germs. On the left, the individual grains. On the right, the union sets.

Now, the easiest Boolean model to work with is one that is homogenous: we say that ΞBM is homogenous if the underlying Poisson p.p. is stationary and Λ (dx) = λ dx. So stationarity is a characteristic of the homogenous BM.

The Boolean model (and also more general germ-grain models) are used in a variety of applications: from telecommunications, to galaxy clustering, and even to DNA sequencing; it is a widely applicable model to real world pattern formations and is surprisingly accurate given how simple it is.

This is just meant as a simple introduction to the topic. In later posts, I hope to talk about some applications of the Boolean model (and other related concepts) to statistical physics. Anyway, I hope y’all find this stuff just as interesting as I do! And, like always, thanks for reading!

HackDFW + checking in

Hey y’all, I know it’s been awhile since I last made a blog post. Things got kinda crazy over winter break and over the course of this semester; life’s kinda unexpected in that way, but I’m back and have a lot to share with y’all!

So this past weekend I attended my 2nd hackathon: HackDFW, up in Dallas! For those that are unfamiliar with what a hackathon is, I wrote another post about the previous one I attended which briefly discussed the purpose of these events. Now, this hackathon had a way different vibe than HackTX did. When my hacking partner, Devin, and I got there, they were blasting EDM. And this pretty much remained to be the case throughout the entire event (yes, even at 8am on Sunday morning). While at this hackathon, we had a lot of project ups and down: hardware issues, sleep deprivation, inaccessibility of libraries….just a lot really. We ended up not being able to submit a hack due to several of these issues (which I’ll talk about more in depth later on in this post).

While in attendance, Devin and I, along with many other attendees, had the unfortunate experience of dealing with spotty internet connection; there were small wifi hubs set up around various sections of the venue, and some were stronger than others. For an event where the participants rely heavily on things like web-based APIs to build data and create apps/hacks, you can see how this would be frustrating.

It was through this collective frustration that Devin and I formed our idea: let’s get information about these various wireless networks scattered about the venue and create a bunch of data sets to look at how the spotty wifi hubs vary in signal strength over time. Then, let’s model these data sets using some stochastic geometry techniques. Specifically, we wanted to use a Boolean model (which I’ll be discussing more in depth in my next post, because I am giving a talk about it later this week!) to look at range of coverage for each individual network.

We didn’t have much of a thought about how this would be beneficial, but we figured that, at least in doing this, the organizers might have some thought about how to fix this issue for next year. If nothing else, it would be a nice visualization tool. This project was going to be a divide and conquer: Devin was going to work on hardware and acquiring data and I was going to code the simulations and use randomly generated data to test the code via Wolfram Language.

Anyway, so to start this project we first approached one of the sponsors who was lending out chipKITs, which are basically supposed to be arduino-like devices, capable of being programmed to do a variety of different tasks. The sponsor noted that the device itself was built for the arduino community, so it could use the same sorts of libraries that arduinos use. We also acquired, on top of the chipKIT uC32, a wifi shield, to gather information about the wireless networks and have the device output a stream of data regarding this information (stuff like wifi info and RSSI levels). Here’s where we ran into some issues: the device required a miniUSB cable…which are not in use anymore, as people have switched to microUSB cables. So, taking this issue to the sponsor, they told us that they would run to Fry’s Electronics to find some; we were playing a waiting game at this point.

dev1

a ChipKIT uC32 with mounted wifi shield, courtesy of Digilent

The sponsor returns, and informs us that they found three cables there, but they were $11 each, and so they didn’t buy any. Both Devin and I felt that this was not the appropriate behavior of a hackathon sponsor, and were pretty frustrated. Eventually, after a lot of coffee, consideration, and beating our heads against the wall, we decided to stick with this project.

Devin made a run to Best Buy to purchase the cable, and I continued to work on the simulation. He came back and finally! We were getting somewhere! But not so fast: we then ran into the issue of incompatible libraries. But what gives? We were told that arduino libraries were fully supported on these chipKIT devices.

Because of this issue, we were not able to get the data needed into a readable and transferable format so that I could read it into the simulation and model it. I also ran into a couple of issues myself with writing the code for the Boolean model (I got the underlying Poisson point process down, but generating the second distribution of independent marks proved quite a challenge for me at the time; I’m going to tackle it again before my next blog post).

Screen Shot 2016-04-24 at 8.12.31 PM

My code for a Poisson point process from which the Boolean model is built

It’s quite unfortunate we weren’t able to submit, but it happens! Hackathons are unpredictable, and it’s this unpredictability that make them a worthwhile learning experience.

dev4

A very dark pic of me Sunday morning, sporting a cool new hat and some stylish sweatpants, in front of the stage at HackDFW (PC: Devin Luu)