How math generalizes everything
Generalizations are the key to understanding high-level mathematics. The more advanced math is, the more abstract generalizations you’ll encounter.
The reason is we often face complex objects and try to deconstruct them to see what their most descriptive properties are. In that way, we can only focus on individual properties and also recognize them in other objects. Studying one property can thus be applied to a wide class of objects.
Equality
Mathematicians fear redefining things. They like their objects to be immutable. And this is very noticeable when generalizing equality. We never redefine it. We view it as a relation on a set whose elements we wish to equate differently.
What’s a relation? It can be viewed as a function R(x,y) (usually denoted as xRy) telling you whether a given pair (x,y) has a certain property (i.e. whether x and y are related). There’s a bit of a language barrier here, as saying x and y are in a relation does not necessarily mean that y and x are in a relation… we’ll figure it out.
Let’s say we have a relation ≡ that looks like equality. Then it should satisfy the following properties:
- for every x we must have x ≡ x,
- if we have x ≡ y, then we must have y ≡ x,
- for every x, y and z we must have x ≡ y and y ≡ z implying x ≡ z.
Those conditions are called reflexivity, symmetry and transitivity. A relation ≡ satisfying the above is called an equivalence relation.
Let’s look at an example of set size equality. Two sets have equal size when there is a bijection between them. We can check if this satisfies our conditions:
- The identity function is a bijection from any set to itself, so every set has a size equal to its own.
- If we have a bijection from X to Y, then its inverse is a bijection from Y to X.
- If we have a bijection b from X to Y and a bijection c from Y to Z, then the composition c ∘ b is a bijection from X to Z.
In this context, we can equate two sets that have the same size. Here I expand on the concept.
We can represent an equivalence relation on a set X by gathering the equivalent elements in the same set. We then view those sets as points and call them equivalence classes. They form a partition of X.
Ordering
Here we generalize less-or-equal as a relation on a certain set, by extracting the relevant properties:
- for every x we must have x ≤ x,
- for every x, y and z we must have x ≤ y and y ≤ z implying x ≤ z,
- if x ≤ y and y ≤ x then x = y.
Any ≤ satisfying the above is called a partial order and is intuitively treated as a version of less-or-equal. An example of such an order is divisibility over natural numbers. So m ≤ n when m | n, meaning m divides n:
- obviously for every x we have x | x,
- if x | y and y | z, then there are q, h such that y = q ⋅ x and z = h ⋅ y, so z = h ⋅ q ⋅ x, implying x | z,
- if x | y and y | x, you can check that the only possibility is x = y.
We often visualize partial orders with a Hasse diagram. The connected elements are related and the one higher is on the right side of the relation (larger). The diagram uniquely represents our partial order.
There is one important property missing though. It’s called totality and requires every two elements to be comparable. Meaning either x ≤ y or y ≤ x. In that case, the Hasse diagram is always a straight line.
This theory is widely applied. One instance is the theory of Incidence algebras in combinatorics, with its main result being Möbius inversions.
Algebraic operations
We know how to add and subtract or multiply and divide numbers. How can we transfer this to some other abstract sets?
We recognize that those are binary (take two arguments) functions that satisfy some conditions. An important generalization is that of a group, which is a set G with some binary operation f(x, y), denoted by x ⋅ y, obeying the following conditions:
- (x ⋅ y) ⋅ z = x ⋅ (y ⋅ z),
- there exists an element e, such that x ⋅ e = e ⋅ x = x for all x in G,
- for every x in G, there exists y in G, such that x ⋅ y = y ⋅ x = e.
The first point is called associativity and implies that we can forget about the bracket placements in any given expression (this implication is not entirely trivial). We call the special element e a unit and can easily prove it’s unique: If there are two units e and u, then by using their definitions, we get
e = e ⋅ u = u
The third point gives groups their symmetry by including inverses. An element is a reflection of its inverse.
There are various examples of groups, like
- integers with addition (unit is 0),
- non-zero rational numbers with multiplication (unit is 1),
- bijections (on a fixed set) with composition (unit is the identity function).
A very important example is that of cyclic groups. A lot of cryptography relies on theorems from group theory, applied to cyclic (and elliptic curve) groups.
If we omit the inverse condition of a group, we get monoids. An example is to consider finite strings over some alphabet, e.g. {x,y,z}. The elements look like
xyz, xyy, zzyzyy, z, “” (empty string), …
We define multiplication as concatenation and see that it’s associative. The unit is the empty string. This is called the free monoid over {x,y,z}.
Distances
Let’s consider distances in our residing 3d space. From Pythagoras’ theorem, we can calculate the length of a certain 3d-vector as:
The distance between two points is the length of the vector from one point to another, so
We can extend the definition of a distance to be more general. We say that the properties defining a distance are:
- d(x, y) = 0 if and only if x = y,
- d(x, y) = d(y, x),
- d(x, z) ≤ d(x, y) + d(y, z).
The first one says the only case where the distance between two points is 0 is when the points are the same. The second one says the distance from x to y is equal to the distance from y to x. The third one is called triangle inequality and requires that traveling from x directly to z is always faster than going through some other point y.
Now, we can talk about what balls look like in this generalized distance. A ball around an element a with radius r > 0 is defined as the set of all points x with d(a, x) < r. Visualizing balls in some given distance can be challenging but gives you an insanely useful intuition about the space.
Again, we get an example when considering strings over some alphabet, say {x,y,z}. Except now the strings are infinite, e.g.
xyzzzyzzzzyzzxyzxxxzyyzyzzzzzzzzz…,
yyyzzzxxxyyyzzzxxxyyyzzzxxxyyyzzz…
We want to define a distance between two different strings s and r. The following metric formalizes the idea that the closeness of two strings is determined by the number of symbols they have in common at the beginning:
Try to visualize which strings are close and which are far. Or what balls around some string a look like. They should sort of look like trees. The lower the radius, the higher the trunk.
Another example is viewing functions as vectors. That is basically the whole point of the theory called functional analysis. For two functions f and g, both mapping from some set X to the set of real numbers, we can define the distance between them as
This says the distance between two functions is the maximum (supremum) distance between their values. Now a whole range of questions opens up. What does the convergence of a function sequence look like in this metric? Is the limit unique?
Don’t think this generalization stops here. We have a notion called a topology, which further generalizes distances (in a way). It only tells us what the balls of a given space are. And it turns out that this is enough to define limits, which means we can do calculus inside that space. Understanding topology is probably the most rewarding experience in math.
Scalar (dot) product
The scalar product between two real n-dimensional vectors x and y is defined as
What makes this concept interesting is orthogonality. We know that two vectors are orthogonal if and only if their scalar product is 0. Also, a vector is of unit length if its scalar product with itself is 1.
If we have orthonormal vectors e1, …, en spanning our vector space, then
So, any given vector q can be expressed as a linear combination of those vectors. And the coefficients of those vectors are easily computable.
How do we generalize this? It turns out the following axioms are a good choice to characterize the scalar product
Wouldn’t it be just insane if we extended this concept to functions? We can define the scalar product as
As the integral is a generalization of the sum, we can see how this naturally generalizes our usual scalar product. This produces one of the most beautiful theories in math. Fourier’s theory. You may know the Fourier transform of a given function f:
This is a direct result of the vectors
1=cos(0x), cos(nx), sin(nx) for n = 1, 2, 3, …
being orthonormal in the above scalar product. And they span some vector space, including a large class of functions. So we can linearly expand those functions along our vectors, as we did in the non-generalized case, and get the formula for our coefficients:
It’s also possible to find other sets of orthonormal vectors spanning the same space (such as Legendre polynomials).
Measure
We understand how to measure an interval or the volume of some 3d object. E.g. for a ≤ b we have
m([a, b]) = b - a
It’s incredibly useful to generalize this concept. The whole mathematical basis for probability and statistics relies on this. So, what should a measure be?
We define it as a function that maps subsets of a given set to positive real numbers. It must satisfy the following conditions:
- a measure of the empty set is 0,
- if sets A and B contain different elements, then m(A ∪ B) = m(A) + m(B). This should also hold for countable unions of such sets (then we have an infinite sum).
A set is countable if its set size is equal to that of all integers. It’s crucial to this theory that the set of real numbers is strictly larger than the set of integers.
The second point gives us a way to measure separated sets separately:
m([1, 2] ∪ [10, 11]) = m([1, 2]) + m([10, 11]).
What’s a measure in probability theory? Probability! We define probability as a measure on the space of all events. It takes a set of events and tells us their weight. E.g. to get the probability of a die rolling a two, we make our probability measure the weight of all possible current conditions (events) that lead to a world, where our die lands on a two.
In this way, the first step in mathematically defining probability is admitting that randomness doesn’t exist. We operate under an illusion of the existence of a set that perfectly describes all possible current conditions of our world. Any element (event) of that set is a specific condition and from it, the entire future can be predicted.
Others may have different interpretations of what events represent, but this intuition served me well in my numerous probability courses. And it’s philosophically intriguing, as we include reality being deterministic in our assumption, meaning our ignorance is the only thing making the world seem random.
Conclusion
After doing math for some time, your mind generalizes problems as soon as it interprets them. It’s a pattern, burned into your subconsciousness. When you encounter a problem your first thought is how to deconstruct it via generalization. In this way, you focus on only what’s important and abstract other bad things away.