Categories

## On Gromov’s theorem on groups of polynomial growth

This article documents my presentation of Gromov’s theorem on groups of polynomial growth at the MIT combinatorics reading group. The presentation is based on Gromov’s 1981 paper, Groups of polynomial growth and expanding maps, Kleiner’s 2007 paper, A new proof of Gromov’s theorem on groups of polynomial growth, and Tao’s 2009 blog post, A finitary version of Gromov’s polynomial growth theorem.

#### Introduction

Throughout, we fix a finitely generated group $G$ and a finite symmetric generating set $S$ (that is $\forall x \in S. x^{-1}\in S$). For every group $x \in G$, the word length $\lVert x \rVert$ of $x$ is the shortest length $n$ of a word $s_1s_2 \dots s_n$ in $S$ that expresses $x$.

Gromov’s theorem connects a group property of $G$ with the growth of the cardinality of the ball $B(r) := \left\{x \in G : \lVert x \rVert \le r\right\}$ of radius $r$. The ball $B(r)$ can be seen as the set of vertices within distance $r$ from the identity element of $G$ in the Cayley graph of $(G, S)$.

Definition (nilpotent and virtually nilpotent). A group $H$ is nilpotent of class $n$ if there is a lower central series $$H = H_0 \rhd H_1 \rhd \dots \rhd H_n = \{e\},$$ where $H_{i+1} = [H_i, H]$. A group $G$ is virtually nilpotent if there is a finite index subgroup $H$ of $G$ that is nilpotent.

Example 1. When $G$ is abelian, $G$ is nilpotent and $\lvert B(r) \rvert = r^{\mathrm{rank}(G)}$.

Example 2. When $G$ is the discrete Heisenberg group, that is $$G = \left\{ \begin{pmatrix} 1 & a & b \\ & 1 & c \\ & & 1 \end{pmatrix} : a, b, c \in \mathbb{Z} \right\},$$ we have the lower central series $$G > \langle \begin{pmatrix} 1 & & 1 \\ & 1 & \\ & & 1\end{pmatrix} \rangle > \{e\},$$ and the growth of $\lvert B(r) \rvert$ is bounded by $r^4$.

Theorem (Gromov 1981). For every group $G$ generated by a finite symmetric set $S$, $\lvert B(r)\rvert$ is at most polynomial in $r$ if and only if $G$ is virtually nilpotent.

Gromov’s proof uses the deep Montgomery–Zippin–Yamabe structure theory of locally compact groups on Hilbert’s fifth problem. Colding and Minicozzi, solving a conjecture of Yau, showed that the space of harmonic functions with polynomial growth on an open manifold with non-negative Ricci curvature. To state (a weak version of) the discrete analog of their result, we need the notion of Lipschitz harmonic on the group $G$ with the finite symmetric generating set $S$.

Definition (Lipschitz and harmonic). A function $f\colon G\to\mathbb{R}$ is Lipschitz if $\sup_{g \in G, s \in S}\lvert f(gs) - f(g) \rvert$ is finite, and is harmonic if $f(g) = \frac{1}{\lvert S \rvert} \sum_{s \in S} f(gs)$ for all $g \in G$.

Theorem (Colding and Minicozzi 1997, Kleiner 2010). If $\vert B(r)\lvert$ is at most polynomial in $r$, then the linear space of Lipschitz harmonic functions on $G$ is finite dimensional.

Kleiner provided a new proof of Gromov’s theorem by establishing the Colding–Minicozzi theorem from scratch. Later Shalom and Tao pushed Kleiner’s methods to obtain the following quantitative version of Gromov’s theorem.

Theorem (Shalom and Tao 2010). There exists an absolute constant $c$ such that if $\lvert B(r) \rvert \le r^d$ for some $r > \exp(\exp(cd^c))$ then $G$ contains a finite index subgraph $H$ which is nilpotent of class $\le c^d$.

In the rest of the article, we will prove the Colding–Minicozzi theorem through the Poincaré inequality and the reverse Poincaré inequality.

#### Poincaré inequality

Lemma (Poincaré inequality). For every function $f \colon G \to \mathbb{R}$, if $f$ has mean $0$ on $B(r)$, the $l^2$-norm of $f$ on $B(r)$ is bounded by the fluctuation of $f$ on $B(2r)$: $$\sum_{x\in B(r)}f(x)^2 \le \frac{\lvert B(2r) \rvert}{\lvert B(r) \rvert}\cdot 2r^2 \sum_{x,y\in B(2r), x\sim y}(f(x) - f(y))^2.$$

Proof. One can check that the left hand side is equal to $$\frac{1}{2\lvert B(r)\rvert}\sum_{x,y\in B(r)}(f(x)-f(y))^2.$$

For each $z \in B(2r)$, we fix a shortest path $e = z_0, z_1, \dots, z_{\lVert z \rVert} = z$ from $e$ to $z$ in the Cayley graph of $(G, S)$. Given $x,y\in B(r)$, let $z = x^{-1}y \in B(2r)$ and get $$f(x) - f(y) = \sum_{i=1}^{\lVert z \rVert}f(xz_{i-1})-f(xz_i) \\ \implies (f(x)-f(y))^2 \le \lVert z \rVert \sum_{i=1}^{\lVert z \rVert}(f(xz_{i-1})-f(xz_i))^2.$$

When summing over the last inequality over all $x,y\in B(r)$, we can regroup the summands by $z$ and $i$ as follows: $$\sum_{z \in B(2r)}\lVert z\rVert\sum_{i=1}^{\lVert z \rVert} \left(\sum_{x\in B(r) : xz \in B(r)}(f(xz_{i-1})-f(xz_i))^2\right).$$

Fix $z$ and $i$ for a moment. One can show that both $xz_{i-1}$ and $xz_i$ are in $B(2r)$, and moreover the directed edges $(xz_{i-1}, xz_{i})$ are distinct when $x$ varies in $B(r)$. Thus $$\sum_{x\in B(r) : xz \in B(r)}(f(xz_{i-1})-f(xz_i))^2 \le \sum_{x,y\in B(2r), x\sim y}(f(x)-f(y))^2.$$

We obtain the Poincaré inequality by putting everything together and noticing that $\lVert z \rVert \le 2r$.

#### Reverse Poincaré inequality

Lemma (Reverse Poincaré inequality). For every harmonic function $f\colon G \to \mathbb{R}$, the fluctuation of $f$ on $B(r)$ is bounded by the $l^2$-norm of $f$ on $B(2r)$: $$\sum_{x,y\in B(r), x\sim y}(f(x) - f(y))^2 \le \lvert S \vert \cdot 4r^{-2}\sum_{x\in B(2r)}f(x)^2.$$

To facilitate the proof, we introduce the following notations. Given a function $f\colon G\to \mathbb{R}$ and $s\in S$, write $f_s(x) := f(xs)$ and $\partial_s f := f_s - f$. It is easy to see that

1. $\sum_{s\in S}\partial_{s^{-1}}\partial_s f = 0$ when $f$ is harmonic, and
2. $\sum_{x\in G}f(x)\partial_s g(x) = \sum_{x\in G}\partial_{s^{-1}}f(x) g(x)$ when $f$ or $g$ is finitely supported.

Proof. Fix the harmonic function $f\colon G\to \mathbb{R}$ and let $\phi\colon G \to [0,1]$ be defined by $$\phi(x) = \begin{cases} 1 & \text{if }\lVert x\rVert \le r,\\ 2 - \lVert x\rVert/r & \text{if }r < \lVert x\rVert < 2r, \\ 0 & \text{if }\lVert x\rVert \ge 2r.\end{cases}$$

For every $s \in S$, note that $\partial_s (f\phi^2) = (\partial_s f)\phi^2 + f_s(\partial_s \phi^2)$ and $\partial_s \phi^2 = (\partial_s \phi)\phi + \phi_s (\partial_s\phi) = (\partial_s \phi)(2\phi + \partial_s \phi)$. We obtain \begin{aligned}\partial_s f \partial_s (f\phi^2) & = (\partial_s f)^2 \phi^2 + (\partial_s f)f_s(\partial_s \phi)(2\phi + \partial_s\phi) \\ & \ge \tfrac{1}{2}(\partial_s f)^2\phi^2 - 2(f_s)^2(\partial_s \phi)^2 + (\partial_s f)f_s(\partial_s\phi)^2 \\ & = \tfrac{1}{2}(\partial_s f)^2\phi^2 - f_s(f_s + f)(\partial_s\phi)^2 \\ & \ge \tfrac{1}{2}(\partial_s f)^2\phi^2 - \tfrac{1}{2}(3f_s^2 + f^2)(\partial_s\phi)^2. \end{aligned}

When summing the above inequality over all $s\in S$ and $x \in G$, by noticing that $$\sum_{s\in S}\sum_{x\in G}\partial_s f(x) \partial_s (f(x)\phi(x)^2) = \sum_{s\in S}\sum_{x\in G}(\partial_{s^{-1}}\partial_s f(x)) f(x)\phi(x)^2 \\ = \sum_{x\in G}\left(\sum_{s\in S}\partial_{s^{-1}}\partial_s f(x)\right) f(x)\phi^2(x) = 0,$$ we get $$\sum_{x\in G}\sum_{s \in S}(\partial_s f(x))^2\phi(x)^2 \le \sum_{x\in G}\sum_{s \in S}(3f_s(xs)^2+f(x)^2)(\partial_s \phi(x))^2 .$$

The left hand side of the last inequality is at least $$\sum_{x,y\in B(r), x\sim y}(f(x)-f(y))^2,$$ whereas the right hand side is at most $$4\lvert S\rvert \tfrac{1}{r^2}\sum_{r \le \lVert x\rVert \le 2r}f(x)^2$$ because $(\partial_s\phi)^2 \le 1/r^2$ and $\partial_s\phi(x)^2 > 0$ only if both $x$ and $xs$ are in $\{x \in G : r\le \lVert x\rVert \le 2r\}$.

#### Colding–Minicozzi theorem

To simplify the presentation, we will assume the doubling constant $\lvert B(2r) \rvert / \lvert B(r) \rvert$ is uniformly bounded at all scales $r$, which, for example, is indeed the case when $\lvert B(r)\rvert = \Theta(r^d)$. In general, one needs the pigeonhole principle to select the correct radii for the argument below to work.

Proof assuming the doubling constant is uniformly bounded. Suppose for the sake of contradiction, the dimension of the linear space consisting of Lipschitz harmonic functions on $G$ is at least $n$, where the parameter $n$ will be determined later. Denote by $V$ the $n$-dimensional linear subspace.

Let $k$ be a natural number to be determined soon, and fix $r$ for a moment. Let $\mathcal{A}_r$ be a maximal collection of disjoint balls of radius $r/2$ with centers in $B(kr)$, and let $\mathcal{B}_r$ be the collection of balls with the same centers of the balls in $\mathcal{A}_r$, but of radius $r$. Let $V_r$ be the linear subspace of $V$ consisting of harmonic functions in $V$ that average to $0$ on each ball in $\mathcal{B}_r$. Note that the co-dimension of $V_r$ as a subspace of $V$ is at most $\lvert \mathcal{B}_r \rvert = \lvert \mathcal{A}_r \rvert$, which is at most $\lvert B(kr+r/2) \rvert / \lvert B(r/2) \rvert = O(1) =: C$.

For every harmonic function $f$ in $V_r$, using the fact that $B(kr) \subseteq \cup\mathcal{B}_r$, the fact that each point in $G$ is covered by $2B$ for at most $\lvert B(2r+r/2) \rvert / \lvert B(r/2) \rvert = O(1)$ many $B \in \mathcal{B}_r$, the Poincaré inequality and the reverse Poincaré inequality, we get \begin{aligned}\sum_{x \in B(kr)}f(x)^2 & \le \sum_{B \in \mathcal{B}(r)}\sum_{x\in B}f(x)^2 \\ & \lesssim r^2\sum_{B \in \mathcal{B}(r)}\sum_{x,y \in 2B, x\sim y}(f(x)-f(y))^2 \\ & \lesssim r^2\sum_{x, y \in B(kr+2r), x\sim y}(f(x)-f(y))^2 \\ & \lesssim \frac{1}{(k+2)^2}\sum_{x \in B(2(k+2)r)}f(x)^2.\end{aligned} Now take $k$ large enough (depending only on the group $G$) so that for all $f\in V_r$, $$3^d\sum_{x\in B(kr)}f(x)^2 \le \sum_{x\in B(3kr)}f(x)^2.$$

Consider the quadratic form $Q_r$ on $V$ defined by $Q_r(f) := \sum_{x\in B(r)}f(x)^2$. Since the kernels of $Q_r$‘s form a descending chain of vector spaces, there exists $r_0$ such that $Q_r$ is positive-definite for all $r \ge r_0$.

For every $r \ge r_0$, let $q(r)$ be the volume of the ellipsoid $E_r$ induced by $Q_r$. To be more precise, after fixing a basis $\{f_1, \dots, f_n\}$ of $V$, the ellipsoid is defined by $$E_r := \{(c_1, \dots, c_n) \in \mathbb{R}^n : Q_r(c_1f_1 + \dots c_nf_n) \le 1\}.$$ By scaling and translation, we may assume that $f_i$ is $1$-Lipschitz and $f_i(e)=0$ (whenever $f_i$ is non-constant). Since $\lvert B(r) \rvert$ is at most polynomial in $r$, there exists a natural number $d$ (depending only on the group $G$) such that $\sum_{x \in B(r)} f_i(x)^2 \le r^d$ for all $i \in [n]$. By Cauchy–Schwarz inequality, we have \begin{aligned}Q_r(c_1f_1 + \dots + c_nf_n) & = \sum_{x\in B(r)}(c_1f_1(x)+\dots+c_nf_n(x))^2 \\ & \le n\sum_{x\in B(r)}c_1^2f_1(x)^2 + \dots c_n^2f_n(x)^2 \\ & \le (c_1^2+\dots+c_n^2)n^2r^d.\end{aligned} Therefore $E_r$ contains the ball of radius $(nr^{d/2})^{-1}$, hence $q_r \ge v_n(nr^{d/2})^{-n}$, where $v_n$ is the volume of the $n$-dimensional Euclidean unit ball.

Although the volume $q(r)$ of the ellipsoid is not intrinsic to $Q_r$, the ratio between $q(r)$ and $q(r')$ does not depend on the choice of the basis.

After a linear transformation, we may assume the symmetric matrices associated to $Q_{kr}$ and $Q_{3kr}$ are of the form $$\begin{pmatrix}A_1 & B_1 \\ B_1^T & C_1 \end{pmatrix}, \begin{pmatrix}A_3 & \\ & C_3\end{pmatrix},$$ where $A_1, A_3$ act on $V_r$. Using the Schur complement, we obtain that the volume ratio $q_{3kr}/q_{kr}$ is $$\frac{\det Q_{kr}}{\det Q_{3kr}} = \frac{\det A_1 \det (C_1 - B_1^TA_1^{-1}B_1)}{\det A_3 \det C_3}.$$ As $B_1^TA_1^{-1}B_1$ is positive semi-definite, we obtain $\det(C_1 - B_1^TA_1^{-1}B_1) \le \det C_1$ and so $q_{3kr}/q_{kr}$ is at most $$\frac{\det A_1\det C_1}{\det A_3\det C_3}.$$ Recall that $3^dQ_{kr}(f) \le Q_{3kr}(f)$ for all $f \in V_r$ and clearly $Q_{kr}(f) \le Q_{3kr}(f)$ for all $f \in V$. In other words, $3^dA_1 \preceq A_3$ and $C_1 \preceq C_3$ and so $q_{kr}/q_{3kr} \ge (3^d)^{\dim V_r} \ge (3^d)^{n-C}$.

Choose $n$ so that $(3^d)^{n-C} \ge 2^{dn}$ hence $2^{dn}q(3kr) \le q(kr)$. Repeatedly apply the last inequality to obtain $q(kr) \ge 2^{mdn}q(3^mkr) \ge 2^{mdn}v_n(n(3^mkr)^{d/2})^{-n} = \Omega_{d,n}((2/\sqrt{3})^{mdn})$, which is absurd for $m$ sufficiently large.

Categories

## On the basis exchange property

One of the students in my class, Undergraduate Seminar on Discrete Mathematics, asks if the exchange property of a matroid can be strengthened to the following, of which I was completely unaware.

Strong basis exchange property. For every pair of bases $B_1, B_2$ and $x_1 \in B_1$, there exists $x_2 \in B_2$ such that both $B_1 - x_1 + x_2$ and $B_2 + x_1 - x_2$ are still bases.

As it turns out, Brualdi proved back in 1969 the strong exchange property (Theorem 2) in Comments on bases in dependence structures.

I record below the proof which showcases the benefit of viewing matroids both graphically and linear-algebraically.

As per Aaron Berger‘s suggestion, we use a graphic matroid to motivate the proof. Suppose $T_1$ and $T_2$ are two spanning trees of a connected graph $G$ and $e_1$ is an edge in $T_1$. Clearly, when the edge $e_1$ is also in $T_2$, we can simply exchange $e_1$ with itself. Assume from now that $e_1$ is not in $T_2$. Note that $T_2 + e_1$ contains a unique cycle $C$, each edge of which can be “traded” to $T_1$. To be more precise, for every $e_2 \in C$, $T_2 + e_1 - e_2$ is still a spanning tree. It is then easy to see that adding back some edge from $C - e_1$ can reconnect the two connected components resulted from removing $e_1$ from $T_1$.

As the unique cycle (or circuit) $C$ plays a central role in the above argument, one can show the same concept is valid in any matroid.

Lemma (Fundamental circuit). For every independent set $I$, if an element $e$ satisfies that $I + e$ is dependent, then $I + e$ contains a unique circuit $C$, called the fundamental circuit. Moreover, for every $f \in C$, $I + e - f$ is independent.

Proof of strong basis exchange property. Without loss of generality, we may assume that $e_1 \not\in B_2$. As $B_2$ is a maximal independent set and $B_2 + e_1$ is dependent, there is a unique circuit $C \subseteq B_2 + e_1$ such that $B_2 + e_1 - e_2$ is independent for every $e_2 \in C$.

Finally we resort to our intuition of representable matroids. Since $e_1 \in \mathrm{span}(C - e_1) \subseteq \mathrm{span}(B_1 + C - e_1)$, we have $\mathrm{span}(B_1 + C - e_1) = \mathrm{span}(B_1 + C)$, hence $B_1 + C - e_1$ contains a basis, say $B_1'$. By the ordinary exchange property for $B_1 - e_1$ and $B_1'$, there is $e_2 \in B_1' - (B_1 - e_1) \subseteq C - e_1$ such that $B_1 - e_1 + e_2$ is independent.

Categories

## Minimal Distance to Pi

Here is a problem from Week of Code 29 hosted by Hackerrank.

Problem. Given two integers $q_1$ and $q_2$ ($1\le q_1 \le q_2 \le 10^{15}$), find and print a common fraction $p/q$ such that $q_1 \le q \le q_2$ and $\left|p/q-\pi\right|$ is minimal. If there are several fractions having minimal distance to $\pi$, choose the one with the smallest denominator.

Note that checking all possible denominators does not work as iterating for $10^{15}$ times would exceed the time limit (2 seconds for C or 10 seconds for Ruby).

The problem setter suggested the following algorithm in the editorial of the problem:

1. Given $q$, it is easy to compute $p$ such that $r(q) := p/q$ is the closest rational to $\pi$ among all rationals with denominator $q$.
2. Find the semiconvergents of the continued fraction of $\pi$ with denominators $\le 10^{15}$.
3. Start from $q = q_1$, and at each step increase $q$ by the smallest denominator $d$ of a semiconvergent such that $r(q+d)$ is closer to $\pi$ than $r(q)$. Repeat until $q$ exceeds $q_2$.

Given $q$, let $d = d(q)$ be the smallest increment to the denominator $q$ such that $r(q+d)$ is closer to $\pi$ than $r(q)$. To justify the algorithm, one needs to prove the $d$ is the denominator of one of the semiconvergents. The problem setter admits that he does not have a formal proof.

Inspired by the problem setter’s approach, here is a complete solution to the problem. Note that $\pi$ should not be special in this problem, and can be replaced by any other irrational number $\theta$. Without loss of generality, we may assume that $\theta\in(0,1)$.

We first introduce the Farey intervals of $\theta$.

1. Start with the interval $(0/1, 1/1)$.
2. Suppose the last interval is $(a/b, c/d)$. Cut it by the mediant of $a/b$ and $c/d$ and choose one of the intervals $(a/b, (a+c)/(b+d)), ((a+c)/(b+d), c/d)$ that contains $\theta$ as the next interval.

We call the intervals appeared in the above process Farey intervals of $\theta$. For example, take $\theta = \pi - 3 = 0.1415926...$. The Farey intervals are:

$$\begin{gathered}(0/1, 1/1), (0/1, 1/2), (0/1, 1/3), (0/1, 1/4), (0/1, 1/5), \\ (0/1, 1/6), (0/1, 1/7), (1/8, 1/7), (2/15, 1/7),\cdots\end{gathered}$$

The Farey sequence of order $n$, denoted by $F_n$, is the sequence of completely reduced fractions between 0 and 1 which when in lowest terms have denominators less than or equal to $n$, arranged in order of increasing size. Fractions which are neighboring terms in any Farey sequence are known as a Farey pair. For example, Farey sequence of order 5 is
$$F_5 = (0/1, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 1/1).$$

Using the Stern–Brocot tree, one can prove that

Lemma 1. For every Farey interval $(a/b, c/d)$ of $\theta$, the pair $(a/b, c/d)$ is a Farey pair. Conversely, for every Farey pair $(a/b, c/d)$, if $\theta \in (a/b, c/d)$, then $(a/b, c/d)$ is a Farey interval.
We say $p/q$ is a good rational approximation of $\theta$ if every rational between $p/q$ and $\theta$ (exclusive) has a denominator greater than $q$.

By the definition of Farey sequence, incorporating with Lemma 1, we know that

Lemma 2. A rational is an endpoint of a Farey interval of $\theta$ if and only if it is a good rational approximation of $\theta$.

In fact, one can show that the endpoints of Farey intervals and semiconvergents of continued fraction are the same thing! Thereof, the problem setter’s claim follows immediately from:

Proposition. Given $q$, let $r(q) = p / q$ be the rational closest to $\theta$ with denominator $q$. If $d = d(q)$ is the minimal increment to $q$ such that $r(q + d) = (p + c) / (q + d)$ is closer to $\theta$ than $r(q)$, then $c/d$ is a good rational approximation.

Remark. The proposition states that the increments to $p/q$ always come from a good rational approximation. It is stronger than the problem setter’s statement, which only asserts that the increment to the $q$ comes from a good rational approximation.

Proof. In $(x, y)$-plane, plot the trapezoid defined by

$$\left| y/x - \theta \right| < \left|p/q - \theta\right|, q < x < q + d.$$

Also we interpret rational numbers $p/q, (p+c)/(q+d)$ as points $A = (q, p), B = (q+d, p+c)$. Let the line through $(q, p)$ parallel to $y=\theta x$ intersect the vertical line $x = q+d$ at $C = (q+d, p+\theta d)$. By the definition of $d$, we know that the trapezoid does not contain lattice points. In particular, there is no lattice point in the interior of the triangle $ABC$. In the coordinate system with origin at $A$, $B$ has coordinate $(d, c)$ and the line through $A, C$ is $y = \theta x$. Since triangle $ABC$ contains no lattice points, there is no $(b, a)$ with $b < d$ such that $a/b$ is between $\theta$ and $c/d$. In other words, $c/d$ is a good rational approximation.

Here is a fine print of the algorithm. Because floats may not have enough precision for the purpose of computation, we will instead use a convergent of the continuous fraction of $\pi$ instead. All the computations will then happen in $\mathbb{Q}$. Finally, we present the algorithm.

P = Rational(5706674932067741, 1816491048114374) - 3
# find endpoints of Farey intervals
a, b, c, d = 0, 1, 1, 1
farey = [[a,b],[c,d]]
while (f = b + d) <= max - min
farey << [e = a + c, f]
if P < Rational(e, f)
c, d = e, f
else
a, b = e, f
end
end
min, max = gets.split.map(&:to_i)
p_min = (P * q).round
# increase p_min/min by frations in farey
while min <= max
c, d = nil, nil
farey.each do |a, b|
break if min + b > max
if (Rational(p_min + a, min + b) - P).abs < (Rational(p_min, min) - P).abs
c, d = a, b
break
end
end
break if d == nil
p_min += c; min += d
end
puts "#{p_min + 3 * min}/#{min}"

Categories

## A Short Proof of the Nash-Williams' Partition Theorem

Notations.

1. $\mathbb{N}$ – the set of natural numbers;
2. $\binom{M}{k}$ – the family of all subsets of $M$ of size $k$;
3. $\binom{M}{<\omega}$ – the family of all finite subsets of $M$;
4. $\binom{M}{\omega}$ – the family of all infinite subsets of $M$;

The infinite Ramsey theorem, in its simplest form, states that for every partition $\binom{\mathbb{N}}{k} = \mathcal{F}_1 \sqcup \dots \sqcup \mathcal{F}_r$, there exists an infinite set $M\subset \mathbb{N}$ such that $\binom{M}{k}\subset \mathcal{F}_i$ for some $i\in [r]$. The Nash-Williams‘ partition theorem can be seen as a strengthening of the infinite Ramsey theorem, which considers a partition of a subset of $\binom{\mathbb{N}}{<\omega}$.

Notations.

1. $\mathcal{F}\restriction M$$\mathcal{F}\cap 2^M$, that is, the set $\{s\in\mathcal{F} : s\subset M\}$.
2. $s \sqsubset t$, where $s,t$ are subsets of $\mathbb{N}$$s$ is an initial segment of $t$, that is $s = \{n\in t : n \le \max s\}$.

Definition. Let set $\mathcal{F} \subset \binom{\mathbb{N}}{<\omega}$.

1. $\mathcal{F}$ is Ramsey if for every partition $\mathcal{F}=\mathcal{F}_1\sqcup \dots\sqcup\mathcal{F}_r$ and every $M\in\binom{\mathbb{N}}{\omega}$, there is $N\in\binom{M}{\omega}$ such that $\mathcal{F}_i\restriction N = \emptyset$ for all but at most one $i\in[r]$.
2. $\mathcal{F}$ is a Nash-Williams family if for all $s, t\in\mathcal{F}, s\sqsubset t \implies s = t$.

Theorem (Nash-Williams 1965). Every Nash-Williams family is Ramsey.

The proof presented here is based on the proof given by Prof. James Cummings in his Infinite Ramsey Theory class. The purpose of this rewrite is to have a proof that resembles the one of the infinite Ramsey theorem.

Notation. Let $s\in\binom{\mathbb{N}}{<\omega}$ and $M\in\binom{\mathbb{N}}{\omega}$. Denote $$[s, M] = \left\{t \in \binom{\mathbb{N}}{<\omega} : t \sqsubset s \text{ or } (s \sqsubset t \text{ and } t\setminus s \subset M)\right\}.$$

Definition. Fix $\mathcal{F}\subset \binom{\mathbb{N}}{<\omega}$ and $s\in \binom{\mathbb{N}}{<\omega}$.

1. $M$ accepts $s$ if $[s, M]\cap \mathcal{F}\neq \emptyset$ and $M$ rejects $s$ otherwise;
2. $M$ strongly accepts $s$ if every infinite subset of $M$ accepts $s$;
3. $M$ decides $s$ if $M$ either rejects $s$ or strongly accepts it.

We list some properties that encapsulates the combinatorial characteristics of the definitions above.

Properties.

1. If $M$ decides (or strongly accepts, or rejects) $s$ and $N\subset M$, then $N$ decides (respectively strongly accepts, rejects) $s$ as well.
2. For every $M\in\binom{\mathbb{N}}{\omega}$ and $s\in\binom{\mathbb{N}}{<\omega}$, there is $N_1\in\binom{M}{\omega}$ deciding $s$. Consequently, there is $N_2\in\binom{M}{\omega}$ deciding every subset of $s$.

Proof of Theorem. Enough to show that if $\mathcal{F} = \mathcal{F}_1\sqcup \mathcal{F}_2$, then for every $M\in\binom{\mathbb{N}}{\omega}$, there is infinite $N\in \binom{M}{\omega}$ such that $F_i \restriction N = \emptyset$ for some $i\in[2]$.

We are going to use $\mathcal{F}_1$ instead of $\mathcal{F}$ in the definitions of “accept”, “reject”, “strongly accept” and “decide”. Find $N\in \binom{M}{\omega}$ that decides $\emptyset$. If $N$ rejects $\emptyset$, by definition $\mathcal{F}_1\restriction N = [\emptyset, N]\cap \mathcal{F}_1 = \emptyset$. Otherwise $N$ strongly accepts $\emptyset$.

Inductively, we build a decreasing sequence of infinite sets $N \supset N_1 \supset N_2\supset \dots$, an increasing sequence of natural numbers $n_1, n_2, \dots$, and maintain that $n_i\in N_i, n_i < \min N_{i+1}$ and that $N_i$ strongly accepts every $s\subset \{n_j : j < i\}$. Initially, we take $N_1 = N$ as $N$ strongly accepts $\emptyset$.

Suppose $N_1 \supset \dots \supset N_i$ and $n_1 < \dots < n_{i-1}$ have been constructed. In the following lemma, when taking $M = N_i$ and $s = \{n_j : j < i\}$, it spits out $m$ and $N$, which are exactly what we need for $n_i$ and $N_{i+1}$ to finish the inductive step.

Lemma. Suppose $M\in\binom{\mathbb{N}}{\omega}$, $s\in\binom{\mathbb{N}}{<\omega}$ and $\max s < \min M$. If $M$ strongly accepts every subset of $s$, then there are $m \in M$ and $N \in \binom{M}{\omega}$ such that $n < \min N$ and $N$ strongly accepts every subset of $s\cup \{n\}$

Proof of lemma. We can build $M = M_0 \supset M_1\supset M_2 \supset \dots$ such that for every $i$, $m_i := \min M_i < \min M_{i+1}$ and $M_{i+1}$ decides every subset of $s\cup \{m_i\}$. It might happen that $M_{i+1}$ rejects a subset of $s\cup \{m_i\}$. However, we claim that this cannot happen for infinitely many times.

Otherwise, by the pigeonhole principle, there is $t\subset s$ such that $I = \{i : M_{i+1} \text{ rejects }t\cup\{m_{i}\}\}$ is infinite. Let $M' = \{m_i : i\in I\}$. Note that $[t, M'] \subset \cup_i [t\cup\{m_i\}, M_{i+1}]$, and so $[t,M']\cap \mathcal{F}_1\subset \cup_i \left([t\cup\{m_i\}, M_{i+1}]\cap\mathcal{F}_1\right) = \emptyset$. Hence $M'\subset M$ rejects $t\subset s$, which is a contradiction.

Now we pick one $i$ such that $M_{i+1}$ strongly accepts every subset of $s\cup\{m_i\}$, and it is easy to check that $m = m_i$ and $N = M_{i+1}$ suffice.

Finally, we take $N_\infty = \{n_1, n_2, \dots\}$. For any $s\in\binom{N_\infty}{<\omega}$, there is $i$ such that $s\subset \{n_1, \dots, n_{i-1}\}$. Note that $N_i$ strongly accepts $s$ and $N_\infty\subset N_i$. Therefore $N_\infty$ (strongly) accepts $s$, that is $[s, N_\infty]\cap \mathcal{F}_1 \neq \emptyset$, and say $t\in [s, N_\infty]\cap \mathcal{F}_1$. Because $t\in\mathcal{F}_1$ and $\mathcal{F} = \mathcal{F}_1 \sqcup \mathcal{F}_2$ is a Nash-Williams family, $s\notin \mathcal{F}_2$.

Categories

## Alternative to Beamer for Math Presentation

Although using blackboard and chalk is the best option for a math talk for various reasons, sometimes due to limit on the time, one has to make slides to save time on writing. The most common tools to create slides nowadays are LaTeX and Beamer.

When I was preparing for my talk at Vancouver for Connections in Discrete Mathematics in honor of the work of Ron Graham, as it is my first ever conference talk, I decided to ditch Beamer due to my lack of experience. Finally, I ended up using html+css+javascript to leverage my knowledge in web design.

The javascript framework I used is reveal.js. Though there are other options such as impress.js, reveal.js fits better for a math talk. One can easily create a text-based presentation with static images / charts. The framework also has incorporated with MathJax as an optional dependency, which can be added with a few lines of code. What I really like about reveal.js as well as impress.js is that they provide a smooth spatial transition between slides. However, one has to use other javascript library to draw and animate diagrams. For that, I chose raphael.js, a javascript library that uses SVG and VML for creating graphics so that users can easily, for example, create their own specific chart. The source code of the examples on the official website is really a good place to start.

To integrate reveal.js and raphael.js to achieve a step-by-step animation of a diagram, I hacked it by adding a dummy fragment element in my html document so that reveal.js can listen to the fragmentshown event and hence trigger raphael.js to animate the diagram. In cases where the diagrams are made of html elements, I used jQuery to control the animation. Here is my favorite animation in the slides generated by jQuery.

However, one has to make more effort to reverse the animation made by raphael.js or jQuery if one wants to go backwards in slides. I did not implement any reverse animation since I did not plan to go back in slides at all.

In case there is no internet access during the presentation, one has to have copies of all external javascript libraries (sometimes also fonts), which, in my case, are MathJax, raphael.js and jQuery. In order to use MathJax offline, one need to configure reveal.js.

Currently, my slides only work on Chrome correctly. There is another bug that I have not figured out yet. If I start afresh from the first slide, then my second diagram generated by Raphael is not rendered correctly. I got around it by refreshing the slide where the second diagram lives. This is still something annoying that I would like to resolve.

After all, I really like this alternative approach of making slides for math presentation because it enables me to implement whatever I imagine.

Categories

## A Short Proof for Hausdorff Moment Problem

Hausdorff moment problem asks for necessary and sufficient conditions that a given sequence $(m_n)$ with $m_0=1$ be the sequence of moments of a random variable $X$ supported on $[0,1]$, i.e., $\operatorname{E}X^n=m_n$ for all $n$.

In 1921, Hausdorff showed that $(m_n)$ is such a moment sequence if and only if the sequence is completely monotonic, i.e., its difference sequences satisfy the equation $(D^r m)_s \ge 0$ for all $r, s \ge 0$. Here $D$ is the difference operator on the space of real sequences $(a_n)$ given by $D a = (a_{n} - a_{n+1})$.

The proof under the fold follows the outline given in (E18.5 – E18.6) Probability with Martingales by David Williams.

Proof of Necessity. Suppose $(m_n)$ is the moment sequence of a random variable $X$ supported on $[0,1]$. By induction, one can show that $(D^r m)_s = \operatorname{E}(1-X)^rX^s$. Clearly, as $X$ is supported on $[0,1]$, the moment sequence is completely monotonic.

Proof of Sufficiency. Suppose $(m_n)$ is a completely monotonic sequence with $m_0 = 1$.

Define $F_n(x) := \sum_{i \le nx}{n\choose i}(D^{n-i}m)_i$. Clearly, $F_n$ is right-continuous and non-decreasing, and $F_n(0^-) = 0$. To prove $F_n(1) = 1$, one has to prove the identity $$\sum_{i}{n\choose i}(D^{n-i}m)_i = m_0.$$

A classical trick. Since the identity above is about vectors in the linear space (over the reals) spanned by $(m_n)$ and the linear space spanned by $(m_n)$ is isomorphic to the one spanned by $(\theta^n)$, the identity is equivalent to $$\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \theta^0,$$ where $\theta_n = \theta^n$. Now, we take advantage of the ring structure of $\mathbb{R}[\theta]$. Notice that $(D^{r}\theta)_s = (1-\theta)^r\theta^s$. Using the binomial theorem, we obtain $$\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \sum_{i}{n\choose i}(1-\theta)^{n-i}\theta^i = (1-\theta + \theta)^n = \theta^0.$$

Therefore $F_n$ is a bona fide distribution function. Define $m_{n, k} := \int_{[0,1]} x^kdF_n(x)$, i.e., $m_{n,k}$ is the $k$th moment of $F_n$. We now find an explicit formula for $m_{n,k}$.

Noticing that $F_n$ is constant, say $c_{n,i}$, on $[\frac{i}{n}, \frac{i+1}{n})$, for all $i=0, \dots, n-1$ and $c_{n,i}$ is a linear combination of $m_0, \dots, m_n$, we know that $m_{n,k} = \sum_{i=0}^n a_{n,k,i}m_i$.

Just like what we did in proving the identity, we use the special case $m_n = \theta^n$ to compute the coefficients $a_i = a_{n,k,i}$, where $0 \le \theta \le 1$. In this case, $$F_n(x) = \sum_{i \le nx}{n\choose i}(D^{n-i}\theta)_i = \sum_{i\le nx}{n\choose i}(1-\theta)^{n-i}\theta^i, m_{n,k} = \sum_{i=0}^n a_{i}\theta^i.$$

Now consider the situation in which a coin with probability $\theta$ is tossed at times $1,2,\dots$. The random variable $H_k$ is $1$ if the $k$th toss produces heads, $0$ otherwise. Define $A_n := (H_1 + \dots + H_n)/n$. It is immediate from the formula of $F_n$ that $F_n$ is the distribution function of $A_n$, and so $m_{n,k}$ is the $k$th moment of $A_n$. However, one can calculate the $k$th moment of $A_n$ explicitly. Let $f\colon [k] \to [n]$ be chosen uniformly at random, $Im_f$ be the cardinality of the image of $f$ and denote by $p_i = p_{n,k,i} := \operatorname{Pr}(Im_f = i)$. Using $f, Im_f$ and $p_i$, we obtain \begin{aligned}\operatorname{E}A_n^k & = \operatorname{E}\left(\frac{H_1 + \dots + H_n}{n}\right)^k = \operatorname{E}H_{f(1)}\dots H_{f(k)} \\ & = \operatorname{E}\operatorname{E}[H_{f(1)}\dots H_{f(k)}\mid Im_f] = \operatorname{E}\theta^{Im_f} = \sum_{i=0}^n p_{i}\theta^i.\end{aligned} Therefore, for all $\theta\in [0,1]$, we know that $\sum_{i=0}^n a_i\theta^i = \sum_{i=0}^n p_i\theta^i$, and so $a_i = p_i$ for all $i=0,\dots, n$.

As both $(a_i)$ and $(p_i)$ do not depend on $m_i$, $a_i = p_i$ holds in general. Since $p_k = p_{n, k, k} = \prod_{i=0}^{k-1}(1-i/n)\to 1$ as $n\to\infty$ and $p_i = 0$ for all $i > k$, we know that $\lim_n m_{n,k}= m_k$.

Using the Helly–Bray Theorem, since $(F_n)$ is tight, there exists a distribution function $F$ and a subsequence $(F_{k_n})$ such that $F_{k_n}$ converges weakly to $F$. The definition of weak convergence implies that $$\int_{[0,1]} x^k dF(x) = \lim_n \int_{[0,1]}x^k dF_{k_n}(x) = \lim_n m_{k_n,k} = m_k.$$ Therefore, the random variable $X$ with distribution function $F$ is supported on $[0,1]$ and its $k$th moment is $m_k$.

There are other two classical moment problems: the Hamburger moment problem and the Stieltjes moment problem.

Categories

Categories

## An Upper Bound on Stirling Number of the Second Kind

We shall show an upper bound on the Stirling number of the second kind, a byproduct of a homework exercise of Probabilistic Combinatorics offered by Prof. Tom Bohman.

Definition. A Stirling number of the second kind (or Stirling partition number) is the number of ways to partition a set of $n$ objects into $k$ non-empty subsets and is denoted by $S(n,k)$.

Proposition. For all $n, k$, we have $$S(n,k) \leq \frac{k^n}{k!}\left(1-(1-1/m)^k\right)^m.$$

Proof. Consider a random bipartite graph with partite sets $U:=[n], V:=[k]$. For each vertex $u\in U$, it (independently) connects to exactly one of the vertices in $V$ uniformly at random. Suppose $X$ is the set of non-isolated vertices in $V$. It is easy to see that $$\operatorname{Pr}\left(X=V\right) = \frac{\text{number of surjections from }U\text{ to }V}{k^n} = \frac{k!S(n,k)}{k^n}.$$

On the other hand, we claim that for any $\emptyset \neq A \subset [k]$ and $i \in [k]\setminus A$, $$\operatorname{Pr}\left(i\in X \mid A\subset X\right) \leq \operatorname{Pr}\left(i\in X\right).$$ Note that the claim is equivalent to $$\operatorname{Pr}\left(A\subset X \mid i\notin X\right) \geq \operatorname{Pr}\left(A\subset X\right).$$ Consider the same random bipartite graph with $V$ replaced by $V':=[k]\setminus \{i\}$ and let $X'$ be the set of non-isolated vertices in $V'$. The claim is justified since $$\operatorname{Pr}\left(A\subset X\mid i\notin X\right) = \operatorname{Pr}\left(A\subset X'\right) \geq \operatorname{Pr}\left(A\subset X\right).$$

Set $A:=[i-1]$ in above for $i = 2, \ldots, k$. Using the multiplication rule with telescoping the conditional probability, we obtain \begin{aligned}\operatorname{Pr}\left(X=V\right) =& \operatorname{Pr}\left(1\in X\right)\operatorname{Pr}\left(2\in X \mid [1]\subset X\right) \\ & \ldots \operatorname{Pr}\left(k\in X\mid [k-1]\subset X\right)\\ \leq & \operatorname{Pr}\left(1\in X\right)\operatorname{Pr}\left(2\in X\right)\ldots\operatorname{Pr}\left(k\in X\right) \\ = & \left(1-(1-1/m)^k\right)^m.\end{aligned}

Categories

## A Probabilistic Proof of Isoperimetric Inequality

This note is based on Nicolas Garcia Trillos’ talk, Some Problems and Techniques in Geometric Probability, at Carnegie Mellon University on January 29, 2015.

In particular, we demonstrate a probabilistic proof of the isoperimetric inequality. The proof can also be found in Integral Geometry and Geometric Probability by Luis A. Santaló.

Theorem. For a convex set with perimeter $L$ and area $A$, the isoperimetric inequality states that $4\pi A\leq L^2$, and that equality holds if and only if the convex set is a disk. (Assume that the boundary is a closed convex curve of class $C^1$.)

Proof. Let $P(s)$ parametrize the boundary by its arc length $s$. Given $s$ and $\theta$. Suppose the line through $P(s)$ whose angle to the tangent line equals $\theta$ intersects the boundary at another point $Q(s)$. Let $\sigma(s, \theta)$ be the length of the chord between the two intersections $P(s), Q(s)$. Consider the integral $$\int (\sigma_1\sin\theta_2 - \sigma_2\sin\theta_1)^2 \mathrm{d}s_1\mathrm{d}\theta_1\mathrm{d}s_2\mathrm{d}\theta_2,$$ where the integration extends over $0 \leq s_1, s_2 \leq L$ and $0 \leq \theta_1, \theta_2 \leq \pi$.

Expanding the square in the integrand, we obtain that the integral is equal to $$\pi L \int \sigma^2\mathrm{d}s\mathrm{d}\theta - 2\left(\int \sigma\sin\theta\mathrm{d}s\mathrm{d}\theta\right)^2.$$
On one hand, we have $$\int \sigma^2\mathrm{d}s\mathrm{d}\theta = \int_0^L\int_0^\pi \sigma^2\mathrm{d}\theta\mathrm{d}s = \int_0^L 2A\mathrm{d}s = 2LA.$$

On the other hand, let $p$ be the distance from the chord to the origin and $\phi$ the angle from the $x$-axis to the chord. Suppose the angle from the $x$-axis to the tangent line is $\tau$. We have $$p = \langle x, y\rangle\cdot\langle \sin\phi, -\cos\phi \rangle = x\sin\phi - y\cos\phi.$$ Differentiating the latter, we get $$\mathrm{d}p = \sin\phi\mathrm{d}x - \cos\phi\mathrm{d}y + (x\cos\phi + y\sin\phi)\mathrm{d}\phi.$$ Moreover, we know that $$\mathrm{d}x = \cos\tau\mathrm{d}s, \mathrm{d}y = \sin\tau\mathrm{d}s.$$ Therefore $$\mathrm{d}p = \sin\phi\cos\tau\mathrm{d}s - \cos\phi\sin\tau\mathrm{d}s + + (x\cos\phi + y\sin\phi)\mathrm{d}\phi,$$ and so $$\mathrm{d}p\mathrm{d}\phi = \sin(\phi - \tau)\mathrm{d}s\mathrm{d}\phi.$$ Since $\theta + \phi = \tau$ and $\mathrm{d}\theta + \mathrm{d}\phi = \tau'\mathrm{d}s$, we have $$\mathrm{d}p\mathrm{d}\phi = -\sin\theta\mathrm{d}s\mathrm{d}\theta,$$ and so $$\int\sigma\sin\theta\mathrm{d}s\mathrm{d}\theta = \int_0^{2\pi}\int_{-\infty}^\infty \sigma\mathrm{d}p\mathrm{d}\theta = 2\pi A.$$

Since the integral is non-negative, we have that $2\pi A(L^2 - 4\pi A)\geq 0$, and so $4\pi A \leq L^2$. The equality is achieved if and only if $\sigma / \sin\theta$ is a constant, in which case the boundary is a circle.

Remark. The proof is considered a probabilistic proof because the differential form $\mathrm{d}p\mathrm{d}\theta$ is the measure (invariant under rigid motion) of a random line.

Categories

## On Extension of Rationals

This note is based on my talk An Expedition to the World of $p$-adic Numbers at Carnegie Mellon University on January 15, 2014.

#### Construction from Cauchy Sequences

A standard approach to construct the real numbers from the rational numbers is a procedure called completion, which forces all Cauchy sequences in a metric space by adding new points to the metric space.

Definition. A norm, denoted by $|\cdot|$, on the field $F$ is a function from $F$ to the set of nonnegative numbers in an ordered field $R$ such that (1) $|x|=0$ if and only if $x=0$; (2) $|xy|=|x||y|$; (3) $|x+y|\leq|x|+|y|$.

Remark. The ordered field is usually chosen to be $\mathbb{R}$. However, to construct of $\mathbb{R}$, the ordered field is $\mathbb{Q}$.

A norm on the field $F$ naturally gives rise to a metric $d(x,y)=|x-y|$ on $F$. For example, the standard metric on the rationals is defined by the absolute value, namely, $d(x, y) = |x-y|$, where $x,y\in\mathbb{Q}$, and $|\cdot|$ is the standard norm, i.e., the absolute value, on $\mathbb{Q}$.

Given a metric $d$ on the field $F$, the completion procedure considers the set of all Cauchy sequences on $F$ and an equivalence relation
$$(a_n)\sim (b_n)\text{ if and only if }d(a_n, b_n)\to 0\text{ as }n\to\infty.$$

Definition. Two norms on $F$, $|\cdot|_1, |\cdot|_2$, are equivalent if and only if for every sequence $(a_n)$, it is a Cauchy sequence with respect to $d_1$ if and only if it is so with respect to $d_2$, where $d_1, d_2$ are the metrics determined by $|\cdot|_1, |\cdot|_2$ respectively.

Remark. It is reasonable to worry about the situation in which two norms $|\cdot|_1$ and $|\cdot|_2$ are equivalent, but they introduce different equivalent relationships on the set of Cauchy sequences. However, given two equivalent norms, $|a_n|_1$ converges to 0 if and only if it does so with respect to $d_2$. (Hint: prove by contradiction and consider the sequence $(1/a_n)$.)

Definition. The trivial norm on $F$ is a norm $|\cdot|$ such that $|0|=0$ and $|x|=1$ for $x\neq 0$.

Since we are interested in norms that generate different completions of the field, it would be great if we can classify all nontrivial norms modulo the norm equivalence.

#### Alternative Norms on Rationals

Definition. Let $p$ be any prime number. For any non-zero integer $a$, let $\mathrm{ord}_pa$ be the highest power of $p$ which divides $a$. For any rational $x=a/b$, define $\mathrm{ord}_px=\mathrm{ord}_pa-\mathrm{ord}_pb$. Further define a map $|\cdot|_p$ on $\mathbb{Q}$ as follows: $$|x|_p = \begin{cases}p^{-\mathrm{ord}_px} & \text{if }x\neq 0 \\ 0 & \text{if }x=0\end{cases}.$$

Proposition. $|\cdot|_p$ is a norm on $\mathbb{Q}$. We call it the $p$-adic norm on $\mathbb{Q}$.

Proof (sketch). We only check the triangle inequality. Notice that \begin{aligned}\mathrm{ord}_p\left(\frac{a}{b}-\frac{c}{d}\right) & = \mathrm{ord}_p\left(\frac{ad-bc}{bd}\right) \\ & = \mathrm{ord}_p(ad-bc)-\mathrm{ord}_p(bd) \\ & \geq \min(\mathrm{ord}_p(ad), \mathrm{ord}_p(bc)) - \mathrm{ord}_p(bd) \\ &= \min(\mathrm{ord}_p(ad/bd), \mathrm{ord}_p(bc/bd)).\end{aligned} Therefore, we obtain $|a/b-c/d|_p \leq \max(|a/b|_p, |c/d|_p) \leq |a/b|_p+|c/d|_p$.

Remark. Some counterintuitive fact about the $p$-adic norm are
The following theorem due to Ostrowski classifies all possible norms on the rationals up to norm equivalence. We denote the standard absolute value by $|\cdot|_\infty$.

Theorem (Ostrowski 1916). Every nontrivial norm $|\cdot|$ on $\mathbb{Q}$ is equivalent to $|\cdot|_p$ for some prime $p$ or for $p=\infty$.

Proof. We consider two cases (i) There is $n\in\{1,2,\ldots\}$ such that $|n|>1$; (ii) For all $n\in\{1,2,\ldots\}$, $|n|\leq 1$. As we shall see, in the 1st case, the norm is equivalent to $|\cdot|_\infty$, whereas, in the 2nd case, the norm is equivalent to $|\cdot|_p$ for some prime $p$.

Case (i). Let $n_0$ be the least such $n\in\{1,2,\ldots\}$ such that $|n|>1$. Let $\alpha > 0$ be such that $|n_0|=n_0^\alpha$.

For every positive integer $n$, if $n_0^s \leq n < n_0^{s+1}$, then we can write it in $n_0$-base: $$n = a_0 + a_1n_0 + \ldots + a_sn_0^s.$$

By the choice of $n_0$, we know that $|a_i|\leq 1$ for all $i$. Therefore, we obtain \begin{aligned}|n| & \leq |a_0| + |a_1||n_0| + \ldots + |a_s||n_0|^s \\ & \leq 1 + |n_0| + \ldots |n_0|^s \\ & \leq n_0^{s\alpha}\left(1+n_0^{-\alpha}+n_0^{-2\alpha}+\ldots\right) \\ & \leq Cn^\alpha,\end{aligned} where $C$ does not depend on $n$. Replace $n$ by $n^N$ and get $|n|^N = |n^N| \leq Cn^{N\alpha}$, and so $|n|\leq \sqrt[N]{C}n^\alpha$. As we can choose $N$ to be arbitrarily large, we obtain $|n| \leq n^\alpha$.

On the other hand, we have \begin{aligned}|n| & \geq |n_0^{s+1}| - |n_0^{s+1}-n| \\ & \geq n_0^{(s+1)\alpha} - (n_0^{s+1}-n_0^s)^\alpha\\ & = n_0^{(s+1)\alpha}\left[1-(1-1/n_0)^\alpha\right] \\ & > C'n^\alpha.\end{aligned} Using the same trick, we can actually take $C'=1$.

Therefore $|n| = n^\alpha$. It is easy to see it is equivalent to $|\cdot|_\infty$.

Case (ii). Since the norm is nontrivial, let $n_0$ be the least $n$ such that $|n|<1$.

Claim 1. $n_0=p$ is a prime.

Claim 2. $|q|=1$ if $q$ is a prime other than $p$.

Suppose $|q| < 1$. Find $M$ large enough so that both $|p^M|$ and $|q^M|$ are less than $1/2$. By Bézout’s lemma, there exists $a,b\in\mathbb{Z}$ such that $ap^M + bq^M = 1$. However, $$1 = |1| \leq |a||p^M| + |b||q^M| < 1/2 + 1/2 = 1,$$ a contradiction.

Therefore, we know $|n|=|p|^{ord_p n}$. It is easy to see it is equivalent to $|\cdot|_p$.

#### Non-Archimedean Norm

As one might have noticed, the $p$-adic norm satisfies an inequality stronger than the triangle inequality, namely $|a\pm b|_p\leq \max(|x|_p, |y|_p)$.

Definition. A norm is non-Archimedean provided $|x\pm y|\leq \max(|x|, |y|)$.

The world of non-Archimedean norm is good and weird. Here are two testimonies.

Proposition (no more scalene triangles). If $|x|\neq |y|$, then $|x\pm y| = \max(|x|, |y|)$.

Proof. Suppose $|x| < |y|$. On one hand, we have $|x\pm y| \leq |y|$. On the other hand, $|y| \leq \max(|x\pm y|, |x|)$. Since $|x| < |y|$, we must have $|y| \leq |x\pm y|$.

Proposition (all points are centers). $D(a, r^-) = D(b, r^-)$ for all $b\in D(a, r^-)$ and $D(a, r) = D(b,r)$ for all $b\in D(a,r)$, where $D(c, r^-) = \{x : |x-c| and $D(c,r)=\{x:|x-c|\leq r\}$.

#### Construction of $p$-adic Numbers

The $p$-adic numbers are the completion of $\mathbb{Q}$ via the $p$-adic norm.

Definition. The set of $p$-adic numbers is defined as $$\mathbb{Q}_p = \{\text{Cauchy sequences with respect to }|\cdot|_p\} / \sim_p,$$ where $(a_n)\sim_p(b_n)$ iff $|a_n-b_n|_p\to 0$ as $n\to\infty$.

We would like to extend $|\cdot|_p$ from $\mathbb{Q}$ to $\mathbb{Q}_p$. When extending $|\cdot|_\infty$ from $\mathbb{Q}$ to $\mathbb{R}$, we set $|[(a_n)]|_\infty$ to be $[(|a_n|)]$, an element in $\mathbb{R}$. However, the values that $|\cdot|_p$ can take, after the extension, are still in $\mathbb{Q}$.

Definition. The extension of $|\cdot|_p$ on $\mathbb{Q}_p$ is defined by $|[(a_n)]|_p = \lim_{n\to\infty}|a_n|_p$.

Remark. Suppose $(a_n)\sim_p (a_n')$. Then $\lim_{n\to\infty}|a_n-a_n'|_p=0$, and so $\lim_{n\to\infty}|a_n|_p=\lim_{n\to\infty}|a_n'|_p$. Moreover, one can prove that $\lim_{n\to\infty}|a_n|_p$ always exists provided that $(a_n)$ is a Cauchy sequence. (Hint: Suppose $\lim_{n\to\infty}|a_n|_p > 0$. There exists $\epsilon > 0$ such that $|a_n|_p > \epsilon$ infinitely often. Choose $N$ enough so that $|a_m - a_n|_p < \epsilon$ for all $m,n>N$. Use ‘no more scalene triangles!’ property to deduce a contradiction.)

#### Representation of $p$-adic Numbers

Even though each real number is an equivalence class of Cauchy sequences, each equivalence class has a canonical representative. For instance, the canonical representative of $\pi$ is $(3, 3.1, 3.14, 3.141, 3.1415, \ldots)$. The analog for $\mathbb{Q}_p$ is the following.

Theorem. Every equivalence class $a$ in $\mathbb{Q}_p$ for which $|a|_p\leq 1$ has exactly one representative Cauchy sequence of the form $(a_i)$ for which (1) $0\leq a_i < p^i$ for $i=1,2,3,\ldots$; (2) $a_i = a_{i+1} (\pmod p^i)$ for $i=1,2,3,\ldots$.

Proof of uniqueness. Prove by definition chasing.

Proof of existence. We shall repeatedly apply the following lemma.

Lemma. For every $b\in\mathbb{Q}$ for which $|b|_p \leq 1$ and $i\in\mathbb{N}$, there exists $a\in\{0, \ldots, p^i-1\}$ such that $|a-b|_p \leq p^{-i}$.

Proof of Lemma. Suppose $b=m/n$ is in the lowest form. As $|b|_p\leq 1$, we know that $(n, p^i)=1$. By Bézout’s lemma, $an+a'p^i=m$ for some integers $a,a'$. We may assume $a\in\{0,\ldots,p^i-1\}$. Note that $a-b=a'p^i/n$, and so $|a-b|_p \leq p^{-i}$.

Suppose $(c_i)$ is a representative of $a$. As $(c_i)$ is a Cauchy sequence, we can extract a subsequence $(b_i)$ such that $|b_i - b_j| \leq p^{-i}$ for all $i < j$ which is still a representative of $a$. Using the lemma above, for each $b_i$, we can find $0 \leq a_i < q^i$ such that $|a_i-b_i|_p \leq q^{-i}$. Therefore $(a_i)$ is a representative of $a$ as well. For all $i, we have $|a_i-a_j|_p \leq \max(|a_i - b_i|_p, |b_i-b_j|_p, |b_j-a_j|_p) \leq q^{-i}$, Therefore $q^i$ divides $a_i - a_j$.

For $|a|_p\leq 1$, we write $a=b_0 + b_1p + b_2p^2 + \ldots$, where $(b_{n-1}b_{n-2}\ldots b_0)_p = a_n$.

What if $|a|_p > 1$? As $|ap^m|=|a|/p^m$, $|ap^m|\leq 1$ for some natural number $m$. By the representation theorem, we can write $ap^m = b_0 + b_1p + b_2p^2 + \ldots$, and $a = b_0p^{-m} + b_1p^{-m+1} + b_2p^{-m+2} + \ldots$.

Using the representation of $p$-adic numbers, one can perform arithmetic operations such as addition, subtraction, multiplication and division.

Like $\mathbb{R}$, $\mathbb{Q}_p$ is not algebraically closed. Though $\mathbb{C}$, the algebraic closure of $\mathbb{R}$, has degree $2$ over $\mathbb{R}$, and it is complete with respect to the absolute value, it is not so for $\overline{\mathbb{Q}_p}$, the algebraic closure of $\mathbb{Q}_p$. In fact, $\overline{\mathbb{Q}_p}$ has infinite degree over $\mathbb{Q}_p$ and is, unfortunately, not complete with respect to proper extension of $|\cdot|_p$. The good news is that the completion of $\overline{\mathbb{Q}_p}$, denoted by $\Omega$ is algebraically closed.