Discrete Thoughts

On the Stirling numbers of the first kind

Zilin — Sat, 20 Nov 2021 02:13:40 +0000

I am teaching MAT 412/512 Introduction to Combinatorics this semester. Earlier in the semester, I introduced the Stirling numbers of the second kind, and I did not find a good opportunity to introduce the Stirling numbers of the first kind until I started introducing permutation groups and group actions to build up to the Pólya enumeration theorem.

To my surprise, the timing is actually perfect for the Stirling numbers of the first kind to get introduced after the Pólya enumeration theorem because of the following combinatorial identity.

Theorem. For every natural number $n$ ,
$x(x+1)(x+2) \dots (x + n - 1) = \sum_{k=0}^n \left[ n \atop k \right]x^k.$

Recall that the (unsigned) Stirling number $\left[ n \atop k \right]$ of the first kind is the number of permutations of $\{1, 2, \dots, n\}$ with exactly $k$ cycles. Certainly one can establish the above combinatorial identity by induction via the recurrence:
$\left[ n \atop k \right] = (n-1)\left[ {n-1} \atop k \right] + \left[ {n-1} \atop {k-1} \right].$

But the inductive proof buries the combinatorial story behind the identity. While I was preparing for the lectures, I rediscovered a different proof using the Pólya enumeration theorem. Let me record it here.

Proof. We first assume that $x$ is an arbitrary positive integer. We count the number of colorings of $n$ identical balls using $x$ colors. Certainly, pre the Pólya enumeration theorem, the number of such colorings is exactly the number of multisets of size $n$ with elements in $\{1, 2, \dots, x\}$ , which is $\binom{x+n-1}{n}$ . Post the Pólya enumeration theorem, the number of such colorings is given by
$\frac{1}{n!}\sum_{g\in S_n}x^{c(g)},$
where $S_n$ is the symmetry group on $\{1, 2, \dots, n\}$ , and $c(g)$ denotes the number of cycles in the permutation $g$ . The above expression can be rewritten as
$\frac{1}{n!}\sum_{k=0}^n \left[ n \atop k \right]x^k.$

Up to this point, the combinatorial identity is established for all positive integers $x$ . If the polynomial on the left hand side is not identical to that on the right hand side, then the identity could only hold for at most $n$ many $x$ ‘s, which is a contradiction.[qed]

On Gromov’s theorem on groups of polynomial growth

Zilin — Sun, 24 Nov 2019 05:16:30 +0000

This article documents my presentation of Gromov’s theorem on groups of polynomial growth at the MIT combinatorics reading group. The presentation is based on Gromov’s 1981 paper, Groups of polynomial growth and expanding maps, Kleiner’s 2007 paper, A new proof of Gromov’s theorem on groups of polynomial growth, and Tao’s 2009 blog post, A finitary version of Gromov’s polynomial growth theorem.

Introduction

Throughout, we fix a finitely generated group $G$ and a finite symmetric generating set $S$ (that is $\forall x \in S. x^{-1}\in S$ ). For every group $x \in G$ , the word length $\lVert x \rVert$ of $x$ is the shortest length $n$ of a word $s_1s_2 \dots s_n$ in $S$ that expresses $x$ .

Gromov’s theorem connects a group property of $G$ with the growth of the cardinality of the ball $B(r) := \left\{x \in G : \lVert x \rVert \le r\right\}$ of radius $r$ . The ball $B(r)$ can be seen as the set of vertices within distance $r$ from the identity element of $G$ in the Cayley graph of $(G, S)$ .

Definition (nilpotent and virtually nilpotent). A group $H$ is nilpotent of class $n$ if there is a lower central series $H = H_0 \rhd H_1 \rhd \dots \rhd H_n = \{e\},$ where $H_{i+1} = [H_i, H]$ . A group $G$ is virtually nilpotent if there is a finite index subgroup $H$ of $G$ that is nilpotent.

Example 1. When $G$ is abelian, $G$ is nilpotent and $\lvert B(r) \rvert = r^{\mathrm{rank}(G)}$ .

Example 2. When $G$ is the discrete Heisenberg group, that is $G = \left\{ \begin{pmatrix} 1 & a & b \\ & 1 & c \\ & & 1 \end{pmatrix} : a, b, c \in \mathbb{Z} \right\},$ we have the lower central series $G > \langle \begin{pmatrix} 1 & & 1 \\ & 1 & \\ & & 1\end{pmatrix} \rangle > \{e\},$ and the growth of $\lvert B(r) \rvert$ is bounded by $r^4$ .

Theorem (Gromov 1981). For every group $G$ generated by a finite symmetric set $S$ , $\lvert B(r)\rvert$ is at most polynomial in $r$ if and only if $G$ is virtually nilpotent.

Gromov’s proof uses the deep Montgomery–Zippin–Yamabe structure theory of locally compact groups on Hilbert’s fifth problem. Colding and Minicozzi, solving a conjecture of Yau, showed that the space of harmonic functions with polynomial growth on an open manifold with non-negative Ricci curvature. To state (a weak version of) the discrete analog of their result, we need the notion of Lipschitz harmonic on the group $G$ with the finite symmetric generating set $S$ .

Definition (Lipschitz and harmonic). A function $f\colon G\to\mathbb{R}$ is Lipschitz if $\sup_{g \in G, s \in S}\lvert f(gs) - f(g) \rvert$ is finite, and is harmonic if $f(g) = \frac{1}{\lvert S \rvert} \sum_{s \in S} f(gs)$ for all $g \in G$ .

Theorem (Colding and Minicozzi 1997, Kleiner 2010). If $\vert B(r)\lvert$ is at most polynomial in $r$ , then the linear space of Lipschitz harmonic functions on $G$ is finite dimensional.

Kleiner provided a new proof of Gromov’s theorem by establishing the Colding–Minicozzi theorem from scratch. Later Shalom and Tao pushed Kleiner’s methods to obtain the following quantitative version of Gromov’s theorem.

Theorem (Shalom and Tao 2010). There exists an absolute constant $c$ such that if $\lvert B(r) \rvert \le r^d$ for some $r > \exp(\exp(cd^c))$ then $G$ contains a finite index subgraph $H$ which is nilpotent of class $\le c^d$ .

In the rest of the article, we will prove the Colding–Minicozzi theorem through the Poincaré inequality and the reverse Poincaré inequality.

Poincaré inequality

Lemma (Poincaré inequality). For every function $f \colon G \to \mathbb{R}$ , if $f$ has mean $0$ on $B(r)$ , the $l^2$ -norm of $f$ on $B(r)$ is bounded by the fluctuation of $f$ on $B(2r)$ : $\sum_{x\in B(r)}f(x)^2 \le \frac{\lvert B(2r) \rvert}{\lvert B(r) \rvert}\cdot 2r^2 \sum_{x,y\in B(2r), x\sim y}(f(x) - f(y))^2.$

Proof. One can check that the left hand side is equal to $\frac{1}{2\lvert B(r)\rvert}\sum_{x,y\in B(r)}(f(x)-f(y))^2.$

For each $z \in B(2r)$ , we fix a shortest path $e = z_0, z_1, \dots, z_{\lVert z \rVert} = z$ from $e$ to $z$ in the Cayley graph of $(G, S)$ . Given $x,y\in B(r)$ , let $z = x^{-1}y \in B(2r)$ and get $f(x) - f(y) = \sum_{i=1}^{\lVert z \rVert}f(xz_{i-1})-f(xz_i) \\ \implies (f(x)-f(y))^2 \le \lVert z \rVert \sum_{i=1}^{\lVert z \rVert}(f(xz_{i-1})-f(xz_i))^2.$

When summing over the last inequality over all $x,y\in B(r)$ , we can regroup the summands by $z$ and $i$ as follows: $\sum_{z \in B(2r)}\lVert z\rVert\sum_{i=1}^{\lVert z \rVert} \left(\sum_{x\in B(r) : xz \in B(r)}(f(xz_{i-1})-f(xz_i))^2\right).$

Fix $z$ and $i$ for a moment. One can show that both $xz_{i-1}$ and $xz_i$ are in $B(2r)$ , and moreover the directed edges $(xz_{i-1}, xz_{i})$ are distinct when $x$ varies in $B(r)$ . Thus $\sum_{x\in B(r) : xz \in B(r)}(f(xz_{i-1})-f(xz_i))^2 \le \sum_{x,y\in B(2r), x\sim y}(f(x)-f(y))^2.$

We obtain the Poincaré inequality by putting everything together and noticing that $\lVert z \rVert \le 2r$ . [qed]

Reverse Poincaré inequality

Lemma (Reverse Poincaré inequality). For every harmonic function $f\colon G \to \mathbb{R}$ , the fluctuation of $f$ on $B(r)$ is bounded by the $l^2$ -norm of $f$ on $B(2r)$ : $\sum_{x,y\in B(r), x\sim y}(f(x) - f(y))^2 \le \lvert S \vert \cdot 4r^{-2}\sum_{x\in B(2r)}f(x)^2.$

To facilitate the proof, we introduce the following notations. Given a function $f\colon G\to \mathbb{R}$ and $s\in S$ , write $f_s(x) := f(xs)$ and $\partial_s f := f_s - f$ . It is easy to see that

$\sum_{s\in S}\partial_{s^{-1}}\partial_s f = 0$ when $f$ is harmonic, and
$\sum_{x\in G}f(x)\partial_s g(x) = \sum_{x\in G}\partial_{s^{-1}}f(x) g(x)$ when $f$ or $g$ is finitely supported.

Proof. Fix the harmonic function $f\colon G\to \mathbb{R}$ and let $\phi\colon G \to [0,1]$ be defined by $\phi(x) = \begin{cases} 1 & \text{if }\lVert x\rVert \le r,\\ 2 - \lVert x\rVert/r & \text{if }r < \lVert x\rVert < 2r, \\ 0 & \text{if }\lVert x\rVert \ge 2r.\end{cases}$

For every $s \in S$ , note that $\partial_s (f\phi^2) = (\partial_s f)\phi^2 + f_s(\partial_s \phi^2)$ and $\partial_s \phi^2 = (\partial_s \phi)\phi + \phi_s (\partial_s\phi) = (\partial_s \phi)(2\phi + \partial_s \phi)$ . We obtain $\begin{aligned}\partial_s f \partial_s (f\phi^2) & = (\partial_s f)^2 \phi^2 + (\partial_s f)f_s(\partial_s \phi)(2\phi + \partial_s\phi) \\ & \ge \tfrac{1}{2}(\partial_s f)^2\phi^2 - 2(f_s)^2(\partial_s \phi)^2 + (\partial_s f)f_s(\partial_s\phi)^2 \\ & = \tfrac{1}{2}(\partial_s f)^2\phi^2 - f_s(f_s + f)(\partial_s\phi)^2 \\ & \ge \tfrac{1}{2}(\partial_s f)^2\phi^2 - \tfrac{1}{2}(3f_s^2 + f^2)(\partial_s\phi)^2. \end{aligned}$

When summing the above inequality over all $s\in S$ and $x \in G$ , by noticing that $\sum_{s\in S}\sum_{x\in G}\partial_s f(x) \partial_s (f(x)\phi(x)^2) = \sum_{s\in S}\sum_{x\in G}(\partial_{s^{-1}}\partial_s f(x)) f(x)\phi(x)^2 \\ = \sum_{x\in G}\left(\sum_{s\in S}\partial_{s^{-1}}\partial_s f(x)\right) f(x)\phi^2(x) = 0,$ we get $\sum_{x\in G}\sum_{s \in S}(\partial_s f(x))^2\phi(x)^2 \le \sum_{x\in G}\sum_{s \in S}(3f_s(xs)^2+f(x)^2)(\partial_s \phi(x))^2 .$

The left hand side of the last inequality is at least $\sum_{x,y\in B(r), x\sim y}(f(x)-f(y))^2,$ whereas the right hand side is at most $4\lvert S\rvert \tfrac{1}{r^2}\sum_{r \le \lVert x\rVert \le 2r}f(x)^2$ because $(\partial_s\phi)^2 \le 1/r^2$ and $\partial_s\phi(x)^2 > 0$ only if both $x$ and $xs$ are in $\{x \in G : r\le \lVert x\rVert \le 2r\}$ . [qed]

Colding–Minicozzi theorem

To simplify the presentation, we will assume the doubling constant $\lvert B(2r) \rvert / \lvert B(r) \rvert$ is uniformly bounded at all scales $r$ , which, for example, is indeed the case when $\lvert B(r)\rvert = \Theta(r^d)$ . In general, one needs the pigeonhole principle to select the correct radii for the argument below to work.

Proof assuming the doubling constant is uniformly bounded. Suppose for the sake of contradiction, the dimension of the linear space consisting of Lipschitz harmonic functions on $G$ is at least $n$ , where the parameter $n$ will be determined later. Denote by $V$ the $n$ -dimensional linear subspace.

Let $k$ be a natural number to be determined soon, and fix $r$ for a moment. Let $\mathcal{A}_r$ be a maximal collection of disjoint balls of radius $r/2$ with centers in $B(kr)$ , and let $\mathcal{B}_r$ be the collection of balls with the same centers of the balls in $\mathcal{A}_r$ , but of radius $r$ . Let $V_r$ be the linear subspace of $V$ consisting of harmonic functions in $V$ that average to $0$ on each ball in $\mathcal{B}_r$ . Note that the co-dimension of $V_r$ as a subspace of $V$ is at most $\lvert \mathcal{B}_r \rvert = \lvert \mathcal{A}_r \rvert$ , which is at most $\lvert B(kr+r/2) \rvert / \lvert B(r/2) \rvert = O(1) =: C$ .

For every harmonic function $f$ in $V_r$ , using the fact that $B(kr) \subseteq \cup\mathcal{B}_r$ , the fact that each point in $G$ is covered by $2B$ for at most $\lvert B(2r+r/2) \rvert / \lvert B(r/2) \rvert = O(1)$ many $B \in \mathcal{B}_r$ , the Poincaré inequality and the reverse Poincaré inequality, we get $\begin{aligned}\sum_{x \in B(kr)}f(x)^2 & \le \sum_{B \in \mathcal{B}(r)}\sum_{x\in B}f(x)^2 \\ & \lesssim r^2\sum_{B \in \mathcal{B}(r)}\sum_{x,y \in 2B, x\sim y}(f(x)-f(y))^2 \\ & \lesssim r^2\sum_{x, y \in B(kr+2r), x\sim y}(f(x)-f(y))^2 \\ & \lesssim \frac{1}{(k+2)^2}\sum_{x \in B(2(k+2)r)}f(x)^2.\end{aligned}$ Now take $k$ large enough (depending only on the group $G$ ) so that for all $f\in V_r$ , $3^d\sum_{x\in B(kr)}f(x)^2 \le \sum_{x\in B(3kr)}f(x)^2.$

Consider the quadratic form $Q_r$ on $V$ defined by $Q_r(f) := \sum_{x\in B(r)}f(x)^2$ . Since the kernels of $Q_r$ ‘s form a descending chain of vector spaces, there exists $r_0$ such that $Q_r$ is positive-definite for all $r \ge r_0$ .

For every $r \ge r_0$ , let $q(r)$ be the volume of the ellipsoid $E_r$ induced by $Q_r$ . To be more precise, after fixing a basis $\{f_1, \dots, f_n\}$ of $V$ , the ellipsoid is defined by $E_r := \{(c_1, \dots, c_n) \in \mathbb{R}^n : Q_r(c_1f_1 + \dots c_nf_n) \le 1\}.$ By scaling and translation, we may assume that $f_i$ is $1$ -Lipschitz and $f_i(e)=0$ (whenever $f_i$ is non-constant). Since $\lvert B(r) \rvert$ is at most polynomial in $r$ , there exists a natural number $d$ (depending only on the group $G$ ) such that $\sum_{x \in B(r)} f_i(x)^2 \le r^d$ for all $i \in [n]$ . By Cauchy–Schwarz inequality, we have $\begin{aligned}Q_r(c_1f_1 + \dots + c_nf_n) & = \sum_{x\in B(r)}(c_1f_1(x)+\dots+c_nf_n(x))^2 \\ & \le n\sum_{x\in B(r)}c_1^2f_1(x)^2 + \dots c_n^2f_n(x)^2 \\ & \le (c_1^2+\dots+c_n^2)n^2r^d.\end{aligned}$ Therefore $E_r$ contains the ball of radius $(nr^{d/2})^{-1}$ , hence $q_r \ge v_n(nr^{d/2})^{-n}$ , where $v_n$ is the volume of the $n$ -dimensional Euclidean unit ball.

Although the volume $q(r)$ of the ellipsoid is not intrinsic to $Q_r$ , the ratio between $q(r)$ and $q(r')$ does not depend on the choice of the basis.

After a linear transformation, we may assume the symmetric matrices associated to $Q_{kr}$ and $Q_{3kr}$ are of the form $\begin{pmatrix}A_1 & B_1 \\ B_1^T & C_1 \end{pmatrix}, \begin{pmatrix}A_3 & \\ & C_3\end{pmatrix},$ where $A_1, A_3$ act on $V_r$ . Using the Schur complement, we obtain that the volume ratio $q_{3kr}/q_{kr}$ is $\frac{\det Q_{kr}}{\det Q_{3kr}} = \frac{\det A_1 \det (C_1 - B_1^TA_1^{-1}B_1)}{\det A_3 \det C_3}.$ As $B_1^TA_1^{-1}B_1$ is positive semi-definite, we obtain $\det(C_1 - B_1^TA_1^{-1}B_1) \le \det C_1$ and so $q_{3kr}/q_{kr}$ is at most $\frac{\det A_1\det C_1}{\det A_3\det C_3}.$ Recall that $3^dQ_{kr}(f) \le Q_{3kr}(f)$ for all $f \in V_r$ and clearly $Q_{kr}(f) \le Q_{3kr}(f)$ for all $f \in V$ . In other words, $3^dA_1 \preceq A_3$ and $C_1 \preceq C_3$ and so $q_{kr}/q_{3kr} \ge (3^d)^{\dim V_r} \ge (3^d)^{n-C}$ .

Choose $n$ so that $(3^d)^{n-C} \ge 2^{dn}$ hence $2^{dn}q(3kr) \le q(kr)$ . Repeatedly apply the last inequality to obtain $q(kr) \ge 2^{mdn}q(3^mkr) \ge 2^{mdn}v_n(n(3^mkr)^{d/2})^{-n} = \Omega_{d,n}((2/\sqrt{3})^{mdn})$ , which is absurd for $m$ sufficiently large. [qed]

On the basis exchange property

Zilin — Sun, 03 Nov 2019 15:57:03 +0000

One of the students in my class, Undergraduate Seminar on Discrete Mathematics, asks if the exchange property of a matroid can be strengthened to the following, of which I was completely unaware.

Strong basis exchange property. For every pair of bases $B_1, B_2$ and $x_1 \in B_1$ , there exists $x_2 \in B_2$ such that both $B_1 - x_1 + x_2$ and $B_2 + x_1 - x_2$ are still bases.

As it turns out, Brualdi proved back in 1969 the strong exchange property (Theorem 2) in Comments on bases in dependence structures.

I record below the proof which showcases the benefit of viewing matroids both graphically and linear-algebraically.

As per Aaron Berger‘s suggestion, we use a graphic matroid to motivate the proof. Suppose $T_1$ and $T_2$ are two spanning trees of a connected graph $G$ and $e_1$ is an edge in $T_1$ . Clearly, when the edge $e_1$ is also in $T_2$ , we can simply exchange $e_1$ with itself. Assume from now that $e_1$ is not in $T_2$ . Note that $T_2 + e_1$ contains a unique cycle $C$ , each edge of which can be “traded” to $T_1$ . To be more precise, for every $e_2 \in C$ , $T_2 + e_1 - e_2$ is still a spanning tree. It is then easy to see that adding back some edge from $C - e_1$ can reconnect the two connected components resulted from removing $e_1$ from $T_1$ .

As the unique cycle (or circuit) $C$ plays a central role in the above argument, one can show the same concept is valid in any matroid.

Lemma (Fundamental circuit). For every independent set $I$ , if an element $e$ satisfies that $I + e$ is dependent, then $I + e$ contains a unique circuit $C$ , called the fundamental circuit. Moreover, for every $f \in C$ , $I + e - f$ is independent.

Proof of strong basis exchange property. Without loss of generality, we may assume that $e_1 \not\in B_2$ . As $B_2$ is a maximal independent set and $B_2 + e_1$ is dependent, there is a unique circuit $C \subseteq B_2 + e_1$ such that $B_2 + e_1 - e_2$ is independent for every $e_2 \in C$ .

Finally we resort to our intuition of representable matroids. Since $e_1 \in \mathrm{span}(C - e_1) \subseteq \mathrm{span}(B_1 + C - e_1)$ , we have $\mathrm{span}(B_1 + C - e_1) = \mathrm{span}(B_1 + C)$ , hence $B_1 + C - e_1$ contains a basis, say $B_1'$ . By the ordinary exchange property for $B_1 - e_1$ and $B_1'$ , there is $e_2 \in B_1' - (B_1 - e_1) \subseteq C - e_1$ such that $B_1 - e_1 + e_2$ is independent. [qed]

Minimal Distance to Pi

Zilin — Tue, 28 Feb 2017 16:29:49 +0000

Here is a problem from Week of Code 29 hosted by Hackerrank.

Problem. Given two integers $q_1$ and $q_2$ ( $1\le q_1 \le q_2 \le 10^{15}$ ), find and print a common fraction $p/q$ such that $q_1 \le q \le q_2$ and $\left|p/q-\pi\right|$ is minimal. If there are several fractions having minimal distance to $\pi$ , choose the one with the smallest denominator.

Note that checking all possible denominators does not work as iterating for $10^{15}$ times would exceed the time limit (2 seconds for C or 10 seconds for Ruby).

The problem setter suggested the following algorithm in the editorial of the problem:

Given $q$ , it is easy to compute $p$ such that $r(q) := p/q$ is the closest rational to $\pi$ among all rationals with denominator $q$ .
Find the semiconvergents of the continued fraction of $\pi$ with denominators $\le 10^{15}$ .
Start from $q = q_1$ , and at each step increase $q$ by the smallest denominator $d$ of a semiconvergent such that $r(q+d)$ is closer to $\pi$ than $r(q)$ . Repeat until $q$ exceeds $q_2$ .

Given $q$ , let $d = d(q)$ be the smallest increment to the denominator $q$ such that $r(q+d)$ is closer to $\pi$ than $r(q)$ . To justify the algorithm, one needs to prove the $d$ is the denominator of one of the semiconvergents. The problem setter admits that he does not have a formal proof.

Inspired by the problem setter’s approach, here is a complete solution to the problem. Note that $\pi$ should not be special in this problem, and can be replaced by any other irrational number $\theta$ . Without loss of generality, we may assume that $\theta\in(0,1)$ .

We first introduce the Farey intervals of $\theta$ .

Start with the interval $(0/1, 1/1)$ .
Suppose the last interval is $(a/b, c/d)$ . Cut it by the mediant of $a/b$ and $c/d$ and choose one of the intervals $(a/b, (a+c)/(b+d)), ((a+c)/(b+d), c/d)$ that contains $\theta$ as the next interval.

We call the intervals appeared in the above process Farey intervals of $\theta$ . For example, take $\theta = \pi - 3 = 0.1415926...$ . The Farey intervals are:

$\begin{gathered}(0/1, 1/1), (0/1, 1/2), (0/1, 1/3), (0/1, 1/4), (0/1, 1/5), \\ (0/1, 1/6), (0/1, 1/7), (1/8, 1/7), (2/15, 1/7),\cdots\end{gathered}$

The Farey sequence of order $n$ , denoted by $F_n$ , is the sequence of completely reduced fractions between 0 and 1 which when in lowest terms have denominators less than or equal to $n$ , arranged in order of increasing size. Fractions which are neighboring terms in any Farey sequence are known as a Farey pair. For example, Farey sequence of order 5 is
$F_5 = (0/1, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 1/1).$

Using the Stern–Brocot tree, one can prove that

Lemma 1. For every Farey interval $(a/b, c/d)$ of $\theta$ , the pair $(a/b, c/d)$ is a Farey pair. Conversely, for every Farey pair $(a/b, c/d)$ , if $\theta \in (a/b, c/d)$ , then $(a/b, c/d)$ is a Farey interval.
We say $p/q$ is a good rational approximation of $\theta$ if every rational between $p/q$ and $\theta$ (exclusive) has a denominator greater than $q$ .

By the definition of Farey sequence, incorporating with Lemma 1, we know that

Lemma 2. A rational is an endpoint of a Farey interval of $\theta$ if and only if it is a good rational approximation of $\theta$ .

In fact, one can show that the endpoints of Farey intervals and semiconvergents of continued fraction are the same thing! Thereof, the problem setter’s claim follows immediately from:

Proposition. Given $q$ , let $r(q) = p / q$ be the rational closest to $\theta$ with denominator $q$ . If $d = d(q)$ is the minimal increment to $q$ such that $r(q + d) = (p + c) / (q + d)$ is closer to $\theta$ than $r(q)$ , then $c/d$ is a good rational approximation.

Remark. The proposition states that the increments to $p/q$ always come from a good rational approximation. It is stronger than the problem setter’s statement, which only asserts that the increment to the $q$ comes from a good rational approximation.

Proof. In $(x, y)$ -plane, plot the trapezoid defined by

$\left| y/x - \theta \right| < \left|p/q - \theta\right|, q < x < q + d.$

Geometric interpretation

Also we interpret rational numbers $p/q, (p+c)/(q+d)$ as points $A = (q, p), B = (q+d, p+c)$ . Let the line through $(q, p)$ parallel to $y=\theta x$ intersect the vertical line $x = q+d$ at $C = (q+d, p+\theta d)$ . By the definition of $d$ , we know that the trapezoid does not contain lattice points. In particular, there is no lattice point in the interior of the triangle $ABC$ . In the coordinate system with origin at $A$ , $B$ has coordinate $(d, c)$ and the line through $A, C$ is $y = \theta x$ . Since triangle $ABC$ contains no lattice points, there is no $(b, a)$ with $b < d$ such that $a/b$ is between $\theta$ and $c/d$ . In other words, $c/d$ is a good rational approximation. [qed]

Here is a fine print of the algorithm. Because floats may not have enough precision for the purpose of computation, we will instead use a convergent of the continuous fraction of $\pi$ instead. All the computations will then happen in $\mathbb{Q}$ . Finally, we present the algorithm.

P = Rational(5706674932067741, 1816491048114374) - 3
# find endpoints of Farey intervals
a, b, c, d = 0, 1, 1, 1
farey = [[a,b],[c,d]]
while (f = b + d) <= max - min
  farey << [e = a + c, f]
  if P < Rational(e, f)
    c, d = e, f
  else
    a, b = e, f
  end
end
min, max = gets.split.map(&:to_i)
p_min = (P * q).round
# increase p_min/min by frations in farey
while min <= max
  c, d = nil, nil
  farey.each do |a, b|
    break if min + b > max
    if (Rational(p_min + a, min + b) - P).abs < (Rational(p_min, min) - P).abs
      c, d = a, b
      break
    end
  end
  break if d == nil
  p_min += c; min += d
end
puts "#{p_min + 3 * min}/#{min}"

Shalom to Ning

Zilin — Tue, 28 Feb 2017 08:00:40 +0000

I had never expected that Feb 19, 2017 would be the last day we say farewell to each.

Red Sea from Eilat

We used to talk about math puzzles, from blue-eyed islander puzzle to the hardest logic puzzle ever, every time on the bus from or to Shuk. We joked about the possibility that the apple cores we throw in the Carmel national park would one day become apple trees. We had plans to host a hot-pot party and introduce Mahjong to our Israeli friends…

I realized that all these can only expressed in past tense when I saw you forever asleep in Eilat.

We were so close, yet we are so far apart.

First week in Israel

Zilin — Tue, 30 Aug 2016 11:46:05 +0000

Here is a list of things I have experienced during the first week in Israel and some tips for those of you who plan to visit me in Haifa!

Hainan airline has a direct flight from Beijing to Tel Aviv, and they have a resting area for transiting passengers in Beijing.

If this is your first time in Israel, make sure your arrival day is not Saturday or any holiday. Saturday (or holiday) means almost no public transportation, almost no restaurants and fewer people on the street whom you can ask help from.

Most Israelis are quite friendly and speak decent English. Even if someone does not speak English at all, he or she is always able to grab someone closeby to help.

You can exchange all major currencies (eg. US dollars, Chinese yuan) for new Israeli shekels at the airport in Tel Aviv at Bank Hapoalim. Expectedly, the rate is not as good as what you can get outside the airport. A lot of places accept major credit cards, such as Visa and Mastercard. Some places ask for a purchase of 20 shekels or more if you use credit card.

The vending machines for train tickets did not accept my Mastercard. The ticket from the airport to Haifa (Hof Hacarmel station) costs 35 shekels. The train is on platform 2 and it has WiFi connection. Use Google map to tell which station you are arriving if you, like myself, do not understand enough Hebrew.

The train from Tel Aviv to Haifa goes along the coastal line of the mediterranean sea. Highway no.2 lies between the rails and the coast, which resembles highway no.1 in California.

Bus no.11 goes from Hof Hacarmel to Technion. The bus fare is something slightly less than 6 shekels. Moovit is a must for public transportation planning, and it can send you notification when you are approaching at your destination.

Uber and Lyft do not have services in Haifa (but Uber does in Tel Aviv). People use Gett (aka Get Taxi) to call cabs. Since tipping is not required for taxi, remember to turn off automatic tipping in the app. You can use my code GTMLIYK for 15 shekels off your 1st ride.

The most popular messaging app is WhatsApp. I was asked for my WhatsApp contact for quite a few times.

Google Fi (global cellular service including data) works quite well in Israel. For the first week, I’ve heavily relied on my smart phone for navigation, transportation and information retrieving. It’s probably a good idea to carry a power bank. The outlet sockets in Israel are Type C and H.

Mediterranean sea from Hof Hacarmel.

Other random observations:

On Sunday, the train is full of soldiers. A few were probably carrying M4 carbines when I was on the train.

The rent for the property listed online (eg. yod2.co.il or Facebook group) usually does not include city tax and building management fee.

Some apartments in Haifa have a separate room for the toilet only.

You give all the checks for the rent to the landlord when you sign the contract. Technion is able to write a guarantee letter to the landlord saying that they will freeze your last payment as the security deposit.

You will not be able to change the PIN, set by the bank, of your debit card, at least for Bank Leumi, the other major bank in Israel besides Bank Hapoalim.

Hebrew calendar and Chinese calendar are both lunisolar, so they are similar in a lot of ways. Friday and Saturday are weekends in Israel.

High schoolers can choose how difficult the math they want to learn. For example, calculus (including formulas like $e^{i\theta} = \cos \theta + i\sin\theta$ ) is offered in high schools.

However, most high school graduates need to go to army for 2-3 years and it’s hard to recall what they have learnt after the army. Top students might have the option to delay their military service.

Pork and shellfish are not Kosher, so you will not be able to see them in most supermarkets. It is also not Kosher to eat meat with milk.

A Short Proof of the Nash-Williams' Partition Theorem

Zilin — Wed, 20 Jan 2016 03:39:41 +0000

Notations.

$\mathbb{N}$ – the set of natural numbers;
$\binom{M}{k}$ – the family of all subsets of $M$ of size $k$ ;
$\binom{M}{<\omega}$ – the family of all finite subsets of $M$ ;
$\binom{M}{\omega}$ – the family of all infinite subsets of $M$ ;

The infinite Ramsey theorem, in its simplest form, states that for every partition $\binom{\mathbb{N}}{k} = \mathcal{F}_1 \sqcup \dots \sqcup \mathcal{F}_r$ , there exists an infinite set $M\subset \mathbb{N}$ such that $\binom{M}{k}\subset \mathcal{F}_i$ for some $i\in [r]$ . The Nash-Williams‘ partition theorem can be seen as a strengthening of the infinite Ramsey theorem, which considers a partition of a subset of $\binom{\mathbb{N}}{<\omega}$ .

Notations.

$\mathcal{F}\restriction M$ – $\mathcal{F}\cap 2^M$ , that is, the set $\{s\in\mathcal{F} : s\subset M\}$ .
$s \sqsubset t$ , where $s,t$ are subsets of $\mathbb{N}$ – $s$ is an initial segment of $t$ , that is $s = \{n\in t : n \le \max s\}$ .

Definition. Let set $\mathcal{F} \subset \binom{\mathbb{N}}{<\omega}$ .

$\mathcal{F}$ is Ramsey if for every partition $\mathcal{F}=\mathcal{F}_1\sqcup \dots\sqcup\mathcal{F}_r$ and every $M\in\binom{\mathbb{N}}{\omega}$ , there is $N\in\binom{M}{\omega}$ such that $\mathcal{F}_i\restriction N = \emptyset$ for all but at most one $i\in[r]$ .
$\mathcal{F}$ is a Nash-Williams family if for all $s, t\in\mathcal{F}, s\sqsubset t \implies s = t$ .

Theorem (Nash-Williams 1965). Every Nash-Williams family is Ramsey.

The proof presented here is based on the proof given by Prof. James Cummings in his Infinite Ramsey Theory class. The purpose of this rewrite is to have a proof that resembles the one of the infinite Ramsey theorem.

Notation. Let $s\in\binom{\mathbb{N}}{<\omega}$ and $M\in\binom{\mathbb{N}}{\omega}$ . Denote $$[s, M] = \left\{t \in \binom{\mathbb{N}}{<\omega} : t \sqsubset s \text{ or } (s \sqsubset t \text{ and } t\setminus s \subset M)\right\}.$$

Definition. Fix $\mathcal{F}\subset \binom{\mathbb{N}}{<\omega}$ and $s\in \binom{\mathbb{N}}{<\omega}$ .

$M$ accepts $s$ if $[s, M]\cap \mathcal{F}\neq \emptyset$ and $M$ rejects $s$ otherwise;
$M$ strongly accepts $s$ if every infinite subset of $M$ accepts $s$ ;
$M$ decides $s$ if $M$ either rejects $s$ or strongly accepts it.

We list some properties that encapsulates the combinatorial characteristics of the definitions above.

Properties.

If $M$ decides (or strongly accepts, or rejects) $s$ and $N\subset M$ , then $N$ decides (respectively strongly accepts, rejects) $s$ as well.
For every $M\in\binom{\mathbb{N}}{\omega}$ and $s\in\binom{\mathbb{N}}{<\omega}$ , there is $N_1\in\binom{M}{\omega}$ deciding $s$ . Consequently, there is $N_2\in\binom{M}{\omega}$ deciding every subset of $s$ .

Proof of Theorem. Enough to show that if $\mathcal{F} = \mathcal{F}_1\sqcup \mathcal{F}_2$ , then for every $M\in\binom{\mathbb{N}}{\omega}$ , there is infinite $N\in \binom{M}{\omega}$ such that $F_i \restriction N = \emptyset$ for some $i\in[2]$ .

We are going to use $\mathcal{F}_1$ instead of $\mathcal{F}$ in the definitions of “accept”, “reject”, “strongly accept” and “decide”. Find $N\in \binom{M}{\omega}$ that decides $\emptyset$ . If $N$ rejects $\emptyset$ , by definition $\mathcal{F}_1\restriction N = [\emptyset, N]\cap \mathcal{F}_1 = \emptyset$ . Otherwise $N$ strongly accepts $\emptyset$ .

Inductively, we build a decreasing sequence of infinite sets $N \supset N_1 \supset N_2\supset \dots$ , an increasing sequence of natural numbers $n_1, n_2, \dots$ , and maintain that $n_i\in N_i, n_i < \min N_{i+1}$ and that $N_i$ strongly accepts every $s\subset \{n_j : j < i\}$ . Initially, we take $N_1 = N$ as $N$ strongly accepts $\emptyset$ .

A mental picture of the construction.

Suppose $N_1 \supset \dots \supset N_i$ and $n_1 < \dots < n_{i-1}$ have been constructed. In the following lemma, when taking $M = N_i$ and $s = \{n_j : j < i\}$ , it spits out $m$ and $N$ , which are exactly what we need for $n_i$ and $N_{i+1}$ to finish the inductive step.

Lemma. Suppose $M\in\binom{\mathbb{N}}{\omega}$ , $s\in\binom{\mathbb{N}}{<\omega}$ and $\max s < \min M$ . If $M$ strongly accepts every subset of $s$ , then there are $m \in M$ and $N \in \binom{M}{\omega}$ such that $n < \min N$ and $N$ strongly accepts every subset of $s\cup \{n\}$

Proof of lemma. We can build $M = M_0 \supset M_1\supset M_2 \supset \dots$ such that for every $i$ , $m_i := \min M_i < \min M_{i+1}$ and $M_{i+1}$ decides every subset of $s\cup \{m_i\}$ . It might happen that $M_{i+1}$ rejects a subset of $s\cup \{m_i\}$ . However, we claim that this cannot happen for infinitely many times.

Otherwise, by the pigeonhole principle, there is $t\subset s$ such that $I = \{i : M_{i+1} \text{ rejects }t\cup\{m_{i}\}\}$ is infinite. Let $M' = \{m_i : i\in I\}$ . Note that $[t, M'] \subset \cup_i [t\cup\{m_i\}, M_{i+1}]$ , and so $[t,M']\cap \mathcal{F}_1\subset \cup_i \left([t\cup\{m_i\}, M_{i+1}]\cap\mathcal{F}_1\right) = \emptyset$ . Hence $M'\subset M$ rejects $t\subset s$ , which is a contradiction.

Now we pick one $i$ such that $M_{i+1}$ strongly accepts every subset of $s\cup\{m_i\}$ , and it is easy to check that $m = m_i$ and $N = M_{i+1}$ suffice. [qed]

Finally, we take $N_\infty = \{n_1, n_2, \dots\}$ . For any $s\in\binom{N_\infty}{<\omega}$ , there is $i$ such that $s\subset \{n_1, \dots, n_{i-1}\}$ . Note that $N_i$ strongly accepts $s$ and $N_\infty\subset N_i$ . Therefore $N_\infty$ (strongly) accepts $s$ , that is $[s, N_\infty]\cap \mathcal{F}_1 \neq \emptyset$ , and say $t\in [s, N_\infty]\cap \mathcal{F}_1$ . Because $t\in\mathcal{F}_1$ and $\mathcal{F} = \mathcal{F}_1 \sqcup \mathcal{F}_2$ is a Nash-Williams family, $s\notin \mathcal{F}_2$ . [qed]

Alternative to Beamer for Math Presentation

Zilin — Sun, 21 Jun 2015 21:05:48 +0000

Although using blackboard and chalk is the best option for a math talk for various reasons, sometimes due to limit on the time, one has to make slides to save time on writing. The most common tools to create slides nowadays are LaTeX and Beamer.

When I was preparing for my talk at Vancouver for Connections in Discrete Mathematics in honor of the work of Ron Graham, as it is my first ever conference talk, I decided to ditch Beamer due to my lack of experience. Finally, I ended up using html+css+javascript to leverage my knowledge in web design.

The javascript framework I used is reveal.js. Though there are other options such as impress.js, reveal.js fits better for a math talk. One can easily create a text-based presentation with static images / charts. The framework also has incorporated with MathJax as an optional dependency, which can be added with a few lines of code. What I really like about reveal.js as well as impress.js is that they provide a smooth spatial transition between slides. However, one has to use other javascript library to draw and animate diagrams. For that, I chose raphael.js, a javascript library that uses SVG and VML for creating graphics so that users can easily, for example, create their own specific chart. The source code of the examples on the official website is really a good place to start.

To integrate reveal.js and raphael.js to achieve a step-by-step animation of a diagram, I hacked it by adding a dummy fragment element in my html document so that reveal.js can listen to the fragmentshown event and hence trigger raphael.js to animate the diagram. In cases where the diagrams are made of html elements, I used jQuery to control the animation. Here is my favorite animation in the slides generated by jQuery.

How does mathematics progress?

However, one has to make more effort to reverse the animation made by raphael.js or jQuery if one wants to go backwards in slides. I did not implement any reverse animation since I did not plan to go back in slides at all.

In case there is no internet access during the presentation, one has to have copies of all external javascript libraries (sometimes also fonts), which, in my case, are MathJax, raphael.js and jQuery. In order to use MathJax offline, one need to configure reveal.js.

Currently, my slides only work on Chrome correctly. There is another bug that I have not figured out yet. If I start afresh from the first slide, then my second diagram generated by Raphael is not rendered correctly. I got around it by refreshing the slide where the second diagram lives. This is still something annoying that I would like to resolve.

After all, I really like this alternative approach of making slides for math presentation because it enables me to implement whatever I imagine.

十一年

Zilin — Tue, 09 Jun 2015 21:03:40 +0000

数学领域里关于随机过程有两个容易混淆的概念，一个叫上鞅，一个叫下鞅。大致是说，上鞅随着时间的演化会越来越糟糕，而下鞅则相反，越变越美好。两年前，在读研究生课程的时候，教授课程的老师为了方便大家记忆，就告诉我们说：“人生是一个上鞅。”言下之意，人生是越来越糟糕的。

人生是越来越糟糕的，比如现在每次打球必须要做热身才敢上场，又比如已经记不起上一次《成长的足迹》具体写了什么，又比如不能在我们最尊爱的人的身边当他们一个一个离我们远去，又比如……

生活的大潮把我们不断地改变着，把曾经熟悉的面孔冲散在茫茫人海中，再给我们一个叫微信的东西，让我们在掌中寻找彼此。如果想要了解一个进华人的生活现状和人生轨迹，看看他或她的朋友圈里的更新可能就足够了吧。

既然如此，我想以下的文字不应该是关于现实中的我。既然是继续成长的足迹，那就不妨假设一下我们这伙人从进华毕业，升入同一所高中，进入同一所大学，直到今天所有的同学老师甚至同学的家长都还在我的生活中出现，我的一天会是怎样的呢？

为了让阅读这篇文章更加有乐趣，具体人物的姓名就不提了。如果其中一些桥段能引起你的共鸣，请毫不犹豫对号入座。

故事要开始了。

这是一个美国匹兹堡的初夏，知了还没有开始鸣叫。这已经是我在卡内基梅隆大学开始攻读博士学位以来的第四个夏天了。

我住在一个四室一厅的公寓里，三个室友还是以前在进华的三位室友，其中两位已经在当地找到了工作，另一位在攻读博士学位，不过是在另一所大学。整个公寓约定周五是早餐日，用来提醒大家即便再忙也要记得吃早餐。大家便在约定好的时间围坐在客厅餐桌周围，边吃边商量着周末是不是找一天通宵打八十分，有人开玩笑提醒说，这项古老的运动当年是在阳台上秘密进行，其余人纷纷表示赞同延续这个传统。

早餐完毕，我下楼和楼管打招呼，虽然现在大家都有手机和互联网了，但不知为何传达室里还是有一台投币电话。虽然已经好久没有人使用了，但据说投入硬币之后就能听到最思念的人的声音。对了，每次一元。公寓大厅入口放着一叠当地的报纸，一位初中同班在该报社工作，时不时还会找当年的同学采访，报道民生。

我走路来到学校，一路上，随身听里放着周杰伦的龙卷风。这首歌是初中时候第一次接触流行音乐时听的，从此一盘盘磁带，一张张 CD，陪伴着青春。我跟着哼唱，爱情来得太快就像龙卷风……

突然，走廊里出现了系主任，他是从历史系调过来的。我们互相打招呼，系主任不常询问大家的学业情况，但却密切关注每个学生的感情状况。不知道他是不是听到我刚才哼的曲子了，他和最后一次见面一样，质问我道：“你怎么回事？怎么还没有找女朋友？赶紧列一个单子把喜欢女孩的名字写出来！要主动点！”我勉强一笑，心里知道他迫不及待地希望能看到每个人都能早日成家立业。我边告退，边笑着并敷衍着：“马上列，马上列！”

走着走着，我经过一间教室。往里一瞥，班中原来那位少年大学生已经开始了他的教学生涯。他的学生中有几个知道他们的老师当年可是班中最调皮的学生之一呢？现在为人师表，会不会在看到班中调皮的学生时，看到那些年的自己呢？偶然间，想到他也还没有成家，心里暗暗松了一口气。

上午一晃而过，中午去学校食堂的路上，遇到一群在把网球当足球踢的本科生。我心想还是踢瓶盖更环保一些。又走不远，另外一群孩子拿着 iPhone 和 iPad 正在玩部落战争。这简直太逊了，初中班级里发明的拍手游戏变化多端、风靡全年级，游戏平衡性和竞技性岂是这些触屏上的游戏能比拟的。我边走边骄傲地想，这些都应该进入世界文化遗产名录。

饭后回到办公室，我打开电脑开始刷微博，人生百态在眼前展开。可惜今天的头条又不是汪峰，而是班中两位青梅竹马的一对结婚了。系主任要是知道了，肯定得拿这事情当正面典型。

关上微博，在办公室工作了几个小时后，学院里的研究生和教授开始纷纷回家，预备享受周末。英语里 TGIF 的意思是 Thank God It’s Friday （感谢上帝今天是周五），当地人一般都会去酒吧消遣，但我还是同几位当年的球友约定在学校的篮球场感谢上帝。我们今天约了和另外一组美籍华人打全场对抗，虽然没有什么人围观，但我们还是非常认真地对待。对方有几个球员的个人能力非常厉害，但我方毕竟一起磨练多年，不用看就知道队友的位置。突破，分球，倒到圈顶，三分球进了。这是我们常用的战术，而且变化很多。包夹，盗球，抢篮板，盖帽，后仰跳投，每个人都有自己的擅长。汗水挥洒在篮球场上，没有人记得比分。

打完球，大家一起吃晚餐，有另外几位同学也加入了进来。那些在金融或咨询行业工作的几个人总是姗姗来迟。有时会有急事接到电话：“喂？啊，上次那三百万美金的跨国合约……”剩下的人嘴上羡慕嫉妒着，但心里却为这些人在各个行业的成功由衷高兴着。一些在高科技行业工作的同学分享着新的创业机会和股市消息，一些人聚精会神地听着，就好像是考前在听老师圈考点。晚餐临近结束，那位家里有孩子的奶爸扭扭捏捏地说不能久留，要回家给孩子喂奶了。酒足饭饱后，大家各自回家。

我回到自己的公寓，仰面躺在床上，想起了当年初中住宿夜聊室友彻夜讲述《神雕侠侣》，想起上课吃橙子听 CD 纷纷被老师发现没收，想起笑容依旧挂在那一张张天真无邪的脸上，想起……

我注视着房顶的电风扇一圈一圈地摇着头，眼前开始渐渐朦胧，从窗户进入的夏日晚风像一张蚊帐开始笼罩着我……

虽然这一切的一切都是基于一个很大的不可能——如果那些人还依然陪伴着我们，但这些可能性或者说这些梦却似乎在另一个平行宇宙里影响着毕业之后的我们，成为我们这代人的基音之一。这些进华的人和进华的事常以回忆的方式回来提醒我、温暖我、鼓励我。

我为此感恩。

注：本文是为上海市民办进华中学建校二十周年校庆而作，将收录在《继续成长的足迹》一书中。

A Short Proof for Hausdorff Moment Problem

Zilin — Thu, 07 May 2015 05:03:05 +0000

Hausdorff moment problem asks for necessary and sufficient conditions that a given sequence $(m_n)$ with $m_0=1$ be the sequence of moments of a random variable $X$ supported on $[0,1]$ , i.e., $\operatorname{E}X^n=m_n$ for all $n$ .

In 1921, Hausdorff showed that $(m_n)$ is such a moment sequence if and only if the sequence is completely monotonic, i.e., its difference sequences satisfy the equation $(D^r m)_s \ge 0$ for all $r, s \ge 0$ . Here $D$ is the difference operator on the space of real sequences $(a_n)$ given by $D a = (a_{n} - a_{n+1})$ .

The proof under the fold follows the outline given in (E18.5 – E18.6) Probability with Martingales by David Williams.

Proof of Necessity. Suppose $(m_n)$ is the moment sequence of a random variable $X$ supported on $[0,1]$ . By induction, one can show that $(D^r m)_s = \operatorname{E}(1-X)^rX^s$ . Clearly, as $X$ is supported on $[0,1]$ , the moment sequence is completely monotonic.

Proof of Sufficiency. Suppose $(m_n)$ is a completely monotonic sequence with $m_0 = 1$ .

Define $F_n(x) := \sum_{i \le nx}{n\choose i}(D^{n-i}m)_i$ . Clearly, $F_n$ is right-continuous and non-decreasing, and $F_n(0^-) = 0$ . To prove $F_n(1) = 1$ , one has to prove the identity $\sum_{i}{n\choose i}(D^{n-i}m)_i = m_0.$

A classical trick. Since the identity above is about vectors in the linear space (over the reals) spanned by $(m_n)$ and the linear space spanned by $(m_n)$ is isomorphic to the one spanned by $(\theta^n)$ , the identity is equivalent to $\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \theta^0,$ where $\theta_n = \theta^n$ . Now, we take advantage of the ring structure of $\mathbb{R}[\theta]$ . Notice that $(D^{r}\theta)_s = (1-\theta)^r\theta^s$ . Using the binomial theorem, we obtain $\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \sum_{i}{n\choose i}(1-\theta)^{n-i}\theta^i = (1-\theta + \theta)^n = \theta^0.$

Therefore $F_n$ is a bona fide distribution function. Define $m_{n, k} := \int_{[0,1]} x^kdF_n(x)$ , i.e., $m_{n,k}$ is the $k$ th moment of $F_n$ . We now find an explicit formula for $m_{n,k}$ .

Noticing that $F_n$ is constant, say $c_{n,i}$ , on $[\frac{i}{n}, \frac{i+1}{n})$ , for all $i=0, \dots, n-1$ and $c_{n,i}$ is a linear combination of $m_0, \dots, m_n$ , we know that $m_{n,k} = \sum_{i=0}^n a_{n,k,i}m_i$ .

Just like what we did in proving the identity, we use the special case $m_n = \theta^n$ to compute the coefficients $a_i = a_{n,k,i}$ , where $0 \le \theta \le 1$ . In this case, $F_n(x) = \sum_{i \le nx}{n\choose i}(D^{n-i}\theta)_i = \sum_{i\le nx}{n\choose i}(1-\theta)^{n-i}\theta^i, m_{n,k} = \sum_{i=0}^n a_{i}\theta^i.$

Now consider the situation in which a coin with probability $\theta$ is tossed at times $1,2,\dots$ . The random variable $H_k$ is $1$ if the $k$ th toss produces heads, $0$ otherwise. Define $A_n := (H_1 + \dots + H_n)/n$ . It is immediate from the formula of $F_n$ that $F_n$ is the distribution function of $A_n$ , and so $m_{n,k}$ is the $k$ th moment of $A_n$ . However, one can calculate the $k$ th moment of $A_n$ explicitly. Let $f\colon [k] \to [n]$ be chosen uniformly at random, $Im_f$ be the cardinality of the image of $f$ and denote by $p_i = p_{n,k,i} := \operatorname{Pr}(Im_f = i)$ . Using $f, Im_f$ and $p_i$ , we obtain $\begin{aligned}\operatorname{E}A_n^k & = \operatorname{E}\left(\frac{H_1 + \dots + H_n}{n}\right)^k = \operatorname{E}H_{f(1)}\dots H_{f(k)} \\ & = \operatorname{E}\operatorname{E}[H_{f(1)}\dots H_{f(k)}\mid Im_f] = \operatorname{E}\theta^{Im_f} = \sum_{i=0}^n p_{i}\theta^i.\end{aligned}$ Therefore, for all $\theta\in [0,1]$ , we know that $\sum_{i=0}^n a_i\theta^i = \sum_{i=0}^n p_i\theta^i$ , and so $a_i = p_i$ for all $i=0,\dots, n$ .

As both $(a_i)$ and $(p_i)$ do not depend on $m_i$ , $a_i = p_i$ holds in general. Since $p_k = p_{n, k, k} = \prod_{i=0}^{k-1}(1-i/n)\to 1$ as $n\to\infty$ and $p_i = 0$ for all $i > k$ , we know that $\lim_n m_{n,k}= m_k$ .

Using the Helly–Bray Theorem, since $(F_n)$ is tight, there exists a distribution function $F$ and a subsequence $(F_{k_n})$ such that $F_{k_n}$ converges weakly to $F$ . The definition of weak convergence implies that $\int_{[0,1]} x^k dF(x) = \lim_n \int_{[0,1]}x^k dF_{k_n}(x) = \lim_n m_{k_n,k} = m_k.$ Therefore, the random variable $X$ with distribution function $F$ is supported on $[0,1]$ and its $k$ th moment is $m_k$ . [qed]

There are other two classical moment problems: the Hamburger moment problem and the Stieltjes moment problem.