Minimal Distance to Pi

Here is a problem from Week of Code 29 hosted by Hackerrank.

Problem. Given two integers q_1 and q_2 (1\le q_1 \le q_2 \le 10^{15}), find and print a common fraction p/q such that q_1 \le q \le q_2 and \left|p/q-\pi\right| is minimal. If there are several fractions having minimal distance to \pi, choose the one with the smallest denominator.

Note that checking all possible denominators does not work as iterating for 10^{15} times would exceed the time limit (2 seconds for C or 10 seconds for Ruby).

The problem setter suggested the following algorithm in the editorial of the problem:

  1. Given q, it is easy to compute p such that r(q) := p/q is the closest rational to \pi among all rationals with denominator q.
  2. Find the semiconvergents of the continued fraction of \pi with denominators \le 10^{15}.
  3. Start from q = q_1, and at each step increase q by the smallest denominator d of a semiconvergent such that r(q+d) is closer to \pi than r(q). Repeat until q exceeds q_2.

Given q, let d = d(q) be the smallest increment to the denominator q such that r(q+d) is closer to \pi than r(q). To justify the algorithm, one needs to prove the d is the denominator of one of the semiconvergents. The problem setter admits that he does not have a formal proof.

Inspired by the problem setter’s approach, here is a complete solution to the problem. Note that \pi should not be special in this problem, and can be replaced by any other irrational number \theta. Without loss of generality, we may assume that \theta\in(0,1).

We first introduce the Farey intervals of \theta.

  1. Start with the interval (0/1, 1/1).
  2. Suppose the last interval is (a/b, c/d). Cut it by the mediant of a/b and c/d and choose one of the intervals (a/b, (a+c)/(b+d)), ((a+c)/(b+d), c/d) that contains \theta as the next interval.

We call the intervals appeared in the above process Farey intervals of \theta. For example, take \theta = \pi - 3 = 0.1415926.... The Farey intervals are:

\begin{gathered}(0/1, 1/1), (0/1, 1/2), (0/1, 1/3), (0/1, 1/4), (0/1, 1/5), \\ (0/1, 1/6), (0/1, 1/7), (1/8, 1/7), (2/15, 1/7),\cdots\end{gathered}

The Farey sequence of order n, denoted by F_n, is the sequence of completely reduced fractions between 0 and 1 which when in lowest terms have denominators less than or equal to n, arranged in order of increasing size. Fractions which are neighboring terms in any Farey sequence are known as a Farey pair. For example, Farey sequence of order 5 is
F_5 = (0/1, 1/5, 1/4, 1/3, 2/5, 1/2, 3/5, 2/3, 3/4, 4/5, 1/1).

Using the Stern–Brocot tree, one can prove that

Lemma 1. For every Farey interval (a/b, c/d) of \theta, the pair (a/b, c/d) is a Farey pair. Conversely, for every Farey pair (a/b, c/d), if \theta \in (a/b, c/d), then (a/b, c/d) is a Farey interval.
We say p/q is a good rational approximation of \theta if every rational between p/q and \theta (exclusive) has a denominator greater than q.

By the definition of Farey sequence, incorporating with Lemma 1, we know that

Lemma 2. A rational is an endpoint of a Farey interval of \theta if and only if it is a good rational approximation of \theta.

In fact, one can show that the endpoints of Farey intervals and semiconvergents of continued fraction are the same thing! Thereof, the problem setter’s claim follows immediately from:

Proposition. Given q, let r(q) = p / q be the rational closest to \theta with denominator q. If d = d(q) is the minimal increment to q such that r(q + d) = (p + c) / (q + d) is closer to \theta than r(q), then c/d is a good rational approximation.

Remark. The proposition states that the increments to p/q always come from a good rational approximation. It is stronger than the problem setter’s statement, which only asserts that the increment to the q comes from a good rational approximation.

Proof. In (x, y)-plane, plot the trapezoid defined by

\left| y/x - \theta \right| < \left|p/q - \theta\right|, q < x < q + d.
Geometric interpretation

Also we interpret rational numbers p/q, (p+c)/(q+d) as points A = (q, p), B = (q+d, p+c). Let the line through (q, p) parallel to y=\theta x intersect the vertical line x = q+d at C = (q+d, p+\theta d). By the definition of d, we know that the trapezoid does not contain lattice points. In particular, there is no lattice point in the interior of the triangle ABC. In the coordinate system with origin at A, B has coordinate (d, c) and the line through A, C is y = \theta x. Since triangle ABC contains no lattice points, there is no (b, a) with b < d such that a/b is between \theta and c/d. In other words, c/d is a good rational approximation. [qed]

Here is a fine print of the algorithm. Because floats may not have enough precision for the purpose of computation, we will instead use a convergent of the continuous fraction of \pi instead. All the computations will then happen in \mathbb{Q}. Finally, we present the algorithm.

P = Rational(5706674932067741, 1816491048114374) - 3
# find endpoints of Farey intervals
a, b, c, d = 0, 1, 1, 1
farey = [[a,b],[c,d]]
while (f = b + d) <= max - min
  farey << [e = a + c, f]
  if P < Rational(e, f)
    c, d = e, f
    a, b = e, f
min, max =
p_min = (P * q).round
# increase p_min/min by frations in farey
while min <= max
  c, d = nil, nil
  farey.each do |a, b|
    break if min + b > max
    if (Rational(p_min + a, min + b) - P).abs < (Rational(p_min, min) - P).abs
      c, d = a, b
  break if d == nil
  p_min += c; min += d
puts "#{p_min + 3 * min}/#{min}"

A Short Proof of the Nash-Williams' Partition Theorem


  1. \mathbb{N} – the set of natural numbers;
  2. \binom{M}{k} – the family of all subsets of M of size k;
  3. \binom{M}{<\omega} – the family of all finite subsets of M;
  4. \binom{M}{\omega} – the family of all infinite subsets of M;

The infinite Ramsey theorem, in its simplest form, states that for every partition \binom{\mathbb{N}}{k} = \mathcal{F}_1 \sqcup \dots \sqcup \mathcal{F}_r, there exists an infinite set M\subset \mathbb{N} such that \binom{M}{k}\subset \mathcal{F}_i for some i\in [r]. The Nash-Williams‘ partition theorem can be seen as a strengthening of the infinite Ramsey theorem, which considers a partition of a subset of \binom{\mathbb{N}}{<\omega}.


  1. \mathcal{F}\restriction M\mathcal{F}\cap 2^M, that is, the set \{s\in\mathcal{F} : s\subset M\}.
  2. s \sqsubset t , where s,t are subsets of \mathbb{N}s is an initial segment of t, that is s = \{n\in t : n \le \max s\}.

Definition. Let set \mathcal{F} \subset \binom{\mathbb{N}}{<\omega}.

  1. \mathcal{F} is Ramsey if for every partition \mathcal{F}=\mathcal{F}_1\sqcup \dots\sqcup\mathcal{F}_r and every M\in\binom{\mathbb{N}}{\omega}, there is N\in\binom{M}{\omega} such that \mathcal{F}_i\restriction N = \emptyset for all but at most one i\in[r].
  2. \mathcal{F} is a Nash-Williams family if for all s, t\in\mathcal{F}, s\sqsubset t \implies s = t.

Theorem (Nash-Williams 1965). Every Nash-Williams family is Ramsey.

The proof presented here is based on the proof given by Prof. James Cummings in his Infinite Ramsey Theory class. The purpose of this rewrite is to have a proof that resembles the one of the infinite Ramsey theorem.

Notation. Let s\in\binom{\mathbb{N}}{<\omega} and M\in\binom{\mathbb{N}}{\omega}. Denote $$[s, M] = \left\{t \in \binom{\mathbb{N}}{<\omega} : t \sqsubset s \text{ or } (s \sqsubset t \text{ and } t\setminus s \subset M)\right\}.$$

Definition. Fix \mathcal{F}\subset \binom{\mathbb{N}}{<\omega} and s\in \binom{\mathbb{N}}{<\omega}.

  1. M accepts s if [s, M]\cap \mathcal{F}\neq \emptyset and M rejects s otherwise;
  2. M strongly accepts s if every infinite subset of M accepts s;
  3. M decides s if M either rejects s or strongly accepts it.

We list some properties that encapsulates the combinatorial characteristics of the definitions above.


  1. If M decides (or strongly accepts, or rejects) s and N\subset M, then N decides (respectively strongly accepts, rejects) s as well.
  2. For every M\in\binom{\mathbb{N}}{\omega} and s\in\binom{\mathbb{N}}{<\omega}, there is N_1\in\binom{M}{\omega} deciding s. Consequently, there is N_2\in\binom{M}{\omega} deciding every subset of s.

Proof of Theorem. Enough to show that if \mathcal{F} = \mathcal{F}_1\sqcup \mathcal{F}_2, then for every M\in\binom{\mathbb{N}}{\omega}, there is infinite N\in \binom{M}{\omega} such that F_i \restriction N = \emptyset for some i\in[2].

We are going to use \mathcal{F}_1 instead of \mathcal{F} in the definitions of “accept”, “reject”, “strongly accept” and “decide”. Find N\in \binom{M}{\omega} that decides \emptyset. If N rejects \emptyset, by definition \mathcal{F}_1\restriction N = [\emptyset, N]\cap \mathcal{F}_1 = \emptyset. Otherwise N strongly accepts \emptyset.

Inductively, we build a decreasing sequence of infinite sets N \supset N_1 \supset N_2\supset \dots , an increasing sequence of natural numbers n_1, n_2, \dots, and maintain that n_i\in N_i, n_i < \min N_{i+1} and that N_i strongly accepts every s\subset \{n_j : j < i\}. Initially, we take N_1 = N as N strongly accepts \emptyset.

A mental picture of the construction.

Suppose N_1 \supset \dots \supset N_i and n_1 < \dots < n_{i-1} have been constructed. In the following lemma, when taking M = N_i and s = \{n_j : j < i\}, it spits out m and N, which are exactly what we need for n_i and N_{i+1} to finish the inductive step.

Lemma. Suppose M\in\binom{\mathbb{N}}{\omega}, s\in\binom{\mathbb{N}}{<\omega} and \max s < \min M. If M strongly accepts every subset of s, then there are m \in M and N \in \binom{M}{\omega} such that n < \min N and N strongly accepts every subset of s\cup \{n\}

Proof of lemma. We can build M = M_0 \supset M_1\supset M_2 \supset \dots such that for every i, m_i := \min M_i < \min M_{i+1} and M_{i+1} decides every subset of s\cup \{m_i\}. It might happen that M_{i+1} rejects a subset of s\cup \{m_i\}. However, we claim that this cannot happen for infinitely many times.

Otherwise, by the pigeonhole principle, there is t\subset s such that I = \{i : M_{i+1} \text{ rejects }t\cup\{m_{i}\}\} is infinite. Let M' = \{m_i : i\in I\}. Note that [t, M'] \subset \cup_i [t\cup\{m_i\}, M_{i+1}], and so [t,M']\cap \mathcal{F}_1\subset \cup_i \left([t\cup\{m_i\}, M_{i+1}]\cap\mathcal{F}_1\right) = \emptyset. Hence M'\subset M rejects t\subset s, which is a contradiction.

Now we pick one i such that M_{i+1} strongly accepts every subset of s\cup\{m_i\}, and it is easy to check that m = m_i and N = M_{i+1} suffice. [qed]

Finally, we take N_\infty = \{n_1, n_2, \dots\}. For any s\in\binom{N_\infty}{<\omega}, there is i such that s\subset \{n_1, \dots, n_{i-1}\}. Note that N_i strongly accepts s and N_\infty\subset N_i. Therefore N_\infty (strongly) accepts s, that is [s, N_\infty]\cap \mathcal{F}_1 \neq \emptyset, and say t\in [s, N_\infty]\cap \mathcal{F}_1. Because t\in\mathcal{F}_1 and \mathcal{F} = \mathcal{F}_1 \sqcup \mathcal{F}_2 is a Nash-Williams family, s\notin \mathcal{F}_2. [qed]

Alternative to Beamer for Math Presentation

Although using blackboard and chalk is the best option for a math talk for various reasons, sometimes due to limit on the time, one has to make slides to save time on writing. The most common tools to create slides nowadays are LaTeX and Beamer.

When I was preparing for my talk at Vancouver for Connections in Discrete Mathematics in honor of the work of Ron Graham, as it is my first ever conference talk, I decided to ditch Beamer due to my lack of experience. Finally, I ended up using html+css+javascript to leverage my knowledge in web design.

The javascript framework I used is reveal.js. Though there are other options such as impress.js, reveal.js fits better for a math talk. One can easily create a text-based presentation with static images / charts. The framework also has incorporated with MathJax as an optional dependency, which can be added with a few lines of code. What I really like about reveal.js as well as impress.js is that they provide a smooth spatial transition between slides. However, one has to use other javascript library to draw and animate diagrams. For that, I chose raphael.js, a javascript library that uses SVG and VML for creating graphics so that users can easily, for example, create their own specific chart. The source code of the examples on the official website is really a good place to start.

To integrate reveal.js and raphael.js to achieve a step-by-step animation of a diagram, I hacked it by adding a dummy fragment element in my html document so that reveal.js can listen to the fragmentshown event and hence trigger raphael.js to animate the diagram. In cases where the diagrams are made of html elements, I used jQuery to control the animation. Here is my favorite animation in the slides generated by jQuery.

How does mathematics progress?

However, one has to make more effort to reverse the animation made by raphael.js or jQuery if one wants to go backwards in slides. I did not implement any reverse animation since I did not plan to go back in slides at all.

In case there is no internet access during the presentation, one has to have copies of all external javascript libraries (sometimes also fonts), which, in my case, are MathJax, raphael.js and jQuery. In order to use MathJax offline, one need to configure reveal.js.

Currently, my slides only work on Chrome correctly. There is another bug that I have not figured out yet. If I start afresh from the first slide, then my second diagram generated by Raphael is not rendered correctly. I got around it by refreshing the slide where the second diagram lives. This is still something annoying that I would like to resolve.

After all, I really like this alternative approach of making slides for math presentation because it enables me to implement whatever I imagine.

A Short Proof for Hausdorff Moment Problem

Hausdorff moment problem asks for necessary and sufficient conditions that a given sequence (m_n) with m_0=1 be the sequence of moments of a random variable X supported on [0,1], i.e., \operatorname{E}X^n=m_n for all n.

In 1921, Hausdorff showed that (m_n) is such a moment sequence if and only if the sequence is completely monotonic, i.e., its difference sequences satisfy the equation (D^r m)_s \ge 0 for all r, s \ge 0. Here D is the difference operator on the space of real sequences (a_n) given by D a = (a_{n} - a_{n+1}).

The proof under the fold follows the outline given in (E18.5 – E18.6) Probability with Martingales by David Williams.

Proof of Necessity. Suppose (m_n) is the moment sequence of a random variable X supported on [0,1]. By induction, one can show that (D^r m)_s = \operatorname{E}(1-X)^rX^s. Clearly, as X is supported on [0,1], the moment sequence is completely monotonic.

Proof of Sufficiency. Suppose (m_n) is a completely monotonic sequence with m_0 = 1.

Define F_n(x) := \sum_{i \le nx}{n\choose i}(D^{n-i}m)_i. Clearly, F_n is right-continuous and non-decreasing, and F_n(0^-) = 0. To prove F_n(1) = 1, one has to prove the identity \sum_{i}{n\choose i}(D^{n-i}m)_i = m_0.

A classical trick. Since the identity above is about vectors in the linear space (over the reals) spanned by (m_n) and the linear space spanned by (m_n) is isomorphic to the one spanned by (\theta^n), the identity is equivalent to \sum_{i}{n\choose i}(D^{n-i}\theta)_i = \theta^0, where \theta_n = \theta^n. Now, we take advantage of the ring structure of \mathbb{R}[\theta]. Notice that (D^{r}\theta)_s = (1-\theta)^r\theta^s. Using the binomial theorem, we obtain \sum_{i}{n\choose i}(D^{n-i}\theta)_i = \sum_{i}{n\choose i}(1-\theta)^{n-i}\theta^i = (1-\theta + \theta)^n = \theta^0.

Therefore F_n is a bona fide distribution function. Define m_{n, k} := \int_{[0,1]} x^kdF_n(x), i.e., m_{n,k} is the kth moment of F_n. We now find an explicit formula for m_{n,k}.

Noticing that F_n is constant, say c_{n,i}, on [\frac{i}{n}, \frac{i+1}{n}), for all i=0, \dots, n-1 and c_{n,i} is a linear combination of m_0, \dots, m_n, we know that m_{n,k} = \sum_{i=0}^n a_{n,k,i}m_i.

Just like what we did in proving the identity, we use the special case m_n = \theta^n to compute the coefficients a_i = a_{n,k,i}, where 0 \le \theta \le 1. In this case, F_n(x) = \sum_{i \le nx}{n\choose i}(D^{n-i}\theta)_i = \sum_{i\le nx}{n\choose i}(1-\theta)^{n-i}\theta^i, m_{n,k} = \sum_{i=0}^n a_{i}\theta^i.

Now consider the situation in which a coin with probability \theta is tossed at times 1,2,\dots. The random variable H_k is 1 if the kth toss produces heads, 0 otherwise. Define A_n := (H_1 + \dots + H_n)/n. It is immediate from the formula of F_n that F_n is the distribution function of A_n, and so m_{n,k} is the kth moment of A_n. However, one can calculate the kth moment of A_n explicitly. Let f\colon [k] \to [n] be chosen uniformly at random, Im_f be the cardinality of the image of f and denote by p_i = p_{n,k,i} := \operatorname{Pr}(Im_f = i). Using f, Im_f and p_i, we obtain \begin{aligned}\operatorname{E}A_n^k & = \operatorname{E}\left(\frac{H_1 + \dots + H_n}{n}\right)^k = \operatorname{E}H_{f(1)}\dots H_{f(k)} \\ & = \operatorname{E}\operatorname{E}[H_{f(1)}\dots H_{f(k)}\mid Im_f] = \operatorname{E}\theta^{Im_f} = \sum_{i=0}^n p_{i}\theta^i.\end{aligned} Therefore, for all \theta\in [0,1], we know that \sum_{i=0}^n a_i\theta^i = \sum_{i=0}^n p_i\theta^i, and so a_i = p_i for all i=0,\dots, n.

As both (a_i) and (p_i) do not depend on m_i, a_i = p_i holds in general. Since p_k = p_{n, k, k} = \prod_{i=0}^{k-1}(1-i/n)\to 1 as n\to\infty and p_i = 0 for all i > k, we know that \lim_n m_{n,k}= m_k.

Using the Helly–Bray Theorem, since (F_n) is tight, there exists a distribution function F and a subsequence (F_{k_n}) such that F_{k_n} converges weakly to F. The definition of weak convergence implies that \int_{[0,1]} x^k dF(x) = \lim_n \int_{[0,1]}x^k dF_{k_n}(x) = \lim_n m_{k_n,k} = m_k. Therefore, the random variable X with distribution function F is supported on [0,1] and its kth moment is m_k. [qed]

There are other two classical moment problems: the Hamburger moment problem and the Stieltjes moment problem.



欺诈猜数游戏在甲和乙之间进行,甲和乙都知道正整数 kn。游戏开始时,甲先选定两个整数 xN,其中 1\leq x\leq N。甲告诉乙 N 的值,但对 x 守口如瓶。乙试图通过提问来获得与 x 相关的信息:每次提问,乙任选一个由若干正整数组成的集合S(可以重复使用之前提问中使用过的集合),问甲“x是否属于S?”。乙可以提任意数量的问题。每次提问之后,甲立刻回答是或否,甲可以说谎话,但在任意连续 k+1 次回答中至少有一次回答是真话。

在乙问完所有想问的问题之后,乙必须指出一个至多包含 n 个正整数的集合 X,若 x 属于 X,则乙获胜;否则甲获胜。证明:对所有充分大的整数 k,存在整数 n\geq 1.99^k,使得乙无法保证获胜。

为了解决第二个问题,需要从甲的角度考虑问题,每一次回答问题后,使得乙无法缩小 x 范围。

为此我们考虑贪心策略:甲每次在乙提出的集合和其补集中选择元素较多的那个集合。很明显这个策略有一个弱点:如果乙每次提问都问“x 是不是 1 呢?”那甲如果每次只顾眼前利益,应当回答“不是。”但经过 k+1 轮之后,甲就能轻松排除 1,从而缩小 x 的范围。

于是甲的策略必须是一种折中的策略,既要顾及眼前的利益,又不能放弃长期的利益。我们设定 N=n+1

于是甲在觉得到底要选择乙提出的集合或是其补集前,对 1n+1 中每个数都进行打分。甲对 i 的打分是 a^{m_i},其中a = 1.999m_i 满足 i 在最近连续 m_i 次选择的集合中都不出现。根据每个元素的打分,甲在 SS^c 中选择会使得之后整个集合总分较低的那个集合。

为了让证明成立,设定 n = (2-a)a^{k+1} - 1。下面,我们通过归纳法证明每一次所有元素的总分不超过 a^{k+1}。根据打分规则,最开始总分是 n = (2-a)a^{k+1}-1 < a^{k+1},命题显然那成立。假设目前的总分是 y = \sum_{i=1}^{n+1} a^{m_i},乙给出集合 S,如果甲选 S,则总和变为 y_1 = \sum_{i\in S}1 + \sum_{i\notin S}a^{m_i+1},如果甲选 S^c,则总和变为 y_2 = \sum_{i\notin S}1 + \sum_{i\in S}a^{m_i+1}。由于甲会选择使得总和变得更小的那个策略,于是他回答后总和至多为 \frac{1}{2}(y_1 + y_2) = \frac{1}{2}(ay+n+1)。由于 y < a^{k+1},总和小于 \frac{1}{2}(a\cdot a^{k+1}+(2-a)a^{k+1}) = a^{k+1}。命题成立!

如果 i 连续 k+1 次没有被甲选中,则其得分已经为 a^{k+1}。这不可能,因为总分总是小于 a^{k+1}。也就是说在这样的策略下乙无法缩小 x 的可能范围。证毕!

An Upper Bound on Stirling Number of the Second Kind

We shall show an upper bound on the Stirling number of the second kind, a byproduct of a homework exercise of Probabilistic Combinatorics offered by Prof. Tom Bohman.

Definition. A Stirling number of the second kind (or Stirling partition number) is the number of ways to partition a set of n objects into k non-empty subsets and is denoted by S(n,k).

Proposition. For all n, k, we have S(n,k) \leq \frac{k^n}{k!}\left(1-(1-1/m)^k\right)^m.

Proof. Consider a random bipartite graph with partite sets U:=[n], V:=[k]. For each vertex u\in U, it (independently) connects to exactly one of the vertices in V uniformly at random. Suppose X is the set of non-isolated vertices in V. It is easy to see that \operatorname{Pr}\left(X=V\right) = \frac{\text{number of surjections from }U\text{ to }V}{k^n} = \frac{k!S(n,k)}{k^n}.

On the other hand, we claim that for any \emptyset \neq A \subset [k] and i \in [k]\setminus A, \operatorname{Pr}\left(i\in X \mid A\subset X\right) \leq \operatorname{Pr}\left(i\in X\right). Note that the claim is equivalent to \operatorname{Pr}\left(A\subset X \mid i\notin X\right) \geq \operatorname{Pr}\left(A\subset X\right). Consider the same random bipartite graph with V replaced by V':=[k]\setminus \{i\} and let X' be the set of non-isolated vertices in V'. The claim is justified since \operatorname{Pr}\left(A\subset X\mid i\notin X\right) = \operatorname{Pr}\left(A\subset X'\right) \geq \operatorname{Pr}\left(A\subset X\right).

Set A:=[i-1] in above for i = 2, \ldots, k. Using the multiplication rule with telescoping the conditional probability, we obtain \begin{aligned}\operatorname{Pr}\left(X=V\right) =& \operatorname{Pr}\left(1\in X\right)\operatorname{Pr}\left(2\in X \mid [1]\subset X\right) \\ & \ldots \operatorname{Pr}\left(k\in X\mid [k-1]\subset X\right)\\ \leq & \operatorname{Pr}\left(1\in X\right)\operatorname{Pr}\left(2\in X\right)\ldots\operatorname{Pr}\left(k\in X\right) \\ = & \left(1-(1-1/m)^k\right)^m.\end{aligned} [qed]

A Probabilistic Proof of Isoperimetric Inequality

This note is based on Nicolas Garcia Trillos’ talk, Some Problems and Techniques in Geometric Probability, at Carnegie Mellon University on January 29, 2015.

In particular, we demonstrate a probabilistic proof of the isoperimetric inequality. The proof can also be found in Integral Geometry and Geometric Probability by Luis A. Santaló.

Theorem. For a convex set with perimeter L and area A, the isoperimetric inequality states that 4\pi A\leq L^2, and that equality holds if and only if the convex set is a disk. (Assume that the boundary is a closed convex curve of class C^1.)

Proof. Let P(s) parametrize the boundary by its arc length s. Given s and \theta. Suppose the line through P(s) whose angle to the tangent line equals \theta intersects the boundary at another point Q(s). Let \sigma(s, \theta) be the length of the chord between the two intersections P(s), Q(s). Consider the integral \int (\sigma_1\sin\theta_2 - \sigma_2\sin\theta_1)^2 \mathrm{d}s_1\mathrm{d}\theta_1\mathrm{d}s_2\mathrm{d}\theta_2, where the integration extends over 0 \leq s_1, s_2 \leq L and 0 \leq \theta_1, \theta_2 \leq \pi.

Random variables.

Expanding the square in the integrand, we obtain that the integral is equal to \pi L \int \sigma^2\mathrm{d}s\mathrm{d}\theta - 2\left(\int \sigma\sin\theta\mathrm{d}s\mathrm{d}\theta\right)^2.
On one hand, we have \int \sigma^2\mathrm{d}s\mathrm{d}\theta = \int_0^L\int_0^\pi \sigma^2\mathrm{d}\theta\mathrm{d}s = \int_0^L 2A\mathrm{d}s = 2LA.

Change variables.

On the other hand, let p be the distance from the chord to the origin and \phi the angle from the x-axis to the chord. Suppose the angle from the x-axis to the tangent line is \tau. We have p = \langle x, y\rangle\cdot\langle \sin\phi, -\cos\phi \rangle = x\sin\phi - y\cos\phi. Differentiating the latter, we get \mathrm{d}p = \sin\phi\mathrm{d}x - \cos\phi\mathrm{d}y + (x\cos\phi + y\sin\phi)\mathrm{d}\phi. Moreover, we know that \mathrm{d}x = \cos\tau\mathrm{d}s, \mathrm{d}y = \sin\tau\mathrm{d}s. Therefore \mathrm{d}p = \sin\phi\cos\tau\mathrm{d}s - \cos\phi\sin\tau\mathrm{d}s + + (x\cos\phi + y\sin\phi)\mathrm{d}\phi, and so \mathrm{d}p\mathrm{d}\phi = \sin(\phi - \tau)\mathrm{d}s\mathrm{d}\phi. Since \theta + \phi = \tau and \mathrm{d}\theta + \mathrm{d}\phi = \tau'\mathrm{d}s, we have \mathrm{d}p\mathrm{d}\phi = -\sin\theta\mathrm{d}s\mathrm{d}\theta, and so \int\sigma\sin\theta\mathrm{d}s\mathrm{d}\theta = \int_0^{2\pi}\int_{-\infty}^\infty \sigma\mathrm{d}p\mathrm{d}\theta = 2\pi A.

Since the integral is non-negative, we have that 2\pi A(L^2 - 4\pi A)\geq 0, and so 4\pi A \leq L^2. The equality is achieved if and only if \sigma / \sin\theta is a constant, in which case the boundary is a circle. [qed]

Remark. The proof is considered a probabilistic proof because the differential form \mathrm{d}p\mathrm{d}\theta is the measure (invariant under rigid motion) of a random line.

On Extension of Rationals

This note is based on my talk An Expedition to the World of p-adic Numbers at Carnegie Mellon University on January 15, 2014.

Construction from Cauchy Sequences

A standard approach to construct the real numbers from the rational numbers is a procedure called completion, which forces all Cauchy sequences in a metric space by adding new points to the metric space.

Definition. A norm, denoted by |\cdot|, on the field F is a function from F to the set of nonnegative numbers in an ordered field R such that (1) |x|=0 if and only if x=0; (2) |xy|=|x||y|; (3) |x+y|\leq|x|+|y|.

Remark. The ordered field is usually chosen to be \mathbb{R}. However, to construct of \mathbb{R}, the ordered field is \mathbb{Q}.

A norm on the field F naturally gives rise to a metric d(x,y)=|x-y| on F. For example, the standard metric on the rationals is defined by the absolute value, namely, d(x, y) = |x-y|, where x,y\in\mathbb{Q}, and |\cdot| is the standard norm, i.e., the absolute value, on \mathbb{Q}.

Given a metric d on the field F, the completion procedure considers the set of all Cauchy sequences on F and an equivalence relation
(a_n)\sim (b_n)\text{ if and only if }d(a_n, b_n)\to 0\text{ as }n\to\infty.

Definition. Two norms on F, |\cdot|_1, |\cdot|_2, are equivalent if and only if for every sequence (a_n), it is a Cauchy sequence with respect to d_1 if and only if it is so with respect to d_2, where d_1, d_2 are the metrics determined by |\cdot|_1, |\cdot|_2 respectively.

Remark. It is reasonable to worry about the situation in which two norms |\cdot|_1 and |\cdot|_2 are equivalent, but they introduce different equivalent relationships on the set of Cauchy sequences. However, given two equivalent norms, |a_n|_1 converges to 0 if and only if it does so with respect to d_2. (Hint: prove by contradiction and consider the sequence (1/a_n).)

Definition. The trivial norm on F is a norm |\cdot| such that |0|=0 and |x|=1 for x\neq 0.

Since we are interested in norms that generate different completions of the field, it would be great if we can classify all nontrivial norms modulo the norm equivalence.

Alternative Norms on Rationals

Definition. Let p be any prime number. For any non-zero integer a, let \mathrm{ord}_pa be the highest power of p which divides a. For any rational x=a/b, define \mathrm{ord}_px=\mathrm{ord}_pa-\mathrm{ord}_pb. Further define a map |\cdot|_p on \mathbb{Q} as follows: |x|_p = \begin{cases}p^{-\mathrm{ord}_px} & \text{if }x\neq 0 \\ 0 & \text{if }x=0\end{cases}.

Proposition. |\cdot|_p is a norm on \mathbb{Q}. We call it the p-adic norm on \mathbb{Q}.

Proof (sketch). We only check the triangle inequality. Notice that \begin{aligned}\mathrm{ord}_p\left(\frac{a}{b}-\frac{c}{d}\right) & = \mathrm{ord}_p\left(\frac{ad-bc}{bd}\right) \\ & = \mathrm{ord}_p(ad-bc)-\mathrm{ord}_p(bd) \\ & \geq \min(\mathrm{ord}_p(ad), \mathrm{ord}_p(bc)) - \mathrm{ord}_p(bd) \\ &= \min(\mathrm{ord}_p(ad/bd), \mathrm{ord}_p(bc/bd)).\end{aligned} Therefore, we obtain |a/b-c/d|_p \leq \max(|a/b|_p, |c/d|_p) \leq |a/b|_p+|c/d|_p. [qed]

Remark. Some counterintuitive fact about the p-adic norm are
The following theorem due to Ostrowski classifies all possible norms on the rationals up to norm equivalence. We denote the standard absolute value by |\cdot|_\infty.

Theorem (Ostrowski 1916). Every nontrivial norm |\cdot| on \mathbb{Q} is equivalent to |\cdot|_p for some prime p or for p=\infty.

Proof. We consider two cases (i) There is n\in\{1,2,\ldots\} such that |n|>1; (ii) For all n\in\{1,2,\ldots\}, |n|\leq 1. As we shall see, in the 1st case, the norm is equivalent to |\cdot|_\infty, whereas, in the 2nd case, the norm is equivalent to |\cdot|_p for some prime p.

Case (i). Let n_0 be the least such n\in\{1,2,\ldots\} such that |n|>1. Let \alpha > 0 be such that |n_0|=n_0^\alpha.

For every positive integer n, if n_0^s \leq n < n_0^{s+1}, then we can write it in n_0-base: n = a_0 + a_1n_0 + \ldots + a_sn_0^s.

By the choice of n_0, we know that |a_i|\leq 1 for all i. Therefore, we obtain \begin{aligned}|n| & \leq |a_0| + |a_1||n_0| + \ldots + |a_s||n_0|^s \\ & \leq 1 + |n_0| + \ldots |n_0|^s \\ & \leq n_0^{s\alpha}\left(1+n_0^{-\alpha}+n_0^{-2\alpha}+\ldots\right) \\ & \leq Cn^\alpha,\end{aligned} where C does not depend on n. Replace n by n^N and get |n|^N = |n^N| \leq Cn^{N\alpha}, and so |n|\leq \sqrt[N]{C}n^\alpha. As we can choose N to be arbitrarily large, we obtain |n| \leq n^\alpha.

On the other hand, we have \begin{aligned}|n| & \geq |n_0^{s+1}| - |n_0^{s+1}-n| \\ & \geq n_0^{(s+1)\alpha} - (n_0^{s+1}-n_0^s)^\alpha\\ & = n_0^{(s+1)\alpha}\left[1-(1-1/n_0)^\alpha\right] \\ & > C'n^\alpha.\end{aligned} Using the same trick, we can actually take C'=1.

Therefore |n| = n^\alpha. It is easy to see it is equivalent to |\cdot|_\infty.

Case (ii). Since the norm is nontrivial, let n_0 be the least n such that |n|<1.

Claim 1. n_0=p is a prime.

Claim 2. |q|=1 if q is a prime other than p.

Suppose |q| < 1. Find M large enough so that both |p^M| and |q^M| are less than 1/2. By Bézout’s lemma, there exists a,b\in\mathbb{Z} such that ap^M + bq^M = 1. However, 1 = |1| \leq |a||p^M| + |b||q^M| < 1/2 + 1/2 = 1, a contradiction.

Therefore, we know |n|=|p|^{ord_p n}. It is easy to see it is equivalent to |\cdot|_p. [qed]

Non-Archimedean Norm

As one might have noticed, the p-adic norm satisfies an inequality stronger than the triangle inequality, namely |a\pm b|_p\leq \max(|x|_p, |y|_p).

Definition. A norm is non-Archimedean provided |x\pm y|\leq \max(|x|, |y|).

The world of non-Archimedean norm is good and weird. Here are two testimonies.

Proposition (no more scalene triangles). If |x|\neq |y|, then |x\pm y| = \max(|x|, |y|).

Proof. Suppose |x| < |y|. On one hand, we have |x\pm y| \leq |y|. On the other hand, |y| \leq \max(|x\pm y|, |x|). Since |x| < |y|, we must have |y| \leq |x\pm y|. [qed]

Proposition (all points are centers). D(a, r^-) = D(b, r^-) for all b\in D(a, r^-) and D(a, r) = D(b,r) for all b\in D(a,r), where D(c, r^-) = \{x : |x-c|<r\} and D(c,r)=\{x:|x-c|\leq r\}.

Construction of p-adic Numbers

The p-adic numbers are the completion of \mathbb{Q} via the p-adic norm.

Definition. The set of p-adic numbers is defined as \mathbb{Q}_p = \{\text{Cauchy sequences with respect to }|\cdot|_p\} / \sim_p, where (a_n)\sim_p(b_n) iff |a_n-b_n|_p\to 0 as n\to\infty.

We would like to extend |\cdot|_p from \mathbb{Q} to \mathbb{Q}_p. When extending |\cdot|_\infty from \mathbb{Q} to \mathbb{R}, we set |[(a_n)]|_\infty to be [(|a_n|)], an element in \mathbb{R}. However, the values that |\cdot|_p can take, after the extension, are still in \mathbb{Q}.

Definition. The extension of |\cdot|_p on \mathbb{Q}_p is defined by |[(a_n)]|_p = \lim_{n\to\infty}|a_n|_p.

Remark. Suppose (a_n)\sim_p (a_n'). Then \lim_{n\to\infty}|a_n-a_n'|_p=0, and so \lim_{n\to\infty}|a_n|_p=\lim_{n\to\infty}|a_n'|_p. Moreover, one can prove that \lim_{n\to\infty}|a_n|_p always exists provided that (a_n) is a Cauchy sequence. (Hint: Suppose \lim_{n\to\infty}|a_n|_p > 0. There exists \epsilon > 0 such that |a_n|_p > \epsilon infinitely often. Choose N enough so that |a_m - a_n|_p < \epsilon for all m,n>N. Use ‘no more scalene triangles!’ property to deduce a contradiction.)

Representation of p-adic Numbers

Even though each real number is an equivalence class of Cauchy sequences, each equivalence class has a canonical representative. For instance, the canonical representative of \pi is (3, 3.1, 3.14, 3.141, 3.1415, \ldots). The analog for \mathbb{Q}_p is the following.

Theorem. Every equivalence class a in \mathbb{Q}_p for which |a|_p\leq 1 has exactly one representative Cauchy sequence of the form (a_i) for which (1) 0\leq a_i < p^i for i=1,2,3,\ldots; (2) a_i = a_{i+1} (\pmod p^i) for i=1,2,3,\ldots.

Proof of uniqueness. Prove by definition chasing.

Proof of existence. We shall repeatedly apply the following lemma.

Lemma. For every b\in\mathbb{Q} for which |b|_p \leq 1 and i\in\mathbb{N}, there exists a\in\{0, \ldots, p^i-1\} such that |a-b|_p \leq p^{-i}.

Proof of Lemma. Suppose b=m/n is in the lowest form. As |b|_p\leq 1, we know that (n, p^i)=1. By Bézout’s lemma, an+a'p^i=m for some integers a,a'. We may assume a\in\{0,\ldots,p^i-1\}. Note that a-b=a'p^i/n, and so |a-b|_p \leq p^{-i}. [qed]

Suppose (c_i) is a representative of a. As (c_i) is a Cauchy sequence, we can extract a subsequence (b_i) such that |b_i - b_j| \leq p^{-i} for all i < j which is still a representative of a. Using the lemma above, for each b_i, we can find 0 \leq a_i < q^i such that |a_i-b_i|_p \leq q^{-i}. Therefore (a_i) is a representative of a as well. For all i<j, we have |a_i-a_j|_p \leq \max(|a_i - b_i|_p, |b_i-b_j|_p, |b_j-a_j|_p) \leq q^{-i}, Therefore q^i divides a_i - a_j. [qed]

For |a|_p\leq 1, we write a=b_0 + b_1p + b_2p^2 + \ldots, where (b_{n-1}b_{n-2}\ldots b_0)_p = a_n.

What if |a|_p > 1? As |ap^m|=|a|/p^m, |ap^m|\leq 1 for some natural number m. By the representation theorem, we can write ap^m = b_0 + b_1p + b_2p^2 + \ldots, and a = b_0p^{-m} + b_1p^{-m+1} + b_2p^{-m+2} + \ldots.

Using the representation of p-adic numbers, one can perform arithmetic operations such as addition, subtraction, multiplication and division.

Like \mathbb{R}, \mathbb{Q}_p is not algebraically closed. Though \mathbb{C}, the algebraic closure of \mathbb{R}, has degree 2 over \mathbb{R}, and it is complete with respect to the absolute value, it is not so for \overline{\mathbb{Q}_p}, the algebraic closure of \mathbb{Q}_p. In fact, \overline{\mathbb{Q}_p} has infinite degree over \mathbb{Q}_p and is, unfortunately, not complete with respect to proper extension of |\cdot|_p. The good news is that the completion of \overline{\mathbb{Q}_p}, denoted by \Omega is algebraically closed.

Poisson Summation Formula and Basel Problem

This note is based on Professor Noam Elkies’ talk at Carnegie Mellon University on December 2, 2014.

A classical mathematical analysis problem, also known as the Basel problem, first posed by Pietro Mengoli in 1644 and solved by Leonhard Euler in 1735, asks the precise sum in closed form of the infinite series \sum_{n=1}^\infty \frac{1}{n^2} = \lim_{n\to\infty}\left(\frac{1}{1^2}+\frac{1}{2^2}+\ldots+\frac{1}{n^2}\right).

Here we present an attack to the Basel problem using the Poisson summation formula.

Poisson Summation Formula

The Poisson summation formula states that \sum_{n\in\mathbb{Z}}f(n) = \sum_{k\in\mathbb{Z}}\hat{f}(k), where f: \mathbb{R}\to\mathbb{R} is a “nice” function and \hat{f}(\omega):=\int_\mathbb{R}f(x)e^{-2\pi i\omega x}dx is the Fourier transformation of f.

For the sake of completeness, we are going to prove the Poisson summation formula and specify how “nice” the function should be for the formula to work.

Proof. Let R:\mathbb{R}\to\mathbb{R} be defined by R(t) = \sum_{n\in\mathbb{Z}}f(n+t). Because R(t+m) = R(t) for all m\in\mathbb{Z}, and so R is a periodic function, we can expand R(t) in the Fourier series R(t) = \sum_{k\in\mathbb{Z}}a_ke^{2\pi i kt}, where \begin{aligned}a_k & = \int_0^1R(t)e^{-2\pi ikt}dt \\ & = \int_0^1\sum_{n\in\mathbb{Z}}f(n+t)e^{-2\pi ikt}dt \\ & = \sum_{n\in\mathbb{Z}}\int_0^1f(n+t)e^{-2\pi ikt}dt \\ & = \sum_{n\in\mathbb{Z}}\int_n^{n+1}f(t)e^{-2\pi ikt}dt \\ & = \int_\mathbb{R}f(t)e^{-2\pi ikt}dt.\end{aligned} which equals the value at -k of the Fourier transform \hat{f} of f. We conclude that \sum_{n\in\mathbb{Z}}f(n) = R(0)=\sum_{k\in\mathbb{Z}}a_k = \sum_{k\in\mathbb{Z}}\hat{f}(k). [qed]

Remark. In order to use Fourier series for R, we need R\in C^1 to ensure that its Fourier series expansion \sum_{n\in\mathbb{Z}}a_ke^{2\pi ikt} converges to R(t) pointwise (and uniformly in fact). This allows us to evaluate R at 0 and equate it with \sum_{k\in\mathbb{Z}} a_k. On the other hand, we switched the integration and the infinite summation \int_0^1\sum_{n\in\mathbb{Z}}f(n+t)e^{-2\pi ikt}dt = \sum_{n\in\mathbb{Z}}\int_0^1f(n+t)e^{-2\pi ikt}dt, in the computation. The Fubini–Tonelli theorem says that this is legit if \int_0^1\sum_{n\in\mathbb{Z}}\left|f(n+t)e^{-2\pi ikt}\right|dt = \int_0^1\sum_{n\in\mathbb{Z}}\left|f(n+t)\right|dt < \infty or \sum_{n\in\mathbb{Z}}\int_0^1\left|f(n+t)e^{-2\pi ikt}\right|dt = \sum_{n\in\mathbb{Z}}\int_n^{n+1}|f(x)|dx = \int_\mathbb{R}|f(x)|dx < \infty.

Application to Basel Problem

Take f(x) = \frac{1}{x^2+c^2}, where c > 0 is a constant. For such f, R(t) = \sum_{n\in\mathbb{Z}}f(n+t) defined above converges uniformly. As each f(n+t) is continuous, so is R. As \sum_{n\in\mathbb{Z}}f'(n+t) = \sum_{n\in\mathbb{Z}}\frac{-2(t+n)}{((t+n)+c^2)^2} converges uniformly, we have that R'(t) = \sum_{n\in\mathbb{Z}}f'(n+t), and moreover, as f'(n+t) is continuous, so is R'. Now R is continuously differentiable, and so we can use Fourier series for R. On the other hand, since f is absolutely integrable, we can switch integral with summation. Therefore we can apply the Poisson summation formula.

Its Fourier transform is, by Cauchy’s integral formula, \hat{f}(\omega) = \int_\mathbb{R}\frac{1}{x^2+c^2}e^{-2\pi i\omega x}dx = \frac{\pi}{c}e^{-2\pi c|\omega|}. By the Poisson summation formula, we obtain \sum_{n\in\mathbb{Z}}\frac{1}{n^2+c^2} = \sum_{k\in\mathbb{Z}}\frac{\pi}{c}e^{-2\pi c|k|} = \frac{\pi}{c}\frac{e^{2\pi c}+1}{e^{2\pi c}-1}, and so \sum_{n=1}^\infty \frac{1}{n^2+c^2} = \frac{1}{2}\left(\frac{\pi}{c}\frac{e^{2\pi c}+1}{e^{2\pi c}-1}-\frac{1}{c^2}\right).

Sending c to 0^+, the left hand side converges to \sum_{n=1}^\infty\frac{1}{n^2} and the right hand side is equal to

\begin{aligned} & \frac{1}{2}\left(\frac{\pi}{c}\frac{2+2\pi c+2\pi^2 c^2 + \ldots}{2\pi c + 2\pi^2 c^2 + 4\pi^3c^3/3 + \ldots}-\frac{1}{c^2}\right) \\ = & \frac{1}{2}\left(\frac{(2 + 2\pi c+2\pi^2c^2 + \ldots) - (2 + 2\pi c + 4\pi^2c^2/3 + \ldots)}{2c^2 + 2\pi c^3 + 4\pi^2c^4/3 + \ldots}\right) \\ = & \frac{1}{2} \frac{2\pi^2c^2 - 4\pi^2c^2/3 + \ldots}{2c^2 + \ldots} \end{aligned}

and converges to \frac{\pi^2}{6} as c\to 0^+.

A Note on Thue's Method

This note is based on my talk Introduction to Diophantine Approximation at Carnegie Mellon University on November 4, 2014. Due to limit of time, details of Thue’s method were not fully presented in the talk. The note is supposed to serve as a complement to that talk.

Diophantine Approximation

Diophantine approximation deals with the approximation of real numbers by rational numbers. In other words, given a real number \alpha, we are interested in finding a rational approximation p/q such that the error |\alpha - p/q| is “small”. The smaller the error, the better the approximation. Since the set of rationals is dense, there is no best rational approximation of an irrational number (apparently, the best rational approximation of a rational number is itself). However, if we compare the error with the complexity of the rational, i.e., the denominator of the rational, it is natural to expect that the more complicated the rational approximation, the smaller the error. In particular, we can make sense of this in the following way.

Theorem (Dirichlet). Given an irrational \alpha, we have |\alpha - p/q| < 1/q^2 for infinitely many rationals p/q.

Proof. Think of C = \mathbb{R}/\mathbb{Z} as a circle with circumference 1. Fix a natural number N and mark nq on C for n=1,\ldots,N. By the pigeonhole principle, there are 1\leq n_1 \leq n_2\leq N such that n_1\alpha and n_2\alpha is \leq 1/N apart from each other on C. Let q = n_2 - n_1 < N. Then there is p such that |q\alpha - p| \leq 1/N, and so |\alpha - p/q| \leq 1/(Nq) < 1/q^2. Taking N large enough yields better approximation p/q with the desired property.

By means of continued fraction, one can push the result a bit further by a constant factor.

Theorem (Hurwitz 1891). Given an irrational \alpha, we have |\alpha - p/q| < 1/\sqrt{5}q^2 for infinitely many rationals p/q. Moreover, the constant \sqrt{5} is the best possible for \alpha = (1+\sqrt{5})/2.

These show us how “well” we can approximate reals by rationals. On the other hand, we would also like to say that we cannot approximate “too well”, i.e., given a real number \alpha, we have |\alpha - p/q| > 1/q^n for all but finite rationals p/q, where n may depend on \alpha. Unfortunately, this is not so because of the existence of Liouville numbers.

Definition. A Liouville number is an irrational number \alpha with the property that, for every natural number n, there exists a rational p/q and such that |\alpha - p/q| < q^{-n}.

For example, \alpha = \sum_{k=1}^\infty 2^{-k!} is a Liouville number.

However, we indeed can say something along those lines for algebraic numbers.

Diophantine Approximation for Algebraic Numbers

Definition. An algebraic number is a number that is a root of a non-zero polynomial in one variable with rational coefficients (or equivalently integer coefficients). Given an algebraic number, there is a monic polynomial (with rational coefficients) of least degree that has the number as a root. This polynomial is called its minimal polynomial. If its minimal polynomial has degree d, then the algebraic number is said to be of degree d.

Theorem (Liouville 1844). Given an algebraic irrational number \alpha of degree d, there exists A>0 such that for all rationals p/q, |\alpha - p/q|>Aq^{-d}.

A immediate consequence of the theorem is that Liouville numbers are not algebraic.

Proof. Let f(x) be a polynomial with integer coefficients of degree d that has \alpha as a root. Note that f'(\alpha)\neq 0 otherwise f'(x) is a polynomial of degree <d that has \alpha as a root. As f'(x) is continuous, there is r>0 such that |f'(x)| < 1 / A for all |x-\alpha|<r. Without loss of generality, we may assume p/q is pretty close, i.e., within distance r, to \alpha by taking A < r. Note that also f(p/q)\neq 0 otherwise f(x) / (x-p/q) is a polynomial of degree <d that has \alpha as a root. Since q^df(p/q) is an integer, |f(p/q)| \geq q^{-d}. On the other hand, by mean value theorem, there is c between p/q and \alpha such that |f(\alpha)-f(p/q)| = |f'(c)||\alpha - p/q|. Therefore |\alpha - p/q| = |f(p/q)|/f'(c) > Aq^{-n} since |c-\alpha| < r.[qed]

Liouville’s theorem (on Diophantine approximation) can be phrased in an alternative way.

Corollary. Given an irrational algebraic number \alpha of degree d. If the equation |\alpha - p/q| < 1/q^n has infinitely many rational solutions p/q, then n\leq d.

The continuing working was established by Axel Thue (1909), Carl Ludwig Siegle (1921), Freeman Dyson (1947), and culminated at Klaus Roth’s result (1955):

Theorem (Roth 1955). Given an irrational algebraic number \alpha. If the equation |\alpha-p/q|<1/q^n has infinitely many rational solutions p/q, then n\leq 2.

The constant 2 cannot be improved due to Dirichlet’s theorem. Another way to interpret Roth’s theorem is as follows.

Corollary. Any irrational algebraic number \alpha has approximation exponent equal to 2, i.e., for any \epsilon>0, the inequality |\alpha-p/q|<1/q^{2+\epsilon} can only have finitely many solutions p/q.

This amounts to say that we cannot approximate irrational algebraic numbers “too well” in the sense that if we slightly restrict the error from q^{-2} to q^{-(2+\epsilon)}, the number of “good” approximation changes from infinite to finite.

Thue’s Method

Although Roth’s proof has merit in itself, part of his idea dates back to Thue’s work. Indeed, Thue used a method, now known as the polynomial method, in a special way to prove the following theorem.

Theorem (Thue 1909). Given an irrational algebraic number \alpha. If the equation |\alpha-p/q|<1/q^n has infinitely many rational solutions p/q, then n\leq d/2+1, where d is the degree of \alpha.

Proof. Suppose that the irrational algebraic number \alpha has two “good” rational approximations r_1 = p_1/q_1, r_2 = p_2/q_2, i.e., |\alpha - p_i/q_i| < q_i^{-n} for i = 1,2. Since there are infinitely many “good” rational approximations, we get to choose later how large we want q_i to be.

The rest of the proof can be broken into three parts. Let m\in\mathbb{N} and 0 < \epsilon \ll 1 be two constants to be determined. The constant \epsilon is fixed before we send it to 0.

We are going to use C_0, \ldots, C_6 to indicate positive constants that may depend on \epsilon, d and c, a constant determined by \alpha (defined later). We may always assume \alpha, d, c, C_0,\ldots, C_6 \ll m.

Part I. Find a non-zero polynomial f(x,y) = P(x) + Q(x)y with integer coefficients that satisfies the following properties:

  1. it vanishes at (\alpha, \alpha) to order (m, 1), i.e., \partial_x^k f(\alpha, \alpha) = 0 for all k<m;
  2. its degree in x is \leq D := \frac{1}{2}(1+\epsilon)dm.
  3. its size, i.e., the maximum of all its coefficients’ absolute values, denoted by |f|, is \leq C_1^m for some constant C_1.

Proof of Part I. The proof is a classical “parameter counting” argument. Consider a polynomial f(x,y)=P(x)+Q(x)y = \sum_{i<D}a_ix^i + \sum_{i<D}b_ix^iy with degree \leq D in x and think of its 2D coefficients as undetermined. We can interpret the first property as a homogeneous system of m linear equations with coefficients in \mathbb{Z}[\alpha] because \frac{1}{k!}\partial_x^k f(\alpha, \alpha) = 0 \text{ if and only if } \sum_{i < D}{i\choose k}\alpha^{i-k}a_i + \sum_{i < D}{i\choose k}\alpha^{i-k+1}b_i = 0.

Fix a (non-zero) polynomial c_dx^d + \ldots + c_0 with integer coefficients that has \alpha as a root and c the size of this polynomial. Notice that c_d\alpha^d = -(c_{d-1}\alpha^{d-1} + \ldots + c_0). For each of the linear equation above of the form L_0 + L_1\alpha + \ldots + L_t\alpha^t = 0, for some t \geq d where L_k are linear combination of a_i, b_i with integer coefficients, we can replace it by c_d(L_0 + L_1\alpha + \ldots + L_{t-1}\alpha^{t-1}) - L_t(c_0\alpha^{t-d} + \ldots + c_{d-1}\alpha^{t-1}) = 0, i.e., L'_0 + L_1'\alpha + \ldots + L_{t-1}\alpha^{t-1} = 0 with L'_k = \begin{cases} c_dL_k & 0\leq k < t-d \\ c_dL_k - c_{k-t+d}L_t & t-d \leq k < t\end{cases} and we can repeat this until its degree in \alpha becomes smaller than d. Let |L_k|_1 be the sum of the absolute values of all coefficients in L_k. Note that each replacement increases the \max |L_k|_1 to \max |L_k'|_1 \leq 2c \max|L_k|. Note that the initially \max |L_k| < 2^D. After at most D replacements, the equation satisfies \max |L_k| < 2^D(2c)^D = C_0^m for some constant C_0.

Finally, for each of the linear equation, after the replacements, of the form L_0 + L_1\alpha + \ldots + L_{d-1}\alpha^{d-1} = 0, it is equivalent to have L_0 = L_1 = \ldots = L_{d-1} because 1, \alpha, \ldots, \alpha^{d-1} are linearly independent over \mathbb{Q}. Therefore each linear equation with coefficient in \mathbb{Z}[\alpha] corresponding to \partial_x^k f(\alpha, \alpha) = 0 is equivalent to d linear equations whose |\cdot|_1 are all bounded by C_0^m. In total, we have dm linear equations with 2D undetermined.

One can view this homogeneous system of linear equations as a linear transformation L = (L_1, \ldots, L_{dm}): \mathbb{Z}^{2D} \to \mathbb{Z}^{dm}. Consider all integral points in [-M, M]^{2D} and their image under L. Because |L_k|_1 \leq C_0^m, their image is contained in [-C_0^mM, C_0^mM]^{dm}. By the pigeonhole principle, as long as (2M+1)^{2D} > (2C_0^mM+1)^{dm}, there are two integral points v', v'' in [-M,M]^{2D} such that Lv'=Lv'', hence there is an integral point v=v'-v''\in [-2M, 2M]^{2D} such that Lv=0 and v\neq 0, hence a non-zero solution to the linear system. To guarantee the above inequality involving M, it is enough to have 2M > C_0^{m/\epsilon}, and so 2M = C_1^m, for some constant C_1, works.

Part II. Find l < L := \epsilon m such that g(x,y) = \frac{1}{l!}\partial_x^l f(x,y) satisfies the following properties:

  1. it vanishes at (\alpha, \alpha) to order (m-L, 1);
  2. its degree in x is \leq D (=\frac{1}{2}(1+\epsilon)dm);
  3. its size is |g| \leq {D\choose l}C_1^m < C_2^m for some constant C_2;
  4. and g(r_1, r_2)\neq 0.

Proof of Part II. The first three properties follow immediately from the properties of f. Henceforth, we only have to show there exists l < L such that the last property holds. For the sake of contradiction, assume that \partial_x^l f(r_1, r_2) = 0 for all l < L. Define D(x) := \begin{vmatrix}P(x) & Q(x) \\ P'(x) & Q'(x)\end{vmatrix} = P(x)Q'(x) - P'(x)Q(x). Because f(x,y) vanishes at (r_1, r_2) to order (L,1), by Leibniz rule, D(x) vanishes at r_1 to order L-1. Note that D(x) is in \mathbb{Z}[x]. By Gauss’ lemma, (q_1x-p_1)^{L-1} divides D(x). Therefore either (1) |D(x)| \geq q_1^{L-1}; or (2) D(x) = 0 as a polynomial.

If it is the first case, because |D(x)| = |P(x)Q'(x) - P'(x)Q(x)| < 2D|f|^2, we have 2D|f|^2 > q_1^{L-1}. Because |f| < C_1^m, we get a contradiction as long as q_1 > C_3, for some constant C_3.

If it is the second case, P(x) and Q(x) are linearly dependent over \mathbb{Q}. Therefore either (1) P(x) = -r_2'Q(x) and Q(x) vanishes at r_1 to order < L or (2) Q(x) vanishes at r_1 to order L. If it is the first case, because f(x,y) = (y-r_2')Q(x) vanishes at (r_1, r_2) to order (L, 1) and Q(x) vanish at r_1 to order <L, we know that r_2' = r_2, hence by Gauss’ lemma, f(x,y) is a multiple of q_2x - p_2, and so |f| \geq q_2. If it is the second case, because Q vanishes at r_1 to order L, by Gauss’ lemma, Q(x) is a multiple of (q_1x - p_1)^L, and so |f| \geq |Q| \geq q_1^L. To get a contradiction in both cases, we need q_1 > C_4\text{ and }q_2 > C_1^m.
Therefore, it is enough to have q_1 > C_5\text{ and }q_2 > q_1^m for some constant C_5.

Part III. We claim that |g(r_1, r_2)| < C_6^m\left(q_1^{-(1-\epsilon)m}+q_2^{-1}\right).

Proof of Part III. We know that |g(r_1, r_2)-g(r_1, \alpha)| = |Q(r_1)||\alpha - r_2| \leq D|g|(|\alpha|+1)^D|\alpha-r_2|. Because g vanishes at (\alpha_1, \alpha_1) to order (m-L, 1), using Taylor expansion, we have
\begin{aligned}|g(r_1, \alpha) - g(\alpha, \alpha)| & \leq \sup_{|x| < |\alpha|+1}\frac{1}{(m-L)!}\partial_x^{m-L}g(x,\alpha) |\alpha_1-r_1|^{m-L} \\ & < D2^D|g|(|\alpha|+1)^D||\alpha_1-r_1|^{m-L}.\end{aligned}

As r_1, r_2 are two “good” approximations of \alpha, we obtain \begin{aligned}|g(r_1, r_2)| & \leq D|g|(|\alpha|+1)^D|\alpha-r_2| + D2^D|g|(|\alpha|+1)^D||\alpha_1-r_1|^{m-L} \\ & < C_6^m\left(q_2^{-n} + q_1^{(-m+L)n}\right),\end{aligned} for some constant C_6

From Part II, we know that g(r_1, r_2)\neq 0, and so |g(r_1, r_2)| \geq q_1^{-D}q_2^{-1}. From Part III, we know that |g(r_1, r_2)| < C_6^m(q_1^{(-m+L)n}+q_2^{-n}). Therefore, we obtain q_1^{-D}q_2^{-1} < C_6^m\left(q_1^{(-m+L)n}+q_2^{-n}\right). Set m to be the biggest integer less than \log{q_2} / \log{q_1}, and so q_1^m < q_2 \leq q_1^{m+1}. To ensure Part II to work, we only need to take q_1 > C_5. In addition, we require q_1^\epsilon > C_6 and so C_6^m < q_1^{\epsilon m} < q_2^\epsilon. The choice of m implies that
q_2^{-\frac{1}{2}(1+\epsilon)d - 1} = q_2^{-D/m}q_2^{-1} < q_2^\epsilon\left(q_2^{-\frac{m-L}{m+1}n}+q_2^{-n}\right) < 2q_2^{-(1-2\epsilon)n+\epsilon}.
In order to have this inequality hold for arbitrarily large q_2, we need \frac{1}{2}(1+\epsilon)d + 1 > (1-2\epsilon)n-\epsilon. Let \epsilon go to 0 and obtain n\leq \frac{1}{2}d + 1. [qed]


The proof largely relies on Larry Guth‘s notes on polynomial method.