## A Short Proof for Hausdorff Moment Problem

Hausdorff moment problem asks for necessary and sufficient conditions that a given sequence $(m_n)$ with $m_0=1$ be the sequence of moments of a random variable $X$ supported on $[0,1]$, i.e., $\operatorname{E}X^n=m_n$ for all $n$.

In 1921, Hausdorff showed that $(m_n)$ is such a moment sequence if and only if the sequence is completely monotonic, i.e., its difference sequences satisfy the equation $(D^r m)_s \ge 0$ for all $r, s \ge 0$. Here $D$ is the difference operator on the space of real sequences $(a_n)$ given by $D a = (a_{n} - a_{n+1})$.

The proof under the fold follows the outline given in (E18.5 – E18.6) Probability with Martingales by David Williams.

Proof of Necessity. Suppose $(m_n)$ is the moment sequence of a random variable $X$ supported on $[0,1]$. By induction, one can show that $(D^r m)_s = \operatorname{E}(1-X)^rX^s$. Clearly, as $X$ is supported on $[0,1]$, the moment sequence is completely monotonic.

Proof of Sufficiency. Suppose $(m_n)$ is a completely monotonic sequence with $m_0 = 1$.

Define $F_n(x) := \sum_{i \le nx}{n\choose i}(D^{n-i}m)_i$. Clearly, $F_n$ is right-continuous and non-decreasing, and $F_n(0^-) = 0$. To prove $F_n(1) = 1$, one has to prove the identity $$\sum_{i}{n\choose i}(D^{n-i}m)_i = m_0.$$

A classical trick. Since the identity above is about vectors in the linear space (over the reals) spanned by $(m_n)$ and the linear space spanned by $(m_n)$ is isomorphic to the one spanned by $(\theta^n)$, the identity is equivalent to $$\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \theta^0,$$ where $\theta_n = \theta^n$. Now, we take advantage of the ring structure of $\mathbb{R}[\theta]$. Notice that $(D^{r}\theta)_s = (1-\theta)^r\theta^s$. Using the binomial theorem, we obtain $$\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \sum_{i}{n\choose i}(1-\theta)^{n-i}\theta^i = (1-\theta + \theta)^n = \theta^0.$$

Therefore $F_n$ is a bona fide distribution function. Define $m_{n, k} := \int_{[0,1]} x^kdF_n(x)$, i.e., $m_{n,k}$ is the $k$th moment of $F_n$. We now find an explicit formula for $m_{n,k}$.

Noticing that $F_n$ is constant, say $c_{n,i}$, on $[\frac{i}{n}, \frac{i+1}{n})$, for all $i=0, \dots, n-1$ and $c_{n,i}$ is a linear combination of $m_0, \dots, m_n$, we know that $m_{n,k} = \sum_{i=0}^n a_{n,k,i}m_i$.

Just like what we did in proving the identity, we use the special case $m_n = \theta^n$ to compute the coefficients $a_i = a_{n,k,i}$, where $0 \le \theta \le 1$. In this case, $$F_n(x) = \sum_{i \le nx}{n\choose i}(D^{n-i}\theta)_i = \sum_{i\le nx}{n\choose i}(1-\theta)^{n-i}\theta^i, m_{n,k} = \sum_{i=0}^n a_{i}\theta^i.$$

Now consider the situation in which a coin with probability $\theta$ is tossed at times $1,2,\dots$. The random variable $H_k$ is $1$ if the $k$th toss produces heads, $0$ otherwise. Define $A_n := (H_1 + \dots + H_n)/n$. It is immediate from the formula of $F_n$ that $F_n$ is the distribution function of $A_n$, and so $m_{n,k}$ is the $k$th moment of $A_n$. However, one can calculate the $k$th moment of $A_n$ explicitly. Let $f\colon [k] \to [n]$ be chosen uniformly at random, $Im_f$ be the cardinality of the image of $f$ and denote by $p_i = p_{n,k,i} := \operatorname{Pr}(Im_f = i)$. Using $f, Im_f$ and $p_i$, we obtain \begin{aligned}\operatorname{E}A_n^k & = \operatorname{E}\left(\frac{H_1 + \dots + H_n}{n}\right)^k = \operatorname{E}H_{f(1)}\dots H_{f(k)} \\ & = \operatorname{E}\operatorname{E}[H_{f(1)}\dots H_{f(k)}\mid Im_f] = \operatorname{E}\theta^{Im_f} = \sum_{i=0}^n p_{i}\theta^i.\end{aligned} Therefore, for all $\theta\in [0,1]$, we know that $\sum_{i=0}^n a_i\theta^i = \sum_{i=0}^n p_i\theta^i$, and so $a_i = p_i$ for all $i=0,\dots, n$.

As both $(a_i)$ and $(p_i)$ do not depend on $m_i$, $a_i = p_i$ holds in general. Since $p_k = p_{n, k, k} = \prod_{i=0}^{k-1}(1-i/n)\to 1$ as $n\to\infty$ and $p_i = 0$ for all $i > k$, we know that $\lim_n m_{n,k}= m_k$.

Using the Helly–Bray Theorem, since $(F_n)$ is tight, there exists a distribution function $F$ and a subsequence $(F_{k_n})$ such that $F_{k_n}$ converges weakly to $F$. The definition of weak convergence implies that $$\int_{[0,1]} x^k dF(x) = \lim_n \int_{[0,1]}x^k dF_{k_n}(x) = \lim_n m_{k_n,k} = m_k.$$ Therefore, the random variable $X$ with distribution function $F$ is supported on $[0,1]$ and its $k$th moment is $m_k$. [qed]

There are other two classical moment problems: the Hamburger moment problem and the Stieltjes moment problem.