A Short Proof for Hausdorff Moment Problem

Hausdorff moment problem asks for necessary and sufficient conditions that a given sequence $(m_n)$ with $m_0=1$ be the sequence of moments of a random variable $X$ supported on $[0,1]$ , i.e., $\operatorname{E}X^n=m_n$ for all $n$ .

In 1921, Hausdorff showed that $(m_n)$ is such a moment sequence if and only if the sequence is completely monotonic, i.e., its difference sequences satisfy the equation $(D^r m)_s \ge 0$ for all $r, s \ge 0$ . Here $D$ is the difference operator on the space of real sequences $(a_n)$ given by $D a = (a_{n} - a_{n+1})$ .

The proof under the fold follows the outline given in (E18.5 – E18.6) Probability with Martingales by David Williams.

Proof of Necessity. Suppose $(m_n)$ is the moment sequence of a random variable $X$ supported on $[0,1]$ . By induction, one can show that $(D^r m)_s = \operatorname{E}(1-X)^rX^s$ . Clearly, as $X$ is supported on $[0,1]$ , the moment sequence is completely monotonic.

Proof of Sufficiency. Suppose $(m_n)$ is a completely monotonic sequence with $m_0 = 1$ .

Define $F_n(x) := \sum_{i \le nx}{n\choose i}(D^{n-i}m)_i$ . Clearly, $F_n$ is right-continuous and non-decreasing, and $F_n(0^-) = 0$ . To prove $F_n(1) = 1$ , one has to prove the identity $\sum_{i}{n\choose i}(D^{n-i}m)_i = m_0.$

A classical trick. Since the identity above is about vectors in the linear space (over the reals) spanned by $(m_n)$ and the linear space spanned by $(m_n)$ is isomorphic to the one spanned by $(\theta^n)$ , the identity is equivalent to $\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \theta^0,$ where $\theta_n = \theta^n$ . Now, we take advantage of the ring structure of $\mathbb{R}[\theta]$ . Notice that $(D^{r}\theta)_s = (1-\theta)^r\theta^s$ . Using the binomial theorem, we obtain $\sum_{i}{n\choose i}(D^{n-i}\theta)_i = \sum_{i}{n\choose i}(1-\theta)^{n-i}\theta^i = (1-\theta + \theta)^n = \theta^0.$

Therefore $F_n$ is a bona fide distribution function. Define $m_{n, k} := \int_{[0,1]} x^kdF_n(x)$ , i.e., $m_{n,k}$ is the $k$ th moment of $F_n$ . We now find an explicit formula for $m_{n,k}$ .

Noticing that $F_n$ is constant, say $c_{n,i}$ , on $[\frac{i}{n}, \frac{i+1}{n})$ , for all $i=0, \dots, n-1$ and $c_{n,i}$ is a linear combination of $m_0, \dots, m_n$ , we know that $m_{n,k} = \sum_{i=0}^n a_{n,k,i}m_i$ .

Just like what we did in proving the identity, we use the special case $m_n = \theta^n$ to compute the coefficients $a_i = a_{n,k,i}$ , where $0 \le \theta \le 1$ . In this case, $F_n(x) = \sum_{i \le nx}{n\choose i}(D^{n-i}\theta)_i = \sum_{i\le nx}{n\choose i}(1-\theta)^{n-i}\theta^i, m_{n,k} = \sum_{i=0}^n a_{i}\theta^i.$

Now consider the situation in which a coin with probability $\theta$ is tossed at times $1,2,\dots$ . The random variable $H_k$ is $1$ if the $k$ th toss produces heads, $0$ otherwise. Define $A_n := (H_1 + \dots + H_n)/n$ . It is immediate from the formula of $F_n$ that $F_n$ is the distribution function of $A_n$ , and so $m_{n,k}$ is the $k$ th moment of $A_n$ . However, one can calculate the $k$ th moment of $A_n$ explicitly. Let $f\colon [k] \to [n]$ be chosen uniformly at random, $Im_f$ be the cardinality of the image of $f$ and denote by $p_i = p_{n,k,i} := \operatorname{Pr}(Im_f = i)$ . Using $f, Im_f$ and $p_i$ , we obtain $\begin{aligned}\operatorname{E}A_n^k & = \operatorname{E}\left(\frac{H_1 + \dots + H_n}{n}\right)^k = \operatorname{E}H_{f(1)}\dots H_{f(k)} \\ & = \operatorname{E}\operatorname{E}[H_{f(1)}\dots H_{f(k)}\mid Im_f] = \operatorname{E}\theta^{Im_f} = \sum_{i=0}^n p_{i}\theta^i.\end{aligned}$ Therefore, for all $\theta\in [0,1]$ , we know that $\sum_{i=0}^n a_i\theta^i = \sum_{i=0}^n p_i\theta^i$ , and so $a_i = p_i$ for all $i=0,\dots, n$ .

As both $(a_i)$ and $(p_i)$ do not depend on $m_i$ , $a_i = p_i$ holds in general. Since $p_k = p_{n, k, k} = \prod_{i=0}^{k-1}(1-i/n)\to 1$ as $n\to\infty$ and $p_i = 0$ for all $i > k$ , we know that $\lim_n m_{n,k}= m_k$ .

Using the Helly–Bray Theorem, since $(F_n)$ is tight, there exists a distribution function $F$ and a subsequence $(F_{k_n})$ such that $F_{k_n}$ converges weakly to $F$ . The definition of weak convergence implies that $\int_{[0,1]} x^k dF(x) = \lim_n \int_{[0,1]}x^k dF_{k_n}(x) = \lim_n m_{k_n,k} = m_k.$ Therefore, the random variable $X$ with distribution function $F$ is supported on $[0,1]$ and its $k$ th moment is $m_k$ . [qed]

There are other two classical moment problems: the Hamburger moment problem and the Stieltjes moment problem.

Leave a comment Cancel reply