The ultimate prisoner puzzle

People like prisoner puzzles. Here is the hardest one I have ever seen. I heard it from Prof. Boris Bukh on the social hour of Maths Department couple of weeks ago, whom heard it on a maths conference. Since recently I am quite into Cryptography, especially setting up communication protocols, I would like to log the puzzle in my blog.

The storyline is like The Shawshank Redemption. You are locked into one of the cells in the jail. Unfortunately, you do not know how many prisoners there are in the jail. You do not even know whether you are the only one! The evil-hearted yet mathematically inclined warden wants to play a game with you. The rules are as follows.

  1. On the first day, you can write up a communication protocol. The warden will make identical photocopies of your protocol and distribute them among other prisoners.
  2. During the following days, everyday, each prisoner will be given one black card and one white card, and they should turn in one of them to the warden according to your protocol. The rest of the cards will be discarded. The warden would line up all prisoners in a circle in his mind and imagine each of them passes his or her chosen card to the next person clockwise. After going through all this in his mind, he will redistribute the cards to the prisoners accordingly. Notice, the warden could line up the people by any order he likes. Moreover, the orders could probably differ from day to day.
  3. One day, you should call for a stop and immediately tell the number of prisoners in this jail. All prisoners including yourself will be released if you got the number correct.

Remember, the evil-hearted warden knows every detail of your protocol. So all odds are always against you.
The following is the solution I came up.

Denote the number of prisoners as p. First of all, we want to draft the protocol so that all prisoners know an upper bound of p, denoted as P. Before we say anything about the protocol, let us take a look at the significance of knowing such an upper bound.

If all prisoners know an upper bound P, then they can broadcast messages. To be precise, if one turns in a card of some color he chooses, say c. Everyone else should keep turning in white cards until he receives a black card, and then keep turning in black cards regardless of what cards are received. Then after P days, all prisoners will receive the the cards of color c. (Think about why this is true.) Now everyone knows the color c. Hence they succeeded in broadcasting a single bit. If the protocol says something about coding messages by colors, one can broadcast any message to others. Note that prisoners might not know who has sent the messages.

Though direct message is hard to realize in this situation, broadcasting would be good enough for us to figure out p.

Now, let us see how prisoners cooperatively figure out an upper bound P. We call it the first stage.

In the first stage, all prisoners will be labeled as either visible or hidden. At the beginning, only I am labeled as visible, all the rest are labeled as hidden. We break down the following days into several phases. At the beginning of the kth phase, all prisoners know the upper bound of the current visible prisoners P_k. For instance, P_1=1. Now, all hidden people broadcast the a single bit of message telling the others there are hidden people. The broadcasting is done similarly, i.e., whoever receives the black card once should keep sending black cards. The broadcasting can be done within P_k days because there are at most these many visible prisoners.

There are two possible turnouts. If no visible prisoner receives this message, then all prisoners are visible, and immediately all prisoners know P=P_k. Otherwise, on the next day, all visible prisoners turn in black cards. Whoever receives a black card changes his label to visible if it was hidden. The number of prisoners who change their labels is at most P_k. Thus we can use P_{k+1}=2P_k in the k+1th phase as now there are at most P_k+P_k visible prisoners. Since at least one prisoner changes his label in each phase, eventually, all prisoners will become visible. Hence this process will terminate at a certain phase and provides us an upper bound P.

After the first stage, comes the second stage.

In the second stage, we will partition prisoners by clans and try refining the clans as much as possible. During the second stage, each prisoner knows the total number of clans, the names of each clan and which clan he belongs to. At the beginning, I myself alone consists the first clan and the rest of prisoners consist the second clan. This initial set up is written into the protocol, so everyone knows it.

The following is how I refine the clans. I can broadcast a message that asks the members in a certain clans if any one has turned in a black card on a given day. Prisoners in this certain clan will broadcast a message of a single bit to indicate their answer. And then I broadcast a message that asks if any one in this clan has turned in a white card on that day. If both answers are ‘yes’, this means there are two people in this clan turned in different card on that given day. Then I will demand this clan to divide into two smaller clans according to the color of their cards turned in on that given day. I can also do the same thing to the cards they received on that given day as well.

So in the second phase, I can constantly refine the clans by asking questions clan by clan and day by day. Since the number of clans can not exceed P, this refining process will stop and the number of clans will stabilize. Moreover, once it stabilizes, prisoners in each clan will turn in the same cards and receive the same cards as well. Otherwise, some day I will interrogate the members in this clan and separate them into smaller clans.

Now we can assume the prisoners in each clans always turn in the same cards and receive the same cards. This is amazing!
Suppose c_i is the number of prisoners in the ith clan C_i for i=1,\ldots, n. Assume I am in C_1. Then c_1=1. Initially the values of all other c_is are unknown. Our goal is to find out each c_i, though it is enough to find \sum_{i=1}^nc_i.

Notation. [n] is the set of all naturals from 1 to n. If I\subset [n], c_I := \{c_i: i\in I\} and \mathrm{span}(c_I) be the set of all linear combination of \{c_i: i\in I\}.

For r = 1,\ldots, n-1, we claim that for all r-element subset I\subset [n] and i\in I, c_i can be written as a linear combination of \{c_j: j\in[n]-I\}. In abbreviation we shall write c_i\in \mathrm{span}(c_{[n]-I}). Note when we say ‘some variable can be written as a linear combination of other variables’, we mean all prisoners know this information. In other words, all prisoners keep track of these linear dependencies along the way. We will omit the details of how prisoners would share this information in the following proof and leave them to the readers.

Before we prove the claim, let us take a look at the consequence if what we claimed is true. When r=n-1, it means c_2, \ldots, c_n can be written as a linear combination of c_1=1. Thus we know c_i for all i\in[n]. Hence we know the total number of prisoners.

Now we prove the claim by induction on r. If r=1, we shall prove c_i\in \mathrm{span}(c_{[n]-\{i\}}) for all i\in[n]. This is relatively easy. Let the prisoners in the clan C_i send out black cards. Some of the other clans will receive those black cards. Thus c_i is the sum of the size of those clans which receive the black cards.

Suppose the claim is true for some r. Now we shall prove for every r+1-element subset I\subset [n] and i\in I, c_i\in \mathrm{span}(c_J), where J=[n]-I.

Fix some i_0\in I. By the induction hypothesis, we know for all i\in I with i\neq i_0, we have
c_i = \alpha_ic_{i_0}+\beta_i(c_J),
where \beta_i(c_J)\in \mathrm{span}(c_J).
This is also true for i=i_0. Because we may set \alpha_{i_0}=1 and \beta_{i_0}=0.

Let the index set I_+ be the set \{i\in I: \alpha_i > 0\} and I_-=I-I^+. As i_0\in I_+, I_+ is non-empty.

Now we demand all clans indexed by I^+ send out black cards. Suppose the clans indexed by I_+'\subset I_+ who send out black cards receive white cards and the clans indexed by I_-'\subset I_- and J'\subset J who send out white cards receive black cards. Hence we have the following equation.
\sum_{i\in I'_+}c_i = \sum_{i\in I'_-}c_i+\sum_{j\in J'}c_j

\sum_{i\in I'_+}\left(\alpha_ic_{i_0}+\beta_i(c_J)\right)=\sum_{i\in I'_-}\left(\alpha_ic_{i_0}+\beta_i(c_J)\right)+\sum_{j\in J'}c_j

The \alpha-coefficient on the left hand side is positive. Meanwhile it is nonpositive on the right hand side. Thus by the equation above we can solve for c_{i_0} in terms of a linear combination of c_J. Hence we can write c_i in terms of a linear combination of c_J for all i\in I.



上个月在新加坡的时候,正值国际生物奥林匹克竞赛在新加坡国立大学举行。于是不免想去看看 2012 年 IMO 竞赛的题目和各国比赛的结果。


这次团体的第一名是韩国队思密达,个人总分第一名是新加坡的一位选手 Jeck Lim。这货四次代表新加坡国家队出战,金银铜满分四种情况都集齐了。中国队总分位列第二,中国队在个人排名最高的是 Yiyang She,位列第四。



下面是 2012 年 IMO 第三题:

“欺诈猜数游戏”在两个玩家甲和乙之间进行, 游戏依赖于两个甲和乙都知道的正整数kn
游戏开始时甲先选定两个整数xN1\leq x\leq N。甲如实告诉乙N的值,但对x守口如瓶。乙现在试图通过如下方式的提问来获得关于x的信息:每次提问,乙任选一个由若干正整数组成的集合S(可以重复使用之前提问中使用过的集合),问甲“x是否属于S?”。乙可以提任意数量的问题。在乙每次提问之后,甲必须对乙的提问立刻回答“是”或“否”,甲可以说谎话,并且说谎的次数没有限制,唯一的限制是甲在任意连续k+1次回答中至少有一次回答是真话。


  1. n\geq 2^k,则乙可保证获胜;
  2. 对所有充分大的整数k,存在整数n\geq 1.99^k,使得乙无法保证获胜。
















如果补回之前忽略的取整细节,你立即会发现,只要N\geq 2^{k+1},那经过这样k+1轮之后,一定能够排除至少一个数字,从而缩小x的出现范围。







综上所述,只要N大于2^k,那我们就能不断地排除一些数。于是当n\geq 2^k,乙有必胜策略!



Excerpt from Kunen

People living in M cannot construct a G which is P-generic over M. They may believe on faith that there exists a being to whom their universe, M, is countable. Such a being will have a generic G and an f_G=\cup G. The people in M do not know what G and f_G are but they have names for them, \Gamma and \Phi. They may also read the preceding few paragraphs and thus figure out certain properties of G and f_G

I could only say that Paul Cohen’s idea on the Continuum Hypothesis is amazing. And I am quite curious about his religious belief.


Comments on equality

This semester, I am assigned as a grader for 21-355 Principle of Mathematical Analysis I since I did really bad in the iTA test last year and ended up in the category 3 which doesn’t allow me to hold any recitation for undergraduate students. Anyway, I need to find some fun in the given situation. That’s my principle for life.

The textbook for the course is Rudin’s book, one of the most classic introductory books in analysis. As I need to grade several problems from the exercises part, I read through the first chapter and kept in mind what results students can use in their proofs. But quickly, I was stuck at the point where students started to use things like, if a=b, then ac=bc or if a=b then a+c=b+c.

Then I questioned myself, how can I use the axioms given in Rudin’s book to justify those arguments? I know I was being nerdy at that time, but it turned out that those field axioms tell us nothing about the interplay between the equality and the two operators, namely addition and multiplication. Moreover, in the proof for some basic properties of field Rudin took those arguments as granted.

One might say “It is obvious, isn’t it?”. But my response is “No.”. In first order logic with equality, those arguments actually used the axioms of equality.

To exaggerate a little bit, how does maths scare a lot of people? I can imagine a scenario that someone is looking up a solution using this kind of ‘tricks’ and get depressed since he didn’t think of such an apparent way of solving the problem.

But in fact, that’s not this guy’s fault, it’s the deficiency hidden in maths education. In my opinion, there are no magic in mathematics, but axioms and rules that people use them unconsciously. So maths educators, especially for those in a primary level, should pay more attention to these obvious issues. Sometimes, one really need to emphasis the trivial things at the beginning level.


Mathematical Abbreviations

Here is a short list of mathematical abbreviations which troubled me the first time I graded students’ papers. You should be familiar with these words if you have used English as your first language in mathematics for years. But for students who has no experience of writing maths in English, this list might help.

Abbreviation Definition
ETS / ETP / STP enough to show / enough to prove / sufficient to prove
FSOC / FSOCWMA for sake of contradiction / for sake of contradiction, we may assume
NTS / WTP / WTS need to show / want to prove / want to show
TFAE the following are equivalent
W^5 which was what we want
WLOG / WALOG / WLOGWMA without loss of generality / without any loss of generality / without loss of generality, we may assume
WRT with respect to

How to say maths terms in English?

On Maths Presentation Workshop at Carnegie Mellon University, Professor Brandon was trying to to help us understand the undergrad in United States and improve our teaching skills. This was my first time knowing how to say maths terms in English. And I want to share with you some of the tips. Before I start, I would like to abbreviate some of the points that Prof. Brandon mentioned concerning teaching.

  • Lousy background. American students have a lousy background in mathematics, but it doesn’t mean they are not smart. You will see students writing \frac{1}{a}+\frac{1}{b}=\frac{1}{a+b} or \sqrt{a}+\sqrt{b}=\sqrt{a+b}, but it’s not their fault. Even their high school teachers would say ‘Well, it’s just a different way to do those calculations.’
  • Cheat on homework. A student who is copied by others is considered cheating as much as the student who copied him or her.
  • Handholding. Some students do not know how to learn things. They need someone to guide them, for instance, how to use textbooks, how to prepare exams, etc.
  • 2 tips on teaching skills. Keep eye contact. Reword things and write it down.

Finally, here is the list that gives you some examples of spoken Maths.

Maths Terms How to say it?
\frac{1}{x^n} 1 over x to the power of n
\sqrt[3]{x}, \sqrt[4]{x} cube root of x, fourth root of x
\sin^{n}x sine to the power of n of x
f\circ g f composed with g
\lim_{x\rightarrow a}f(g(x)) limit as x approaches a of f of g of x
x\rightarrow \pi^{-} x approaches pi from the left
\frac{d}{dx}f(x) the derivative of f of x (with respect to x) or d over dx of f of x
\frac{\partial x}{\partial y} del y over del x
\nabla, \Delta gradient, Laplacian
\log_a(xy) log (with respect to) base a of x times y
\ln x natural log of x or ln of x
f(x)g(x)]_a^b f of x times g of x evaluated from a to b
f(x,y), (a,b) f of x comma y, a comma b
f'(x), f^{\prime\prime}(x), f^{\prime\prime\prime}(x) f prime of x, f double prime of x, f triple prime of x
S_n, \cdots, f_x s sub n, dot dot dot or etc, f sub x

John 椭球

对于一个三角形T,一定可以找到一个椭圆E,满足,E\subseteq T\subseteq 2E。对于一个平行四边形P,一定可以找到一个椭圆E^*,满足,E^*\subseteq P\subseteq \sqrt{2}E^*。由于在仿射变换下三角形,平行四边形,椭圆,线段比例都保持,所以只需要对正三角形和正方形进行证明就可以了。

实际上,John 定理断言,每一个n维凸体K都有一个相应的椭球E满足,E\subseteq K\subseteq nE。对每一个中心对称凸体C,都有一个相应的椭球E^*满足,E^*\subseteq C\subseteq \sqrt{n}E^*

为了证明 John 定理,我们需要引入 John 椭球的概念,为此需要证明 John-Loewner 椭球定理:对于任意一个n维空间中的含有内点的紧子集,存在唯一的椭球包含K,使得椭球体积达到最小,此时,称该椭球为 John 椭球。

证明(概要):利用椭球与n阶正定对称阵的联系,考虑所有包含 K 的椭球的中心和其对应的正定对称阵构成空间C_K,证明\det函数在C_K上取到最大值,存在性得证。如果\detC_K中有两个极大值点,可以通过这两个极大值点构造C_K中的元素,使得\det在该元素上的取值更大(这里需要利用\ln \detC_K上的凹性),由此导出矛盾,唯一性得证。

作为应用,我们考虑所有的GL_n(\mathbb{R})的紧子群G,令K=\cup_{g\in G}{g(B^n)},其中B^n是单位球,此时K含有内点。于是K是在任意G中元素作用下稳定,如果EK的 John 椭球,那么E在任意G中作用下也稳定(这是因为G的紧致性保证了其任意元素g的行列式为1,于是g(E)明显包含K,且体积与E相等,由 John 椭球的唯一性知E的稳定性),由此可知存在v\in GL_n(\mathbb{R}),使得对任意g\in G,有,v^{-1}gv(B^n)=B^n,故v^{-1}Gv\subseteq SO_n(\mathbb{R}),也就是说在相差一个共轭的程度上,正交群是极大的紧子群。



Matrix67 致敬,我也来写点科普文章。

假期给初一、初二小孩讲几何课的时候,都提到了下面的命题:四边形 ABCD中, E,F依次是边 BC上的两个三等分点, G,H依次是边 AD上两个三等分点,求证 S_{ABCD}=3S_{EFHG}





很明显,这块田地没有被九等分。但会不会像之前的“三分田地”问题一样,有类似的结论呢?很自然的一个猜测是最中间的土地面积是总面积的九分之一。其数学表述如下:四边形 ABCD中, E,F依次是边 BC上的两个三等分点, G,H依次是边 AD上两个三等分点, I,J依次是边 AB上的两个三等分点, K,L依次是边 DC上两个三等分点,求证 S_{ABCD}=9S_{MNOP}




由之前的结论,红色四边形 IJLK的面积是整个面积的三分之一。假设我们能再证明 M,P N,O分别是 IK, JL的三等分点,那再次利用三分田地的结论,我们就能证明红黑相交的部分 MPNO的面积是红色四边形 IJLK的三分之一,也就是原四边形的九分之一了。

最后我们证明 M,P N,O确实分别是 IK, JL的三等分点,为此我们只证明 M IK的三等分点。




S_{GKE} = \frac{2}{3}S_{GDE}+\frac{1}{3}S_{GCE}=2\left(\frac{2}{3}S_{GAE}+\frac{1}{3}S_{GBE}\right)=2S_{GIE}.

于是 KM = 2IM,就证明了需要的结论。

通过类似的论证,读者还能将井田问题中的三等分替换为五等分、七等分甚至任意 2n+1等分。进行类似的面积划分后,最中间那块的面积便是 (2n+1)^2分之一了。




恽之玮学长是北京大学数学科学学院 00 级的学生,后去 Princeton 读研,2009 年至 2010 年期间在高等研究所(Institute of Advanced Study)工作,目前在 MIT 就职,主要工作方向为几何表示论(Geometric Representation Theory)。




恽之玮:现在还在做的大概有十个人或者更多。从之前和之后的比较来说这都是一个 local 的最大值,整体的不能说。






恽之玮:上大学之前,我买了一本拓扑的教材。当时看不懂,但我觉得他讲得很清楚。大一寒假里,我再翻一翻,越看越有意思,拓扑是吸引我的一个方面。还有一个就是 Galois 理论,当时老师也没有时间教,主要是自己看的。像 Galois 理论,它证明五次方程没有根式解,中学里就听过,但一直不知道是怎么证的。我一直对它挺感兴趣。我对群的概念比较感兴趣,这两个都涉及到群的概念。对我比较影响比较大的几门课,高峡老师当时给我们讲了“模形式”,这门课非常适合本科生,是一门很综合的学科。现在不用着急,今后肯定要学一学。它把拓扑,复变还有群论这些东西都综合在一起,学了之后你会发现,前面的东西你可能都没学懂,真正用的时候不会操作。





恽之玮:你们学抽象代数是吧。我当时学群的时候也觉得很抽象,证明的那些关于群的定理,都是很抽象也很一般的,那些论述对所有群都是成立的。其实学群不应该这么学,应该学个别的具体的群。而且不要光学群,要学习群在空间中的作用。群作用的概念现在是有的,但是你对于每一个群都要想一想它作用在什么上面。如果你凭空造出来一个群,不作用在任何东西上面,它是没有意义的。比如说学习 Galois 理论你就知道了,这个群作用在这个域上。如果群不作用在任何东西上,就没有存在的意义。


恽之玮:以前那个杨磊老师——现在不知道还开不开课,我们受他影响挺大的——他当时叫我们念 Klein 的《二十面体与五次方程》。那本书是二十世纪初翻译的,语言跟现在的有些不一样。但是它的内容都是很具体的、可以动手算的,我们当时做了个二十面体模型来看这里的对称群。它还可以被用来解某些五次方程;或者换句话说,五次方程不能用根式解,但是可以用椭圆函数解。相当于当时的数学基本上都能归纳到这本书里了。当时数学其实已经很丰富了,当时的比如模型式的理论,虽然不算很系统,但是模型式里的特殊函数、矩阵、线性变换群,还有 Riemann 曲面、常微分方程——你可以将里面的东西翻译成 Riemann 曲面的语言。这样书如果认真看的话可以学到很多东西。一般书讲的很抽象,上来看两页就不想看。但这本书不用当任务看,你可以当消遣看。能当消遣看的数学书不多。

恽之玮:我在 Princeton 的导师叫 MacPherson,是个拓扑学家,也做代数几何和表示论,但他拿手的是拓扑。当然他现在在考虑一些应用数学问题,也很有意思。我没有跟他学过这些东西,但是完全是纯数学里的想法,相当于开创了拓扑的一个新的方向。他告诉我,一定要注重例子。比如说找几个曲线、曲面作为例子来算;定理成立与否的条件,这个条件是否必要也,可以举个例子。这一方面帮助你理解一般性的定理:有些时候通过分析一个例子你可以就知道一般性的问题该怎么做了;另一方面在今后做研究的时候,很多研究中的问题都来源于例子,而不是抽象的理论。最终的研究成品可能是一个推广了的、发展了的理论,是一套抽象的东西,但其实研究者的出发点肯定不是这样的。有一个具体的问题或具体的例子,原来的工具不够用了,从而迫使他来发现、来改进别人的结果,于是新发展了一套工具,而不是为了发展这个而发展。一定要以例子为中心。也有个别不看例子的,像 Grothendieck,他是代数几何的一代宗师。这种人毕竟是少数,他的逻辑思维的能力太强了,很少能有人能达到这个程度。其他的数学家,虽然写的有些东西很抽象,但他的想法,如果你跟他交谈或者是了解他私下里怎么思考,其实都是例子。






恽之玮:对,后来我在 Princeton 读研究生的时候有一些同学,他们本科的时候多少做过一些科研的,而且有的还写出文章来了,但是我觉得那个意思也不是很大。老师是手把手地把问题给你,大概怎么做给你,然后当然你自己需要一些聪明才智。但是即使你做出来,最后对整个框架性的东西还是不太了解。这个问题为什么要这么提,它的来龙去脉怎么样,不可能在这么短的时间又做出来又了解它的背景,总会是有得有失。他们拿出来的成果表面上比我们厉害,因为本科生的时候就有论文发表。但是我觉得不用急,不用急着做原创性的东西,但不要说人家做过的我就不去算,如果你要练习就还是要上手。你做的目的不是为了发现一个新的东西,至少你不能有意地去发现,你可以研究例子的时候无意地去发现。有时候你盯着一个最简单的例子来问,就可以问很多很多问题。基本上任何一个简单的问题最后问到足够深都能问到那种 open problem 不一定是别人做不出来的,但至少是别人没有考虑过的。这个也是很重要的一件事:提出问题。


恽之玮:这个问题物理上有一些动机,研究弦理论的学者认为世界是高维度的,并且具有一些结构,例如 Calabi–Yau 流形。数学上,比如代数几何,它的动机是研究多项式方程组的解,当变量增多的时候,维数自然就高了。代数几何学家想问题的时候,经常画一些图,只用到了一维,二维,三维的直觉来画一些类比的图形。比如画一条代数曲线,代数曲线应该是一个 Riemann 面,是二维的,他可能就画一条线来类比。很多情况看是看不见的,最多就能看到三维流形。我们平常只能看到最简单的三维流形,是在欧氏空间中的,那还有很多很复杂的三维流形,那些就只有从事三维流形研究的人能够想象了,他会做一些手术,比如粘合。比如 Thurston,他主要研究三维流形,他就能在脑子里清楚的看到一些三维流形。也有时候,他们不是在想那个流形本身,而是划归为一些相对熟悉的东西,比如高维的球面,虽然不知道它的样子,但是还是能稍微想想的。还有一些流形,或者是代数几何中研究的代数簇,它局部上有的时候就像一个球一样,有的可能带有奇点,比如可能像一个锥。其实都是在用一些熟悉的图形在类比,只不过维数变高了。

王志宇:学长在 Princeton 学习和工作辛苦吗?能保持一直高效率的状态吗?

恽之玮:还可以,以前田刚老师跟我们说,他每天要工作 12 个小时。我数来数去我数不到,反正我自己工作不到 12 个小时。我觉得有时候不要太苛求自己了,只要不浪费时间就可以了。我们当时,第一年要通过一个博士生资格考试,那个就可能花一个学期准备准备,那些都是考基础的知识为主,加上两个自选科目,形式是口试问答。之后最困难的就是需要找一个问题做,这有时候比解决问题还要困难。各个导师不一样,有的导师会给你一个问题,有的导师甚至把答案都告诉你,有的导师不给你问题也不给你答案,这个看运气了。但是最终你是要自己找一个问题了。








恽之玮:一般就只能自己退出了。每个学校,比如说,今年招了 20 个研究生,但是它只招 5 个博士后,也就是说,如果每个学校都这么算的话,有四分之三的人就找不到工作了。如果是这样的话,有些人会选择往世界其他地方分,有回国的,有中途转业,但不是转得特别离谱的,比如转到应用方向的,或者是统计的,有的就在统计或者应用的方面做一些应用或者研究,做教授,有的就到工业界。

姜子麟:您当年是如何申请到 Princeton 的研究生的?


王志宇:学长当年有多少同龄人去了像 Princeton 这样的比较好的学校?

恽之玮:有一个袁新意是我同学,他比我早一年毕业,3 年就毕业了,他在 Columbia 大学做得很好,是张寿武的学生。我这一年毕业的有张伟,后来也师从张寿武,也在 Columbia 大学。还有一个叫朱歆文,去了 Berkley 大学。当年申请有两个 99 级的,比我们高一级,他们是三加二,三年本科,两年研究生,有一个和我去了 Princeton 大学,我们是同学,还有一个去了 MIT。Stanford 有一个念统计的同学。所以这些学校每年都会要北大的学生。

姜子麟:在 Princeton 求学的过程中,有没有发现国外学生一些特别的地方值得我们学习?



Hall's Theorem

In 1935, the mathematician Philip Hall discovered a criteria of a perfect matching on a bipartite graph, known as Hall’s theorem, aka marriage theorem.

Considering two sets of vertices, denoted as A=\{a_1, \cdots, a_m\} and B=\{b_1, \cdots, b_n\}. Edges are connected between a_i and b_j for some pairs of (i, j). Here, we admit multiple edges between two vertices. This graph, named as G, is called a bipartite graph. One may think the bipartite graph as a model of a marriage system. Taken set A and set B as boys and girls, an edge between a boy and a girl tells that they might be a couple. A natural question arises that whether there exists a perfect matching, that is, every boy can find his girl without conflicting with other boys. Put formally, a perfect matching is an injective map from A to B which induces a subgraph of G.

Hall’s theorem asserted that there is a perfect matching in above bipartite graph G if and on if for every subsets S of A, the cardinal number of S(number of the members in a set) does not exceed the number of the neighbors of S. Here, the neighbors of S are all vertices in B which are connected with one vertex in S.

The necessary condition apparently holds. On the other hand, the proof of the sufficient part is a little bit tricky. Challenge yourself with this problem.

Now we represent several applications of Hall’s theorem.

  • Let A=(a_{ij}) be a matrix of order n with nonnegative integers as entries, such that every row and column of A has sum l. Then A is the sum of l permutation matrices.
  • Suppose all vertices of a directed graph G have the same in-degrees and out-degrees of l, a fixed positive integer. Thus the graph can be separated as l groups of circles. Every vertex appears exactly once in each group, and each edge appears in only one circle.
  • (Birkhoff-von Neumann theorem) A doubly stochastic matrix, is a square matrix of nonnegative real numbers, each of whose rows and columns sums to 1. Assertion: every doubly stochastic matrix is mere a convex combination of permutation matrices.

The first two questions are equivalent with each other if one noticed the relation between a graph and its adjacent matrix. In addition, in the first problem, we construct a bipartite graph G consisting of 2n vertices, u_1, \cdots, u_n and v_1, \cdots, v_n and a_{ij} edges between each pair of vertices u_i and v_j. It can be easily verified that this graph meets the condition required in Hall’s theorem through contradictory argument. Thus we get a perfect matching in this graph, which conversely represent a permutation matrix. After Eliminating the perfect matching in this graph, it become to the situation of l-1 in-degrees and out-degrees. This allows us to use mathematical induction on the parameter l.

If the entries of the matrix are all rational in the third problem, the result is reduced to the first problem. Still, we shall cover the gap between rational numbers and reals. Denote all permutation matrix as M_1, \cdots, M_{n!}. For each matrix (a_{ij}) meets the requirement that the sum of each row or each column is 1, one can approach it by (a_{ij}^n) with rational entries which is also doubly stochastic. Thus we can decompose the rational matrix (a_{ij}^n) into a convex combination of permutation matrices with rational coefficient c_k^n, where k \in \{1, \cdots, n!\}. By finding convergent subsequence in each (c_k^n)_{n=1}^\infty, one can require that all these subsequences are convergent for the same indices, and they converge to a limit c_k. After all, the original matrix (a_{ij}) can be expressed as a convex combination of permutation matrices with coefficients c_k.