Categories
Mathematics

The ultimate prisoner puzzle

People like prisoner puzzles. Here is the hardest one I have ever seen. I heard it from Prof. Boris Bukh on the social hour of Maths Department couple of weeks ago, whom heard it on a maths conference. Since recently I am quite into Cryptography, especially setting up communication protocols, I would like to log the puzzle in my blog.

The storyline is like The Shawshank Redemption. You are locked into one of the cells in the jail. Unfortunately, you do not know how many prisoners there are in the jail. You do not even know whether you are the only one! The evil-hearted yet mathematically inclined warden wants to play a game with you. The rules are as follows.

  1. On the first day, you can write up a communication protocol. The warden will make identical photocopies of your protocol and distribute them among other prisoners.
  2. During the following days, everyday, each prisoner will be given one black card and one white card, and they should turn in one of them to the warden according to your protocol. The rest of the cards will be discarded. The warden would line up all prisoners in a circle in his mind and imagine each of them passes his or her chosen card to the next person clockwise. After going through all this in his mind, he will redistribute the cards to the prisoners accordingly. Notice, the warden could line up the people by any order he likes. Moreover, the orders could probably differ from day to day.
  3. One day, you should call for a stop and immediately tell the number of prisoners in this jail. All prisoners including yourself will be released if you got the number correct.

Remember, the evil-hearted warden knows every detail of your protocol. So all odds are always against you.
The following is the solution I came up.

Denote the number of prisoners as p. First of all, we want to draft the protocol so that all prisoners know an upper bound of p, denoted as P. Before we say anything about the protocol, let us take a look at the significance of knowing such an upper bound.

If all prisoners know an upper bound P, then they can broadcast messages. To be precise, if one turns in a card of some color he chooses, say c. Everyone else should keep turning in white cards until he receives a black card, and then keep turning in black cards regardless of what cards are received. Then after P days, all prisoners will receive the the cards of color c. (Think about why this is true.) Now everyone knows the color c. Hence they succeeded in broadcasting a single bit. If the protocol says something about coding messages by colors, one can broadcast any message to others. Note that prisoners might not know who has sent the messages.

Though direct message is hard to realize in this situation, broadcasting would be good enough for us to figure out p.

Now, let us see how prisoners cooperatively figure out an upper bound P. We call it the first stage.

In the first stage, all prisoners will be labeled as either visible or hidden. At the beginning, only I am labeled as visible, all the rest are labeled as hidden. We break down the following days into several phases. At the beginning of the kth phase, all prisoners know the upper bound of the current visible prisoners P_k. For instance, P_1=1. Now, all hidden people broadcast the a single bit of message telling the others there are hidden people. The broadcasting is done similarly, i.e., whoever receives the black card once should keep sending black cards. The broadcasting can be done within P_k days because there are at most these many visible prisoners.

There are two possible turnouts. If no visible prisoner receives this message, then all prisoners are visible, and immediately all prisoners know P=P_k. Otherwise, on the next day, all visible prisoners turn in black cards. Whoever receives a black card changes his label to visible if it was hidden. The number of prisoners who change their labels is at most P_k. Thus we can use P_{k+1}=2P_k in the k+1th phase as now there are at most P_k+P_k visible prisoners. Since at least one prisoner changes his label in each phase, eventually, all prisoners will become visible. Hence this process will terminate at a certain phase and provides us an upper bound P.

After the first stage, comes the second stage.

In the second stage, we will partition prisoners by clans and try refining the clans as much as possible. During the second stage, each prisoner knows the total number of clans, the names of each clan and which clan he belongs to. At the beginning, I myself alone consists the first clan and the rest of prisoners consist the second clan. This initial set up is written into the protocol, so everyone knows it.

The following is how I refine the clans. I can broadcast a message that asks the members in a certain clans if any one has turned in a black card on a given day. Prisoners in this certain clan will broadcast a message of a single bit to indicate their answer. And then I broadcast a message that asks if any one in this clan has turned in a white card on that day. If both answers are ‘yes’, this means there are two people in this clan turned in different card on that given day. Then I will demand this clan to divide into two smaller clans according to the color of their cards turned in on that given day. I can also do the same thing to the cards they received on that given day as well.

So in the second phase, I can constantly refine the clans by asking questions clan by clan and day by day. Since the number of clans can not exceed P, this refining process will stop and the number of clans will stabilize. Moreover, once it stabilizes, prisoners in each clan will turn in the same cards and receive the same cards as well. Otherwise, some day I will interrogate the members in this clan and separate them into smaller clans.

Now we can assume the prisoners in each clans always turn in the same cards and receive the same cards. This is amazing!
Suppose c_i is the number of prisoners in the ith clan C_i for i=1,\ldots, n. Assume I am in C_1. Then c_1=1. Initially the values of all other c_is are unknown. Our goal is to find out each c_i, though it is enough to find \sum_{i=1}^nc_i.

Notation. [n] is the set of all naturals from 1 to n. If I\subset [n], c_I := \{c_i: i\in I\} and \mathrm{span}(c_I) be the set of all linear combination of \{c_i: i\in I\}.

For r = 1,\ldots, n-1, we claim that for all r-element subset I\subset [n] and i\in I, c_i can be written as a linear combination of \{c_j: j\in[n]-I\}. In abbreviation we shall write c_i\in \mathrm{span}(c_{[n]-I}). Note when we say ‘some variable can be written as a linear combination of other variables’, we mean all prisoners know this information. In other words, all prisoners keep track of these linear dependencies along the way. We will omit the details of how prisoners would share this information in the following proof and leave them to the readers.

Before we prove the claim, let us take a look at the consequence if what we claimed is true. When r=n-1, it means c_2, \ldots, c_n can be written as a linear combination of c_1=1. Thus we know c_i for all i\in[n]. Hence we know the total number of prisoners.

Now we prove the claim by induction on r. If r=1, we shall prove c_i\in \mathrm{span}(c_{[n]-\{i\}}) for all i\in[n]. This is relatively easy. Let the prisoners in the clan C_i send out black cards. Some of the other clans will receive those black cards. Thus c_i is the sum of the size of those clans which receive the black cards.

Suppose the claim is true for some r. Now we shall prove for every r+1-element subset I\subset [n] and i\in I, c_i\in \mathrm{span}(c_J), where J=[n]-I.

Fix some i_0\in I. By the induction hypothesis, we know for all i\in I with i\neq i_0, we have
c_i = \alpha_ic_{i_0}+\beta_i(c_J),
where \beta_i(c_J)\in \mathrm{span}(c_J).
This is also true for i=i_0. Because we may set \alpha_{i_0}=1 and \beta_{i_0}=0.

Let the index set I_+ be the set \{i\in I: \alpha_i > 0\} and I_-=I-I^+. As i_0\in I_+, I_+ is non-empty.

Now we demand all clans indexed by I^+ send out black cards. Suppose the clans indexed by I_+'\subset I_+ who send out black cards receive white cards and the clans indexed by I_-'\subset I_- and J'\subset J who send out white cards receive black cards. Hence we have the following equation.
\sum_{i\in I'_+}c_i = \sum_{i\in I'_-}c_i+\sum_{j\in J'}c_j

Hence
\sum_{i\in I'_+}\left(\alpha_ic_{i_0}+\beta_i(c_J)\right)=\sum_{i\in I'_-}\left(\alpha_ic_{i_0}+\beta_i(c_J)\right)+\sum_{j\in J'}c_j

The \alpha-coefficient on the left hand side is positive. Meanwhile it is nonpositive on the right hand side. Thus by the equation above we can solve for c_{i_0} in terms of a linear combination of c_J. Hence we can write c_i in terms of a linear combination of c_J for all i\in I.

Categories
Mathematics

欺诈猜数游戏(上)

上个月在新加坡的时候,正值国际生物奥林匹克竞赛在新加坡国立大学举行。于是不免想去看看 2012 年 IMO 竞赛的题目和各国比赛的结果。

理论上,国际数学奥林匹克竞赛是从来不进行国家排名的。但是,私底下……你懂的。

这次团体的第一名是韩国队思密达,个人总分第一名是新加坡的一位选手 Jeck Lim。这货四次代表新加坡国家队出战,金银铜满分四种情况都集齐了。中国队总分位列第二,中国队在个人排名最高的是 Yiyang She,位列第四。

言归正传,说说今年的题目。今年难度系数也是最有意思的题目当属第三题——欺诈猜数游戏。

广告插播:《欺诈者游戏》今年的剧场版换女主角了,我再也不相信爱情了。

下面是 2012 年 IMO 第三题:

“欺诈猜数游戏”在两个玩家甲和乙之间进行, 游戏依赖于两个甲和乙都知道的正整数kn
游戏开始时甲先选定两个整数xN1\leq x\leq N。甲如实告诉乙N的值,但对x守口如瓶。乙现在试图通过如下方式的提问来获得关于x的信息:每次提问,乙任选一个由若干正整数组成的集合S(可以重复使用之前提问中使用过的集合),问甲“x是否属于S?”。乙可以提任意数量的问题。在乙每次提问之后,甲必须对乙的提问立刻回答“是”或“否”,甲可以说谎话,并且说谎的次数没有限制,唯一的限制是甲在任意连续k+1次回答中至少有一次回答是真话。

在乙问完所有想问的问题之后,乙必须指出一个至多包含n个正整数的集合X,若x属于X,则乙获胜;否则甲获胜。证明:

  1. n\geq 2^k,则乙可保证获胜;
  2. 对所有充分大的整数k,存在整数n\geq 1.99^k,使得乙无法保证获胜。

扯点题外话,虽然很多年没有做题了。但是此次再次拿起竞赛题思考的时候,发现心智已经成熟了不少,知道如何去细致地处理问题了。

回想高中的那段时间,无论是看竞赛书,还是出去上课,对于做不出的问题,大都是看了书上或是老师的解答之后,感叹解答之妙,但始终对解题动机这一说没有一个理解。

高中竞赛的训练使得我成为了一个熟练的工人,对于一般的问题,只需要将一般的工具试一遍就好了,而遇到非主流问题就一筹莫展了。所以如何提高对我来说一直是一个问题,现在看来,正是缺乏组合问题的训练,以及对待具体问题具体处理的能力。

但是,一旦脱离了那样的环境,反而能静下心去想问题,也能理解为什么说数学是人类心智的荣耀了。

那下面我要开始分析这个谜题了哦,如果想自己做题目的,就别继续看了哦。隆重推荐这道题目的另外一个原因是,做这个谜题的时候纯粹只需要在脑子里面想就可以了,还可以找个人一起玩这个游戏找找感觉。

第一步,将问题转化为等价的容易思考的形式,也就是说,需要回答这个问题:k+1个回答中至少有一次回答是真话究竟意味着什么?

每次乙提出一个问题“x是否属于S?”,如果甲回答“是”,那么我们说Sx可能属于的集合,反之,我们说S的补集是x可能属于的集合。为了简化,我们称这个集合为可能集。

那么,经过若干轮问答之后,我们得到一连串的可能集。连续k+1个回答中至少有一次回答是真话就意味着:在这一连串的可能集中,任意连续的k+1个可能集中必然有一个可能集真真切切地含了x这个元素。也就是说x包含于任意k+1个连续可能集的并集。

第二步,从甲和乙的角度分别考虑取胜的策略。针对第一个小问,我们从乙的角度出发考虑问题。

作为乙的智囊团,我们需要向甲提出一系列为难的问题,使得将x可能出现的范围不断缩小。不妨我们把注意力先集中在开始的k+1个问题,我们的目标是使得对应的k+1个可能集的并集最小。

不过多久,就能想到一种二分法的策略。

一开始,x的出现范围是1N中的所有数。那我们的第一个问题就是取出这个集合中一半数量的数问甲,x是不是在这个集合中呢?无论甲如何回答,在第一轮问答过后,得到的可能集,记为A_0,其大小最多就是\frac{N}{2}。此处为了分析方便,我们暂时忽略各种取整的细节。

那如何设计第二轮的问题,才能使得两轮问答过后得到的可能集的并尽量小呢?

啊哈!只需要将在A_0以外的元素分出来一半作为第二次提问的集合就好了,这样无论甲此次如何选择可能集,记为A_1,这两个集合并集的大小最多是\frac{N}{2}+\frac{N}{4}

这样,如此下去,每次我们都取出前面产生可能集以外元素的一半来质问甲就好了。经过k+1论后,可能集并集的大小最多就是\frac{N}{2}+\cdots+\frac{N}{2^{k+1}},于是通过之前的分析我们一定可以排除这个并集以外的元素了,也就是\frac{N}{2^{k+1}}个元素了。

如果补回之前忽略的取整细节,你立即会发现,只要N\geq 2^{k+1},那经过这样k+1轮之后,一定能够排除至少一个数字,从而缩小x的出现范围。

我们只要反复应用这个战术,每k+1轮过后,只要x的出现范围的大小大于等于2^{k+1},我们就能继续排除。

换句话说,我们总是能将x的出现范围的大小缩小到2^{k+1}以下。

但第一问的实质是让我们将这个范围的大小缩小到2^k,所以我们的方法需要改进,但毫无疑问我们已经开了一个好头。

观察到在这样的策略下,甲的第一次回答应该尽量选择一个较大的可能集,如果甲选择了只有一个元素的可能集,那可大事不妙了。

那能否迫使甲选择只有一个元素的可能集呢?答案是简单的。

只需要一直问甲,x是不是1就可以了。如果甲连续k+1轮都回答“不是”,那我们就成功排除了1这个数。如果甲某次回答“是”,那接下来的k个问题,我们如法炮制之前的二分法就可以成功排除\frac{N-1}{2^k}个数。

综上所述,只要N大于2^k,那我们就能不断地排除一些数。于是当n\geq 2^k,乙有必胜策略!

第二个子问题,请听下回分解。

Categories
Mathematics

Excerpt from Kunen

People living in M cannot construct a G which is P-generic over M. They may believe on faith that there exists a being to whom their universe, M, is countable. Such a being will have a generic G and an f_G=\cup G. The people in M do not know what G and f_G are but they have names for them, \Gamma and \Phi. They may also read the preceding few paragraphs and thus figure out certain properties of G and f_G

I could only say that Paul Cohen’s idea on the Continuum Hypothesis is amazing. And I am quite curious about his religious belief.

Categories
Mathematics

Comments on equality

This semester, I am assigned as a grader for 21-355 Principle of Mathematical Analysis I since I did really bad in the iTA test last year and ended up in the category 3 which doesn’t allow me to hold any recitation for undergraduate students. Anyway, I need to find some fun in the given situation. That’s my principle for life.

The textbook for the course is Rudin’s book, one of the most classic introductory books in analysis. As I need to grade several problems from the exercises part, I read through the first chapter and kept in mind what results students can use in their proofs. But quickly, I was stuck at the point where students started to use things like, if a=b, then ac=bc or if a=b then a+c=b+c.

Then I questioned myself, how can I use the axioms given in Rudin’s book to justify those arguments? I know I was being nerdy at that time, but it turned out that those field axioms tell us nothing about the interplay between the equality and the two operators, namely addition and multiplication. Moreover, in the proof for some basic properties of field Rudin took those arguments as granted.

One might say “It is obvious, isn’t it?”. But my response is “No.”. In first order logic with equality, those arguments actually used the axioms of equality.

To exaggerate a little bit, how does maths scare a lot of people? I can imagine a scenario that someone is looking up a solution using this kind of ‘tricks’ and get depressed since he didn’t think of such an apparent way of solving the problem.

But in fact, that’s not this guy’s fault, it’s the deficiency hidden in maths education. In my opinion, there are no magic in mathematics, but axioms and rules that people use them unconsciously. So maths educators, especially for those in a primary level, should pay more attention to these obvious issues. Sometimes, one really need to emphasis the trivial things at the beginning level.

Categories
Mathematics

Mathematical Abbreviations

Here is a short list of mathematical abbreviations which troubled me the first time I graded students’ papers. You should be familiar with these words if you have used English as your first language in mathematics for years. But for students who has no experience of writing maths in English, this list might help.

AbbreviationDefinition
ETS / ETP / STPenough to show / enough to prove / sufficient to prove
FSOC / FSOCWMAfor sake of contradiction / for sake of contradiction, we may assume
NTS / WTP / WTSneed to show / want to prove / want to show
TFAEthe following are equivalent
W^5which was what we want
WLOG / WALOG / WLOGWMAwithout loss of generality / without any loss of generality / without loss of generality, we may assume
WRTwith respect to

Categories
Mathematics

How to say maths terms in English?

On Maths Presentation Workshop at Carnegie Mellon University, Professor Brandon was trying to to help us understand the undergrad in United States and improve our teaching skills. This was my first time knowing how to say maths terms in English. And I want to share with you some of the tips. Before I start, I would like to abbreviate some of the points that Prof. Brandon mentioned concerning teaching.

  • Lousy background. American students have a lousy background in mathematics, but it doesn’t mean they are not smart. You will see students writing \frac{1}{a}+\frac{1}{b}=\frac{1}{a+b} or \sqrt{a}+\sqrt{b}=\sqrt{a+b}, but it’s not their fault. Even their high school teachers would say ‘Well, it’s just a different way to do those calculations.’
  • Cheat on homework. A student who is copied by others is considered cheating as much as the student who copied him or her.
  • Handholding. Some students do not know how to learn things. They need someone to guide them, for instance, how to use textbooks, how to prepare exams, etc.
  • 2 tips on teaching skills. Keep eye contact. Reword things and write it down.

Finally, here is the list that gives you some examples of spoken Maths.

Maths Terms How to say it?
\frac{1}{x^n} 1 over x to the power of n
\sqrt[3]{x}, \sqrt[4]{x} cube root of x, fourth root of x
\sin^{n}x sine to the power of n of x
f\circ g f composed with g
\lim_{x\rightarrow a}f(g(x)) limit as x approaches a of f of g of x
x\rightarrow \pi^{-} x approaches pi from the left
\frac{d}{dx}f(x) the derivative of f of x (with respect to x) or d over dx of f of x
\frac{\partial x}{\partial y} del y over del x
\nabla, \Delta gradient, Laplacian
\log_a(xy) log (with respect to) base a of x times y
\ln x natural log of x or ln of x
f(x)g(x)]_a^b f of x times g of x evaluated from a to b
f(x,y), (a,b) f of x comma y, a comma b
f'(x), f^{\prime\prime}(x), f^{\prime\prime\prime}(x) f prime of x, f double prime of x, f triple prime of x
S_n, \cdots, f_x s sub n, dot dot dot or etc, f sub x
Categories
Mathematics

John 椭球

对于一个三角形T,一定可以找到一个椭圆E,满足,E\subseteq T\subseteq 2E。对于一个平行四边形P,一定可以找到一个椭圆E^*,满足,E^*\subseteq P\subseteq \sqrt{2}E^*。由于在仿射变换下三角形,平行四边形,椭圆,线段比例都保持,所以只需要对正三角形和正方形进行证明就可以了。

实际上,John 定理断言,每一个n维凸体K都有一个相应的椭球E满足,E\subseteq K\subseteq nE。对每一个中心对称凸体C,都有一个相应的椭球E^*满足,E^*\subseteq C\subseteq \sqrt{n}E^*

为了证明 John 定理,我们需要引入 John 椭球的概念,为此需要证明 John-Loewner 椭球定理:对于任意一个n维空间中的含有内点的紧子集,存在唯一的椭球包含K,使得椭球体积达到最小,此时,称该椭球为 John 椭球。

证明(概要):利用椭球与n阶正定对称阵的联系,考虑所有包含 K 的椭球的中心和其对应的正定对称阵构成空间C_K,证明\det函数在C_K上取到最大值,存在性得证。如果\detC_K中有两个极大值点,可以通过这两个极大值点构造C_K中的元素,使得\det在该元素上的取值更大(这里需要利用\ln \detC_K上的凹性),由此导出矛盾,唯一性得证。

作为应用,我们考虑所有的GL_n(\mathbb{R})的紧子群G,令K=\cup_{g\in G}{g(B^n)},其中B^n是单位球,此时K含有内点。于是K是在任意G中元素作用下稳定,如果EK的 John 椭球,那么E在任意G中作用下也稳定(这是因为G的紧致性保证了其任意元素g的行列式为1,于是g(E)明显包含K,且体积与E相等,由 John 椭球的唯一性知E的稳定性),由此可知存在v\in GL_n(\mathbb{R}),使得对任意g\in G,有,v^{-1}gv(B^n)=B^n,故v^{-1}Gv\subseteq SO_n(\mathbb{R}),也就是说在相差一个共轭的程度上,正交群是极大的紧子群。

Categories
Mathematics

井田问题

Matrix67 致敬,我也来写点科普文章。

假期给初一、初二小孩讲几何课的时候,都提到了下面的命题:四边形 ABCD中, E,F依次是边 BC上的两个三等分点, G,H依次是边 AD上两个三等分点,求证 S_{ABCD}=3S_{EFHG}

三分田地

这道题目立即让我联想起了张景中院士的一本科普书,上课的时候我记不得这本书的名字了,后来回去搜索了一下,叫《新概念几何》。我的爸妈一般不给我买书(可能是因为买过两套四大名著我都不看的关系吧),记忆中他们还给我买过的书只有《十万个为什么(小学版)》和《新概念几何》了。

在《新概念几何》中提到过一个问题,叫做井田问题:传说从前的农民要分地的话,没有很精准的丈量工具,也不懂几何或是微积分,所以都采取一些近似的方法。有一天,村里的九户人家发现了一块四边形的田地,他们打算平分这块田。于是他们通过脚步丈量四条边后,将四边都三等分后,如下图将田分为了九块。

井田问题

很明显,这块田地没有被九等分。但会不会像之前的“三分田地”问题一样,有类似的结论呢?很自然的一个猜测是最中间的土地面积是总面积的九分之一。其数学表述如下:四边形 ABCD中, E,F依次是边 BC上的两个三等分点, G,H依次是边 AD上两个三等分点, I,J依次是边 AB上的两个三等分点, K,L依次是边 DC上两个三等分点,求证 S_{ABCD}=9S_{MNOP}

由于整个问题犹如在一块田中写一个井字,井田问题由此得来。

记得当时上课的时候,由于是即兴想到这个问题,双核同时运转,只顾着把故事编得生动一些,没腾出脑子去想解答,所以留下这个问题让小朋友自己回家思考了……

实际上,建立在之前的三分田地的结论上,这个问题是很容易解决的:

由之前的结论,红色四边形 IJLK的面积是整个面积的三分之一。假设我们能再证明 M,P N,O分别是 IK, JL的三等分点,那再次利用三分田地的结论,我们就能证明红黑相交的部分 MPNO的面积是红色四边形 IJLK的三分之一,也就是原四边形的九分之一了。

最后我们证明 M,P N,O确实分别是 IK, JL的三等分点,为此我们只证明 M IK的三等分点。

为了能看得更清楚,我们考虑与需要证明的结论有关的部分。

辅助图

由定比分点公式,我们有关于面积的等式:

S_{GKE} = \frac{2}{3}S_{GDE}+\frac{1}{3}S_{GCE}=2\left(\frac{2}{3}S_{GAE}+\frac{1}{3}S_{GBE}\right)=2S_{GIE}.

于是 KM = 2IM,就证明了需要的结论。

通过类似的论证,读者还能将井田问题中的三等分替换为五等分、七等分甚至任意 2n+1等分。进行类似的面积划分后,最中间那块的面积便是 (2n+1)^2分之一了。

Categories
Mathematics

数学之路——对话恽之玮学长

记者:姜子麟,王志宇,盛开

恽之玮学长是北京大学数学科学学院 00 级的学生,后去 Princeton 读研,2009 年至 2010 年期间在高等研究所(Institute of Advanced Study)工作,目前在 MIT 就职,主要工作方向为几何表示论(Geometric Representation Theory)。

王志宇:学长是什么时候开始下决心做纯数学?

恽之玮:是在高中和大学之间,参加完竞赛之后。我有一段时间不想做纯数学,也可能是题做多了有点厌烦,想念计算机或者其他的。后来碰到南大的一个老师,她跟我说纯数学还是很有意思的,越念下去越有意思。我进大学后就一直觉得纯数学很有趣,后来就没有动摇过。

王志宇:你们那届做纯数学的有多少人?

恽之玮:现在还在做的大概有十个人或者更多。从之前和之后的比较来说这都是一个 local 的最大值,整体的不能说。

姜子麟:现在是否诱惑比较多,容易半途转行?

恽之玮:你说是应用方面,嗯,这个一直有很多人,数学学院毕业的做这些的是大多数人。做金融,统计啊等等。我个人对这些不太感兴趣。

姜子麟:学长当年如果转向了计算机,现在的轨迹会是什么样的?

恽之玮:我高中时候喜欢的那种计算机其实就是编程序,像现在的环境,清华有姚班,我也有可能去上。你们也知道那些竞赛题,有些组合跟计算机题挺像的,当时做的有点腻了,如果以我当时的状态,可能就不会选择进一步研究类似组合的问题。我觉得数学的好处在于,可能解问题时,最后还是要用一些组合的办法,但问题的方面更加广,像几何的数论的。同样的想法,可以有不同的包装。表面完全不同的问题,最后的想法又是相通的。

盛开:学长刚接触到大学课程的时候,对那些课程的感受是怎么样的?

恽之玮:上大学之前,我买了一本拓扑的教材。当时看不懂,但我觉得他讲得很清楚。大一寒假里,我再翻一翻,越看越有意思,拓扑是吸引我的一个方面。还有一个就是 Galois 理论,当时老师也没有时间教,主要是自己看的。像 Galois 理论,它证明五次方程没有根式解,中学里就听过,但一直不知道是怎么证的。我一直对它挺感兴趣。我对群的概念比较感兴趣,这两个都涉及到群的概念。对我比较影响比较大的几门课,高峡老师当时给我们讲了“模形式”,这门课非常适合本科生,是一门很综合的学科。现在不用着急,今后肯定要学一学。它把拓扑,复变还有群论这些东西都综合在一起,学了之后你会发现,前面的东西你可能都没学懂,真正用的时候不会操作。

王志宇:学长在低年级的时候就学这些高级课程?

恽之玮:我是在大三时候学的模形式。我听说你们现在越学越早,有些同学二年级就开始学代数几何。我前面两年对拓扑和微分几何感兴趣,后面才转向代数几何。我当时还参加讨论班,郑志明老师讲动力系统。高峡老师的讨论班不限定题目,可以读一个东西或者自己想点问题。当时陈维桓开了个研究生课程,是黎曼几何,我也去听了这个课。那个讲的更一般一些,相比本科的微分几何来说。其实研究生课程也不见得比本科的难多少,如果你把数学分析和代数学的比较好的话。

恽之玮:抽象代数其实应该早点学。依照我们的传统,比我更早学数学的人当中,学几何还有分析的比较多,学代数、数论、表示论这些都相对比较少一些。这可能就是因为我们代数学的时间比较靠后,而且讲的东西也不是很多。你们现在有抽代二,抽代二其实是研究生的课。如果把抽象代数一、二这两门都上下来,肯定比我们当时学得多。至少Galois理论,作为一个本科生,作为一个数学系的本科生,你一定得知道是怎么回事。相当于是数学在前一个世纪,也就是十九世纪最大的突破吧。学数学肯定得知道这个。

姜子麟:有时候会觉得学习代数有点枯燥,这方面学长怎么看?

恽之玮:你们学抽象代数是吧。我当时学群的时候也觉得很抽象,证明的那些关于群的定理,都是很抽象也很一般的,那些论述对所有群都是成立的。其实学群不应该这么学,应该学个别的具体的群。而且不要光学群,要学习群在空间中的作用。群作用的概念现在是有的,但是你对于每一个群都要想一想它作用在什么上面。如果你凭空造出来一个群,不作用在任何东西上面,它是没有意义的。比如说学习 Galois 理论你就知道了,这个群作用在这个域上。如果群不作用在任何东西上,就没有存在的意义。

王志宇:学长能举一些更具体的例子吗?

恽之玮:以前那个杨磊老师——现在不知道还开不开课,我们受他影响挺大的——他当时叫我们念 Klein 的《二十面体与五次方程》。那本书是二十世纪初翻译的,语言跟现在的有些不一样。但是它的内容都是很具体的、可以动手算的,我们当时做了个二十面体模型来看这里的对称群。它还可以被用来解某些五次方程;或者换句话说,五次方程不能用根式解,但是可以用椭圆函数解。相当于当时的数学基本上都能归纳到这本书里了。当时数学其实已经很丰富了,当时的比如模型式的理论,虽然不算很系统,但是模型式里的特殊函数、矩阵、线性变换群,还有 Riemann 曲面、常微分方程——你可以将里面的东西翻译成 Riemann 曲面的语言。这样书如果认真看的话可以学到很多东西。一般书讲的很抽象,上来看两页就不想看。但这本书不用当任务看,你可以当消遣看。能当消遣看的数学书不多。

恽之玮:我在 Princeton 的导师叫 MacPherson,是个拓扑学家,也做代数几何和表示论,但他拿手的是拓扑。当然他现在在考虑一些应用数学问题,也很有意思。我没有跟他学过这些东西,但是完全是纯数学里的想法,相当于开创了拓扑的一个新的方向。他告诉我,一定要注重例子。比如说找几个曲线、曲面作为例子来算;定理成立与否的条件,这个条件是否必要也,可以举个例子。这一方面帮助你理解一般性的定理:有些时候通过分析一个例子你可以就知道一般性的问题该怎么做了;另一方面在今后做研究的时候,很多研究中的问题都来源于例子,而不是抽象的理论。最终的研究成品可能是一个推广了的、发展了的理论,是一套抽象的东西,但其实研究者的出发点肯定不是这样的。有一个具体的问题或具体的例子,原来的工具不够用了,从而迫使他来发现、来改进别人的结果,于是新发展了一套工具,而不是为了发展这个而发展。一定要以例子为中心。也有个别不看例子的,像 Grothendieck,他是代数几何的一代宗师。这种人毕竟是少数,他的逻辑思维的能力太强了,很少能有人能达到这个程度。其他的数学家,虽然写的有些东西很抽象,但他的想法,如果你跟他交谈或者是了解他私下里怎么思考,其实都是例子。

姜子麟:那怎么能挑选一个比较好的问题作为自己的入手?

恽之玮:其实我这方面也不算成功,我刚开始做本科生科研,跟着姜伯驹老师和王诗宬老师,最后什么都没有写出来,我也是不好意思的。其实本科生科研里有很多很具体的东西,上手并不难,是可以操作操作的。但是我自己没有找到什么比较好的问题。

王志宇:您当时是跟那两位老师做什么方面的?

恽之玮:姜伯驹老师让我读一些关于辫群的东西。那是很有意思的东西,我现在还很感兴趣,和纽结理论有一些关系,里面又有一些代数的操作,你可以具体地去算。但是我最后也没做出什么东西来。王诗宬老师当时开了一门课叫几何群论,几何群论的主要观点就是把每一个群都看成空间的基本群,换句话说就是把群作用在一个拓扑空间上。把每一个用几何的观点来看,然后推导出一些纯粹关于群的一些性质,也就是代数的性质。但是用几何的观点去看,你就看得自然一些。这个也是很有意思的一个学问,但是我的主要的经历还是放在学上面了。其实本科生搞科研可能还是早了点,当时我也没有找到什么合适的问题。

王志宇:学长是认为本科生具有的预备知识太少了嘛?

恽之玮:对,后来我在 Princeton 读研究生的时候有一些同学,他们本科的时候多少做过一些科研的,而且有的还写出文章来了,但是我觉得那个意思也不是很大。老师是手把手地把问题给你,大概怎么做给你,然后当然你自己需要一些聪明才智。但是即使你做出来,最后对整个框架性的东西还是不太了解。这个问题为什么要这么提,它的来龙去脉怎么样,不可能在这么短的时间又做出来又了解它的背景,总会是有得有失。他们拿出来的成果表面上比我们厉害,因为本科生的时候就有论文发表。但是我觉得不用急,不用急着做原创性的东西,但不要说人家做过的我就不去算,如果你要练习就还是要上手。你做的目的不是为了发现一个新的东西,至少你不能有意地去发现,你可以研究例子的时候无意地去发现。有时候你盯着一个最简单的例子来问,就可以问很多很多问题。基本上任何一个简单的问题最后问到足够深都能问到那种 open problem 不一定是别人做不出来的,但至少是别人没有考虑过的。这个也是很重要的一件事:提出问题。

王志宇:平面几何给人以形象的直观,但现代的几何就非常抽象,与我们所能看见的事物似乎没有什么联系了,这些理论究竟是在研究什么样的问题呢?

恽之玮:这个问题物理上有一些动机,研究弦理论的学者认为世界是高维度的,并且具有一些结构,例如 Calabi–Yau 流形。数学上,比如代数几何,它的动机是研究多项式方程组的解,当变量增多的时候,维数自然就高了。代数几何学家想问题的时候,经常画一些图,只用到了一维,二维,三维的直觉来画一些类比的图形。比如画一条代数曲线,代数曲线应该是一个 Riemann 面,是二维的,他可能就画一条线来类比。很多情况看是看不见的,最多就能看到三维流形。我们平常只能看到最简单的三维流形,是在欧氏空间中的,那还有很多很复杂的三维流形,那些就只有从事三维流形研究的人能够想象了,他会做一些手术,比如粘合。比如 Thurston,他主要研究三维流形,他就能在脑子里清楚的看到一些三维流形。也有时候,他们不是在想那个流形本身,而是划归为一些相对熟悉的东西,比如高维的球面,虽然不知道它的样子,但是还是能稍微想想的。还有一些流形,或者是代数几何中研究的代数簇,它局部上有的时候就像一个球一样,有的可能带有奇点,比如可能像一个锥。其实都是在用一些熟悉的图形在类比,只不过维数变高了。

王志宇:学长在 Princeton 学习和工作辛苦吗?能保持一直高效率的状态吗?

恽之玮:还可以,以前田刚老师跟我们说,他每天要工作 12 个小时。我数来数去我数不到,反正我自己工作不到 12 个小时。我觉得有时候不要太苛求自己了,只要不浪费时间就可以了。我们当时,第一年要通过一个博士生资格考试,那个就可能花一个学期准备准备,那些都是考基础的知识为主,加上两个自选科目,形式是口试问答。之后最困难的就是需要找一个问题做,这有时候比解决问题还要困难。各个导师不一样,有的导师会给你一个问题,有的导师甚至把答案都告诉你,有的导师不给你问题也不给你答案,这个看运气了。但是最终你是要自己找一个问题了。

姜子麟:物理学家推导命题的时候经常不会特别严格,那做数学的时候会不会在某些步骤就一步跨过去呢?

恽之玮:对,你先要往前看几步,大概看看这个路能否走通。不能说,我一直往前推进,遇到一个问题就一直卡在那里。有的时候,你可能觉得这不是一个大问题,比如稍微加一个条件,或者是觉得再花点功夫能做出来。所以应该先往前看几步,回来再补一补。当然这个要有一定的直觉了,有时候你可能看的是不对的,觉得能做出来,其实做不出来,这也有可能发生,这就比较糟糕了。至少需要试着举几个例子,有时候一试就发现错了。

王志宇:学长接下来是准备一直待在美国吗?

恽之玮:这个目前还不知道,我接下来就是做两年博士后,做完了还要再找工作,希望能找到个比如教授的职务。现在美国属于找工作比较苦难的时期,因为金融危机之后,对于学校经费的影响有一定的滞后效应金融危机后,金融市场可能已经好转了,但是学校的拨款,有的是政府拨款,有的是私人捐款,还是会受到前两年的影响。今年用的可能是去年或者前年的资金,或者是前年就已经规划好了,所以它会滞后一些,这对找工作有一些影响。

姜子麟:据我所知,美国的教授职务属于继承式的,比如某个方向上一个教授退下去了,才会招一个做相同方向的教授,这些会对决定自己研究方向或者寻找工作带来很多限制吗?

恽之玮:美国有一些系只重点发展几个方向,基本上教授的总数总是保持差不多的,可以说退一个才能要一个。有几年是退休的高峰,工作就比较好找。申请工作的时候,不会第一名到第五十名的学校都去申请。事实上,有些学校肯定不会录取你,因为那个学校没有做你那个方向的。比如做代数几何的,到有的学校,那里一个做代数几何的都没有,审你材料的人不会对你感兴趣,所以选择不是特别多。美国有那么多学校,上百个学校,具体到每个方向,能有二十个学校选择已经很不错了。

王志宇:在美国,从学校出来,如果无法找到相应的工作,会怎么办?

恽之玮:一般就只能自己退出了。每个学校,比如说,今年招了 20 个研究生,但是它只招 5 个博士后,也就是说,如果每个学校都这么算的话,有四分之三的人就找不到工作了。如果是这样的话,有些人会选择往世界其他地方分,有回国的,有中途转业,但不是转得特别离谱的,比如转到应用方向的,或者是统计的,有的就在统计或者应用的方面做一些应用或者研究,做教授,有的就到工业界。

姜子麟:您当年是如何申请到 Princeton 的研究生的?

恽之玮:我觉得它一般就是看成绩。

王志宇:学长当年有多少同龄人去了像 Princeton 这样的比较好的学校?

恽之玮:有一个袁新意是我同学,他比我早一年毕业,3 年就毕业了,他在 Columbia 大学做得很好,是张寿武的学生。我这一年毕业的有张伟,后来也师从张寿武,也在 Columbia 大学。还有一个叫朱歆文,去了 Berkley 大学。当年申请有两个 99 级的,比我们高一级,他们是三加二,三年本科,两年研究生,有一个和我去了 Princeton 大学,我们是同学,还有一个去了 MIT。Stanford 有一个念统计的同学。所以这些学校每年都会要北大的学生。

姜子麟:在 Princeton 求学的过程中,有没有发现国外学生一些特别的地方值得我们学习?

恽之玮:我说的可能不是普遍情况,但是就我接触到的,我有一个师兄,也是和我同一个导师的,给我印象最深的就是他的问问题的能力比较强。比如我们听同样一个报告,听完了我就问不出问题来,而他就能问出问题来,而且问问题问得很有意思,问的是那种确实值得去思考的问题。有的人虽然也能问问题,但问的内容,就比如这个理论能不能推广到高维,比较一般性的问题实际上没有多大意思。我的那个师兄问的问题就是针对那个讲座的,特别具体的一些问题,这个能力,我觉得,不是能够教出来的。可能大家讨论的话,情况会好一点,讨论班的话,会比较随便一点。比如黑板上就抄着一个定理,也不用管它的证明,那能不能就这个定理问一些问题?这个定理一定要去欣赏,很多时候写了一个定理,然后开始证明,证明完了,课也结束了。其实证明是其次的,如果真正感兴趣了,再去看证明,但首先需要欣赏,这个定理漂亮在哪里?有用在哪里?要花甚至比看证明更多的时间来想这个问题。当然,我现在是这么说,但我学的时候也不是这样,自学的时候也是看看定理的叙述,有时候自己再想一想,有时候想不出来就看看人家怎么做的。其实这些都是中学的学习方法,是不对的,真正学数学不应该这么学。

Categories
Mathematics

Hall's Theorem

In 1935, the mathematician Philip Hall discovered a criteria of a perfect matching on a bipartite graph, known as Hall’s theorem, aka marriage theorem.

Considering two sets of vertices, denoted as A=\{a_1, \cdots, a_m\} and B=\{b_1, \cdots, b_n\}. Edges are connected between a_i and b_j for some pairs of (i, j). Here, we admit multiple edges between two vertices. This graph, named as G, is called a bipartite graph. One may think the bipartite graph as a model of a marriage system. Taken set A and set B as boys and girls, an edge between a boy and a girl tells that they might be a couple. A natural question arises that whether there exists a perfect matching, that is, every boy can find his girl without conflicting with other boys. Put formally, a perfect matching is an injective map from A to B which induces a subgraph of G.

Hall’s theorem asserted that there is a perfect matching in above bipartite graph G if and on if for every subsets S of A, the cardinal number of S(number of the members in a set) does not exceed the number of the neighbors of S. Here, the neighbors of S are all vertices in B which are connected with one vertex in S.

The necessary condition apparently holds. On the other hand, the proof of the sufficient part is a little bit tricky. Challenge yourself with this problem.

Now we represent several applications of Hall’s theorem.

  • Let A=(a_{ij}) be a matrix of order n with nonnegative integers as entries, such that every row and column of A has sum l. Then A is the sum of l permutation matrices.
  • Suppose all vertices of a directed graph G have the same in-degrees and out-degrees of l, a fixed positive integer. Thus the graph can be separated as l groups of circles. Every vertex appears exactly once in each group, and each edge appears in only one circle.
  • (Birkhoff-von Neumann theorem) A doubly stochastic matrix, is a square matrix of nonnegative real numbers, each of whose rows and columns sums to 1. Assertion: every doubly stochastic matrix is mere a convex combination of permutation matrices.

The first two questions are equivalent with each other if one noticed the relation between a graph and its adjacent matrix. In addition, in the first problem, we construct a bipartite graph G consisting of 2n vertices, u_1, \cdots, u_n and v_1, \cdots, v_n and a_{ij} edges between each pair of vertices u_i and v_j. It can be easily verified that this graph meets the condition required in Hall’s theorem through contradictory argument. Thus we get a perfect matching in this graph, which conversely represent a permutation matrix. After Eliminating the perfect matching in this graph, it become to the situation of l-1 in-degrees and out-degrees. This allows us to use mathematical induction on the parameter l.

If the entries of the matrix are all rational in the third problem, the result is reduced to the first problem. Still, we shall cover the gap between rational numbers and reals. Denote all permutation matrix as M_1, \cdots, M_{n!}. For each matrix (a_{ij}) meets the requirement that the sum of each row or each column is 1, one can approach it by (a_{ij}^n) with rational entries which is also doubly stochastic. Thus we can decompose the rational matrix (a_{ij}^n) into a convex combination of permutation matrices with rational coefficient c_k^n, where k \in \{1, \cdots, n!\}. By finding convergent subsequence in each (c_k^n)_{n=1}^\infty, one can require that all these subsequences are convergent for the same indices, and they converge to a limit c_k. After all, the original matrix (a_{ij}) can be expressed as a convex combination of permutation matrices with coefficients c_k.