请问下面的输出是?
<span class="kn">import</span> <span class="nn">eventlet</span>
<span class="kn">import</span> <span class="nn">threading</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">def</span> <span class="nf">count_10000</span><span class="p">():</span>
<span class="k">global</span> <span class="n">count</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">10000</span><span class="p">):</span>
<span class="n">count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="k">def</span> <span class="nf">count_in_threads</span><span class="p">():</span>
<span class="n">threads</span> <span class="o">=</span> <span class="p">[]</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="n">t</span> <span class="o">=</span> <span class="n">threading</span><span class="o">.</span><span class="n">Thread</span><span class="p">(</span><span class="n">target</span><span class="o">=</span><span class="n">count_10000</span><span class="p">)</span>
<span class="n">threads</span><span class="o">.</span><span class="n">append</span><span class="p">(</span><span class="n">t</span><span class="p">)</span>
<span class="n">t</span><span class="o">.</span><span class="n">start</span><span class="p">()</span>
<span class="c"># wait all threads to finish</span>
<span class="k">for</span> <span class="n">t</span> <span class="ow">in</span> <span class="n">threads</span><span class="p">:</span>
<span class="n">t</span><span class="o">.</span><span class="n">join</span><span class="p">()</span>
<span class="k">def</span> <span class="nf">count_in_coroutines</span><span class="p">():</span>
<span class="n">pool</span> <span class="o">=</span> <span class="n">eventlet</span><span class="o">.</span><span class="n">GreenPool</span><span class="p">()</span>
<span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="nb">xrange</span><span class="p">(</span><span class="mi">5</span><span class="p">):</span>
<span class="n">pool</span><span class="o">.</span><span class="n">spawn_n</span><span class="p">(</span><span class="n">count_10000</span><span class="p">)</span>
<span class="c"># wait all coroutines to finish</span>
<span class="n">pool</span><span class="o">.</span><span class="n">waitall</span><span class="p">()</span>
<span class="n">count_in_threads</span><span class="p">()</span>
<span class="k">print</span> <span class="n">count</span>
<span class="n">count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">count_in_coroutines</span><span class="p">()</span>
<span class="k">print</span> <span class="n">count</span>
本机的运行结果如下:
19598
50000
事实上,不论是在单核的 CPU 还是在多核的 CPU,多线程下 count 的值是不确定的(介于 1 至 50000),多协程下 count 值必定为 50000,stackoverflow 也有类似的问题 why-use-threading-data-race-will-occur-but-will-not-use-gevent。
一个 Python 进程内,任何时刻只有一个协程在运行,所以协程本质上是伪并发的。有人会问,由于 Python 全局解释锁(Global Interpreter Lock)的存在,一个 Python 进程内任何时刻同样仅有一个线程在运行,为什么多线程下就会出现 race condition 呢?
原图出处 UnderstandingGIL
和线程不同,协程由应用程序负责调度,操作系统并不感知。操作系统在切换线程的时机是不确定的,但是应用程序切换协程是有条件的。应用程序只有在以下场景才会切换协程:
- sleep:如 eventlet.sleep()
- IO:比如网络 IO,磁盘 IO 等。
所以协程在执行 count += 1 时不会被切换,保证了该操作的原子性,从 CPU 的角度来看,count += 1 可以分为三个步骤:
- 读取数据 count
- count 加 1
- 写回数据 count
由于线程的切换是随机的,不能保证 count += 1 的原子性,所以就有可能出现如下的 race condiction:
因此多线程下 count 的值不确定,但是介于 1 到 50000 之间。
综上,Python 多协程下的全局变量之所以不需要加锁,是因为以下两个条件保证了它不会出现 race condiction:
- 一个进程内,任何时候只有一个协程在运行。
- 协程的切换是有条件的,它只有在遇上 IO 和 sleep 等场景时才会触发切换。