Python 多核并行計(jì)算的示例代碼

更新時間：2017年11月07日 17:10:03 作者：Jekyll & whiteglass

本篇文章主要介紹了Python 多核并行計(jì)算的示例代碼，小編覺得挺不錯的，現(xiàn)在分享給大家，也給大家做個參考。一起跟隨小編過來看看吧

以前寫點(diǎn)小程序其實(shí)根本不在乎并行，單核跑跑也沒什么問題，而且我的電腦也只有雙核四個超線程（下面就統(tǒng)稱核好了），覺得去折騰并行沒啥意義（除非在做IO密集型任務(wù)）。然后自從用上了32核128GB內(nèi)存，看到 htop 里面一堆空載的核，很自然地就會想這個并行必須去折騰一下。后面發(fā)現(xiàn)，其實(shí) Python 的并行真的非常簡單。

multiprocessing vs threading

Python 自帶的庫又全又好用，這是我特別喜歡 Python 的原因之一。Python 里面有 multiprocessing和 threading 這兩個用來實(shí)現(xiàn)并行的庫。用線程應(yīng)該是很自然的想法，畢竟（直覺上）開銷小，還有共享內(nèi)存的福利，而且在其他語言里面線程用的確實(shí)是非常頻繁。然而，我可以很負(fù)責(zé)任的說，如果你用的是 CPython 實(shí)現(xiàn)，那么用了 threading 就等同于和并行計(jì)算說再見了（實(shí)際上，甚至?xí)葐尉€程更慢），除非這是個IO密集型的任務(wù)。

GIL

CPython 指的是 python.org 提供的 Python 實(shí)現(xiàn)。是的，Python 是一門語言，它有各種不同的實(shí)現(xiàn)，比如 PyPy, Jython, IronPython 等等……我們用的最多的就是 CPython，它幾乎就和 Python 畫上了等號。

CPython 的實(shí)現(xiàn)中，使用了 GIL 即全局鎖，來簡化解釋器的實(shí)現(xiàn)，使得解釋器每次只執(zhí)行一個線程中的字節(jié)碼。也就是說，除非是在等待IO操作，否則 CPython 的多線程就是徹底的謊言！

有關(guān) GIL 下面兩個資料寫的挺好的：

http://cenalulu.github.io/python/gil-in-python/
http://www.dabeaz.com/python/UnderstandingGIL.pdf

multiprocessing.Pool

因?yàn)?GIL 的緣故 threading 不能用，那么我們就好好研究研究 multiprocessing。（當(dāng)然，如果你說你不用 CPython，沒有 GIL 的問題，那也是極佳的。）

首先介紹一個簡單粗暴，非常實(shí)用的工具，就是 multiprocessing.Pool。如果你的任務(wù)能用 ys = map(f, xs) 來解決，大家可能都知道，這樣的形式天生就是最容易并行的，那么在 Python 里面并行計(jì)算這個任務(wù)真是再簡單不過了。舉個例子，把每個數(shù)都平方：

import multiprocessing

def f(x):
  return x * x

cores = multiprocessing.cpu_count()
pool = multiprocessing.Pool(processes=cores)
xs = range(5)

# method 1: map
print pool.map(f, xs) # prints [0, 1, 4, 9, 16]

# method 2: imap
for y in pool.imap(f, xs):
  print y      # 0, 1, 4, 9, 16, respectively

# method 3: imap_unordered
for y in pool.imap_unordered(f, xs):
  print(y)      # may be in any order

map 直接返回列表，而 i 開頭的兩個函數(shù)返回的是迭代器；imap_unordered 返回的是無序的。

當(dāng)計(jì)算時間比較長的時候，我們可能想要加上一個進(jìn)度條，這個時候 i 系列的好處就體現(xiàn)出來了。另外，有一個小技巧，就是輸出 \r 可以使得光標(biāo)回到行首而不換行，這樣就可以制作簡易的進(jìn)度條了。

cnt = 0
for _ in pool.imap_unordered(f, xs):
  sys.stdout.write('done %d/%d\r' % (cnt, len(xs)))
  cnt += 1

更復(fù)雜的操作

要進(jìn)行更復(fù)雜的操作，可以直接使用 multiprocessing.Process 對象。要在進(jìn)程間通信可以使用：

multiprocessing.Pipe
multiprocessing.Queue
同步原語
共享變量

其中我強(qiáng)烈推薦的就是 Queue，因?yàn)槠鋵?shí)很多場景就是生產(chǎn)者消費(fèi)者模型，這個時候用 Queue 就解決問題了。用的方法也很簡單，現(xiàn)在父進(jìn)程創(chuàng)建 Queue，然后把它當(dāng)做 args 或者 kwargs 傳給 Process 就好了。

使用 Theano 或者 Tensorflow 等工具時的注意事項(xiàng)

需要注意的是，在 import theano 或者 import tensorflow 等調(diào)用了 Cuda 的工具的時候會產(chǎn)生一些副作用，這些副作用會原樣拷貝到子進(jìn)程中，然后就發(fā)生錯誤，如：

could not retrieve CUDA device count: CUDA_ERROR_NOT_INITIALIZED

解決的方法是，保證父進(jìn)程不引入這些工具，而是在子進(jìn)程創(chuàng)建好了以后，讓子進(jìn)程各自引入。

如果使用 Process，那就在 target 函數(shù)里面 import。舉個例子：

import multiprocessing

def hello(taskq, resultq):
  import tensorflow as tf
  config = tf.ConfigProto()
  config.gpu_options.allow_growth=True
  sess = tf.Session(config=config)
  while True:
    name = taskq.get()
    res = sess.run(tf.constant('hello ' + name))
    resultq.put(res)

if __name__ == '__main__':
  taskq = multiprocessing.Queue()
  resultq = multiprocessing.Queue()
  p = multiprocessing.Process(target=hello, args=(taskq, resultq))
  p.start()

  taskq.put('world')
  taskq.put('abcdabcd987')
  taskq.close()

  print(resultq.get())
  print(resultq.get())

  p.terminate()
  p.join()

如果使用 Pool，那么可以編寫一個函數(shù)，在這個函數(shù)里面 import，并且把這個函數(shù)作為 initializer傳入到 Pool 的構(gòu)造函數(shù)里面。舉個例子：

import multiprocessing

def init():
  global tf
  global sess
  import tensorflow as tf
  config = tf.ConfigProto()
  config.gpu_options.allow_growth=True
  sess = tf.Session(config=config)

def hello(name):
  return sess.run(tf.constant('hello ' + name))

if __name__ == '__main__':
  pool = multiprocessing.Pool(processes=2, initializer=init)
  xs = ['world', 'abcdabcd987', 'Lequn Chen']
  print pool.map(hello, xs)

以上就是本文的全部內(nèi)容，希望對大家的學(xué)習(xí)有所幫助，也希望大家多多支持腳本之家。

您可能感興趣的文章: