Python編程產(chǎn)生非均勻隨機(jī)數(shù)的幾種方法代碼分享

更新時(shí)間：2017年12月13日 15:42:50 作者：冬木遠(yuǎn)景

這篇文章主要介紹了Python編程產(chǎn)生非均勻隨機(jī)數(shù)的幾種方法代碼分享，具有一定借鑒價(jià)值，需要的朋友可以參考下。

1.反變換法

設(shè)需產(chǎn)生分布函數(shù)為F(x)的連續(xù)隨機(jī)數(shù)X。若已有[0,1]區(qū)間均勻分布隨機(jī)數(shù)R,則產(chǎn)生X的反變換公式為：

F(x)=r, 即x=F^-1(r)

反函數(shù)存在條件：如果函數(shù)y=f(x)是定義域D上的單調(diào)函數(shù),那么f(x)一定有反函數(shù)存在,且反函數(shù)一定是單調(diào)的。分布函數(shù)F(x)為是一個(gè)單調(diào)遞增函數(shù)，所以其反函數(shù)存在。從直觀意義上理解，因?yàn)閞一一對(duì)應(yīng)著x,而在[0,1]均勻分布隨機(jī)數(shù)R≤r的概率P(R≤r)=r。因此,連續(xù)隨機(jī)數(shù)X≤x的概率P(X≤x)=P(R≤r)=r=F(x)

即X的分布函數(shù)為F(x)。

例子:下面的代碼使用反變換法在區(qū)間[0, 6]上生成隨機(jī)數(shù)，其概率密度近似為P(x)=e^-x

import numpy as np
import matplotlib.pyplot as plt
# probability distribution we're trying to calculate
p = lambda x: np.exp(-x)
# CDF of p
CDF = lambda x: 1-np.exp(-x)
# invert the CDF
invCDF = lambda x: -np.log(1-x)
# domain limits
xmin = 0 # the lower limit of our domain
xmax = 6 # the upper limit of our domain
# range limits
rmin = CDF(xmin)
rmax = CDF(xmax)
N = 10000 # the total of samples we wish to generate
# generate uniform samples in our range then invert the CDF
# to get samples of our target distribution
R = np.random.uniform(rmin, rmax, N)
X = invCDF(R)
# get the histogram info
hinfo = np.histogram(X,100)
# plot the histogram
plt.hist(X,bins=100, label=u'Samples');
# plot our (normalized) function
xvals=np.linspace(xmin, xmax, 1000)
plt.plot(xvals, hinfo[0][0]*p(xvals), 'r', label=u'p(x)')
# turn on the legend
plt.legend()
plt.show()

一般來(lái)說(shuō)，直方圖的外廓曲線接近于總體X的概率密度曲線。

2.舍選抽樣法(Rejection Methold)

用反變換法生成隨機(jī)數(shù)時(shí)，如果求不出F-1(x)的解析形式或者F(x)就沒(méi)有解析形式，則可以用F-1(x)的近似公式代替。但是由于反函數(shù)計(jì)算量較大，有時(shí)也是很不適宜的。另一種方法是由Von Neumann提出的舍選抽樣法。下圖中曲線w(x)為概率密度函數(shù)，按該密度函數(shù)產(chǎn)生隨機(jī)數(shù)的方法如下：

基本的rejection methold步驟如下：

1. Draw x uniformly from [xmin xmax]

2. Draw x uniformly from [0, ymax]

3. if y < w(x)，accept the sample, otherwise reject it

4. repeat

即落在曲線w(x)和X軸所圍成區(qū)域內(nèi)的點(diǎn)接受，落在該區(qū)域外的點(diǎn)舍棄。

例子:下面的代碼使用basic rejection sampling methold在區(qū)間[0, 10]上生成隨機(jī)數(shù)，其概率密度近似為P(x)=e^-x

# -*- coding: utf-8 -*-
'''
The following code produces samples that follow the distribution P(x)=e^−x 
for x=[0, 10] and generates a histogram of the sampled distribution.
'''
import numpy as np
import matplotlib.pyplot as plt
P = lambda x: np.exp(-x)
# domain limits
xmin = 0 # the lower limit of our domain
xmax = 10 # the upper limit of our domain
# range limit (supremum) for y
ymax = 1
N = 10000  # the total of samples we wish to generate
accepted = 0 # the number of accepted samples
samples = np.zeros(N)
count = 0  # the total count of proposals
# generation loop
while (accepted < N):
    # pick a uniform number on [xmin, xmax) (e.g. 0...10)
  x = np.random.uniform(xmin, xmax)
    # pick a uniform number on [0, ymax)
  y = np.random.uniform(0,ymax)
    # Do the accept/reject comparison
  if y < P(x):
    samples[accepted] = x
    accepted += 1
    count +=1
  print count, accepted
# get the histogram info
# If bins is an int, it defines the number of equal-width bins in the given range 
(n, bins)= np.histogram(samples, bins=30) # Returns: n-The values of the histogram,n是直方圖中柱子的高度
# plot the histogram
plt.hist(samples,bins=30,label=u'Samples')  # bins=30即直方圖中有30根柱子
# plot our (normalized) function
xvals=np.linspace(xmin, xmax, 1000)
plt.plot(xvals, n[0]*P(xvals), 'r', label=u'P(x)')
# turn on the legend
plt.legend()
plt.show()

>>>

99552 10000

3.推廣的舍取抽樣法

從上圖中可以看出，基本的rejection methold法抽樣效率很低，因?yàn)殡S機(jī)數(shù)x和y是在區(qū)間[xmin xmax]和區(qū)間[0 ymax]上均勻分布的，產(chǎn)生的大部分點(diǎn)不會(huì)落在w(x)曲線之下(曲線e-x的形狀一邊高一邊低，其曲線下的面積占矩形面積的比例很小，則舍選抽樣效率很低)。為了改進(jìn)簡(jiǎn)單舍選抽樣法的效率，可以構(gòu)造一個(gè)新的密度函數(shù)q(x)(called a proposal distribution from which we can readily draw samples)，使它的形狀接近p(x)，并選擇一個(gè)常數(shù)k使得kq(x)≥w(x)對(duì)于x定義域內(nèi)的值都成立。對(duì)應(yīng)下圖，首先從分布q(z)中生成隨機(jī)數(shù)z0，然后按均勻分布從區(qū)間[0 kq(z0)]生成一個(gè)隨機(jī)數(shù)u0。 if u0 > p(z0) then the sample is rejected,otherwise u0 is retained. 即下圖中灰色區(qū)域內(nèi)的點(diǎn)都要舍棄?？梢?jiàn)，由于隨機(jī)點(diǎn)u0只出現(xiàn)在曲線kq(x)之下，且在q(x)較大處出現(xiàn)次數(shù)較多，從而大大提高了采樣效率。顯然q(x)形狀越接近p(x)，則采樣效率越高。

根據(jù)上述思想，也可以表達(dá)采樣規(guī)則如下：

1. Draw x from your proposal distribution q(x)

2. Draw y uniformly from [0 1]

3. if y < p(x)/kq(x) , accept the sample, otherwise reject it

4. repeat

下面例子中選擇函數(shù)p(x)=1/(x+1)作為proposal distribution，k=1。曲線1/(x+1)的形狀與e-x相近。

import numpy as np
import matplotlib.pyplot as plt
p = lambda x: np.exp(-x)     # our distribution
g = lambda x: 1/(x+1)      # our proposal pdf (we're choosing k to be 1)
CDFg = lambda x: np.log(x +1)  # generates our proposal using inverse sampling
# domain limits
xmin = 0 # the lower limit of our domain
xmax = 10 # the upper limit of our domain
# range limits for inverse sampling
umin = CDFg(xmin)
umax = CDFg(xmax)
N = 10000 # the total of samples we wish to generate
accepted = 0 # the number of accepted samples
samples = np.zeros(N)
count = 0 # the total count of proposals
# generation loop
while (accepted < N):
    # Sample from g using inverse sampling
  u = np.random.uniform(umin, umax)
  xproposal = np.exp(u) - 1
  # pick a uniform number on [0, 1)
  y = np.random.uniform(0, 1)
  # Do the accept/reject comparison
  if y < p(xproposal)/g(xproposal):
    samples[accepted] = xproposal
    accepted += 1
    count +=1
  print count, accepted
# get the histogram info
hinfo = np.histogram(samples,50)
# plot the histogram
plt.hist(samples,bins=50, label=u'Samples');
# plot our (normalized) function
xvals=np.linspace(xmin, xmax, 1000)
plt.plot(xvals, hinfo[0][0]*p(xvals), 'r', label=u'p(x)')
# turn on the legend
plt.legend()
plt.show()

>>>

24051 10000