python出現(xiàn)RuntimeError錯(cuò)誤問題及解決
下面是出現(xiàn)的錯(cuò)誤解釋
RuntimeError:
An attempt has been made to start a new process before the
current process has finished its bootstrapping phase.
This probably means that you are not using fork to start your
child processes and you have forgotten to use the proper idiom
in the main module:
if __name__ == '__main__':
freeze_support()
...
The "freeze_support()" line can be omitted if the program
is not going to be frozen to produce an executable.
下面是出現(xiàn)錯(cuò)誤代碼的原代碼
import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re
base_url = "https://morvanzhou.github.io/"
#crawl爬取網(wǎng)頁
def crawl(url):
response = urlopen(url)
time.sleep(0.1)
return response.read().decode()
#parse解析網(wǎng)頁
def parse(html):
soup = BeautifulSoup(html,'html.parser')
urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
title = soup.find('h1').get_text().strip()
page_urls = set([urljoin(base_url,url['href'])for url in urls])
url = soup.find('meta',{'property':"og:url"})['content']
return title,page_urls,url
unseen = set([base_url])
seen = set()
restricted_crawl = True
pool = mp.Pool(4)
count, t1 = 1, time.time()
while len(unseen) != 0: # still get some url to visit
if restricted_crawl and len(seen) > 20:
break
print('\nDistributed Crawling...')
crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
htmls = [j.get() for j in crawl_jobs] # request connection
print('\nDistributed Parsing...')
parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
results = [j.get() for j in parse_jobs] # parse html
print('\nAnalysing...')
seen.update(unseen) # seen the crawled
unseen.clear() # nothing unseen
for title, page_urls, url in results:
print(count, title, url)
count += 1
unseen.update(page_urls - seen) # get new url to crawl
print('Total time: %.1f s' % (time.time()-t1)) # 16 s !!!這是修改后的正確代碼
import multiprocessing as mp
import time
from urllib.request import urlopen,urljoin
from bs4 import BeautifulSoup
import re
?
base_url = "https://morvanzhou.github.io/"
?
#crawl爬取網(wǎng)頁
def crawl(url):
? ? response = urlopen(url)
? ? time.sleep(0.1)
? ? return response.read().decode()
?
#parse解析網(wǎng)頁
def parse(html):
? ? soup = BeautifulSoup(html,'html.parser')
? ? urls = soup.find_all('a',{"href":re.compile('^/.+?/$')})
? ? title = soup.find('h1').get_text().strip()
? ? page_urls = set([urljoin(base_url,url['href'])for url in urls])
? ? url = soup.find('meta',{'property':"og:url"})['content']
? ? return title,page_urls,url
?
def main():
? ? unseen = set([base_url])
? ? seen = set()
? ? restricted_crawl = True
?
? ? pool = mp.Pool(4)
? ? count, t1 = 1, time.time()
? ? while len(unseen) != 0: ? ? ? ? ? ? ? ? # still get some url to visit
? ? ? ? if restricted_crawl and len(seen) > 20:
? ? ? ? ? ? ? ? break
? ? ? ? print('\nDistributed Crawling...')
? ? ? ? crawl_jobs = [pool.apply_async(crawl, args=(url,)) for url in unseen]
? ? ? ? htmls = [j.get() for j in crawl_jobs] ? ? ?# request connection
?
? ? ? ? print('\nDistributed Parsing...')
? ? ? ? parse_jobs = [pool.apply_async(parse, args=(html,)) for html in htmls]
? ? ? ? results = [j.get() for j in parse_jobs] ? ?# parse html
?
? ? ? ? print('\nAnalysing...')
? ? ? ? seen.update(unseen) ? ? ? ? # seen the crawled
? ? ? ? unseen.clear() ? ? ? ? ? ? ?# nothing unseen
?
? ? ? ? for title, page_urls, url in results:
? ? ? ? ? ? print(count, title, url)
? ? ? ? ? ? count += 1
? ? ? ? ? ? unseen.update(page_urls - seen) ? ? # get new url to crawl
? ? print('Total time: %.1f s' % (time.time()-t1)) ? ?# 16 s !!!
?
?
if __name__ == '__main__':
? ? main()綜上可知,就是把你的運(yùn)行代碼整合成一個(gè)函數(shù),然后加入
if __name__ == '__main__': ? ? main()
這行代碼即可解決這個(gè)問題。
python報(bào)錯(cuò):RuntimeError
python報(bào)錯(cuò):RuntimeError:fails to pass a sanity check due to a bug in the windows runtime這種類型的錯(cuò)誤
這種錯(cuò)誤原因
1.當(dāng)前的python與numpy版本之間有什么問題,比如我自己用的python3.9與numpy1.19.4會(huì)導(dǎo)致這種報(bào)錯(cuò)。
2.numpy1.19.4與當(dāng)前很多python版本都有問題。
解決辦法
在File->Settings->Project:pycharmProjects->Project Interpreter下將numpy版本降下來就好了。
1.打開interpreter,如下圖:

2.雙擊numpy修改其版本:

3.勾選才能修改版本,將需要的低版本導(dǎo)入即可:

弄完了之后,重新運(yùn)行就好。
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
基于python實(shí)現(xiàn)模擬數(shù)據(jù)結(jié)構(gòu)模型
這篇文章主要介紹了基于python實(shí)現(xiàn)模擬數(shù)據(jù)結(jié)構(gòu)模型,文中通過示例代碼介紹的非常詳細(xì),對(duì)大家的學(xué)習(xí)或者工作具有一定的參考學(xué)習(xí)價(jià)值,需要的朋友可以參考下2020-06-06
Python實(shí)現(xiàn)線性擬合及繪圖的示例代碼
在數(shù)據(jù)處理和繪圖中,我們通常會(huì)遇到直線或曲線的擬合問題,本文主要介紹了Python實(shí)現(xiàn)線性擬合及繪圖的示例代碼,具有一定的參考價(jià)值,感興趣的可以了解一下2024-04-04
python 基于PYMYSQL使用MYSQL數(shù)據(jù)庫
這篇文章主要介紹了python 基于PYMYSQL使用MYSQL數(shù)據(jù)庫的方法,幫助大家更好的理解和使用python,感興趣的朋友可以了解下2020-12-12
詳解Python利用APScheduler框架實(shí)現(xiàn)定時(shí)任務(wù)
在做一些python工具的時(shí)候,常常會(huì)碰到定時(shí)器問題,總覺著使用threading.timer或者schedule模塊非常不優(yōu)雅。所以本文將利用APScheduler框架實(shí)現(xiàn)定時(shí)任務(wù),需要的可以參考一下2022-03-03
快速進(jìn)修Python指南之控制if-else循環(huán)技巧
這篇文章主要為大家介紹了Java開發(fā)者的Python快速進(jìn)修指南之控制之if-else和循環(huán)技巧示例詳解,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-12-12
Python json 錯(cuò)誤xx is not JSON serializable解決辦法
這篇文章主要介紹了Python json 錯(cuò)誤xx is not JSON serializable解決辦法的相關(guān)資料,需要的朋友可以參考下2017-03-03

