獲取python運(yùn)行輸出的數(shù)據(jù)并解析存為dataFrame實(shí)例

更新時(shí)間：2020年07月07日 18:21:57 作者：喝粥也會(huì)胖的唐僧

這篇文章主要介紹了獲取python運(yùn)行輸出的數(shù)據(jù)并解析存為dataFrame實(shí)例，具有很好的參考價(jià)值，希望對(duì)大家有所幫助。一起跟隨小編過(guò)來(lái)看看吧

在學(xué)習(xí)xg的時(shí)候，想畫(huà)學(xué)習(xí)曲線，但無(wú)奈沒(méi)有沒(méi)有這個(gè) evals_result_

AttributeError: 'Booster' object has no attribute 'evals_result_'

因?yàn)椴皇怯玫姆诸?lèi)器或者回歸器，而且是使用的train而不是fit進(jìn)行訓(xùn)練的，看過(guò)源碼fit才有evals_result_這個(gè)，導(dǎo)致訓(xùn)練后沒(méi)有這個(gè)，但是又想獲取學(xué)習(xí)曲線，因此肯定還需要獲取訓(xùn)練數(shù)據(jù)。

運(yùn)行的結(jié)果上面有數(shù)據(jù)，于是就想自己解析屏幕的數(shù)據(jù)試一下，屏幕可以看到有我們迭代過(guò)程的數(shù)據(jù)，因此想直接獲取屏幕上的數(shù)據(jù)，思維比較low但是簡(jiǎn)單粗暴。

接下來(lái)分兩步完成：

1) 獲取屏幕數(shù)據(jù)

import subprocess
import pandas as pd
top_info = subprocess.Popen(["python", "main.py"], stdout=subprocess.PIPE)
out, err = top_info.communicate()
out_info = out.decode('unicode-escape')
lines=out_info.split('\n')

注：這里的main.py就是自己之前執(zhí)行的python文件

2) 解析文件數(shù)據(jù):

ln=0
lst=dict()
for line in lines:
 if line.strip().startswith('[{}] train-auc:'.format(ln)):
 if ln not in lst.keys():
  lst.setdefault(ln, {})
 tmp = line.split('\t')
 t1=tmp[1].split(':')
 t2=tmp[2].split(':')
 if str(t1[0]) not in lst[ln].keys():
  lst[ln].setdefault(str(t1[0]), 0)
 if str(t2[0]) not in lst[ln].keys():
  lst[ln].setdefault(str(t2[0]), 0)
 lst[ln][str(t1[0])]=t1[1]
 lst[ln][str(t2[0])]=t2[1]
 ln+=1
json_df=pd.DataFrame(pd.DataFrame(lst).values.T, index=pd.DataFrame(lst).columns, columns=pd.DataFrame(lst).index).reset_index()
json_df.columns=['numIter','eval-auc','train-auc']
print(json_df)

整體代碼：

import subprocess
import pandas as pd
top_info = subprocess.Popen(["python", "main.py"], stdout=subprocess.PIPE)
out, err = top_info.communicate()
out_info = out.decode('unicode-escape')
lines=out_info.split('\n')
 
ln=0
lst=dict()
for line in lines:
    if line.strip().startswith('[{}]    train-auc:'.format(ln)):
        if ln not in lst.keys():
            lst.setdefault(ln, {})
        tmp = line.split('\t')
        t1=tmp[1].split(':')
        t2=tmp[2].split(':')
        if str(t1[0]) not in lst[ln].keys():
            lst[ln].setdefault(str(t1[0]), 0)
        if str(t2[0]) not in lst[ln].keys():
            lst[ln].setdefault(str(t2[0]), 0)
        lst[ln][str(t1[0])]=t1[1]
        lst[ln][str(t2[0])]=t2[1]
        ln+=1
json_df=pd.DataFrame(pd.DataFrame(lst).values.T, index=pd.DataFrame(lst).columns, columns=pd.DataFrame(lst).index).reset_index()
json_df.columns=['numIter','eval-auc','train-auc']
print(json_df)

看下效果：