基于Python實現(xiàn)虛假評論檢測可視化系統(tǒng)

更新時間：2023年04月19日 08:52:59 作者：空白=null

這篇文章主要為大家詳細介紹了如何基于Python實現(xiàn)一個簡單的虛假評論檢測可視化系統(tǒng)，文中的示例代碼講解詳細，感興趣的小伙伴可以了解一下

1.app.py

這個代碼就是Flask的整個邏輯實現(xiàn)的地方啦，通過路由規(guī)則到達指定的頁面，然后通過get方式得到頁面輸入的內(nèi)容，通過post方式返回內(nèi)容給前端頁面。

# -*- coding: utf-8 -*-
"""

服務：



-自動生成詞云圖：
1. 根據(jù)用戶輸入指定網(wǎng)址，通過采集該網(wǎng)址文本進行處理。
2. 根據(jù)用戶輸入文本字符串進行處理。
3. 根據(jù)用戶輸入載入本地文本進行處理，用戶將所需要處理文本文件放入text文本夾中，指定文件名進行處理。


-文本關鍵信息提取
-文本情感分析
-用戶評價分析
-用戶畫像



后臺設計：
1. 服務接口設計
1.1 頁面請求設計
1.2 數(shù)據(jù)請求設計
2. 異常請求設計


"""

import os
from src import config
from src.exe import LR_xitong
from src.exe import file
from src.exe import yelp_claw

from flask import Flask, render_template,send_from_directory
from flask import Flask, render_template, request, redirect, url_for
from flask import request, redirect, json, url_for
from werkzeug.utils import secure_filename
import requests
import json
from flask_sqlalchemy import SQLAlchemy
from sqlalchemy import and_

# from src.exe import exe_02
# from src.exe import exe_03
# from src.exe import exe_05
# from src.exe import exe_06

# from src.exe import exe_01, exe_02, exe_03, exe_05, exe_06
 



## =================================== 路由配置 ===================================

##############################################################################################
print(LR_xitong.predict_review())
## Part 1 ++++++++++++++++++++++++++++++++++++++++++++++++++++


#==================================================================
#登錄,連接數(shù)據(jù)庫
app = Flask(__name__, template_folder=config.template_dir,static_folder=config.static_dir)
HOSTNAME = "127.0.0.1"
PORT = 3306
USERNAME = "root"
PASSWORD = "root"
DATABASE = "database_learn"
app.config[
    'SQLALCHEMY_DATABASE_URI'] = \
    f"mysql+pymysql://{USERNAME}:{PASSWORD}@{HOSTNAME}:{PORT}/{DATABASE}?charset=utf8mb4"
app.config['SQLALCHEMY_TRACK_MODIFICATIONS'] = True
db = SQLAlchemy(app)

@app.route("/")
def index():
    return render_template("register.html")
class User(db.Model):
    __tablename__ = 'user_list1' #（設置表名）
    id = db.Column(db.Integer, primary_key=True) #（設置主鍵）
    username = db.Column(db.String(255), unique=True)
    password = db.Column(db.String(255), unique=True)
# 返回一個可以用來表示對象的可打印字符串：（相當于java的toString）
    def __repr__(self):
        return '<User 用戶名：%r 密碼：%r>' % (self.username, self.password)# 操作數(shù)據(jù)庫
#增
def add_object(user):
    db.session.add(user)
    db.session.commit()
    print("添加 % r 完成" % user.__repr__)
with app.app_context():
    user = User()
    user = db.session.merge(user)  # 將未綁定的實例或?qū)ο蠛喜⒌綍捴?
    # user.username = 'li三'
    # user.password = '123456'
    # add_object(user)

# 查 （用到and的時候需要導入庫from sqlalchemy import and_）
# def query_object(user, query_condition_u, query_condition_p):
#     result = user.query.filter(and_(user.username == query_condition_u, user.password == query_condition_p))
#     print("查詢 % r 完成" % user.__repr__)
#     return result
# 刪
# def delete_object(user):
#     result = user.query.filter(user.username == '11111').all()
#     db.session.delete(result)
#     db.session.commit()
# #改
# def update_object(user):
#     result = user.query.filter(user.username == '111111').all()
#     result.title = 'success2018'


@app.route("/login",methods=['POST'])
def login():
    username1=request.form.get("username")
    password1 = request.form.get("password")
    if user.query.filter_by(username =username1,password =password1).all()!=[]:
        # print(user.username,username1,user.password,password1)
        print("登錄成功")
        return render_template("text_classification1.html")
    else:
        print("失敗")
        print(username1,password1)
        return render_template("register.html")


#===========================================================
#注冊：
@app.route("/register",methods=['POST'])
def register():
    username1=request.form.get("username")
    password1 = request.form.get("password")
    #判斷是否在表中，如果不在，則增加，如果在，則返回已經(jīng)存在的錯誤提示
    if user.query.filter_by(username=username1, password=password1).all() == []:
        user.username = username1
        user.password = password1
        add_object(user)
        return render_template("login.html")
    else:
        print("已經(jīng)注冊過了")
        message="已經(jīng)注冊過了"
        return render_template("register.html",message=message)






## Part 2 自動生成詞云圖 ++++++++++++++++++++++++++++++++++++++++++++++++++++
def read_file(filepath):
        """
        Read the local file and transform to text.

        Parameters
        ----------
        filepath : TYPE-str
            DESCRIPTION: the text file path.

        Returns
        -------
        content : TYPE-str
            DESCRIPTION:The preprocessed news text.

        """
        f = open(filepath,'r',encoding='utf-8')
        content = f.read()
        f.close()
        return content   
    
def save_to_file(filepath, content):
    f = open(filepath, 'w', encoding='utf-8') 
    f.write(content)
    f.close()

def check_url(url):
    """
    Check if the URL can be accessed normally.
    
    Open a simulated browser and visit.

    If the access is normal, the output is normal, and the error is output.

    Parameters
    ----------
    url : TYPE-str
        DESCRIPTION: the URL.

    Returns
    -------
    content : TYPE-str
        DESCRIPTION:The preprocessed news text.
    
    """
    import urllib
    import time
    
    opener = urllib.request.build_opener()
    opener.addheaders = [('User-agent', 'Mozilla/49.0.2')] #Mozilla/5.0 (Windows NT 6.1; WOW64; rv:6.0) Gecko/20100101 Firefox/6.0
    url = url.replace('\n','').strip()
    try:
        opener.open(url)
        print(url + ' successfully accessed.')
        return True
    except urllib.error.HTTPError:
        print(url + ' = Error when accessing the page.')
        time.sleep(2)
    except urllib.error.URLError:
        print(url + " = Error when accessing the page.")
        time.sleep(2)
    time.sleep(0.1)
    return False
    

       



##############################################################################################




##############################################################################################
## Part 3 文本預處理
## Part 3.2 文本關鍵信息提取--多文本分析--主題分析






##############################################################################################

## Part 4 文本分類
#/classification_1是單文本
#英文
@app.route("/classification_1",methods=['GET'])
def review_classification_home():
    return render_template("text_classification1.html")

@app.route("/classification_1",methods=['POST'])
def review_classification_input():
    text=request.form.get('inputtext')
    text1=text  #將輸入的文本儲存到text1中
    if not text.isascii():  #如果不是英文
        url = 'http://fanyi.youdao.com/translate?smartresult=dict&smartresult=rule'
        data = {
            'i': text,
            'from': 'AUTO',
            'to': 'AUTO',
            'smartresult': 'dict',
            'client': 'fanyideskweb',
            'salt': '16071715461327',
            'sign': 'f5d5d5c129878e8e36558fb321b16f85',
            'ts': '1607171546132',
            'bv': 'd943a2cf8cbe86fb2d1ff7fcd59a6a8c',
            'doctype': 'json',
            'version': '2.1',
            'keyfrom': 'fanyi.web',
            'action': 'FY_BY_REALTlME',
            'typoResult': 'false'
        }

        # 發(fā)送POST請求并獲取響應數(shù)據(jù)
        response = requests.post(url, data=data)
        result = json.loads(response.text)

        # 解析翻譯結果并輸出
        translate_result = result['translateResult'][0][0]['tgt']
        print("翻譯結果：", translate_result)
        text = translate_result
    try:
        if text!=None:
            save_to_file(config.classificaion_input_text_path,text) #英文文本
            save_to_file(config.classificaion_input_text1_path,text1) #輸入的中文文本
            print(text)
        return redirect('/download_classification')
    except:
        return render_template("text_classification1.html")
####################################################################################
#####################################################################################
# 文本分類結果
@app.route('/download_classification', methods=['GET'])
def review_classification():
    cur = LR_xitong.predict_review()
    print("要返回結果啦")
    return render_template("classification.html", curinput=cur)


# 文本分類結果，下載輸出結果
@app.route('/download_classification', methods=['POST'])
def download_review_classification():
    file_dir, filename = os.path.split(config.download_classification_input_text_save_path)
    print("要保存啦")
    return send_from_directory(file_dir, filename, as_attachment=True)
######################################################################################
#批量文本處理
@app.route("/classification_2",methods=['GET'])
def pilialng():
    return render_template("text_classification2.html")


@app.route('/classification_2', methods=['POST'])
def get_import_file():
    userfile = request.files.get('loadfile')
    if userfile:
        filename = secure_filename(userfile.filename)
        types = ['xlsx', 'csv', 'xls']
        if filename.split('.')[-1] in types:
            uploadpath = os.path.join(config.save_dir, filename)
            userfile.save(uploadpath)
            save_to_file(config.wc_input_file_save_path, uploadpath)
            print('文件上傳成功')
            return redirect('/download_classification_2')
    else:
        return render_template("text_classification2.html")

#=============================
#批量文本下載
@app.route('/download_classification_2', methods=['GET'])
def rt_keyinfo_import_file():
    filepath=read_file(config.wc_input_file_save_path)
    cur = file.predict(filepath)  #這里就要把列表的東西返回
    return render_template("classification2.html", curinput=cur)


# 03 tab3關鍵信息生成-下載輸出結果
@app.route('/download_classification_2', methods=['POST'])
def download_keyinfo_3():
    file.save()
    return 0

##############################################################################################
#輸入URL
@app.route("/classification_3", methods=['GET'])
def keyinfo_home_1():
    return render_template("text_classification3.html")


# 01 tab1關鍵信息提取構建-獲取前端輸入數(shù)據(jù)
@app.route('/classification_3', methods=['POST'])
def get_keyinfo_url():
    url = request.form.get('texturl')[25:]
    try:
        save_to_file(config.keyinfo_input_url_path, url)
        # if check_url(url):
        #     save_to_file(config.keyinfo_input_url_path, url)
        #     print('add URL: ' + url)
        return redirect('/download_classification_3')
    except:
        return render_template("text_classification3.html")

    # 01 tab1關鍵信息生成-數(shù)據(jù)請求


@app.route('/download_classification_3', methods=['GET'])
def rt_keyinfo_url():
    res_name=read_file(config.keyinfo_input_url_path)  #這是讀的餐廳名字
    #然后進行爬取，存儲到另一個路徑
    yelp_claw.claw(res_name)
    cur = file.predict('yelp_reviews.csv')
    return render_template("classification3.html", curinput=cur)


# 01 tab1關鍵信息生成-下載輸出結果
@app.route('/download_classification_3', methods=['POST'])
def download_keyinfo_1():
    file_dir, filename = os.path.split(config.download_keyinfo_input_url_save_path)
    return send_from_directory(file_dir, filename, as_attachment=True)













##############################################################################################
    








# #############################  異常處理  ###########################
# 403錯誤
@app.errorhandler(403)
def miss(e):
    return render_template('error-403.html'), 403


# 404錯誤
@app.errorhandler(404)
def error404(e):
    return render_template('error-404.html'), 404


# 405錯誤
@app.errorhandler(405)
def erro405r(e):
    return render_template('error-405.html'), 405


# 500錯誤
@app.errorhandler(500)
def error500(e):
    return render_template('error-500.html'), 500




# 主函數(shù)
if __name__ == "__main__":
    app.run()

2.LR_xitong.py

這部分代碼就是單條文本檢測的實現(xiàn)了，先將數(shù)據(jù)集進行訓練，保存LR模型參數(shù)，然后使LR對新得到的句子向量進行判斷。

##  基礎函數(shù)庫
import numpy as np


## 導入邏輯回歸模型函數(shù)
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn import linear_model
from src.exe import Singlesentence
from Singlesentence import *
import tensorflow as tf
from tensorflow import keras

##Demo演示LogisticRegression分類

## 構造數(shù)據(jù)集
train_data_features=pd.read_csv(r'D:\BaiduNetdiskDownload\yelp\new\BHAN+W\res.csv') #需要加一行數(shù)組標
file_name = r'D:\BaiduNetdiskDownload\yelp\yelp_rzj\label.csv' #鍵入訓練數(shù)據(jù)名
label_name = 'label1' #鍵入標簽列標題
#提取評論標簽
def getLabel():
    df_data=pd.read_csv(file_name, encoding='utf-8')
    data = list(df_data[label_name])
    return data
label = getLabel()
x_fearures = train_data_features
y_label = label


## 調(diào)用邏輯回歸模型
lr_clf = LogisticRegression()

## 用邏輯回歸模型擬合構造的數(shù)據(jù)集
lr_clf = lr_clf.fit(x_fearures, y_label)

def predict_review():

    x_fearures_new1=[vec()]
    ##在訓練集和測試集上分布利用訓練好的模型進行預測


    y_label_new1_predict=lr_clf.predict(x_fearures_new1)


    if y_label_new1_predict[0] == 1:
        a='真實'
    else:
        a='虛假'
    print('The New point 1 predict class:\n',a)
    ##由于邏輯回歸模型是概率預測模型（前文介紹的p = p(y=1|x,\theta)）,所有我們可以利用predict_proba函數(shù)預測其概率
    y_label_new1_predict_proba=lr_clf.predict_proba(x_fearures_new1)
    print('The New point 1 predict Probability of each class:\n',y_label_new1_predict_proba)
    a1=read_file(config.classificaion_input_text_path) #
    b=read_file(config.classificaion_input_text1_path)
    if a1==b:
        inputtext=a1
    else:
        inputtext=b
    curinput={'inputtext':inputtext,'a':a,'proba':y_label_new1_predict_proba}
    return curinput

3.singleSentence.py

這部分就是對文本通過BERT-whitening模型進行向量化。

#! -*- coding: utf-8 -*-
# 簡單的線性變換（白化）操作，就可以達到甚至超過BERT-flow的效果。

from utils import *
import os, sys
import numpy as np
import xlsxwriter
import re
from src import config
import pandas as pd
import tensorflow as tf
from tensorflow import keras
def save_to_file(filepath, content):
    """
    Write the text to the local file.

    Parameters
    ----------
    filepath : TYPE-str
        DESCRIPTION: the file save path.

    Returns
    -------
    content : TYPE-str
        DESCRIPTION: the text.

    """
    f = open(filepath, 'w', encoding='utf-8')
    f.write(content)
    f.close()

def read_file(filepath):
    """
    Read the local file and transform to text.

    Parameters
    ----------
    filepath : TYPE-str
        DESCRIPTION: the text file path.

    Returns
    -------
    content : TYPE-str
        DESCRIPTION:The preprocessed news text.

    """
    f = open(filepath,'r',encoding='utf-8')
    content = f.read()
    f.close()
    return content
def load_mnli_train_data1(filename):
    df = pd.read_csv(filename, encoding='gbk')
    # 劃分data與label
    data = df['comment_text']
    D = []
    with open(filename, encoding='gbk') as f:
        for i, l in enumerate(f):
            if i > 0:
                l = l.strip().split(',')
                pattern = r'\.|\?|\~|!|。|、|；|‘|'|【|】|·|!|…|（|）'
                result_list = re.split(pattern, data[i-1])
                for text in result_list:
                    D.append((text, l[-1]))
    return D
def convert_to_ids1(data, tokenizer, maxlen=64):
    """轉換文本數(shù)據(jù)為id形式
    """
    a_token_ids= []
    for d in tqdm(data):
        token_ids = tokenizer.encode(d, maxlen=maxlen)[0]
        a_token_ids.append(token_ids)

    a_token_ids = sequence_padding(a_token_ids)

    return a_token_ids

def convert_to_vecs1(data, tokenizer, encoder, maxlen=64):
    """轉換文本數(shù)據(jù)為向量形式
    """
    a_token_ids = convert_to_ids1(data, tokenizer, maxlen)
    with session.as_default():
        with session.graph.as_default():
            a_vecs = encoder.predict([a_token_ids,
                              np.zeros_like(a_token_ids)],
                             verbose=True)

    return a_vecs


config1 = tf.ConfigProto(
    device_count={'CPU': 1},
    intra_op_parallelism_threads=1,
    allow_soft_placement=True
)
session = tf.Session(config=config1)
keras.backend.set_session(session)
#BERT配置

config_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_config.json'
checkpoint_path =r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_model.ckpt'
dict_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\vocab.txt'

# 建立分詞器
tokenizer = get_tokenizer(dict_path)

# 建立模型
encoder = get_encoder(config_path, checkpoint_path)

# 加載NLI預訓練權重


encoder.load_weights('D:\downloads\BERT-whitening-main\BERT-whitening-main\eng\weights\_res200.weights')

def vec():
    data=read_file(config.classificaion_input_text_path)
    print("在vec函數(shù)內(nèi)的",data)
    # pattern = r'\.|\?|\~|!|。|、|；|‘|'|【|】|·|!|…|（|）'
    # result_list = re.split(pattern, data)
    # D1=[]
    # for text in result_list:
    #     D1.append(text)
    # nli_data = D1
    nli_data = data
    #在這里增加對不符合正常邏輯的句子的判斷？還是去除停用詞比較好呢？
    nli_a_vecs= convert_to_vecs1(
        nli_data, tokenizer, encoder
    )
    # nli_a_vecs=nli_a_vecs.reshape((2,384))
    #得到白化后的向量
    kernel, bias = compute_kernel_bias([nli_a_vecs],n_components=200)
    # np.save('weights/hotel.kernel.bias' , [kernel, bias])
    kernel = kernel[:, :768]
    a_vecs = transform_and_normalize(nli_a_vecs, kernel, bias) #shape=[8000,768]
    #需要在這里將[句子數(shù)量，768]變成[1,768]
    a=[0]*200#200是這個最后的向量維度
    for i in a_vecs:
        a=a+i
    output = a/len(a_vecs)
    return output

4.批量文本的處理

這部分代碼和上面單條文本的很像，不同之處就是在predict()函數(shù)那里增加了讀取文件的操作，將對單文本進行文本向量化變成了對多文本進行文本向量化。

#! -*- coding: utf-8 -*-
# 簡單的線性變換（白化）操作，就可以達到甚至超過BERT-flow的效果。

from utils import *
import os, sys
import numpy as np
import xlsxwriter
import re
from src import config
import pandas as pd
import tensorflow as tf
from tensorflow import keras
def save_to_file(filepath, content):
    """
    Write the text to the local file.

    Parameters
    ----------
    filepath : TYPE-str
        DESCRIPTION: the file save path.

    Returns
    -------
    content : TYPE-str
        DESCRIPTION: the text.

    """
    f = open(filepath, 'w', encoding='utf-8')
    f.write(content)
    f.close()

def read_file(filepath):
    """
    Read the local file and transform to text.

    Parameters
    ----------
    filepath : TYPE-str
        DESCRIPTION: the text file path.

    Returns
    -------
    content : TYPE-str
        DESCRIPTION:The preprocessed news text.

    """
    f = open(filepath,'r',encoding='utf-8')
    content = f.read()
    f.close()
    return content
def load_mnli_train_data2(filename):
    # df = pd.read_csv(filename, encoding='gbk')
    # 劃分data與label
    # data = df['comment_text']
    D = []
    with open(filename, encoding='gbk') as f:
        for i, l in enumerate(f):
            if i > 0:
                D.append(l)
    return D
def load_mnli_train_data3(filename):
    df = pd.read_csv(filename, encoding='gbk')
    data = df['comment_text']
    D = []
    for d in data:
        D.append(d)
    return D
def convert_to_ids1(data, tokenizer, maxlen=64):
    """轉換文本數(shù)據(jù)為id形式
    """
    a_token_ids= []
    for d in tqdm(data):
        token_ids = tokenizer.encode(d, maxlen=maxlen)[0]
        a_token_ids.append(token_ids)

    a_token_ids = sequence_padding(a_token_ids)

    return a_token_ids

def convert_to_vecs1(data, tokenizer, encoder, maxlen=64):
    """轉換文本數(shù)據(jù)為向量形式
    """
    a_token_ids = convert_to_ids1(data, tokenizer, maxlen)
    with session.as_default():
        with session.graph.as_default():
            a_vecs = encoder.predict([a_token_ids,
                              np.zeros_like(a_token_ids)],
                             verbose=True)

    return a_vecs


config1 = tf.ConfigProto(
    device_count={'CPU': 1},
    intra_op_parallelism_threads=1,
    allow_soft_placement=True
)
session = tf.Session(config=config1)
keras.backend.set_session(session)
#BERT配置

config_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_config.json'
checkpoint_path =r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\bert_model.ckpt'
dict_path = r'D:\HomeWork\Paper\ZhangRong\BERT\BERT\GLUE\BERT_BASE_DIR\uncased_L-12_H-768_A-12\vocab.txt'

# 建立分詞器
tokenizer = get_tokenizer(dict_path)

# 建立模型
encoder = get_encoder(config_path, checkpoint_path)

# 加載NLI預訓練權重


encoder.load_weights('D:\downloads\BERT-whitening-main\BERT-whitening-main\eng\weights\_res200.weights')







# 得到向量
def vec1(nli_data):
    # 在這里增加對不符合正常邏輯的句子的判斷？還是去除停用詞比較好呢？
    # nli_data = preProcess(nli_data) #先將網(wǎng)頁那些去除
    nli_a_vecs = convert_to_vecs1(
        nli_data, tokenizer, encoder
    )

    # 得到白化后的向量
    kernel, bias = compute_kernel_bias([nli_a_vecs], n_components=200)
    # np.save('weights/hotel.kernel.bias' , [kernel, bias])
    kernel = kernel[:, :768]
    a_vecs = transform_and_normalize(nli_a_vecs, kernel, bias)  # shape=[8000,768]
    # 需要在這里將[句子數(shù)量，768]變成[1,768]
    a = [0] * 200  # 200是這個最后的向量維度
    for i in a_vecs:
        a = a + i
    output = a / len(a_vecs)
    return output


## 導入邏輯回歸模型函數(shù)
from sklearn.linear_model import LogisticRegression
import pandas as pd
from sklearn import linear_model
from src.exe import Singlesentence
from Singlesentence import *
import tensorflow as tf
from tensorflow import keras

##Demo演示LogisticRegression分類

## 構造數(shù)據(jù)集
train_data_features=pd.read_csv(r'D:\BaiduNetdiskDownload\yelp\new\BHAN+W\res.csv') #需要加一行數(shù)組標
file_name = r'D:\BaiduNetdiskDownload\yelp\yelp_rzj\label.csv' #鍵入訓練數(shù)據(jù)名
label_name = 'label1' #鍵入標簽列標題
#提取評論標簽
def getLabel():
    df_data=pd.read_csv(file_name, encoding='utf-8')
    data = list(df_data[label_name])
    return data
label = getLabel()
x_fearures = train_data_features
y_label = label


## 調(diào)用邏輯回歸模型
lr_clf = LogisticRegression()

## 用邏輯回歸模型擬合構造的數(shù)據(jù)集
lr_clf = lr_clf.fit(x_fearures, y_label)

def predict(filepath):
    Data = []
    #開始預測
    data = load_mnli_train_data3(filepath)
    for input_text in data:
        #進行預處理，去掉<br>和索引號
        input_text = re.sub(r"&#39;", "", input_text)
        input_text = re.sub(r"[^a-zA-Z0-9\s]", "", input_text)
        predict=lr_clf.predict([vec1(input_text)])
        if predict[0] == 1:
            a = '真實'
            Data.append([input_text,a])
        else:
            b = '虛假'
            Data.append([input_text,b])
    curinput={'Data':Data,'filename':filepath,'url':read_file(config.keyinfo_input_url_path) }
    print(Data)
    return curinput
# predict()

# def save():
# # 將data內(nèi)容寫到表格中
#     dd=pd.DataFrame(predict().Data,columns=['comment','label'])
#     file='D:\downloads\predict_file.csv'
#     dd.to_csv(file)
#     return file
#

5.爬取網(wǎng)頁代碼

import requests
import csv

# 設置 API 訪問密鑰和 API 端點 URL
# API_KEY =  'GET https://api.yelp.com/v3/businesses/north-india-restaurant-san-francisco/reviews'
# API_HOST = 'https://api.yelp.com/v3'
# REVIEWS_PATH = '/businesses/{}/reviews'
#
# # 設置餐廳ID和請求頭
# business_id = 'NORTH-INDIA-RESTAURANT-SAN-FRANCISCO'
# headers = {'Authorization': 'Bearer %s' % API_KEY}
#
# # 發(fā)送評論請求獲取餐廳評論
# url = API_HOST + REVIEWS_PATH.format(business_id)

#通過請求分析得到店鋪的評論接口，然后進行爬取解析Json對象得到想要的內(nèi)容和特征
def claw(res_name):
    # businessid=res_name
    i=0


    print(res_name+"這是店鋪名稱")


    response = requests.get('https://www.yelp.com/biz/{}/review_feed?start={}'.format(res_name,i))
    reviews = response.json()['reviews']
    # 將評論數(shù)據(jù)寫入 CSV 文件
    with open('yelp_reviews.csv', mode='w', encoding='utf-8', newline='') as file:
        writer = csv.writer(file)
        writer.writerow(['User Name', 'User_URL', 'Review Data', 'Rating', 'comment_text', 'Review Count'])
        for review in reviews:
            user_name = review['user']['altText']  # 用戶ID
            user_link = review['user']['link'][21:]  # 用戶個人地址
            review_count = review['user']['reviewCount']  # 用戶評論數(shù)量
            rating = review['rating']  # 評論評分
            text = review['comment']['text']  # 評論
            data = review['localizedDate']  # 拿的評論日期
            writer.writerow([user_name, user_link, data, rating, text, review_count])

主要代碼好像就這么多了。接下來是可視化界面：

到此這篇關于基于Python實現(xiàn)虛假評論檢測可視化系統(tǒng)的文章就介紹到這了,更多相關Python虛假評論檢測系統(tǒng)內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關文章希望大家以后多多支持腳本之家！

您可能感興趣的文章:

Python錯誤的處理方法
這篇文章主要介紹了Python錯誤的處理方法，文中代碼非常詳細，幫助大家更好的理解和學習，感興趣的朋友可以了解下
2020-06-06
python實現(xiàn)自動整理文件
這篇文章主要介紹了python實現(xiàn)自動整理文件，主要內(nèi)容通過整理桌面雜亂無章都是文檔和資料了解用python如何批量將不同后綴的文件移動到同一文件夾,需要的朋友可以參考一下
2022-04-04
Python數(shù)據(jù)結構與算法中的棧詳解(1)
這篇文章主要為大家詳細介紹了Python中的棧，文中示例代碼介紹的非常詳細，具有一定的參考價值，感興趣的小伙伴們可以參考一下，希望能夠給你帶來幫助
2022-03-03
Python用類實現(xiàn)撲克牌發(fā)牌的示例代碼
這篇文章主要介紹了Python用類實現(xiàn)撲克牌發(fā)牌的示例代碼，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧
2020-06-06
django模板語法學習之include示例詳解
寫過 Web 程序的都對 include 包含文件很熟悉，那么在 Django，include 又是怎么一個機制呢？下面這篇文章主要給大家介紹了關于django模板語法學習之include的相關資料，需要的朋友可以參考借鑒，下面隨著小編來一起學習學習吧。
2017-12-12
python-docx如何刪除所有bookmarks
在Python-docx庫中,雖然沒有直接刪除書簽的功能,但可以通過操作XML元素,遍歷文檔結構并刪除指定元素來實現(xiàn)刪除所有書簽的目的,首先要明白書簽在XML文件中的位置,然后利用Python-docx提供的element元素遍歷并刪除特定的書簽元素
2024-09-09
使用Python編制一個批處理文件管理器
在軟件開發(fā)和系統(tǒng)管理中,批處理文件（.bat）是一種常見且有用的工具,它們可以自動化重復性任務,簡化復雜的操作流程,今天,我們將探討如何使用Python和wxPython創(chuàng)建一個圖形用戶界面（GUI）應用程序來管理和執(zhí)行批處理文件,需要的朋友可以參考下
2025-01-01
python記錄程序運行時間的三種方法
這篇文章主要介紹了python記錄程序運行時間的三種方法的相關資料,需要的朋友可以參考下
2017-07-07
python 解析XML python模塊xml.dom解析xml實例代碼
這篇文章主要介紹了分享下python中使用模塊xml.dom解析xml文件的實例代碼，學習下python解析xml文件的方法，有需要的朋友參考下
2014-02-02
Python中通過@classmethod 實現(xiàn)多態(tài)的示例
這篇文章主要介紹了Python中通過@classmethod 實現(xiàn)多態(tài),python中通常使用對象創(chuàng)建多態(tài)模式，python還支持類創(chuàng)建多態(tài)模式，下面通過一個例子展示它如何實現(xiàn)多態(tài)，需要的朋友可以參考下
2022-11-11