基于Python實(shí)現(xiàn)的簡單數(shù)字識別程序

更新時(shí)間：2025年12月25日 08:54:51 作者：ufdf

文章介紹了如何使用全連接神經(jīng)網(wǎng)絡(luò)（MLP）進(jìn)行MNIST數(shù)字識別,包括代碼模型定義、訓(xùn)練和測試的步驟,并解釋了模型權(quán)重保存文件的內(nèi)容,需要的朋友可以參考下

簡易代碼

模型定義代碼，model.py

import torch.nn as nn

# 定義一個(gè)簡單的 CNN 模型
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(28 * 28, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 10)
        self.relu = nn.ReLU()
        self.dropout = nn.Dropout(0.2)

    def forward(self, x):
        x = self.flatten(x)  # [B, 1, 28, 28] -> [B, 784]
        x = self.relu(self.fc1(x))
        x = self.dropout(x)
        x = self.relu(self.fc2(x))
        x = self.dropout(x)
        x = self.fc3(x)  # 輸出層不加激活（CrossEntropyLoss 內(nèi)部含 softmax）
        return x

然后訓(xùn)練代碼，train.py

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
from model import SimpleModel  # ?? 從 model.py 導(dǎo)入

# 配置
batch_size = 64
learning_rate = 0.001
num_epochs = 10
model_save_path = 'mnist_mlp.pth'

# 數(shù)據(jù)
transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
])
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, transform=transform, download=True)
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True)

# 模型、損失、優(yōu)化器
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = SimpleModel().to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# 訓(xùn)練
print(f"Training on {device}...")
model.train()
for epoch in range(num_epochs):
    total_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        total_loss += loss.item()
    print(f'Epoch [{epoch+1}/{num_epochs}], Loss: {total_loss/len(train_loader):.4f}')

# 保存
torch.save(model.state_dict(), model_save_path)
print(f"? Model saved to {model_save_path}")

訓(xùn)練

在訓(xùn)練之前我們需要安裝下python依賴

pip install torch torchvision

然后我們就可以開始訓(xùn)練模型啦！執(zhí)行命令python ./train.py，你會看到類似輸出

Training on cpu...
Epoch [1/10], Loss: 0.3501
Epoch [2/10], Loss: 0.1702
Epoch [3/10], Loss: 0.1335
Epoch [4/10], Loss: 0.1141
Epoch [5/10], Loss: 0.1027
Epoch [6/10], Loss: 0.0915
Epoch [7/10], Loss: 0.0884
Epoch [8/10], Loss: 0.0801
Epoch [9/10], Loss: 0.0769
Epoch [10/10], Loss: 0.0715
? Model saved to mnist_mlp.pth

目錄下會生成一個(gè)mnist_mlp.pth，mnist_mlp.pth 是一個(gè) PyTorch 模型權(quán)重保存文件，本質(zhì)上是一個(gè) 序列化后的字典（state_dict），存儲了神經(jīng)網(wǎng)絡(luò)中所有可學(xué)習(xí)參數(shù)（如權(quán)重和偏置）的數(shù)值。

測試模型

現(xiàn)在我們拿我們的模型去試試我們的數(shù)字圖片了~
predict.py

# predict.py
import torch
import torchvision.transforms as transforms
from PIL import Image
from model import SimpleModel
import argparse
import os

def predict_image(image_path, model_path='mnist_mlp.pth', device='cpu'):
    # 1. 加載模型
    model = SimpleModel()
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()  # 推理模式

    # 2. 圖像預(yù)處理（必須和訓(xùn)練時(shí)一致！）
    transform = transforms.Compose([
        transforms.Grayscale(num_output_channels=1),  # 轉(zhuǎn)灰度
        transforms.Resize((28, 28)),                   # 調(diào)整為 28x28
        transforms.ToTensor(),                         # 轉(zhuǎn)為 Tensor [0,1]
        transforms.Normalize((0.1307,), (0.3081,))    # 用 MNIST 的均值/標(biāo)準(zhǔn)差
    ])

    # 3. 加載并預(yù)處理圖像
    image = Image.open(image_path).convert('L')  # 強(qiáng)制灰度（兼容 RGB 輸入）
    input_tensor = transform(image)              # shape: [1, 28, 28]
    input_batch = input_tensor.unsqueeze(0)      # 增加 batch 維度 → [1, 1, 28, 28]

    # 4. 推理
    with torch.no_grad():
        output = model(input_batch)
        probabilities = torch.softmax(output, dim=1)
        predicted_class = torch.argmax(probabilities, dim=1).item()
        confidence = probabilities[0][predicted_class].item()

    return predicted_class, confidence

if __name__ == '__main__':
    parser = argparse.ArgumentParser(description='Predict digit in an image using trained MLP')
    parser.add_argument('image_path', type=str, help='Path to the input image (e.g., digit.png)')
    args = parser.parse_args()

    if not os.path.exists(args.image_path):
        print(f"? Error: Image file '{args.image_path}' not found!")
        exit(1)

    device = 'cuda' if torch.cuda.is_available() else 'cpu'
    digit, conf = predict_image(args.image_path, device=device)

    print(f"? Predicted digit: {digit}")
    print(f"?? Confidence: {conf:.4f} ({conf*100:.2f}%)")

我們可以python .\predict.py .\data\digit.png來看看預(yù)測的結(jié)果如何。

到此這篇關(guān)于基于Python實(shí)現(xiàn)的簡單數(shù)字識別程序的文章就介紹到這了,更多相關(guān)Python數(shù)字識別程序內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: