PyTorch中torch.utils.data.DataLoader實例詳解

更新時間：2022年09月27日 09:46:31 作者：進擊的程小白

torch.utils.data.DataLoader主要是對數(shù)據(jù)進行batch的劃分,下面這篇文章主要給大家介紹了關(guān)于PyTorch中torch.utils.data.DataLoader的相關(guān)資料,文中通過實例代碼介紹的非常詳細,需要的朋友可以參考下

1、dataset：（數(shù)據(jù)類型 dataset）

輸入的數(shù)據(jù)類型,這里是原始數(shù)據(jù)的輸入。PyTorch內(nèi)也有這種數(shù)據(jù)結(jié)構(gòu)。

2、batch_size：（數(shù)據(jù)類型 int）

批訓練數(shù)據(jù)量的大小，根據(jù)具體情況設置即可（默認：1）。PyTorch訓練模型時調(diào)用數(shù)據(jù)不是一行一行進行的（這樣太沒效率），而是一捆一捆來的。這里就是定義每次喂給神經(jīng)網(wǎng)絡多少行數(shù)據(jù)，如果設置成1，那就是一行一行進行（個人偏好，PyTorch默認設置是1）。每次是隨機讀取大小為batch_size。如果dataset中的數(shù)據(jù)個數(shù)不是batch_size的整數(shù)倍，這最后一次把剩余的數(shù)據(jù)全部輸出。若想把剩下的不足batch size個的數(shù)據(jù)丟棄，則將drop_last設置為True，會將多出來不足一個batch的數(shù)據(jù)丟棄。

3、shuffle：（數(shù)據(jù)類型 bool）

洗牌。默認設置為False。在每次迭代訓練時是否將數(shù)據(jù)洗牌，默認設置是False。將輸入數(shù)據(jù)的順序打亂，是為了使數(shù)據(jù)更有獨立性，但如果數(shù)據(jù)是有序列特征的，就不要設置成True了。

4、collate_fn：（數(shù)據(jù)類型 callable，沒見過的類型）

將一小段數(shù)據(jù)合并成數(shù)據(jù)列表，默認設置是False。如果設置成True，系統(tǒng)會在返回前會將張量數(shù)據(jù)（Tensors）復制到CUDA內(nèi)存中。

5、batch_sampler：（數(shù)據(jù)類型 Sampler）

批量采樣，默認設置為None。但每次返回的是一批數(shù)據(jù)的索引（注意：不是數(shù)據(jù)）。其和batch_size、shuffle 、sampler and drop_last參數(shù)是不兼容的。我想，應該是每次輸入網(wǎng)絡的數(shù)據(jù)是隨機采樣模式，這樣能使數(shù)據(jù)更具有獨立性質(zhì)。所以，它和一捆一捆按順序輸入，數(shù)據(jù)洗牌，數(shù)據(jù)采樣，等模式是不兼容的。

6、sampler：（數(shù)據(jù)類型 Sampler）

采樣，默認設置為None。根據(jù)定義的策略從數(shù)據(jù)集中采樣輸入。如果定義采樣規(guī)則，則洗牌（shuffle）設置必須為False。

7、num_workers：（數(shù)據(jù)類型 Int）

工作者數(shù)量，默認是0。使用多少個子進程來導入數(shù)據(jù)。設置為0，就是使用主進程來導入數(shù)據(jù)。注意：這個數(shù)字必須是大于等于0的，負數(shù)估計會出錯。

8、pin_memory：（數(shù)據(jù)類型 bool）

內(nèi)存寄存，默認為False。在數(shù)據(jù)返回前，是否將數(shù)據(jù)復制到CUDA內(nèi)存中。

9、drop_last：（數(shù)據(jù)類型 bool）

丟棄最后數(shù)據(jù)，默認為False。設置了 batch_size 的數(shù)目后，最后一批數(shù)據(jù)未必是設置的數(shù)目，有可能會小些。這時你是否需要丟棄這批數(shù)據(jù)。

10、timeout：（數(shù)據(jù)類型 numeric）

超時，默認為0。是用來設置數(shù)據(jù)讀取的超時時間的，但超過這個時間還沒讀取到數(shù)據(jù)的話就會報錯。所以，數(shù)值必須大于等于0。

11、worker_init_fn（數(shù)據(jù)類型 callable，沒見過的類型）

子進程導入模式，默認為Noun。在數(shù)據(jù)導入前和步長結(jié)束后，根據(jù)工作子進程的ID逐個按順序?qū)霐?shù)據(jù)。

對batch_size舉例分析：

"""
    批訓練，把數(shù)據(jù)變成一小批一小批數(shù)據(jù)進行訓練。
    DataLoader就是用來包裝所使用的數(shù)據(jù)，每次拋出一批數(shù)據(jù)
"""
import torch
import torch.utils.data as Data
 
BATCH_SIZE = 5
 
x = torch.linspace(1, 11, 11)
y = torch.linspace(11, 1, 11)
print(x)
print(y)
# 把數(shù)據(jù)放在數(shù)據(jù)庫中
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    # 從數(shù)據(jù)庫中每次抽出batch size個樣本
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    # num_workers=2,
)
 
def show_batch():
    for epoch in range(3):
        for step, (batch_x, batch_y) in enumerate(loader):
            # training
            print("steop:{}, batch_x:{}, batch_y:{}".format(step, batch_x, batch_y))
 
if __name__ == '__main__':
    show_batch()

輸出為：

tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
tensor([11., 10., 9., 8., 7., 6., 5., 4., 3., 2., 1.])
steop:0, batch_x:tensor([ 3., 2., 8., 11., 1.]), batch_y:tensor([ 9., 10., 4., 1., 11.])
steop:1, batch_x:tensor([ 5., 6., 7., 4., 10.]), batch_y:tensor([7., 6., 5., 8., 2.])
steop:2, batch_x:tensor([9.]), batch_y:tensor([3.])
steop:0, batch_x:tensor([ 9., 7., 10., 2., 4.]), batch_y:tensor([ 3., 5., 2., 10., 8.])
steop:1, batch_x:tensor([ 5., 11., 3., 6., 8.]), batch_y:tensor([7., 1., 9., 6., 4.])
steop:2, batch_x:tensor([1.]), batch_y:tensor([11.])
steop:0, batch_x:tensor([10., 5., 7., 4., 2.]), batch_y:tensor([ 2., 7., 5., 8., 10.])
steop:1, batch_x:tensor([3., 9., 1., 8., 6.]), batch_y:tensor([ 9., 3., 11., 4., 6.])
steop:2, batch_x:tensor([11.]), batch_y:tensor([1.])

Process finished with exit code 0

若drop_last=True

"""
    批訓練，把數(shù)據(jù)變成一小批一小批數(shù)據(jù)進行訓練。
    DataLoader就是用來包裝所使用的數(shù)據(jù)，每次拋出一批數(shù)據(jù)
"""
import torch
import torch.utils.data as Data
 
BATCH_SIZE = 5
 
x = torch.linspace(1, 11, 11)
y = torch.linspace(11, 1, 11)
print(x)
print(y)
# 把數(shù)據(jù)放在數(shù)據(jù)庫中
torch_dataset = Data.TensorDataset(x, y)
loader = Data.DataLoader(
    # 從數(shù)據(jù)庫中每次抽出batch size個樣本
    dataset=torch_dataset,
    batch_size=BATCH_SIZE,
    shuffle=True,
    # num_workers=2,
    drop_last=True,
)
 
def show_batch():
    for epoch in range(3):
        for step, (batch_x, batch_y) in enumerate(loader):
            # training
            print("steop:{}, batch_x:{}, batch_y:{}".format(step, batch_x, batch_y))
 
if __name__ == '__main__':
    show_batch()

對應的輸出為：

tensor([ 1., 2., 3., 4., 5., 6., 7., 8., 9., 10., 11.])
tensor([11., 10., 9., 8., 7., 6., 5., 4., 3., 2., 1.])
steop:0, batch_x:tensor([ 9., 2., 7., 4., 11.]), batch_y:tensor([ 3., 10., 5., 8., 1.])
steop:1, batch_x:tensor([ 3., 5., 10., 1., 8.]), batch_y:tensor([ 9., 7., 2., 11., 4.])
steop:0, batch_x:tensor([ 5., 11., 6., 1., 2.]), batch_y:tensor([ 7., 1., 6., 11., 10.])
steop:1, batch_x:tensor([ 3., 4., 10., 8., 9.]), batch_y:tensor([9., 8., 2., 4., 3.])
steop:0, batch_x:tensor([10., 4., 9., 8., 7.]), batch_y:tensor([2., 8., 3., 4., 5.])
steop:1, batch_x:tensor([ 6., 1., 11., 2., 5.]), batch_y:tensor([ 6., 11., 1., 10., 7.])

Process finished with exit code 0

總結(jié)

到此這篇關(guān)于PyTorch中torch.utils.data.DataLoader的文章就介紹到這了,更多相關(guān)PyTorch torch.utils.data.DataLoader內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家！

您可能感興趣的文章: