淺談tensorflow語(yǔ)義分割api的使用(deeplab訓(xùn)練cityscapes)
淺談tensorflow語(yǔ)義分割api的使用(deeplab訓(xùn)練cityscapes)
遇到的坑:
1. 環(huán)境:
- tensorflow1.8+CUDA9.0+cudnn7.0+annaconda3+py3.5
- 使用最新的tensorflow1.12或者1.10都不行,報(bào)錯(cuò):報(bào)錯(cuò)不造卷積算法(convolution algorithm...)
2. 數(shù)據(jù)集轉(zhuǎn)換
# Exit immediately if a command exits with a non-zero status.
set -e
CURRENT_DIR=$(pwd)
WORK_DIR="."
# Root path for Cityscapes dataset.
CITYSCAPES_ROOT="${WORK_DIR}/cityscapes"
# Create training labels.
python "${CITYSCAPES_ROOT}/cityscapesscripts/preparation/createTrainIdLabelImgs.py"
# Build TFRecords of the dataset.
# First, create output directory for storing TFRecords.
OUTPUT_DIR="${CITYSCAPES_ROOT}/tfrecord"
mkdir -p "${OUTPUT_DIR}"
BUILD_SCRIPT="${CURRENT_DIR}/build_cityscapes_data.py"
echo "Converting Cityscapes dataset..."
python "${BUILD_SCRIPT}" \
--cityscapes_root="${CITYSCAPES_ROOT}" \
--output_dir="${OUTPUT_DIR}" \
- 首先當(dāng)前conda環(huán)境下安裝cityscapesScripts模塊,要支持py3.5才行;
- 由于cityscapesscripts/preparation/createTrainIdLabelImgs.py里面默認(rèn)會(huì)把數(shù)據(jù)集gtFine下面的test,train,val文件夾json文件都轉(zhuǎn)為TrainIdlandelImgs.png;然而在test文件下有很多json文件編碼格式是錯(cuò)誤的,大約十幾張,每次報(bào)錯(cuò),然后將其剔除?。?!
- 然后執(zhí)行build_cityscapes_data.py將img,lable轉(zhuǎn)換為tfrecord格式。
3. 訓(xùn)練cityscapes代碼
- 將訓(xùn)練代碼寫成腳本文件:train_deeplab_cityscapes.sh
#!/bin/bash
# CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco
PATH_TO_INITIAL_CHECKPOINT='/home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt'
PATH_TO_TRAIN_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/'
PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord'
WORK_DIR='/home/rjw/tf-models/research/deeplab'
# From tensorflow/models/research/
python "${WORK_DIR}"/train.py \
--logtostderr \
--training_number_of_steps=40000 \
--train_split="train" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--train_crop_size=513 \
--train_crop_size=513 \
--train_batch_size=1 \
--fine_tune_batch_norm=False \
--dataset="cityscapes" \
--tf_initial_checkpoint=${PATH_TO_INITIAL_CHECKPOINT} \
--train_logdir=${PATH_TO_TRAIN_DIR} \
--dataset_dir=${PATH_TO_DATASET}
參數(shù)分析:
training_number_of_steps: 訓(xùn)練迭代次數(shù);
train_crop_size:訓(xùn)練圖片的裁剪大小,因?yàn)槲业腉PU只有8G,故我將這個(gè)設(shè)置為513了;
train_batch_size: 訓(xùn)練的batchsize,也是因?yàn)橛布l件,故保持1;
fine_tune_batch_norm=False :是否使用batch_norm,官方建議,如果訓(xùn)練的batch_size小于12的話,須將該參數(shù)設(shè)置為False,這個(gè)設(shè)置很重要,否則的話訓(xùn)練時(shí)會(huì)在2000步左右報(bào)錯(cuò)
tf_initial_checkpoint:預(yù)訓(xùn)練的初始checkpoint,這里設(shè)置的即是前面下載的../research/deeplab/backbone/deeplabv3_cityscapes_train/model.ckpt.index
train_logdir: 保存訓(xùn)練權(quán)重的目錄,注意在開始的創(chuàng)建工程目錄的時(shí)候就創(chuàng)建了,這里設(shè)置為"../research/deeplab/exp/train_on_train_set/train/"
dataset_dir:數(shù)據(jù)集的地址,前面創(chuàng)建的TFRecords目錄。這里設(shè)置為"../dataset/cityscapes/tfrecord"
4.驗(yàn)證測(cè)試
- 驗(yàn)證腳本:
#!/bin/bash
# CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco
PATH_TO_INITIAL_CHECKPOINT='/home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/'
PATH_TO_CHECKPOINT='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/'
PATH_TO_EVAL_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/eval/'
PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord'
WORK_DIR='/home/rjw/tf-models/research/deeplab'
# From tensorflow/models/research/
python "${WORK_DIR}"/eval.py \
--logtostderr \
--eval_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--eval_crop_size=1025 \
--eval_crop_size=2049 \
--dataset="cityscapes" \
--checkpoint_dir=${PATH_TO_INITIAL_CHECKPOINT} \
--eval_logdir=${PATH_TO_EVAL_DIR} \
--dataset_dir=${PATH_TO_DATASET}
- rusult:model.ckpt-40000為在初始化模型上訓(xùn)練40000次迭代的模型;后面用初始化模型測(cè)試miou_1.0還是很低,不知道是不是有什么參數(shù)設(shè)置的問題?。?!
- 注意,如果使用官方提供的checkpoint,壓縮包中是沒有checkpoint文件的,需要手動(dòng)添加一個(gè)checkpoint文件;初始化模型中是沒有提供chekpoint文件的。
INFO:tensorflow:Restoring parameters from /home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/model.ckpt-40000 INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting evaluation at 2018-12-18-07:13:08 INFO:tensorflow:Evaluation [50/500] INFO:tensorflow:Evaluation [100/500] INFO:tensorflow:Evaluation [150/500] INFO:tensorflow:Evaluation [200/500] INFO:tensorflow:Evaluation [250/500] INFO:tensorflow:Evaluation [300/500] INFO:tensorflow:Evaluation [350/500] INFO:tensorflow:Evaluation [400/500] INFO:tensorflow:Evaluation [450/500] miou_1.0[0.478293568] INFO:tensorflow:Waiting for new checkpoint at /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/ INFO:tensorflow:Found new checkpoint at /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt INFO:tensorflow:Graph was finalized. 2018-12-18 15:18:05.210957: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1435] Adding visible gpu devices: 0 2018-12-18 15:18:05.211047: I tensorflow/core/common_runtime/gpu/gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix: 2018-12-18 15:18:05.211077: I tensorflow/core/common_runtime/gpu/gpu_device.cc:929] 0 2018-12-18 15:18:05.211100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:942] 0: N 2018-12-18 15:18:05.211645: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 9404 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1) INFO:tensorflow:Restoring parameters from /home/rjw/tf-models/research/deeplab/pretrain_models/deeplabv3_cityscapes_train/model.ckpt INFO:tensorflow:Running local_init_op. INFO:tensorflow:Done running local_init_op. INFO:tensorflow:Starting evaluation at 2018-12-18-07:18:06 INFO:tensorflow:Evaluation [50/500] INFO:tensorflow:Evaluation [100/500] INFO:tensorflow:Evaluation [150/500] INFO:tensorflow:Evaluation [200/500] INFO:tensorflow:Evaluation [250/500] INFO:tensorflow:Evaluation [300/500] INFO:tensorflow:Evaluation [350/500] INFO:tensorflow:Evaluation [400/500] INFO:tensorflow:Evaluation [450/500] miou_1.0[0.496331513]
5.可視化測(cè)試
- 在vis目錄下生成分割結(jié)果圖
#!/bin/bash
# CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py --backbone resnet --lr 0.01 --workers 4 --epochs 40 --batch-size 16 --gpu-ids 0,1,2,3 --checkname deeplab-resnet --eval-interval 1 --dataset coco
PATH_TO_CHECKPOINT='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/train/'
PATH_TO_VIS_DIR='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/exp/train_on_train_set/vis/'
PATH_TO_DATASET='/home/rjw/tf-models/research/deeplab/datasets/cityscapes/tfrecord'
WORK_DIR='/home/rjw/tf-models/research/deeplab'
# From tensorflow/models/research/
python "${WORK_DIR}"/vis.py \
--logtostderr \
--vis_split="val" \
--model_variant="xception_65" \
--atrous_rates=6 \
--atrous_rates=12 \
--atrous_rates=18 \
--output_stride=16 \
--decoder_output_stride=4 \
--vis_crop_size=1025 \
--vis_crop_size=2049 \
--dataset="cityscapes" \
--colormap_type="cityscapes" \
--checkpoint_dir=${PATH_TO_CHECKPOINT} \
--vis_logdir=${PATH_TO_VIS_DIR} \
--dataset_dir=${PATH_TO_DATASET}
以上為個(gè)人經(jīng)驗(yàn),希望能給大家一個(gè)參考,也希望大家多多支持腳本之家。
相關(guān)文章
pytorch GPU計(jì)算比CPU還慢的可能原因分析
這篇文章主要介紹了pytorch GPU計(jì)算比CPU還慢的可能原因,具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2024-02-02
Django權(quán)限機(jī)制實(shí)現(xiàn)代碼詳解
這篇文章主要介紹了Django權(quán)限機(jī)制實(shí)現(xiàn)代碼詳解,分享了相關(guān)代碼示例,小編覺得還是挺不錯(cuò)的,具有一定借鑒價(jià)值,需要的朋友可以參考下2018-02-02
Python腳本簡(jiǎn)單實(shí)現(xiàn)打開默認(rèn)瀏覽器登錄人人和打開QQ的方法
這篇文章主要介紹了Python腳本簡(jiǎn)單實(shí)現(xiàn)打開默認(rèn)瀏覽器登錄人人和打開QQ的方法,涉及Python針對(duì)瀏覽器及應(yīng)用程序的相關(guān)操作技巧,代碼非常簡(jiǎn)單實(shí)用,需要的朋友可以參考下2016-04-04
用Python實(shí)現(xiàn)隨機(jī)森林算法的示例
這篇文章主要介紹了用Python實(shí)現(xiàn)隨機(jī)森林算法,小編覺得挺不錯(cuò)的,現(xiàn)在分享給大家,也給大家做個(gè)參考。一起跟隨小編過來看看吧2017-08-08
python機(jī)器學(xué)習(xí)理論與實(shí)戰(zhàn)(五)支持向量機(jī)
這篇文章主要為大家詳細(xì)介紹了python機(jī)器學(xué)習(xí)理論與實(shí)戰(zhàn)第五篇,支持向量機(jī)的相關(guān)資料,具有一定的參考價(jià)值,感興趣的小伙伴們可以參考一下2018-01-01
Python3.x檢查內(nèi)存可用大小的兩種實(shí)現(xiàn)
本文將介紹如何使用Python 3實(shí)現(xiàn)檢查L(zhǎng)inux服務(wù)器內(nèi)存可用大小的方法,包括使用Python標(biāo)準(zhǔn)庫(kù)實(shí)現(xiàn)和使用Linux命令實(shí)現(xiàn)兩種方式,感興趣可以了解一下2023-05-05
解讀殘差網(wǎng)絡(luò)(Residual Network),殘差連接(skip-connect)
這篇文章主要介紹了殘差網(wǎng)絡(luò)(Residual Network),殘差連接(skip-connect),具有很好的參考價(jià)值,希望對(duì)大家有所幫助,如有錯(cuò)誤或未考慮完全的地方,望不吝賜教2023-08-08
Python和Anaconda的版本對(duì)應(yīng)關(guān)系
這篇文章主要為大家介紹了Python和Anaconda,有需要的朋友可以借鑒參考下,希望能夠有所幫助,祝大家多多進(jìn)步,早日升職加薪2023-06-06

