Python實現GIF動圖以及視頻卡通化詳解

更新時間：2021年12月06日 10:09:02 作者：劍客阿良_ALiang

本文主要介紹了如何使用Python中的animegan2-pytorch實現動圖以及視頻的卡通化效果，文中的代碼具有一定的學習價值，需要的朋友可以參考一下

前言

參考文章：Python實現照片卡通化

我繼續(xù)魔改一下，讓該模型可以支持將gif動圖或者視頻，也做成卡通化效果。畢竟一張圖可以那就帶邊視頻也可以，沒毛病。所以繼給次元壁來了一拳，我在加兩腳。

項目github地址：github地址

環(huán)境依賴

除了參考文章中的依賴，還需要加一些其他依賴，requirements.txt如下：

其他環(huán)境不太清楚的，可以看我前言鏈接地址的文章，有具體說明。

核心代碼

不廢話了，先上gif代碼。

gif動圖卡通化

實現代碼如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/5 18:10
# @Author  : 劍客阿良_ALiang
# @Site    : 
# @File    : gif_cartoon_tool.py
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/5 0:26
# @Author  : 劍客阿良_ALiang
# @Site    :
# @File    : video_cartoon_tool.py
 
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/4 22:34
# @Author  : 劍客阿良_ALiang
# @Site    :
# @File    : image_cartoon_tool.py
 
from PIL import Image, ImageEnhance, ImageSequence
import torch
from torchvision.transforms.functional import to_tensor, to_pil_image
from torch import nn
import os
import torch.nn.functional as F
import uuid
import imageio
 
 
# -------------------------- hy add 01 --------------------------
class ConvNormLReLU(nn.Sequential):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
        pad_layer = {
            "zero": nn.ZeroPad2d,
            "same": nn.ReplicationPad2d,
            "reflect": nn.ReflectionPad2d,
        }
        if pad_mode not in pad_layer:
            raise NotImplementedError
 
        super(ConvNormLReLU, self).__init__(
            pad_layer[pad_mode](padding),
            nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias),
            nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True),
            nn.LeakyReLU(0.2, inplace=True)
        )
 
 
class InvertedResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, expansion_ratio=2):
        super(InvertedResBlock, self).__init__()
 
        self.use_res_connect = in_ch == out_ch
        bottleneck = int(round(in_ch * expansion_ratio))
        layers = []
        if expansion_ratio != 1:
            layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0))
 
        # dw
        layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True))
        # pw
        layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False))
        layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True))
 
        self.layers = nn.Sequential(*layers)
 
    def forward(self, input):
        out = self.layers(input)
        if self.use_res_connect:
            out = input + out
        return out
 
 
class Generator(nn.Module):
    def __init__(self, ):
        super().__init__()
 
        self.block_a = nn.Sequential(
            ConvNormLReLU(3, 32, kernel_size=7, padding=3),
            ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(64, 64)
        )
 
        self.block_b = nn.Sequential(
            ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(128, 128)
        )
 
        self.block_c = nn.Sequential(
            ConvNormLReLU(128, 128),
            InvertedResBlock(128, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            ConvNormLReLU(256, 128),
        )
 
        self.block_d = nn.Sequential(
            ConvNormLReLU(128, 128),
            ConvNormLReLU(128, 128)
        )
 
        self.block_e = nn.Sequential(
            ConvNormLReLU(128, 64),
            ConvNormLReLU(64, 64),
            ConvNormLReLU(64, 32, kernel_size=7, padding=3)
        )
 
        self.out_layer = nn.Sequential(
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False),
            nn.Tanh()
        )
 
    def forward(self, input, align_corners=True):
        out = self.block_a(input)
        half_size = out.size()[-2:]
        out = self.block_b(out)
        out = self.block_c(out)
 
        if align_corners:
            out = F.interpolate(out, half_size, mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_d(out)
 
        if align_corners:
            out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_e(out)
 
        out = self.out_layer(out)
        return out
 
 
# -------------------------- hy add 02 --------------------------
 
def handle(gif_path: str, output_dir: str, type: int, device='cpu'):
    _ext = os.path.basename(gif_path).strip().split('.')[-1]
    if type == 1:
        _checkpoint = './weights/paprika.pt'
    elif type == 2:
        _checkpoint = './weights/face_paint_512_v1.pt'
    elif type == 3:
        _checkpoint = './weights/face_paint_512_v2.pt'
    elif type == 4:
        _checkpoint = './weights/celeba_distill.pt'
    else:
        raise Exception('type not support')
    os.makedirs(output_dir, exist_ok=True)
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    net.to(device).eval()
    result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext))
    img = Image.open(gif_path)
    out_images = []
    for frame in ImageSequence.Iterator(img):
        frame = frame.convert("RGB")
        with torch.no_grad():
            image = to_tensor(frame).unsqueeze(0) * 2 - 1
            out = net(image.to(device), False).cpu()
            out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
            out = to_pil_image(out)
            out_images.append(out)
    # out_images[0].save(result, save_all=True, loop=True, append_images=out_images[1:], duration=100)
    imageio.mimsave(result, out_images, fps=15)
    return result
 
 
if __name__ == '__main__':
    print(handle('samples/gif/128.gif', 'samples/gif_result/', 3, 'cuda'))

代碼說明：

1、主要的handle方法入參分別為：gif地址、輸出目錄、類型、設備使用（默認cpu，可選cuda使用顯卡）。

2、類型主要是選擇模型，最好用3，人像處理更生動一些。

執(zhí)行驗證一下

下面是我準備的gif素材

執(zhí)行結果如下

看一下效果

哈哈，有點意思哦。

視頻卡通化

實現代碼如下：

#!/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/5 0:26
# @Author  : 劍客阿良_ALiang
# @Site    : 
# @File    : video_cartoon_tool.py
 
# !/usr/bin/env python
# -*- coding: utf-8 -*-
# @Time    : 2021/12/4 22:34
# @Author  : 劍客阿良_ALiang
# @Site    :
# @File    : image_cartoon_tool.py
 
from PIL import Image, ImageEnhance
import torch
from torchvision.transforms.functional import to_tensor, to_pil_image
from torch import nn
import os
import torch.nn.functional as F
import uuid
import cv2
import numpy as np
import time
from ffmpy import FFmpeg
 
 
# -------------------------- hy add 01 --------------------------
class ConvNormLReLU(nn.Sequential):
    def __init__(self, in_ch, out_ch, kernel_size=3, stride=1, padding=1, pad_mode="reflect", groups=1, bias=False):
        pad_layer = {
            "zero": nn.ZeroPad2d,
            "same": nn.ReplicationPad2d,
            "reflect": nn.ReflectionPad2d,
        }
        if pad_mode not in pad_layer:
            raise NotImplementedError
 
        super(ConvNormLReLU, self).__init__(
            pad_layer[pad_mode](padding),
            nn.Conv2d(in_ch, out_ch, kernel_size=kernel_size, stride=stride, padding=0, groups=groups, bias=bias),
            nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True),
            nn.LeakyReLU(0.2, inplace=True)
        )
 
 
class InvertedResBlock(nn.Module):
    def __init__(self, in_ch, out_ch, expansion_ratio=2):
        super(InvertedResBlock, self).__init__()
 
        self.use_res_connect = in_ch == out_ch
        bottleneck = int(round(in_ch * expansion_ratio))
        layers = []
        if expansion_ratio != 1:
            layers.append(ConvNormLReLU(in_ch, bottleneck, kernel_size=1, padding=0))
 
        # dw
        layers.append(ConvNormLReLU(bottleneck, bottleneck, groups=bottleneck, bias=True))
        # pw
        layers.append(nn.Conv2d(bottleneck, out_ch, kernel_size=1, padding=0, bias=False))
        layers.append(nn.GroupNorm(num_groups=1, num_channels=out_ch, affine=True))
 
        self.layers = nn.Sequential(*layers)
 
    def forward(self, input):
        out = self.layers(input)
        if self.use_res_connect:
            out = input + out
        return out
 
 
class Generator(nn.Module):
    def __init__(self, ):
        super().__init__()
 
        self.block_a = nn.Sequential(
            ConvNormLReLU(3, 32, kernel_size=7, padding=3),
            ConvNormLReLU(32, 64, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(64, 64)
        )
 
        self.block_b = nn.Sequential(
            ConvNormLReLU(64, 128, stride=2, padding=(0, 1, 0, 1)),
            ConvNormLReLU(128, 128)
        )
 
        self.block_c = nn.Sequential(
            ConvNormLReLU(128, 128),
            InvertedResBlock(128, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            InvertedResBlock(256, 256, 2),
            ConvNormLReLU(256, 128),
        )
 
        self.block_d = nn.Sequential(
            ConvNormLReLU(128, 128),
            ConvNormLReLU(128, 128)
        )
 
        self.block_e = nn.Sequential(
            ConvNormLReLU(128, 64),
            ConvNormLReLU(64, 64),
            ConvNormLReLU(64, 32, kernel_size=7, padding=3)
        )
 
        self.out_layer = nn.Sequential(
            nn.Conv2d(32, 3, kernel_size=1, stride=1, padding=0, bias=False),
            nn.Tanh()
        )
 
    def forward(self, input, align_corners=True):
        out = self.block_a(input)
        half_size = out.size()[-2:]
        out = self.block_b(out)
        out = self.block_c(out)
 
        if align_corners:
            out = F.interpolate(out, half_size, mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_d(out)
 
        if align_corners:
            out = F.interpolate(out, input.size()[-2:], mode="bilinear", align_corners=True)
        else:
            out = F.interpolate(out, scale_factor=2, mode="bilinear", align_corners=False)
        out = self.block_e(out)
 
        out = self.out_layer(out)
        return out
 
 
# -------------------------- hy add 02 --------------------------
 
def handle(video_path: str, output_dir: str, type: int, fps: int, device='cpu'):
    _ext = os.path.basename(video_path).strip().split('.')[-1]
    if type == 1:
        _checkpoint = './weights/paprika.pt'
    elif type == 2:
        _checkpoint = './weights/face_paint_512_v1.pt'
    elif type == 3:
        _checkpoint = './weights/face_paint_512_v2.pt'
    elif type == 4:
        _checkpoint = './weights/celeba_distill.pt'
    else:
        raise Exception('type not support')
    os.makedirs(output_dir, exist_ok=True)
    # 獲取視頻音頻
    _audio = extract(video_path, output_dir, 'wav')
    net = Generator()
    net.load_state_dict(torch.load(_checkpoint, map_location="cpu"))
    net.to(device).eval()
    result = os.path.join(output_dir, '{}.{}'.format(uuid.uuid1().hex, _ext))
    capture = cv2.VideoCapture(video_path)
    size = (int(capture.get(cv2.CAP_PROP_FRAME_WIDTH)), int(capture.get(cv2.CAP_PROP_FRAME_HEIGHT)))
    print(size)
    videoWriter = cv2.VideoWriter(result, cv2.VideoWriter_fourcc(*'mp4v'), fps, size)
    cul = 0
    with torch.no_grad():
        while True:
            ret, frame = capture.read()
            if ret:
                print(ret)
                image = to_tensor(frame).unsqueeze(0) * 2 - 1
                out = net(image.to(device), False).cpu()
                out = out.squeeze(0).clip(-1, 1) * 0.5 + 0.5
                out = to_pil_image(out)
                contrast_enhancer = ImageEnhance.Contrast(out)
                img_enhanced_image = contrast_enhancer.enhance(2)
                enhanced_image = np.asarray(img_enhanced_image)
                videoWriter.write(enhanced_image)
                cul += 1
                print('第{}張圖'.format(cul))
            else:
                break
    videoWriter.release()
    # 視頻添加原音頻
    _final_video = video_add_audio(result, _audio, output_dir)
    return _final_video
 
 
# -------------------------- hy add 03 --------------------------
def extract(video_path: str, tmp_dir: str, ext: str):
    file_name = '.'.join(os.path.basename(video_path).split('.')[0:-1])
    print('文件名:{}，提取音頻'.format(file_name))
    if ext == 'mp3':
        return _run_ffmpeg(video_path, os.path.join(tmp_dir, '{}.{}'.format(uuid.uuid1().hex, ext)), 'mp3')
    if ext == 'wav':
        return _run_ffmpeg(video_path, os.path.join(tmp_dir, '{}.{}'.format(uuid.uuid1().hex, ext)), 'wav')
 
 
def _run_ffmpeg(video_path: str, audio_path: str, format: str):
    ff = FFmpeg(inputs={video_path: None},
                outputs={audio_path: '-f {} -vn'.format(format)})
    print(ff.cmd)
    ff.run()
    return audio_path
 
 
# 視頻添加音頻
def video_add_audio(video_path: str, audio_path: str, output_dir: str):
    _ext_video = os.path.basename(video_path).strip().split('.')[-1]
    _ext_audio = os.path.basename(audio_path).strip().split('.')[-1]
    if _ext_audio not in ['mp3', 'wav']:
        raise Exception('audio format not support')
    _codec = 'copy'
    if _ext_audio == 'wav':
        _codec = 'aac'
    result = os.path.join(
        output_dir, '{}.{}'.format(
            uuid.uuid4(), _ext_video))
    ff = FFmpeg(
        inputs={video_path: None, audio_path: None},
        outputs={result: '-map 0:v -map 1:a -c:v copy -c:a {} -shortest'.format(_codec)})
    print(ff.cmd)
    ff.run()
    return result
 
 
if __name__ == '__main__':
    print(handle('samples/video/981.mp4', 'samples/video_result/', 3, 25, 'cuda'))

代碼說明

1、主要的實現方法入參分別為：視頻地址、輸出目錄、類型、fps（幀率）、設備類型（默認cpu，可選擇cuda顯卡模式）。

2、類型主要是選擇模型，最好用3，人像處理更生動一些。

3、代碼設計思路：先將視頻音頻提取出來、將視頻逐幀處理后寫入新視頻、新視頻和原視頻音頻融合。

關于如何視頻提取音頻可以參考我的另一篇文章：python 提取視頻中的音頻

關于如何視頻融合音頻可以參考我的另一篇文章：Python 視頻添加音頻

4、視頻中間會產生臨時文件，沒有清理，如需要可以修改代碼自行清理。

驗證一下

下面是我準備的視頻素材截圖，我會上傳到github上。

執(zhí)行結果

看看效果截圖

還是很不錯的哦。

總結

這次可不是沒什么好總結的，總結的東西蠻多的。首先我說一下這個開源項目目前模型的一些問題。

1、我測試了不少圖片，總的來說對亞洲人的臉型不能很好的卡通化，但是歐美的臉型都比較好。所以還是訓練的數據不是很夠，但是能理解，畢竟要專門做卡通化的標注數據想想就是蠻頭疼的事。所以我建議大家在使用的時候，多關注一下項目是否更新了最新的模型。

2、視頻一但有字幕，會對字幕也做處理。所以可以考慮找一些視頻和字幕分開的素材，效果會更好一些。

以上就是Python實現GIF動圖以及視頻卡通化詳解的詳細內容，更多關于Python 動圖視頻卡通化的資料請關注腳本之家其它相關文章！

您可能感興趣的文章:

Python類和對象的定義與實際應用案例分析
這篇文章主要介紹了Python類和對象的定義與實際應用,結合三個具體案例形式分析了Python面向對象程序設計中類與對象的定義、應用、設計模式等相關操作技巧,需要的朋友可以參考下
2018-12-12
python?pygame實現五子棋雙人聯機
這篇文章主要為大家詳細介紹了python?pygame實現五子棋雙人聯機，文中示例代碼介紹的非常詳細，具有一定的參考價值，感興趣的小伙伴們可以參考一下
2022-05-05
Python 數據分析之逐塊讀取文本的實現
這篇文章主要介紹了Python 數據分析之逐塊讀取文本的實現，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧
2020-12-12
Python利用sqlacodegen自動生成ORM實體類示例
這篇文章主要介紹了Python利用sqlacodegen自動生成ORM實體類,結合實例形式分析了Python sqlacodegen安裝技巧ORM實體類相關實現技巧,需要的朋友可以參考下
2019-06-06
如何用Python Beautiful?Soup解析HTML內容
Beautiful Soup是一種Python的解析庫，主要用于解析和處理HTML/XML內容，詳細介紹Beautiful Soup的使用方式和應用場景,本文給大家介紹的非常詳細，需要的朋友可以參考下
2023-05-05
Python利用pandas處理Excel數據的應用詳解
這篇文章主要介紹了Python利用pandas處理Excel數據的應用詳解，文中通過示例代碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值，需要的朋友們下面隨著小編來一起學習學習吧
2019-06-06
Python中uuid模塊的應用實例詳解
這篇文章主要介紹了Python中uuid模塊應用的相關資料,該模塊提供了多種方法生成UUID,包括uuid1()、uuid3()、uuid4()和uuid5(),并解釋了UUID的格式,UUID在數據庫、分布式系統(tǒng)和網絡協議中廣泛應用,是處理唯一標識符的有力工具,需要的朋友可以參考下
2024-11-11
Python實現清理重復文件功能的示例代碼
在電腦上或多或少的存在一些重復文件，體積小的倒沒什么，如果體積大的就很占內存了。本文用python制作了一個刪除重復文件的小工具，核心代碼很簡單，希望對你有所幫助
2022-07-07
python生成lmdb格式的文件實例
今天小編就為大家分享一篇python生成lmdb格式的文件實例，具有很好的參考價值，希望對大家有所幫助。一起跟隨小編過來看看吧
2018-11-11
python算法練習之兔子產子（斐波那切數列）
這篇文章主要給大家介紹python算法練習兔子產子，文章先進行問題描述及分析然后設計算法最后再得出完整程序,需要的朋友可以參考一下文章得具體內容
2021-10-10