如何上线部署Pytorch深度学习模型到生产环境中

这期内容当中小编将会给大家带来有关如何上线部署Pytorch深度学习模型到生产环境中，文章内容丰富且以专业的角度为大家分析和叙述，阅读完这篇文章希望大家可以有所收获。

创新互联建站-成都网站建设公司，专注成都网站设计、成都网站制作、外贸网站建设、网站营销推广，域名与空间，虚拟空间，网站改版维护有关企业网站制作方案、改版、费用等问题，请联系创新互联建站。

Pytorch模型部署准备

Pytorch和TensorFlow是目前使用最广泛的两种深度学习框架，在上一篇文章《自动部署深度神经网络模型TensorFlow（Keras）到生产环境中》中我们介绍了如何通过AutoDeployAI的AI模型部署和管理系统DaaS（Deployment-as-a-Service）来自动部署TensorFlow模型，本篇我们将介绍如果通过DaaS来自动部署Pytorch深度神经网络模型，同样我们需要：

安装Python DaaS-Client
初始化DaasClient
创建项目

完整的代码，请参考Github上的Notebook：deploy-pytorch.ipynb

Pytorch自定义运行时

DaaS是基于Kubernetes的AI模型自动部署系统，模型运行在Docker Container中，在DaaS中被称为运行时（Runtime），有两类不同的运行时，分别为网络服务运行环境（Environment）和任务运行环境（Worker）。Environment用于创建网络服务（Web Service），而Worker用于执行任务（Job）的部署，比如模型评估和批量预测等。DaaS默认自带了四套运行时，分别针对Environment和Worker基于不同语言Python2.7和Python3.7，自带了大部分常用的机器学习和深度学习类库，但是因为Docker镜像（Image）大小的缘故，暂时没有包含Pytorch库。

DaaS提供了自定义运行时功能，允许用户把自定义Docker镜像注册为Runtime，满足用户使用不同模型类型，模型版本的定制需求。下面，我们以部署Pytorch模型为例，详细介绍如何创建自定义运行时:

1. 构建Docker镜像：

一般来说，有两种方式创建Image，一种是通过Dockerfile构建（docker build），一种是通过Container生成（docker commit），这里我们使用第一种方式。无论那一种方式，都需要选定一个基础镜像，这里为了方便构建，我们选择了Pytorch官方镜像pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime。

为了创建网络服务运行时，除了包含模型运行的依赖类库外，还需要额外安装网络服务的一些基础库，完整的列表请参考requirements-service.txt。下载requirements-service.txt文件到当前目录，创建Dockerfile：

FROM pytorch/pytorch:1.5.1-cuda10.1-cudnn7-runtime

RUN mkdir -p /daas
WORKDIR /daas

COPY requirements-service.txt /daas

RUN pip install -r requirements-service.txt && rm -rf /root/.cache/pip

构建Image：

docker build -f Dockerfile -t pytorch:1.0 .

2. 推送Docker镜像到Kubernetes中：

构建好的Docker镜像必须推送到安装DaaS的Kubernetes环境能访问的地方，不同的Kubernetes环境有不同的Docker镜像访问机制，比如本地镜像，私有或者公有镜像注册表（Image Registry）。下面以Daas-MicroK8s为例，它使用的是MicroK8s本地镜像缓存（Local Images Cache）：

docker save pytorch:1.0 > pytorch.tar
microk8s ctr image import pytorch.tar

3. 创建Pytorch运行时：

登陆DaaS Web页面后，点击顶部菜单环境 / 运行时定义，下面页面会列出所有的有效运行时，可以看到DaaS自带的四种运行时：

如何上线部署Pytorch深度学习模型到生产环境中

点击按钮创建运行时，创建基于pytorch:1.0镜像的Environment运行时:

如何上线部署Pytorch深度学习模型到生产环境中

默认部署Pytorch模型

训练Pytorch模型。

使用torchvision中的MNIST数据来识别用户输入的数字，以下代码参考官方实例：Image classification (MNIST) using Convnets。

首先，定义一个无参函数返回用户定义模型类（继承自torch.nn.Module）的一个实例，函数中包含所有的依赖，可以独立运行，也就是说包含引入的第三方库，定义的类、函数或者变量等等。这是能自动部署Pytorch模型的关键。

# Define a function to create an instance of the Net class
def create_net():
    import torch
    import torch.nn as nn  # PyTorch's module wrapper
    import torch.nn.functional as F

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = nn.Conv2d(1, 32, 3, 1)
            self.conv2 = nn.Conv2d(32, 64, 3, 1)
            self.dropout1 = nn.Dropout2d(0.25)
            self.dropout2 = nn.Dropout2d(0.5)
            self.fc1 = nn.Linear(9216, 128)
            self.fc2 = nn.Linear(128, 10)

        def forward(self, x):
            x = self.conv1(x)
            x = F.relu(x)
            x = self.conv2(x)
            x = F.relu(x)
            x = F.max_pool2d(x, 2)
            x = self.dropout1(x)
            x = torch.flatten(x, 1)
            x = self.fc1(x)
            x = F.relu(x)
            x = self.dropout2(x)
            x = self.fc2(x)
            output = F.log_softmax(x, dim=1)
            return output
    return Net()

为了快速训练出模型，修改epochs=3

import torch
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.optim.lr_scheduler import StepLR

def train(model, device, train_loader, optimizer, epoch, log_interval):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = data.to(device), target.to(device)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % log_interval == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.item()))

def test(model, device, test_loader):
    model.eval()
    test_loss = 0
    correct = 0
    with torch.no_grad():
        for data, target in test_loader:
            data, target = data.to(device), target.to(device)
            output = model(data)
            test_loss += F.nll_loss(output, target, reduction='sum').item()  # sum up batch loss
            pred = output.argmax(dim=1, keepdim=True)  # get the index of the max log-probability
            correct += pred.eq(target.view_as(pred)).sum().item()

    test_loss /= len(test_loader.dataset)

    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))

use_cuda = torch.cuda.is_available()
batch_size = 64
test_batch_size = 1000
seed = 1234567
lr = 1.0
gamma = 0.7
log_interval = 10
epochs = 3

torch.manual_seed(seed)
device = torch.device("cuda" if use_cuda else "cpu")

kwargs = {
   'batch_size': batch_size}
if use_cuda:
    kwargs.update({
   'num_workers': 1,
                   'pin_memory': True,
                   'shuffle': True},
                  )

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.1307,), (0.3081,))
    ])
dataset1 = datasets.MNIST('./data', train=True, download=True, transform=transform)
dataset2 = datasets.MNIST('./data', train=False, transform=transform)
train_loader = torch.utils.data.DataLoader(dataset1, **kwargs)
test_loader = torch.utils.data.DataLoader(dataset2, **kwargs)

model = create_net().to(device)
optimizer = optim.Adadelta(model.parameters(), lr=lr)

scheduler = StepLR(optimizer, step_size=1, gamma=gamma)

for epoch in range(1, epochs + 1):
    train(model, device, train_loader, optimizer, epoch, log_interval)
    test(model, device, test_loader)
    scheduler.step()

发布Pytorch模型

模型训练成功后，通过客户端publish函数，发布模型到DaaS服务器端。通过设置测试数据集x_test和y_test，DaaS会自动侦测模型输入数据格式（类型和维数），挖掘模式（分类或者回归），评估模型，并且自动存储x_test中的第一行数据作为样例数据，以方便模型测试使用。参数source_object指定为上面定义的create_net函数，该函数代码会被自动存储到DaaS系统中。

batch_idx, (x_test, y_test) = next(enumerate(test_loader))

# Publish the built model into DaaS
publish_resp = client.publish(model,
                              name='pytorch-mnist',
                              x_test=x_test,
                              y_test=y_test,
                              source_object=create_net,
                              description='A Pytorch MNIST classification model')
pprint(publish_resp)

结果如下：

{
   'model_name': 'pytorch-mnist', 'model_version': '1'}

测试Pytorch模型

调用test函数，指定runtime为之前创建的pytorch：

test_resp = client.test(publish_resp['model_name'], 
                        model_version=publish_resp['model_version'],
                        runtime='pytorch')
pprint(test_resp)

返回值test_resp是一个字典类型的结果，记录了测试API信息，如下：

The runtime "pytorch" is starting
Waiting for it becomes available... 

{'access_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOjEwMDAsInVzZXJuYW1lIjoiYWRtaW4iLCJyb2xlIjoiYWRtaW4iLCJleHAiOjE1OTYwNzkyNzksImlhdCI6MTU5NjAzMjQ3OX0.kLO5R-yiTY6xOo14sAxZGwetQqiq5hDfPs5WZ7epSkDWKeDvyLkVP4VzWQxxlPyUX6SgGeCx0pq-of6SYVLPcOmR54a6W7b4ZfKgllKrssdMqaStclv0S2OFHeVXDIoy4cyoB99MjNaXOc6FCbNB4rae0ufu-eZLLYGlHbvV_c3mJtIIBvMZvonU1WCz6KDU2fEyDOt4hXsqzW4k7IvhyDP2geHWrkk0Jqcob8qag4qCYrNHLWRs8RJXBVXJ1Y9Z5PdhP6CGwt5Qtyf017s7L_BQW3_V9Wq-_qv3_TwcWEyCBTQ45RcCLoqzA-dlCbYgd8seurnI3HlYJZPOcrVY5w',
 'endpoint_url': 'https://192.168.64.7/api/v1/test/deployment-test/pytorch/test',
 'payload': {'args': {'X': [{'tensor_input': [[[[...], [...], ...]]]}],
                      'model_name': 'pytorch-mnist',
                      'model_version': '1'}}}

tensor_input是一个维数为(1, 1, 28, 28)的嵌套数组，以上未列出完整的数据值。

使用requests库调用测试API：

response = requests.post(test_resp['endpoint_url'],
                         headers={'Authorization': 'Bearer {token}'.format(token=test_resp['access_token'])},
                         json=test_resp['payload'],
                         verify=False)
pprint(response.json())

返回结果：

{'result': [{'tensor_output': [[-21.444242477416992,
                                -20.39040756225586,
                                -17.134702682495117,
                                -16.960391998291016,
                                -20.394105911254883,
                                -22.380189895629883,
                                -29.211040496826172,
                                -1.311301275563892e-06,
                                -20.16324234008789,
                                -13.592040061950684]]}],
 'stderr': [],
 'stdout': []}

测试结果除了预测值，还包括标准输出和标准错误输出的日志信息，方便用户的查看和调试。

验证测试结果

把预测结果与本地模型结果进行比较：

import numpy as np

desired = model(x_test[[0]]).detach().numpy()
actual = response.json()['result'][0]['tensor_output']
np.testing.assert_almost_equal(actual, desired)

正式部署Pytorch模型

测试成功后，可以进行正式的模型部署。与测试API test 类似，同样需要指定runtime为之前创建的pytorch。为了提升部署的性能和稳定性，可以为运行环境指定CPU核数、内存大小以及部署副本数，这些都可以通过 deploy 函数参数设定。

deploy_resp = client.deploy(model_name=publish_resp['model_name'], 
                            deployment_name=publish_resp['model_name'] + '-svc',
                            model_version=publish_resp['model_version'],
                            runtime='pytorch')
pprint(deploy_resp)

返回结果：

The deployment "pytorch-mnist-svc" created successfully
Waiting for it becomes available... 

{'access_token': 'eyJ0eXAiOiJKV1QiLCJhbGciOiJSUzI1NiJ9.eyJ1aWQiOjEwMDAsInVzZXJuYW1lIjoiYWRtaW4iLCJyb2xlIjoiYWRtaW4iLCJwcm9qZWN0TmFtZSI6Ilx1OTBlOFx1N2Y3Mlx1NmQ0Ylx1OGJkNSIsInByb2plY3RMYWJlbCI6ImRlcGxveW1lbnQtdGVzdCIsImlhdCI6MTU5NjAyODU2N30.iBGyYxCjD5mB_o2IbMkSKRlx9YVvfE3Ih-6LOE-cmp9VoDde-t3JLcDdS3Fg7vyVSIbre6XmYDQ_6IDjzy8XEOzxuxxdhwFPnW8Si1P-fbln5HkPhbDukImShM5ZAcfmD6fNWbz2S0JIgs8rM15d1WKGTC3n9yaXiVumWV1lTKImhl1tBF4ay_6YdCqKmLsrLX6UqbcZA5ZTqHaAG76xgK9vSo1aOOstKLTcloEkswpuMtkYo6ByouLznqQ_yklAYTthdrKX623OJdO3__DOkULq8E-am_c6R7FtyRvYwr4O5BKeHjKCxY6pHmc6PI4Yyyd_TJUTbNPX9fPxhZ4CRg',
 'endpoint_url': 'https://192.168.64.7/api/v1/svc/deployment-test/pytorch-mnist-svc/predict',
 'payload': {'args': {'X': [{'tensor_input': [[[[...],[...],...]]]}]}}}

使用requests库调用正式API：

response = requests.post(deploy_resp['endpoint_url'],
                         headers={'Authorization': 'Bearer {token}'.format(token=deploy_resp['access_token'])},
                         json=deploy_resp['payload'],
                         verify=False)
pprint(response.json())

结果如下：

{'result': [{'tensor_output': [[-21.444242477416992,
                                -20.39040756225586,
                                -17.134702682495117,
                                -16.960391998291016,
                                -20.394105911254883,
                                -22.380189895629883,
                                -29.211040496826172,
                                -1.311301275563892e-06,
                                -20.16324234008789,
                                -13.592040061950684]]}]}

正式部署结果和测试结果是相同的，除了通过DaaS-Client客户端程序，模型测试和模型部署，也可以在DaaS Web客户端完成，这里就不再赘述。

自定义部署Pytorch模型

在上面的默认模型部署中，我们看到模型的输入数据是维数为(, 1, 28, 28)的张量（Tensor），输出结果是(, 10)的张量，客户端调用部署REST API时，必须进行数据预处理和结果后处理，包括读取图像文件，转换成需要的张量格式，并且调用和模型训练相同的数据变换，比如上面的归一化操作（Normalize），最后通过张量结果计算出最终识别出的数字。

为了减轻客户端的负担，我们希望这些操作都能在部署服务器端完成，客户端直接输入图像，服务器端直接返回最终的识别数字。在DaaS中，可以通过模型自定义部署功能来满足以上需求，它允许用户自由添加任意的数据预处理和后处理操作，下面我们详细介绍如何自定义部署上面的Pytorch模型。

登陆DaaS Web客户端，查看pytorch-mnist模型信息：

如何上线部署Pytorch深度学习模型到生产环境中

切换到实时预测标签页，点击命令生成自定义实时预测脚本，生成预定义脚本：

如何上线部署Pytorch深度学习模型到生产环境中

我们看到函数create_net内容会被自动写入到生成的预测脚本中，点击命令高级设置，选择网络服务运行环境为pytorch：

如何上线部署Pytorch深度学习模型到生产环境中

点击作为API测试命令，页面切换到测试页面，修改preprocess_files函数，引入模型训练时的图像处理操作：

def preprocess_files(args):
    """preprocess the uploaded files"""
    files = args.get('files')
    if files is not None:
        # get the first record object in X if it's present
        if 'X' in args:
            record = args['X'][0]
        else:
            record = {}
            args['X'] = [record]
        
        from torchvision import transforms
        transform = transforms.Compose([
            transforms.ToTensor(),
            transforms.Normalize((0.1307,), (0.3081,))
            ])

        import numpy as np
        from PIL import Image
        for key, file in files.items():
            img = Image.open(file)
            normed = transform(img)
            record[key] = normed.numpy()

    return args

完成后，输入函数名predict，选择请求正文基于表单，输入名称tensor_input，选择文件，点击上传测试图像test.png（该图像为上面测试使用的数据），点击提交，右侧响应页面将会显示预测结果：

如何上线部署Pytorch深度学习模型到生产环境中

可以看到，结果与默认部署输出相同。继续修改postprocess函数为：

def postprocess(result):
    """postprocess the predicted results"""
    import numpy as np
    return [int(np.argmax(np.array(result).squeeze(), axis=0))]

重新提交，右侧响应页面显示结果为：

如何上线部署Pytorch深度学习模型到生产环境中

测试完成后，可以创建正式的部署，切换到部署标签页，点击命令添加网络服务，输入服务名称pytorch-mnist-custom-svc，网络服务运行环境选择pytorch，其他使用默认选项，点击创建。进入到部署页面后，点击测试标签页，该界面类似之前的脚本测试界面，输入函数名predict，请求正文选择基于表单，输入名称tensor_input，类型选择文件，点击上传测试的图片后，点击提交：

如何上线部署Pytorch深度学习模型到生产环境中

到此，正式部署已经测试和创建完成，用户可以使用任意的客户端程序调用该部署服务。点击以上界面中的生成代码命令，显示如何通过curl命令调用该服务，测试如下：

如何上线部署Pytorch深度学习模型到生产环境中

通过ONNX部署Pytorch模型

除了通过以上的原生部署，Pytorch库本身支持导出ONNX格式，所以通过ONNX来部署Pytorch模型是另一个选择，ONNX部署的优势是模型部署不再需要依赖Pytorch库，也就是不需要创建上面的pytorch运行时。可以使用DaaS默认自带的运行时Python 3.7 - Function as a Service，它包含了ONNX Runtime CPU版本用于支持ONNX模型预测。

转换Pytorch模型到ONNX：

# Export the model
torch.onnx.export(model,                     # model being run
                  x_test[[0]],               # model input (or a tuple for multiple inputs)
                  'mnist.onnx',              # where to save the model (can be a file or file-like object)
                  export_params=True,        # store the trained parameter weights inside the model file
                  opset_version=10,          # the ONNX version to export the model to
                  do_constant_folding=True,  # whether to execute constant folding for optimization
                  input_names = ['tensor_input'],   # the model's input names
                  output_names = ['tensor_output'], # the model's output names
                  dynamic_axes={
   'input' : {
   0 : 'batch_size'},    # variable lenght axes
                                'output' : {
   0 : 'batch_size'}}
                  )

发布ONNX模型：

publish_resp = client.publish('mnist.onnx',
                              name='pytorch-mnist-onnx',
                              x_test=x_test,
                              y_test=y_test,
                              description='A Pytorch MNIST classification model in ONNX')
pprint(publish_resp)

结果如下：

{
   'model_name': 'pytorch-mnist-onnx', 'model_version': '1'}

测试ONNX模型

上面，我们通过客户端的test函数来进行模型测试，这里我们使用另一个方式，在DaaS Web页面中测试模型。登陆DaaS Web客户端，进入pytorch-mnist-onnx模型页面，切换到测试标签页，我们看到DaaS自动存储了一条测试数据，点击提交命令，测试该条数据，如图：

如何上线部署Pytorch深度学习模型到生产环境中

我们看到，该ONNX模型和原生Pytorch模型测试结果是一致的。

默认部署和自定义部署ONNX模型

关于在DaaS Web界面中如何为ONNX模型创建默认部署和自定义部署，请参考文章《使用ONNX部署深度学习和传统机器学习模型》，流程相同，就不再这里赘述。

试用DaaS(Deployment-as-a-Service)

本文中，我们介绍了在DaaS中如何原生部署Pytorch模型，整个流程非常简单，对于默认部署，只是简单调用几个API就可以完成模型的部署，而对于自定义部署，DaaS提供了方便的测试界面，可以随时程序修改脚本进行测试，调试成功后再创建正式部署。在现实的部署中，为了获取更高的预测性能，用户需要更多的修改自定义预测脚本，比如更优的数据处理，使用GPU等。DaaS提供了简单易用的部署框架允许用户自由的定制和扩展。

如果您想体验DaaS模型自动部署系统，或者通过我们的云端SaaS服务，或者本地部署，请发送邮件到 autodeploy.ai#outlook.com（# 替换为 @），并说明一下您的模型部署需求。

上述就是小编为大家分享的如何上线部署Pytorch深度学习模型到生产环境中了，如果刚好有类似的疑惑，不妨参照上述分析进行理解。如果想知道更多相关知识，欢迎关注创新互联行业资讯频道。

当前文章：如何上线部署Pytorch深度学习模型到生产环境中
转载源于：http://scyanting.com/article/pgsihj.html

如何上线部署Pytorch深度学习模型到生产环境中

Pytorch模型部署准备

Pytorch自定义运行时

1. 构建Docker镜像：

2. 推送Docker镜像到Kubernetes中：

3. 创建Pytorch运行时：

默认部署Pytorch模型

训练Pytorch模型。

发布Pytorch模型

测试Pytorch模型

验证测试结果

正式部署Pytorch模型

自定义部署Pytorch模型

通过ONNX部署Pytorch模型

转换Pytorch模型到ONNX：

发布ONNX模型：

测试ONNX模型

默认部署和自定义部署ONNX模型

试用DaaS(Deployment-as-a-Service)

其他资讯