Tutorial 1: Training your own model with your own dataset

If you followed the quickstart, you saw how easy is to train with the models and datasets from our vision package by enabling the API. In this tutorial we are gonna train a custom model with a custom dataset.

We will show you how to prepare your model and dataset so the NPU library can process it. All you need is, a model, some data and the following 2 lines of code:

model = npu.compile(model, input_shape)
model_trained = npu.train(model, train_data, val_data, loss, optim, batch_size, epochs)

Goal of this tutorial:

  • Learn the basic steps to train your own model and dataset using the NPU API.

  • Prepare the model and dataset for our accelerator cards.

  • Train a simple ConvNet using Pytorch, Tensorflow 2 and MXNET.

Prepare your dataset

The train function expects 2 types of data. Training data and validation data. Each of these data subsets must have the data input (x) and the labels or target (y). Make sure you have the same number of data samples as data labels! There are 3 data formats that the train function can understand: dataset as numpy, dataset as id and global dataset. We have already covered global datasets on the quickstart, so we will skip this.

Dataset as numpy

This option is when your data is not available on the dashboard. We will be using pytorch torchvision package to import and prepare our data. We will get the FashionMnist and split in training and validation subsets.

import torch.nn as nn
import torchvision.datasets as dset
import os
from pathlib import Path

# Create folder to save Datasets if it doesn't exist already
CWD = os.getcwd()
DATA_PATH = CWD + '/datasets'
Path(DATA_PATH).mkdir(parents=True, exist_ok=True)

train_ds = dset.FashionMNIST(DATA_PATH, train=True, download=True)

# Get the first 40,000 samples, reshape to [N,C,H,W] and transform to numpy.
x_train = train_ds.data.unsqueeze(1).numpy()[0:40000]
y_train = train_ds.targets.numpy()[0:40000]

# Get the last 20,000 samples, reshape to [N,C,H,W] and transform to numpy.
x_val = train_ds.data.unsqueeze(1).numpy()[40000:]
y_val = train_ds.targets.numpy()[40000:]

# Training and validation data are now ready to be sent over with the API
train_data = (x_train, y_train)
val_data = (x_val, y_val)

Dataset as id

This option is when your data is available on the dashboard or has been previously uploaded as a numpy array. It couldn’t be simpler to reference data. As shown in the video below you will need to copy the id of the data you want to use for training or validation and pass it as a string to the training and validation data variables as shown below.

To find your data id in the dashboard, go to Datasets>My Datasets and select the right data row, to expand the details and show the full id.

# Training and validation data are now ready to be sent over with the API
train_data = '5efc73fs204a837aace59587'
val_data = '5efc740s5284a8f7aade39509'

Prepare your model

We currently support Pytorch, Mxnet, and Tensorflow 2 models. There are 4 ways you can load a model to be compiled by the NPU API: As an object, as a filename, reference by id and as a global model. We have already covered global models on the quickstart, so we will skip this.

Model as an object

This option is when you build your model using your favorite library. The resulting model is a python object that the NPU API can easily read. The same model on 3 different supported AI libraries is shown below.

import torch
import torch.nn as nn

class Net(nn.Module):
    def __init__(self):
        nn.Module.__init__(self)
        self.layer1 = nn.Sequential(
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=3, padding=1),
            nn.BatchNorm2d(16),
            nn.ReLU(),
            nn.MaxPool2d(kernel_size=2, stride=2))
        self.layer2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=3),
            nn.BatchNorm2d(32),
            nn.ReLU(),
            nn.MaxPool2d(2))
        self.fc1 = nn.Linear(in_features=32*6*6, out_features=10)

    def forward(self, x):
        x = self.layer1(x)
        x = self.layer2(x)
        x = x.flatten(start_dim=1)
        x = self.fc1(x)
        return x

# Initialise your model
model = Net()
import tensorflow as tf
from tensorflow.keras import Model, layers

class Net(Model):
    # Set layers.
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = layers.Conv2D(16, kernel_size=3, padding='same')
        self.bn1 = layers.BatchNormalization()
        self.relu = layers.ReLU()
        self.maxp1 = layers.MaxPool2D(2)
        self.conv2 = layers.Conv2D(32, kernel_size=3, padding='same')
        self.bn2 = layers.BatchNormalization()
        self.maxp2 = layers.MaxPool2D(2)
        self.flatten = layers.Flatten()
        self.fc1 = layers.Dense(10)

    # Set forward pass.
    def call(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxp1(x)
        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.maxp2(x)
        x = self.flatten(x)
        x = self.fc1(x)
        return x

# Initialise your model and run 1 forward pass
model = Net()
model.predict(tf.random.normal((1, 28, 28, 1)))
import mxnet
from mxnet import gluon
from mxnet.gluon import nn

class Net(gluon.HybridBlock):
    def __init__(self, **kwargs):
        super(Net, self).__init__(**kwargs)
        self.net1 = nn.HybridSequential()
        self.net1.add(nn.Conv2D(channels=16, kernel_size=3, padding=1),
                nn.BatchNorm(),
                nn.Activation(activation='relu'),
                nn.MaxPool2D(pool_size=2, strides=2))
        self.net2 = nn.HybridSequential()
        self.net2.add(nn.Conv2D(channels=32, kernel_size=3, padding=1),
                nn.BatchNorm(),
                nn.Activation(activation='relu'),
                nn.MaxPool2D(pool_size=2, strides=2))
        self.flat = nn.Flatten()
        self.fc1 = nn.Dense(10)

    def hybrid_forward(self, F, x, *args, **kwargs):
        x = self.net1(x)
        x = self.net2(x)
        return self.fc1(self.flat(x))

# Initialise your model and run 1 forward pass
model = Net()
model.initialize()
model.hybridize()
model(mxnet.nd.ones([1, 1, 28, 28]))

Model as a file

This option is when you have previously trained a model on your own machine or anywhere else, and you want to continue training using our NPU API.

Note

For mxnet and tensorflow the required files containing weights and model must be inside a .tar.gz file. For pytorch the file can be directly in .pt or .pth format.

The following snippets show you how to save your models correctly to be loaded by the NPU API.

import dill
torch.save(model, "model.pt", pickle_module=dill)
model.save("model")
model.export("model", epoch=1)

After saving your tensorflow or mxnet model, you will need to tar the resulting files or folder using the following command on linux or macOS:

tar -cvf model.tar /path/to/file /path/to/other_file

The model variable now should contain a path to your file, as shown below:

model = '/path/to/file/model.tar'

Model as id

This option is when your model is available on the dashboard or has been previously uploaded. Similar to the data, it couldn’t be simpler to reference a model by id. As shown in the video below you will need to copy the id of the model you want to use and we will store it in model variable as a string. We will pass it as an argument to the training function later.

To find your model id in the dashboard, go to Models>My Models and select the right model row, to expand the details and show the full id.

# Your model is ready to be sent over with the API
model = '5efcc746be822ea6ba6b642c'

Compile and train the model

All of the workflow prior to this is what you are currently used to. Here’s where the magic happens. You simply call compile on the model and input the compiled model to npu.train() with the associated parameters.

The NPU will train your model and you can view your model training on our dashboard.

import npu

npu.api(API_TOKEN)

# Compile model if it is not in Dashboard
model = npu.compile(model, input_shape=[1, 28, 28])

model_trained = npu.train(model,
                          train_data=train_data,
                          val_data=val_data,
                          loss=npu.loss.SoftmaxCrossEntropyLoss,
                          optim=npu.optim.SGD(lr=0.01),
                          batch_size=64,
                          epochs=10)