Tutorial 1: Training your own model with your own dataset¶
If you followed the quickstart, you saw how easy is to train with the models and datasets from our vision package by enabling the API. In this tutorial we are gonna train a custom model with a custom dataset. We will show you how to prepare your dataset so the NPU library can process it. We will train the same ConvNet using the different currently supported AI libraries so you can see the flexibility of the NPU library.
Goal of this tutorial:
Learn the basic steps to train your own model and dataset using the NPU API.
Prepare the dataset for our accelerator cards.
Train a simple ConvNet using Pytorch, Tensorflow 2 and MXNET.
Prepare your dataset¶
The train function expects 2 types of data. Training data and validation data. Each of these data subsets must have the data input (x) and the labels or target (y). Make sure you have the same number of data samples as data labels! There are 3 data formats that the train function can understand: dataset as numpy, dataset as id and global dataset. We have already covered global datasets on the quickstart, so we will skip this.
Dataset as numpy¶
This option is when your data is not available on the dashboard. We will be using pytorch torchvision package to import and prepare our data. We will get the FashionMnist and split in training and validation subsets.
import torch.nn as nn import torchvision.datasets as dset import os from pathlib import Path # Create folder to save Datasets if it doesn't exist already CWD = os.getcwd() DATA_PATH = CWD + '/datasets' Path(DATA_PATH).mkdir(parents=True, exist_ok=True) train_ds = dset.FashionMNIST(DATA_PATH, train=True, download=True) val_ds = dset.FashionMNIST(DATA_PATH, train=True, download=True) # Get the first 40,000 samples, reshape to [N,C,H,W] and transform to numpy. x_train = train_ds.data.unsqueeze(1).numpy()[0:40000] y_train = train_ds.targets.numpy()[0:40000] # Get the last 20,000 samples, reshape to [N,C,H,W] and transform to numpy. x_val = train_ds.data.unsqueeze(1).numpy()[40000:] y_val = train_ds.targets.numpy()[40000:] # Training and validation data are now ready to be sent over with the API train_data = (x_train, y_train) val_data = (x_val, y_val)
Dataset as id¶
This option is when your data is available on the dashboard or has been previously uploaded as a numpy array. It couldn’t be simpler to reference data. As shown in the video below you will need to copy the id of the data you want to use for training or validation and pass it as a string to the training and validation data variables as shown below.
To find your data id in the dashboard, go to Datasets>My Datasets and select the right data row, to expand the details and show the full id.
# Training and validation data are now ready to be sent over with the API train_data = '5efc73fs204a837aace59587' val_data = '5efc740s5284a8f7aade39509'
Prepare your model¶
We currently support Pytorch, Mxnet, and Tensorflow 2 models. There are 4 ways you can load a model to be compiled by the NPU API: As an object, as a filename, reference by id and as a global model. We have already covered global models on the quickstart, so we will skip this.
Model as an object¶
This option is when you build your model using your favorite library. The resulting model is a python object that the NPU API can easily read. The same model on 3 different supported AI libraries is shown below.
Model as a file¶
This option is when you have previously trained a model on your own machine or anywhere else, and you want to continue training using our NPU API.
For mxnet and tensorflow the required files containing weights and model must be inside a .tar.gz file. For pytorch the file can be directly in .pt or .pth format.
The following snippets show you how to save your models correctly to be loaded by the NPU API.
After saving your tensorflow or mxnet model, you will need to tar the resulting files or folder using the following command on linux or macOS:
tar -cvf model.tar /path/to/file /path/to/other_file
The model variable now should contain a path to your file, as shown below:
model = '/path/to/file/model.tar'
Model as id¶
This option is when your model is available on the dashboard or has been previously uploaded. Similar to the data, it couldn’t be simpler to reference a model by id. As shown in the video below you will need to copy the id of the model you want to use and we will store it in model variable as a string. We will pass it as an argument to the training function later.
To find your model id in the dashboard, go to Models>My Models and select the right model row, to expand the details and show the full id.
# Your model is ready to be sent over with the API model = '5efcc746be822ea6ba6b642c'
Compile and train the model¶
All of the workflow prior to this is what you are currently used to.
Here’s where the magic happens. You simply call compile on the model
and input the compiled model to
npu.train() with the associated parameters.
The NPU will train your model and you can view your model training on our dashboard.
import npu npu.api(API_TOKEN) # Compile model if it is not in Dashboard model = npu.compile(model, input_shape=[1, 28, 28]) model_trained = npu.train(model, train_data=train_data, val_data=val_data, loss=npu.loss.SoftmaxCrossEntropyLoss, optim=npu.optim.SGD(lr=0.01), batch_size=64, epochs=10)