Tutorial 3: Asynchronous Training and Callbacks

Beyond the basics of training and predict, our API introduces 2 major functionalities: Asynchronicity for your training/prediction and Callbacks.

Asynchronous functionality

We will train a pretrained resnet18 model from our global models and chain multiple training pipelines together.

import npu
import npu.vision.models as models
import npu.vision.datasets as dset

npu.api(API_TOKEN)

# Samples used for training and validation
SAMPLES = 5000
# Validation samples starting point
VAL = 30000
for i in range(5):
    min_range = i * SAMPLES
    max_range = (i + 1) * SAMPLES
    trained_model = npu.train(models.resnet18(pretrained=True),
                              train_data=dset.CIFAR10[min_range:max_range],
                              val_data=dset.CIFAR10[min_range+VAL:max_range+VAL],
                              loss=npu.loss.SparseCrossEntropyLoss,
                              optim=npu.optim.SGD(lr=0.01),
                              batch_size=128,
                              epochs=3,
                              asynchronous=True)

Out:

Token successfully authenticated
Started training. View status at https://dashboard.neuro-ai.co.uk/tasks?task_id=5ee7982236bbaecaba3d6a10
Started training. View status at https://dashboard.neuro-ai.co.uk/tasks?task_id=5ee7982236bbaecaba3d6a11
Started training. View status at https://dashboard.neuro-ai.co.uk/tasks?task_id=5ee7982336bbaecaba3d6a12
Started training. View status at https://dashboard.neuro-ai.co.uk/tasks?task_id=5ee7982336bbaecaba3d6a13
Started training. View status at https://dashboard.neuro-ai.co.uk/tasks?task_id=5ee7982336bbaecaba3d6a14

We can view on the dashboard that all of our training tasks are running concurrently. This means we can test a variety of different datasets and hyperparameters without having to wait for each one to finish before trying something different. We can see below that each of the 5 training tasks are running.

../_images/multi.png

We can also chain models trained previously in a similarly asynchronous matter. The first training will use the resnet18 global model, but the next training will use the resulting trained model and so on.

model = models.resnet18(pretrained=True)
for i in range(5):
    model = npu.train(model,
                      train_data=datasets.CIFAR10.train,
                      val_data=datasets.CIFAR10.val,
                      batch_size=128,
                      epochs=3,
                      asynchronous=True)

Callbacks

Callbacks allow you to run your own function once training has been completed.

def say_hi(args):
    print("Hello World!")

model = npu.train(model,
                train_data=dset.CIFAR10.train,
                val_data=dset.CIFAR10.val,
                loss=npu.loss.SparseCrossEntropyLoss,
                optim=npu.optim.SGD(lr=0.01),
                batch_size=128,
                epochs=2,
                asynchronous=True,
                callback=say_hi)

This will print Hello World! right after training your model. Callbacks are very powerful, they enable you to access variables and metrics from your training task and change behaviour accordingly.