ML training on a microcontroller

Researchers at MIT have developed programming techniques that use less than a quarter of a megabyte of memory to enable ML training, making it suitable for operation in microcontrollers and other edge hardware with limited resources.

The techniques can be used to train machine learning models on a microcontroller in a matter of minutes and enable them to adapt to new data collected by the device’s sensors.

Until now, the problem with implementing such solutions, is that edge devices are often constrained in their memory size and processing power. At one end of the scale, tiny IoT devices based on microcontrollers may have as little as 256KB of SRAM, which is barely enough for the inference work of some deep learning models, let alone the training.

However, deep learning training systems like PyTorch and TensorFlow are often run-on clusters of servers with gigabytes of memory at their disposal and while there are edge deep learning inference frameworks, some of these lack support for the backpropagation to adjust the models.

In contrast, the intelligent algorithms and framework that MIT researchers have developed are able to reduce the amount of computation required to train models. Training a typical deep learning model undergoes hundreds of updates while it learns, and as there may be millions of weights and activations involved, training a model requires much more memory than running a pre-trained model.

One of the MIT solutions that makes the training process more efficient is ‘sparse update’, which skips the gradient computation of less important layers and sub-tensors by using an algorithm to identify only the most important weights to update during each round of training.

The algorithm works by ‘freezing the weights’ one at a time until it detects the accuracy dip to a set threshold. The remaining weights are then updated, while the activations corresponding to the frozen weights do not need to be stored.

The second solution is to reduce the size of the weights using quantization, typically from 32 bits to just 8 bits, to cut the amount of memory needed for training and inference. The system changes the order of steps in the training process, so more work is completed in the compilation stage, before the model is deployed on edge devices.

The final part of the solution is a lightweight training system, Tiny Training Engine (TTE), that implements these algorithms on a simple microcontroller. According to a paper written about this research; the newly developed framework is the first machine learning solution to enable on-device training of convolutional neural networks with a memory budget of less than 256KB.

The researchers now say they want to apply what they have learned to other machine learning models and types of data, such as language models and time-series data.

The techniques used could “shrink the size of larger models without sacrificing accuracy, and reduce the carbon footprint of training large-scale, machine-learning models.