How to do classification using MAX78000 neural network accelerator

en español

In the form of personal technologies, artificial intelligence (AI) and computer science has increased in stature in recent years. Whilst the convergence of the two is still in its infancy, together they have the potential to revolutionize the life of consumers and companies. But it has challenges, especially when it comes to practical implementation as the Artificial intelligence (AI) requires serious computing power.

The task such as simple classification is less complex and can be done on low power microcontroller. Maxim has introduced MAX78000 microcontroller which is low power and energy efficient in processing this AI task. The hardware is based on convolutional neural network (CNN) accelerator that enables battery-powered applications to execute AI inferences.

The classification on MAX78000 involves three steps i.e. data preparation, model training and inferences, which is discussed in details.

Data preparation

In the application development cycle, one of the first steps is to prepare and preprocess the available data to create training and validation/test datasets. In addition to the usual data preprocessing, several hardware constraints have to be considered to run the model on the MAX78000. For training, input data is expected to be in the range [- 128/128, + 127/128]. When evaluating quantised weights, or when running on hardware, input data is instead expected to be in the native MAX7800X range of [-128, +127].If the available data range is [0 255], it needs to be divided by 256 to bring it to the [0 1] range.

Data sources can contain unprocessed data files in different sizes and formats. The functionality of the data set and data loader function is responsible for handling the necessary conversions. One of the key roles of using data loaders is to adjust data ranges and manage data types before the dataset is presented to the CNN model. The TorchVision package

Model creation and training

Convolutional Neural Networks (CNNs) are useful because they can learn through data entry and scale independent feature from input data. Convolutional kernels are usually small in length, such as 3x3, 7x7, etc., which gives them much more memory efficiency.

Pytorch or Tensorflow-Keras Toolchain can be used to develop the ML models on max78000. The model is built with a set of defined subclasses representing the hardware. This model is trained using floating point weights and training data. Weights can be quantized during training or after training. The result of quantisation can be evaluated on the test dataset to determine the decrease in accuracy due to quantization.

The MAX78000 (ai8xize) synthesizer tool accepts a PyTorch checkpoint or ONNX files exported from TensorFlow as input, along with a model description in YAML format. An input model data file (.npy file) is provided both for the synthesizer and for verifying the model mounted on the machine. The inference result of this data file is compared to the expected output of the pre-synthesis model.

The MAX78000 synthesizer automatically generates the C code that runs on MAX78000. The C code includes Application Programming Interface (API) calls to load the sample weights and data provided to the hardware, perform an inference on the sample data, and compare the classification result with the expected result as pass or fail. This generated C code can be used as an example to build custom applications. The figure below shows the general development flow of the MAX78000.

Figure 1: Development flow of the MAX78000

Model Testing and inference

Maxim has developed the MAX78000 EV kit for model training and demonstrating the inference. Let us assume that the model is trained with the face classification dataset. The image classification is done by three main steps:

Face Extraction: Detect faces in an image and extract a rectangular sub-image containing only one face.
Face Alignment: Determining the rotation angle (3D) of the face in the sub-image to compensate for the effects of affine transitions.
Face Identification: Identify the person using the extracted and customised sub-image.

There are several methods for the first two steps. Multitasking Cascaded Convolutional Neural Networks (MTCNNs) can solve facial recognition and alignment steps. The MAX78000 EV kit is used to identify uncropped faces of an image, each with only one face.

The adopted approach is based on learning a signature, i.e., embedding, for each facial image whose distance to another embedding gives a measure about the face's similarity. It is expected to observe small distances between the faces of the same person and large distances for the faces of distinct people.

Figure 2: Inference on MAX78000 EV Kit

FaceNet is one of the most popular CNN-based models developed for the built-in face recognition method. Triplet loss is the key to their success. This loss function requires three input samples: an anchor, a positive sample of the same identity as the anchor, and a negative sample of a different identity. The triplet loss function provides short values when the armature distance is close to the positive sample and far from the negative.

However, this model has 7.5 million parameters, which is quite large for the MAX78000. It also requires 1.6G floating-point operations, making it very difficult to run the model on multiple mobile or IoT devices. Therefore, a new architectural model with less than 450,000 parameters was designed to suit the MAX78000.

The Knowledge Distillation method was adopted to develop this small CNN model from FaceNet, as it is a widely valued neural network for FaceID applications.

In machine learning, cognitive distillation is the process of transferring knowledge from a large sample to a small sample. The larger model can know more than the smaller model. However, the capacity will not be fully utilised. The goal here is therefore to introduce small networks to match the behavior of large networks.