Thursday 8 August 2019

    Neural network inference pipeline for videos in Tensorflow

    Just as we saw a huge influx of images in the past decade or so, we are now seeing a lot of videos being produced on social media. The need to understand and moderate videos using machine learning has never been greater.

    In this post, I will show you how to build an efficient pipeline to processes videos in Tensorflow.  For simplicity, let us consider a Resnet50 model pre-trained on Imagenet. Pretty straightforward, using tf.keras.applications 

    Now, let us break it up to see what exactly is happening:
    1. We are loading the model with weights.
    2. We are reading an image and resizing it to 224x224.
    3. Do some preprocessing of the image.
    4. Run inference.
    5. Do some post processing. 

    If we want to do something similar for large videos, we need to have a pipeline that takes a stream of frames from the video, applies preprocess transformations, run inference of frames, unravel the inferences and apply post processing.

    We can see that doing all these in a sequence - frame by frame is clearly not the right thing as it is slow and inefficient. In order to tackle this, we will use and run inference in batch.

    First, lets create a generator that can produce frames from a video:

    We will use the method to create a dataset object out of this.

    Now let us define a function which does resizing, normalization and other preprocessing steps that are required on a batch of frames. Then, using the batch operation on the dataset created above, create a batch of size 64. Map the preprocess method that we defined onto the batch in parallel on CPU as it is a CPU intensive task.

    It is important to make sure that I/O is parallelized as much as possible. For best performance, instructions that are well suited for CPU should run on CPU and the ones suited for GPU should run on GPU. Also, If you observe the code above, we are prefetching. What this means is that, before consuming the dataset, a batch of 64 frames are preprocessed and is ready for consumption. By the time we run inference on a batch of frames, the next batch is ready for consumption. This is very important because, it ensures that we utilize CPU, GPU, I/O at its highest potential. Here, we are prefetching one unit at a time; and for your usecase, you may prefetch a different size. I always run nvidia-smi to tune the batch size, number of workers, prefetching etc so that both CPU and GPU are always in use in my job.

    Let's put all of this together with running the actual inference:

    This is all good, but what if you have some post processing that can be parallelized on CPU? That will keep the GPU idle till it is processed. So, let's make the inference step a generator that is part of the pipeline and feeds to a second dataset object.

    I have used post processing to do things like writing to disk, data visualization, generating content etc that can be parallelized. This was a simple guide to design data pipelines for inference in Tensorflow. Here is the code to a sightly complicated architecture that generates a video output (observe how I pass on the original frame through the pipeline).

    More resources: