Posts

  • 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
  • Magic Cauldron: Harry Potter Themed Gender Reveal Party - #Aurdino

    Earlier this year, we had a very fun filled Harry Potter themed gender reveal party. For the reveal, I built a Magic Cauldron which would reveal the gender. Check it out for yourself! For this I needed: A Cauldron. WS2812B LED array. Aurdino UNO. Bread board and jumper wires. Dry ice. Kasa Smart bulbs I will go over in the following sections The Mist.

  • Kakashi: The Copycat Robot - #Aurdino #image processing

    In this post, I want to share about "Kakashi: The Copycat Robot"—a fun side project I built a few years ago. The name is inspired by the famous character from Naruto, Kakashi Hatake, also known as the Copycat Ninja.The goal of this robot was to mimic Kakashi's ability to copy movements—though, of course, in a more limited way. Check it out for yourself!Here are the things I used to build this:

  • Neural network inference pipeline for videos in Tensorflow - #Deeplearning #Tensorflow

    Just as we saw a huge influx of images in the past decade or so, we are now seeing a lot of videos being produced on social media. The need to understand and moderate videos using machine learning has never been greater. In this post, I will show you how to build an efficient pipeline to processes videos in Tensorflow.  For simplicity, let us consider a Resnet50 model pre-trained on

  • Finding Where's Waldo using Mask R-CNN - #Deeplearning #ML

    When I was a kid, I really loved solving Where's Waldo. There were few books (it used to be called Where's Wally) in our school library on which I spent hours finding Waldo. For people who do not know what it is, basically Waldo - a unique character is hidden among hundreds of other characters and you have to find him in all the chaos in the image. Now that I am too old to be solving it and

  • Higher level ops for building neural network layers with deeplearn.js - #Deeplearning #javascript #ML

    I have been meddling with google's deeplearn.js lately for fun. It is surprisingly good given how new the project is and it seems to have a sold roadmap. However it still lacks something like tf.layers and tf.contrib.layers which have many higher level functions that has made using tensorflow so easy. It looks like they will be added to Graphlayers in future but their priorities as of now is to

Thursday, 8 August 2019

Neural network inference pipeline for videos in Tensorflow

Just as we saw a huge influx of images in the past decade or so, we are now seeing a lot of videos being produced on social media. The need to understand and moderate videos using machine learning has never been greater.

In this post, I will show you how to build an efficient pipeline to processes videos in Tensorflow.  For simplicity, let us consider a Resnet50 model pre-trained on Imagenet. Pretty straightforward, using tf.keras.applications 


Now, let us break it up to see what exactly is happening:
  1. We are loading the model with weights.
  2. We are reading an image and resizing it to 224x224.
  3. Do some preprocessing of the image.
  4. Run inference.
  5. Do some post processing. 

If we want to do something similar for large videos, we need to have a pipeline that takes a stream of frames from the video, applies preprocess transformations, run inference of frames, unravel the inferences and apply post processing.

We can see that doing all these in a sequence - frame by frame is clearly not the right thing as it is slow and inefficient. In order to tackle this, we will use tf.data.Dataset and run inference in batch.

First, lets create a generator that can produce frames from a video:


We will use the tf.data.Dataset.from_generator method to create a dataset object out of this.


Now let us define a function which does resizing, normalization and other preprocessing steps that are required on a batch of frames. Then, using the batch operation on the dataset created above, create a batch of size 64. Map the preprocess method that we defined onto the batch in parallel on CPU as it is a CPU intensive task.


It is important to make sure that I/O is parallelized as much as possible. For best performance, instructions that are well suited for CPU should run on CPU and the ones suited for GPU should run on GPU. Also, If you observe the code above, we are prefetching. What this means is that, before consuming the dataset, a batch of 64 frames are preprocessed and is ready for consumption. By the time we run inference on a batch of frames, the next batch is ready for consumption. This is very important because, it ensures that we utilize CPU, GPU, I/O at its highest potential. Here, we are prefetching one unit at a time; and for your usecase, you may prefetch a different size. I always run nvidia-smi to tune the batch size, number of workers, prefetching etc so that both CPU and GPU are always in use in my job.


Let's put all of this together with running the actual inference:


This is all good, but what if you have some post processing that can be parallelized on CPU? That will keep the GPU idle till it is processed. So, let's make the inference step a generator that is part of the pipeline and feeds to a second dataset object.


I have used post processing to do things like writing to disk, data visualization, generating content etc that can be parallelized. This was a simple guide to design data pipelines for inference in Tensorflow. Here is the code to a sightly complicated architecture that generates a video output (observe how I pass on the original frame through the pipeline).


More resources:
https://www.tensorflow.org/guide/performance/datasets
https://www.tensorflow.org/beta/guide/data