Posts

    Monday 30 September 2024

    Magic Cauldron: Harry Potter Themed Gender Reveal Party

    Earlier this year, we had a very fun filled Harry Potter themed gender reveal party. For the reveal, I built a Magic Cauldron which would reveal the gender. Check it out for yourself!

    For this I needed:

    1. A Cauldron.
    2. WS2812B LED array.
    3. Aurdino UNO.
    4. Bread board and jumper wires.
    5. Dry ice.
    6. Kasa Smart bulbs

    I will go over in the following sections
    1. The Mist.
    2. Serial Bus.
    3. LED orchestration.
    4. Room effect.

    The Mist

    Dry Ice is solid carbon dioxide which is at a very low temperature of -78.5°C. When put in water at room temperature, it rapidly sublimates to create the fog/ mist effect. The hotter the water, the denser the mist and the quicker the effect occurs. So, it is important to have a balance - I preferred to have water heated up for 10s in microwave before I began. To prevent the mist from spreading too far, place the water in a smaller bowl inside the cauldron.

    Make sure to get the dry ice the same day as the event if you don't have the means to store it. Even when stored in the freezer, you will lose a lot it through sublimation. Always handle dry ice with care—never touch it with bare hands, as it can cause instant frostbite—and keep it out of reach of children.

    Serial Bus

    The serial bus acts as a communication channel between my computer and the Arduino via a wired connection. 
    We didn’t know the baby’s gender, and neither did anyone else. The plan was to have a trusted person open the envelope and provide the answer to start the reveal!
    Enter mode: b: boy, g: girl, d: dry run: 

    I used the serial bus to relay this information to the Aurdino.  


    LED orchestration

    I used a WS2812B LED array with 60 individually addressable LEDs, controlled through the FastLED.h library. The possibilities of what you can create with it are only limited by your imagination.

    There were 4 stages to the reveal build up 

    1. Lightning 

    • Every LED has 50% probability of turning on or off.
    • Random delay between 10 and 100ms before all LEDs are turned off
    • Random delay between 10 and 100ms before we repeat till 10s 


    2. Brewing portion 

    • LEDs are green.
    • They are turned on in sequence before it resets with a delay.


    3. Exponential backoff 

    This is right before the big reveal so this builds up anticipation. This backfired on me as it made be very nervous 😆  

    • LEDs are white.
    • All LEDs are turned on at the same time before turning off after a delay.
    • This delay is reduced exponentially to build up the suspense till it reaches a minimum.

    It is hard to capture this in a video because of shutter speed being low on phone camera.


    4. Final reveal

    All the LEDs are turned on with blue for boy and pink for girl - initially with the same effect from (2) before it settles to one color.  


    Room Effect


    I purchased some smart bulbs from Amazon that could be controlled using command line. I set up two of them in the room, and after the cauldron reveal, the bulbs would light up blue for a boy or pink for a girl, illuminating the entire room in the chosen color.


    Conclusion

    As Harry Potter fans, this was a fun and exciting way for us to reveal our baby's gender. We hope it inspires and helps others who are looking to create a similar experience. You can checkout the whole codebase here (I hope I get to clean it up some day).

    Sunday 22 September 2024

    Kakashi: The Copycat Robot

    In this post, I want to share about "Kakashi: The Copycat Robot"—a fun side project I built a few years ago. The name is inspired by the famous character from Naruto, Kakashi Hatake, also known as the Copycat Ninja.

    The goal of this robot was to mimic Kakashi's ability to copy movements—though, of course, in a more limited way. Check it out for yourself!

    Here are the things I used to build this:

    1. Aurdino UNO board.
    2. Max7219 8x8 LED
    3. 3D printed Pan and Tilt brackets (2x) 
    4. 4 servo motors
    5. Bread board and jumper wires.
    I will go through it in following sections:
    • The Sharinghan
    • Pan and Tilt motion 
    • Controller - Serial bus
    • Tracking algorithm

    The Sharinghan

    Of course, our Kakashi needs a Sharingan! For those unfamiliar, the Sharingan is the special eye that grants Kakashi his copycat abilities in Naruto.



    For this, I used a Max7219 8x8 LED. It has 5 pins which I connected as follows:

    • VCC - connect to 5V 
    • GND - connect to ground
    • DIN - data in ports.h 
    • CS - chip select ports.h 
    • CLK - clock ports.h 



    Then I found a led editor which I used to create a hex mapping of the sharinghan in different angles and wrote this code that loops around it. 


    Pan and Tilt

    Pan and tilt are the two motions using which you can basically cover any movement when used in combination. 


    I used two of these to mimic arm movements. Each one is made up of a pan and tilt bracket, which you can either 3D print or purchase pre-made from Amazon. I attached two servo motors to each bracket. I won't go into assembly details, as there are plenty of great tutorials available on how to put one together. 

    Each servo motor has 3 pins - 5V power, ground and control. I connected the four control cables to the following ports:


    I wrote a simple class to control the 4 servo motors and map it into pan and tilt actions. 

       

    Controller - Serial Bus

    The serial bus acts as a communication channel between my computer and the Arduino via a wired connection. I use it to control the 8x8 LED display and handle pan and tilt actions. This setup is flexible and has been useful in several other projects as well. 
    On the client side, I implemented a simple class that sends control messages. It also has the ability to record and playback actions—similar to how Kakashi copies techniques and reuses them.


    On aurdino, I receive these messages and do an appropriate action.


    Tracking Algorithm

       

    In this section, I'll explain how I mapped my real-world movements to control the robot's actions. There were three main requirements:

    1. Hand Tracking: The system needed to track my hand movements and map them to four angles, corresponding to the servo motors in the pan and tilt setup.

    2. Scale Invariance: It had to be scale-invariant, meaning I could start from any position and move freely, with the robot replicating the same actions regardless of where I started.

    3. Smooth Movements: The movements had to be smooth, taking into account the bandwidth limitations of the serial bus and the movement speed of the servo motors while being fault tolerant. 

    For hand tracking, I needed a model that could quickly provide hand landmarks while running efficiently on CPU/MPS (for Mac). Since high accuracy wasn't critical, I opted for the EfficientDet model via MediaPipe. You can find more details in the kakashi.py file.




    Once I have the hand landmarks, I extract three key pieces of information from each hand:

    1. Center of the Hand (landmark 0)
    2. Palm Height (difference between landmarks 5 and 0) — used to scale the coordinates.
    3. Average Position of Finger Tips (landmarks 4, 8, 12, 16, 20) — since not all fingers might always be visible.

    With the tracking data available for each frame, the next step is to map it to the pan and tilt actions, i.e., the four angles for the servo motors.

    A servo motor can move between 0 and 180 degrees. I set the motors to point forward at 0 degrees, and whenever the program starts, the motors reset to this position. The tracking data from the first frame (td₀) serves as the reference point.

    For each subsequent frame, we calculate the distance along the x and y axes relative to the reference frame. This distance is scaled based on palm height to maintain scale invariance. After scaling, the distance is normalized between 0 and 1, with a range of -3 to +3, and then converted into a corresponding angle between 0 and 180 degrees.

    Here is the code that does this:


    Then we put all this together and voila, we have the Kakashi: The Copycat Robot! 

    PS: Feel free to checkout the whole code on github (I hope I get to clean it up someday).

    Thursday 8 August 2019

    Neural network inference pipeline for videos in Tensorflow

    Just as we saw a huge influx of images in the past decade or so, we are now seeing a lot of videos being produced on social media. The need to understand and moderate videos using machine learning has never been greater.

    In this post, I will show you how to build an efficient pipeline to processes videos in Tensorflow.  For simplicity, let us consider a Resnet50 model pre-trained on Imagenet. Pretty straightforward, using tf.keras.applications 


    Now, let us break it up to see what exactly is happening:
    1. We are loading the model with weights.
    2. We are reading an image and resizing it to 224x224.
    3. Do some preprocessing of the image.
    4. Run inference.
    5. Do some post processing. 

    If we want to do something similar for large videos, we need to have a pipeline that takes a stream of frames from the video, applies preprocess transformations, run inference of frames, unravel the inferences and apply post processing.

    We can see that doing all these in a sequence - frame by frame is clearly not the right thing as it is slow and inefficient. In order to tackle this, we will use tf.data.Dataset and run inference in batch.

    First, lets create a generator that can produce frames from a video:


    We will use the tf.data.Dataset.from_generator method to create a dataset object out of this.


    Now let us define a function which does resizing, normalization and other preprocessing steps that are required on a batch of frames. Then, using the batch operation on the dataset created above, create a batch of size 64. Map the preprocess method that we defined onto the batch in parallel on CPU as it is a CPU intensive task.


    It is important to make sure that I/O is parallelized as much as possible. For best performance, instructions that are well suited for CPU should run on CPU and the ones suited for GPU should run on GPU. Also, If you observe the code above, we are prefetching. What this means is that, before consuming the dataset, a batch of 64 frames are preprocessed and is ready for consumption. By the time we run inference on a batch of frames, the next batch is ready for consumption. This is very important because, it ensures that we utilize CPU, GPU, I/O at its highest potential. Here, we are prefetching one unit at a time; and for your usecase, you may prefetch a different size. I always run nvidia-smi to tune the batch size, number of workers, prefetching etc so that both CPU and GPU are always in use in my job.


    Let's put all of this together with running the actual inference:


    This is all good, but what if you have some post processing that can be parallelized on CPU? That will keep the GPU idle till it is processed. So, let's make the inference step a generator that is part of the pipeline and feeds to a second dataset object.


    I have used post processing to do things like writing to disk, data visualization, generating content etc that can be parallelized. This was a simple guide to design data pipelines for inference in Tensorflow. Here is the code to a sightly complicated architecture that generates a video output (observe how I pass on the original frame through the pipeline).


    More resources:
    https://www.tensorflow.org/guide/performance/datasets
    https://www.tensorflow.org/beta/guide/data 

    Wednesday 13 June 2018

    Finding Where's Waldo using Mask R-CNN

    When I was a kid, I really loved solving Where's Waldo. There were few books (it used to be called Where's Wally) in our school library on which I spent hours finding Waldo. For people who do not know what it is, basically Waldo - a unique character is hidden among hundreds of other characters and you have to find him in all the chaos in the image.

    Now that I am too old to be solving it and too busy to spend hours on such things, I decided to build a system that uses deep learning to automatically solve it and spent weeks to build it. 

    I started off by treating this like a classification problem with two classes - Waldo and not Waldo, similar to Hot dog - not Hot dog . Once we can get the classification problem successfully solved, we can just apply a classification action mapping (CAM) layer to find Waldo's activations in the image and thus finding Waldo. However I couldn't find enough images of Waldo. I found this repo which has about 20 images. And as there are only 20 Waldo vs thousands of not-Waldo characters, there is very high imbalance in the classes. I still tried though. But the results weren't that great.

    When I looked if someone has already worked on it, I found a medium post which used Tensorflow's Faster R-CNN model to do this. But I didn't want to just find bounding boxes, I wanted to actually mask out Waldo in the image. But I got more images of Where's Waldo from it.

    Then I came across this paper on Mask R-CNN which sounded promising for this usecase. And it was indeed much better than my earlier approach:


    Waldo masked out in the image

    Original Image

    In this post I would like to share how I was able to get the data, tag it and train a model to be able to solve Where's Waldo. You can checkout my code on github here.

    Fork deepwaldo on Github

    Mask R-CNN 


    The main idea here is to:

    1. Take the input image and pass it into a set of convolutional layers that sort of generates a feature map for the given image.
    2. Now, you take this feature map and pass it into a region proposal network which generates rectangular regions that say that for the set of final classes, we might have an image in this region. This will have its own classification loss (rpn_class_loss) and bounding box loss (rpn_bbox_loss).
    3. Now you take these regions and pass it into a ROI pooling layer using something like non-max-suppression.
    4. The regions are then reshaped and passed on to set of convolution layers which predict if there is an object in them or not. This again will have its own classification and bounding box losses. 
    5. Now, you have a separate convolution layers which predicts, for every pixel in the bounding box predicted, is it the given class or not. This essentially gives the mask required. Here in addition to the bounding box and classification losses, we also have mask loss.
    6. You run all these networks together backpropogating all the losses.
    If you want a more clear explanation, checkout the lecture in CS231n


    Mask R-CNN arch from CS231n (In this case we have a 28x28 mask instead of 14x14)


    Data


    As I mentioned earlier, I got 20 images from this repo and few more images from the medium post that used Faster R-CNN.  So,  a total of 29 images. I split this into 26 for training and 3 for validation. Then I used the via-via tool (used to tag VGG) to manually draw masks over Waldo in every image.  You can find the images and annotations in my github repo here.


    Training


    I trained the model for 30 epochs with 100 steps per epoch. The losses on tensorboard:






    If you want to train on your own dataset, first set your configurations in the waldo_config.py file or use the default.

    This will download the Mask-RCNN model trained on coco dataset to the MODEL_DIR folder and trains a model with the data in the DATA_DIR folder.
    python train.py
    For prediction, you can do the following which shows a popup with waldo detected in the image.

    python predict.py [MODEL PATH] [PATH TO IMAGE]
    # for example
    python predict.py models/logs/waldo20180612T1628/mask_rcnn_waldo_0030.h5 data/val/5.jpg

    In conclusion, the Mask R-CNN algorithm works fairly well to find Waldo for cases where it has already seen similar type of waldo image. Also, it looks like it works much better when the image quality is good and waldo is clearly visible. But I think it is still great since we only had a very tiny training data to train on.


    Tuesday 23 January 2018

    Higher level ops for building neural network layers with deeplearn.js

    I have been meddling with google's deeplearn.js lately for fun. It is surprisingly good given how new the project is and it seems to have a sold roadmap. However it still lacks something like tf.layers and tf.contrib.layers which have many higher level functions that has made using tensorflow so easy. It looks like they will be added to Graphlayers in future but their priorities as of now is to fix the lower level APIs first - which totally makes sense.

    So, I quickly built one for tf.layers.conv2d and tf.layers.flatten which I will share in this post. I have made them as close to function definitions in tensorflow as possible.

    1.  conv2d - Functional interface for the 2D convolution layer.

    Arguments:
    • inputs Tensor input.
    • filters Integer, the dimensionality of the output space (i.e. the number of filters in the convolution).
    • kernel_size Number to specify the height and width of the 2D convolution window.
    • graph Graph opbject.
    • strides Number to specify the strides of convolution.
    • padding One of "valid" or "same" (case-insensitive).
    • data_format "channels_last" or "channel_first"
    • activation Optional. Activation function which is applied on the final layer of the function. Function should accept Tensor and graph as parameters
    • kernel_initializer An initializer object for the convolution kernel.
    • bias_initializer  An initializer object for bias.
    • name string which represents name of the layer.
    Returns:

    Tensor output.

    Usage:

    Add this to your code:

    2. flatten - Flattens an input tensor.


    I wrote these snippets while building a tool using deeplearnjs where I do things like loading datasets, batching, saving checkpoints along with visualization. I will share more on that in my future posts.

    Thursday 11 January 2018

    Hacking FaceNet using Adversarial examples


    With the rise in popularity of face recognition systems with deep learning and it's application in security/ authentication, it is important to make sure that it is not that easy to fool them. I recently finished the 4th course on deeplearning.ai where there is an assignment which asks us to build a face recognition system - FaceNet. While I was working on the assignment, I couldn't stop thinking about how easy it is to fool it with adversarial examples. In this post I will tell you how I managed to do it.

    First off, some basics about FaceNet. Unlike image recognition systems which map every image with a class, it is not possible to assign a class label to every face in face recognition. This is because one, there are way too many faces that a system should handle in the real world to assign class to each of them and two, if there are new people the system should handle, it can't do it. So, what we do is, we build a system that learns similarities and dissimilarities. Basically, there is a neural network similar to what we have in image recognition and instead of applying softmax in the end, we just take the logits as embedding for the given image input and then minimize something called the triplet loss.  Consider face A, we have a positive match P and negative match N. If f is the embedding function and L is the triplet loss, we have this:

    Triplet loss

    Basically, it is incentivizing small distance between A - P and large distance between A - N. Also, I really recommend watching Ian Goodfellow's lecture from Stanford's CS231n course if you want to know about adversarial examples.

    Like I said earlier, this thought came to me while doing an assignment from 4th course from deeplearning.ai which can be found here and I have built on top of it.  The main idea here is to find small noise that when added to someone's photo although causing virtually no visual changes, can make faceNet identify them as the target.





    Benoit (attacker)
    Add noise
    Kian
    Kian Actual (Target)

    First lets load the images of the attacker Benoit and the target Kian.


    Now say that the attacker image is A` and the target image is T. We want to define triplet loss to achieve two things:

    1. Minimize distance between A` and T
    2. Maximize distance between A` and A` (original)
    In other words the triplet loss L is:

    L (A, P, N) = L (A`, T, A`)

    Now, let's compute the gradient of the logits with respect to the input image 



    These gradients are used to obtain the adversarial noise as follows :

    noise = noise - step_size * gradients

    According to the assignment, a l2 distance of the embeddings of less than 0.7 indicates that two faces have the same person. So lets do that.



    The distance decreases from 0.862257 to 0.485102 which is considered enough in this case.

    L2 distance between embeddings of attacker and target
    This is impressive because, all this is done while not altering the image visibly just by adding a little calculated noise!



    Also note that the l2 scores indicate that the generated image is more of Kian than Benoit in spite of looking practically identical to Benoit. So there you go, adversarial example generation for FaceNet.


    Sunday 17 December 2017

    Tensorflow and AEM

    It has been a while since google released Tensorflow support for java. Even though it is still in its infancy, I feel like it has everything we need. Build computation graphs - check, run session and compute stuff - check, GPU support - check. Now if you have all the time in the world to reinvent the wheel, you can pretty much build anything in java that we can build using python or c++.

    So, I have been working on Adobe Experience Manager since I joined Adobe and recently, I started experimenting with several use cases where machine learning can help in content creation and discovery. As I have zero knowledge in building any deep learning models in java, I decided to build everything in java. How hard can it be? Right? Right? Sarcasm aside, as I mentioned earlier, Tensorflow for java has everything we need and as it internally uses JNI we can have interoperability with python and c++ (that's why I preferred this over deeplearning4j).

    First off, I followed their official guide for the setup and had to face a lot of hurdles along the way. In this post I will show you how I managed to successfully setup Tensorflow on AEM (or any felix based systems).

    Step 1

    Add the dependency to your pom.xml file. Note that the scope set to compile.


    Step 2

    Add this configuration to your maven-bundle-plugin.


    Step 3

    Build and install to your AEM instance. Then, navigate to /system/console/bundles/ and look for the bundle which contains the dependency. See if the "Exported Packages" section has the following packages:


    Step 4

    Install JNI if necessary (this is mentioned in the link that I shared earlier).

    Then place the library file in the appropriate place.

    Testing 

    Lets write a simple sling servlet to check if everything is working as expected. Like I told earlier, Tensorflow for java is still in its infancy. So, I wrote a helper class a while back to manipulate the computation graph. Get GraphBuilder.java and place it where it is accessible to the sling servlet.

    GraphBuilder.java


    The following sling servlet includes things like:
    • Creating a computation graph
    • Creating placeholders, constants etc
    • Arithmetic operations, matrix multiplication.
    • Feeding data and computing values of placeholders.


    When you go to /services/tftest you should get something like this:

    4 -2 3 0 FLOAT tensor with shape [3, 3] 14 Testing done!
    Now you can start building any deep learning model on AEM. Also, I will be writing about some of the real life applications of deep learning in content creation and content discovery. So stay tuned!


    Wednesday 29 March 2017

    Most original prize at The 2017 Deep Learning Hackathon

    Although I have worked on several deep learning projects in the past, I still consider myself to be a newbie in deep learning because of all the new things that keep coming up and it is so hard to keep up with all that. So, I decided to take part in "The 2017 Deep Learning Hackathon" by Deepgram to work on something I have been wanting to do for a while now.

    I built something called Medivh - prophet from Warcraft who has seen the future.  The idea was to build a tool for web developers to predict how users are going to see / use the site even before deploying. Basically, it generates heat maps on websites which show where the user might look at. Example:



    I will write another post with all the technical details. Here is the sneak peak of how it was done.




    Apart from building that, We got an opportunity to interact with people like Bryan Catanzaro - maker of CUDNN and VP at Nvidia,  Jiaji Huangform from Baidu, Jonathan Hseu from Google Brain etc.

    We also got to interact with people from Deepgram and their caffe like framework called Kur which seems pretty good. I think I'll write a review about Kur after playing around with it for some more time.

    Also this:

    This is me presenting before the results.
    For Medivh, I won the "Most original prize" -  Nvidia Titan X pascal. What a beauty!


    Thursday 9 March 2017

    Introducing mailing in crontab-ui

    Now crontab-ui has option to send mails after execution of jobs along with output and errors attached as text files. This internally uses nodemailer and all the options available through nodemailer are available here.

    Defaults


    To change the default transporter and mail config you can modify config/mailconfig.js.
    var transporterStr = 'smtps://user%40gmail.com:password@smtp.gmail.com';
    
    var mailOptions = {
        from: '"Fred Foo 👥" <foo@blurdybloop.com>', // sender address
        to: 'bar@blurdybloop.com, baz@blurdybloop.com', // list of receivers
        subject: 'Job Test#21 Executed ✔', // Subject line
        text: 'Test#21 results attached 🐴', // plaintext body
        html: '<b>Test#21 🐴</b> results attached' // html body
    };

    Troubleshooting


    Make sure that you have node at /usr/local/bin/node else you need to create a softlink like this
    ln -s [location of node] /usr/local/bin/node

    Setting up crontab-ui on raspberry pi

    In this tutorial I will show you how to setup crontab-ui on raspberry pi.

    Step 1

    Find your architecture
    uname -a
    Linux raspberrypi 4.4.50-v7+ #970 SMP Mon Feb 20 19:18:29 GMT 2017 armv7l GNU/Linux
    
    Note that it is ARMv7. Download and extract latest node.
    wget https://nodejs.org/dist/v7.7.2/node-v7.7.2-linux-armv7l.tar.xz
    tar xz node-v7.7.2-linux-armv7l.tar.xz
    sudo mv node-v7.7.2-linux-armv7l /opt/node

    Step 2

    Remove old nodejs if it is already installed and add the latest node to the $PATH
    sudo apt-get purge nodejs
    echo 'export PATH=$PATH:/opt/node/bin' > ~/.bashrc
    source ~/.bashrc

    Step 3

    Install crontab-ui and pm2. And start crontab-ui.
    npm install -g crontab-ui
    npm install -g pm2
    pm2 start crontab-ui
    Now your crontab-ui must be running. Visit http://localhost:8000 on your browser to see if it is working.

    Step 4 (Optional)

    In order to be able access crontab-ui from outside, you have to forward the port 8000. Install nginx and configure.
    sudo apt-get install nginx
    sudo vi /etc/nginx/sites-available/default
    Paste the following lines in the file:
    server {
        listen 8001;
    
        server_name localhost;
    
        location / {
            proxy_pass http://localhost:8000;
        }
    }
    Restart nginx
    sudo service nginx restart
    Now, crontab-ui must be accessible from outside through port 8001. So, to access crontab-ui, go to
    <ip address of pi>:8001
    You can also setup http authentication by following this.
    Thanks!
    Fork me on Github