Introducing MoCaDeSyMo

Like every other office, ours also has different opportunities for lunch. You can take your break and visit the cafeteria, you can order food from some local restaurants or you can just step outside and buy something from the food truck. I’m a regular food truck buyer, but this option has some downsides. My office is on the other side from the front door, so I don’t see the guy arriving. This means I end up “pinging” colleagues with a direct line of sight regularly in Skype for Business in order to get some information about when I should go for lunch. Right now there are only a few people in our team with a window facing the front door so they end up getting pinged a lot. To end their misery we came up with the idea of some type of push notification. 

The first idea was about writing a mobile app that the driver of the food truck installs on his phone. This app should recognise our GPS coordinates and push a simple message to Azure upon arrival at our front door. This could have worked but had a dependency on the driver’s phone and his free will to join the idea. After some further thinking, we came up with the idea of using a Raspberry Pi that takes pictures of the scene and uploads them and uses cognitive services to check whether the food truck is present or not. So without further ado, here is MoCaDeSyMo

MoCaDeSyMo (mobile carbohydrate delivery system monitor) is the red fellow on the left of the picture. Insides his red body is a Raspberry Pi 3 and the default Pi camera module. 

On weekdays between 10:45 and 11:30 MoCaDeSyMo takes a picture every minute and tries to detect the food truck of a local snack bar called “Imbiss Rainer”. His main target is to recognise something like this:

To make sure everyone on the team knows about the arrival of the truck MoCaDeSyMo uses Azure Cognitive Services to predict the presence of the food truck and if the probability is above a given threshold a Microsoft Teams connector gets triggered and we get notified with a message like this:

This post discusses how we created the system based on this high-level architecture:

The Raspberry part

In order to make this possible we use a simple Raspberry Pi 3 with a camera module and a default Raspbian Jessie image as it’s operating system. Raspbian has the best tools for the Pi camera and as taking pictures is the main task we decided to stick with this operating system and did not install Windows 10 IoT Core. Also with this decision, we are using Linux as our base system to talk to Azure, a new experience for us but something highly recommended given the fact that the shell script we are using has only 35 lines of code.

That’s all the code running on the Pi. It takes a picture and crops it to focus only on the part we are interested in. To take the picture we use a tool called raspistill that has many features and has a lot to offer. In order to take some training pictures for the cognitive services API we used just one command:

This takes a picture every two seconds for a period of 30 seconds. Something very useful to create sample data to train the image recognition algorithm. 

Then we use the Azure CLI 2.0 (https://azure.github.io/projects/clis/) based on Python to login and upload the image to a blob storage. Best option in terms of authentication is using a service principle to login to Azure. You could use your login and your password, but storing your password in a shell script on a Pi isn’t something your security guys are ok with. So just use the Azure CLI to create a service principal direct in Linux and use it in your script. 

Azure Function

After uploading the picture, we trigger an Azure Function with the URL of the new item in the blob storage. To get some prediction data on our image the function calls a Custom Vision endpoint:

This service is available at https://www.customvision.ai and is available as a preview for a few weeks. Within this service, you can create a project to provide image recognition predictions. “Easily customize your own state-of-the-art computer vision models that fit perfectly with your unique use case. Just bring a few examples of labeled images and let Custom Vision do the hard work.” This is the marketing slogan, and four our example it exactly delivered that. 

We used 77 pictures of the truck present and 164 images without it to train the system via the web interface.  With just one iteration of training we get results like this one:

Above we use the URL of the image in the blob storage within in the Quick Test feature of Custom Vision services. It gives us 100% false because I tagged all the images without the food truck as “false”. 

Below I uploaded a picture from my hard drive with the truck present and we get a 100% probability of true. We also get a 1,1% of false, which is something I need to investigate in detail because this gives us a sum of 101,1% in terms of probability which sounds wrong to me. But maybe it’s a feature I’m not aware of. (UPDATE: Of course I was wrong. The probabilities are not exclusive, the image recognition service gives back the probability of categorising the image with a certain tag. It doesn’t know that they are mutual exclusive hence it gives back the probability of each tag.)

The endpoint we called in our custom vision project returned a JSON object:

To parse the response we use some class definitions that can be found in a sample project (http://aihelpwebsite.com/Blog/EntryId/1025/Microsoft-Cognitive-Custom-Vision-Service-ndash-A-Net-Core-Angular-4-Application-Part-One)

With this objects, we only need to take the TagId from the response above and query the response for the probabilities of the tags we are interested in. 

Final thing is to call our Microsoft Teams connector to inform us about the given probabilities of the image taken:

Conclusion

Within only a few hours we were ready with a real prototype that uses our Raspberry Pi to take pictures and uploads them to Azure. Within an Azure Function we call the Custom Vision API to retrieve the probabilities of our custom image recognition algorithm. This information is then pushed to a Microsoft Teams channel. All that within roughly 8-9 hours of work over three days. Of course, the code isn’t production ready and the use case is very simplistic and maybe nothing our customers can relate to immediately. But for sure it shows what is possible with the current technology and some imagination and creativity.