Is it Mila?

One of the great things about the Internet is that people create all sorts of silly, but interesting, stuff. I was recently fascinated by a deep learning project where an app can classify images as “hotdog” or “not hotdog”. The project was itself inspired by a fictional app that appears in HBO’s show Silicon Valley, and the project was organized by an employee at HBO.

The creator of the app wrote an excellent article, outlining how the team approached building the app. From data gathering, over designing and training a deep learning neural network to building an app for the Android and iPhone app stores.

Naturally, I thought to myself: perhaps I can be silly too. So I started a small project to try and classify whether an image contains my dog Mila or not. (Also, the architecture for the hotdog app is called DeepDog, so as you can see, it is all deeply connected!)

The is-mila project is not as large and detailed as the hotdog project (for example, I am not building an app), but it was a fun way to get to know deep learning a bit better.

The full code for the project is available on Github, and feel free to try and classify a photo as well.

A simple start

One of the obstacles to any kind of machine learning task is to get good training data. Fortunately, I have been using Flickr for years, and many of my photos have Mila in them. Furthermore, most of these photos are tagged with “Mila”, so it seemed like a good idea to use the Flickr photos as the basis for training the network.

Mila as a puppy
Mila as a puppy

I prepared a small script and command-line interface (CLI) for fetching pictures via the Flickr API. Of course, my data was not as clean as I thought it would be, so I had to manually move some photos around. I also removed photos that only showed Mila from a great distance or with her back to the camera.

In the end, I had 263 photos of Mila. There were many more “not Mila” photos available of course, but I decided to also use only 263 “not Mila” photos so the training set for the two classes “Mila” and “not Mila” had equal size. I do not really want to discuss overfitting, data quality, classification accuracy, etc. in this post, but there are many interesting topics to discuss there for another time.

For the deep learning part, I used Keras which is a deep learning library that is a bit simpler to get started with than e.g. Tensorflow. In the first iteration, I created a super-simple convolutional neural network (CNN) with just three convolutional layers and one fully-connected layer (and some MaxPooling and Dropout layers in between).

Training this network was faster than I thought and only took a few minutes. In my latest run, the accuracy settled at around 79% and validation accuracy (i.e. for photos that were not used to train the network) at 77% after 57 epochs of roughly six seconds each. This is not very impressive, but for binary classification, anything above 50-60% accuracy is at least better than a coin flip.

Finally, I created a simple website for testing the classification. I did not bother using a JavaScript transpiler/bundler like Babel/Webpack, so the site only works in modern browsers. You can try the simple classification here if you like.

The results from this initial experiment were interesting. In the validation set, most of the photos containing Mila were correctly classified as Mila, and a few were classified as not Mila for no obvious reasons. For example, these two images are from a similar setting, with similar lighting, but with different positioning of Mila, and they are classified differently:

Mila, correctly classified
Mila, correctly classified
Mila, incorrectly classified as not Mila
Mila, incorrectly classified as not Mila

Perhaps more surprising though are the false positives, the photos classified as Mila when they do not have Mila in them. Here are some examples:

Sports car, classified as Mila
Sports car, classified as Mila
Rainbow crosswalk, classified as Mila
Rainbow crosswalk, classified as Mila
Goats, classified as Mila
Goats, classified as Mila

Mila is certainly fast, but she is no sports car :-)

As of writing this, I am still uncertain what the simple network sees in the photos it is given. I have not investigated this yet, but it would be an interesting topic to dive into at a later stage.

Going deeper

A cool feature of Keras is that it comes with a few pre-trained deep learning architectures. In an effort to improve accuracy, I tried my luck with using a slightly modified MobileNet architecture using pre-trained weights for the ImageNet dataset, which contains a big and diverse set of images.

The Keras-provided MobileNet network is 55 layers deep so it is quite a different beast than the “simple” network outlined above. But by freezing the weights of the existing network layers and adding a few extra output layers as needed for my use case (binary classification of “Mila” and “not Mila”), the complexity of training the network was reduced since there were less weights to adjust.

After training the network for 48 epochs of about 18 seconds each, the training accuracy settled around 97% and validation accuracy at 98%. The high accuracy was surprising and felt like an excellent result! For example, the Mila pictures shown above were now both correctly classified, and the sports car and rainbow cross walk were no longer classified as being Mila. However, the goat was still “Mila” so something was still not quite right…

You can try out the network here if you like.

At this point, I had a hunch that the increased accuracy of MobileNet was mainly due to its ability to detect dogs in pictures (and the occasional goat). Unfortunately, it was worse than that, and photos of both dogs, cats, birds, butterflies, bears, kangaroos and even a squirrel were classified as being Mila.

It seemed I had not created a Mila detector, but an animal detector. I had kind of expected a result like this, but it was still a disappointing realization, and this is also where the story ends for now.

Sneaky squirrels and other animals

To summarize, I tried to create an image classifier that could detect Mila in photos, but in the current state of the project, this is not really possible. Writing this blog post feels like the end of the journey, but there are still many tweaks and improvements that could be made.

For example, it would be interesting to know why the “simple” network saw a rainbow crosswalk as Mila, and it would be nice to figure out how to improve the quality of the predictions for the MobileNet version such that it does not just say that all pets are Mila. One idea could be to clean the training data a bit more, e.g. by having more pets in the “not Mila” photo set or perhaps restrict the Mila photos to close-ups to improve consistency and quality in that part of the data.

One thing is for sure: there is always room for improvement, and working on this project has been a nice learning experience so far. As an added benefit, I managed to mention squirrels in a (technical) blog post, and I will leave you with a picture of the sneaky “Mila” squirrel:

Sneaky squirrel, classified as Mila
Sneaky squirrel, classified as Mila

(I like squirrels. A lot. It was all worth it just for the squirrel.)

University is what you make of it

Being a developer in a position far removed from academia, I am often confronted with the question of whether my university degree was worth the effort or not. Or to put it more mildly: would I be where I am today without it. I usually arrive at the same conclusion: yes, it was definitely worth it for me. And here is an important thing to keep in mind about higher education: it is what you make it out to be.

Anecdotally, I know both sides of the education opinion spectrum very well. When I was growing up, higher education was the most important thing in the world, and people that did not go through university were frowned upon. I have also often heard the song of how companies hunger for computer science graduates, and how good it is to have a Master’s degree and not “just” a Bachelor’s degree.

On the other hand, I have met many people that told me that education is a waste of time. I also know at least a handful of professional developers that are self-taught and some of them wear that as a badge of honor — sometimes also dismissing education outright and calling it useless.

I reject the mentality of both these extremes, and at least statistics like the 2017 Stack Overflow Survey seem to indicate that the industry as a whole has a more nuanced view of education. According to the survey, 76.5% of all professional developers have a Bachelor’s degree or higher which means that roughly one out of every four professional developers do not have a formal education. At the same time, 32% (almost a third of all developers) respond that education is not very important, but most of the responses are grouped around the middle with education being “somewhat important”.

Education or not, neither is right or wrong, and I think it is important to have a balanced view of this. However, I do not want to dismiss the feelings involved here. I would be lying if I said it did not affect me when I was a mid-twenties graduate without professional experience, and I saw much younger self-taught programmers with better business opportunities than myself. But then I realize that they probably did not build a neural network for image classification by hand, nor did they have the opportunity to discuss computer ethics with like-minded peers. And those things gave me immense joy. Likewise, I can sympathize with feelings of the opposite, although it would be disingenuous of me to presume what those feelings are.

The outcome in both cases is the same: it is easy to feel doubt and resentment. From my point of view, this comes in the shape of “why the hell did I waste time in university”, and “how come they got by without a degree?”. When these feelings emerge, they have to be put to rest quickly, because they are not helpful, and most importantly, they are missing the point.

Because in the end, when it comes to professional development, like many other parts of life, there is no right or wrong path to take. Higher education is not a measure of success, but it should not be dismissed either. University can be a tremendously rewarding experience, and the outcome is what you make of it, if you want it.

… and let’s not forget the parties…


Photo by Ian Schneider on Unsplash.

Dig the Data, StoreGrader edition

A new Dig the Data was published yesterday. It has some data insights from StoreGrader which is an app I have been working on for a while now.

For this edition of Dig the Data, I wanted to create a nice looking interactive infographic, and I wanted to combine both static and interactive elements. My previous Dig the Data visualization was quite minimal, but had full interactivity. However, it lacked a bit of the feeling of “niceness” that some static graphics can provide (as well as the magic touch of a designer, which I am not). A good example of this “niceness” is the first Dig the Data, where the entire visualization is a static image created by my colleague Julia.

This time, I teamed up with Maria to create a visualization that combines both static insights (with a bit of animation) as well as interactive graphs to explore.

I am very pleased with the result, and you can check out the post here.