Today, I ran through the image captioning tutorial from Tensorflow, because why not. I trained it on a reduced set of 5000 images compared to the tutorial, but otherwise the code is identical.
The purpose of the model is to look at an image and predict a caption for the image, e.g. “man sits next to computer” if the model was looking at me right now.
The results are quite hilarious at this point, and it has given me some ideas for future development that could potentially delight or confuse.
Anyway, for now, here is a caption that is as close to being accurate as I could get today. It’s a picture of Mila, my dog, with the predicted caption “a little dog sitting on dirt field”. Not too bad.
The model has a bit of randomness in its output, so the same image might produce multiple different results. For example, the image above also produced “a dog is surfing on a field with it’s toppings in front of grass” as well as the completely nonsensical “a fire up of a green grass next to a grass in a grassy field”.
I also tried a few other images, but I am a too lazy to look up the proper attribution for including them in this post, so here is another image of Mila with the strange caption “a dog that is standing in tall grass in the grassy field with a large dog and white dog is standing in a grassy field next to a tree”
There is also the slightly more boring “a dog standing in front of a grass” and my personal favorite (although it’s a bit disturbing) “a dressed dog in a field eaten horse to enjoy zebras grass in the grass”.
That is it for now. Not much content here, but I felt like writing a post. Stay tuned for possibly more silliness when the spark of energy hits.