Image captioning model

Progress is slow for my various hobby efforts to generate images, create lyrics or even detect my dog in an image, but I do still experiment a little bit when I have the time for it.

Today, I ran through the image captioning tutorial from Tensorflow, because why not. I trained it on a reduced set of 5000 images compared to the tutorial, but otherwise the code is identical.

The purpose of the model is to look at an image and predict a caption for the image, e.g. “man sits next to computer” if the model was looking at me right now.

The results are quite hilarious at this point, and it has given me some ideas for future development that could potentially delight or confuse.

Anyway, for now, here is a caption that is as close to being accurate as I could get today. It’s a picture of Mila, my dog, with the predicted caption “a little dog sitting on dirt field”. Not too bad.

Image of Mila, my dog, sitting on a lawn
Best predicted caption: a little dog sitting on dirt field

The model has a bit of randomness in its output, so the same image might produce multiple different results. For example, the image above also produced “a dog is surfing on a field with it’s toppings in front of grass” as well as the completely nonsensical “a fire up of a green grass next to a grass in a grassy field”.

I also tried a few other images, but I am a too lazy to look up the proper attribution for including them in this post, so here is another image of Mila with the strange caption “a dog that is standing in tall grass in the grassy field with a large dog and white dog is standing in a grassy field next to a tree”

Best predicted caption: a dog that is standing in tall grass in the grassy field with a large dog and white dog is standing in a grassy field next to a tree

There is also the slightly more boring “a dog standing in front of a grass” and my personal favorite (although it’s a bit disturbing) “a dressed dog in a field eaten horse to enjoy zebras grass in the grass”.

That is it for now. Not much content here, but I felt like writing a post. Stay tuned for possibly more silliness when the spark of energy hits.


The Painting Dataset

In my continuing exploration of generative adversarial networks, I found “The Painting Dataset” and wrote a short script to extract the images from it. Only about 1200 paintings seems to be valid, but it is at least something.


Mighty mighty world

I found these lyrics hidden away in my notes. They were created using my updated lyrics generator earlier this year after retraining the neural network using a sentence embedding instead of a word embedding (lyrics generator code). I guess I was waiting to publish it as part of a larger post. No need. This masterpiece stands on its own :-)

hello world

a mighty mighty mighty mighty mighty mighty a world

world world

and mighty
mighty mighty

all mighty mom mighty

babies mom
the mom


roman you roman mom

plastic cat’s roman roman

the pagan pagan pagan pagan pagan pagan pagan pagan pagan i pagan oh
and dear


Gene Lyrica Two