{"id":2805,"date":"2020-11-01T21:43:53","date_gmt":"2020-11-01T20:43:53","guid":{"rendered":"https:\/\/davidlebech.com\/thoughtflow\/?p=2805"},"modified":"2022-03-16T20:59:41","modified_gmt":"2022-03-16T19:59:41","slug":"image-captioning-model","status":"publish","type":"post","link":"https:\/\/davidlebech.com\/thoughtflow\/image-captioning-model\/","title":{"rendered":"Image captioning model"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">Progress is slow for my various hobby efforts to <a href=\"https:\/\/github.com\/dlebech\/gan\">generate images<\/a>, <a href=\"https:\/\/github.com\/dlebech\/lyrics-generator\">create lyrics<\/a> or even <a href=\"https:\/\/github.com\/dlebech\/is-mila\">detect my dog in an image<\/a>, but I do still experiment a little bit when I have the time for it.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Today, I ran through the <a href=\"https:\/\/www.tensorflow.org\/tutorials\/text\/image_captioning\">image captioning tutorial<\/a> from Tensorflow, because why not. I trained it on a reduced set of 5000 images compared to the tutorial, but otherwise the code is identical.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The purpose of the model is to look at an image and predict a caption for the image, e.g. &#8220;man sits next to computer&#8221; if the model was looking at me right now.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The results are quite hilarious at this point, and it has given me some ideas for future development that could potentially delight or confuse.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Anyway, for now, here is a caption that is as close to being accurate as I could get today. It&#8217;s a picture of Mila, my dog, with the predicted caption &#8220;a little dog sitting on dirt field&#8221;. Not too bad.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1024x683.jpg\" alt=\"Image of Mila, my dog, sitting on a lawn\" class=\"wp-image-2808\" srcset=\"https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1024x683.jpg 1024w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-300x200.jpg 300w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-150x100.jpg 150w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-768x512.jpg 768w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila.jpg 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Best predicted caption: a little dog sitting on dirt field<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">The model has a bit of randomness in its output, so the same image might produce multiple different results. For example, the image above also produced &#8220;a dog is surfing on a field with it&#8217;s toppings in front of grass&#8221; as well as the completely nonsensical &#8220;a fire up of a green grass next to a grass in a grassy field&#8221;.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I also tried a few other images, but I am a too lazy to look up the proper attribution for including them in this post, so here is another image of Mila with the strange caption &#8220;a dog that is standing in tall grass in the grassy field with a large dog and white dog is standing in a grassy field next to a tree&#8221;<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1-1024x683.jpg\" alt=\"\" class=\"wp-image-2809\" srcset=\"https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1-1024x683.jpg 1024w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1-300x200.jpg 300w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1-150x100.jpg 150w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1-768x512.jpg 768w, https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/mila-1.jpg 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption>Best predicted caption: a dog that is standing in tall grass in the grassy field with a large dog and white dog is standing in a grassy field next to a tree<\/figcaption><\/figure>\n\n\n\n<p class=\"wp-block-paragraph\">There is also the slightly more boring &#8220;a dog standing in front of a grass&#8221; and my personal favorite (although it&#8217;s a bit disturbing) &#8220;a dressed dog in a field eaten horse to enjoy zebras grass in the grass&#8221;.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">That is it for now. Not much content here, but I felt like writing a post. Stay tuned for possibly more silliness when the spark of energy hits.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Progress is slow for my various hobby efforts to generate images, create lyrics or even detect my dog in an image, but I do still experiment a little bit when I have the time for it. Today, I ran through the image captioning tutorial from Tensorflow, because why not. I trained it on a reduced [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[37],"tags":[200,191,203,238],"class_list":["post-2805","post","type-post","status-publish","format-standard","hentry","category-projects","tag-classification","tag-deep-learning","tag-tensorflow","tag-text-generation"],"_links":{"self":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/2805","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/comments?post=2805"}],"version-history":[{"count":0,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/2805\/revisions"}],"wp:attachment":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media?parent=2805"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/categories?post=2805"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/tags?post=2805"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}