{"id":2376,"date":"2019-03-17T18:36:45","date_gmt":"2019-03-17T17:36:45","guid":{"rendered":"https:\/\/davidlebech.com\/thoughtflow\/?p=2376"},"modified":"2021-05-05T18:03:56","modified_gmt":"2021-05-05T16:03:56","slug":"reinforcement-learning","status":"publish","type":"post","link":"https:\/\/davidlebech.com\/thoughtflow\/reinforcement-learning\/","title":{"rendered":"Reinforcement learning"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I have been looking into a machine learning technique called <a href=\"https:\/\/en.wikipedia.org\/wiki\/Reinforcement_learning\">reinforcement learning<\/a> (RL) lately. This was on my TODO for a while, and I must say, this field is incredibly exciting! I played around with some <a href=\"https:\/\/gym.openai.com\/\">OpenAI Gym<\/a> environments and re-implemented two RL algorithms mostly based on code I found from other authors.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">After spending many hours on this, I can still only get my algorithm to solve the Cartpole problem, where the goal is to balance a pole on a moving cart (video below). I haven&#8217;t cracked the nut on a continuous action problem like Pendulum, where the goal is to swing the pendulum into an upright position and keep it there (video below).<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Anyway, here is <a href=\"https:\/\/github.com\/dlebech\/reinforcement-learning\">my implementation of the RL algorithms<\/a>. Perhaps it will be useful for someone :-)<\/p>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"1080\" style=\"aspect-ratio: 1920 \/ 1080;\" width=\"1920\" controls src=\"https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/cartpole.webm\"><\/video><\/figure>\n\n\n\n<figure class=\"wp-block-video\"><video height=\"1080\" style=\"aspect-ratio: 1920 \/ 1080;\" width=\"1920\" controls src=\"https:\/\/davidlebech.com\/thoughtflow\/wp-content\/uploads\/pendulum.webm\"><\/video><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>I have been looking into a machine learning technique called reinforcement learning (RL) lately. This was on my TODO for a while, and I must say, this field is incredibly exciting! I played around with some OpenAI Gym environments and re-implemented two RL algorithms mostly based on code I found from other authors. After spending [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[114],"tags":[194,218],"class_list":["post-2376","post","type-post","status-publish","format-standard","hentry","category-code","tag-machine-learning","tag-reinforcement-learning"],"_links":{"self":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/2376","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/comments?post=2376"}],"version-history":[{"count":0,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/2376\/revisions"}],"wp:attachment":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media?parent=2376"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/categories?post=2376"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/tags?post=2376"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}