{"id":720,"date":"2013-05-04T21:36:15","date_gmt":"2013-05-04T19:36:15","guid":{"rendered":"http:\/\/thoughtflow.dk\/?p=720"},"modified":"2018-11-04T11:11:18","modified_gmt":"2018-11-04T10:11:18","slug":"experimental-features","status":"publish","type":"post","link":"https:\/\/davidlebech.com\/thoughtflow\/experimental-features\/","title":{"rendered":"Experimental features"},"content":{"rendered":"<p><em>This post is about Antecons, a product recommendation engine, now part of <a href=\"https:\/\/conversio.com\/\">Conversio<\/a>. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.<\/em><\/p>\n<hr>\n<p>Yesterday, I found out exactly what it means when Google warns about their experimental App Engine features: Your code might eventually break. Let me be clear, I am not blaming Google. They give you fair warning:<\/p>\n<blockquote><p>Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce.<\/p><\/blockquote>\n<p>I have written about <a href=\"\/thoughtflow\/mapreduce-and-frequent-patterns\/\" title=\"Frequent patterns and MapReduce\">my usage of the MapReduce framework<\/a> earlier. Yesterday, I updated the MapReduce framework to the latest version only to see that my custom Datastore reader suddenly had stopped working and I was seeing exceptions in my MapReduce pipeline. Bummer.<\/p>\n<p>Long story short, I spent a day debugging the new code and finally got it working by:<\/p>\n<ol>\n<li>Digging through the MapReduce framework code. Hurray for open source!<\/li>\n<li>Dropping the idea of running FP-Growth on batches of entities and instead running the mapping function on each entity.<\/li>\n<\/ol>\n<p>That second point probably requires some explanation to really grasp and I am not sure I will be able to but maybe some pseudo-Python will help. The biggest change happened in the map-step of the Frequent Patterns MapReduce pipeline. Basically I went from this:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ndef map_batch_of_transactions(batch):\r\n    frequent_patterns = fpgrowth.run(batch)\r\n    for p in frequent_patterns:\r\n        yield p, p.support\r\n<\/pre>\n<p>to this:<\/p>\n<pre class=\"brush: python; title: ; notranslate\" title=\"\">\r\ndef map_single_transaction(transaction):\r\n    frequent_patterns = itertools.combinations(transaction, 2)\r\n    for p in frequent_patterns:\r\n        yield p, 1\r\n<\/pre>\n<p>The MapReduce shuffler takes care of grouping together patterns with the same key so with the new method, the shuffler will have more work to do since the same patterns will be yielded more often. Let&#8217;s say we have the pattern:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\na,b (support: 4)\r\n<\/pre>\n<p>Before, the shuffler would just receive:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n('a,b', 4)\r\n<\/pre>\n<p>but now it will receive:<\/p>\n<pre class=\"brush: plain; title: ; notranslate\" title=\"\">\r\n('a,b', 1)\r\n('a,b', 1)\r\n('a,b', 1)\r\n('a,b', 1)\r\n<\/pre>\n<p>On the other hand, FP-growth does not have to run so the map-step of the pipeline has more predictable performance characteristics. It remains to be seen if the change has significant impact on the entire MapReduce process. I am currently testing this.<\/p>\n<p>So anyway, the whole point of this post was: If a feature is experimental, watch out. Sounds obvious right? Well&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission. Yesterday, I found out exactly what it means when Google warns about their experimental App Engine features: Your code might eventually break. Let me [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[69],"tags":[147,20,11,45,84],"class_list":["post-720","post","type-post","status-publish","format-standard","hentry","category-antecons","tag-antecons","tag-debugging","tag-open-source","tag-programming","tag-python"],"_links":{"self":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/720","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/comments?post=720"}],"version-history":[{"count":0,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/720\/revisions"}],"wp:attachment":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media?parent=720"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/categories?post=720"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/tags?post=720"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}