What happened next

Future Or Bust!
Future Or Bust! by Paul Hocksenar

In July last year (2014), I started working full-time on Antecons. Back then, I wrote a post about complete independence where I predicted that I could support myself for up to half a year without income. This turned out to be true, but now that we have entered a new year, it is time for an update. This is an update that I was nervous about until very recently where a story about monetary failure took an interesting turn for the better.

The challenge

A lot of people try to hide their failures, myself included. Being afraid of failure, taking risks and facing challenges, it seems like a very weird and poor life choice to give up a lucrative and comfortable position as a consultant to work without income and earning potential for an extended period of time. The explanation is simple though: Things were starting to get fun.

When Antecons launched on Shopify in 2013, it was mostly developed while doing freelance consulting work. Having a high-revenue commercial product was never really the goal. I simply wanted to see if it was possible for me to start something from scratch and make it through all the way to a finished product and not just an abandoned side-project.

When the first few customers started coming in, I realized that this product might have some potential, and it was quite a different feeling having customers that bought a service rather than buying my labor. It was… fun. So I started working on Antecons full-time in July with the vision of slowly building up a list of stable clients. I had a good relationship and regular contact with a webshop house, and they sounded very interested in offering Antecons to their webshop customers so I pushed towards a first “revenue milestone” with them as a re-seller. It was a great Summer and I built an API beta that I was satisfied with offering to potential buyers.

Fail

My plan failed. It turned out that nothing came out of my contact with the webshop re-seller and in the meantime, Shopify sales only increased very slowly. Antecons was even featured on the front page of the Shopify app store, but there were disappointingly few signups.

There is no need to go into great detail about the technicalities, because the main problem was that I neglected sales too much so I failed to sell. That makes sense now, but I did not realize it soon enough… so I failed. Period. The end.

Late Fall came, I was not making any money and the product was not going to generate significant revenue anytime soon. I had to take on a few hours of consulting work again to pay the bills which meant that I had less time to work on Antecons and in the beginning of December, I was hired full-time as a Python developer for Neodev.

I had mixed feelings about starting a new job, because it felt like a defeat to stop working on Antecons. Both the job and my colleagues were great though, and it was sometimes quite difficult to explain to friends and family how having a rewarding and well-compensated job could still feel a bit like a step back or a let-down. Nevertheless, it was difficult to leave Antecons behind. As it turned out though, the departure was short-lived.

An unexpected journey

It was exactly one day after signing my new contract that I was contacted by Adii, a successful entrepreneur with experience in the e-commerce field. Adii was looking for a recommendation engine to improve a product upsell feature in a young startup called Receiptful. Initially, I was skeptical and without hope, because I had basically given up on monetizing Antecons since it was not making any real money. But after a few weeks of communication, we decided to work together and I was “acquihired” (yes, that is a real word) and Antecons was revived before it had even drawn its final breath.

Fast forward to today and I have been with Receiptful for a month, working on integrating Antecons as part of the Receiptful system. It is really great to be able to work on Antecons and data analysis in a full-time position and it has also presented some new challenges, but that is a topic for a different post.

Working independently was a great experience. It was an unexpected journey with an unexpected ending. A new and different chapter has now begun: Life in a startup.

Product similarities and relations

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


I recently started to implement a feature that analyzes similarities and relations between products and factors this analysis into the recommendations that are created. As mentioned in the previous post, more data means better recommendations.

In fact, adding product analysis to the equation is a huge improvement to Antecons for several reasons:

  1. Brand new shops will (probably) see recommendations immediately even if they do not have any sales yet.
  2. Some products might get very few sales or page views. Product relations help improve visibility of these products.
  3. The shop owner is indirectly influencing the recommendations with the tags that are added to a product.

This feature is now fully rolled out but there are still some technical details that are being tweaked. If you are interested in the technicalities, read on.

Complexity

There are some tiny problems with finding product relations: Complexity and cost. One approach is to compare each product with every other product. This requires O(n2) comparisons (where n is the number of products) which is not ideal but it sounds ok since the analysis does not have to run very often.

The first approach I tried was to create a pipeline that reads batches of products and compares each of these products to all products that “come after” that product for a total of n(n+1)/2 comparisons. This is not a problem for a few hundred products but with a few thousand products it starts to get problematic. If we have 10.000 products, the analysis will have to perform about 50 million product entity reads. On Google App Engine (GAE), each entity fetch is 1 read for the entity and 1 read for the query that fetched the entity. Reading the products in batches of 50 would thus require about one million queries and a total of about 51 million reads. On GAE, datastore reads cost $0,06 / 100.000 operations so the price for running this analysis would be at least $30 — and that is only reading the data…

Needless to say, this has failed as a scalable and affordable solution and I should have done the math before going down that path but… lesson learned.

MapReduce to the rescue?

The second approach I tried was to let the MapReduce framework do some of the work for us. The idea would be to run through all products exactly once and map each product to key-value pairs consisting of tags and product keys. The map and reduce steps could be written something like this:

product_map(product):
    # Create combinations of tags
    tag_combos = combinations(product.tags, 2)

    # Yield each combination of tags.
    for tag_subset in tag_combos:
        sorted_tag_subset = sorted(tag_subset)
        yield sorted_tag_subset, product

product_reduce(tags, products):
    # Create combinations of products.
    product_combos = combinations(products, 2)

    # Calculate the similarity and shared tags of each combination of products.
    for combo in product_combos:
        relation = ProductRelation(p1=combo[0], p2=combo[1])
        yield operation.db.Put(relation)

The above code is not exactly how I did it but pretty close. The problem with this is that the amount of relations that need to be stored is the same so I am still storing (potentially) massive amounts of data.

Locality-sensitive hashing and good ol’ queries

When I started developing Antecons for Google App Engine, I minimized the number of indexed properties per entity. Since then, I have learned that it is better to focus on minimizing the number of entities so having up to n2 product relations as separate entities did not seem to be the way to go. For tag relations, indexing the tags for each product seemed to be an obvious choice so I did that. This way, it is easy to select related products based on tags with some datastore queries instead of querying separate relation entities.

Finding product similarities, however, was a more tricky problem to solve. For example, how is it possible to find products with similar titles based on a datastore query? Can we split the title into tokens and query for each of these tokens? Should we use full-text search? What if a product uses two different spellings? What if similar products could be grouped into buckets that can be queried? Ok, now we are on to something…

Locality-sensitive Hashing is a technique that does exactly this: Given a set of web documents, each document is hashed to a specific bucket such that documents in the same bucket are similar. Given a new web document, we can find similar documents by looking in bucket that the document belongs to.

After some testing, I ended up using an implementation of simhash. Now, every time a product is saved, three simhash buckets are calculated and these can then be used to query for similar products. In other words, we only store three extra fields per product, a very efficient and scalable solution.

Conclusion

I am happy to have added extra recommendation data to Antecons with product relations and similarities. This is not the end of it though since I am already considering how I can approve the above approach so it is faster and more robust. I will continue to write on the blog when there are new improvements for Antecons.

Thank you for reading!

More data, better recommendations

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


Today, we have published an improvement for the Antecons recommendation algorithm. In the beginning, recommendations were based on an analysis of order data for a webshop which has turned out to work quite well. But more data is better. Starting today, Antecons will also analyze data based on what products customers are looking at on the webshop. This improves the recommendations, especially for products that have recently been added to a shop and have not been sold so much yet.

There are many other ideas and features in the pipeline and one of them is adding similarity measures as a recommendation tool. That is, similarity in terms of common product tags and similar product titles. This is probably going to find its way into Antecons in the near future, possibly as an opt-in feature.

As an extra note, the back-end and infrastructure of Antecons is constantly improving, thanks in part to the constant improvements being made to Google App Engine. Scalability and reliability are key elements for a high-performance app like Antecons and GAE makes it possible to focus on the app instead of the infrastructure. This might sound like a sales pitch for GAE but actually, it is one of Antecons’ secret weapons.

Experimental features

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


Yesterday, I found out exactly what it means when Google warns about their experimental App Engine features: Your code might eventually break. Let me be clear, I am not blaming Google. They give you fair warning:

Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce.

I have written about my usage of the MapReduce framework earlier. Yesterday, I updated the MapReduce framework to the latest version only to see that my custom Datastore reader suddenly had stopped working and I was seeing exceptions in my MapReduce pipeline. Bummer.

Long story short, I spent a day debugging the new code and finally got it working by:

  1. Digging through the MapReduce framework code. Hurray for open source!
  2. Dropping the idea of running FP-Growth on batches of entities and instead running the mapping function on each entity.

That second point probably requires some explanation to really grasp and I am not sure I will be able to but maybe some pseudo-Python will help. The biggest change happened in the map-step of the Frequent Patterns MapReduce pipeline. Basically I went from this:

def map_batch_of_transactions(batch):
    frequent_patterns = fpgrowth.run(batch)
    for p in frequent_patterns:
        yield p, p.support

to this:

def map_single_transaction(transaction):
    frequent_patterns = itertools.combinations(transaction, 2)
    for p in frequent_patterns:
        yield p, 1

The MapReduce shuffler takes care of grouping together patterns with the same key so with the new method, the shuffler will have more work to do since the same patterns will be yielded more often. Let’s say we have the pattern:

a,b (support: 4)

Before, the shuffler would just receive:

('a,b', 4)

but now it will receive:

('a,b', 1)
('a,b', 1)
('a,b', 1)
('a,b', 1)

On the other hand, FP-growth does not have to run so the map-step of the pipeline has more predictable performance characteristics. It remains to be seen if the change has significant impact on the entire MapReduce process. I am currently testing this.

So anyway, the whole point of this post was: If a feature is experimental, watch out. Sounds obvious right? Well…

Creating a Shopify plugin

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


In the last post about Antecons, I wrote about frequent patterns and MapReduce. Since then, this base functionality has been tweaked to work a little better but the main focus has been on creating the first plugin for Antecons: A Shopify app.

I have previously mentioned that it is important to me that Antecons can be easy to set up. Although focusing on the API initially was a great way to get started, it has become clear after some thought that the reach is limited if the entire product is just an API from the beginning. So I have begun implementing a plugin (or an app as they call it) for Shopify. There are two reasons for choosing Shopify as the first integrated platform for Antecons:

  1. It is easy to get started with app development on Shopify and their API is quite extensive.
  2. Shopify was simply something I knew before-hand so I did not need to research other options too much.

So far, it has been a good experience working with the Shopify API. Integrating with Python/Google App Engine was very easy thanks to their open source API bindings for python.

The Shopify app for Antecons is not finished yet or at least, I do not consider it in a state where I can release it. However, it is in state where it installs correctly and delivers recommendations to the webshop it is installed on. Instead of writing a lot about that, below are some screenshots that show the functionality. The suggestions by Antecons are the little “You might also like” products.

Antecons installation start
Antecons installation start
Antecons app authentication
Antecons app authentication
Antecons installed
Antecons installed
Product page suggestion
Product page suggestion
Cart page suggestion
Cart page suggestion