Photo Amaze

Screenshot of Photo Amaze app

About a month ago, I attended the wedding of a childhood friend. Since I have had some extra free time lately, I came up with an idea of combining my interest in 3D with an app that could be used for the wedding. The result was Photo Maze, a 3D maze where the guests at the wedding could upload a photo from their phones and it would appear immediately in the maze, giving the bride and groom a kind of interactive photo album from their party.

I felt the urge to develop this idea a bit further and now, it has be re-named to Photo Amaze — a pun on maze and amaze. It is available for everyone, and I hope you will try it out!

Product similarities and relations

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


I recently started to implement a feature that analyzes similarities and relations between products and factors this analysis into the recommendations that are created. As mentioned in the previous post, more data means better recommendations.

In fact, adding product analysis to the equation is a huge improvement to Antecons for several reasons:

  1. Brand new shops will (probably) see recommendations immediately even if they do not have any sales yet.
  2. Some products might get very few sales or page views. Product relations help improve visibility of these products.
  3. The shop owner is indirectly influencing the recommendations with the tags that are added to a product.

This feature is now fully rolled out but there are still some technical details that are being tweaked. If you are interested in the technicalities, read on.

Complexity

There are some tiny problems with finding product relations: Complexity and cost. One approach is to compare each product with every other product. This requires O(n2) comparisons (where n is the number of products) which is not ideal but it sounds ok since the analysis does not have to run very often.

The first approach I tried was to create a pipeline that reads batches of products and compares each of these products to all products that “come after” that product for a total of n(n+1)/2 comparisons. This is not a problem for a few hundred products but with a few thousand products it starts to get problematic. If we have 10.000 products, the analysis will have to perform about 50 million product entity reads. On Google App Engine (GAE), each entity fetch is 1 read for the entity and 1 read for the query that fetched the entity. Reading the products in batches of 50 would thus require about one million queries and a total of about 51 million reads. On GAE, datastore reads cost $0,06 / 100.000 operations so the price for running this analysis would be at least $30 — and that is only reading the data…

Needless to say, this has failed as a scalable and affordable solution and I should have done the math before going down that path but… lesson learned.

MapReduce to the rescue?

The second approach I tried was to let the MapReduce framework do some of the work for us. The idea would be to run through all products exactly once and map each product to key-value pairs consisting of tags and product keys. The map and reduce steps could be written something like this:

product_map(product):
    # Create combinations of tags
    tag_combos = combinations(product.tags, 2)

    # Yield each combination of tags.
    for tag_subset in tag_combos:
        sorted_tag_subset = sorted(tag_subset)
        yield sorted_tag_subset, product

product_reduce(tags, products):
    # Create combinations of products.
    product_combos = combinations(products, 2)

    # Calculate the similarity and shared tags of each combination of products.
    for combo in product_combos:
        relation = ProductRelation(p1=combo[0], p2=combo[1])
        yield operation.db.Put(relation)

The above code is not exactly how I did it but pretty close. The problem with this is that the amount of relations that need to be stored is the same so I am still storing (potentially) massive amounts of data.

Locality-sensitive hashing and good ol’ queries

When I started developing Antecons for Google App Engine, I minimized the number of indexed properties per entity. Since then, I have learned that it is better to focus on minimizing the number of entities so having up to n2 product relations as separate entities did not seem to be the way to go. For tag relations, indexing the tags for each product seemed to be an obvious choice so I did that. This way, it is easy to select related products based on tags with some datastore queries instead of querying separate relation entities.

Finding product similarities, however, was a more tricky problem to solve. For example, how is it possible to find products with similar titles based on a datastore query? Can we split the title into tokens and query for each of these tokens? Should we use full-text search? What if a product uses two different spellings? What if similar products could be grouped into buckets that can be queried? Ok, now we are on to something…

Locality-sensitive Hashing is a technique that does exactly this: Given a set of web documents, each document is hashed to a specific bucket such that documents in the same bucket are similar. Given a new web document, we can find similar documents by looking in bucket that the document belongs to.

After some testing, I ended up using an implementation of simhash. Now, every time a product is saved, three simhash buckets are calculated and these can then be used to query for similar products. In other words, we only store three extra fields per product, a very efficient and scalable solution.

Conclusion

I am happy to have added extra recommendation data to Antecons with product relations and similarities. This is not the end of it though since I am already considering how I can approve the above approach so it is faster and more robust. I will continue to write on the blog when there are new improvements for Antecons.

Thank you for reading!

More data, better recommendations

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


Today, we have published an improvement for the Antecons recommendation algorithm. In the beginning, recommendations were based on an analysis of order data for a webshop which has turned out to work quite well. But more data is better. Starting today, Antecons will also analyze data based on what products customers are looking at on the webshop. This improves the recommendations, especially for products that have recently been added to a shop and have not been sold so much yet.

There are many other ideas and features in the pipeline and one of them is adding similarity measures as a recommendation tool. That is, similarity in terms of common product tags and similar product titles. This is probably going to find its way into Antecons in the near future, possibly as an opt-in feature.

As an extra note, the back-end and infrastructure of Antecons is constantly improving, thanks in part to the constant improvements being made to Google App Engine. Scalability and reliability are key elements for a high-performance app like Antecons and GAE makes it possible to focus on the app instead of the infrastructure. This might sound like a sales pitch for GAE but actually, it is one of Antecons’ secret weapons.

Experimental features

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


Yesterday, I found out exactly what it means when Google warns about their experimental App Engine features: Your code might eventually break. Let me be clear, I am not blaming Google. They give you fair warning:

Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce.

I have written about my usage of the MapReduce framework earlier. Yesterday, I updated the MapReduce framework to the latest version only to see that my custom Datastore reader suddenly had stopped working and I was seeing exceptions in my MapReduce pipeline. Bummer.

Long story short, I spent a day debugging the new code and finally got it working by:

  1. Digging through the MapReduce framework code. Hurray for open source!
  2. Dropping the idea of running FP-Growth on batches of entities and instead running the mapping function on each entity.

That second point probably requires some explanation to really grasp and I am not sure I will be able to but maybe some pseudo-Python will help. The biggest change happened in the map-step of the Frequent Patterns MapReduce pipeline. Basically I went from this:

def map_batch_of_transactions(batch):
    frequent_patterns = fpgrowth.run(batch)
    for p in frequent_patterns:
        yield p, p.support

to this:

def map_single_transaction(transaction):
    frequent_patterns = itertools.combinations(transaction, 2)
    for p in frequent_patterns:
        yield p, 1

The MapReduce shuffler takes care of grouping together patterns with the same key so with the new method, the shuffler will have more work to do since the same patterns will be yielded more often. Let’s say we have the pattern:

a,b (support: 4)

Before, the shuffler would just receive:

('a,b', 4)

but now it will receive:

('a,b', 1)
('a,b', 1)
('a,b', 1)
('a,b', 1)

On the other hand, FP-growth does not have to run so the map-step of the pipeline has more predictable performance characteristics. It remains to be seen if the change has significant impact on the entire MapReduce process. I am currently testing this.

So anyway, the whole point of this post was: If a feature is experimental, watch out. Sounds obvious right? Well…

Creating a Shopify plugin

This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


In the last post about Antecons, I wrote about frequent patterns and MapReduce. Since then, this base functionality has been tweaked to work a little better but the main focus has been on creating the first plugin for Antecons: A Shopify app.

I have previously mentioned that it is important to me that Antecons can be easy to set up. Although focusing on the API initially was a great way to get started, it has become clear after some thought that the reach is limited if the entire product is just an API from the beginning. So I have begun implementing a plugin (or an app as they call it) for Shopify. There are two reasons for choosing Shopify as the first integrated platform for Antecons:

  1. It is easy to get started with app development on Shopify and their API is quite extensive.
  2. Shopify was simply something I knew before-hand so I did not need to research other options too much.

So far, it has been a good experience working with the Shopify API. Integrating with Python/Google App Engine was very easy thanks to their open source API bindings for python.

The Shopify app for Antecons is not finished yet or at least, I do not consider it in a state where I can release it. However, it is in state where it installs correctly and delivers recommendations to the webshop it is installed on. Instead of writing a lot about that, below are some screenshots that show the functionality. The suggestions by Antecons are the little “You might also like” products.

Antecons installation start
Antecons installation start
Antecons app authentication
Antecons app authentication
Antecons installed
Antecons installed
Product page suggestion
Product page suggestion
Cart page suggestion
Cart page suggestion