Thought Flow

Tag: debugging

  • Experimental features

    This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


    Yesterday, I found out exactly what it means when Google warns about their experimental App Engine features: Your code might eventually break. Let me be clear, I am not blaming Google. They give you fair warning:

    Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce.

    I have written about my usage of the MapReduce framework earlier. Yesterday, I updated the MapReduce framework to the latest version only to see that my custom Datastore reader suddenly had stopped working and I was seeing exceptions in my MapReduce pipeline. Bummer.

    Long story short, I spent a day debugging the new code and finally got it working by:

    1. Digging through the MapReduce framework code. Hurray for open source!
    2. Dropping the idea of running FP-Growth on batches of entities and instead running the mapping function on each entity.

    That second point probably requires some explanation to really grasp and I am not sure I will be able to but maybe some pseudo-Python will help. The biggest change happened in the map-step of the Frequent Patterns MapReduce pipeline. Basically I went from this:

    def map_batch_of_transactions(batch):
        frequent_patterns = fpgrowth.run(batch)
        for p in frequent_patterns:
            yield p, p.support
    

    to this:

    def map_single_transaction(transaction):
        frequent_patterns = itertools.combinations(transaction, 2)
        for p in frequent_patterns:
            yield p, 1
    

    The MapReduce shuffler takes care of grouping together patterns with the same key so with the new method, the shuffler will have more work to do since the same patterns will be yielded more often. Let’s say we have the pattern:

    a,b (support: 4)
    

    Before, the shuffler would just receive:

    ('a,b', 4)
    

    but now it will receive:

    ('a,b', 1)
    ('a,b', 1)
    ('a,b', 1)
    ('a,b', 1)
    

    On the other hand, FP-growth does not have to run so the map-step of the pipeline has more predictable performance characteristics. It remains to be seen if the change has significant impact on the entire MapReduce process. I am currently testing this.

    So anyway, the whole point of this post was: If a feature is experimental, watch out. Sounds obvious right? Well…

  • Guess-driven development

    A few days ago, I received a link to a blog post called some lesser-known truths about programming. Among other things, it states:

    Bad programmers spend much of that 90% debugging code by randomly making changes and seeing if they work.

    Patrick, my business partner, jokingly calls this Guess-Driven Development and I now take the liberty to publish the term in writing. I will admit that I have fallen for this type of development quite a few times. But now I have started to wonder: Where is the fine line between guessing and exploring?

    When faced with a strange bug or error in some system, we are taught to use e.g. a debugger and over time, we hopefully become more adept at solving bugs. But sometimes (many times) I have solved a problem by almost randomly trying out different solutions. So is this guessing or exploring? I don’t have the answer.

    Maybe I should have listened more carefully in Software Engineering class?