Thought Flow

Tag: programming

  • Intellectual stimulus

    After a year of doing freelance programming/consulting, I have a better understanding of why some people choose academia or research as a career path: Intellectual stimulation. I have been working mostly with website and webshop backend programming and on a day-to-day basis, the recipe is basically the same:

    1. Create a datatable in a database.
    2. Create a webpage form so userdata can be put into this table.
    3. Display this userdata in various versions on the website/webshop.

    Over time, I see myself writing the same or very similar code over and over again for every new feature that needs to be added even when I try to reuse the same code — and it bores the hell out of me sometimes.

    Actually, while studying, I was not particularly interested in theory. I always liked the programming more than the theory and the latter is usually more emphasized. It is ofted noted that college is not preparing students with proper programming skills for “the real world” and that a dedicated software development education might be a good idea. I agree.

    But now I’m going of at a tangent. My point is, when I was in school, I was displeased with theory (because “hated” is such a strong word) and I fell asleep while reading any research paper or computer science book (this still haunts me. Reading CS books seems to be important for developers. I just cannot do it). But now, I miss the intellectual stimuli from academia. This is one of the reasons I have been pushing myself to find time to work on Antecons, a project that has forced me to dig into a number of research papers on association rule learning. (By the way, the recommendations made by Antecons are actually being added by the users to the shopping carts at vildmedvin.dk. I can thus assume that it is raising profits. This is a great success, given its currently limited and quite crude functionality)

    This blog post does not have a particular message or opinion. This is thought flow and I have just been wondering what other people do to cope with boredom in their jobs. Sometimes, boring work is inevitable, I know. But when more than half of one’s work is boring, is it then time to start rethinking one’s situation?

  • Strange Linq-to-SQL performance when using Count

    Yesterday, I was writing a Linq-to-SQL query and noticed a quite remarkable difference in performance between two very similar queries (seconds versus minutes of running time). I thought it would be worth sharing.

    Basically, I have a bunch of orders that are represented by two SQL tables, called Basket and BasketItem. If an order e.g. consists of the purchase of items A, B and C, the corresponding rows in the tables could be:

    Basket:     ID = 1
    BasketItem: ID = 1   BasketID = 1   ItemID = A
    BasketItem: ID = 2   BasketID = 1   ItemID = B
    BasketItem: ID = 3   BasketID = 1   ItemID = C
    

    This is a fairly common way to represent order information in a relational database, establishing a one-to-many relationship between an order and its separate order lines.

    Now, somewhere in the code I am working on, I have an item set (for example { A, B }) and I want to figure out in how many orders these two items occur together. The items are represented in a list:

    // items is given as a parameter
    // but manually initialized here for descriptive purposes
    List<object> items = new List<object>() { "A", "B" };
    

    First version of the solution to the problem looked like this:

    int count = 0;
    foreach (Basket b in Baskets)
    {
      IEnumerable<BasketItem> bItems = b.GetBasketItems();
      if (items.All(item => bItems.Any(bItem => bItem.ItemID.Equals(item))))
        count++;
    }
    return count;
    

    If the BasketItems for a Basket contains all the given items, increment a count. This is obviously not optimal since it requires all BasketItems to be fetched from the database. After a lot of pondering and test, I came up with the following Linq query:

    int minLines = items.Count;
    var count =
      from basket in Baskets
      where
        (from basketItem in BasketItems
         where basketItem.BasketID == basket.BasketID &&
         items.Contains(basketItem.ItemID)
         select basketItem).Count() >= minLines
      select basket;
    return count.Count();
    

    If the query finds items.Count or more BasketItems for a given Basket, the basket is selected and the number of valid baskets are returned in the end. This dropped the running time from ~50 seconds to ~8 seconds for about 7000 calls to the method with the above code (for different items lists), a nice improvement.

    I then thought that the above Linq query could be further improved by using Count() directly instead of using Where clauses. It then rewrites to:

    var count =
      AnteconsBaskets.Count(basket =>
        AnteconsBasketItems.Count(basketItem =>
          basketItem.BasketID == basket.BasketID &&
          items.Contains(basketItem.ItemID)) >= minLines);
    return count;
    

    Using the above code, I had to manually halt the program after waiting more than 10 minutes for execution to complete.

    The question now is: Why did this code perform so poorly compared to the code in listing 2? I cannot offer an explanation of this but I can hint at it. The code in listing 2 translates into an SQL query where COUNT(*) is used in the sub-query for the BasketItem selection. For listing 3, Count() translates into an SQL query that uses a SELECT CASE WHEN query with three cases. Why this is so slow, I do not know but I will definitely not use Count() like this in the future.

  • Lisp

    Is Lisp easy?I try. I really do. I read articles, I write code, I frustrate myself and I force myself to spend countless hours making things work. Whatever I do, I keep bumping my head into yet another wall. I’m trying to grasp Lisp.

    According to John McCarthy, Lisp represents a local optimum in the space of programming languages. What he and everyone else fail to tell people learning Lisp (at least me) is that is represents a global minimum in the space of programming language learnability.

    There are many good resources to help understanding Lisp everywhere, ranging from almost philosophical over paedagogical introductions to more technical articles. I have also managed to do something in Lisp, like implementing FastICA for Independent Component Analysis in Clojure, a dialect of Lisp, and at the moment, I’m trying to implement FP-growth, a well-known and fairly scalable Association Rule Mining algorithm. And yes, whenever something succeeds (after many hours of pondering), it is a pleasing experiencing to notice how few lines are sometimes needed to accomplish complex tasks.

    But I’m not satisfied. I’m frustrated. When I was learning Java in the early days of my academic career, I was rarely frustrated when faced with new problems to solve. But learning Lisp is like having a very unstable nuclear power plant living in the brain. Meltdowns are inevitable and occur quite often.

    And the worst thing about Lisp is not Lisp itself. It is the feeling of incompetence that hits you in the face whenever you cannot figure something out. The feeling of mediocrity is not very pleasant and the meltdowns are tough on one’s self-esteem. The thought: “Maybe I am just an average programmer” pops up constantly.

    “I want to believe” that Lisp is great. I hope I will see the light. I’m looking for the promised epiphany. And I most certainly will not settle with mediocrity… ever.