Thought Flow

Technology and other things

Author: David

  • Spotify royalties

    Spotify is a cool service but I do not agree with how they pay out royalties. In this post, I will propose a different way.

    The current royalty calculation is explained by Spotify like this: There is a big chunk of money (the revenue) and each artist is paid according to their global “market share”. The share is calculated by taking the number of artist streams and dividing it by the total number of streams on Spotify. 1

    On the surface this looks like a good thing because everyone is paying for everyone. But the problem is that the equation does not account for the usage of each user on Spotify. Sometimes I can go days without using Spotify and every second I am not using the servive, the market shares of the artists I listen to are going down, relative to users that use Spotify more than me. For example, if I stream two Radiohead tracks during one month and another user streams eight tracks from Justin Bieber, the market share for Justin Bieber will be four times higher than Radiohead, simply because Spotify is being used more by the other user.

    I think this is an unfair way of distributing royalties and I am not the first one to say this. 2 Instead of calculating a global market share for each artist, I propose to calculate the market share for each artist as the average market share value of that artist for each user.

    So instead of:

    for each artist:
      market_share = artist.streams / total_streams
    

    I propose:

    for each artist:
      market_share_sum = 0
    
      for each user:
        market_share_sum +=
          user.artist.streams / user.total_streams
    
      market_share = market_share_sum / number_of_users
    

    The calculation is probably more complicated than what is explained by Spotify but I do not think the proposed change is unreasonable. Let us see how it fixes the market share calculation bias from the example before.

    Old market share calculation:

    Radiohead.streams = 2
    JustinBieber.streams = 8
    total_streams = 10
    
    Radiohead.market_share =
      Radiohead.streams / total_streams = 
      2 / 10 = 20%
    JustinBieber.market_share =
      JustinBieber.streams / total_streams =
      8 / 10 = 80%
    

    New market share calculation:

    David.Radiohead.streams = 2
    someone.JustinBieber.streams = 8
    total_streams = 10
    
    Radiohead.market_share =
      (David.Radiohead.streams / David.total_streams +
       someone.Radiohead.streams / someone.total_streams)
      / number_of_users = 
      (2/2 + 0/8) / 2 = 50%
    JustinBieber.market_share =
      (David.JustinBieber.streams / David.total_streams +
       someone.JustinBieber.streams / someone.total_streams)
      / number_of_users = 
      (0/2 + 8/8) / 2 = 50%
    

    The two artist now have an equal market share. The reason that I think this is fair is that it values our listening preferences equally, not the time we spend listening.

    I love Spotify and have been a happy (paying) customer for almost two years. The 99 SEK per month price means that I have spent more money on music in the last two years than I did in the ten years before that and I am sure I am not alone. Spotify says that about 70% of their revenue is paid to artists and rights holders so to me, it seems like a win for the industry. But I hope they redo their royalty calculation. Until then, my limited usage does not warrant a premium account. I really don’t want to support Justin Bieber while I’m sleeping.

  • Teaching philosophy

    Teaching is fun. Unfortunately, I have not had the chance to teach since 2010 at the University of Oregon but I found an old paper I wrote for a teaching effectiveness seminar and I thought I would re-print it here. It has not been edited except for a spelling error.

    Teaching philosophy

    Diverse classes, different situations and changing environments require a differentiated teaching approach in each new teaching situation. Nonetheless, there are a few things that I like to bring with me to the classroom, no matter the subject, content or form of the class that I am teaching. In this short paper, I will try to define and describe my general teaching philosophy.

    Motivation is a key element. There are different aspects to this. First, I think it is important to show one’s own motivation for the material. If the students sense that the teacher has a desire to teach the material, I believe this will have a positive effect on them. Second, motivating the material for the students themselves and giving them a sense of where they are going with the material is important.

    I strive to be as open-minded as possible. For me, being open-minded covers a lot of important traits like being friendly towards students and open to input. But another important aspect of being open-minded is being able to admit one’s own mistakes and willing to change accordingly. Along the same lines, being sympathetic and understanding are also important traits for the open-minded teacher. Every student is a human being and it is important to remember and understand that we are all individuals with different personalities and issues to deal with.

    From my previous experience, I have learned that the success of a particular class is directly proportional to the amount of preparation that I do before the class. Being well-prepared is almost self-explanatory but it still deserves to be mentioned here. Well-preparedness also entails being knowledgeable about the material that is being taught which is crucial to how well the students will
    come to understand the material.

    Overall, I think personality is the most important part of being a good teacher. Being well-prepared and knowledgeable should definitely not be neglected but the key words for my general teaching philosophy are motivation, open-mindedness and sympathy.

  • Open-sourcing the past

    While studying at the University of Oregon, I worked as a teaching assistant in three different computer science courses. One of them was CIS 323 Data Structures Lab but this course was a bit special because it had its own course number and I was teaching it almost on my own.

    It was quite a roller coaster ride3

    Anyway, throughout the course, we implemented some classic and often used data structures and algorithms in various forms. In my opinion, the most notable data structure we implemented was a fairly new balanced tree data structure called the left-leaning red-black tree (LLRB), invented by Robert Sedgewick in 2008. Back in the beginning of 2010, I could not find any publicly available C++ implementation of the LLRB tree 4 which made it fun to use in class because it was very new. This means that there is a possibility that my implementation was the first-ever implementation of the LLRB tree in C++. It is a fun thought but it is not very significant, considering it is only a few lines of code, the delete operation was not implemented and it was never released. Until now.

    I recently went through some old course material and found the code. So I emailed the University of Oregon and the course supervisor and with their permission, here is the code which I might expand with a few more data structures once I have looked through the material. I have refactored the code from the original but it still has the mark of a C++ beginner. It was fun going through it again though.

  • More data, better recommendations

    This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


    Today, we have published an improvement for the Antecons recommendation algorithm. In the beginning, recommendations were based on an analysis of order data for a webshop which has turned out to work quite well. But more data is better. Starting today, Antecons will also analyze data based on what products customers are looking at on the webshop. This improves the recommendations, especially for products that have recently been added to a shop and have not been sold so much yet.

    There are many other ideas and features in the pipeline and one of them is adding similarity measures as a recommendation tool. That is, similarity in terms of common product tags and similar product titles. This is probably going to find its way into Antecons in the near future, possibly as an opt-in feature.

    As an extra note, the back-end and infrastructure of Antecons is constantly improving, thanks in part to the constant improvements being made to Google App Engine. Scalability and reliability are key elements for a high-performance app like Antecons and GAE makes it possible to focus on the app instead of the infrastructure. This might sound like a sales pitch for GAE but actually, it is one of Antecons’ secret weapons.

  • Experimental features

    This post is about Antecons, a product recommendation engine, now part of Conversio. Antecons is no longer commercially available, but I have kept my developer diary on my website with permission.


    Yesterday, I found out exactly what it means when Google warns about their experimental App Engine features: Your code might eventually break. Let me be clear, I am not blaming Google. They give you fair warning:

    Mapreduce is an experimental, innovative, and rapidly changing new feature for Google App Engine. Unfortunately, being on the bleeding edge means that we may make backwards-incompatible changes to Mapreduce.

    I have written about my usage of the MapReduce framework earlier. Yesterday, I updated the MapReduce framework to the latest version only to see that my custom Datastore reader suddenly had stopped working and I was seeing exceptions in my MapReduce pipeline. Bummer.

    Long story short, I spent a day debugging the new code and finally got it working by:

    1. Digging through the MapReduce framework code. Hurray for open source!
    2. Dropping the idea of running FP-Growth on batches of entities and instead running the mapping function on each entity.

    That second point probably requires some explanation to really grasp and I am not sure I will be able to but maybe some pseudo-Python will help. The biggest change happened in the map-step of the Frequent Patterns MapReduce pipeline. Basically I went from this:

    def map_batch_of_transactions(batch):
        frequent_patterns = fpgrowth.run(batch)
        for p in frequent_patterns:
            yield p, p.support
    

    to this:

    def map_single_transaction(transaction):
        frequent_patterns = itertools.combinations(transaction, 2)
        for p in frequent_patterns:
            yield p, 1
    

    The MapReduce shuffler takes care of grouping together patterns with the same key so with the new method, the shuffler will have more work to do since the same patterns will be yielded more often. Let’s say we have the pattern:

    a,b (support: 4)
    

    Before, the shuffler would just receive:

    ('a,b', 4)
    

    but now it will receive:

    ('a,b', 1)
    ('a,b', 1)
    ('a,b', 1)
    ('a,b', 1)
    

    On the other hand, FP-growth does not have to run so the map-step of the pipeline has more predictable performance characteristics. It remains to be seen if the change has significant impact on the entire MapReduce process. I am currently testing this.

    So anyway, the whole point of this post was: If a feature is experimental, watch out. Sounds obvious right? Well…