{"id":174,"date":"2011-01-21T13:24:36","date_gmt":"2011-01-21T12:24:36","guid":{"rendered":"http:\/\/thoughtflow.dk\/?p=174"},"modified":"2011-01-21T13:25:40","modified_gmt":"2011-01-21T12:25:40","slug":"strange-linq-to-sql-performance-when-using-count","status":"publish","type":"post","link":"https:\/\/davidlebech.com\/thoughtflow\/strange-linq-to-sql-performance-when-using-count\/","title":{"rendered":"Strange Linq-to-SQL performance when using Count"},"content":{"rendered":"<p>Yesterday, I was writing a Linq-to-SQL query and noticed a quite remarkable difference in performance between two very similar queries (seconds versus minutes of running time). I thought it would be worth sharing.<\/p>\n<p>Basically, I have a bunch of orders that are represented by two SQL tables, called <code>Basket<\/code> and <code>BasketItem<\/code>. If an order e.g. consists of the purchase of items A, B and C, the corresponding rows in the tables could be:<\/p>\n<pre>\r\nBasket:     ID = 1\r\nBasketItem: ID = 1   BasketID = 1   ItemID = A\r\nBasketItem: ID = 2   BasketID = 1   ItemID = B\r\nBasketItem: ID = 3   BasketID = 1   ItemID = C\r\n<\/pre>\n<p>This is a fairly common way to represent order information in a relational database, establishing a one-to-many relationship between an order and its separate order lines.<\/p>\n<p>Now, somewhere in the code I am working on, I have an item set (for example { A, B }) and I want to figure out in how many orders these two items occur together. The items are represented in a list:<\/p>\n<pre class=\"brush: csharp; title: ; notranslate\" title=\"\">\r\n\/\/ items is given as a parameter\r\n\/\/ but manually initialized here for descriptive purposes\r\nList&lt;object&gt; items = new List&lt;object&gt;() { &quot;A&quot;, &quot;B&quot; };\r\n<\/pre>\n<p>First version of the solution to the problem looked like this:<\/p>\n<pre class=\"brush: csharp; title: Listing 1; notranslate\" title=\"Listing 1\">\r\nint count = 0;\r\nforeach (Basket b in Baskets)\r\n{\r\n  IEnumerable&lt;BasketItem&gt; bItems = b.GetBasketItems();\r\n  if (items.All(item =&gt; bItems.Any(bItem =&gt; bItem.ItemID.Equals(item))))\r\n    count++;\r\n}\r\nreturn count;\r\n<\/pre>\n<p>If the <code>BasketItem<\/code>s for a <code>Basket<\/code> contains all the given items, increment a count. This is obviously not optimal since it requires all <code>BasketItem<\/code>s to be fetched from the database. After a lot of pondering and test, I came up with the following Linq query:<\/p>\n<pre class=\"brush: csharp; title: Listing 2; notranslate\" title=\"Listing 2\">\r\nint minLines = items.Count;\r\nvar count =\r\n  from basket in Baskets\r\n  where\r\n    (from basketItem in BasketItems\r\n     where basketItem.BasketID == basket.BasketID &amp;&amp;\r\n     items.Contains(basketItem.ItemID)\r\n     select basketItem).Count() &gt;= minLines\r\n  select basket;\r\nreturn count.Count();\r\n<\/pre>\n<p>If the query finds <code>items.Count<\/code> or more <code>BasketItem<\/code>s for a given <code>Basket<\/code>, the basket is selected and the number of valid baskets are returned in the end. This dropped the running time from ~50 seconds to ~8 seconds for about 7000 calls to the method with the above code (for different <code>items<\/code> lists), a nice improvement.<\/p>\n<p>I then thought that the above Linq query could be further improved by using <code>Count()<\/code> directly instead of using <code>Where<\/code> clauses. It then rewrites to:<\/p>\n<pre class=\"brush: csharp; title: Listing 3; notranslate\" title=\"Listing 3\">\r\nvar count =\r\n  AnteconsBaskets.Count(basket =&gt;\r\n    AnteconsBasketItems.Count(basketItem =&gt;\r\n      basketItem.BasketID == basket.BasketID &amp;&amp;\r\n      items.Contains(basketItem.ItemID)) &gt;= minLines);\r\nreturn count;\r\n<\/pre>\n<p>Using the above code, I had to manually halt the program after waiting more than 10 minutes for execution to complete.<\/p>\n<p>The question now is: Why did this code perform so poorly compared to the code in listing 2? I cannot offer an explanation of this but I can hint at it. The code in listing 2 translates into an SQL query where <code>COUNT(*)<\/code> is used in the sub-query for the <code>BasketItem<\/code> selection. For listing 3, <code>Count()<\/code> translates into an SQL query that uses a <code>SELECT CASE WHEN<\/code> query with three cases. Why this is so slow, I do not know but I will definitely not use <code>Count()<\/code> like this in the future.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Yesterday, I was writing a Linq-to-SQL query and noticed a quite remarkable difference in performance between two very similar queries (seconds versus minutes of running time). I thought it would be worth sharing. Basically, I have a bunch of orders that are represented by two SQL tables, called Basket and BasketItem. If an order e.g. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[3],"tags":[46,53,45,54],"class_list":["post-174","post","type-post","status-publish","format-standard","hentry","category-software","tag-net","tag-linq","tag-programming","tag-sql"],"_links":{"self":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/174","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/comments?post=174"}],"version-history":[{"count":0,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/174\/revisions"}],"wp:attachment":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media?parent=174"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/categories?post=174"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/tags?post=174"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}