{"id":3014,"date":"2023-06-11T17:22:42","date_gmt":"2023-06-11T15:22:42","guid":{"rendered":"https:\/\/davidlebech.com\/thoughtflow\/?p=3014"},"modified":"2023-06-11T17:22:46","modified_gmt":"2023-06-11T15:22:46","slug":"not-materialized","status":"publish","type":"post","link":"https:\/\/davidlebech.com\/thoughtflow\/not-materialized\/","title":{"rendered":"NOT MATERIALIZED"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">I managed to reduce a PostgreSQL view&#8217;s query time by more than 80% with just the two keywords <code>NOT MATERIALIZED<\/code> and by reading the actual database manual when GPT-4 gave up.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Below, I will explain the context of the speedup.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I recently moved a little bit of business logic from an API endpoint and into a PostgreSQL view, because it made some things a bit easier to manage. The view uses common table expression (CTE), and one of them looks something like this (not the actual code, but very similar):<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nWITH first_and_last_score AS (\n  SELECT DISTINCT ON (company_id, user_id)\n    company_id,\n    user_id,\n    LAST_VALUE(score) OVER (PARTITION BY company_id ORDER BY created_at) AS last_company_score,\n    LAST_VALUE(score) OVER (PARTITION BY company_id, user_id ORDER BY created_at) AS last_user_score\n  FROM scores\n)\n-- ...\n-- more CTEs and queries\n-- ...\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">The query finds the last score for each company and the last score for each user in that company using the <code>LAST_VALUE<\/code> window function.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">I was not particularly nervous using the above code for live data, because I knew that the view would <em>always<\/em> be used with a filter on company_id, and the CTE would thus not to a full table scan of the <code>scores<\/code> table because the query optimizer would be smart enough to filter the score table first.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Alas, when profiling the view, <em><strong>it was indeed making a table scan<\/strong><\/em> on the <code>scores<\/code> table in my view, and it was a major bottle-neck for performance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">It took several hours with trial and error to figure out what was going on, until I finally decided to actually read the Postgresql manual on CTEs, specifically how <a rel=\"noreferrer noopener\" href=\"https:\/\/www.postgresql.org\/docs\/14\/queries-with.html#id-1.5.6.12.7\" target=\"_blank\">CTEs are automatically folded into the parent query<\/a>:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><em>&#8220;&#8230; it can be folded into the parent query, allowing joint optimization of the two query levels. By default, this happens if the parent query references the <code>WITH<\/code> query just once, but not if it references the <code>WITH<\/code> query <strong>more than once<\/strong>&#8220;<\/em><br>(Emphasis mine)<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The problem was that I referenced the <code>scores<\/code> CTE twice and thus it was materialized and did not take advantage of the filter on company ID, I was using on the view.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">In the end, adding <code><strong>NOT MATERIALIZED<\/strong><\/code> to the CTE was the fix:<\/p>\n\n\n<div class=\"wp-block-syntaxhighlighter-code \"><pre class=\"brush: sql; title: ; notranslate\" title=\"\">\nWITH first_and_last_score AS NOT MATERIALIZED (\n  ...\n)\n<\/pre><\/div>\n\n\n<p class=\"wp-block-paragraph\">Just this change made reduced query time by roughly 80% for most queries to the view, and since the view was ultimately being served to a front-end, it was the different between &#8220;this feels a bit slow&#8221; to &#8220;this feels instant&#8221;.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I managed to reduce a PostgreSQL view&#8217;s query time by more than 80% with just the two keywords NOT MATERIALIZED and by reading the actual database manual when GPT-4 gave up. Below, I will explain the context of the speedup. I recently moved a little bit of business logic from an API endpoint and into [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3020,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[114,29],"tags":[169,247,248,54],"class_list":["post-3014","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-code","category-tips","tag-data","tag-database","tag-optimization","tag-sql"],"_links":{"self":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/3014","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/comments?post=3014"}],"version-history":[{"count":5,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/3014\/revisions"}],"predecessor-version":[{"id":3021,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/posts\/3014\/revisions\/3021"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media\/3020"}],"wp:attachment":[{"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/media?parent=3014"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/categories?post=3014"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/davidlebech.com\/thoughtflow\/wp-json\/wp\/v2\/tags?post=3014"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}