CTE executing multiple times

Tag: sql-server-2008-r2 , common-table-expression Author: w312119795 Date: 2013-01-15

I have this structure:

WITH my_cte
    AS
    (
    SELECT y.name
    FROM 
        WHData.dbo.vw_data x
        INNER JOIN WHData.dbo.vw_DimNames y
                ON x.nameKey = y.CasinoKey
    WHERE DateKey >= CONVERT(CHAR(8),DATEADD(mm,-2,GETDATE() - DAY(GETDATE())) + 1,112)--two complete months ago    
    GROUP BY y.name
    )
SELECT *
FROM    WHAnalysis.dbo.tb_otherData a
WHERE   NOT EXISTS
                (
            SELECT 1
            FROM my_cte b
            WHERE b.name = a.name 
            );

If I run the code in the CTE in isolation it takes 3 secs; but the complete script just runs and runs.

If I move away from a CTE and use an indexed temporary table then it all runs in 4secs.

What I assume is happening is that the CTE is executing against each record of data in tb_otherData so it'll probably take, as there are 2000 records, 2000 x 3sec ....too long!

The temp table solution is fine but, out of interest, is there a way to change the CTE code so that it runs quickly? Are there some CTE tricks that I am missing?


EDIT

If I switch to a good old fashioned sub-query then the execution plan is absolutely identical:

SELECT *
FROM   WHAnalysis.dbo.tb_otherData a
WHERE name not in 
        (
            SELECT y.name
            FROM 
            WHData.dbo.vw_data x
            INNER JOIN WHData.dbo.vw_DimNames y
            ON x.nameKey = y.CasinoKey
             WHERE DateKey >= CONVERT(CHAR(8),DATEADD(mm,-2,GETDATE() - DAY(GETDATE())) + 1,112)--two complete months ago   
              GROUP BY y.name
              );
CTEs are just syntactic sugar in this case - they make you think they are only evaluating once but that is often not the case. Are you looking for an explanation or a workaround (e.g. writing the query without a CTE and without a temp table)?
@AaronBertrand actually I've just tested and the execution plan is EXACTLY the same if I switch to a sub-query. I was actually wondering if I could keep the CTE but maybe there was a trick I was missing to make it faster? p.s. got into a slightly heated discussion with a_horse_with_no_name the other day who told me I should alias columns like x as mycolumn but also mentioned your name - I thought you were of the school mycolumn = x ?
Yes, a CTE is just a different way to write a sub query. If we're not talking recursive they will likely optimize identically. And yes I'm alias = column but it's just a preference and not worth arguing over. I'll argue against people using column as 'alias' OR 'alias' = column, but only to get rid of the stupid string delimiters that are confusing and deprecated.
Is DateKey from WHData.dbo.vw_data or WHData.dbo.vw_DimNames?

Best Answer

Try to use this query instead of using subquery with EXISTS:

SELECT a. *
FROM    WHAnalysis.dbo.tb_otherData a
LEFT JOIN
(   SELECT y.name as name
    FROM WHData.dbo.vw_data x
         INNER JOIN WHData.dbo.vw_DimNames y
                ON x.nameKey = y.CasinoKey
         WHERE DateKey >= 
          CONVERT(CHAR(8),DATEADD(mm,-2,GETDATE() - DAY(GETDATE())) + 1,112)    
) b on a.name=b.name
WHERE b.name is null

If DateKey is from WHData.dbo.vw_data then it will be:

SELECT a. *
FROM    WHAnalysis.dbo.tb_otherData a
LEFT JOIN WHData.dbo.vw_DimNames y on a.name=y.name
LEFT JOIN WHData.dbo.vw_data x
   on y.CasinoKey= x.nameKey 
      and (
          x.DateKey >= 
          CONVERT(CHAR(8),DATEADD(mm,-2,GETDATE() - DAY(GETDATE())) + 1,112)
          )
WHERE x.nameKey is null

comments:

+1 this L.O.J is spot on; for some reason it is much faster - the execution plan when using a LOJ has no nested loops in it; maybe that is the gain.
If it is possible you should avoid nested subqueries and IN subqueries using JOINs instead.
@valex why? In a lot of cases, EXISTS / NOT EXISTS performs better than JOINs (and for inner joins and non-nullable columns, IN/EXISTS can produce the exact same plan). Here's a blog post I wrote on NOT EXISTS. But you should always test rather than make sweeping generalizations.