Quote Originally Posted by JeenLeen View Post
Thanks.
I figured the issue was the "second pass through the data" to do a DISTINCT or the sorting via ORDER BY. Well, actually the log note said it was running out of memory while doing the sorting, so I'm pretty sure it's the ORDER BY. But I felt like asking.

I generally only do LEFT JOINs out of habit, but it seemed more intuitive to do RIGHT JOINs for this particular project, and just got me wondering.
I recognize that your OP question was theory rather than application, but as general advise, I always remind my programmers to do each step in isolation if you can help it. Sorts in one step, minimize each table about to be joined to the smallest size you can manage (making a sub-table with reduced variables for one; reducing down by filter criteria; and doing distincts), then doing joins. I can't count the number of times a programmer with 25+ years of experience who can run circles around me (a managers with hopefully competent but by no means exhaustive programming experience) has said that there's 'nothing wrong' with their code and it couldn't possibly be the issue and then I find that the code is trying to join 2-3 multi-million rowed, 300-variable dimension tables while doing a distinct and a filter and multiple complex case-whens.