SQL Server Hater: Greatest/Least

The title explain itself.

SQL Server does not have two, maybe not very popular but hell - very useful functions. Those functions are greatest and least.

For those who are not familiar with them. Both functions are scalar functions that return the smallest or the greatest value from the parameter list. Typically those allow variable list of parameters (very like coalesce). These are not aggregation functions (like max and min) as those work on the row level.

A contextless example:

select 
    greatest( 1, 2, 3, 4, 5 ) as gt, 
    least( 1, 2, 3, 4, 5 ) as lt
from dual;

GT LT
---------- ----------
5 1

When I was searching the internet to find out if MS SQL Server does offer such functionality I came across many discussions on how to do it otherwise and boy... people don't seem to understand what's the point in using those functions!

Most common answer to the problem is use case/when. Yup, that will work if you have two scalar values ready to be compared. Another one seem to be... let's call it unpivot-and-aggregate.. sure, seems like a reasonable solution from both points of view - ease of use and optimisation...

Let's try then. Obviously case/when method only allows us to do one comparison at the time so we would need to stack case statements to make it work. Let's try. Let's assume this as our data set:

Oracle:
select 1 as a, 2 as b, 3 as c, 4 as d, 5 as e from dual;
MS SQL:
select 1 as a, 2 as b, 3 as c, 4 as d, 5 as e;

and let's stack case statements...

select a, b, c, d, e,
case when 
 case when 
  case when a > b then a else b end > case when c > d then c else d end
 then
  case when a > b then a else b end
 else
  case when c > d then c else d end
 end > e
then
 case when 
  case when a > b then a else b end > case when c > d then c else d end
 then
  case when a > b then a else b end
 else
  case when c > d then c else d end
 end
else
 e
end as gt,
case when 
 case when 
  case when a < b then a else b end < case when c < d then c else d end
 then
  case when a < b then a else b end
 else
  case when c < d then c else d end
 end < e
then
 case when 
  case when a < b then a else b end < case when c < d then c else d end
 then
  case when a < b then a else b end
 else
  case when c < d then c else d end
 end
else
 e
end as lt
from (
 select 1 as a, 2 as b, 3 as c, 4 as d, 5 as e
) dataset

W H A T ? ? ?

Yup, case/when doesn't allow you to use the result of the condition in the then/else part so it has to be recalculated again. What would be the solution? Well... subquery, obviously... but remember - calculating max/min requires all values to be compared agains each other, at least in pairs when using golden divide kind of methods... so... it would generally require some number of subqueries (log(2) or something) to calculate this:

select a, b, c, d, e,
case when pg > e then pg else e end as gt,
case when pl < e then pl else e end as lt
from (
 select a, b, c, d, e,
 case when pg1 > pg2 then pg1 else pg2 end as pg,
 case when pl1 < pl2 then pl1 else pl2 end as pl
 from (
  select a, b, c, d, e,
  case when a > b then a else b end as pg1,
  case when c > d then c else d end as pg2,
  case when a < b then a else b end as pl1,
  case when c < d then c else d end as pl2
  from (
   select 1 as a, 2 as b, 3 as c, 4 as d, 5 as e
  ) dataset
 ) precalc1
) precalc2

Looks much better but still bollocks. It would be much easier to maintain but it doesn't look nice.

How about the unpivot?

with
 dataset as (select 1 as a, 2 as b, 3 as c, 4 as d, 5 as e)
select * from dataset
cross join
(
 select max(val) as gt, min(val) as lt
 from (
  select *
  from dataset
  unpivot
  (
    val
    for colname in (a, b, c, d, e)
  ) x
 ) agg
) calc

Even though this sounds like a stupid solution it appears to me as much more sensible one. It will probably scan the dataset twice but it's much more compact nevertheless.

Let's get back to people commenting that in SQL Server you can do case/when and you don't need least/greatest. What they don't realise is the fact that (1) you don't always want to compare just two values (as presented above) or (2) the compared values are not always simple scalar values. In my case it's often some random, long analytical function so in most cases I am forced to use case with subquery just to minimise maintenance risk of having same, complicated expression in multiple places...

One of the other solutions offered by MS SQL users was to expand your database with custom C# code. Well... seems tempting but in real life you're not always allowed to do such stuff. It's always a risky business...

Have you got any other solutions (apart from the obvious - switching to Oracle ;D)?

Best

SQL Server Hater

Thursday, August 8, 2019

Greatest/Least

No comments:

Post a Comment

Non fractional division (int/float)

Search This Blog