SQL Server Hater: Multicolumn IN statement

Well... Multicolumn IN statement. What about it? Well... it doesn't exist in SQL server...

What am I even talking about? What multicolumn IN statement? What the heck is it?

Well.. you know what the IN statement is? What it does? No? So, it's used for filtering rows by a set of values, for example give me finance data for Q1 and Q2 of each year. The list don't have to be a static one, it can be a result of a subquery, for example give me finance data for quarters listed in a configuration table.

Remember a table function we created in another post? We'll use it to create a sales generator to have some data to work on ;)

-- thank you Microsoft for disallowing non-deterministic functions in UDFs... duh...
create or alter view StupidSQLServerRand as select rand() as rnd;
go

create or alter function GenerateSales ()
RETURNS @t TABLE ( 
 sale_year int,
 sale_month int,
 sale_product int,
 sale_amount float
)
AS
begin
 declare @y_iterator int = 1
 declare @m_iterator int = 1
 declare @p_iterator int = 1
 
 declare @p_max int
 declare @sale float
 declare @tmp float

 while @y_iterator <= 4
 begin
  while @m_iterator <= 12
  begin
   select @p_max = rnd*11 from StupidSQLServerRand
   while @p_iterator <= @p_max
   begin
    select @sale = rnd*100 + 0.1 from StupidSQLServerRand
    insert into @t (sale_year, sale_month, sale_product, sale_amount ) values ( 2000+@y_iterator, @m_iterator, @p_iterator, @sale )
    set @p_iterator = @p_iterator + 1
   end
   set @p_iterator = 1
   set @m_iterator = @m_iterator + 1
  end
  set @m_iterator = 1
  set @y_iterator = @y_iterator + 1
 end

 return
end
go

select * into SalesDetails from GenerateSales ()

select sale_year, sale_month, sum(sale_amount) as total_sale 
into SaleTotals
from SalesDetails group by sale_year, sale_month

-- thank you Microsoft for disallowing non-deterministic functions in UDFs... duh...
create or alter view StupidSQLServerRand as select rand() as rnd;
go

create or alter function GenerateSales ()
RETURNS @t TABLE ( 
 sale_year int,
 sale_month int,
 sale_product int,
 sale_amount float
)
AS
begin
 declare @y_iterator int = 1
 declare @m_iterator int = 1
 declare @p_iterator int = 1
 
 declare @p_max int
 declare @sale float
 declare @tmp float

 while @y_iterator <= 4
 begin
  while @m_iterator <= 12
  begin
   select @p_max = rnd*11 from StupidSQLServerRand
   while @p_iterator <= @p_max
   begin
    select @sale = rnd*100 + 0.1 from StupidSQLServerRand
    insert into @t (sale_year, sale_month, sale_product, sale_amount ) values ( 2000+@y_iterator, @m_iterator, @p_iterator, @sale )
    set @p_iterator = @p_iterator + 1
   end
   set @p_iterator = 1
   set @m_iterator = @m_iterator + 1
  end
  set @m_iterator = 1
  set @y_iterator = @y_iterator + 1
 end

 return
end
go

select * into SalesDetails from GenerateSales ()

select sale_year, sale_month, sum(sale_amount) as total_sale 
into SaleTotals
from SalesDetails group by sale_year, sale_month

Now we have two tables: SalesDetails and SaleTotals that we can work on.

So, let's get all the details for months 1, 3 and 4 of each year:

select sale_year, sale_month, sale_amount
from SalesDetails 
where sale_month in (1, 3, 4);

Now, let's use subquery. Let's see all the details for years when sale amount exceeded 2500:

select sale_year, sale_month, sale_amount
from SalesDetails 
where sale_year in (
    select sale_year 
    from SaleTotals 
    group by sale_year 
    having sum(total_sale) > 2500
);

So far, so good.

But what if we wanted to see the details for months when the sale was above a certain threshold? For example:

select * from SaleTotals where total_sale > 500;

In Oracle, we could use multiple columns in the in statement, like so:

select sale_year, sale_month, sale_product, sale_amount
from SalesDetails 
where (sale_year, sale_month) in (
    select sale_year, sale_month from SaleTotals where total_sale > 500
) order by 1, 2;

No problem.

But in SQL Server we're getting:
Msg 4145, Level 15, State 1, Line 100
An expression of non-boolean type specified in a context where a condition is expected, near ','.

Creative again, huh?

There are better and worse solutions. One of the latter type, would be to concatenate key columns somehow. In this case it could work, especially if we did not just concatenate but if we combined columns somehow. For year and month easy arithmetic would work, for example:

select sale_year, sale_month, sale_product, sale_amount
from SalesDetails 
where (sale_year, sale_month) in (
    select sale_year, sale_month from SaleTotals where total_sale > 500
) order by 1, 2;

but in case of some more random data, especially text, especially when codepage is not known (usually for n-prefixed text columns) it probably isn't good approach. The same applies to sort of autoincrement ID's. They can get very large so arithmetics may be out of scope and concatenating may lead to amiguity. In this case some separating characters would do the work but... how about performance? How about the limits of string variables (length)?

Another common alternative is inner join:

select d.sale_year, d.sale_month, d.sale_product, d.sale_amount
from SalesDetails d
inner join
(
 select sale_year, sale_month from SaleTotals where total_sale > 500
) t
on
 d.sale_year = t.sale_year and d.sale_month = t.sale_month
;

But this approach has another issue you need to be aware of. What will happen when there's more then one key value in the subquery (two entries for the same month and year in case of this example)?
We will start getting duplicates in the output. You need to get rid of duplicates prior to join either by getting distinct values or some aggregation (or whatever else might be the case for you):

select d.sale_year, d.sale_month, d.sale_product, d.sale_amount
from SalesDetails d
inner join
(
 select distinct sale_year, sale_month from SaleTotals where total_sale > 500
) t
on
 d.sale_year = t.sale_year and d.sale_month = t.sale_month
;

Best

SQL Server Hater

Monday, September 2, 2019

Multicolumn IN statement

No comments:

Post a Comment

Non fractional division (int/float)

Search This Blog