Monday, September 2, 2019

Multicolumn IN statement

Well... Multicolumn IN statement. What about it? Well... it doesn't exist in SQL server...

What am I even talking about? What multicolumn IN statement? What the heck is it?

Well.. you know what the IN statement is? What it does? No? So, it's used for filtering rows by a set of values, for example give me finance data for Q1 and Q2 of each year. The list don't have to be a static one, it can be a result of a subquery, for example give me finance data for quarters listed in a configuration table.

Remember a table function we created in another post? We'll use it to create a sales generator to have some data to work on ;)



Now we have two tables: SalesDetails and SaleTotals that we can work on.

So, let's get all the details for months 1, 3 and 4 of each year:
select sale_year, sale_month, sale_amount
from SalesDetails 
where sale_month in (1, 3, 4);

Now, let's use subquery. Let's see all the details for years when sale amount exceeded 2500:
select sale_year, sale_month, sale_amount
from SalesDetails 
where sale_year in (
    select sale_year 
    from SaleTotals 
    group by sale_year 
    having sum(total_sale) > 2500
);

So far, so good.

But what if we wanted to see the details for months when the sale was above a certain threshold? For example:
select * from SaleTotals where total_sale > 500;

In Oracle, we could use multiple columns in the in statement, like so:
select sale_year, sale_month, sale_product, sale_amount
from SalesDetails 
where (sale_year, sale_month) in (
    select sale_year, sale_month from SaleTotals where total_sale > 500
) order by 1, 2;

No problem.

But in SQL Server we're getting:
Msg 4145, Level 15, State 1, Line 100
An expression of non-boolean type specified in a context where a condition is expected, near ','.


Creative again, huh?

There are better and worse solutions. One of the latter type, would be to concatenate key columns somehow. In this case it could work, especially if we did not just concatenate but if we combined columns somehow. For year and month easy arithmetic would work, for example:
select sale_year, sale_month, sale_product, sale_amount
from SalesDetails 
where (sale_year, sale_month) in (
    select sale_year, sale_month from SaleTotals where total_sale > 500
) order by 1, 2;
but in case of some more random data, especially text, especially when codepage is not known (usually for n-prefixed text columns) it probably isn't good approach. The same applies to sort of autoincrement ID's. They can get very large so arithmetics may be out of scope and concatenating may lead to amiguity. In this case some separating characters would do the work but... how about performance? How about the limits of string variables (length)?

Another common alternative is inner join:
select d.sale_year, d.sale_month, d.sale_product, d.sale_amount
from SalesDetails d
inner join
(
 select sale_year, sale_month from SaleTotals where total_sale > 500
) t
on
 d.sale_year = t.sale_year and d.sale_month = t.sale_month
;
But this approach has another issue you need to be aware of. What will happen when there's more then one key value in the subquery (two entries for the same month and year in case of this example)?
We will start getting duplicates in the output. You need to get rid of duplicates prior to join either by getting distinct values or some aggregation (or whatever else might be the case for you):
select d.sale_year, d.sale_month, d.sale_product, d.sale_amount
from SalesDetails d
inner join
(
 select distinct sale_year, sale_month from SaleTotals where total_sale > 500
) t
on
 d.sale_year = t.sale_year and d.sale_month = t.sale_month
;


Best

No comments:

Post a Comment

Non fractional division (int/float)

For me a RDBMS is a high level tool. Something very opposite to using low level languages like C, C++ or... Assembler. From user/developer p...