For me a RDBMS is a high level tool. Something very opposite to using low level languages like C, C++ or... Assembler. From user/developer perspective it should exist outside of binary realm. You shouldn't be exposed to any binary considerations. Until you want to, of course.
It's obvious, that data has some binary representation and it may and could be beneficial if the user, and developer especially, is aware of it but it shouldn't be a requirement.
This binary representation is a crucial thing to know for administrators (maybe not on the row but a data file level). The database performance relies on it greatly. It's best if such representation is aligned with the hardware. Things like endianness, word lengths or architecture (RISC/CISC) imply different internal implementations. We can imagine that if datafiles are in line with such factors RDBMS will perform better as there will be no conversions necessary.True, but it's the administrator's task to make it work like that (if RDBMS allows for different datafile formats). Unfortunately (my point of view) for SQL Server users it always existed in Microsoft x86 based world and as such - these things never ever changed (apart from being expanded with new features).
It's very obvious that SQL Server developers decided to incorporate internal CPU database into server's datatypes. We can see types like int, bigint or float. For sure, it's probably quick and very effective, especially when it comes to storage but it also introduces some of pitfalls.
The most notable one is a no-reminder division when both numbers are integer:
select2/3asresult
result ----------- 0
To make sure you get what you want (unless it IS what you want) you have to cast at least one of them to floating point:
result --------------------------------------- 0.666666
(1 row affected)
result ---------------------- 0,666666666666667
(1 row affected)
result --------------------------------------- 0.666666
(1 row affected)
The other thing worth mentioning is a way the floating point data is represented binary. It's all explained nicely in the SQL Servers documentation so I won't do this again. Suffice to say the precision is highly influenced by the exponent so, let me quote documentation exactly: "Floating point data is approximate; therefore, not all values in the data type range can be represented exactly".
Fortunately SQL Server does support other numeric datatypes like numeric for example which I strongly encourage anyone to consider using, especially when working on finance data.
Just make sure you know what you're doing and it's generally a good idea to stay away of binary data unless that's exactly what you want to do. It's all right to use int/bigint for identifiers but is very questionable for me.
The first thing you're faced with when you switch from Oracle to SQL Server is object naming convention. There are books about how those database work. The whole topic would cover a lot of areas such as administration, access control and what not, so in big points, no details.
In Oracle world you have database instances that don't know anything about each other. Inside such instance you have schemas and you generally work within their boundaries. Since schema is basically an user there's very limited interaction between them (some preivileges are required). In MS SQL Server however there might be multiple databases existing at the same time and schemas, though existing, are merely a logical separators.
In Oracle you're strictly limited to 1 and 2-part naming conventions: [SCHEMA.]<OBJECT>
where [SCHEMA.] is not obligatory and if omitted CURRENT_SCHEMA is used (CURRENT_SCHEMA is a system variable and it generally works as MS SQL's USE statement, by default CURRENT_SCHEMA is a logged user name). That's it.
In MS SQL Server, however, you can use one of: 1, 2, 3 and 4-part naming conventions: [INSTANCE.][DATABASE.][SCHEMA.]<OBJECT>
It's pretty self explanatory. One note - in case of SQL Server default [SCHEMA] is called dbo. In case of 3 and 4-part naming conventions you can still omit the schema name when pointing to objects in dbo by using a so called double-dot convention (just remove dbo from the query).
Each of the following would lead to the same table:
usemastergoselect * from SaleTotals
goselect * from dbo.SaleTotals
goselect * from master.dbo.SaleTotals
goselect * from master..SaleTotals
goselect * from dbserver.master.dbo.SaleTotals
go
It's not so much different to the Oracle. It certainly makes sense. No issues about it whatsoever from my side.
My issue is that SQL Server is very inconsistent when it comes to enforcing these conventions. This mainly relates to DDL operations and... yes, you guessed it: functions.
So, what did I expect? I expected all those conventions can be used in all situations, but no. They apply only to the select statement and not in it's entirety - minus function calls of course ;)
How about DDL then?
createtable DropMe1 ( x int )
go-- Commands completed successfully.createtable dbo.DropMe2 ( x int )
go-- Commands completed successfully.createtable master.dbo.DropMe3 ( x int )
go-- Commands completed successfully.createtable dbserver.master.dbo.DropMe4 ( x int )
goMsg 117, Level 15, State 1, Line 168The object name 'dbserver.master.dbo.DropMe4' contains more than the maximum number of prefixes.
The maximum is 2.
Ok. Inconsistent? Yes, but that's fine. You can't access objects from another server directly anyway so it's not a problem at all. Strange they thought of 4-part naming convention anyway but I'm going to pass it and not bother about it ever again.
Let's try something else, shall we?
usemastergocreateview DropMe_V1 asselect1as x
go-- Commands completed successfully.createview dbo.DropMe_V2 asselect1as x
go-- Commands completed successfully.createview master.dbo.DropMe_V3 asselect1as x
goMsg 166, Level 15, State 1, Line 177'CREATE/ALTER VIEW' does not allow specifying the database name as a prefix to the object name.
What? You can't create view using 3-part naming convention? Duh... Didn't expect that (as a matter of fact - I did, I already knew it would happen ;D). It's very common for other DDL commands, including functions. We need one to demonstrate another example of such inconsistency.
Msg 195, Level 15, State 10, Line 196 'DropMeF' is not a recognized built-in function name.
Oh... bummer. It appears to me as Microsoft decided that schema prefix is mandatory as not to confuse UDFs with built-in functions. I suspect the MS developers have a strong affection towards conditional if and case statements (that suspicion is a topic for another blog post). This is bollocks to me. I would rather see a form of object already exist when accidently trying to overwrite system function instead of stumbling on something like this.
After this we already can expect something strange when using double-dot, can't we?
select master..DropMeF () as fresult;
Msg 102, Level 15, State 1, Line 199 Incorrect syntax near '.'.
Of course! Since we already assumed they enforce schema name in case of function call so it's no different in this case. Still stupid.
Do you know any other (different) cases of such inconsistency in SQL Server?
Just recently, while working on another post on multicolumn IN statement I came across another random SQL Server stupidity. Maybe it's an exaggregation but very irritating thing nevertheless. Every other day I find SQL Server's User Defined Functions to be more and more crippled, useless... I have tons of things to write about (and will do) but since I just stumbled upon this one I'll do it now.
Well... seems like UDFs in SQL Server must be deterministic by design. The server disallows you to create any functions that would encorporate anything that Server knows is non deterministic and, first and foremost, that includes system functions like rand() and newid().
Msg 443, Level 16, State 1, Procedure NonDeterministic, Line 6 [Batch Start Line 121] Invalid use of a side-effecting operator 'rand' within a function.
Why is that even possible? What is the reasoning behind it? Why a function has to be deterministic in the SQL Server? Well... since people have already found workarounds and it's easy to write function that will pass compilation and will behave undeterministically.. what's the point of all of this?
createoralterview RandomView asselectrand() as RandomValue
gocreateoralterfunction NonDeterministic ()
returnsfloatasbegindeclare @rv floatselect @rv = RandomValue from RandomView
return @rv
endgoselect dbo.NonDeterministic() as r1, dbo.NonDeterministic() as r2, dbo.NonDeterministic() as r3
from (select1asidunion all select2asidunion all select3asid) severalrowsofdata;
createorreplacefunction NonDeterministic
returnnumberisbeginreturn dbms_random.value(0,1);
end;
/
select NonDeterministic as r1, NonDeterministic as r2, NonDeterministic as r3
from (
select1asidfrom dual union all
select2asidfrom dual union all
select3asidfrom dual
) severalrowsofdata;
Even more. In Oracle ALL functions are non-deterministic by definition! However, you can set them to be deterministic but this still won't disallow anything from the function's body! It's only hinting the database engine that this function will return same output when given same parameters thus allowing for caching its results. You can obviously put yourself in trouble when not paying attention ;) Just take a look:
createorreplacefunction NonDeterministic2
returnnumberdeterministicisbeginreturn dbms_random.value(0,1);
end;
/
select NonDeterministic2 as r1, NonDeterministic2 as r2, NonDeterministic2 as r3
from (
select1asidfrom dual union all
select2asidfrom dual union all
select3asidfrom dual
) severalrowsofdata;
Well... Multicolumn IN statement. What about it? Well... it doesn't exist in SQL server...
What am I even talking about? What multicolumn IN statement? What the heck is it?
Well.. you know what the IN statement is? What it does? No? So, it's used for filtering rows by a set of values, for example give me finance data for Q1 and Q2 of each year. The list don't have to be a static one, it can be a result of a subquery, for example give me finance data for quarters listed in a configuration table.
Remember a table function we created in another post? We'll use it to create a sales generator to have some data to work on ;)
-- thank you Microsoft for disallowing non-deterministic functions in UDFs... duh...createoralterview StupidSQLServerRand asselectrand() as rnd;
go
createoralterfunction GenerateSales ()
RETURNS @t TABLE (
sale_year int,
sale_month int,
sale_product int,
sale_amount float
)
ASbegindeclare @y_iterator int = 1declare @m_iterator int = 1declare @p_iterator int = 1declare @p_max intdeclare @sale floatdeclare @tmp floatwhile @y_iterator <= 4beginwhile @m_iterator <= 12beginselect @p_max = rnd*11from StupidSQLServerRand
while @p_iterator <= @p_max
beginselect @sale = rnd*100 + 0.1from StupidSQLServerRand
insertinto @t (sale_year, sale_month, sale_product, sale_amount ) values ( 2000+@y_iterator, @m_iterator, @p_iterator, @sale )
set @p_iterator = @p_iterator + 1endset @p_iterator = 1set @m_iterator = @m_iterator + 1endset @m_iterator = 1set @y_iterator = @y_iterator + 1endreturnendgoselect * into SalesDetails from GenerateSales ()
select sale_year, sale_month, sum(sale_amount) as total_sale
into SaleTotals
from SalesDetails groupby sale_year, sale_month
-- thank you Microsoft for disallowing non-deterministic functions in UDFs... duh...createoralterview StupidSQLServerRand asselectrand() as rnd;
go
createoralterfunction GenerateSales ()
RETURNS @t TABLE (
sale_year int,
sale_month int,
sale_product int,
sale_amount float
)
ASbegindeclare @y_iterator int = 1declare @m_iterator int = 1declare @p_iterator int = 1declare @p_max intdeclare @sale floatdeclare @tmp floatwhile @y_iterator <= 4beginwhile @m_iterator <= 12beginselect @p_max = rnd*11from StupidSQLServerRand
while @p_iterator <= @p_max
beginselect @sale = rnd*100 + 0.1from StupidSQLServerRand
insertinto @t (sale_year, sale_month, sale_product, sale_amount ) values ( 2000+@y_iterator, @m_iterator, @p_iterator, @sale )
set @p_iterator = @p_iterator + 1endset @p_iterator = 1set @m_iterator = @m_iterator + 1endset @m_iterator = 1set @y_iterator = @y_iterator + 1endreturnendgoselect * into SalesDetails from GenerateSales ()
select sale_year, sale_month, sum(sale_amount) as total_sale
into SaleTotals
from SalesDetails groupby sale_year, sale_month
Now we have two tables: SalesDetails and SaleTotals that we can work on.
So, let's get all the details for months 1, 3 and 4 of each year:
select sale_year, sale_month, sale_amount
from SalesDetails
where sale_month in (1, 3, 4);
Now, let's use subquery. Let's see all the details for years when sale amount exceeded 2500:
select sale_year, sale_month, sale_amount
from SalesDetails
where sale_year in (
select sale_year
from SaleTotals
groupby sale_year
havingsum(total_sale) > 2500
);
So far, so good.
But what if we wanted to see the details for months when the sale was above a certain threshold? For example:
select * from SaleTotals where total_sale > 500;
In Oracle, we could use multiple columns in the in statement, like so:
select sale_year, sale_month, sale_product, sale_amount
from SalesDetails
where (sale_year, sale_month) in (
select sale_year, sale_month from SaleTotals where total_sale > 500
) orderby1, 2;
No problem.
But in SQL Server we're getting: Msg 4145, Level 15, State 1, Line 100 An expression of non-boolean type specified in a context where a condition is expected, near ','.
Creative again, huh?
There are better and worse solutions. One of the latter type, would be to concatenate key columns somehow. In this case it could work, especially if we did not just concatenate but if we combined columns somehow. For year and month easy arithmetic would work, for example:
select sale_year, sale_month, sale_product, sale_amount
from SalesDetails
where (sale_year, sale_month) in (
select sale_year, sale_month from SaleTotals where total_sale > 500
) orderby1, 2;
but in case of some more random data, especially text, especially when codepage is not known (usually for n-prefixed text columns) it probably isn't good approach. The same applies to sort of autoincrement ID's. They can get very large so arithmetics may be out of scope and concatenating may lead to amiguity. In this case some separating characters would do the work but... how about performance? How about the limits of string variables (length)?
Another common alternative is inner join:
select d.sale_year, d.sale_month, d.sale_product, d.sale_amount
from SalesDetails d
innerjoin
(
select sale_year, sale_month from SaleTotals where total_sale > 500
) t
on
d.sale_year = t.sale_year and d.sale_month = t.sale_month
;
But this approach has another issue you need to be aware of. What will happen when there's more then one key value in the subquery (two entries for the same month and year in case of this example)?
We will start getting duplicates in the output. You need to get rid of duplicates prior to join either by getting distinct values or some aggregation (or whatever else might be the case for you):
select d.sale_year, d.sale_month, d.sale_product, d.sale_amount
from SalesDetails d
innerjoin
(
selectdistinct sale_year, sale_month from SaleTotals where total_sale > 500
) t
on
d.sale_year = t.sale_year and d.sale_month = t.sale_month
;