T-SQL – Matt Uliasz few small ideas

Automatic generation of EF entities and mappings from existing database schema

This is a bunch of useful scripts for generating C# Entity Framework Code First POCOs from existing database schema.

First, let’s create T-SQL function for generating C# class for EF entity:

This function deliberately ignores certain columns, (Created On/By, Modified On/By). This properties are pushed to EntityBase class, which our POCO inherits from. All other columns in DB table will be mapped to corresponding C# type.

Next, let’s create T-SQL function for EF configuration (mapping class):

Now, let’s put it all together in one script:

All source code is available here

Clustered vs. Non-clustered indexes in SQL Server

In this article I will make attempt to outline differences between clustered and non-clustered indexes in SQL Server. Be fully aware that it is not going to be comprehensive explanation, but rather brief overview highlighting purpose of both types of indexes.

Indexes

Indexes are one of the most important part of database optimization, especially for large and growing database tables. They work in very similar way to index in a book – just imagine massive address book, with thousands pages but no index. Searching this book for specific address is nearly impossible without index.

When query is ran against SQL Server database, Query Optimizer analyses all available indexes for tables involved, and creates most efficient execution plan to run statement. SQL Server reads data in 8KB pages, so if table is small enough to fit on one or two pages, there is no need for an index at all, as SQL Server will only have to make one or two read operations. This operation is called Full Table Scan and for small table it is the most efficient way of pulling the data out. On the other hand, if table has 1,000,000 records and size of row allows reading only 50 rows per page, number of read operations rises to 20,000. That clearly will take a lot of time and this is where indexes come into play. Indexes can massively improve SQL query, however is worth mentioning, that it does not come for free – indexes can take a lot of storage. When index is involved in selecting data from table, this operation is called Index Seek. In general we can distinguish two types of indexes: clustered and non-clustered.

Clustered Index

This index assumes that data in a table is physically sorted in specific order. This means that when new row is inserted, it physically goes to the place determined by index, allowing data after new row to be moved to next page if required. The table can only have one clustered index. This index is specifically useful if we use BETWEEN statement in WHERE clause of our statement (i.e. SELECT * FROM Table WHERE Col1 BETWEEN ‘A’ AND ‘Z’).

Non-clustered Index

When this type of index is used logical order of index does not match physical order of data stored on the disk. Index plays a role of “pointer to row”, allowing faster access to the row that is requested.

A non-clustered index is a special type of index in which the logical order of the index does not match the physical stored order of the rows on disk. The leaf node of a non-clustered index does not consist of the data pages. Instead, the leaf nodes contain index rows. This type of index works best when used having = in WHERE clause (i.e. SELECT * FROM Table WHERE Col1 = ‘A’) or if the column is used in JOIN.

Reference

http://msdn.microsoft.com/en-us/library/ms188783.aspx – MSDN on indexes
http://en.wikipedia.org/wiki/Index_(database) – Wikipedia on indexes
http://www.sql-server-performance.com/articles/per/index_data_structures_p1.aspx – good article explaining difference between clustered and non-clustered indexes much deeper

How to refer to previous row in SQL 2005

In this post I will describe simple mechanism allowing to refer back to previous row in SQL Server 2005/2008 database table.

For the purpose of this article let’s create a table dbo.AgentLog:

create table dbo.AgentLog (
	 AgentID	int
	,LogInTime	datetime
	,LogOutTime	datetime
	)
go

insert dbo.AgentLog
select 1, '2010-09-08 09:40:00.000', '2010-09-08 10:14:00.000'
union all
select 1, '2010-09-08 10:35:00.000', '2010-09-08 12:04:00.000'
union all select 1, '2010-09-08 13:11:00.000', '2010-09-08 14:24:00.000'
union all select 1, '2010-09-08 14:35:00.000', '2010-09-08 16:08:00.000'
union all select 2, '2010-09-08 09:00:00.000', '2010-09-08 09:23:00.000'
union all select 2, '2010-09-08 10:12:00.000', '2010-09-08 12:05:00.000'
union all select 2, '2010-09-08 13:12:00.000', '2010-09-08 15:07:00.000'
union all select 2, '2010-09-08 15:10:00.000', '2010-09-08 16:25:00.000'
go

Our table will look like this:

AgentID LogInTime LogOutTime
----------- ----------------------- -----------------------
1           2010-09-08 09:40:00.000 2010-09-08 10:14:00.000
1           2010-09-08 10:35:00.000 2010-09-08 12:04:00.000
1           2010-09-08 13:11:00.000 2010-09-08 14:24:00.000
1           2010-09-08 14:35:00.000 2010-09-08 16:08:00.000
2           2010-09-08 09:00:00.000 2010-09-08 09:23:00.000
2           2010-09-08 10:12:00.000 2010-09-08 12:05:00.000
2           2010-09-08 13:12:00.000 2010-09-08 15:07:00.000
2           2010-09-08 15:10:00.000 2010-09-08 16:25:00.000

The task is to for each agent list all breaks the person made and its length. To do this I need to somehow refer back to previous row to calculate difference in minutes between login and previous logout.

To accomplish the task I will use two features new to SQL Server 2005 (when compared to SQL Server 2000). These are CTE (Common Table Expression) and Row_Number() function.

First one in essence can be thought as variable table of results from SELECT statement. This data set only exists in current scope.

Row_Number() is a SQL Server built-in function that adds extra identity column to data set retrieved with SELECT statement.

SELECT
     Row_Number() over (partition by AgentID order by LogInTime) AS RowID
    ,*
FROM dbo.AgentLog

Row_Number() takes partition by argument, which tells function that you want unique row numbers against specific column. If this isn’t provided, SQL Server will just number all rows on one-by-one basis. Combining CTE and Row_Number() we can now get the following:

;with AgentLogCTE as
(select Row_Number() over (partition by AgentID order by LogInTime) as RowID, * from dbo.AgentLog )

This CTE then contains following information:

RowID                AgentID     LogInTime               LogOutTime
-------------------- ----------- ----------------------- -----------------------
1                    1           2010-09-08 09:40:00.000 2010-09-08 10:14:00.000
2                    1           2010-09-08 10:35:00.000 2010-09-08 12:04:00.000
3                    1           2010-09-08 13:11:00.000 2010-09-08 14:24:00.000
4                    1           2010-09-08 14:35:00.000 2010-09-08 16:08:00.000
1                    2           2010-09-08 09:00:00.000 2010-09-08 09:23:00.000
2                    2           2010-09-08 10:12:00.000 2010-09-08 12:05:00.000
3                    2           2010-09-08 13:12:00.000 2010-09-08 15:07:00.000
4                    2           2010-09-08 15:10:00.000 2010-09-08 16:25:00.000

Now what we need to do is just join this table to itself in some clever way, linking current row to previous one:

select
    t1.AgentID
   ,t2.LogOutTime as PrevLogout
   ,t1.LogInTime as Login
   ,convert(varchar, t1.LogInTime - t2.LogOutTime, 108) as BreakTime
from AgentLogCTE t1
   join AgentLogCTE t2
      on t2.RowID = (t1.RowID-1) and t1.AgentID = t2.AgentID
go

And this is it! Resultset looks like the following:

AgentID     PrevLogout              Login                   BreakTime
----------- ----------------------- ----------------------- ------------------------------
1           2010-09-08 10:14:00.000 2010-09-08 10:35:00.000 00:21:00
1           2010-09-08 12:04:00.000 2010-09-08 13:11:00.000 01:07:00
1           2010-09-08 14:24:00.000 2010-09-08 14:35:00.000 00:11:00
2           2010-09-08 09:23:00.000 2010-09-08 10:12:00.000 00:49:00
2           2010-09-08 12:05:00.000 2010-09-08 13:12:00.000 01:07:00
2           2010-09-08 15:07:00.000 2010-09-08 15:10:00.000 00:03:00

Whole SQL script explained in this article can be obtained from here.

How to drop all schema objects in MS SQL 2005

I needed to test my SQL implementation script that was a part of big SQL Server Data Warehouse release. For anybody who have ever done large SQL implementation fundamental question is always “Have I scripted everything that is needed?”. So how to check if your implementation script contains all objects you require? Here is a few steps that helped me:

Collate all individual scripts into one or more implementation script (I recommend having one script per development, but it always depends on circumstances).
Order your scripts so you know the sequence of running them. This is specifically important if there is dependency between objects created by different scripts.
Create backout script for each implementation one and order them as well.
Drop all objects that come into release. “How to drop all schema objects in one go?” – this is where it turns out to be priceless to have DBA in your team:

declare @schema varchar(200)
select @schema = 'MySchema'

select
'DROP ' + case
    when o.xtype = 'U' then 'TABLE'
    when o.xtype = 'V' then 'VIEW'
    when o.xtype = 'P' then 'PROCEDURE'
    when o.xtype = 'FN' then 'FUNCTION'
    end + ' ' + s.name + '.' + o.name as SQL
from
    sys.sysobjects as o
    join sys.schemas as s on o.uid = s.schema_id
where
    s.name = @schema and
    o.xtype in ('U','V','P','FN')
union all
select
'DROP SCHEMA ' + @schema

This will give you DROP statement for each table, view, stored procedure and user defined function in your schema, as well as schema itself. Just grab whichever statement you need and execute it!