Friday, March 2, 2012

On Primary key and Clustered indexes

There are so many articles around the net about this topic. Great DBAs have written a lot on this already. Still I see a number of myths regarding clustered index going around among newbies. In fact some idea held by a good friend-colleague made me write this entry.

Following are the myths many people hold

Myth #1) Clustered indexes are always primary key or vice versaIt is true that when we create a primary key on a table by default a clustered index is created. But definitely we can opt for a non clustered index if we have a reason to do so.

Eg. We create a table as

CREATE TABLE ClusterMythBuster
(
PkColumn INT NOT NULL,
Column1 VARCHAR(10),
Column2 VARCHAR(25),
EgFiltercol VARCHAR(10),

)


Now the table is a heap (Meaning there is no clustered index, and no physical order in which the data is stored in the table)
Now I will create a primary key that is not a clustered index


ALTER TABLE ClusterMuthBuster ADD CONSTRAINT Pk_
ClusterMythBuster  PRIMARY KEY NONCLUSTERED (PkColumn)

Now let us add a clustered index to this table to a non key column


CREATE CLUSTERED INDEX IX_
ClusterMythBuster_EgFiltercol ON  ClusterMythBuster  (EgFiltercol)

Now we have a clustered index that is not a primary key as well as a primary key that is not a clustered index.


Myth#2) Clustered index cannot be created on non unique column.This is also not true. This already evident from the earlier example, still we can seal it by trying to insert some value into the above table.


insert into 
ClusterMythBuster values (1,'Test1','Test2',1)
insert into 
ClusterMythBuster values (2,'Test1','Test2',1)
insert into 
ClusterMythBuster values (3,'Test1','Test2',1)
insert into 
ClusterMythBuster values (4,'Test1','Test2',2)
insert into 
ClusterMythBuster values (5,'Test1','Test2',2)
insert into 
ClusterMythBuster values (6,'Test1','Test2',1)

See that now since the clustered index is created on EgFiltercol column, the data will be ordered as first all the ones and then the twos.

Please note: It is always recommended to create a clustered index on a column that is ever increasing, since it will create less fragmentation.


Myth#3 A clustered index cannot be created on a combination of columnsThis is also not true since we can create a composite clustered index.

See the example below

First we need to drop the existing since we can have only one clustered index per table

DROP INDEX 
ClusterMythBuster .IX_ClusterMythBuster_EgFiltercol


CREATE CLUSTERED INDEX IX_
ClusterMythBuster_CompCluster ON  ClusterMythBuster (Column1,Column2)


However, I would like to end by saying that ironically all the above myths are serving one purpose since it is always safe to choose a column that is unique and ever increasing. It is also better if the clustered index is narrow (less number of columns),
But my point is that if you have your own valid reasons to choose so, you can opt otherwise.

Please see these links

http://www.sqlservercentral.com/articles/Indexing/68563/
http://msdn.microsoft.com/en-us/library/ms190639.aspx
http://msdn.microsoft.com/en-us/library/ms177443.aspx
http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx

Thursday, March 1, 2012

Auto Growth considerations

Auto growth is the event that occurs when a query needs additional space in the database to get executed

SQL Server provides two options if auto growth is enabled;
Auto growth in MB or in percentage.

When the current allocated size is filled, an auto growth event is triggered. During this event SQL collects the required space from OS and initializes it.

The auto growth will resource intensive, and hence recommended to give auto growth setting in such a way that it should not occur too often, at same time it should not grow very large that the auto growth itself takes much time to complete.


You can alter the file size and growth settings in database property window.



Alternately you can use the TSQL command to change database size and growth settings


ALTER DATABASE databaseName
MODIFY FILE (NAME=LogicalFilename,FILEGROWTH=XMB)


To get the logical name and other informations you can use the following command.


USE DatabaseName
SP_HELPFILE


or


SP_HELPDB DatabaseName

Now the question is how can we find out what is the optimum size for auto growth?
The answer is you need to first monitor your database growth and decide on this.

If you are using SQL 2005 and abaove and if you have default trace enabled, you can use following query to find this.




if
(select convert(int,value_in_use) from sys.configurations where name = 'default trace enabled' ) = 1
begin
declare
@curr_tracefilename varchar(500) ;
declare
@base_tracefilename varchar(500) ;
declare
@indx int ;
select
@curr_tracefilename = path from sys.traces where is_default = 1 ;
set
@curr_tracefilename = reverse(@curr_tracefilename);
select
@indx = patindex('%\%', @curr_tracefilename) ;
set
@curr_tracefilename = reverse(@curr_tracefilename) ;
set
@base_tracefilename = left( @curr_tracefilename,len(@curr_tracefilename) - @indx) + '\log.trc' ;
select
(dense_rank() over (order by StartTime desc))%2 as l1,
convert(int, EventClass) as EventClass, DatabaseName,
Filename, (Duration/1000) as Duration, StartTime,
EndTime
, (IntegerData*8.0/1024) as ChangeInSize
from ::fn_trace_gettable( @base_tracefilename, default )
where
EventClass >= 92 and EventClass <= 95 and ServerName = @@servername --and DatabaseName = db_name()
order
by StartTime desc ;
end
else
select
-1 as l1, 0 as EventClass, 0 DatabaseName, 0 as Filename, 0 as Duration, 0 as StartTime, 0 as EndTime,0 as ChangeInSize




To determine space utilized inside the data files for each databases use





Create Table ##DB_usage
(
Database_name varchar(255),
Logical_name varchar(255),
PhysicalLocation Varchar(1000),
size float,
Used float)

exec sp_msforeachdb
'use ?; insert into ##DB_usage
select ''?'', name,physical_name,size*8/1024,FILEPROPERTY(name,''spaceused'')*8/1024 from sys.database_files'


select * from ##DB_usage
drop table ##DB_usage


For more details see.

Tuesday, July 5, 2011

Beware of Nolock

There is a common misconception among the developers regarding the use of nolock inside their query. In a recent meeting with development team, they told me they have included nolock so that it will not lock the table on read. I explained to them, in default isolation level, select query will create only a shared lock on the resource. So what will happen when you put nolock hint in your query?
- It will retrive the data ignoring the locks held by updates.

This can be demonstrated by a simple example.

In sql management studio, open a query window and run the following code.


create Table Fruits
(
FruitName varchar(25),
Quantity int
)

Go

insert into Fruits values ('Apple',100)
Go
insert into Fruits values ('Orange',50)
Go
insert into Fruits values ('Strawberry',500)
Go

Begin tran

Update Fruits set Quantity =75 where FruitName='Orange'


See that after updating the quantity for orange, you have not committed the transaction.
Now open another query window and run

select * from Fruits

This query will wait indefenitely.

Now cancel this query, and try running the same query with nolock hint
select * from Fruits(nolock)

Its completed just like that.
Note the quantity of orange will be shown as 75, where as we have not yet decided whether to commit the query in first window.
This is called dirty read.

To conclude I will quote the app team head

"This means I have only two options, either my application be extremely slow, or my data is often inconsistant".

Well, we should try other options to tune the code before using nolocks. Above said, there are many situations when a dirty read is not an issue. But it all depends on your business requirement

Monday, July 4, 2011

Blocking : Uncommitted Open transactions

Often we come across blocks in SQL server which affects the performance of your production server. There are many information on the net to tune your queries to decrease blocking, or to choose an appropriate isolation level to improve concurrent access. But here I want to mention a rare scenario that sometimes may go unnoticed.

When you are analyzing blocks on SQL server, normally you may run a query like

Select * from sysprocesses where blocked <>0

And you will look at the head blockers.

But I would recomment you sometimes check this

Select * from sysprocesses where status='sleeping' and open_tran>0

If you have a record, you might further want to look at the host,program and the query which was executed recently on the spid.

A developer should thoroughly check entire flow of his SP and should ensure that it leaves no uncommited transactions when completed.

In some funny situations I have seen people enabling implicit transactions in there query analyzer and simply forget to put commit. By the time we report this issue to him, he must have erased and retyped many other queries in the same window. And will not rember what was the query which left the open transaction.