How to decide if to use an index?


How to decide if to use an index?

When deciding whether or not to add a non-clustered index to a column of a table, it is useful to first found out how selective it is. By this, what we want to know is the ratio of unique rows to total rows (based on a specific column) found in the table. Generally, if a column is not more than 95% unique, then the Query Optimizer might not even use the index. If this is the case, then adding the non-clustered index may be a waste of disk space. If fact, adding a non-clustered index that is never used will hurt a table’s performance.

Another useful reason to determine the selectivity of a column is to decide what is the best order to position indexes in a composite index. This is because you will get the best performance out of a composite index if the columns are arranged so that the most selective is the first one, the next most selective, the second one, and so on.

So how do you determine the selectivity of a column? One way is to run the following script on any column you are considering for a non-clustered index. This example script is designed to be used with the Northwind database, so you will need to modify it appropriately for your use.

–Finds the Degree of Selectivity for a Specific Column in a Row
Declare @total_unique float
Declare @total_rows float
Declare @selectivity_ratio float

SELECT @total_unique = 0
SELECT @total_rows = 0
SELECT @selectivity_ratio = 0

–Finds the Total Number of Unique Rows in a Table
–Be sure to replace OrderID below with the name of your column
–Be sure to replace [Order Details] below with your table name
SELECT @total_unique = (SELECT COUNT(DISTINCT OrderID) FROM [Order Details])

–Calculates Total Number of Rows in Table
–Be sure to replace [Order Details] below with your table name
SELECT @total_rows = (SELECT COUNT(*) FROM [Order Details])

–Calculates Selectivity Ratio for a Specific Column
SELECT @selectivity_ratio = ROUND((SELECT @total_unique/@total_rows),2,2)
SELECT @selectivity_ratio as ‘Selectivity Ratio’

The results in this case is 42%, which means that adding a non-clustered index to the OrderID column of the Order Details table in the Northwind database is probably not a very good idea

Avoid DSN connections:

When making your connection to SQL Server, choose DSN-less connections for the fastest connection. Not only does it make database connections faster, it allows you to use the OLE DB provider, which is not available using a DSN-based connection. The OLE DB provider is the fastest provider you can use to access SQL Server.

If you do need to use a DSN, select System DSNs over File DSNs, because they are faster when making connections

one row of data

If you know that the results of a query from within a stored procedure you call will return only one row of data (and not an entire recordset), don’t open an ADO Recordset for the purpose of retrieving the data. Instead, use a stored procedure output parameter for the best performance.

bind columns to field

When looping through recordsets, be sure you bind columns to field objects before the looping begins. Don’t use the Fields collection of the Recordset object to assign values for fields in a Recordset within each loop, as it incurs much more overhead.

MoveFirst, moveprevious

Avoid using the MoveFirst method of the RecordSet object when using a Forward-Only cursor. In effect, when you use this method, it re-executes the entire query and repopulates the Forward-Only cursor, increasing server overhead and killing performance.

CacheSize

By default, the CacheSize property of the ADO Recordset object determines how many rows are fetched from a server-side cursor at a time. The default is one. This means each row of the recordset is returned one at a time from the server to the client. This is very inefficient.

The CacheSize property needs to be set to a much higher figure, such as between 100 and 500, depending on the number of rows that are to be eventually returned from the server to the client.

You can use Profiler to capture the communication between your VB application and SQL Server, and if you see only one record being returned at a time via a cursor, you know you are in trouble.

select top N

If your application allows users to run queries, but you are unable in your application to easily prevent users from returning hundreds, even thousands of unnecessary rows of data they don’t need, consider using the TOP operator within the query. This way, you can limit how may rows are returned, even if the user doesn’t enter any criteria to help reduce the number or rows returned to the client.

ROWLOCK

GetRows method

If you are VB developer and need to access SQL Server data, but don’t have the time or interest in learning how to write stored procedures, consider using the GetRows method of the RecordSet object. The GetRows method is used to pull all the records from the recordset into an array, which is much faster than using embedded Transact-SQL to download a RecordSet to your application

Don´t use ODBC

When creating a connection using ADO, be sure you use the OLE DB provider, not the older ODBC provider for SQL Server, or the ODBC provider for OLE DB. The parameter you will use in your connection string is “provider=sqloledb”. The OLE DB provider performs much more efficiently than the ODBC provider, providing better performance. You may not notice the speed boost if your application makes a single connection and the connection always stays open. But if your applications opens and closes hundreds or thousands of connections, then the benefit will become apparent.

Don’t use DAO to access SQL Server

While this may seem like ancient history, don’t use DAO to access SQL Server, it is performance suicide. Also avoid ODBCDirect. Instead, use RDO or ADO, with ADO being the current preference. You would be surprised at how many SQL Server-based applications that are still around using this old and poorly-performing technology

When using an ADO recordset to return data from SQL Server, the most efficient way is to use a firehose cursor :
When using an ADO recordset to return data from SQL Server, the most efficient way is to use what is often called a firehose cursor. The firehouse cursor is really an incorrect term because it is not a cursor. A firehose cursor is just a method to quickly move data from SQL Server to the client that requested it.

Essentially, a firehose cursor sends the requested data (from the query) to an output buffer on SQL Server. Once the output buffer is full, it waits until the client can retrieve the data from the output buffer. Then the output buffer is filled again. This process repeats over and over until all of the data is sent to the client. Another advantage of this method is that records are only locked long enough to be moved to the output buffer, helping to boost concurrency and performance.

When you open an ADO RecordSet and use its default settings, a firehose cursor is automatically used by default. If you want to specify a firehouse cursor manually, you can do so by using these property settings:

CursorType = adForwardOnly

CursorLocation = adUseServer

LockType = adLockReadOnly

CacheSize = 1

When the client receives the data from the firehose cursor, the data should be read into a local data structure for local use by the client

Example:

UPDATE Users WITH (ROWLOCK)
SET Username = ‘fred’ WHERE Username = ‘foobar’
Using ROWLOCK politely asks SQL Server to only use row-level locks. You can use this in SELECT, UPDATE, and DELETE statements, but iti sbest used in UPDATE and DELETE statements. You’d think that an UPDATE in which you specify the primary key would always cause a row lock, but when SQL Server gets a batch with a bunch of these, and some of them happen to be in the same page (depending on this situation, this can be quite likely, e.g. updating all files in a folder, files which were created at pretty much the same time), you’ll see page locks, and bad things will happen. And if you don’t specify a primary key for an UPDATE or DELETE, there’s no reason the database wouldn’t assume that a lot won’t be affected, so it probably goes right to page locks, and bad things happen.

By specifically requesting row-level locks, these problems are avoided. However, be aware that if you are wrong and lots of rows are affected, either the database will take the initiative and escalate to page locks, or you’ll have a whole army of row locks filling your server’s memory and bogging down processing. One thing to be particularly aware of is the “Management/Current Activity” folder with Enterprise Manager. It takes a long time to load information about a lot of locks. The information is valuable, and this technique is very helpful, but don’t be surprised if you see hundreds of locks in the “Locks/Processes” folder after employing this technique. Just be glad you don’t have lock timeouts or deadlocks.

NOLOCK
Example:

SELECT COUNT(UserID)
FROM Users WITH (NOLOCK)
WHERE Username LIKE ‘foobar’
Using NOLOCK politely asks SQL Server to ignore locks and read directly from the tables. This means you completely circumvent the lock system, which is a major performance and scalability improvement. However, you also completely circumvent the lock system, which means your code is living dangerously. You might read the not-necessarily-valid uncommitted modifications of a running transaction. This is a calculated risk.

For financial code and denormalized aggregates (those little counters of related data that you stash away and try desperately to keep accurate), you should play it safe and not use this technique. But I think you’ll find that for better than 90% of your application, it would not be that big of a deal if a user (or even intermediate code) saw an uncommitted modification. In fact, you’ll probably find that most of your data never or only very rarely changes, in which case the overhead of locking the data is almost always completely wasted.

For example, if I want to count all users that joined Streamload.com between June 1 and August 31 of Y2K, there’s no reason for me to lock anything: that number was cast in stone the moment September 1, 2000 rolled around. Another example is the file listings you see on Streamload.com: it doesn’t much matter if you don’t see the exact perfect data, since either you don’t own the data and it doesn’t much matter what you see, or you do own the data and you know perfectly well whether you just modified the data or not and whether new files have finished uploading.

Just don’t use this type of data as the basis for modifications to the database, and don’t use it when it’s really important that the user not see the wrong thing (an account statement or balance, for instance).

You can only use NOLOCK in SELECT statements. This includes inner queries, and the SELECT clause of the INSERT statement. You can and should use NOLOCK in joins, for example:

SELECT COUNT(Users.UserID)
FROM Users WITH (NOLOCK)
JOIN UsersInUserGroups WITH (NOLOCK) ON
Users.UserID = UsersInUserGroups.UserID

It’s difficult to quantify the performance gain by applying these techniques and it is impossible to speculate as to the effects this would have on your website. often it is a night and day improvement. And you won’t find this if you go searching through the documentation for help with lock contention. The docs recommend rewriting your app so that tables are referenced – and hence, locks are attained – in the same order throughout ,keeping transactions short and in one batch (a good idea, but in practice “yeah, right!”), use a low isolation level (also a good idea: NOLOCK takes this to an extreme), and use bound connections to allow processes to (share locks and) cooperate (sounds like a very complicated bad idea).

Från Khan sql dba – mcitp        http://www.addarr.com

Advertisements

One thought on “How to decide if to use an index?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s