Emmanuel Katto Dubai : How can I improve the performance of a slow-running SQL query in my large database?

emmanuelkatto24

New member
Local time
Today, 11:04
Joined
Oct 17, 2024
Messages
4
Hello, community!

I’m Emmanuel Katto from Dubai, United Arab Emirates (UAE), currently tackling a project where I need to optimize a large SQL query that’s slowing down considerably. My database contains millions of records, and the query involves multiple JOINs and subqueries.

As I’m relatively new to advanced SQL optimization techniques, I would greatly appreciate any guidance or best practices you could share. I’m particularly interested in:

  1. Indexing: What are the most effective strategies for indexing large tables to enhance query performance?
  2. Query Structure: Are there specific patterns or methods I should consider when rewriting the query for improved efficiency?
  3. Tools and Techniques: Can you recommend any tools for analyzing query performance and pinpointing bottlenecks?
I haven’t yet found a solution to these challenges, so any insights or examples would be incredibly helpful. Thank you in advance for your assistance!

Best regards!
Emmanuel Katto
 
Not clear if your BE is access, sql server or something else but this link maybe of help
 
CJ's link gives you some insights. In particular if you have a comparison that uses "LIKE" rather than "=" to find matching values AND the exact values aren't in the beginning of the field, then you use "*" in front of the X LIKE "*xxxx". That kills the use of any index on the field X. However, there are other considerations.

You can index several fields, and there is a temptation among new Access users to index every field. I think the limit is still 32 indexes per table, but there will be severe performance hits if you update records with multiple indexed fields very often. Therefore, there is a cost consideration in that if you update a record with indexes, you update the record and each index as well. When the table is large, this can become a speed problem, too.

The first idea for indexing is to decide when you are looking for a value in a particular field, what fields will you most frequently search and how many records will return if your search is based a single field. In other words, how efficient is the index in finding what you wanted?

If you have a unique identifier of some sort, searching on that unique ID field returns a single record regardless of how many records are in the DB. The cardinality of that index is 1. Could be a person's employee ID number, or could be an inventory SKU code ... anything that is unique. This is the most efficient kind of field to index.

If you are searching for only female employees, you can search on the gender code but typically searching for that code only will return about 50% of the employee list. The cardinality of that field is <number of records>/2, which makes it inefficient for indexing. You don't get a good return on the investment of indexing on gender.

If you are searching for a person's eye color, that might return a smaller group than the "gender" search above, but the question is, how often would you run such a search unless you are running against a database of criminals? If you do an "eye color" search once a year, is it worth the space implied by indexing?

Therefore, your consideration of what to index must consider cardinality (efficiency) and frequency of use.

Writing queries is its own art, at least some of the time. In essence, your SQL gives the DB engine a series of instructions on how to narrow down the number of records to be affected, because the more records you involve, the more work there is to do. You want to as rapidly as possible reduce the result set because eventually, you and I both know that SQL has to loop through the affected records one at a time. Even though theory says that SQL affects the members of the result set simultaneously, we know that it merely seems that way because SQL doesn't return to us until it is finished, and the actual query processing is ALMOST like a black box.

This link is one of many you could find that discusses "order of SQL operations" - meaning, when you execute an SQL statement, how is that processed internally (i.e. opening the black box).


In this article, you discover that FROM and JOIN clauses get evaluated first. If you have fields on which a JOIN makes sense, it is a highly efficient way to reduce the result set. Note also that for Access, having an index on a field that participates in JOIN clauses a lot will make it far more efficient. WHERE clauses are 2nd in the list. After that, the HAVING clause (if you have one) is the last chance to reduce the result set. Therefore, to get the fastest possible performance out of a query, set it up to do the greatest amount of reduction early.

I don't know what tools are available these days because I've been retired for over 8 years now. New stuff pops up all over the place. Products come and go. I will have to defer to my colleagues who are still active to answer that for you. However, you can ask Access to show you a query execution plan which will tell you if you have indexed scans or relation scans. You don't want many relation scans, as they are the slowest case. For instance, a scan for female employees would result in a relation scan on the gender field.

 

Users who are viewing this thread

Back
Top Bottom