Optimizing Slow Query (Query Grid Diagram Attached) (1 Viewer)

ions

Access User
Local time
Today, 08:49
Joined
May 23, 2004
Messages
785
Dear MS Access Expert

Attached is a gif file of my query, showing the joins and the criteria. My goal is to speed this query up.

**************** Table Specs *****************

TestSubInfo: 15 fields, 46,000 records
SubInfo: 25 fields, 29,000 records
Customer: 35 fields, 4000 records (This table should be broken down but assume it will not change)

All fields involved in joins are indexed.

*****************Criteria Fields Specs *********************

Criteria Fields data types:

Invoice: Long
ReadyInvoice: Boolean
BrokerID: Long
GeneratorID: Long
BillTo: Text (Lenght 1)
UP: Currency
Description: Text (Length 75)


The only criteria fields currently with an index are GeneratorID and BrokerID (enclosed in green in the attached Gif)

***************** Questions *********************

How would you make this query run faster?

Would you index all the fields that have criteria?

Would you index the Boolean ReadyInvoice field?

The field BillTo currently has only two possible data Entries… “B” or “G”. Is it worth indexing?

Would you index a Description field that has infinite data entry possibilities?

Would you create a compound Index comprised of multiple fields? What field order would you use for the compound index? Please note that there are 2 other queries. One for just Broker records and one for only Generator records. The query currently being examined combines both Broker and Generator Records and is 3- 5 times slower than the separate Broker / Generator queries.

How would you re-order (re-sequence ) the criteria fields if you are using a compound index? How would you re-sequence the criteria fields if you opted not to use a compound index?

Thank you for helping me optimize this query.
 

Attachments

  • Query_Criteria.gif
    Query_Criteria.gif
    24.5 KB · Views: 168

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
I would put a single index on Subinfo.BrokerID and Subinfo.GeneratorID and see if that helps. Don't bother indexing the text field, the boolean field, or the "B" "G" field.

John Viescas, author
Microsoft Office Access 2007 Inside Out
Building Microsoft Access Applications
Microsoft Office Access 2003 Inside Out
SQL Queries for Mere Mortals
http://www.viescas.com/
(Paris, France)
 

MarkK

bit cruncher
Local time
Today, 08:49
Joined
Mar 17, 2004
Messages
8,186
Everything you asked above? Try it. See what difference it makes. Seems to me it takes longer to post the question and wait for the response than to just do it and see.
A thing I would change is the logic in the where clause of the query in your .gif image.
You have something like...
Code:
WHERE 
(A Is Null And B Is Not Null And C = True And Customer = "Bob") OR 
(A Is Null And B Is Not Null And C = True And Customer = "Bill")
And most of that is duplication. Remove the duplicate logic...
Code:
WHERE 
(A Is Null And B Is Not Null And C = True) And 
(Customer = "Bob" OR Customer = "Bill")
and you'll have a much more efficient evaluation to do for each record.
Cheers,
 

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
Why not index a text field or those other fields?

Because there's probably not much benefit. For the boolean field, there's only a benefit if most of the values are, for example, True, and you're looking for False. Same thing with the "B" "G" field that has only two values.

Also, your suggestion to rearrange the predicate probably won't work. The query optimizer in Access will rearrange it back in most cases, prefering to solve a series of ORed ANDs than ANDed ORs.

John Viescas, author
Microsoft Office Access 2007 Inside Out
Building Microsoft Access Applications
Microsoft Office Access 2003 Inside Out
SQL Queries for Mere Mortals
http://www.viescas.com/
(Paris, France)
 

Banana

split with a cherry atop.
Local time
Today, 08:49
Joined
Sep 1, 2005
Messages
6,318
I wonder if using derived tables will simplify things....


Code:
SELECT ...
FROM (
   (
      SELECT ...
      FROM TestSubInfo
      WHERE Invoice IS NULL AND
            ReadyInvoice = True AND
            UP IS NOT NULL AND
            Description IS NOT NULL
   ) t
   INNER JOIN (
      SELECT ...
      FROM SubInfo
      WHERE BrokerID = [Forms]![Invoice]![Customer] OR
            GeneratorID = [Forms]![Invoice]![Customer]
   ) s
   ON t.TEST = s.TEST
) 
INNER JOIN Customer c
ON s.GeneratorID = c.GeneratorID;
 

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
I would like to think that the database engine optimizer is smart enough to think of that, but it would be worth a try in an attempt to influence it.

John Viescas
 

Banana

split with a cherry atop.
Local time
Today, 08:49
Joined
Sep 1, 2005
Messages
6,318
Truth be told, I've not had seriously experimented with Jet's SHOWPLAN, but what I do know is that there's no "magical optimization" so it's possible for optimizers, of whatever product we happen to use, to go astray. An example: It's a common claim to recommend a frustrated join for returning non-matches over the IN() subquery. Yet, I've found that with both MySQL & Jet (can't remember if I tested SQL Server as well) that IN() is actually faster than the frustrated join in two corner cases (when there are no non-matches to be found, or the ratio of matches to non-matches are seriously skewed). Then when we consider that several products offer a kind of optimizer 'hint' (Aside: I generally dislike them as that tempts some people to tamper with optimizer when they shouldn't be) to cover corner cases where optimizer may have a blind spot.
 

ions

Access User
Local time
Today, 08:49
Joined
May 23, 2004
Messages
785
Thanks for all the Reponses.

Johh the query currently did have GeneratorID and BrokerID indexed, however, you don't recommend putting any other indexes for the reasons you outlined. Thanks.

Also John you believe re-arranging the criteria fields in a specific order will NOT have any impact because the optimizer re-arranges anyways.

That only leaves me with one thing to try Banana's idea. If Banana's idea does not speed things up, have I done everything to optimize this query?

Other possibilities I can think of:

1) double check if there are any unecessary fields to minimize network traffic. This includes criteria fields which most likely do not have to be sent over the wire.


2) Archive records that have been invoiced and are over X years old.

If you can think of anything else I would appreciate it.

Thank you.
 

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
Ions-

You mention "network." What is the back end database? If SQL Server, then yes, eliminating fields that you don't need except for filters can make the amount of data transmitted less. If the back end is an mdb file, then it doesn't matter. When Access is the front end to a shared mdb on a network server, *all* the database manipulation (including indexes) is done on the client machine.

John Viescas, author
Microsoft Office Access 2007 Inside Out
Building Microsoft Access Applications
Microsoft Office Access 2003 Inside Out
SQL Queries for Mere Mortals
http://www.viescas.com/
(Paris, France)
 

ions

Access User
Local time
Today, 08:49
Joined
May 23, 2004
Messages
785
The backend is .MDB not SQL Server.

John I want to ensure I understand this concept correctly because it’s so important. We previously discussed how JET leverages indexed fields.

The following is my understanding of the entire distinction between JET and SQL Server. (Please correct and comment)

At the RecordS Level:

When I put a filter / criteria on an indexed field, JET will only transmit Pages from the hard disk that contain records that meet the query criteria. JET will not send the entire table over the network just Pages that contain the records in question. This is still not as efficient as SQL Server because SQL Server literally only sends the record data vs. the entire page (A Page may contain record data along with unnecessary data hence the reason this approach is less efficient).

If the filter / criteria is on a non-indexed field, JET will send the entire table.

John what happens if I have criteria on an indexed FieldA and a non-indexed FieldB in a (FieldA AND FieldB) predicate? Is Jet smart enough to only send Pages matching criteria in FieldA? I am assuming Yes, hence the reason you told me to index GeneratorID and BrokerID above.

At the Record Level:

If using Jet as the backend, the Page(s) will contain the entire record with all the fields so limiting the number of fields in a query has no impact on the amount of data transmitted when the backend is JET.
SQL Server will only send the necessary fields not the entire record.


Fact 3 (Please correct): When optimizing queries, limiting the data sent over the network is the greatest concern for speed since transmitting data over the wire is the greatest bottleneck.

Thank you for all the responses.
 

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
The backend is .MDB not SQL Server.

The following is my understanding of the entire distinction between JET and SQL Server. (Please correct and comment)

At the RecordS Level:

When I put a filter / criteria on an indexed field, JET will only transmit Pages from the hard disk that contain records that meet the query criteria. JET will not send the entire table over the network just Pages that contain the records in question. This is still not as efficient as SQL Server because SQL Server literally only sends the record data vs. the entire page (A Page may contain record data along with unnecessary data hence the reason this approach is less efficient).

That is correct. But keep in mind that Access has to fetch the index pages first. Sometimes it'll decide to do a full scan anyway if the index has low granularity - which is why I said that an index on the yes/no field or the character field with only two values probably wouldn't help.

If the filter / criteria is on a non-indexed field, JET will send the entire table.

No, not if there are other indexed fields on which you have placed criteria.

John what happens if I have criteria on an indexed FieldA and a non-indexed FieldB in a (FieldA AND FieldB) predicate? Is Jet smart enough to only send Pages matching criteria in FieldA? I am assuming Yes, hence the reason you told me to index GeneratorID and BrokerID above.

Yes.

At the Record Level:

If using Jet as the backend, the Page(s) will contain the entire record with all the fields so limiting the number of fields in a query has no impact on the amount of data transmitted when the backend is JET.
SQL Server will only send the necessary fields not the entire record.

That is correct, but Access will keep only the fields requested in the record buffer, thus saving memory on the client machine.


Fact 3 (Please correct): When optimizing queries, limiting the data sent over the network is the greatest concern for speed since transmitting data over the wire is the greatest bottleneck.

Absolutely.

Thank you for all the responses.

You're welcome!
John Viescas, author
Microsoft Office Access 2007 Inside Out
Building Microsoft Access Applications
Microsoft Office Access 2003 Inside Out
SQL Queries for Mere Mortals
http://www.viescas.com/
(Paris, France)
 

ions

Access User
Local time
Today, 08:49
Joined
May 23, 2004
Messages
785
Thanks for the answers John

>>That is correct. But keep in mind that Access has to fetch the index pages first.

John does the entire process with an index field work as described below?

JET sends all the Index Pages for the entire table over the network to the client? The client then processes the Index Pages and maintains a list of pointers to the Record Pages it requires. The client then requests from JET or the Server OS ??? to send the required Record Pages. Once the client receives all the Record Pages, the query is finally processed on the client?

Is this correct?

Question 2:

Suppose my query returned a recordset of 10 records.

Suppose I have 50,000 records and decide to archive 5 years worth of data so I am only working with 25,000 records.

Despite my archiving there is a 99% probability that the query will return the same # of records ( 10 records in the above example) as before archiving.


I can see two gains I will get from archiving in the above scenario.

Archiving will reduce the # of Index Pages transmitted over the wire? (Approximately Half)

Archiving will reduce the # of pointers to Record Pages due to statement above, hence less records will have to be sent over the network.

Are the above statements correct? Is it worth archiving the data or will the user experience be almost the same?

Question 3

Finally Suppose I decide to NOT to archive and instead add an additional filter to my query. Suppose I put a date range of 5 years on an indexed date field.

This scenario is not as good as question 2 because I am not reducing the # of Index Pages transmitted over the wire but I will get gains by reducing the record data transmitted over the wire due to the additional 5 year date filter

Am I correct?

Thank you for the great feedback.
 

Banana

split with a cherry atop.
Local time
Today, 08:49
Joined
Sep 1, 2005
Messages
6,318
JET sends all the Index Pages for the entire table over the network to the client? The client then processes the Index Pages and maintains a list of pointers to the Record Pages it requires. The client then requests from JET or the Server OS ??? to send the required Record Pages. Once the client receives all the Record Pages, the query is finally processed on the client?

Is this correct?

That's the gist, yes. From the JET, not Server OS.

Suppose my query returned a recordset of 10 records.

Suppose I have 50,000 records and decide to archive 5 years worth of data so I am only working with 25,000 records.

Despite my archiving there is a 99% probability that the query will return the same # of records ( 10 records in the above example) as before archiving.


I can see two gains I will get from archiving in the above scenario.

Archiving will reduce the # of Index Pages transmitted over the wire? (Approximately Half)

Archiving will reduce the # of pointers to Record Pages due to statement above, hence less records will have to be sent over the network.

Are the above statements correct? Is it worth archiving the data or will the user experience be almost the same?

Well... it's not the total numbers of rows that matter but rather the criteria we can use to filter. I'm fairly sure that Jet is smart enough to know when to not ask for all index pages... There's also metadata pages so the Jet can look into those and guess at which index pages it needs. Pretend that the criteria asks for dates between 1/1/1999 and 2/1/1999. There's a total of 100 index pages indexing that date. The meta data will tell that each page has a range between a certain date, so Jet knows that it doesn't have to read all pages, rather skip directly to the index page that has the criteria (e.g. 1/1/1999) in it and fetch more, if any, index pages until it has all records between 1/1/1999 and 2/1/1999 then read those pages to process the record pages.

I could be severely wrong in my understanding, however. But with the more general principle that less records in a table is faster is usually accurate so there may be some benefit to be had in horizontal partitioning but not on the same scale (e.g. cutting the table half into two smaller table does not necessarily translate into cutting the processing into half).

Question 3

Finally Suppose I decide to NOT to archive and instead add an additional filter to my query. Suppose I put a date range of 5 years on an indexed date field.

This scenario is not as good as question 2 because I am not reducing the # of Index Pages transmitted over the wire but I will get gains by reducing the record data transmitted over the wire due to the additional 5 year date filter

Am I correct?

Already explained in above.
 

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
Banana-

FWIW, I agree with you.

John Viescas, author
Microsoft Office Access 2007 Inside Out
Building Microsoft Access Applications
Microsoft Office Access 2003 Inside Out
SQL Queries for Mere Mortals
http://www.viescas.com/
(Paris, France)
 

ions

Access User
Local time
Today, 08:49
Joined
May 23, 2004
Messages
785
Thank you for the explanations.

Banana I believe I understood your explanation of MetaData for Index Pages. After reading the explanation, I would conclude that archiving will not give me a significant performance gain over merely keeping all the records and adding a filter to simulate archiving.

Please note I am basing performance gain on reducing network traffic.

The logic is.

MetaData is sent first over the network to determine which index Pages to analyze. If you archive or not basically the same amount of metaData will be sent.

The metaData limits the # of Index Pages requested over the Network.

The Index Pages limit the # of Record Pages requested over the Network.

Hence if I archive vs. add criteria that simulates archiving I should not see a performance gain in terms of reducing network traffic for the above scenario.

Above Scenario:

Despite my archiving there is a 99% probability that the query will return the same # of records ( 10 records in the above example) as before archiving.


Please comment

Thank you.
 

TexanInParis

Registered User.
Local time
Today, 17:49
Joined
Sep 9, 2009
Messages
112
Thanks for the answers John

>>That is correct. But keep in mind that Access has to fetch the index pages first.

John does the entire process with an index field work as described below?

JET sends all the Index Pages for the entire table over the network to the client? The client then processes the Index Pages and maintains a list of pointers to the Record Pages it requires. The client then requests from JET or the Server OS ??? to send the required Record Pages. Once the client receives all the Record Pages, the query is finally processed on the client?

Is this correct?

No, that is not correct. The index pages are a B-Tree, so Access fetches the root page(s) of the index, then figures out which lower level index pages are applicable to the problem and fetches those. It does not fetch all the index pages for most requests. From the index pages, Access decides which data pages to fetch. When there are multiple indexes that apply, Access does an intersection of the index pages to find only the relevant data blocks. Access picked up the index intersection technology from FoxPro - one of the reasons that MSFT bought that product.

Question 2:

Suppose my query returned a recordset of 10 records.

Suppose I have 50,000 records and decide to archive 5 years worth of data so I am only working with 25,000 records.

Despite my archiving there is a 99% probability that the query will return the same # of records ( 10 records in the above example) as before archiving.

I can see two gains I will get from archiving in the above scenario.

Archiving will reduce the # of Index Pages transmitted over the wire? (Approximately Half)

Possbily, but not by half.

Archiving will reduce the # of pointers to Record Pages due to statement above, hence less records will have to be sent over the network.

Again, the number of index pages will be reduced, but because the JET engine "walks down" the B-Tree, the actual number of index pages fetched won't necessarily be that much different.

Question 3

Finally Suppose I decide to NOT to archive and instead add an additional filter to my query. Suppose I put a date range of 5 years on an indexed date field.

This scenario is not as good as question 2 because I am not reducing the # of Index Pages transmitted over the wire but I will get gains by reducing the record data transmitted over the wire due to the additional 5 year date filter

Am I correct?

See my discussion above about intersecting index values. Any time you can add an index that applies to a complex problem, you'll probably help performance.

How long is it taking to solve this query, and how many rows are you fetching? What's the SQL of your query? (And the SQL of any queries referenced in your final query?) How many people are sharing the database, and what's the speed of your network? (You need 100mb minimum speed to work well with Access.) What else is on the server?

John Viescas, author
Microsoft Office Access 2007 Inside Out
Building Microsoft Access Applications
Microsoft Office Access 2003 Inside Out
SQL Queries for Mere Mortals
http://www.viescas.com/
(Paris, France)
 

Users who are viewing this thread

Top Bottom