Query is fast to run, but very slow to export results

Rudolph583 · Dec 3, 2024

Hi

I have run into a rather strange issue with a query and I'm wondering if any of you could think of a solution to this. I have got a SELECT query, which itself consists of several sub-queries and some of those again consist of sub-sub-queries. So overall my SELECT query has dependencies on about 20 underlying queries. All of the underlying queries are of type SELECT.

Before splitting my SELECT query into various sub-queries, my SELECT query used to be rather slow (about 2 minutes of run time). But thanks to splitting it into sub-queries it now runs very fast and now takes less than 5 seconds to execute and to display the result in the Access datasheet view. The result is about 10,000 rows across 9 different fields.

So far, so good. But when it comes to actually using the result of the SELECT query, the issues start:

1. When I want to click on the top-left arrow button in the Access datasheet view (to select all rows, aiming to copy them to the clipboard), Access seems to be re-computing the query again, but it keeps computing at high CPU load for many minutes and essentually just freezes. This is surprising, because the query result has already been computed and is already displayed in the datasheet view, so there should not be any need for Access to re-compute it. But even if for some reason it has to recompute after all, it should only take the same 5 seconds as the original computation. But it just computes for a LONG time and then just freezes and crashes.

2. Next, I tried just taking the result of the SELECT query and storing it into a temporary table so I can work with it more efficiently, without any risk of Acces re-computing the result. To do that, I have created a "Make Table" query, which creates a "SELECT * INTO MyTemporaryTable FROM NameofMySELECTQuery" type of SQL statement. While the creation of that temporary table eventually does succeed, it takes about 100 times longer than the runtime of my SELECT query. This is unexpected, because it should be almost as fast as my SELECT query, essentially a little bit of overhead for writing the result to a table, but this should add maybe 1 second of compute time for 10,000 rows, no more. Also, when the "Make Table" query is being executed, I noted that the progress bar at the bottom of Access behaves in a strange way: It keeps progressing to 100%, but then it jumps back to about 95% and then it keep jumping back and forth between 95% and 100% for several minutes until it's done.

3. I tried exporting the result of my SELECT query using the "External Data -> Export -> Excel" button in the ribbon. While it eventually does succeed, it again takes many minutes, although the SELECT query itself (before exporting) runs in less than 5 seconds.

4. Lastly, I tried building another query on top of my SELECT query. Nothing heavy, just adding a bit more data, someting that typically computes in about a second. Again the same behaviour there: Instead of the compute time for the whole thing going from, say, 3 seconds to 4 seconds, it now takes many minutes to run.

I'd be grateful for any suggestions on what could be causing this or ways to debug it. If anyhow possible, I'd like to avoid changing the structure of my SELECT query and making it less nested, because without the nesting the compute time of my SELECT query itself increases too much.

Thanks!

CJ_London · Dec 4, 2024

But thanks to splitting it into sub-queries it now runs very fast and now takes less than 5 seconds to execute and to display the result in the Access datasheet view. The result is about 10,000 rows across 9 different fields.

To be clear - is that how long it takes to display the initial screen? or when the recordcount of 10,000 records at the bottom is populated?. What matters is the latter - until that is populated you wont be able to do anything with the query. From what you are describing suggests that your 5 seconds is how long it takes to display the initial screen. Very simplistically. lets say that is 20 records so taking 1/4 second per record x 10,000 rows is 2500 seconds or around 40 minutes for the query to run completely.

With regards query efficiency - ensure all fields used in joins, criteria and sorting are indexed, avoid the Like criteria with an initial * - it negates the use of the index. Don't use domain or UDF functions (with a few possible exceptions) and don't use sub queries - although faster than domain functions, they still need to to execute for each row. Instead use aliased queries wherever possible.

And apply criteria as soon as you can in your query chain - means less data passed up to the next level.

Minor point but interrogating indexes based on strings takes longer than indexes based on numbers. A simplistic example - a long type number takes 4 bytes, If that number is a string it takes 2 bytes per character plus (if I remember correctly) a further 2 bytes. So a 5 digit number takes 4 bytes, the same number as text takes 12 bytes - 3 times the size and potentially 3 times slower as a result. This is very simplistic, indexing algorithms have improved significantly over the years so perhaps only 50% slower and will really only be apparent in very large datasets

Rudolph583 · Dec 4, 2024

Hi CJ_London, many thanks for your detailled response. You are making a great point. The less than 5 seconds is indeed only the runtime to display the initial datasheet screen and it takes much longer for the number of records to be displayed. That explains why exporting and other things take so long.

I had already implemented all the best practices that you suggested, except for one point: I am using quite a lot of sub and sub-sub queries. Mostly I use them to apply selection criteria early on, to reduce the number of records as soon as possible. In your experience, is it advisable to do so? Asking because you also mentioned that sub-queries lead to (slow) row by row execution. Wondering which of the 2 has a bigger effect in performance.

Also, would you be able to point me to an example SQL statement for an aliased query and how I can use it instead of a sub-query in a way which is optimal for runtime? Thanks again.

Mark_ · Dec 4, 2024

With multiple levels of queries, have you tried creating a table in your local front end instead of the back end that gets filled by sequential actions? Then you have a limited set of data to work with for your query. Once done, delete the records in your local temporary table.

This can avoid having the same subquery rerun multiple times. In general not the best option, but if your structure is that complicated its worth a shot.

Gasman · Dec 4, 2024

Perhaps a side end would be better as well, if continually run.

CJ_London · Dec 4, 2024

Rudolph583 said:
would you be able to point me to an example SQL statement for an aliased query and how I can use it instead of a sub-query in a way which is optimal for runtime?

difficult to provide an example that would be relevant to you as it depends on what the subquery is returning. but for example if you had a subquery to get the latest order date which might look like this

Code:

Select OrderPK, SupplierFK, (SELECT Max(OrderDate) FROM tblOrders A WHERE A.OrderFK=B.OrderPK) AS LatestOrderDate
FROM tblOrderHeaders B

Could be written as an aliased query

Code:

SELECT *
FROM (SELECT orderPK, SupplierFK, Max(OrderDate) AS LatestOrderDate
FROM tblOrderHeaders B INNER JOIN tblOrders A ON A.OrderFK=B.OrderPK
GROUP BY orderPK, SupplierFK) AS Q1

The above is aircode to illustrate the principle - the subquery is executed row by row, the aliased query is executed once

Pat Hartman · Dec 4, 2024

When you use nested queries, Access tries to build a query that merges everything. It doesn't run the queries separately unless the optimizer determines that to be the optimum solution. I rarely use temp tables but when I do, I use the "side-end" concept so the temp tables can be used by multiple users simultaneously. and don't bloat the FE or the BE. Sometimes, you can do better with optimizing your query than Access can because YOU have inside information whereas Access has to operate on general rules. So, I might try optimizing the data with multiple temp tables.

The_Doc_Man · Dec 4, 2024

One factor that you have to watch out for is if you have any improper JOINs. One way this can occur is if you had a choice between using a JOIN or a WHERE to filter out records and you chose the WHERE clause. It turns out that queries have a defined order of performing their various clauses. The JOIN is ALWAYS preferable based the idea that FROM and JOIN clauses are executed earliest in query processing, whereas WHERE clauses come slightly later. That means that if you have JOIN clauses, they reduce the amount of work done by the WHERE phase of the same query. If you have a complex WHERE, this can add up quickly. (For really small tables, you might never notice the difference.)

Short example:

Code:

SELECT A.X, B.Y, A.Z, B.W FROM A, B WHERE A.X = B.V AND B.Y < 1100 AND A.Z LIKE "*XYZ*" ;
SELECT A.X, B.Y, A.Z, B.W FROM A JOIN B ON A.X = B.V WHERE B.Y < 1100 AND A.Z LIKE "*XYZ*" ;

This is a contrived example to illustrate a point: Both examples should theoretically produce the same result. HOWEVER...

The first example creates a permutation of all possible combinations of A records and B records, then later (when it gets around to the WHERE clause) filters out a bunch from a recordset of the size A.COUNT * B.COUNT - which could get very big. If A has 200 records and B has 5000 records, the permutation join returns 1,000,000 records. To be clear, a "permutation JOIN" occurs when you name two source tables and didn't use a JOIN clause in the FROM clause.

The second example uses the JOIN to reduce the recordset immediately to only the records where A.X = B.V, thus reducing the work to be done by the later elements in the WHERE clause. Since this is no longer a permutation JOIN, the returned recordset can never be more than the size of the largest table, so 5000 records would come back. A LOT easier to examine than 1 million records.

There is a hidden "gotcha" in my example that multiplies this effect because using a LIKE clause with leading "*" guarantees the slowest possible search - a "contains" search - that negates any value of an index on field A.Z in the examples.

The reason I bring this up is that based on the results, you wouldn't be able to tell the difference - but based on the timing, it would be a world of difference. Using JOIN clauses, where possible, will DRASTICALLY improve the speed of your queries. Which is why when using sub-queries, you might do better if you can somehow turn the sub-query into a single query with a JOIN operation. EVERY JOIN will help your speed if you have a complex WHERE clause or other elements such as GROUP BY or HAVING and particularly if you have Aggregate clauses or ORDER BY clauses.

Isaac · Dec 4, 2024

Rudolph583 said:
Hi

I have run into a rather strange issue with a query and I'm wondering if any of you could think of a solution to this. I have got a SELECT query, which itself consists of several sub-queries and some of those again consist of sub-sub-queries. So overall my SELECT query has dependencies on about 20 underlying queries. All of the underlying queries are of type SELECT.

Before splitting my SELECT query into various sub-queries, my SELECT query used to be rather slow (about 2 minutes of run time). But thanks to splitting it into sub-queries it now runs very fast and now takes less than 5 seconds to execute and to display the result in the Access datasheet view. The result is about 10,000 rows across 9 different fields.

So far, so good. But when it comes to actually using the result of the SELECT query, the issues start:

1. When I want to click on the top-left arrow button in the Access datasheet view (to select all rows, aiming to copy them to the clipboard), Access seems to be re-computing the query again, but it keeps computing at high CPU load for many minutes and essentually just freezes. This is surprising, because the query result has already been computed and is already displayed in the datasheet view, so there should not be any need for Access to re-compute it. But even if for some reason it has to recompute after all, it should only take the same 5 seconds as the original computation. But it just computes for a LONG time and then just freezes and crashes.

2. Next, I tried just taking the result of the SELECT query and storing it into a temporary table so I can work with it more efficiently, without any risk of Acces re-computing the result. To do that, I have created a "Make Table" query, which creates a "SELECT * INTO MyTemporaryTable FROM NameofMySELECTQuery" type of SQL statement. While the creation of that temporary table eventually does succeed, it takes about 100 times longer than the runtime of my SELECT query. This is unexpected, because it should be almost as fast as my SELECT query, essentially a little bit of overhead for writing the result to a table, but this should add maybe 1 second of compute time for 10,000 rows, no more. Also, when the "Make Table" query is being executed, I noted that the progress bar at the bottom of Access behaves in a strange way: It keeps progressing to 100%, but then it jumps back to about 95% and then it keep jumping back and forth between 95% and 100% for several minutes until it's done.

3. I tried exporting the result of my SELECT query using the "External Data -> Export -> Excel" button in the ribbon. While it eventually does succeed, it again takes many minutes, although the SELECT query itself (before exporting) runs in less than 5 seconds.

4. Lastly, I tried building another query on top of my SELECT query. Nothing heavy, just adding a bit more data, someting that typically computes in about a second. Again the same behaviour there: Instead of the compute time for the whole thing going from, say, 3 seconds to 4 seconds, it now takes many minutes to run.

I'd be grateful for any suggestions on what could be causing this or ways to debug it. If anyhow possible, I'd like to avoid changing the structure of my SELECT query and making it less nested, because without the nesting the compute time of my SELECT query itself increases too much.

Thanks!

(One of the) moral(s) of the story is that rendering the first FEW records on-screen when the query first "finishes" running is VERY different than actually accessing all records, which is required when exporting, of course.

An example of something that can cause a MASSIVE difference between when the query finishes running vs. when all records are access would be when a VBA function is used in the query. (Something I wouldn't recommend doing but when absolutely necessary). You learn this when you start trying to Page Down after the query finishes running.

My advice would be look for ways to optimize your query.

Pat Hartman · Dec 4, 2024

The_Doc_Man said:
whereas WHERE clauses come slightly later.

Are you sure about that? I would think that a WHERE, especially when used against an indexed field, would execute first to limit the joined resultset.

The_Doc_Man · Dec 4, 2024

Pat Hartman said:
Are you sure about that? I would think that a WHERE, especially when used against an indexed field, would execute first to limit the joined resultset.

Every reference I've ever seen says that FROM/JOIN is handled BEFORE the WHERE clauses. Since I can't claim knowledge of EVERY possible SQL engine, I can't be sure. However, I've never seen any SQL "order of execution" reference that mentions an override of the order of clause execution. We also know that ACE & JET did not and do not perform sub-clause optimization (a.k.a. keyhole optimization in older references.) Therefore, what I have to go on would answer your question with "FROM/JOIN first, WHERE second." If you find a reference that says Access performs that particular operation in an unusual order, I'd be interested. But - forgive me if I misremembered this - doesn't a JOIN ... ON sequence require indexes too?

adhoustonj · Dec 5, 2024

Post the SQL and let us help. Is your db contained into one file, split between FRONT END and BACK END - and if so, is the BE still an accdb file?

arnelgp · Dec 5, 2024

maybe create a UDF that will dump all those 20 sub-queries to temp table.
then join them again on your final export.

Pat Hartman · Dec 5, 2024

The optimizer works on GP (general principles) so someone calculated the cost of the join first vs the where with and without indexes on either and made a decision to always perform one first and the other second rather than making the decision based on the presence/absence of indexes and deciding on the fly. Since the OP by analyzing his data can determine which reduces the record set the most, the fastest, he can use temp tables to force the decision to do the "correct" reduction first.

I don't expect the use of temp tables to increase the speed but, it could so you might as well fiddle around and see if you can find the optimum solution manually based on actual data.

PS, you probably need to try both with and without indexes on your temp tables. Have fun

Rudolph583 · Dec 8, 2024

Hi all, many thanks for the numerous replies! There were a lot of great suggestions and I'm working on finding the best combination of the mentioned measures to optimise the query runtime.

Query is fast to run, but very slow to export results

Rudolph583

New member

CJ_London

Super Moderator

Rudolph583

New member

Mark_

Longboard on the internet

Gasman

Enthusiastic Amateur

CJ_London

Super Moderator

Pat Hartman

Super Moderator

The_Doc_Man

Immoderate Moderator

Isaac

Lifelong Learner

Pat Hartman

Super Moderator

The_Doc_Man

Immoderate Moderator

adhoustonj

Member

arnelgp

..forever waiting... waiting for jellybean!

Pat Hartman

Super Moderator

Rudolph583

New member

Similar threads

Users who are viewing this thread