Speed up a Group By query (table with over 6 million records) (1 Viewer)

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
I have a query with just one table in it but with a lot of data in it, more than 6 million. The table contains data from 2006 up to 2018 (no new records are added to it or removed) but I only need from 2012 and above. So I was thinking to make a new table which wont contain anything under 2012. And I did but the gain was just about 15 seconds. The original query with same used criteria takes about 3 minutes, and with the new table 2012 and above just 15 seconds faster.

The table contains amount of payments and amount which he needs to pay. The query returns a field how much every customer owns CustomerDebts and a field how much did he pay CustomerPayments, besides those 2 a CustomerId, CustomerKontoNumber and 2 other fileds which are set to 0 as default.
The Group By is being done on CustomerId, First on CustomerKontoNumber, Expression on the 2 fileds which return 0 and Sum on the 2 fileds CustomerDebts and CustomerPayments.

Here is the actual SQL with criteria (i translated it from another language so its maybe easier to get into it):

Code:
SELECT First(CLng(Right(CStr([ItemDatabase].[Konto]),5))) AS CustomerId, ItemDatabase.Konto AS Konto,
 0 AS StartingBalance, Sum(ItemDatabase.CustomerDebts) AS CustomerDebts,
 Sum(ItemDatabase.CustomerPayments) AS CustomerPayments, 0 AS InterestDebt

FROM ItemDatabase

WHERE (((ItemDatabase.ItemDate)>#1/1/2012# And (ItemDatabase.ItemDate)<[forms]![frmBalanceForming]![txtDateStart]) 
AND ((Left([Konto],4))='2010' Or (Left([Konto],4))='2011' Or (Left([Konto],4))='2012') 
AND (([OrderId] & "/" & [OrderType])<>"1/PS") AND ((ItemDatabase.CustomerDebts)<>0) 
AND ((ItemDatabase.ItemType)<>"K")) OR (((ItemDatabase.Date)>#1/1/2012# 
And (ItemDatabase.Date)<=[forms]![rmBalanceForming]![txtDateStart]) 
AND ((Left([Konto],4))='2010' Or (Left([Konto],4))='2011' Or (Left([Konto],4))='2012') 
AND (([OrderId] & "/" & [OrderType])<>"1/PS") AND ((ItemDatabase.CustomerPayments)<>0))

GROUP BY ItemDatabase.Konto;

The Group By was actually set to this
GROUP BY CLng(Right(CStr([ItemDatabase].[Konto]),5));

and I moved it to GROUP BY ItemDatabase.Konto; which improved the query speed for about 20 seconds (tested on original database, not the one split from 2012)

I use this query later in another query, so I was thinking maybe instead making a query to write this query data to a temp table and use that temp table in the other query later on. However, this doesnt increase the speed of this query Im trying to improve but might help in the second one.

The Analyze performance gave me a tip to change the ItemDatabase.Date to Index and I did but I didnt see any improvement there so I put it back to no index.

Any tips are welcome!
Thanks :)
 

Attachments

  • ItemDatabase.PNG
    ItemDatabase.PNG
    16.7 KB · Views: 511
Last edited:

theDBguy

I’m here to help
Staff member
Local time
Today, 09:30
Joined
Oct 29, 2018
Messages
21,358
Hi. Just a quick question, you said the table has more than 6 million records but "no new records are added to it or removed," correct? If so, do the existing records get changed though? If so, how often?
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
No they wont be changed at all for at least a year, it stays as it is. Maybe yearly some will be added after new year or so. But that would be done manually, not from code or anything
 

theDBguy

I’m here to help
Staff member
Local time
Today, 09:30
Joined
Oct 29, 2018
Messages
21,358
No they wont be changed at all for at least a year, it stays as it is. Maybe yearly some will be added after new year or so. But that would be done manually, not from code or anything
Okay, in that case, would you consider using your GROUP BY query to make a temporary table, so you could have the result of the GROUP BY query to use wherever you need it, and you only had to run the GROUP BY query once? That is until you get new or updated records.
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
Hmm but the Date criteria changes often, the user puts it through a form. Wouldnt that create a problem?
 

theDBguy

I’m here to help
Staff member
Local time
Today, 09:30
Joined
Oct 29, 2018
Messages
21,358
Hmm but the Date criteria changes often, the user puts it through a form. Wouldnt that create a problem?
Ah, yes. That complicates things. Sorry.
 

isladogs

MVP / VIP
Local time
Today, 16:30
Joined
Jan 14, 2017
Messages
18,186
You do need to index the fields used in the WHERE clause in order to improve performance. But you also need to modify the filters to actually use the indexes.

For example, let's assume you index the following fields
1. ItemDate - filtering for > #1/1/2012# is fine as it allows the query to use the index
2. Konto - appears to be another date field but filtering for Left(Konto,4)='2010' ..or '2011' or '2012' ...prevents Access using any index created. Instead filter for Konto Between #/1/1/2010# And #12/31/2012#

Similar arguments may apply to your other fields used in the WHERE filter.
For more information, see this article on my website Optimise Queries

Also Date is a reserved word in Access and should never be used as a field name

EDIT
Is your database split?
With 6 million records, I suspect your data file is getting very large - possibly over 1GB?
It may be a good idea to consider upsizing to SQL Server
 
Last edited:

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
Nope, its not split probably should be but for the time being I have to leave it like this. Indeed, its over 1GB. I was reading your link today I saw it on my other thread and thats when I moved the Group By from Clng to just Konto but obviously I should read it few times more.
Thanks, I will try the suggestions now

Edit: Btw Konto is a text field not a date field, I put in the attachment a screenshot of the table structure

I keep the database on the C: drive of the Server PC and its not on the SQL Server. They are accessing the Server PC over RDP, would the database run faster on the SQL Server than local PC?
 
Last edited:

isladogs

MVP / VIP
Local time
Today, 16:30
Joined
Jan 14, 2017
Messages
18,186
Sorry. I had missed the screenshot...
You made the right decision to simplify the GROUP BY clause.
Indexing should make a significant difference to the times..if used properly.

Don't think you've mentioned how many records are found but are the quoted times the time to load the query results or does it include the time to scroll to the last record?

Using SQL Server without making other changes which make use of its features will not speed up the query. It may well be slower. However, there are many things SS can do that will speed up query times if you sure that processing is done on the server and not in Access
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
I saw that indexing is quite recommended, and it should be especially useful to this table because nothing will be added to it for a long time and definitely nothing will be updated?
Im just not quite sure which fields to index, I tried the date field but somehow the time to finish the query was the same. The query returns about 26000 records and its without the time needed to reach the last record.
Another thing, this is all being tested on my weak dual core laptop, on the server it takes about 55 seconds to run the whole code, for the query maybe 35-40, Im not sure.

Edit: currently are only the first 4 fields indexed and the DocumentType but this one is not used in the query
 

isladogs

MVP / VIP
Local time
Today, 16:30
Joined
Jan 14, 2017
Messages
18,186
Indexing is STRONGLY recommended.
It can make SELECT queries up to 250 times faster but UPDATE queries are slower as the indexes need to be updated as well as the field values.

Indexing should be used on all fields regularly used in WHERE, HAVING, GROUP BY or ORDER BY clauses unless the field only has a limited number of values E.g. boolean fields should not be indexed.

Your First expression in the SELECT clause is complex. Try to simplify or run stacked queries to get then use that value.

For info, my UK postcodes table has about 50 fields and over 2.6 million records. Its total size is over 1.6GB. However I can search it in less than a second by indexing carefully.

One last thing. You have a composite PK of four fields.
You may well find it faster if you use an auto number PK field and create a composite index for those four fields. Try on a backup copy of the DB
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
To reach the last record happens within a second, pretty fast.
Now I actually saw that the number of records dont match for the 2 different Group By fields.
But I could fix that by creating a new table where I will export all records from ItemDatabase WHERE Date would be from 2012 and above with the CLng function but without Group By so then I could do the Group By by CustomerId in the new table?

Edit: I forgot to add the attachments, here they are. Also I was writing before your last post so didnt see it. I will try out the suggestions
 

Attachments

  • GropByKonto.PNG
    GropByKonto.PNG
    18.9 KB · Views: 323
  • GropByClng.PNG
    GropByClng.PNG
    15.5 KB · Views: 498
Last edited:

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
Alright, I made a new table from the above query a bit modified (table in attachment)
SQL for the create table query:

Code:
SELECT CLng(Right(CStr([ItemDatabase].[Konto]),5)) AS CustomerId, ItemDatabase.Konto, 
ItemDatabase.ItemDate, 0 AS StartingBalance, ItemDatabase.CustomerDebts, 
ItemDatabase.CustomerPayments, 0 AS InterestDebt INTO ItemDatabase2012_2018

FROM ItemDatabase

WHERE (((ItemDatabase.ItemDate)>#1/1/2012#) AND ((ItemDatabase.CustomerDebts)<>0) 
AND ((Left([Konto],4))='2010' Or (Left([Konto],4))='2011' Or (Left([Konto],4))='2012') 
AND (([OrderId] & "/" & [OrderType])<>"1/PS") AND ((ItemDatabase.ItemType)<>"K")) 
OR (((ItemDatabase.ItemDate)>#1/1/2012#) AND ((ItemDatabase.CustomerPayments)<>0) 
AND ((Left([Konto],4))='2010' Or (Left([Konto],4))='2011' Or (Left([Konto],4))='2012') 
AND (([OrderId] & "/" & [OrderType])<>"1/PS"));

So Im inserting everything necessary into the ne table. Knowing that only the ItemDate will be modified through the form in the second query, I added everything grater than 1.1.2012.
So now that I have my new table with everything necessary in it Im ready to make the old/new query to Group By stuff from it. Same as in ItemDatabase data in the new table ItemDatabase2012_2018 wont be modified, only once per year some new records will be added manually.

In the new table I only included the fields which will be needed for the new query, and I dont intend to use the new table anywhere else, so is this ok? I mean there are no composite keys or primary keys in it or some other fields from the old table.

Here is the SQL of the new query with the new table:

Code:
SELECT ItemDatabase2012_2018CustomerId, First(ItemDatabase2012_2018.Konto) AS Kont, 
First(ItemDatabase2012_2018.StartingBalance) AS StartingBalance, 
Sum(ItemDatabase2012_2018.CustomerDebts) AS CustomerDebts, 
Sum(ItemDatabase2012_2018.CustomerPayments) AS CustomerPayments, 
First(ItemDatabase2012_2018.InterestDebt) AS InterestDebt

FROM ItemDatabase2012_2018

WHERE (((ItemDatabase2012_2018.ItemDate)<[forms]![frmBalanceForming]![txtItemDateStart])) 
OR (((ItemDatabase2012_2018.ItemDate)<=[forms]![frmBalanceForming]![txtItemDateStart]))

GROUP BY ItemDatabase2012_2018.CustomerId;

Now Im doing the Group By on the CustomerId because its no more complex and Im getting same number of records as the first original query. Only WHERE clause I left is the ItemDate because the others I dont need anymore, I got "rid" of them during the table creation.
Time for the query to finish is now 13 seconds. I think its a decent improvement but hopefully I didnt mess anything up with all this :D

My question now is, do I need a primary key here or to also add the 4 composite keys from my first table. For now I dont see a reason for it because nothing will be modified in the table, but later when adding new data it might matter?
Also, should I make the fields I do the Sum on also indexed?

Thanks!
 

Attachments

  • NewTableItemDatabase2012_2018.PNG
    NewTableItemDatabase2012_2018.PNG
    8.3 KB · Views: 504

arnelgp

..forever waiting... waiting for jellybean!
Local time
Tomorrow, 00:30
Joined
May 7, 2009
Messages
19,169
this is redundant, you can leave out the other:
Code:
WHERE (((ItemDatabase2012_2018.ItemDate)<[forms]![frmBalanceForming]![txtItemDateStart])) 
OR (((ItemDatabase2012_2018.ItemDate)<=[forms]![frmBalanceForming]![txtItemDateStart]))
already you have index there when your field has name like "ID", "num".
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 03:30
Joined
Jan 20, 2009
Messages
12,849
Applying any function to the field value in the Where clause will cause every record to be processed.

Code:
[OrderId] & "/" & [OrderType]<>"1/PS"

should be expressed as:
Code:
Not ([OrderId] = "1" AND [OrderType]="PS")

This expression will use the indexes on the fields.

In this case the query optimiser might even be smart enough to convert the expression itself but the principle is universal. Always use SARGable queries structures and avoid the need to apply any processing to every record.
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
this is redundant, you can leave out the other:
Code:
WHERE (((ItemDatabase2012_2018.ItemDate)<[forms]![frmBalanceForming]![txtItemDateStart])) 
OR (((ItemDatabase2012_2018.ItemDate)<=[forms]![frmBalanceForming]![txtItemDateStart]))
already you have index there when your field has name like "ID", "num".
So every field which is Number is by default indexed?
These OR are still confusing me in the criteria when there are multiple.

Galaxiom Thanks that speeded up my query, and one other had the same field before I changed it, needed time to execute was a little more than 6 seconds, after changing its under 2 seconds.
Probably the index on those fields was important too, I didnt test it without index.

I managed to speed up some of my other queries too with all the suggestions of you guys, thanks a lot! :)
 

isladogs

MVP / VIP
Local time
Today, 16:30
Joined
Jan 14, 2017
Messages
18,186
So every field which is Number is by default indexed?
No - but ID fields are indexed automatically by Access

These OR are still confusing me in the criteria when there are multiple.
Arnel's point was that you don't need both lines
<=[forms]![frmBalanceForming]![txtItemDateStart] covers both < and = so you don't need the other line as well

6 seconds is a lot better than your original time ... 3 minutes?

Suggest you try all the changes in a query on the original table and see how long that now takes
 

Galaxiom

Super Moderator
Staff member
Local time
Tomorrow, 03:30
Joined
Jan 20, 2009
Messages
12,849
No - but ID fields are indexed automatically by Access

This behaviour is controlled by a setting.
Options > Object Designers > AutoIndex on Import/Create

The default setting is "ID;key;code;num"

Fields with any of these names are automatically indexed on creation. You can enter whatever names you want there.
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
Hey guys I just didnt find time to do further testing.
I have another weird similar problem, with a different query and tables involved tho.

A query with 2 subqueries in it and one table.
FROM (qryPocStanjeAll RIGHT JOIN Konto ON qryPocStanjeAll.Konto = Konto.Konto) LEFT JOIN qryPrometAll ON Konto.Konto = qryPrometAll.Kont

If I do an order by Konto.Konto (Is a primary key and indexed, text type with length 9) the query executes forever. If I remove the order by the query executes in about 4 to 6 seconds but to reach the last record it takes forever.
There are only 2 tables involved in all the queries and both tables are linked from SQL server.
Now if I run the same query with order by from my .mdb database the query executes in about 4-6 seconds and reaches the last record in about a second or so..

I just dont understand what could it be, everything is same just in first situation tables are from SQL Server and in second one are from a .mdb..

The tables should be the same because I did an upsizing from the same .mdb when I migrated to SQL Server.

I came with a solution by creating 2 temporary tables and entering the data from the 2 subqueries into them and then instead the subqueries I use those 2 tables and it runs fine but I would like to figure out the core of the problem, it seems to be in the subqueries and the JOINs? However, when I execute the subqueries alone they execute pretty fast, on both SQL Server and .mdb.

Thanks!
 

Hello1

Registered User.
Local time
Today, 18:30
Joined
May 17, 2015
Messages
271
When I make a pass-through query, I go make write SQL in SQL Server and then paste it into access pass-through query it runs pretty fast, even faster than the .mdb version. However, its the first time I done it like this and my access query has IIF expression, IsNull and takes some data from unbound form fields and thats making me trouble in the SQL Server query, Im not quite sure how to make that.
 

Users who are viewing this thread

Top Bottom