Mythbusters - Speed Comparison Tests - Having vs Where (1 Viewer)

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
This is a follow up to a recent thread Desperate-Count Help Needed https://www.access-programmers.co.uk/forums/showpost.php?p=1597549&postcount=8

Part of this included discussion of the comparative efficiency of aggregate queries using WHERE with HAVING.
I thought that Galaxiom explained this very clearly so I've quoted him below:

Code:
SELECT field1, Sum(field2)
FROM table
GROUP BY field1
HAVING field 1 = something

Code:
SELECT field1, Sum(field2)
FROM table
WHERE field 1 = something
GROUP BY field1

The first one is what the Access query designer encourages a novice to build. The second is what the query should be.

The difference is that the WHERE is applied before the GROUP BY while the HAVING is applied after. The first query will group all the records then only return the Having. The second query selects only the "something" records and the Group becomes trivial.

I agreed completely with what was written by Galaxiom in that post.
I decided to adapt my speed comparison test utility to demonstrate this point for future use.

Unfortunately, the results were nothing like as clear cut as I had expected so I did two different versions with different datasets

a) Postcodes
For this test, I used my large Postcodes table with 2.6 million records.
Unfortunately, I cannot upload the database used as the large dataset means it is 240 MB in size

The number of postcodes for selected areas, districts, sectors & zones were counted and the results appended to a data table. There are around 700 aggregated records created

This was done in 2 ways:
1. HAVING - data is grouped then filtered, counted & added to the data table one at a time
Code:
CurrentDb.Execute "INSERT INTO PostcodesCount ( TotalPostcodes, PostcodeArea, PostcodeDistrict,  PostcodeSector, PostcodeZone )" & _
                " SELECT Count(Postcodes.ID) AS TotalPostcodes, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone" & _
                " FROM Postcodes" & _
                " GROUP BY Postcodes.InUse, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone," & _
                    " Right([PostcodeDistrict],1), Right([PostcodeSector],1), Right([PostcodeZone],1)" & _
                " HAVING (((Postcodes.InUse)=True)" & _
                " AND ((Postcodes.PostcodeArea) Like 'B*' Or (Postcodes.PostcodeArea) Like 'C*' Or (Postcodes.PostcodeArea) Like 'H*' Or (Postcodes.PostcodeArea) Like '*L')" & _
                " AND ((Right([PostcodeDistrict],1))='0' Or (Right([PostcodeDistrict],1))='3' Or (Right([PostcodeDistrict],1))='5' Or (Right([PostcodeDistrict],1))='8')" & _
                " AND ((Right([PostcodeSector],1))='1' Or (Right([PostcodeSector],1))='2' Or (Right([PostcodeSector],1))='5' Or (Right([PostcodeSector],1))='8')" & _
                " AND ((Right([PostcodeZone],1))='A' Or (Right([PostcodeZone],1))='N' Or (Right([PostcodeZone],1))='Q' Or (Right([PostcodeZone],1))='X'));"

2. WHERE - data is filtered then grouped, counted & added one at a time

Code:
 CurrentDb.Execute "INSERT INTO PostcodesCount ( TotalPostcodes, PostcodeArea, PostcodeDistrict, PostcodeSector, PostcodeZone )" & _
                " SELECT Count(Postcodes.ID) AS TotalPostcodes, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone" & _
                " FROM Postcodes" & _
                " WHERE (((Postcodes.InUse)=True)" & _
                " AND ((Postcodes.PostcodeArea) Like 'B*' Or (Postcodes.PostcodeArea) Like 'C*' Or (Postcodes.PostcodeArea) Like 'H*' Or (Postcodes.PostcodeArea) Like '*L')" & _
                " AND ((Right([PostcodeDistrict],1))='0' Or (Right([PostcodeDistrict],1))='3' Or (Right([PostcodeDistrict],1))='5' Or (Right([PostcodeDistrict],1))='8')" & _
                " AND ((Right([PostcodeSector],1))='1' Or (Right([PostcodeSector],1))='2' Or (Right([PostcodeSector],1))='5' Or (Right([PostcodeSector],1))='8')" & _
                " AND ((Right([PostcodeZone],1))='A' Or (Right([PostcodeZone],1))='N' Or (Right([PostcodeZone],1))='Q' Or (Right([PostcodeZone],1))='X'))" & _
                " GROUP BY Postcodes.InUse, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone," & _
                    " Right([PostcodeDistrict],1), Right([PostcodeSector],1), Right([PostcodeZone],1);"

The tests were each run 10 times and the average calculated for each test



Unexpectedly using HAVING was slightly faster but the difference in times was negligible


b) Patients
I then decided to try again adapting the dataset from my recent patient login kiosk mode example.
I imported the same 3300 records 9 times over to give a dataset of around 30000 records.
Duplicating patient names and dates of birth doesn’t matter for these tests.
Each record includes gender & date of birth (both indexed).

The database for this test is small enough to upload (attached)

Code was used to count the number of records grouping by birth day & birth month as well as gender and appended these record counts to another table.
The tests each created just over 700 records on each run.
As there were fewer records in this dataset, the code looped through each 3 times in each test
All tests were repeated 5 times and average values calculated


This time, there were 3 different tests:
1. HAVING - data is first grouped then filtered, counted and appended one at a time
Code:
For I = 1 To LC 'loop count
        For M = 1 To 12 'month count
            For D = 1 To 31 'day count
                db.Execute "INSERT INTO tblData ( RecordCount, Gender, BirthDateMonth )" & _
                    " SELECT Count(tblPatients.PatientID) AS CountOfPatientID, tblPatients.Gender,     Left([DOB],5) AS BirthDateMonth" & _
                    "  FROM tblPatients" & _
                    " GROUP BY tblPatients.Gender, Left([DOB],5), Month([DOB]), Day([DOB])" & _
                    " HAVING (((Month([DOB]))=" && M && ") AND ((Day([DOB]))=" && D && "));"
           Next D
        Next M
    Next I

2. WHERE – data is first filtered then grouped, counted and appended one at a time
Code:
    For I = 1 To LC 'loop count
        For M = 1 To 12 'month count
            For D = 1 To 31 'day count
                db.Execute "INSERT INTO tblData ( RecordCount, Gender, BirthDateMonth )" & _
                   " SELECT Count(tblPatients.PatientID) AS CountOfPatientID, tblPatients.Gender, Left([DOB],5) AS BirthDateMonth" & _
                   "  FROM tblPatients" & _
                   " WHERE (((Month([DOB]))=" && M && ") AND ((Day([DOB]))=" && D && "))" & _
                   " GROUP BY tblPatients.Gender, Left([DOB],5), Month([DOB]), Day([DOB]);"
            Next D
        Next M
    Next I

3. INSERT – data is grouped then all records were appended ‘at once’
Code:
For I = 1 To LC 'loop count
                db.Execute "INSERT INTO tblData ( RecordCount, BirthDateMonth, Gender )" & _
                    " SELECT Count(tblPatients.PatientID) AS RecordCount, Left([DOB],5) AS BirthDateMonth, tblPatients.Gender" & _
                    " FROM tblPatients" & _
                    " GROUP BY Left([DOB],5), tblPatients.Gender, Day([DOB]), Month([DOB])" & _
                    " ORDER BY tblPatients.Gender, Left([DOB],5);"
    Next I

Results were as follows


In these tests WHERE was faster than HAVING but the difference between them was once again negligible.
INSERT is of course dramatically faster than either of the other 2 methods as it all happens ‘at once’ rather than 'row by agonising row' (RBAR) in a recordset loop


=======================================================

Incidentally, I ran 4 different versions of the HAVING tests to compare the relative efficiency of each
All gave EXACTLY the same results but the times were VERY different
The comparative speeds were this time what I had expected

Method 1 (as above) - using Month & Day functions - approx. 78s for 3 loops
Code:
db.Execute "INSERT INTO tblData ( RecordCount, Gender, BirthDateMonth )" & _
                    " SELECT Count(tblPatients.PatientID) AS RecordCount, tblPatients.Gender, Left([DOB],5) AS BirthDateMonth" & _
                    "  FROM tblPatients" & _
                    " GROUP BY tblPatients.Gender, Left([DOB],5), Month([DOB]), Day([DOB])" & _
                    " HAVING (((Month([DOB]))=" & M & ") AND ((Day([DOB]))=" & D & "));"

Method 2 - as method 1 but also formatting dates by mm/dd/yyyy - approx. 298s for 3 loops
Code:
db.Execute "INSERT INTO tblData ( RecordCount, Gender, BirthDateMonth )" & _
                    " SELECT Count(tblPatients.PatientID) AS RecordCount, tblPatients.Gender, Left([DOB],5) AS BirthDateMonth" & _
                    "  FROM tblPatients" & _
                    " GROUP BY tblPatients.Gender, Left([DOB],5), Month(Format([DOB],'mm/dd/yyyy')), Day(Format([DOB],'mm/dd/yyyy'))" & _
                    " HAVING (((Month(Format([DOB],'mm/dd/yyyy')))=" & M & ") AND ((Day(Format([DOB],'mm/dd/yyyy')))=" & D & "));"

Method 3 - using Left & Mid functions - approx. 170s for 3 loops
Code:
db.Execute "INSERT INTO tblData ( RecordCount, Gender, BirthDateMonth )" & _
                    " SELECT Count(tblPatients.PatientID) AS RecordCount, tblPatients.Gender, Left([DOB],5) AS BirthDateMonth" & _
                    "  FROM tblPatients" & _
                    " GROUP BY tblPatients.Gender, Left([DOB],5), Mid([DOB],4,2), Left([DOB],2)" & _
                    " HAVING ((Int(Mid([DOB],4,2))=" & M & ") AND (Int(Left([DOB],2))=" & D & "));"

Method 4 – using Date Part functions – approx. 160s for 3 loops
Code:
db.Execute "INSERT INTO tblData ( RecordCount, Gender, BirthDateMonth )" & _
                    " SELECT Count(tblPatients.PatientID) AS RecordCount, tblPatients.Gender, Left([DOB],5) AS BirthDateMonth" & _
                    "  FROM tblPatients" & _
                    " GROUP BY tblPatients.Gender, Left([DOB],5), DatePart('m',[DOB]), DatePart('d',[DOB])" & _
                    " HAVING ((DatePart('m',[DOB])=" & M & ") AND (DatePart('d',[DOB])=" & D & "));"

========================================================

Overall Conclusions
The HAVING vs WHERE tests were done to provide evidence that the use of HAVING is far less efficient than using WHERE.
Logically this MUST be the case
However, the tests haven’t worked out anything like I had expected.

Now it may be that my code is flawed or my datasets not sufficiently large to make the relative efficiency of each test create significant differences.
:banghead:

OR I've just managed to show that this commonly expressed viewpoint is actually not true or has minimal effect. :confused:

I would appreciate others running the same tests and providing feedback on the results obtained.

I would also appreciate comments on the tests themselves together with any suggested improvements or alterations to the process.
 

Attachments

  • TestResults Having vs Where - 6A.PNG
    TestResults Having vs Where - 6A.PNG
    58.8 KB · Views: 812
  • TestResults Having vs Where - 6B.PNG
    TestResults Having vs Where - 6B.PNG
    55.8 KB · Views: 771
  • SpeedComparisonTest_v6B_HavingWhere.zip
    887.7 KB · Views: 360
Last edited:

CJ_London

Super Moderator
Staff member
Local time
Today, 08:47
Joined
Feb 19, 2013
Messages
16,553
Re: Speed Comparison Tests - Having vs Where

Like you, I've always believed WHERE provides a better optimisation than HAVING, but according to this thread, the execution plan is the same

https://stackoverflow.com/questions/328636/which-sql-statement-is-faster-having-vs-where

OR I've just managed to show that this commonly expressed viewpoint is actually not true or has minimal effect.
looks like not true:)

Ran your patients test and came out with results circa 44 seconds, also with a very small difference between the two methods.
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
Re: Speed Comparison Tests - Having vs Where

Good to see you get similar results to me (except for the fact yours complete in less than 2/3 of the time that mine do) :mad:

The SO link you provided was certainly interesting ....

It seems that getting unexpected results with these speed tests is getting to be a habit ....

Various tests done previously are available from my website: http://www.mendipdatasystems.co.uk/speed-comparison-tests/4594424200 and in various places here at AWF
1. Handling nulls: Trim / Len / Nz
2. CurrentDB vs DBEngine(0)(0)
3. DoEvents vs DBIdle.RefreshCache
 

CJ_London

Super Moderator
Staff member
Local time
Today, 08:47
Joined
Feb 19, 2013
Messages
16,553
Re: Speed Comparison Tests - Having vs Where

except for the fact yours complete in less than 2/3 of the time that mine do
i7 processor;)
 

Galaxiom

Super Moderator
Staff member
Local time
Today, 19:47
Joined
Jan 20, 2009
Messages
12,849
Re: Speed Comparison Tests - Having vs Where

Well there is a long held illusion gone.:eek: I adopted it very long ago when I saw a performance improvement in a query I changed.

Goes to show the importance of replicating experimental results in any science and ensuring the observations are not due to unexpected external factors.

Thanks to Colin for thorough testing.
 

jdraw

Super Moderator
Staff member
Local time
Today, 04:47
Joined
Jan 23, 2006
Messages
15,361
Re: Speed Comparison Tests - Having vs Where

My results

 

Attachments

  • speedHavingWhereInsert.PNG
    speedHavingWhereInsert.PNG
    57.4 KB · Views: 686

NauticalGent

Ignore List Poster Boy
Local time
Today, 04:47
Joined
Apr 27, 2015
Messages
6,280
Re: Speed Comparison Tests - Having vs Where

Colin, these tests of yours are simply fantastic. Thanks and please continue, Mythbusters has always been one of my favorite shows and this ranks right up there....
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
@NG
Just renamed the thread based on your comment :)
The irony here is I wasn't intending it to bust any myths

I already have another planned. Watch this space....

@jdraw
Thanks for sharing your results. Good to know someone has a PC slower than mine!
 

Minty

AWF VIP
Local time
Today, 08:47
Joined
Jul 26, 2013
Messages
10,353
I'll chime in and concur. I had always changed Having to Where if it appeared during a quick querybuild.
Interesting that the Insert takes longer on my "Spaceship spec" laptop.:confused:


EDIT : As an addendum obviously in all the cases where the Having clause can be moved to a Where clause without altering the result set the optimizer is (surprisingly to most of us) doing its job.

If you were to apply the Having to a aggregated result e.g. a Sum(Your field) > 100 then the clause can't be moved, and the "slowness" will apply.
 

Attachments

  • HavingWhereSpeeds.jpg
    HavingWhereSpeeds.jpg
    60.4 KB · Views: 655
Last edited:

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
Yessss!
My old PC is faster than Minty's dream machine, at least for the Insert.

Actually this may be due to the limitations of using system time for precise timing.
This only happens about 60-100 times per second so accuracy is really only to+ or -0.016s.
For larger times, that really isn't significant especially if you do an average of several tests.
However for short time intervals it can be proportionately significant.
In fact if you try the Insert for a single run (loop=1), you may get a negative time recorded.

However if yours really is slower on the Insert test than others are getting, that really would be a mythbuster.

As an aside, I really wish inline images would appear smaller by default at AWF as they do at other forums so you click them for an enlarged view.
Is there a setting for this or shall I suggest it to Jon?
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
For info, I have just posted another thread in the code repository explaining the use of the little documented Jet ShowPlan feature to optimise queries / VBA SQL statements. See https://www.access-programmers.co.uk/forums/showthread.php?t=302257

I have also used this particular test as an example.
It should make clear why the times for the HAVING & WHERE tests are almost identical despite going against long held beliefs by many of us. The reality is that no matter how you write your query, Access will try to optimise it so it seems WHERE / HAVING are treated the same

Its also worth pointing out that even Allen Browne got it wrong. See http://allenbrowne.com/QueryPerfIssue.html

The following is a quote from that web page which I read years ago and had until now believed to be the case

WHERE versus HAVING
Totals queries (those with a GROUP BY clause) can have both a WHERE clause and a HAVING clause. The WHERE is executed first - before aggregation; the HAVING is executed afterwards - when the totals have been calculated. It makes sense, then, to put your criteria in the WHERE clause, and use the HAVING clause only when you must apply criteria on the aggregated totals.

This is not obvious in the Access query designer. When you add a field to the design grid, Access sets the Total row to Group By, and the temptation is type your criteria under that. If you do, the criteria end up in the HAVING clause. To use the WHERE clause, add the field to the grid a second time, and choose Where in the Total row.
 

theDBguy

I’m here to help
Staff member
Local time
Today, 01:47
Joined
Oct 29, 2018
Messages
21,357
Hi,


Just curious... Would it be possible that Access is smart enough to convert the HAVING clause into a WHERE clause when it doesn't use any aggregation, so that's why the two SQL statements ran almost the same?


Just wondering...
 

CJ_London

Super Moderator
Staff member
Local time
Today, 08:47
Joined
Feb 19, 2013
Messages
16,553
be interesting to see the execution plan for a query that groups a number of fields (i.e. no aggregation) and SELECT DISTINCT.
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
Would it be possible that Access is smart enough to convert the HAVING clause into a WHERE clause when it doesn't use any aggregation, so that's why the two SQL statements ran almost the same?

Well it definitely optimises whatever way it thinks is best....

be interesting to see the execution plan for a query that groups a number of fields (i.e. no aggregation) and SELECT DISTINCT.

As requested, two grouped queries with SELECT DISTINCT - one HAVING & the other WHERE

HAVING (Query1H)
Code:
SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.InUse
HAVING (((Postcodes.InUse)=True));

WHERE (Query 1W)
Code:
SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes
WHERE (((Postcodes.InUse)=True))
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector;

The 2 execution plans are attached
What I need to study is why both versions repeatedly scan a table (tblKickout) that is nothing to do with the query
But that's a job for another day .. unless either of you can enlighten me!
 

Attachments

  • showplan2.zip
    577 bytes · Views: 330

CJ_London

Super Moderator
Staff member
Local time
Today, 08:47
Joined
Feb 19, 2013
Messages
16,553
Hi Colin,

Sorry, wasn't very clear:eek: - what I meant was


Code:
SELECT DISTINCT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes


versus

Code:
SELECT Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector

What I need to study is why both versions repeatedly scan a table (tblKickout) that is nothing to do with the query
But that's a job for another day .. unless either of you can enlighten me!
just a guess but is there a relationship between tblKickout and postcodes?
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
Hi CJ

There is no relationship between Postcodes & tblKickout
The Postcodes tables was a linked table from my UK Postal Addresses app which includes a forced closedown procedure. Every 30s, that app checks the status of a boolean field in tblKickout and shuts down if its true.
I just didn't expect that to be included in the ShowPlan as I didn't link that table
I assume its included as its being checked in the background all the time

To avoid complications, I'll use a local table this time (tblPatients from the Having vs Where test)

Query1
Code:
SELECT DISTINCT tblPatients.Surname, tblPatients.Forename, tblPatients.Gender, tblPatients.DOB
FROM tblPatients;
ShowPlan1
--- Query1 ---

- Inputs to Query -
Table 'tblPatients'
- End inputs to Query -

store result in temporary table

Query2
Code:
SELECT tblPatients.Surname, tblPatients.Forename, tblPatients.Gender, tblPatients.DOB
FROM tblPatients
GROUP BY tblPatients.Surname, tblPatients.Forename, tblPatients.Gender, tblPatients.DOB;

ShowPlan2
--- Query2 ---

- Inputs to Query -
Table 'tblPatients'
- End inputs to Query -

01) Group table 'tblPatients'

Not sure that tells you anything unexpected...
As you would expect, both give the same output of 3338 records
 
Last edited:

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 03:47
Joined
Feb 28, 2001
Messages
26,996
In regards to this test, I'm obviously behind the main thrust here, partly because I'm busy taking care of my wife after here surgery. However, a fine point comes to mind.

Does the test make a bigger difference (or ANY difference) when you do the comparison with vs. without indexing? I.e. is it possible that the old "HAVING clause" suggestion is correct for unindexed tables but less important when working with indexed tables?

Pat Hartman asked about whether the queries that break up dates were adequate to negate the effect of the indexes. If you REALLY want to negate the index effect, remove them. Otherwise you are left with doubt. Rather obviously, the 2.8 million Post Code case is one where you really wouldn't want to remove indexes, but the Patient list might be more reasonable.
 

isladogs

MVP / VIP
Local time
Today, 08:47
Joined
Jan 14, 2017
Messages
18,186
Hi Doc
The issue is indeed with indexes but not quite in the way you described.

Anyway, time for a large slice of humble pie. The tests were flawed!

I first became suspicious when I did similar tests as part of my Optimising Queries article:

At that occasion, I found WHERE was faster and also used the Jet ShowPlan feature to analyse exactly what was happening

I revisited these tests well over a year ago after a private message exchange with @Pat Hartman.
Pat suggested the tests might not actually be using the indexing ...and she was indeed correct!

However, I've only just got around to updating my website article today:

The issue with both sets of tests was that although the fields were all indexed, the queries had been constructed in a way that meant the indexing wasn't being used. For example, the postcodes queries were:

HAVING:
Rich (BB code):
INSERT INTO PostcodesCount ( TotalPostcodes, PostcodeArea, PostcodeDistrict, PostcodeSector, PostcodeZone )
SELECT Count(Postcodes.ID) AS CountOfID, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone, Postcodes.InUse, Year([Introduced])
HAVING (((Postcodes.InUse)=True) AND ((Year([Introduced])) Between 1985 And 1996));

WHERE:
Rich (BB code):
INSERT INTO PostcodesCount ( TotalPostcodes, PostcodeArea, PostcodeDistrict, PostcodeSector, PostcodeZone )
SELECT Count(Postcodes.ID) AS CountOfID, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone
FROM Postcodes
WHERE (((Postcodes.InUse)=True) AND ((Year([Introduced])) Between 1985 And 1996))
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone;

The important part is shown in BOLD RED text

The 'non-indexed' filter meant that the whole dataset had to be scanned in each case - so the time difference was minimal

AvgResultsNonIndexed6A.png


The Jet ShowPlan output clearly shows a full dataset scan being used

ShowPlanNonIndexedWHERE6A.png


So I amended both queries to use the indexes correctly:

HAVING:
Rich (BB code):
INSERT INTO PostcodesCount ( TotalPostcodes, PostcodeArea, PostcodeDistrict, PostcodeSector, PostcodeZone )
SELECT Count(Postcodes.ID) AS CountOfID, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone
FROM Postcodes
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone, Postcodes.InUse, Postcodes.Introduced
HAVING (((Postcodes.InUse)=True) AND ((Postcodes.Introduced) Between #1/1/1985# And #12/31/1996#));

WHERE:
Rich (BB code):
INSERT INTO PostcodesCount ( TotalPostcodes, PostcodeArea, PostcodeDistrict, PostcodeSector, PostcodeZone )
SELECT Count(Postcodes.ID) AS CountOfID, Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone
FROM Postcodes
WHERE (((Postcodes.InUse)=True) AND ((Postcodes.Introduced) Between #1/1/1985# And #12/31/1996#))
GROUP BY Postcodes.PostcodeArea, Postcodes.PostcodeDistrict, Postcodes.PostcodeSector, Postcodes.PostcodeZone;

The amended filter meant that Rushmore query optimisation was used in both cases so the times for each were significantly reduced ....
AND now WHERE was now almost 20% faster than HAVING

AvgResultsIndexed6A.png


Thus is the updated JETShowPlan output:

ShowPlanIndexedWHERE6A.png


For full details please read my revised web article (link given above)

My apologies for giving the wrong conclusions earlier.
From reading various threads at StackOverflow & elsewhere, many others have made the same mistake.

Anyway, this really is a case of a myth being re-busted!
 
Last edited:

CJ_London

Super Moderator
Staff member
Local time
Today, 08:47
Joined
Feb 19, 2013
Messages
16,553
We often see OP's using aggregate queries without any aggregation rather than SELECT DISTINCT - do you have an example to compare a SELECT DISTINCT v GROUP BY with perhaps a criteria or two?
 

Users who are viewing this thread

Top Bottom