Go Back   Access World Forums > Microsoft Access Discussion > Tables

 
Reply
 
Thread Tools Rate Thread Display Modes
Old 09-13-2018, 06:15 AM   #1
BeeJayEff
Newly Registered User
 
Join Date: Sep 2013
Location: Somerset, UK
Posts: 198
Thanks: 37
Thanked 10 Times in 10 Posts
BeeJayEff is on a distinguished road
Indexing

One of my tables includes a pair of Yes/No fields : [SoldFlag] and [ReturnedFlag]. There are a number of queries which use these flags to search for items which are (Sold) or which are (Sold and not Returned) for example. I am not aware of any performance issues with these queries.

Running the Performance Analyzer results in recommendations that both these fields should be indexed, whereas I was of the belief that it was generally not a good idea to index fields which had very few possible values. To add some numbers/context, the table has nearly 400,000 records, of which 30,000 are Sold and of those, 800 have been Returned.

Should I index or not ?

BeeJayEff is offline   Reply With Quote
Old 09-13-2018, 06:28 AM   #2
arnelgp
Registered User
 
arnelgp's Avatar
 
Join Date: May 2009
Location: somewhere out there
Posts: 6,839
Thanks: 55
Thanked 2,183 Times in 2,094 Posts
arnelgp is just really nice arnelgp is just really nice arnelgp is just really nice arnelgp is just really nice
Re: Indexing

i agree with you.
__________________
"Never stop learning, because life never stops teaching"
arnelgp is offline   Reply With Quote
Old 09-13-2018, 07:31 AM   #3
The_Doc_Man
Happy Retired Curmudgeon
 
Join Date: Feb 2001
Location: Suburban New Orleans, LA, USA
Posts: 12,568
Thanks: 62
Thanked 1,196 Times in 1,096 Posts
The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold
Re: Indexing

The idea of to index or not to index (aye, THAT is the question) is that you need to know if there is a big enough benefit to outweigh the increased maintenance costs associated with the index that MUST be updated every time you add or delete a record and every time you change the indexed value for a record.

In the absence of information regarding value distribution, the cardinality (expected return percentage on a query of an indexed field) is 50% for T/F data. As tables get really big, the cost of the index gets really big and there is no really clear advantage to that index, taking overhead into account.

You, however, have information regarding distribution and you therefore know that you expect different results than the usual 50% predicted by purely statistical expectation. Therefore, this is the question you must ask:

How often will I look for the records for which those flags are FALSE?

If you will always and only look for TRUE cases, then your expected cardinality is asymmetric and skews the index/not index decision. If you expect 30K out of 400K to be TRUE then your TRUE expectation is 7.5% and your FALSE expectation is 92.5%. SO if your expectation of query "direction" is always for the TRUE case, you gain 92.5% of the time by having a query. (This is a simplified approach, not rigorously derived.)

Your other field expects 800 out of 400K which is 0.2% TRUE and 99.8% false. Again, if the queries are always in favor of seeking TRUE then you win 99.8% of the time by using an index.

The third case is the combination of the SOLD/RETURNED treated sequentially (i.e. first find SOLD then find RETURNED from that) which is 800 out of 30K which is 8/300 = 6.66% expectation of TRUE, and that gives you 93.33% improvement. Although, to be honest, SQL would probably not do them sequentially if both fields appeared in the same query.

Given that you have uneven expectations, the nature of your expected queries and searches will determine whether an index on a Boolean field is useful. Ask yourself how often you will search for TRUE and how often you search for FALSE. Then consider my comments in making your decision.

Remember, if the direction of query selection is equal between TRUE and FALSE, you lose the advantage because you in essence negated your knowledge of skewed distribution.

Hope this helps you to make your decision.

__________________
I'm a certified grandpa (3 times now) and proud of it.
Retired over one year and survived being home all day with the wife. She must really love me.
If I have helped you, please either click the thanks or click the scales.
The_Doc_Man is offline   Reply With Quote
The Following 2 Users Say Thank You to The_Doc_Man For This Useful Post:
BeeJayEff (09-13-2018), sonic8 (09-13-2018)
Old 09-13-2018, 08:24 AM   #4
BeeJayEff
Newly Registered User
 
Join Date: Sep 2013
Location: Somerset, UK
Posts: 198
Thanks: 37
Thanked 10 Times in 10 Posts
BeeJayEff is on a distinguished road
Re: Indexing

Quote:
Originally Posted by The_Doc_Man View Post
Hope this helps you to make your decision.
Awwww, yes it does but it means I need to put more thought into it ! I suspect that by far the most common search is for (not Sold), followed by (Sold and not Returned), followed by (Sold and Returned). There should be no cases of ((not Sold) and Returned). So I need to look at the frequency of searches as well as the distribution of values.
BeeJayEff is offline   Reply With Quote
Old 09-13-2018, 11:11 AM   #5
Pat Hartman
Super Moderator
 
Join Date: Feb 2002
Location: Stratford,Ct USA
Posts: 27,448
Thanks: 13
Thanked 1,422 Times in 1,355 Posts
Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light
Re: Indexing

There is another factor at play here. YOU don't build the execution plan for the query, the query engine does. The query engine has rules it uses to determine whether or not to even use an existing index and I believe that even if you create this index, the query engine will not use it since the engine will know that there are only two possible values.

Personally, I don't use this type of flag. I would have used two date fields since that conveys the Y/N info AND provides the date the action happened. I don't know if Access maintains statistics for the count of Y values for each Y/N field. If it does, it would know whether or not it makes sense to use the index.
__________________
Bridge Players Still Know All the Tricks
Pat Hartman is offline   Reply With Quote
Old 09-13-2018, 11:58 AM   #6
The_Doc_Man
Happy Retired Curmudgeon
 
Join Date: Feb 2001
Location: Suburban New Orleans, LA, USA
Posts: 12,568
Thanks: 62
Thanked 1,196 Times in 1,096 Posts
The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold
Re: Indexing

Good point, Pat. I said what I said because I am not 100% sure that Access would bypass the index. (NOT saying categorically that it will go one way or the other.) However, there IS another possibility: Change the two Y/N fields to a single encoded byte integer field for the states NOT SOLD, SOLD, RETURNED (since they are mutually exclusive when viewed that way). In that case, the index would work.

However, BeeJayEff is concerned that his queries all run the wrong way anyway, and that would defeat the algorithmic value of the information about skewed distribution.
__________________
I'm a certified grandpa (3 times now) and proud of it.
Retired over one year and survived being home all day with the wife. She must really love me.
If I have helped you, please either click the thanks or click the scales.
The_Doc_Man is offline   Reply With Quote
Old 09-13-2018, 12:46 PM   #7
Pat Hartman
Super Moderator
 
Join Date: Feb 2002
Location: Stratford,Ct USA
Posts: 27,448
Thanks: 13
Thanked 1,422 Times in 1,355 Posts
Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light Pat Hartman is a glorious beacon of light
Re: Indexing

I was going to suggest that but you still have only three values and that isn't enough to solve the problem so I wouldn't combine the fields since that just causes other issues.

__________________
Bridge Players Still Know All the Tricks
Pat Hartman is offline   Reply With Quote
Old 09-13-2018, 02:35 PM   #8
CJ_London
Super Moderator
 
Join Date: Feb 2013
Location: UK
Posts: 10,463
Thanks: 40
Thanked 3,381 Times in 3,276 Posts
CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice
Re: Indexing

other thing to do is just to add the indexes and see how it performs, not just for extracting data, but also for inserting as well. If there is no perceptible improvement, you have your answer. Frequently you will have other criteria which will have a much bigger impact on performance

I've not done any trials to prove it but believe a faster index is achieved if you have the situation where true (or any value for that matter) is only applied to a relatively small percentage of records - set the index to ignore nulls and then populate with 1 for true and null for false - and set the number type to byte if you want to make it even smaller

It might mean playing around with presentation a bit on returned records depending on controls used

-nz(yesnofield,0)


or if predominately expecting Nos, populate with 0 and use

-nz(yesnofield,1)

also worthwhile ignoring nulls on fields which are often not populated but should be indexed, perhaps middle names or a FK link to a spouse record (referential integrity will not be affected)
__________________
CJ_London
_______________________
A little thanks goes a long way. If you have found this post useful, please tick the thanks button
CJ_London is offline   Reply With Quote
Old 09-13-2018, 02:53 PM   #9
isladogs
Part time moderator
 
isladogs's Avatar
 
Join Date: Jan 2017
Location: Somerset, UK
Posts: 7,350
Thanks: 92
Thanked 1,811 Times in 1,686 Posts
isladogs is a glorious beacon of light isladogs is a glorious beacon of light isladogs is a glorious beacon of light isladogs is a glorious beacon of light isladogs is a glorious beacon of light
Re: Indexing

Some months ago I tried indexing in a similar situation to what you describing.
One such was a Boolean field where approx. 80% of 2.6 million records were true.

Results
1. Table size increased due to indexing thus increasing overall database size.
2. Searches using SELECT were approximately the same speed as before. No real benefit.
3. UPDATE queries took MUCH longer on that field or using that field.
IIRC, approx. 60s instead of 40s

I removed the index.

By contrast indexing definitely helped improve searches where fields had multiple values. In many cases, results were obtained n less than 10% of the time with no indices.
__________________
If this answer has helped, please click the Thanks button and/or click the 'reputation scales' symbol on the left.

Website links:
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.


Colin
Previously known as ridders : Access 2010 32-bit, Access 2016 32-bit & 64-bit, SQL Server Express 2014, Windows 10,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
,
To view links or images in signatures your post count must be 10 or greater. You currently have 0 posts.
isladogs is offline   Reply With Quote
Old 09-13-2018, 03:03 PM   #10
CJ_London
Super Moderator
 
Join Date: Feb 2013
Location: UK
Posts: 10,463
Thanks: 40
Thanked 3,381 Times in 3,276 Posts
CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice
Re: Indexing

did you try the ignore nulls basis? - in your case populate when false, null when true
__________________
CJ_London
_______________________
A little thanks goes a long way. If you have found this post useful, please tick the thanks button
CJ_London is offline   Reply With Quote
Old 09-13-2018, 06:55 PM   #11
The_Doc_Man
Happy Retired Curmudgeon
 
Join Date: Feb 2001
Location: Suburban New Orleans, LA, USA
Posts: 12,568
Thanks: 62
Thanked 1,196 Times in 1,096 Posts
The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold
Re: Indexing

CJ, the problem is that BeeJayEff suggests he would be using the FALSE case significantly, so using the nulls for FALSE should actually cost you more time, since you have that NZ function floating around to handle the NULL cases. Not to mention that intentionally inserting NULL in a field is problematic. Among other things, you have to assure that the "Required" flag is FALSE in the field definition, because if it isn't you CAN'T insert a NULL. You would get a constraint violation.
__________________
I'm a certified grandpa (3 times now) and proud of it.
Retired over one year and survived being home all day with the wife. She must really love me.
If I have helped you, please either click the thanks or click the scales.
The_Doc_Man is offline   Reply With Quote
Old 09-14-2018, 01:26 AM   #12
CJ_London
Super Moderator
 
Join Date: Feb 2013
Location: UK
Posts: 10,463
Thanks: 40
Thanked 3,381 Times in 3,276 Posts
CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice CJ_London is just really nice
Re: Indexing

@Doc - I was suggesting using a byte field type rather than a Boolean and was really contributing to the general discussion about performance optimisation where there is a significant imbalance between true and false. So in Bejays case, you would set 'true' to 1 and 'false' to null. when looking for false you would use a criteria <>1 rather than =null.

I don't see the need in this case to set a default value, since the bulk of the records are false. And personally I avoid using checkboxes for this sort of think - to easy to change a value without realising it

I accept the need to use the nz function if you want to display 'false/No/0' but in many cases a blank will do - in a list it is often easier to identity those rows that are the exception if the 'norm' is blank.
__________________
CJ_London
_______________________
A little thanks goes a long way. If you have found this post useful, please tick the thanks button
CJ_London is offline   Reply With Quote
The Following User Says Thank You to CJ_London For This Useful Post:
BeeJayEff (09-14-2018)
Old 09-14-2018, 02:38 AM   #13
BeeJayEff
Newly Registered User
 
Join Date: Sep 2013
Location: Somerset, UK
Posts: 198
Thanks: 37
Thanked 10 Times in 10 Posts
BeeJayEff is on a distinguished road
Re: Indexing

Thanks to all for your contributions - I hadn't expected so much discussion !

Quote:
Originally Posted by CJ_London View Post
other thing to do is just to add the indexes and see how it performs, not just for extracting data, but also for inserting as well. If there is no perceptible improvement, you have your answer. Frequently you will have other criteria which will have a much bigger impact on performance
The thing is that the suggestion to index these two fields only arose from using the Performance Analyzer - which cannot of course know how frequently the fields are referenced. I am not aware that there is in fact any real-world performance issue arising from the absence of indexes and given the amount of effort required to instrument the code and analyze the results, I shall for the time being leave things as they are. It ain't broke, so I ain't fixin' it !
BeeJayEff is offline   Reply With Quote
Old 09-14-2018, 05:05 AM   #14
The_Doc_Man
Happy Retired Curmudgeon
 
Join Date: Feb 2001
Location: Suburban New Orleans, LA, USA
Posts: 12,568
Thanks: 62
Thanked 1,196 Times in 1,096 Posts
The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold
Re: Indexing

Quote:
It ain't broke, so I ain't fixin' it !
Spoken like a true software engineer!
__________________
I'm a certified grandpa (3 times now) and proud of it.
Retired over one year and survived being home all day with the wife. She must really love me.
If I have helped you, please either click the thanks or click the scales.
The_Doc_Man is offline   Reply With Quote
Old 09-14-2018, 05:29 AM   #15
The_Doc_Man
Happy Retired Curmudgeon
 
Join Date: Feb 2001
Location: Suburban New Orleans, LA, USA
Posts: 12,568
Thanks: 62
Thanked 1,196 Times in 1,096 Posts
The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold The_Doc_Man is a splendid one to behold
Re: Indexing

CJ: The issue really isn't whether you use NULL or 0 or FALSE. The problem is that the larger number of records has FALSE for both flags. I think that in terms of optimization, you might not see much because BeeJayEff is contemplating that the search goes for FALSE more often than not.

BeeJayEff: While mulling this over, a thought occurred to me.

IF there is other information to be recorded regarding the sale (besides just the "SOLD" flag), put that in a CHILD TABLE of the main table with a one-to-many (main/SOLD) relationship. Then make an INNER JOIN query when looking at SOLD properties. You use one/many because one-one cases are rare, and one/many allows for cases of one/none.

If there is information to be recorded regarding the return (besides just the "RETURNED" flag), put THAT in a CHILD table of the main table. Again, make it a one/many (main/RETURNED) relationship. Then make an INNER JOIN query when looking at only RETURNED properties.

When looking for never-sold properties, write a query with a WHERE clause that includes the two child tables as

Code:
WHERE ... AND (PropID NOT IN ( SELECT PropID FROM SalesTable ) ) AND ( PropID NOT IN ( SELECT PropID FROM ReturnTable ) ) ...
and that setup WOULD use indexes. Of course, if there is no other information to record for SOLD and for RETURNED, then a two-field (ID, flag) table might be overkill. When looking for SOLD but not RETURNED, you would have an inner join on the SOLD table but a NOT IN clause to the RETURNED table.

Another wrinkle in the works would be if you can sell a returned property a second time, you have to do something to invalidate the record for the first sale. That is a detail to be considered if and only if you chose to go this way.

__________________
I'm a certified grandpa (3 times now) and proud of it.
Retired over one year and survived being home all day with the wife. She must really love me.
If I have helped you, please either click the thanks or click the scales.
The_Doc_Man is offline   Reply With Quote
Reply

Thread Tools
Display Modes Rate This Thread
Rate This Thread:

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Indexing geister General 13 01-17-2017 02:46 AM
help with indexing accessnewbie79915 Tables 1 03-03-2010 06:05 PM
indexing qwertyjjj Queries 0 10-02-2006 05:24 AM
Indexing M_Mike SQL Server 2 06-08-2006 03:29 PM
indexing scratch Tables 1 09-05-2005 07:20 PM




All times are GMT -8. The time now is 08:57 PM.


Microsoft Access Help
General
Tables
Queries
Forms
Reports
Macros
Modules & VBA
Theory & Practice
Access FAQs
Code Repository
Sample Databases
Video Tutorials

Featured Forum post


Sponsored Links


Powered by vBulletin®
Copyright ©2000 - 2018, Jelsoft Enterprises Ltd.
(c) copyright 2017 Access World