Long-running delete of duplicates

No one suggested creating a single index using all the fields.

Just create indices on each field that can be indexed (ie not memo fields)
And what good would that do? I do have a few such indexes, for lookups, and they work nicely. But for this task, I am looking for duplicate records across the ENTIRE set of fields, except the unique PK.

An index speeds up finding a record by a key value. More indexes do not make this faster - you can't look in multiple indexes at once.
 
I explained why you are wrong
You didn't, and I'm not.

Actually they can - but only the first 256 chars
You can't with the graphic - it doesn't even offer a memo field. You have to do it with DDL. And if it only indexes part of the field, it would be useless for my purpose. And even if it did, I still can't have an index that would cover the entire table, because there is a limit of 32 fields per index.
 
no point arguing- I’ve given a direction involving hashing and indexing. Up to you whether you try it
 
no point arguing- I’ve given a direction involving hashing and indexing. Up to you whether you try it
Yes, I acknowledged that the hash might be a good idea, and thanked you for the tip.

I have also several times explained to you why indexing the table as it is now is not possible, and even if it were possible, would likely still not solve the problem of the delete query doing repeated scans, instead of holding an intermediate result. If you have any ideas on THAT topic, I'd be pleased to hear them. Instead, you told me about a situation you worked on that is only somewhat similar, and how you invented a solution to prevent duplicates during mass imports, which is not what I needed - this user only enters one record at a time, manually, and I was trying to get rid of existing duplicates without having to wait several months for the query to complete.
 

Users who are viewing this thread

Back
Top Bottom