Long-running delete of duplicates

pdanes · Nov 10, 2024

cheekybuddha said:
No one suggested creating a single index using all the fields.

Just create indices on each field that can be indexed (ie not memo fields)

And what good would that do? I do have a few such indexes, for lookups, and they work nicely. But for this task, I am looking for duplicate records across the ENTIRE set of fields, except the unique PK.

An index speeds up finding a record by a key value. More indexes do not make this faster - you can't look in multiple indexes at once.

CJ_London · Nov 10, 2024

pdanes said:
Just saying 'you are wrong' provides zero benefit to anyone, especially as I am NOT wrong.

I explained why you are wrong

pdanes said:
and memo fields cannot be indexed at all

Actually they can - but only the first 256 chars

pdanes · Nov 10, 2024

CJ_London said:
I explained why you are wrong

You didn't, and I'm not.

CJ_London said:
Actually they can - but only the first 256 chars

You can't with the graphic - it doesn't even offer a memo field. You have to do it with DDL. And if it only indexes part of the field, it would be useless for my purpose. And even if it did, I still can't have an index that would cover the entire table, because there is a limit of 32 fields per index.

CJ_London · Nov 10, 2024

no point arguing- I’ve given a direction involving hashing and indexing. Up to you whether you try it

pdanes · Nov 10, 2024

CJ_London said:
no point arguing- I’ve given a direction involving hashing and indexing. Up to you whether you try it

Yes, I acknowledged that the hash might be a good idea, and thanked you for the tip.

I have also several times explained to you why indexing the table as it is now is not possible, and even if it were possible, would likely still not solve the problem of the delete query doing repeated scans, instead of holding an intermediate result. If you have any ideas on THAT topic, I'd be pleased to hear them. Instead, you told me about a situation you worked on that is only somewhat similar, and how you invented a solution to prevent duplicates during mass imports, which is not what I needed - this user only enters one record at a time, manually, and I was trying to get rid of existing duplicates without having to wait several months for the query to complete.

CJ_London · Nov 10, 2024

pdanes said:
have also several times explained to you why indexing the table as it is now is not possible, and even if it were possible, would likely still not solve the problem

I think we all know that

Long-running delete of duplicates

pdanes

Registered User.

CJ_London

Super Moderator

pdanes

Registered User.

CJ_London

Super Moderator

pdanes

Registered User.

CJ_London

Super Moderator

Similar threads

Users who are viewing this thread