Finding duplicate records

KitaYama

Well-known member
Local time
Today, 10:13
Joined
Jan 6, 2022
Messages
1,756
A little hard to explain and it may be a little confusing.
I have a table with following fields:

PK (AutoNumber)
UserFK (Number)
SPS(text)
obj (text)
Lst (text)

I need a query that finds all the records that all fields (except PK) are the same, keeps the record with the largest PK, and deletes the rest.

PKUserFKSPSobjlst
10010LayoutEquations4
1016BaseBody59
1026BaseBody59
1036BaseBody59
10490BaseBody59
1056BaseBody59
1066BaseBody59
10710PlanesEquations4
10810LayoutEquations4
10910LayoutEquations4

In above sample data:

100 - 108 - 109 are duplicated. (all fields except PK are the same)
100 - 108 has to be deleted, 109 will remain (because it's the largest PK)
(107 is not duplicated, because SPS field is different) No action is necessary for this record.

101 - 102 - 103 - 105 - 106 are duplicated. (all fields except PK are the same)
101 - 102 - 103 - 105 has to be deleted, 106 will remain (because it's the largest PK)
(104 is not duplicated, because UserFK field is different) No action is necessary for this record.

If mixing a find duplicate with a delete query is not possible, a query that shows the duplicates can help too.
I talked to my supervisor to allow me to export a part of data to offer here, but unfortunately I wasn't able to get his OK.


Note : At present, I've written a function to find and delete these records, but since the table has a little less than a million records, I thought a query may be a wiser idea. ( If possible at all)

Thanks for any kind of help.
 
Last edited:
@KitaYama

I don't think your table is displaying as you intended, A lot of the info you reference is not visible.
I copied and pasted here. So the result were messed up.
I did some correcting.
On my side, it seems to be OK.
Can you check one more time?

thanks.
 
Most important note: Unwanted duplicates cannot occur in a table if a unique index has been placed over the corresponding field combination. You don't have to get rid of waste you don't produce.

the table has a little less than a million records
Larger amounts of data are always a small challenge. Try something like this:
SQL:
DELETE DISTINCTROW
   A.*
FROM
   TabX AS A
      INNER JOIN TabX AS B
      ON A.Key1 = B.Key1
         AND
      A.Key2 = B.Key2
         AND
      A.Key3 = B.Key3
         AND
      A.Key4 = B.Key4
         AND
      A.Key5 = B.Key5
WHERE
   A.PK < B.PK
Here too, indexing would help, ideally a composite index over key1-key5, at this point no unique index yet.

The simple select query:
SQL:
SELECT 
   MAX(A.PK) AS PK, 
   A.Key1, 
   A.Key2, 
   A.Key3, 
   A.Key4, 
   A.Key5 
FROM 
   TabX AS A 
GROUP BY 
   A.Key1, 
   A.Key2, 
   A.Key3, 
   A.Key4, 
   A.Key5
 
Last edited:
Most important note: Unwanted duplicates cannot occur in a table if a unique index has been placed over the corresponding field combination.
The tables resides in a CAD application. CADs, are applications used in engineering and 3D design. They have their own sql server databases for their own use. And for sure, I don't have control over them, and even if I had, certainly I don't want to take a step that may bring a several thousands/per year application down and ask them for support. I'm just bringing data from their database to extract what I need.

Thanks for the sql. I'll give it a try to see how it goes.
 
Last edited:
That means you create your own tables?
Both yes and no.
To use the data we need, we had to bring the data from their tables to ours. But the application costs so much and we don't want to take any risk. Though the exporting side of the data may never get harmed, but we were much comfortable to just bring down the data and work from there, rather than to work on a live set of data.
I know what you mean, and I'm sure you're correct. But the support team we contacted to, asked us to work on a copy or they don't take any responsibility.
Here, where I live, when some support guy says "Do it on your responsibility", nobody even breaths.

Anyway, both your SQLs were perfect.
I really appreciate your time and help.

Million thanks.
 
To any Admin who reads this.
Since the software update, we are not able to set a question as solved. If any of you can mark this thread as solved, please do so.

Copy button on the top right side of the Code sections, doesn't responds too.

Thanks.
 
you may also try:
Code:
DELETE *
FROM YourTable
WHERE PK NOT IN (
    SELECT Max(PK)
    FROM YourTable AS SubQuery
    GROUP BY USERFK, SPS, OBJ, LST);
 
you may also try:
Code:
DELETE *
FROM YourTable
WHERE PK NOT IN (
    SELECT Max(PK)
    FROM YourTable AS SubQuery
    GROUP BY USERFK, SPS, OBJ, LST);
Since the case is solved now, I have no way to test it. But I kept in a code library as a starting point for study.
Thanks.
 
you can test it on you sample data on post#1.
 
you may also try:
Code:
DELETE *
FROM YourTable
WHERE PK NOT IN (
    SELECT Max(PK)
    FROM YourTable AS SubQuery
    GROUP BY USERFK, SPS, OBJ, LST);
@arnelgp
if you get a few spare moment could you please try to explain for me what is going on in that code?
 

Users who are viewing this thread

Back
Top Bottom