SQL Subquery to delete duplicate records. (1 Viewer)

eddix99

Registered User.
Local time
Today, 17:52
Joined
Jul 7, 2016
Messages
32
Tearing my hair out here!

I have a table with over 100k records. Many of these records have multiple duplicates, but all have a unique primary key.
I'm trying to write a query in Access SQL, where only the record with the lowest primary key value for each record will display. From there I can do an update query to add a 'true' attribute to a checkbox and then delete all the records with a 'false' attribute; thus clearing out all the duplicates
To do this I need to write a query and include a sub query:

SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name],[Business Data_TEST].[Keep?]
FROM [Business Data_TEST]
WHERE Min(id) AND
(SELECT Business Data_TEST].[Company Name]
FROM [Business Data_TEST] AS alt
WHERE alt.company name = [Business Data_TEST].[Company Name])
;

I cant get the query to work and I'm sure that it would take seconds for someone with MS SQL skills to identify where i'm going wrong. [I'm pretty sure i'm going wrong at the first WHERE statement]
I include a demo DB
Many thx in advance.
 

Attachments

  • Demo DB.accdb
    500 KB · Views: 55

eddix99

Registered User.
Local time
Today, 17:52
Joined
Jul 7, 2016
Messages
32
I had a look at another thread on this forum and made some changes, this is my latest attempt:

SELECT [Business Data_TEST].[ID], [Business Data_TEST].[Company Name], [Business Data_TEST].[zip/postal code],[Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE ID <>
(SELECT Min(ID)
FROM [Business Data_TEST] AS alt
WHERE alt.company name = [Business Data_TEST].[Company Name]
)
;
 

MarkK

bit cruncher
Local time
Today, 01:22
Joined
Mar 17, 2004
Messages
8,180
You may want to GROUP BY the fields that are duplicated, then find the Min(ID), something like...
Code:
SELECT Min(ID) As MinID, Count(*) As DupeCount, CompanyName, ZipCode
FROM Table
GROUP BY CompanyName, ZipCode
...as a way to select the rows you are talking about.
hth
Mark
 

eddix99

Registered User.
Local time
Today, 17:52
Joined
Jul 7, 2016
Messages
32
Thanks for the reply. I've had a go at this but get a 'syntax error in query expression' error message, specifically at the sub query part. Anyway, heres my latest attempt:
Code:
SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name], [Business Data_TEST].[zip/postal code], [Business Data_TEST].Telephone1 AS Expr1, [Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE [Business Data_TEST].ID <>

(SELECT Min(ID) AS MinID, Count (*) AS DupeCount,Company Name
FROM [Business Data_TEST]
GROUP BY CompanyName
)
;
 

MarkK

bit cruncher
Local time
Today, 01:22
Joined
Mar 17, 2004
Messages
8,180
Does that sub query work as a stand-alone query? Make sure that works first.

Then, in your WHERE clause, you can't use '<>' to compare a single field value in the main query against a sub query that returns multiple rows. You'll need the IN operator, like...
Code:
SELECT *
FROM YourTable
WHERE ID [I]<NOT>[/I] IN
   ( 
   SELECT Min(ID)
   FROM OtherTable
   GROUP BY Company, Zip
   )
hth
Mark
 

eddix99

Registered User.
Local time
Today, 17:52
Joined
Jul 7, 2016
Messages
32
I got it working. Thank you so much for your assistance. I took your advice and got the subquery working before transplanting it back into the main query.

my code was:
Code:
SELECT [Business Data_TEST].ID, [Business Data_TEST].[Company Name], [Business Data_TEST].[Keep?]
FROM [Business Data_TEST]

WHERE ID IN
 (SELECT  min(ID) AS lowid 
FROM [Business Data_TEST]
GROUP BY [Company Name]
)

ORDER BY [Business Data_TEST].[Company Name]

;
 

eddix99

Registered User.
Local time
Today, 17:52
Joined
Jul 7, 2016
Messages
32
This query gives the results that i'm looking for, but takes upwards of 20mins to run.

My original table with duplicates had over 140,000 records. Now I have whittled it down to just over 30,000.

:)
 

Users who are viewing this thread

Top Bottom