How to get rid of double records (SQL) - HELP! (1 Viewer)

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Hi,
I (really urgent) need to get rid of double records in the result of a query.
Any help would be highly appreciated! :)

My database looks like this:

orders.partno
orders.date
orders.data1
orders.data2
orders.data3

I need to write an SQL-query that gives me all the fields from the table above as result, but every partno should only exist ONCE (means "removing" any double partno records). Important is that when there are double partno records, it should keep the record with the most recent orders.date (if there are several records with the same orders.date for one orders.partno, it doesn't matter which of them is kept / removed).

I am unfortunately not experienced at all in SQL (I can read easy code but not write).

An employee working with SQL, but not for access, sent me this. He just hade time to sent me a general answer (below). I have not been able to transfer this proposal into a working SQL (where t would be my table orders above I assume):

His quick generic proposal:
select
distinct
t.a,
t.b,
(select MAX(c) from #test where a = t.a) as c
from #test t

I tried to write an SQL like this (with my table and fields):
select
distinct
orders.orderno,
orders.date,
orders.data1,
orders.data2,
orders.data3,
(select MAX(c) from #test where partno = orders.partno) as c
from #test orders

I get a syntax error on the select statement (I don't know if I correctly understood how the field c should be handled)

I hope you can help me out. Many thanks in advance!
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Many thanks for a quick reply!

Unfortunately I don't think this solution will work in the situation where I have two identical dates for a double record. Then I think I will get both of those records with the highest date (for that specific partno), not only one of them.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
do you want to delete the dup with early date or just ignore them from the result?
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Thanks for your interest in my problem!

I just want to ignore them in the result (the database itself should not be changed).

Example:

Partno...Date............Data1 (and Data2 etc.)
abc...24th of Dec...999....(keep this record OR...
abc...24th of Dec....888 ... keep this, but only one of them!)
abc....21st of Dec....777
def....19th of Dec....666...keep this
def....7th of Dec....555
ghi.....1st of January...444...Of course keep this
 
Last edited:

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
SELECT Orders.PartNo, Orders.Date, Orders.Data1, Orders.Data2, Orders.Data3
FROM Orders
GROUP BY Orders.PartNo, Orders.Date, Orders.Data1, Orders.Data2, Orders.Data3
HAVING (((Orders.Date) In (select max(t1.[date]) from Orders As t1 Where t1.PartNo=Orders.[PartNo])));
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
my previous post will not work.

i resort to using a user-define function.
copy and paste in a Standard Module.

note however that for this to work you
need to put a sorting order in the query
by:

Orders.PartNo ASC, Orders.Date DESC

your query:

SELECT Orders.PartNo, Orders.Date, Orders.Data1, Orders.Data2, Orders.Data3
FROM Orders WHERE fncInclude([PartNo])
ORDER BY Orders.PartNo, Orders.Date DESC;

Code:
Public Function fncInclude(pPart As Variant) As Boolean
    
    Static thisPart As String
    
    If thisPart = "" Or thisPart <> pPart & "" Then
        thisPart = pPart & ""
        fncInclude = True
        Exit Function
        
    End If
End Function
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Thanks again for your second reply!
Update: I realized that I probably have done a thinking error. Your solution might work. I will check again!

I tried this and I got an answer. It was however too few lines.

In totalt I have 22.712 records in the table "orders".
I know that 87 are double => 22.712 - 87 = 22.625 lines should be in the output.
Your query however only give 22.223 lines in the output (402 lines "too few").
Do you think you can find the problem?

Extra info:
This short query gives 22.625 lines as output:
SELECT DISTINCT
orders.partno, orders.date
FROM orders;
 
Last edited:

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Hi again,
close to the goal! I have found out that your query does the correct job. Many thanks!! But....

I however found out a problem with my data. Some of the fields orders.partno contains some trailing blanks. Blanks at the end are not actually valid. The field should always be 15 positions. Any positions after this are not valid.

Therefore, some records look different, but only differs because one "partno" is 15 positions with some blanks after (like 19 positions in total) and some are "just 15 positions" (without any blanks).

If this somehow could be implemented in your solution it would work: "adjusted PartNo" = LEFT (PartNo;15) (but I still would like to call the field partno if possible).

Or even better:
Get rid of these annoying blanks when creating the source "table" orders.

The "table" orders is "created" using this query:
SELECT Orders.PartNo, Orders.Date, Orders.Data1, Orders.Data2, Orders.Data3 FROM Excelfile1
UNION SELECT Orders.PartNo, Orders.Date, Orders.Data1, Orders.Data2, Orders.Data3 FROM Excelfile2
UNION SELECT Orders.PartNo, Orders.Date, Orders.Data1, Orders.Data2, Orders.Data3 FROM Excelfile3
(and another 4 similar Excelfiles in the same way)
If I could truncate the field PartNo in the Union-query above, only saving 15 leftmost positions of PartNo, the problem would be fully solved and I would even have a better “order table” than before! :)
Many thanks in advance!
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
if you include the date field you will get more. you only need 1 record per partno, am i correct?

select distinct partno from orders

give me ur feedback.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
ooops my mistake, you are right, 1 record per partno+date:

Code:
Public Function fncInclude(pPart As Variant, pDate as Date) As Boolean
    
    Static thisPart As String
    
    If thisPart = "" Or thisPart <> pPart & pDate & "" Then
        thisPart = pPart & pDate &  ""
        fncInclude = True
        Exit Function
        
    End If
End Function

change the criteria of the query and add the date field:



... WHERE fncInclude([PartNo], [Date])
 

isladogs

MVP / VIP
Local time
Today, 11:44
Joined
Jan 14, 2017
Messages
18,209
Post 9 was moderated for some reason. Posting to trigger email notifications
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Hi, there were some confusions in the conversation above, because one of my answers wasn't published (somehow "blocked" / delayed by the system).

However, you are not wrong, you are right!
I only need ONE instance per PartNo (and the additional data is selected from the record with the highest date). This means your first query (post #7) does the job correctly and 22.223 lines is correct (i was wrong). Thank you!!!

I have one remaining problem, described in post #9 above (especially the part below "Or even better:"). Can you please help out with this last thing?
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Hi, I have bad news. Your query and macro runs perfect within MS Access. We are however calling an Access Query from within Excel, and there it crashes with an error message saying that the function doesn't exist. I guess macros can't be access this way.

Therefore I wonder if there might be a plain SQL-solution without a macro?
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
here is a pure SQL without the function:

1.)
SELECT T1.PartNo, T1.tDate As [Date],
(SELECT TOP 1 [Data1] FROM Orders AS T2 WHERE T2.PartNo=T1.PartNo And T2.[Date]=T1.tDate) As [Data1],
(SELECT TOP 1 [Data2] FROM Orders AS T2 WHERE T2.PartNo=T1.PartNo And T2.[Date]=T1.tDate) As [Data2],
(SELECT TOP 1 [Data3] FROM Orders AS T2 WHERE T2.PartNo=T1.PartNo And T2.[Date]=T1.tDate) As [Data3]
FROM (SELECT PartNo, Max([Date]) As [TDate] FROM Orders GROUP BY PartNo) AS T1


regarding the union query you can use the Left function:

2.)
SELECT LEFT(PartNo, 15) As TPart, .... From excel1
Union
SELECT LEFT(PartNo, 15) As TPart, ... From excel2


now since PartNo has change to TPart in the Union, you also change it in query #1:

SELECT T1.TPart As PartNo, T1.tDate As [Date],
(SELECT TOP 1 [Data1] FROM Orders AS T2 WHERE T2.PartNo=T1.TPart And T2.[Date]=T1.tDate) As [Data1],
(SELECT TOP 1 [Data2] FROM Orders AS T2 WHERE T2.PartNo=T1.TPart And T2.[Date]=T1.tDate) As [Data2],
(SELECT TOP 1 [Data3] FROM Orders AS T2 WHERE T2.PartNo=T1.TPart And T2.[Date]=T1.tDate) As [Data3]
FROM (SELECT TPart, Max([Date]) As [TDate] FROM Orders GROUP BY PartNo) AS T1
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Thanks! I tried the query, and unfortunately I get huge performance problems (first page after a while, then the screen locked "for ever").

Do you think there is a way to write the query to come around the performance problem?

Many, many thanks for all the time you spent on my topic so far!!
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
you know if the final step as you suggest in post #9 that the final is excel, you do not need to involve msa. just use excel. consolidate all sheets into one. select all columns. sort by partno and date. remove duplicate (by partno). then youre done.
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Thanks for the fast reply. The table "orders" is not just sent back to Excel. It is joined with some huge table in our datawarehouse and used for several reports. Therefore it has to be in Access. Hmmm... this is really difficult.
 

arnelgp

..forever waiting... waiting for jellybean!
Local time
Today, 18:44
Joined
May 7, 2009
Messages
19,231
suggest you do it first in excel, much simpler.
after coming to the final result, 1 record per partno+max date, then you can
create a link table in msa and use this link table instead of building a slow query.
 

bsk13

Registered User.
Local time
Today, 12:44
Joined
Sep 14, 2018
Messages
11
Hi. Thanks again for your suggestion and all your effort in this thread!
To do it in Excel is not really easy. Seven independent (similar looking) "order files" are updated by different users spread over the company. MS access links to these seven and creates an aggregated report. All reporting works dynamically "on-demand" for the users. Therefore it must work without any "manual steps" bu "super user". Perhaps there are no possible MS Access solution that will meet the demands I wrote on in earlier posts (like no macro, high speed etc), and I instead have to re-think the concept or think somehow unconventional...
 

Users who are viewing this thread

Top Bottom