Solved Exclude Duplicate Records from a Query

davegoodo

Member
Local time
Tomorrow, 02:08
Joined
Jan 11, 2024
Messages
93
I have a query say, qryMain that has records I'm interested in, however the query contains some duplicates. I created a find duplicates wizard query to identify those records qryDupes. The qryDupes records are on 3 fields. So I'm thinking can I do a Join between these 2 recordsets to exclude the qryDupes records?

I'm not sure how to do this, would I need to use SQL to achieve it? Or are there easier ways?
 
I think it's a question of establishing why you are seeing duplicates. Do the duplicates reflect the query you are trying to write, or does it mean your data contains unwanted duplicates.
 
... would I need to use SQL to achieve it? Or are there easier ways?
SQL IS easy.
When telling stories about queries, it's helpful to show the SQL statement itself.
You should also take into account that queries rely on table data. If there are already unwanted duplicates, you should apply your leverage there and correct them.
 
I think it's a question of establishing why you are seeing duplicates. Do the duplicates reflect the query you are trying to write, or does it mean your data contains unwanted duplicates.
Thanks for the reply,
I understand why the duplicates are there, but they don't fit into the analysis I want to do. So in terms of what you have replied I'd say they are unwanted duplicates.
 
SQL IS easy.
When telling stories about queries, it's helpful to show the SQL statement itself.
You should also take into account that queries rely on table data. If there are already unwanted duplicates, you should apply your leverage there and correct them.
Thanks,
Maybe it's easy for you. The duplicates are unwanted but understandable as to why they are there. I just want to exclude them at this stage. I imagine there is an SQL statement which has the qryMain records and NOT IN qryDupes like a join that excludes the duplicates. I don't know what you mean by "apply your leverage there". I'm not sufficiently versed in SQL at the moment.
 
The duplicates are unwanted
With a unique index on the field or field combination that contains the duplicates, you can safely prevent unwanted duplicates in the table.
 
I imagine there is an SQL statement
A concrete proposal can only be created based on a specific specification. The SQL statement is a fairly comprehensive definition.
 
You can make your query a totals query, which will automatically show only 1 of each different grouped selection.

Such a query becomes non updatable though, which is awkward sometimes.
 
A concrete proposal can only be created based on a specific specification. The SQL statement is a fairly comprehensive definition.
I looked at the table in the first instance and saw a couple of dupes. I then ran the Find Dupes Wizard that gave a list of 12 records that are dupes.
Rather than delete them, I thought I could run a join that would not include the dupes. But that requires SQL which I'm very rusty at. I thought someone might give me an outline of what the SQL would look like?
 
You can make your query a totals query, which will automatically show only 1 of each different grouped selection.

Such a query becomes non updatable though, which is awkward sometimes.
Thanks,
Can I make a query into a totals query? That sounds like a good idea. I just want to exclude the dupes so I can analyse the non-dupe records.
I'll try that out and get back to you. Thank you.

I just did that but it didn't exclude the dupes. The SQL is attached.
 

Attachments

Last edited:
You can make your query a totals query, which will automatically show only 1 of each different grouped selection.

Such a query becomes non updatable though, which is awkward sometimes.

I've realised why there may be some confusion.. I'll need to explain further.
These queries concern analysing Horse-Racing information. The query I ran concerned getting a set of results where some horses have a characteristic called X. There is usually only one horse per race has that trait called X. When I did a group by query I found there were instances in the dataset where there were 2 horses with X. So I did a Find Duplicates query for the whole dataset there were 6 instances where there were 2 horses with X. So to analyse the race results where there is only one horse with trait X I wanted to exclude the instances from the Find Dupes query that returned 12 horses that were duplicates of trait X. Does that make any sense?

The duplicates are in fields RaceDate, Track, RaceNum and the Horse. The find dupes query revealed there were two instances of RaceNum which is/are the duplicates, but the horses names are unique. The problem is that there were 2 occurences of trait X for each of the Race Numbers. The horses names are different (not technicallly duplications). But the 6 pairs of horses in the same race (dupes) which had trait X. I hope that helps.

Summing up, the problem is there is only supposed to be 1 horse per race with trait X. I have a set of 233 races but there are six races where there are 2 horses with X. They are no good to my analysis. I only want to analyse the 233 - 12 = 221 races. To do that I need to exclude these instances where there are 2 horses with X. That's it in a big nutshell.
 

Attachments

Last edited:
So in the races with more than 1 candidate with Trait X, you want to only consider one of them.

In the races that have two candidates, are they both genuine candidates, or is the data at fault. If they are both genuine then maybe it's your premise that there can only be one example per race that's at fault.

If the data is not at fault, then do you want perhaps to exclude one of the candidates at random? That's the hard bit. Databases are predicated on set theory, and it's not easy therefore to remove an item from a set, that truly belongs on the set.

However, if you want to just ignore the "problem races", then first have a query to find the races with more than one Trait X. Then use an unmatched query to select the other races, and extract your Trait X Data from the unmatched query.
 
So in the races with more than 1 candidate with Trait X, you want to only consider one if them.

In the races that have two candidates, are they both genuine candidates, or us the data at fault. If they are both genuine then it's your premise that there can only be one example per race that's at fault.

If the data is not at fault, then do you want perhaps to exclude one of the candidates at random? That's the hard bit. Databases are predicated on set theory, and it's not easy therefore to remove an item from a set, that truly belongs on the set.

Maybe have a query to find the races with more than one Trait X. Then use an unmatched query to select the other races, and extract your Trait X Dara from the unmatched query.
Thanks Gemma,
So in the races with more than 1 candidate with Trait X I don't want to consider either of them. They are disqualified from further consideration.
That is why I want to exclude them from further analysis. I only want 1 candidate per race, not 2. The person who created the data made a judgement that two horses qualified and assigned both horses with trait X. I am only interested in analysing races where there is only one horse with Trait X. That's it.

I was thinking maybe I need to do an update query to the instances where there are two and mark them in a way that I can filter them out of the dataset. The data is at fault in the sense that the author of the data couldn't decide between the two horses and assigned both Trait X. That is not in my definition.

I like your suggestion of the unmatched query. I already have the query which finds races with more than one Trait X. I'll have to brush up on how to do the unmatched query. Does Access have an unmatched query wizard? I just checked it does. So I could put the original query with all records including those with 2 candidates with Trait X, and the Dupes query with only the 2 candidates with Trait X. Is that right? And then do the unmatched query?
 
Have I missed something but is there any reason why nobody has suggested using the DISTINCT / DISTINCT ROW predicate?

As a aside, which may not be of much use in this particular case, remember that no duplicate records are returned when you use a UNION operation, unless you specify ALL.
 
Last edited:
Thanks Gemma,
So in the races with more than 1 candidate with Trait X I don't want to consider either of them. They are disqualified from further consideration.
That is why I want to exclude them from further analysis. I only want 1 candidate per race, not 2. The person who created the data made a judgement that two horses qualified and assigned both horses with trait X. I am only interested in analysing races where there is only one horse with Trait X. That's it.

I was thinking maybe I need to do an update query to the instances where there are two and mark them in a way that I can filter them out of the dataset. The data is at fault in the sense that the author of the data couldn't decide between the two horses and assigned both Trait X. That is not in my definition.

I like your suggestion of the unmatched query. I already have the query which finds races with more than one Trait X. I'll have to brush up on how to do the unmatched query. Does Access have an unmatched query wizard? I just checked it does. So I could put the original query with all records including those with 2 candidates with Trait X, and the Dupes query with only the 2 candidates with Trait X. Is that right? And then do the unmatched query?
The unmatched query. You join all the races with the query identifying the multiple traits, and select the records that aren't matched. I'm pretty sure there's a wizard.
 
Have I missed something but is there any reason why nobody has suggested using the DISTINCT / DISTINCT ROW predicate?

As a aside, which may not be of much use in this particular case, remember that no duplicate records are returned when you use a UNION operation, unless you specify ALL.
It won't really help, because some of the horse races have multiple rows. The OP wanted to ignore those races, not just select a random row.
 
Probably all you need is this:
SQL:
SELECT
  RaceDate,
  Track,
  RaceNum,
  Horse
FROM qry2PlusCount
GROUP BY
  RaceDate,
  Track,
  RaceNum,
  Horse
HAVING COUNT(*) = 1
;
(Provided qry2PlusCount provides the correct info)
 
It won't really help, because some of the horse races have multiple rows. The OP wanted to ignore those races, not just select a random row.

The initial post says duplicate records and this is what DISTINCT ROW does. A record is a row: if they are duplicates only one will be seen.
 
I'm with DickyP on this one. SELECT DISTINCT seems to be what is needed if all you are doing is looking for 1 record per horse or per race/number combo or whatever is the X factor.

However, there is another approach, "divide and conquer."

Add a yes/no field to the table, maybe call it "DQX", to help you with the selection and exclusion.
Before you do your analysis, run an UPDATE query: UPDATE mytable SET DQX = FALSE ;
Then when you do your "find duplicates" query, mark the ones that have duplicates as DQX = TRUE.
Then when you do your analysis, select your records but include in the WHERE clause "... AND DQX=FALSE ..." so that you won't consider either of the horses in the races where duplication occurred.
 

Users who are viewing this thread

Back
Top Bottom