Need a query to identify duplicate customer orders (1 Viewer)

Ingeneeus

Registered User.
Local time
Today, 05:45
Joined
Jul 29, 2011
Messages
89
OK, I'm back, if you are by any chance still monitoring this thread (and I hope you are). I managed to eliminate the, uh, false positives that were being generated by the "shipping" and "sales tax" SKUs.
Code:
If [COLOR=Green](Not strOutSKU Like "Shipping") And (Not strOutSKU Like "Sales Tax*")[/COLOR] And (strOutSKU = strInnSKU) And (strOutCustID = strInnCustID) Then _ 
MsgBox "It appears that Customer " & strOutCustID & " has previously ordered " & strInnSKU & " - " & strInnProduct & " within the past 60 days"
You are correct that it's still returning too many records. When I run it with the date set to November 2 2015, it's reporting each repeated SKU, but doesn't seem to be using the CustomerID to determine whether a specific customer has ordered a specific SKU before. Thus, if SKU 855114005270 has been purchased by both CustomerID 115 and 277 (which in this date range it has), the MsgBox is triggered. It appears to be just looking at repeated purchases of the same SKU.

That seems to be the rub. I'm afraid your code is still above my head. Is there any more information I could provide that would help unravel this?
 

MarlaC

Registered User.
Local time
Today, 08:45
Joined
Aug 1, 2013
Messages
82
Here's one way that might work:
1. Create a query with your order header and details tables linked, with criteria on the order date field where order date <Date() and >Date()-7 (or however many days you want to check against).
2. For each order you cycle through in your import, check customer ID and SKU against that query called as a recordset, and if a match is found, report it to your user via MsgBox.

Edit: Sorry, I didn't see page 2 of the thread. Looks like UncleGizmo has it well in hand.
 

Uncle Gizmo

Nifty Access Guy
Staff member
Local time
Today, 13:45
Joined
Jul 9, 2003
Messages
16,304
I think the results you are getting are the answer to the question, however maybe your question has changed slightly?

I'm approaching the solution with the idea that we could catch the duplications if we were able to check each day's import against previous orders.

My interpretation of the question is:- Produce a list of today's orders, run through that list and find any duplicates within the Last 60 days.

I've done a screen recording showing the code working for a particular day that day being the 2nd of November 2015. See YouTube Video HERE:- MS Access - Find Previous Orders
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 07:45
Joined
Feb 28, 2001
Messages
27,266
Layering queries is perfectly legal.

Starting with innermost query: Join your OrderDetails, Orders, and OrderNumber tables to extract Customer, SKU, Order Date, Quantity, perhaps your Product Name, etc. Here, you would filter the input by asking the query to do something like taking the difference in days between OrderDate and Now() and saying that you only accept the last 7 days (or 14 days or whatever window you seek.) Or you could do a DateAdd of Now() and a negative number of days and then saying you would take only dates equal or later than the computed cutoff date. That limits the number of records to the most recent x days. LIMIT THE INNER QUERY DATA SIZE. This is based on a hypergeometric consideration of how to most efficiently limit the sizes of sets for subsequent computation.

Next layer of query: Using the innermost query as the basis, do a GROUP BY of CUSTOMER then SKU (leaving SKU/Product Name, Order Date, and any other fields as detail items.) Write queries to compute the DMax and DMin of the Order Dates for that customer and that SKU. E.g.

Code:
DMax( "[OrderDate]", "innerquery", "( [Customer]=" & CStr( [Customer] ) " ) AND " _
         "( [SKU] = " & CStr( [SKU] ) & ")" )

Something similar to the above... and name the query's field MinDate or MaxDate as appropriate. Also do a DSum similar to the above for the Quantity field, title it as something like TotOrdered.

Now, one last layer ... Retrieve the CUSTOMER, SKU, MINDATE, MAXDATE, TOTORDERED - and include a DateDiff in days between MAXDATE and MINDATE showing the positive number of days between the earliest and latest orders. Here, you can do ONE MORE FILTER - "WHERE TOTORDERED > 1" - so that you don't waste time considering orders that weren't duplicated.

That outer query returns to you all records where the customer ordered multiple quantities of a single SKU within the window of consideration. You could easily build a list of these records showing customer, SKU, quantity, and the number of days between the first and last order.

Now... you can ask yourself why all the gyrations? Answer: Consider Julius Caesar's advice when trying to subdue Gaul. Divide and conquer. Break the problem up into pieces-parts where you manage an individual sub-task needed to complete the whole task. By doing nested queries, each with a different purpose in the computation, you can break it up into manageable pieces. You can ALSO open the inner query or middle query to see what each intermediate recordset looks like - an easy form of debugging.
 

Ingeneeus

Registered User.
Local time
Today, 05:45
Joined
Jul 29, 2011
Messages
89
I think the results you are getting are the answer to the question, however maybe your question has changed slightly?



My interpretation of the question is:- Produce a list of today's orders, run through that list and find any duplicates within the Last 60 days.

I've done a screen recording showing the code working for a particular day that day being the 2nd of November 2015. See YouTube Video HERE:- MS Access - Find Previous Orders

Hi, Uncle -- thanks for the YouTube video. I apologize if my original post wasn't clear about what I was trying to; I had hoped to articulate it better. I should have said that I need a way to get an alert if a given customer is attempting to make a repeat purchase of a particular item. With the nature of our business, this is almost always a mistake. Ideally, we'd like to be catching those before we generate purchase orders to our suppliers.

Quick edit -- I liked the YouTube idea so much I decided to borrow it. Here are the results I'm getting: https://youtu.be/vuwDZ5iACCU
(I changed your MsgBox text to make it a little easier for me to follow)
 
Last edited:

Ingeneeus

Registered User.
Local time
Today, 05:45
Joined
Jul 29, 2011
Messages
89
Layering queries is perfectly legal.

Starting with innermost query: Join your OrderDetails, Orders, and OrderNumber tables to extract Customer, SKU, Order Date, Quantity, perhaps your Product Name, etc. Here, you would filter the input by asking the query to do something like taking the difference in days between OrderDate and Now() and saying that you only accept the last 7 days (or 14 days or whatever window you seek.) Or you could do a DateAdd of Now() and a negative number of days and then saying you would take only dates equal or later than the computed cutoff date. That limits the number of records to the most recent x days. LIMIT THE INNER QUERY DATA SIZE. This is based on a hypergeometric consideration of how to most efficiently limit the sizes of sets for subsequent computation.

Next layer of query: Using the innermost query as the basis, do a GROUP BY of CUSTOMER then SKU (leaving SKU/Product Name, Order Date, and any other fields as detail items.) Write queries to compute the DMax and DMin of the Order Dates for that customer and that SKU. E.g.

Code:
DMax( "[OrderDate]", "innerquery", "( [Customer]=" & CStr( [Customer] ) " ) AND " _
         "( [SKU] = " & CStr( [SKU] ) & ")" )
Something similar to the above... and name the query's field MinDate or MaxDate as appropriate. Also do a DSum similar to the above for the Quantity field, title it as something like TotOrdered.

Now, one last layer ... Retrieve the CUSTOMER, SKU, MINDATE, MAXDATE, TOTORDERED - and include a DateDiff in days between MAXDATE and MINDATE showing the positive number of days between the earliest and latest orders. Here, you can do ONE MORE FILTER - "WHERE TOTORDERED > 1" - so that you don't waste time considering orders that weren't duplicated.

That outer query returns to you all records where the customer ordered multiple quantities of a single SKU within the window of consideration. You could easily build a list of these records showing customer, SKU, quantity, and the number of days between the first and last order.

Hi, Doc Man -- thank you for taking an interest in my request. I hope I'm up to following you on this. I've got my first -- innermost -- query ("Items Ordered") up and running. It's reporting every SKU ordered by every CustomerID going back 60 days.
So far, so good.
Then, I create a second query ("Items Ordered pt2"), using Items Ordered as the data source. This second query will have the same fields as the first query, but I set it up for "Group By," making CustomerID the first grouping element and SKU the second.

This is where it gets a little murky.

Do I create a VBA module and designate Items Ordered pt2 as a recordset
Code:
Dim ItemsOrdered as DAO.Recordset
Set ItemsOrdered = Items Ordered pt 2
So that I can then employ your DMax and DMin code (once I figure out what it's doing)? Sorry if this seems like a strange question, but I'm in a little over my head here.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 07:45
Joined
Feb 28, 2001
Messages
27,266
Not required.

You might have some typing to do, but basically all three queries are created in the query design window (or at worst in the query SQL window). When you use the query builder, point to that JOIN query as the data source.

DMax( "[OrderDate]", "innerquery", "( [Customer]=" & CStr( [Customer] ) " ) AND " _
"( [SKU] = " & CStr( [SKU] ) & ")" )

This would be in one field (followed by "AS MaxDate" ) and you would have those other domain aggregates in two other fields - for MinDate and TotOrdered.

I'm not going to swear on a stack of Bibles here, but I think the date difference between MinDate and MaxDate can ALSO be created in this query, but if that one causes SQL to balk, you can do the date difference in the THIRD layer of the query, which you design by first selecting the SECOND layer query as the source of your data.

Now, if you wanted to do something special here regarding how you ask whether to ignore the second/subsequent order, that can be driven by a form using the THIRD layer query as its recordsource. You could populate a combo box or list box (probably list box, since that allows multi-select) with all records from the FIRST query (that shows individual order entries) that have the correct SKU and Customer ID.

Then, if you select one or more elements from the list box, you can make a command button to whack the order with the right SKU, product, etc. ... whatever your business rules suggest you must do. You could either cancel the entire order or choose to modify the order by removing elements thereof. That is going to be a decision for you because I don't know (and don't want to know) your business rules.
 

Ingeneeus

Registered User.
Local time
Today, 05:45
Joined
Jul 29, 2011
Messages
89
Hi, Doc Man --
Sorry for the delayed response. This site was down for me most of yesterday afternoon. Just kept getting "can't find server" errors. A colleague of mine said she was having trouble with several UK-based websites as well, so maybe it was something more systemic.

In any case, I researched DMax and DMin (and CStr), and attempted to implement your suggested query parameter. It took a bit longer than I had hoped -- I had to insert a couple of additional ampersands between some of the quotes. It was my intention to post the modified code here to see if what I did was viable. Unfortunately, the changes are on virtual "sticky note" on my PC at the office, so I won't be able to get around to doing that until Monday.

I ran the second level query with the DMax and DMin parameters to see what it would do, and I think I may have not done something correctly. It seems to be returning far too many results, and the DMax and DMin columns are producing "Type Mismatch" errors.

I will provide more details on Monday. Thank you again for following up on this -- I really appreciate it!
 

Ingeneeus

Registered User.
Local time
Today, 05:45
Joined
Jul 29, 2011
Messages
89
Ah, Monday, when nothing can possibly go wrong.

All right, this is what I was trying to post up here when the forum went down on Friday. As I said, I had to insert a couple of "&s" into your sample code, because Access kept telling me "The expression you entered contained invalid syntax: You may have entered an operand without an operator." I eventually changed it to
Code:
MaxDate: DMax("[OrderDate]","Items Ordered","( [CustomerID]=" & CStr([CustomerID]) [B][COLOR=SeaGreen]&[/COLOR][/B] " ) AND _ 
" [B][COLOR=SeaGreen]&[/COLOR][/B] "( [SKU] = " & CStr([SKU]) & ")") 
and
MinDate: DMin("[OrderDate]","Items Ordered","( [CustomerID]=" & CStr([CustomerID]) [B][COLOR=SeaGreen]&[/COLOR][/B] " ) AND _ 
" [B][COLOR=SeaGreen]&[/COLOR][/B] "( [SKU] = " & CStr([SKU]) & ")")
and stopped getting that error.

Unfortunately, when I ran the query with those parameters, I got a "Data type mismatch in criteria expression" warning the moment I hit the ! Run button. Over and over and over again.
I noticed that Access had assigned the DMax and Dmin parameters to Group By, so I removed that from each and that time it ran, but the MaxDate and MinDate columns in the resulting datasheet view had nothing but #Error. If I clicked into any of the #Error cells, I got the "Data type mismatch in criteria expression" again. It also appears to be creating some duplicate lines (see attached image)

I will continue tinkering with it :banghead: but any insight you could share would be most appreciated.
 

Attachments

  • Items Ordered pt2.jpg
    Items Ordered pt2.jpg
    64.8 KB · Views: 101

MarlaC

Registered User.
Local time
Today, 08:45
Joined
Aug 1, 2013
Messages
82
When you removed Group By, what did you change it to? It should be Expression, as should all calculated fields and domain aggregates that aren't using the query's built in Sum, Max, etc.

Also, when you use strings as criteria, as in
Code:
MinDate: DMin("[OrderDate]","Items Ordered","( [CustomerID]=" & CStr([CustomerID]) & " ) AND _ 
" & "( [SKU] = " & CStr([SKU]) & ")")
they need to be enclosed as strings, as in
Code:
MinDate: DMin("[OrderDate]","Items Ordered","( [CustomerID]='" & CStr([CustomerID]) & "' ) AND _ 
" & "( [SKU] = '" & CStr([SKU]) & "')")
(note single quotes ' inside the double quotes " surrounding your string criteria).
 

Users who are viewing this thread

Top Bottom