Procedures for importing and duplicate checking? (1 Viewer)

AndrewS

Registered User.
Local time
Today, 09:56
Joined
Feb 21, 2017
Messages
30
A request to see if I don't need to reinvent the wheel ...

We've been using Eventbrite for managing bookings on workshops etc and I've been asked to look at importing that data into our Access contacts database. I can download the booking data from the Eventbrite website in Excel format, and it's no problem just uploading it. But I'd like to minimise manual process and avoid duplicates.

So what I'm looking for is if anyone has any procedures (I couldn't see anything in the Code Repository forum here) that they're willing to share, and that would cover uploading from Excel, checking for duplicates and (this is my feeling as to the way to go), presenting the user with lists of current records and possible duplicates in the new data, so they can accept or reject possible matches.

I appreciate this is a bit cheeky, but I'm sure I'm not the first person to need to do something like this, and even something vaguely similar would be useful to get my thinking on the right lines.

Cheers.
 

Minty

AWF VIP
Local time
Today, 09:56
Joined
Jul 26, 2013
Messages
10,371
I don't think you'll find something pre-written, as most excel imports processes are bespoke to the data layout coming in and the layout required within your system.

The easiest way is normally to have some form of staging table that you upload the new data into. You then query that against your existing data.

You can present this query result as a list box perhaps, depending on the number of records. You can then select the list box items you want to import, and do an append query, for either just the selected items, or the selected items and any that aren't matched (new data?)
 

Cronk

Registered User.
Local time
Today, 18:56
Joined
Jul 4, 2013
Messages
2,772
To add to Minty's comments, and to give an example, I had a case of taking Excel data of military awards to individuals and trying to sort out duplicate personnel. While military service number was one identifying field, some had enlisted multiple times (with different service numbers) and at one stage when different military services had in the past issued the same set of service numbers to respective recruits.

Then there is the possibility of different forms of the same name eg William/Bill. On top of that, there will be typos.
 

jdraw

Super Moderator
Staff member
Local time
Today, 04:56
Joined
Jan 23, 2006
Messages
15,378
Further to the comments by Minty and Cronk which I agree with, many people have spent careers cleansing data. A staging type table is often part of the solution, and you can probably make use of unique composite index.
An issue you will face is to clearly identify what makes a duplicate, or said differently - is person A the same person as B or C?

I found this re temp/staging table in a response to a similar question:
"Most likely you use a temp table when you are importing records from elsewhere (like, say, Excel, or another db, perhaps maintained by someone else). Then a form is used to manipulate the record data (manually, with queries or with VBA procedures) - adding, deleting, modifying records/data until the records are in the condition you want them to add to a "permanent" table."

Good luck.
 
Last edited:

Uncle Gizmo

Nifty Access Guy
Staff member
Local time
Today, 09:56
Joined
Jul 9, 2003
Messages
16,280
One of my rainy day projects is to build an MS Access database for handling my bank account details.

Basically you export your bank details to comma separated values (CSV) and then they should import nicely into Access... No they don't!, not in all cases! One particular bank, the comma separated values do not format correctly in to MS Access, and I don't know why.

The solution I have come up with reads the comma separated value file, extractS the header records, creates a table, then imports everything as text into the newly created table. The routine then parses through the text and according to certain rules, checks the text entries and decides what type of data it is. Using the results of these decisions to build another table, I suppose you could call it a truth table for want of a better name. The code routine counts the numbers of particular data types in each column in the truth table. If there was a 100% text or integer or currency then it would convert that particular column of the imported table into that particular data type.

This works fine for small tables (Less than 70 Columns) but runs into problems with bigger tables! Still haven't solved that problem yet. I have mentioned it in AWF before. (Joining Large Tables together) I have shelved the project until I can fix this problem, or at least come to terms with the fact that it has this bloody problem!
 
Last edited:

JHB

Have been here a while
Local time
Today, 10:56
Joined
Jun 17, 2012
Messages
7,732
..One particular bank, the comma separated values do not format correctly in to MS Access, and I don't know why.
...
Do the commas been placed correct in the file, when you look at it in program like Notesblok?
If the values are easy to mask, then you could post a part of the original file, where you know there is a problem.
You have not written what exactly goes wrong, but what I suspect is that there may be some hidden characters in the file, like Tab, Return etc..
 

Users who are viewing this thread

Top Bottom