Ideas for ways to automate importing external data (1 Viewer)

cricketbird

Registered User.
Local time
Yesterday, 20:09
Joined
Jun 17, 2013
Messages
108
I have a database containing nutrient information for a variety of foods. We regularly get nutrient reports from various analytical laboratories. The reports are in excel, and they name both the foods and nutrients with a variety of non-standardized names. The data tends to come to us in wide/crosstab format, but my database tables are structured in long/normalized format.

My current process for importing these data is:
  • Manually rename each food to match our food names
  • Manually rename each nutrient to match our nutrient names
  • Use R (statistics software) to "flatten" the data (see below)
  • Import flattened spreadsheet into Access
  • Run an append query

This process is tedious and prone to error (mine). I feel I would be better off developing a form that would allow me to match the lab sample names with our food names, the lab nutrient names with our nutrients, and import everything automatically. I just can't figure out how to start. Is Access even the best tool for this?

The various structures are below. Any suggestions on an approach would be greatly appreciated.

CB


Example Lab 1 Report:
Food Protein Fat Calcium
Apples 5 5 2
Bananas 4 9 3

Example Lab 2 Report:
Food Protein(g) Fat(g) Calcium(mg/kg)
Apple 6 6 3
Banana 5 10 5

Flattened/Normalized File from R combining both labs:
Food Nutrient Value Lab_ID Date
Apple Protein_g 5 1 12/1/2016
Apple Fat_g 5 1 12/1/2016
Apple Calcium_mg 2 1 12/1/2016
Banana Protein_g 4 1 12/1/2016
Banana Fat_g 9 1 12/1/2016
Banana Calcium_mg 3 1 12/1/2016
Apple Protein_g 6 2 12/1/2016
Apple Fat_g 6 2 12/1/2016
Apple Calcium_mg 3 2 12/1/2016
Banana Protein_g 5 2 12/1/2016
Banana Fat_g 10 2 12/1/2016
Banana Calcium_mg 5 2 12/1/2016

My database:
FOOD_TABLE
Food_ID
Food_Name
etc.

NUTRIENT_TABLE
Nutrient_ID
Nutrient_Name

NUTRIENT_DATA
Data_ID
Data_Date
Lab_ID
Food_ID
Nutrient_ID
Nutrient_Value
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 01:09
Joined
Sep 12, 2006
Messages
15,656
Flattened/Normalized File from R combining both labs:

the table layout looks OK.

import this as a table (or link to it, but I generally import). you should then be able to write a series of queries to extract the data you want into the appropriate tables.
 

cricketbird

Registered User.
Local time
Yesterday, 20:09
Joined
Jun 17, 2013
Messages
108
Thanks - but this still involves me manually coding all the original data files in Excel. I guess I was hoping I could use Access to recode the data somehow. Right now the first two steps on my process can take over an hour per report (there are hundreds of items on these reports), and I get 3-5 reports per day.
 

Tieval

Still Clueless
Local time
Today, 01:09
Joined
Jun 26, 2015
Messages
475
You cannot use Access as a brain :mad:

What you could do is import the details of the datasheet into a table such as 'tempdata'.

You could then load this into a sub-form for viewing with the main form for data entry. The main form would contain your real fields and could consist of drop-down boxes etc. to ease data entry.

Once you close the form the correct data would be entered and the information cleared from the tempdata table.
 

Frothingslosh

Premier Pale Stale Ale
Local time
Yesterday, 20:09
Joined
Oct 17, 2012
Messages
3,276
I have once created a procedure that basically did what you're asking. It basically involved a staging table, multiple append queries to compress the columns at issue into one field, and a whole lot of data validation.

It can be done, but it's going to take a fair amount of work up front and some maintenance, not to mention a way to tell if the automated procedure runs into a new column it's not equipped for.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 19:09
Joined
Feb 28, 2001
Messages
27,184
Read up on Access using automation methods called "Application Objects." What you want here is an Excel Application Object. You will also need to look up the File System Object and you will need to learn how to use a File Picker. Read up on VBA if you aren't already fluent, because to do this in an automated way, you are going to need your programming skills. Hold on to your hat, 'cause this is going to be a roller-coaster ride.

First: Set up a working table to contain your name, nutrient, date, lab ID, nutrient value, and whatever else you are staging. You will do your imports through this table.

Second: Set up translation tables to let you look up a raw name and come back with a translated name. Two columns, most likely. RName (raw name) and XName (translated name). You can do this for food names and for nutrients in separate tables if you wish.

Third: Though you COULD theoretically do this with a Macro that does a RunCode, you MIGHT do better if you ran this from a form due to the better error handling capabilities associated with a Class Module.

OK, on the form build a button that says DO IT. Behind the button (in the OnClick routine) you will have code to either directly or via subroutine do the following steps.

1. Open up a file-picker dialog. Pick a spreadsheet.

2. Create an Excel application object. (In my discussion I'll call it xlObj, but the name is arbitrary.) Use it to open the picked file as a workbook. Tell it to select the correct worksheet, which will become your ActiveSheet. (Sort of like the "ME" name for forms - it is a shortcut.)

3. Write a loop to step through the worksheet one row at a time. That isn't as hard as it sounds. You will know how many rows you have from ActiveSheet.Rows.Count; you can address each row with ActiveSheet.Row(n).

4. Read the row's first column. (ActiveSheet.Row(n).Column(1)) to see if it is the header or a data line. The header seems to contain keyword "FOOD" so that is one tell-tale item. You can also test the other columns to see if the 2nd or later column is numeric (using the IsNumeric(string) function, which returns TRUE if the text string is numeric or FALSE if not.)

5. Have two arrays, NutrName(1 To 4) and NutrValue(1 To 4) - but if there are actually more columns, then use 1 to however many columns you really needed. You showed 4 columns, but this method would work for 100 columns just as well.

For the row that contains the header, for each column you do a nutrient name lookup (DLookup) of what is in that column of the spreadsheet and store the translated name in the corresponding NutrName array slot.

For each row that contains a food name, do a food name lookup for column 1 only and store that in NutrName(1); for each nutrient column you do a Value(string) for the string that appears in the spreadsheet for that column and store it in the NutrValue array slot.

6. When you have finished the row, write a second loop that does an INSERT INTO for array members 2 to n (n being the number of columns you imported) where you build a dynamic SQL query to insert the food name that you stored in column 1 and the nutrient name and value from the correct array slot.

7. If this all worked without error then have two more controls for Lab ID and Sample Date on the working form. Enter the Lab ID and Date (which are, after all, constant for a single spreadsheet, right?) Enable a button to do the append. Build one more SQL INSERT INTO query, this time importing the contents of your temp table with the fields from that table and the LabID and SampleDate fields supplied by the form's controls.

8. Always, always, always: Close the xlObj.Workbook without saving. Close the xlObj object. Never leave these two items dangling when you are done, whether with success or failure.

Now, the wrinkles... If you run across a name you have not seen before, you have a couple of options.

First is to pop up a message box that says "Cannot translate food name <gutschmecken>", and then disable the button to do the append.

Second option is to pop up an input box that says "Cannot translate food name <gutschmecken>. What name should I use?" Then synthesize an INSERT INTO for the appropriate table (nutrient or food name) and add the new name and new translation. Then take that same input name and use it as though the DLookup had succeeded and returned that transation.

Final advice: Read this SLOWLY. I have to presume you have some programming skills, and this will require you to use them. But just remember Julius Caesar's advice: Divide and Conquer! Take each little part and contemplate it.
 

Tieval

Still Clueless
Local time
Today, 01:09
Joined
Jun 26, 2015
Messages
475
Obviously Access would not be able to handle gutschmecken as it is a matter of taste:D
 

Frothingslosh

Premier Pale Stale Ale
Local time
Yesterday, 20:09
Joined
Oct 17, 2012
Messages
3,276
Doc, having done precisely what the OP is asking for, let me just point out that you can do the same thing by importing the data into a staging table first, and it will run a LOT faster via recordset than via automation. For a 6,000 line spreadsheet, it dropped from 10 minutes to about 45 seconds in the app where I had to do it.

As long as the data is clean, the import wouldn't be a problem, but if it's not clean he's going to have issues either way.
 

cricketbird

Registered User.
Local time
Yesterday, 20:09
Joined
Jun 17, 2013
Messages
108
Thanks everyone for the speedy and detailed advice! The_Doc_Man - thank you very much for such a detailed procedure. I think that is a direction I will head, with some modifications that have struck me in the meantime.

Here is my current thoughts to changing the procedure using your suggestions:
1) flatten the data FIRST using R (really - someone needs to translate the reshape>melt procedure from R into VBA. Surely I'm not the only one that has to deal with flattening data)
2) Import flattened data into Access as a table or temp table
3) Use the validation tables that The_Doc_Man and Frothingslosh suggested to contain an ever-growing list of what various labs call various foods and nutrients (and LOVE the option to automatically add new items as they pop up at the end of The_Doc_Man's process).
4) Use queries to join the various tables together and append a final "clean" version of the data to my data table.

Do you see advantages/disadvantages with this approach vs. the others mentioned?

CB
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 01:09
Joined
Sep 12, 2006
Messages
15,656
you have to do it the first time, to work out the processes involved.

as long as the excel spreadsheet doesn't change then your process will work forever.

I think access might have some data analysis options, but even if it does, you need to check it yourself, and what you have described is not complicated.
 

cricketbird

Registered User.
Local time
Yesterday, 20:09
Joined
Jun 17, 2013
Messages
108
Hey! I think I've got something going here that is a little faster:

1) Flatten data in R, import to Access (hoping to eventually automate this, but so far, manually) - looks like flattened file above
2) SQL query adds two columns to the imported table: LinkedNutrient and LinkedFood
3) A form based on the above SQL query uses combo boxes to pick nutrients and foods from existing tables and puts those values in the LinkedNutrient and LinkedFood columns.
4) Another SQL query formats the data (still working on some unit conversions) and appends it to my nutrient data table.

I've got the pieces working now for data that needs no unit conversions (mg to g, % to ppm, etc.) - just have to wrap it in VBA to get it to flow nicely. Once I've got it well oiled, I'll post back. Will work on unit conversions and final touches tomorrow. Another step will be to modify the #3 above and save the matched values for future use. But, I'll need to do it manually for a while to build up my table of potential matches.

Thanks everyone for your ideas. Static - I started off using yours as the base. Appreciate the example!

CB
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 19:09
Joined
Feb 28, 2001
Messages
27,184
Frothingslosh, you are absolutely correct and I could have tossed out something that used an UPDATE based on a JOIN to a translation table. My method, though not as fast as using straight SQL, might have been conceptually easier to understand.

I am well aware of how many ways there are to skin a cat, having been cited many times by the ASPCA.
 

cricketbird

Registered User.
Local time
Yesterday, 20:09
Joined
Jun 17, 2013
Messages
108
I ended up with a form laid out sequentially into the following steps:
  1. The first item is a combo box that lets you choose the lab format (so far, I've only got this set up for one lab). This is next to a button that launches the corresponding script in R. The R script prompts the user for an excel file and flattens the data using the column names (i.e. nutrient analyses) that are specific to that lab.
  2. A button that opens the file picker to choose the flattened file exported from R in step 1, applies some conversion factors, lines up the column headers (analyses' names) with our in-house names for those, adds a column for FOODID (our database primary key) and puts this in a temporary table.
  3. Displays the temp table in datasheet view with one column being a combo box listing our food names, so the user can manually line up the imported data foods with our food names, and review the data for any odd numbers or data that didn't get imported correctly.
  4. A button to import the data you have just viewed and edited. This appends it to our main database.
 

RogerCooper

Registered User.
Local time
Yesterday, 17:09
Joined
Jul 30, 2014
Messages
286
I have analogous problem. I need to import reports from our customers with sales from their branches, so I can compute commissions for field reps.

The customers vary greatly in format. On the other hand, only have about 10 customers who do this.

Is to save the customer reports, removing superfluous header information. I create links to the reports and design a unique query to populate a standardized table in Access. Once the query is designed everything is done automatically with a Macro.
 

NauticalGent

Ignore List Poster Boy
Local time
Yesterday, 20:09
Joined
Apr 27, 2015
Messages
6,341
Is to save the customer reports, removing superfluous header information. I create links to the reports and design a unique query to populate a standardized table in Access. Once the query is designed everything is done automatically with a Macro.

Maybe a simpler approach would be to provide your customers with a standard form instead...?
 

Users who are viewing this thread

Top Bottom