Massive Task - Dont Know Where to Start

  • Thread starter Thread starter Qoo
  • Start date Start date
Q

Qoo

Guest
Hi Guys

Ive recently been set an Task that involves collecting data and storing it in an Access Database. Sounds easy, however, there is a massive amount of data in various forms.
What also makes this difficult is the level of detail that my boss would like to see. This isnt too bad as i did a bit of visual basic and im sure i could make a few formulas etc etc but i dont know where to begin.
There is a robot that manufactures batches of cells. Each batch is then tested in groups of 24 cells. A Excel spread sheet is then generated by each 24 cells and data recorded in a number of test fields ( levels of readings) and ultimately wether the unit passes or fails. The thing is, this would be no problem, but for fact that there is such a large quantity of data. This is due to the fact that it has been generating information and just storing it in a folder up until now and there is just so much of it.
An Access data base is required to read certain data and display it - however the excel spread sheets they have been saved on are not linked, and it would take me literally months to manually convert each sheet to a link table. There are over 40 folders each containing over 150 or more excel spread sheets
and i have no idea how to start getting this amount of data into an Access table. Is it possible to extract this data without linking? If so how would i begin extracting that amount of data?

Do you guys have any suggestions about how i should start extracting all this information? Or any other tips for massive databases? Any help at all would be welcome as i literally dont know how to start going about this :D

Cheers guys!

Qoo
 
First thing I do is plan how you want the database set up. Don't worry about forms, reports etc. right now. Start by how you want to set up the tables and the relationships. Once you get that down the rest of it will be easy. I find the hardest part of designing a database program is how the information is stored.
 
Couldn't agree more, Maxx

Get it all down on paper. All the tables and fields and their relationships to each other. Then set about the forms etc.
 
Thanks for the advice
The task in itself is relitively simple - i have now worked out all relationships and how the problem can be solved. My only worry now is wether it is possible to directly access an already created excel spread sheet using access and use it to collate specific data. The prospect of linking 6000 + spreadsheets (and thats only for 1/3 of the year! ) , however, doesnt really appeal.
There is also the additional problem that some of the fields in the table are erroneous and would need to be removed. It would be impossible to go about this manually so i would need a method of removing erroneous data - in order to ensure more accurate results, without risk of altering any other data.
Ideally there would be some kind of command that would enable me to remove an entire row from a non-linked excel spread sheet. Is this possible in access? (sounds like it may need me to do something with the SQL - something i know nothing about)
Is it also possible to acess data directly from an unlinked spread sheet?

thanks again guys!

Qoo
 
If you need to cleanse your data then you're better off in Excel for several reasons:
1) Excel is less fussy about datatypes. Will hold garbage without throwing #Errors all over
2) Easier to cut copy paste between rows that between records in Access
3) Easier to 'eyeball' a set of data
4) Find and filter can be used to help identify data in Excel without the need to write queries

If you've got 6000 spreadsheets, you're going to have to deal with them sometime, they won't go away! Think about aggregating the data into one, or a small number of large spreadsheets, cleanse the data and then import into Access.

If you need to, identify the resources you need to do this project, and then get them. Problems are big if you are short of resources. With adequate resources they are simply tasks. Good project management begins with a plan. Developing the database is only part of the project.
 
Hiya

Thanks for the advice Nieleg - maybe tackling this beast in excel is the way forward. I have a meeting with my boss tomorrow where im going to show him my findings once i have worked out all my options.
Ironically im very "up" for doing this project, however, the only thing that worries me is that my position is not permanent here and as a result i cant just say "YES this can be done - but it will take time" as im only going to be here for another few months and i not sure if its long enough to deal with that volume of data.
6000 + records isnt even the beggining of it ! thats only data that was aquired between sept - dec 2002 ! I dread to think how much there is now!

Thanks again for the help everybody

cheers

Qoo
 
Qoo, you can do this, particularly if you have everything in a single folder. I'm going to randomly toss some concepts at you as things Access will help you in doing. You say you are comfortable in VB, so VBA should not be tough on you.

First, consider the Application.FileSearch object as a way to find your files. The help on the FileSearch object points you to ways to step through the files you find matching any template.

The way I would approach this part of the problem is to keep two folders. As you import each file, rename it (using Name As, q.v.) to a sub-folder or another folder name parallel to (at the same level as) the one holding the .XLS files. If it is in folder NOTYET, you haven't imported it yet. If it is in GOTIT, you have. No big, tricky lookups after the fact.

Second, the import operation isn't necessarily that bad IF you have a way to pre-screen records from the spreadsheet. I have taken both of these two approaches, each one works differently.

First, you could just import the spreadsheet to a table with pre-defined text fields. Then write some filtration queries (or Delete queries) to ignore or remove the stuff you don't want. Then, when your filtration is done, write an Update query between the temporary table and your final data table. Be sure to erase the temporary table before each new import. Format changes to the imported data would occur probably in the Update query.

Alternatively, you can open an Application.Excel object, for which the parent object is a WorkBook which is a collection of WorkSheets. The ActiveSheet (the selected Worksheet) is just a collection of Rows, each of which is in turn a collection of Cells. You can random-access your way through this thing with some simple nested loops, row by row. In this case, you could dynamically test each cell in your code. If a row passes muster, you could have a recordset open to your primary table. Add a new record (.AddNew) and append it to the recordset (.Update).

If something about the name of the file counts as valid data (that does not appear in the spreadsheet), you should have that available to store in the recordset, and in that case, the .Excel object is probably easier to manage.

This whole concept hinges on the ability to encode validation rules (perhaps more than one) for each field to be imported. If you cannot encode the rules, you would have trouble automating this process anyway. So once you have your data laid out, spend some time on defining the rules for data validation.

Hope these ideas help.
 

Users who are viewing this thread

Back
Top Bottom