Need some guidance with normalizing data from excel import (1 Viewer)

Chaga

Member
Local time
Today, 17:47
Joined
Aug 19, 2022
Messages
34
Hello everyone,
I am an ex access developer, but haven't done any work in the past 15+ years so am quite rusty.
Would love it if you could give me some insight as to how to proceed with my issue(s).
I've decided to get out of retirement and help a friend of mine who's in HR, and migrate her data from excel to access. The excel sheet stores mostly CV info about people, and there are around 3,000 records. She just typed in the information the way it was in the resumes.
I need to normalize that data to be able to store it in a db, but am not quite sure as to which work is better done in excel before importing, and which must be done after.
I'll list down all the columns I have in the sheet with my comments next to them.

ID : I added this one to have a record ID
Name : This field contains composite names, always starts with first name, but might have up to 5 middle/last names mostly for Spanish/Portuguese/Arabic/Asian people. The problem I am facing is that for Spanish/Portuguese, the format could be FMML (AYRTON BRENO FARTURA TIAGO), while Arabic it could be FFL(Sheikh Omar Sharif) or FLL(Hassam Abdul Rasheed). What would be the best approach into dealing with that? Use first as F and last as L, with everything in the middle as M?
Phone & Alternate Phone : Here the format varies, but shouldn't be hard to fix. Ive got 1111111,+11111111, (1)111 111 etc...
Location : No Issue
Previous Positions : These cells contain all previous positions in bullet points. I have managed to separate them using the text to column feature in excel, so now I have a sheet with ID, POS1,POS2,etc...
Interview1 Time : NI
Interview1 Date : NI
Interview1 Interviewer : NI
Vaccinated : Has the vaccine name and number of shots taken or simply a yes
Last Salary : Has some that are different currencies
Expected Salary : NI
Contracted : Has Y/N or current company name with date till end of contract.
Visa : Y/N
Applying Position : Has separate positions separated by "/" (Stage Manager / Cast Manager / Creative assistant director / Choreographer)
Feedback : NI
Rate : 1 to 5 with .25 increments
Shortlisted : Y/N
Interview2 Time : Need to be added
Interview2 Date : Need to be added
Interview2 Interviewer : NI
Manager Feedback : NI
Manager Recommended Position : has bullet points and "/" (•Cast Manager / Coordinator •Stage Manager / Coordinator)
Manager rating : Same as Rate
Shortlisted for final interview : Y/N
Past Experience : Bullet points with a list of the previous companies

I also need to add a final interview time and date

I have attached an excel sheet that has the basic table structures I am planning on creating, any input/corrections are more than welcome.
I am using office 2016, but last one I worked on was probably 2006, so have no idea about any new features.
I wish the data didn't have personal information, or else I would've attached a sample.

Thank you in advance.
 

Attachments

  • TableList.zip
    7.3 KB · Views: 96

plog

Banishment Pending
Local time
Today, 09:47
Joined
May 11, 2011
Messages
11,646
The first thing I notice is that you mocked up a database in Excel. For 10% more work you could have actually had something to start with. You would have been able to use the Relationship Tool, been able to throw some sample data in there to test a few things and actually have something you could use.

That aside, it looks good for the most part--other than not being able to see the relationships among tables.. The only thing I see is I'm not a fan of tables with only 1 real field in them (autonumbers are not real pieces of data). tblCountries, tblGenders, tblDivisions etc don't need to exist. Instead of using their ID as a foreign key into whatever table you should just store the actual value.

Make this in Access, throw some sample data at it and then see if it serves your needs. Then import everything into it if it does.
 

June7

AWF VIP
Local time
Today, 06:47
Joined
Mar 9, 2014
Messages
5,472
Breaking up composite names is not as simple as taking first word as first name and last word as last name. A first name could be Betty Jo. A last name could be Van Buren.

Also, do not use spaces nor punctuation/special characters in object naming.
 

Chaga

Member
Local time
Today, 17:47
Joined
Aug 19, 2022
Messages
34
Breaking up composite names is not as simple as taking first word as first name and last word as last name. A first name could be Betty Jo. A last name could be Van Buren.

Also, do not use spaces nor punctuation/special characters in object naming.
I am aware of that, but other than manually entering names, what would be the best approach?
 

Chaga

Member
Local time
Today, 17:47
Joined
Aug 19, 2022
Messages
34
The first thing I notice is that you mocked up a database in Excel. For 10% more work you could have actually had something to start with. You would have been able to use the Relationship Tool, been able to throw some sample data in there to test a few things and actually have something you could use.

That aside, it looks good for the most part--other than not being able to see the relationships among tables.. The only thing I see is I'm not a fan of tables with only 1 real field in them (autonumbers are not real pieces of data). tblCountries, tblGenders, tblDivisions etc don't need to exist. Instead of using their ID as a foreign key into whatever table you should just store the actual value.

Make this in Access, throw some sample data at it and then see if it serves your needs. Then import everything into it if it does.
I am an old school access user where we'd make tables for everything to enforce referential integrity.
It's not gonna be a big db, so no harm done here.
I've created a basic db and also added an excel with some data. As I said, I mostly need to figure out what work should be done in excel, prior to importing data, and what can be done after the import.
Thanks
 

Attachments

  • JennHR.zip
    68.4 KB · Views: 95

June7

AWF VIP
Local time
Today, 06:47
Joined
Mar 9, 2014
Messages
5,472
I am aware of that, but other than manually entering names, what would be the best approach?
However you automate break up of names, really need to do a review of each and every name to make sure the splitting is correct for each. There is no easy way to ensure this data integrity.
 
Last edited:

mike60smart

Registered User.
Local time
Today, 15:47
Joined
Aug 6, 2017
Messages
1,905
I am an old school access user where we'd make tables for everything to enforce referential integrity.
It's not gonna be a big db, so no harm done here.
I've created a basic db and also added an excel with some data. As I said, I mostly need to figure out what work should be done in excel, prior to importing data, and what can be done after the import.
Thanks
Using Data - Text to Columns in Excel gives you the following output
 

Attachments

  • Names.PNG
    Names.PNG
    17.9 KB · Views: 88

Pat Hartman

Super Moderator
Staff member
Local time
Today, 10:47
Joined
Feb 19, 2002
Messages
43,275
tblCountries, tblGenders, tblDivisions etc don't need to exist.
@plog you post this frequently. I'd like to understand your thinking. Are you saying that this data should not be validated? Or are you saying that it should be stored at the table level in a combo with a value list? The first is just plain wrong and the second is hard to manage and requires the programmer to physically manage the list. Personally, I solved the problem of the proliferation of lookup tables 40 years ago. Long before I came to Access. For Access, I turned it into a mini-app that can be imported into any database. It is an excellent way to manage lots of small, simple lookup tables without using embedded lists or creating separate tables for each. If you want, you can implement security. I only used three levels - Programmer (things that the users could never change because they involved code to support the data changes), Admin (only admins), Anyone. Most clients never use the Anyone setting because people typically don't pay attention and left to their own devices will just add mis-spellings to the list.

@Chaga Given that Access is my tool of choice rather than Excel, I would use Access for anything that requires coding to fix. Since some of the fields can contain anything, they aren't suitable for searching and would be too hard to clean up so I would load them into long text fields if they could be more than 255 characters and leave it at that.

For the salary, I would attempt to convert it before loading it so it is all in the currency of your country. Same thing when you get to the data entry part. If you want to support currency conversions, you are going to need a source of daily currency rates. You can either do this as you need them or set up an automated job to update them daily or weekly or monthly, whichever makes sense. I don't have a link but there are websites that will download currency conversion lists in Excel format. Those can be linked to or imported.

Vaccine is also nebulous. I'm assuming this is related to COVID because prior to COVID, no one cared about vaccinations. To bad the COVID "vaccine" isn't an actual vaccine. In this case, if you want to keep the data, I would attempt to normalize it because if you care about it, it matters as we get new variants if a person has gotten his four boosters for the year:(

I don't like mushing names so I agree, use the first space to identify the first name and the last space to identify the last name and everything in the middle is the middle name. You can add the prefix and suffix fields if the data includes that and use a function with lists of prefixes and suffixes to see if the first or last is a prefix/suffix. You don't want a first name of Dr. or a last name of Jr.

There are several fields which I would put into separate tables such as interview times and interview could occur multiple times per individual. Ratings typically happen annually and so those belong in a separate table. Same with applications and the position applied for. Applications can also happen multiple times and actually the interview is a child of the Application.

Please post a picture of your schema and we will comment further.
 

plog

Banishment Pending
Local time
Today, 09:47
Joined
May 11, 2011
Messages
11,646
Or are you saying that it should be stored at the table level in a combo with a value list?

This one. Usually. More often than not, you don't even need a value list.

You don't need a Gender table nor is it hard to administer it with my method. Also, Chaga has tblRatings with an autonumber primary key that is used to look up a numeric field. So instead of storing the numeric value needed he's storing a numeric value related to the actual number needed. Unnecessary.

I've seen this play out time and again--either using a table to relate to just a number or used to hold a virtually immutable list. People read up on normaliziation then over do it. Not everything needs its own table.
 

Gasman

Enthusiastic Amateur
Local time
Today, 15:47
Joined
Sep 21, 2011
Messages
14,306
Well I used to have a Value list for a combo consisting of Male and Female just because that was all that was needed at that time.
That no longer flies these days. :) One would be best having a table for all the new permutations? :)
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 10:47
Joined
Feb 19, 2002
Messages
43,275
@plog the tables or value lists are used to ensure that only the CORRECT values are entered. You need one or the other if you want valid entries. Apparently, you've never worked in the insurance industry where there are dozens of code lists for every application. Are you suggesting that these should be kept as value lists on tables and the programmer should make updates as necessary. I'm trying to understand your position. I don't get why you recommend that people don't enforce validation on their lookup lists. I agree that separate tables for each list gets unwieldy which is why I came up with this solution 40 years ago.

Have you ever looked at the link I posted? It solves the problem by using two tables and two forms and a report to manage an "infinite" number of simple lookup tables.

There are a few common "lists" where you wouldn't expect changes. Gender and Marital status are just two. Looks like Gender has gone wacko and if you didn't use a table before, you need one now. Even a value list doesn't work any more if you want to actually keep track of them because new "genders" are invented daily:( Since we are OK with the polygamy of Islam, I would expect new marital status' any day now. Not for Mormons, mind you but polygamy is OK for Muslims.
 

plog

Banishment Pending
Local time
Today, 09:47
Joined
May 11, 2011
Messages
11,646
Yikes, I really touched a polygamy/islam/gender nerve with my last response. I apologize. I will try to avoid those topics as well as Trump, racial equality, homosexuality, abortion and the moon landing in this response. But no guarantees.

Your insurance issue is not applicable to this discussion. Codes don't live in a vacuum. By their nature they are references to other data and are data themselves = 2 pieces of real data.

Also, no, I didn't read the link you didn't post in this thread.
 

Pat Hartman

Super Moderator
Staff member
Local time
Today, 10:47
Joined
Feb 19, 2002
Messages
43,275
Sorry,

@plog I'm not trying to start a fight with you. I'm trying to understand your position and so far, your response has not been clear. Let's start again. If you don't think that validation lists should be stored in tables, what do you think is the solution? Are you suggesting value lists at the table level for dozens of items? How do you propose to update them? The interface barely works for 6 items, let alone dozens. Or are you saying that we shouldn't use combos at all and just make people type into a text field?

The link above is to a solution I created a long time ago. It has stood the test of time. I use it in every application I build once I know I need at least one list for a combo.
 

June7

AWF VIP
Local time
Today, 06:47
Joined
Mar 9, 2014
Messages
5,472
I am also confused by Plog's statement. Why would you not have a table of countries even if there is only ID and CountryName fields?
 

Users who are viewing this thread

Top Bottom