I'm building a database that's importing Excel contact data into it, and the data (which consists of many thousands of records) is very inconsistent. In the Excel doc, there's a "coverage" field ("coverage" indicates the area a particular contact covers/manages, so it can be multiple geographic locations) and this coverage field has wildly inconsistent entries: sometimes it's just the country "England", sometimes a state "New York", sometimes a region "Northwest", sometimes it's a combination "New York City, New York." In summary, there's no consistency at all. I believe I have two main options for cleaning this up and normalizing: (please provide thoughts on if there are more):
- Create many to many relationships for each type of data I find: people_country, people_city, people_region, people_state, etc.
- Denormalize all coverage entries into a "locations" table, and then create a many to many relation between people_locations, with a lookup table that provides metadata for each location