255 Column Limit

The_Doc_Man · Nov 9, 2023

Based on these answers, your 5 spreadsheets would relate to 5 SETS of records from the same tables. If everything is the same for each provider, you can already save space and organizational structure by stacking the five sheets "vertically" with a column for "Service Provider." This is a part of the process of normalization.

LarryE · Nov 9, 2023

This is very complex and the design is the easy part. Converting all of the EXCEL data will require time and experience with the process. Having said that, I must say you are going about it in the proper manner. Learning about table normalization and referencial integrity is paramount. You are building a house and a properly built foundation comes first and that will eliminate many many problems in the future.

Attached is a file that has a table design that MIGHT be appropriate. Please look at Database Tools>Relationships to see a very preliminary design. Of course I have no idea about specific fields you need in each table, but you can enter those as required. I will post the design so others can comment as well:

Note:
Instead of having separate RCADS, CurrentView and ORS tables for survey answers, you could just include those respective ID's in a single answer table.

naa123 · Nov 10, 2023

Thank you very much for this, @LarryE! It is very helpful!

I just wanted to ask about the structure of the questionnaire tables. This goes back to posts #15/#16/#17 on this thread, so apologies if I am going round in circles, I just want to ensure I understand everything!

Before I had received all of this helpful advice I had planned to structure the Current View questionnaire just as one table (as below), but I am now learning it is best to have answer tables (e.g. CurrentView in above design) separate to survey question tables (e.g. CurrentViewSurvey in above design). I understand that this would be best due to limits on table and record width, and as databases tend to work best with long and thin tables. However, it would be a lot easier to import data as we currently have it into a table as below, especially as we get new data. Would I likely run into issues structuring my data in this way? With the way I plan to query tables, I do not foresee myself ending up with a record above 4000 bytes.

Appointment ID

Completed by

CV Q1

CV Q2

CV Q3

CV Q4

CV Q5

GaP42 · Nov 10, 2023

A cautionary word ... I cannot provide insight on the particular data collected in the surveys, however I have experience in managing health data collected for various purposes across a very large health system for statutory reporting, for epidemiological and performance reporting. The data collected for this purpose is formulated into structures that support data analysis rather than transaction processing. @naa123, you need to be clear about the purpose of your data collection. it appears that it is likely that it is to be focused on analysis and reporting - the submission of spreadsheets is likely to continue as the method of data acquisition - and this database will not have a front end to support data entry on a record-by-record basis.

If what I am saying is correct then the process of normalisation may not be entirely appropriate as it is focussed upon ensuring data "truth" is preserved in a transaction-focussed basis through ensuring only one instance/copy of a data item is maintained with those other items which describe the same "thing". In a repository where the data is to be made available for analysis and reporting this may not be the focus - and data can be structured to meet / facilitate reporting. Much of the reporting needed involved constructing the data sets into "flat files" that those with tools such as SAS could easily apply their analyses.

The appropriate storage of data was focussed on what was the atomic level of data that was being collected, what "events" or "measures were being collected to make these the central focus of the data structure - with things like who, where, type etc as dimensions of those events or measures. This is, in brief a star or snowflake style schema to your database.

The issue is not about saving storage space - it is about the width (no of columns) of your tables. Sure, each spreadsheet does not need its own table and the data acquisition process needs to check the state of the data, assign foreign keys to link to refernece data and popuplate the "fact" table. Your fact tables seem to be focussed on Survey/Questionaire Responses and standard measures. The Dimensions are appointments (when measure was taken), Providers (Dept /Institutions), Dr and Patient of course. "Recommendations" appears to be a bit of an anomaly - it is free text and not standardised (and therefore not well formed for data analytics), and there is no data on diagnosis.

LarryE · Nov 10, 2023

naa123 said:
Thank you very much for this, @LarryE! It is very helpful!

I just wanted to ask about the structure of the questionnaire tables. This goes back to posts #15/#16/#17 on this thread, so apologies if I am going round in circles, I just want to ensure I understand everything!

Before I had received all of this helpful advice I had planned to structure the Current View questionnaire just as one table (as below), but I am now learning it is best to have answer tables (e.g. CurrentView in above design) separate to survey question tables (e.g. CurrentViewSurvey in above design). I understand that this would be best due to limits on table and record width, and as databases tend to work best with long and thin tables. However, it would be a lot easier to import data as we currently have it into a table as below, especially as we get new data. Would I likely run into issues structuring my data in this way? With the way I plan to query tables, I do not foresee myself ending up with a record above 4000 bytes.

Appointment ID Completed by CV Q1 CV Q2 CV Q3 CV Q4 CV Q5

My thought is that you would have a standard set of questions for all 3 questionnaires. They are stored in their respective survey tables. Then when a patient makes an appointment, someone will need to make a determination of which survey questionnaires are appropriate to be answered. After that determination is made there should be separate command buttons on the Appointments form that will APPEND the active Appointment AppointmentID Foreign Key to the appropriate answer table (RCADS or CurrentView or ORS or some or all of them). Then, the answers can be entered. That way, only the appropriate questionnaires and answers are completed for that appointment for that patient.

You stated that each survey may or may not be completed for each patient for each appointment. But they all need to be available to be completed. Which ones, will need to determined and then simply click the appropriate button(s) and complete the questionnaire.

GaP42 · Nov 11, 2023

I will persist with the line I raised earlier - as I have not yet seen confirmation from the OP that the envisaged method of operation is to capture the data in the db at the point of patient care as described in post #25 - ie at the appointment. Typically the survey data is collected on a paper form or perhaps a survey software which is transferred to spreadsheets. (It may also be captured using the clinic information system/module in use within the Department (which can vary between Depts, and be interfaced to the Patient Admin System). Another system to capture patient data is another source of error and data redundancy in these circumstances).

@naa123 please provide confirmation of the above (this post / post #25, and #24) - it is fundamental to what you need.

Pat Hartman · Nov 11, 2023

GaP42 said:
Typically the survey data is collected on a paper form or perhaps a survey software which is transferred to spreadsheets.

Why would you enter the data into spreadsheets? That's an unnecessary step. Why not enter it into the computer directly? If you get surveys from others, you can normalize them on the way in. If they are wider than 255 columns, You simply create multiple append procedures. Procedure 1 selects fields 1- 250. Procedure 2 selects field1 (the pk), fields 251-300. Once the data is normalized, you can work with it easily.

Here is a very simple example of a survey database. It lets you define surveys, respondents, and then enter the answers for the respondent. The app allows a respondent to take the same survey multiple times. The response form is set up with two types of answers. One uses a combo the other a text box. You can get fancy and overlay controls so that you have the option of combo, text, date, number and in the current event of a row, you can set focus to the control you need for that particular question. This adds complexity but I've done it for other applications where i needed to collect a very large range of data points but not all were relevant for each survey. This sample includes a grouping option which is used to organize the questions.

LarryE · Nov 11, 2023

GaP42 said:
I will persist with the line I raised earlier - as I have not yet seen confirmation from the OP that the envisaged method of operation is to capture the data in the db at the point of patient care as described in post #25 - ie at the appointment. Typically the survey data is collected on a paper form or perhaps a survey software which is transferred to spreadsheets. (It may also be captured using the clinic information system/module in use within the Department (which can vary between Depts, and be interfaced to the Patient Admin System). Another system to capture patient data is another source of error and data redundancy in these circumstances).

@naa123 please provide confirmation of the above (this post / post #25, and #24) - it is fundamental to what you need.

It's pretty clear that when the OP said, "With the Current View, this is usually only filled out once during the first appointment, unless circumstances significantly change. The other 2 measures should be completed at each appointment, however this is not always the case." in post #20 that the survey questions are collected at the time of the appointment, if at all.

Any system should be able to print the questionnaires first and then collect and enter the data. That said, survey questions could most certainly be distributed to patients via e-mail before the actual appointment. Clinics and doctors do that all the time.

GaP42 · Nov 11, 2023

LarryE said:
It's pretty clear that when the OP said, "With the Current View, this is usually only filled out once during the first appointment, unless circumstances significantly change. The other 2 measures should be completed at each appointment, however this is not always the case." in post #20 that the survey questions are collected at the time of the appointment, if at all.

Any system should be able to print the questionnaires first and then collect and enter the data. That said, survey questions could most certainly be distributed to patients via e-mail before the actual appointment. Clinics and doctors do that all the time.

That right @LarryE - the data is collected at the appointment. What I am asking is whether the OP is seeking to change the method of collection at the appointment or if the problem is about how to manage the data (for analysis) after the collection. Does the OP have the ability to effect change in the collection process (managing change in at least 5 clinics, and the work practices in place), or is it about how to handle the data after collection. The clinics will have software to manage appointments/patients etc data, and may collect the survey data there too. If the data is being extracted as separate excel spreadsheets at each clinic then an appointment management system is probably not being sought (and a home grown system should not be recommended given the constraints that exist around managing patient data. )

If the survey data is supplementary/additional to the patient management systems (PMS) employed at the clinics, then yes, a process for collection of survey data may be needed - but consider, if the PMS exists, should there not be a method to get the accurate patient/ provider / therapist/ referral / appointment data from there rather than trying to recreate them in the survey data gathering application? Some considerable data redundancy and duplicate data entry if such integration is not supported - with implications for work practices.

LarryE · Nov 12, 2023

@naa123
Could you please at least provide the EXCEL sheets column names so we can know what fields they are tracking? You can just make a copy of one of the sheets and then delete all the rows so just the column names show.

Thank you.

naa123 · Nov 13, 2023

GaP42 said:
I will persist with the line I raised earlier - as I have not yet seen confirmation from the OP that the envisaged method of operation is to capture the data in the db at the point of patient care as described in post #25 - ie at the appointment. Typically the survey data is collected on a paper form or perhaps a survey software which is transferred to spreadsheets. (It may also be captured using the clinic information system/module in use within the Department (which can vary between Depts, and be interfaced to the Patient Admin System). Another system to capture patient data is another source of error and data redundancy in these circumstances).

@naa123 please provide confirmation of the above (this post / post #25, and #24) - it is fundamental to what you need.

Hi all, apologies for any confusion I have caused by not being clear on this. I am still getting my head around a lot of it myself (and learning a lot!) and in doing so may have misunderstood/missed key information. GaP42, you’re right – this database will not have a front end to support data entry, instead, data is collected by the service through their clinical software and this is then transferred to spreadsheets. I get access to the data at the point that it has been transferred to the spreadsheet – I am not able to change the collection method for these services. What I am concerned with is managing the data after collection so it can then be reported on. From the advice provided, it seems that normalization may not be appropriate given the purpose of this Access database.

Currently, the data is received on spreadsheets, then cleaned for analysis and reporting on patient outcomes, waiting times etc. Previously, all cleaned data was stored on Excel, but as the amount of data received is growing, I decided to move this master database to Access with the hopes it can better handle the large amount of data. What I need to think about now, is how best to import the large flat Excel files and structure these into appropriate tables to link through queries for analysis. Ideally, I would want to link tables to cleaned Excel spreadsheets so data in Access is updated as and when new data is received/added to cleaned Excel spreadsheet.

ebs17 · Nov 13, 2023

naa123 said:
What I need to think about now, is how best to import the large flat Excel files and structure these into appropriate tables to link through queries for analysis. Ideally, I would want to link tables to cleaned Excel spreadsheets so data in Access is updated as and when new data is received/added to cleaned Excel spreadsheet.

The first step should always be to create a functional database schema in the database, according to database rules, not according to the Excel tables that are offered to you.

The database schema would give you an idea of how to break down the wide tables into narrower tables (less than 255 fields) so that query access is possible.

There are solutions for importing from Excel tables into the tables of a database schema using queries.

GaP42 · Nov 13, 2023

@naa123 - appreciate the confirmation of the situation. You have indicated you do data cleansing using the spreadsheets. Does this include ensuring, for instance that the same patient attending, if they attend more than one clinic (not the same clinic multiple times) that the records for the appointment point to the same patient - not by name - but by a common ID?
Also I saw in your Word doc that you capture postcode and age. As age is provided and not DoB then that is specific to the appointment and not the patient (age at attendance) . I presume you will not calculate back to DoB. Also for postcode - if this is to assign the patient to a geographic location at the time of attendance - which may vary (the patient may move) - and if that is important (retain the postcode at the time of attendance) - as that may be important for epi reasons- it is also an attribute of the appointment? These are questions for your benefit in design/ assignment of attributes to the appropriate tables in the database as well as others that may read this case, as it may not be what is normally encountered.

In the process of data acquisition, your database can / should use stages where the process acquires the extract and logs its receipt, then applies cleansing routines and reporting on results of cleansing: what do you do if data is not acceptable - there may be data records that must be rejected entirely, others may have accepted but with errors logged. You can assign "red tape" fields - from which batch did it originate, does it have an associated error record? etc. And you probably need to assign foreign keys (such as for the Patient ID if there is not a common source for this across the data collection). These are the ETL : Extract, Transformation, Load step(s) These can be built as your sophistication develops with the business needs. If you are OK with handling the ETL through the spreadsheets.

Once ready in the staging tables you can then load into the tables supporting analyses.

A preliminary. high level view (much dependent on a limited concept of what data you do have):

However, with respect to the survey data the structure is extended:

Note the 1:1 relationship of Appointment to each survey response - these could all be combined - but as you have encountered if they were placed in one table the items extend beyond 255 items.
The data sets/ queries are selectable by the dimensions of clinic/patient (type), Clinician/Dr to obtain the relevant appointments and the associated survey responses (as needed for detailed analysis. As said this high level conceptual view - there may be other dimensions not represented (eg Calendar date/periods- if you are not just selecting on the basis of appointment date).
You may also need to rigorously manage the definition of surveys (assuming variations may occur over time), your Clinicians - (presuming name is not sufficient - a medical practice ID?), and Clinics/Providers.

Your spreadsheets are identical in structure, so if all data is good, then you need to work out a standard process by which the Patient, Appointment and (Survey)Response tables are populated from each.

The_Doc_Man · Nov 13, 2023

While the "best answer" is complex, so is the problem. If you want to think of it this way, perhaps it can help.

Access CANNOT externally map that extra-wide Excel sheet as though it were a table any better than it could make a native Access table that wide. "No" still means "No" on the width question. However, Excel AS EXCEL can map it, and from Access, you CAN open an Excel Application Object that can read and manipulate the rows and columns of a spreadsheet, even if it is a huge one. You can open an Excel workbook, activate one of the sheets, and start random-accessing the cells by row number and column number.

Therefore, it might be possible to write code to handle the import problem one row at a time by selectively extracting data from desired columns and performing various recordset operations. Like, take the contents of the first 12 columns as a record, but then select columns 13-20 as another record in a different table. To do this properly will take a careful analysis to, ON PAPER, determine the mapping of what goes where from each row. The analysis will be tedious but will help you to have a better mental grasp of the process. This process is certainly part of what GaP42 was discussing. It might lead to a "standard process" to find and distribute the table data. Once you design that process fully, the code will be pretty simple for most of the data storage because you will have a road map.

I have done a decomposition of spreadsheets before, because when I took over our department's computer inventory, I discovered it had been kept in a messy spreadsheet designed by non-programmer engineers. That same concept - open the workbook, scan the sheet, pluck data from specific columns as needed - worked great for me at least twice before. Tedious to start - but eminently doable, and once you get accustomed to it, it goes easily and smoothly.

naa123 · Nov 15, 2023

Thank you very much everyone for your very helpful responses (and patience with me!), I really appreciate it. Everything feels a lot clearer now and I have some great advice to consider going forward.

GaP42, thank you for your detailed guidance in your recent post - it is extremely useful. Your questions around the cleansing process and the data acquisition process are also very helpful to think about.

GaP42 · Nov 15, 2023

naa123 said:
Thank you very much everyone for your very helpful responses (and patience with me!), I really appreciate it. Everything feels a lot clearer now and I have some great advice to consider going forward.

GaP42, thank you for your detailed guidance in your recent post - it is extremely useful. Your questions around the cleansing process and the data acquisition process are also very helpful to think about.

Happy to hear you a seeing a way forward. Please reach out directly through mail if you wish to discuss/review some details.
A further thought - while your problem was about the appropriate data structure to use, in this scenario a MS backend database was the assumption. The MS access db has a 2GB size limit. Your ETL processing / MS front end can be split from the MS back end database, so the BE only has to store the data. Further you can have the data in a larger capacity DMBS to which the MS access FE can be linked. Doing this will rely upon available services and expertise. While I am not sure about the direct translation of Excel spreadsheet file size to Access be file size (and indeed it does depend upon the specifics of the translation) do some calculations of the size/frequency of your spreadsheets and the rate of growth of your data - and more particularly with a "test" import your spreadsheet to the MS Access database.

255 Column Limit

The_Doc_Man

Immoderate Moderator

LarryE

Active member

Attachments

naa123

New member

GaP42

Active member

LarryE

Active member

GaP42

Active member

Pat Hartman

Super Moderator

Attachments

LarryE

Active member

GaP42

Active member

LarryE

Active member

naa123

New member

ebs17

Well-known member

GaP42

Active member

The_Doc_Man

Immoderate Moderator

naa123

New member

GaP42

Active member

Similar threads

Users who are viewing this thread