Question about best practice on overwriting records vs deleting records

Isaac · Dec 6, 2023

Benginner2212 said:
I have a database tracking various audio, video and network connections. Each cable is identified with an alphanumeric code that consists of a 3 characters and a four digit number, the source and destination of the cable and the equipment that is connected by the cable.

Currently, our practice is that we delete the record for a given cable number from the database. Which so far hasn't been a problem, but I recently finished a project where my cable numbers were in the 9000 range. Due to limitations caused by other ways that we document how equipment is wired connected, the numeric portion of the cable number can't exceed 9999. To avoid going having cable numbers that exceed 9999, I was thinking that it would be best to overwrite records with new data instead of completely deleting the old records from the database. Then given the user an option to view a report the lists all of the decommissioned cable numbers that are present in the database so that they could effetely re-use a cable number.

I know it's a basic question but I am just curious if overwriting records could eventually cause the record or database to become corrupt?

I almost always err on the side of deactivating (etc) records, mark as deleted, don't actually delete.

Reason: it's really, really hard to troubleshoot things that don't exist.

The_Doc_Man · Dec 6, 2023

ebs17 said:
A long field always occupies 4 bytes, regardless of whether it has been filled or not, and 577907 also occupies the same 4 bytes as long as 2.
How does a record get longer?

It DOESN'T get longer. But you miss the point. When a transaction is underway and doesn't get committed (specifically including DB.Execute sql, dbFailOnError), the WHOLE TRANSACTION rolls back. But you and I both know that the instructions are executed one at a time. Since the records ALL get processed before you regain control after an UPDATE query, it LOOKS monolithic to us - but it is not. (It really can't be!)

Where this becomes an issue is that you make an updated copy of record A, but since there is no COMMIT just yet, the new record A* has to just sit there until the COMMIT arrives - and therefore, so does the OLD record, because it hasn't been obviated just yet either. Now go on to update record B and do the same thing. Within the scope of that COMMIT operation, you have updated records A*, B*, C*, ... and original records A, B, C. The old versions of the records are just waiting for the new records to become current. This means that the virtual memory boundary was at least temporarily "stretched" by the combined lengths of A*, B*, and C*. In the meantime, A, B, and C are still in their correct places.

So...two cases apply.

First, imagine that the transaction fails and has to be rolled back. OK, just delete A*, B*, C*, etc. from where they were waiting, which would be in the next free space after the currently occupied table space in the DB's virtual memory. It might be possible to re-shrink virtual memory boundaries, though I tend to doubt it. Windows DOES have a way to adjust virtual memory size of your process - an API call, ... but it is an expensive operation because I think it requires you to be swapped out to virtual memory in order to adjust some of the virtual memory registers that map your process to physical memory.

Second, let's say the transaction succeeds and the new records have to become current. So if you want Access to be efficient, are you going to COPY the set of A*, B*, C* over the places occupied by A, B, C..., potentially having to copy some number of bytes into the middle of a buffer area, taking into account that if you updated a string field to be longer, you CAN'T do an overlay? Or do you just change the linked list that identifies/locates the table's records so that the pointers now point to A*, B*, C* and then go back to mark A, B, and C as deleted? Which one will take longer? Which one moves less total data? Access works on pointers a LOT. An open query has a bunch of pointers to identify the records being selected, updated, or deleted.

Bloat occurs because those old records A, B, and C are now deleted data, which doesn't get reclaimed until the next C&R. Access CAN'T reclaim that space on-the-fly because it requires you to step through the deleted data records to find one of the right size. HINT: This scan causes virtual memory thrashing, which totally screws your memory performance to the floor. Or into the basement.

So it isn't size-change in a record that causes bloat. It is the fact that you have rollback rules including the dbFailOnError case.

Isaac · Dec 6, 2023

Long story short, I've never experienced bloat from updates. Period. Not even perceptible, anyway

The_Doc_Man · Dec 6, 2023

Isaac said:
Long story short, I've never experienced bloat from updates. Period. Not even perceptible, anyway

For very small numbers of updated records, given that Windows rounds databases to the nearest Megabyte, a large database with limited churn of records won't show a lot of bloat right away.

Isaac · Dec 6, 2023

I just created a 2007 Access database with Table1 containing one million records with one column of Text datatype all containing the value "dog".
Database was 26 MB.

I then updated all million records to "cat" ... db still 26 MB.

Isaac · Dec 6, 2023

well i know i will see a size change if i lengthen the text, also as well as delete and reappend, also i know the sequential nature of the identity keys will be lost - all that i knew. i was just pointing out that a simple update, all else being equal, is not causing a size change.

Benginner2212 · Dec 12, 2023

Pat Hartman said:
Try reading the directions one more time. Let me repeat it for you.

June likes the reuse idea. I disagree. I think it is worse than deleting, but that is just an opinion, especially since I believe that both are wrong. The real issue is do you have paperwork that references the generated ids? You have not actually addressed this question.

I am still not following you on how to create the unique index based on two fields of the form. This is the process that I did based on your directions.

I opened the indexes dialog box and the PrimaryKey Index is already listed.

I create a new index on the first blank line and call it fullCableNumber and the field name that select is my field called cableCategory_PK. The data type of the cableCategory_PK is a number because I have a parent table that contains all of the cable categories that we use. I then add the second field to the index which is the cable number field. Those are the two fields that are used to generate the cable number HDV2022. When I try to save that index, I get an error message that access is unable to save because it would result in duplicate values in the indexes, primary keys or relationships.

And yes we have paperwork that references the generated ID's

gemma-the-husky · Dec 12, 2023

What access is saying is that your data has at least one pair of rows where those key values are duplicated, so it can't set the new index as unique. If it is supposed to be unique, you need to find the duplicates and resolve the issue.

Benginner2212 · Dec 12, 2023

Pat Hartman said:
You need to clean up the existing data first. OR add a third field to the PK. NEVER include the autonumber PK in a multi-field index.

Since you have paperwork, I would NOT delete records and I would NOT overwrite records. You have to try to implement my suggestion where you keep the two underlying fields as the compound unique index and the concatenated field as non-unique. When you need to archive a record, you set the archive date and set the two underlying data fields to null

When you say to add a third field to the PK, do mean add a third field to the Index that I am trying to create or add a third field to the parent table for the cableCategory_PK?

I can't think of a way to clean up the data to remove the duplicate entries error because there a multiple cables in each category so there will always be duplicates if I try to create an index based of the cableCategory_PK.

Benginner2212 · Dec 12, 2023

Pat Hartman said:
Yes, I meant unique indes.

We have gotten off track here. I thought there were originally THREE fields that when combined, constituted uniqueness.

No, there were only two fields that are combined to determine uniqueness. Up to this point I haven't concatenated the two fields. I have added a field labeled fullCableNumber that will be a concatenation of the cableCategory_PK and the cableNumber fields. Then I will be able to use the fullCableNumber field as the unique index for the first index and then I will create an AchiveDT field on the table that I can use as the second non- unique index.

When I archive the record, I can null out the cableCategory_PK and the cableNumber Fields. After that I just need to figure out how to find the gaps between cable numbers in each category.

Benginner2212 · Dec 12, 2023

Pat Hartman said:
I thought that reuse happened when you connected the cable to something else

I am sorry, I may have been using some terminology incorrectly. There are two situations where we would reuse a cable number. The first situation is just like you described, HDV2202 is connects our old digital beta tape deck to input 279 on our video router. Then the 20 year old digital beta tape deck bites the dust and we have to replace it with a new server that records everything to an SSD hard drive. After I have gotten the new server up and running, I would go into the database and pull the record for HDV2202 and I would edit the record to reflect the changed that were made. So if the old digital beta deck was in rack 13, but we installed the new server into rack 12, I would change the source rack from 13 to 12. Then I would change the source description to note HDV2202 is now connected to the new server and not the old digital beta tape deck.

The other situation where we would reuse a cable number is after equipment has been disposed of and the cables have been removed from the rack and this is the situation that I am not sure if it would be better to delete or archive the record. We remove the old digital tape deck along with the cable connecting the old digital tape deck to the router so now there is no tape deck connected to router input 279 via cable HDV2202. The record for that cable is deleted and we print out new hard copies of the cable numbers. The hard copies now go HDV2199, HDV2200,HDV2201 and HDV2203.

Six months down the road we have to connected a new studio camera to the video router. We look in the database and see that there is not HDV2202 in the database and we decide to reuse HDV2202. The way that m y current database is set up, the first record for HDV2202 has been deleted so we create a new record the reflects HDV2202 connects Camera 5 in Studio A to Router Input 135 and print out a hard copy of the cable information.

I am not to worried about the first situation because, with the exception of the cable category, cable source rack and cable destination rack, every field on the cable information table is either a short text, date/time or a check box. So I am just changing the string of characters in the source description field on the cable information table.

The second situation is the one that I am most concerned about. If I keep just deleting the records I am afraid that my database will be corrupted at some point and we'll lose a bunch of data. But if I archive the record, I have to make sure that the archived data doesn't interfere with the new data and I am not getting duplicate records when I print a hard copy of the information or when I run a search for a cable number.

I have attached a copy of my database in case that might help.

GPGeorge · Dec 12, 2023

In my mind it comes down to this: You created a database application in the first place in order to keep track of transactions involving your products and services. Deleting or overwriting records means you no longer have any idea what really happened in the past. You've turned your database into a glorified spreadsheet.

If you do not care what happened last week, last month, last year or at any prior point in time, it's a moot point, I guess. However, personally, I'd look into replacing it a spreadsheet that would be easier, ironically, to update.

Benginner2212 · Dec 12, 2023

Pat Hartman said:
I can't make any of these decisions for you. I've given you my opinion. If there is a paper trail (or email or whatever), I would not delete or overwrite the data. I would always want an audit trail. Here's a simple real world example. You have an order entry system. A customer adds 3 items to his order. He then deletes item 2 before going to checkout. Should I archive item 2 or just delete it since it never really happened. Likely as not, I would allow a delete in this case but once the order had proceeded to checkout, now the customer has seen the completed order and whether he printed it out or not, he might have so if at this point he goes back to his order entry and asks to delete item 2, I would now choose to archive it rather than delete it because there could be a reference to the item in the real world and I would need to be able to answer the question of what happened to item 2 should the customer call and say why did my order contain only two items when I ordered three.

I am not trying to get you to make the decision for me. I am simultaneously trying to weighs the pros and cons as I see them and figure out how to code this. Having an archive of the old cable numbers would be beneficial.

LarryE · Dec 13, 2023

Pat Hartman said:
We are 47 posts into this. You seem to agree that having the archive is the correct approach. Now it's a matter of details. As I said, I don't like the idea I threw out of keeping two versions of the user key in the record. I think there is a better way and the better way would be to create a child table. Post a picture of your schema so we can move on to that.

Pat:
Although I have not been participating, I was curious myself, so here is the design. I created it from the attachment in post #44.

He is using RI, but is not cascading anything. The db has only a few thousand records. I can't believe "bloating" would ever be an issue. He should probably just APPEND records he wants to archive to an archival table, then delete the record in tblCableInformation and forget it. You still have an archived record and could then still reuse the CableNumber field in other connections. He said he was worried that deleting records might cause some sort of corruption, but I don't know why he believes that. Passing it back to you.

Benginner2212 · Dec 13, 2023

LarryE said:
He said he was worried that deleting records might cause some sort of corruption, but I don't know why he believes that. Passing it back to you.

There was another access forum that I used to be active on where I was told that when you deleted a records that have a relationship with records on another table its possible the relationship between the two records might not be deleted correctly. And that could potentially corrupts the database.

Benginner2212 · Dec 13, 2023

Pat Hartman said:
We are 47 posts into this. You seem to agree that having the archive is the correct approach. Now it's a matter of details. As I said, I don't like the idea I threw out of keeping two versions of the user key in the record. I think there is a better way and the better way would be to create a child table. Post a picture of your schema so we can move on to that.

We are using this database to track equipment maintenance and equipment problems as well as cable numbers. I isolated the relationships for the cable number portion of the database.

Gasman · Dec 13, 2023

No point in having a relationship then?

LarryE · Dec 13, 2023

Benginner2212 said:
There was another access forum that I used to be active on where I was told that when you deleted a records that have a relationship with records on another table its possible the relationship between the two records might not be deleted correctly. And that could potentially corrupts the database.

If you check the Cascade Delete Related Records box when you create the relationships, then ACCESS will delete the cooresponding records correctly.

Benginner2212 · Dec 14, 2023

Pat Hartman said:
It seems like Category, CableType, and CableNumber get assigned when a cable is added.

Then Source and Destination Rack get assigned when the cable is installed somewhere.

What does ActiveCable mean?

It also seems like Source and Destination Rack could change if you move a cable.

Can DrawingUpdate change after the record is created?

Yes, Category, CableType and Cable Number are assigned when a cable is added. The source and destination get assigned when a cable is installed somewhere and the source and destination racks can be changed if the cable gets moved and installed somewhere else.

Activecable was my initial attempt to signify that a cable was still physically present between the source and destination racks but that the cable was not connected to anything. And I also used it to make sure that when we printed out a report of the cablenumbers that we didn't have those unconnected cables in the printout.

LarryE · Dec 17, 2023

It's been a few days now. Have you resolved your issues?

Question about best practice on overwriting records vs deleting records

Lifelong Learner

Immoderate Moderator

Lifelong Learner

Immoderate Moderator

Lifelong Learner

Lifelong Learner

Member

Super Moderator

Member

Member

Member

Attachments

George Hepworth

Member

Well-known member

Member

Member

Enthusiastic Amateur

Well-known member

Member

Well-known member

Similar threads

Users who are viewing this thread