Solved Should I change the Database design

why

Member
Local time
Yesterday, 23:49
Joined
Sep 28, 2020
Messages
40
Currently, I have an Excel Sheet that Imports using the

Code:
DoCmd.TransferSpreadsheet acImport, acSpreadsheetTypeExcel12, "all_matters", "R:\Close Files\accessimport.xls", True

This excel sheet has 16 columns and about 52,000 Rows. It does have a unique id. There about 10-20 rows are added 5 days a week. Also, 20-50 rows could have between 1 and 12 columns updated daily. (this comes from a separate system. I currently don’t have a way to do an ODBC connection to that system.) This Table is the primary table of the database.

Screenshot (Fake Data)
Excel.jpg



When we update the excel sheet it will contain the 52,000 rows plus any new rows and the rows that have updates.

Currently, I am deleting the table and then importing a new excel file. This is working I am just wondering if this is the best way to handle it. Or should I be doing an append and update? I have 11 other tables that use the RowID as a foreign key, but I noticed that when I delete the table “all_matters” It deletes the relationships. Also, I am worried about when I have multiple users it could cause problems. I was thinking about importing excel into a table “all_matters1” Then trying to do an update and append to “all_matters” Honestly, I am not sure of the best way to handle this. I would appreciate any advice.



Also, while talking about design I am thinking about splitting my Database into a front end and backend but had a question. If I split it is there a way to make a copy of it that can be put on a laptop to use offsite, then data merged back in with the main database? (This would need to be easy because it would be done quite frequently.) If so, how would I do this?
 
Since you are looking at data changes in existing rows AND new rows, AND you have relationships for dependent rows that you don't want to lose, you have the worst of all situations. Let's do a "divide and conquer" approach here.

First question/comment: Is there ANY field in the spreadsheet that would tell you unequivocally that this was an existing row that was updated? Like a last change date or something. Because if you can identify changed rows then your daily update becomes manageable. If not, it is still possible but becomes a real pain in the patootie.

If you COULD identify changed rows easily then your daily operation is a three-parter. First, link to the spreadsheet as an external table. Doesn't matter if it is read-only if you have a way to know which lines changed. Second, run an INSERT INTO query from that spreadsheet / table for any rows where the unique ID number is not in the main table. Third, run an append query from that spreadsheet / table for the existing but changed rows.

If there is no easy way to know that something has changed, you have only a brute force method available in which you compare fields in order to identify the records in order to do the update. Oh, it might be possible to write a single query for this, but it is guaranteed to not run fast in that case because that external table won't be indexed. You'll be doing a relation scah.

Second question/comment: That external edit/subsequent merge requires a feature that is no longer supported as an automatic feature on modern versions of Access. There is the question of how often you intended to do this AND whether you would be the only person doing it.

As to the take-home/edit/reconcile feature, you'll have to "roll your own" on that one. Basically, you need a copy of the table with one more flag that says "I edited this while I was at home." (You could pick a shorter name for that, of course.) Make your copy to your laptop, setting the flag to false for all records. Then, any records you change, you set the flag to TRUE. Then to reconcile, you copy back the file and link to that copied table as an external file. Run an update query of the main table based on those records that have change flags.
 
  • Like
Reactions: why
Agreeing with what has been posted so far, and emphasizing that anything you can do to avoid the bloat that will occur from "deleting and re-inserting" large quantities of records ..... if it were me I would double check to see if there was anything I could do to exert some control over the Excel process. For example, some method that would essentially make sure each row was 'tagged' in a way that you could then interpret whether it was New or Modified (like Doc Man has mentioned).

If you can collaborate with the folks that modify this Excel file and be allowed to make some VBA driven enhancements to it, making these Modified and Created datetime-stamp columns would be easy.
 
Since you are looking at data changes in existing rows AND new rows, AND you have relationships for dependent rows that you don't want to lose, you have the worst of all situations. Let's do a "divide and conquer" approach here.

First question/comment: Is there ANY field in the spreadsheet that would tell you unequivocally that this was an existing row that was updated? Like a last change date or something. Because if you can identify changed rows then your daily update becomes manageable. If not, it is still possible but becomes a real pain in the patootie.

If you COULD identify changed rows easily then your daily operation is a three-parter. First, link to the spreadsheet as an external table. Doesn't matter if it is read-only if you have a way to know which lines changed. Second, run an INSERT INTO query from that spreadsheet / table for any rows where the unique ID number is not in the main table. Third, run an append query from that spreadsheet / table for the existing but changed rows.

If there is no easy way to know that something has changed, you have only a brute force method available in which you compare fields in order to identify the records in order to do the update. Oh, it might be possible to write a single query for this, but it is guaranteed to not run fast in that case because that external table won't be indexed. You'll be doing a relation scah.

Second question/comment: That external edit/subsequent merge requires a feature that is no longer supported as an automatic feature on modern versions of Access. There is the question of how often you intended to do this AND whether you would be the only person doing it.

As to the take-home/edit/reconcile feature, you'll have to "roll your own" on that one. Basically, you need a copy of the table with one more flag that says "I edited this while I was at home." (You could pick a shorter name for that, of course.) Make your copy to your laptop, setting the flag to false for all records. Then, any records you change, you set the flag to TRUE. Then to reconcile, you copy back the file and link to that copied table as an external file. Run an update query of the main table based on those records that have change flags.
Thank you so much unfortunately I don't think there is a way to get a field to see if it has changed. So how should I do the update? Should I use a recordset and do while not EOF with a bunch of if loops? Do you have an example to get started?

Here is the partial code I had started. Not even sure if this will work
Code:
Public Sub Updateall_Matters()
Dim db As Database
Dim rs As Recordset
Dim rs1 As Recordset
'Dim str As String

Set db = CurrentDb
' main table
Set rs = db.OpenRecordset("all_matters")
' Just imported from excel
Set rs1 = db.OpenRecordset("all_matters1")

Do While Not rs.EOF
    rs.MoveFirst
    If rs("RowID") = rs("RowID") Then
      If rs("ClientName") <> rs1("ClientName") Then
      With rs
      .Edit
      rs("ClientName") = rs1("ClientName")
 
Last edited:
NEVER do this in an if loop if there is another way.

Call your main table MT and your linked spreadsheet LS.

Code:
UPDATE MT INNER JOIN LS ON MT.RowID = LS.RowID
  SET MT.Matter = LS.Matter, 
    MT.MatterDescription = LS.MatterDescription, 
    MT.MatterType = LS.MatterType, 
    .... (other fields to be copied)
  WHERE MT.Matter <> LS.Matter 
    OR MT.MatterDescription <>LS.MatterDescription 
    OR MT.MatterType <> LS.MatterType
    OR ..... ;

Then as a separate action,

Code:
INSERT INTO MT (RowID, Matter, MatterDescription, MatterType, ... (other fields to be inserted) ...)
SELECT RowID, Matter, MatterDescription, Mattertype, ... list of corresponding fields
FROM LS WHERE LS.RowID NOT IN ( SELECT MT1.RowID FROM MT AS Mt1) ;

First query tries to find and update rows that don't match. WARNING: This will be slower than anyone would like.

Second query tries to import rows that don't exist in the main table yet.
 
If you COULD identify changed rows easily then your daily operation is a three-parter.

to find which records have at least one changed field, hash all your field values together (you can exclude the uniqueID), then compare the hash with a hash you store in the destination table together with a join on the uniqueID

a hash function along these lines

Code:
Public Function hashFields(ParamArray flds() As Variant) As String
Static UTF As Object
Static md5 As Object
Dim TextToHash() As Byte
Dim bytes() As Byte
Dim i As Integer
Dim fldStr As String


    If flds(0) = "" Then 'free memory after query has been run, called after the .execute command
     
       Set UTF = Nothing
       Set md5 = Nothing
   
    Else
   
        For i = 0 To UBound(flds) - 1 'convert fields to a string value (can be mix of text/numbers. Since the hash does nUTF need to be dehashed, consistency is what matters
           
            fldStr = "|" & flds(i) & fldStr
       
        Next i
       
        fldStr = Mid(fldStr, 2) 'remove initial |. nUTF essential, just my habit
       
        'create objects
        If UTF Is Nothing Then Set UTF = CreateObject("System.Text.UTF8Encoding")
        If md5 Is Nothing Then Set md5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
       
        'convert to a hash string (hex format)
        TextToHash = UTF.Getbytes_4(fldStr)
        bytes = md5.ComputeHash_2((TextToHash))
        hashFields = ConvToHexString(bytes)


    End If
   
End Function

examples of output
?hashFields("aaa", "bbb", 0, #12/1/2019#, True)
d95565b39def6238d35cdcd23117bb93
?hashFields("aaa", "ccc", 0, #12/1/2019#, True)
a9fa7e639bed4c1bf9ed4b4821f20d19

even a change of case will be picked up
?hashFields("aaa", "CCC", 0, #12/1/2019#, True)
b05fd0197e030d191c2b6f9a46a00199

can use in a query like this
Code:
SELECT *
FROM tblSource INNER JOIN tblDestination ON tblSource.UniqueID = tblDestination.UniqueID
WHERE hashfields(tblSource.fld1,tblSource.fld2....)<>tblDestination.hashresult
 
  • Like
Reactions: why
NEVER do this in an if loop if there is another way.

Call your main table MT and your linked spreadsheet LS.

Code:
UPDATE MT INNER JOIN LS ON MT.RowID = LS.RowID
  SET MT.Matter = LS.Matter,
    MT.MatterDescription = LS.MatterDescription,
    MT.MatterType = LS.MatterType,
    .... (other fields to be copied)
  WHERE MT.Matter <> LS.Matter
    OR MT.MatterDescription <>LS.MatterDescription
    OR MT.MatterType <> LS.MatterType
    OR ..... ;

Then as a separate action,

Code:
INSERT INTO MT (RowID, Matter, MatterDescription, MatterType, ... (other fields to be inserted) ...)
SELECT RowID, Matter, MatterDescription, Mattertype, ... list of corresponding fields
FROM LS WHERE LS.RowID NOT IN ( SELECT MT1.RowID FROM MT AS Mt1) ;

First query tries to find and update rows that don't match. WARNING: This will be slower than anyone would like.

Second query tries to import rows that don't exist in the main table yet.
Thank you I will test in the morning and let you know
 
to find which records have at least one changed field, hash all your field values together (you can exclude the uniqueID), then compare the hash with a hash you store in the destination table together with a join on the uniqueID

a hash function along these lines

Code:
Public Function hashFields(ParamArray flds() As Variant) As String
Static UTF As Object
Static md5 As Object
Dim TextToHash() As Byte
Dim bytes() As Byte
Dim i As Integer
Dim fldStr As String


    If flds(0) = "" Then 'free memory after query has been run, called after the .execute command
    
       Set UTF = Nothing
       Set md5 = Nothing
  
    Else
  
        For i = 0 To UBound(flds) - 1 'convert fields to a string value (can be mix of text/numbers. Since the hash does nUTF need to be dehashed, consistency is what matters
          
            fldStr = "|" & flds(i) & fldStr
      
        Next i
      
        fldStr = Mid(fldStr, 2) 'remove initial |. nUTF essential, just my habit
      
        'create objects
        If UTF Is Nothing Then Set UTF = CreateObject("System.Text.UTF8Encoding")
        If md5 Is Nothing Then Set md5 = CreateObject("System.Security.Cryptography.MD5CryptoServiceProvider")
      
        'convert to a hash string (hex format)
        TextToHash = UTF.Getbytes_4(fldStr)
        bytes = md5.ComputeHash_2((TextToHash))
        hashFields = ConvToHexString(bytes)


    End If
  
End Function

examples of output
?hashFields("aaa", "bbb", 0, #12/1/2019#, True)
d95565b39def6238d35cdcd23117bb93
?hashFields("aaa", "ccc", 0, #12/1/2019#, True)
a9fa7e639bed4c1bf9ed4b4821f20d19

even a change of case will be picked up
?hashFields("aaa", "CCC", 0, #12/1/2019#, True)
b05fd0197e030d191c2b6f9a46a00199

can use in a query like this
Code:
SELECT *
FROM tblSource INNER JOIN tblDestination ON tblSource.UniqueID = tblDestination.UniqueID
WHERE hashfields(tblSource.fld1,tblSource.fld2....)<>tblDestination.hashresult
Thank you I will test this also, I was searching about hashing this morning and saw some hashing on SQL but was not sure if it was possible in access.
 
not sure if hashing will work with long text/memo fields - if you are using them, you'll have to try and see.
 
Good point, CJ. We have seen articles in which LONG TEXT variables get truncated at 255 bytes. So if there is a LONG TEXT field involved, it could happen that only the first 255 bytes are hashed. Thus, changes made by appending text to a long text field would be missed.

But I also think I recall that using TransferSpreadsheet won't import LONG TEXT cases anyway, so that might be a different question entirely.
 
NEVER do this in an if loop if there is another way.

Call your main table MT and your linked spreadsheet LS.

Code:
UPDATE MT INNER JOIN LS ON MT.RowID = LS.RowID
  SET MT.Matter = LS.Matter,
    MT.MatterDescription = LS.MatterDescription,
    MT.MatterType = LS.MatterType,
    .... (other fields to be copied)
  WHERE MT.Matter <> LS.Matter
    OR MT.MatterDescription <>LS.MatterDescription
    OR MT.MatterType <> LS.MatterType
    OR ..... ;

Then as a separate action,

Code:
INSERT INTO MT (RowID, Matter, MatterDescription, MatterType, ... (other fields to be inserted) ...)
SELECT RowID, Matter, MatterDescription, Mattertype, ... list of corresponding fields
FROM LS WHERE LS.RowID NOT IN ( SELECT MT1.RowID FROM MT AS Mt1) ;

First query tries to find and update rows that don't match. WARNING: This will be slower than anyone would like.

Second query tries to import rows that don't exist in the main table yet.
So here is what I have so far
Code:
db.Execute "UPDATE all_matters INNER JOIN all_matters1 ON all_matters.RowID = all_matters1.RowID" & _
  " SET all_matters.ClientName = all_matters1.ClientName," & _
   " all_matters.Matter = all_matters1.Matter, " & _
   " all_matters.MatterDescription = all_matters1.MatterDescription, " & _
   " all_matters.ClosedDate = all_matters1.ClosedDate, " & _
   " all_matters.MatterType = all_matters1.MatterType, " & _
   " all_matters.ClosedFile1 = all_matters1.ClosedFile1, " & _
   " all_matters.ClosedFile2 = all_matters1.ClosedFile2, " & _
   " all_matters.Years = all_matters1.Years, " & _
   " all_matters.Notes = all_matters1.Notes, " & _
   " all_matters.DestructionDate = all_matters1.DestructionDate, " & _
   " all_matters.ProposedDestructionD = all_matters1.ProposedDestructionD" & _
  " WHERE all_matters.ClientName <> all_matters1.ClientName " & _
   " OR all_matters.Matter <>all_matters1.Matter " & _
   " OR all_matters.MatterDescription <>all_matters1.MatterDescription " & _
   " OR all_matters.MatterType <> all_matters1.MatterType " & _
   " OR all_matters.ClosedFile1 <> all_matters1.ClosedFile1 OR all_matters.ClosedFile1 Is Null Or all_matters.ClosedFile1 = """" " & _
   " OR all_matters.ClosedFile2 <> all_matters1.ClosedFile2 OR all_matters.ClosedFile2 Is Null Or all_matters.ClosedFile2 = """" " & _
   " OR all_matters.Years <> all_matters1.Years OR all_matters.Years Is Null Or all_matters.Years = """" " & _
   " OR all_matters.Notes <> all_matters1.Notes OR all_matters.Notes Is Null Or all_matters.Notes = """" " & _
   " OR all_matters.DestructionDate <> all_matters1.DestructionDate OR all_matters.DestructionDate Is Null" & _
   " OR all_matters.ProposedDestructionD <> all_matters1.ProposedDestructionD OR all_matters.ProposedDestructionD Is Null;"

I didn't add a blank on the dates because it was giving a mismatch error.

If all_matters1 field is null and all_matters is not null it does not update Not sure of the best way to handle the update so it will remove the data from all_matters field and make it null. How can I fix this?

Nevermind after a night of sleep I realized I could use a second or statement with the all_matters1 table null. This seems to work.
 
Last edited:
As part of the comparisons, every field that CAN be null should be bracketed by an NZ(field,default-value) function. If the field is text, you can use a quoted string for the default value. If the field is a number, you can use a number as the default value. The problem is that NULL is a problem-child in that it is never equal to anything - including another NULL. So you use NZ(,) to trap and replace nulls.
 
As part of the comparisons, every field that CAN be null should be bracketed by an NZ(field,default-value) function. If the field is text, you can use a quoted string for the default value. If the field is a number, you can use a number as the default value. The problem is that NULL is a problem-child in that it is never equal to anything - including another NULL. So you use NZ(,) to trap and replace nulls.
Thank you
 

Users who are viewing this thread

Back
Top Bottom