I have a wide variable length text file. The record width is between 150 and 450 characters. To add to the complication the file as multiple record types in each file, in no particular order.
What I would like to do is write a piece of VBA code that will suck all the data into an existing table so the new table can be further massaged into the component parts.
The table the data is going into is extremely simple:
Field1 - Text - 255 char
Field2 - Text - 255 char
There has to be a simple way to do something like this. I have been playing around with Macro and the import function, my only problem is the imput file name changes often. Everything else stays the same.
There is one line of non-delimited text. There is no way to set up fields to be imported since the structure of any given line could be one of 6 different things.
I am not the one who created the export/import text file. It is coming from the government and this is what I am given.
So how do you propose splitting the long (or wide) text into 2 separate fields?
Why not just have one memo (long text) field instead and dump the whole lot in that?
I ran into something like this once, where the text file could have each line be a different record type, but they were identifiable via the first three letters in the line, and each record type had a specific fixed-width format (up to 1600 characters in length).
I set up a definitions table, with a field for the identifier, an ID field, and a field indicating which table the record gets placed in, and linked it to a formatting table that was little more than that ID as a foreign key and then fields for field name, field starting location, and field length.
The app would read the file line-by-line, identify the format for each line, parse it out based on the corresponding format, and drop the completed record into the table corresponding to the record type.
I also spent entirely too long trying to convince anyone who would listen that that file should be updated to a more modern format.
Anyway, assuming you have a way to identify a format for each line, you might be able to use the same approach.
We won't even go into what I had to do to import a file that used multi-row records formatted for a green-bar printer. That one was a mess.
I have often used a similar line by line technique to that suggested by Frothingslosh for reading text reports with variably line wrapped data interspersed by page headers and various lines of other unwanted junk.
There is virtually no limit to what can be done this way. You just have to define the logic.
Nothing really to be gained by importing into a table for processing.
Of course the same idea applies to any text file whether it be CSV, XML, JSON etc. Any such file can be split into component fields providing the structure is known so rules can be applied to parse it.
However in post 3 the OP suggested there was no clear structure.
That is the reason I asked how he proposed it should be split
If there is truly no structure then the data is incomprehensible. Otherwise the user must be applying some kind of logic to interpret it. That logic should be able to be expressed as code, though it may be very complex.
I can't see how dropping it in a table is really going to help.
Might be able to open it as a text file for input, then do an INPUT LINE on it and stuff that into a LONG TEXT record. But if there is no infrastructure in the record, I don't know what you would do with it afterwards.
Yeah, if there's truly no structure to the record, then at best it needs human intervention for each line, and at worst it's meaningless gibberish. I took his comments to mean simply that there's no one logic applicable to every line.
There is one line of non-delimited text. There is no way to set up fields to be imported since the structure of any given line could be one of 6 different things.
There is one line of non-delimited text. There is no way to set up fields to be imported since the structure of any given line could be one of 6 different things.
"One of six different things" sounds pretty clear to me. Perfectly suited to line by line parsing which is where Frothy and I were taking the conversation until post 7 implied that we were on the wrong track with:
Any such file can be split into component fields providing the structure is known so rules can be applied to parse it.
However in post 3 the OP suggested there was no clear structure
Without SOME kind of sample, we can bump around in the dark all night. You can read long text from a file by opening it in text mode. Without more data, however, such as info on these putative six formats, it will be hard to get more specific.
Here is an example that reads a kss file. This file is created by most CAD programs and is imported into an application I built to manage a drawing log that tracks drawings as they are sent from one team to another.
For this particular file format, the record type sequence has meaning since one of the record types may occur multiple times for a particular drawing and the parent keys are not repeated in the child rows so the program has to keep track of when there is a break in the group of records related to a particular drawing.
Just FYI our drawings are for the steel infrastructure of a building or bridge.
Code:
Public Sub ImportKSS(frm As Form)
Dim db As DAO.Database
Dim qd As DAO.QueryDef
Dim tdD As DAO.TableDef 'Drawing
Dim rsD As DAO.Recordset
Dim tdS As DAO.TableDef 'Sequences
Dim rsS As DAO.Recordset
Dim tdR As DAO.TableDef 'Revisions
Dim rsR As DAO.Recordset
Dim Job As String
Dim JobID As Variant
Dim HoldDrawingID As Variant
Dim strFullFileName As String
Dim strFileName As String
Dim iPos As Integer
Dim iLen As Integer
Dim strLine As String
Dim strKrec As Variant
Dim strHrec As Variant
Dim strDrec As Variant
Dim strSrec As Variant
Dim strSQL As String
Dim strWhere As String
Dim strMsg As String
Dim vDesc As Variant
Dim vDrawingPfx As Variant
Dim vDrawingNum As Variant
Dim vDrawingSfx As Variant
Dim vDrawingID As Variant
Dim strDrawingNum As String
Dim iStart As Integer
Dim iEnd As Integer
Dim FileNum As Long
Dim CountDrawings As Long
Dim CountRev As Long
Dim PrevFullDwgNum As String
Dim CurFullDwgNum As String
Dim aNamePart() As String
On Error GoTo ImportKISS_Error
'strFullFileName = "C:\Data\DrawingProject\ExtractSamples\BS-2525_MIS_list.kss"
strFullFileName = frm.txtPath
JobID = frm.cboJob
'Delete old records
DoCmd.RunMacro "mWarningsOff"
DoCmd.RunSQL "Delete * From TEMP_AccumulateSequences;"
DoCmd.RunMacro "mWarningsOn"
'Open recordsets
Set db = CurrentDb()
Set tdD = db.TableDefs!tblDrawings
Set rsD = tdD.OpenRecordset(dbOpenDynaset, dbSeeChanges)
Set tdS = db.TableDefs!TEMP_AccumulateSequences
Set rsS = tdS.OpenRecordset(dbOpenDynaset, dbSeeChanges)
Set tdR = db.TableDefs!tblRevisions
Set rsR = tdR.OpenRecordset(dbOpenDynaset, dbSeeChanges)
'process file
CountDrawings = 0
CountRev = 0
FileNum = FreeFile
Close FileNum
Open strFullFileName For Input As FileNum
'set initial values
CurFullDwgNum = ""
PrevFullDwgNum = ""
vDrawingID = ""
HoldDrawingID = ""
DoEvents
Do While Not EOF(FileNum)
Line Input #FileNum, strLine
Debug.Print strLine
Select Case Left(strLine, 1)
Case "K" 'kiss file definition
strKrec = Split(strLine, ",")
GoSub Krec
Case "H" 'header
strHrec = Split(strLine, ",")
GoSub Hrec
Case "D" 'Detail
strDrec = Split(strLine, ",")
GoSub Drec
frm.Repaint
Case "S" 'Sequence
strSrec = Split(strLine, ",")
GoSub Srec
Case Else
End Select
Loop
'append summarized sequence records
DoCmd.RunMacro "mWarningsOff"
DoCmd.OpenQuery "qAppendSequences"
DoCmd.RunMacro "mWarningsOn"
'update suffixes
Set qd = db.QueryDefs!qUpdateSfx
qd.Parameters!EnterJobID = JobID
qd.Execute
ImportKISS_Exit:
rsD.Close
rsR.Close
rsS.Close
Close FileNum
strMsg = CountDrawings & " Drawings Imported" & vbCrLf & CountRev & " Revisions Imported"
'MsgBox strMsg, vbOKOnly
frm.txtDwgCount = CountDrawings
frm.txtRevCount = CountRev
Exit Sub
'====================================================================
Krec:
If strKrec(1) = "1.0" Or strKrec(1) = "1.1" Then
Else
MsgBox "Record format is not recognized. Call programmer to fix.", vbOKOnly
Exit Sub
End If
Krec_Exit:
Return
'----------------------------------------------
Hrec:
'modified form to have user select the job rather than relying on data in Hrec
Return
'----------------------------------------------
Drec:
'parse drawing number into pfx, num, sfx
vDrawingPfx = Null
vDrawingNum = Null
vDrawingSfx = Null
strDrawingNum = strDrec(1)
''''iStart = InStr(1, strDrawingNum, "[0-9]")
aNamePart = Split(strDrawingNum, "-")
If UBound(aNamePart) > 0 Then
vDrawingPfx = aNamePart(0)
vDrawingNum = aNamePart(1)
vDrawingSfx = aNamePart(2)
Else
vDrawingNum = strDrawingNum
End If
''''remove dlookup for speed
''''''check for duplicates (KSS file has duplicate drawing "D" records. Bypass all but first
''''' vDrawingID = DLookup("DrawingID", "qFullDrawingNum", "JobID = " & JobID & " AND FullDrawingNum = '" & strDrawingNum & "'")
''''' Replaced with a new field that concatenates all parts because SQL Server does not support
''''' a unique index with null columns.
CurFullDwgNum = (vDrawingPfx + "-") & vDrawingNum & ("-" + vDrawingSfx) ''vDrawingPfx & "-" & vDrawingNum & "-" & vDrawingSfx
frm.txtCurrentDwg = CurFullDwgNum
strWhere = "JobID = " & JobID & " AND (FullDrawingNum = '" & strDrawingNum & "' OR FullDwgName = '" & strDrawingNum & "')"
If CurFullDwgNum = PrevFullDwgNum Then ' duplicate dwg rec
If vDrawingID & "" = "" Then 'should never happen
vDrawingID = DLookup("DrawingID", "qFullDrawingNumLookup", strWhere)
End If
Else
vDrawingID = DLookup("DrawingID", "qFullDrawingNumLookup", strWhere)
End If
If vDrawingID & "" <> "" Then 'The drawing already exists so check the revision.
If strDrec(2) & "" = "" Then 'No rev on kss file so assume rev 0
strWhere = "DrawingID = " & vDrawingID & " AND RevNum = '0'"
Else
strWhere = "DrawingID = " & vDrawingID & " AND RevNum = '" & strDrec(2) & "'"
End If
If DCount("*", "tblRevisions", strWhere) > 0 Then
GoTo Drec_Exit
Else
HoldDrawingID = vDrawingID
CountRev = CountRev + 1
GoTo Drec_AddRev
End If
End If
rsD.AddNew
rsD!JobID = JobID
rsD!DrawingNum = vDrawingNum
rsD!DrawingPfx = vDrawingPfx
rsD!DrawingSfx = vDrawingSfx
rsD!FullDwgName = CurFullDwgNum '(vDrawingPfx + "-") & vDrawingNum & ("-" + vDrawingSfx)
rsD!DrawingTypeID = 9 'default type imported from KSS file
''''' removed to increase speed -- replaced with an update query run at the end of the import
'''''''' If vDrawingSfx & "" = "" Then
'''''''' Else
'''''''' vDesc = DLookup("SfxDesc", "lkpSfxDesc", "DrawingSfx = '" & vDrawingSfx & "'") 'get desc
'''''''' If IsNull(vDesc) Then 'insert new suffix
'''''''' vDesc = Null
'''''''''''' vDesc = "Added from KSS file"
'''''''''''' strsql = "Insert into lkpSfxDesc (DrawingSfx, SfxDesc, UpdateDT, UpdateBy) Values ('" & vDrawingSfx & "', '" & vDesc & "', #" & Now() & "#, 'KSS File')"
'''''''''''' DoCmd.RunMacro "mWarningsOff"
'''''''''''' DoCmd.RunSQL strsql
'''''''''''' DoCmd.RunMacro "mWarningsOn"
'''''''' End If
'''''''' rsD!Desc = vDesc
'''''''' End If
rsD!Quantity = strDrec(5)
rsD!AssemblyMark = strDrec(3)
rsD!PartMark = strDrec(4)
rsD!Desc = strDrec(11)
rsD!UpdateBy = Environ("UserName")
rsD!UpdateDT = Now()
'DrawingID = rsD!DrawingID 'only works for Jet/ACE
rsD.Update
HoldDrawingID = db.OpenRecordset("SELECT @@Identity")(0) 'must be outside .AddNew for SQL Server
vDrawingID = HoldDrawingID
PrevFullDwgNum = CurFullDwgNum
CountDrawings = CountDrawings + 1
Drec_AddRev:
''delete existing sequences.
Set qd = db.QueryDefs!qDelSequences
qd.Parameters!EnterDrawingID = HoldDrawingID
qd.Execute
''add new rev
rsR.AddNew
rsR!DrawingID = HoldDrawingID
If strDrec(2) & "" = "" Then
rsR!RevNum = "0"
Else
rsR!RevNum = strDrec(2)
End If
rsR!UpdateBy = Environ("UserName")
rsR!UpdateDT = Now()
rsR.Update
Drec_Exit:
Return
'----------------------------------------------
Srec:
If HoldDrawingID & "" = "" Then 'drawing wasn't added so don't add sequences
GoTo Srec_Exit
End If
rsS.AddNew
rsS!DrawingID = HoldDrawingID
rsS!PhaseSeqNum = strSrec(1)
rsS!QtyPieces = strSrec(2)
rsS!UpdateBy = Environ("UserName")
rsS!UpdateDT = Now()
rsS.Update
Srec_Exit:
Return
'----------------------------------------------
'''''''CreateJobRec:
''''''' strSQL = "Insert Into tblJob(Job, JobPrefix, UpdateDT,UpdateBy) Values(" & QUOTE & Job
''''''' strSQL = strSQL & QUOTE & ", " & QUOTE & Left(Job, 2) & QUOTE & ", Now(), "
''''''' strSQL = strSQL & QUOTE & Environ("UserName") & QUOTE & ")"
''''''' ''''strSQL = "Insert Into tblJob(Job,UpdateDT,UpdateBy) Values(" & QUOTE & Job & QUOTE & ", Now(), " & QUOTE & Environ("UserName") & QUOTE & ")"
''''''' DoCmd.RunSQL strSQL
''''''' JobID = DMax("JobID", "tblJob", "Job = '" & Job & "'")
''''''' If IsNull(JobID) Then
''''''' MsgBox "Job could not be created for Job -" & Job & "-"
''''''' Exit Sub
''''''' End If
'''''''CreateJobRec_Exit:
''''''' Return
'----------------------------------------------
ImportKISS_Error:
Select Case Err.Number
Case 3022
Resume Drec_Exit
Case 9 'subscript out of range
Resume Next
Case Else
MsgBox "Error " & Err.Number & " (" & Err.Description & ") in procedure ImportKISS of Module Module1"
Resume ImportKISS_Exit
Resume Next
End Select
End Sub
Public Function GetJob(txtString As String) As String
Dim iUnderscore As Integer
Dim iSpace As Integer
Dim iPeriod As Integer
Dim iLen As Integer
'function finds the first delimiter so Job name can be isolated.
'users sometimes use space, sometimes _, and sometimes nothing to delimit the "Job" part of the file name.
'When using a series of unrelated if statements, you don't find the "first" delimiter, you find the one you looked for first.
'So this function solves that problem by comparing the position of subsequent searches and choosing the lowest value.
On Error GoTo GetDelim_Error
iLen = 0
iUnderscore = Nz(InStr(txtString, "_"), 0) 'find _ if present
iSpace = InStr(txtString, " ") 'find space if _ not present
iPeriod = InStr(txtString, ".") 'find . if _ and space not present
If iUnderscore <= 0 Then 'underscore not present
iUnderscore = 0
End If
If iSpace <= 0 Then
iSpace = 0
End If
If iPeriod <= 0 Then
iPeriod = 0
End If
'find first delimiter
iLen = iPeriod
If iSpace > 0 And iSpace < iLen Then
iLen = iSpace
End If
If iUnderscore > 0 And iUnderscore < iLen Then
iLen = iUnderscore
End If
'check file type
If iPeriod > 0 Then
If Mid(txtString, iPeriod + 1, 3) = "kss" Then
Else
MsgBox "File type is not kss. Please choose a kss file.", vbOKOnly
GetJob = "Error"
Exit Function
End If
Else
MsgBox "File type is invalid. Please choose a kss file.", vbOKOnly
GetJob = "Error"
Exit Function
End If
'set return value
If iLen > 0 Then
GetJob = Left(txtString, iLen - 1) 'ilen is the position of the first delimiter
Else
GetJob = txtString
End If
GetDelim_Exit:
Exit Function
GetDelim_Error:
Select Case Err.Number
Case Else
MsgBox "Error " & Err.Number & " (" & Err.Description & ") in procedure GetDelim of Module modImportKSS"
End Select
End Function