Admittedly, I am a bit of a hack at this. My programming is based in 13 years of VB and C++ programming with a smattering a SQL thrown in over the years. Now suddenly I find myself doing some serious data mining. I sometimes revert to code to plow through the data using brute force, which is what I did here. It works, but it is slow. The pace is so fast at this job it is difficult sometimes to be able to take the time stand back learn something new.
Update: Well, I finally got around to running your very eloquent query on some real data. It was 6 months worth of labs for one clinic (44,000 records). After 30 minutes I gave up and had to kill it. I'm not sure how long it would take if I let it finish. This is in no way a criticism of your query, but I was surprised it was so slow. My old way of doing it with multiple queries and loops took about 3 minutes. I thought the Jet engine would have sped things up.
The way I did it was to use a query to get the last dates for all labs for all patients. I then use those dates to get the lab values and this is put out to a table. I then pull out all of the LDLs and run the code below to get duplicates and take the lowest value.
Code:
Private Sub RemoveDoubleDates()
Dim sSQL As String
Dim RecLDLs As Recordset
Dim iHighVal As Long
Dim sItemName As String
Dim sChartnum As String
Dim dLabDate As Date
'Find double labs
sSQL = "SELECT tblQrtlyLDLAll.ChartNumber, tblQrtlyLDLAll.LastLab, Count(tblQrtlyLDLAll.ChartNumber) AS CountOfChartNumber"
sSQL = sSQL & " FROM tblQrtlyLDLAll"
sSQL = sSQL & " GROUP BY tblQrtlyLDLAll.ChartNumber, tblQrtlyLDLAll.LastLab"
sSQL = sSQL & " HAVING (((Count(tblQrtlyLDLAll.ChartNumber)) > 1))"
With CurrentDb.OpenRecordset(sSQL, dbOpenDynaset)
If Not .EOF And Not .BOF Then
Do While Not .EOF
sSQL = "SELECT tblLab.ChartNumber, tblLab.ItemName, tblLab.LabDate, tblLab.LabValue"
sSQL = sSQL & " FROM tblLab"
sSQL = sSQL & " WHERE (((tblLab.ChartNumber)='" & .Fields("ChartNumber") & "') AND ((tblLab.ItemName)='LDL' Or (tblLab.ItemName)='LDL(Direct)') AND ((tblLab.LabDate)=#" & .Fields("LastLab") & "#))"
Set RecLDLs = CurrentDb.OpenRecordset(sSQL, dbOpenDynaset)
iHighVal = -10
If Not RecLDLs.EOF And Not RecLDLs.BOF Then
Do While Not RecLDLs.EOF
If RecLDLs.Fields("LabValue") >= iHighVal Then
iHighVal = RecLDLs.Fields("LabValue")
sItemName = RecLDLs.Fields("ItemName")
sChartnum = RecLDLs.Fields("ChartNumber")
dLabDate = RecLDLs.Fields("LabDate")
End If
RecLDLs.MoveNext
Loop
End If
RecLDLs.Close
If iHighVal <> -10 Then
sSQL = "DELETE tblQrtlyLDLAllWithVlaue.*, tblQrtlyLDLAllWithVlaue.LastLab, tblQrtlyLDLAllWithVlaue.ChartNumber, tblQrtlyLDLAllWithVlaue.ItemName, tblQrtlyLDLAllWithVlaue.LabValue"
sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
sSQL = sSQL & " WHERE (((tblQrtlyLDLAllWithVlaue.LastLab)=#" & dLabDate & "#) AND ((tblQrtlyLDLAllWithVlaue.ChartNumber)='" & sChartnum & "') AND ((tblQrtlyLDLAllWithVlaue.ItemName)='" & sItemName & "') AND ((tblQrtlyLDLAllWithVlaue.LabValue)=" & iHighVal & "))"
DoCmd.RunSQL sSQL
End If
.MoveNext
Loop
End If
.Close
End With
'make table of chart numbers with two lab dates
sSQL = "SELECT tblQrtlyLDLAllWithVlaue.ChartNumber INTO tblQrtlyLDLDoubles"
sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
sSQL = sSQL & " GROUP BY tblQrtlyLDLAllWithVlaue.ChartNumber"
sSQL = sSQL & " HAVING (((Count(tblQrtlyLDLAllWithVlaue.ChartNumber))>1))"
DoCmd.RunSQL sSQL
'make a table with the lowest lab dates
sSQL = "SELECT tblQrtlyLDLAllWithVlaue.ChartNumber, Min(tblQrtlyLDLAllWithVlaue.LastLab) AS LabDate INTO tblQrtlyLDLMinDate"
sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue INNER JOIN tblQrtlyLDLDoubles ON tblQrtlyLDLAllWithVlaue.ChartNumber = tblQrtlyLDLDoubles.ChartNumber"
sSQL = sSQL & " GROUP BY tblQrtlyLDLAllWithVlaue.ChartNumber;"
DoCmd.RunSQL sSQL
'go through the table and reset the chart number on the earlier dates
With CurrentDb.OpenRecordset("tblQrtlyLDLMinDate", dbOpenDynaset)
If Not .EOF And Not .BOF Then
Do While Not .EOF
sSQL = "SELECT tblQrtlyLDLAllWithVlaue.LastLab, tblQrtlyLDLAllWithVlaue.ChartNumber, tblQrtlyLDLAllWithVlaue.ItemName, tblQrtlyLDLAllWithVlaue.LabValue"
sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
sSQL = sSQL & " WHERE (((tblQrtlyLDLAllWithVlaue.LastLab)=#" & .Fields("LabDate") & "#) AND ((tblQrtlyLDLAllWithVlaue.ChartNumber)='" & .Fields("ChartNumber") & "'));"
Set RecLDLs = CurrentDb.OpenRecordset(sSQL, dbOpenDynaset)
If Not RecLDLs.EOF And Not RecLDLs.BOF Then
RecLDLs.Edit
RecLDLs!ChartNumber = "xzxzxzxz"
RecLDLs.Update
End If
RecLDLs.Close
.MoveNext
Loop
End If
.Close
End With
'delete those chartnumbers we reset
sSQL = "DELETE tblQrtlyLDLAllWithVlaue.*, tblQrtlyLDLAllWithVlaue.ChartNumber"
sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
sSQL = sSQL & " WHERE (((tblQrtlyLDLAllWithVlaue.ChartNumber)='xzxzxzxz'))"
DoCmd.RunSQL sSQL
End Sub
In a data operation there's two sides to every problem really.
That you went with recordsets first isn't always a bad idea. (Naturally the out and out programmer in you encouraged you to do so - that's understandable).
But your incling to attempt to find a query method is also correct and commendable.
It's part of being a database developer where you always walk that balancing line between query Vs code. The decision needs to be made quite often.
In this case - Brent gave you a correct query.
He very well knew himself that it wasn't going to fly (corrolated subqueries very rarely do once the number of rows increase).
One thing to always consider - is hybrid solutions.
You yourself have some query calls in your code based solution. That's natural - and is effectively employing both methods.
What you can also do - is employ some code based decision making, before then running your query.
The classic example of this is a Temporary table.
(Temporary tables aren't always necessary "temporary" - the phrase "Intermediate table" would be more appropriate really).
Loding data into a temp table can allow you to sidestep the rules of normalisation while you're performing some heavy lifting with the database engine - having set up the objects to do so in code first.
(I realise it could be said that your main table is temporary in a sense - being populated by an external View - this in itself can be to our advantage).
In this case - I'd say you'll benefit from removing the dependancy on that calculated column.
Determining
IIf(Left(tblLabTest.ItemName,3)='LDL','LDL',tblLabTest.ItemName)
for every row in both instances of the table is inevitably costing you performance during the Join operations.
While adding Indexes to your table might help (though perhaps not substantially) - you'd lose any such index use for the ItemName field due to the calculations being performed upon it.
So I say - either create a temporary table, where you append the data from tblLabTest to operate upon and include another column to *store* the calculated results of the immediate if statement - or simply include that "temporary" column in your tblLabTest table - and perform a single update to that column prior to running your query.
If possible - the Update will be a quicker solution (than deleting and re-appending rows to your temp table).
A quick test here shows 0.69 seconds Vs 0.95 seconds.
You're then going to receive real benefit from that move. The query engine can work with stored data and not calculating on the fly. (You could even index this "calculated" column for more performance).
Ultimately though - you'll still be performing a corrolated subquery on 44,000 rows.
It's gonna sting.
So in conjunction with your Temp table - also consider a rather more complex looking query - but it's not when you think about it.
It's just nested due to necessity of your requirements
It's currently based on you having a tblTemp as described earlier - but could work otherwise if needed.
Code:
SELECT tblLabTest.LabTestID, tblLabTest.ChartNumber, tblLabTest.LabDate, tblLabTest.LabValue, tblLabTest.ItemName
FROM tblLabTest INNER JOIN
(
SELECT Min(T3.LabTestID) AS MinLabTestID
FROM tblTemp AS T3
INNER JOIN
(
SELECT T2.ChartNumber, T2.LabDate, T2.CalcItemName, Min(T2.LabValue) AS MinLabValue
FROM tblTemp AS T2
INNER JOIN
(
SELECT T.ChartNumber, T.CalcItemName, Max(T.LabDate) AS MaxLabDate
FROM tblTemp AS T
GROUP BY T.ChartNumber, T.CalcItemName
) QD
ON T2.LabDate = QD.MaxLabDate AND T2.CalcItemName = QD.CalcItemName AND T2.ChartNumber = QD.ChartNumber
GROUP BY T2.ChartNumber, T2.LabDate, T2.CalcItemName
) QV
ON T3.CalcItemName = QV.CalcItemName AND T3.LabValue = QV.MinLabValue AND T3.LabDate = QV.LabDate AND T3.ChartNumber = QV.ChartNumber
GROUP BY T3.ChartNumber, T3.LabDate, T3.LabValue, T3.CalcItemName
) AS QID
ON QID.MinLabTestID = tblLabTest.LabTestID
I'm sure you'll realise that you could create your temp table the first time by just ammending the Append statement slighty... into another Maketable (as you used in your original table creating code).
SELECT LabTestID, ChartNumber, LabDate, LabValue, IIf(Left([ItemName],3) ='LDL','LDL',[ItemName]) As CalcItemName INTO tblTemp
FROM tblLabTest
I'm sure you'll realise that you could create your temp table the first time by just ammending the Append statement slighty... into another Maketable (as you used in your original table creating code).
SELECT LabTestID, ChartNumber, LabDate, LabValue, IIf(Left([ItemName],3) ='LDL','LDL',[ItemName]) As CalcItemName INTO tblTemp
FROM tblLabTest
Yep, that's exactly what I did. The TransferText (Access 03) method does not allow you to add a auto-number as a PK during the transfer. Even if you set up a schema for the transfer with the auto-number. The "Let Access Add a Primary Key" option can not be stored with the schema. In my work atmosphere you can't expect people to import their own data and generate reports. I almost always design interfaces that allow them to select date ranges, and other options, and then click a button to get the report.
The CSV file I import is much, much larger than 44,000 records. In code I create a new table (CreateTableDef) with an auto-number PK. I then add the records I want to query to that. That takes only a few seconds.
I modified the original query written by datAdrenaline to turn it in to a MT query. Ultimately, the results get spit out to a report with multiple calculated fields. Designing reports off tables is much, much faster than designing them off a query.
In my test with only a few hundred records it works fine. In the real world though, it does not perform well at all. I can see the problems with the query and why it is so slow. What really amazed me was just how slow it was. I let it run for 34 minutes and then gave up. I never timed my other method, but it was less than 3 minutes. That's a huge difference.
One thing to always consider - is hybrid solutions.
You yourself have some query calls in your code based solution. That's natural - and is effectively employing both methods.
What you can also do - is employ some code based decision making, before then running your query.
The classic example of this is a Temporary table.
(Temporary tables aren't always necessary "temporary" - the phrase "Intermediate table" would be more appropriate really).
Loding data into a temp table can allow you to sidestep the rules of normalisation while you're performing some heavy lifting with the database engine - having set up the objects to do so in code first.
(I realise it could be said that your main table is temporary in a sense - being populated by an external View - this in itself can be to our advantage).
In this case - I'd say you'll benefit from removing the dependancy on that calculated column.
Determining
IIf(Left(tblLabTest.ItemName,3)='LDL','LDL',tblLabTest.ItemName)
for every row in both instances of the table is inevitably costing you performance during the Join operations.
While adding Indexes to your table might help (though perhaps not substantially) - you'd lose any such index use for the ItemName field due to the calculations being performed upon it.
So I say - either create a temporary table, where you append the data from tblLabTest to operate upon and include another column to *store* the calculated results of the immediate if statement - or simply include that "temporary" column in your tblLabTest table - and perform a single update to that column prior to running your query.
If possible - the Update will be a quicker solution (than deleting and re-appending rows to your temp table).
A quick test here shows 0.69 seconds Vs 0.95 seconds.
You're then going to receive real benefit from that move. The query engine can work with stored data and not calculating on the fly. (You could even index this "calculated" column for more performance).
So in conjunction with your Temp table - also consider a rather more complex looking query - but it's not when you think about it.
It's just nested due to necessity of your requirements
It's currently based on you having a tblTemp as described earlier - but could work otherwise if needed.
Code:
SELECT tblLabTest.LabTestID, tblLabTest.ChartNumber, tblLabTest.LabDate, tblLabTest.LabValue, tblLabTest.ItemName
FROM tblLabTest INNER JOIN
(
SELECT Min(T3.LabTestID) AS MinLabTestID
FROM tblTemp AS T3
INNER JOIN
(
SELECT T2.ChartNumber, T2.LabDate, T2.CalcItemName, Min(T2.LabValue) AS MinLabValue
FROM tblTemp AS T2
INNER JOIN
(
SELECT T.ChartNumber, T.CalcItemName, Max(T.LabDate) AS MaxLabDate
FROM tblTemp AS T
GROUP BY T.ChartNumber, T.CalcItemName
) QD
ON T2.LabDate = QD.MaxLabDate AND T2.CalcItemName = QD.CalcItemName AND T2.ChartNumber = QD.ChartNumber
GROUP BY T2.ChartNumber, T2.LabDate, T2.CalcItemName
) QV
ON T3.CalcItemName = QV.CalcItemName AND T3.LabValue = QV.MinLabValue AND T3.LabDate = QV.LabDate AND T3.ChartNumber = QV.ChartNumber
GROUP BY T3.ChartNumber, T3.LabDate, T3.LabValue, T3.CalcItemName
) AS QID
ON QID.MinLabTestID = tblLabTest.LabTestID
... Since this is a "Challenge" per se ... I have came up with some code. It processes 80,000 records in about 10 seconds, and of that 10s, <5s of it was opening the recordset and managing the tables and >5s to actually process the data. I left the timer (that times the whole process) in the code. I did not alter your original structure at all (so No ID field, nor a permanant computed column. The premise is the I open a recordset with tblLabTest as the source, loop that recordset, and add the first record of each group (based on Chart and Item) to a second table. The SQL statement uses an expression to get the computed ItemName (vItemName) {side note the "v" stands for "virtual" when I am speaking it in my mind}. The SQL also puts in the appropriate sorts to force the latest lab to the top of each group. I figure since I'm looping a recordset, there was no need to add a column to the main table then execute an UPDATE statement to update that column. As stated, I do create a table for the results of the process (tblLabTest_Lastest).... This type of work seems to be in line with MS's claim that "Information Workers" are the core Access users!!.
Here is the code ... (if you want a sample db I can post ... but I just used your code to create a table with 80000 records, then used the following to process that data)
Code:
Public Sub ProcessData()
Dim strSQL As String
Dim cnn As ADODB.Connection
Dim strChart As String, strItemName As String
Dim rstLabTest As ADODB.Recordset
Dim rstLabTest_Latest As ADODB.Recordset
Dim sglTimer As Single
'Start clock
sglTimer = Timer
'Point to the CurrentDB, and set up the rst objects
Set cnn = CurrentProject.Connection
Set rstLabTest = New ADODB.Recordset
Set rstLabTest_Latest = New ADODB.Recordset
'Make a table (or delete the contents) for the results
If DLookup("[Id]", "MSysObjects", "[Name]='tblLabTest_Latest'") Then
strSQL = "DELETE FROM tblLabTest_Latest"
Else
strSQL = "SELECT ChartNumber, LabDate, LabValue, ItemName" & _
" INTO tblLabTest_Latest" & _
" FROM tblLabTest" & _
" WHERE 1=0"
End If
cnn.Execute strSQL, , adCmdText
RefreshDatabaseWindow
'Open the LATEST lab table
strSQL = "SELECT * FROM tblLabTest_Latest WHERE 1=0"
rstLabTest_Latest.Open strSQL, cnn, adOpenKeyset, adLockPessimistic, adCmdText
'Open the lab tests and loop the rst.
strSQL = "SELECT *" & _
" FROM (SELECT *, IIF(Left(ItemName,3) = 'LDL','LDL', ItemName) As vItemName" & _
" FROM tblLabTest) As vTbl" & _
" ORDER BY ChartNumber," & _
" vItemName," & _
" LabDate DESC, LabValue"
With rstLabTest
'Set up the recordset for speedy looping
.CursorLocation = adUseClient
.CursorType = adOpenForwardOnly
.LockType = adLockReadOnly
.Open strSQL, cnn
'Loop the tests
Do Until .EOF
'Record the lastest test if appropriate
If strChart <> !ChartNumber Or _
strItemName <> !vItemName Then
'Add the record to the results
rstLabTest_Latest.AddNew Array(0, 1, 2, 3) _
, Array(!ChartNumber, !LabDate, !LabValue, !ItemName)
'Remember chart and lab we are on
strChart = !ChartNumber
strItemName = !vItemName
End If
'Go to the next lab or chart record
.MoveNext
Loop
End With
'Clean up
rstLabTest.Close
rstLabTest_Latest.Close
'Stop clock
Debug.Print Timer - sglTimer
End Sub
OK then - let's go with direct solutions then (no intermediate stages)
(Losing really useful efficiency here though - the temp table / calc column is a real time saver!)
Code:
SELECT tblLabTest.LabTestID, tblLabTest.ChartNumber, tblLabTest.LabDate, tblLabTest.LabValue, tblLabTest.ItemName
FROM tblLabTest INNER JOIN (SELECT Min(T3.LabTestID) AS MinLabTestID
FROM tblLabTest AS T3
INNER JOIN
(
SELECT T2.ChartNumber, T2.LabDate, IIf(Left(T2.[ItemName],3)="LDL","LDL",T2.[ItemName]) As CalcItemName2, Min(T2.LabValue) AS MinLabValue
FROM tblLabTest AS T2
INNER JOIN
(
SELECT T.ChartNumber, IIf(Left(T.[ItemName],3)="LDL","LDL",T.[ItemName]) As CalcItemName1, Max(T.LabDate) AS MaxLabDate
FROM tblLabTest AS T
GROUP BY T.ChartNumber, IIf(Left(T.[ItemName],3)="LDL","LDL",T.[ItemName])
) QD
ON T2.LabDate = QD.MaxLabDate AND IIf(Left(T2.[ItemName],3)="LDL","LDL",T2.[ItemName]) = QD.CalcItemName1 AND T2.ChartNumber = QD.ChartNumber
GROUP BY T2.ChartNumber, T2.LabDate, IIf(Left(T2.[ItemName],3)="LDL","LDL",T2.[ItemName])
) QV
ON IIf(Left(T3.[ItemName],3)="LDL","LDL",T3.[ItemName]) = QV.CalcItemName2 AND T3.LabValue = QV.MinLabValue AND T3.LabDate = QV.LabDate AND T3.ChartNumber = QV.ChartNumber
GROUP BY T3.ChartNumber, T3.LabDate, T3.LabValue, IIf(Left(T3.[ItemName],3)="LDL","LDL",T3.[ItemName])
) AS QID ON QID.MinLabTestID = tblLabTest.LabTestID;
Just for fun though IMO. Please consider the temp table / calc column.
With the calculated column and tblTemp, I got Leighs processing the data in about 5s ... (consistently slightly faster than my code based solution ... but not by much)
Without the calc column I got Leighs solution at a little over 8s
{note this is for 80,000 records being processed}
.... Looks like Leigh gets the prize at this point!!! .... Excellent Job my friend!!! ...
EDITS ADDED ...
I changed my code to use DAO recordsets and the processing time was just over 4s ...