Query Challange

Nicely done!

Admittedly, I am a bit of a hack at this. My programming is based in 13 years of VB and C++ programming with a smattering a SQL thrown in over the years. Now suddenly I find myself doing some serious data mining. I sometimes revert to code to plow through the data using brute force, which is what I did here. It works, but it is slow. The pace is so fast at this job it is difficult sometimes to be able to take the time stand back learn something new.

Thanks again!
 
>> Nicely done! <<

Thank you!! ... Please let me know if "things" come up with the sample ... keep that board handy just in case!! ... :D
 
Update: Well, I finally got around to running your very eloquent query on some real data. It was 6 months worth of labs for one clinic (44,000 records). After 30 minutes I gave up and had to kill it. I'm not sure how long it would take if I let it finish. This is in no way a criticism of your query, but I was surprised it was so slow. My old way of doing it with multiple queries and loops took about 3 minutes. I thought the Jet engine would have sped things up.

The way I did it was to use a query to get the last dates for all labs for all patients. I then use those dates to get the lab values and this is put out to a table. I then pull out all of the LDLs and run the code below to get duplicates and take the lowest value.
Code:
Private Sub RemoveDoubleDates()
    Dim sSQL As String
    Dim RecLDLs As Recordset
    Dim iHighVal As Long
    Dim sItemName As String
    Dim sChartnum As String
    Dim dLabDate As Date
    
    'Find double labs
    sSQL = "SELECT tblQrtlyLDLAll.ChartNumber, tblQrtlyLDLAll.LastLab, Count(tblQrtlyLDLAll.ChartNumber) AS CountOfChartNumber"
    sSQL = sSQL & " FROM tblQrtlyLDLAll"
    sSQL = sSQL & " GROUP BY tblQrtlyLDLAll.ChartNumber, tblQrtlyLDLAll.LastLab"
    sSQL = sSQL & " HAVING (((Count(tblQrtlyLDLAll.ChartNumber)) > 1))"
    With CurrentDb.OpenRecordset(sSQL, dbOpenDynaset)
        If Not .EOF And Not .BOF Then
            Do While Not .EOF
                sSQL = "SELECT tblLab.ChartNumber, tblLab.ItemName, tblLab.LabDate, tblLab.LabValue"
                sSQL = sSQL & " FROM tblLab"
                sSQL = sSQL & " WHERE (((tblLab.ChartNumber)='" & .Fields("ChartNumber") & "') AND ((tblLab.ItemName)='LDL' Or (tblLab.ItemName)='LDL(Direct)') AND ((tblLab.LabDate)=#" & .Fields("LastLab") & "#))"
                Set RecLDLs = CurrentDb.OpenRecordset(sSQL, dbOpenDynaset)
                iHighVal = -10
                If Not RecLDLs.EOF And Not RecLDLs.BOF Then
                    Do While Not RecLDLs.EOF
                        If RecLDLs.Fields("LabValue") >= iHighVal Then
                            iHighVal = RecLDLs.Fields("LabValue")
                            sItemName = RecLDLs.Fields("ItemName")
                            sChartnum = RecLDLs.Fields("ChartNumber")
                            dLabDate = RecLDLs.Fields("LabDate")
                        End If
                        RecLDLs.MoveNext
                    Loop
                End If
                RecLDLs.Close
                If iHighVal <> -10 Then
                    sSQL = "DELETE tblQrtlyLDLAllWithVlaue.*, tblQrtlyLDLAllWithVlaue.LastLab, tblQrtlyLDLAllWithVlaue.ChartNumber, tblQrtlyLDLAllWithVlaue.ItemName, tblQrtlyLDLAllWithVlaue.LabValue"
                    sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
                    sSQL = sSQL & " WHERE (((tblQrtlyLDLAllWithVlaue.LastLab)=#" & dLabDate & "#) AND ((tblQrtlyLDLAllWithVlaue.ChartNumber)='" & sChartnum & "') AND ((tblQrtlyLDLAllWithVlaue.ItemName)='" & sItemName & "') AND ((tblQrtlyLDLAllWithVlaue.LabValue)=" & iHighVal & "))"
                    DoCmd.RunSQL sSQL
                End If
                .MoveNext
            Loop
        End If
        .Close
    End With
    
    'make table  of chart numbers with two lab dates
    sSQL = "SELECT tblQrtlyLDLAllWithVlaue.ChartNumber INTO tblQrtlyLDLDoubles"
    sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
    sSQL = sSQL & " GROUP BY tblQrtlyLDLAllWithVlaue.ChartNumber"
    sSQL = sSQL & " HAVING (((Count(tblQrtlyLDLAllWithVlaue.ChartNumber))>1))"
    DoCmd.RunSQL sSQL
    
    'make a table with the lowest lab dates
    sSQL = "SELECT tblQrtlyLDLAllWithVlaue.ChartNumber, Min(tblQrtlyLDLAllWithVlaue.LastLab) AS LabDate INTO tblQrtlyLDLMinDate"
    sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue INNER JOIN tblQrtlyLDLDoubles ON tblQrtlyLDLAllWithVlaue.ChartNumber = tblQrtlyLDLDoubles.ChartNumber"
    sSQL = sSQL & " GROUP BY tblQrtlyLDLAllWithVlaue.ChartNumber;"
    DoCmd.RunSQL sSQL
    
    'go through the table and reset the chart number on the earlier dates
    With CurrentDb.OpenRecordset("tblQrtlyLDLMinDate", dbOpenDynaset)
        If Not .EOF And Not .BOF Then
            Do While Not .EOF
    
                sSQL = "SELECT tblQrtlyLDLAllWithVlaue.LastLab, tblQrtlyLDLAllWithVlaue.ChartNumber, tblQrtlyLDLAllWithVlaue.ItemName, tblQrtlyLDLAllWithVlaue.LabValue"
                sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
                sSQL = sSQL & " WHERE (((tblQrtlyLDLAllWithVlaue.LastLab)=#" & .Fields("LabDate") & "#) AND ((tblQrtlyLDLAllWithVlaue.ChartNumber)='" & .Fields("ChartNumber") & "'));"
                Set RecLDLs = CurrentDb.OpenRecordset(sSQL, dbOpenDynaset)
                If Not RecLDLs.EOF And Not RecLDLs.BOF Then
                    RecLDLs.Edit
                    RecLDLs!ChartNumber = "xzxzxzxz"
                    RecLDLs.Update
                End If
                RecLDLs.Close
                .MoveNext
            Loop
        End If
        .Close
    End With

    'delete those chartnumbers we reset
    sSQL = "DELETE tblQrtlyLDLAllWithVlaue.*, tblQrtlyLDLAllWithVlaue.ChartNumber"
    sSQL = sSQL & " FROM tblQrtlyLDLAllWithVlaue"
    sSQL = sSQL & " WHERE (((tblQrtlyLDLAllWithVlaue.ChartNumber)='xzxzxzxz'))"
    DoCmd.RunSQL sSQL
    
End Sub
 
In a data operation there's two sides to every problem really.
That you went with recordsets first isn't always a bad idea. (Naturally the out and out programmer in you encouraged you to do so - that's understandable).
But your incling to attempt to find a query method is also correct and commendable.
It's part of being a database developer where you always walk that balancing line between query Vs code. The decision needs to be made quite often.

In this case - Brent gave you a correct query.
He very well knew himself that it wasn't going to fly (corrolated subqueries very rarely do once the number of rows increase).

One thing to always consider - is hybrid solutions.
You yourself have some query calls in your code based solution. That's natural - and is effectively employing both methods.
What you can also do - is employ some code based decision making, before then running your query.
The classic example of this is a Temporary table.
(Temporary tables aren't always necessary "temporary" - the phrase "Intermediate table" would be more appropriate really).
Loding data into a temp table can allow you to sidestep the rules of normalisation while you're performing some heavy lifting with the database engine - having set up the objects to do so in code first.
(I realise it could be said that your main table is temporary in a sense - being populated by an external View - this in itself can be to our advantage).

In this case - I'd say you'll benefit from removing the dependancy on that calculated column.
Determining
IIf(Left(tblLabTest.ItemName,3)='LDL','LDL',tblLabTest.ItemName)
for every row in both instances of the table is inevitably costing you performance during the Join operations.
While adding Indexes to your table might help (though perhaps not substantially) - you'd lose any such index use for the ItemName field due to the calculations being performed upon it.
So I say - either create a temporary table, where you append the data from tblLabTest to operate upon and include another column to *store* the calculated results of the immediate if statement - or simply include that "temporary" column in your tblLabTest table - and perform a single update to that column prior to running your query.

If possible - the Update will be a quicker solution (than deleting and re-appending rows to your temp table).
A quick test here shows 0.69 seconds Vs 0.95 seconds.
You're then going to receive real benefit from that move. The query engine can work with stored data and not calculating on the fly. (You could even index this "calculated" column for more performance).

Simple Append Query example:
INSERT INTO tblTemp ( LabTestID, ChartNumber, LabDate, LabValue, CalcItemName )
SELECT LabTestID, ChartNumber, LabDate, LabValue, IIf(Left([ItemName],3)='LDL','LDL',[ItemName])
FROM tblLabTest

Ultimately though - you'll still be performing a corrolated subquery on 44,000 rows.
It's gonna sting.

So in conjunction with your Temp table - also consider a rather more complex looking query - but it's not when you think about it.
It's just nested due to necessity of your requirements
It's currently based on you having a tblTemp as described earlier - but could work otherwise if needed.

Code:
SELECT tblLabTest.LabTestID, tblLabTest.ChartNumber, tblLabTest.LabDate, tblLabTest.LabValue, tblLabTest.ItemName
FROM tblLabTest INNER JOIN 
  (
  SELECT  Min(T3.LabTestID) AS MinLabTestID
    FROM tblTemp AS T3 
      INNER JOIN
      (
        SELECT T2.ChartNumber, T2.LabDate, T2.CalcItemName, Min(T2.LabValue) AS MinLabValue
        FROM tblTemp AS T2 
          INNER JOIN 
            (
              SELECT T.ChartNumber, T.CalcItemName, Max(T.LabDate) AS MaxLabDate
              FROM tblTemp AS T
              GROUP BY T.ChartNumber, T.CalcItemName
            ) QD
          ON T2.LabDate = QD.MaxLabDate AND T2.CalcItemName = QD.CalcItemName AND T2.ChartNumber = QD.ChartNumber
        GROUP BY T2.ChartNumber, T2.LabDate, T2.CalcItemName
      ) QV
      ON T3.CalcItemName = QV.CalcItemName AND T3.LabValue = QV.MinLabValue AND T3.LabDate = QV.LabDate AND T3.ChartNumber = QV.ChartNumber
    GROUP BY T3.ChartNumber, T3.LabDate, T3.LabValue, T3.CalcItemName
  ) AS QID 
ON QID.MinLabTestID = tblLabTest.LabTestID

Cheers.
 
I'm sure you'll realise that you could create your temp table the first time by just ammending the Append statement slighty... into another Maketable (as you used in your original table creating code).

SELECT LabTestID, ChartNumber, LabDate, LabValue, IIf(Left([ItemName],3) ='LDL','LDL',[ItemName]) As CalcItemName INTO tblTemp
FROM tblLabTest
 
I'm sure you'll realise that you could create your temp table the first time by just ammending the Append statement slighty... into another Maketable (as you used in your original table creating code).

SELECT LabTestID, ChartNumber, LabDate, LabValue, IIf(Left([ItemName],3) ='LDL','LDL',[ItemName]) As CalcItemName INTO tblTemp
FROM tblLabTest

Yep, that's exactly what I did. The TransferText (Access 03) method does not allow you to add a auto-number as a PK during the transfer. Even if you set up a schema for the transfer with the auto-number. The "Let Access Add a Primary Key" option can not be stored with the schema. In my work atmosphere you can't expect people to import their own data and generate reports. I almost always design interfaces that allow them to select date ranges, and other options, and then click a button to get the report.

The CSV file I import is much, much larger than 44,000 records. In code I create a new table (CreateTableDef) with an auto-number PK. I then add the records I want to query to that. That takes only a few seconds.

I modified the original query written by datAdrenaline to turn it in to a MT query. Ultimately, the results get spit out to a report with multiple calculated fields. Designing reports off tables is much, much faster than designing them off a query.

In my test with only a few hundred records it works fine. In the real world though, it does not perform well at all. I can see the problems with the query and why it is so slow. What really amazed me was just how slow it was. I let it run for 34 minutes and then gave up. I never timed my other method, but it was less than 3 minutes. That's a huge difference.
 
One thing to always consider - is hybrid solutions.
You yourself have some query calls in your code based solution. That's natural - and is effectively employing both methods.
What you can also do - is employ some code based decision making, before then running your query.
The classic example of this is a Temporary table.
(Temporary tables aren't always necessary "temporary" - the phrase "Intermediate table" would be more appropriate really).
Loding data into a temp table can allow you to sidestep the rules of normalisation while you're performing some heavy lifting with the database engine - having set up the objects to do so in code first.
(I realise it could be said that your main table is temporary in a sense - being populated by an external View - this in itself can be to our advantage).

You're preaching to the choir here. You basically just described my job.

In this case - I'd say you'll benefit from removing the dependancy on that calculated column.
Determining
IIf(Left(tblLabTest.ItemName,3)='LDL','LDL',tblLabTest.ItemName)
for every row in both instances of the table is inevitably costing you performance during the Join operations.
While adding Indexes to your table might help (though perhaps not substantially) - you'd lose any such index use for the ItemName field due to the calculations being performed upon it.
So I say - either create a temporary table, where you append the data from tblLabTest to operate upon and include another column to *store* the calculated results of the immediate if statement - or simply include that "temporary" column in your tblLabTest table - and perform a single update to that column prior to running your query.

Totally. It is an eloquent solution but is very inefficient. As I said a few messages up, this will most likely take multiple queries.

If possible - the Update will be a quicker solution (than deleting and re-appending rows to your temp table).
A quick test here shows 0.69 seconds Vs 0.95 seconds.
You're then going to receive real benefit from that move. The query engine can work with stored data and not calculating on the fly. (You could even index this "calculated" column for more performance).

Simple Append Query example:
INSERT INTO tblTemp ( LabTestID, ChartNumber, LabDate, LabValue, CalcItemName )
SELECT LabTestID, ChartNumber, LabDate, LabValue, IIf(Left([ItemName],3)='LDL','LDL',[ItemName])
FROM tblLabTest

Ultimately though - you'll still be performing a corrolated subquery on 44,000 rows.
It's gonna sting.

Yep.

So in conjunction with your Temp table - also consider a rather more complex looking query - but it's not when you think about it.
It's just nested due to necessity of your requirements
It's currently based on you having a tblTemp as described earlier - but could work otherwise if needed.

Code:
SELECT tblLabTest.LabTestID, tblLabTest.ChartNumber, tblLabTest.LabDate, tblLabTest.LabValue, tblLabTest.ItemName
FROM tblLabTest INNER JOIN 
  (
  SELECT  Min(T3.LabTestID) AS MinLabTestID
    FROM tblTemp AS T3 
      INNER JOIN
      (
        SELECT T2.ChartNumber, T2.LabDate, T2.CalcItemName, Min(T2.LabValue) AS MinLabValue
        FROM tblTemp AS T2 
          INNER JOIN 
            (
              SELECT T.ChartNumber, T.CalcItemName, Max(T.LabDate) AS MaxLabDate
              FROM tblTemp AS T
              GROUP BY T.ChartNumber, T.CalcItemName
            ) QD
          ON T2.LabDate = QD.MaxLabDate AND T2.CalcItemName = QD.CalcItemName AND T2.ChartNumber = QD.ChartNumber
        GROUP BY T2.ChartNumber, T2.LabDate, T2.CalcItemName
      ) QV
      ON T3.CalcItemName = QV.CalcItemName AND T3.LabValue = QV.MinLabValue AND T3.LabDate = QV.LabDate AND T3.ChartNumber = QV.ChartNumber
    GROUP BY T3.ChartNumber, T3.LabDate, T3.LabValue, T3.CalcItemName
  ) AS QID 
ON QID.MinLabTestID = tblLabTest.LabTestID
Cheers.

I'll take a look at this. Thanks.
 
... Since this is a "Challenge" per se ... I have came up with some code. It processes 80,000 records in about 10 seconds, and of that 10s, <5s of it was opening the recordset and managing the tables and >5s to actually process the data. I left the timer (that times the whole process) in the code. I did not alter your original structure at all (so No ID field, nor a permanant computed column. The premise is the I open a recordset with tblLabTest as the source, loop that recordset, and add the first record of each group (based on Chart and Item) to a second table. The SQL statement uses an expression to get the computed ItemName (vItemName) {side note the "v" stands for "virtual" when I am speaking it in my mind}. The SQL also puts in the appropriate sorts to force the latest lab to the top of each group. I figure since I'm looping a recordset, there was no need to add a column to the main table then execute an UPDATE statement to update that column. As stated, I do create a table for the results of the process (tblLabTest_Lastest).... This type of work seems to be in line with MS's claim that "Information Workers" are the core Access users!!.

Here is the code ... (if you want a sample db I can post ... but I just used your code to create a table with 80000 records, then used the following to process that data)
Code:
Public Sub ProcessData()
 
    Dim strSQL As String
    Dim cnn As ADODB.Connection
    Dim strChart As String, strItemName As String
    Dim rstLabTest As ADODB.Recordset
    Dim rstLabTest_Latest As ADODB.Recordset
    Dim sglTimer As Single
 
    'Start clock
    sglTimer = Timer
 
    'Point to the CurrentDB, and set up the rst objects
    Set cnn = CurrentProject.Connection
    Set rstLabTest = New ADODB.Recordset
    Set rstLabTest_Latest = New ADODB.Recordset
 
    'Make a table (or delete the contents) for the results
    If DLookup("[Id]", "MSysObjects", "[Name]='tblLabTest_Latest'") Then
        strSQL = "DELETE FROM tblLabTest_Latest"
    Else
        strSQL = "SELECT ChartNumber, LabDate, LabValue, ItemName" & _
             " INTO tblLabTest_Latest" & _
             " FROM tblLabTest" & _
             " WHERE 1=0"
    End If
    cnn.Execute strSQL, , adCmdText
    RefreshDatabaseWindow
 
    'Open the LATEST lab table
    strSQL = "SELECT * FROM tblLabTest_Latest WHERE 1=0"
    rstLabTest_Latest.Open strSQL, cnn, adOpenKeyset, adLockPessimistic, adCmdText
 
    'Open the lab tests and loop the rst.
    strSQL = "SELECT *" & _
             " FROM (SELECT *, IIF(Left(ItemName,3) = 'LDL','LDL', ItemName) As vItemName" & _
                   " FROM tblLabTest) As vTbl" & _
             " ORDER BY ChartNumber," & _
                      " vItemName," & _
                      " LabDate DESC, LabValue"
 
    With rstLabTest
 
        'Set up the recordset for speedy looping
        .CursorLocation = adUseClient
        .CursorType = adOpenForwardOnly
        .LockType = adLockReadOnly
        .Open strSQL, cnn
 
        'Loop the tests
        Do Until .EOF
 
            'Record the lastest test if appropriate
            If strChart <> !ChartNumber Or _
               strItemName <> !vItemName Then
 
                'Add the record to the results
                rstLabTest_Latest.AddNew Array(0, 1, 2, 3) _
                    , Array(!ChartNumber, !LabDate, !LabValue, !ItemName)
 
                'Remember chart and lab we are on
                strChart = !ChartNumber
                strItemName = !vItemName
 
            End If
 
            'Go to the next lab or chart record
            .MoveNext
 
        Loop
    End With
 
    'Clean up
    rstLabTest.Close
    rstLabTest_Latest.Close
 
    'Stop clock
    Debug.Print Timer - sglTimer
 
End Sub

{Note: ... I did not test Leigh's solution ...}
 
Last edited:
OK then - let's go with direct solutions then (no intermediate stages) :-p
(Losing really useful efficiency here though - the temp table / calc column is a real time saver!)

Code:
SELECT tblLabTest.LabTestID, tblLabTest.ChartNumber, tblLabTest.LabDate, tblLabTest.LabValue, tblLabTest.ItemName
FROM tblLabTest INNER JOIN (SELECT  Min(T3.LabTestID) AS MinLabTestID
    FROM tblLabTest AS T3 
      INNER JOIN
      (
        SELECT T2.ChartNumber, T2.LabDate, IIf(Left(T2.[ItemName],3)="LDL","LDL",T2.[ItemName]) As CalcItemName2, Min(T2.LabValue) AS MinLabValue
        FROM tblLabTest AS T2 
          INNER JOIN 
            (
              SELECT T.ChartNumber,  IIf(Left(T.[ItemName],3)="LDL","LDL",T.[ItemName]) As CalcItemName1, Max(T.LabDate) AS MaxLabDate
              FROM tblLabTest AS T
              GROUP BY T.ChartNumber,  IIf(Left(T.[ItemName],3)="LDL","LDL",T.[ItemName])
            ) QD
          ON T2.LabDate = QD.MaxLabDate AND  IIf(Left(T2.[ItemName],3)="LDL","LDL",T2.[ItemName])  = QD.CalcItemName1 AND T2.ChartNumber = QD.ChartNumber
        GROUP BY T2.ChartNumber, T2.LabDate,  IIf(Left(T2.[ItemName],3)="LDL","LDL",T2.[ItemName]) 
      ) QV
      ON  IIf(Left(T3.[ItemName],3)="LDL","LDL",T3.[ItemName])  = QV.CalcItemName2 AND T3.LabValue = QV.MinLabValue AND T3.LabDate = QV.LabDate AND T3.ChartNumber = QV.ChartNumber
    GROUP BY T3.ChartNumber, T3.LabDate, T3.LabValue,  IIf(Left(T3.[ItemName],3)="LDL","LDL",T3.[ItemName]) 
  ) AS QID ON QID.MinLabTestID = tblLabTest.LabTestID;

Just for fun though IMO. Please consider the temp table / calc column.
 
With the calculated column and tblTemp, I got Leighs processing the data in about 5s ... (consistently slightly faster than my code based solution ... but not by much)

Without the calc column I got Leighs solution at a little over 8s

{note this is for 80,000 records being processed}

.... Looks like Leigh gets the prize at this point!!! .... Excellent Job my friend!!! ...


EDITS ADDED ...
I changed my code to use DAO recordsets and the processing time was just over 4s ...
 
Last edited:

Users who are viewing this thread

Back
Top Bottom