Solved Is it possible to search for gaps between records?

ThatTransportationPerson · May 27, 2022

Hello everyone!

I'm attempting to do a "Gap Analysis" where I look for gaps in our data. Here's an example table:

State	Start Mile	End Mile
California	2.6	3.5
California	3.5	4.7
California	5	6.1
Florida	0	1.4
Florida	1.6	2

For California, there is a gap between 4.7 (end mile of second record) and 5 (begin mile of 3rd record). For Florida, there is a gap between 1.4 and 1.6. These gaps would be easy enough to find if they were in the same record, but unfortunately I will have to compare different fields in different records. I'm at a total loss as to how to go about this, or if it is even possible. If anyone has any suggestions, I would be very grateful.

MajP · May 27, 2022

Yes it can be done. You describe other situations than above, can you elaborate on those? I will write the SQL for this case.

CJ_London · May 27, 2022

would be better if your records had a primary key to uniquely identify them but as a start you would left join the table to itself on End Mile to Start Mile. Also help if you told us what you are going to do about it - adjust one of the records? measure the difference? something else?

This identifies those records which don't have a matching start mile

Code:

SELECT A.*
FROM myTable A LEFT JOIN myTable B ON A.State=B.State AND A.endMile=B.startMile
WHERE B.StartMile is Null

This will include the last record for each state since there is no later record

ThatTransportationPerson · May 27, 2022

Hello MajP,

Thanks for replying! Are you asking about what I wrote here?

These gaps would be easy enough to find if they were in the same record, but unfortunately I will have to compare different fields in different records.

What I mean is that I will have to group records based on the "State" field, and then compare the first record's "End Mile" to the next record's "Start Mile", like so:

If the End Mile and Start Mile do not match, those two records would be included in the final query table. I hope this clears up any confusion. And thank you again!

MajP · May 27, 2022

@CJ_London, That is an easier way than I thought.
I was thinking you pull the previous record.

Code:

SELECT tblmileage.state,
       tblmileage.startmiles,
       tblmileage.endmiles,
       (SELECT TOP 1 B.endmiles
        FROM   tblmileage AS B
        WHERE  tblmileage.endmiles > b.endmiles
               AND ( b.state = tblmileage.state )
        ORDER  BY endmiles DESC) AS PreviousEnd
FROM   tblmileage
ORDER  BY tblmileage.state,
          tblmileage.startmiles,
          tblmileage.endmiles;

QryPreviousEnd QryPreviousEnd

State	StartMiles	EndMiles	PreviousEnd
CA	2.6	3.5
CA	3.5	4.7	3.5
CA	5	6.1	4.7
FL	0	1.4
FL	1.6	2	1.4

Then you simply find where Startmiles <> previousEnd and PreviousEnd is Not Null

However, this still could be useful if you need to figure out the amount of the gap.

MajP · May 27, 2022

@ThatTransportationPerson

I misunderstood. When you said

These gaps would be easy enough to find if they were in the same record, but unfortunately I will have to compare different fields in different records.

I thought you had other fields besides what was shown. I understood the shown problem. FYI, this is a pretty common issue. Are you doing data entry in Access? If so you should be able to validate the entry so that a gap cannot be entered.

MajP · May 27, 2022

I think in hindsight this is a better solution you get a clean table without the end records. It gets confusing to see the last record, because you will have to ignore that. Also this lets you see the actual discrepancy. You may want to account for the value of the gap. However, writing subqueries is more work.
qryMajPGaps qryMajPGaps

State	StartMiles	EndMiles	PreviousEnd
CA	5	6.1	4.7
FL	1.6	2	1.4

arnelgp · May 28, 2022

you can also use VBA to get the result.
i created a function fnGapToTable() in Module1.
this will save the "gaps" to gapTable table.

there is a sample form dsGapTable that calls the function.
open the dsGapTable in datasheet view.

Pat Hartman · May 28, 2022

Once you clean up the data, are you going to add validation code to keep gaps from reappearing?

Galaxiom · May 28, 2022

arnelgp said:
you can also use VBA to get the result.

Solutions like that are generally known as RBAR (Row By Agonising Row).

A solution that uses the database engine's capacity to process sets of data will almost inevitably be more efficient and faster.

arnelgp · May 29, 2022

Galaxiom said:
Solutions like that are generally known as RBAR (Row By Agonising Row).

it's an alternative suggestion.
do you have anything to suggest to the op (your own idea)?

MajP · May 29, 2022

Galaxiom said:
Solutions like that are generally known as RBAR (Row By Agonising Row).

A solution that uses the database engine's capacity to process sets of data will almost inevitably be more efficient and faster.

I would not be surprised on a large data set if @arnelgp solution would be faster. My solution uses a subquery and ACE tends to choke on simple subqueries. A straight read through a snapshot recordset is pretty fast.

Pat Hartman · May 29, 2022

If we know a solution will work, we don't need to keep offering alternatives unless the OP is having trouble implementing the original solution or the original solution is flawed for some reason. It just confuses the OP and turns a thread which should be 2-3 posts into dozens of posts riddled with confusion. If you object to the original method, then feel free to post an alternative but explain why your alternative is better to give the OP some framework for making a decision.

Personally, I rarely use sub selects for the reason MajP offered. Access does not optimize them well and if you have a large dataset (which I usually do), they can be slow. There are situations where a sub select is required so you don't have options but this case has an option of using a self join to a query with criteria (left joins don't work when the criteria is on the right table) which I didn't post because the sub select will probably work fine and learning how to construct them may help the OP in the future. For some reason, if you layout the query the way I described, Access works out how to optimize it better. It is logically identical to the sub query but it seems to give Access a little guidance as to how to construct the best execution plan.

arnelgp · May 29, 2022

Pat Hartman said:
which I didn't post because the sub select will probably work fine

by all means show us, so we may learn *******************

[Jon's edit: Let's attack the problem, not people. Thank you.]

CJ_London · May 29, 2022

That is an easier way than I thought.
I was thinking you pull the previous record.

you can do it the other way, but then you will always get the first record as there is no previous record. With more data such as a PK, these can be avoided.

to include the previous record, you can use a non standard join. Again would be easier with a PK. Not sure of the benefit of showing the previous start mile but if required, join this query to your table on state and prevendmile This does not use a subquery and subject to proper indexing should be the fastest way

Code:

SELECT A.State, A.StartMile, A.EndMile, Max(B.EndMile) AS PrevEndMile, [a].[startmile]-Max([b].[endmile]) AS Diff
FROM tblMiles AS A INNER JOIN tblMiles AS B ON ( B.endmile<=A.startmile) AND (A.State = B.State)
GROUP BY A.State, A.StartMile, A.EndMile
HAVING [a].[startmile]-Max([b].[endmile])<>0;

non standard join is (B.EndMile < A.StartMile), so this can only be viewed in the sql window. You can create the query in the query builder then go to the sql window and change the = to <

Produces this result

Query1 Query1

State	StartMile	EndMile	PrevEndMile	Diff
California	5	6.1	4.7	0.3
Florida	1.6	2	1.4	0.2

Pat Hartman · May 29, 2022

by all means show us, so we may learn ********************

Was it necessary to be snarky?

I did not post the alternative because this isn't a competition and we have no idea if the OP needs an alternative. If he asks, I will post it. Otherwise, I won't add to the confusion.

KitaYama · May 30, 2022

@Pat Hartman
With all respects, if it was me, I would like to see all my options and possibilities. Then choose the one I think suits my case best. You may see it as a competition, but for me it's a wide range of solutions which not only helps me to learn, but also see how things can be done by observing a problem from a different angle.
If there's only one solution, though it works and solves my problem, I will never know any other work around exists. Having multiple solutions teaches me how to step back and see the problem and search for different ways to solve the problem, the next time I face a wall.

I can't talk for the OP, but the next time I post a question, I would really like to hear your idea. Even if the previous one works just fine. A talented programmer like you can always help a newbie like me to learn more and see the different ways to solve a single problem.

My apologies to OP for going off topic.

Galaxiom · May 30, 2022

MajP said:
I would not be surprised on a large data set if @arnelgp solution would be faster. My solution uses a subquery and ACE tends to choke on simple subqueries. A straight read through a snapshot recordset is pretty fast.

I can't say with ACE. I mostly work with SQL Server and an experience with a function comes to mind that was very instructive.

The function needed to calculate the average daily balance from the running account balances stored in financial transaction records. The balance had to come from the last transaction of the day. Easy to get the last transaction of the day but many days had no transactions. So I opened a cursor (SQL equivalent of a recordset in VBA) and looped through, repeating the balance for the missing days. It was quick enough for one account but the query was slow when the function was applied to each of hundreds of accounts.

When I had a bit more time I rewrote the function without the cursor but was disappointed when it was slightly slower than the one with the cursor. But then I integrated the SQL version into the query so it calculated all the running balances together. Remarkably, it didn't take much longer to run for a hundred balances than calculating just the one average balance.

Minty · May 30, 2022

SQL Server can also use the LAG() function to get previous row data which is very efficient.
It allows grouping and an offset to be used.

However, we're miles off topic now.

Auntiejack56 · May 30, 2022

Sometimes well documented RBAR in VBA can be great in a DB that is going back to an SME who has NFC.

Solved Is it possible to search for gaps between records?

New member

You've got your good things, and you've got mine.

Super Moderator

New member

You've got your good things, and you've got mine.

You've got your good things, and you've got mine.

You've got your good things, and you've got mine.

Attachments

..forever waiting... waiting for jellybean!

Attachments

Super Moderator

Super Moderator

..forever waiting... waiting for jellybean!

You've got your good things, and you've got mine.

Super Moderator

..forever waiting... waiting for jellybean!

Super Moderator

Super Moderator

Well-known member

Super Moderator

AWF VIP

Registered User.

Similar threads

Users who are viewing this thread