Nth select (1 Viewer)

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Is there a utility which can select records using "nth select" so an even representation can be selected from a given data set
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
hi manu,

you need to use a syntax like "SELECT TOP n myTable.ID..." in your query,

HTH

Drew

[This message has been edited by Drew (edited 08-22-2001).]
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Thanx for your prompt reply Drew

I am a complete rookie when it comes to familiarity with expressions etc.. May I take the Liberty of requesting a walk through your idea and help in creating an expression as suggested by you.

Thanks for your time and patience my friend
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
Of course. Having looked at your question again i'm not sure my answer was really what you're after. It will return you the top so many records rather than a random selection. If that's what you're after then:

Assuming you have your query written which pulls all the data that you want a selection of, go into SQL view ( View | SQL View ) and just add 'TOP n' without the '' after SELECT and before th first field name. eg instead of
"SELECT myTable.ID..." you would now have "SELECT TOP n myTable.ID...".

If you wanted a random selection then post back, i'm sure i have some code here somewhere to do that. OR are you after a way of selecting a representation of the 'average' record within the dataset?

Drew

[This message has been edited by Drew (edited 08-22-2001).]
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Thank You

Yes, Random selection is what I am after. As an e.g.
"I have a dataset of 1000 records and would like to select 200 records. To achieve even representation across the entire dataset, I should pick every 5th record from my dataset (1000/200 = 5)

The utility/code that I am looking for basically would make me specify:

1. Total record count (e.g. 1000)
2. Sample size needed (e.g. 200)

the code at this point will go figure the "nth value("5" in the above e.g.)

subsequently, will return the sample dataset

Apologies if I got a bit long winded there.

Thanks, and all help is appreciated

Manu
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 15:00
Joined
Feb 28, 2001
Messages
27,261
If I may butt in with a thought...

Add a Yes/No field to your table. This is not very expensive to do. Make its name [Selected] or [Chosen] or [Picked] or something like these but that doesn't conflict with any of your other names. I'll used [Chosen] because it is a short name.

Now write two update queries for your table. The first one should set every [Chosen] (or whatever its name is) to False.

The second one should set [Chosen] to True with a criteria field of

Rnd() < fraction

where fraction is the decimal fractional representation of what percent of the table you want to pick. So if it is 1 out of 5, 1/5 = 0.20 so the criteria would be

Rnd() < 0.20

Then, one last query (for all records with [Chosen] = True) gives you a random sample.

Don't like the sample? Changed the table and want to re-run it? Just repeat the process. The Rnd function initializes itself from the system clock if you don't run a randomize, so this should produce a different sequence each time.

This ought to work. Just make sure when you build the query that it doesn't try to use "Rnd" as a text value. Sometimes the expression builder doesn't quite get it right. But this should work even within a simple query.
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
Okay, i'm off home shortly but will pick up again tomorrow. Where to go next depends on how you want to use the data - do you just want to print a report or do you want to be able to manipulate it further?

Catch u later

Drew


<extra bit> I like Docs post above, but Rnd either needs to be randomized somehow or it will remain the same for the entire query ( is it actually called for every record?). I think it'll need to be run from code to work. Otherwise way better than what i was thinking of. Anyway, hometime for me, bye 4 now


[This message has been edited by Drew (edited 08-22-2001).]
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Have a Good Day Drew...

The code for nth select should allow choosing all/selected fields for qualified records.

So, the 200 records(post nth select)can include all/selected fields from the original dataset.

Regards,

Manu
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Hi The_Doc_Man

Thanks for your input,

Tried your suggestion,

Having trouble in 2nd update query with criteria:Rnd() < 0.20

Expression Builder defaults the function as Rnd(<number> ) < 0.20

Tried:
1.deleting the <number> Rnd() < 0.20
2.replace with record count Rnd(1000) < 0.20
where 1000 is the total record count of dataset

neither of the above updates the table with True

Any further suggestions??

Regards,

Manu
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
Morning Manu,

Working with Docs idea above here's some code that will mark the records as selected

Code:
Sub changeYN(bytPercSel As Byte)
    Dim rst As Recordset
    Dim dbs As Database
    Dim x As Integer
    Dim lngTotalRecords As Long
    Dim lngNumToSet As Long
    
    Set dbs = CurrentDb
    dbs.Execute "UPDATE org_employees SET chosen=False"
    Set rst = dbs.OpenRecordset("org_employees", dbOpenDynaset)
    Randomize
    
    rst.MoveLast
    lngTotalRecords = rst.RecordCount
    lngNumToSet = (bytPercSel * 0.01 * lngTotalRecords)
Do Until x >= lngNumToSet
    With rst
        .MoveFirst
        .Move Int(lngTotalRecords * Rnd)
        If Not .Fields("chosen") Then
            .Edit
            .Fields("chosen") = True
            .Update
            x = x + 1
        End If
    End With
Loop
End Sub

Its DAO so if you're in 2K you'll either need to change it accoringly or add the references in, you may also want to change the recordset opening to something more sensible rather than grabbing the entire table, any problems post back

HTH

Drew

However - this doesn't select every 5th record, it selects 1/5 of records ( if you pass in 20 as bytPercSel, thinking about it, you'll want to check that bytPercSel is less than 100 before it sets off... )

[This message has been edited by Drew (edited 08-23-2001).]
 

v.khawaja

New member
Local time
Today, 21:00
Joined
Aug 2, 2001
Messages
9
Or
Private Sub cmdWhatEver()
Dim rst As Recordset
Dim dbs As Database
Dim intCounter As Integer
Set dbs = CurrentDb
dbs.Execute "UPDATE tbltablename SET chosen=False"
Set rst = dbs.OpenRecordset("tbltablename", dbOpenDynaset)
intCounter = 1
rst.MoveFirst
While Not rst.EOF
If intCounter = 5 Then
rst.Edit
rst!chosen = True
rst.Update
intCounter = 1
End If
intCounter = intCounter + 1
rst.MoveNext
Wend
End Sub
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Good Morning Drew,

Kindly read my e-mail message.

Regards,

Manu
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
erm, bad news on that one i'm afraid - i broke my linux box the other day, i'm planning to do a fresh install tonight but may not have time to set everything up. Can u post here?
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Thanks for your detailed response this morning, however, I will need some guidance as to:

1. Where should I insert this code

2. Next steps after code insertion

3. Your posted message talks to DAO...??

4. What should I expect to see in my table after running this code/utility


My apologies for these very basic/silly questions and all your time and patience is much appreciated.

Regards,

Manu
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
you're welcome, they're not silly questions ( not to me anyhow, the only stupid question is the unasked one ). Okay:-

1/ In a module - just cut and paste. Doesn't matter if it's a new one or not, up to you.

2/ depends where you want to call it from. If you're just doing this db for yourself then you could just put
Code:
Sub RunTheThing()
  changeYN 50
End Sub
above the already pasted routine. The 50 here refers to what percentage of records will be 'selected' from the whole group, if you want 1 in 5 then 20 will do it etc etc You'll also need the change the references to my field and table names in there ( ie 'org_employees' and 'chosen' ) to whatever yours are called.

4/ If you've added the Chosen field to your table then x% of the records will be marked as true, the remaining 100-x% will be false still. You can then build any queries you want based on that field being true. Every time you run the code it will change all the records to false and randomly reselect some more.

3/You only need to worry about DAO/ADO if you're on Access 2000. I only have 97 so i'm not totaly sure of the changes that need to be made to make it a2k friendly. But if u are on 2k post back, there's plenty of people here that will know. The TLAs refer to Data Access Objects and ActiveX Data Objects - basically microsoft has changed the way it's databases present data to the outside world. I guess ADO is better(?) as it's good to work with in ASP but can't really comment in any deep way,

If that doesn't help any please post back,

Drew

[This message has been edited by Drew (edited 08-23-2001).]
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Here we go..

Code as entered in My Module:
*********************************************
Sub Runthething()
changeYN 20
Dim rst As Recordset
Dim dbs As Database
Dim x As Integer
Dim lngTotalRecords As Long
Dim lngNumToSet As Long
Set dbs = CurrentDb
dbs.Execute "UPDATE TA2 SET chosen=False"
Set rst = dbs.OpenRecordset("TA2", dbOpenDynaset)
Randomize
rst.MoveLast
lngTotalRecords = rst.RecordCount
lngNumToSet = (bytPercSel * 0.01 * lngTotalRecords)
Do Until x >= lngNumToSet
With rst
.MoveFirst
.Move Int(lngTotalRecords * Rnd)
If Not .Fields("chosen") Then
.Edit
.Fields("chosen") = True
.Update
x = x + 1
End If
End With
Loop
End Sub

*********************************************

Additional Info:

Table Name - TA2
Yes/No Field - Chosen

Receiving the following Compile Error:

"COMPILE ERROR: SUB OR FUNCTION NOT DEFINED"


Please advise further steps

Manu
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Oops, forgot to mention, I am on Access97 FYI

Manu
 

Drew

Registered User.
Local time
Today, 21:00
Joined
Aug 18, 2001
Messages
41
sorry, that wasn't very clear. You should end up with 2 seperate subs. One as Sub changeYN(bytPercSel As Byte) and one as Sub RunTheThing(). Then when you want to run it, put yourself into the Sub RunTheThing() one and either hit F5 or the play button on the top on the Visual Basic toolbar ( if you have it visible ). At the moment it's probably choking on the changeYN 20 line.
Your changes to the table name looks fine and as your on 97 too you'll probably be okay once they're split back out into 2 routines again.

Do post back if you get any more probs

Drew
 

manu

Registered User.
Local time
Today, 16:00
Joined
Jun 28, 2001
Messages
51
Beautiful....

Works like a charm, a comment though:

The routine selects records randomly and not necessarily sequentially. Here's what I mean:

Given Dataset - 1000 records

Sample size needed - 250

A true Nth select routine should go and select every 4th(1000/250)record to create a truly representative sample.

The current routine works on %, where in answer to the above situation, it will return the required 250 records but not necessarily every 4th record.

Subsequent to running the routine, I went and looked at my table, records were duly selected and marked in the "chosen" field, occassionally though, as many as 4 records were chosen together before a skip to look for the next qualified record.

I hope I am not beating a "dead horse" here

Thanks for all your effort and patience

Manu
 

pcs

Registered User.
Local time
Today, 15:00
Joined
May 19, 2001
Messages
398
i'm kind of hesitant to weigh-in on this one and i'm prepared for the abuse from those members who are more statistically-inclined.


but,
----you wrote---
A true Nth select routine should go and select every 4th(1000/250)record to create a truly representative sample.
----------------

i think if you want a purely random dataset selecting the nth record is a problem as it imposes an 'order' on the data.

think that getting a random % of the total record count (which you now have) is a better sample...

hth,
al
 

Users who are viewing this thread

Top Bottom