I am a complete rookie when it comes to familiarity with expressions etc.. May I take the Liberty of requesting a walk through your idea and help in creating an expression as suggested by you.
Of course. Having looked at your question again i'm not sure my answer was really what you're after. It will return you the top so many records rather than a random selection. If that's what you're after then:
Assuming you have your query written which pulls all the data that you want a selection of, go into SQL view ( View | SQL View ) and just add 'TOP n' without the '' after SELECT and before th first field name. eg instead of
"SELECT myTable.ID..." you would now have "SELECT TOP n myTable.ID...".
If you wanted a random selection then post back, i'm sure i have some code here somewhere to do that. OR are you after a way of selecting a representation of the 'average' record within the dataset?
Drew
[This message has been edited by Drew (edited 08-22-2001).]
Yes, Random selection is what I am after. As an e.g.
"I have a dataset of 1000 records and would like to select 200 records. To achieve even representation across the entire dataset, I should pick every 5th record from my dataset (1000/200 = 5)
The utility/code that I am looking for basically would make me specify:
1. Total record count (e.g. 1000)
2. Sample size needed (e.g. 200)
the code at this point will go figure the "nth value("5" in the above e.g.)
Add a Yes/No field to your table. This is not very expensive to do. Make its name [Selected] or [Chosen] or [Picked] or something like these but that doesn't conflict with any of your other names. I'll used [Chosen] because it is a short name.
Now write two update queries for your table. The first one should set every [Chosen] (or whatever its name is) to False.
The second one should set [Chosen] to True with a criteria field of
Rnd() < fraction
where fraction is the decimal fractional representation of what percent of the table you want to pick. So if it is 1 out of 5, 1/5 = 0.20 so the criteria would be
Rnd() < 0.20
Then, one last query (for all records with [Chosen] = True) gives you a random sample.
Don't like the sample? Changed the table and want to re-run it? Just repeat the process. The Rnd function initializes itself from the system clock if you don't run a randomize, so this should produce a different sequence each time.
This ought to work. Just make sure when you build the query that it doesn't try to use "Rnd" as a text value. Sometimes the expression builder doesn't quite get it right. But this should work even within a simple query.
Okay, i'm off home shortly but will pick up again tomorrow. Where to go next depends on how you want to use the data - do you just want to print a report or do you want to be able to manipulate it further?
Catch u later
Drew
<extra bit> I like Docs post above, but Rnd either needs to be randomized somehow or it will remain the same for the entire query ( is it actually called for every record?). I think it'll need to be run from code to work. Otherwise way better than what i was thinking of. Anyway, hometime for me, bye 4 now
[This message has been edited by Drew (edited 08-22-2001).]
Working with Docs idea above here's some code that will mark the records as selected
Code:
Sub changeYN(bytPercSel As Byte)
Dim rst As Recordset
Dim dbs As Database
Dim x As Integer
Dim lngTotalRecords As Long
Dim lngNumToSet As Long
Set dbs = CurrentDb
dbs.Execute "UPDATE org_employees SET chosen=False"
Set rst = dbs.OpenRecordset("org_employees", dbOpenDynaset)
Randomize
rst.MoveLast
lngTotalRecords = rst.RecordCount
lngNumToSet = (bytPercSel * 0.01 * lngTotalRecords)
Do Until x >= lngNumToSet
With rst
.MoveFirst
.Move Int(lngTotalRecords * Rnd)
If Not .Fields("chosen") Then
.Edit
.Fields("chosen") = True
.Update
x = x + 1
End If
End With
Loop
End Sub
Its DAO so if you're in 2K you'll either need to change it accoringly or add the references in, you may also want to change the recordset opening to something more sensible rather than grabbing the entire table, any problems post back
HTH
Drew
However - this doesn't select every 5th record, it selects 1/5 of records ( if you pass in 20 as bytPercSel, thinking about it, you'll want to check that bytPercSel is less than 100 before it sets off... )
[This message has been edited by Drew (edited 08-23-2001).]
Or
Private Sub cmdWhatEver()
Dim rst As Recordset
Dim dbs As Database
Dim intCounter As Integer
Set dbs = CurrentDb
dbs.Execute "UPDATE tbltablename SET chosen=False"
Set rst = dbs.OpenRecordset("tbltablename", dbOpenDynaset)
intCounter = 1
rst.MoveFirst
While Not rst.EOF
If intCounter = 5 Then
rst.Edit
rst!chosen = True
rst.Update
intCounter = 1
End If
intCounter = intCounter + 1
rst.MoveNext
Wend
End Sub
erm, bad news on that one i'm afraid - i broke my linux box the other day, i'm planning to do a fresh install tonight but may not have time to set everything up. Can u post here?
you're welcome, they're not silly questions ( not to me anyhow, the only stupid question is the unasked one ). Okay:-
1/ In a module - just cut and paste. Doesn't matter if it's a new one or not, up to you.
2/ depends where you want to call it from. If you're just doing this db for yourself then you could just put
Code:
Sub RunTheThing()
changeYN 50
End Sub
above the already pasted routine. The 50 here refers to what percentage of records will be 'selected' from the whole group, if you want 1 in 5 then 20 will do it etc etc You'll also need the change the references to my field and table names in there ( ie 'org_employees' and 'chosen' ) to whatever yours are called.
4/ If you've added the Chosen field to your table then x% of the records will be marked as true, the remaining 100-x% will be false still. You can then build any queries you want based on that field being true. Every time you run the code it will change all the records to false and randomly reselect some more.
3/You only need to worry about DAO/ADO if you're on Access 2000. I only have 97 so i'm not totaly sure of the changes that need to be made to make it a2k friendly. But if u are on 2k post back, there's plenty of people here that will know. The TLAs refer to Data Access Objects and ActiveX Data Objects - basically microsoft has changed the way it's databases present data to the outside world. I guess ADO is better(?) as it's good to work with in ASP but can't really comment in any deep way,
If that doesn't help any please post back,
Drew
[This message has been edited by Drew (edited 08-23-2001).]
Code as entered in My Module:
*********************************************
Sub Runthething()
changeYN 20
Dim rst As Recordset
Dim dbs As Database
Dim x As Integer
Dim lngTotalRecords As Long
Dim lngNumToSet As Long
Set dbs = CurrentDb
dbs.Execute "UPDATE TA2 SET chosen=False"
Set rst = dbs.OpenRecordset("TA2", dbOpenDynaset)
Randomize
rst.MoveLast
lngTotalRecords = rst.RecordCount
lngNumToSet = (bytPercSel * 0.01 * lngTotalRecords)
Do Until x >= lngNumToSet
With rst
.MoveFirst
.Move Int(lngTotalRecords * Rnd)
If Not .Fields("chosen") Then
.Edit
.Fields("chosen") = True
.Update
x = x + 1
End If
End With
Loop
End Sub
sorry, that wasn't very clear. You should end up with 2 seperate subs. One as Sub changeYN(bytPercSel As Byte) and one as Sub RunTheThing(). Then when you want to run it, put yourself into the Sub RunTheThing() one and either hit F5 or the play button on the top on the Visual Basic toolbar ( if you have it visible ). At the moment it's probably choking on the changeYN 20 line.
Your changes to the table name looks fine and as your on 97 too you'll probably be okay once they're split back out into 2 routines again.
The routine selects records randomly and not necessarily sequentially. Here's what I mean:
Given Dataset - 1000 records
Sample size needed - 250
A true Nth select routine should go and select every 4th(1000/250)record to create a truly representative sample.
The current routine works on %, where in answer to the above situation, it will return the required 250 records but not necessarily every 4th record.
Subsequent to running the routine, I went and looked at my table, records were duly selected and marked in the "chosen" field, occassionally though, as many as 4 records were chosen together before a skip to look for the next qualified record.
i'm kind of hesitant to weigh-in on this one and i'm prepared for the abuse from those members who are more statistically-inclined.
but,
----you wrote---
A true Nth select routine should go and select every 4th(1000/250)record to create a truly representative sample.
----------------
i think if you want a purely random dataset selecting the nth record is a problem as it imposes an 'order' on the data.
think that getting a random % of the total record count (which you now have) is a better sample...