Solved Find all races where at least 2 horses from current race have competed in last 5 years (1 Viewer)

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
I have slowly built my horse racing database for many years and am now at the stage where I need help with some queries.
Given a race with say 10 horses I want to find all races prior to the current race where at least 2 horses from current race have competed against one another. I want to look at form connections over approx last 5 years.

I am using MS-Access 2010, and MS-Excel 2010 (I know - they're old like me but still work). I use Python3 for downloading data from web.

My tables have following structure:
  • tblMeeting:
    • ID_Meeting (PK),
    • MtgDate,
    • MtgVenue,
    • MtgWeather,
    • TrackCondition, etc ...
  • tblRace:
    • ID_Race (PK),
    • ID_Meeting (FK),
    • RaceNo,
    • RaceDistance,
    • RaceClass, etc ...
  • tblField:
    • ID_Field (PK),
    • ID_Race (FK),
    • IDHorse,
    • Tote,
    • Jockey,
    • Trainer,
    • WgtCarried,
    • FinishPosn,
    • Margin,
    • TimeAdj,
    • Rating, etc ...
Using the following query I can get a listing of ALL the horses in the current race, their total average ratings, and total number of runs, over last 5 years (approx).

I'm using UK & AUS date format here.
varDateFrom = CDate("01/09/2012")
varDateTo = CDate("16/07/2017")

SQL:
SELECT tblField.ID_Horse, Round(Avg([Rating]),10) AS AvgRtgH, Count(tblField.Rating) AS Runs
FROM (tblMeet INNER JOIN tblRace ON tblMeet.ID_Meeting = tblRace.ID_Meeting)
    INNER JOIN tblField ON tblRace.ID_Race = tblField.ID_Race
WHERE (((tblMeet.MtgDate) Between #" & varDateFrom & "# And #" & varDateTo - 1 & "#))
GROUP BY tblField.ID_Horse
HAVING (((tblField.ID_Horse) In
    (SELECT tblField.ID_Horse
    FROM (tblMeet
        INNER JOIN tblRace ON tblMeet.ID_Meeting = tblRace.ID_Meeting)
        INNER JOIN tblField ON tblRace.ID_Race = tblField.ID_Race
    WHERE (((tblMeet.MtgDate)=#" & varDateTo & "#) AND ((tblRace.RaceNo)=1) AND ((tblField.FStatus)='RUN')))
    ));

If possible, I would like to tweak this query to restrict the averages, and run counts, to only those races where 2 or more of the current field have raced each other in previous races.

I hope I have used correct formatting.
Hoping someone can point me in the right direction.


[Edit]
For some reason I cannot respond to contributors as I get this message. Anyway I will be away for a couple of days but will respond here by weekend
Regards
 

Attachments

  • spam-message.png
    spam-message.png
    11.2 KB · Views: 58
Last edited:

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 07:35
Joined
Feb 28, 2001
Messages
27,186
I might suggest that you look into using the IN clause and have a layered query. In one layer you would have the list of horses in the current field. Then build a query to count the number of horses in some arbitrary field not more than five years ago and the filter would be WHERE Horse_ID IN (the query for the current field). With no further filter, this would be the list of times the horses in the current field ran within the last 5 years. Then pick only those fields with counts greater than one. I would write this for you but I'm getting a bit tired and you DON'T want to use my queries when I'm getting sleepy.

Here is a link to the IN operator if you are not familiar with it.

 

ebs17

Well-known member
Local time
Today, 14:35
Joined
Feb 7, 2020
Messages
1,946
Given a race with say 10 horses I want to find all races prior to the current race where at least 2 horses from current race have competed against one another. I want to look at form connections over approx last 5 years.
A pretty, somewhat challenging task.

Can you upload a sample database? At least with me the abstraction ability is not enough to write everything completely from the head on the paper, for trying out some realistic sample data is very helpful.

look at form connections
1) Which characteristics according to the table are to be considered there?
2) Do you want to compare 2 horses against each other, or a comparison of all horses to be considered? The second case is likely to be very large and full of gaps.

Of course, Doc may present a complete and correct solution in your sense right away.
 

plog

Banishment Pending
Local time
Today, 07:35
Joined
May 11, 2011
Messages
11,646
I think I got this with just one subquery:

Code:
SELECT tblMeeting.MtgDate AS MeetingDate, main.ID_Race AS RaceID, main.IDHorse AS mainHorse, competitors.IDHorse AS competitorHorse
FROM ((tblField AS main INNER JOIN tblField AS competitors ON main.ID_Race = competitors.ID_Race) INNER JOIN tblRace ON main.ID_Race = tblRace.ID_Race) INNER JOIN tblMeeting ON tblRace.ID_Meeting = tblMeeting.ID_Meeting
WHERE (((main.IDHorse)<>[competitors].[IDHorse]))
ORDER BY main.ID_Race, main.IDHorse;

Paste that in a query and name it 'SimilarRaces_sub1'. It divides your data into a main horse and all its competitors for a race and brings in the date so you can filter on that later as well.

Then this query will tell you which races have had at least 2 similar competitors:

Code:
SELECT main.MeetingDate, main.RaceID
FROM SimilarRaces_sub1 AS main INNER JOIN SimilarRaces_sub1 AS [prior] ON (main.competitorHorse = prior.competitorHorse) AND (main.mainHorse = prior.mainHorse)
WHERE (((main.MeetingDate)>[prior].[MeetingDate]) AND ((main.RaceID)<>[prior].[RaceID]))
GROUP BY main.MeetingDate, main.RaceID;

It uses the sub query and finds the races where the main horse and its competitors match in prior races. Any race that appears in this list meets your criteria of having past competitors. I didn't apply your 5 year criteria, but you should be able to when you use that final query.
 

ebs17

Well-known member
Local time
Today, 14:35
Joined
Feb 7, 2020
Messages
1,946
Small correction for simple quantity (without reversals):
Code:
WHERE main.IDHorse < competitors.IDHorse
 

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
Wow. Thanks for the response.

I thought this may need some more information to aid you good folk so, I have created a sample DBase with tables and queries and also include an excel pivot table.

As previously stated I am trying to establish form-lines between horses for each race. I have created a pivot table to help me in my attempt to explain what I am trying to do. I do not expect that I will achieve this in a single query but, who knows.

In the pivot table, (its actually the query above), I am able to display the field for each race and the ratings for each horse in that race. The rows are the HorseIDs and the columns are the serial date of the race with the race number appended ie
'42428_3' which is "28/02/2016", RaceNo 3

This is what I am trying to achieve in access using queries or if necessary recordsets and temp tables, I'm open to suggestions.

The initial query gives me the total number of runs per horse and the average of those runs. If I highlight each row for a race I can then locate previous races where some of these horse have competed against each other - a slow laborious task.

I hope this makes sense. As I have said I am open to suggestions.
 

Attachments

  • Racing.zip
    1 MB · Views: 78

ebs17

Well-known member
Local time
Today, 14:35
Joined
Feb 7, 2020
Messages
1,946
Below is a crosstab query that should meet your requirements. There are three parameters to enter to run it:
- the date of the meeting
- the duration of the review in months (I arbitrarily changed this from years to months, which should give you a little more variability)
- the number of the race on the day
SQL:
PARAMETERS
   parMtgDate Date,
   parMonthsBack Long,
   parRaceNo Long
;
TRANSFORM
   FIRST(Q.Rating) AS R
SELECT
   Q.ID_Horse,
   COUNT(Q.Rating) AS RaceCount,
   Round(AVG(Q.Rating), 10) AS RaceAverage
FROM
   (
      SELECT
         tblField.ID_Horse,
         tblField.Rating,
         CLng(tblMeeting.MtgDate) & "_" & parRaceNo AS RaceDate
      FROM
         (tblMeeting
            INNER JOIN tblRace
            ON tblMeeting.ID_Meeting = tblRace.ID_Meeting
         )
         INNER JOIN tblField
         ON tblRace.ID_Race = tblField.ID_Race
      WHERE
         tblMeeting.MtgDate BETWEEN DateAdd("m", - parMonthsBack, parMtgDate - 1)
            AND
         parMtgDate - 1
            AND
         tblField.ID_Horse IN
            (
               SELECT
                  tblField.ID_Horse
               FROM
                  (tblMeeting
                     INNER JOIN tblRace
                     ON tblMeeting.ID_Meeting = tblRace.ID_Meeting
                  )
                  INNER JOIN tblField
                  ON tblRace.ID_Race = tblField.ID_Race
               WHERE
                  tblMeeting.MtgDate = parMtgDate
                     AND
                  tblRace.RaceNo = parRaceNo
            )
         ) AS Q
GROUP BY
   Q.ID_Horse
PIVOT
   Q.RaceDate

Code:
CLng(tblMeeting.MtgDate) & "_" & parRaceNo AS RaceDate

' could be exchanged to
Format(tblMeeting.MtgDate, "yymmdd") & "_" & parRaceNo AS RaceDate
The date in short format is not quite as cryptic to read as the number.
 
Last edited:

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
Below is a crosstab query that should meet your requirements. There are three parameters to enter to run it:
- the date of the meeting
- the duration of the review in months (I arbitrarily changed this from years to months, which should give you a little more variability)
- the number of the race on the day
SQL:
PARAMETERS
   parMtgDate Date,
   parMonthsBack Long,
   parRaceNo Long
;
TRANSFORM
   FIRST(Q.Rating) AS R
SELECT
   Q.ID_Horse,
   COUNT(Q.Rating) AS RaceCount,
   Round(AVG(Q.Rating), 10) AS RaceAverage
FROM
   (
      SELECT
         tblField.ID_Horse,
         tblField.Rating,
         CLng(tblMeeting.MtgDate) & "_" & parRaceNo AS RaceDate
      FROM
         (tblMeeting
            INNER JOIN tblRace
            ON tblMeeting.ID_Meeting = tblRace.ID_Meeting
         )
         INNER JOIN tblField
         ON tblRace.ID_Race = tblField.ID_Race
      WHERE
         tblMeeting.MtgDate BETWEEN DateAdd("m", - parMonthsBack, parMtgDate - 1)
            AND
         parMtgDate - 1
            AND
         tblField.ID_Horse IN
            (
               SELECT
                  tblField.ID_Horse
               FROM
                  (tblMeeting
                     INNER JOIN tblRace
                     ON tblMeeting.ID_Meeting = tblRace.ID_Meeting
                  )
                  INNER JOIN tblField
                  ON tblRace.ID_Race = tblField.ID_Race
               WHERE
                  tblMeeting.MtgDate = parMtgDate
                     AND
                  tblRace.RaceNo = parRaceNo
            )
         ) AS Q
GROUP BY
   Q.ID_Horse
PIVOT
   Q.RaceDate

Code:
CLng(tblMeeting.MtgDate) & "_" & parRaceNo AS RaceDate

' could be exchanged to
Format(tblMeeting.MtgDate, "yymmdd") & "_" & parRaceNo AS RaceDate
The date in short format is not quite as cryptic to read as the number.
Perfect!! Thanks for that. It does exactly what I wanted.
Now my next problem is dealing with the Access limitation of 255 columns but that can wait for now.
 

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
I think I got this with just one subquery:

Code:
SELECT tblMeeting.MtgDate AS MeetingDate, main.ID_Race AS RaceID, main.IDHorse AS mainHorse, competitors.IDHorse AS competitorHorse
FROM ((tblField AS main INNER JOIN tblField AS competitors ON main.ID_Race = competitors.ID_Race) INNER JOIN tblRace ON main.ID_Race = tblRace.ID_Race) INNER JOIN tblMeeting ON tblRace.ID_Meeting = tblMeeting.ID_Meeting
WHERE (((main.IDHorse)<>[competitors].[IDHorse]))
ORDER BY main.ID_Race, main.IDHorse;

Paste that in a query and name it 'SimilarRaces_sub1'. It divides your data into a main horse and all its competitors for a race and brings in the date so you can filter on that later as well.

Then this query will tell you which races have had at least 2 similar competitors:

Code:
SELECT main.MeetingDate, main.RaceID
FROM SimilarRaces_sub1 AS main INNER JOIN SimilarRaces_sub1 AS [prior] ON (main.competitorHorse = prior.competitorHorse) AND (main.mainHorse = prior.mainHorse)
WHERE (((main.MeetingDate)>[prior].[MeetingDate]) AND ((main.RaceID)<>[prior].[RaceID]))
GROUP BY main.MeetingDate, main.RaceID;

It uses the sub query and finds the races where the main horse and its competitors match in prior races. Any race that appears in this list meets your criteria of having past competitors. I didn't apply your 5 year criteria, but you should be able to when you use that final query.
Thanks for that. I am currently working my way through the code to make sure I understand how it works. Will get back to you..
 

ebs17

Well-known member
Local time
Today, 14:35
Joined
Feb 7, 2020
Messages
1,946
Now my next problem is dealing with the Access limitation of 255 columns
What would be your goal in doing so? You don't really want to see 254 columns, do you?

The first and easy way would be to start with a smaller value for parameter parMonthBack. A smaller period of time will automatically result in less race days and therefore less columns.

With a little more effort it would also be possible to iterate the value for parMonthBack by code so that you get a maximum number of columns that can be displayed.

I want to look at form connections over approx last 5 years.
I wonder what usable value the rating of a horse 5 years ago can have for a race today - before starting a technical implementation there is always the question of sense.
 

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
What would be your goal in doing so? You don't really want to see 254 columns, do you?

The first and easy way would be to start with a smaller value for parameter parMonthBack. A smaller period of time will automatically result in less race days and therefore less columns.

With a little more effort it would also be possible to iterate the value for parMonthBack by code so that you get a maximum number of columns that can be displayed.


I wonder what usable value the rating of a horse 5 years ago can have for a race today - before starting a technical implementation there is always the question of sense.
The program I am trying to implement is taken from Bob Wilkins book "Bioenergetics and Racehorse Ratings". As part of the final stages there is a convergence performed on all races for all horses currently running. The average racing life of racehorses is 3 - 5 years so that is what I am working on that now. By performing the convergence on all races for all horses over a period of time gives a more accurate measure of their abilities. Obviously this would require a huge table or pivot table or even a sparse matrix. I am still trying to decide on which method of attack. I have already done work on Speed, Pace and Weight as separate databases and so this Power/Work/Energy approach is my current challenge.
Thanks for all your help and no doubt I will have more questions. Cheers
 

ebs17

Well-known member
Local time
Today, 14:35
Joined
Feb 7, 2020
Messages
1,946
If you need ALL the data, the pivot step should not be done. There is no limit for rows, and SQL works row-oriented, so you can still do something with the long list afterwards.
 

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 13:35
Joined
Sep 12, 2006
Messages
15,657
it's tricky, because in a race with 10 runners you have 45 possible pairs to check, (also some races may have more than 2 matching runners)

You might be able to compare sets of runners, effectively put the runner ids for both races into temporary tables, and then count the matches.

You might need 2 large arrays (of dimension your total number of racehorses). Load the the array with 1s for the horse id's you want to compare. Do a boolean AND on the array, and see if you get 2 or more matches.

your arrays will be wide because you will have several thousand horses, but the calculations should be quick because the required operations are basic instructions.

You might find this discussion of a similar problem useful. It also started with python!
Can you find: five five-letter words with twenty-five unique letters? - YouTube

Do you have a sample set of data. Say, the last few grand nationals and cheltenham gold cups, to try to find matching runners in these races?
sorry - I see you have done this. I will have a look at your data.
 
Last edited:

gemma-the-husky

Super Moderator
Staff member
Local time
Today, 13:35
Joined
Sep 12, 2006
Messages
15,657
I've had a look at your data.

In the horse-reference, where you have year and reference, is the reference number the same for the same horse in multiple years.

eg would Kauto Star always be K122 irrespective of the year?
 

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
If you need ALL the data, the pivot step should not be done. There is no limit for rows, and SQL works row-oriented, so you can still do something with the long list afterwards.
Sorry. been away for a week. Yes I think that may be the way to go. I am still getting my head around this task. I have mocked up a small Excel pivot table with 4 horses over 5 races. It takes approx 25 iterations to almost converge. I'll attach the .xlsx
 

Attachments

  • Converge_Example.zip
    30.9 KB · Views: 58

3DogLegs

New member
Local time
Today, 22:35
Joined
Aug 14, 2022
Messages
9
I've had a look at your data.

In the horse-reference, where you have year and reference, is the reference number the same for the same horse in multiple years.

eg would Kauto Star always be K122 irrespective of the year?
Yes you are right. I have gone back over my data and K122 being the Brand for the Horse will be unique. I think I have to simplify my approach.
 

Users who are viewing this thread

Top Bottom