Help with calculation query

torok · Jun 13, 2004

Hi everyone!

I'm having a lot of difficulty with this one. Here's a tiny piece of the table:

Code:

Year | Profession | Cohort | Supply | Additions | Subtractions
2000  Doctor           1        100        10            10
2000  Doctor           2        100         10           10

The cohorts represent age groups, in 5-year increments. So Cohort 1 might be ages [16-20], Cohort 2 might be ages [21-25]

I want to calculate Additions =((0.8*Additions) + (0.2*Additions for the previous cohort)). My query runs once every "year" to calculate the additions and subtractions for the next year, but people age one year so the additions and subtractions have to be modified.

Any suggestions?

Pat Hartman · Jun 13, 2004

That's why you shouldn't be storing a person's age even in this form. A person's "age" could change tomorrow! Calculate the cohort in the query. That way it always reflects a person's true age.

torok · Jun 13, 2004

Pat Hartman said:
That's why you shouldn't be storing a person's age even in this form. A person's "age" could change tomorrow! Calculate the cohort in the query. That way it always reflects a person's true age.

Hi Pat! Your comments are *always* appreciated... but I'm not sure what you mean by that.... These tables are huge, and they're used for scenarios in labour market models. This data has to reflect the number of people in a particular cohort, somehow. If there's another way to do this, I'd love to hear it, it's giving me headaches. So far I've managed to make to following query with a self-join, where the table in my original post is called z_1_add_subtract:

Code:

SELECT [z_1_add_subtract].[Additions]*0.8 AS Additions1, [z_1_add_subtract_1].[Additions]*0.2 AS Additions2, z_1_add_subtract.Year, z_1_add_subtract.Profession, z_1_add_subtract.Sex, z_1_add_subtract.CohortS_ID
FROM z_1_add_subtract, z_1_add_subtract AS z_1_add_subtract_1
WHERE (((z_1_add_subtract.Year)=[z_1_add_subtract_1].[Year]) AND ((z_1_add_subtract.Profession)=[z_1_add_subtract_1].[Profession]) AND ((z_1_add_subtract.Sex)=[z_1_add_subtract_1].[Sex]) AND (([z_1_add_subtract].[CohortS_ID]-1)=[z_1_add_subtract_1].[CohortS_ID]));

It almost works! Problem is, when it tries to do 0.8*Cohort 1 + 0.2* Cohort0, it finds that there is no Cohort with number 0. Instead of just calculating 0.8*Cohort1 and leaving it at that (which is what I want), it ignores that line altogether.

You wouldn't believe how long I've spent thinking about this model. It's got 10 very large tables in it, with compound primary keys like Year/Profession/Sex/Cohort, Year/Profession/Cohort/Treatments, etc. with 4 professions, 2 sexes, 10 cohorts, 14 treatments.... You put the data in, then the model runs a dozen or so queries to calculate supply and demand, projected for 25 years into the future based on each year's variables.

KKilfoil · Jun 14, 2004

Hi Pat! Your comments are *always* appreciated... but I'm not sure what you mean by that.... These tables are huge, and they're used for scenarios in labour market models. This data has to reflect the number of people in a particular cohort, somehow. If there's another way to do this, I'd love to hear it, it's giving me headaches.

Your data appears to group age groups into 5 year lumps, and you are making the assumption that each year, exactly 20% of each lump 'graduates' to the next cohort.

Over many years, isn't your assumption going to smooth out the size of each cohort? And where are you going to get reasonable numbers for the people that should be entering the 'youngest' cohort? You may have validity issues towards the end of your calculations.

If you have (or can get) the data by individual birth year rather than cohort, lumping people into cohorts each year into the future would be easier and more accurate, { but I bet you already knew that

}

What you CAN do with your data is to 'split up' the cohorts into 'YearOfBirth' using your 20% assumption, into new tables, and then 'rebuild' new cohorts, in a query, each year as you progress into the future. Your assumption will still lead to initial errors, but at least the errors won't further accumulate as you 'age' your data to look into the future.

KKilfoil · Jun 14, 2004

On the other hand...

Have you thought about simply performing your analysis in 5-year 'steps'? Each cohort will then age exactly enough to simply become the next cohort.

Once you have that done, you can assume that the results are reasonably accurate for the 'median' age member of each cohort, and then interpolate the values for the remainding members of that cohort, and/or interpolate what the results would have been for the intervening years, that you did not directly calculate.

I suspect that the inaccuracies you would be introducing would not be a whole lot worse than the ones introduced by the '20% assumption', and would save you some headaches.

HTH

torok · Jun 14, 2004

I appreciate all your comments - they get me thinking in new ways about an old problem.

I could see some inaccuracies coming in by assuming that people in the cohorts are evenly distributed by age, but that's to be expected. Every year that the model calculates, it calculates how many people are coming in to each cohort based on a bunch of factors in other tables. Doing it in 5-yr increments wouldn't work because part of the model's design requirements dictate that drastic changes can happen in any year (new school being built that results in more graduates from a profession, massiev budget cuts resulting in layoffs, etc)

Basically, a user can edit any of a couple dozen fields for any cohort/sex/profession/year combination (there are thousands of possible fields in total) in any year from 2005 through 2025, and the model calculates supply/demand for all those years based on the data. Each year's data can affect the next year's calculations.

This query is for the supply side, obviously. The table I showed in the first post is actually much larger, but the additions and subtractions are calculated fields that basically sum the addition/subtraction fields that the user fills out, and they're used by a query that calculates the next year's supply. (Supply for next year = supply for this year + additions - subtractions, but make sure the cohorts age by a year and do a few other calcs too)

I would appreciate any more comments - I'm not beyond redesigning this model and have a few ideas based on the excellent comments so far.

Thanks again!

Help with calculation query

torok

Registered User.

Pat Hartman

Super Moderator

torok

Registered User.

KKilfoil

Registered User.

KKilfoil

Registered User.

torok

Registered User.

Similar threads

Users who are viewing this thread