Computing Trend Lines - "How To" (1 Viewer)

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 04:41
Joined
Feb 28, 2001
Messages
26,996
Earlier this month someone came up with a question about trend lines and I made an off-handed reference that it might be done simply via a query and a quick & dirty recordset operation. So I thought it might be a good idea to post this in a way to make in findable by search later.

You can use the following general ideas to do a linear regression. Here are the formulas and data you need.

Requirements:
1. A table with lists of data to be analyzed this way, where you have one independent value and a corresponding dependent value. Further, you need to believe the formula describing that dependency is a straight line.

I.e you have to assume the mathematical relationship: y = a + bx.

The goal is therefore to find a and b having lists of x and corresponding y.

2. The table (suggesting for scientific purposes) should contain these X and Y values in either SINGLE or DOUBLE format - and I always work in DOUBLE if doing regressions for reasons listed later.

3. Write this query, perhaps calling it GENSUM:

Code:
SELECT COUNT(X) AS N, SUM(X) AS SUMX, SUM(Y) AS SUMY, SUM(X*X) AS SUMX2, SUM(Y*Y) AS SUMY2, SUM(X*Y) AS SUMXY 
FROM MYDATATABLE 
WHERE {fill in any filtration/selection criteria here};

NOTE: ORDER BY clauses are NOT NEEDED in this context. See later discussion. GROUP BY clauses would let you do this for multiple sets of sums at once for disjoint data sets in the same table, but for this discussion I am keeping it simple.

4. In your VBA code where you need to do this computation (and I will not assume it to be a subroutine on its own for this example):

Code:
...

'   Define variables to be used in computation

DIM N AS DOUBLE
DIM SUMX AS DOUBLE
DIM SUMY AS DOUBLE
DIM SUMX2 AS DOUBLE
DIM SUMY2 AS DOUBLE
DIM SUMXY AS DOUBLE

'    Define variables that we wanted to see

DIM A AS DOUBLE
DIM B AS DOUBLE

'    Define a couple of working variables for intermediates

DIM DIVISOR AS DOUBLE
DIM DIVIDEND AS DOUBLE

'    Define a way to see the results of the query

DIM RS AS RECORDSET

....

'    The actual mode of opening is ALMOST irrelevant because you are
'    not going to scan multiple records via the explicit recordset.

SET RS = CURRENTDB.OPENRECORDSET( "GENSUM", dbOpenDynaset )
RS.MOVEFIRST

'    Transfer the variables from the recordset to the code
'    (It makes typing easier later)

WITH RS
    N = ![N]
    SUMX = ![SUMX]
    SUMY = ![SUMY]
    SUMXY = ![SUMXY]
    SUMY2 = ![SUMY2]
    SUMX2 = ![SUMX2]
END WITH

RS.CLOSE

'   Finally, do the math.

DIVISOR = ( N * SUMX2 ) - ( SUMX * SUMX )

DIVIDEND = ( N * SUMXY ) - ( SUMX * SUMY )
B = DIVIDEND / DIVISOR      'B is the slope

DIVIDEND = ( SUMY * SUMX2 ) - ( SUMX * SUMXY )
A = DIVIDEND / DIVISOR      'A is the Y-intercept

DISCUSSION:

In essence, let an Access summation query form the sums, then copy them from the recordset implied by that summation query to your code, then apply the formulas. Now, a fine point: Since addition is an "associative" operation and is also "commutative" you do not need an ORDER BY in the generation of the sums. Having the pairs in ascending, descending, or random order makes NO DIFFERENCE to the math.

There is another pitfall to be mentioned. If it should ever happen that you do this, there is a question to be asked: Is this relationship really expected to be a straight line? If the answer is NO then you need another type of regression. The above method will work but you will need more sums, and the higher the order of the polynomial you are using, the faster the power series starts to mount up. For example, to get a quadratic equation, you will need to diddle with 4th-power summations, which is why I suggested DOUBLE variables - they have a wider exponent range than SINGLE variables. By the time you get to quartic (4th-order) polynomials, you are talking about sums involving the 16th power of your variables, which could quickly blow SINGLE sums out of the water.

Note also that if you have Excel installed along with Access, there is a matrix math library you can use, and some of the really messy equations for higher polynomials can be managed more easily via matrix multiplication. A web search on "VBA MATRIX MATH" will lead you to articles involving use of a list of worksheet functions that look like obj.WorksheetFunction.MMult, obj.WorksheetFunction.MInverse, etc. A web search on "Regression using Sums" will include articles on how to do this via matrix methods.
 

June7

AWF VIP
Local time
Today, 01:41
Joined
Mar 9, 2014
Messages
5,423
Interesting article. I had to build matrix calculation in Access VBA. Tested and compared 3 methods.

1. matrix functions in Excel cells

2. Excel matrix functions referenced in VBA

3. all VBA, no Excel functions (found code)

Disturbing outcome was each method produced different numbers. Method 3 showed the greatest variance.

I guess query would be a 4th method. Not sure if it would be usable for my situation.
 
Last edited:

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 04:41
Joined
Feb 28, 2001
Messages
26,996
To be honest, I only tried the Excel matrix functions once or twice. The SQL method is a simpler procedure for limited degrees of polynomial. Actually works with other things as well, such as taking logarithms or exponentials and THEN forming the sums. Just kind of depends on the original formula.
 

Users who are viewing this thread

Top Bottom