And I thought this was simple....

dynamictiger

Registered User.
Local time
Today, 01:21
Joined
Feb 3, 2002
Messages
270
This issue is difficult to describe, so please bear with me.

I have attached a sample database below to have a look at.

There is a table tblData this contains numbers. I have reduced it for this post to a few records the original has several millions.

A second table tblVariable contains another set of numbers.

The task is to search for matches in tblData…sounds simple enough. However its not quite that straightforward.

In addition to finding matches the need is to find a specific number of matches termed pass for discussion purpose.

A very small example is helpful to better understand this.

Working on 25 records only:
n1
n2
n3
n4
n5
n6
n7
1​
2​
3​
4​
5​
6​
7​
1​
2​
3​
4​
5​
6​
8​
1​
2​
3​
4​
5​
6​
9​
1​
2​
3​
4​
5​
6​
10​
1​
2​
3​
4​
5​
6​
11​
1​
2​
3​
4​
5​
6​
12​
1​
2​
3​
4​
5​
6​
13​
1​
2​
3​
4​
5​
6​
14​
1​
2​
3​
4​
5​
6​
15​
1​
2​
3​
4​
5​
6​
16​
1​
2​
3​
4​
5​
6​
17​
1​
2​
3​
4​
5​
6​
18​
1​
2​
3​
4​
5​
6​
19​
1​
2​
3​
4​
5​
6​
20​
1​
2​
3​
4​
5​
6​
21​
1​
2​
3​
4​
5​
6​
22​
1​
2​
3​
4​
5​
6​
23​
1​
2​
3​
4​
5​
6​
24​
1​
2​
3​
4​
5​
6​
25​
1​
2​
3​
4​
5​
6​
26​
1​
2​
3​
4​
5​
6​
27​
1​
2​
3​
4​
5​
6​
28​
1​
2​
3​
4​
5​
6​
29​
1​
2​
3​
4​
5​
6​
30​
1​
2​
3​
4​
5​
6​
31​


Using the tblVariable of two values 1 and 27 then the task is to find all matches of data containing 1 and 27 in any column of tblData. Shortened too:
n1​
n2​
n3​
n4​
n5​
n6​
n7​
1​
2​
3​
4​
5​
6​
20​
1​
2​
3​
4​
5​
6​
8​
1​
2​
3​
4​
5​
6​
9​
1​
2​
3​
4​
5​
6​
10​
1​
2​
3​
4​
5​
6​
11​
1​
2​
3​
4​
5​
6​
27​
This example would be incorrect(And is shortened for clarity). As it contains records with 1 and any other number.

Sorry got to continue next post
 

Attachments

This example would be incorrect(And is shortened for clarity). As it contains records with 1 and any other number.

The correct return would be:

n1​
n2​
n3​
n4​
n5​
n6​
n7​
1​
2​
3​
4​
5​
6​
27​


As both 1 and 27 are present in the data returned.



Moving on if there were three numbers in tblVariable being 1, 3, 20 then on 3 pass the return required would be:
n1​
n2​
n3​
n4​
n5​
n6​
n7​
1​
2​
3​
4​
5​
6​
20​




However if the three numbers in tblVariable being 1,3,20 and the pass is 2 then correct return would be:

n1​
n2​
n3​
n4​
n5​
n6​
n7​
1​
2​
3​
4​
5​
6​
20​
1​
2​
3​
4​
5​
6​
8​
1​
2​
3​
4​
5​
6​
9​
1​
2​
3​
4​
5​
6​
10​
1​
2​
3​
4​
5​
6​
11​
1​
2​
3​
4​
5​
6​
27​

As each row contains two of the variables of 1 and 3 or 1 and 20 or 3 and 20.

As I don’t know the contents of tblVariable or the number of passes although I do know the limit of passes is 7 how can I create a SQL statement that can process this data?

I have been thinking along the lines shown in the sample database using query1, query2 and query3. And indeed this may work however I think someone with a bit more math bend than me may be able to save a lot of time and processing.

Thanks in advance.
 
Man that's hard to follow. But I think I got it and the answer is--normalize your data properly. tblData is not properly set up for the task you want to achieve.

As you have presented it tblData should not contain 7 fields but 3 fields. You have this:

n1,n2,n3,n4,n5,n6,n7
1,14,12,4,5,6,7
1,14,13,4,5,6,7

It should be stored like this:

r, c, v
1,1,1
1,2,14
1,3,12
1,4,4
1,5,5
1,6,6
1,6,7
2,1,1
2,2,14
2,3,13
2,4,4
2,5,5
2,6,6
2,7,7

You essentially have a spreadsheet with rows (r) and columns (c) that hold values (v). My structure is how it should be stored in a database to achieve what you want.

When you do that you can build a query to find out how many matches that table has in tblVariable and can find the records you want.
 
The open-ended search for an arbitrary number of potential matches in a "searching" with some number of elements in a "matching" table cannot be easily done. SQL tends to like certainty. You have some uncertainty in where you would find these matches. At least from the example you showed plus your discussion, you would want to find any records that matched any of its fields with any fields of a second table. You imply that the second table might be somewhat variable. (Or at least that seems to be what I saw.)

You are not going to be able to do this with SQL. Per your discussion, ANY column in the "searching" table could match with any column in the "matching" table. The number of combinations is nasty, at least 35 if I did the combination formula right.

There is another issue and that is that your samples don't show a unique row identifier. Without one, I think you are flat-out doomed to handle this. You cannot do this with a single SQL query. You might be able to write some VBA to do some looping but this is a seriously difficult problem that will take some thought about how to do it.
 
Do you need to just return the correct records or do you have to show the matches? If you just have to return the records with N matches that is easy. Showing what is matched will be challenging, and may need to be done in code.

As Plog suggested this required a normalizing query.
 

Attachments

not sure how theoretical this is but assuming a) all values are numeric and b) there is a PK to each record, you could create a union query to 'normalise' with a simple criteria (field in (1,3,20)) then group by PK and having count=3

Code:
SELECT myTable.*
FROM myTable INNER JOIN (SELECT PK FROM (SELECT PK FROM myTable WHERE N1 in (1,3,20)
UNION ALL SELECT PK FROM myTable WHERE N2 in  (1,3,20)
UNION ALL.....) AS UQ
GROUP BY PK
HAVING Count(PK)=3) T ON myTable.PK=Y.PK
 
It almost sounds like you are trying to build some kind of model here, which Access really isn't a tool for, but there are many tools out there for.

Otherwise, as has been mentioned, you need to stop what you are doing and go back to the beginning of structuring your data properly. Properly structured data would never yield a problem anywhere remotely near something like this
 
Attached is a database to demonstrate how easy it is with a good structure. I renamed your tblData to oldtblData and created a properly structured table with all its data and named it tblData. Then I set up 2 sub queries to prepare the data for the matching (sub1 & sub2). Then I made a cross-tab query (Main) to display the results like you want them.

Currently it is set up to find records with at least 2 of these values (1,3,20) on the full 10000 records you created. To change it, put the candidate numbers into tblVariable and change the criteria in sub2 to the number of matches a line must have.
 

Attachments

FYI,
I already normalized your data using the union query. Just do a make table from that.
 
Man that's hard to follow. But I think I got it and the answer is--normalize your data properly. tblData is not properly set up for the task you want to achieve.

As you have presented it tblData should not contain 7 fields but 3 fields. You have this:

n1,n2,n3,n4,n5,n6,n7
1,14,12,4,5,6,7
1,14,13,4,5,6,7

It should be stored like this:

r, c, v
1,1,1
1,2,14
1,3,12
1,4,4
1,5,5
1,6,6
1,6,7
2,1,1
2,2,14
2,3,13
2,4,4
2,5,5
2,6,6
2,7,7

You essentially have a spreadsheet with rows (r) and columns (c) that hold values (v). My structure is how it should be stored in a database to achieve what you want.

When you do that you can build a query to find out how many matches that table has in tblVariable and can find the records you want.
Thanks for the thought. Normalised in this way I dont think is an option as it will result in over 18 million records.
 
Your reason doesn't support your assertion. What's so horrible about 18 million records?
 
Databases are much more efficient in records (rows) vs fields (columns). 18 million records in three columns is more storage efficient and much faster to query than the same amount of data stored in 7 columns. However as shown you can do a normalizing union query which is fast.
 
Your sample data leads to a situation that doesn't match up with your words. Can ANY number be in ANY columns of the dataset, rather than just having rows that look like <1,2,3,4,5,6,41>,<1,2,3,4,5,6,67>? Can you have a row that looks like <1,15,2,6,9,4,31>?

If so, you are really going to stretch the abilities of Access or any other SQL engine. And doing this with VBA will only slow it down more.

You claim to have created a simplified version of this problem for discussion, and I appreciate that you did so. But what does the "real" problem look like and what is the actual problem trying to do? Your answer here should NOT be "to match up numbers in rows" but rather should tell us what the rows are all about and why you want to match them up this way. Your problem as stated is TOO abstract. That abstraction gives us nothing to really grasp.
 
OP now has 4 suggestions to consider - posts 5, 6, 9 and 14.
 

Users who are viewing this thread

Back
Top Bottom