Select records that exclude a list of words (1 Viewer)

AleXX

New member
Local time
Today, 05:00
Joined
Jul 17, 2018
Messages
4
I have a table with a field that contains a description (something like "this is my description"); right now I'm able to select all the records that contain a word from another table

Code:
SELECT *
FROM table1, table2
WHERE (((InStr([Description],[table2].[word]))>0));

I can't find the way to do the opposite, I mean to exclude the words :banghead:
 

AleXX

New member
Local time
Today, 05:00
Joined
Jul 17, 2018
Messages
4
yes I've tried to add NOT in front of Instr function and also to put =0 but the words are not filtered
 

plog

Banishment Pending
Local time
Yesterday, 22:00
Joined
May 11, 2011
Messages
11,638
Why do you have two unlinked tables in the FROM? Usually, 1 data source (table/query) comprises the FROM and then additional sources are joined to it (INNER JOIN, LEFT JOIN).

Why? Perhaps you can give us a plain english explanation of the purpose of this query. Take a step back and give us the big picture.

I mean every record from table1 is going to show in the results because of this.
 

AleXX

New member
Local time
Today, 05:00
Joined
Jul 17, 2018
Messages
4
The purpose of the query is to extract data from a table and copy to another (there should be a INSERT INTO above the query I posted).

This is the original table (table1):
Code:
ID   |Description                 |Other field...
--------------------------------------------------
 1   |the cat is on the table     |
 2   |the cat is under the table  |
 3   |the dog is outside the door |
 4   |the door is closed          |

This is the table with the list of words to include (table2):
Code:
ID   |Word to include              
----------------------------------
 1   |table 
 2   |door

This should be the table with the list of words to exclude (table3):
Code:
ID   |Word to exclude              
----------------------------------
 1   |under 
 2   |closed

This is the final result I will find in the new table (table4):
Code:
ID   |Description                 |Other field...
--------------------------------------------------
  1  |the cat is on the table     |
  2  |the dog is outside the door |
 

plog

Banishment Pending
Local time
Yesterday, 22:00
Joined
May 11, 2011
Messages
11,638
Great explanation. Here is the SQL to achieve what you want:

Code:
SELECT table1.Description, Max(InStr([Description],[WordToInclude])) AS Include, Max(InStr([Description],[WordToExclude])) AS Exclude
FROM table1, Table2, Table3
GROUP BY table1.Description
HAVING (((Max(InStr([Description],[WordToInclude])))>1) AND ((Max(InStr([Description],[WordToExclude])))=0));

This provides you with the unique records in table1 that have included words and do not have excluded words. Depending on your dataset, this might take a ton of time to run--because of all the tables in the FROM clause.
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Yesterday, 22:00
Joined
Feb 28, 2001
Messages
27,131
Your problem is that your are focusing on individual words in the description field, so you need something that is not normally very "SQL"-ish. SQL is designed to manipulate whole fields. If your description field was a simple list of single words, you could do something similar to

Code:
SELECT [I]a bunch of fields[/I] FROM [I]mytable[/I] WHERE [I]mytable.tested-word[/I] NOT IN (SELECT [I]bad-word[/I] FROM [I]bad-word-list[/I]) ;

The wrinkle in this is that you have to separate the keywords from the "chaff" words in that field. Words like "the", "cat", "is" etc. And the descriptive fields are of varying size in characters but ALSO in words. E.g. "The door is closed" has fewer characters AND fewer WORDS than "the dog is outside the door." And this makes life INCREDIBLY difficult. If you write a function to do the split, you have to somehow generate records for EACH WORD so that you can get a "yeah/nay" vote on inclusion or exclusion.

My approach would be to somehow identify how many description words need to be separated and then separate them into a child table with maybe <ID, word, flag> as the tuple where the ID matches the ID of the original record. You would need one record per word in each description, minus the "chaff" words, and initialize the flags to FALSE. Then you could run a query to set the flag TRUE for any word that is in the table of no-no words. Then select the records based on having a summation query of the child table joined to an ordinary query of the main table.

Code:
Query A:
UPDATE child-table SET FLAG = TRUE WHERE child-table.word IN (SELECT bad-word.word FROM bad-word) ;
 
Query B:
SELECT ID, COUNT(FLAG) AS CFLAG FROM child-table WHERE FLAG = TRUE GROUP BY ID;

Query C:
SELECT ID, Description, other-stuff FROM parent-table INNER JOIN QueryB ON parent-table.ID = QueryB.ID WHERE QueryC.CFlag = 0 ;

Run query A to set the flags then run query C to get the results. Query B just feeds query C.

But the trick will be in generating that child table. THAT might require you to write VBA-level recordset code.

Another method comes to mind but depending on how many of the "main" records you have, you might hate it. Add a field to the main table, call it FLAG, and initialize it to FALSE. Then run a ugly Cartesian join of the description field to the bad-words field.

Code:
UPDATE Flag FROM bad-word, main SET FLAG = TRUE WHERE main.description LIKE "*'" & bad-word.word & "'*" ;"

SELECT ID, description .... FROM main WHERE FLAG = false;

(And watch out for the apostrophes embedded in that first query.)

This could be fast IF AND ONLY IF the main table is limited in size. It is a Cartesian JOIN (which is to say, improperly constrained JOIN) so it will test every combination of records in the main table vs. every word in the bad-word table. IF you have your simple case, then you have 2 bad words and four main records - so you test 8 combinations. If you have a huge case then you have, say, 100 bad words and 1000 main records, you will run this test against 100,000 virtual (i.e. combined) records. AND because of the LIKE operator with asterisks on either side of the bad-word, indexes on the description field will be useless because this will HAVE to be processed as a relation scan.

One more "gotcha" - check for whether "Description" is a reserved word. I think it is, and you don't want to use reserved words as field names. Access gets "twitchy" when you do that.
 
Last edited:

Frothingslosh

Premier Pale Stale Ale
Local time
Yesterday, 23:00
Joined
Oct 17, 2012
Messages
3,276
Another thing to keep in mind:

Is there any possibility at ALL that your search words can be included in other words? If so, you have a problem.

As an example, if you add the sentence 'He disclosed the documents lying on the table' to table 1, your current rules will exclude the sentence due to a false positive in the word 'disclosed'. You aren't going to be able to use spaces to narrow it down - the string could begin with the matching word, and any word can have punctuation after it.

So if there is the slightest chance of false positives, how will you deal with them?
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 23:00
Joined
May 21, 2018
Messages
8,525
I am no expert in Regular expressions but know they are very fast.
tblWords

TheWord WordType
table Inc
door Inc
under Exc
closed Exc

Before running your query, make a pattern for what you are looking for and store in public variables

Code:
Public IncludePattern As String
Public ExcludePattern As String

Public Sub MakePatterns()
  Dim rs As DAO.Recordset
  Dim colWords As New Collection
  Set rs = CurrentDb.OpenRecordset("select TheWord from tblWords where WordType = 'Inc'")
  Do While Not rs.EOF
    colWords.Add (rs!TheWord)
    If colWords.Count = 1 Then colWords.Add (rs!TheWord)
    rs.MoveNext
    If rs.EOF Then colWords.Add colWords(colWords.Count)
  Loop
  IncludePattern = GetPattern(colWords)
  Set colWords = New Collection
  Set rs = CurrentDb.OpenRecordset("select TheWord from tblWords where WordType = 'Exc'")
  Do While Not rs.EOF
    colWords.Add (rs!TheWord)
    If colWords.Count = 1 Then colWords.Add (rs!TheWord)
    rs.MoveNext
    If rs.EOF Then colWords.Add colWords(colWords.Count)
  Loop
  ExcludePattern = GetPattern(colWords)
End Sub

Public Function GetPattern(colWords As Collection) As String
  '^(?=.*\bjack\b)(?=.*\bjames\b).*$
  Dim SearchString As String
  Dim strsql As String
  Dim I As Integer
  Dim TheWord As String
  For I = 1 To colWords.Count
      If SearchString = "" Then
        SearchString = "|\" & colWords(I) & "\b"
       Else
        SearchString = SearchString & "|\b" & colWords(I)
      End If
  Next I
  SearchString = "/" & SearchString & "/g"
  GetPattern = SearchString
End Function

Your two patterns
Code:
includepattern
\btable\b|\bdoor\b

ExcludePattern
\bunder\b|\bclosed\b
I am no expert in regexp but seems to be the correct pattern to find whole words.
Once you make the patter you can use the helper functions in sql

Code:
SELECT 
  tblComments.comment, 
  included([Comment]) AS Included, 
  excluded([comment]) AS Excluded
FROM 
  tblComments
Code:
Public Function Included(SearchIn As String) As Boolean
  Included = IsMatch(SearchIn, IncludePattern)
 ' Debug.Print IncludePattern & " include"
End Function
Public Function Excluded(SearchIn As String) As Boolean
  Excluded = IsMatch(SearchIn, ExcludePattern)
 ' Debug.Print ExcludePattern & " exclude"
End Function
Public Function IsMatch(SearchIn As String, ThePattern As String) As Boolean
  'This function returns a collection of emails found in a string
  'Need Microsoft VBScript Regular Expressions
   Dim objRegExp As VBScript_RegExp_55.RegExp
   Dim objMatch As VBScript_RegExp_55.match
   Dim ReturnMatches As VBScript_RegExp_55.MatchCollection

   'Create a regular expression object.
   Set objRegExp = New RegExp
  'Set the pattern by using the Pattern property.
   objRegExp.Pattern = ThePattern
  'Set Case Insensitivity.
   objRegExp.ignorecase = True
  'Set global applicability. Not sure what that does
   objRegExp.Global = True
  'Test whether the String can be compared. Not sure what that does
  If (objRegExp.test(SearchIn) = True) Then
     Set ReturnMatches = objRegExp.Execute(SearchIn)   ' Execute search.
     'lets assume you only get one match
     IsMatch = ReturnMatches.Count > 0
   Else
     Debug.Print "Could not compare string"
   End If
 '
End Function

I would think this would work well for very longs lists of words and strings to search.
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 23:00
Joined
May 21, 2018
Messages
8,525
To show it works with whole words only
Code:
comment	Included	Excluded
table	               -1      0
door	               -1      0
Under                0      -1
closed               0      -1
table closed     -1    -1
doornumber1    0      0
table2 under 	0     	-1
sorry about the formatting.
 

JHB

Have been here a while
Local time
Today, 05:00
Joined
Jun 17, 2012
Messages
7,732
..I can't find the way to do the opposite, I mean to exclude the words :banghead:
You don't need the opposite you can use an unmatch query.
Run a query which create a temporary table included all the words you want.
Code:
SELECT tblMain.ID, tblMain.Description INTO tbltemp
FROM tblMain, tblInclude
WHERE (((InStr([Description],[Words]))>0));
Then create a query based on the temporary table which exclude all the words you don't want.
Code:
SELECT tbltemp.ID, tbltemp.Description
FROM tbltemp, tblExclude
WHERE (((InStr([Description],[Words]))>0));
Then run a query which find all the records in the temporary table that aren't found in the in the query above.
Code:
SELECT tbltemp.ID, tbltemp.Description
FROM tbltemp LEFT JOIN qryFindExclude ON tbltemp.[ID] = qryFindExclude.[ID]
WHERE (((qryFindExclude.ID) Is Null));
Sample database is attached, open the form and click the button.
I don't know how fast it is with many records, but I think it is faster as what you have been suggested until now. :)
 

Attachments

  • IncludeExcludeWords.accdb
    504 KB · Views: 63

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 23:00
Joined
May 21, 2018
Messages
8,525
provided a solution in the Excel forum that used regexp expression that I thought was rather neat
Take a look at thread 9 for how to do this in access.
 

MajP

You've got your good things, and you've got mine.
Local time
Yesterday, 23:00
Joined
May 21, 2018
Messages
8,525
I believe this is a way to modify the where clause to get only full words. A full word with have a space before it and after it if in the middle of the text. If it is at the beginning of the text it will have a space after it. If it is at the end of the text it will have a space before. Check those three conditions
Code:
SELECT RandoMText.id, 
       RandomText.description, 
       tblwords.theword, 
FROM   randomtext, 
       tblwords 
WHERE  Instr(([description], "" & [theword] & "") > 0 
             OR RIGHT([description], Len([theword]) + 1) = "" & [theword] 
             OR LEFT([description], Len([theword]) + 1) = "" & [theword] )) = - 1
I tested this and my Regex method on a 15000 records and 30 words to check. This method ran surprisingly fast with the cartesian join in about 3-4 seconds and the regex was actually slower at about 8-10. Would be interesting to see what happens as the number of search words increase.
 

Users who are viewing this thread

Top Bottom