Measure text occurrences MS Access SQL (1 Viewer)

demp22

New member
Local time
Today, 10:36
Joined
Jul 9, 2015
Messages
4
I was wondering whether it is possible (and if it is, the way) to perform the following task in Microsoft Access: there are two tables; first one consists of three columns 1) ID 2) tweet text and 3) date. Second table is a single vector of words, like a lexicon. For every row of the first table, I want to measure occurrences of the words of the second table (lexicon) in the tweet text column (2). Following, I want to add a new column in the first table, in which I will keep a record of these occurrences. My ultimate purpose is to perform some sort of sentiment analysis.
In case this helps, this is what I have done so far:

SELECT *
FROM Tweet_data
WHERE Tweet_text LIKE "*" & Positive_sentiment & "*";

However, I most probably have to make some changes in the part following the LIKE

If you think there is a more practical way to perform such task (sentiment analysis) I am open to suggestions.
Thanks!
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 12:36
Joined
Feb 28, 2001
Messages
27,175
What you have described is possibly not normalized already. Your post can be interpreted two ways (at least at first reading), because one could infer that your "keyword" table, of arbitrary length, could define words of positive or negative status for your tweet analysis. Do you want to make a column in your table #1 for each word in table #2 or do you want to make a column in table #1 that gets a count for ANY match in table #2?

If you want separate counts for keywords, you need to understand junction tables, which is how Access tracks many-to-many relationships.

If you want one aggregate count, this MIGHT (and I re-emphasize MIGHT) be a case where if Table #2 is short enough, you would do better to build an in-memory array of strings from Table #2 and then write a function that can be called in SQL to count matches and update the "matching keywords" slot in Table #1. Note that the field for this count has to already exist before you would run the scan.
 

plog

Banishment Pending
Local time
Today, 12:36
Joined
May 11, 2011
Messages
11,646
I probably wouldn't store the matches. It really depends on how large each data set is. Below is a query that will do the matching for you:

Code:
SELECT Tweet_data.ID, Tweet_data.Tweet_text, lexicon.ID, lexicon.Positive_sentiment, IIf([Tweet_text] Like "*" & [Positive_sentiment] & "*",1,0) AS [Match]
FROM lexicon, Tweet_data;

You provided the table and field names of your Tweet data, but you weren't specific about the lexicon. For that, I used "lexicon" as the table name and "Positive_sentiment" as the field name.

The above SQL will show you what the matches are between the two data sets.
 

demp22

New member
Local time
Today, 10:36
Joined
Jul 9, 2015
Messages
4
I probably wouldn't store the matches. It really depends on how large each data set is. Below is a query that will do the matching for you:

Code:
SELECT Tweet_data.ID, Tweet_data.Tweet_text, lexicon.ID, lexicon.Positive_sentiment, IIf([Tweet_text] Like "*" & [Positive_sentiment] & "*",1,0) AS [Match]
FROM lexicon, Tweet_data;
You provided the table and field names of your Tweet data, but you weren't specific about the lexicon. For that, I used "lexicon" as the table name and "Positive_sentiment" as the field name.

The above SQL will show you what the matches are between the two data sets
.

Although this is a big first step (THANKS!!!), here is what it does not going right. I want a summative count for each row of the initial tweet_text field. For example, lets say i have 10 entries in the respective field, I need 10 different counters (one for each row), counting matches between each row (of the tweet_text) and the entire lexicon.

FYI lexicon is about 4K words and tweets is around 3.5M entries
 

CJ_London

Super Moderator
Staff member
Local time
Today, 18:36
Joined
Feb 19, 2013
Messages
16,607
I want a summative count for each row of the initial tweet_text field
This sql cannot be shown in the query grid but should give you want you want if posted into the sql window and field and table names are correct

Code:
 SELECT Lexicon.Positive_sentiment, Count(*) As SummaryCount
 FROM Tweet_data INNER JOIN Lexicon ON Tweet_data.Tweet_text Like "*" & Lexicon.Positive_sentiment & "*"
 GROUP BY Lexicon.Positive_sentiment
 

CJ_London

Super Moderator
Staff member
Local time
Today, 18:36
Joined
Feb 19, 2013
Messages
16,607
FYI lexicon is about 4K words and tweets is around 3.5M entries
Note that this will be a slow query due to the volume and the fact you are using *...* (which cannot use indexing). Suggest test on a smaller dataset to ensure it does what you want, then run on the full set and take the afternoon off:)

You might want to turn it into a maketable query to preserve the results
 

plog

Banishment Pending
Local time
Today, 12:36
Joined
May 11, 2011
Messages
11,646
lets say i have 10 entries in the respective field, I need 10 different counters (one for each row), counting matches between each row (of the tweet_text) and the entire lexicon.

I have no idea what that means. Can you demonstrate with data please?

FYI lexicon is about 4K words and tweets is around 3.5M entries

That's 14 billion permutations you need to check (4K * 3.5M). No matter what method you use, VBA or SQL, this is going to take a huge amount of time to run.
 

spikepl

Eledittingent Beliped
Local time
Today, 19:36
Joined
Nov 3, 2010
Messages
6,142
AFAIk MS SQL has text indexing capabilities, so that such undertaking presumably can be executed much faster in MS SQL than Access
 

demp22

New member
Local time
Today, 10:36
Joined
Jul 9, 2015
Messages
4
Quote:
I want a summative count for each row of the initial tweet_text field
This sql cannot be shown in the query grid but should give you want you want if posted into the sql window and field and table names are correct

Code:

SELECT Lexicon.Positive_sentiment, Count(*) As SummaryCount
FROM Tweet_data INNER JOIN Lexicon ON Tweet_data.Tweet_text Like "*" & Lexicon.Positive_sentiment & "*"
GROUP BY Lexicon.Positive_sentiment

Almost there (THANKS!!!) However this returns the occurrences based on the lexicon terms. Let me give you an example of what I am looking for:

Lets say I have these three tweets:
a) I love ice cream and I like summer
b) I like apples
c) I have no prior SQL knowledge so thanks very much

And the lexicon consists of the words
1) Like
2) Love

My expected counter outcome should be like:
a) 2
b) 1
c) 0
 

plog

Banishment Pending
Local time
Today, 12:36
Joined
May 11, 2011
Messages
11,646
That's not a far step from where I got you. Just make it a Totals Query and Sum the Match field.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 18:36
Joined
Feb 19, 2013
Messages
16,607
OK so you want a query which is going to return 3.5m rows? Would love to know what you plan to use the data for.

Code:
 SELECT Tweet_data.Tweet_text, Count(*) As SummaryCount
FROM Tweet_data, Lexicon
 WHERE Tweet_data.Tweet_text Like "*" & Lexicon.Positive_sentiment & "*"
GROUP BY Tweet_data.Tweet_text
Advantage - you can see this in the query grid
Disadvantage - rather than taking the afternoon off, take the week:)

You may also find with such a large dataset you will get out of memory errors, so perhaps better to split your Tweet_data into a number of smaller recordsets

Ah yes - very similar to Plogs
 

jdraw

Super Moderator
Staff member
Local time
Today, 13:36
Joined
Jan 23, 2006
Messages
15,379
I posted this in a duplicate post.

It may be more than you need, but it may have some use.


I found this via Google and an old thread on this forum

I have not read the details on how to use this utility. I have downloaded it and converted to 2003.

It seems this should do a lot more than you have asked, but will do what you have asked.
Of course reading the instructions is the first major step.

Good luck.

Note:
To anyone who is familiar, or who becomes familiar with this (Lexical Analysis of Texts) utility, please post a review so that others could benefit.
 

CJ_London

Super Moderator
Staff member
Local time
Today, 18:36
Joined
Feb 19, 2013
Messages
16,607
interesting links - also made me realise that Like *....* is not sufficient since 'glove' would also be counted for 'love'. So the code would need to be amended slightly to include a space at both ends to ensure a separate word

Code:
WHERE Tweet_data.Tweet_text Like "* " & Lexicon.Positive_sentiment & " *"
but then what about if the word is the first or last in a sentence? - suppose you could use

WHERE " " & Tweet_data.Tweet_text & " " Like "* " & Lexicon.Positive_sentiment & " *"

and catering for other punctuation such as commas and fullstops?

WHERE " " & Tweet_data.Tweet_text & " " Like "* " & Lexicon.Positive_sentiment & "[., ]*"

Then there is context - should 'like' be counted in 'Its so like them'?
 

Users who are viewing this thread

Top Bottom