Word Sentiment Value Tables (1 Viewer)

BlueIshDan

☠
Local time
Today, 11:35
Joined
May 15, 2014
Messages
1,122
Hello!

I've been playing around with the Outlook.Application object, and have come to the conclusion that I would like to play with sentiment scoring emails by indexing every use of every word. The only thing I'm missing is sentiment values.

So, that leads me to ask if any of you have come across any kind of database that contains this data. ANY KIND OF DATABASE would help drastically! =]

Kindest Regards,
Dan
 

jdraw

Super Moderator
Staff member
Local time
Today, 10:35
Joined
Jan 23, 2006
Messages
15,363
Dan,

Hadn't heard the term, but found this.
 

BlueIshDan

☠
Local time
Today, 11:35
Joined
May 15, 2014
Messages
1,122
Thank you for your consideration and time for finding the wiki page. I've been researching and reading up on it a lot for the past few days. Would be a fun project I think :p
 

BlueIshDan

☠
Local time
Today, 11:35
Joined
May 15, 2014
Messages
1,122
Just a cool piece of information to share:

The fourth most used word in my 5000 emails is Please =]
 

jdraw

Super Moderator
Staff member
Local time
Today, 10:35
Joined
Jan 23, 2006
Messages
15,363
Dan,

Here is the first part of the word frequency distribution from the pdf article you posted. I don't know if it's relevant.
Code:
Word	Frequency	%

,	488	5.79
.	441	5.23
the	382	4.53
of	229	2.72
a	200	2.37
is	154	1.83
and	150	1.78
in	138	1.64
to	127	1.51
)	115	1.36
(	115	1.36
opinion	111	1.32
on	108	1.28
or	105	1.25
opinions	89	1.06
an	84	1.00
are	81	0.96
:	63	0.75
for	60	0.71
this	59	0.70
it	59	0.70
be	58	0.69
that	57	0.68
feature	55	0.65
phone	54	0.64
we	49	0.58
not	49	0.58
object	48	0.57
e	48	0.57
can	48	0.57
”	47	0.56
“	47	0.56
sentence	46	0.55
as	46	0.55
sentences	44	0.52
negative	43	0.51
positive	39	0.46
i	38	0.45
2	37	0.44
opinionated	36	0.43
also	36	0.43
f	35	0.42
example	35	0.42
1	35	0.42
sentiment	34	0.40
some	32	0.38
with	30	0.36
product	29	0.34
o	29	0.34
which	28	0.33
review	28	0.33
may	28	0.33
each	27	0.32
g	25	0.30
one	23	0.27
set	22	0.26
]	22	0.26
[	22	0.26
about	21	0.25
summary	20	0.24
have	20	0.24
analysis	20	0.24
text	19	0.23
reviews	19	0.23
expresses	19	0.23
classification	19	0.23
two	18	0.21
such	18	0.21
quality	18	0.21
objects	18	0.21
information	18	0.21
document	18	0.21
d	18	0.21
cellular	18	0.21
expressed	17	0.20
but	17	0.20
all	17	0.20
subjective	16	0.19
mining	16	0.19
many	16	0.19
definition	16	0.19
called	16	0.19
because	16	0.19
any	16	0.19
its	15	0.18
h	15	0.18
from	15	0.18
based	15	0.18
applications	15	0.18
4	15	0.18
voice	14	0.17
them	14	0.17
s	14	0.17
other	14	0.17
objective	14	0.17
express	14	0.17
battery	14	0.17
6	14	0.17
their	13	0.15
research	13	0.15
most	13	0.15
individual	13	0.15
however	13	0.15
holder	13	0.15
features	13	0.15
whether	12	0.14
what	12	0.14
Web	12	0.14
was	12	0.14
there	12	0.14
so	12	0.14
see	12	0.14
problem	12	0.14
only	12	0.14
has	12	0.14
following	12	0.14
by	12	0.14
been	12	0.14
at	12	0.14
5	12	0.14
my	11	0.13
jk	11	0.13
figure	11	0.13
emotions	11	0.13
data	11	0.13
3	11	0.13
user	10	0.12
use	10	0.12
they	10	0.12
term	10	0.12
structured	10	0.12
more	10	0.12
model	10	0.12
language	10	0.12
j	10	0.12
direct	10	0.12
chapter	10	0.12
camera	10	0.12
bar	10	0.12
attributes	10	0.12
when	9	0.11
thus	9	0.11
t	9	0.11
subjectivity	9	0.11
size	9	0.11
products	9	0.11
people	9	0.11
orientation	9	0.11
often	9	0.11
life	9	0.11
iPhone	9	0.11
general	9	0.11
different	9	0.11
8	9	0.11
7	9	0.11
>	9	0.11
<	9	0.11
will	8	0.09
types	8	0.09
then	8	0.09
th	8	0.09
shows	8	0.09
results	8	0.09
oo	8	0.09
number	8	0.09
now	8	0.09
no	8	0.09
need	8	0.09
natural	8	0.09
like	8	0.09
known	8	0.09
itself	8	0.09
important	8	0.09
given	8	0.09
expressions	8	0.09
existing	8	0.09
does	8	0.09
components	8	0.09
.....

This is document summary

Different words/items counted: 1,368
Total words: 6,797
Total punctuation: 995
Total other text: 242
Total characters: 41,307
Total paragraphs: 1,436
 

The_Doc_Man

Immoderate Moderator
Staff member
Local time
Today, 09:35
Joined
Feb 28, 2001
Messages
26,999
The reason you don't see so many obvious databases on your word frequency issue is that government security wonks have snapped up a lot of the good analyzers for "footprint" analysis.

You might or might not believe this, but you can identify people by their code's in-line documentation patterns. Not only the amount of comments in-line with instructions (on the same line) but also the header comments (with no instructions on the line). I used to challenge some folks - for whom I was the supervisor in an engineering programming shop - to try to hide their styles from me. I was never wrong about who wrote what, though admittedly it was a small sample set.

If you watch CSI:Cyber (CBS network, Sunday night), you'll see them sometimes portray the use of coding patterns as a way to identify hackers. You might be skeptical about whether that is possible - but I'm not. All the really good semantic analyzers are being sold to the government. Oh, you can still find a few here and there - just not as many as you would like.
 

jdraw

Super Moderator
Staff member
Local time
Today, 10:35
Joined
Jan 23, 2006
Messages
15,363
Agree there are a lot of analyzers, along with weighting factors/multipliers.
I used Notetab Light which has a Tools utility and offers statistics.

I have seen all sorts of people with theories. One of which I always thought was interesting.
One person I worked with was trying to relate the degree of interest/focus the government had towards safety/accident prevention based on the number of accidents on certain highway as a function of the number of telephone/hydro poles along the accident area and the number of serious injuries.

I'm sure there are profilers who analyze just about everything/anything and can draw conclusions that satisfy some audience. Just do a little google/bing searching, then switch over to do some skype and watch the marketing literature follow along....
 
Last edited:

BlueIshDan

&#9760;
Local time
Today, 11:35
Joined
May 15, 2014
Messages
1,122
I'm not looking for frequency, I'm looking for sentiment values. Positive / Negative word scoring.

With that said, thank you for your information as well =]
 

BlueIshDan

&#9760;
Local time
Today, 11:35
Joined
May 15, 2014
Messages
1,122
You're awesome! Haha

I did add these words to my seed already as well :p
 

BlueIshDan

&#9760;
Local time
Today, 11:35
Joined
May 15, 2014
Messages
1,122
FYI, This turned out to be a really cool and informative project for me. I've been able to query my set of 5,000 incoming emails and come up with a spectrum between positive neutral and negative emails.

The results seem to be effective, even with a small table of un-tuned word sentiment values.

Fun times =]

Emails Table:


Sentiment Scores Query:


I have also been able to output pivot tables displaying how many time distinct words were used by individuals, displaying both word popularity and communication amounts.

Glimpse:
 

Attachments

  • WordUsage.jpg
    WordUsage.jpg
    112.3 KB · Views: 448
  • EmailWords.jpg
    EmailWords.jpg
    95.4 KB · Views: 458
  • SentimentScores.jpg
    SentimentScores.jpg
    91 KB · Views: 446

Users who are viewing this thread

Top Bottom