mining subjective data

Posted by Kaiyuan Chen on October 15, 2017

Survey on Mining Subjective Data on theWeb

by Mikalai Tsytsarau · Themis Palpanas

This is a review of Development of Sentiment analysis and opinion mining. It might be useful for extraction of logic.

Subjectivity Analysis

The whole subjectivity analysis can be composed into three components:

  • identify
  • classify
  • aggregate

for opinion mining, it is the first two steps; opinion aggreation is post-processing step. idenity: opinionative and non-opinionative phrases classification: distinguish postive/negative texts problem: neutral & irrelevant categories

Machine learning:

1) learn the model from a corpus of training data (supervised, unsupervised), 2) classify the unseen data based on the trained model. SVM is useful Pang and Lee(2002)

Dictionary approach:

pre-build dictionary that contains opinion of words \frac{weight * modifier}{weight} words can change polarity when in different context

Statistical approach:

Point-wise Mutual Information (PMI):replacing probability values with the frequencies of term occurrence F(x) and co-occurrence F(x near y): PMI = log_2 F(x near y) / F(x)F(y)

and use sentiment polarity to calculate difference between PMI values against two opposing lists of words

Semantic Approach:

Wordnet Similar to statistical methods, two sets of seed words with positive and negative sentiments are used as a starting point for bootstrapping the construction of a dictionary.

Opinion Aggregation

The key is feature extraction and aggregation.