Applications and Challenges for Sentiment Analysis : A Survey

DOI : 10.17577/IJERTV2IS2331

Download Full-Text PDF Cite this Publication

Text Only Version

Applications and Challenges for Sentiment Analysis : A Survey

1 Mr. Saifee Vohra, 2 Prof. Jay Teraiya

1P.G. Student, Marwadi Education Foundation Group of Institutes, Rajkot, Gujarat

2Asst. Professor, IT department, Marwadi Education Foundation Group of Institutes, Rajkot, Gujarat

Abstract

With rapid development of Web 2.0 applications such as microbloging, social networks, e-commerce sites, news portals and web-forums reviews, comments, recommendations, ratings and feedbacks are generated by users. This user generated content can be about products, people, events, etc. This information is very useful for businesses, governments and individuals. While this content meant to be helpful, bulk of this user generated content require the use of automated techniques for mining and analyzing because manual analysis are difficult for such a huge content. Sentiment analysis is the automated mining of attitudes, opinions, and emotions from text, speech, and database sources through Natural Language Processing (NLP). This paper presents a survey on the Sentiment analysis applications and challenges with their approaches and techniques.

  1. Introduction

    What other people think has always been an important piece of information of us during the decision making process [11]. Currently users of web

    2.0 contribute content actively in product review websites, blogs and social media and web-forums. Opinions and reviews can now be found almost everywhere-blogs, social media like Facebook and Twitter, web-forums, e-commerce sites, RSS feeds etc. These opinions are helpful for both business organizations and individuals, but the huge amount of such opinionated text data becomes overwhelming to users. How to analyze and summarize the opinions expressed in these huge opinionated text data is a very interesting domain for researchers. This new research domain is usually called Sentiment Analysis or Opinion Mining.

    Sentiment analysis is the automated mining of attitudes, opinions, and emotions from text, speech, and database sources through Natural Language Processing. Sentiment analysis involves classifying opinions in text into categories like "positive" or "negative or

    "neutral". It is often referred to as subjectivity analysis, opinion mining, and appraisal extraction [11].

    Main fields of research in Sentiment analysis are Subjectivity Detection, Sentiment Prediction, Aspect Based Sentiment Summarization, Text Summarization for Opinions, Contrastive Viewpoint Summarization, Product Feature Extraction, detecting opinion spam. Subjectivity Detection is a task of determining whether text is opinionated or not. Sentiment Prediction is about predicting the polarity of text whether it is positive or negative. Aspect Based Sentiment Summarization provides sentiment summary in the form of star ratings or scores of features. Text Summarization generates a few sentences that summarize the reviews of a product. Contrastive Viewpoint Summarization puts an emphasis on contradicting opinions. Product Feature Extraction is a task that extracts the product features from its review. Detecting opinion spam is concern with identifying fake or bogus opinion from reviews.

    Sentiment prediction can be done at the document level, sentence level and phrase level. In document level the sentiment in the entire document is summarized as positive, negative or objective. Sentence level prediction classifies individual sentiment bearing sentences. At phrase level phrases in a sentence are classified according to polarity.

    The objective of this paper is to discover the concept of Sentiment Analysis in the field of Natural Language Processing and explore its applications and challenges in this field. The paper is organized as follows: Section 2 provides the overview of the most commonly used approaches and techniques in Sentiment analysis. The applications are mentioned in section 3. Section 4 discusses the challenges present in sentiment analysis. Section 5 concludes the manuscript.

  2. Approaches and Techniques for Sentiment Analysis

    1. Approaches

      Depending on the task at hand and perspective of the person doing the sentiment analysis, the approach can be discourse-driven, relationship-driven, language- model driven, or keyword-driven [14].

      1. Knowledge-based approach

        The main task in this approach is the construction of word lexicons that indicate positive class or negative class. The sentiment values of the words in the lexicon are determined prior to the sentiment analysis work. Lexicons can be created in different ways. It can be created by starting with some seed words and then using some linguistic heuristics to add more words to them, or starting with some seed words and adding to these seed words other words based on frequency in a text. SENTIWORDNET 3.0 is a publicly available lexical resource explicitly devised for supporting sentiment classification and opinion mining applications [5].

      2. Relationship-based approach

        In this approach the different relationships between features and components is analyzed for sentiment classification task. Such relationships may be relationships between different participants, relationships between product features. For example, if one wants to know the sentiment of customers about a product brand, one may compute it as a function of the sentiments on different features or components of it.

      3. Language models

        In this approach the n-gram language models are built. Presence or frequency of n-grams can be used. In text classification, frequency of n-grams gives better results. Normally, the frequency is converted to TF- IDF to take terms importance for a document into account. However, Pang et al. [1] found that term- presence gives better results than term frequency. Their research of movie reviews shows that uni-gram presence is more suited for sentiment analysis. Dave et al. [6] found that bi-grams and tri-grams worked better than uni-grams in sentiment classification of product reviews.

      4. Discourse structures and semantics

        This approach uses discourse relation between text components for classification. In many reviews, the overall sentiment is usually expressed at the end of the text [1]. In this discourse-driven approach the sentiment of the whole review is obtained by determining sentiment between different discourse components and the discourse relations that exist between them. In such

        an approach, the last paragraph of the review might be given more weight in the determination of the sentiment of the whole review.

    2. Techniques

      Sentiment analysis can be implemented using both supervised and unsupervised methods of classification. Supervised methods have shown better performance than the unsupervised methods. However, the unsupervised methods is important too because supervised methods demand large amounts of labeled training data that very expensive whereas acquisition of unlabeled data is easy. Most domains except movie reviews lack labeled training data in this case unsupervised methods are very useful for developing applications.

          1. Supervised Techniques

            Supervised techniques can be implemented by building a classifier. This classifier is trained by examples which can be manually labelled. Mostly used supervised algorithms are Support Vector Machines (SVM), Naive Bayes classifier and Maximum Entropy. It has been shown that Supervised Techniques outperform unsupervised techniques in performance [1].

            Cui et al. [12] have arued that SVMs are more appropriate for sentiment classification because they can better perform when review contains both positive and negative words. However, when the set of training data is small, a Naïve Bayes classifier might be more appropriate because SVMs requires a large set of data in order to build a high-quality classifier. One of the most important tasks in sentiment classification is selecting an appropriate set of features. The most commonly used features in sentiment classification are introduced below.

            Term presence and their frequency:

            These features include uni-grams or n-grams and their frequency or presence. These features have been widely and successfully used in sentiment classification. Pang et al. [1] claim that uni-grams gives better results than bi-grams in movie review sentiment analysis, but Dave et al. [6] report that bi-grams and tri-grams give better product-review polarity classification.

            Part of speech information:

            Part-of-speech is used to disambiguate sense which in turn is used to guide feature selection [11]. Part-of- speech tagging is useful for identifying adjectives and adverbs in the sentences which identify the opinion words and nouns which are used to identify features of the products.

            Negations:

            Negation is also an important feature to take into account since it has the potential of reversing a sentiment [11]. For example, The book is great and The book is not great, here the negation word not makes the second sentence negative.

            Opinion words and phrases:

            Opinion words and phrases such as like, nice, hate, I'd suggest that… are words and phrases that express positive or negative opinions. The main approaches to identify the semantic orientation (positive or negative) or polarity of an opinion words are statistical-based or lexicon-based.

            The main limitation of supervised learning is that it is dependent on the amount and quality of the training data and may fail when training data are insufficient.

          2. Unsupervised Techniques

      In unsupervised technique, classification is done by comparing the features of a given text against word lexicons whose sentiment values are determined prior to their use. For example, start with positive and negative word lexicons, analyze the document for which sentiment need to find. Then if the document has more positive word lexicons, it is positive, otherwise it is negative. The most prominent work done using unsupervised methods for opinion mining and sentiment detection is by Turney [2]. He uses poor and excellent seed words as they are appear more in web for calculating the semantic orientation of phrases, where orientation is measured by pointwise mutual information. The sentiment of a document is calculated as the average semantic orientation of all such phrases. He was able to achieve 66% accuracy for the movie review domain.

      Ting-Chun Peng and Chia-Chun Shih [13] uses part-of-speech(POS) patterns for extracting the sentiment phrases of each review, they used unknown sentiment phrase as a query term and get top-N relevant phrases from a search engine. Next, sentiments of unknown sentiment phrases are computed based on the sentiments of nearby known relevant phrase using lexicons. Gang Li & Fei Liu [10] developed an approach for clustering documents into positive group and negative group based on the k-means clustering algorithm.

  3. Applications of sentiment analysis

    Sentiment analysis can be used in diverse fields for various purposes. This section discusses some of the common ones. The examples presented in this section are not complete but simply a snap shot of the possibilities.

      1. Online Commerce

        The most general use of sentiment analysis is in e- commerce activities. Websites allows their users to submit their experience about shopping and product qualities. They provide summary for the product and different features of the product by assigning ratings or scores. Customers can easily view opinions and recommendation information on whole product as well as specific product features. Graphical summary of the overall product and its features is presented to users. Popular merchant websites like amazon.com provides review from editors and also from customers with rating information. http://tripadvisor.in is a popular website that provides reviews on hotels, travel destinations. They contain 75 millions opinions and reviews worldwide. Sentiment analysis helps such websites by converting dissatisfied customers into promoters by analyzing this huge volume of opinions.

      2. Voice of the Market (VOM)

        Voice of the Market is about determining what customers are feeling about products or services of competitors. Accurate and timely information from the Voice of the Market helps in gaining competitive advantage and new product development. Detection of such information as early as possible helps in direct and target key marketing campaigns. Sentiment Analysis helps corporate to get customer opinion in real-time. This real-time information helps them to design new marketing strategies, improve product features and can predict chances of product failure.

        Zhang et al. [7] proposed weakness finder system which can help manufacturers nd their product weakness from Chinese reviews by using aspects based sentiment analysis. There are some commercial and free sentiment analysis services are available, Radiant6, Sysomos, Viralheat, Lexalytics, etc. are commercial services. Some free tools like www.tweettfeel.com, www.socialmention.com are also available.

      3. Voice of the Customer (VOC)

        Voice of the Customer is concern about what individual customer is saying about products or services. It means analyzing the reviews and feedback of the customers. VOC is a key element of Customer Experience Management. VOC helps in identifying new opportunities for product inventions. Extracting customer opinions also helps identify functional requirements of the products and some non-functional requirements like performance and cost.

      4. Brand Reputation Management

        Brand Reputation Management is concern about managing your reputation in market. Opinions from

        customers or any other parties can damage or enhance your reputation. Brand Reputation Management (BRM) is a product and company focused rather than customer. Now, one-to-many conversations are taking place online at a high rate. That creates opportunities for organizations to manage and strengthen brand reputation. Now Brand perception is determined not only by advertising, public relations and corporate messaging. Brands are now a sum of the conversations about them. Sentiment analysis helps in determining how companys brand, product or service is being perceived by community online.

      5. Government

    Sentiment analysis helps government in assessing their strength and weaknesses by analyzing opinions from public. For example, If this is the state, how do you expect truth to come out? The MP who is investigating 2g scam himself is deeply corrupt. [15]. this example clearly shows negative sentiment about government.

    Whether it is tracking citizens opinions on a new 108 system, identifying strengths and weaknesses in a recruitment campaign in government job, assessing success of electronic submission of tax returns, or many other areas, we can see the potential for sentiment analysis.

  4. Challenges for Sentiment analysis

    Sentiment analysis classifies text as positive, negative or else objective, so it can be thought as text classification task. Text classification has many classes as there are many topics but sentiment analysis has only three classes. However, there are many factors that make sentiment analysis difficult compared to traditional text classification. The following are some of the factors.

      1. Coreference resolution

        Coreference resolution is the problem ofidentifying what a pronoun, or a noun phrase refers to. For example, "We watched the movie and went to dinner; it was awful." What does "It" refer to?

        Coreference resolution may be useful for the topic/aspect based sentiment analysis. Coreference resolution may improve the accuracy of opinion mining.

      2. Temporal Relations

        The time of reviews may be important for sentiment analysis. The reviewer may think that Windows Vista is good in 2008, but now he may have negative opinion in 2009 because of new Windows 7.So assessing this kind of opinions that are changed with time may

        improve the performance of the sentiment analysis system. This helps us to observe if a certain product gets improved with time, or people change their opinion about a product.

      3. Sarcastic sentences

        Text may have Sarcastic and ironic sentences. For example, What a great car, it stopped working in the second day. In such case, positive words can have negative sense of meaning. Sarcastic or ironic sentences can be hard to identify which can lead to erroneous opinion mining.

      4. Requirement of World Knowledge

        Knowledge about worlds facts, events, people are often required to correctly classify the text. Consider the following example [9],

        Casablanca and a lunch comprising of rice and sh: a good Sunday

        The system without world knowledge classifies above sentence as positive due to the word good, but it is an objective sentence because Casablanca is the name of the famous movie.

      5. Domain Considerations

        The accuracy of sentiment classication can be inuenced by the domain of the items to which it is applied. The reason is that the there are many words whose meaning changes from domain to domain. For example [11], Go read the book. This sentence has positive sentiment in book domain while it indicates negative sentiment for movie domain.

      6. Grouping synonyms

        Many times text contains different words having same meaning. So such word should be identified and group together for accurate classification. It is a difficult task to identify these words, as people often use different words to describe the same feature. For example, voice and sound both refer to the same feature in phone review.

      7. Thwarted Expectations

        Some text contains sentences starting with different context which has different context at the end. For example,

        The cast was not good, actors performed poorly, but I liked it.

        In above review the last sentence makes the whole review positive. If term frequency considered the above statements would classify as negative due to more negative words in review.

      8. Negation

        In traditional text classification small differences between two pieces of text don't change the meaning very much. In Sentiment analysis, however, "the movie was great" is very different from "the movie was not great. Negation handling is a difficult task in sentiment analysis as it reverses the polarity. Negation also expresses by sarcasm and implicit sentences which doesnt contain any negative words.

      9. Review Spam Detection

    On product review site, many people write fake reviews, called review spam, to promote their products by giving undeserving positive opinions, or defame their competitors products by giving false negative opinions. The opinion spam identication task has great impacts on industrial communities. If the opinion provided services contain large number of spams, they will affect the users experience. Furthermore, if the user is cheated by the provided opinion, he will never use the system again.

  5. Conclusion

    Applying Sentiment analysis to mine the huge amount of data has become an important research problem. This paper summarizes some of the most commonly used applications and challenges in sentiment analysis. Now business organizations and academics are putting forward their efforts to find the best system for sentiment analysis. Although, some of the algorithms have been used in sentiment analysis gives good results, but still no algorithm can resolve all the challenges. Most of the researchers reported that Support Vector Machines (SVM) has high accuracy than other algorithms, but it also has limitations. It is found that sentiment classification is domain dependent. Different types of classification algorithms should be combined in order to overcome their individual drawbacks and benefit from each others merits, and enhance the sentiment classification performance.

    There is a huge need in the industry for such applications because every company wants to know how consumers feel about their products and services and those of their competitors Sentiment analysis can be developed for new applications. the techniques and algorithms used for sentiment analysis have made good progress, but a lot of challenges in this field remain unsolved. More future research can be done for solving these challenges.

  6. References

  1. B. Pang, L. Lee, and S. Vaithyanathan, Thumbs up?: sentiment classification using machine learning techniques, Proceedings of the ACL-02 conference on Empirical methods in natural language processing, vol.10, 2002, pp. 79- 86.

  2. P. Turney, Thumbs up or thumbs down? Semantic orientation applied to unsupervised classification of reviews, Proceedings of the Association for Computational Linguistics (ACL), 2002, pp. 417424.

  3. M. Taboada, J. Brooke, M. Tofiloski, K. Voll, and M. Stede, "Lexicon-based methods for sentiment analysis, Computational Linguistics, vol. 37, 2011, pp. 267-307.

  4. M. Hu and B. Liu, "Mining and summarizing customer reviews," Proceedings of the tenth ACM international conference on Knowledge discovery and data mining, Seattle, 2004, pp. 168-177.

  5. S. Baccianella, A. Esuli, and F.Sebastiani, "SentiWordNet 3.0: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining," Proceedings of the Seventh conference on International Language Resources and Evaluation, 2010, pp. 2200-2204.

  6. K. Dave, S. Lawrence, and D. M. Pennock, Mining the peanut gallery: Opinion extraction and semantic classification of product reviews, Proceedings of WWW, 2003, pp. 519 528.

  7. W. Zhang, H. Xu, W. Wan, Weakness Finder: Find product weakness from Chinese reviews by using aspects based sentiment analysis, Expert Systems with Applications, Elsevier, vol. 39, 2012, pp. 10283-10291.

  8. B. Liu, Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer, 2006.

  9. A. Joshi, A.R. Balamurali, P. Bhattacharyya, and R. Mohanty, C-feel-it: a sentiment analyzer for microblogs, In Proceedings of ACL: Systems Demonstrations, HLT 11, 2011, pp. 127132.

  10. G. Li, and F. Liu, "A Clustering-Based Approach on Sentiment Analysis," IEEE International Conference on Intelligent System and Knowledge Engineering, Hangzhou, China, vol. 2010 / 1, pp. 331-337.

  11. B. Pang and L. Lee, Opinion mining and sentiment analysis, Foundations and Trends in Information Retrieval 2(1-2),2008, pp. 1135.

  12. H. Cui, V. Mittal, and M. Datar, Comparative Experiments on Sentiment Classification for Online Product Reviews. In Proceedings of AAAI-06, 2006, pp.1265-1270.

  13. T. Peng, C. Shih, An Unsupervised Snippet-Based Sentiment Classification Method for Chinese Unknown Phrases without Using Reference Word Pairs. Proceedings of the International Conference on Web Intelligence and Intelligent Agent Technology, 2010, pp.243-248.

  14. G. Gebremeskel, Sentiment Analysis of Twitter Posts about News, Masters Thesis, University of Malta, May 2011.

  15. http://proudtobeindian.net/indian-media-exposed

Leave a Reply