Opinion Mining and Sentiment Analysis Application for Opinion Classification from Education Questionnaire

Amir Hamzah

doi:10.17577/IJERTV6IS110226

Volume 06, Issue 11 (November 2017)

Opinion Mining and Sentiment Analysis Application for Opinion Classification from Education Questionnaire

DOI : 10.17577/IJERTV6IS110226

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 125
Total Downloads : 261
Authors : Amir Hamzah
Paper ID : IJERTV6IS110226
Volume & Issue : Volume 06, Issue 11 (November 2017)
Published (First Online): 28-11-2017
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Opinion Mining and Sentiment Analysis Application for Opinion Classification from Education Questionnaire

Amir Hamzap)

1)Departement of Informatics Engineering Institut Sains & Teknologi AKPRIND Yogyakarta, Indonesia

Abstract Measurement of academic services using questionnaires with multiple choice answers generally provide comments and advice columns. In the data analysis results, comments and suggestions made by the thousands of students cannot be utilized due to the lack of analysis tools. Whereas comments and suggestions could be actually contain student opinions on various things, such as facilities, faculty, library and others. Opinion mining and sentiment analysis as a new tool in text mining can be applied to the data to utilize comments and suggestions. This research applied HMM-POS Tagger to give automatically POS TAG to the sentence based on training POS TAG data by using Hidden Markov Model. By implementing POS TAG pattern the comments can then be determined whether it was opinion or not. Furthermore if it were opinion it can be determined its target and also the orientation of the opinion whether it is positive or negative. The data used was 1,000 comments given POS-TAG manually and 1,000 comments as test data. Sentiment analysis is applied using four methods of classification, namely SVM, NBC, ME and KM-Clustering. The result showed that the accuracy of POS-Tagger was 0.95 and the average of accuracy of four classification method was 0.85.

Keywords: HMM POS Tagger; opinion classification

INTRODUCTION

Opinion and opinion orientation are the most important part of decision making for a policy. The right decision is strongly influenced by opinion analysis from various sources related to decision making. For example in the business world, the addition of products by the production manager requires the analysis of the product review of goods on the market. Other examples such as the management of educational services in universities, the measurement of the level of service learning satisfaction can be measured from the opinions of students about the learning process. Opinions appear in a variety of situations, e.g. deliberately requested by an opinion assessment tool through a request for suggestions in a questionnaire activity, or appear naturally from an online forum provided by the college's official site. The volume of online opinion in the form of free text is getting more and more and generally not utilized because of its unstructured shape.

The existence of internet and other on-line information sources is growing very rapidly. Online data and information from companies and organizations are generally unstructured and generally in the form of text that reaches 80% [5]. The emergence of social media such as Facebook (2004) and

Tweeter (2006) has encouraged activities such as reviews, discussion forums, blogs, micro-blogs, comments, and posts that multiply the existence of text documents on the internet. This is because the social media has been used both by individuals and organizations for various interests in conducting information sharing activities. This explosive condition of information further complicates the process of data mining as well as it has been predicted [10]. Therefore, the development of research in the field of opinion mining becomes a very important topic in addition to the previous topics, namely data mining and text mining.

One branch of research that evolved from the information explosion situation on the internet was sentiment analysis and opinion mining. Opinion mining is a challenging research because there is an accumulation of research challenges from Information Retrieval (IR) : Information Extraction, Information Summarization, Document Classification and from Natural Language Processing (NLP) fields such as Named Entity Recognition (NER) and Document Subjectivity Analysis [9]. This research branch examines how one extracts opinions from online media and performs an analysis of those opinions. Sentiment Analysis or opinion mining is a computational study of people's opinions, appraisals and emotions through entities, events and attributes [6].

The application of sentiment analysis and opinion mining to conduct policy evaluation and decision making promises a more practical and economical way than the classical method using the questionnaire approach. Critics of the method of questionnaire is that it is very time consuming and expensive method, while it is also providing results that sometimes cannot catch the real problem. Questionnaires and interviews are considered weak because in general people do not like to answer survey questions that are sometimes long-winded. In this position opinion mining answers the question of opinion extracting by listening rather than by asking like a questioner (by asking), thus more accurately reflecting the true reality [13]. Even further opinion mining allows capturing the emotions of opinion owners [7].

The Institute of Science and Technology AKPRIND as a higher education institution constantly wants to improve services in learning management. For that purpose at the end of each semester the academic administration should evaluate the learning service using the questionnaire instrument with the items provided. So far, there are data of questionnaires that cannot be used and analyzed that is student's suggestion data.

This data amounts to thousands of suggestions or more exactly the opinion of all participants from all courses. Suggestions or opinions can be about academic atmosphere, lecturer, class room, air conditioners, Over Head Projector, or other campus facilities. Within a few years this data is getting stacked up unusable.

consuming, there has been a wide interest in the automation of the POS tagging process [3].

If there is a sentence consisting of n words (wi: i = 1, .., n), and will be assigned a POS-tag for each word that composes the sentence (ti: i = 1, .., n), then this issue can be formulated as seeking the maximum value of:

t

The problem in this research is how to build a software

t arg max P(tn | wn)

(1)

prototype that can extract opinions from a collection of commentary text documents, and then determine the target of opinion and opinion orientation.

n 1 1

1

By applying Bayes theorem in conditional probability, then

(1) can be written to be:

The purpose of this study is to review the application of opinion mining techniques and sentiment analysis to analyze

1 1 1

t arg max

P(wn | tn)P(tn)

(2)

the data suggestions / opinions of students. This research is also designed to create prototype of opinion mining software and sentiment analysis that can extract opinion, analyze opinion, map target of opinion and set opinion orientation

P(wn)

1

Since the value of the denominator is always the same for every sentence, then (2) can be written to be:

t arg max P(wn | tn)P(tn)

(3)
RELATED WORKS

The study of opinion mining applications conducted by
[4] on the English National Health Service website that captures 6,412 free comments from treated patients. Analysis

1 1 1

By making two assumptions, then equation (3) can be written:
1. The probability of a word depends only on its POS-tag.
  
  n
  
  of comments related to hygiene, hospital services and various
  
  P(wn | tn) P(w | t )
  
  (4)
  
  aspects of hospital responsibilities has resulted in conform between 81% to 89% compared with quantitative rating method that as provided through the questionnaire.
  
  1 1 i i
  
  i1
2. The probability of a POS-tag depends only on the previous POS-tag.
In opinion mining the method to detect opinions is by

P( n) P(t | t )

(5)

applying HMM-POS Tagger. This method is applied to the

t1

n

i1

i i1

collection of trainer data which is the opinion sentence and

By applying (4) and (5) to (3) to be obtained:

non-opinion sentence that has been given POS-TAG or Part of Speech's sign of the word on each word. The program is expected to extract opinions from the commentary text of the

t arg max

n

i1

P(wi | ti )

P(ti

| ti1 )

(6)

questionnaire, as well as search for the object of opinion.

There are several approaches to automated POS tagging, which are rule-based, probabilistic, and transformational- based approaches. Rule-based tagger POS sets a tag to a word based on some of the manual linguistic rules created, for example a word is tagged NOUN if it follows AJECTIVE. The probabilistic approach determines the word tag of a token based on the context probability of manually specified token tags from a corpus. The transformational-based approach combines rule-based and probabilistic approaches to automatically derive symbolic rules from the corpus [11]. The use of the Hidden Markov Model for POS Tagger Bahasa Indonesia was examined by [14] resulted in accuracy of 96.2% and [15], which resulted in an accuracy of 92.2%
THEORITICAL BACKGROUND
1. HMM POS-TAGGER
  
  HMM-POS Tagger is a method for performing POS- tagging in a sentence automatically based on an analysis and characteristic of POS-tag data from training data collection. Part-of-Speech (POS) tagging known as grammatical tagging is the process of providing a POS-tag to a word in a sentence text. Part-of-speech is a grammatical category of words in a sentence, e.g. verb (VERB), noun (NOUN), adjective (ADJECTIVE), and others. POS tagging is an important tool in many natural language processing applications such as
  
  Furthermore from the results of HMM-POS Tagger will be
  
  generated sentences that have been given POS-tag. In the next step based on these POS-Tag patterns it will be determined whether a text is an opinion or not an opinion. With POS-Tag patterns can also be determined an object of opinion.
2. SENTIMENT ANALYSIS
Sentiment analysis for comments that have been detected as an opinion is done by applying an opinion classification to determine the opinion orientation. There are many methods available in text classification. In this research, we used four methods, namely Naive Bayes Classifier (NBC) method, Support Vector Machine (SVM), Maximum Entropy (ME) and K-Means Clustering (KMC).

NBC Method

The NBC method assumes the collection of opinion documents as D ={d1, d2, …, d | D |} and collection of categories C = {c1, c2, …, c | C |}. The NBC classification is done by finding the probability P (C = cj | D = di), i.e. the probability of category cj if the document is found in. The document is seen as a tuple of words, i.e. <w1, w2, …, wn>, whose frequency of occurrence is assumed to be a random variable with Bernoulli probability distribution [8]. The document classification is to find the maximum value of:

disambiguation, parsing, question-answer, and machine translation systems. Because assigning manual part-of-speech tags to words in a sentence is costly, exhausting, and time-

VMAP = arg max P( j | w1, w2 ,…, wn )

C

c jC

(7)

By providing the Bayes theorem obtained

w. xi – b 1: first class x (15)

or

VMAP=

arg max P(w1, w2 ,…, wn | c j )P(c j )

(8)

w. xi – b -1: xi second class (16)

c jC

P(w1, w2 ,…, wn )

It can be rewritten as:

Since the denominator value is constant for a document, and assuming that each word is independent of each other the equation (8) can be written as:

yi (w xi – b) 1,

for all 1 i n (17)

n

VMAP = arg max

c jC i1

P(wi | c j )P(c j )

(9)

So the problem of finding the maximum hyperplane is the optimization problem:

Practically the calculation of P (cj) is approximated by:

Minimize w with constraints for every i = 1, …, n

P(cj)=

| doc j |

(10)

y (w x – b) 1 (18)

| contoh | i i

| docj | is the number of document categories j and | sample | is the number of sample documents (training).

In computation P (wi | cj) is approached by:

Maximum Entrophy (ME) Method

Classification with Maximum Entropy (ME) applies information theory. Entropy is the average set of information contained in a set of events X = {x1, x2, …, x3} which can be

P(wi|cj)=

| ni 1|

n | vocabulary |

(11)

expressed in:

H (p) =

1

p(x) log e p(x)

(19)

ni is the frequency of occurrences of the word wi in category cj, and n is the word frequency in the category document cj and | vocabulary | is the number of occurrences of all words in the sample document collection.

SVM Method

Support Vector Machine (SVM) was first developed by [1] and continued with a more detailed description by [2]. The SVM concept can be explained as the search for the best hyperplane that serves as a separator of two classes in the input space. For a collection of documents in the form of:

= {(xi, yi) | xi Rp, yi {-1,1}} (12)

where yi is 1 or -1, indicates which class the xi is located. Each xi is a real p-dimensional vector. There will be a maximum margin hyperplane that divides the points for points that have yi = 1 of which has yi = – 1.

Any hyperplane can be written as set of points x satisfying
1. x – b = 0 where (.) indicates dot product. The vector w is a normal vector that is perpendicular to the hyperplane. Parameter || w || determine the offset of hyperplane from the origin along the normal vector w. It will be attempted to choose w and b to maximize margins, or the distance between parallel hyperplanes apart as far as possible to separate data.
  
  These hyperplanes can be described by the equation:
  
  w. x – b = 1 (13)
  
  and
  
  w. x – b = -1 (14)
  
  The distance between the two hyperplanes is 2 / || w ||, so we want to minimize w. To prevent data points from falling into the margins, the following limits must be added:
  
  For each value i
  
  xX
  
  With the value of H (p) is the set of information from the set of events X, and p (x) is the probability of the occurrence of x in the set X. The Maximum Entropy (ME) method is a method to maximize the value of H (p). The maximum value of H (p) will be obtained if the value of X is uniform so that p (x) = 1 /
  
  | X | with | X | is the cardinality of the set X.
  
  The application of the ME Method for document classification is performed by the conditional probability approach of a class of documents when a document is present. Suppose that the set of class of documents is A = {a1, a2, …, ac} and the collection of documents is D = {d1, d2, …, dn}. The determination of class a of document d will be seen by determining the conditional probability value p (a | d) of maximum value of the probability distribution with maximum entropy.
  
  KM-C Method
  
  K-means clustering approach is clustering by using a cluster center as grouping criterion. The cluster center is the average value of all cluster members' objects. Suppose we have a collection of documents D = {di | i = 1,2, … | D |} = {d1, d2, …, d | D |} to be clustered into K clusters. In this case di is a real valued vector that represents the document. The vector has a dimension n, which is the number of unique words in the document collection. Document collections can be represented by the n x |D| size matrix, denoted by [Xij], with the xij element represnting Term Frequency (TF) i.e. the occurrence of the term (word) i in the document j. To obtain better accuracy in computation, the matrix containing the term frequency value is converted into a real element matrix that takes into account the frequency of occurrence of documents containing the i-th word by weighting Invert Document Frequency (IDF). Furthermore, it is also strived that the length of the document vector is always 1 by normalizing the document vector. Weighting that combines TF with IDF then known as TF-IDF weighting that can be formulated as:
  
  wij =
  
  N
  
  (ln( fij ) 1).log
  
  ni
  
  (20)
  
  code as in Table I. Collection of data training for POS-Tagger has format as shown in Fig 1.
  
  <DOC-0001>tolong/VBI gedungnya/NNG dibersihkan/VBP <*DOC>
  
  <DOC-0002>birokrasi/NNU kampus/NNC yang/PR jelek/JJ membuat/VBT
  
  mahasiswa/NNC merasa/VBT dikhianati/VBP
  
  <*DOC>
  
  <DOC-0003>kursi/NNC di/IN dalam/IN ruangan/NNU kurang/RB mengenakkan/JJ <*DOC>
  
  <DOC-0004>lebih/RB ditingkatkan/VBP lagi/RB
  
  <*DOC>
  
  <DOC-0005>acnya/NNG kurang/RB dingin/JJ
  
  <*DOC>
  
  …
  
  <DOC-0999>pasang/VBT ac/NNC pak/UH soalnya/NNG panas/JJ <*DOC>
  
  N 2
  
  (ln( fij ) 1).log
  
  n
  
  i
  
  K-Means clustering algorithm is done by taking K initial vector as the seeds of the centers of the cluster. Furthermore the entire vector document is calculated distance against each cluster center. The document vector that closest to a cluster center then is defined to be a new member of the cluster. The cluster center is the average vector of all vectors in a given cluster.
  
  The algorithm of K-means can be written as follows:
  1. Take K object as seed from K center cluster
  2. For all objects: locate the cluster with the closest distance, and assign the object into the cluster.
  3. Recalculate the center of the cluster with the average of all objects in the cluster
  4. Calculate the criteria function and evaluate. If the criterion function does not change the algorithm stops.

PROPOSED METHOD

Research Materials

The materials used in this research are POS TAG for HMM-POS Tagger and pairs of POS TAG as rules for opinion detection and target detection which were taken from [14] and [12] with modification. These rules are in the form of POS TAG collections. The POS TAG related to the HMM POS Tagger is shown as in TABLE I, while the POS Tagger data training that use this table is shown in Fig.1. The rule of opinion detection was based on POS TAG pair as shown in TABLE II. The rule of target detection was based on POS TAG pairs as shown in TABLE III.

The source of comments data were text documents consisting of comment collection from suggestion on student questionnaire from Institute of Science and Technology AKPRIND for 6 semesters :
1. Semester 1 2014/2015: 4.128 comments
2. Semester 2 2013/2014: 3.152 comments
3. Semester 1 2013/2014: 3.801 comments
4. Semester 2 2013/2014: 2.551 comments
5. Semester 1 2012/2013: 3.663 comments
6. Semester 2 2012/2013: 1883 comments
  
  From these collections 1000 documents were selected as training data and 1000 documents were selected as test data.
Data Analysis

The research steps taken in solving the problem are divided into two steps: the pre-processing step of the document and the step of detection and extraction of opinion, the target detection and finally the classification of opinion

Pre-processing Document and training

Documents in the form of student comments with irregular form are converted into standard sentence. The next step, the parsing process in the collection of training data so that it can be determined the frequency of occurrence of each POS-TAG word in the collection of train data with POS-TAG

Fig. 1. POS-Tagger data training format

The collection of training data for opinion classification has format as shown in Fig. 2. In the training data the orientation of opinion document has been manually set, which are positive opinions and negative opinions. The orientation of opinion on the training data is labeled on the document in the form of code (+) and (-).

<DOC-0001(+)>mengajarnya sudah oke, perlu ditingkatkan </DOC>

<DOC-0002(-)> kamar mandi kotor sekali, tolong dibersihkan yang rutin </DOC>

<DOC-0003(+)>pak joko ngajarnya memang joss..</DOC>

<DOC-0004 (-)>kipas angin kurang baik, fasilitas wivi tolong lebih ditingkatkan lagi biar baik </DOC>

<DOC-0005(-)>lebih ditingkatkan lagi fasilitas perpustakannya </DOC>

<DOC-0006(+)>birokrasi kampus sangat jelek urusan terlalu bertele-tele </DOC>

Fig. 2. Classification data training format

After POS Tagger training data has been processed, the information of POS-TAG to each word in the sentence collection can be determined. The next step, i.e. opinion extraction and opinion classification was conducted follow the flowchart as shown in Fig.3.

No	POS	POS Name	Contoh
1	OP	Open Parenthesis	({[
2	CP	Close Parenthesis	)}]
3	GM	Slash	/
4	;	Semicolon	;
5	:	Colon	:
6		Quotation	" and '
7	.	Sentence terminator	.
8	,	Comma	,
9	–	Dash	–
10		Ellipses
11	JJ	Adjective	Baik, Bagus
12	RB	Adverb	Sekali, sangat

TABLE I. POS TAGGER DATA

13	NNC	Countable Noun	Kursi, Kulkas
14	NNU	Uncountable Noun	Gula, hujan
15	NNP	Proper Noun	Toyota, Sony
16	NNG	Genetive Noun	Motornya
17	VBI	Intransitive Verb	Pergi
18	VBT	Transitive Verb	Membeli
19	VBP	Passive Verb	ditingkatkan, diperbaiki
20	IN	Preposition	Di, Dari, Ke
21	MD	Modal	Bisa
22	CC	Coor-Conjunction	Dan, Atau, tetapi
23	SC	Subor-Conjunction	Jika, Ketika
24	DT	Determiner	Para, Ini, Itu
25	UH	Interjection	Wah, Aduh, Oi
26	CDO	Ordinal Numerals	Pertama, Kedua, Ketiga
27	CDC	Collective <>Numerals	Berdua
28	CDP	Primary Numerals	Satu, Dua, Tiga
29	CDI	Irregular Numerals	Beberapa
30	PRP	Personal Pronouns	Saya, Mereka
31	WP	WH-Pronouns	Apa, Siapa, Dimana
32	PRN	Number Pronouns	Kedua-duanya
33	PRL	Locative Pronouns	Sini, Situ
34	NEG	Negation	Bukan, Tidak
35	SYM	Symbols	#,%,^,&,*
36	RP	Particles	Pun, Kah
37	FW	Foreigns Words	Word

TABLE II. RUL E FOR OPINION DETECTTION

No	Rule	Examples
1	RB JJ	sangat buruk, dengan bagus , memang jelek
2	RB VB	semoga berjalan, jika memilih
3	NN JJ	LCDnya jelek, alatnya bagus
4	NN VB	Ngajarnya membosankan, perkataannya menjengkelkan
5	JJ VB	mudah difahami, cepat memahami
6	CK JJ	bagus atau baik, tetapi malas
7	JJ BB	sama bagus

8	VB VB	membuat pusing, membikin bosan
9	JJ RB	indah sekali, bagus sekali
10	VB JJ	membikin bingung
11	NEG JJ	tidak seindah, tidak semudah
12	NEG VB	tidak mengerti, tidak memahami, bukan mengajar
13	PRP VBI	saya menyukai, kita suka
14	PRP VBT	kita suka
15	VBT NN	memiliki kedekatan, memiliki kepekaan
16	MD VBT	Perlu mengambil referensi
17	MD VBI	Perlu dikembangkan
18	UH VBP	Tolong dicat, tolong diperbaiki
19	JJ VBP	Mudah diterima, sulit dipahami

TABLE III. RULE FOR TARGET DETECTTION

No	Rule	Examples
1	NN	Ac, lcd, internet
2	NNG	Laboratoriumnya, lcdnya
3	NNP	Pak Joko, Bu Yuli, Pengok
4	NN NN	kantin kampus, ac pengok
5	NN CC NN	Kampus dan lab
6	NN IN NN	AC di Klas

Fig. 3. Flowchart of Opinion extraxtion and classification

Opinion Extracttion and Target detection

The step of opinion extraction was done after pre- processing and POS tagging had been done. By applying the rule fo opinion detection from TABLE II the document using opinion and opinion targeting rules using target rule is determined as the scheme in Fig.3. After opinions and opinions targets was obtained the last step was to make an opinion classification to determine whether an opinion be positive or negative.

RESULT AND DISCUSSION

The prototype for Pre-processing document collection for HMM POS-Tagger has been developed in the interface as shown in in Fig.4.

Fig. 4. The result of document pre-processing

For HMM POS-Tagger demonstration to assign POS TAG to the document that is input to the system as shown in Fig.5. The sentence tolong gedungnya dibersihkan is tagged by system as depicted in Fig.5 : tolong/VBI gedungnya/NNG dibersihkan/BP with accuracy 100%.

Fig. 5. Result of HMM POS Tagger

Opinion extraction from various target opinions can be shown in Fig.6., where target can be chosen selected from combo box component.

Fig. 6. Opinion Ekstraction without classification

The classification of opinions into two class i.e. positive or negative can be demonstrated in Fig.7. The classification method can be selected from four options (NBC, SVM, MAxEnt and KM-Klus). Those figure also shown that NBC method classification for target Perpustaaan has accuracy of 81.54%.

Fig. 7. Opinion Ekstraction with classification

Performance Testing Program

Program performance is measured through precision and recall parameters for POS Tagger HMM performance in opinion detection and target detection. Test results using 1000 data training and 1000 test data is shown in Table IV.

TABLE IV. PERFORMANCE OF HMM POS TAGGER

PerformanceM easurement

Application of HMM-POS Tagger

Opinion Detection

Target Detection

Precision

0.95

0.91

Recall

0.92

0.89

The performance of the classification program is measured through accuracy parameters. The comparison of four methods test performance is shown in Table 5. The result showed that KMC-method has best performance compared to the three other methods.

TABLE V. PERFORMANCE OF CLASSIFICATION

PerformanceM easurement

Classification Methods

NBC

SVM

ME

KMC

Accuracy

0.84

0.83

0.84

0.88
CONCLUSION

In this study the prototype of preprocessing the collection of comments has been developed. The system prototype of opinion extraction and classification has also been developed. The program has been demonstrated how to extract the opinion using HMM POS-Tagger which is equipped with sentiment analysis using four opinion classification methods, namely NBC, SVM, ME and KM-Clustering. Accuracy in opinion detection and object opinion detection averages over 90% of class of opinion opinions with four methods of accuracy reaching an average above 80%. The result also showed that the KMC-method has the best performance in accuracy compare to the three other method.

[1]	REFERENCES Boser,B.E.,Guyon, I.M. and Vapnik,V.N.,1992, "A Training Algorithm for Optimal Margin Classifiers",Proceedings of the 5th Annual ACM Workshop on Computational Learning	[9]	Pang, B., Lee, L. and Vaithyanathan, S., 2002, Thumbs up?: sentiment classification using machine learning techniques", Proceeding EMNLP '02 Proceedings of the ACL-02 conference on Empirical methods in natural language processing – Volume
	Theory,1992,pp. 1171-1183.		10, pp: 79-86
[2]	Cortes,C. and Vapnik,V., 1995,"Support-Vector Networks",	[10]	Putten,P.V.D., Kok, J. and Gupta,A., 2002, Why the Information
	Machine Learning, 20, pp.273-297		Explosion can be Bad for Data Mining, and How Data Fusion
[3]	Cutting, D., Kupiec,J., Pesderson,J. and Sibun,P., A Practical		Provides a Way Out, Proc.of the 2nd SIAM International
	Part-ofspeech Tagger, Xerox Palo Alto Research Center, in		Conference on Data Mining,pp:11-13
	Proceeding of the third Conference on applied Natural Language	[11]	Pisceldo. F., Manurung, R. and Adriani,M. Probabilistic Part-
	Processing , 1992, pp.133-140.		of-Speech Tagging for Bahasa Indonesia, Third International
[4]	Greaves, F., D.R. Cano, C. Millet, A.Darzi, and L. Donaldson,		MALINDO Workshop, colocated event ACL-IJCNLP,
	Use of Sentiment Analysis for Capturing Patient Experience		Singapore, 2009
	From Free-Text Comments, Journal of Medical Internet	[12]	Rozi,I.F., Pramono,S.H. dan Dahlan, E.A.,2012, Implementasi
	Research, 2013, 15:11, e239. Online publication date: 1-Jan-		Opinion Mining (Analisis Sentimen) untuk Ekstraksi Data Opini
	2013		Publik pada Perguruan Tinggi, Jurnal EECCIS, Vol.6,No.1,Juni
[5]	Grimes, S., 2013, Unstructured Data and the 80 Precent Rule,		2012

Clarabridge Bridgepoints.

Liu,B.,2010, Sentiment Analysis: Multi Facet Problem, IEEE Intelligence System, 25 (3),pp:76-80
Loia, L. and Senatore, S.,A fuzzy-oriented sentic analysis to capture the human emotion in Web-based content, Knowledge- Based Systems 58, 2014, pp. 75-85 Online publication date: 1-

Mar-2014
McCallum, A. and Nigam, K., 1998, "A Comparison of Event Models for Naive Bayes Text Classification", AAAI/ICML-98 Workshop on Learning for Text Categorization, pp. 41-48

Shelke,N.M., Deshpande,S. and Thakre,V., Survey of Techniques for Opinion Mining, International Journal of Computer Applications (0975 8887) Volume 57 No.13,

November 2012
Wicaksono, A.F. and Purwarianti,A., 2010, "HMM Based Part- of-Speech Tagger for Bahasa Indonesia", Proceedings of the 4th International MALINDO Workshop, Jakarta.
Widhiyanti, K and A. Harjoko, POS Tagging for Bahasa Indonesia dengan HMM dan Rule Based, INFORMATIKA Vol.8.,No.2., November 2012, pp.151-167

PerformanceM easurement	Application of HMM-POS Tagger
PerformanceM easurement	Opinion Detection	Target Detection
Precision	0.95	0.91
Recall	0.92	0.89

PerformanceM easurement	Classification Methods
PerformanceM easurement	NBC	SVM	ME	KMC
Accuracy	0.84	0.83	0.84	0.88

Opinion Mining and Sentiment Analysis Application for Opinion Classification from Education Questionnaire

Leave a Reply