- Open Access
- Total Downloads : 317
- Authors : Mohammad Khoiron, Surya Sumpeno, Adhi Dharma Wibawa
- Paper ID : IJERTV5IS010616
- Volume & Issue : Volume 05, Issue 01 (January 2016)
- Published (First Online): 28-01-2016
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Microblogging Analysis for Determining Public Policy Priority based on Public Opinion using Naive Bayes and Analytical Hierarchy Process Algorithm
Mohammad Khoiron
Telematika-CIO, Electrical Engineering Department Faculty of Industrial Technology
Institut Teknologi Sepuluh Nopember Surabaya
Surya Sumpeno
Multimedia and Networking Engineering Department Faculty of Industrial Technology
Institut Teknologi Sepuluh Nopember Surabaya
Adhi Dharma Wibawa
Multimedia and Networking Engineering Department Faculty of Industrial Technology
Institut Teknologi Sepuluh Nopember Surabaya
Abstract-The main task of a government is making and implementing public policy, and also evaluating the public policies that have been made . Often all three tasks can not satisfy the expectations of the wider community because it is arranged not based on the aspirations of a society where the government is located. Determination of public policy is more likely to consider the political aspects and the interests of a certain elite.
By seeing that problems, it is necessary to find the rapid and inexpensive solution for obtain data about what expectations is desired by the community towards a public policy. This can be obtained from the microblogging analysis, by monitoring issues of public policy that are discussed by people in the media microblogging, within a certain time.
Analysis was performed using Naïve Bayes algorithm to classify whether an opinion delivered by the public through the microblogging has a negative, positive, or neutral sentiment. Results from the classification used to determine the priority of public policy using Analytical Hierarchy Process ( AHP ) algorithm, which became the reference for making a public policy that is expected to satisfy the justice and public expectations.
Key Word: Public Policy, Public Policy Priority, Sentiment Analysis, Clasification, Naïve Bayes, Analytical Hierarchy Process
-
INTRODUCTION
Characteristic of democratic modern society is the involvement of the community in taking a public policy. The community involvement began since the government planning until implementing the public policy. Community involvement is necessary because public policy will affect their daily lives. Therefore, a democratic government should always involve the community in determining public policy.
In Indonesia now, people look more enthusiastic in discussing a public policy generated by the government. Such enthusiasm is very positive as far as to provide another perspective for the benefit of society. Public debate marks the dynamics of a society. The amount of community involvement can not be separated from the reform era that is still kept rolling with a wide range of dynamics and risks.
One kind of media that used frequently to express public opinion is microblogging social media. At this time microblogging site such as Twitter, Tumblr, and Facebook has become a very popular means of communication among Internet users, where millions of messages appear every day.
Free message format and ease of access from various platforms, making Internet users tend to switch from blogs or mailing to the microblogging service. This has caused many users are posting about a product and services that they use, to express their views on politics and religion, also criticize a public policy.
Twitter as a microblogging site with over 500 million users and 400 million tweets per day, allowing users to share the message using short text called tweets. Twitter can be a data source of the opinion and public sentiment, and then that data can be used efficiently for marketing or social studies.
In this paper will be discussed about twitter microblogging sentiment analysis using Naïve Bayes algorithm which may be utilized as consideration for determining the priority of public policy by using Analytical Hierarchy Process algorithms, so the quality of the policy are expected to fulfill the expectations and desires of the community.
– Public Policy
-
LITERATURE REVIEW
For P (Vj) and P (xi|Vj) that calculated at the time of training, the equation is as follows:
Public policy as a part of the political decision is a rules made by the government to solve the various problems and issues in society. Public policy is also a decision made by the government to perform certain actions between to do or no to to do something.
In a society that is in the jurisdiction of a country often occurs various problems, and the government
| |
() =
() =
||
(|) =
(|) =
+1
+ ||
Description:
|docs j| : the number of document at each j category
|contoh| : the number of document of all category
(b.4)
(b.5)
which holds full responsibility for the lives of the people should be able to resolve these issues. Public policy which is made and issued by the state is expected to be a solution to these problems. Public policy is a decision made to overcome the problems in a particular activity undertaken by the government in the framework of governance (Mustopadidjaja , 2002).
-
Naïve Bayes Classifier
Naive Bayes classifier is an algorithm used to find the value of the highest probability to classify the test data to the most appropriate category (Feldman and Sanger 2007). In this research, the test data is a Tweet documents. There are two stages in document classification. The first stage is the training of the documents that have been known the category, and then the second stage is the process of classifying documents of unknown category.
In a naïve Bayes classifier algorithm each document is represented by a pair of attributes "x1, x2, x3, … xn" which is x1 is the first word, x2 is the second word, and so on, while V is the set of Tweet categories. In the process of classification algorithm will search for the highest probability of all the document categories that were tested (VMAP), where the
equation is as follows:
arg
nk : the number of occurence frequency of each word
n : the number of occurence frequency of each word from each category
|kosakata| : the number of words from all categories
-
Analytical Hierarchy Process
AHP (Analytical Hierarchy Process) is a decision support system that decompose a complex multi-factor problem into a hierarchy, where each level is formed of specific elements. The main equipment AHP is a functional hierarchy with the main input is human perception. The existence of a hierarchy allows complex or unstructured problem is divided into sub- problems, then compile them into a form of hierarchy (Kusrini, 2007).
Decision makers involved to provide consideration in determining the relative importance of these factors. The general objective of the decision to be taken is located on the top of the hierarchy, while the criteria and alternative decision at a lower level sequentially. The AHP stages are as follows:
-
The establishment of a hierarchy
Hierarchy is a structure tree that is used to represent the spread of influences ranging from goals down to the structure located at the most basic level
-
Pairwise Comparison
Step in AHP involves estimating the weighting priority
(1, 2, 3, |)()
=
(b.1)
of a set of criteria or alternatives of a square matrix
(1,2,3, )
used in pairwise comparisons A = [aij], in which the weight value must be positive andif policies regarding
For P (x1, x2, x3, … xn) is constant for all
categories (Vj) so that the equation can be written as follows:
pairwise comparison is completely consistent then made a reverse comparison of that value, for example: aij = 1/aij for all i, j = 1, 2, 3, …, n.
= arg ( , , ,
| )( )
(b.2)
Furthermore , the final weight of the wi as a i-th factor
1 2 3
that has been normalized, is as follows:
w a / n a
i 1, 2 ,…, n
The equation can be simplified as follows:
ij ij ij
(b.6)
= 1 ( |)() (b.3)
Pairwise comparisonsi1scale for the relative importance
is assessing in a comparative degree of importance
Description:
=
between an element with another element. A comparative scale used in AHP according Kusrini are:
Vj : Tweet category j =1, 2, 3,n, which in the research
j1 : negative sentiment tweet category
j2 : positive sentiment tweet category
j3 : neutral sentiment tweet category P(xi|Vj) : xi probability in Vj category P(Vj) : Vj probability
Value
Description
1
Criteria / alternative A as important as the criteria / alternative B
3
A little more important than B
5
A clearly more important than B
7
A very clearly more important than B
9
A absolutely more important than B
2,4,6,8
When hesitating between two adjacent values
Table 1. Comparison Scale (Source: Kusrini, 2007:134)
-
Consistency checking
Check whether the pairwise comparisons were made based on a policy decision remains within specified limits or not. Consistency measurement naturally or deviation of consistency called consistency index (CI), which is defined as follows:
n
-
-
Determining Public Policy
Public policy has a very broad sphere, so it is necessary for an example of public policy that can be used to simulate the prioritization of public policies on the terms of public opinion that comes from Twitter. For example, the priority of public policy that will be made in this research are the MDGs
CI max
n 1
(b.7)
(Millennium Development Goals) that has eight goals.
Consistency Index of a inverse comparison matrix from scale 1 to 9 which is generated randomly, with the inverteed comparison results, for each size of the matrix is called the Random Index (RI) shown in the following table:
-
Determining Keywords
After determining public policy priorities that will be made, the next step is selecting the keywords that can represent each predetermined policy. Keywords used to search the public opinion via Twitter which are expected consistent with the public policy that has been set. Here is a list of keywords and the public policies represented in Bahasa Indonesia:
Order Matrix
RI
value
Order Matrix
RI
value
Order Matrix
RI
value
1,2
0,00
5
1,12
8
1,41
3
0,58
6
1,24
9
1,45
4
0,90
7
1,32
10
1,49
Order Matrix
RI
value
Order Matrix
RI
value
Order Matrix
RI
value
1,2
0,00
5
1,12
8
1,41
3
0,58
6
1,24
9
1,45
4
0,90
7
1,32
10
1,49
Table 2. List of Random Index (Source: Kusrini, 2007:136)
So that the consistency ratio (CR) is defined as the ratio between the CI and RI for the same order matrix
CR = CI/ RI (b.8)
CR < 0.1 then the policy is acceptable. If the CR value more than 0.1, the leader necessary to review the measures taken.
-
Overall weight evaluation
Weighting of each critera that has been obtained is multiplied by the value of the criteria for each alternative so the best alternative is an alternative that has the highest priority
-
Group decision-making / establishing policies
To produce policy outcomes of the group, each member of the group makes its own policies to copy model they have and then combining the results
-
-
-
-
Methodology
-
Flowchart of Research Methodology
Picture 2. Flowchart of Research Methodology
No
Public Policy based on MDGs
Keywords
1.
Memberantas Kemiskinan dan Kelaparan Ekstrem
Kemiskinan Kelaparan
2.
Mewujudkan Pendidikan Dasar untuk Semua
Pendidikan Buta huruf
3.
Mendorong Kesetaraan Gender dan Pemberdayaan Perempuan
Kesetaraan gender Pemberdayaan perempuan
4.
Menurunkan Angka Kematian Anak
Kematian bayi Imunisasi
5.
Meningkatkan Kesehatan Ibu
Kesehatan ibu Kesehatan reproduksi
6.
Memerangi HIV dan AIDS Malaria Serta Penyakit Lainnya
Cegah HIV Cegah penyakit
7.
Memastikan Kelestarian Lingkungan
Keanekaragaman hayati Kelestarian lingkungan
8.
Mengembangkan Kemitraan Global untuk Pembangunan
Akses internet Perdagangan bebas
No
Public Policy based on MDGs
Keywords
1.
Memberantas Kemiskinan dan Kelaparan Ekstrem
Kemiskinan Kelaparan
2.
Mewujudkan Pendidikan Dasar untuk Semua
Pendidikan Buta huruf
3.
Mendorong Kesetaraan Gender dan Pemberdayaan Perempuan
Kesetaraan gender Pemberdayaan perempuan
4.
Menurunkan Angka Kematian Anak
Kematian bayi Imunisasi
5.
Meningkatkan Kesehatan Ibu
Kesehatan ibu Kesehatan reproduksi
6.
Memerangi HIV dan AIDS Malaria Serta Penyakit Lainnya
Cegah HIV Cegah penyakit
7.
Memastikan Kelestarian Lingkungan
Keanekaragaman hayati Kelestarian lingkungan
8.
Mengembangkan Kemitraan Global untuk Pembangunan
Akses internet Perdagangan bebas
Table 3. List of Keywords
-
Data Harvesting
The process tweet data harvesting done by utilizing the Twitter Streaming APIs. Searching and collecting of public opinion in Twitter made within two mobths based on keywords that are predefined. Data obtained from the results of harvesting are stored into a database.
– Pre Processing
Before doing the feature selection process of the tweet has been obtained and to obtain more accurate results or tweet sentiment analysis, preprocessing of the exixsting tweet data need to be done, which includes:
-
Cleansing
Things done in the cleansing process includes the removal of a URL , @mention , #hashtags and delimiter (alphanumeric characters and symbols)
-
Case Folding
At this stage, all uppercase characters converted to lowercase
-
Parsing
-
This is the stage where a tweet or a sentence is separated into words
– Feature Selection
Feature selection is done before the process of learning and classification. There are two processes at this stage, namely:
-
Stop Word Removal
Elimination of vocabulary that is not a characteristic (unique word) of a document (eg: "di", "oleh", "pada", "sebuah", "karena")
-
Stemming
Process mapping and decomposition of various forms (variants) of a word to its basic word (stem), by removing the particle-particle whether it be prefixes , suffixes , and infixes that exist in every word.
-
Learning and Classification
From the feature selection that has been done, the next thing is learning process and classification using Naïve Bayes algorithm which is divided into two stages:
-
First stage
Training of tweet documents that have been known the category (negative or positive sentiment, or neutral).
-
Second stage
The process of document classification with the unknown categories (negative or positive sentiment, or neutral).
– Validation and Evaluation
This stage is necessary to validate and evaluate the extent of the learning process and classification accuracy by using Naïve Bayes algorithm that has been done.
-
Determining Priorities of Policy
From the analysis of tweet sentiment using Naïve Bayes algorithm which has been obtained, the next process is determining the priority of public policy by using Analytical Hierarchy Process (AHP) algorithm, which the hierarchical structure is formed of a number of positive sentiment tweets, the number of negative sentiment tweets, the number of neutral tweets, the number of retweets, the number of tweets in the form of questions, and tweet that is not a retweet and question (direct tweet) for each public policy.
D. RESULT AND ANALYSIS
-
-
Data Harvesting
Tweet Data were collected between June and July 2015 with the following results:
No.
Keywords
Number of tweets
1.
Kemiskinan
50226
Kelaparan
72404
2.
Pendidikan
96060
Buta huruf
10350
3.
Kesetaraan gender
1636
Pemberdayaan Perempuan
7444
4.
Kematian bayi
1766
Imunisasi
11620
5.
Kesehatan ibu
3609
Kesehatan reproduksi
2712
6.
Cegah HIV
345
Cegah penyakit
4082
7.
Keanekaragaman hayati
2509
Kelestarian lingkungan
1703
8.
Akses internet
12804
Perdagangan bebas
2854
Total Tweet
282124
Table 4. Number of tweets on each keyword
-
Training Data
From the result of tweet harvesting, will be taken 3000 tweet that will be used as training data. Retrieving training data doing by considering the percentage of acquisition of each keyword so that there are elements of representation. Furthermore, the training data is labeled manually to classify in a tweet that has a negative or positive sentiment, or neutral.
-
Pre Processing
From the training data as much as 3,000 tweets, pre processing stage need to be done with the following stages:
-
Cleansing
-
Case Folding
-
Parsing
-
-
Feature Selection
The next step is selecting a feature on the training data that has been through the pre processing stage. The process at this stage is:
-
Stop Word Removal
In this process a list of words that have no meaning will be removed from a training data tweet document. A list of words that have no meaning obtained from the research results of Tala (Tala, F. Z. (2003))
-
Stemming
Training data tweet document that have been through the process of stop word removal is processed using PHP library of Sastrawi which is based on stemming algorithm of Nazief and Andriani.
-
-
Learning and Classification
From the feature selection that has been done, the next step is doing learning process and classification using naïve Bayes algorithm which is divided into two stages:
-
First stage
By using the WEKA software, training of tweet document training data that has been known the categories obtained 73.8 % accuracy using Naïve Bayes algorithm and features of the TF – IDF.
-
Second stage
-
Furthermore, the unknown category tweet document will be classified.
-
To get a direct tweet, retweet and tweet question conducted by filtering based on the characters '?' and 'RT @'
-
Results from the overall classification and filtering of tweets shown in the table:
Table 5. Classification and Filtering Results
Description:
A1
Memberantas Kemiskinan dan Kelaparan Ekstrem
A2
Mewujudkan Pendidikan Dasar untuk Semua
A3
Mendorong Kesetaraan Gender dan Pemberdayaan Perempuan
A4
Menurunkan Angka Kematian Anak
A5
Meningkatkan Kesehatan Ibu
A6
Memerangi HIV dan AIDS Malaria Serta Penyakit Lainnya
A7
Memastikan Kelestarian Lingkungan
A8
Mengembangkan Kemitraan Global untuk Pembangunan
Table 6. Policy Priorities Alternative
-
-
Determination of Public Policy Priorities
From the data that has been obtained in the preceding stage, determining public policy priorities algorithms using Analytical Hierarchy Process (AHP) can be done with the steps as below:
-
The establishment of a hierarchy
Picture 7. The Establishment of A Hierarchy
Description:
K1= The number of negative tweets K2= The number of positive tweets K3= The number of neutral tweets K4= The number of direct tweets K5= The number of re-tweets
K6= The number of question tweets
-
Pairwise Comparison
The main objective of this study is, to make a ranking of public policy based on public opinion towards a public policy that is most negative. Hereafter devised pairwise comparison matrix with the following criteria:
-
The number of negative tweets little more important than the number of positive tweets.
-
The number of negative tweets little more important than the number of neutral tweets.
-
The number of negative tweets little more important than the number of direct tweets.
-
The number of negative tweets little more important than the number of re-tweets.
-
The number of negative tweets little more important than the number of question tweets.
-
The number of positive tweets little more important than the number of neutral tweets.
-
The number of direct tweets little more important than the number of question tweets.
With reference to the 1-9 scale Saaty, L Thomas, pairwise comparison matrix can be made as shown in the table:
Table 7. Pairwise Comparison Matrix
-
-
Pairwise Comparison Matrix Normalization
The next phase is to normalize the pairwise comparison matrix by dividing each value in the column matrix with the sum of the corresponding column
Table 8. Pairwise Comparison Matrix Normalization
-
Consistency Ratio Checking (CR)
A consistency check is required to see whether the pairwise matrix that we have created a consistent value. It is fulfilled if the value of CR <= 0.1
Maximum Eigen Value
maks= 6,2889
Consistency Index Value (CI) CI=(maksn)/(n-1)
CI= 0,057777778
Consistency Ratio Value (CR)
RI value taken from the Random Index Table. The value for matrix which has orders for 6 is = 1.24
CR=CI/RI
CR= 0,046594982 (CR value <=0,1 so it is cosistence)
-
Weight Evaluation
Table 9. Weight Evaluation Table
From the weight evaluation shows that the order or priority of public policy that can be taken is based on Analytical Hierarchy Process (AHP) algorithm is as follows:
-
Eradicate Extrem Poverty and Hunger
-
Achieve Universal Primary Education
-
Global Partnership for Development
-
Reduce Child Mortality
-
Promote Gender Equality and Empower Women
-
Improve Maternal Health
-
Combat HIV/AIDS, Malaria, and other Diseases
-
Ensure Environmental Sustainability.
-
CONCLUSION
This research proved that microblogging analysis may be taken into consideration and studies to determine the priority of a public policy that is closer to the aspirations and desires of the community.
Besides that, it can be seen also that the public is very easy to give their opinion on matters that affect their daily lives, evidenced by the problem of poverty and education ranked number one and two in the tweet acquisition that correlated with the rating of public policy priorities.
The data presented in this research are preliminary results that could still be improved. The author still want to try to improve classification accuracy by using another methods and features that better.
-
BIBLIOGRAPHY
-
Berry, M.W. & Kogan, J. 2010. Text Mining Aplication and theory. WILEY : United Kingdom.
-
Feldman, R & Sanger, J. 2007. The Text Mining Handbook : Advanced Approaches in Analyzing Unstructured Data. Cambridge University Press : New York.
-
Han, J & Kamber, M. 2006 Data Mining: Concepts and Techniques Second Edition. Morgan Kaufmann publisher : San Francisco.
-
Kusrini. 2007. Konsep dan Aplikasi Sistem Pendukung Keputusan. Yogyakarta : Andi.
-
Nazief dan Adriani. 1996. Confix Stripping : Approach to Stemming Algorithm for Bahasa Indonesia.Technical report,Faculty of Computer Science, University of Indonesia,Depok, 1996
-
Pang, B., Lee, L., & Vithyanathan, S. (2002). SentimentClassification Using Machine Learning Techniques. Dalam Proceedings of The ACL-02 conference on Empirical methods in natural language processing, pp. 79-86. Stroudsburg: Association for computationalLinguistic.
-
Prasad, S. 2011. Micro-blogging Sentiment Analysis Using Bayesian Classification Methods.
-
Sunni, I. & Widyantoro, D. H. 2012. Analisis Sentimen dan Ekstraksi Topik PenentuSentimen pada Opini Terhadap Tokoh Publik
-
Tala, F. Z. (2003). A Study of Stemming Effects on Information Retrieval in Bahasa Indonesia. M.S. thesis. M.Sc. Thesis. Master of Logic Project. Institute for Logic, Language and Computation.
Universiteti van Amsterdam The Netherlands
-
Wang, A. H. 20100. Don't Follow Me: Twitter Spam Detection. Proceedings of 5th International Conference on Security and Cryptography (SECRYPT) Athens 2010: pp. 1-10. California:IEEE.