Sentiment Analysis On Review Data Of Myhrmis Mobile Using Lexicon-based Approach

DOI : 10.17577/IJERTV12IS120051

Download Full-Text PDF Cite this Publication

Text Only Version

Sentiment Analysis On Review Data Of Myhrmis Mobile Using Lexicon-based Approach

R Ismai, NN Zulkifli, Mohd Saufi MR Faculty of Humanities, Management and Science Universiti Putra Malaysia Bintulu Sarawak Campus

Nyabau Road, P.O. Box 396, 97008 Bintulu, Sarawak

Malaysia

AbstractThis paper reports the user sentiment towards MyHRMIS Mobile application using the lexicon-based approach. The total number of 2184 reviews were scraped from the Google Play Store and App Store. Following the pre-processing procedures, a cleaned dataset consisting of 2144 reviews was retained. The lexicon-based approach used in the study is the VADER method and the SentiWordNet-based approach to label a review dataset, either positive or negative. Then SVM is applied to the labeled dataset for testing and comparing the performance of the methods. The result of the performance evaluation shows that using the VADER method outperforms SentiWordNet-based approach with an accuracy value of 91.39%, a precision value of 91.61%, and a recall value of 98.65%. The VADER method demonstrated efficient and quick classification of substantial volumes of data. Hence, we suggested that the VADER method can be used to extract sentiment from MyHRMIS Mobile application review data instead of SentiWordNet-based approach.

KeywordsSentiment Analysis; VADER; SentiWordNet; SVM; MyHRMIS Mobile

  1. INTRODUCTION

    Malaysia's e-Government started with the introduction of the Multimedia Super Corridor (MSC) in 1996, the goals are to transform administrative processes and service delivery using ICT and multimedia. Seven (7) projects are implemented in the MSC which are e-Services, ELX, e-Syariah, ePerolehan, HRMIS, SPP II and GEO [1].

    Human Resource Management Information System (HRMIS) is a technological solution designed to streamline and enhance the functions and operations of the human resource department. By automating key procedures, the HRMIS facilitates the efficient administration of human resources within an organization. The HRMIS represents a convergence of human resource management and information technology. HRMIS has been implemented in Malaysian government agencies to keep up with the latest technological trends [2]. The HRMIS aims to link various government entities through the utilization of centralized databases.

    The Public Service Department (PSD) has made significant progress in the field of mobile applications since 2014, these include the introduction of MyHRMIS Profile and MyHRMIS eGL in 2015, followed by the implementation of MyHRMIS Self Check-in in 2016. In 2017, PSD introduced MyHRMIS Care, and most recently, in 2019, the implementation of MyHRMIS Keluar Pejabat took place.

    Following the principles of Speed, Integrity, and Professionalism, and considering user opinions and

    suggestions, the PSD has undertaken a rebranding initiative for its six (6) MyHRMIS Mobile applications. This effort involves the development of a new mobile application, called MyHRMIS Mobile, which consolidates all the functionalities previously offered by the individual MyHRMIS. The MyHRMIS Mobile application has been enhanced to provide a more refined and sophisticated user experience, characterized by professionalism, simplicity, attractiveness, and organization. Despite the government's efforts to encourage the use of HRMIS, the rate of adoption among government servants is still low [1], [3].

    This occurrence catalyzes for researchers to conduct research and use it as an object of study, due to its extensive use among Malaysian government servants. Many government servants write reviews on MyHRMIS Mobile application based on their experience on the Google Play store and App Store. The study will examine the concerns surrounding the reviews of the MyHRMIS Mobile application in both the Google Play Store and App Store. Furthermore, the study aims to assess the performance of sentiment analysis outcomes through the utilization of Support Vector Machines (SVM).

    The primary objective of this research is to assist the Public Service Department and the Malaysian government in understanding the positive and negative government servants perceptions of the MyHRMIS

    Mobile application. Additionally, this study aims to give empirical evidence that can contribute to existing theories and serve as a valuable resource for the advancement of future research in this field.

  2. LITERATURE REVIEW

    1. Sentiment analysis

      Sentiment analysis is an approach that examines the perceptions and characteristics of a certain group in order to interpret the standing and review of content [4]. It pertains to the examination of consumer reviews and information, encompassing both subjective and objective phrases [5]. The examination of consumer opinions on the rating and review yields sentiments that can be categorized as positive, negative, or neutral [6].

      Sentiment analysis is commonly conducted through the utilization of machine learning techniques, text mining methodologies, natural language processing (NLP) algorithms, and classification models. Sentiment analysis is applied in the

      industry and digital business, as evidenced by the ratings and reviews on the Google Play Store [6][8] and comments on Amazon product purchases[9], [10]. These metrics provide insights into the opinions and reactions of users regarding related objects and real-world services. This will undoubtedly assist management in making crucial decisions about sales, marketing, and customer support in the future.

      Mostly, there are three approaches to detecting and classifying emotions represented in text: lexicon-based approaches, machine learning-based approaches, and hybrid strategies. The lexicon-based approach makes use of word polarity, whereas machine learning approaches see texts as a classification problem and can be further classified as unsupervised, semi-supervised, and supervised learning [11].

      Many factors influence sentiment analysis performance, such as the quality of the input dataset and the text categorization. The effectiveness of lexicon-based approach relies on the categorization of polarity for each sentence by lexical resources. Several lexical resources are available in the English language, including VADE, Sentiwordnet, WordNet- Affect, MPQA, SenticNet [12]. Lexicon resources are commonly categorized into three distinct polarities: positive, negative, and neutral. In this study, the utilization of solely positive and negative categories is employed for the sake of simplicity.

    2. Valence Aware Dictionary for sEntiment Reasoning (VADER) Method

      VADER is a sentiment analysis tool that operates on a lexicon and rule-based approach [13]. It can effectively process various linguistic elements such as words, abbreviations, slang, emoticons, and emojis that are

      frequently encountered on social media platforms [14]. It is typically much quicker than machine learning algorithms because no training is required [13]. Every textual document generates a vector consisting of sentiment ratings that represent negative, neutral, positive, and compound polarities [14].

      The VADER method utilizes both qualitative and quantitative techniques in order to construct and verify a sentiment lexicon that is tailored to certain contexts [15]. This combination has the potential to improve the accuracy of sentiment analysis models across several domains, such as social media, film reviews, and product reviews [16]. The polarity of negative, neutral, and positive sentiments is adjusted within the range of 0 to 1. The compound polarity can be conceptualized as acomprehensive assessment of all other feelings, which have been standardized to range between -1 (representing negativity) and 1 (representing positivity).

      The efficacy, user-friendliness, and efficiency of VADER method in sentiment classification of Twitter data have been demonstrated [16]. A recent research on the sentiment analysis of Bitcoin, a digital asset, reveals that there exists a correlation between the sentiment expressed in tweets and the prices of Bitcoin [14].

    3. SentiWordNet-based Approach

      SentiWordNet is a lexical resource utilized in the field of sentiment analysis and opinion mining that operates on the WordNet Database, the database consists of a collection of

      lemmas that are associated with a corresponding interface referred to as "synset" [17]. Each synset is associated with polarity scores that indicate positive and negative sentiment. The range of values for the positive sentiment score (Pos(s)) and negative sentiment score (Neg(s)) lies within the interval of 0 to 1. SentiWordNet 3.0 is an upgraded version of the publicly accessible SentiWordNet 1.0, primarily intended for research projects. It is presently licensed to over 300 research groups, and it is being utilized by several research projects across the globe [17].

    4. Support Vector Machine (SVM)

      SVMs are a type of supervised machine learning algorithms that are commonly employed for both classification and regression tasks. The algorithm aims to identify an optimal hyperplane in an N-dimensional space, where N represents the number of features. This hyperplane is selected based on its ability to maximize the gap or margin between different classes of data points. The data points can thereafter be categorized into two distinct kinds. SVMs are often considered a suitable option for datasets characterized by ambiguous distribution and a large number of features, as they are relatively unaffected by outliers [17].

      SVM has been widely employed with effectiveness in numerous text classification research projects because to its prominent advantages. These advantages include their robustness in high-dimensional spaces, suitability for various functions, strength in handling erratic sample sets, and their ability to address the majority of issues encountered in linear free-text classification [18]. Furthermore, SVM has demonstrated promising outcomes in the field of opinion mining, exceeding others machine learning methods. However, one drawback of the SVM is that its accuracy is sensitive to the choice of suitable parameters [18], [19].

      Drawing from the aforementioned literature, we intend to conduct a comparative analysis between the VADER method and SentiWordNet-based approach in order to discover users perceptions regarding the MyHRMIS Mobile application. Hence, the primary objective of this study is to determine the most effective lexicon-based approach for analyzing sentiment in the context of the MyHRMIS Mobile application.

  3. METHODOLOGY

    This paper focused on analyzing the Malaysian government servants reviews of the MyHRMIS Mobile application. The data was obtained by employing the Python programming language via utilizing the google-play-scrapper library for web scraping.

    To receive input from other platforms, data was collected from the Google Play Store and App Store. Fig.1 shows the flowchart of this work.

    Store. The data were pre-processed before we performed sentiment analysis because most users reviews are informal and unstructured, which could have an impact on word polarity or text feature extraction [20][22]. We implemented a basic data-cleaning process as follows:

    Fig. 1. Sentiment analysis flowchart

    The sentiment polarity will be categorized into positive and negative, according to the parameters outlined in Table I. Two lexicon-based approaches that have been employed are VADER method and SentiwordNet-based approach.

    No.

    Expression

    Polarity

    1

    Positivity >= 0

    positive

    2

    Negativity < 0

    negative

    TABLE I. SENTIMENT POLARITY RULES

    The data obtained from sentiment analysis were further analyze using Support Vector Machines (SVM) in order to evaluate the performance of the method. The performance metrics encompass accuracy, recall, and precision.

      1. Data Collection

        We collected reviews of the MyHRMIS Mobile application from the Google Play Store and App Store by employing the Python programming language and utilizing the google-play- scrapper library. The data used in this study was scrapped on August 18, 2023. The

        reviews data from Google Play Store is 1812 and the reviews data from App Store is 369, respectively. The total combined reviews data is 2184.

      2. Data preprocessing

        A total of 2184 unique reviews are in English language and Bahasa Melayu were collected from Google Play Store andApp

        • Convert all reviews into English text by using Google Cloud Translation.

        • Replacing upper-case letters with lower-case letters to prevent the software from incorrectly classifying words based on case.

        • Removing hashtags, usernames, and hyperlinks that start with "www," "http," and "https."

        • Reducing duplicated characters within certain words. Some users will type the same characters over and over to show how strongly they feel, so these words that are not in the dictionaries should be changed into correct words.

        • Expanding contractions in reviews such as " isn't " or "don't" because they will become meaningless words after the punctuations have been removed.

        • Clearing any non-alphabetical characters or symbols, such as punctuation, numbers, and other special symbols, that could potentially hinder the extraction of features from the text.

        • Removing redundant or empty reviews and generating a refined dataset.

    Furthermore, it is essential to perform additional preprocessing steps, such as stemming and POS tagging, for certain sentiment analysis methods like SentiWordNet-based approach [21]. This step aims to develop and enhance strategies for text cleaning, polarity calculation, and sentiment classification models through the utilization of two distinct approaches to sentiment analysis: the lexicon-based approach and the machine-learning-based technique.

    The final step involves the development of a sentiment analysis. The initial method employed in this study is a lexicon- based approach which is VADER method and SentiWordNet- based approach. The lexical resources employ a scoring technique to compute the polarity score for each sentence. In the task of sentiment detection, both VADER and SentiWordNet are employed to compute the negativity and positivity scores associated with individual words. These scores are subsequently aggregated to yield a final negativity and positivity scofor

    the entire phrase. Then, we analyze and contrast the results obtained from the lexicon-based approaches, evaluating their performance and predictive precision.

  4. RESULT AND DISCUSSION

    In this study, there are two main analysis that have been conducted. The first analysis was the labelling sentiment polarity by using VADER method and SentiWordNet-based approach. Then, the second analysis was to evaluate the

    performance of the VADER method and SentiWordNet-based approach by using SVM.

    Fig. 2 shows a word cloud generated directly from the raw text data without any preprocessing. When generating a word cloud without engaging in additional text data pre-processing, it becomes challenging to see a meaningful words as the resulting word cloud becomes a mixture of English language and Bahasa Melayu.

    Fig. 2. Word Cloud with No Pre-processing

    Althogh we can see some relevant words, e.g., update app and "tak boleh", however the world cloud is blend of English language and Bahasa Melayu. Next, the data is subjected to a cleaning process that involves the removal of hyperlinks, elimination of ampersands, deletion of punctuation marks, substitution of "US" with "USA" to avoid confusion with the pronoun "us," and conversion of all text to lowercase. Then, the process involves the elimination of stop words, which are commonly used terms that do not carry significant meaning. Additionally, any words that contain non-ASCII letters are excluded. Furthermore, the words are lemmatized, meaning they are reduced to their base or dictionary form. Following the completion of pre-processing procedures, a cleaned dataset consisting of 2144 reviews was retained.

    Fig. 3. Word Cloud with Pre-processing Representing All Reviews.

    As a result of the data cleaning process, the most frequently utilized words have been identified, excluding stop words. Numerous reviews appear to cover noteworthy subjects, as depicted in Fig. 3, the users expressed words such as app, update, leave and praised an app such as thank and easy application.

    The cleaned dataset included in this study comprises a total of 2144 reviews, which were obtained using web scraping from both the Google Play Store and the App Store. The dataset is then analyzed using RapidMiner Studio Version Educational 10.1.

    Sentiments are classified into two distinct categories: positive and negative. Table II presents the sentiment polarity that has been labelled using the VADER method and SentiWordNet- based approach. As we can see, the data reveals that the VADER method has generated 1772 positive reviews and 366 negative reviews, while the SentiWordNet-based approach has garnered 1473 positive reviews and 665 negative reviews.

    TABLE II. SENTIMENT LABELLED USING VADER AND SENTIWORDNET

    Method

    Sentiment

    Positive (%)

    Negative (%)

    VADER

    1772 (82.9%)

    366 (17.1%)

    SentiWordNet

    1473 (68.9%)

    665 (31.1%)

    TABLE III. PERFORMANCE EVALUATION SVM

    Performance Metric

    VADER

    SentiWordNet

    Accuracy

    91.39%

    84.66%

    Precision

    91.61%

    84.10%

    Recall

    98.65%

    95.86%

    Table III shows the results of performance evaluation of VADER and SentiWordNet. It contains the accuracy, precision and recall for measurement results of SVM. From the results of model evaluation, the accuracy result of the VADER is slightly better than the SentiWordNet with a value of 91.39% compared SentiWordNet with a value of 84.66%. This value show that the VADER classification is more accurate in predicting positive and negative reviews of the MyHRMIS Mobile application.

    The VADER categorization has a higher precision value of 91.61% in comparison to the SentiWordNet, which achieves a precision value of 84.10%. This implies that the VADER classification exhibits a better ratio in predicting positive reviews.

    The recall test value of the VADER classification shown a higher score of 98.65% in comparison to the SentiWordNet, which achieved a value of 95.86%. These findings suggest that the VADER demonstrates superior performance in predicting positive reviews compared to the entire dataset of actual positive reviews.

    The findings of the study suggest that the VADER method could be used to measure sentiments expressed in reviews and classify the dataset, accordingly, thereby producing good results compared to the SentiWordNet-based approach.

  5. CONCLUSION

The objective of this paper was to efficiently evaluate the sentiment of the government staff's behavior towards the MyHRMIS Mobile application using the Google Play Store and App Store reviews. This study presents an experimental analysis conducted on datasets by labelling with positive and negative sentiment using VADER method and SentiWordNet- based approach. Then SVM is applied to the labelled dataset

for testing and comparing the performance of the SentiWordNet and VADER. The VADER method demonstrated efficient and quick classification of substantial volumes of data. Hence, we suggested that VEDER model can be used to extract sentiment from MyHRMIS Mobile application reviews data instead of SentiWordNet-based approach. In future research, we intend to enhance our methodology through the utilization of extensive datasets, a machine-learning-based approach, and a hybrid-approach to achieve favorable outcomes.

ACKNOWLEDGMENT

This study is part of the research project that has received financial from Universiti Putra Malaysia under the Putra IPM Grant (GP-IPM/2022/9730800).

REFERENCES

[1] S. U. Yusuf, J. Taslim, W. A. Wan Adnan, and S. K. Baharudin, Usability evaluation of Human Resource Management Information System (HRMIS), Proc. – 2014 3rd Int. Conf. User Sci. Eng. Exp. Eng. Engag. i-USEr 2014, pp. 204209, 2015, doi: 10.1109/IUSER.2014.7002703.

[2] A. S. M. Zahari, M. A. Harun, S. F. M. Hamzah, and S. M. Salleh, The influence of Human Resource Management Information System (HRMIS) Application towards Employees Efficiency and Satisfaction, J. Phys. Conf. Ser., vol. 1019, no. 1, 2018, doi: 10.1088/1742- 6596/1019/1/012077.

[3] E. S. Alias, S. H. Mohd Idris, N. S. Ashaari, and H. Kasimin, Evaluating e-government services in Malaysia using the EGOVSAT model, Proc. 2011 Int. Conf. Electr. Eng. Informatics, ICEEI 2011, no. July, pp. 15, 2011, doi: 10.1109/ICEEI.2011.6021740.

[4] F. N. Ribeiro, M. Ara̼jo, P. Gon̤alves, M. Andr̩ Gon̤alves, and F. Benevenuto, SentiBench Рa benchmark comparison of state-of-the- practice sentiment analysis methods, EPJ Data Sci., vol. 5, no. 1, pp. 1 29, 2016, doi: 10.1140/epjds/s13688-016-0085-1.

[5] S. Becken, B. Stantic, J. Chen, A. R. Alaei, and R. M. Connolly, Monitoring the environment and human sentiment on the Great Barrier Reef: Assessing the potential of collective sensing, J. Environ. Manage., vol. 203, pp. 8797, 2017, doi: 10.1016/j.jenvman.2017.07.007.

[6] S. Wahyu Handani, D. Intan Surya Saputra, Hasirun, R. Mega Arino, and G. Fiza Asyrofi Ramadhan, Sentiment analysis for go-jek on google play store, J. Phys. Conf. Ser., vol. 1196, no. 1, pp. 17, 2019, doi: 10.1088/1742-6596/1196/1/012032.

[7] S. Fransiska and A. Irham Gufroni, Sentiment Analysis Provider by.U on Google Play Store Reviews with TF-IDF and Support Vector Machine (SVM) Method, Sci. J. Informatics, vol. 7, no. 2, pp. 2407 7658, 2020, [Online]. Available: http://journal.unnes.ac.id/nju/index.php/sji

[8] D. Pratmanto, R. Rousyati, F. F. Wati, A. E. Widodo, S. Suleman, and R. Wijianto, App Review Sentiment Analysis Shopee Application in Google Play Store Using Naive Bayes Algorithm, J. Phys. Conf. Ser., vol. 1641, no. 1, pp. 18, 2020, doi: 10.1088/1742-6596/1641/1/012043.

[9] K. S. Srujan, S. S. Nikhil, H. Raghav Rao, K. Karthik, B. S. Harish, and

H. M. Keerthi Kumar, Classification of amazon book reviews based on sentiment analysis, vol. 672, no. March. Springer Singapore, 2018. doi: 10.1007/978-981-10-7512-4_40.

[10] N. Nandal, R. Tanwar, and J. Pruthi, Machine learning based aspect level sentiment analysis for Amazon products, Spat. Inf.

[11] Res., vol. 28, no. 5, pp. 601607, 2020, doi: 10.1007/s41324-020-00320-

2.

[12] A. A. Q. Aqlan, B. Manjula, and R. Lakshman Naik, A study of sentiment analysis: Concepts, techniques, and challenges, vol. 28. Springer Singapore, 2019. doi: 10.1007/978-981-13-6459-4_16.

[13] C. Tho, Y. Heryadi, I. H. Kartowisastro, and W. Budiharto, A Comparison ofLexicon-based and Transformer-based Sentiment Analysis on Code-mixed of Low-Resource Languages, Proc. 2021 1st Int. Conf. Comput. Sci. Artif. Intell. ICCSAI 2021, vol. 1, no. October, pp. 8185, 2021, doi: 10.1109/ICCSAI53272.2021.9609781.

[14] C. Hutto and E. Gilbert, VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text, Proc. Int. AAAI Conf. Web Soc. Media, vol. 8, no. 1, pp. 216225, May 2014, doi: 10.1609/icwsm.v8i1.14550.

[15] T. Pano and R. Kashef, A Complete VADER-Based Sentiment Analysis of Bitcoin ( BTC ) Tweets during the Era of COVID-19, Big Data Cogn. Comput., vol. 4, no. 33, pp. 117, 2020, doi: 2504- 2289/4/4/33.

[16] A. G. Budianto, B. Wirjodirdjo, I. Maflahah, and D. Kurnianingtyas, Sentiment Analysis Model for KlikIndomaret Android App During Pandemic Using Vader and Transformers NLTK Library, IEEE Int. Conf. Ind. Eng. Eng. Manag., vol. 2022-Decem, pp. 423427, 2022, doi: 10.1109/IEEM55944.2022.9989577.

[17] S. Elbagir and J. Yang, Twitter Sentiment Analysis Using Natural Language Toolkit and VADER Sentiment, in The International MultiConference of Engineers and Computer Scientists 2019, 2019, vol. 0958.

[18] S. Baccianella, A. Esuli, and F. Sebastiani, SENTIWORDNET 3.0:: An Enhanced Lexical Resource for Sentiment Analysis and Opinion Mining, in Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC10), 2010, vol., pp. 2200 2204.

[19] D. A. Kristiyanti, D. A. Putri, E. Indrayuni, A. Nurhadi, and A. H. Umam, E-Wallet Sentiment Analysis Using Naïve Bayes and Support Vector Machine Algorithm, J. Phys. Conf. Ser., vol. 1641, no. 1, p. 012079, Nov. 2020, doi: 10.1088/1742-6596/1641/1/012079.

[20] A. S. H. Basari, B. Hussin, I. G. P. Ananta, and J. Zeniarja, Opinion mining of movie review using hybrid method of support vector machine and particle swarm optimization, Procedia Eng., vol. 53, pp. 453462, 2013, doi: 10.1016/j.proeng.2013.02.059.

[21] U. Naseem, I. Razzak, M. Khushi, P. W. Eklund, and J. Kim, COVIDSenti: A Large-Scale Benchmark Twitter Data Set for COVID- 19 Sentiment Analysis, IEEE Trans. Comput. Soc. Syst., vol. 8, no. 4, pp. 976988, 2021, doi: 10.1109/TCSS.2021.3051189.

[22] Y. Qi and Z. Shabrina, Sentiment analysis using Twitter data: a comparative application of lexicon and machine learning based approach, Soc. Netw. Anal. Min., pp. 114, 2023, doi: 10.1007/s13278- 023-01030-x.

[23] D. A. Kristiyanti, D. A. Putri, E. Indrayuni, A. Nurhadi, and A. H. Umam, E-Wallet Sentiment Analysis Using Naïve Bayes and Support Vector Machine Algorithm, J. Phys. Conf. Ser., vol. 1641, no. 1, p. 012079, Nov. 2020, doi: 10.1088/1742-6596/1641/1/012079.