User Categorization Based on Trust Analysis for Reputation System in E-Commerce

DOI : 10.17577/IJERTCONV5IS01129

Download Full-Text PDF Cite this Publication

Text Only Version

User Categorization Based on Trust Analysis for Reputation System in E-Commerce

Reena Mahe

Shree L R Tiwari College of Engineering Mira Road, Mumbai, India

Prof. Seema Kolkur

Thadomal Shahani Engineering College Bandra, Mumbai, India

AbstractIn e-commerce applications, users prefer to give reviews, feedbacks about the products they have used. Currently most reputation systems directly depend on user's ratings and calculate the score for the product. However the reliability of these scores needs to be verified as some users may have intentions to falsify the product positively or negatively and they can give false feedbacks which can affect the buying decisions of potential users. Therefore there is a need to improve current reputation systems by providing trustworthiness of the users, feedbacks and the products. In this project work, a new architecture has been proposed which detects genuinity of users. This detection is done through one questionnaire. Every user (feedback provider) is redirected to some set of questions reflecting different features of the product. User is genuine if his opinion about product matches with majority of users. Only those feedbacks are considered which are given by genuine users. This trust analysis system helps to categorize users based on genuinity and generate the global reputation score of the product.

KeywordsReputation systems, E-Commerce, Sentiment Analysis, Textual feedbacks, Polarity.

  1. INTRODUCTION

    In e-commerce environment, as participants are not physically present, to assess the reliability of the product, selling and buying something is not easy. Customers are unable to see the product, verify its quality and the risk of being cheated by other party is also high. Although many technologies exist to make the transactions more secure but they remain insufficient to build a trustful reputation about the seller or product. It becomes solely individuals decision of whom to trust and which product to buy. In such circumstances, established mechanism of reputation systems assist users to make decisions in online shopping.

    Online reputation system gives clue about the quality of product of a product or service. However there is a chance of attack on reputation system to either degrade the reputation score or boost the reputation score for a particular product/service. Dealing with malicious ratings in reputation systems has been recognized as an important but difficult task [1]. A reputation system becomes ineffective when the number of genuine users is less than their malicious counterpart. This rating score becomes very important for both parties companies and consumers as consumers make decisions based on this score and on the other side companies get to know about the reputation of products and can take appropriate actions to improve the quality of product for customer satisfaction.

    Existing reputation systems were designed with the assumption that users will provide honest feedbacks. But, such systems generally compromise of malicious users. This leads to the problem in cooperation, aggregation and evaluation. So some mechanism is required to detect malicious users who are providing dishonest feedbacks to upgrade or degrade the reputation score for personal or professional reasons. The system should also reduce the impact of unfair ratings and improve trust on reputation score. The main objective of this proposed system is to distinguish between honest and dishonest or malicious users and generates trustworthy reputation score for the product. This score can be helpful in decision making for subsequent users. The submitted feedback reflects the feedback providers opinion about the product.

  2. LITERATURE SURVEY

    People use internet for entertainment, Knowledge, shopping and business purposes. A reputation system collects feedbacks from users and aggregates these feedbacks as evidence and generates the aggregated results to the normal users. To protect the reputation system, many defence schemes have been developed over the period of time since internet has become a medium for online transactions. This section briefly describes the review on research areas that are relevant with respect to the reputation system for E- commerce systems and Sentiment Analysis.

    1. Reputation Systems

      Hasnae Rahimi and Hanan Bakkali [2] proposed a new Trust Reputation System (TRS) for E-Commerce applications. They proposed a system for calculating the reputation for the product based on the analysis of the users attitude toward a collection of prefabricated textual feedbacks. This system calculates the trust degree of the user according to his subjective choice either like or dislike and according to the feedback trustworthiness. Further they calculate the global trust reputation score of the product and generate the trustworthiness of the users given feedback.

      This TRS system is based on two algorithms: Text Mining algorithm and the Reputation algorithm. Text mining algorithm is required to classify the feedbacks by categories in a knowledge base. Each feedback has already a degree of trustworthiness which represents the trust degree of the user who is the provider of the feedback. Then user is asked to give opinion on each review (like/dislike) in addition to the trustworthiness degree of the liked/disliked feedback and trust degree for the user is generated by using them.

      Advantage of this system is that the user is asked to provide rating and feedback both for the product which is quite different. Concordance between these two is verified at first step in order to avoid any conflict. The algorithm generates a trustful reputation score of a product using the trust degree of the user as a coefficient. At the end of the execution, the algorithm applies a trustworthiness degree to the feedback. The disadvantage of this system is that to calculate the trust degree for user and product from very initial feedbacks will not be reliable enough as there will not be prefabricated feedbacks to be given to feedback provider to know his attitude.

      Jnanamurthy HK and Sanjay Singh proposed a new method[2] to detect malicious users in online reputation systems using Quality Repository Approach (QRA). It is mainly focused on anomaly in both rating-values domain and the malicious user domain. In complex collusion attack, malicious users work in group to reduce the reputation score by giving dishonest ratings. QRA is very efficient to detect malicious users rating and provides aggregate trustful rating. It consists of four modules:

      • Change Detector

      • Quality Repository

      • Behavior analysis

      • Aggregation algorithm

        Threshold is a region which marks a boundary for a new state. Threshold selection plays important role in finding malicious users and decision making whether user is a true user. The selection of threshold or newly launched product is not an easy task, because it is impossible to predict the newly launched product whether it is a good product or a bad product. This is the big disadvantage of this system.

        Authors in this paper [3] proposed a novel personalized approach for effectively handling unfair ratings in an enhanced centralized reputation system. They consider the scenario where consumer agents elicit reputation ratings of provider agents from other consumer agents, known as advisor agents. In many multiagent settings, agents are self- interested. When the consumer agent is not confident in its private reputation ratings it can also use what refer to as the public reputation of the advisor agent. This ublic reputation is calculated based on the advisor agents ratings for all provider agents in the system. Then weighted average of private and public reputations to represent the trustworthiness of the advisor agent is computed. Trustworthiness will be decreased more/less if advisor agents provide more/fewer unfair ratings. This method for an enhanced centralized reputation system is inspired by the approaches used in distributed reputation systems. Authors claim this approach is to be effective even when the majority of advisor agents provide large numbers of unfair ratings, by adjusting to rely more heavily on private reputations of advisor agents. Its applicability is limited as it accepts only binary ratings. Range of ratings can be improved.

        As the value of reputation systems is widely recognized, the incentive to manipulate such systems is rapidly growing [4]. In the study by Y. Liu and Y. Sun, a complete anomaly

        detection plan TAUCA (Temporal And User Correlation Analysis), was composed and assessed for securing feedback based online reputation frameworks. Jabeen Begum et al [5] provide a similar kind of technique named TATA (Joint Temporal and Trust Analysis). It protects online reputation systems from a new angle and is the combination of time domain anomaly detection and DempsterShafer theory- based trust computation.

        Yuhong Liu, Yafei Yang, and Yan Lindsay Sun [6] propose a scheme that detects collaborative unfair raters based on similarity in their rating behaviors. They address the unfair rating problem by detecting the abnormal signals from both user-domain and rating-domain.

        H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman introduced SybilGuard [7], a new decentralized system for restricting the corruptive impacts of Sybil attacks, by limiting both the number and size of Sybil groups. In a Sybil attack, a malicious client gets numerous fake identities and puts on a show to be various, different nodes in the framework.

        Panayotis Fouliras provides a new Reputation Management System (RMS) [8]. Several novel but simple ideas are presented in this paper to deal with fake ratings by malicious users and multiple fake user accounts to implement such actions. Some other vital aspects for a successful reputation system are summarized as:

        • Recent ratings should get more weight than older ones.

        • A good sales record for low price range transactions should not carry the full weight when the seller sells an item at a higher price range. A higher price range means a higher profit for the seller, hence a stronger incentive to turn malicious.

        • Entities must be long lived; ratings should show the life of an entity.

        • When a party approaches a certain threshold m of malicious incidents the reputation metric should be reduced to the minimum.

        • The amount of details for each transaction should be kept to a minimum, reducing the chances of information explosion.

        • The raw rating given by an entity to the associated party should be simple and easy to understand.

    2. Sentiment Analysis

    Sentiment analysis is the procedure by which information is extracted from the opinions, appraisals and emotions of people in regards to entities, events and their attributes. This information can be used to check the polarity of opinions. This has not been the main focus of this proposed system but little of this included to calculate the final global score of product.

    Sentiment analysis can be performed on three different levels depending upon the granularities required. Document level Sentiment Analysis [9,10,11,12,13] is the simplest form of classification. This analysis can be done using Supervised and Unsupervised machine learning approach. Same approaches can also be used for Sentence level classification [14,15,16,17,18]. In this, polarity is calculated for each sentence as each sentence is considered as separate unit and each sentence can have different opinion. Minqing Hu and

    Bing Liu's work [19] is the most pioneering in Feature Level Sentiment Analysis. A new approach is proposed by authors of [20] which uses feature oriented appraisal words lexicon. It is fine grained approach in which review categorization is based on attitude and polarity of the adjectival words for the frequent features of the product.

  3. PROPOSED SYSTEM

    This system is to detect the malicious users and generate trustworthy reputation score for the product that can be helpful in decision making for subsequent users. Feedbacks, reviews, scores, recommendations or any other information given by users are very important for online reputation systems. E-commerce users prefer to focus on these opinions about a product to conceive their own trust. The aim of this system is to make these feedbacks more reliable. The proposed system is the combination of two modules as shown in figure 1:

    Module I: Genuinity detection for feedback providers Module II: Sentiment Analysis (SA)

    Figure 1: System Block Diagram

    Module I is working on the part where trust analysis is done for the user. User can be either genuine or non-genuine. Using Supervised machine learning approach, this categorization is done with the help of one questionnaire which is given to user to know his opinion about the other features of the product.

    While submitting the feedback, user is redirected to the set of few questions which are specially designed related to the same product. User is asked to fill it up before he submits his feedback. Each question has trustworthiness score from 1 to 4. Result of that questionnaire will be compared with the rest of users who already have submitted their feedbacks and are proved genuine. This approach follows the concept of majority rule. Detection of dishonest users at the time of accepting feedbacks/reviews is quite different and efficient approach. This is a new method using Naïve Bayes based on majority rule to provide reliable reputation score.

    In Supervised machine learning approach, Naive Bayes algorithm is used. It does not need a lot of data to perform well. It needs enough data to understand the probabilistic relationship of each attribute in isolation with the output variable. In this system, it makes reliable estimations of the

    probability of each class irrespective of the size of the dataset. The data set for Naive Bayes algorithm is generated from the questionnaire given to users. Attributes for this dataset are the different features of the product for which users attitude needs to be observed. It consists of all users opinion about these features and the score they have given in every question. To prepare this dataset is a little challenge in the beginning as there will be no base to categorize initial users. Lab test results and User Testing approaches can be followed to overcome this drawback. Naive Bayesian equation is used to calculate the posterior probability for each class. Then we calculate the Frequencies and Probabilities and prepare the NB model. Once this NB model is prepared, likelihoods for the new instance can be estimated based on these frequencies and it can be used to predict the behaviour of next user Genuine or NonGenuine based on different set of evidences.

    The class with the highest posterior probability is the outcome of prediction. An outcome of some behaviour is predicted by observing some evidences. Generally, it is better to have more than one evidences to support the prediction of behaviour. Typically, the more evidences are gathered, the better the classification accuracy can be obtained.

    Module II is implemented based on the genuineness. In this module, sentiment analysis is being done on the given feedback by the genuine user. Positive or negative polarity is checked with the help of positive dictionary and negative dictionary. At the end overall score for the product is determined by calculating the polarity of the feedbacks.

    In this module, raw data is taken and pe-processing steps are applied which includes tokenization, stemming, stop word removal etc. In given feedback, number of positive words and negative words are compared. If positive words are more than negative words, feedback is positive otherwise negative. It is the simple method to determine the polarity of the feedback. Lexicon-based approach is used to extract sentiments from text and classify the text according to polarity. Two dictionaries are made in two separate files for Lexicon approach. One dictionary contains all positive words while the other has all negative words.

    Procedure for Sentiment Analysis:

    Initialization:

    PosiCnt=0; NegiCnt=0:

    1. For each substring Tj to K in user feedback

    2. For each patterni to n is positive dictionary .

    3. If a substring Tj matches to suffix of pattern j

    4. If mismatch occurs at next comparison.

    5. Then find ( if it exists) right most of Tj

    6. Check T0' is not suffix of Pi P

    7. Check T0' is not prefix of Pi P

    8. PosiCnt=PosiCnt+1 ;

    9. End If

    10. End If

    11. End For

    12. End For

    13. For each substring Tj to K in the user feedback

      70

      60

      50

      40

      30

      20

      10

      0

    14. For each Patternc to m in negative dictionary

      58

    15. If a substring Ti matches the suffix of Patternc

    16. If mismatch occurs as next comparisons

      39

      35

      40

      35

    17. then find right most of Tj

    18. check T0'

      is not suffix of Nj N

      Positive Feedbacks

      Negative Feedbacks

      8

    19. check T0' is not prefix of Nj N

    20. NegCnt=NegCnt+1;

    21. End if

    22. End If

    23. End For

    24. End For

    25. If PosiCnt > NegCnt

    26. Feedback is positive

    27. End If

    28. If NegCnt> PosiCnt

    29. Feedback is negative

    30. End If

    31: If PosiCn equals to NegCnt

    32 Feedback is neutral 33 End if

  4. RESULTS AND DISCUSSION

    Mobile phone has been chosen as a product with specific features to implement this system. Observations have been calculated for five different mobile phone models so that analysis can be done efficiently and accurately. Hence five different data sets are made. We have collected around 350 reviews from different users for different mobile phones. Primary challenge was to collect the initial genuine feedbacks because based on that, next feedback has to be observed. To make it easy and authentic, five Google forms were prepared for five different phones and shared with only those persons who are using that phone currently or have used in past. User was supposed to rate the quality of different features of that phone on linear scale basis from 1 to 4 ranking. Once the data set is prepared, it was used to train the Naïve Bayes classifier to categorize the future feedback providers.

    Genuine Users

    NonGenuine Users

    12

    17

    12

    20

    15

    40

    53 52

    48

    69

    80

    70

    60

    50

    40

    30

    20

    10

    0

    The graph shows the result for five different phones. Figure 2 shows the categorization of users based on genuinity. Figure 3 shows the result of positive and negative feedbacks. For genuine users only, polarity of the feedback was observed.

    Figure 2: Module 1 analysis in bar chart form

    5

    11

    18

    13

    Figure 3: Module 2 analysis in bar chart form

  5. CONCLUSION

A common problem is unfair ratings which are used to unfairly increase or decrease the reputation of an entity. This system ensures only true and trusted feedbacks are displayed, rejecting the false and ill intentional feedback, thus providing a trustful reputation score for a specific product or service so as to support relying parties taking the right decision while interacting with an e-commerce application. This ensures that the product and services sold online get the prefect ratings according to its capabilities and helps the customer to make a right choice about which services or product to buy, which in turn helps to build a trust in online transaction as there will be true product rating and trusted user reviews only. Overall, this approach enhances overall trustworthiness, detects malicious users who insert dishonest ratings, bares an enormous potential and might thus lead to substantially more robust reputation systems and enhanced user experience

REFERENCES

  1. Jnanamurthy, H.; Singh, S. , "Detection and Filtering of Collaborative Malicious Users in Reputation System using Quality Repository Approach", Advances in Computing, Communications and Informatics (ICACCI),2013 International Conference on , pp.466-471, 22- 25 Aug. 2013 doi: 10.1109/ICACCI.2013.6637216

  2. Hasnae Rahimi, Hanan EL Bakkali, New Reputation Algorithm for Evaluating Trustworthiness in E-Commerce Context, IEEE, 2013

  3. Jie Zhang and Robin Cohen, A Personalized Approach to Address Unfair Ratings in Multiagent Reputation Systems, in Proc. of the Fifth Int. Joint Conf. on Autonomous Agents and Multiagent Systems (AAMAS) Workshop on Trust in Agent Societies, 2006

  4. Yuhong Liu and Yan (Lindsay) Sun, Anomaly Detection in Feedback-based Reputation Systems through Temporal and Correlation Analysis, in Proc. of 2nd IEEE Int. Conf. on Social Computing, Aug 2010.

  5. Dr. S. Jabeen Begum, Mr. G. Rajesh Kumar, R. Varanambigai, Reputation Management using Trust Based Decision Making System through Temporal and Correlation Analysis IJRITCC,

    Volume: 2 Issue 5, May 2014

  6. Liu Yuhong, Yafei Yang, and Yan Lindsay Sun, "Detection of collusion behaviors in online reputation systems", 42nd Asilomar Conference on Signals,Systems and Computers, IEEE, 2008.

  7. Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, Abraham Flaxman, SybilGuard: Defending Against Sybil Attacks via Social Networks, ACM 1-59593-308, Sep, 2006

  8. Panayotis Fouliras, A novel reputation-based model for e- commerce, Springer-Verlag, April, 2011

  9. Peter D. Turney, (2002), Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL),

    Philadelphia, pp. 417-424

  10. Richa Sharma, Shweta Nigam and Rekha Jain, (2014), Opinion Mining Of Movie Reviews at Document Level,

    International Journal on Information Theory (IJIT), Vol.3, No.3

  11. Yan Zhao, Suyu Dong and Leixiao Li, (2014), "Sentiment Analysis on News Comments Based on Supervised Learning Method", International Journal of Multimedia and Ubiquitous Engineering, Vol.9, No.7 pp.333-346

  12. Lina L. Dhande and Dr. Prof. Girish K. Patnaik, (2014), "Analyzing Sentiment of Movie Review Data using Naive Bayes Neural Classifier", International Journal of Emerging Trends & Technology in Computer Science (IJETTCS),

    Volume 3, Issue 4, ISSN 2278-6856

  13. Gautam Kumar, Pawan kumar Goel, Sanjeev kumar Chauhan, Anand kumar Pandey, (2012), "Opinion mining and summarization for customer reviews", International Journal of Engineering Science and Technology (IJEST), Vol. 4 No.08,

    ISSN: 0975-5462

  14. V. S. Jagtap, Karishma Pawar, (2013), Analysis of different approaches to Sentence-Level Sentiment Classification, International Journal of Scientific Engineering and Technology, PP : 164-170

  15. Raisa Varghese, Jayasree M, (2013) A Survey on Sentiment Analysis and Opinion Mining", International Journal of Research in Engineering and Technology (IJRET), eISSN: 2319-1163 pISSN: 2321-7308

  16. Gizem Gezici, Berrin Yanikoglu, Dilek Tapucu, and Yucel Saygn, (2012), "New Features for Sentiment Analysis: Do Sentences Matter?" First International Workshop on Sentiment Discovery from Affective Data (SDAD)

  17. S Padmaja and Prof. S Sameen Fatima, (2013), " Opinion Mining and Sentiment Analysis An Assessment of Peoples Belief: A Survey", International Journal of Ad hoc, Sensor & Ubiquitous Computing (IJASUC), Vol.4, No.1

  18. Aurangzeb Khan, Baharum Baharudin, (2011), "Sentiment Classification by Sentence Level Semantic Orientation using SentiWordNet from Online Reviews and Blogs", Int. J Comp Sci. Emerging Tech, Vol-2 No 4

  19. Minqing Hu and Bing Liu, (2004), "Mining and Summarizing Customer Review", ACM

  20. D D Chaudhari, R A Deshmukh , A B Bagwan, P K Deshmukh, (2013), "Feature based approach for Review Mining Using Appraisal Words", IEEE, ISBN 978-1-4799- 1082-3

Leave a Reply