User Categorization Based on Trust Analysis for Reputation System in E-Commerce

Reena Mahe; Prof. Seema Kolkur

doi:10.17577/IJERTCONV5IS01129

ICIATE - 2017 (Volume 5 - Issue 01)

User Categorization Based on Trust Analysis for Reputation System in E-Commerce

DOI : 10.17577/IJERTCONV5IS01129

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 93
Total Downloads : 21
Authors : Reena Mahe, Prof. Seema Kolkur
Paper ID : IJERTCONV5IS01129
Volume & Issue : ICIATE – 2017 (Volume 5 – Issue 01)
Published (First Online): 24-04-2018
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

User Categorization Based on Trust Analysis for Reputation System in E-Commerce

Reena Mahe

Shree L R Tiwari College of Engineering Mira Road, Mumbai, India

Prof. Seema Kolkur

Thadomal Shahani Engineering College Bandra, Mumbai, India

AbstractIn e-commerce applications, users prefer to give reviews, feedbacks about the products they have used. Currently most reputation systems directly depend on user's ratings and calculate the score for the product. However the reliability of these scores needs to be verified as some users may have intentions to falsify the product positively or negatively and they can give false feedbacks which can affect the buying decisions of potential users. Therefore there is a need to improve current reputation systems by providing trustworthiness of the users, feedbacks and the products. In this project work, a new architecture has been proposed which detects genuinity of users. This detection is done through one questionnaire. Every user (feedback provider) is redirected to some set of questions reflecting different features of the product. User is genuine if his opinion about product matches with majority of users. Only those feedbacks are considered which are given by genuine users. This trust analysis system helps to categorize users based on genuinity and generate the global reputation score of the product.

KeywordsReputation systems, E-Commerce, Sentiment Analysis, Textual feedbacks, Polarity.

INTRODUCTION

In e-commerce environment, as participants are not physically present, to assess the reliability of the product, selling and buying something is not easy. Customers are unable to see the product, verify its quality and the risk of being cheated by other party is also high. Although many technologies exist to make the transactions more secure but they remain insufficient to build a trustful reputation about the seller or product. It becomes solely individuals decision of whom to trust and which product to buy. In such circumstances, established mechanism of reputation systems assist users to make decisions in online shopping.

Online reputation system gives clue about the quality of product of a product or service. However there is a chance of attack on reputation system to either degrade the reputation score or boost the reputation score for a particular product/service. Dealing with malicious ratings in reputation systems has been recognized as an important but difficult task [1]. A reputation system becomes ineffective when the number of genuine users is less than their malicious counterpart. This rating score becomes very important for both parties companies and consumers as consumers make decisions based on this score and on the other side companies get to know about the reputation of products and can take appropriate actions to improve the quality of product for customer satisfaction.

Existing reputation systems were designed with the assumption that users will provide honest feedbacks. But, such systems generally compromise of malicious users. This leads to the problem in cooperation, aggregation and evaluation. So some mechanism is required to detect malicious users who are providing dishonest feedbacks to upgrade or degrade the reputation score for personal or professional reasons. The system should also reduce the impact of unfair ratings and improve trust on reputation score. The main objective of this proposed system is to distinguish between honest and dishonest or malicious users and generates trustworthy reputation score for the product. This score can be helpful in decision making for subsequent users. The submitted feedback reflects the feedback providers opinion about the product.
LITERATURE SURVEY

People use internet for entertainment, Knowledge, shopping and business purposes. A reputation system collects feedbacks from users and aggregates these feedbacks as evidence and generates the aggregated results to the normal users. To protect the reputation system, many defence schemes have been developed over the period of time since internet has become a medium for online transactions. This section briefly describes the review on research areas that are relevant with respect to the reputation system for E- commerce systems and Sentiment Analysis.
1. Reputation Systems
  
  Hasnae Rahimi and Hanan Bakkali [2] proposed a new Trust Reputation System (TRS) for E-Commerce applications. They proposed a system for calculating the reputation for the product based on the analysis of the users attitude toward a collection of prefabricated textual feedbacks. This system calculates the trust degree of the user according to his subjective choice either like or dislike and according to the feedback trustworthiness. Further they calculate the global trust reputation score of the product and generate the trustworthiness of the users given feedback.
  
  This TRS system is based on two algorithms: Text Mining algorithm and the Reputation algorithm. Text mining algorithm is required to classify the feedbacks by categories in a knowledge base. Each feedback has already a degree of trustworthiness which represents the trust degree of the user who is the provider of the feedback. Then user is asked to give opinion on each review (like/dislike) in addition to the trustworthiness degree of the liked/disliked feedback and trust degree for the user is generated by using them.
  
  Advantage of this system is that the user is asked to provide rating and feedback both for the product which is quite different. Concordance between these two is verified at first step in order to avoid any conflict. The algorithm generates a trustful reputation score of a product using the trust degree of the user as a coefficient. At the end of the execution, the algorithm applies a trustworthiness degree to the feedback. The disadvantage of this system is that to calculate the trust degree for user and product from very initial feedbacks will not be reliable enough as there will not be prefabricated feedbacks to be given to feedback provider to know his attitude.
  
  Jnanamurthy HK and Sanjay Singh proposed a new method[2] to detect malicious users in online reputation systems using Quality Repository Approach (QRA). It is mainly focused on anomaly in both rating-values domain and the malicious user domain. In complex collusion attack, malicious users work in group to reduce the reputation score by giving dishonest ratings. QRA is very efficient to detect malicious users rating and provides aggregate trustful rating. It consists of four modules:
  - Change Detector
  - Quality Repository
  - Behavior analysis
  - Aggregation algorithm
    
    Threshold is a region which marks a boundary for a new state. Threshold selection plays important role in finding malicious users and decision making whether user is a true user. The selection of threshold or newly launched product is not an easy task, because it is impossible to predict the newly launched product whether it is a good product or a bad product. This is the big disadvantage of this system.
    
    Authors in this paper [3] proposed a novel personalized approach for effectively handling unfair ratings in an enhanced centralized reputation system. They consider the scenario where consumer agents elicit reputation ratings of provider agents from other consumer agents, known as advisor agents. In many multiagent settings, agents are self- interested. When the consumer agent is not confident in its private reputation ratings it can also use what refer to as the public reputation of the advisor agent. This ublic reputation is calculated based on the advisor agents ratings for all provider agents in the system. Then weighted average of private and public reputations to represent the trustworthiness of the advisor agent is computed. Trustworthiness will be decreased more/less if advisor agents provide more/fewer unfair ratings. This method for an enhanced centralized reputation system is inspired by the approaches used in distributed reputation systems. Authors claim this approach is to be effective even when the majority of advisor agents provide large numbers of unfair ratings, by adjusting to rely more heavily on private reputations of advisor agents. Its applicability is limited as it accepts only binary ratings. Range of ratings can be improved.
    
    As the value of reputation systems is widely recognized, the incentive to manipulate such systems is rapidly growing [4]. In the study by Y. Liu and Y. Sun, a complete anomaly
    
    detection plan TAUCA (Temporal And User Correlation Analysis), was composed and assessed for securing feedback based online reputation frameworks. Jabeen Begum et al [5] provide a similar kind of technique named TATA (Joint Temporal and Trust Analysis). It protects online reputation systems from a new angle and is the combination of time domain anomaly detection and DempsterShafer theory- based trust computation.
    
    Yuhong Liu, Yafei Yang, and Yan Lindsay Sun [6] propose a scheme that detects collaborative unfair raters based on similarity in their rating behaviors. They address the unfair rating problem by detecting the abnormal signals from both user-domain and rating-domain.
    
    H. Yu, M. Kaminsky, P. B. Gibbons, and A. Flaxman introduced SybilGuard [7], a new decentralized system for restricting the corruptive impacts of Sybil attacks, by limiting both the number and size of Sybil groups. In a Sybil attack, a malicious client gets numerous fake identities and puts on a show to be various, different nodes in the framework.
    
    Panayotis Fouliras provides a new Reputation Management System (RMS) [8]. Several novel but simple ideas are presented in this paper to deal with fake ratings by malicious users and multiple fake user accounts to implement such actions. Some other vital aspects for a successful reputation system are summarized as:
    - Recent ratings should get more weight than older ones.
    - A good sales record for low price range transactions should not carry the full weight when the seller sells an item at a higher price range. A higher price range means a higher profit for the seller, hence a stronger incentive to turn malicious.
    - Entities must be long lived; ratings should show the life of an entity.
    - When a party approaches a certain threshold m of malicious incidents the reputation metric should be reduced to the minimum.
    - The amount of details for each transaction should be kept to a minimum, reducing the chances of information explosion.
    - The raw rating given by an entity to the associated party should be simple and easy to understand.
2. Sentiment Analysis
Sentiment analysis is the procedure by which information is extracted from the opinions, appraisals and emotions of people in regards to entities, events and their attributes. This information can be used to check the polarity of opinions. This has not been the main focus of this proposed system but little of this included to calculate the final global score of product.

Sentiment analysis can be performed on three different levels depending upon the granularities required. Document level Sentiment Analysis [9,10,11,12,13] is the simplest form of classification. This analysis can be done using Supervised and Unsupervised machine learning approach. Same approaches can also be used for Sentence level classification [14,15,16,17,18]. In this, polarity is calculated for each sentence as each sentence is considered as separate unit and each sentence can have different opinion. Minqing Hu and

Bing Liu's work [19] is the most pioneering in Feature Level Sentiment Analysis. A new approach is proposed by authors of [20] which uses feature oriented appraisal words lexicon. It is fine grained approach in which review categorization is based on attitude and polarity of the adjectival words for the frequent features of the product.
PROPOSED SYSTEM

This system is to detect the malicious users and generate trustworthy reputation score for the product that can be helpful in decision making for subsequent users. Feedbacks, reviews, scores, recommendations or any other information given by users are very important for online reputation systems. E-commerce users prefer to focus on these opinions about a product to conceive their own trust. The aim of this system is to make these feedbacks more reliable. The proposed system is the combination of two modules as shown in figure 1:

Module I: Genuinity detection for feedback providers Module II: Sentiment Analysis (SA)

Figure 1: System Block Diagram

Module I is working on the part where trust analysis is done for the user. User can be either genuine or non-genuine. Using Supervised machine learning approach, this categorization is done with the help of one questionnaire which is given to user to know his opinion about the other features of the product.

While submitting the feedback, user is redirected to the set of few questions which are specially designed related to the same product. User is asked to fill it up before he submits his feedback. Each question has trustworthiness score from 1 to 4. Result of that questionnaire will be compared with the rest of users who already have submitted their feedbacks and are proved genuine. This approach follows the concept of majority rule. Detection of dishonest users at the time of accepting feedbacks/reviews is quite different and efficient approach. This is a new method using NaÃ¯ve Bayes based on majority rule to provide reliable reputation score.

In Supervised machine learning approach, Naive Bayes algorithm is used. It does not need a lot of data to perform well. It needs enough data to understand the probabilistic relationship of each attribute in isolation with the output variable. In this system, it makes reliable estimations of the

probability of each class irrespective of the size of the dataset. The data set for Naive Bayes algorithm is generated from the questionnaire given to users. Attributes for this dataset are the different features of the product for which users attitude needs to be observed. It consists of all users opinion about these features and the score they have given in every question. To prepare this dataset is a little challenge in the beginning as there will be no base to categorize initial users. Lab test results and User Testing approaches can be followed to overcome this drawback. Naive Bayesian equation is used to calculate the posterior probability for each class. Then we calculate the Frequencies and Probabilities and prepare the NB model. Once this NB model is prepared, likelihoods for the new instance can be estimated based on these frequencies and it can be used to predict the behaviour of next user Genuine or NonGenuine based on different set of evidences.

The class with the highest posterior probability is the outcome of prediction. An outcome of some behaviour is predicted by observing some evidences. Generally, it is better to have more than one evidences to support the prediction of behaviour. Typically, the more evidences are gathered, the better the classification accuracy can be obtained.

Module II is implemented based on the genuineness. In this module, sentiment analysis is being done on the given feedback by the genuine user. Positive or negative polarity is checked with the help of positive dictionary and negative dictionary. At the end overall score for the product is determined by calculating the polarity of the feedbacks.

In this module, raw data is taken and pe-processing steps are applied which includes tokenization, stemming, stop word removal etc. In given feedback, number of positive words and negative words are compared. If positive words are more than negative words, feedback is positive otherwise negative. It is the simple method to determine the polarity of the feedback. Lexicon-based approach is used to extract sentiments from text and classify the text according to polarity. Two dictionaries are made in two separate files for Lexicon approach. One dictionary contains all positive words while the other has all negative words.

Procedure for Sentiment Analysis:

Initialization:

PosiCnt=0; NegiCnt=0:
1. For each substring Tj to K in user feedback
2. For each patterni to n is positive dictionary .
3. If a substring Tj matches to suffix of pattern j
4. If mismatch occurs at next comparison.
5. Then find ( if it exists) right most of Tj
6. Check T0' is not suffix of Pi P
7. Check T0' is not prefix of Pi P
8. PosiCnt=PosiCnt+1 ;
9. End If
10. End If
11. End For
12. End For
13. For each substring Tj to K in the user feedback
  
  70
  
  60
  
  50
  
  40
  
  30
  
  20
  
  10
  
  0
14. For each Patternc to m in negative dictionary
  
  58
15. If a substring Ti matches the suffix of Patternc
16. If mismatch occurs as next comparisons
  
  39
  
  35
  
  40
  
  35
17. then find right most of Tj
18. check T0'
  
  is not suffix of Nj N
  
  Positive Feedbacks
  
  Negative Feedbacks
  
  8
19. check T0' is not prefix of Nj N
20. NegCnt=NegCnt+1;
21. End if
22. End If
23. End For
24. End For
25. If PosiCnt > NegCnt
26. Feedback is positive
27. End If
28. If NegCnt> PosiCnt
29. Feedback is negative
30. End If
31: If PosiCn equals to NegCnt

32 Feedback is neutral 33 End if
RESULTS AND DISCUSSION

Mobile phone has been chosen as a product with specific features to implement this system. Observations have been calculated for five different mobile phone models so that analysis can be done efficiently and accurately. Hence five different data sets are made. We have collected around 350 reviews from different users for different mobile phones. Primary challenge was to collect the initial genuine feedbacks because based on that, next feedback has to be observed. To make it easy and authentic, five Google forms were prepared for five different phones and shared with only those persons who are using that phone currently or have used in past. User was supposed to rate the quality of different features of that phone on linear scale basis from 1 to 4 ranking. Once the data set is prepared, it was used to train the NaÃ¯ve Bayes classifier to categorize the future feedback providers.

Genuine Users

NonGenuine Users

12

17

12

20

15

40

53 52

48

69

80

70

60

50

40

30

20

10

0

The graph shows the result for five different phones. Figure 2 shows the categorization of users based on genuinity. Figure 3 shows the result of positive and negative feedbacks. For genuine users only, polarity of the feedback was observed.

Figure 2: Module 1 analysis in bar chart form

5

11

18

13

Figure 3: Module 2 analysis in bar chart form
CONCLUSION

A common problem is unfair ratings which are used to unfairly increase or decrease the reputation of an entity. This system ensures only true and trusted feedbacks are displayed, rejecting the false and ill intentional feedback, thus providing a trustful reputation score for a specific product or service so as to support relying parties taking the right decision while interacting with an e-commerce application. This ensures that the product and services sold online get the prefect ratings according to its capabilities and helps the customer to make a right choice about which services or product to buy, which in turn helps to build a trust in online transaction as there will be true product rating and trusted user reviews only. Overall, this approach enhances overall trustworthiness, detects malicious users who insert dishonest ratings, bares an enormous potential and might thus lead to substantially more robust reputation systems and enhanced user experience

REFERENCES

Jnanamurthy, H.; Singh, S. , "Detection and Filtering of Collaborative Malicious Users in Reputation System using Quality Repository Approach", Advances in Computing, Communications and Informatics (ICACCI),2013 International Conference on , pp.466-471, 22- 25 Aug. 2013 doi: 10.1109/ICACCI.2013.6637216
Hasnae Rahimi, Hanan EL Bakkali, New Reputation Algorithm for Evaluating Trustworthiness in E-Commerce Context, IEEE, 2013
Jie Zhang and Robin Cohen, A Personalized Approach to Address Unfair Ratings in Multiagent Reputation Systems, in Proc. of the Fifth Int. Joint Conf. on Autonomous Agents and Multiagent Systems (AAMAS) Workshop on Trust in Agent Societies, 2006
Yuhong Liu and Yan (Lindsay) Sun, Anomaly Detection in Feedback-based Reputation Systems through Temporal and Correlation Analysis, in Proc. of 2nd IEEE Int. Conf. on Social Computing, Aug 2010.
Dr. S. Jabeen Begum, Mr. G. Rajesh Kumar, R. Varanambigai, Reputation Management using Trust Based Decision Making System through Temporal and Correlation Analysis IJRITCC,

Volume: 2 Issue 5, May 2014
Liu Yuhong, Yafei Yang, and Yan Lindsay Sun, "Detection of collusion behaviors in online reputation systems", 42nd Asilomar Conference on Signals,Systems and Computers, IEEE, 2008.
Haifeng Yu, Michael Kaminsky, Phillip B. Gibbons, Abraham Flaxman, SybilGuard: Defending Against Sybil Attacks via Social Networks, ACM 1-59593-308, Sep, 2006
Panayotis Fouliras, A novel reputation-based model for e- commerce, Springer-Verlag, April, 2011
Peter D. Turney, (2002), Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews", Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL),

Philadelphia, pp. 417-424
Richa Sharma, Shweta Nigam and Rekha Jain, (2014), Opinion Mining Of Movie Reviews at Document Level,

International Journal on Information Theory (IJIT), Vol.3, No.3
Yan Zhao, Suyu Dong and Leixiao Li, (2014), "Sentiment Analysis on News Comments Based on Supervised Learning Method", International Journal of Multimedia and Ubiquitous Engineering, Vol.9, No.7 pp.333-346
Lina L. Dhande and Dr. Prof. Girish K. Patnaik, (2014), "Analyzing Sentiment of Movie Review Data using Naive Bayes Neural Classifier", International Journal of Emerging Trends & Technology in Computer Science (IJETTCS),

Volume 3, Issue 4, ISSN 2278-6856
Gautam Kumar, Pawan kumar Goel, Sanjeev kumar Chauhan, Anand kumar Pandey, (2012), "Opinion mining and summarization for customer reviews", International Journal of Engineering Science and Technology (IJEST), Vol. 4 No.08,

ISSN: 0975-5462
V. S. Jagtap, Karishma Pawar, (2013), Analysis of different approaches to Sentence-Level Sentiment Classification, International Journal of Scientific Engineering and Technology, PP : 164-170
Raisa Varghese, Jayasree M, (2013) A Survey on Sentiment Analysis and Opinion Mining", International Journal of Research in Engineering and Technology (IJRET), eISSN: 2319-1163 pISSN: 2321-7308
Gizem Gezici, Berrin Yanikoglu, Dilek Tapucu, and Yucel Saygn, (2012), "New Features for Sentiment Analysis: Do Sentences Matter?" First International Workshop on Sentiment Discovery from Affective Data (SDAD)
S Padmaja and Prof. S Sameen Fatima, (2013), " Opinion Mining and Sentiment Analysis An Assessment of Peoples Belief: A Survey", International Journal of Ad hoc, Sensor & Ubiquitous Computing (IJASUC), Vol.4, No.1
Aurangzeb Khan, Baharum Baharudin, (2011), "Sentiment Classification by Sentence Level Semantic Orientation using SentiWordNet from Online Reviews and Blogs", Int. J Comp Sci. Emerging Tech, Vol-2 No 4
Minqing Hu and Bing Liu, (2004), "Mining and Summarizing Customer Review", ACM
D D Chaudhari, R A Deshmukh , A B Bagwan, P K Deshmukh, (2013), "Feature based approach for Review Mining Using Appraisal Words", IEEE, ISBN 978-1-4799- 1082-3

User Categorization Based on Trust Analysis for Reputation System in E-Commerce

Leave a Reply