- Open Access
- Authors : Akshatha T M, Dr. M. N Veena
- Paper ID : IJERTCONV8IS14051
- Volume & Issue : NCETESFT – 2020 (Volume 8 – Issue 14)
- Published (First Online): 28-08-2020
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
Machine Learning Framework for Detecting Spammer and Fake Users on Twitter
Akshatha T M Department of MCA PES College of Engineering Mandya, Karnataka, India
Dr. M. N Veena
Department. of MCA PES College of Engineering Mandya, Karnataka, India
Abstract Twitter has rapidly become an online source for acquiring real-time his/her information about users. Twitter is an Online Social Network (OSN) where users can share anything and everything, such as news, opinions, and even their moods. Several arguments can be held over different topics, such as politics, Perticular affairs, and important events. When a user tweets something, it is instantly conveyed to her followers, allowing them to outspread the received information at a much broader level. With the evolution of OSNs, the need to study and analyze users' behaviors in online social platforms has intensity Spammers can be identified based on: (i) fake content, (ii) URL based spam detection, (iii) spam in trending topics, and (iv)fake user identification. And with the help of machine learning algorithms we are going to identify the fake user and spammer in twitter.
Keywords Spammers, fake identification, machine learning, online social platform.
-
INTRODUCTION
Several research works have been carried in the one of the popular social media like Twitter. Nowadays most of the people are using the twitter. In twitter also we have the fake users so in this survey we are find fake user identification from Twitter. In this paper we are going to identify the fake users based on : (i) fake content, (ii) URL based spam detection, (iii) spam in trending topics, and (iv)fake user identification. After identify the fake user. The fake user going to waste the times of others, they are going to post the post frequently and which is not related to the other user.
-
LITERATURE REVIEW
The survey of new methods and techniques to identify Twitter spam detection. The survey presents a comparative study of the current approaches. On the other hand, the authors in conducted a survey on different behaviors exhibited by spammers on Twitter social network. The study also provides a literature review that identify the existence of spammers on Twitter social network. Despite all the existing studies, there is still a gap in the existing literature.
-
Fake Content
Fake content contains the fake information which is posted by user. Based on the user likes or dislikes and the user comments admin going to decide that the user posted tweet are fake or not. Just for one tweet we can not say that the user is fake or not, because he/she is going to post the tweet by some
information so admin have to check the user history.
-
Url Based Spam Detection
User going to post the tweets in tweeter and also they can post the URLs. Some users going to post the fake URLs, spammers or fake users can going to post the fake URLs to get the user personal information or they can also access the bank details if the user give permission to access the data.
-
Spam In Trending
User going to trend the posts. Based on Minimum weight algorithm, admin going to identified that the post is fake or not.
-
Fake User
Admin going to identify the fake user based on k-Means algorithms detecting fake users through hybrid techniques.
-
-
PROPOSED METHODOLOGY
In this paper we are going to divide the fake users into four types are (i) fake content, (ii) URL based spam detection,
(iii) detecting spam in trending topics, and (iv)fake user identify. With the help of Machine learning algorithms like Random forest, Minimum weight and K-means we using these algorithms in different stages to identify the fake users and spammer on twitter.
3.1 Random Forest Algorithm
In this paper we are using random forest which is comes under supervised learning in machine learning. Random forest algorithm which is used to classification, in this paper we are going to identify the spammer and firstly we have to categorized the spammer after that we are going to identify the spammer.
Steps for Random Forest algorithm
Step 1: Gather the different training data from the training dataset.
Step 2: In each data which we are gathered we have to take the particular information.
Step 3: Finally we have to predict the data
Training dataset
Training data1
Training dataset
Training data1
Training data1
Training data1
.
.
Training data2
Training data2
Training data1
Training data1
Prediction
…
Training data N
Training data N
Training data1
Training data1
..
Figure 1: Random Forest Algorithm
-
Minimum Weight Algorithm
Minimum weight algorithm is the part of decision tree. This algorithm going to takes the fine grind data (last divisible object), the object consists of single keyword.
Steps of minimum weight algorithm:
Step1: Initialize a machine learning weight for better problem object.
Step2 : Identify the optimal model weights for the particular training dataset by calling the fit method and object initialized in step1.
Step3 : Predict the labels for a test dataset by calling the predict method of the object initialized in step 1.
-
K-Means Algorithm
K-Means algorithm comes under the unsupervised learning which is used in cluster. This algorithm is used to identify the fake users in twitter.
Steps of K-Means Algorithm:
Step 1: we need to identify the number of clusters, K is num of cluster, need to be generated by this algorithm.
Step 2 : randomly select K value points and assign each value point to a cluster. That means, classify the data based on the number of value points.
Step 3 : In this step it will compute the cluster data.
Step 4 : keep fallow the following steps until we get optimal centroid which is the assignment of data points to the clusters that are not changing any more.
These are all the algorithms which we are used to done this survey.
Figure 2: Proposed Model
-
-
EXPERIMENTAL RESULTS
Figure 3: User dataset
In the above figure table contains the user information. Table
also store the url of the images
Figure 4: User Profile
This user page contains the user information this will also display to the Twitter user. And here only user can easily identify the type of the user
Figure 5: Search Friend
Search friends page used to search the users those who are using twitter . Twitter user can easily find their friends.
Figure 6: View Post Comment
In twitter user going to post the pictures and information to their friends. After that they may can get the comments to that pictures so here the user can easily view the comments which is send by their friends.
Figure 7 : Add Spammer Filter
In this page comes in the admin side here admin going to add The Spammer word to the spammer category based on the spammer category we can easily identify the spammer user. Here we are using the random forest algorithm
Figure 8: Spammer Detection
This Page admin going to identify the spammer and it categoty.
Here we are using the minimum weight algorithm.
Figure 9: Fake user Identification
In the above picture is also very important to identify Fake Users on twitter. In this page we can see the information of the fake user. We are using the k-Mean algorithm to find out the fake uers on twitter.
Figure 9: Fake User Identification Result
In above Picture we can see the fake user post in the graph.
-
CONCLUSION
The development of successful strategies for the spam detection and fake user recognition on Twitter, there are still many problems to further development by the researchers. The issues are highlighted as fallow: False news identification on social media is a problem that needs to be explored because of the serious repercussions of such news at individual as well as different level. Another related subject that is worth exploring is the discovery of rumor sources on social media. While a few experiments focused on different techniques have already been performed to identify the origins of misinformation, more advanced approaches, e.g., social networkbased approaches, can be extended because of their demonstrated efficacy.
-
REFERENCES
-
Spammer Detection and Fake User Identification on Social Networks Faiza Masood , Ghana Ammad ,Ahmad Almogren ,Assad Abbas , May 2019
-
Detection of spam-posting accounts on Twitter Isa Inuwa-
Dutse, Mark Liptrott, Ioannis Korkontzelos
-
A sneak into the Devils Colony- Fake Profiles in Online Social Networks Mudasir Ahmad Wani, Suraiya Jabina
-
Strangers Intrusion Detection – Detecting Spammers and Fake Profiles in Social Networks Based on Topology Anomalies Michael Fire, Gilad Katz, Yuval Elovici
-
Twitter Spammer Detection Ashwini Bhangare, Smita Ghodke, Kamini Walunj , Utkarsha Yewale
-
Fake News Detection on Social Media: A Data Mining Perspective Kai Shu, Amy Sliva, Suhang Wang, Jiliang Tang
-
Machine Learning (An Algorithmic Perspective) Stephen Marsland
-
N. Eshraqi, M. Jalali, and M. H. Moattar, Detecting spam tweets in Twitter using a data stream clustering algorithm, in Proc. Int. Congr. Technol., Commun. Knowl. (ICTCK), Nov. 2015, pp. 347351.
-
C. Chen, Y. Wang, J. Zhang, Y. Xiang, W. Zhou, and G. Min, Statistical features-based real-time detection of drifted Twitter spam, IEEE Trans Apr. 2017
-
C. Buntain and J. Golbeck, Automatically identifying fake news in popular Twitter threads, Nov 2017.
-
C. Chen, J. Zhang, Y. Xie, Y. Xiang, W. Zhou, M. M. Hassan, A. AlElaiwi, and M. Alrubaian, A performance evaluation of machine learning-based streaming spam tweets detection,, Sep.
2015.
-
G. Stafford and L. L. Yu, An evaluation of the effect of spam on Twitter trending topics, Sep. 2013.
-
M. Mateen, M. A. Iqbal, M. Aleem, and M. A. Islam, A hybrid approach for spam detection for Twitter, Jan. 2017.
-
A. Gupta and R. Kaushal, Improving spam detection in online social networks, , Mar. 2015.
-
Parameshachari B D et. al Epileptic Seizure Detection Using Machine Learning, 1st International Conference on Emerging Trends in Engineering, Innovative Science and Management (ICETEISM-2019), 2019.