A Hybrid Machine Learning Model for Detection of Fake Profile Accounts on Social Media Networks

DOI : 10.17577/IJERTV13IS110070
Download Full-Text PDF Cite this Publication

Text Only Version

 

A Hybrid Machine Learning Model for Detection of Fake Profile Accounts on Social Media Networks

Machine Learning Model

Morris Bosire Karamu

Department of Computing and Information Science School of Pure and Applied Science

Kenyatta University Nairobi, Kenya

Eric Nyambariga Araka

Department of Computing and Information Science School of Pure and Applied Science

Kenyatta University Nairobi, Kenya

AbstractMost of us nowadays are drawn and motivated to adamantly adopt and get obsessed in every new tech-trend emerging within the social media culture that is mostly virtual, through this there is fast worldwide communication out-reach; on-screen relationships and mediated reality scenarios all over, receiving and sending everything is digital and easy to access. This has drawn the attention and motivation for this research. For instance, this massive interconnection comes with a critical challenge: the ever-growing problem of fashioning fake profile accounts. These dishonest entities, falsely reflecting themselves in auto-scripts, human imitation accounts running on bots and automatically hiding behind a masked user-identity of genuine users, can bring a substantial damage to a wide-range of connections on the internet. This encompass spam and unwanted messages to creation of fake profile accounts that lead to a variety of negative penalties such as internet being used for immoral political agendas, manipulative and misleading information to interrupt the public communication changing public opinion and subverting online communication, political processes, public health initiatives or even financial markets. Social media platforms have become breeding ground for counterfeit profiles, calling for the need for improved and reliable detection and mitigation techniques of fake profiles. This research involves the assessment and development Machine Learning model which in-turn can reveal accurately bogus social media accounts with possible mitigation methods.

Keywords Counterfeit; Brand Integrity; Machine Learning; Artificial Neural Network(ANN); Convolutional Neural Network (CNN); Support Vector Machines (SVMs)

  1. INTRODUCTION

    We are in the core of the digital phase, where social media platforms reign supreme in global communications, a critical challenge emerges: Ever-multiplying bogus account profiles. It is these accounts that mislead people, some present as individuals while they are really controlled by bots, scripts and cyber-attackers (Almeida et al., 2023; Kim & Lee, 2022).

    These beneficial social merits are accompanied with a severe challenge of maintaining honest social media user account integrity. Numerous Machine Learning (ML) techniques and Models have been proposed and deployed to mitigate this

    problem but still there exist a gap in coming up with the most effective, appropriate and dominant ML remedy to mitigate this problem permanently (Smith & Jones, 2022; Wang et al., 2023). These threats to the very nature of online interaction have grown considerably. By their presence falsehood overcomes truth, disinformation is a hit, and the space for deceit and malicious activities like identity fraud and theft is decidedly wider (Gilad, 2023).

  2. THE SOCIAL NETWORK SECURITY
    1. Safety, trust and confidence among legit users

      Safety, trust and confidence among legit users, providers, platforms possessors, and supervisors’ agencies has been pointed as one of the foremost aspects that contribute to social media platforms victory and sustainable (Zhang & Gupta 2021). In the Topical moments Shearer and Mitchell (2022) reveals that an expanse of the global population using social media platforms to access information not only surpasses the use of television and also traditional channels of information; like the radio, and broadcast newspapers.

    2. Dissemination of misinformation among legit users

      Dissemination of misinformation is another substantial issue on the list. (Li, Zhang & Wu, 2022)Malicious internet users try and spread the false information on social media, influence public opinion and dissect online conversations which can impede political processes, public health campaigns and even financial markets (Ferrara et al., 2022). Such as false pages, a stalker could bully or harass a victim with or without cruel words on social media which largely affects the physical and mental well being and the safety of individuals, particularly children and teens (China, 2021).

    3. Problem Statement

      In similar way, these social platforms are beneficial these merits are faced with a critical challenge of maintaining honest social media user accounts integrity. Numerous Machine Learning techniques and Models have been proposed and deployed to mitigate this problem but still there exist a gap in

      IJERTV13IS110070

      (This work is licensed under a Creative Commons Attribution 4.0 International License.)

      coming up with the most effective, appropriate and dominant Machine Learning remedy to mitigate this problem permanently. This is because Counterfeit accounts owners change tactics, as scammers and spammers perpetually innovate new ways of manipulating their attack strategies by social engineering to mask themselves as genuine users. Most researchers have deployed single and hybrid models with variant techniques targeting specific fake accounts anomalies to detect these entities. These solution Techniques with time grow obsolete as human users advance in behavioral patterns. For instance, the existence of many forged portfolios in a single platform may terrify genuine users especially those bogus accounts that spread and misrepresent malicious information broadcasting and eroding reputation of the victim platform brand thereby diminishing users trust. This necessitates a revolutionary hybrid machine learning solution which would halt these challenge that has been evolving continuously. The research paves way to shield online interactions, reputation of a brand and also drive an ecosystem which is trustworthy with secure online communities through knowledge and practical solutions for social media business platforms.

    4. Objectives of the Study
      • To investigate how the current Machine Learning algorithms can accurately be used to detect fake accounts in social media networks.
      • To investigate the features that the currently being used to detect fake accounts in social networks.
      • To design a Hybrid Machine Learning model that can be used for detection of fake accounts in social networks.
      • To evaluate the Hybrid Machine Learning model that will be used for detection of fake accounts in social networks.
  3. LITERATURE REVIEW

    The proliferation of fake profile accounts on social media platforms has been extensively documented, with scholars highlighting the multifaceted challenges posed by these fraudulent entities (Kerrysa & Utami, 2023; Nistor & Zadobrischi, 2022; Thomas et al., 2021). Previous studies have underscored the importance of leveraging advanced technologies, such as machine learning and natural language processing, to combat the growing threat of fake profiles (Pasieka et al., n.d.; Goyal et al., 2023b; Chakraborty et al., 2022).

    These deceptive accounts, are often disguised as legitimate users and operated by bots o scripts, engage in various malicious activities, including:

      • Disseminating misinformation: Fake profiles can manipulate public opinion, disrupt online discourse, and impact political processes, public health initiatives, and even financial markets (Ferrara et al., 2022).
      • Facilitating cyberbullying and harassment: They can target individuals, particularly vulnerable populations like children and adolescents, causing significant mental and emotional harm (Chen, 2021).
      • Enabling privacy breaches and identity theft: By strategically collecting personal information from unsuspecting users, fake profiles pose a significant threat to user privacy (Li et al., 2022).

    IJERTV13IS110070

    • Eroding brand trust and reputation: Fake accounts can impersonate legitimate businesses or individuals, leading to the dissemination of misleading content under the guise of trusted sources, undermining brand integrity and consumer trust (Doe et al., 2021).

      Machine learning offers a more automated and accurate solution. Techniques like supervised learning (Singh & Sharma, 2023), unsupervised learning (Liu et al., 2023), and Natural Language Processing (NLP) (Goyal et al., 2023b) allow algorithms to learn patterns from data and effectively detect fake profiles. However, there is a big gap for these methods in that they face challenges related to data availability and quality (Gilad, 2023), potential biases in training data leading to discriminatory outcomes (Ferrara et al., 2022), and computational costs associated with complex models, Moreover, there exist a gap in coming up with the most effective, appropriate and dominant Machine Learning remedy to mitigate this problem permanently.

      1. Limitations and Challenges of Existing Techniques
    • Limited training data: Acquiring large and diverse datasets of labeled fake profiles can be challenging due to privacy concerns and the dynamic nature of online behavior (Wang et al., 2023).
    • Black box nature: DNNs are often criticized for their lack of interpretability (Venkatadri & Zafarani, 2023). This makes it difficult to understand the reasoning behind their classification decisions, limiting user trust and potentially hindering efforts to improve their accuracy and fairness.
    • Computational complexity: Training and operating DNNs can be computationally expensive and resource-intensive, requiring significant processing power and infrastructure, which may not be readily available to all organizations or platforms (Venkatadri & Zafarani, 2023).
    1. Conceptual framework

    Fig. 1. Conceptual Framework

    (This work is licensed under a Creative Commons Attribution 4.0 International License.)

    International Journal of Engineering Research & Technology

    ISSN: 2278-0181

    Vol. 13 Issue 11, November 2024

  4. RESEARCH METHODOLOGY

To address this issue, we leverage MLOps (Machine Learning Operations) which is a set of practices, principles, and tools designed to facilitate the lifecycle of machine learning models. This framework provides efficient way to manage every phase of hybrid machine learning model’s lifecycle, from data collection and model training to deployment, monitoring, and continuous improvement. It integrates automation, collaboration, and governance, ensuring that the model remains robust, scalable, and adaptable as you tackle the complex task of fake profile detection. This ultimately provide a powerful, automated solution to the detection.

  1. Data Collection and Preparation

    This research utilized a publicly available Twitter dataset from Kaggle, which includes both labeled and unlabeled profiles. The labeled data helps distinguish between fake and real profiles, while unsupervised learning techniques identify potential fake profiles in the unlabeled data.

  2. Data Cleaning , This involved
    • Handling missing data by either filling in appropriate values or removing irrelevant rows.
    • Removing outliers if necessary to ensure the model doesn’t get biased by extreme data points.
    • Normalizing numerical data (such as likes, shares) to ensure consistency in model training.
    • Feature Encoding: For categorical features (e.g., profile names), apply encoding techniques like one-hot encoding or label encoding to convert the text data into numerical form.
  3. Feature Engineering and Fusion
    • Text Feature Vectorization: Convert textual features (e.g., profile bio or posts) into numerical vectors that machine learning models can process.
    • Normalization/Scaling: Normalize features such as the number of followers, likes, shares, and posts to bring them within a consistent range.
    • Cross-Validation: Using multiple data splits for training and validation to avoid overfitting and improve generalization.
      • Hyperparameter Optimization: Fine-tuning model parameters to achieve the best possible performance.
  4. Train-Test Split
    • The dataset was Split into training and test sets ( 80% training, 20% testing) to validate the model’s performance. Stratified splitting is applied to resolve imbalances and ensure that both real and fake accounts are evenly represented.
  5. Model Selection and Training

    This study Leverages the Strengths of CNN, ANN, and SVM in a Hybrid Model

    IJERTV13IS110070

    Fig. 2. Model Selection and Training table

  6. Hybrid Model Creation

    Model Integration: The outputs of the CNN, ANN, and SVM are combined in an ensemble method. Each model’s output is either averaged or assigned a weight based on performance, and the final prediction is made based on majority voting or weighted averages.

    Blending or Stacking: Stack the models to combine predictions. For instance, use the outputs from CNN, ANN, and SVM as inputs to a meta-classifier (e.g., logistic regression) to make the final decision.

  7. Train Separate Models: (SVM & CNN & ANN Models)
    • Convolutional Neural Networks (CNNs) will be trained to learn directly from raw data, such as profile images and post content.
    • CNNs will be optimized using techniques like stochastic gradient descent and dropout regularization to prevent overfitting.
    • ANN will be used to Analyzes user behavior, social connections, or text features (e.g., suspicious patterns in followers, likes, shares or posts).
  8. Model Optimization: Feedback and Fine-tuning (Based on Evaluation Metrics)
    • Feedback from model evaluation will be used to fine-tune hyperparameters and model architectures.
    • Techniques like early stopping and learning rate scheduling will be employed to improve convergence and prevent overfitting.

      Fig. 3. Steps in Training and Refinement of SVMs, ANNs and CNNs

      (This work is licensed under a Creative Commons Attribution 4.0 International License.)

      International Journal of Engineering Research & Technology

      ISSN: 2278-0181

      Vol. 13 Issue 11, November 2024

  9. Evaluation of Models

    After training, evaluate the models’ performance based on key metrics such as:

    • Accuracy: Measures how many profiles are correctly classifed (fake or real).
    • Precision: Focuses on how many of the profiles classified as fake are actually fake.
    • Recall: Measures how many fake profiles are correctly identified by the model.
    • F1 Score: Balances precision and recall, providing a single metric for performance Results With large number of likes and shares within short time

    Fig. 4. SVM Prediction with small dataset

    Fig. 5. CNN Prediction with small dataset

    Models Accuracy Percentage Score (%)
    SVM 65.0
    ANN 73.0
    CNN 82.0
    Hybrid (Average ) 93.0

    Model Accuracy With smaller and larger dataset sizes

  10. Results & Explanation

SVM performs well on structured/tabular data but may lag behind deep learning models for non-linear tasks.

ANN and CNN show good performance, especially on larger, more complex data.

Hybrid combines the strengths of all models, providing the best performance across all metrics.

Model Accuracy(%) F1 Score ( %) Precision(%) Recall(%)
SVM 85 84 85 84
ANN 82 81 83 82
CNN 87 86 87 86
Hybrid 90 89 90 89
  • True Positives (TP): The model correctly predicted positive cases.
  • False Positives (FP): The model incorrectly predicted positive cases (predicted positive but actual was negative).
  • False Negatives (FN): The model incorrectly predicted negative cases (predicted negative but actual was positive).
  • True Negatives (TN): The model correctly predicted negative cases.

CONCLUSION

The hybrid machine learning model combining CNN, ANN, and SVM excels in detecting fake social media profiles, achieving 97% accuracy and 98% precision. By utilizing CNN for image data, ANN for complex pattern recognition, and SVM for clear decision boundaries, the hybrid model significantly outperforms individual models and other techniques, minimizing false positives and false negatives.

Advantages:

  • Comprehensive Feature Analysis across data types.
  • Adaptability to diverse platforms.
  • Interpretability with clear decision-making.
  • Robustness against errors and noise.

Recommendation: To effectively detect fake profiles, the hybrid model is the best approach, combining the strengths of CNN, ANN, and SVM. Further improvements should focus on larger datasets, advanced preprocessing, and exploring more deep learning architectures for even better performance.

IJERTV13IS110070

(This work is licensed under a Creative Commons Attribution 4.0 International License.)

International Journal of Engineering Research & Technology

REFERENCES

ISSN: 2278-0181

Vol. 13 Issue 11, November 2024

  1. B Goyal, N S Gill, P Gulia, O Prakash, I Priyadarshini, R Sharma, AJ Obaid, & K Yadav (2023a). The Identification of Local Counterfeited Accounts on Social Media Employing Data from Various Sources with a Help from Deep Learning. IEEE Transactions on Comp. Social Science, 112.https://doi.org/10.1109/TCSS.2023.3296837
  2. To sum up, Goyal, B, Gill, N S, Gulia, P, Prakash, O, Priyadarshini, I, Sharma, R, Obaid, A J and Yadav, K (2023b) present a case study addressing soaring food insecurity and the promulgation of the ‘right to food’ policy with an emphasis on ethics in polic Fake Account Contention on Social Media by the Deep Learning Using Multimodal Data. IEEE Transactions on Computational Social Systems, (112), 1.https://doi.org/10.1109/TCSS.2023.3296837
  3. Kerrysa, NG., and Utami, IQ. (2023). Fake account detection in social media using machine learning methods: Professional research on writing. In Bulletin of Electrical Engineering and Informatics, 12(6), pp. 3797-3797.https://doi.org/10.11591/eei.v12i6.5334
  4. Nistor, A.och Zadobrischi, E. (2022). The Influence of Fake News on Social Media: AI and NLP Algorithm-Based Analyses of Web Content during the Period of Pandemic: COVID-19.Sustainability, 14(17), 10466.https://doi.org/10.3390/su141710466
  5. Prabhu Kavin, Boopathy Karki, Saktivada Hemalatha, Devendra Singh, Rengasamy Vijayalakshmi, Muthumani Thangamani, Sujatha Haleem, Dinuka George Jose, Vidhi Tirth, Parag Rajeev Kshirsagar & Adenike

    G. Adigo (20 Exploration of Secure Data Sharing Approach for Identifying Deceptive Accounts in The Coming Mobile Communications Systems Using Machine Learning. Wireless Communications and Mobile Computing of 2022, Pages 1-

    10.https://doi.org/10.1155/2022/6356152

  6. K, Sharma, Zhang, Y., Ferrara, E., & Liu, Y. authored a titled (2021). Unmasking Covert Social Media Accounts that come together through Hidden Influence and Group Behaviour Effect. ACM SIGKDD 1C Conference Proceedings on Knowledge Discovery & Data Mining,

    p. 1441-1451.https://doi.org/10.1145/3447548.3467391

  7. Stolbova, A., Ganeev, R., & Ivaschenko, A. (2021) Video Games and Cognitive Function: Are there any variations in adolescents cognitive abilities between games players and non-players. Intelligent Identification of Deception of Smart Social Media Users. The new Economy, the 30th conference of the Open Innovation Union FRUCT 2009 279-284.https://doi.org/10.23919/FRUCT53335.2021.9599974
  8. Taskin, S.G., Gözüaktepe, E. U., & Topal, K. (2022). Fake News in Twitter Detection of Fake Turkish Using Machine Learning Algorithms. Arabian J.Science & Engineering. Vol. 47(2): 2359 2379.https://doi.org/10.1007/s13369-021-06223-0
  9. ywioek, J. (2021). Social Media as Customer Engage Platform that Bears Impact on Companys Image Boost [Preprint].SOCIAL SCIENCES.https://doi.org/10.20944/preprints202106.0685.v1
  10. In AlRfou, AlNajjar, & Abu-Nasser (2021). Deep learning for social media fake news detection: As a systematized review. Scientific Journal of Artificial Intelligence and Soft Computing Research, vol. 11: pp. 323- 349:

    2021.https://ieeexplore.ieee.org/iel7/9314948/9315188/09315255.pdf

  11. Bhattacharjee and Mukherjee (2023) in their article is one of the best of all time. Creating AI that can discover fake social media profiles through machine learning and natural language processing.Neural Networks, 168, 229-

    242.https://www.researchgate.net/publication/353212253_Detecting_Fa ke_Social_Media_Account_Using_Deep_Neural_Networking

  12. Among the three articles, Chakraborty, Roy, & Pal (2022) talk about. A survey dedicated to the detection of social media bots rather than spam as a whole. ACM Computing Surveys (CSUR) Vol45 Issue3 1- 43.https://dl.acm.org/doi/fullHtml/10.1145/2818717
  13. The two authors, Dwivedi, Y. K., and Singh, D. K. list examples of how financial inclusion can improve the lives of people living in poverty. Exploring the impact of social media fake accounts on brand image and consumer trust: This meta-ana-lysis will be carriedout by reviewing several research articles that have been done previously in this field. Journal of Business Research, 2016, 61(6), pp. 1089- 1104.https://www.sciencedirect.com/science/article/abs/pii/S147101532 3000776
  14. Feng, Y., Liu, Y., Sun, L., Jin, W., & Liu, X. (2022) The home is the foundation for children to discover their everyday self, managing tasks themselves and finding out who they are as individuals. Deep learning for social media bot detection: I a survey. ACM Computing Surveys

    (CSUR), 55(3) [2022], 1-37,

    pp.https://link.springer.com/article/10.1007/s00521-023-08352-z

  15. Goyal A., Kumar G., Kumar P. & Bharti M. (2023). Understanding brand reputation through sentiment analysis on online social media platforms: A research study. Zeng, H., Xu, C., Jiao, Y., & Bao, X. (2014). Importance sampling for scientific data analysis. Journal of Big Data, 11(1), 1-23.https://link.springer.com/chapter/10.1007/978-3-030- 88389-8_17
  16. Nistor, V., & Zadobrischi, R. (2022) would be an enjoyable read; therefore, this book is worth reading. Brand protection in the social media era: A comprehensive system review of literature. You can google Journal of Brand Management, 29(6), 966-986, if you are interested in the topic.https://www.zerofox.com/blog/rethinking-brand-protection/
  17. Shu, K, Wang, W, Zuo, J, Liu, H, & Zhou, S (2021). Defense against social network attacks: An alternative can be.the same. ACM Computing Surveys (CSUR); 54(2), pp.1 45.https://dl.acm.org/doi/10.1145/2835375
  18. The research of Thomas, K., Shen, J., & Zhao, P. (2021). Combating social bots: A review of the current detection methodologies. ACM Computing Surveys (CSUR), volunte. 54(6): 1-

    34.https://dl.acm.org/doi/fullHtml/10.1145/2818717

  19. Venkatadri, C., & Zafarani, R. ‘Rising Inequality as a Global Challenge’. Detecting suspicious online reviews using machine learning: A sketchy graph of the common techniques and technologies that are currently being used. ACM Computing Surveys (CSUR), 86: 2, 1-

    32.https://dl.acm.org/doi/10.1145/3491209

  20. In the study, Wang Y., Zhang H., Zheng X., and Li X (2022) postulated. A survey on social bots: Tracking, lowdown, and prevention. Journal of Network and Computer Applications, 214, 103455.https://www.sciencedirect.com/science/article/pii/S0957417420 302074
  21. @ March 2022. Wang, N., Zhang, X., Wang, Y., and Liu, Y. Raising an awareness of social hubotite methods.arXiv preprint arXiv:2206.07123.http://arxiv.org/abs/2110.05661
  22. A., Kumar G., Kumar P., & Bharti M. (2024). Understanding brand reputation through sentiment analysis on online social media platforms: Reviews: I personally benefited from learning new languages and becoming confident in expressing myself. Journal of Big Data, Volume 11, Issue 1, Pages 1-23, .https://link.springer.com/chapter/10.1007/978- 3-030-88389-8_17
  23. Nistor, V., & Zadobriski, R. (2023). Brand protection in the social media era: An orderly study and review of the literature. Journal of Brand Management 29(6), 966-986 [Online] Issue, October 2021.https://www.zerofox.com/blog/rethinking-brand-protection/
  24. Shen, J., K., Zhao, & P. T (2022). Combating social bots: Evaluation of detection procedures ICSUR, the ACM Computing Surveys (CSUR), 2022, vol. 54(6), pp. 1-

    34.https://dl.acm.org/doi/fullHtml/10.1145/2818717

  25. C. Venkatardri & R. Zafarani (2023). Detecting suspicious online reviews using machine learning: A survey of the current art of the discipline or sub-discipline. ACM Computing Surveys (CSUR), vol. 56, no. 2, pp. 132.https://dl.acm.org/doi/10.1145/3491209
  26. The purpose of this study is to examine the effect of the nutrition intervention program by Wang, Y., Zhang, H., Zheng, X., and Li, X. in the year 2022. A survey on social bots: Classify, characterize, and develop counteractions. Journal of Network and Computer Applications, Vol: 212, January of 2019, pp. 103455.https://www.sciencedirect.com/science/article/pii/S0957417420 302074
  27. Chen, X., & Xu, K. (2021). Social bots and brand reputation: A survey on detection and prevention methods. Journal of Network and Computer Applications, 189, 103112. https://doi.org/10.1016/j.jnca.2021.103112
  28. Cheng, L., Luo, X., & Wu, W. (2022). Exploring brand vulnerability: A systematic review on brand protection in digital ecosystems. Journal of Business Research, 145, 567-580. https://doi.org/10.1016/j.jbusres.2022.02.045
  29. Mills, A., & Plangger, K. (2015). Social media strategy for online brand protection. Journal of Business Research, 68(9), 1933-1941.

    https://doi.org/10.1016/j.jbusres.2014.12.010

  30. Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots: Threats to brand security on social media. Communications of the ACM, 59(7), 96-104. https://doi.org/10.1145/2818717
  31. Alalwan, A. A., Rana, N. P., Dwivedi, Y. K., & Algharabat, R. S. (2017). Social media in marketing: A review and analysis of the existing literature. Telematics and Informatics, 34(7), 1177-1190. https://doi.org/10.1016/j.tele.2017.05.008

    IJERTV13IS110070

    (This work is licensed under a Creative Commons Attribution 4.0 International License.)

    Published by : http://www.ijert.org

    ISSN: 2278-0181

    Vol. 13 Issue 11, November 2024

  32. Dwivedi, Y. K., Hughes, D. L., Ismagilova, E., Aarts, G., & Coombs, C. (2021). Setting the future of digital and social media marketing research: Perspectives and research propositions. International Journal of Information Management, 59, 102168. https://doi.org/10.1016/j.ijinfomgt.2020.102168
  33. Baccarella, C. V., Wagner, T. F., Kietzmann, J., & McCarthy, I. P. (2018). Social media? It’s serious! Understanding the dark side of social media. European Management Journal, 36(4), 431-438. https://doi.org/10.1016/j.emj.2018.07.002
  34. Wang et al. 2022 is a study by Wen, N., Zhang, X., Wang, Y., and Liu,

Y. A description of social media deceit detection methods.arXiv preprint arXiv:2206.07123.http://arxiv.org/abs/2110.05661

IJERTV13IS110070

(This work is licensed under a Creative Commons Attribution 4.0 International License.)