An Intelligent Hate Speech Detection System For Safetalk Using Bidirectional LSTM

Akash R Nair; Arathy Krishna; Ashik Das Th; Diya Merin Babu; Anjana Sekhar

doi:10.17577/ICCIDT2K23-211

ICCIDT- 2023 (Volume 11 – Issue 01)

An Intelligent Hate Speech Detection System For Safetalk Using Bidirectional LSTM

DOI : 10.17577/ICCIDT2K23-211

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 54
Authors : Akash R Nair, Arathy Krishna, Ashik Das Th, Diya Merin Babu, Anjana Sekhar
Paper ID : ICCIDT2K23-211
Volume & Issue : Volume 11, Issue 02
Published (First Online): 15-06-2023
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

An Intelligent Hate Speech Detection System For Safetalk Using Bidirectional LSTM

igent Hate Speech Detection Syste01 Fo1 Using Bidirectional LSTM

Akash R Nair 1

B-tech Student

Ashik Das TH3

B-tech Student

ter Science &Engineering

Computer Science &Engir

ilil Colleg Of Engineering

Mangalam Colleg Of Engi

Arathy Krishna2

B-tech Student

Diya Merin Babu

B-tech Student

:er Science &Engineering

Computer Science &Engir

un Colleg Of Engineering

Mangalam Colleg Of Engi

Anjana Sekhar5

Assistant Professor Computer Science &Engineering Mangalam Colleg Of Engineering

peech is clearly directed at social on the Internet, especially on social

. Recognition becomes increasingly increasingly important to identify and J

ins overlap. However, there are still transmission of hate speech, i.e. figl

,terns with informal and indirect sexism. With the vast amount of userÂ

: communication, such as sarcasm, on the Internet, especially on social

1d glorification of immoral behavior increasingly important to identify anc nee or society. In this study, we spread of hate speech Hate speech and selectiQB.e 11,mfttllod based on Puiihfill by, wagn’ii1igt another person’s religion, eth 1tion networks. The purpose is to orientation are prohibited by law. In m d scale the number of trainine n ,J… ;_ ;4- n H; 1 – – ;…1 ;n

l online system (especiall y during a

RELATED WORK

iques have proven to be very effective

At the same time, however, it leads conflict and hatred, making the unattracti ve for users. Although resear

hate is a cross-platform problem, there

:ch. The performance of deep learningÂ

detection models that use cross-platfc

as outperformed classical machine

research gap, we are collecting a total o1

;h as support vector machines (SVMs),

four platforms: YouTube, Reddit, WikiI

cision trees (GBDTs), and logistic

the deep learning-based classifiers, etworks (CNNs) record local patterns m memory model (LSTM) based on a l or Gated Recurrent Unit (GRU) long-range dependencies

to detect hate speech in tweets:

80% of comments labeled as non-ha 20% labeled as hateful. Then we te algorithms (Logistic Regression, Naive Machine, XGBoost and Neural N representation (Bag-of-Words, TF-IDF, their combinations). Although all outperformed the benchmark keyv

on Twitter is essential for applications

XGBoost using all the features would pe

troversial events, creating AI chatbots,

Feature importance analysis indicated t]

>ns, and sentiment analysis. We define to classify a tweet as racist, sexist or

the most impact on predictions.

natural language constructs makes this [4] Hate me don’t hate me: Detect hatE

!perform extensive tests with several While promoting communication and f tures to learn how to embed semantic sharing, social networking sites are also complexity. Our tests on a benchmark campaigns against specific grou1

t1otated tweets show that such deep Cyberbullying, inciting self-harm, sexu,

erform modern character/word n-gram of the serious effects of large-scale onli

11. With the dramatic increase in social attacks can be made against groups of v

;ocial networks, there has also been an into physical violence. In this work, we ,

ctivity aimed at exploiting these the alarming spread of such hate camp, itter, tji e1\ m are tweets conta by, waifujat nchmark, we looked at the text co

1ed at individuals (online followers, appeared on a set of Italian public pag

:, products) or specific groups (a mnlt1nle h::ite c.::iteP-or1es to rl1sti n P-1111

METHODOLOGY

Bidirectional LSTMs process the te
backward directions, allowing theu
[I

dependencies in the text.
Output layer: The output layer of ti
detection model using LSTM, we can

binary classifier that outputs a p

ps:

being hateful or not.

:ollect a large dataset of labeled data
Evaluation: Evaluate the performau
mples of hate speech and non-hate

a test dataset. Common evalua

several publicly available datasets for

accuracy, precision, recall, and Fl s

on that can be used.
Tuning: Fine-tune the hyper pararr
g: Preprocess the data by removing

to optimize performance. This can

tion such as stopwords, punctuation,

:ers. Then tokenize the text and convert

number of LSTM layers, the numl layer, and the learning rate.

integers to feed into the LSTM model.
Deployment: Deploy the trained r

s: Use pre-trained word embeddings

text as hateful or not.

.stText or Word2Vec to represent each

Overall, using a bidirectional LSTM al

vith a dense vector . This will help the

speech involves a combination of natur;

learn better semantic relationships

techniques and machine learning algori1

use a large and diverse dataset to trair

chitecture: Define the LSTM model

carefully evaluate its performance to eni

n embedding layer followed by one or

s. The output of the LSTM layers will Hy connected layer with a sigmoid

C. System Architecture

1 to produce a binary classification

DATA COLLECTION

.

DATA

, PRE-PROCESSING

>—-

: LSTM model on the preprocessed and ng the back propagation algorithm with

,. Adj.\l&te 1hyptl:lf parameters suchbliila by, wwwij ie”r’-t’ .or…._ -.

ch size, and number of epochs to EVALUATING

——-1 MODEL BUILDING
I’ nP.rform ::inrP.

MODEL

;e the relevant features are selected,

on the data using a suitable machine ‘here are many algorithms to choose
1. FUTURE SCOPE
  As natural language processing (NLP
  
  r regression, decision trees, neural
  
  continue to evolve. BiLSTM (BidiJ
  
  1odel learns from the input data and
  
  Term Memory) is a deep learning a]
  rs to minimize the error or loss
  
  proven effective in text classification 1
  
  ‘he trained model is evaluated on a
  
  analysis and hate speech detection.
  
  directions for research and developmE
  
  I assess its performance and accuracy.
  
  Multilingual hate speech detection: B
  
  ics depend on the type of problem ample, for a classification problem,
  
  to detect the hate speech in multipl1
  
  especially important due to the gl1
  
  : accuracy, precision, recall, and Fl
  
  media and the internet. As a result, rE
  
  models capable of detecting hate SJ
  
  Once the model is trained and
  
  languages, which will have impo
  
  ! deployed in production to make
  
  content monitoring and n
  
  :lta. The deployment process involves
  
  Contextualization: Hate speech detec
  
  l into the existing system, testing its
  
  contextualization to better unders
  
  nitoring its performance over time.
  
  te ML pipeline typically consists of a
  
  . data preprocessing layer, a feature model training layer, and a model
  
  meaning of language
2. CONCLUSION

:se layers can be implemented using

In daily life, as the use of social mi

chnologies like Python, scikit-learn,

seem to think that they can say or wri

yTorch, etc.

Due to this reflection, ate speech ha

IV. RESULT a need to automate the process of c

Volume 11, Issue Ol Published by, www,data. Interpretable natural language

. aims to reduce the number of data learning have been adopted in recent

r .’ . . ‘ .

ahadori, A. Schuetz, W. F. Stewart, and Predicting clinical events via recurrent

t Proc. 1st Mach. Learn. Healthcare 2016, pp. 301-318.

E. Aldana, and K. Stein, “Artificial

:alth care space: How we can trust what tan. L. Pol ‘y Rev., vol. 30, p. 399, Jul.

“Explainable artificial intelligence,,, ‘rojects Agency (DARPA), p. 2, 2017.

Langs, H. Denk, K. Zatloukal, and H. bility and explainability of artificial cine,” WIREs, Data Mini ng Knowl.

,. 4, p. e1312, Jul. 2019.

1e importance of interpretability and aachine learning for applications in care,” Neural Comput. Appl, vol. 32, b. 2019.

:t al., “Global, regional, and national

:, and years lived with disability for 354 for 195 countries and territories, 1990- analysis for the global burden of

” Lancet, vol. 392, no. 10159, pp. 18, doi: 10.1016/S0140-6736(18)32279-

Volume 11, Issue 01 Published by, wwwijert.org