- Open Access
- Authors : Akash R Nair, Arathy Krishna, Ashik Das Th, Diya Merin Babu, Anjana Sekhar
- Paper ID : ICCIDT2K23-211
- Volume & Issue : Volume 11, Issue 02
- Published (First Online): 15-06-2023
- ISSN (Online) : 2278-0181
- Publisher Name : IJERT
- License: This work is licensed under a Creative Commons Attribution 4.0 International License
An Intelligent Hate Speech Detection System For Safetalk Using Bidirectional LSTM
igent Hate Speech Detection Syste01 Fo1 Using Bidirectional LSTM
Akash R Nair 1
B-tech Student
Ashik Das TH3
B-tech Student
ter Science &Engineering
Computer Science &Engir
ilil Colleg Of Engineering
Mangalam Colleg Of Engi
Arathy Krishna2
B-tech Student
Diya Merin Babu
B-tech Student
:er Science &Engineering
Computer Science &Engir
un Colleg Of Engineering
Mangalam Colleg Of Engi
Anjana Sekhar5
Assistant Professor Computer Science &Engineering Mangalam Colleg Of Engineering
peech is clearly directed at social on the Internet, especially on social
. Recognition becomes increasingly increasingly important to identify and J
ins overlap. However, there are still transmission of hate speech, i.e. figl
,terns with informal and indirect sexism. With the vast amount of userÂ
: communication, such as sarcasm, on the Internet, especially on social
1d glorification of immoral behavior increasingly important to identify anc nee or society. In this study, we spread of hate speech Hate speech and selectiQB.e 11,mfttllod based on Puiihfill by, wagn’ii1igt another person’s religion, eth 1tion networks. The purpose is to orientation are prohibited by law. In m d scale the number of trainine n ,J… ;_ ;4- n H; 1 – – ;…1 ;n
l online system (especiall y during a
RELATED WORK
iques have proven to be very effective
At the same time, however, it leads conflict and hatred, making the unattracti ve for users. Although resear
hate is a cross-platform problem, there
:ch. The performance of deep learningÂ
detection models that use cross-platfc
as outperformed classical machine
research gap, we are collecting a total o1
;h as support vector machines (SVMs),
four platforms: YouTube, Reddit, WikiI
cision trees (GBDTs), and logistic
the deep learning-based classifiers, etworks (CNNs) record local patterns m memory model (LSTM) based on a l or Gated Recurrent Unit (GRU) long-range dependencies
to detect hate speech in tweets:
80% of comments labeled as non-ha 20% labeled as hateful. Then we te algorithms (Logistic Regression, Naive Machine, XGBoost and Neural N representation (Bag-of-Words, TF-IDF, their combinations). Although all outperformed the benchmark keyv
on Twitter is essential for applications
XGBoost using all the features would pe
troversial events, creating AI chatbots,
Feature importance analysis indicated t]
>ns, and sentiment analysis. We define to classify a tweet as racist, sexist or
the most impact on predictions.
natural language constructs makes this [4] Hate me don’t hate me: Detect hatE
!perform extensive tests with several While promoting communication and f tures to learn how to embed semantic sharing, social networking sites are also complexity. Our tests on a benchmark campaigns against specific grou1
t1otated tweets show that such deep Cyberbullying, inciting self-harm, sexu,
erform modern character/word n-gram of the serious effects of large-scale onli
11. With the dramatic increase in social attacks can be made against groups of v
;ocial networks, there has also been an into physical violence. In this work, we ,
ctivity aimed at exploiting these the alarming spread of such hate camp, itter, tji e1\ m are tweets conta by, waifujat nchmark, we looked at the text co
1ed at individuals (online followers, appeared on a set of Italian public pag
:, products) or specific groups (a mnlt1nle h::ite c.::iteP-or1es to rl1sti n P-1111
METHODOLOGY
- Bidirectional LSTMs process the te
backward directions, allowing theu
[Idependencies in the text.
- Output layer: The output layer of ti
detection model using LSTM, we can
binary classifier that outputs a p
ps:
being hateful or not.
:ollect a large dataset of labeled data
- Evaluation: Evaluate the performau
mples of hate speech and non-hate
a test dataset. Common evalua
several publicly available datasets for
accuracy, precision, recall, and Fl s
on that can be used.
- Tuning: Fine-tune the hyper pararr
g: Preprocess the data by removing
to optimize performance. This can
tion such as stopwords, punctuation,
:ers. Then tokenize the text and convert
number of LSTM layers, the numl layer, and the learning rate.
integers to feed into the LSTM model.
- Deployment: Deploy the trained r
s: Use pre-trained word embeddings
text as hateful or not.
.stText or Word2Vec to represent each
Overall, using a bidirectional LSTM al
vith a dense vector . This will help the
speech involves a combination of natur;
learn better semantic relationships
techniques and machine learning algori1
use a large and diverse dataset to trair
chitecture: Define the LSTM model
carefully evaluate its performance to eni
n embedding layer followed by one or
s. The output of the LSTM layers will Hy connected layer with a sigmoid
C. System Architecture
1 to produce a binary classification
DATA COLLECTION | .
. |
DATA
, PRE-PROCESSING |
>—-
: LSTM model on the preprocessed and ng the back propagation algorithm with
,. Adj.\l&te 1hyptl:lf parameters suchbliila by, wwwij ie”r’-t’ .or…._ -.
ch size, and number of epochs to EVALUATING
- ——-1 MODEL BUILDING
I’ nP.rform ::inrP.
MODEL
;e the relevant features are selected,
on the data using a suitable machine ‘here are many algorithms to choose
- FUTURE SCOPE
As natural language processing (NLP
r regression, decision trees, neural
continue to evolve. BiLSTM (BidiJ
1odel learns from the input data and
Term Memory) is a deep learning a]
rs to minimize the error or loss
proven effective in text classification 1
‘he trained model is evaluated on a
analysis and hate speech detection.
directions for research and developmE
I assess its performance and accuracy.
Multilingual hate speech detection: B
ics depend on the type of problem ample, for a classification problem,
to detect the hate speech in multipl1
especially important due to the gl1
: accuracy, precision, recall, and Fl
media and the internet. As a result, rE
models capable of detecting hate SJ
Once the model is trained and
languages, which will have impo
! deployed in production to make
content monitoring and n
:lta. The deployment process involves
Contextualization: Hate speech detec
l into the existing system, testing its
contextualization to better unders
nitoring its performance over time.
te ML pipeline typically consists of a
. data preprocessing layer, a feature model training layer, and a model
meaning of language
- CONCLUSION
- FUTURE SCOPE
:se layers can be implemented using
In daily life, as the use of social mi
chnologies like Python, scikit-learn,
seem to think that they can say or wri
yTorch, etc.
Due to this reflection, ate speech ha
IV. RESULT a need to automate the process of c
Volume 11, Issue Ol Published by, www,data. Interpretable natural language
. aims to reduce the number of data learning have been adopted in recent
r .’ . . ‘ .
ahadori, A. Schuetz, W. F. Stewart, and Predicting clinical events via recurrent
t Proc. 1st Mach. Learn. Healthcare 2016, pp. 301-318.
E. Aldana, and K. Stein, “Artificial
:alth care space: How we can trust what tan. L. Pol ‘y Rev., vol. 30, p. 399, Jul.
“Explainable artificial intelligence,,, ‘rojects Agency (DARPA), p. 2, 2017.
Langs, H. Denk, K. Zatloukal, and H. bility and explainability of artificial cine,” WIREs, Data Mini ng Knowl.
,. 4, p. e1312, Jul. 2019.
1e importance of interpretability and aachine learning for applications in care,” Neural Comput. Appl, vol. 32, b. 2019.
:t al., “Global, regional, and national
:, and years lived with disability for 354 for 195 countries and territories, 1990- analysis for the global burden of
” Lancet, vol. 392, no. 10159, pp. 18, doi: 10.1016/S0140-6736(18)32279-
Volume 11, Issue 01 Published by, wwwijert.org