Personality Trait Classification Using CNN-LSTM Model

Joffin George; Koshy M Varkey; Vidul Venogopalan; Rahul G; Ms.Neema George

doi:10.17577/ICCIDT2K23-222

ICCIDT- 2023 (Volume 11 – Issue 01)

Personality Trait Classification Using CNN-LSTM Model

DOI : 10.17577/ICCIDT2K23-222

Download Full-Text PDF Cite this Publication

Open Access
Article Download / Views: 209
Authors : Joffin George, Koshy M Varkey, Vidul Venogopalan, Rahul G, Ms.Neema George
Paper ID : ICCIDT2K23-222
Volume & Issue : Volume 11, Issue 01 (June 2023)
Published (First Online): 11-06-2023
ISSN (Online) : 2278-0181
Publisher Name : IJERT
License: This work is licensed under a Creative Commons Attribution 4.0 International License

PDF Version

View

Text Only Version

Personality Trait Classification Using CNN-LSTM Model

Joffin George, Koshy M Varkey, Vidul Venogopalan, Rahul G Ms.Neema George

Dept. of CSE,Mangalam College of Engineering Ettumanoor,Kerala,India

Mail_id:joffingeorge10@gmail.com, koshymvarkey2000@gmail.com, vidulvenugopal2001@gmail.com, rahulgpalakuzhiyil@gmail.com, neema.george@mangalam.in

KeywordsPersonality Trait,CNN,LSTM,Keras Tokenizer,Deep Learning

INTRODUCTION

Cognitive science deals with various emotions of the people on the basis of their tweets.Personality plays an important role in determining the individual. Personality of an individual can be judged from various parameters such as text,audio,video. CNN extracts the basic features of the sentences without storing the previous information. We introduce a technique which uses both CNN and LSTM technology to enhance the feature of existing system.The present system aims to classify user behaviour based on various deep learning models.We also introduce the concept of audio to text using the same CNN+LSTM model to improvise the existing system.
1. RESEARCH STUDY MOTIVATION
  
  Various studies have been conducted by the research scientist for personality detection.The personality trait is a classification problem where the user gives the input tweet as text and he gets the various emotions corresponding to the text he has given.The aim of this paper is to build a strong model for personality trait detection.We have catagorized the various emotions as a pair like I-E,N-S,T-F,J-P. We classify the various tweets into this category.There is also provision for user to give their tweets in the form of audio and our system will detect the personality of individual based on the audio.
2. PROBLEM BACKGROUND
  
  Cognitive-based SA applications have gained popularity in recent years among online communities as a way to learn about people's attitudes and personality traits towards various topics, laws, and other
  
  things. However, because. It takes a lot of time to analyze text using the present techniques to find personality traits in such content because of the diversity of social media information. Therefore, it has become essential to automatically classify personality traits for use in social media content extraction and analysis. We have all seen a lot of research in the fields of text-based SA, lexicon generation, cognition, aspect-based SA, and visual SA. However, further study on cognitive-based social media is needed, with an emphasis on extracting and classifying personality features. Our suggested method can resolve both issues, but the current system cannot handle both audio and text transformation.
3. RESEARCH PROBLEM
  
  The present system for personality trait classification has limited number of models. These techniques uses old models and they need to be improved for improving the accuracy of the system. We treat personality trait as a classification problem and which need to be resolved for future. Furthermore we include the concept of speech to text where user will give his tweet in the form of audio and our training model will identify the emotion from that audio.Through this we show that our model is self sufficient to solve all kinds of transformation.
4. CONTRIBUTION
1. Exploring LSTM model and capturing information from text and training it using CNN and storing it using LSTM.
2. We used SVM and conducted various test like logistic regression,decision tree,k nearest neighbour.
3. our proposed system has showed very good performance and result in against of the existing system models.
4. The proposed system can help various companies to analyse the personality of their employees.
5. Furthermore we provide both speech to text as well as audio to text so there would be added benefit for users to use our system.
REVIEW OF LITERATURE

In this section, a comprehensive study of personality trait classification. The study discusses various practices and approaches that have address the problem of personality prediction, and the methods that have been used for this purpose. Specifically, the literature focuses on machine learning approaches to personality recognition, and the following studies are reviewed in detail :

A.Sentiment Analysis of Arabic Tweets from Twitter

Arabic sentiment analysis, however, poses challenges due to the informal and noisy nature of the language and its rich morphology. For Arabic sentiment analysis, they still require improvements in terms of accuracy and efficiency. In order to address this issue, an approach called corpus-based approach is used for Arabic sentiment analysis of tweets from Twitter. A Discriminative Multinomial NaÃ¯ve Bayes (DMNB) method along with N-grams tokenizer, stemming, and term frequency-inverse document frequency (TF-IDF) techniques is utilized. The proposed approach is evaluated using a set of performance evaluation metrics on a public Twitter dataset. The experimental results shows the effectiveness of the presented approach, which outperformed related works and improved accuracy by 0.3%.
1. Recognizing Personality from reading text speech
  
  This study depicts the relationship between an individual's reading text speech and the personality traits using the Five-Factor Model of Personality. This study involves 140 subjects whose reading text speech was determined with the help of Open SMILE toolkit and ComParE 2013 audio feature set that is used for feature extraction. Kernel SVM classifier was used for classification, along with five filter feature selection approaches and Principal Component Analysis. SVM models were trained individually for each trait using repeated cross validation and five different feature sets. One of the best achieved Unweighted Average Recall (UAR) ranges from 74% to 80% depending on the specific trait being analyzed. These results suggest that automatic identification of speaker personality based on reading text speech is a promising area for further research.
2. System for Personality and Happiness Detection
  
  This study propose a platform for assessing personality and happiness based on Eysenck's theory. Their platform collects text messages from social media, specifically WhatsApp, and applies machine learning algorithms to classify them into distinct personality categories. Although the relationship between personality features and happiness is not yet clear, future correlations may emerge. The platform is described in detail, and various sources of messages are used as a proof of concept. Researchers have traditionally used both direct (e.g., the EPQ-R questionnaire) and indirect methods to gain insight into human personality, with written text being one of the latter. Because personality is thought to be consistent across situations and time, trained psychologists can infer a person's personality profile by observing their behavior. Based on existing research, it is reasonable to assume that individuals will exhibit unique written expression patterns that correspond to their distinct personalities.
3. Personality Detection of players in an educational game
  
  This study discusses the use of Educational Data Mining (EDM) to model student behavior and personality in Intelligent Tutoring Systems (ITS). Specifically, the authors introduce an approach using data mining techniques and NLP to automatically detect student personality and behavior in an educational game. The framework relies on he classification of input excerpts into six different personality classes, using algorithms such as Naive Bayes, Support Vector Machine (SVM), and Decision Tree. Traditional techniques for detecting psychopathy, such as the Hare Psychopathy Checklist and the Psychopathy Checklist-Revised (PCL-R), rely on manual assessment. However textual content from social media can also be used to detect the personality traits of individuals. . Earlier studies have applied supervised machine learning approaches, such as SVM, NB, and DT, to identify personality traits in students during educational games. The results showed that using n-grams as features gave the finest performance as compared to other feature sets.
4. Predicting the Big Five Personality Traits Using Facial Images of Students
  
  This study shows how to predict college students personality characteristics with static facial images. It focuses on the relationship between self-reported personality characteristics and facial features.. To succeed this, they constructed a dataset that contains 13,347 data pairs composed of facial images and personality characteristics and trained a deep neural network with 10,667 sample pairs from the dataset and used the remaining samples to test (1335 pairs) and validate (1335 pairs) self-reported Big Five personalities… The results show that personality traits can be reliably predicted from facial images with an accuracy that exceeds 70%. In case of five- character tag classification, the recognition accuracy of neuroticism and extroversion was the most accurate, and the prediction accuracy exceeded 90%.
PROPOSED SYSTEM A.MOTIVATION

Various deep learning models, including CNN, GRU, RNN, and LSTM, have been utilized for personality classification. However, all these models alone fail to capture semantic information effectively. The combination of deep learning models, such as CNN + LSTM, which allows us to take advantage of two models, CNN and LSTM, to capture context information more effectively. Moreover, using the LSTM model helps comprehend the context more efficiently by saving information in one direction. Our research project study aims to classify personality traits, such as 'I(Introversion)-E(Extroversion)', 'N(Intuition)-S(Sensing)', ' T(Thinking)-F(Feeling)' and ' J(Judging)- P(Perceiving)' from textual data. To achieve this goal, we propose implementing a deep neural network model called Convolutional Neural Network including Long Short-Term Memory (CNN+LSTM), which demonstrates great potential.

Referring to Figure 1, the suggested approach for categorizing personality qualities from social media texts entails a number of modules, including input tweet text, data pre-processing, feature extraction, and the use of a deep neural network. Getting the required information is done in the first module, then pre-processing is done to get the social media reviews ready for analysis. After that, raw data are converted into a numerical representation as part of the feature extraction process. A deep neural network is then utilized in the final module to turn them into a machine-readable format represented by a real-valued vector. Using word embeddings, the words are mathematically encoded in this process and sent into the hidden layers, which use CNN and LSTM models. The LSTM model learns long-term knowledge to effectively identify user evaluations based on several personality qualities including "I-E," "N-S," "T-F," and "J-P."

The CNN model extracts the significant features from the input data. For a comprehensive description of the four modules, please see the following sections.
In the last step, a deep neural network is used to categorise personality traits from the input text. Three layers make up the CNN+LSTM model: the input layer, the hidden layer, and the output layer. Here is an explanation of these several layers:
1. INPUT LAYER
  
  The deep neural network's input layer is in charge of taking in incoming data. The embedding layer of Kera's library is used to convert words into real-valued vectors. Semantic information is captured by this numerical representation.
2. HIDDEN LAYER
  
  Multiple CNN and LSTM layers make up the deep neural network's hidden layer. The following are the layers and components of the CNN model:
  1. Convolutional Layer
    
    With the use of a linear procedure known as convolution, this layer collects features from the incoming data. The input data is sent through the filter, and the resulting feature map is activated nonlinearly using a function like "Relu" to eliminate negative values.
  2. Pooling Layer
    
    The input from the previous layer is used to perform a downsampling procedure known as Maxpooling, which lowers the volume of the feature map after convolution.
  3. LSTM Layer
    
    The LSTM layer is added to learn long-term information. It incorporates input from the CNN model and keeps both recent and old data. It can also memorise long-term memories and retain knowledge for lengthy periods of time. The results of the LSTM layer are then added to the output layer.
3. OUTPUT LAYER
After feature extraction, downsampling, and long-term memory at the convolutional, pooling, and LSTM layers, respectively, the output layer of the deep neural network categorises the learned features. The input text, such as "I am finding the lack of me in these posts very alarming," is classified in this layer using the "softmax" function into four personality characteristic classes: "I-E," "N-S," "T-F," and "J-P." The target label (class), which is generated by the softmax function, is given to the input text by the class with the highest likelihood.

EXPERIMENTS AND RESULTS

We tried various CNN+LSTM models with parameters having different values for the categorization of input text over different personality traits. We used a single layer with a variety of parameters to get a best output. As a result, optimising CNN+LSTM's parameters increases the classifier's effectiveness. We used various settings for the LSTM layer's "units" parameter.The various layers used in our model,their parameters and values are listed out in Table 5.

Proposed model Layers	Parameters and their values
CNN Layer	Kernal size =3×3 Filters =64 padding = same layers of pooling =2
Layer of units LSTM	50
Dropout Layer	rate=0.2
Dense Layer	classes=2 activation =relu

Further parameters length of input size=823 epochs=5

batch_size=32 output_dimension=5 00

Table8. The overall accuracy that we have got is 0.85%.After completing the experiment our model performed well for the Feeling personality characteristic because its F-measure is greater than the F- measure for Thinking.

Personality Characteristi c	F-measure	Accuracy
Thinking	0.84	0.85
Feeling	0.86	0.85

Experiment predicting Introversion-Extroversion

In this Experiment, we test the effectiveness of the proposed CNN+LSTM model to predict the accuracy whether the input text or the speech is an Introversion-Extroversion. The findings are shown in Table 6. The overall accuracy that we have got is 0.88%. After completing the experiment our model performed well for the Introversion personality characteristic because its F-measure is greater than the F-measure for Extroversion.

Personality Characteristic	F-measure	Accuracy
Introversion	0.92	0.88
Extroversion	0.72	0.88

Experiment predicting Judging-Perception

In this Experiment, we test the effectiveness of the proposed CNN+LSTM model to predict the accuracy whether the input text or the speech is an Judging-Perception. The findings are shown in Table

9. The overall accuracy that we have got is 0.80%. After completing the experiment our model performed well for the Perception personality characteristic because its f-measure is greater than the f- measure for Judging.

Experiment predicting Intuition-Sensing

In this Experiment, we test the effectiveness of the proposed CNN+LSTM model to predict the accuracy whether the input text or the speech is an Intuition-Sensing. The findings are shown in Table

7. The overall accuracy that we have got is 0.91%.After completing the experiment our model performed well for the Intuition personality characteristic because its F-measure is greater than the F-measure for Sensing.

Personality Characteristi c	F-measure	Accuracy
Intuition	0.95	0.91
Sensing	0.62	0.91

.

Experiment predicting Thinking-Feeling

In this Experiment, we test the effectiveness of the proposed CNN+LSTM model to predict the accuracy whether the input text or the speech is an Thinking-Feeling. The findings are shown in

Personality Characteri stics	F-measure	Accuracy
Judging	0.70	0.80
Perception	0.80	0.80

CONCLUSION

In this method we predicted the personality of the individual using text and audio.By applying deep learning model and by concating CNN+LSTM we have helped achieve this target. With the help of max pooling layer we are able to extract basic feature of the in dividual.We are taking the data sequentially and with the help of LSTM we are able to get the information of prior data.This model has an average accuracy of 88%.
REFERENCES

[1] H. Ahmad, M. Z. Asghar, A. S. Khan, and A. Habib, A systematic literature review of personality trait classification from textual content, Open Comput. Sci., vol. 10, no. 1, pp. 175193, Jul. 2020.

[2] H. Ahmad, M. Z. Asghar, F. M. Alotaibi, and I. A. Hameed, Applying deep learning technique for depression classification in social media text, J. Med. Imag. Health Informat., vol. 10, no. 10, pp. 24462451, Oct. 2020, doi: 10.116/jmihi.2020.3169.

[3] R. Katarya and P. Srinivas, Predicting heart disease at early stages using machine learning: A survey, in Proc. Int. Conf. Electron. Sustain. Com mun. Syst. (ICESC), Jul. 2020, pp. 302305.

[4] Rahul and R. Katarya, A review: Predicting the performance of students using machine learning classification techniques, in Proc. 3rd Int. Conf. I-SMAC, Dec. 2019, pp. 3641.

[5] G. Kou, Y. Xu, Y. Peng, F. Shen, Y. Chen, K. Chang, and S. Kou, Bankruptcy prediction for SMEs using transactional data and two- stage multiobjective feature selection, Decis. Support Syst., vol. 140, Jan. 2021, Art. no. 113429.

[6] A. Kazameini, S. Fatehi, Y. Mehta, S. Eetemadi, and E. Cambria, Per sonality trait detection using bagged SVM over BERT word embedding ensembles, 2020, arXiv:2010.01309.

[7] L. Liu, D. Preotiuc-Pietro, Z. R. Samani, and M. E. Ungar, Analyzing personality through social media profile picture choice, in Proc. Int. AAAI Conf. Social Media (ICWSM), 2016, pp. 211 220.

[8] All things Statista. Number of Monthly Active Twitter Users World wide From 1st Quarter 2010 to 1st Quarter 2019 Retrieved From. Accessed: Jan. 18, 2020. [Online]. Available:https://www.statista.com/statistics/282087/number-of- monthly-active-twitter-users/

[9] N. Majumder, S. Poria, A. Gelbukh, and E. Cambria, Deep learning-based document modeling for personality detection from text, IEEE Intell. Syst., vol. 32, no. 2, pp. 7479, Mar. 2017.

[10] D. Xue, L. Wu, Z. Hong, S. Guo, L. Gao, Z. Wu, X. Zhong, and

J. Sun, Deep learning-based personality recognition from text posts of online social networks, Int. J. Speech Technol., vol. 48, no. 11, pp. 42324246, Nov. 2018.

[11] M. Osama and S. R. El-Beltagy, A transfer learning approach for emotion intensity prediction in microblog text, in Proc. Int. Conf. Adv. Intell. Syst. Inform. Cham, Switzerland: Springer, Oct. 2019 pp. 512522.

[12] A. Khattak, M. Z. Asghar, Z. Ishaq, W. H. Bangyal, and I. A. Hameed, Enhanced concept-level sentiment analysis system with expanded onto logical relations for efficient classification of user reviews, Egyptian Informat. J., vol. 3, pp. 117, Apr. 2021, doi: 10.1016/j.eij.2021.03.001.

[13] S. Ahmad, M. Z. Asghar, F. M. Alotaibi, and S. Khan, Classification of poetry text into the emotional states using deep learning technique, IEEE Access, vol. 8, pp. 7386573878, 2020.

[14] M. Z. Asghar, A. Sattar, A. Khan, A. Ali, F. Masud Kundi, and

S. Ahmad, Creating sentiment lexicon for sentiment analysis in Urdu: The case of a resource-poor language, Expert Syst., vol. 36, no. 3, Jun. 2019, Art. no. e12397.

[15] F. M. Alotaibi, M. Z. Asghar, and S. Ahmad, A hybrid CNN- LSTM model for psychopathic class detection from tweeter users, Cognit. Comput., vol. 13, no. 3, pp. 709723, Mar. 2021.