Facial Expressions Recognition using EEG Based on Machine Learning and Deep Neural Network Methods

DOI : 10.17577/IJERTCONV9IS05083

Download Full-Text PDF Cite this Publication

Text Only Version

Facial Expressions Recognition using EEG Based on Machine Learning and Deep Neural Network Methods

M. Prakash,

Assistant Professor, Department of Computer Science (SF),

Erode Arts and Science College(Autonomous), Erode 638009, Tamilnadu, India.

Dr. R N Muhammad Ilyas,

Assistant Professor,

Dr. K Sankar,

Assistant Professor, Department of Computer Science, The New College (Autonomous), Chennai – 14, Tamilnadu, India.

Department of Computer Science, The New College (Autonomous), Chennai – 14, Tamilnadu, India.

Abstract-Real-time expression recognition has been an active field of research over the past few years. Our research work aims to classify physically challenged individuals (deaf, dumb and laid up) and disorder childrens expressions using facial landmarks and electroencephalograph (EEG) signals. We employ a convolutional neural network (CNN) and long short- term memory (LSTM) classifiers. Here we develop a rule for emotion recognition using virtual markers through an optical flow rule. It works effectively in uneven lightning like head rotation (up to 25o), totally different backgrounds, and numerous skin tones. Six facial emotions (happiness, sadness, anger, fear, disgust, and surprise) are collected using 10 virtual markers. 55 college students (Thirty Five male and Twenty female) with a mean age of 22 are taken as samples. All the students participated in the experiment for facial expression recognition. 19 college students volunteered to gather EEG signals. Initially hair like features were used for facial and eye detection. Then, virtual markers are placed on outlined locations of the subjects face. It helped in retrieving the facial action victimization using the mathematical model approach known as caterpillar-tracked victimization the Lucas-Kande optical flow algorithm.

The space between the middle of the subjects face and every marker position is employed as a feature for facial expression classification. This distance feature is statistically valid employing a unidirectional analysis of variance with a significance level of p < 0.01. In addition to that, the fourteen signals are collected from the EEG signal reader (EPOCþ) channels which are used as options for expression classification. Finally, the options are cross-validated and victimization of multiple cross-validations is given to the LSTM and CNN classifiers. We accomplish the recognition rate of 99.25% victimization CNN for expression detection using facial landmarks. However, the utmost

recognition rate achieved in victimization of the LSTM classifier is 87.96% for feeling detection using victimization EEG signals.

INTRODUCTION

One of the important ways humans display emotions is through facial expressions. Facial expression recognition is one of the most powerful, natural and immediate means for human beings to communicate their emotions and intensions.

Humans can be in some circumstances restricted from showing their emotions, such as hospitalized patients, or due to deficiencies; hence, better recognition of other human emotions will lead to effective communication. Automatic human emotion recognition has received much attention recently with the introduction of IOT and smart environments at hospitals, smart homes and smart cities. Intelligent personal assistants (IPAs), such as Siri, Alexia, Cortana and others, use natural language processing to communicate with humans, but when augmented with emotions, it increases the level of effective communication and human-level intelligence.

Due to the fast advancement of artificial intelligence (AI) and machine learning, its application is actively being used in many domains including spam detection, in which a spam classifier is utilized to rearrange email according to some specific standards and to move unwanted and unsolicited email to spam folder [1]. As well as significantly being used in data mining it is used for market analysis to support the large amount of data being produced every day and to detect fraud probability through the customers fraud insurance [2]. For example, enhanced Fraud Miner uses the clustering-based data mining method Lingo in order to identify frequent patterns [3]. In addition, it is used for machine learning driven advances in medical space, such as revenue cycle management (i.e. payments) and understanding patient health through focusing on clinical data-rich environment [4,5].

Moreover, machine learning algorithms have played a significant role in pattern recognition and pattern classification problems, especially in facial expression recognition and electroencephalography (EEG), over the past several decades [68]. It has come of age and has revolutionized several fields in computing and beyond, including human computer interaction (HCI) [9,10]. Human computer interaction has become a part of our daily life. Additionally, emotion-related declaration is an important aspect in human interaction and communication due to its effective cost, reliable recognition and shorter computational time, among other advantages [10]. In other words, it may be a potential nonverbal communication media for creating

diverse situations that can illustrate superior interaction with humans by close collaboration with human-human communication [1013]. Facial expression analysis is an interesting and challenging problem and impacts important applications in many areas, such as humancomputer interactions and medical applications.

Several works are based on facial landmarks to extract some features to help in emotion detection [16]. It presents a potential approach that uses 68 facial landmarks to detect three kinds of emotions in real time; negative, blank, and positive using one camera. Their proposed system can detect emotions using both 79 new features and 26 geometrical features (10 eccentricity features, 3 linear features and 13 differential features) from Ref. [17] with the average accuracy of 70.65%. Palestra et al. [18] computes 32 geometric facial (linear, eccentricity, polygonal and slope) features based on

20 facial landmarks for automatic facial expression recognition. Thus, the expressions of humans can easily be understood by recent HMI systems [10,1214]. Among the many methods studied in the literature, studies that made use of still images and emotion were perceived by measuring the dimensions of lips and eyes [10,11,15]. Biosensors, such as electromyograms (EMGs) and electroencephalograms (EEGs), have been used to perceive facial muscle changes and to conceive brain activities.

There have been efforts to develop multimodal information using facial expressions [11]. Facial recognition suffers limitations such as light intensity, face position, and background changes. The multimodal strategy has a better performance in terms of the emotion recognition rate compared to the single modal strategy [19].

However, combining facial expressions with other modalities, such as speech signals, gestures and biosignals, will not efficiently detect emotions in dumb, deaf and paralyzed patients when developing intelligent human machine interface (HMI) assistive systems due to the above-mentioned issues. The existence of such HMI can be convenient for those who are totally disabled physically as well as patients with special needs, when seeking help. To date, most of the existing methods work offline, and will not be useful for real-time applications [10,11]. Hence, the present work is mainly focused on developing a multimodal intelligent HMI system that works in a real-time environment. It will recognize emotions using facial expressions and EEG ased on machine learning and deep neural network methods.

To identify emotional facial expressions in real-time, the changes in facial reactions are measured. The facial action

2. Methods and materials

Fig. 1 shows the structure of the proposed system of our study. As illustrated in Fig. 1, we used two approaches to detect the subjects emotion

Emotion detection using facial landmarks and

coding system (FACS) is a human observer-based system designed by Ekman and Friesen [15], which is used to detect subtle changes in facial features and controls facial models by manipulating single actions, which are called action units (AUs). These AUs are considered a reference for identifying ten virtual markers for detecting six basic emotional facial expressions: sadness, anger, happiness, fear, disgust and surprise. Accordingly, if these facial muscle responses are studied and used as an instruction to pinpoint the expressions, the emotional state of humans can be perceived in real time [11]. Hence, the main objective of this study is to recognize six basic emotional expressions of the subjects using facial virtual markers and EEG signals. Therefore, the proposed system has less computational complexity (execution time, memory) and works in real-time applications.

As it is using the markers based approach in which ten virtual markers are placed on the face rather than using the image pixels based approach that requires a longer computational time to detect the face [20], this will simplify the system design as well as reduce the computational time and system memory. As in real-time systems, face recognition is implemented on either by using the pixels of the image, or Haar-like features that efficiently use the AdaBoost cascade classifier, to detect any object in a given image or a video sequence involving human faces [21,22]. These Haar-like features based face detection can process 384 X 288 pixels of a face image in approximately 0.067 s [23].

They have used an AdaBoost cascaded classifier for a lesser computational time [20]. Most of the works in the literature focus on developing an offline emotion recognition system. In addition, this study aims to develop an algorithm for real- time emotion recognition using virtual markers through optical flow algorithm that works effectively in uneven lighting and subject head rotation (up to 25o), different backgrounds, and various skin tones. The distance feature computed from face virtual markers and the fourteen signals recorded from an EPOC þ device are used to classify the emotional expressions using a convolutional neural network (CNN) and long short-term memory (LSTM) classifier. This will lead to achieve the aim of the system to help physically disabled people (deaf, dumb, and laid up), in addition to its benefit for Autism children to recognize the feelings of others.

Emotion detection using EEG signals. Fig: 1 System Structure..1. Emotion detection using facial landmarks

In emotion detection using facial landmarks, data collection occurred on the same day and it was obtained from each subject volunteered to our study. Thus, two facial expression databases were developed: one with a total of 30 subjects (15 male, 15 female) for automated marker placement, and another with a total of 55 subjects (35 male, 20 female) for testing and validating the proposed system. All subjects were

chosen from a mean age domain of 22.9 years. They were mixed males and females and healthy undergraduate university students with a history free from any cognitive, muscular, facial or emotional disorders.

The subjects were requested to sit in front of a computer that had a built-in camera, and express six different emotional expressions (happiness, anger, fear, sadness, surprise, and disgust) in a video sequence for the purpose of data collection. They were requested to express particular emotion in a controlled environment (room temperature: 26o, lighting intensity: 50 lux; the distance between the camera and subject: 0.95 m).

Facial expression for each emotion lasted 60 s and was performed once by each subject. Therefore, each subject required 6 min to express the emotions, as illustrated in Fig.

  1. However, each subject took approximately 20 min as a total time, including instructions and baseline state. The total input virtual distances collected from each subject on all emotions is 2284 on average, saved in a format of CSV file, and the total virtual distance collected from all subjects on all emotions is 125,620.

    Fig: 2 Data acquisition protocol for facial emotion detection.

        1. Face feature extraction

          In this study, a HD camera was used to capture the subjects faces and create a grayscale image. This simplified the facial image process in facial expression recognition. Then, by using a grayscale image, the subjects eyes are detected, and ten virtual markers (action units) are placed on the subjects face at defined locations using a mathematical model, as shown in Fig. 3.

          Fig: 3 Ten virtual locations using a mathematical model.

          The Lucas-Kande optical flow algorithm is used to transfer each virtual marker position to track its position during the subjects emotional expression. The ten features are derived as the distance between each marker and the point as depicted in Fig. 4.

          Fig: 4 Ten distances between each marker point and the center point.

          Fig: 5 The center point (C) and p_m1 coordinates at the distance of m1.

          In the current study, all the distance data were calculated using the Pythagorean Theorem [24]. Then, they are stored in CSV format during the data acquisition process for further processing. In Fig. 5, in the right mouth column, line m1 is the hypotenuse of a right triangle, wherein the line parallel to the x-axis is dx [the difference parallel to the axis is dy [the difference between y-coordinates of p_m1 (yp_m1) and the point (yc)]. Thus, the formula for the computation of distance is given in Equation (1):

        2. Facial landmarks classification

    A convolutional neural network was used in our system to obtain improved facial emotion detection as it is applied to other computer fields such as face recognition [25] and object detection [26]. In addition, predictions are based on information given at a particular time [27]. Fig. 6 shows the network structure that is used for emotion detection using facial landmarks. This network takes an input image and attempts to predict the output emotion. It has eight stages, including convolutions, pooling and fully connected layers with rectified linear unit (ReLU) operations, which preserve good quality while making convergence much faster [28]. The number of filters was 32, 64, and 128 with a filter size of 5 X 5 for the convolutional layers, and the number of output nodes in the fully connected layer was 6 with the Adam optimizer and a dropout rate of 0.3.

    Fig. 6. Facial landmarks system structure.

      1. Emotion detection using EEG signals

        In emotion detection using EEG signals, data collection took place on the same day after written consent was obtained from each subject volunteered in this study. For the purpose of EEG data collection, a video was created from six main clips to obtain the best reaction of the brain in terms of electrical activities. In the beginning, video clips were created using emotional images from the International Affective Picture System [29] and music [30]. Those clips were supposed to elicit the six emotions to be predicted. After that, the video clips were tested on a small test sample to confirm the elicited emotions by prompting the volunteers to vote for the emotion they felt while watching the clips. Accordingly, other video clips were selected based on certain criteria, one of which must have a minimum of one million views on YouTube. Each clip was 2 min long and referred to one emotion. Therefore, the total length of the video, including the gaps between the clips, was 13 min and 10-s. Aftr the video clips were determined, all the subjects were asked to watch the video and classify the emotional categories following their proto-think and explain the effect that they consciously believe they caused.

        In this way, EEG raw data of the 14 channels were collected at a sampling rate of 128 Hz using the Emotiv EPOC device [31]. Thus, an EEG signal database was developed with a sum of 19 subjects (7 males and 12 females) for EEG signal investigation with a total of 1,700,000 records of signal data. All subjects were chosen from a mean age domain of 22.9 years. Collecting data was conducted in a laboratory environment and required approximately 1 h to perform. All subjects were healthy undergraduate university students with a history free from any cognitive, muscular, facial, or emotional disorders. EEG recording was performed using the EPOC þ interface. It is connected wirelessly to the Emotiv EPOC device and records EEG signals that are coming from the brain while the subject is watching the video. As each emotions video clip lasted 2 min with a 10 s gap between the videos and was performed once by each subject, each subject required 13 min and 10 s to record the EEG signals, as illustrated in Fig. 7. However, each subject took approximately 1 h as a total time, including instructions and a

        baseline state.

        Fig. 7. Data acquisition protocol for EEG emotion detection.

        1. EEG signal preprocessing

          Due to the artifacts that come from different sources during EEG signal recording, such as eye blinks, eye movements, muscle movements, respiration, and sweat, a process is needed to remove this unwanted information from the EEG signals that can cause significant distortion.

        2. EEG signal classification

    A long short-term memory (LSTM) network model was used for training the affective model and obtaining improved EEG signal emotion detection. LSTM is a special type of artificial recurrent neural network (RNN) architecture. It is used in the field of deep learning and well-suited to time series data to classify, process, and make predictions. It can process entire sequences as analog data in this study. Nineteen participants were asked to watch the same video that contained six different parts, to recognize the emotion elicited from these videos. Then, the collected data (EEG raw data) were fed to the proposed model, as shown in Fig. 8. The proposed model achieved the highest accuracy among three models (conventional, fully connected layers, grid search).

    The LSTM and dropout layers are used to learn features from raw EEG signals. However, the dropout layer is used to reduce the overfitting by preventing too many units from coadapting. Finally, the dense layer that uses the Softmax activation function is used for classification.

  2. Data validation

    System validation aims to evaluate the accuracy after development. It was tested by collecting data especially for the testing.

    1. Facial landmarks database validation

      For emotion detection using facial landmarks, two facial expression databases were developed, one with a total of 30 subjects (15 male, 15 female) for automated marker placement, and another one with a total of 55 subjects (25 male, 30 female) for testing and validating of the proposed system. They were requested to sit in front of a computer that has a built in Camera and express six different emotional expressions (happiness, anger, fear, sadness, surprise, and disgust) in a video sequence for the purpose of data collection. They were requested to express particular emotion in a controlled environment (room temperature: 26 degree Celsius, lighting intensity: 50 lux; the distance between the camera and subject: 0.95 m). Thus, the proposed method achieves 99.81% as a highest accuracy.

    2. EEG signals database validation

      The facial landmarks data for the 19 subjects are given into the three-fold cross-validation method to split the set of features into training and testing sets. Thus, the proposed method achieves the higher emotional detection rate of 87.25%.

  3. Experimental results and discussion For emotion detection using facial landmarks, data were collected from 55 subjects for testing; their ages were between 20 and 25. They are undergraduate students (25 males, 30 females). Then, the accuracy was found at 100 epochs for the collected data in both cases of normalized and not normalized data, as illustrated in Table (1). The collected facial landmark data were normalized using Equation (2) to make all data values between [0, 1].

where Zi is the ith normalized data. Both models were run till 300 epochs to check the shape of the curve and obtain the highest accuracy for emotion detection.

As shown in Fig. 8, emotion detection using facial landmarks has a direct relationship till 300 epochs. Therefore, the system can recognize emotions in 99.81% of facial landmarks using the proposed model at 300 Epochs. Whereas, the proposed model of EEG signals has a direct relationship between accuracy and number of epochs till 100, and the curve stops increasing after 100, as shown in Fig. 9. As a result, the highest emotion recognition accuracy of EEG signals is 87.25% at 100 Epochs.

Fig: 8 Accuracy vs. #Epochs (Facial Landmarks).

Our system places exactly 10 virtual markers automatically to detect emotions. It is trained using our own created database, and the maximum accuracy was 99.81%. This means that the system detects emotions using facial landmarks with high accuracy and without too many features. However, in the

Fig: 9. Accuracy vs. #Epochs (EEG).The results of Table (2) are calculated using the following equations:

Where; TP: True Positive, FP: False Positive, TN: True Negative, FN: False Negative. Table (2) shows the performance factors for each emotion; 0:Anger, 1:Disgust, 2:Fear, 3:Sad, 4:Smile and 5:Surprise. It shows the perfect precision and the proportion of actual positives and negatives of emotions that are accurately identified. Thus, the collected data can be used as a benchmark for other researchers.

Our system is based on just 10 virtual distances for facial landmarks and 14 EEG raw data for the EEG signals. In other words, it increased the performance of the system as the number of training facial landmark features is fewer than in previous work, as shown in Table (2).

detection using facial landmarks, it was more accurate since it depends on external physical features.

Many exhaustive tests were conducted to minimize the errors in obtaining the desired accuracy. The highest recognition rate obtained in this research was 99.81% for emotion detection using facial landmarks with the 10 virtual markers. However, the highest accuracy for emotion detection using EEG signals was 87.25%, which can be improved by collecting more data from more subjects and finding techniques to extract more features from the EEG signals.

CONCLUSION AND FUTURE WORK

An algorithm for real-time emotion recognition using virtual markers through an optical flow algorithm has been developed to create a real-time emotion recognition system with less computational complexity (execution time, memory) using facial expressions and EEG signals. This algorithm works effectively in uneven lightning and subject head rotation (up to 25o), different backgrounds, and various skin tones. The system aims to help physically disabled people (deaf, dumb and laid up)

The results show that the system can recognize emotion in 99.25% of facial landmarks and 87.96% of EEG signals. The cases that were used to collect the data were performed at Kuwait University. It was difficult to collect data from many subjects due to the students schedules and timings. Thus, only a few subjects were available for collecting the data.

For future work, the systems precision and accuracy can be improved by collecting more data from more subjects. Additionally, techniques can be used to extract more features from EE signals [14]. In addition to improving the system techniques, putting the subjects in real situations to express the exact feelings can help to improve the systems accuracy for the EEG

REFERENCES

  1. Dada EG, Bassi JS, Chiroma H, Abdulhamid SM, Adetunmbi AO, Ajibuwa OE. Machine learning for email spam filtering: review, approaches and open research problems. Heliyon 2019;5(6):e01802. https://doi.org/10.1016/j.heliyon.2019. e01802.

  2. Xie M. Development of artificial intelligence and effects on financial system. J Phys Conf 2019;1187:032084. https://doi.org/10.1088/1742-6596/1187/3/032084.

  3. Hegazy O, Soliman OS, Salam MA. A machine learning model for stock market prediction. Int J Comput Sci Telecommun 2014;4(12):1623.

  4. Beckmann JS, Lew D. Reconciling evidence-based medicine and precision medicine in the era of big data: challenges and opportunities. Genome Med 2016;8(1):1349.

  5. Weber GM, Mandl KD, Kohane IS. Finding the missing link for big biomedical data. Jama 2014;311(24):247980.

  6. Loconsole C, Chiaradia D, Bevilacqua V, Frisoli A. Real-time emotion recognition: an improved hybrid approach for classification performance. Intelligent Computing Theory 2014:32031.

  7. Huang X, Kortelainen J, Zhao G, Li X, Moilanen A, Seppanen T, Pietikainen M.Multi-modal emotion analysis from facial expressions and electroencephalogram. Comput Vis Image Understand 2016;147:11424.

    https://doi.org/10.1016/j.cviu.2015.09.015.

  8. Raheel A, Majid M, Anwar SM. Facial expression recognition based on electroencephalography. In: 2019 2nd international conference on computing, mathematics and engineering technologies (iCoMET), Sukkur, Pakistan; 2019. p. 15.

  9. Vassilis S, Herrmann Jürgen. Where do machine learning and human-computer interaction meet? 1997.

  10. Keltiner D, Ekrman P. In: Lewis M, Haviland Jones JM, editors. Facial expression of emotion, hand book of emotions. New York: Gilford Press; 2000. p. 23649.

  11. Ekman P. Darwin and facial expression: a century of research in review. United State Of America: Academic Press Ishk; 2006. p. 1973.

  12. Ekman P, Friesen WV. Constants across cultures in the face and emotion. J Pers Soc Psychol 1971;17(2):124.

  13. Ekman P. Darwin and facial expression: a century of research in review. United State Of America: Academic Press Ishk; 2006. p. 1973.

  14. Ekman P, Friesen WV, Ancoli S. Facial signs of emotional experience. J Pers Soc Psychol 1980;39:112334.

  15. Ekman P, Friesen WV, Ancoli S. Facial signs of emotional experience. J Pers Soc Psychol 1980;39:112334.

  16. Nguyen BT, Trinh MH, Phan TV, Nguyen HD. An efficient real- time emotion detection using camera and facial landmarks. In: 2017 seventh international conference on information science and technology (ICIST); 2017. https://doi.org/10.1109/icist.2017.7926765.

  17. Loconsole C, Miranda CR, Augusto G, Frisoli A, Orvalho V. Real-time emotion recognition novel method for geometrical facial features extraction. Proceedings of the International Conference on Computer Vision Theory and Applications (VISAPP) 2014:37885.

  18. Palestra Giuseppe, Pettinicchio Adriana, Del Coco Marco, Carcagn Pierluigi, Leo Marco, Distante Cosimo. Improved performance in facial expression recognition using 32 geometric features. In: Proceedings of the 18th international conference on image analysis and processing. ICIAP; 2015. p. 51828.

  19. Zhang J, Yin Z, Cheng P, Nichele S. Emotion recognition using multi-modal data and machine learning techniques: a tutorial and review. Information fusion. 2020.

  20. Wilson PI, Fernandez J. Facial feature detection using Haar classifiers. J. Comput. Small Coll., rocník 21, c. 2006;4. s. 127- 133, ISSN 1937-4771.

  21. Zhao G, Pietikainen M. Dynamic Texture Recognition Using Volume Local Binary Patterns. In: Vidal R, Heyden A, Ma Y, editors. Dynamical Vision. WDV 2006, WDV 2005. Lecture Notes in Computer Science, vol. 4358. Berlin, Heidelberg: Springer; 2007.

  22. Das PK, Behera HS, Pradhan SK, Tripathy HK, Jena PK. A modified real time A* algorithm and its performance analysis for improved path planning of mobile robot. In: Computational intelligence in data mining, springer India, vol. 2; 2015. p. 221 34.

  23. Viola P, Jones M. Rapid object detection using a boosted cascade of simple features. 2001 [20] aeed, A. and Al-Hamadi, A. (2015). Boosted human head pose estimation using kinect camera. In International Conference on Image Processing (ICIP), pages 1752 1756. IEEE.

  24. Sally Judith D, Paul Sally. "Chapter 3: Pythagorean triples". Roots to research: a vertical development of mathematical problems. American Mathematical Society Bookstore; 2007. p. 63. ISBN 0821844032.

  25. Sun Y, Chen Y, Wang X, Tang X. Deep learning face representation by joint identification-verification. In: Proc. Adv. Neural inf. Process. Syst.; 2014. p. 198896.

  26. Ouyang W, Wang X, Zeng X, Qiu S, Luo P, Tian Y, Li H, Yang S, Wang Z, Loy C-C, Tang X, Dong C, Loy CC, He K. Deepid- net: deformable deep convolutional neural networks for object detection. In: In proc. IEEE conf. Comput. Vis. Pattern recogn.; 2015. p. 240312.

  27. Tang X. Image super-resolution using deep convolutional networks. IEEE Trans Pattern Anal Mach Intell 2016;38(2):295 307. https://doi.org/10.1109/ tpami.2015.2439281.

  28. Krizhevsky A, Sutskever I, Hinton G. ImageNet classification with deep convolutional neural networks. in Proc. Adv. Neural Inf. Process. Syst. 2012: 1097105.

  29. Lang PJ, Bradley MM, Cuthbert BN. International affective picture system (IAPS): affective ratings of pictures and instruction manual. Gainesville, FL: University of Florida; 2008. Technical Report A-8.

  30. Bhattacharya J, Lindsen JP. Music for a brighter world: brightness judgment bias by musical emotion. PloS One 2016;11(2):e0148959. https://doi.org/10.1371/journal.pone.0148959.

  31. https://www.emotiv.com/.

  32. Sangaiah AK, Arumugam M, Bian G-B. An intelligent learning approach for improving ECG signal classification and arrhythmia analysis. Artif Intell Med 2019: 101788. https://doi.org/10.1016/j.artmed.2019.101788.

Leave a Reply