Cognitive Behaviour Analysis Task | Data Modalities | Dataset Type | Feature Extraction | Description |
---|---|---|---|---|
Lie/Deception Detection | Audio | Unimodal | Mel Frequency Cepstral Coefficient (MFCC) [17] Spectral Kurtosis, MFCC, Spectral Spread, blood pressure, Spectral Centroid, respiration rate, and Tonal Power Ratio [66] | The Linear Kernel Support Vector Machines (SVM) classifier was used on the processed speech signals. The accuracy of Lie and Truth deception detection of speech audio, respectively, was 88.23% and 84.52% [17]. The MMO-DBN [66] method combines the Monarch Butterfly Optimization [95] and Moth Search [91] algorithms with a Deep Belief Neural Network, resulting in an accuracy of 98.4% |
Lie/Deception Detection | Images | Unimodal | Facial Features extracted using OpenFace[39] | A fraudulent detection framework to identify persons acting dishonestly in video clips by extracting the proportions of their facial micro-expressions [38]. An expression database with five expressions (Happiness, Joy, Surprise, Anger, Disgust/Contempt, and Sadness) with a classification accuracy rate of 85% Long Short-Term Memory Network (LSTM) was trained using facial videos from Real-life Trial (RLT) Dataset, Silesian Deception Dataset, and Bag-of-lies dataset to classify facial features with an accuracy of 89.49% [39] |
Lie/Deception Detection | Audio and Video | Multimodal | Verbal features: unigrams and bigrams derived from bag-of-words representation [18] Non-verbal features: Eyes, eyebrows, and mouth movements (facial expressions) and hand movements and trajectories (hand gestures) | The decision Trees algorithm was trained on these features to classify truth and deception with an accuracy of up to 75% |
Lie/Deception Detection | Audio, video, and text | Multimodal | Improved Dense Trajectory (videos), MFCC (Mel-frequency Cepstral Coefficients) features from audio and GloVe vector representations for transcripts (text) | Linear SVM algorithm was applied to classify truth and deception with an accuracy of 87.73% |
Lie/Deception Detection | Audio, video, and EEG | Multimodal | Attention-enhanced frequency distributed spectrograms (audio), two-stream CNN (video frames), Bi-LSTM (EEG) | The study investigates the Bag of Lies dataset using audio, video, and EEG data, applying late fusion of a two-stream CNN, attention-enhanced frequency distributed spectrograms with CNN, and a Bi-LSTM neural network for EEG data to detect lies, achieving an 83.5% accuracy with multimodal fusion |
Lie/Deception Detection | Audio, video, and EEG | Multimodal | Audio frames, Concatenated LBP face images from 20 frames per video, Concatenated EEG channels | In [40], LieNet, a unique deep convolutional neural network, is developed to detect multiscale variations of dishonesty using preprocessed audio, video, and EEG signals individually input into LieNet[40] for feature extraction. The framework is trained with data augmentation methods resulting in high accuracy rates on the BOL, RL trail, and MU3D databases. Other Deception detection techniques are also reported in literature [41, 42] |
Lie/Deception Detection | Audio, video, and micro-expression features | Multimodal | 3D-CNN [43] (videos), CNN and Word2Vec (text), open smile [44] toolkit (audio), 39 manually annotated microexpressions | [43] proposes a neural network model for deceit detection using audio, video, text, and micro-expression features; features are extracted using 3D-CNN, CNN, openSMILE toolkit, and binary annotations; the features are fused and fed to a multilayer perceptron for classification, achieving a maximum accuracy of 96.14% |
Lie/Deception Detection | Audio, Video, EEG, Gaze | Multimodal | LBP features from 20 frames per video, Zero crossing rate (audio), Spectral centroid (audio), Spectral bandwidth (audio), Spectral roll-off (audio), Chroma frequencies (audio), MFCC (audio), PyGaze (gaze), 100 points from a CSV file for each channel (EEG) | The research presented by [19] collected data from four different modalities and used different ML models to analyse and classify them, including using LBP and algorithms like SVM, random forest, and MLP for video data, frequency-based properties and Random Forest/KNN for audio data, CNN-based classifier and Random Forest/MLP for EEG data, and fixations, eye blinks and pupil size as features for gaze data |
Stress/Emotion Detection | EEG | Unimodal | Differential Entropy (DE), Power Spectral Density (PSD), Differential Asymmetry (DASM), Differential Caudality, and Rational Asymmetry (RASM) | In [21], DBNs were used to classify positive, neutral, and negative emotions from EEG data filtered by a bandpass filter between 0.3 and 50 Hz, using features such as Differential Entropy (DE), Power Spectral Density (PSD), Differential Asymmetry (DASM), Differential Caudality, and Rational Asymmetry (RASM), achieving an average accuracy of 86.08%, with SVM, LR, and KNN also used as classifiers |
Stress/Emotion Detection | EEG | Unimodal | empirical mode decomposition (EMD), discrete wavelet transformations (DWT) and a combination of both DWT-EMD | In [52], EEG characteristics are extracted using EMD, DWT, and DWT-EMD, and classification techniques such as KNN, SVM, and ANN were used to classify intrinsic properties of real, neutral, and performed smiles with an average accuracy of 94.3% and 84.1% using DWT-EMD and ANN in alpha and beta bands, respectively |
Stress/Emotion Detection | ECG | Unimodal | Peak detection followed by HRV feature extraction | In MAUS Dataset [27], HRV statistical and frequency domain features are extracted. SVM is applied for binary classification achieving an accuracy of 71.6% for the wrist using LOSO and mixed subject fivefold cross-validation methods |
Stress/Emotion Detection | PPG (Wrist) | Unimodal | Peak detection followed by HRV feature extraction | In MAUS Dataset [27], HRV statistical and frequency domain features are extracted. SVM is applied for binary classification achieving an accuracy of 66.7% wrist PPG using LOSO and mixed subject fivefold cross-validation methods |
Stress/Emotion Detection | PPG (Fingertip) | Unimodal | Peak detection followed by HRV feature extraction | In MAUS Dataset [27], HRV statistical and frequency domain features are extracted. SVM is applied for binary classification achieving an accuracy of 59.9% for fingertip PPG using LOSO and mixed subject fivefold cross-validation methods |
Stress/Emotion Detection | Text | Unimodal | GloVe embeddings | The cognitive approach to psychotherapy aims to modify negative thoughts; NLP was employed to create schemas from cognitive processes demonstrated by healthy individuals. These were then categorised into nine groups and mapped using GLoVE embeddings with KNN, SVM, and RNN classifiers |
Stress/Emotion Detection | ECG, GSR | Multimodal | ECG: HRV (Statistical and Frequency), GSR: statistical | In the SWELL_KW dataset [22], stress detection was performed using ECG and GSR modalities with preprocessing and feature extraction methods. KNN and SVM algorithms were used for classification achieving 66.52% and 72.82% accuracy, respectively |
Abnormal Behaviour Detection | Text | Unimodal | Bag of Words, SkipGram, GloVe | The corpus used in this study was taken from the Koko platform, which contains 500,000 posts on mental health issues. It was annotated into three classes: thinking errors (such as black-and-white thinking and catastrophising), emotions (including anger and anxiety), and situations (such as bereavement and work). The posts can have multiple labels, and different deep-learning techniques were used with word embeddings to classify them. The CNN-GloVe model achieved the highest F1 score of 57.8% |
Abnormal Behaviour Detection | Images | Unimodal | Social Force Flow [31] For every pixel in every frame, the interaction force is then transferred into the image plane | The Social Force concept is used to locate abnormal behaviours in crowd footage by covering a picture in a grid of particles, projecting it using the space–time average of optical flow, and measuring the interaction forces between particles treated as persons. The method achieved 94% accuracy using the bag of words method to categorise frames as normal and abnormal |
Abnormal Behaviour Detection | ECG | Unimodal | quadratic time–frequency distribution (QTFD) technique | This paper uses the quadratic time–frequency distribution (QTFD) technique to analyse EEG signals and track changes in spectral characteristics over time, extracting time–frequency characteristics for subject-dependent SVM classification of emotions using a 2D arousal-valence plane [50] |
Abnormal Behaviour Detection | ECG | Unimodal | Power Spectral Density and the Burg Autoregressive model [51] | A technique proposed for emotion recognition combines dynamic functional network patterns with regional brain activations calculated using Power Spectral Density and the Burg Autoregressive model. The method achieved up to 90.3% accuracy in differentiating between true/genuine versus neutral, true/genuine versus fake, and neutral versus fake emotions [51] |
Abnormal Behaviour Detection | ECG | Unimodal | DWT, EMD, and DWT-EMD | In [52], SVM, KNN, and ANN classifiers were used on EEG data to identify genuine smiles, fake/acted smiles, and neutral expressions. EEG features were extracted using three time–frequency analysis techniques at three frequency bands: DWT, EMD, and DWT-EMD. When distinguishing genuine emotional expression from a fake emotional expression using ANN, SVM, and KNN, the DWT-EMD technique yielded the highest classification accuracy in the alpha band at 94.3%, 92.4%, and 83.8%, respectively |
Abnormal Behaviour Detection | ECG, EDA, EMG, BVP, Accelerometer, Respiration, and Temperature | Multimodal | Forward Selection method | In [54], Forward Selection was used for feature selection, and SMOTE was used to balance the imbalanced WESAD dataset, with non-linear algorithms like GBDT, RF, ET, and DT being used to evaluate information gain through Gini Impurity or Friedman MSE |
Abnormal Behaviour Detection | ECG, EDA, EMG, BVP, Accelerometer, Respiration, and Temperature | Multimodal | PCA, Quantile Transformer, and Standard Scalar preprocessing | This study analyses bio-signals to detect stress using deep learning and machine learning on the WESAD dataset, applying PCA, Quantile Transformer, and Standard Scalar preprocessing, and using six machine learning methods for binary classification while employing Leave-one-subject-out cross-validation to avoid personalisation [55] |
Abnormal Behaviour Detection | Accelerometer, EDA, Temperature | Multimodal | Features like Mean and Standard Deviation, Dynamic Range, and min and max values were extracted | A new stress tracking system is proposed based on a GRU RNN, which is useful in situations where not all modalities are reliable stress predictors. The system performs binary classification, considering only ACC, EDA, and TEMP signals with statistical parameters for feature engineering. GRU solves the vanishing gradient problem of RNN, and the selected indicators are used to distinguish between stress and non-stress-related circumstances [57] |