Machine learning for cognitive behavioral analysis: datasets, methods, paradigms, and research directions

Bhatt, Priya; Sethi, Amanrose; Tasgaonkar, Vaibhav; Shroff, Jugal; Pendharkar, Isha; Desai, Aditya; Sinha, Pratyush; Deshpande, Aditya; Joshi, Gargi; Rahate, Anil; Jain, Priyanka; Walambe, Rahee; Kotecha, Ketan; Jain, N. K.

doi:10.1186/s40708-023-00196-6

Brain Informatics

Table 1 Cognitive Behavior Analysis Datasets and Data Acquisition Methods

From: Machine learning for cognitive behavioral analysis: datasets, methods, paradigms, and research directions

Dataset Name	Data Modalities	Dataset Type	Cognitive Behavior Analysis Task	Description
Miami University Deception Detection Database (MU3D) [15]	Videos	Multimodal	Lie/Deception Detection	The dataset comprises 320 videos featuring individuals providing both truthful and deceptive statements. It is a multimodal dataset encompassing various data modalities, including audio and video [15]. This dataset contains 320 films of individuals telling lies and stating the truth
Silesian Deception Database [16]	Videos	Multimodal	Lie/Deception Detection	The dataset consists of 101 high-speed camera video recordings of subjects captured at a resolution of 640480 and a frame rate of 100 frames per second. Within the database, over 1.1 million coded frames serve as the ground truth for detecting deception cues on the subject's face during truthful and deceptive statements. The videos of subjects were captured with a 640480 resolution and 100 frames per second
ReLiDDB (ReGIM-Lab Lie Detection Database) [17]	Speech Signals	Unimodal	Lie/Deception Detection	The dataset consists of recordings of false and true declarations captured in various indoor and outdoor settings. It includes 40 subjects' speech signals, presenting hypothetical scenarios for preliminary investigation. The dataset contains approximately 37% samples of false declarations and 68% samples of true declarations. A dataset containing speech signals that can be used for preliminary investigation
Deception Detection and Physiological Monitoring (DDPM) [58]	Thermal video frames, text (annotations), audio, and pulse oximeter for 70 subjects over 13 h	Multimodal	Lie/Deception Detection	The database encompasses approximately 13 h of recordings from 70 subjects, comprising over 8 million video frames captured in visible-light, near-infrared, and thermal spectra. Additionally, the database includes relevant metadata, audio data, and pulse oximeter data [58]. The interviewee's data, including RGB, near-infrared, long-wave infrared recordings, cardiac pulse, blood oxygenation, and audio information, were collected and annotated for further analysis [58]. A situation in which the interviewee tries to trick the interviewer by giving certain answers
SJTU Emotion EEG data set (SEED) [21]	EEG Signals	Unimodal	Stress/Emotion Detection	The SEED dataset, or the SJTU Emotion EEG Dataset [21], consists of three-class emotional EEG data obtained from 15 individuals. During the data collection, participants were exposed to emotional film clips representing positive, negative, and neutral emotions [21].An EEG dataset acquired from 15 subjects
EEG data set for genuine & acted emotional expressions [52]	EEG Signals	Unimodal	Stress/Emotion Detection	The dataset involves classifying emotions into genuine, neutral, or simulated categories. During the data collection process, participants wore an EEG headset while being presented with photos or movie clips displayed on a computer monitor [52]. The participants' emotions experienced fluctuations in response to the visual stimuli, reflected in the captured EEG data [52]. EEG recordings of subjects with genuine and fake emotional expressions
A Database for Emotion Analysis using Physiological Signals (DEAP) [20]	EEG signals, peripheral physiological signals, and multimedia content analysis	Multimodal	Stress/Emotion Detection	In this dataset, 32 individuals' EEG and peripheral physiological data were monitored while they watched 40 one-minute-long music video snippets. Each film was scored by participants based on its arousal, valence, like/dislike, dominance, and familiarity levels. Frontal face footage was also taken for 22 of the 32 participants [20].EEG recordings and peripheral physiological signals of 32 subjects as each watched 40 one-minute-long excerpts of music videos
SWELL_KW dataset [22, 46]	Computer logs, facial expressions from video recording, Body postures and HRV	Multimodal	Stress/Emotion Detection	This dataset contains readings from 25 participants, subjected to neutral interruptions and pressure conditions for 3 h each [22]. The data collected are computer logs, facial expressions from video recording, body postures using the Kinect 3D sensor, the ECG sensor, and the body sensors for skin conductance level. A dataset that contains readings from 25 participants who were subjected to neutral, interruptions, and pressure conditions for a total of 3 h each
Physical Activity and Stress (PASS) dataset [23]	ECG, EDA, respiration, temperature	Multimodal	Stress/Emotion Detection	This dataset consists of the experimental procedure employed and descriptive statistics of the participants' neurophysiological signals captured under various circumstances. Tasks of varying stress levels were asked to be performed by participants
Continuous stress detection on nurses in a hospital [24]	EDA, ECG, accelerometer data, temperature	Multimodal	Stress/Emotion Detection	This dataset provides physiological stress indicators for nurses working in real hospital environments during the COVID-19 pandemic. It was created primarily to conduct research on stress in the workplace setting and was collected using data streams from Empatica E4 devices [24]. Physiological data were monitored. Survey was filled out by nurses periodically regarding the aspects that contributed to stress
PURE [25]	Video, pulse rate, SpO2 readings	Multimodal	Stress/Emotion Detection	Ten subjects were asked to perform different head-head motions. This benchmark dataset focuses on how much the head moves during the measurement was introduced [25] motions
COFACE [26]	Videos, physiological signals (contact photoplethysmography and respiration)	Multimodal	Stress/Emotion Detection	This dataset includes 160 movies and physiological information collected from 40 healthy adults over several days. The group was composed of 70% men and 30% women. Participants were recorded for one minute using a standard webcam, while their physiological data were recorded using a Blood-Volume Pulse sensor and a respiration belt [26].Data collected from 40 subjects over several days for realistic conditions
MAUS [27]	ECG, PPG, GSR signals	Multimodal	Stress/Emotion Detection	This dataset includes data collected from 22 healthy graduate students who were given guidelines and signed a consent form before the test [27]. The study participants' age had an average of 23 years and a standard deviation of 1.7. The mental workload was checked using wearable sensors
VIPL-HR [28]	Visible light videos, Near infrared videos	Multimodal	Stress/Emotion Detection	This dataset consists of 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos, capturing 107 subjects. The VIPL-HR database encompasses diverse variations, including head movements, illumination variations, and changes in acquisition devices [28]. Remote heart rate estimation was done using a face Videos
(Multimodal Sentiment Analysis) – Stress (MuSe-Stress) dataset [29]	Text, audio, video, and physiological data like skin temperature, skin conductance, breathing rate, and heart rate	Multimodal	Stress/Emotion Detection	This dataset consists of stressed emotions that consist of recordings of 28 college students from the University of Michigan, nine females and 19 males, in two sessions: one during which an external stressor (the University of Michigan's final exam period) was present, and the other session during which the stressor was absent. Each recording lasts about 45 min in total. Each individual is exposed to various emotional stimuli, including brief movies and questions that evoke strong emotions [29]. Three separate datasets are used to analyse sentiments and detect emotions and humour
Koko website [30]	Text (corpus)	Unimodal	Abnormal Behavior Detection	The corpus contained 500,000 written posts and was annotated into three classes, thinking errors, emotions, and situations [30] and was annotated into three classes, thinking errors, emotions, and situations
Abnormal crowd behaviour dataset [31]	Video	Multimodal	Abnormal Behavior Detection	This dataset is a collection of normal and abnormal crowd recordings. The collection consists of films from 11 different escape event scenarios shot in 3 indoor and outdoor settings. Each video starts with a segment on regular behaviour and concludes with segments on deviant conduct [31].Computer vision methods employed on videos collected of pedestrians in crowded areas
Wearable Stress and Affect Detection (WESAD) dataset [32]	ACC, BVP, ECG, EMG, EDA, RESP, TEMP	Multimodal	Abnormal Behavior Detection	This dataset is a publicly available collection of physiological data from 15 individuals recorded during a lab experiment using chest- and wrist-worn devices. The data were collected under five conditions: Baseline, Amusement, Stress, Meditation, and Recovery [32]. Tphysiological data from 15 subjects captured from the wrist and chest-worn devices
Multimodal Analysis of Human Nonverbal Behavior in Conversations – Human–Computer Interaction. (MAHNOB-HCI) dataset [33]	Audio signals, face videos, eye gaze data, and peripheral/ central nervous system physiological signals	Multimodal	Abnormal Behavior Detection	This dataset comprises multimodal data, including face videos, audio signals, eye gaze data, and physiological signals from the peripheral and central nervous systems. Two experiments were conducted with 27 participants of diverse cultural backgrounds and genders. Participants watched 20 emotional movies in the first experiment and self-reported their feelings using specific emotional keywords. The second experiment involved showing short films and photos with and without tags, and participants rated their agreement or disagreement with the displayed tags. The captured movies and corresponding physical reactions were segmented and stored in a database [33]. 27 subjects watched 20 emotional videos and reported the emotions they feel
Bio-reactions and faces for emotion-based personalisation (BIRAFFE) dataset [34]	ECG, GSR, facial expression signals and hand movements through the accelerometer and gyroscope	Multimodal	Abnormal Behavior Detection	Individuals were subjected to audio–video stimuli and a three-level emotion-evoking game. The whole BIRAFFE dataset consists of data gathered from 201 out of 206 participants13. Unfortunately, some of the data was not properly collected for some participants due to, e.g., applications crashing, Bluetooth signal being lost, and poor electrode contact. Finally, the real data is available for 141 subjects [34]
MuSe (Multimodal Sentiment Analysis)-Physio dataset [35]	EDA, GSR, audio, video, heart rate, respiration	Multimodal	Abnormal Behavior Detection	In this database, human annotations were used to predict psycho-physiological responses. 69 participants (49 of them female) are aged between 18 and 39 years, providing about 6 h of data for the MuSe-Stress and MuSe-Physio sub-challenges. Besides audio, video, and texts, the participants can optionally utilise the ECG, RESP, and BPM signals Human annotations were used to predict psycho-physiological responses

Back to article page