Skip to main content

Table 1 Cognitive Behavior Analysis Datasets and Data Acquisition Methods

From: Machine learning for cognitive behavioral analysis: datasets, methods, paradigms, and research directions

Dataset Name

Data Modalities

Dataset Type

Cognitive Behavior Analysis Task

Description

Miami University Deception Detection Database (MU3D) [15]

Videos

Multimodal

Lie/Deception

Detection

The dataset comprises 320 videos featuring individuals providing both truthful and deceptive statements. It is a multimodal dataset encompassing various data modalities, including audio and video [15]. This dataset contains 320 films of individuals telling lies and stating the truth

Silesian Deception Database [16]

Videos

Multimodal

Lie/Deception Detection

The dataset consists of 101 high-speed camera video recordings of subjects captured at a resolution of 640*480 and a frame rate of 100 frames per second. Within the database, over 1.1 million coded frames serve as the ground truth for detecting deception cues on the subject's face during truthful and deceptive statements. The videos of subjects were captured with a 640*480 resolution and 100 frames per second

ReLiDDB (ReGIM-Lab Lie Detection Database) [17]

Speech Signals

Unimodal

Lie/Deception Detection

The dataset consists of recordings of false and true declarations captured in various indoor and outdoor settings. It includes 40 subjects' speech signals, presenting hypothetical scenarios for preliminary investigation. The dataset contains approximately 37% samples of false declarations and 68% samples of true declarations. A dataset containing speech signals that

can be used for preliminary investigation

Deception Detection and

Physiological Monitoring (DDPM) [58]

Thermal video

frames, text (annotations), audio, and pulse oximeter for 70 subjects over

13 h

Multimodal

Lie/Deception Detection

The database encompasses approximately 13 h of recordings from 70 subjects, comprising over 8 million video frames captured in visible-light, near-infrared, and thermal spectra. Additionally, the database includes relevant metadata, audio data, and pulse oximeter data [58]. The interviewee's data, including RGB, near-infrared, long-wave infrared recordings, cardiac pulse, blood oxygenation, and audio information, were collected and annotated for further analysis [58]. A situation in

which the interviewee tries to trick the interviewer by giving certain answers

SJTU Emotion EEG data set (SEED) [21]

EEG Signals

Unimodal

Stress/Emotion Detection

The SEED dataset, or the SJTU Emotion EEG Dataset [21], consists of three-class emotional EEG data obtained from 15 individuals. During the data collection, participants were exposed to emotional film clips representing positive, negative, and neutral emotions [21].An EEG dataset acquired from 15 subjects

EEG data set for genuine & acted

emotional expressions [52]

EEG Signals

Unimodal

Stress/Emotion Detection

The dataset involves classifying emotions into genuine, neutral, or simulated categories. During the data collection process, participants wore an EEG headset while being presented with photos or movie clips displayed on a computer monitor [52]. The participants' emotions experienced fluctuations in response to the visual stimuli, reflected in the captured EEG data [52]. EEG recordings of

subjects with genuine and fake emotional expressions

A Database for Emotion Analysis using Physiological Signals (DEAP) [20]

EEG signals, peripheral physiological signals, and multimedia content analysis

Multimodal

Stress/Emotion Detection

In this dataset, 32 individuals' EEG and peripheral physiological data were monitored while they watched 40 one-minute-long music video snippets. Each film was scored by participants based on its arousal, valence, like/dislike, dominance, and familiarity levels. Frontal face footage was also taken for 22 of the 32 participants [20].EEG recordings and peripheral physiological signals of 32 subjects as each watched 40 one-minute-long excerpts of music videos

SWELL_KW dataset [22, 46]

Computer logs, facial expressions from video recording, Body postures and HRV

Multimodal

Stress/Emotion Detection

This dataset contains readings from 25 participants, subjected to neutral interruptions and pressure conditions for 3 h each [22]. The data collected are computer logs, facial expressions from video recording, body postures using the Kinect 3D sensor, the ECG sensor, and the body sensors for skin conductance level. A dataset that contains readings from 25

participants who were subjected to neutral, interruptions, and pressure conditions for a total of 3 h each

Physical Activity and Stress (PASS) dataset [23]

ECG, EDA, respiration, temperature

Multimodal

Stress/Emotion Detection

This dataset consists of the experimental procedure employed and descriptive statistics of the participants' neurophysiological signals captured under various circumstances. Tasks of varying stress levels were asked to be

performed by participants

Continuous

stress detection on nurses in a hospital [24]

EDA, ECG,

accelerometer data, temperature

Multimodal

Stress/Emotion Detection

This dataset provides physiological stress indicators for nurses working in real hospital environments during the COVID-19 pandemic. It was created primarily to conduct research on stress in the workplace setting and was collected using data streams from Empatica E4 devices [24]. Physiological data were monitored. Survey was

filled out by nurses periodically regarding the aspects that contributed to

stress

PURE [25]

Video, pulse rate, SpO2 readings

Multimodal

Stress/Emotion Detection

Ten subjects were asked to perform different head-head motions. This benchmark dataset focuses on how much the head moves during the measurement was introduced [25]

motions

COFACE [26]

Videos,

physiological signals (contact photoplethysmography and respiration)

Multimodal

Stress/Emotion Detection

This dataset includes 160 movies and physiological information collected from 40 healthy adults over several days. The group was composed of 70% men and 30% women. Participants were recorded for one minute using a standard webcam, while their physiological data were recorded using a Blood-Volume Pulse sensor and a respiration belt [26].Data collected from 40 subjects over several

days for realistic conditions

MAUS [27]

ECG, PPG, GSR signals

Multimodal

Stress/Emotion Detection

This dataset includes data collected from 22 healthy graduate students who were given guidelines and signed a consent form before the test [27]. The study participants' age had an average of 23 years and a standard deviation of 1.7. The mental workload was checked using wearable sensors

VIPL-HR [28]

Visible light

videos, Near infrared videos

Multimodal

Stress/Emotion Detection

This dataset consists of 2,378 visible light videos (VIS) and 752 near-infrared (NIR) videos, capturing 107 subjects. The VIPL-HR database encompasses diverse variations, including head movements, illumination variations, and changes in acquisition devices [28]. Remote heart rate estimation was done using a face

Videos

(Multimodal Sentiment Analysis) – Stress (MuSe-Stress) dataset [29]

Text, audio, video, and physiological data like skin temperature, skin conductance, breathing rate, and heart rate

Multimodal

Stress/Emotion Detection

This dataset consists of stressed emotions that consist of recordings of 28 college students from the University of Michigan, nine females and 19 males, in two sessions: one during which an external stressor (the University of Michigan's final exam period) was present, and the other session during which the stressor was absent. Each recording lasts about 45 min in total. Each individual is exposed to various emotional stimuli, including brief movies and questions that evoke strong emotions [29]. Three separate datasets are used to analyse sentiments and detect emotions and humour

Koko website [30]

Text (corpus)

Unimodal

Abnormal Behavior Detection

The corpus contained 500,000 written posts and was annotated into three classes, thinking errors, emotions, and situations [30]

and was annotated into three classes, thinking errors, emotions, and situations

Abnormal crowd behaviour dataset [31]

Video

Multimodal

Abnormal Behavior Detection

This dataset is a collection of normal and abnormal crowd recordings. The collection consists of films from 11 different escape event scenarios shot in 3 indoor and outdoor settings. Each video starts with a segment on regular behaviour and concludes with segments on deviant conduct [31].Computer vision methods employed on videos collected of pedestrians in crowded areas

Wearable Stress and Affect Detection (WESAD) dataset [32]

ACC, BVP, ECG, EMG, EDA, RESP, TEMP

Multimodal

Abnormal Behavior Detection

This dataset is a publicly available collection of physiological data from 15 individuals recorded during a lab experiment using chest- and wrist-worn devices. The data were collected under five conditions: Baseline, Amusement, Stress, Meditation, and Recovery [32]. Tphysiological data from 15 subjects captured from the wrist and chest-worn devices

Multimodal Analysis of Human Nonverbal Behavior in Conversations – Human–Computer Interaction. (MAHNOB-HCI) dataset [33]

Audio signals, face videos, eye gaze data, and peripheral/ central nervous system physiological signals

Multimodal

Abnormal Behavior Detection

This dataset comprises multimodal data, including face videos, audio signals, eye gaze data, and physiological signals from the peripheral and central nervous systems. Two experiments were conducted with 27 participants of diverse cultural backgrounds and genders. Participants watched 20 emotional movies in the first experiment and self-reported their feelings using specific emotional keywords. The second experiment involved showing short films and photos with and without tags, and participants rated their agreement or disagreement with the displayed tags. The captured movies and corresponding physical reactions were segmented and stored in a database [33]. 27 subjects watched 20 emotional videos and reported the emotions they feel

Bio-reactions and faces for emotion-based personalisation (BIRAFFE) dataset [34]

ECG, GSR, facial

expression signals and hand movements through the accelerometer and gyroscope

Multimodal

Abnormal Behavior Detection

Individuals were subjected to audio–video stimuli and a three-level emotion-evoking game. The whole BIRAFFE dataset consists of data gathered from 201 out of 206 participants13. Unfortunately, some of the data was not properly collected for some participants due to, e.g., applications crashing, Bluetooth signal being lost, and poor electrode contact. Finally, the real data is available for 141 subjects [34]

MuSe (Multimodal Sentiment Analysis)-Physio dataset [35]

EDA, GSR, audio,

video, heart rate, respiration

Multimodal

Abnormal Behavior Detection

In this database, human

annotations were used to predict psycho-physiological responses. 69 participants

(49 of them female) are aged between 18 and 39 years, providing about 6 h of data for the MuSe-Stress and MuSe-Physio sub-challenges. Besides audio, video, and texts, the participants can optionally utilise the ECG, RESP, and BPM signals

Human

annotations were used to predict psycho-physiological responses