Emotion elicitation stimuli
Emotion elicitation is required in the subject to obtain a high-quality database for building an emotion classification model [6, 43]. To obtain a good emotion dataset it is important to elicit the emotion in the subject naturally as EEG is measured in ANS/CNS. Many different protocols for emotion elicitation on subjects have been proposed using various types of stimuli. The most common of all is audiovisual. Images have also been used to elicit emotions, international affective picture system (IAPS) [44, 45] is a very well-known project to develop an international framework for emotional elicitation based on a static picture. The link between music and emotions has also inspired the use of music and audio stimuli to elicit emotions [40]. Memory recalling [43] and multimodal approach [6] are some other strategies used.
The purpose of the emotion elicitation technique is to stimulate the desired emotion in the subject by eliminating the possibility of stimulating multiple emotions. A study by Gross and Levenson reveals that psychologically characterized movies attained better outcomes considering its dynamic profile [46].
Data used
In this review article, results from various entropy approaches have been reviewed. Different researchers used different emotion database as per the need and suitability of the data for better results.
There are two benchmark databases; dataset for emotion analysis using physiological signals (DEAP) and SJTU Emotion EEG Dataset (SEED) that are widely used. The other datasets were similarly recorded, but are not publicly available for use.
DEAP dataset
Dataset for emotion analysis using physiological signals (DEAP) is a multimodal dataset designed to study human cognitive states by Queen Mary University in London. The database has been obtained using a BioSemi acquisition system of 32 channels and it recorded the electroencephalogram (EEG) and peripheral physiological signals of 32 participants as each subject viewed 40 1-min music video excerpts. The subjects scored the videos according to the extent of arousal, valence, like/dislike, dominance, and familiarity. Frontal face video was also tracked for 22 of the 32 subjects. A unique stimulus selection approach has been used, using affective tag retrieval from the last.fm website, video highlight identification, and an online assessment tool. The dataset is made freely accessible, and it allows other researchers to use it to check their methods of estimating the effective state. The dataset was first presented in a paper by Koelstra et al. [46]. For further specifics on DEAP database, please find the details online at https://www.eecs.qmul.ac.uk/mmv/datasets/deap/index.html.
SEED dataset
The SEED dataset includes 62-channel EEG recordings from 15 subjects (7 males and 8 females, 23.27 \(\pm\) 2.37 years) according to the international standard 10–20 system. The emotions of the subjects are triggered through 15 video shots, and every video shot is of 4 min duration. It measures three types of emotions (positive, neutral, negative) and every category of emotion is associated with five video shots, respectively. At a time interval of 1 week or longer between multiple sessions, each participant was advised to enroll in the experiments for three sessions [47, 48]. For further specifics on SEED database, please find the details online at http://bcmi.sjtu.edu.cn/home/seed/seed.html.
Entropy feature extraction
In building an emotion recognition system, different features are required to be retrieved, whichever better describes the behavior (either static or dynamic) of brain electrical activity during different emotional states. Distinct kinds of emotions depending on different entropy characteristics are assessed in this article.
Entropy function is a dynamic feature that describes the chaotic nature of any system and evaluates the amount of information acquisition which could be employed to isolate necessary information from the interfering data [49]. Greater the value of entropy, greater is the irregularity of the system. This section provides a concise description of various entropy used as feature to classify different emotions.
Sample entropy
Sample entropy (SampEn) quantifies a physiological signal’s complexity irrespective of the signal length and therefore has a trouble-free implementation. Conceptually, sample entropy is based on the conditional probability that two sequences of length ‘n + 1’ randomly selected from a signal will match, given that they match for the first ‘n’ elements of the sequences. Here ‘match’ means that the distance between the sequences is less than some criterion ‘k’ which is usually 20% of the standard deviation of the data sequence taken into account. Distance is measured in a vector sense. Defining ‘k’ as a fraction of the standard deviation eliminates the dependence of SampEn on signal amplitude. The conditional probability is estimated as the ratio of the unconditional probabilities of the sequences matching for lengths ‘n + 1’ and ‘n’, and SampEn is calculated as the negative logarithm of this conditional probability [49]. Thus, SampEn is defined as:
$${\text{Sample}}\;{\text{Entropy}} = - \log \left( {\frac{{A^{n} \left( k \right)}}{{B^{n} \left( k \right)}}} \right),$$
(1)
where \(B^{n} (k)\) is the estimated probability that two sequences match for n points, and \(A^{n} \left( k \right)\) is the estimated probability that the sequences match for n + 1 points. \(A^{n} \left( k \right)\) and \(B^{n} (k)\) are evaluated from data using a relative frequency approach.
Sample entropy is comparatively reliable and decreases the bias of approximate entropy [31]. Greater value of estimated sample entropy suggests that signal is extremely unpredictable and lower value suggests that signal is predictable. The value of ‘n’ has to be considered as per the preference of the researcher and it differs from work to work.
SampEn’s advantage is that it can distinguish a number of systems from one another. It gives a much better result than approximate entropy with the theory of random numbers. Self matches are not included in this entropy so the bias decreases. The entropy esteems are relatively steady across various sample lengths. However, lack of consistency for sparse data is the key downside of SampEn.
Dynamic sample entropy
As the name suggests dynamic sample entropy (DySampEn) is the dynamic extension of the Sample entropy and is applied to evaluate the EEG signal. DySampEn is often seen as the dynamic feature of EEG signals that can track the temporal dynamics of the emotional state over time. Dynamic sample entropy determination strategy follows the calculation of SampEn from EEG signals by sliding time windows (a set of consecutive time windows) [29] by employing sliding time windows with window length \(t_{w}\) and moving window length ∆t, the DySampEn can be expressed as:
$${\text{DySampEn}}\left( {n,k} \right) _{\left\langle l \right\rangle } = {\text{SampEn}}\left( {n,k} \right)_{\left\langle l \right\rangle } , \quad 1 \le l \le w,$$
(2)
where subscript \(\left\langle l \right\rangle\) represents sliding time windows (k = 1,2,3,…\(w\)) and \(w = \left[ {\frac{{T - t_{w} }}{\Delta t}} \right] + 1\) is measure of sliding time windows and T is the total length of EEG signal and \(\left[ \cdot \right]\) corresponds to floor function that rounds \(\frac{{T - t_{w} }}{\Delta t}\) to largest integer not exceeding \(\frac{{T - t_{w} }}{\Delta t}\).
Time window length \(t_{w}\) and moving window length ∆t have to be taken as per research need. As EEG features a temporal profile, the benefits of this entropy is that it can provide more accurate emotionally relevant signal patterns for emotion classification than sample entropy [18, 29].
Differential entropy
Differential entropy (DE) is the entropy of a continuous random variable and measures its uncertainty. It is also related to minimum description length. One may describe the mathematical formulation as
$$h\left( X \right) = - \int \limits_{X}^{{}} f\left( x \right) \log \left[ {f\left( x \right)} \right]{\text{d}}x,$$
(3)
where X is a random variable, \(f\left( x \right)\) is its probability density. So for any time series X obeying gauss distribution \(N\left( {\mu ,\sigma^{2} } \right)\), its differential entropy can be expressed as [50]:
$$\begin{aligned} h\left( X \right) & = - \int \limits_{ - \infty }^{\infty } \begin{array}{*{20}c} {\frac{1}{{\sqrt {2\pi \sigma^{2} } }}e^{{ - \left[ {\frac{{\left( {x - \mu } \right)^{2} }}{{2\sigma^{2} }}} \right] }} } \\ \end{array} \log \left[ {\frac{1}{{\sqrt {2\pi \sigma^{2} } }} e^{{ - \left[ {\frac{{\left( {x - \mu } \right)^{2} }}{{2\sigma^{2} }}} \right]}} } \right]{\text{d}}x \\ & = \frac{1}{2} \log \left( {2\pi e\sigma^{2} } \right). \\ \end{aligned}$$
(4)
Disadvantage is estimation of this entropy is quite difficult in practice, as it requires estimation of density of X, which is recognized to be both theoretically difficult and computational demanding [3, 51].
Power spectral entropy
Power spectral entropy (PSE) is standardized model of Shannon’s entropy. It utilizes the component of the power spectrum amplitude of the time series to evaluate entropy from data [3, 50], i.e., it measures the spectral complexity of any EEG signal and so also regarded as frequency domain information entropy [51].
Mathematically it is given by
$${\text{PSE}} = - \mathop \sum \limits_{f} P_{f} \log P_{f} ,$$
(5)
where \(P_{f}\) is power spectral density.
Shannon’s entropy (ShEn) is a set of relational variables that changes linearly with the logarithm of a range of possibilities. This is also a data spread metric, which is widely applicable in a system’s dynamic order determination.
Shannon’s entropy being based on the additivity law of the composite system, i.e., if a system is divided into two statistically independent subsystems A and B then as per additivity law
$$S\left( {A, \, B} \right) = S\left( A \right) + S\left( B \right),$$
where S (A, B) is the total entropy of the system and S(A) and S(B) are entropy of subsystems A and B, respectively. So, Shannon’s entropy successfully addresses extensive (additive) systems involving short-ranged effective microscopic interactions. Now, physically ‘‘dividing the total system into subsystems’’ implies that the subsystems are spatially separated in such a way that there is no residual interaction or correlation. If the system is governed by a long-range interaction, the statistical independence can never be realized by any spatial separation since the influence of the interaction persists at all distances and therefore correlation always exists for such systems. This explains the disadvantage of ShEn that it fails miserably for non-extensive (non-additive) systems that is governed by long-range interactions. It overestimates the entropy level when a larger number of domains are considered, and also does not clarify the temporal connection between different values extracted from a time series signal [18].
Wavelet entropy
One of the quantitative measures in the study of brain dynamics is wavelet entropy (WE). It quantifies the degrees of disorder related to any multi-frequency signal response [52].
It is obtained as
$${\text{WE}} = - \mathop \sum \limits_{i < 0} P_{i} \ln P_{i} ,$$
(6)
where \(P_{i}\) defines the probability distribution of a time series signal and \(i\) defines different resolution level [18].
It is utilized to recognize episodic behavior in EEG signal and provides better results for time-varying EEG [53]. The benefit of wavelet entropy is it efficiently detects the subtle variations in any dynamic signal. It takes lesser computation time, noise can be eliminated easily and its performance is independent of any parameters [18].
EMD approximate entropy
Approximate entropy (ApEn) is a ‘regularity statistics’ measuring the randomness of the fluctuation in given data set. Generally, one may assume that existence of repeated fluctuation patterns in any data set makes it a little less complex than other dataset without many repetitive patterns. Approximate entropy ensures identical patterns of predictions are not accompanied by further identical patterns. A data set with a lot of recurring motifs/patterns has notably lower ApEn, whereas a more complex, i.e., less predictable data set has higher ApEn [54]. ApEn estimation algorithm is described in many papers [33, 54,55,56,57]. Mathematically it can be calculated as
$${\text{ApEn}}\; \left( {n,k,N} \right) = \ln \left( {\frac{{C_{n} \left( k \right)}}{{C_{n + 1} \left( k \right)}}} \right),$$
(7)
where \(C_{n} \left( k \right)\) and \(C_{n + 1} \left( k \right)\) are pattern mean of length \((n)\) and \(\left( {n + 1} \right)\), respectively. ApEn is robust to noise and relies on a less amount of data. It detects changes in series and compares the similarity of samples by pattern length \(n\) and similarity coefficient ‘\(k\)’. The appropriate selection of parameters ‘n’ (subseries length), k (similarity tolerance/coefficient) and N (data length) is critical and are chosen as per research needs. Traditionally, for some of clinical datasets, ‘n’ is to be set at 2 or 3, ‘k’ is to be set between 0.1 and 0.25 times the standard deviation of time series taken into account to eliminate the dependence of entropy on signal’s amplitude and N as equal to or greater than 1000. However, these values do not always produce optimal results for all types of data. The paper cited presents a method that employs the empirical mode decomposition (EMD) to disintegrate the EEG data and then calculates ApEn of disintegrated data and so, is called E-ApEn. EMD is a time frequency analysis method that decomposes nonlinear signals into oscillations at various frequencies.
The advantages of EMD-ApEn are that it is measurable for shorter datasets with high interference and it effectively distinguishes various systems based on their level of periodicity and chaos [49, 54]. The disadvantages are it strongly depends on the length of input signal [58]. Meaningful interpretation of entropy is compromised by significant noise. As it depends on length, it’s a biased statistics [18, 59].
Kolmogorov Sinai entropy
The volatility of data signal over time is assessed using entropy defined by Kolmogorov Sinai shortly known as KS entropy. It is determined by identifying points on the trajectory in phase space that is similar to each other and not correlated with time. Divergence rate of these point pairs yields the value of KSE [60], calculated as
$${\text{KSE}} = \mathop {\lim }\limits_{r \to 0} \mathop {\lim }\limits_{m \to \infty } \frac{1}{\tau }\frac{{C_{m} \left( {r, N_{m} } \right)}}{{C_{m + 1} \left( {r, N_{m + 1} } \right)}},$$
(8)
where \(C_{m} \left( {r, N_{m} } \right)\) is the correlation function which provides probability of two points being closer to each other than r. Higher KSE value signifies higher unpredictability. Hence KSE does not give the accurate results for signals with slightest noise.
The advantages is that it differentiates between periodic and chaotic systems effectively [61, 62]. And this being decayed towards zero with increasing length is its main limitation [63].
Permutation entropy
It is also possible to measure the intricacy of brain activity using the symbolic dynamic theory where a data set is being plotted to a symbolic sequence through which the permutation entropy (PE) is generated. The highest value of PE is 1, signifying that the data series is purely unpredictable; whereas the lowest value of PE is 0, signifying that the data series is entirely predictable [4, 76]. At higher frequency, permutation entropy amplifies with the incongruity of data series while permutations related to reported oscillations are seldom at a lower frequency.
Mathematically PE is described as
$${\text{PE}} = - \mathop \sum \limits_{i = 1}^{n} P_{i} \log_{2} P_{i} ,$$
(9)
where \(P_{i}\) represents the relative frequency of possible sequence patterns, \(n\) implies permutation order of \(n \ge 2\) [63,64,65].
Permutation entropy is a measure of chaotic and non-stationary time series signal in the presence of dynamical noise. This algorithm is reliable, effective, and yields instant outcomes regardless of the noise level in data [64, 65]. Thus, it can be used for processing of huge data sets without preprocessing of data and fine-tuning of complexity parameters [13]. The advantages of this entropy are it is simple, robust and less prone to computational complexity. It is applicable to real and noisy data [66], does not require any model assumption and is suitable for the analysis of nonlinear processes [67]. The main limitation is its inability to include all ordinal patterns or permutations of order ‘n’, when ‘n’ is assigned a larger value for a finite input time series [18, 68].
Singular spectrum entropy
Entropy calculated from singular spectrum analysis (SSA) components are known as singular spectrum entropy. SSA is an important signal decomposition method based on principal component analysis, which can decompose the original time series into the sum of a small number of interpretable components. SSA usually involves two complementary stages, one is the stage of decomposition and the other is the stage of reconstruction. The stage of decomposition consists of two steps: embedding and singular value decomposition (SVD). The stage of reconstruction also consists of two steps: grouping and diagonal averaging [69]. Singular spectrum entropy function represents instability of energy distribution and is a predictor of event-related desynchronization (ERD) and event-related synchronization (ERS) [70].
Multiscale fuzzy entropy
The measures of fuzziness are known as fuzzy information measures and the measure of a quantity of fuzzy information gained from a fuzzy set or fuzzy system is known as fuzzy entropy. No probabilistic concept is needed to define fuzzy entropy like the other classical entropy that needs probabilistic concept. This is due to the fact that fuzzy entropy contains vagueness and ambiguity uncertainties, while Shannon entropy contains the randomness uncertainty (probabilistic).
Multiscale fuzzy entropy extracts multiple scales of original time series with a coarse-gaining method and then calculates the entropy of each scale separately. Assuming an EEG signal with ‘N-point’ samples is reconstructed to obtain a set of ‘m-dimensional’ vectors with \(n,\) \(r\) and \(D_{ij}^{m}\) taken as width, gradient and the similarity degree of the two vectors (fuzzy membership matrix), respectively, final expression for fuzzy entropy appears as
$${\text{Fuzzy}}\;{\text{entropy}} \left( {m,n,r} \right) = \mathop {\lim }\limits_{N \to \infty } \left[ {\ln \varphi^{m} \left( {n,r} \right) - \ln \varphi^{m + 1} \left( {n,r} \right) } \right].$$
(10)
It can also be defined as \(\left[ {\ln \varphi^{m} \left( {n,r} \right) - \ln \varphi^{m + 1} \left( {n,r} \right)} \right]\) for EEG signals where number of given time series sample N is limited, where \(\varphi^{m} \left( {n,r} \right)\) is a function defined to construct a set of (m + 1)-dimensional vector and is taken as:
$$\varphi \left( {n,r} \right) = \frac{1}{{\left( {N - m} \right)}}\mathop \sum \limits_{i = 1}^{N - m} \left[ {\frac{1}{N - m - 1}\mathop \sum \limits_{j = 1,j \ne 1}^{N - m} D_{ij}^{m} } \right],$$
(11)
with fuzzy membership matrix \(D_{ij}^{m} = \mu \left( {D_{ij}^{m} , n , r} \right) = \exp \left( { - (D_{ij}^{m} )^{n} /r} \right).\)
For detailed mathematical formulation one can refer to [71, 72]. The advantage of this entropy is that it is insensitive to noise; and is highly sensitive to changes in the content of information [32, 68, 71].
Recurrence quantification analysis entropy
This is a measure of the average information contained in the line segment distribution or line structures in a recurrence plot. Recurrence plot is a visualization or a graph of a square matrix built from the input time series. This is one of the state-space trajectories-based approaches of recurrence quantification analysis (RQA). This helps to compute the number and duration of the recurrence of a chaotic system [73]. RQA evaluates the forbidden precession of a data set and is computed to portray a time-varying input signal in contexts of its intricacy and randomness. Recurrence entropy helps detect chaos–chaos transitions, unstable periodic orbits, time delays, and extracts appropriate information from short and nonlinear input signals [18, 30].
Classification
After extracting features that seem to be appropriate to the emotional responses, these are then used to build a classification model with the intent to recognize specific emotions employing proposed attributes. Different classifiers like K-Nearest Neighbor (KNN) [74], Support Vector Machines (SVM) [3, 29, 32, 51, 52], integration of deep belief network and SVM (DBN-SVM) [75], channel frequency convolutional neural network (CFCNN) [76], multilayer perception, time-delay neural network, probabilistic neural network (PNN) [77], least-square SVM, etc., are used by various researchers for emotion recognition. It is difficult to compare the different classification algorithms, as different research works employ different datasets, that differs significantly in the manner emotions are evoked. In general, the recognition rate is significantly greater when various physiological signals such as EEG, GSR, PPG, etc., are employed together compared to the use of a single physiological signal for emotion recognition [78].