Fuzzy clusteringbased feature extraction method for mental task classification
 Akshansh Gupta^{1}Email author and
 Dhirendra Kumar^{2}
Received: 3 May 2016
Accepted: 22 August 2016
Published: 3 September 2016
Abstract
A brain computer interface (BCI) is a communication system by which a person can send messages or requests for basic necessities without using peripheral nerves and muscles. Response to mental taskbased BCI is one of the privileged areas of investigation. Electroencephalography (EEG) signals are used to represent the brain activities in the BCI domain. For any mental task classification model, the performance of the learning model depends on the extraction of features from EEG signal. In literature, wavelet transform and empirical mode decomposition are two popular feature extraction methods used to analyze a signal having nonlinear and nonstationary property. By adopting the virtue of both techniques, a theoretical adaptive filterbased method to decompose nonlinear and nonstationary signal has been proposed known as empirical wavelet transform (EWT) in recent past. EWT does not work well for the signals having overlapped in frequency and time domain and failed to provide good features for further classification. In this work, Fuzzy cmeans algorithm is utilized along with EWT to handle this problem. It has been observed from the experimental results that EWT along with fuzzy clustering outperforms in comparison to EWT for the EEGbased response to mental task problem. Further, in case of mental task classification, the ratio of samples to features is very small. To handle the problem of small ratio of samples to features, in this paper, we have also utilized three wellknown multivariate feature selection methods viz. Bhattacharyya distance (BD), ratio of scatter matrices (SR), and linear regression (LR). The results of experiment demonstrate that the performance of mental task classification has improved considerably by aforesaid methods. Ranking method and Friedman’s statistical test are also performed to rank and compare different combinations of feature extraction methods and feature selection methods which endorse the efficacy of the proposed approach.
Keywords
1 Introduction
Brain computer interface (BCI) is a communication system by which a person can send messages or request for basic necessities via his or her brain signals without using peripheral nerves and muscles [1]. It is one of the areas which has contributed to the development of neuronbased techniques to provide solutions for disease prediction, communication, and control [2–4]. Three acquisition modalities have been discussed in the literature [5, 6], viz, invasive (microelectrode array), semiinvasive [electrocorticography (ECoG)], and noninvasive (EEG) for capturing signals corresponding to brain activities. EEG is a widely preferred technique to capture brain activity for BCI system [7, 4] as its ability to record brain signals in a nonsurgical manner leading to low cost. Response to mental tasks is one of the BCI systems [8], which is found to be more pragmatic for locomotive patients. This system is based on the assumption that different mental activities lead to typical, distinguishable and taskspecific patterns of EEG signal. The success of this BCI system depends on the classification accuracy of brain signals. Extraction of relevant and distinct features from EEG signal associated with different mental tasks is necessary to develop an efficient classification model.
In the literature, a number of analytic approaches have been employed by the BCI community for better representation of EEG signal such as band power [9], amplitude values of EEG signals [10], power spectral density (PSD) [11–13], autoregressive (AR), and adaptive autoregressive (AAR) parameters [14]. However, the primary issue with AR modeling is that the accuracy of the spectral estimate is highly dependent on the selected model order. An insufficient model order tends to blur the spectrum, whereas an overly large order may create artificial peaks in the spectrum. In fact, the frequency spectrum of the EEG signal is observed to vary over time, indicating that the EEG signal is a nonstationary signal. As a consequence, such a feature extraction method should be chosen which can model the nonstationary effect in the signal for better representation.
The wavelet transform (WT) [15, 16] is an effective technique that can be used to analyze both time and frequency contents of the signal. However, WT uses some fixed basis mother wavelets, independent of the processed signal, which makes it nonadaptive. Another successful method for feature extraction, empirical mode decomposition (EMD) [17], represents the nonlinear and nonstationary signal in terms of modes that correspond to the underlying signal. EMD is a datadriven approach that does not use a fixed set of basis functions, but is selfadaptive according to the processed signal. It decomposes a signal into finite, welldefined, lowfrequency and highfrequency components known as intrinsic mode functions (IMFs) or modes.
Due to multichannel nature of EEG data, the dimensionality of extracted features is very large but the available number of samples per class is usually small in such application. Hence, it suffers from curseofdimensionality problem [18], which also leads peaking phenomena in the phase of designing classifier [19]. To overcome this problem, dimensionality reduction using feature selection is suggested in the literature [20].
In this paper, a twophase approach has been used to determine a reduced set of relevant and nonredundant features to solve the abovementioned issues. In the first phase, features in terms of eight different parameters are extracted from the decomposed EEG signal using empirical wavelet transform (EWT) or the proposed FEWT. In the second phase, the multivariate filter feature selection approach is employed to select a set of relevant and nonredundant features. To investigate the performance of different combinations of the two feature extraction and multivariate feature selection methods, experiments are performed on a publicly available EEG data [4].
The rest of the paper is organized as follows: The EWT have been discussed briefly in Sect. 2. The proposed feature extraction technique for mental task classification and Fuzzy cmeans (FCM) algorithm have been discussed in Sect. 3. Multivariate feature selection methods are included in Sect. 4. Description of experimental setup and results are discussed in Sect. 5. Finally, Sect. 6 includes conclusions and future work.
2 Empirical wavelet transform
The nature of the EEG is nonlinear and nonstationary [21]. To deal this nature of the EEG signal, in recent past, a fixed basis function based on the WT [22, 23] and an adaptive filterbased EMD methods have been applied [24, 25]. The major concern of EMD method is the lack of mathematical theory [26]. Combining properties of these two methods, recently Gilles [26] has proposed a new adaptive basis transform called EWT to extract the mode of amplitudemodulated–frequencymodulated (AMFM) signal. The method to build a family of adaptive (empirical) wavelets of the signal to be processed is the same as the formation of a set of bandpass filters in Fourier spectrum. The idea to achieve the adaptability is the dependency of filter’s supports on the location of the information in the spectrum of the signal [26].
Let \(\omega\) denote the frequency, which belongs to a segmented of N continuous segment, Fourier support, \(\left[ o,\pi \right]\). Further \(\omega _{n}\) denotes the limit between each segment (\(\omega _{0}=0\) and \(\omega _{N}=\pi)\) and \(\Lambda _{n}=\left[ \omega _{n1},\omega _{n}\right]\) denotes a segment such that \(\bigcup _{n=1}^{N}\Lambda _{n}=\left[ 0,\pi \right]\). It is assumed that the each segment having a transition phase, which is centered around \(\omega _{n}\), of width \(2\tau _{n}\) in research work of Gilles [26].
3 Proposed feature extraction approach
Although EWT has been proposed by Gilles [26] for building adaptive wavelet to represent the signal to be processed, the author, however, has mentioned that the proposed method might fail to decompose properly when the input signal, like EEG signal (due to nature of multiple channels), compose of more than one chirp which overlaps in both time and frequency domain. As the performance of the classification model is highly dependent on the extracted features, features obtained using EWT from EEG signals are not suitable to produce an efficient classification model due to the problem mentioned above. Keeping this point into consideration, a very familiar fuzzy clustering method has been employed in this paper. The proposed method is able to deal with the problem of EWT by reassigning the extracted features from EWT to the more similar type of segment using FCM algorithm. And this final processed signal will be able to produce good classification model. The brief description of FCM is given in the next subsection.
3.1 Fuzzy Cmeans
3.2 Feature coding
The proposed approach of extracting features from EEG signal is carried out in three steps. In the first step, the decomposition of the signal into desire number of support (segment) through the EWT is made. FCM clustering algorithm is employed in the second step of the proposed approach to avoid overlapping segments obtained from the first step. To represent each segment more compactly, eight statistical or uncertainty parameters (root mean square, Lempel–Ziv complexity measure [31], shannon entropy, central frequency, maximum frequency, variance, skewness, and kurtosis) have been calculated in the third or final step of the proposed technique as every signal or data have the distinguishable property in terms of a set of statistical parameters associated with the signal or data. It may be possible that the two signals have same value associated with one or more statistical parameter. In this work, these eight parameters are selected empirically.
4 Feature selection
The feature vector from each channel obtained encloses all the features constructed with the above statistical parameters. The final feature vector obtained after concatenation of features from six channels is large, i.e., each feature vector contains 144 parameters (3 EWT segments \(\times\) 8 parameters \(\times\) 6 channels). Hence, feature selection is carried out to exclude noisy, irrelevant, and redundant features.
Two major categories of feature selection methods are the filter method and the wrapper method. In filter method, the relevance of features is determined on the basis of inherent properties such as distance, consistency, and correlation without involving any classifier. Hence, it may not choose the most relevant feature set for the learning algorithm. Alternatively, the wrapper method [32] has a tendency to find relevant features subset, better suited to a given learning algorithm. However, wrapper method is computationally more costly since the classifier needs to be learned for each feature subset separately. On the other hand, filter feature selection method is computationally less intensive and bias free. Filter methods have a simple structure with straightforward search strategy like forward selection, backward selection, or the combination of both.
Filter approach is further classified into two categories [20] as univariate (ranking) and multivariate (feature subset). A scoring function is used by feature ranking method for measuring the relevance of each feature individually. These methods are simple to compute. The research works have used univariate filter method in the BCI field [33–36]. It is noted that the reduced relevant features obtained from using univariate methods significantly improves the classification accuracy. But it ignores the correlation among the features. Hence, the selected feature subset may have high redundancy among features and may not provide high discriminatory capacity.
In the wrapper approach [37, 38], the seminal work of Keirn and Aunon [4] has used a combination of forward sequential feature selection and an exhaustive search to obtain a subset of relevant and nonredundant features for the mental task classification. However, wrapper approach is not suitable for highdimensional data as it is computationally expensive.
On the other hand, efficient time multivariate filter method finds features which are relevant to the class and nonredundant among themselves. Thus, it overcomes the limitations of both univariate and wrapper approaches. Thus, we have preferred most widely used multivariate filter feature selection methods namely Bhattacharya distance measure [39], ratio of scatter matrices [40], and LR [41] for selecting relevant and nonredundant features. Brief discussion of these techniques is given below.
4.1 Bhattacharyya distance
4.2 Ratio of scatter matrices
4.3 Linear regression
5 Experimental setup and results
5.1 Dataset
Each trial is of 10 s duration recorded with a sampling frequency of 250 Hz, which resulted into 2500 samples points per trial. More detail about the data can be found in the work of Keirn and Aunon [4].^{1}
5.2 Construction of feature vector and classification
For all the multivariate filter methods, the top 25 features were incrementally included one by one to develop the decision model of support vector classifier (SVC) using 10fold crossvalidation. We have used Gaussian Kernel. Grid search is used to find optimal choice of regularization constant C and gamma.
5.3 Results

The performance of classification model has significantly improved after incorporating the fuzzy clustering method along with the EWT compare to EWT alone irrespective of with or without feature selection method for all the binary combination mental tasks for all mentioned subjects.

The classification accuracy of a given classifier has drastically increased with the application of feature selection methods (BD, LR, and SR) as compared to without feature selection (WFS) irrespective of feature extraction methods.

From Figs. 4 and 8, for some binary combination of mental tasks 100 % classification accuracy for subject2 and subject7 is achieved.
5.4 Ranking of various combinations of feature selection methods with proposed FEWT method
We have applied a robust ranking approach utilized by Gupta et al. [43], to study the relative performances of various combinations of feature selection methods with the proposed feature extraction method, i.e., FEWT with respect to EWT. To rank various combinations, the basis of percentage gain in classification accuracy with respect to maximum classification accuracy obtained using EWT feature extraction method with combination of various feature selection methods has been chosen.
A mathematical description of this ranking procedure is as follows:
5.5 Friedman statistical test
Friedman ranking of different combinations of feature selection and extraction methods
Combination  Ranking 

LR_FEWT  1 
BD_FEWT  2.4 
SR_FEWT  2.95 
LR_EWT  4.3 
WFS_FEWT  5.25 
SR_EWT  5.75 
BD_EWT  6.35 
WFS_EWT  8 
6 Conclusion and future work
A theoretical adaptive transform, EWT, has been proposed in recent past to analyze signal on its content basis. EWT would fail to handle the signal which is overlapped in time and frequency domain as the case with the EEG signals from multiple channels. This work has suggested employment of FCM followed by EWT for better representation of EEG signal for further classification of mental task. It can be concluded from experimental results that the proposed approach outperforms as compared with the original EWT technique. It is also noted that the features from multiple channels generate a large size of the feature vector, but the available number of samples is small. Under such a situation, the performance of the learning model degrades in terms of classification accuracy and learning time. To overcome this limitation, this paper has investigated and compared three wellknown multivariate filter methods to determine a minimal subset of relevant and nonredundant features. Experimental findings endorse that the employment of feature selection enhances the performance of learning model. Ranking mechanism and Friedman statistical test have also been performed for the strengthening the experimental findings.
As the employment of FCM enhances the performance of EWT technique for the mental task classification, it would be better to explore some other fuzzybased clustering which has been explored in image segmentation [46]. It will also be interesting to explore whether the FEWT would work in other type of BCI such as motor imagery and multimental task classification.
Declarations
Acknowledgments
Both the authors express their gratitude to the Council of Scientific and Industrial Research (CSIR), India, for the obtained financial support in performing this research work. The first author is also thankful to the Grant Number BT/BI/03/004/2003(C) of Bioinformatics Division, Department of Biotechnology, Ministry of Science and Technology, Government of India. The authors are also thankful to the anonymous reviewers for their constructive suggestions.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Graimann B, Allison B, Pfurtscheller G (2010) Brain–computer interfaces: a gentle introduction. Springer, Berlin, pp 1–27Google Scholar
 Anderson CW, Stolz EA, Shamsunder S (1998) Multivariate autoregressive models for classification of spontaneous electroencephalographic signals during mental tasks. IEEE Trans Biomed Eng 45(3):277–286View ArticleGoogle Scholar
 Babiloni F, Cincotti F, Lazzarini L, Millan J, Mourino J, Varsta M, Heikkonen J, Bianchi L, Marciani M (2000) Linear classification of lowresolution EEG patterns produced by imagined hand movements. IEEE Trans Rehabil Eng 8(2):186–188View ArticleGoogle Scholar
 Keirn ZA, Aunon JI (1990) A new mode of communication between man and his surroundings. IEEE Trans Biomed Eng 37(12):1209–1214View ArticleGoogle Scholar
 Kübler A (2000) Brain computer communication: development of a brain computer interface for lockedin patients on the basis of the psychophysiological selfregulation training of slow cortical potentials (SCP). Schwäbische VerlagsGesellschaft, BaltimoreGoogle Scholar
 Schalk G (2008) Braincomputer symbiosis. J Neural Eng 5(1):P1View ArticleGoogle Scholar
 Akram F, Han HS, Kim TS (2014) A P300based brain computer interface system for words typing. Comput Biol Med 45:118–125View ArticleGoogle Scholar
 Bashashati A, Fatourechi M, Ward RK, Birch GE (2007) A survey of signal processing algorithms in braincomputer interfaces based on electrical brain signals. J Neural Eng 4(2):R32View ArticleGoogle Scholar
 Pfurtscheller G, Neuper C, Flotzinger D, Pregenzer M (1997) EEGbased discrimination between imagination of right and left hand movement. Electroencephalogr Clin Neurophysiol 103(6):642–651View ArticleGoogle Scholar
 Kaper M, Meinicke P, Grossekathoefer U, Lingner T, Ritter H (2004) BCI competition 2003–data set IIB: support vector machines for the P300 speller paradigm. IEEE Trans Biomed Eng 51(6):1073–1076View ArticleGoogle Scholar
 Chiappa S, Donckers N, Bengio S, Vrins F (2004) HMM and IOHMM modeling of EEG rhythms for asynchronous BCI systems. ESANN 193–204Google Scholar
 Moore MM (2003) Realworld applications for braincomputer interface technology. IEEE Trans Neural Syst Rehabil Eng 11(2):162–165View ArticleGoogle Scholar
 Palaniappan R, Paramesran R, Nishida S, Saiwaki N (2002) A new braincomputer interface design using fuzzy artmap. IEEE Trans Neural Syst Rehabil Eng 10(3):140–148View ArticleGoogle Scholar
 Penny WD, Roberts SJ, Curran EA, Stokes MJ et al (2000) EEGbased communication: a pattern recognition approach. IEEE Trans Rehabil Eng 8(2):214–215View ArticleGoogle Scholar
 Daubechies I et al (1992) Ten lectures on wavelets, vol 61. SIAM, PhiladelphiaView ArticleMATHGoogle Scholar
 Mallat SG (1989) A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans Pattern Anal Mach Intell 11(7):674–693View ArticleMATHGoogle Scholar
 Huang NE, Shen Z, Long SR, Wu MC, Shih HH, Zheng Q, Yen NC, Tung NC, Tung CC, Liu HH (1998) The empirical mode decomposition and the hilbert spectrum for nonlinear and nonstationary time series analysis. Proc R Soc Lond 454(1971):903–995MathSciNetView ArticleMATHGoogle Scholar
 Bellman RE (1961) Adaptive control processes: a guided tour, vol 4. Princeton University Press, PrincetonView ArticleMATHGoogle Scholar
 Jain AK, Duin RPW, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37View ArticleGoogle Scholar
 Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182MATHGoogle Scholar
 Berger H (1929) Über das elektrenkephalogramm des menschen. Eur Arch Psychiatry Clin Neurosci 87(1):527–570Google Scholar
 Gupta A, Agrawal R, Kaur B (2012) A three phase approach for mental task classification using EEG. In: Proceedings of the International conference on advances in computing. ACM, Communications and Informatics, pp 898–904Google Scholar
 Hazarika N, Chen JZ, Tsoi AC, Sergejew A (1997) Classification of EEG signals using the wavelet transform. Signal Process 59(1):61–72View ArticleMATHGoogle Scholar
 Diez PF, Mut V, Laciar E, Torres A, Avila E (2009) Application of the empirical mode decomposition to the extraction of features from EEG signals for mental task classification. In: Annual International conference of the IEEE engineering in medicine and biology society 2009. IEEE, pp 2579–2582Google Scholar
 Gupta A, Agrawal R (2012) Relevant feature selection from EEG signal for mental task classification. In: Advances in knowledge discovery and data mining. Springer, Berlin, pp 431–442Google Scholar
 Gilles J (2013) Empirical wavelet transform. IEEE Trans Signal Process 61(16):3999–4010MathSciNetView ArticleGoogle Scholar
 Bezdek JC, Ehrlich R, Full W (1984) FCM: the fuzzy cmeans clustering algorithm. Comput Geosci 10(2–3):191–203View ArticleGoogle Scholar
 Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353View ArticleMATHGoogle Scholar
 Nguyen HT (1978) A note on the extension principle for fuzzy sets. J Math Anal Appl 64(2):369–380MathSciNetView ArticleMATHGoogle Scholar
 Tiwari PK, Srivastava AK (2014) Zadeh extension principle: a note. Ann Fuzzy Math Inform 9(1):37–41MathSciNetMATHGoogle Scholar
 Lempel A, Ziv J (1976) On the complexity of finite sequences. IEEE Trans Inf Theory 22(1):75–81MathSciNetView ArticleMATHGoogle Scholar
 Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97(1):273–324View ArticleMATHGoogle Scholar
 GuerreroMosquera C, Verleysen M, Vazquez AN (2010) EEG feature selection using mutual information and support vector machine: a comparative analysis. In: 2010 Annual International conference of the IEEE engineering in medicine and biology society (EMBC). IEEE, pp 4946–4949Google Scholar
 Koprinska I (2010) Feature selection for braincomputer interfaces. In: Washio SC, Tsumoto SI, Yamada TO, Inokuchi A (eds) New frontiers in applied data mining. Springer, Berlin, pp 106–117View ArticleGoogle Scholar
 Murugappan M, Ramachandran N, Sazali Y et al (2010) Classification of human emotion from EEG using discrete wavelet transform. J Biomed Sci Eng 3(04):390View ArticleGoogle Scholar
 RodriguezBermudez G, GarciaLaencina PJ, RocaDorda J (2013) Efficient automatic selection and combination of EEG features in least squares classifiers for motor imagery braincomputer interfaces. Int J Neural Syst 23(4):1350015View ArticleGoogle Scholar
 Bhattacharyya S, Sengupta A, Chakraborti T, Konar A, Tibarewala D (2014) Automatic feature selection of motor imagery EEG signals using differential evolution and learning automata. Med Biol Eng Comput 52(2):131–139View ArticleGoogle Scholar
 Dias NS, Kamrunnahar M, Mendes PM, Schiff S, Correia JH (2010) Feature selection on movement imagery discrimination and attention detection. Med Biol Eng Comput 48(4):331–341View ArticleGoogle Scholar
 Bhattacharyya A (1946) On a measure of divergence between two multinomial populations. Sankhyā: Indian J Stat 7(4):401–406MathSciNetMATHGoogle Scholar
 Devijver PA, Kittler J (1982) Pattern recognition: a statistical approach, vol 761. PrenticeHall, LondonMATHGoogle Scholar
 Park HS, Yoo SH, Cho SB (2007) Forward selection method with regression analysis for optimal gene selection in cancer classification. Int J Comput Math 84(5):653–667MathSciNetView ArticleMATHGoogle Scholar
 Chernoff H (1952) A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations. Ann Math Stat 493–507Google Scholar
 Gupta A, Agrawal R, Kaur B (2015) Performance enhancement of mental task classification using EEG signal: a study of multivariate feature selection methods. Soft Comput 19(10):2799–2812View ArticleGoogle Scholar
 Derrac J, García S, Molina D, Herrera F (2011) A practical tutorial on the use of nonparametric statistical tests as a methodology for comparing evolutionary and swarm intelligence algorithms. Swarm Evolut Comput 1(1):3–18View ArticleGoogle Scholar
 Friedman M (1937) The use of ranks to avoid the assumption of normality implicit in the analysis of variance. J Am Stat Assoc 32(200):675–701View ArticleMATHGoogle Scholar
 Verma H, Agrawal RK, Kumar N (2014) Improved fuzzy entropy clustering algorithm for MRI brain image segmentation. Int J Imaging Syst Technol 24(4):277–283View ArticleGoogle Scholar