EEG-based classification of epilepsy and PNES: EEG microstate and functional brain network features

Epilepsy and psychogenic non-epileptic seizures (PNES) often show over-lap in symptoms, especially at an early disease stage. During a PNES, the electrical activity of the brain remains normal but in case of an epileptic seizure the brain will show epileptiform discharges on the electroencephalogram (EEG). In many cases an accurate diagnosis can only be achieved after a long-term video monitoring combined with EEG recording which is quite expensive and time-consuming. In this paper using short-term EEG data, the classification of epilepsy and PNES subjects is analyzed based on signal, functional network and EEG microstate features. Our results showed that the beta-band is the most useful EEG frequency sub-band as it performs best for classifying subjects. Also the results depicted that when the coverage feature of the EEG microstate analysis is calculated in beta-band, the classification shows fairly high accuracy and precision. Hence, the beta-band and the coverage are the most important features for classification of epilepsy and PNES patients.


Introduction
Abnormal electrical activity in the brain can cause epileptic seizures. When a person has repeated seizures, this condition is called epilepsy. Hence epilepsy is a transient occurrence of signs and/or symptoms due to abnormal excessive and/or synchronous neuronal activity in the brain [1]. The visible effect (i.e., the seizure) varies from temporary confusion, loss of awareness. Patients seldomly are prior aware of the occurrence of seizures increasing the risk of physical injury. Psychogenic nonepileptic seizures (PNES) are events resembling an epileptic seizure, but without the characteristic electrical discharges associated with epileptic seizures [2] that have a psychogenic origin [3]. The symptoms of PNES usually reflect a psychological conflict that is often associated with distress, disability, and have a poor prognosis when not timely and accurately diagnosed and treated [4]. PNES episodes are not purposely produced by the patient, and the patient is not aware that the seizures are non-epileptic, so the patient may become anxious when having these symptoms. The presentation of the differential diagnosis should be done early in the course of treatment for better patient acceptance, and treatment options should be presented early in the evaluation period [5].
Early diagnosis of epilepsy or PNES is critical. Because of delay in early diagnosis, many patients experience significant morbidity from inappropriate treatment, including adverse effects of antiepileptic drugs and aggressive interventions, such as intubation for pseudostatus epilepticus [6]. However, PNES may be misdiagnosed as epilepsy, and patients are often treated with an incorrect diagnosis [7] with potentially important side-effects. The failure to recognize the psychological cause of the disorder detracts physicians from addressing associated

Open Access
Brain Informatics *Correspondence: n.ahmadi@tue.nl 1 Department of Mathematics and Computer Science, Eindhoven University of Technology, TU/e, P.O.Box: 513, 5600MB Eindhoven, NL, The Netherlands Full list of author information is available at the end of the article psychopathology, and enhances secondary somatization processes [5].
During a PNES, the brain's electrical activity remains normal but in case of an epileptic seizure, interictal epileptiform discharge (IED) occurs. Hence optimal differential diagnosis between epilepsy and PNES can be made based on video-EEG monitoring, during which an attempt is done to record a seizure while recording video and EEG. Besides interpretation of the semiology on the video, the EEG can help in the differentiation between both. If muscle activity is not to prominent, the occurrence of ictal electrical discharges during a seizure can confirm the diagnosis of epilepsy over PNES. When no ictal discharges are observed not certain diagnosis can be made; however, semiology often helps in the diagnosis. In Ref. [8] a clear guidance on standards for the diagnosis of PNES has been delineated. However, long-time EEG monitoring and recording are quite expensive. If we can exclude PNES patients based on a short-term EEG recording, this would reduce the recording cost and burden waiting lists for EEG monitoring units.
It has been shown that the evolutionary pattern of the frequencies of rhythmic movement artifacts on EEG during PNES differs from that of epileptic seizure [9]. Convulsive PNES were demonstrated to display a characteristic pattern of rhythmic movement artifact that remains stable over time during the event, whereas the EEG activity during convulsive epileptic seizure tends to evolve over time [9]. This finding indicated that timefrequency analysis of data from a wristband movement monitor has the potential to be utilized as a diagnostic tool to differentiate between PNES and epileptic seizure with a high sensitivity and specificity [10,11]. Using a seizure detection and classification algorithm, Naganur et al. [11] examined the diagnostic utility of an automated analysis with an ambulatory accelerometer using EEG moments that show seizure-like activity. Also, in our previous work [12] we classified the PNES and epileptic seizure with a very high accuracy using EEG data including seizure-like activities.
However, as we mentioned earlier, the issue in EEG/ video monitoring of the patients with epileptic seizure is that the IED occurs unpredictably. Hence, it is necessary to record EEG for a very long time to see if any epileptiform discharges occur and then use those data for further analysis. Therefore, the aim of this research is finding discriminative features in short-term EEG signals and brain networks in epilepsy patients compared to PNES subjects in the absence of an IED (or seizure) to effectively classify these two groups. Classification of the disorders using IED-free EEG data makes the classification quite challenging. To the best knowledge of the authors, no similar work has been reported in the literature.
At first, we use EEG signal features for automatic classification of the groups. The first EEG signal analysis step is known as feature extraction that aims at describing the EEG signals by (ideally) a few relevant values called features [13]. Such features should capture the information embedded in EEG signals that is relevant to describe the mental states to identify, while rejecting the noise and other non-relevant information. Hence, the purpose of feature extraction is not only to reduce the dimensionality but also to extract more useful/dominant information hidden in the signals by avoiding unnecessary or redundant information.
We also apply the functional brain network analysis to extract network features for classification purpose. A functional network is a mathematical representation of the brain and is defined by a collection of nodes and links between pairs of nodes. Nodes in a functional brain network represent brain regions, while links represent functional connections corresponding to the magnitude of the temporal correlation between node pairs [14]. Functional connectivity is highly timedependent, often changing in a matter of tens or hundreds of milliseconds as functional connections are continually modulated by sensory stimuli and task context. A network formulation simplifies the analysis of brain by providing mathematical tools able to capture different aspects of its organization in a compact and straightforward manner [15,16]. Graph theoretical methods have been extensively applied to many neuroimaging datasets to describe the topological properties of both functional and structural networks.
In the absence of an IED, the EEG signals of epilepsy and PNES are quite similar and common EEG signal and/ or network features may not act as accurate discriminative parameters for classification purpose. Hence, we need to apply a capable analysis with high resolution in time to extract discriminative features. Hence, we also apply EEG microstate analysis to explore if abnormalities in microstates can identify patients with epilepsy and PNES with high accuracy. Microstate analysis is an alternative EEG representation that defines states of the multichannel EEG recording by spatial topographies of electric potentials over the electrode array. This method was first proposed by Lehmann et al. [17], who showed that the alpha frequency band (8-12 Hz) of a multichannel resting-state EEG recording can be parsed into a few number of discrete quasi-stable states that remain dominant for around 80-120 µs before abruptly transitioning to another state. These quasi-stable states are defined by topographic maps of electric potentials recorded in a multichannel array over the scalp. These periods of states are called functional microstates and the discrete spatial configurations are known as microstate classes/maps. Compared to other EEG analysis techniques, spatial analysis of EEG using microstates has several advantages. Most importantly, the spatial topography of the EEG recording can be defined at any data point independently of the preceding topography and therefore has millisecond resolution. Hence, microstates are better suited to detect rapid, dynamic activity in large-scale neurocognitive networks than many traditional methods like frequency power EEG analysis [18]. The spatial EEG signal analysis with microstates simultaneously considers the signal from all electrodes to create a global representation of a functional state. The rich syntax of the microstate time series offers a variety of new quantifications of the EEG signal with potential neurophysiological relevance [19]. In addition, parsing the EEG into microstates can be used to select epochs of interest that correspond to a certain microstate class, which can be further examined using other analysis methods such as timefrequency analysis. Therefore, EEG microstate analysis offers a capable, cost-worthy and clinically translatable neurophysiological approach to study large-scale neural networks and investigate temporally coherent network activity, as it has been suggested to reflect global functional states of the brain in health and brain disorders [19][20][21][22].
The rest of this paper is structured as follows. In Sect. 2, the mathematical methods that we apply for classification purpose are presented. Classification results are presented in Sect. 3, following by concluding remarks in Sect. 4.

Clinical EEG data
The dataset used in this section was obtained from Ghent University Hospital in Belgium with whom a larger multidisciplinary brain research program, called Neu3CA [23], is ongoing. The EEG recordings were obtained from 5 epilepsy and 5 PNES patients. The recordings from each patient include 27 EEG recording electrodes (based on the standard 10-20 acquisition system) and reference (G2) on the right mastoid bone plus the ground (G1) on the left mastoid bone. The sampling rate of all data channels is 256 Hz and the duration of each acquired raw EEG data is 3 h. The 27 channels are Fp1, Fpz, Fp2, F7, F3, Fz, F4, F8, C3, Cz, C4, T7, T8, P7, P8, P3, Pz, P4, O1, Oz, O2, T9, T10, FT9, FT10, TP9 and TP10.
For each patients groups, 50 IED-free epochs, which are termed as subjects, with the duration of 16 s and with the same classification labels were extracted as they contain the least amount of noise or artifact. Thus, we have 100 subjects including 50 Epilepsy and 50 PNES epoches. Then, all epochs were band-passed filtered for the frequency range of 1-40 Hz to further minimize contamination by high-frequency artifact. Finally, each segment is decomposed to its sub-band frequencies. The main frequency sub-bands are delta (below 4 Hz), theta (4-8 Hz), alpha (9-13 Hz), beta (14-30 Hz) and gamma (above 30 Hz).
To avoid overfitting, we conduct classification experiments using cross-validation. For this purpose, we randomly select 1 Epilepsy subject and 1 PNES subject where each subject includes 10 epoches with the same label. Therefore, there are totally 5 × 5 = 25 pairs of cross-validation experiments. The results reported in this paper will be the average of these 25 pairs.

EEG signal analysis
In this paper, a wavelet-based time-frequency scheme [23] is applied to decompose the EEG signals into its sub-bands. The wavelet decomposition is a smooth and quickly vanishing oscillating function with good localization in both frequency and time. Then we use different features based on the EEG signals to transform raw signals in each sub-bands into more informative signatures or fingerprints of the brain network. Note that the signal features are extracted from each single EEG channel and then all of the extracted features are used as the input data for the classifiers. Here, the selected signal features are presented below briefly.

Energy
Discrete time signals are the signals that can be defined and represented at certain time instants of the sequence. As we mentioned before, the sampling rate of all data channels is 256 Hz. It means that the voltage of brain at different locations has been recorded every 1/256 s. Hence, the EEG signals can be considered as discrete signals. In the discrete domain, the energy of the signal is given by [24] where i represents the recording time instant, x i the voltage of signal at i and n the total number of time instants.

Entropy-based features
Entropy measure shows the amount of randomness and uncertainty in the signal; therefore, the more fluctuating signal has a higher value of entropy. In other words, entropy reflects how well one can predict the behavior of each respective part of the trajectory from the other. Basically, higher entropy indicates more complex or chaotic systems, thus, less predictability.
Shannon entropy (ShE): Shannon entropy [25] is a nonlinear measure quantifying the degree of complexity in a signal. Let X be a set of a discrete EEG signal variables X = {x 1 , x 2 , . . . , x n }; x i ∈ R d . Now, the Shannon entropy is defined as Spectral entropy (SE): Spectral entropy (SE) computation uses Shannon's entropy formula to represent the power spectral densities of the EEG signal as probabilities [26]. For this purpose, fast Fourier's transformation (FFT) is used to obtain the spectrum. The normalized SE corresponding to the frequency range [f 1 , f 2 ] is calculated from 1-s epochs of 27-channel EEG signals of epileptic and PNES group as follows [27]: where N [f 1 , f 2 ] equals the total number of frequency components in the frequency range and P(f i ) represents the probability of the ith frequency component. Renyi entropy (RE): Renyi entropy, as an index of diversity, is generalizations of Shannon entropy that depend on a parameter [28]. If p(x i ) is a probability distribution on a finite set, its Renyi entropy of order α is defined as RE = 1 1−α ln n i=1 p(x i ) α , where 0 < α < ∞ . Renyi entropy approaches Shannon entropy as α → 1 [29]. In our study, the value of α is taken as 2. Steps involved in RE are quite similar to computing ShE.

Fractal dimension-based features
Fractals are mathematical sets with a high degree of geometrical complexity that can model many natural phenomena. A very important characteristic of fractals, useful for their description and classification, is their fractal dimension. The fractal dimension of a set in metric space, such as an EEG signal, can be computed from several different measures [30].
Fractal box dimension (FBD): For calculating this measure, a box with different side lengths is used to describe the change of the signal waveform. Smaller side lengths of the box lead to a longer calculation time, but the recognition rate of the signal will increase. Smaller side lengths of the box lead to a longer calculation time, but the recognition rate of the signal will increase. The idea is to apply continuous hypercube mesh coverage to the curve. If we consider X as a non-empty compact subset of the real plane, then the capacity dimension is defined as where N min (ǫ) is the smallest number of boxes with a side length ǫ required to cover X. The box dimension merely represents the geometric dimension of the signal, but does not reflect the density distribution in the planar space.

Higuchi fractal dimension (HFD):
The HFD is a fast non-linear computational method for obtaining the fractal dimension of signals even when very few data points are available [31]. HFD is used to quantify the complexity and self-similarity of a signal. To compute the HFD, the dataset is divided into a k-length sub-dataset as where n is the total length of the data sequence, k is a constant and m = 1, 2, ..., k . The length L m (k) for each sub-dataset is then computed as where the mean of L m (k) for each k is computed to find the HFD as It should be mentioned that to determine the maximum value for k, we followed the recommendation of Doyle et al. at [32]. For this purpose, a maximum number of reconstructed datasets, e.g., K max =5, is determined by the user. For each reconstructed dataset the curve length is calculated and plotted against its corresponding k value on a log-log scale. The resulting slope, fitted by a leastsquares method, represents the fractal dimension of the original data. Determining K max is by a process of examining the data and plotting the fractal dimension over a range of K max ; the point at which the fractal dimension plateaus is considered a saturation point beyond which no benefit could be gained from further calculations.
Best results for the current data were obtained using a K max =20.
Katz fractal dimension (KFD): The KFD is derived directly from the waveform, eliminating the pre-processing step of creating a binary sequence, can be defined as [33] (4) (7) KFD = log 10 (n) where n is the number of steps in the curve, L is the total length of the signal that is to say, the sum of the distance between successive points. Also d is the Euclidean distance between the first point in the series and the point that provides the furthest distance with respect to the first point.

Functional network analysis
Various complex network measures can be used to analyze the functional brain network and characterize one or more aspects of local or global brain connectivity.
To create a functional network, a matrix containing the EEG channels pairwise correlations is required. Thus, one needs to calculate the synchronizations among all pairs of signals and deduce the respective correlation (or adjacency) matrix. Applying a synchronization measure results in the calculation of a correlation matrix with each row representing a node and each column on that row representing the relationship between the current node and every other node in the network. Links between nodes are weighted which represent strength of correlation or causal interactions in functional networks.
In this paper, a synchronization measure based on the horizontal visibility graph (HVG) is applied to calculate correlation matrix and construct the functional network. Visibility algorithms are a family of methods that map signals as graphs nonlinearly [34][35][36]. The HVG algorithm provides an effective method to map EEG signals to a graph permitting a mutual relationship between dynamical properties of signals and topological properties of the graph. Therefore, the information on EEG signals is obtained just by analyzing the characteristics of the graph. In our previous works, we showed that the synchronization measure based on the HVG algorithm is a robust measure for finding correlation among chaotic, noisy and stochastic signals [37], and also this measure is less sensitive to the brain volume conduction effect and is able to predict the coupling degree correctly even with strongly overlapping signals [38]. This synchronization measure is presented here shortly.

HVG-based synchronization measure
Let x(t) be a univariate time series of N discrete data ( t = 1, 2, ..., N ). The visibility graph algorithm converts the time series x(t) to a graph, as a data point x(t) is mapped into a node in the graph. The time point (i.e., a point on the time series) represents a moment in which the data are recorded (see Fig. 1a). By applying the HVG algorithm, an EEG time series of size N maps to a visibility graph with N nodes. In this algorithm, two arbitrary data nodes t * and t ⋆ in the graph are connected if [35] (8) According to the HVG geometric criterion, two data points are connected if one can draw a horizontal line in the time series joining them that does not intersect any intermediate data height. Therefore, by applying the HVG, a signal of size N maps to a graph with N nodes, as the first node in Fig. 1b is associated with the first time point in Fig. 1a. The second node corresponds to the second time point of the EEG time series, and so on.
After constructing the visibility graph, the degree of each node is determined. The degree of node t is the number of links connected to node t. Therefore, by counting the number of links that have node t as an endpoint, we can determine the degree of each node. Then, by considering the degrees of all nodes, a degree sequence (DS) time series is obtained. The corresponding DSs of the HVG algorithms are shown in Fig. 1c as time series. Next, the similarity of two time series x(t) and y(t) is approximated by calculating the Cross-Correlation (CC) function between the DSs of the corresponding visibility graphs. The cross-correlation function measures the linear correlation between two time series as a function of their delay time, which is of interest because such a time delay may reflect a causal relationship between the time series. The CC between two time series x(t) and y(t) with the same N samples' length, where t denotes discrete time (t = 1, ..., N ) , is expressed as where t = 1, ..., N denotes discrete time and h ∈ {−(N − 1), ..., 0, ..., N − 1} denotes time lag. Here, CC = ±1 presents the complete linear direct and inverse correlations, respectively, and CC = 0 indicates lack of linear correlation for a given time lag.
After constructing the functional network at each frequency sub-band, some selected complex network measures are determined as following to detect aspects of the brain network.

Clustering coefficient
The clustering coefficient assesses the degree to which nodes tend to cluster together. In brain network studies, the clustering coefficient is considered to be a measure of the local connectivity of the functional brain network. Brain networks are "small worlds" in which different functional units can work independently but are connected to each other through hubs. A high clustering coefficient indicates the presence of local cliques forming specialized functional units. Given a weighted network G, the local clustering coefficient c i for node i is defined as [39] where w ij = w ij /max(w ij ) is the scaled weight. Here, d i (d i − 1)/2 is the maximum possible number of links when the subgraph of neighbors of node i is completely connected. The global clustering coefficient for the whole graph is the average of the local values and is defined as [40] (10) where N is the number of nodes in the graph. It is clear that 0 ≤ c i ≤ 1 and 0 ≤ C ≤ 1 . Note that c i = 1 if node i is the center of a fully interconnected cluster and c i = 0 if the neighbors of node i are not connected to each other.

Strength
Strength is one of the most basic structural properties of a weighted graph. The vertex strength is defined as the where j ∈ neighbor(i) and w represents the weighted adjacency matrix, in which w ij is the weight on the edge between node i and j [41].

Betweenness centrality
Centrality refers to the relative importance of a vertex within the network. Mostly, the vertices in a network with higher centrality index values are perceived as being the more important vertices. Betweenness centrality quantifies the number of times that a node acts as a bridge along the shortest path between two other nodes.
In an undirected network, a path between two nodes that has the minimum number of links is referred to as the shortest path between these two nodes. In the context of brain network analysis, a brain region (or EEG recording site) has a high betweenness centrality index if it is strategically located as a midpoint between several pairs of brain regions, and therefore, controls the flow of information across the brain network.
Consider an undirected graph G = (V , E) , where V and E denote its node and link set, respectively. For three distinct nodes v 1 , v 2 , v 3 ∈ V , let σ v 1 ,v 3 = 0 be the number of shortest paths between v 1 and v 3 in G, and let σ v 1 ,v 3 (v 2 ) be the number of shortest paths between v 1 and v 3 that pass through v 2 . The betweenness centrality index of node v 2 is defined as [42] The average node betweenness centrality of the graph is defined as follows: The betweenness centrality lies between zero and N − 1 2 , where the value 0 is obtained if and only if all neighbors of v i induce a maximal clique in G.

Eigenvector centrality and largest eigenvalue
Eigenvector centrality is a global measure of centrality, as it does not focus on the immediate vicinity of nodes but instead considers all possible indirect connections.
It operates under the premise that connections to nodes that are themselves well-connected should be given more weight than connections to less well-connected nodes.
Eigenvector centrality for all nodes in the network, then, is simply given by the eigenvector corresponding to the largest eigenvalue (also called the Perron eigenvalue). In brain network studies, the eigenvector centrality is a measure that approximates the centrality or the importance of a brain region to the corresponding functional network. Eigenvector centrality attributes a value to each voxel in the brain, such that a voxel receives a large value if it is strongly correlated with many other nodes that are themselves central within the network. A brain region has higher eigenvector centrality if its neighbors are also highly central. It has been demonstrated that eigenvector centrality is a computationally efficient tool for capturing intrinsic neural architecture on a voxel-wise level [43]. For a matrix A ∈ R N ×N , a number is an eigenvalue if, for some vector � c � = 0 [41], Here, the centrality vector c is the eigenvector of the adjacency matrix A associated with the eigenvalue . In general, eigenvectors give the direction of spread of data, while the eigenvalue is the intensity of spread in a particular direction or of that respective eigenvector. Given the weighted correlation matrix A of network G, it is wise to choose the largest eigenvalue, max , in the absolute value of matrix. By virtue of the Perron-Frobenius theorem [41], this choice implies that if the graph is strongly connected, then the eigenvector solution c is both unique and positive.

EEG microstate analysis
For microstate analysis, we follow the standard steps in microstate segmentation presented in [44]. For this purpose, the EEG data at different bands were imported to MATLAB (vR2016a) using the EEGLAB toolbox (v14.1.2) [45,46]. First, we need to calculate the global field power (GFP) at each data point which represents the magnitude of the field strength at each moment in time. The GFP at each data point is equal to the root of the mean of the squared potential differences at all N electrodes, i.e., V i (t) , from the mean of instantaneous potentials across electrodes, i.e., V i (t) , equivalently, the standard deviation across all electrodes of the EEG for the ith data point [47]. Formally, Topographies that occur at local peaks of the GFP(t) curve represent instants of greatest field strength and highest SNR. Since the field topography remains essentially stable between two peaks of the GFP(t) curve and changes during the troughs, the topographies at GFP(t) maxima are representative of topographies at (15) surrounding data points in time [18,48]. Thus, representation of the EEG data as a set of topographies at local GFP(t) maxima is a valid data reduction method. Therefore for each subject, the topographies at local GFP(t) peaks are extracted. These topographies are called the original maps and which are submitted to a clustering algorithm, such as K-means, to obtain the desired number of cluster maps with the goal of maximizing the similarity between the EEG samples and the prototypes of the microstates they are assigned to. A schematic overview of the microstate analysis is shown in Fig. 2.
In this work we aim to compare the cluster maps of two different groups (i.e., epilepsy and PNES patient groups) and then identify patients using a machine learning technique. Each subject may result in different number of microstate cluster maps. Therefore, it would be quite complicated to compare the temporal characteristics of microstates' maps between the two groups. Hence, it would be ideal to have a set of global cluster maps that represent the recordings of all subjects in both group and then fit these common maps to the individual data for further investigations. Therefore, we apply a data aggregation scheme for each group, as 5000 original maps at GFP(t) maxima of each subject, with the minimum peak distance of 20 µs, are extracted and concatenated to create a new series of topographic original maps. This aggregated series explain variance in both of our datasets, consisting of 100 subjects including 50 epilepsy and 50 PNES. As the next step, the aggregated series is submitted to the modified K-means clustering algorithm to obtain the global microstate cluster maps.

Effective number of cluster maps
Finding the optimum number of cluster maps is crucial for capturing the informative features of the data and avoids over/under-fitting. Selecting the number of cluster microstates is not a straightforward choice to make [21,49,50]. In this paper, we apply cross-validation criterion [51] as a measure of fit for selecting the effective number of microstates, because this measure is polarity-invariant as it is assumed in the segmentation of spontaneous EEG data.
The cross-validation criterion (CV) [51] optimizes the ratio between the global explained variance and the degrees of freedom for a given set of cluster maps. This measure is related to the residual noise, ǫ , and the goal is therefore to obtain a low value of CV.
where C is number of EEG channels, K number of microstate clusters and σ 2 an estimator of the variance of the residual noise calculated as where N is number of time samples, x n is the nth time sample of the recorded EEG, a l n signifies the topographical map assigned to nth EEG sample and l n is the microstate label of the n-th EEG sample. Practically, the CV criterion pointing to the best clustering solution at its smallest value.
The decision for the right number of clusters obviously reflects a trade-off between the goodness of fit and the complexity a high number of microstates brings to the segmentation. Hence according to the CV and GEV plots (see Fig. 3, the optimum numbers of global cluster maps are 3, for alpha and beta-bands, and 4 for delta and thetabands. The topographies of the global cluster maps are shown in Fig. 4.

Back-fitting microstates maps to EEG
Once the global cluster maps have been determined, they are fitted-back to each individual subject's EEG and its corresponding GFP(t) data to define the microstates and extract different features. Back-fitting procedure assigns microstate labels to EEG data point based on which cluster map they are most topographically similar with using the global map dissimilarity (GMD) measure. The GMD is a distance measure that is invariant to the strength of the signal and instead only looks at how similar the topographical maps look. For two EEG samples, x n and x n ′ , GMD is calculated as By normalizing with GFP, two EEG samples that belong to the same microstate, but have different strength, will achieve a low GMD distance.
Hence, the obtained global cluster maps are fitted backward to the original data calculating the spatial correlation between each template and the topography at each time instant corresponding to the maximum value of GFP. Such a procedure allows to represent the EEG time series in terms of sequence of microstates and to extrapolate variables of interest. Figure 5 shows an epoch of EEG data of two different subjects as a function of global cluster microstates at different EEG bands. It is worth mentioning that the unwanted noise in EEG recording can appear as a short microstate segments after the back-fitting procedure. To eliminate this issue, the small maps rejection algorithm is implemented to temporally smooth the microstates after the back-fitting. For this purpose, we introduce a threshold (here: 30 µs) which defines the minimum duration (18) 2 Schematic flowchart of the EEG microstate analysis. Each EEG datum is used to calculate the GFP curve at each data point. The electric potentials of all electrodes at moments of local maxima of the GFP curve are plotted to generate topographic original maps. The original maps are submitted to a clustering algorithm, which groups the submitted maps into a small set of clusters (here: 3) based on topographic similarity, and optimal number of cluster microstate maps is generated for each subject. Finally, the cluster maps are back-fitted to the GFP curve and each data point is labeled with the cluster map that they best correlated to. Therefore, the multichannel EEG recording is now described as a series of alternating microstates for the microstate segments to last. Hence, the label of each microstate segment with duration lower than the threshold changes to the next most likely microstate cluster map as measured by the GMD measure.

EEG microstate features
The basic temporal dynamics of microstates are described by occurrence (k), duration (k), and coverage (k). Occurrence (k) reflects the average number of times per second a microstate is dominant, the Duration (k) is defined as the average duration of a given microstate (in milliseconds), and the Coverage (k) reflects the fraction of time a given microstate is active. These features are inputted to various selected classifiers, independently.

Training/test set split
As introduced in Sect. 2.1, there are totally 5 epilepsy and 5 PNES patients and each patient has 50 IED-free epochs. To augment the data, we transfer these 10 patients into 100 subjects including 50 Epilepsy and 50 PNES epochs. I.e., we transfer each patient into 10 subjects by diving the epochs with a fixed duration.
To avoid overfitting, we conduct classification experiments using cross-validation. However, the data are augmented from limited number of patients so it requires a specific data split to prevent the classifier learning the patterns of each patient. For this purpose, we randomly select 1 Epilepsy patient and 1 PNES patient where each patient includes 10 epochs with the same label (totally 20 subjects) as the test set. The The results reported in this paper will be the average of these 25 pairs.

Features classification
In machine learning and statistics, classification is a supervised learning approach in which the computer program learns from the data input with labels given to it and then uses this learning to classify new observation.
In this work, we apply various well-known classification techniques, including k-Nearest-Neighbors [52], Decision Tree [53], Neural Network [54], Random Forest [55], Naive Bayes [56], Support Vector Machine with linear and radial kernels (SVM-Linear, SVM-RBF) [57] and Gradient Boosting [58], for classification of subjects. The specific kind of function being learned and the assumptions built into it are what distinguish among the various types of classifiers.
The usual model performance measures for evaluating a classification model are precision (or positive predictive value), recall (or sensitivity, true positive rate), accuracy and specificity (or true negative rate). Precision is calculated as the number of correct positive predictions The receiver operating characteristic (ROC) curve is a plot of specificity in the x axis and recall in the y axis. Hence, the ROC curve is a plot of the false-positive rate ( x-axis) versus the true-positive rate ( y-axis) for a number of different subjects threshold values between 0.0 and 1.0. Area under the ROC curve is a measure of model performance. The area under the curve (AUC) of a random classifier is 50% and that of a perfect classifier is 100% . For practical situations, an AUC of over 70% is desirable [59].

Results
In this section, the classification results of epilepsy and PNES, in the absence of an interictal discharge, from real multichannel EEG data are presented based on the EEG signal, functional network and EEG microstate features.

EEG features' classification
In this section, the classification of EEG signal features which were extracted from each single channels is presented. Table 1 shows the performance of the selected EEG signal features in the classification task using different classifiers. Here, the results for two evaluation metrics, i.e., precision and recall, at different EEG frequency sub-bands are presented. It can be seen that different sub-bands have different performance w.r.t selected signal features. Conclusions for each individual sub-band are as following: • In alpha-band, the spectral entropy performs best among the features and the SVM classifier (with linear and RBF kernels) performs best among the classification techniques. The Renyi entropy is the second best feature in the classification tasks. Besides, features such as Higuchi fractal dimension are the worst feature to distinguish subjects because it only achieves about 50% precision. • In beta-band, all features except Higuchi fractal dimension achieve acceptable classification results. Similar to alpha-band, the SVM (with linear and RBF kernels) performs best among all classifiers, and the Higuchi fractal dimension is the worst feature to distinguish subjects. • In delta-band, Katz Fractal dimension and signal energy perform relatively better than other features. Entropy-based features, including Shanon, spectral and Renyi, did not perform well in classifying subjects. Similarly, the SVM (with linear and RBF kernels) performs best in most features. • In theta-band, Katz fractal dimension performs the best among all features. But other features achieve poor performance as the precision is around 50% . Similarly, the SVM (with linear and RBF kernels) performs best in most features. • In gamma-band, almost all features obtain poor performance except signal energy. It indicates that gamma-band may not be a very effective band for classifying the subjects in experiments.
The receiver operating characteristic (ROC) curves for different sub-bands are shown in Figs. 6, 7, 8, 9 and 10. Considering the area under the curves (AUC) we can conclude that the classifiers that use features extracted from the beta-band perform best in classifying subjects as the AUC is the beta-band that is higher than other sub-bands for all selected features. Also, it is observed that the gamma-band performs worst using different features. Furthermore, the alpha-band performs relatively good but still much worse than beta-band. The deltaband and the theta-band are similar to random guess.
We also considered a combination of all selected features as the input for the classifiers. The results are presented in Table 2. It can be seen that by combining all the selected features, the classification precision and recall become larger for all sub-bands.

Network features classification
By applying the horizontal visibility graph algorithm, the synchronizations among all pairs of EEGs are calculated. Then, the correlation matrices and corresponding functional brain networks are constructed to extract selected network measures, i.e., clustering coefficient, strength, betweenness centrality, eigenvector centrality and largest eigenvalue (see Sect. 2.2). At first, the classification techniques were applied on each network measure independently. However, the classification results were poor.
Hence, a combination of all selected features was considered as the input for the classifiers.
The precision, recall and accuracy of the classification methods with the best performances for the combination of all the networks' features at different EEG bands are presented in Table 3. From these results, we can say that the functional network features are not strong discriminative features to be used for the classification of the epileptic seizure and PNES. However from the results, we can conclude that functional network features are robust to the classification task, i.e., different bands perform similarly in classification precision/recall. Among different bands, gamma-band performs best while theta-band performs worst. Also, among different applied classifiers, the SVM either with linear or with RBF kernel performs best for all EEG bands. The only exception is delta-band where Random Forest classifier performs best. Note that for the gamma-band the results of the SVM (RBF) are about 5 % less than the results of the Random Forest technique.

Microstate features classification
These microstate features are inputted to various selected classifiers, independently. The classification precision, recall and accuracy are presented in Tables 4, 5 and 6. Also, Table 7 presents the classification results when all three features are considered as inputs for classifiers. From these results it can be seen that the microstate analysis in beta-band leads to more accurate results compared to other EEG bands. Also, the kNN classifier is a superior technique for doing classification. The only exception is when coverage (k) is the classification input, where Random Forest classifier performs slightly better than the kNN model. Furthermore, it is observed that the occurrence (k) is the weakest discriminative feature as it results in overall accuracy of 68.8% with 72.8% precision and 68.9% recall, whereas duration (k), coverage (k) and combination of all features mostly result in accuracy, precision and recall higher than 80%. To further evaluate the importance of the frequency bands, the so-called leave-one-out tests are performed. For this purpose, each microstate feature (i.e., occurrence (k), duration (k) and precision (k) and also combination of all three features) in all bands (i.e., alpha, beta, delta and theta) is inputted to classifiers independently to measure accuracy, precision and recall of the classification. The results of this test are shown under the header of All in   Table 8. Then, we eliminate one of the frequency bands and do the classification again. The results are shown as All-alpha, All-beta, All-delta and All-theta in Table 8.
The results show that the alpha, delta and theta-bands do not contain important data for microstate analysis as by eliminating them from the classification procedure, the accuracy, precision ad recall not only does not decrease significantly, but also become pronounced for some cases. However, the results for the beta frequency band are quite different. It can be seen that by eliminating the beta-band from the classification, the values of accuracy, precision and recall reduce significantly which highlight the importance of the beta-band in microstate analysis. This importance is confirmed by all selected classification techniques presented in this work.
The classification accuracy of the proposed system is also evaluated through receiver operating characteristic (ROC) curves for different microstate measures        Table 8 Calculated classification accuracy, precision and recall using all frequency bands (All), and excluding alpha-band (All-alpha), beta-band (All-beta), delta-band (All-delta) and theta-band (All-theta)  This indicates that the beta-band is most accurate subband for our classification purpose. Furthermore, it is obvious from Fig. 12 that the coverage mainly results in larger AUC compared to other presented measures. The importance of the microstate features is presented in Table 9. The results show that by leaving out the coverage from the classification in beta-band, the accuracy, precision and recall of the classification reduce significantly compared to other measures. Hence, the coverage and beta-band are the most important features for classification of epileptic seizure and PNES using the microstate analysis.

Conclusion
In this paper, we investigated the EEG signal and functional brain network features for the automatic classification of epilepsy and PNES patients. An epileptic seizure is a transient occurrence of signs due to abnormal excessive or synchronous neuronal activity in the brain, where as PNES are events resembling an epileptic seizure, but without the characteristic electrical discharges associated with epileptic seizure. Hence, in the absence of the electrical discharge, the PNES is commonly misdiagnosed as an epileptic seizure. Generally, by performing a long-time EEG monitoring and recording the physicians can see if epileptiform discharges occur that aid in diagnosing the disorder. However, this monitoring is quite expensive and time-consuming. Hence, we aimed to effectively classify these two brain disorders in the absence of a seizure by analyzing various short-term EEG signal and network features using machine learning algorithms. All of our results showed that the beta-band is the most representative frequency sub-band for subject classification. Generally, the classification based on the EEG signal features and functional network features does not lead to classification with a strong performance even if various classification techniques are applied. The prediction accuracy was found to be around 80% when the classification was computed based on the microstate features extracted from the beta-bands.