A dynamic directed transfer function for brain functional network-based feature extraction

Directed transfer function (DTF) is good at characterizing the pairwise interactions from whole brain network and has been applied in discrimination of motor imagery (MI) tasks. Considering the fact that MI electroencephalogram signals are more non-stationary in frequency domain than in time domain, and the activated intensities of α band (8–13 Hz) and β band [13–30 Hz, with \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta_{1}$$\end{document}β1(13–21 Hz) and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta_{2}$$\end{document}β2(21–30 Hz) included] have considerable differences for different subjects, a dynamic DTF (DDTF) with variable model order and frequency band is proposed to construct the brain functional networks (BFNs), whose information flows and outflows are further calculated as network features and evaluated by support vector machine. Extensive experiments are conducted based on a public BCI competition dataset and a real-world dataset, the highest recognition rate achieve 100% and 86%, respectively. The experimental results suggest that DDTF can reflect the dynamic evolution of BFN, the best subject-based DDTF appears in one of four frequency sub-bands (α, β, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\beta_{1} ,$$\end{document}β1, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$${ }\beta_{2}$$\end{document}β2) for discrimination of MI tasks and is much more related to the current and previous states. Besides, DDTF is superior compared to granger causality-based and traditional feature extraction methods, the t-test and Kappa values show its statistical significance and high consistency as well.


Introduction
Brain-computer interface (BCI) is a technology that allows people with healthy bodies or motor impairments to communicate with external environment through their brains' activity without the assistance of peripheral nerves and muscles [1,2]. Motor imagery (MI) refers to a mental process by which an individual rehearses or simulates a given action, for example, imaging the left or right hand movement without actually executing it. Motor imagery electroencephalography (MI-EEG) is commonly used in BCI owing to its advantages of high temporal resolution, high portability and few risks. Complex motor imagery can activate scattered areas of cortex, mainly including the somatosensory and somatomotor areas, and the brain activations vary with different MI tasks. MI-EEG is a multi-channel non-stationary and time-frequency signal with spatial distribution characteristic, and it shows obvious individual differences. The MI responsive frequency bands are not consistent for inter-/intrasubjects, this reflects the subject-based characteristics of MI-EEG. The MI-based BCI interprets mental activities by identifying EEG signals of different MI tasks, and realizes the control and exchange of information between the brain and the outside world. The key to improve recognition accuracy is how to effectively use these characteristics of MI-EEG for feature extraction [3].
In the past years, a variety of feature extraction methods have been used in classifying mental tasks. Common spatial pattern (CSP) is one of the most popular and

Open Access
Brain Informatics *Correspondence: 861737447@qq.com 1 Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China Full list of author information is available at the end of the article efficient algorithms which is particularly renowned for the high classification rates, notably in BCI competitions [4][5][6]. However, the classical CSP also has its limitations which are sensitive to noise, overfitting and individual variability. To tackle these issues, a series of CSP-based methods have been proposed, including 'analytic signal'based CSP (ACSP) [6], correlation-based channel selection common spatial pattern (CCS-RCSP) [7], common complex-spatio spectral pattern (CCSSP) [8] and complex common spatial patterns (CCSPs) [9]. The variants of CSP methods make full of the multi-channel and spatial distribution characteristics of MI-EEG for feature extraction and achieve preferable classification accuracies. Regretfully, the information transfer and flow between channels have not received much attention. In fact, EEG researchers have tended to reveal that during performing even simple motor or cognitive tasks, many different functional areas widely scattered over the brain are mutually interconnected and exchange their information with one another, thus making it hard to isolate one or two regions where the activity takes place [10]. Physiologically speaking, when humans are involved in a motor task, our brains function as a complex network with the interactions of specialized, spatially distributed but functionally linked brain regions [11][12][13][14][15], contributing to the related brain functions. Therefore, it is an important strategy for ameliorating MI-EEG feature extraction to discover and use information flow between multi-channel EEG signals.
Recent years have witnessed that many researchers topologically characterize brain functional network by employing a mathematical framework called graph theory [16][17][18]. In graph theory, a network is defined as a graph formed by a set of nodes interconnected by edges [19]. Nodes in large-scale brain functional networks usually represent regions of interest (ROIs) or EEG electrodes, while links represent functional connectivity [17,18,[20][21][22]. Generally, we can construct directed or undirected brain functional networks depending on adopting symmetric or asymmetric metrics of coupling between two signals. Anyway, in order to obtain abundant and exact properties, directed brain functional network is employed in this study. Granger causality (GC), as one of the commonly used techniques to construct a directed brain functional network, is aimed to illuminate casual temporal relations and directional nature of information flow for a pair of EEG channels. The basic notion of GC was originally conceived by Wiener [23], later adopted and formalized by Granger in the form of linear regression model. Granger's causality concept has attracted a lot of people's attention and has been extensively applied in the field of economics and neuroscience [24]. Since spectral properties are significant in biomedical signal analysis, extension of the concept to the frequency domain representation of time series was formulated by Geweke [25]. Although the above methods had lots of applications in many areas, it should be noted that they estimated directional information flow through bivariate approaches without using the whole covariance structure for a multivariate system. Unfortunately, it was reported that bivariate measures in certain cases could potentially give spurious connections and misleading results especially when some signals are fed from common channel sources, as is very likely in neurobiological systems [25].
To meet the demands, a full multivariate estimator, i.e., directed transfer function (DTF), was proposed to overcome the limitations of bivariate autoregressive methods and characterize directional connectivity as well as spectral properties of the interactions between any given pair of brain signals, and only one multivariate autoregressive (MVAR) model was required to be estimated simultaneously from all the EEG time recordings [26][27][28]. That is, all signals are regarded as members of one system and their mutual influences (not limited to pairwise connections) are considered. Subsequently, many DTF-based methods were developed. Ding et al. [29] proposed a short time directed transfer function (SDTF), in which the entire data were divided into short overlapping time intervals for computation of DTF measure on each interval. Ginter et al. [30] and Yi et al. [31] calculated SDTF to exhibit the casual relations of brain functional networks caused by motor imagery afterwards. Korzeniewska [32] first introduced a full frequency Directed Transfer Function (ffDTF) in which the normalization of DTF was performed over the full frequency band, then dDTF [32] was derived by multiplying ffDTF by partial coherence to only show direct connections while DTF and ffDTF reveal both direct and indirect connections. Billinger [33] showed the possibility of reliable classification of MI tasks through ffDTF and dDTF. Later in [34], dDTF was applied to feature extraction with power spectral density. Adaptive Directed Transfer Function (ADTF) was originally proposed by Wilke et al. [35] for abnormal physiological signals, such as those during epileptic seizures. The authors established time-varying coefficient matrices by Kalman filter algorithm, constructed MVAAR models and got time-varying brain functional networks. In the following years, ADTF was also extended to the construction of time-varying brain function networks of other EEG signals. Li et al. [15,36] used ADTF to investigate the time-varying P300 network patterns and evaluate time-varying MI-EEG functional networks as well as observe MI information processing in different stages. Wang et al. [37] presented a novel wavelet-based directed transfer function (WDTF) method by combining the wavelet decomposition and the directed transfer function (DTF) algorithm for patient-specific seizure detection.
To sum up, DTF and its extended methods have successively been applied to construct brain functional networks and shown their effectiveness for different EEG signals. It is a potential problem that how to grasp the activated characteristics of MI-EEG signals to further improve DTF and find effective feature parameters to identify MI tasks. In order to fully represent the variation characteristics of MI-EEG in frequency domain, we will present a dynamic DTF, named as DDTF. The experiments are conducted based on publicly available MI-EEG data from BCI competition and real acquisition data from real-world, and the results suggest that this metric not only extracts effective information, but also achieves a high classification accuracy while only one or two order frequency domain model is used to select the best frequency band and calculate DDTF, demonstrating the adaptive and dynamic characteristics of MI-EEG.
The rest of the paper is organized as follows: in Sect. 2, the DDTF-based feature extraction method is described in detail. Experimental research is performed in Sect. 3. Section 4 provides the discussions, and conclusions are drawn in the final section.

Methods
By adaptively selecting the frequency band and model order m, DTF is developed to generate DDTF, which is applied to construction and feature extraction of brain functional network, the main process is as follows.

Preprocessing of MI-EEG signals
Suppose that x 0 (t) = x 01 (t), x 02 (t), . . . , x 0N0 (t) T ∈ R N0×K0 represents the original MI-EEG signals, where N 0 and K 0 refer to the number of total channels and sample points, respectively.
x 0 (t) is spatially filtered with CAR filter at first and the filtered signal is expressed as Step 2: Optimal sample interval selection.
The optimal sample interval [a, b] is selected with the most obvious event-related desynchronization (ERD)/event-related synchronization (ERS) physiological phenomenon and the MI-EEG signals in this time period are denoted as (1) (2) Step 3: Bandpass filtering. The important information in EEG signals is often hidden in frequency with respect to allied cognitive tasks [38] and it is generally accepted that most of the motor imagery-related EEG signals are coded in the frequency band of 8-30 Hz [39]. What's more, the β band (13-30 Hz), with β 1 or lower β sub-band (13-21 Hz) and β 2 or higher β sub-band (21)(22)(23)(24)(25)(26)(27)(28)(29)(30) included, has been reported to be good physiological predictors during motor imagery tasks compared to other frequency bands [40]. This information is significant, as it has been verified that the interdependency at different frequency ranges may own distinct physiological functions [41]. At this point, x 2 (t) is bandpass filtered to α band and β band ( β 1 and β 2 sub band if necessary) and the filtered signals are later described as Step 4: Channel selection.
Considering the computational complexity and feature information redundancy, N channels are selected to cover as many brain regions as possible in the premise of guaranteeing their asymmetries, the signals after channel selection are represented as

The proposed dynamic DTF (DDTF)
In this subsection, DTF is improved, named as DDTF, in which the model order and frequency band are changing and adaptively selected by recognition rates. The detailed introduction takes α band for example.
Step 1: Let us consider a multivariate autoregressive model (MVAR), the present state of MI-EEG signals can be approximated by a weighted sum of p α previous values of all selected channels over α band, and a random noise series is also added: where e α (t) = e α 1 (t), e α 2 (t), . . . , e α N (t) T is the estimation error which is a multivariate uncorrelated white noise sequence with zero mean.
Assume that each element of matrix A α (r) is denoted as a α uv , the coefficient matrix can be set as parts a α uv (r), u = v quantify time-lagged influences between different channels of EEG signals x α u (t) and x α v (t) . The model coefficients can be got iteratively so as to minimize the error between the actual (measured) and predicted values. The model order p α , which denotes the maximum time lag, is used to disclose the influences of past states on the current state. The optimal model order is determined by Schwarz Bayesian Criterion (SBC) [42] and is used for MVAR model fitting to MI-EEG data.
Step 2: Once the MVAR model is fully constructed, the dynamic spectral analysis is performed based on Fourier transform. The B m α f , which has a varying order m α , is defined as follows: where B 0 f = −I (I being the identity matrix). t is the temporal interval between two samples, j represents the imagery unit, f denotes frequency, and we have where X α f and E α f are the transforms of x α (t) and e α (t) , respectively.
Step 3: Equation (7) can be rewritten as follows: Here H m α f is called the transfer matrix, and based on it we define the DDTF from channel s to channel l at frequency f as follows: where H m α ls f is the element in the lth row of column s of transfer matrix H m α f . In addition, a normalization procedure is further executed when θ m α ls f 2 is divided by the quadratic sum of all elements in the lth row as follows: which describes the ratio of influence from channel s to channel l with respect to the joint influence from all the other channels to channel l and has a value between 0 to 1. Value close to 1 means that the channel s has great impact on the channel l while value close to 0 shows that the channel s makes little contribution to channel l.
Step 4: To get the mean value of Eq. (10) under different frequencies, the accumulation over α band is applied: Here f 1 f 1 , f 2 equal to 8 Hz and 13 Hz (i.e., the lower and upper bound of α band), respectively. Perform the same steps for x β (t),x β 1 (t), x β 2 (t) and we get where f 3 , f 4 are 21 Hz and 30 Hz, respectively.

Brain functional network construction based on DDTF
ϒ m α ls reveals the direction and strength between two channels in the total α band. For example, ϒ m α 12 represents the connection intensity from channel 2 to channel 1 and vice versa. The brain functional network of α band then can be constructed with EEG electrodes as nodes (8) , and ϒ m α , the adjacency matrix (AM), as links between channels. Similarly, the brain functional networks of β, β 1 and β 2 bands can also be constructed separately.

Definition of feature parameters
Feature parameters can be obtained according to the brain functional networks and adjacency matrices. Given channel g (g = 1, 2, . . . , N) for example, the gth row of feature matrix ϒ m α ,ϒ m β , ϒ m β 1 and ϒ m β 2 are summed to acquire the inflow information to channel g for α, β,β 1 and β 2 bands, respectively: As a logical sequence, the outflow from channel g is obtained by summing the gth column of adjacency matrix for each band as follows: Inflow characterizes the total information received by a particular destination channel g from other channels while outflow describes gross messages transmitted from the given source node g to the rest of network. Both explore the information interaction processes between specific areas of the brain and other regions.
Furthermore, the inflow and outflow are combined to define information flow (similarly, we take the channel g as an example): .
Information flow indicates the role of channel g playing in the process of information transmission. The larger information flow is, the greater contribution g has to other channels. Conversely, it means that there is barely no or very little info from channel g when the value is approaching to 0.

Construction of a feature vector
Let us take α band for example, there is an OUT m α g and IF m α g for channel g, OUT m α and IF m α can be obtained when the features related to all the N channels are assembled, namely, where N means the number of selected channels. They are fused in serial to form the network feature vector of α band, namely F m α : Analogously, feature vectors of β, β 1 or β 2 band can be acquired as follows: After this process, feature vectors at a characteristic frequency for those directed interconnections are input to SVM classifier to discriminate different motor imagery tasks.

Feature evaluation by SVM
Given each m α , we have the corresponding features F m α and accuracies Acc m α . Similarly, F m β and Acc m β can be obtained when MI-EEG signals are filtered to β band. As is said before, motor imagery response frequency bands are not consistent for each individual subject. Therefore, it is of vital importance to find subjectspecific frequency band, and accuracies in α band and β band are compared. Suppose Acc m α are higher than Acc m β , it means α band is the final frequency band we are seeking for. Instead, the best classification accuracy is hidden in β band. Filter MI-EEG signals to β 1 (13-21 Hz) and β 2 band (21-30 Hz), repeat the above operations and the best accuracy as well as frequency band can ultimately be acquired.

Feature evaluation by SVM
Compare Acc mβ , Acc mβ 1 and Acc mβ 2 Output: the optimal frequency band and the best accuracy 3 Experimental research

Data description and preprocessing
The proposed method was first examined in detail based on a publicly available motor imagery dataset, which was provided by BCI Lab, Graz University of Technology. The datasets generated and analyzed during current study is available in the BCI Competition III Dataset IIIa repository, http:// www. bbci. de/ compe tition/ iii [43]. After that, it was applied to a second motor imagery dataset which was acquired from a real-world experiment. The dataset is available from the corresponding author on reasonable request. For easy of description, the referenced datasets were renamed as Dataset A and Dataset B below.

Dataset A: BCI competition III Dataset IIIa
The public EEG dataset considered originally consists of 60 EEG recordings referenced to the left mastoid and with the right mastoid serving as ground, the electrode position distribution is shown according to the scheme in Fig. 1. EEG was sampled at 250 Hz, it was online filtered by a bandpass filter between 1 and 50 Hz to remove artifacts. Dataset consists of three subjects performing left hand and right hand motor imagery tasks. A training set and a testing set are available for each subject. Both sets contain 45 trials per class for subject 1 (labeled as 'k3b'), and 30 trials per class for subjects 2 and 3 (labeled as 'k6b' and 'l1b'). The subjects sat in a comfortable chair with armrests and were instructed to perform a specific motor imagery task once the relevant cue was shown on a screen, until a fixation cross disappeared. Each trial lasted 8 s. After a trial began, the first 2 s was quiet, at t = 2 s an acoustic stimulus indicated the beginning of the trial, and a cross "+"was displayed; then from t = 3 s an arrow to the left or right was displayed for 1 s; at the same time the subject was asked to imagine left hand or right hand movement, respectively, until the cross disappeared at t = 7 s. The subject took a break and then conducted the next experiment. The MI-EEG collection timing scheme is shown in Fig. 2.

Dataset B: real acquisition data
The real MI-EEG data were acquired by Neuroscan EEG signal recording and analysis system. EEGs were collected using 64 Ag/AgCl electrodes placed according to the international 10-20 system. The subjects are 9 healthy graduate students aged between 23 and 26 (marked as S1-S9, respectively), including two females and seven males. All of them are right-handed and without cognitive impairment. The MI tasks included imagining right-hand pronation and supination, and the sampling frequency was 1000 Hz. In order to ensure the quality of MI-EEG signals, the acquisition experiment of each subject was divided into 5 runs, one run consists of 20 trials (10 for each of two possible classes), i.e., 50 trials per class for each subject. Each subject should be trained in 2 runs of MI tasks before formal acquisition experiment to guarantee he/she was familiar with the experimental process. Each trial lasted 8 s, and the timing diagram is shown in Fig. 3. The first 2 s was preparation period, the "+" cursor appeared on the screen; at t = 2 s the motor imagery prompt tone appeared, and the prompt video was displayed on the screen for 2 s, i.e., imagining the right-hand pronation or supination; the subject performed the corresponding MI task according to the video prompt at 4-8 s; the end sound of MI task appeared on 8 s, the subject took a rest and then proceeded to the next experiment. Thereafter, the original collected MI-EEG signals were down-sampled to 250 Hz and the ocular artifacts were removed. Then, a CAR spatial filter was used for baseline correction. The 4.5-7.5 s MI-EEG was further intercepted and filtered to different frequency bands. Meanwhile, 28 channels covered by green circles in Fig. 4 are selected to reduce information redundancy

Adaptive selection of the best frequency band and m value for each subject
For subject 'k3b' , MVAR were fitted to the preprocessed MI-EEG data (both α and β band) and the model order of α and β band were p α = 3 and p β = 8, respectively. As is known to us all, the wider frequency band the signal, the more complex the signal is. Obviously, a higher model order is needed to match and describe the signal. Furthermore, B m α f and B m β f were calculated according to Eq. (6). Following the steps in Sect. 2, features F m α and F m β were finally obtained and then assessed by SVM. For comparative research, experiments were also carried on MI-EEG signals filtered to 8-30 Hz with the model order of 10. The 10 × 10-fold cross-validation (CV) methodology was used to eliminate the contingency in feature extraction of MI-EEG signals as well as increase the reliability of experimental results, i.e., it uses nine observations to train the classifiers and the remaining one for testing the model. The final classification accuracy is the average of 10 runs [44], as is shown in Fig. 5a and m refers to the varying model order in subsequent sections. It can be clearly seen from Fig. 5a that the highest recognition rate without frequency band selection, i.e., 91.67%, is lower than 96.78%, which is the best accuracy after band selection. In addition, classification accuracies of β band are always higher than those of α band. Considering β band is broader and may contain redundant information, it is refined to find the most active band for recognition. Under this consideration, we separated β band into 2 sub-bands, i.e., β 1 band or lower β band (13-21 Hz) and β 2 band or higher β band (21-30 Hz). The same experiments were performed on MI-EEG signals in the two sub-bands, as is displayed in Algorithm. The model orders of β 1 and β 2 band were 5 and final recognition rates are exhibited in Fig. 5b for better observations. In Fig. 5b, the signals in β 2 band possess higher accuracies than those in β 1 band except m is 1. Unbelievably and reasonably, the recognition rate reaches 100% when m is set to 2 in β 2 band.
The same operations were carried on for subjects 'k6b' and 'l1b' , the average recognition rates with  10 × 10-fold CV are demonstrated in Fig. 6a for 'k6b' and Fig. 6b for 'l1b' , respectively. Different from subject 'k3b' , subjects 'k6b' and 'l1b' show better separability in α band comparing to β band under left/right hand motor imagery tasks. Nevertheless, subject 'l1b' performs better than 'k6b' and gets higher accuracies. This phenomenon perfectly tallies with subject-based characteristic of MI-EEG signals. Affected by internal and external environments, subjects show great diversities. Even for the same subject, the recognition rates vary a lot in various frequency bands, as shown in Fig. 5a, b. Moreover, classification results of MI-EEG signals filtered to  Hz are worse than those under adaptive frequency band selection ( α band) for subjects 'k6b' and 'l1b' . All the above reveals that dividing EEG signals into varying frequency bands can improve the classification performances.
Similarly, recognition rates get largely promoted when m is set to 1 or 2 instead of the original model order p despite in either frequency band for all the subjects. This is because MI-EEG signals are non-stationary in frequency domain, the information of previous 1 or 2 moment is much more related to the current state and thus is more beneficial for recognition than all the information is considered. According to Figs Table 1.

Classification rates without frequency band selection
In this section, the classification rates without frequency band selection for 3 subjects are displayed in Fig. 7 for convenient observations.
By comparing the results in Fig. 7 with those in Figs. 5 and 6, we can find the best results without frequency band selection for subjects 'k3b' , 'k6b' and 'l1b' are 91.67%, 65.58% and 78.00%, respectively, which are much lower than 100.00%, 82.25% and 99.75% with frequency band selection. It further verifies that adaptive selection of the best frequency band for individual subject can improve the classification rates. In the meantime, it is observed that the best classification accuracies are achieved when m value was set to 2 for all the three subjects and this phenomenon coincides with the above ones. What's more, accuracies gain a certain degree of improvement ranging from 1.75 to 4.39% under selection of m values, which is lower than 8.33-17.06% under selection of both best frequency band and m values. This indicates that the double selection of the best frequency band and m values for each subject can yield the best results, which demonstrates the effectiveness of DDTF.

Visualizations of adjacency matrices, brain functional networks, outflows and inflows
It is noted from the results in Sects. 3.2.1 and 3.2.2 that recognition rates get promoted when m equals to 1 or 2 in any frequency band for each subject, so we take one trial of 'k3b' in α band for example to see the distinctions of adjacency matrices, brain functional networks, outflows and inflows under different m values. In addition, we denote 'LH' task as imagine left hand movement and 'RH' task as imagine right hand movement for convenient expressions. The adjacency matrices under different m values for 'LH' and 'RH' tasks are expressed in Fig. 8. Horizontal and vertical axes represent the new channel number, elements in the matrices reveal the direction and strength of information between two channels. The closer the color is to red, the higher the intensity. On the contrary, the closer the color is to blue, the lower the intensity. Based on graph theory, the corresponding brain functional networks are constructed with EEG electrodes as nodes and the adjacency matrices as links between channels, as shown in Fig. 9. As shown in Figs. 8 and 9, channels located in the central parietal area, which are much more related with MI, have higher intensities. Conversely,  channels far away from central parietal area have lower intensities because they are less related with motor activities. Figures 10 and 11 demonstrate the outflows and inflows from all channels, respectively. Considering the spatial distribution of the networks, we can note from the figure that the electrodes located in the central parietal area, which are reported to be related with somatosensory and motor activity [45,46], have larger outflows. Meanwhile, the contralateral primary sensorimotor area was activated during MI when one of the upper limbs was involved, information originates mainly from right parts of the brain when imaging left hand movement and vice versa. It is generally accepted that the movement of a body is denominated by the contralateral part of the brain. The adjacency matrices, brain functional networks, outflows and inflows in β band of 'k3b' when m equals to 2 are also visualized in Fig. 12 for comparison. It can be observed that there are considerable differences between α and β band, which further provides theoretical basis for our method.

Experimental results on Dataset B
Aiming a better insight of potential performance in real acquisition data, DDTF is extended to Dataset B. Under the same experimental procedure in Sect. 3.2, the 10 × 10-fold cross-validation classification accuracies are shown in Fig. 13 when DDTF is applied to extract features from 9 subjects, and the best parameters for each subject are summarized in Table 2. It can be found from Fig. 13 and Table 2 that the optimal frequency bands are varying for different subjects. Even for the same subject, the classification results over different frequency bands also have a significant disparity, which fully reflect the individual differences of MI-EEG. Therefore, the personalized selection of parameters will be helpful to improve the recognition rate for each subject. Not coincidentally, for any frequency band of all subjects, the recognition rates are greatly improved when m is equal to 1 or 2 instead of the original order p, which is consistent with the conclusion of Dataset A and further indicates the correctness and universality of DDTF.

Comparison with GC-based brain functional network methods
To verify the validity of DDTF, some experiments were carried out to compare with GC-based brain functional network methods by which MI-EEG signals were filtered to 8-30 Hz and applied to construction of GC-based brain functional network, the extracted feature vectors were fed to SVM classifier. The average recognition results of 10 × 10-fold CV are displayed in Fig. 14. From Fig. 14, it is easy to see that the classification performance resulted by Time-domain GC (TGC) is not ideal, this is mainly because this method only considers the interactions in time domain while ignoring the spectral properties of MI-EEG signals. Frequency-domain GC (FGC) uses both time and frequency information, which yields a slightly higher accuracies. However, these two methods just focus on information flow between two channels without considering the integrality of the whole brain. Despite the results of TGC and FGC being poor, DTF shows the advantages of multivariate autoregressive methods over traditional univariate methods for each subject in terms of classification accuracy. In addition, for each subject, the recognition rates obtained by using DDTF are significantly enhanced than those by using DTF. The time domain model and frequency domain model share the same model order in DTF, however, the frequency domain model cannot truly be used to construct and embody the changes of MI-EEG brain functional networks. Moreover, the individual differences of allied cognitive tasks are observed in DDTF, which produces better quality information. For all subjects, our method achieves relatively higher recognition accuracies than other GC-based methods, indicating the superiority of DDTF.

Statistical analysis
(1) Kappa coefficient In this section, more statistical analyses were executed to confirm the effectiveness of DDTF. The kappa coefficient, which is generally thought to be a more robust measure than simple percent agreement calculation, takes into account the agreement occurring by chance. It is a common indicator for evaluating the performance of BCI systems [47]. The calculation of kappa coefficient is defined as: where p 0 indicates the classification accuracy, p e represents the probability of opportunity consistency. For a two-class task, if the number of samples across classes is equal, then the value of p e is 0.5. According to Eq. (18), the kappa coefficients of TGC, FGC, DTF as well as DDTF with 10 × 10-fold CV were calculated. The results are shown in Fig. 15. Figure 15 indicates that DDTF achieves the highest kappa value among four methods for each subject. Especially compared to the original DTF, DDTF makes the kappa value increase by 0.39 for subject 'k6b' and 0.45 for subject 'l1b' . Although there are variations in kappa values for different subjects, the mean kappa value of DDTF is improved by 0.37, 0.66 and 0.68 compared to DTF, FGC and TGC methods, respectively, which reveals that DDTF has better consistency in classification. Furthermore, the mean value of DDTF is 0.88, which is higher than 0.8, this illuminates a very good level of agreement.

(2) t-Test
To further analyze the differences between DTF and DDTF statistically, a two-sample t-test is proceeded to inspect whether there is a significant difference when they are available for MI-EEG feature extrac- Adjacency matrices, brain functional networks, outflows and inflows of 'k3b' in β frequency band a AM for 'LH' task, b BFN for 'LH' task, c AM for 'RH' task, d BFN for 'RH' task, e outflows for 'LH' task, f inflows for 'LH' task, g outflows for 'RH' task, h inflows for 'RH' task tion. Suppose that M DTF and M DDTF denote the mean values of tenfold CV accuracies generated by DTF and DDTF, similarly, S 2 DTF and S 2 DDTF stand for the variances, n DTF and n DDTF express the numbers of the results for the two methods, respectively. Then, the t-test statistic is calculated as follows: (19) p = P t > t µ (n DDTF + n DTF−2 ) ≤ 0.05. It can be calculated that the values of p for 3 subjects are 2.0402 × 10 -16 , 3.0029 × 10 -22 and 4.8337 × 10 -19 , respectively, and they are all less than 0.05. Hence, the null hypothesis H 0 is rejected at the 0.05 significance level, which implies DDTF outperforms DTF in MI-EEG feature extraction.

Comparison with multiple traditional feature extraction methods
CSP and its variants have been widely applied in feature extraction of MI-EEG and gained good recognition results based on BCI competition III Dataset. To further illustrate the feasibility of DDTF in this paper, the experiments were conducted to compare with multiple CSPbased methods in references [7][8][9]47]. Table 3 illustrates  the detailed information. It is clear that DDTF achieves the highest recognition rate of 100.00% and 99.75% for subjects 'k3b' and 'l1b' , respectively, and the average classification accuracy, i.e., 94.00%, is superior to the best one with 91.87% in the CSP-based methods. The variants of CSP methods extract features in consideration of multi-channel and spatial distribution characteristics of MI-EEG signals while pitifully neglecting the relationships among EEG sensors. DDTF effectively excavates the interrelationship between multi-channel EEG signals, correctly analyzes the information exchange over the whole brain, and has better applicability in extracting features from MI-EEG signals.

Comparative experiments on Dataset B
In this section, DDTF was compared with the original DTF and GC-based BFN methods on Dataset B, and the results are shown in Fig. 16 and Table 4. It can be seen from Fig. 16 that for each subject, the classification accuracy of DTF is significantly higher than those of TGC, FGC and DTF when used for MI-EEG feature extraction. The average classification accuracy of DDTF increases by 14% compared to DTF, and S6 gets the highest, i.e., 26%. The Kappa coefficients in Table 4 shows that DDTF has an significant improvement in different degrees for 9 subjects, and the average Kappa value of DDTF is 0.61, which has an increase of 0.31, 0.41 and 0.38 relative to those of DTF, TGC and FGC, respectively, revealing that DDTF has the best consistency.

Discussion
In this paper, DDTF was proposed for construction and feature extraction of brain functional network. In DDTF, the optimal frequency band is adaptively selected for each subject, which preferably detects the subject-based feature of MI-EEG, and B m f is defined with a varying order m , this is helpful to improve the classification rates. In addition, the best classification accuracy in any frequency band is achieved for each subject when m value equals to 1 or 2.
To seek for the reasons, we took subject 'k3b' from Dataset A as an example and drew the changing curves of channels 27 and 35 MI-EEG signals in time and frequency domains, which are illustrated as Fig. 17. Figure 17a-e shows the variations of channels 27 and 35 in time and frequency domains when filtered to α, β, β 1 , β 2 and α + β frequency bands, respectively. As is known to all, MI-EEG signals are approximately stationary in short time intervals when the MVAR model is constructed in time domain. Therefore, the time domain MVAR model can express the changes of time  domain signals correctly. However, the non-stationarity of frequency domain signals becomes very intense because of the activated characteristics generated by motor imagery in either α or β frequency band, which can be clearly seen from Fig. 17. Although DTF can correctly reflect the quantitative relationships between the time and frequency domain model, it may not effectively express the characteristics of the frequency domain MI-EEG signals. Therefore, we built the frequency domain MVAR model with model order of 1 based on the MI-EEG filtered to α band, the results perfectly match with the non-stationarity of MI-EEGs in frequency domain. In [26][27][28], the time domain model and frequency domain model share the same model order, which ignores or weakens the non-stationarity of frequency domain signals. Particularly, DDTF is the same as DTF [26][27][28] when m α equals to p α while DDTF can represent the variation characteristics of MI-EEG in frequency domain and the constructed brain functional networks are more veritable and objective. The same experiments were also carried on other subjects and we got similar conclusions. The adjacency matrices and brain functional networks in Figs. 8 and 9 show that with the increase of m values, information transfers get sharp augments and brain functional networks are in the transition from transient state to stable state. Figure 10 indicates the electrodes with larger outflows are located in the central parietal area which are related to motor activity, and the information originates mainly from right part of the brain when imaging left hand movement and vice versa, which is in accordance with the well-accepted theory. What is said above further provides theoretical support for our method.
In addition, DDTF were compared with GC-based BFN feature extraction methods, and DDTF achieved the highest average classification accuracies based on two Datasets and 12 subjects, demonstrating its effectiveness in discrimination of motor imagery tasks, as shown in Figs. 14 and 16. Robustness test was also performed on both Datasets A and B by using Kappa coefficient, the results are shown in Fig. 15 and Table 4. It means that the proposed DDTF achieved the relatively higher Kappa values and even got greatly improvement for each subject compared to GCbased methods and original DTF. This indicates DDTF has good adaptive ability to different subjects, the main reason may be that DDTF can dynamically determine the specific model order and related frequency bands according to the cortical activation induced by motor imagery. Besides, a two-sample t-test statistic was designed to explore whether there was a significant difference between DTF and DDTF in MI-EEG feature extraction. The results implied the superiority and feasibility of DDTF in brain functional network-based feature extraction. This provided a new idea for extracting the features of MI-EEG as well as enhancing the adaptivity of feature extraction.

Conclusions
A dynamic DTF, called DDTF, was developed in this study. It is calculated based on a dynamic frequency domain model with a lower order, and only the most related information is beneficial for recognition of MI-EEG, i.e., the previous 1 or 2 moment is much more related to the current state, and the brain functional network changes from transient state to stable state with the increment of model order. Meanwhile, the best frequency band can be adaptively sought for each individual subject. This makes it more closely coincident with the subject-based characteristic of MI-EEG, yielding better features and recognition rates. Extended experimental results have suggested that DDTF achieves excellent performance in brain functional network-based feature extraction. In the future study, we intend to integrate the DDTF-based brain functional network with other feature extraction methods to improve its property. In addition, it will provide a good prospect for disease diagnosis, seizure detection and rehabilitation effect evaluation.