Localization of epileptic seizure focus by computerized analysis of fMRI recordings

By computerized analysis of cortical activity recorded via fMRI for pediatric epilepsy patients, we implement algorithmic localization of epileptic seizure focus within one of eight cortical lobes. Our innovative machine learning techniques involve intensive analysis of large matrices of mutual information coefficients between pairs of anatomically identified cortical regions. Drastic selection of pairs of regions with biologically significant inter-connectivity provides efficient inputs for our multi-layer perceptron (MLP) classifier. By imposing rigorous parameter parsimony to avoid overfitting, we construct a small-size MLP with very good percentages of successful classification.


Introduction
Epilepsy is a common brain disorder that affects 6 out of 1000 children in the United States [36,52]. Although medication is the mainstay of treatment for these patients, seizures are resistant to anti-epileptogenic drugs in approximately 30% of children. Such medically refractory epilepsies are frequently "focal", defined as originating habitually from a relatively localized region of the brain. In appropriately selected patients, resection of this localized seizure origin can be curative. Despite the promise of this approach, surgical outcomes remain highly variable, even in optimal candidates [6,7,17,24]. Emerging studies motivated by these inconsistencies have demonstrated that pediatric focal epilepsy is not a localized disorder, but rather a disease of large-scale, distributed neural networks [40]. Furthermore, rather than resulting from abnormal activity generated by one or more abnormal foci, the primary organization of seizure origin occurs within a functionally and anatomically connected set of brain regions-a seizure network [2]. Although the exact relationship between ictal dynamics and the inter-ictal seizure network remains to be defined, pathologic connectivity within the seizure network likely contributes to the transition between inter-ictal and ictal states [40,43]. In other words, the capacity of this network for seizure generation results not only from abnormal hyperactivity within intrinsically dysplastic neural elements, but also from abnormal interactions between otherwise "normal" brain predisposed by pathologic interictal connectivity within the seizure network [38,43]. In some cases this network may be relatively restricted in extent; at other times, it encompasses neuronal populations dispersed throughout multiple distant brain structures [40,50]. Regardless of extent, failure to sufficiently resect or disconnect the seizure network may be associated with continued seizures after removal of the seizure focus, thereby contributing to surgical failure [1,18]. At the current time, there is no modality that can accurately map the extent of the seizure network.
Advances in MRI combined with mathematical network approaches have demonstrated great potential to study brain networks non-invasively in a variety of populations, including children with focal epilepsy [4,5,20,21,41,[45][46][47]. Resting-state functional MRI is one method by which connectivity in the brain can be measured. This sequence quantifies the blood oxygen level dependent (BOLD) signal over time as an indirect measure of neuronal activity [3,25]. Functionally connected elements of the brain exhibit similar spontaneous BOLD signal fluctuations at rest [3], which enables one to capture interactions between brain regions in vivo. Within the network framework, the brain is represented as a graph, a mathematical construct consisting of a set of nodes connected by edges [12,14,26,28,30,41,42]. The nodes are defined as anatomical regions of the brain; the edge between each pair of nodes is estimated as the magnitude of association between their BOLD time series [14,23,29,34,53]. Studies based on such resting state functional network constructs have consistently shown global aberrations of the epileptic brain from normal; they have also demonstrated that network features have the potential to predict clinically relevant aspects of brain function in children with epilepsy [22,26,28,30,32,38,44,55]. However, little data exist regarding the use of neuroimaging-defined network models to identify the seizure network in an individual child.
The goal of this study was to assess the potential of MR imaging-defined networks to depict seizure networks. Specifically, we trained an artificial neural network (ANN) to identify the location of seizure origin given access only to features derived from a restingstate functional network. We measured the accuracy of the learning algorithm against the lobe of seizure origin determined at multidisciplinary epilepsy surgery conference. We selected this reference standard-which relies on the consensus of multiple objective tests as well as the clinical expertise from multiple disciplines-as an accurate and reliable assessment of the lobe of seizure origin [18,56]. Identification of the lobe of seizure origin can be considered an initial step toward the ultimate validation of the potential for resting-state functional networks to visualize seizure networks in children with focal epilepsy.

Patients
The HIPAA-compliant study was approved by the local institutional review board. Informed consent was waived for this retrospective study. Patient medical records were retrospectively reviewed to identify patients with the following inclusion criteria: (1) pediatric age group (21 years of age or younger); (2) a clinical diagnosis of focal epilepsy; (3) an available 3-T MR imaging of brain, including a resting-state fMRI sequence; and (4) lobe of seizure origin identified at multidisciplinary epilepsy conference. This determination relies on the consensus of multiple objective tests (including MRI and EEG) as well as the clinical expertise from multiple disciplines (including Neurology, Neurosurgery, and Neuroradiology) [19,20]. Images were performed from January 2012 to December 2017. Forty-six patients were included. Lobe diagnoses for the cohort are presented in Table 1.

MR imaging
All imaging was performed on a 3-T Achieva system (Philips, Andover, Massachusetts) with a 32-channel phased array coil. The following sequences were obtained: 1.

Image processing and analysis
The processing pipeline was implemented using MAT-LAB scripts (version 7.13, MathWorks, Inc) in which adapter functions were embedded to execute FreeSurfer reconstruction (version 5.3.0) and several FMRIB Software Library (FSL) suite tools [39]. Details regarding this pathway have been previously described [26,28,30]. A brief summary is provided here.
The reference space was created from images of one patient in our database, who had no visible abnormality and with optimal registration to MNI space [10]. Structural imaging data for each patient were aligned to a standard reference template (MNI152) using FSL's non-linear registration algorithm [37,39]. Nodes in the network were defined on the template according to parcellation of whole-brain gray matter as follows: First, FreeSurfer reconstruction of cerebral cortical surfaces was performed on the T1 structural image. This processing stream includes motion correction, skull stripping, intensity normalization, segmentation of white matter and gray matter structures, parcellation of the gray matter and white matter boundary, and surface deformation following intensity gradients which Table 1 Lobe diagnosis of the cohort The 5 classes are not perfectly disjoint because 4 patients were ambiguously diagnosed by clinicians and thus belong to 2 classes simultaneously (RT/C5, RF/ RT, LF/LT, LT/RF) optimally place the gray matter/white matter and gray matter/cerebrospinal fluid borders [8,9]. The pial and gray-white surfaces were visually inspected using the Freeview software for accurate placement.
Next, a self-developed MATLAB program was applied to the FreeSurfer output to further subdivide the 148 standard gray matter regions according to their surface area [26][27][28][29][30]. During this process, each region was iteratively divided into two new regions of equal size until the surface area of each parcel (as defined on the FreeSurfer gray-white surface mesh) was less than a size threshold of 350 mm 2 . The final parcellation contained 780 nodes. Each surface parcel was then converted into a volume mask of gray matter at that region to form a node on the network. All nodes defined in reference space were transformed into each individual patient's space by applying the non-linear transformation matrix (12 degrees-of-freedom) obtained during registration.
The first 5 volumes in each resting state functional data were removed to allow magnetization to reach equilibrium. Standard preprocessing and independent component analysis (ICA) of the functional data sets was performed using FSL MELODIC [39], consisting of motion correction, interleaved slice timing correction, brain extraction, spatial smoothing with a Gaussian kernel full width at half maximum of 5 mm, and high pass temporal filtering equivalent to 100 s (0.01 Hz). Affine boundary-based registration as implemented in FSL FLIRT was then used to align the pre-processed functional image volumes for each patient to that individual's structural T1 dataset using linear registration [19]. The inverse transformation matrix was calculated in this step and subsequently used to transform all masks from structural to functional space. Mean BOLD-signal time series were then computed for each node.

Mutual information between discrete random variables
Mutual information MI(X, Y ) between two random variables X and Y quantifies the amount of information brought by the knowledge of X for the prediction ofY . This quantification is symmetric in X and Y , and is based on the concept of conditional entropy. When X and Y can take only a finite number of values denoted x k and y m , their marginal and joint probability distributions are: Then the entropies of X, Y , and Z = (X, Y ) are given by: The mutual information between X and Y is then given by: where MI(X, Y ) = 0 if and only if X and Y are independent. When X and Y have a bivariate Gaussian distribution, the mutual information MI(X, Y ) is also computable as an explicit increasing function of the squared correlation: However, in the non-Gaussian case, this formula is not valid. Mutual information can offer major advantages over correlation, especially in the pathologic brain [33,35,54]. Mutual information has remarkable invariance property which we now recall. For any random variables (U , V ) and any strictly increasing functions f and g , the random variables X = f (U) and Y = g(V ) verify: In this formula, f and g can be non-linear functions. This MI invariance property is not satisfied by correlations, since Cor(U, V ) and Cor(f (U ), g(V )) are generally different unless f and g are both linear functions. In the study of fMRI recordings of brain activity, the strong functional invariance of mutual information offers an interesting advantage, which we now outline.

Advantages of mutual information to estimate cortex connectivity
Denote our N = 780 cortex parcels by CP 1 , CP 2 , . . . , CP N . For any two parcels CP i , and CP j , brain activity recordings by fMRI provide, after standard pre-treatments, two sequences of n = 295 recordings: which can be considered as random samples of n observations for two random variables Y i and Y j . To evaluate the level of neural interaction between the two parcels i and j , many publications use the correlations Cor(Y i , Y j ) or their absolute values, which can easily be estimated from the data sequences S(i) and S(j). In this paper, we have preferred instead to use the same data S(i) and S(j) to estimate the mutual information MI(Y i , Y j ) . We now indicate why this choice has a strong pragmatic and theoretical advantage over the use of Cor(Y i , Y j ) . Indeed, the actual values Y i (n) and Y j (n) generated by fMRI recordings after pretreatment are essentially of the form: where at time n, the numbers BOL i (n) and BOL j (n) represent the BOLD (blood oxygen level dependent) signals for the two parcels CP i and CP j . Here, the functions f (u) and g(v) are two increasing functions of u and v , which are essentially unknown to the experimenter because they are determined by many variable factors such as the actual hardware settings of the fMRI acquisition modalities, the meta-parameters of the fMRI signal reconstruction software, the geometric positions of CP i and CP j within the cortex, etc. Hence, contrary to the actually observed data sequences S(i) and S(j) , the two BOLD signal sequences and BS j = [BOL j (1), . . . , BOL j (n)] are in fact not directly observable. Consider the sequences BS(i) and BS j as random samples from two random variables BOL i and BOL j . Since the sequences S j and BS j are linked by the function f , the observable random vari- Most changes in the instrumental settings of the fMRI hardware may modify f and g in an unknown way. Moreover, if one changes the parcels, the functions f and g will also be modified. Fortunately, due to the functional invariance of mutual information mentioned above, we have: for arbitrary increasing functions f and g . By contrast, Cor(Y i , Y j ) and Cor(BOL i , BOL j ) can generally be quite different unless f and g are linear functions, which is quite unlikely in this context. So to estimate cortex connectivity by quantifying the strength of pairwise interactions between the BOLD signals BOL i , BOL j at parcels CP i and CP j , the mutual information MI(Y i , Y j ) will be more stable and more relevant than the correlation Cor(Y i , Y j ) . Consistent with theoretical superiority, the real-world advantages of MI have been documented in this setting [35,54].

Numerical computation of mutual information matrix
As above, we expect strong values of MI i,j = MI Y i , Y j to detect the frequent presence of simultaneous high blood oxygen levels BOL i and BOL j within cortex parcels CP i and CP j , a phenomenon indicative of strong neuronal interactivity between CP i and CP j . Each observed time series Y i (t) involves n = 295 observations of the continuous random variable Y i , we discretize each Y i to transform it into a random variable taking only 5 values. To this end we first compute 6 quantiles of the n observations Y i (t) , at levels 0, 20, 40, 60, 80, 100%. This subdivides the range of To evaluate the accuracy of our mutual information estimates, we have applied a variant of the "Jackknife resampling" technique by randomly taking out 10% of time points (29 time points) and re-calculating MI on the new data set. This process has been repeated 1000 times and the absolute error of MI estimation has been calculated. For 60% of the mutual information matrix coefficients MI i,j , the relative estimation error on MI i,j is inferior to 10%. The use of 20 bins instead of 5 bins to generate the discretized Y i (t) tends to double the average errors of estimation for the MI i,j , as can be expected from theory. Indeed, our exploration of more detailed discretization, and even optimized discretization, has shown that using 5 bins of equal frequency 20% for each Y i was quite effective in our context. Figure 1 displays two nodes whose BOLD time courses have high MI and another two nodes whose time courses have low MI.

Mutual information between pairs of cortex subregions
For each patient, there are roughly 780 parcels of size 350 mm 2 and for each parcel one time series with 295 recordings. We characterize the cortex connectivity of each patient by the symmetric 780 × 780 mutual information matrix MI i,j . The coefficients of this MI matrix provide a very large number of features ∼ = 780 2 /2 for each patient [31]. To reduce the number of features we have developed and implemented an innovative multiscale analysis of these MI matrices.  MI(A, B) . We have thus separately tested 4 quantiles namely 70, 75, 80, 85%. These tests showed that, using any one of these 4 possible quantiles to implement our features selection algorithm, we ended up with fairly similar performances for our neural network classifier, with a small qualitative advantage for the 75% quantile. This explains our choice for the definition of the regional MI coefficients MI(A, B).
We then compute the coefficients MIreg(m, n) = MI(REG m , REG n ) for any two regions in our list of 148 anatomically defined cortex regions to obtain for each patient a symmetric matrix MIreg of "inter-region connectivity". MIreg has size 148 × 148, and each one of its (148*149)/2 = 11,026 distinct coefficients will provide 11,026 potential input features MIreg(m, n) for our patient classification task. We have also studied the symmetric 10 × 10 symmetric matrices MIlob of mutual information between our 10

Automatic classification by multilayer perceptron
The chart in Fig. 3 shows the workflow of the pipeline from fMRI time series to epileptic seizure focus localization.
Here we explain the MLP architecture and its input selection procedure.

Architecture of our multi-layer perceptron (MLP)
To implement automatic classification of seizure origin by computer analysis of fMRI recordings of resting state brain activity, we have decided to use and evaluate Artificial Neural Networks (ANNs). Our ANNs are feed forward MLPs (multi-layer perceptrons), technically similar to the classical type of ANNs used to classify handwritten digits (MNIST data).
We first implement, as detailed further on, our feature selection algorithm which extracts a set of f highly discriminating input features, selected among our 11,026 region/region mutual information coefficients MIreg(m, n). For each patient, the f selected input features are encoded by f "neurons" which constitute the 1st layer of our MLP. The 2nd layer (called the hidden layer) contains h neurons, with all-to-all "synaptic" links between input layer and hidden layer. Each hidden neuron has a Sigmoid Response Function combining the f input features and using f + 1 unknown parameters ( f weights and one threshold). The 3rd layer of our MLP contains k = 5 neurons (1 neuron for each one of our k classes of patients). Each one of these k neurons is linked to all the h hidden neurons and has a Sigmoid Response function, involving h + 1 unknown parameters. The 4th MLP layer is the output layer and contains k neurons. Each output neuron is linked to all the k neurons of the 3rd layer. The response function of output neuron j is classically computed by the SoftMax formula, which involves no unknown parameter, and which generates the probability p(j) that the current f inputs come from a patient belonging to class CL(j).

Parameters parsimony constraints for the MLP inputs
Most MLPs trained on very large training data sets, such as the MNIST data, have many layers successively generated by deep learning, and a quite large number of neurons. But here, given the moderate number of diagnosed patients in our data set, we have instead focused on implementing a radical reduction of the number of weights and thresholds to be learned during MLP training, in order to enforce a robust learning of synaptic Fig. 3 Flowchart for the pipeline weights and thresholds. Enforcing a strong parsimony for the number of unknown parameters (weights and thresholds) to be "learned" by an MLP is indeed known to enhance the robustness of automatic classifiers [15,16]. Indeed, deep theoretical results of Vapnik [48] have demonstrated that this is a good recipe to perform stable automatic learning on a moderately large training set.
In such situations, feedforward neural networks (such as MLPs) are known to exhibit better generalization capacity and less overfitting when their architecture involves fewer weights. But one still wants the MLPs to achieve good classification performance, which often requires to increase the number of weights. Our approach to implement an ideal balance between these two criteria has involved a meticulous algorithmic selection of a small number of highly discriminating input features for our MLPs. Let " f " be the number of selected input features and " k " be the number of patient classes. Our MLP classifiers involve 4 successive layers of artificial neurons: an input layer of size f , a hidden layer of size h , a 3rd layer of size k = 5 , and finally an output layer of size k which computes k probabilities p(1) . . . p(k) verifying p(1) + · · · + p(k) = 1 . The number w of weights and thresholds (also called "unknown parameters") of such an MLP is given by: Call W the vector of all w unknown parameters. Each patient provides a vector U of f input values for our MLP, and the outputs p(1) . . . p(k) computed by the MLP are all of the form p j = G j (U, W ) where G j is an explicit non-linear function of U and W . But these functions are linearly dependent because by construction one always has G 1 + · · · + G k = 1. For the nth training case, the input U n is known, as well as the true values v 1 (n) . . . v k (n) for the k MLP outputs p(1) . . . p(k) . In the coding of true outputs, for a case in class CL(j) , the only non-zero true output is v j = 1 , so that v 1 + · · · + v k = 1. Hence the nth training case yields k non-linear equations satisfied by the vector W of unknown parameters, namely: But these k equations are linearly dependent because as seen above their sum is always equal to 1. Thus, each training case yields in fact only (k − 1) non-linear independent constraints to be satisfied by the w unknown parameters. Hence a training set of size S provides (k − 1) × S non-linear equations to be satisfied by w unknown parameters. To avoid overfitting, one should "ideally" have w < (k − 1)S , which yields the following dimensional constraint on " h " and " f ": Parameters parsimony constraint h f + k + 1 < (k − 1)S − k.
Here, we have k = 5 patient classes and S = 45 because we use leave-one-out training. Hence, the parameters parsimony constraint on (h, f ) becomes here as h f + 6 ≤ 175.

Implementation of automatic learning
After selecting f highly discriminating input features for our MLP, we also fix the size h of its hidden layer, making sure that f and h verify the parsimony constraint h f + 6 ≤ 175 . During training of this MLP, when a case is ambiguously diagnosed as belonging to 2 classes CL(i) and CL(j) , we assign to this case MLP outputs v i = v j = 1/2 . Automatic learning is performed by standard Gradient Descent, to minimize the average "Cross Entropy" between true outputs and MLP-computed outputs. Indeed, for classification tasks, minimizing cross entropy is generally more efficient than minimizing mean squared error.
Since our training set is of moderate size, we implement the classical leave-one-out technique. Namely, one eliminates one patient from the training set before performing automatic learning; then one evaluates by 0 or 1 the correctness of our MLP classification of this left-out patient. Final performance is the average of these evaluations over all possible choices of the left-out patient.

Input selection for minimal size MLP classifiers
We want to select "features" discriminating between 5 classes of patients CL(1),…CL(5) of sizes s(1) . . . s(5) . This requires solving at least 10 basic tasks of the type "discriminate CL(p) versus CL(q) ". Consider any feature F computable from fMRI recordings, and hence providing for each patient PAT a number F (PAT ). When J p (F ) and J q (F ) are nearly disjoint intervals, we naturally consider that the classes CL(p) and CL(q) are strongly separated by feature F.To quantify this notion, define the separability sep(J , J ′ ) of two intervals J and J′ with intersection I = J ∩ J ′ by: where K is the length of interval K . Then 0 ≤ sep(J , J ′ ) ≤ 1 , and values of sep(J , J ′ ) close to 1 mean that J , J ′ are nearly disjoint. Now define DIS p,q (F ) = sep(J p (F ), J q (F )) as the discriminating power of feature F to differentiate between classes CL(p) and CL(q).
To select a small but efficient set of " f " input features for our MLP classifier, we start from the (148)(149)/2 mutual information coefficients MIreg(m, n) at the cortex regions "scale". For each pair m ≤ n , the "regions based" feature F m,n = MIreg(m, n) provides for each patient an indicator of the neuronal interaction level between cortex regions REG m and REG n . Fix any two distinct classes CL(p), CL(q). For all (m, n) with 1 ≤ m ≤ n ≤ 148 , we compute the power POW (m, n) = DIS p,q (F m,n ) of feature F m,n to discriminate CL(p) vs CL(q) . Then we rank the features F m,n by decreasing POW (m, n) , and select only the top two F m,n having highest discriminating power POW (m, n).

Inter-region connectivity input to MLP
For each one of the 10 pairs CL(p), CL(q) with 1 ≤ p < q ≤ 5, the preceding selection algorithm selects two regions Reg m , Reg n . This algorithm, then, generates a set of 20 pairs of cortex regions REG m , REG n with strongly discriminating coefficients MIreg(m, n) (Fig. 4).
To respect the parameter parsimony constraint, we keep only a set of 18 of these Inter-region Connectivity Coefficients, to be used below as a set of 18 inputs features F m1,n1 , . . . , F m18,n18 for our first MLP classifier.

Intra-region connectivity input to MLP
The intra-region connectivity coefficient MIreg(m, m) of cortex region Reg m is an indicator of the level of neuronal interactions within Reg m . The selection algorithm outlined above can detect the regions Reg m for which the feature F m,m = MIreg(m, m) has strong power POW (m, m) to discriminate between at least two classes CL(p), CL(q). We have thus selected a set of 21 such cortex regions, 21 intra-connectivity coefficients (Fig. 5), from which we keep only a set of 18 coefficients to be used below as input features of our second MLP classifier.

Discriminating power for inter-and intra-region connectivity coefficients:
For each one of the 10 basic discrimination tasks CL(p) vs CL(q) the highest discriminating power DIS(p, q) reached for this task among all the connectivity coefficients MIreg(m, n), is a crude indicator of "how well" the task may be solved by an efficient classifier. In Fig. 6, we display in red the ten DIS(p, q) values reached by inter-connectivity coefficients and in blue the DIS(p, q) values reached by intra-connectivity coefficients. Clearly the red values dominate the blue values.

MLP classifier based on inter-region connectivity coefficients
The set of 18 inter-connectivity coefficient MIreg(m, n) selected above provides the f = 18 inputs of our first MLP classifier, for which we impose a hidden layer size h = 7 . The 4 layers of this MLP have then sizes (18,7,5,5). Our parameter parsimony equation becomes h f + 6 = 168 < 175 , and is hence correctly verified, again assuring a reasonable robustness of the MLP training results.
After training this small-size MLP, we evaluate its performances by leave-one-out technique. The global percentage of successful patient classifications over all 5 patient classes was 89 ± 2.1%. In Table 2, we display the percentage of successful classifications within each patient class. Most errors occur for discrimination between Classes RF and LF.

MLP classifier based on 18 intra-region connectivity coefficients
The set of 18 intra-connectivity coefficients MIreg(m, m) selected above provides f = 18 inputs for our second MLP classifier which has again a 4-layer architecture with dimensions (18,7,5,5) (see Section 2) and hence satisfies our parsimony equation. After training this second MLP, performance evaluated by leave-one-out technique demonstrates a global percentage of successes of 88 ± 2.1%. Table 3 displays classification accuracy within each patient class. Results are slightly weaker than for our first MLP, particularly for Class RF and Class LF. Discrimination between these two classes generate most errors of classification.

Patient classification by support vector machine (SVM) with highly discriminating inputs:
The set of 18 inter-connectivity/intra-connectivity coefficients MIreg(m, n) selected above can be input to a multiclass SVM classifier that uses a one-vs-rest approach. Table 4 displays classification accuracy within each patient class (leave-one-out technique).

Discussion
Our goal was to explore automatic classification of fMRI data in young epileptic patients to identify the origin of epileptic seizures. Given the moderate size of our data set, one challenge was to avoid using machine learning techniques involving large numbers of parameters.
We have solved this challenge by introducing a rigorous "parameters parsimony principle". A second challenge was to identify strongly discriminating input features computable from our fMRI data. To this end, we have systematically used large matrices of mutual information coefficients between all pairs of roughly 780 time series extracted from each patient's fMRI data. We have introduced a computable version of mutual information between any two cortex regions, as an indicator of neuronal interactions between these two regions. This led us to define and compute the discriminating power of all these 148 2 /2 inter-region connectivity coefficients.
We have then extracted one set of 18 strongly discriminating inter-region connectivity coefficients, and used them as input features for a small-size multi-layer perceptron (MLP) with 4 layers (dimensions: 18, 7, 5, 5), including a "SoftMax" terminal layer. After automatic training by a leave-one-out technique, this MLP provided 89% successful patients classification. For comparison, a standard support vector machine applied to our patient classification has yielded less than 71% successes. Our innovative development of mutual information between pairs of cortex regions, and of algorithmic selection of highly discriminating pairs of cortex regions has shown good capacity to extract useful and interpretable brain activity features from fMRI recordings. The small size of the MLP classifier we have thus constructed rigorously avoids overfitting and reaches good performance on our group of epileptic patients. To confirm these exploratory findings, we plan to test our approach on larger groups of patients.
We also designed and implemented an analysis to interpret why interactivity between two specific regions (say A, B ) can separate two specific lobes (say L 1 ,L 2 ). We found out that for almost all the patients with seizure focus in L 1 , we can find within L 1 one or more cortex regions having strong interaction with A and with B , while all subregions of L 2 have much weaker simultaneous interactions with A and B . This means that A, B are simultaneously excited whenever specific subregions of L 1 become strongly active, while hyperactivity in any subregion of L 2 has a weak impact on the simultaneous activity of A and B . For example, the inter-connectivity between the regions = A "Transverse frontopolar gyri and sulci" and = B "Vertical ramus of the anterior segment of the lateral sulcus" has strong discriminating power between lobes L 1 = LF and L 2 = RF and we can actually find two subregions U and V of LF with high interactivity with A: MI(U , A) = 0.22 and MI(V , B) = 0.25, while for all subregions K of RF one has: MI(K , A) < 0.12 and MI(K , B) < 0.11. These data suggest the potential for a similar approach to depict specific nodes that constitute important drivers of the seizure network.
Our methods are generally consistent with previous applications of network connectivity to invasive EEG. Wilke et al. observed a correlation of betweenness centrality, a network graph metric, with the location of  network nodes whose resection was likely to result in seizure freedom [51]. Sinha et al. also validated a model of epileptogenicity based on connectivity against surgical outcomes in a cohort consisting largely of adults [38]. They observed a correlation between their index of epileptogenicity and the putative seizure onset zone; they also found areas of high epileptogenicity outside of the resection zone in several cases of unsuccessful surgery. Tomlinson et al. applied a network approach to a pediatric cohort with focal epilepsy [44]. They observed a significant increase in connectivity outside of the seizure onset zone; further, global synchrony (a measure of the overall strength of connectivity) within the field of invasive EEG could be used to classify patients with regard to seizure-free outcome after surgery. These studies not only reinforce the relevance of network features to seizure onset, they highlight the importance of broader areas of dysconnectivity (beyond the traditional zone of seizure onset) in achieving seizure freedom. Invasive monitoring in the form of electrocorticography (ECOG) or stereo EEG (SEEG) is gold standard for localization of the seizure onset zone. These modalities, however, are limited in their spatial sampling; large areas of the brain are left unexplored, leading to the potential for erroneous and biased conclusions [13]. Optimal use of invasive monitoring therefore requires an accurate pre-test hypothesis regarding the location and extent of the epileptogenic network. It is likely, therefore, that optimal patient outcomes would result from the addition of global network approaches to standard invasive monitoring.

Conclusion
We developed machine learning algorithms to evaluate the connectivity obtained from resting-state fMRI in terms of differentiating the lobe of seizure origin in a pediatric cohort with focal epilepsy. These findings support the potential for neuroimaging-based network constructs to depict pathophysiologically relevant features of seizure genesis. If these approaches can be tailored to identify individual elements within the seizure network, biomarkers based on functional networks may ultimately contribute to personalized management strategies in children undergoing epilepsy surgery.