 Open Access
Identifying HIVinduced subgraph patterns in brain networks with side information
 Bokai Cao^{1}Email author,
 Xiangnan Kong^{2},
 Jingyuan Zhang^{1},
 Philip S. Yu^{1, 3} and
 Ann B. Ragin^{4}
https://doi.org/10.1007/s4070801500231
© The Author(s) 2015
 Received: 4 October 2015
 Accepted: 2 November 2015
 Published: 16 November 2015
Abstract
Investigating brain connectivity networks for neurological disorder identification has attracted great interest in recent years, most of which focus on the graph representation alone. However, in addition to brain networks derived from the neuroimaging data, hundreds of clinical, immunologic, serologic, and cognitive measures may also be documented for each subject. These measures compose multiple side views encoding a tremendous amount of supplemental information for diagnostic purposes, yet are often ignored. In this paper, we study the problem of subgraph selection from brain networks with side information guidance and propose a novel solution to find an optimal set of subgraph patterns for graph classification by exploring a plurality of side views. We derive a feature evaluation criterion, named gSide, to estimate the usefulness of subgraph patterns based upon side views. Then we develop a branchandbound algorithm, called gMSV, to efficiently search for optimal subgraph patterns by integrating the subgraph mining process and the procedure of discriminative feature selection. Empirical studies on graph classification tasks for neurological disorders using brain networks demonstrate that subgraph patterns selected by the multisideviewguided subgraph selection approach can effectively boost graph classification performances and are relevant to disease diagnosis.
Keywords
 Subgraph pattern
 Graph mining
 Side information
 Brain network
1 Introduction
Modern neuroimaging techniques have enabled us to model the human brain as a brain connectivity network or a connectome. Rather than vectorbased feature representations as traditional data, brain networks are inherently in the form of graph representations which are composed of brain regions as the nodes, e.g., insula, hippocampus, thalamus, and functional/structural connectivities between the brain regions as the links. The linkage structure in these brain networks can encode tremendous information concerning the integrated activity of the human brain. For example, in brain networks derived from functional magnetic resonance imaging (fMRI), connections/links can encode correlations between brain regions in functional activity, while structural links in diffusion tensor imaging (DTI) can capture white matter fiber pathways connecting different brain regions. The complex structures and the lack of vector representations within these graph data raise a challenge for data mining. An effective model for mining the graph data should be able to extract a set of subgraph patterns for further analysis. Motivated by such challenges, graph mining research problems, in particular graph classification, have received considerable attention in the last decade.
Despite its value and significance, the feature selection problem for graph data using auxiliary views has not been studied in this context so far. There are two major difficulties in learning from multiple side views for graph classification, as follows:
1.1 The primary view in graph representation
Graph data naturally compose the primary view for graph mining problems, from which we want to select discriminative subgraph patterns for graph classification. However, it raises a challenge for data mining with the complex structures and the lack of vector representations. Conventional feature selection approaches in vector spaces usually assume that a set of features are given before conducting feature selection. In the context of graph data, however, subgraph features are embedded within the graph structures and usually it is not feasible to enumerate the full set of subgraph features for a graph dataset before feature selection. Actually, the number of subgraph features grows exponentially with the size of graphs.
1.2 The side views in vector representations
Figure 2 illustrates two strategies of leveraging side views in the process of selecting subgraph patterns. Conventional graph classification approaches treat side views and subgraph patterns separately and may only combine them at the final stage of training a classifier. Obviously, the valuable information embedded in side views is not fully leveraged in the feature selection process. Most subgraph mining approaches focus on the drug discovery problem which have access to a great amount of graph data for chemical compounds. For neurological disorder identification, however, there are usually limited subjects with a small sample size of brain networks available. Therefore, it is critical to learn knowledge from other possible sources. We notice that transfer learning can borrow supervision knowledge from the source domain to help the learning on the target domain, e.g., finding a good feature representation [10], mapping relational knowledge [24, 25], and learning across graph database [29]. However, to the best of our knowledge, they do not consider transferring complementary information from vectorbased side views to graph database whose instances are complex structural graphs.
To solve the above problems, in this paper, we introduce a novel framework that fuses heterogeneous data sources at an early stage. In contrast to existing subgraph mining approaches that focus on a single view of the graph representation, our method can explore multiple vectorbased side views to find an optimal set of subgraph features for graph classification. We first verify side information consistency via statistical hypothesis testing. Based on auxiliary views and the available label information, we design an evaluation criterion for subgraph features, named gSide. By deriving a lower bound, we develop a branchandbound algorithm, called gMSV, to efficiently search for optimal subgraph features with pruning, thereby avoiding exhaustive enumeration of all subgraph features. In order to evaluate our proposed model, we conduct experiments on graph classification tasks for neurological disorders, using fMRI and DTI brain networks. The experiments demonstrate that our subgraph selection approach using multiple side views can effectively boost graph classification performances. Moreover, we show that gMSV is more efficient by pruning the subgraph search space via gSide.
2 Problem formulation
A motivation for this work is the premise that side information could be strongly correlated with neurological status. Before presenting the subgraph feature selection model, we first introduce the notations that will be used throughout this paper. Let \({\mathcal{D}}=\{G_1,\ldots ,G_n\}\) denote the graph dataset, which consists of n graph objects. The graphs within \({\mathcal{D}}\) are labeled by \([y_1,\ldots ,y_n]^\top\), where \(y_i\in \{1,+1\}\) denotes the binary class label of \(G_i\).
Definition 1
(Graph) A graph is represented as \(G =(V,E)\), where \(V=\{v_1,\ldots ,v_{n_v}\}\) is the set of vertices, \(E\subseteq V\times V\) is the set of edges.
Definition 2
(Subgraph) Let \(G'=(V',E')\) and \(G=(V,E)\) be two graphs. \(G'\) is a subgraph of G (denoted as \(G'\subseteq G\)) iff \(V'\subseteq V\) and \(E'\subseteq E\). If \(G'\) is a subgraph of G, then G is supergraph of \(G'\).
Definition 3
(Side view) A side view is a set of vectorbased features \({\mathbf{z}}_i=[z_1,\ldots ,z_d]^\top\) associated with each graph object \(G_i\), where d is the dimensionality of this view. A side view is denoted as \({\mathcal{Z}}=\{{\mathbf{z}}_1,\ldots ,{\mathbf{z}}_n\}\).

How to leverage the valuable information embedded in multiple side views to evaluate the usefulness of a set of subgraph patterns?

How to efficiently search for the optimal subgraph patterns without exhaustive enumeration in the primary graph space?
Important notations
Symbol  Definition and description 

.  Cardinality of a set 
\(\Vert . \Vert\)  Norm of a vector 
\({\mathcal{D}}=\{G_1,\ldots ,G_n\}\)  Given graph dataset, \(G_i\) denotes the ith graph in the dataset 
\({\mathbf{y}}=[y_1,\ldots ,y_n]^\top\)  Class label vector for graphs in \({\mathcal{D}}\), \(y_i\in \{1,+1\}\) 
\({\mathcal{S}}=\{g_1,\ldots ,g_m\}\)  Set of all subgraph patterns in the graph dataset \({\mathcal{D}}\) 
\({\mathbf{f}}_i=[f_{i1},\ldots ,f_{in}]^\top\)  Binary vector for subgraph pattern \(g_i\), \(f_{ij}=1\) iff \(g_i\subseteq G_j\), otherwise \(f_{ij}=0\) 
\({\mathbf{x}}_j=[x_{1j},\ldots ,x_{mj}]^\top\)  Binary vector for \(G_j\) using subgraph patterns in \({\mathcal{S}}\), \(x_{ij}=1\) iff \(g_i\subseteq G_j\), otherwise \(x_{ij}=0\) 
\(X=[x_{ij}]^{m\times n}\)  Matrix of all binary vectors in the dataset, \(X=[{\mathbf{x}}_1,\ldots ,{\mathbf{x}}_n]=[{\mathbf{f}}_1,\ldots ,{\mathbf{f}}_m]^\top \in \{0,1\}^{m\times n}\) 
\({\mathcal{T}}\)  Set of selected subgraph patterns, \({\mathcal{T}}\subseteq {\mathcal{S}}\) 
\({\mathcal{I}}_{\mathcal{T}}\in \{0,1\}^{m\times m}\)  Diagonal matrix indicating which subgraph patterns are selected from \({\mathcal{S}}\) into \({\mathcal{T}}\) 
min_sup  Minimum frequency threshold; frequent subgraphs are contained by at least min_sup \(\times {\mathcal{D}}\) graphs 
k  Number of subgraph patterns to be selected 
\(\lambda ^{(p)}\)  Weight of the pth side view (default: 1) 
\(\kappa ^{(p)}\)  Kernel function on the pth side view (default: RBF kernel) 
Demographic characteristics
HIV  Control  p  

Age (mean years \(\pm\) SD)  33.3 \(\pm\) 10.1  31.4 \(\pm\) 8.9  0.45 
Gender (% male)  89 %  76 %  0.22 
Race (% white)  62 %  76 %  0.22 
Education (% college)  81 %  90 %  0.29 
3 Data analysis
A motivation for this work is that the side information could be strongly correlated with the health state of a subject. Before proceeding, we first introduce realworld data used in this work and investigate whether the available information from side views has any potential impact on neurological disorder identification.
3.1 Data collections
In this paper, we study the realworld datasets collected from the Chicago Early HIV Infection Study at Northwestern University [27]. The clinical cohort includes 56 HIV (positive) and 21 seronegative controls (negative). Demographic information is presented in Table 2. HIV and seronegative groups did not differ in age, gender, racial composition or education level. More detailed information about data acquisition can be found in [5]. The datasets contain functional magnetic resonance imaging (fMRI) and diffusion tensor imaging (DTI) for each subject, from which brain networks can be constructed, respectively.
For fMRI data, we used DPARSF toolbox^{1} to extract a sequence of responds from each of the 116 anatomical volumes of interest (AVOI), where each AVOI represents a different brain region. The correlations of brain activities among different brain regions are computed. Positive correlations are used as links among brain regions. For details, functional images were realigned to the first volume, slice timing corrected, and normalized to the MNI template and spatially smoothed with an 8mm Gaussian kernel. The linear trend of time series and temporally bandpass filtering (0.01–0.08 Hz) were removed. Before the correlation analysis, several sources of spurious variance were also removed from the data through linear regression: (i) six parameters obtained by rigid body correction of head motion, (ii) the wholebrain signal averaged over a fixed region in atlas space, (iii) signal from a ventricular region of interest, and (iv) signal from a region centered in the white matter. Each brain is represented as a graph with 90 nodes corresponding to 90 cerebral regions, excluding 26 cerebellar regions.
For DTI data, we used FSL toolbox^{2} to extract the brain networks. The processing pipeline consists of the following steps: (i) correct the distortions induced by eddy currents in the gradient coils and use affine registration to a reference volume for head motion, (ii) delete nonbrain tissue from the image of the whole head [15, 30], (iii) fit the diffusion tensor model at each voxel, (iv) build up distributions on diffusion parameters at each voxel, and (v) repetitively sample from the distributions of voxelwise principal diffusion directions. As with the fMRI data, the DTI images were parcellated into 90 regions (45 for each hemisphere) by propagating the Automated Anatomical Labeling (AAL) to each image [34]. Minmax normalization was applied on link weights.
Hypothesis testing results (p values) to verify side information consistency
Side views  fMRI dataset  DTI dataset 

Neuropsychological tests  1.3220e−20  3.6015e−12 
Flow cytometry  5.9497e−57  5.0346e−75 
Plasma luminex  9.8102e−06  7.6090e−06 
Freesurfer  2.9823e−06  1.5116e−03 
Overall brain microstructure  1.0403e−02  8.1027e−03 
Localized brain microstructure  3.1108e−04  5.7040e−04 
Brain volumetry  2.0024e−04  1.2660e−02 
3.2 Verifying side information consistency
We study the potential impact of side information on selecting subgraph patterns via statistical hypothesis testing. Side information consistency suggests that the similarity of side view features between instances with the same label should have higher probability to be larger than that with different labels. We use hypothesis testing to validate whether this statement holds in the fMRI and DTI datasets.
The t test results, p values, are summarized in Table 3. The results show that there is strong evidence, with significance level \(\alpha =0.05\), to reject the null hypothesis on the two datasets. In other words, we validate the existence of side information consistency in neurological disorder identification, thereby paving the way for our next study of leveraging multiple side views for discriminative subgraph selection.
4 Multisideview discriminative subgraph selection
4.1 Exploring multiple side views: gSide
Intuitively, Eq. (5) will minimize the distance between subgraph features of similar instancepairs with \(\kappa ^{(p)}_{ij}\ge \mu ^{(p)}\), while maximizing the distance between dissimilar instancepairs with \(\kappa ^{(p)}_{ij}<\mu ^{(p)}\) in each view. In this way, the side view information is effectively used to guide the process of discriminative subgraph selection. The fact verified in Sect. 3.2 that the side view information is clearly correlated with the prespecified label information can be very useful, especially in the semisupervised setting.
With prespecified information for labeled graphs, we further consider that the optimal set of subgraph patterns should satisfy the following constraints: labeled graphs in the same class should be close to each other; labeled graphs in different classes should be far away from each other. Intuitively, these constraints tend to select the most discriminative subgraph patterns based on the graph labels. Such an idea has been well explored in the context of dimensionality reduction and feature selection [2, 32].
Definition 4
4.2 Searching with a lower bound: gMSV
Now we address the second problem discussed in Sect. 2, and propose an efficient method to find the optimal set of subgraph patterns from a graph dataset with multiple side views.
A straightforward solution to the goal of finding an optimal feature set is the exhaustive enumeration, i.e., we could first enumerate all subgraph patterns from a graph dataset, and then calculate the gSide values for all subgraph patterns. In the context of graph data, however, it is usually not feasible to enumerate the full set of subgraph patterns before feature selection. Actually, the number of subgraph patterns grows exponentially with the size of graphs. Inspired by recent advances in graph classification approaches [7, 20, 21, 37], which nest their evaluation criteria into the subgraph mining process and develop constraints to prune the search space, we adopt a similar approach by deriving a different constraint based upon gSide.
By adopting the gSpan algorithm proposed by Yan and Han [38], we can enumerate all the subgraph patterns for a graph dataset in a canonical search space. In order to prune the subgraph search space, we now derive a lower bound of the gSide value:
Theorem 1
Proof
We can now nest the lower bound into the subgraph mining steps in gSpan to efficiently prune the DFS code tree. During the depthfirst search through the DFS code tree, we always maintain the currently topk best subgraph patterns according to gSide and the temporally suboptimal gSide value (denoted by \(\theta\)) among all the gSide values calculated before. If \({\hat{q}}(g_i)\ge \theta\), the gSide value of any supergraph \(g_j\) of \(g_i\) should be no less than \({\hat{q}}(g_i)\) according to Theorem 1, i.e., \(q(g_j)\ge {\hat{q}}(g_i)\ge \theta\). Thus, we can safely prune the subtree rooted from \(g_i\) in the search space. If \({\hat{q}}(g_i)<\theta\), we cannot prune this subtree since there might exist a supergraph \(g_j\) of \(g_i\) such that \(q(g_j)<\theta\). As long as a subgraph \(g_i\) can improve the gSide values of any subgraphs in \({\mathcal{T}}\), it is added into \({\mathcal{T}}\) and the least best subgraph is removed from \({\mathcal{T}}\). Then we recursively search for the next subgraph in the DFS code tree. The branchandbound algorithm gMSV is summarized in Algorithm 1.
5 Experiments
In order to evaluate the performance of the proposed solution to the problem of feature selection for graph classification using multiple side views, we tested our algorithm on brain network datasets derived from neuroimaging, as introduced in Sect. 3.1.
5.1 Experimental setup

gMSV: The proposed discriminative subgraph selection method using multiple side views. Following the observation in Sect. 3.2 that side information consistency is verified to be significant in all the side views, the parameters in gMSV are simply set to \(\lambda ^{(1)}=\cdots =\lambda ^{(v)}=1\) for experimental purposes. In the case where some side views are suspect to be redundant, we can adopt the alternative optimization strategy to iteratively select discriminative subgraph patterns and update view weights.

gSSC: A semisupervised feature selection method for graph classification based upon both labeled and unlabeled graphs. The parameters in gSSC are set to \(\alpha =\beta =1\) unless otherwise specified [21].

Discriminative Subgraphs (Conf, Ratio, Gtest, HSIC): Supervised feature selection methods for graph classification based upon confidence [12], frequency ratio [16–18], G test score [37], and HSIC [20], respectively. The topk discriminative subgraph features are selected in terms of different discrimination criteria.

Frequent Subgraphs (Freq): In this approach, the evaluation criterion for subgraph feature selection is based upon frequency. The topk frequent subgraph features are selected.
5.2 Performance on graph classification
The experimental results on fMRI and DTI datasets are shown in Figs. 3 and 4, respectively. The average performances with different numbers of features of each method are reported. Classification accuracy is used as the evaluation metric.
In Fig, 3, our method gMSV can achieve the classification accuracy as high as 97.16% on the fMRI dataset, which is significantly better than the union of other subgraphbased features and side view features. The black solid line denotes the method MSV, the simplest baseline that uses only side view data. Conf and Ratio can do slightly better than MSV. Freq adopts an unsupervised process for selecting subgraph patterns, resulting in a comparable performance with MSV, indicating that there is no additional information from the selected subgraphs. Other methods that use different discrimination scores without leveraging the guidance from side views perform even worse than MSV in graph classification, because they evaluate the usefulness of subgraph patterns solely based on the limited label information from a small sample size of brain networks. The selected subgraph patterns can potentially be redundant or irrelevant, thereby compromising the effects of side view data. Importantly, gMSV outperforms the semisupervised approach gSSC which explores the unlabeled graphs based on the separability property. This indicates that rather than simply considering that unlabeled graphs should be separated from each other, it would be better to regularize such separability/closeness to be consistent with the available side views.
5.3 Time and space complexity
Next, we evaluate the effectiveness of pruning the subgraph search space by adopting the lower bound of gSide in gMSV. In this section, we compare the runtime performance of two implementation versions of gMSV: the pruning gMSV uses the lower bound of gSide to prune the search space of subgraph enumerations, as shown in Algorithm 1; the unpruning gMSV denotes the method without pruning in the subgraph mining process, e.g., deleting the line 13 in Algorithm 1. We test both approaches and recorded the average CPU time used and the average number of subgraph patterns explored during the procedure of subgraph mining and feature selection.
The comparisons with respect to the time complexity and the space complexity are shown in Figs. 5 and 6, respectively. On both datasets, the unpruning gMSV needs to explore exponentially larger subgraph search space as we decrease the min_sup value in the subgraph mining process. When the min_sup value is too low, the subgraph enumeration step in the unpruning gMSV can run out of the memory. However, the pruning gMSV is still effective and efficient when the min_sup value goes to very low, because its running time and space requirement do not increase as much as the unpruning gMSV by reducing the subgraph search space via the lower bound of gSide.
The focus of this paper is to investigate side information consistency and explore multiple side views in discriminative subgraph selection. As potential alternatives to the gSpanbased branchandbound algorithm, we could employ other more sophisticated searching strategies with our proposed multisideview evaluation criterion, gSide. For example, we can replace with gSide the G test score in LEAP [37] or the log ratio in COM [17] and GAIA [18], etc. However, as shown in Figs. 5 and 6, our proposed solution with pruning, gMSV, can survive at \(min\_sup=4\%\); considering the limited number of subjects in medical experiments as introduced in Sect. 3.1, gMSV is efficient enough for neurological disorder identification where subgraph patterns with too few supported graphs are not desired.
5.4 Effects of side views
Average classification performances of gMSV on the fMRI dataset with different singleside views
Side views  Precision  Recall  F1 

Neuropsychological tests  0.851  0.679  0.734 
Flow cytometry  0.919  0.872  0.892 
Plasma luminex  0.769  0.682  0.710 
Freesurfer  0.851  0.737  0.785 
Overall brain microstructure  0.824  0.500  0.618 
Localized brain microstructure  0.686  0.605  0.637 
Brain volumetry  0.739  0.737  0.731 
All side views  1.000  0.949  0.973 
Average classification performances of gMSV on the DTI dataset with different singleside views
Side views  Precision  Recall  F1 

Neuropsychological tests  0.630  0.705  0.662 
Flow cytometry  0.847  0.808  0.822 
Plasma luminex  0.801  0.705  0.744 
Freesurfer  0.664  0.632  0.644 
Overall brain microstructure  0.626  0.679  0.647 
Localized brain microstructure  0.717  0.775  0.741 
Brain volumetry  0.616  0.679  0.644 
All side views  1.000  0.951  0.974 
5.5 Feature evaluation
6 Related work
To the best of our knowledge, this paper is the first work exploring side information in the task of subgraph feature selection for graph classification. Our work is related to subgraph mining techniques and multiview feature selection problems. We briefly discuss both of them.
Mining subgraph patterns from graph data has been studied extensively by many researchers. In general, a variety of filtering criteria are proposed. A typical evaluation criterion is frequency, which aims at searching for frequently appearing subgraph features in a graph dataset satisfying a prespecified min_sup value. Most of the frequent subgraph mining approaches are unsupervised. For example, Yan and Han developed a depthfirst search algorithm: gSpan [38]. This algorithm builds a lexicographic order among graphs, and maps each graph to an unique minimum DFS code as its canonical label. Based on this lexicographic order, gSpan adopts the depthfirst search strategy to mine frequentconnected subgraphs efficiently. Many other approaches for frequent subgraph mining have also been proposed, e.g., AGM [14], FSG [22], MoFa [3], FFSM [13], and Gaston [26].
Moreover, the problem of supervised subgraph mining has been studied in recent work which examines how to improve the efficiency of searching the discriminative subgraph patterns for graph classification. Yan et al. introduced two concepts structural leap search and frequencydescending mining, and proposed LEAP [37] which is one of the first works in discriminative subgraph mining. Thoma et al. proposed CORK which can yield a nearoptimal solution using greedy feature selection [33]. Ranu and Singh proposed a scalable approach, called GraphSig, that is capable of mining discriminative subgraphs with a low frequency threshold [28]. Jin et al. proposed COM which takes into account the cooccurrences of subgraph patterns, thereby facilitating the mining process [17]. Jin et al. further proposed an evolutionary computation method, called GAIA, to mine discriminative subgraph patterns using a randomized searching strategy [18]. Our proposed criterion gSide can be combined with these efficient searching algorithms to speed up the process of mining discriminative subgraph patterns by substituting the G test score in LEAP [37] or the log ratio in COM [17] and GAIA [18], etc. Zhu et al. designed a diversified discrimination score based on the log ratio which can reduce the overlap between selected features by considering the embedding overlaps in the graphs [39]. Similar idea can be integrated into gSide to improve feature diversity.
There are some recent works on incorporating multiview learning and feature selection. Tang et al. studied unsupervised multiview feature selection by constraining that similar data instances from each view should have similar pseudoclass labels [31]. Cao et al. explored tensor product to bring different views together in a joint space and presents a dual method of tensorbased multiview feature selection [4]. Aggarwal et al. considered side information for text mining [1]. However, these methods are limited in requiring a set of candidate features as input, and therefore are not directly applicable for graph data. Wu et al. considered the scenario where one object can be described by multiple graphs generated from different feature views and proposes an evaluation criterion to estimate the discriminative power and the redundancy of subgraph features across all views [36]. In contrast, in this paper, we assume that one object can have other data representations of side views in addition to the primary graph view.
In the context of graph data, the subgraph features are embedded within the complex graph structures and usually it is not feasible to enumerate the full set of features for a graph dataset before the feature selection. Actually, the number of subgraph features grows exponentially with the size of graphs. In this paper, we explore the side information from multiple views to effectively facilitate the procedure of discriminative subgraph mining. Our proposed feature selection for graph data is integrated to the subgraph mining process, which can efficiently prune the search space, thereby avoiding exhaustive enumeration of all subgraph features.
7 Conclusion and future work
We presented an approach for selecting discriminative subgraph features using multiple side views. By leveraging available information from multiple side views together with graph data, the proposed method gMSV can achieve very good performance on the problem of feature selection for graph classification, and the selected subgraph patterns are relevant to disease diagnosis. This approach has broad applicability for yielding new insights into brain network alterations in neurological disorders and for early diagnosis.
A potential extension to our method is to combine fMRI and DTI brain networks to find discriminative subgraph patterns in the sense of both functional and structural connections. Other extensions include better exploring weighted links in the multisideview setting. It is also interesting to have our model applied to other domains where one can find graph data and side information aligned with the graph. For example, in bioinformatics, chemical compounds can be represented by graphs based on their inherent molecular structures and are associated with properties such as drug repositioning, side effects, ontology annotations. Leveraging all these information to find out discriminative subgraph patterns can be transformative for drug discovery.
Declarations
Acknowledgments
This work is supported in part by NSF through grants III1526499, CNS1115234, and OISE1129076, Google Research Award, the Pinnacle Lab at Singapore Management University, and NIH through grant R01MH080636.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Aggarwal CC, Zhao Y, Yu PS (2012) On the use of side information for mining text data. TKDE pp 1–1Google Scholar
 BarHillel A, Hertz T, Shental N, Weinshall D (2005) Learning a mahalanobis metric from equivalence constraints. J Mach Learn Res 6(6):937–965MATHMathSciNetGoogle Scholar
 Borgelt C, Berthold MR (2002) Mining molecular fragments: finding relevant substructures of molecules. In: IEEE ICDM, pp 51–58Google Scholar
 Cao B, He L, Kong X, Yu PS, Hao Z, Ragin AB (2014) Tensorbased multiview feature selection with applications to brain diseases. In: IEEE ICDM, pp 40–49Google Scholar
 Cao B, Kong X, Kettering C, Yu PS, Ragin AB (2015) Determinants of HIVinduced brain changes in three different periods of the early clinical course: a data mining analysis. NeuroImageGoogle Scholar
 Cao B, Kong X, Yu PS (2015) A review of heterogeneous data mining for brain disorder identification. Brain Informatics. doi:10.1007/s4070801500213
 Cao B, Zhan L, Kong X, Yu PS, Vizueta N, Altshuler LL, Leow AD (2015) Identification of discriminative subgraph patterns in fMRI brain networks in bipolar affective disorder. In: Brain informatics and health, SpringerGoogle Scholar
 Castelo JMB, Sherman SJ, Courtney MG, Melrose RJ, Stern CE (2006) Altered hippocampalprefrontal activation in HIV patients during episodic memory encoding. Neurology 66(11):1688–1695View ArticleGoogle Scholar
 Chang CC, Lin CJ (2001) LIBSVM: a library for support vector machines. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
 Dai W, Xue GR, Yang Q, Yu Y (2007) Coclustering based classification for outofdomain documents. In: ACM KDD, pp 210–219Google Scholar
 Luthert PJ, Lantos PL (1993) Neuronal number and volume alterations in the neocortex of hiv infected individuals. J Neurol Neurosurg Psychiatry 56(5):481–486View ArticleGoogle Scholar
 Gao C, Wang J (2010) Direct mining of discriminative patterns for classifying uncertain data. In: ACM KDD, pp 861–870Google Scholar
 Huan J, Wang W, Prins J (2003) Efficient mining of frequent subgraphs in the presence of isomorphism. In: IEEE ICDM, pp 549–552Google Scholar
 Inokuchi A, Washio T, Motoda H (2000) An aprioribased algorithm for mining frequent substructures from graph data. In: Principles of data mining and knowledge discovery. Springer, pp 13–23Google Scholar
 Jenkinson M, Pechaud M, Smith S (2005) BET2: MRbased estimation of brain, skull and scalp surfaces. In: Eleventh annual meeting of the organization for human brain mapping, p 17Google Scholar
 Jin N, Wang W (2011) LTS: Discriminative subgraph mining by learning from search history. In: IEEE ICDE, pp 207–218Google Scholar
 Jin N, Young C, Wang W (2009) Graph classification based on pattern cooccurrence. In: ACM CIKM, pp 573–582Google Scholar
 Jin N, Young C, Wang W (2010) GAIA: graph classification using evolutionary computation. In: ACM SIGMOD, pp 879–890Google Scholar
 Kong X, Ragin AB, Wang X, Yu PS (2013) Discriminative feature selection for uncertain graph classification. In: SIAM SDM, pp 82–93Google Scholar
 Kong X, Yu PS (2010) Multilabel feature selection for graph classification. In: IEEE ICDM, pp 274–283Google Scholar
 Kong X, Yu PS (2010) Semisupervised feature selection for graph classification. In: ACM KDD, pp 793–802Google Scholar
 Kuramochi M, Karypis G (2001) Frequent subgraph discovery. In: IEEE ICDM, pp 313–320Google Scholar
 Langford TD, Letendre SL, Larrea GJ, Masliah E (2003) Changing patterns in the neuropathogenesis of hiv during the haart era. Brain Pathol 13(2):195–210View ArticleGoogle Scholar
 Mihalkova L, Huynh T, Mooney RJ (2007) Mapping and revising markov logic networks for transfer learning. In: AAAI, vol 7, pp 608–614Google Scholar
 Mihalkova L, Mooney RJ (2009) Transfer learning from minimal target data by mapping across relational domains. In: IJCAI, vol 9, pp 1163–1168Google Scholar
 Nijssen S, Kok JN (2004) A quickstart in frequent structure mining can make a difference. In: ACM KDD, pp 647–652Google Scholar
 Ragin AB, Du H, Ochs R, Wu Y, Sammet CL, Shoukry A, Epstein LG (2012) Structural brain alterations can be detected early in HIV infection. Neurology 79(24):2328–2334View ArticleGoogle Scholar
 Ranu S, Singh AK (2009) Graphsig: a scalable approach to mining significant subgraphs in large graph databases. In: IEEE ICDE, pp 844–855Google Scholar
 Shi X, Kong X, Yu PS (2012) Transfer significant subgraphs across graph databases. In: SIAM SDM, pp 552–563Google Scholar
 Smith SM (2002) Fast robust automated brain extraction. Hum Brain Mapp 17(3):143–155View ArticleGoogle Scholar
 Tang J, Hu X, Gao H, Liu H (2013) Unsupervised feature selection for multiview data in social media. In: SIAM SDM, pp 270–278Google Scholar
 Tang W, Zhong S (2006) Pairwise constraintsguided dimensionality reduction. In: SDM workshop on feature selection for data miningGoogle Scholar
 Thoma M, Cheng H, Gretton A, Han J, Kriegel HP, Smola AJ, Song L, Yu PS, Yan X, Borgwardt KM (2009) Nearoptimal supervised feature selection among frequent subgraphs. In: SIAM SDM, pp 1076–1087Google Scholar
 TzourioMazoyer N, Landeau B, Papathanassiou D, Crivello F, Etard O, Delcroix N, Mazoyer B, Joliot M (2002) Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI singlesubject brain. Neuroimage 15(1):273–289View ArticleGoogle Scholar
 Wang X, Foryt P, Ochs R, Chung JH, Wu Y, Parrish T, Ragin AB (2011) Abnormalities in restingstate functional connectivity in early human immunodeficiency virus infection. Brain Connect 1(3):207–217View ArticleGoogle Scholar
 Wu J, Hong Z, Pan S, Zhu X, Cai Z, Zhang C (2014) Multigraphview learning for graph classification. In: IEEE ICDM, pp 590–599Google Scholar
 Yan X, Cheng H, Han J, Yu PS (2008) Mining significant graph patterns by leap search. In: ACM SIGMOD, pp 433–444Google Scholar
 Yan X, Han J (2002) gspan: graphbased substructure pattern mining. In: IEEE ICDM, pp 721–724Google Scholar
 Zhu Y, Yu JX, Cheng H, Qin L (2012) Graph classification: a diversified discriminative feature selection approach. In: ACM CIKM, pp 205–214Google Scholar