- Original Research
- Open Access

# A structural equation model for imaging genetics using spatial transcriptomics

- Sjoerd M. H. Huisman
^{1, 2}, - Ahmed Mahfouz
^{1, 2}, - Nematollah K. Batmanghelich
^{4}, - Boudewijn P. F. Lelieveldt
^{1, 2, 3}, - Marcel J. T. Reinders
^{1, 2}Email author and - for the Alzheimer’s Disease Neuroimaging Initiative

**Received:**24 January 2018**Accepted:**21 October 2018**Published:**2 November 2018

## Abstract

Imaging genetics deals with relationships between genetic variation and imaging variables, often in a disease context. The complex relationships between brain volumes and genetic variants have been explored with both dimension reduction methods and model-based approaches. However, these models usually do not make use of the extensive knowledge of the spatio-anatomical patterns of gene activity. We present a method for integrating genetic markers (single nucleotide polymorphisms) and imaging features, which is based on a causal model and, at the same time, uses the power of dimension reduction. We use structural equation models to find latent variables that explain brain volume changes in a disease context, and which are in turn affected by genetic variants. We make use of publicly available spatial transcriptome data from the Allen Human Brain Atlas to specify the model structure, which reduces noise and improves interpretability. The model is tested in a simulation setting and applied on a case study of the Alzheimer’s Disease Neuroimaging Initiative.

## Keywords

- Imaging genetics
- Brain genetics
- Structural equation modelling
- ADNI
- Allen Brain Atlas

## 1 Introduction

The aim of imaging genetics studies is to find associations between genetic variants and imaging features, often in a disease context [1]. This scheme extends beyond traditional genome-wide association studies (GWAS) by identifying genetic associations of imaging biomarkers with the assumption that these biomarkers are a more direct reflection of the genetic effects. Thus, they could provide a stronger association signal [2]. Additionally, the identified associations are likely to provide new insights into the underlying disease mechanisms as well as new hypotheses about the anatomical and/or functional locations involved in complex diseases [3].

So far, imaging genetics studies have been largely focused on the brain [1, 3–6], despite efforts to extend their application to other fields [7]. Several large consortia have gathered data from thousands of subjects to understand the effects of genetic variants on brain structure and function [8]. One of the hallmark sources for imaging genetics studies is the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database [9]. This database contains single nucleotide polymorphism (SNP) and structural MRI data for Alzheimer’s patients, individuals with late mild cognitive impairment, and cognitive normal controls.

One of the largest challenges facing imaging genetics studies is the statistical power needed to identify reliable associations. In a typical GWAS, researchers have to correct for the number of independent tests performed (i.e. number of independent SNPs tested) in order to limit the number of false-positive discoveries. However, a genome-wide brain-wide imaging genetic study will not only have to correct for the number of independent SNPs, but also for the number of independent imaging features tested. As a result, many studies are underpowered to identify reliable associations. One of the largest imaging genetics studies [10] analysed over 30,000 individuals within the Enhancing Neuro Imaging Genetics through Meta-Analysis (ENIGMA) consortium. They performed a genome-wide association of SNPs with seven brain volumes and identified only eight genome-wide significant SNPs.

Despite the high dimensionality of the imaging data (millions of voxels), the actual number of independent tests for which we need to correct in an imaging genetics study is far smaller than the number of voxels. Due to the spatial relationships between voxels, measurements from neighbouring voxels are usually highly correlated. A common approach is to test genetic associations for anatomically defined brain regions [2]. Several studies have shown that both neuroanatomical parcellation and connectivity of the brain are strongly reflected in gene expression patterns across the brain [11–13]. The public availability of brain transcriptome atlases from the Allen Institute for Brain Science [14] provides an opportunity to use these transcriptional signatures to group the anatomically defined brain regions, further limiting the number of effective tests.

Several methods have been proposed to identify associations between genetic variants and imaging features by applying dimension reduction, such as variations of canonical correlation analysis [15], and independent component analysis (commonly used in a functional MRI context) [4]. Others have opted to model the interactions between the different data types explicitly, for instance using graphical Bayesian models [16, 17] which capture a more mechanistic causal view of the data. These models consist of a directed acyclic graph, which can easily be made to incorporate covariates, including possible confounding factors. Both of these studies use relatively small candidate SNP sets, because they aim for understanding SNP–brain relationships rather than the discovery of genome-wide associations. However, these Bayesian models are quite challenging to specify and fit.

In this work, we propose a method to identify associations between candidate genetic variants and imaging features allowing for the incorporation of prior knowledge. The proposed method combines a graphical model with dimension reduction to model the effect of SNPs on brain imaging features through a set of latent variables. We use a maximum likelihood structural equation modelling (SEM) approach to find the edge weights of our model [18]. By performing dimensionality reduction within the model, we reduce the number of parameters to be estimated. In addition, the model allows for easy incorporation of information from the Allen Human Brain Atlas [12] to inform the grouping of brain regions based on the similarity of their transcriptional profiles.

Our model uses the transcriptional profiles for grouping because we consider gene expression to be an intermediate phenotype, that links SNPs to brain imaging features. Most disease-associated SNPs are located near regulatory regions of the genome [19], and the effects of SNPs on expression tend to be tissue and cell type specific [20]. Gene expression data of brain regions reflect cell type composition and anatomical similarity [12] and capture a wide range of brain-specific molecular pathways [21]. For these reasons the region groups in the dimension reduction are based on spatial gene expression data of the brain.

## 2 Materials and methods

The interplay between genetic variation, brain anatomy, and disease symptoms is complex. We use a structural equation model with latent variables [18] to model these relationships. We pose that the genetic variation is exogenous; in other words, the genetic variation in a study population is not caused by disease or brain anatomy. This variation does have an effect on the brain. For example, in Alzheimer’s disease, genetic variants may influence the immune response and amyloid \(\beta\) concentrations in the brain, which may in turn lead to shrinkage in several brain areas [22]. Large-scale imaging initiatives, such as ADNI, offer a possibility to study this shrinkage of brain regions. This can be estimated from MRI data of diseased individuals and controls, and expressed in cortical thickness and subcortical volume measurements.

In our graphical model, we define groups of brain regions, based on the transcriptional profiles of these areas in the healthy brain. Areas that share patterns of gene expression in a normal brain may be similarly affected by genetic variations. For each of the region groups, we introduce one latent variable. This latent variable is affected by the genetic variations and causes changes in relevant brain regions. This makes our model similar to principal component analysis (PCA) on sets of brain regions, combined with a regression for the latent variables. However, in our model the weights are estimated together, and the latent variables reflect not only the correlations between the regions (as in a conventional PCA), but also those between regions and SNPs and among the SNPs.

### 2.1 Variables used

We model the relationship between single nucleotide polymorphisms (SNPs) and brain region measurements. Let \({\mathbf {g}}_i \in {\mathbb {R}}^{p}\) be a vector of centred (zero-mean) SNP values, and \({\mathbf {x}}_i \in {\mathbb {R}}^{q}\) a vector of centred (zero-mean) and scaled (\({\text {sd}}=1\)) brain region measurements, both for individual *i*. The reason both types of measurements are centred is to eliminate intercepts from the model. The brain measurements are, in addition, scaled to unit variance to compensate for the considerably larger variance in thickness or volume for larger brain areas. The genetic variants and brain measurements are connected in the model by a set of latent variables, \({\mathbf {z}}_i \in {\mathbb {R}}^{m}\).

In addition to the variables included in the model, we have two other sources of information. In defining the model structure, we make use of external information on the brain region measurements, in the form of brain region groups with a shared transcriptional profile. These groups are defined based on the spatial gene expression data of the healthy adult brain. Finally, the goal is to understand disease-related phenotypes. The disease labels are not used in the modelling stage. However, we hypothesize that if the variation in the data is related to a disease state, the latent variables will reflect this. After model fitting, we therefore associate each individual’s estimated latent variable score with his or her disease status.

### 2.2 The graphical model

### 2.3 Model implied covariance

### 2.4 Model assumptions and estimation

Some elements of the implied covariance are often assumed to be zero. These assumptions lead to a strong simplification of the implied covariance. It is common in a regression setting to pose that the predictor variables and error variables are independent. In our case, the error independence assumption leads to \({\mathbf {g}}\varvec{\varepsilon }^{\mathrm{T}} = \varvec{\varepsilon }{\mathbf {g}}^{\mathrm{T}} = 0\). In addition, we assume that the errors in the brain region predictions (Eq. 1) are independent of the errors in the latent variable predictions (Eq. 2). This means that \(\varvec{\zeta }\varvec{\varepsilon }^{\mathrm{T}} = \varvec{\varepsilon }\varvec{\zeta }^{\mathrm{T}} = 0\). Finally, we assume that the errors in brain region prediction are independent of the SNPs, so \({\mathbf {g}}\varvec{\zeta }^{\mathrm{T}} = \varvec{\zeta }{\mathbf {g}}^{\mathrm{T}} = 0\).

Model fitting is performed in the *lavaan* package in *R* [23]. For identifiability, we fix the loading of the first brain region measurement per region group (latent variable) to 1. This does not only fix the scales of the latent variables, but it also has the advantage that the resulting latent variables will have the same direction of effect as the first brain region measurement. For example, a reduction in volume of the first brain region will result in a reduction in the corresponding latent variable. All the error variances on the brain region measurements (variance of \(\varvec{\zeta }\)) are assumed to be equal within each region group, which is the same as in principal component analysis.

The model fit in *lavaan* yields estimates for \({\mathbf {B}}\), \(\mathbf {A}\), and the covariance matrices of the error variables \(\varvec{\varepsilon }\) and \(\varvec{\zeta }\). Each of these parameter estimates is provided with robust *p* values (for the hypothesis of being equal to zero), when using the *MLM* estimation procedure [23]. Using the estimated model parameters, one can then calculate unbiased Bartlett scores for the latent variables [24].

### 2.5 Data

*Simulated data* The model is evaluated on both simulated and real data. In the simulation, we first generated SNP values (\({\mathbf {g}}_i\)) in accordance with Hardy–Weinberg equilibrium. The minor allele frequencies were independently drawn from a beta distribution with shape parameters \(\alpha =1\) and \(\beta =2\). Then we simulated latent variables (\({\mathbf {z}}_i\)) as a linear combination of the SNP values, with Gaussian noise (\({\hbox {sd}} = 2\)). Each of these latent variables determined the region measurements (\({\mathbf {x}}_i\)) of a set of regions (a region group), with added Gaussian noise (\({\hbox {sd}} = 2\)). This part of the simulation is in line with Eqs. (1) and (2) and Fig. 1. Finally, we used a logistic model in which a linear combination of some of the latent variables determined the probability of observing a phenotype. These binary phenotypes (disease versus healthy) were then drawn from a Bernoulli distribution.

We simulated 100 independent data sets for 500 individuals. Each time, we set the number of SNPs to 20 and the number of latent variables (and therefore region groups) to 5. We randomly selected 10 SNP-to-latent weights (\({\mathbf {A}}\)) to be either 1 or \(-\,1\). The 5 region groups contain 20, 10, 10, 5, and 5 regions, respectively, for a total of 50 brain region measurements. Each latent variable has latent-to-brain-region weights (in \({\mathbf {B}}\)) which were uniformly sampled between 0.5 and 1.5. All other elements of \({\mathbf {B}}\) were set to zero, which effectively restricts each latent variable to affect only its own region group. Finally, two out of the five latent variables were randomly selected to affect the disease probability, with weights of either 10 or \(-10\). All other latent-to-phenotype weights were set to zero.

To test the robustness of our method, we also simulated data for a range of alternative parameter settings. We varied the amount of noise in the latent variables (\({\mathbf {z}}_i\)) and the region measurements (\({\mathbf {x}}_i\)) between 1 and 5. The number of nonzero SNP-to-latent weights (in \({\mathbf {A}}\)) was varied from 2 to 20. Finally, we constructed data sets with misspecified latent-to-brain-region weights (in \({\mathbf {B}}\)). To this end, we swapped links between latent variables and regions. In each swap, a region was disconnected from its original latent variable and instead connected to another latent variable. To retain the sizes of the region groups, another region of that second latent variable was then connected to the first latent variable. Each swap therefore resulted in two misspecified links. We made sure not to swap regions back to their original latent variables.

*ADNI data and preprocessing* The real data used in the preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu) [9]. The ADNI was launched in 2003 as a public–private partnership, led by Principal Investigator Michael W. Weiner, MD. The primary goal of ADNI has been to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), other biological markers, and clinical and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). For up-to-date information, see www.adni-info.org.

The ADNI database contains measurements on a large number of cognitive normal (CN) controls, individuals with late mild cognitive impairment (LMCI), and individuals with Alzheimer’s disease (AD). The measurements in the database include patient demographics, raw and processed MRI data, biomarker data, and SNP data. For the brain volumes we made use of the UCSF cross-sectional FreeSurfer (version 4.3) cortical thickness and white matter parcellation measurements. For the SNPs we made use of the ADNI 1 Illumina Human 610-Quad BeadChip data, with imputation as previously described [17]. In the end, we selected volumes, SNPs, and diagnoses for 746 individuals. These data were split into two equal parts of 373 individuals, one as a training set and one as a validation set, to prevent over-fitting in the modelling process.

Our methodology is not suited to genome-wide analysis. Instead, it tries to find the effects of specific SNPs on a set of latent variables. As candidate SNPs we selected a set of 35 polymorphisms associated with Alzheimer’s disease according to the International Genomics of Alzheimer’s Project (IGAP) study results [25]. IGAP is a two-stage GWAS on individuals of European ancestry for Alzheimer’s disease. In stage 1, IGAP used genotyped and imputed data on 7,055,881 SNPs of 17,008 Alzheimer’s disease cases and 37,154 controls. In stage 2, 11,632 SNPs were genotyped and tested for association in an independent set of 8,572 Alzheimer’s disease cases and 11,312 controls. Finally, a meta-analysis was performed combining results from stages 1 and 2. We selected the known SNPs, stage 1 discoveries, and stage 1 and stage 2 discoveries from table 2, and the suggestive SNPs from supplemental table 4 of [25].

The nine region groups (corresponding to the latent variables), with the brain regions they contain

Region group code | ABA region |
---|---|

CrCortex | Cingulate gyrus |

CrCortex | Frontal lobe |

CrCortex | Insula |

CrCortex | Middle frontal gyrus |

CrCortex | Occipital lobe |

CrCortex | Parahippocampal gyrus |

CrCortex | Parietal lobe |

CrCortex | Temporal lobe |

Hippocam | Hippocampal formation |

Amygdala | Amygdala |

Striatum | Striatum |

DorsThal | Dorsal thalamus |

SubCort1 | Myelencephalon |

SubCort2 | Globus pallidus |

SubCort2 | White matter |

ClCortex | Cerebellar cortex |

SulcSpac | Sulci and spaces |

## 3 Results

### 3.1 Simulation

The second measure for model comparison is the ability to retrieve the correct SNPs. In each of our simulation data sets, two of the five latent variables have an effect on the phenotype (disease status). All SNPs that affect either of these two latent variables effectively impact the phenotype. We consider those SNPs to be the SNPs with a true effect. We now consider how these SNPs are ranked for importance in our SEM analysis, and two alternative approaches. From our SEM fit, we extracted the robust SNP *p* values for predicting the latent variables (so the *p* values for the estimates in \({\mathbf {A}}\)). These give an impression of the importance of a SNP in predicting the latent variables. In addition, we used the latents’ logistic regression *p* values for the phenotype. These show the importance of a latent variable in predicting the phenotype. As a result, the path from a SNP to the phenotype contains two *p* values per latent variable: one for the latent variable prediction and one for the phenotype prediction.

We considered combining these *p* values in two ways: (1) for each SNP we took the maximum *p* value of the two per latent variable and then the minimum *p* value over the five latent variables; or (2) for each SNP we used Fisher’s method [26] to combine the two *p* values per latent variable (\(-2 \sum \log (p_i)\)) and then took the minimum *p* value over the five latent variables. Note that Fisher’s method is meant for *p* values testing the same null hypothesis, which is not the case here. Both methods yield a score (*p* value) for SNP importance. We varied a threshold for this score from 0 to 1 and compared the set of SNPs with values below this threshold to the set of SNPs with a known true effect. In this way, we constructed a receiver operating characteristic curve for SNP retrieval and calculated the corresponding area under the curve (AUC).

*p*value per SNP–latent combination (

*SEM max*) or using Fisher’s method (

*SEM Fisher*), and of the GWAS-like approaches. The

*SEM max*method has the highest average AUC, indicating that it is best able to rank the SNPs on their importance for the phenotype. Note that the

*SEM Fisher*method has the disadvantage that either a strong SNP-to-latent or a strong latent-to-phenotype effect can lead to a low combined

*p*value, regardless of the other value. The observed difference between the univariate and multivariate approach is very small, which is to be expected since the simulated SNP values are independent.

To test the robustness of our model, we also compared the models for a range of alternative simulation settings. Additional file 2: Fig. S1 shows the results of these simulations. The amount of noise on the latent variables has a similar impact on all compared methods. With a large amount of noise on the brain region measurements, the prediction of phenotypes remains best with our model, but the identification of SNPs is better with methods that do not make use of these region volume data. The number of SNPs with a nonzero effect on the latent variable has little impact on the simulation results. Misspecification of the region groups, on the other hand, has a negative impact specifically on the performance of our method. This shows that our approach is somewhat sensitive to the specification of brain region groups.

### 3.2 ADNI application

We apply our methodology to the Alzheimer’s Disease Neuroimaging Initiative (ADNI) data [9]. We selected 35 SNPs and 105 brain region volumes for 746 individuals. The brain regions were divided into nine region groups based on the gene expression patterns of matching brain areas in the healthy human brain [12, 21]. Each of the nine brain region groups has one corresponding latent variable, and each latent variable has a unique set of brain region measurements attached to it. Additional file 3: Fig. S2 shows the volume loadings for each of the latent variables. Since the first loading for each latent variable is set to 1, the latent variables will have the same direction of effect as this variable. All but two of the region volumes have a positive loading. Two regions in the subcortical group 2 (*SubCort2*) are negatively correlated to the latent variable scores, reflecting a more heterogeneous signal in this group.

*APOE*, on the hippocampal region group (Bonferroni-corrected \(p = 2.28 \times 10^{-4}\)). In the validation set, here used as a replicate, this effect was again significant (Bonferroni-corrected \(p=8.66 \times 10^{-3}\)). None of the other associations are significant after multiple testing correction. This

*APOE*allele is known to be associated with a decrease in the hippocampal volume, both in individuals with mild cognitive impairment [27] and in Alzheimer’s disease [28].

*APOE*for Alzheimer’s disease.

## 4 Conclusion

We have proposed the use of a maximum likelihood structural equation model for combining SNP data and structural brain area measurements. The model makes use of external gene expression data, to define groups of brain regions that may respond similarly to genetic variation. For each of these region groups, we define a latent variable, which captures the relationship between the regions in a group and genetic variation. We have applied the model to a simulated data set, to show it can capture disease-relevant variation and identify causal SNPs. In addition, we have applied the model to the ADNI data set, containing Alzheimer’s patients, individuals with late mild cognitive impairment, and cognitive healthy controls. One SNP, linked to *APOE*, shows a reproducible significant relationship to the latent variable that captures hippocampal volume change. This latent variable, and the ones representing the cerebral cortex, amygdala, and sulci and spaces, also significantly associate with the disease diagnosis. This shows that our approach can be used to integrate several data types and yield interpretable results.

The fitting process of the structural equation model has relatively high computational cost. It is truly multivariate, which makes it infeasible at the moment to perform genome-wide analysis. It does have advantages for incorporating a large number of variables, since it allows for straightforward inclusion of constraints on the parameter estimates [23]. With a constraint on the sum of squared weights, one could for instance implement a ridge regression. In addition, the model allows for the inclusion of additional data. This can be done either in the specification of the model structure, as we have done for the region groups, or by adding observed variables to the model. In our model, we chose to group brain regions based on the similarity of their expression profiles in the healthy brain. An interesting extension to the model would be to incorporate a layer of latent variables to reflect a grouping of the SNPs. These groups could also be based on the similarity of the brain-wide expression patterns of the associated genes.

These results show that maximum likelihood SEM is a versatile approach for data integration, which can be used to elucidate the relationships between genetic variation, structural brain phenotypes, and brain disease.

## Declarations

### Authors’ contributions

SMHH, AM, NKB, BPFL and MJTR were involved in initiating the study and developing the model. SMHH drafted the initial manuscript and performed the analyses. AM, NKB, BPFL and MJTR edited the manuscript. All authors read and approved the final manuscript.

### Authors’ Information

Sjoerd Huisman is a PhD student in the Delft Bioinformatics Lab at Delft University of Technology. Ahmed Mahfouz is an assistant professor at the Leiden Computational Biology Center at the Leiden University Medical Center. Kayhan Batmanghelich is an assistant professor in the Department of Biomedical Informatics and the Intelligent Systems Program with secondary appointments in the Computer Science and Electrical Engineering Departments at the University of Pittsburgh. Boudewijn P.F. Lelieveldt is professor of Biomedical Imaging at the Department of Radiology of Leiden University Medical Center, Leiden, the Netherlands, where he is heading the Division of Image Processing (LKEB). Marcel Reinders is a professor in Bioinformatics within the Faculty of Electrical Engineering, Mathematics and Computer Science at the Delft University of Technology in which he heads the Pattern Recognition and Bioinformatics section.

### Acknowledgements

Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense Award Number W81XWH-12-2-0012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; Bristol-Myers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; Elan Pharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. Hoffmann-La Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research & Development, LLC.; Johnson & Johnson Pharmaceutical Research & Development LLC.; Lumosity; Lundbeck; Merck & Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California.

Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI contributed to the design and implementation of ADNI and/or provided data, but did not participate in analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.

### Competing interests

The authors declare that they have no competing interests.

### Funding

Funding was provided by the Dutch Technology Foundation STW, as part of the STW Project 12721: *Genes in Space* under the *ImaGene* STW Perspective Program, and by the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement 604102 (Human Brain Project). This work is partially supported by NIH Award Number 1R01HL141813-01. We thank the Competitive Medical Research Fund (CMRF) grant for their funding.

### Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

## Authors’ Affiliations

## References

- Liu J, Calhoun VD (2014) A review of multivariate analyses in imaging genetics. Front Neuroinform 8(March):29. https://doi.org/10.3389/fninf.2014.00029 View ArticleGoogle Scholar
- Hibar DP, Kohannim O, Stein JL, Chiang MC, Thompson PM (2011) Multilocus genetic analysis of brain images. Front Genet 2(October):1–11. https://doi.org/10.3389/fgene.2011.00073 View ArticleGoogle Scholar
- Franke B, Stein JL, Ripke S, Anttila V, Hibar DP, van Hulzen KJE et al (2016) Genetic influences on schizophrenia and subcortical brain volumes: large-scale proof of concept. Nat Neurosci 19(3):420–431. https://doi.org/10.1038/nn.4228 View ArticleGoogle Scholar
- Calhoun VD, Liu J, Adalı T (2009) A review of group ICA for fMRI data and ICA for joint inference of imaging, genetic, and ERP data. NeuroImage 45(1):S163–S172. https://doi.org/10.1016/j.neuroimage.2008.10.057 View ArticleGoogle Scholar
- Vounou M, Janousova E, Wolz R, Stein JL, Thompson PM, Rueckert D et al (2012) Sparse reduced-rank regression detects genetic associations with voxel-wise longitudinal phenotypes in Alzheimer’s disease. NeuroImage 60(1):700–716. https://doi.org/10.1016/j.neuroimage.2011.12.029 View ArticleGoogle Scholar
- Stein JL, Medland SE, Vasquez AA, Derrek P, Senstad RE, Winkler AM et al (2012) Identification of common variants associated with human hippocampal and intracranial volumes. Nat Genet 44(5):552–561. https://doi.org/10.1038/ng.2250.Identification View ArticleGoogle Scholar
- Batmanghelich NK, Saeedi A, Cho M, Estepar RSJ, Golland P (2015) Generative method to discover genetically driven image biomarkers. In: Colchester ACF, Hawkes DJ (eds) Information processing in medical imaging. Volume 511 of lecture notes in computer science. Springer, Berlin, pp 30–42View ArticleGoogle Scholar
- Medland SE, Jahanshad N, Neale BM, Thompson PM (2014) Whole-genome analyses of whole-brain data: working within an expanded search space. Nat Neurosci 17(6):791–800. https://doi.org/10.1038/nn.3718 View ArticleGoogle Scholar
- Mueller SG, Weiner MW, Thal LJ, Petersen RC, Jack C, Jagust W et al (2005) The Alzheimer’s Disease Neuroimaging Initiative. Neuroimaging Clin N Am 15(4):869–877. https://doi.org/10.1016/j.nic.2005.09.008 View ArticleGoogle Scholar
- Hibar DP, Stein JL, Renteria ME, Arias-Vasquez A, Desrivières S, Jahanshad N et al (2015) Common genetic variants influence human subcortical brain structures. Nature 520(7546):224–229. https://doi.org/10.1038/nature14101 View ArticleGoogle Scholar
- Ko Y, Sa Ament, Ja Eddy, Caballero J, Earls JC, Hood L et al (2013) Cell type-specific genes show striking and distinct patterns of spatial expression in the mouse brain. Proc Natl Acad Sci 110(8):3095–3100. https://doi.org/10.1073/pnas.1222897110 View ArticleGoogle Scholar
- Hawrylycz MJ, Lein ES, Guillozet-Bongaarts AL, Shen EH, Ng L, Ja Miller et al (2012) An anatomically comprehensive atlas of the adult human brain transcriptome. Nature 489(7416):391–399. https://doi.org/10.1038/nature11405 View ArticleGoogle Scholar
- Richiardi J, Altmann A, Milazzo AC, Chang C, Chakravarty MM, Banaschewski T et al (2015) Correlated gene expression supports synchronous activity in brain networks. Science 348(6240):1241–1244. https://doi.org/10.1126/science.1255905 View ArticleGoogle Scholar
- Sunkin SM, Ng L, Lau C, Dolbeare T, Gilbert TL, Thompson CL et al (2013) Allen Brain Atlas: an integrated spatio-temporal portal for exploring the central nervous system. Nucleic Acids Res 41(Database issue):D996–D1008. https://doi.org/10.1093/nar/gks1042 View ArticleGoogle Scholar
- Du L, Huang H, Yan J, Kim S, Risacher SL, Inlow M et al (2016) Structured sparse canonical correlation analysis for brain imaging genetics: an improved GraphNet method. Bioinformatics 32(10):1544–1551. https://doi.org/10.1093/bioinformatics/btw033 View ArticleGoogle Scholar
- Stingo FC, Guindani M, Vannucci M, Calhoun VD (2013) An integrative Bayesian modeling approach to imaging genetics. J Am Stat Assoc 108(503):37–41. https://doi.org/10.1080/01621459.2013.804409 MathSciNetView ArticleMATHGoogle Scholar
- Batmanghelich NK, Dalca AV, Quon G, Sabuncu MR, Golland P (2016) Probabilistic modeling of imaging, genetics and diagnosis. IEEE Trans Med Imaging 0062(c):1. https://doi.org/10.1109/TMI.2016.2527784 View ArticleGoogle Scholar
- Bollen KA (1989) Structural equations with latent variables. Wiley, New YorkView ArticleGoogle Scholar
- Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E, Wang H et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science 337(6099):1190–1195. https://doi.org/10.1126/science.1222794 View ArticleGoogle Scholar
- Ardlie KG, Deluca DS, Segre AV, Sullivan TJ, Young TR, Gelfand ET et al (2015) The genotype-tissue expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348(6235):648–660. https://doi.org/10.1126/science.1262110 View ArticleGoogle Scholar
- Huisman SMH, van Lew B, Mahfouz A, Pezzotti N, Höllt T, Michielsen L et al (2017) BrainScope: interactive visual exploration of the spatial and temporal human brain transcriptome. Nucleic Acids Res 45(10):e83. https://doi.org/10.1093/nar/gkx046 View ArticleGoogle Scholar
- Bettens K, Sleegers K, Van Broeckhoven C (2013) Genetic insights in Alzheimer’s disease. Lancet Neurol 12(1):92–104. https://doi.org/10.1016/S1474-4422(12)70259-4 View ArticleGoogle Scholar
- Rosseel Y (2012) lavaan: an R package for structural equation modeling. J Stat Softw 48(2):1–20View ArticleGoogle Scholar
- Distefano C, Zhu M, Mîndrilă D (2009) Understanding and using factor scores: considerations for the applied researcher. Pract Assess Res Eval 14(20):1–11Google Scholar
- Lambert JC, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R, Bellenguez C et al (2013) Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer’s disease. Nat Genet 45(12):1452–1458. https://doi.org/10.1038/ng.2802 View ArticleGoogle Scholar
- Fisher R (1950) Statistical methods for research workers. Biological monographs and manuals. No. V., 11th edn. Oliver and Boyd, EdinburghGoogle Scholar
- Farlow MR, He Y, Tekin S, Xu J, Lane R, Charles HC (2004) Impact of APOE in mild cognitive impairment. Neurology 63(10):1898–1901. https://doi.org/10.1212/01.WNL.0000144279.21502.B7 View ArticleGoogle Scholar
- Schuff N, Woerner N, Boreta L, Kornfield T, Shaw LM, Trojanowski JQ et al (2009) MRI of hippocampal volume loss in early Alzheimer’s disease in relation to ApoE genotype and biomarkers. Brain 132(4):1067–1077. https://doi.org/10.1093/brain/awp007 View ArticleGoogle Scholar