- Open Access

# Local dimension-reduced dynamical spatio-temporal models for resting state network estimation

- Gilson Vieira
^{1}Email authorView ORCID ID profile, - Edson Amaro
^{2}and - Luiz A. Baccalá
^{3}

**2**:11

https://doi.org/10.1007/s40708-015-0011-5

© The Author(s) 2015

**Received:**13 October 2014**Accepted:**19 January 2015**Published:**3 February 2015

## Abstract

To overcome the limitations of independent component analysis (ICA), today’s most popular analysis tool for investigating whole-brain spatial activation in resting state functional magnetic resonance imaging (fMRI), we present a new class of local dimension-reduced dynamical spatio-temporal model which dispenses the independence assumptions that severely limit deeper connectivity descriptions between spatial components. The new method combines novel concepts of group sparsity with contiguity-constrained clusterization to produce physiologically consistent regions of interest in illustrative fMRI data whose causal interactions may then be easily estimated, something impossible under the usual ICA assumptions.

## Keywords

- Resting state fMRI
- Dynamical spatio-temporal models
- Brain connectivity
- Sparsity

## 1 Introduction

There is an ever-growing and pressing need for accurately describing how brain regions are dynamically interrelated in resting state fMRI [4]. Thanks to the nature of BOLD signals, resting state interactions cannot be split into separate space and time descriptions, especially if the focus lies on characterizing spatial changes associated to a small number of regions of interest. The chief challenge is that any dynamical spatio-temporal model (DSTM) of fMRI datasets demands many parameters to describe what is also a large number of observed variables which, nonetheless, enjoy a great deal of spatial redundancy [3, 5, 37]. Estimating the spatial origin of signal variability for only relatively short sample sizes using DSTMs is problematic especially under the rather usual unfavourable signal-to-noise ratio (SNR) conditions [8, 24, 28, 34].

To circumvent limitations of modelling high-dimensionality systems, Wikle and Cressie [33] proposed dimension-reduced DSTMs aimed at capturing nonstationary spatial dependence under optimal state representations using Kalman filtering. In their formulation of DSTM, they invoke an a priori defined orthogonal basis to expand the redistribution kernel of a discrete time/continuous space, linear integro-difference equation (IDE) in terms of a finite linear combination of spatial components [33]. This idea was further supported in [14] and extended in [26] who considered parametrized redistribution kernels of arbitrary shape that meet homogeneity conditions in both space and time. Even though the base changes of [33] improve the understanding of high-dimensional processes, they by no means ensure sparse solutions which are key to achieving statistically robust dynamical descriptions.

Model robustness has alternatively been sought by indirect means as, for example, thru LASSO regression [29] and basis pursuit [6] for model selection and denoising, or sparse component analysis for blind source separation [39] and finally by iterative thresholding algorithms for image deconvolution and reconstruction [12, 17]. The latter methods seek sparsity by maximizing a penalized loss function in a compromise between the goodness of fit and the number of basis elements that make up the signal. Recently, more attention has been given to group sparsity, where groups of variables are selected/shrunken simultaneously rather than individually (for a review see [2]). This is achieved by minimizing an objective function that includes a quadratic error term added to a regularization term that considers a priori beliefs or data-driven analysis to induce group sparsity [35, 36, 38].

The present paper extends the results in [31] about local dimension-reduced DSTMs (LDSTMs) involving state-space formulations that are suited to datasets of high dimensionality such as fMRI. LDSTMs take advantage of a sparsifying spatial wavelet transformation to represent the data thru fewer significant parameters which are then combined via sparsity and contiguity-constrained clustering to initialize the observation matrix and sources of a tailored expectation maximization (EM) algorithm. The main assumptions here are that the system is overdetermined (there exist more observed signals than sources) and that the columns of the observation matrix act as point-spreading functions (see Sect. 2). Finally, results are gauged using simulated data (Sect. 4) followed by further method illustration with directed connectivity disclosure using real fMRI resting state data.

## 2 Problem formulation

The EM algorithm has long been the favourite tool to solve (1,2) for \({\mathbf {x}}_{t}\) because (3) is sure to converge to at least a local maximum [13, 27]. The traditional EM algorithm starts with randomly generated solutions for all parameters and then proceeds by re-iterating its two main steps until the maximum of (3) is attained. It begins with the E-step where the unknown \({\mathbf {x}}_{t}\) are replaced by their expected values given the data and current model parameter estimates. Under gaussian assumptions, the expected system \({\mathbf {x}}_{t}\) are obtained via the Rauch–Tung–Striebel (RTS) smoother [25]. In the second algorithm step, the M-step, one estimates model parameters by maximizing the conditional expected likelihood from the previous E-step. In practice, EM algorithm performance degrades rapidly for high-dimensional systems under (1,2). Its solution may even become indeterminate and improper initialization, in fact, often deteriorates estimate quality.

To achieve robust EM solutions, we take into account two common neuroscientist concerns as to what constitute meaningful brain activity components: (a) \({\mathbf {x}}_{t}\) be an economic (i.e. compact/low dimensional) dynamical representation of the brain resting state fMRI dataset as a whole and (b) solutions must be spatially localized, i.e. their associated activation areas mathematically reflect point-spreading functions. We show that the latter assumptions not only allows estimating (1,2) parameters but also \({\mathbf {x}}_{t}\) using the simpler Local Sparse Component Analysis discussed in [32] on \({\mathbf {z}}_{t}\). The nutshell description of the present algorithm is represented in Fig. 1. The aim is to find initial estimators for the observation matrix and system states which are used to initialize a EM algorithm for maximization of (3).

## 3 Algorithm details

### 3.1 Sparsifying spatial wavelet transformation

### 3.2 Contiguity-constrained clustering

The next step consists of determining which time series of wavelet coefficients \(\hat{\mathbf {s}}^{m}\) are associated to each spatial component \({\mathbf {a}}_{k} {\mathbf {x}}^{k}\), where \({\mathbf {a}}_{k}\) is the \(k\)-th column of \({\mathbf {A}}\) and \({\mathbf {x}}^{k}\) is the \(k\)-th row of \({\mathbf {X}}\). For this, we use the spatial localization assumption. As the columns of the observation matrix are point-spreading functions, they should be perfectly described by wavelet coefficients forming localized spatial patterns. In this case, each spatial component can be determined using a clustering algorithm enforcing spatial contiguity. One way of achieving this is to apply complete linkage hierarchical clustering with the help of a dissimilarity measure that combines the time series temporal correlation and the physical distance between the wavelet coefficients. In this case, complete linkage hierarchical clustering is attractive because it yields relatively homogeneous clusters, a key property for subsequent accurate reduction of cluster dimensionality.

### 3.3 Within cluster dimensionality reduction

### 3.4 LDSTM parameter estimation

## 4 Numerical illustration

We used Daubechies (D2) functions to transform the data and gauged performance by executing 100 Monte Carlo simulations leading to the mean and deviation results as shown in Fig. 3. Algorithm effectiveness was evaluated in terms of how well sources were recovered, as measured by their correlation to the estimated \({\mathbf {x}}_{t}\), and by how well \({\mathbf {H}}_{l}\) and \({\mathbf {Q}}\) could be estimated as evaluated by computing the connectivity between states using Partial Directed Coherence (PDC) [1].

### 4.1 Simulation results

## 5 Real FMRI data

For further illustration purposes, we used fMRI images from seven healthy volunteers under a resting state protocol (approved by the local ethical committee and under individual informed written consent).

### 5.1 Image data acquisition

Whole-brain fMRI images (\(\hbox {TR}=600\hbox { ms}, \hbox {TE}=33\hbox { ms}, 32\) slices, \(\hbox {FOV} = 247 \times 247 \hbox { mm}\), matrix size \(128 \times 128\), in-plane resolution \(1.975 \times 1.975 \hbox { mm}\), slice thickness \(3.5\hbox { mm}\) with \(1.8\hbox { mm}\) of gap) were acquired on a 3T Siemens system using a Multiplexed Echo Planar Imaging sequence (multi-band accelerator factor of \(4\)) [16]. To aid in the localization of functional data, high-resolution T1-weighted images were also acquired with an MPRAGE sequence (\(\hbox {TR} = 2500\hbox { ms}, \hbox { TE} = 3.45 \,\hbox {ms}\), inversion time = 1000 ms, \(256 \times 256 \hbox { mm}\) FOV, \(256 \times 256\) in-plane matrix, \(1 \times 1 \times 1 \hbox { mm}\) voxel size, \(7\,^{\circ }\) flip angle).

### 5.2 LDSTM preprocessing

Motion and slice time correction and temporal high-pass filtering (allowing fluctuations above \(0.005\,\hbox {Hz}\)) were carried out using FEAT \(\hbox {v}5.98\). The fMRI data were aligned to the grey matter mask via FreeSurfer’s automatic registration tools (v. 5.0.0) resulting in extracted BOLD signals at regions with preponderantly neuronal cell bodies. To further group analysis by temporal concatenation of the participants’ fMRIs, individual grey matter images were registered to the 3-mm-thick Montreal Neurological Institute (MNI) template using a 12-parameter affine transform. To generate the spatial wavelet transformation, we used 3D Daubechies (D2) functions up to level 3. The model order for the dynamical component in (1) was defined by the Akaike information criterion.

### 5.3 ICA processing

### 5.4 Image results

#### 5.4.1 LDSTM results

The absence of artificial stochastic model constraints permitted exposing the dynamic connectivity between the identified components. Figure 7a summarizes the connectivity network estimated using PDC applied to the reconstructed system components. In addition, PDC also highlights that resting state connectivity is present mainly at low frequencies (Fig. 7b), corroborating several studies of resting state brain connectivity [4].

#### 5.4.2 ICA results

Among the 30 component maps obtained by performing a PICA across all participants, 14 components were considered artifactual components due to scanner and physiological noise. Their signal variances are related to cerebrospinal fluid and white matter, head motion and large vessels. Figure 8 depicts fourteen functional components related to previously report resting state studies. They comprise the default mode network (IC\(2\), IC\(9\), IC\(10\)) and brain regions involved in visual (IC\(1\), IC\(4\)), auditory/motor (IC\(5\)), sensory/motor (IC\(8\)), attentional (IC\(7\), IC\(6\), IC\(12\), IC\(13\)) and executive functions (IC\(7\), IC\(11\), IC\(14\)). In addition, we found 2 components rarely reported in resting state studies. One is a cerebellum component (IC\(16\)) and the other is a brainstem component (IC\(15\)).

## 6 Discussion

The cortical components identified by LDSTM (Fig. 5) reflect most of the data variability and coincide with traditional resting state regions observed across different individuals, data acquisition and analysis techniques. They comprise the default mode network \((\hbox {SC}8)\) and brain regions involved in visual \((\hbox {SC}1, \hbox {SC}2, \hbox {SC}5, \hbox {SC}6)\), motor \((\hbox {SC}13, \hbox {SC}14, \hbox {SC}7)\) and attentional functions \((\hbox {SC}9, \hbox {SC}10, \hbox {SC}17, \hbox {SC}18)\), indicating that most of the ICA components (Fig. 8) can in fact be decomposed into several local sparse components. However, the present results draw attention to the fact that they were obtained without any additional assumption, such as source independence and/or stationarity. All that was assumed was \({\mathbf {a}}_{k}\) spatial localization, which goes along the line of [11]’s observation that ICA effectiveness for brain FMRI is linked to their ability to handle sparse sources rather than independent ones. This could be explained by pointing out that ICA preprocessing steps involve projecting the data into a reduced-dimensional subspace via the singular value decomposition which in turn confines the sources to regions of high signal variance.

PDC analysis shows a network where the information flows from regions in the superior parietal cortex (SPC) to regions in the cerebellum (CER) and anterior cingulate. As expected, the right SPC sends information to the left CER, and left SPC sends information to the right CER. Although the relationship between these structures is known, this stresses two main systems engaging in the mentioned network. The connectivity between SPC and CER is in line with recent studies showing evidence of a cerebellar-parietal network involved in phonological storage [22]. In addition, visual–parietal–cerebellar interactions are expected by following studies of effective connectivity using FMRI [20]. We also observe a network running from the left to right parietal cortex passing through the posterior cingulate. Altogether, we believe that our results provide insight into the mechanisms of how the regions of the fronto-parietal network interact. It also highlights understudied aspects of the cerebellum in this network during resting state.

In our model, LDSTM identified approximately 50 % of components in the cerebellum. This result is surprising as the rate of cerebellar components identified in resting state using ICA is below 20 % in general [4]. Some of these regions seem to be related to noise sources for being located near cerebellar arteries and veins. The components SM1, SM2, SM12, SM17 and SM18 run in the superior surface of the cerebellum near to the superior cerebellar veins, while the components SM8 and SM9 extend into the end of the straight sinus near to internal cerebral veins. On the other hand, the idea that the cerebellum should present as many components as the cortex is encouraging. Many recent fMRI studies have shown that different cerebellar regions are critical for processing higher-order functions in different cognitive domains, in the same way as it occurs in the cortex [30]. In these studies, it is worth mentioning that cerebellar clusters are always smaller than those of corresponding functionality in the cortex. We believe that some differences between ICA and LDSTM may be explained in part by the features along the domain in which they represent the sources.

Since spatial wavelet analysis efficiently encodes the data neighbourhood information via a orthogonal transformation, the present method properly addresses a number of issues involving whole-brain connectivity estimation. The first one is associated to the lack of knowledge about the spatial localization of the sources. The method provides a data-driven approach to locate the main sources of data variability, thus avoiding the effects and uncertainties due to a priori region of interest delineation. The second aspect is that the new method naturally employs multi-scale transformations to create a compact model of the images, a feature of growing importance as higher-resolution images are sure to become available in the near future and whose computational processing load may be thereby substantially mitigated. Finally and most importantly, unlike ICA, the method permits deeper connectivity analysis between the identified spatial components as no independence assumption is made ’a priori’.

## 7 Conclusions

Here, an EM-based algorithm was presented for LDSTM identification. By projecting high-dimensional datasets into smoothness spaces, one can describe the system’s spatial components via a reduced number of parameters. Further dimension reduction and denoising is obtained by soft-vector thresholding under contiguity-constrained hierarchical clustering. Finally, simulated results corroborate that the new algorithm can outperform the traditional EM approach even under mild conditions. Even with very large datasets as in the fMRI example, LDSTM shows promise in its ability to parcelate the human brain into well-localized physiologically plausible regions of spatio-temporal brain activation patterns.

## Declarations

### Acknowledgments

CNPq Grants 307163/2013-0 to L.A.B. We also thank NAPNA—Núcleo de Neurociência Aplicada from the University of São Paulo—and FAPESP Grant 2005/56464-9 (CInAPCe) during which time part of this work took place.

### Conflict of interest

The authors declare that they have no conflict of interest.

**Open Access**This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

## Authors’ Affiliations

## References

- Baccala LA, de Brito CSN, Takahashi DY, Sameshima K (2013) Unified asymptotic theory for all partial directed coherence forms. Philos Trans R Soc A 371(1997):20120158View ArticleGoogle Scholar
- Bach F, Jenatton R, Mairal J, Obozinski G (2011) Structured sparsity through convex optimization. arXiv e-print arXiv:1109.2397
- Beckmann CF, Smith SM (2004) Probabilistic independent component analysis for functional magnetic resonance imaging. IEEE Trans Med Imaging 23(2):137–152View ArticleGoogle Scholar
- Biswal BB, Mennes M, Zuo X-N, Gohel S, Kelly C, Smith SM, Beckmann CF, Adelstein JS, Buckner RL, Colcombe S, Dogonowski A-M, Ernst M, Fair D, Hampson M, Hoptman MJ, Hyde JS, Kiviniemi VJ, Ktter R, Li S-J, Lin C-P, Lowe MJ, Mackay C, Madden DJ, Madsen KH, Margulies DS, Mayberg HS, McMahon K, Monk CS, Mostofsky SH, Nagel BJ, Pekar JJ, Peltier SJ, Petersen SE, Riedl V, Rombouts SARB, Rypma B, Schlaggar BL, Schmidt S, Seidler RD, Siegle GJ, Sorg C, Teng G-J, Veijola J, Villringer A, Walter M, Wang L, Weng X-C, Whitfield-Gabrieli S, Williamson P, Windischberger C, Zang Y-F, Zhang H-Y, Castellanos FX, Milham MP (2010) Toward discovery science of human brain function. Proc Natl Acad Sci USA 107(10):4734–4739View ArticleGoogle Scholar
- Blumensath T, Jbabdi S, Glasser MF, Van Essen DC, Ugurbil K, Behrens TEJ, Smith SM (2013) Spatially constrained hierarchical parcellation of the brain with resting-state fMRI. NeuroImage 76:313–324View ArticleGoogle Scholar
- Chen SS, Donoho DL, Saunders MA (1998) Atomic decomposition by basis pursuit. SIAM J Sci Comput 20(1):33–61View ArticleMathSciNetGoogle Scholar
- Combettes PL, Wajs VR (2005) Signal recovery by proximal forward-backward splitting. Multiscale Model Simul 4(4):1168–1200View ArticleMATHMathSciNetGoogle Scholar
- Cortes J (2009) Distributed kriged kalman filter for spatial estimation. IEEE Trans Autom Control 54(12):2816–2827View ArticleMathSciNetGoogle Scholar
- Cressie N, Wikle CK (2011) Statistics for spatio-temporal data. Wiley, HobokenMATHGoogle Scholar
- Damoiseaux JS, Rombouts SARB, Barkhof F, Scheltens P, Stam CJ, Smith SM, Beckmann CF (2006) Consistent resting-state networks across healthy subjects. Proc Natl Acad Sci 103(37):13848–13853View ArticleGoogle Scholar
- Daubechies I, Roussos E, Takerkart S, Benharrosh M, Golden C, D’Ardenne K, Richter W, Cohen JD, Haxby J (2009) Independent component analysis for brain fMRI does not select for independence. Proc Natl Acad Sci USA 106(26):10415–10422View ArticleGoogle Scholar
- Daubechies I, Defrise M, De Mol C (2003) An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. arXiv e-print math/0307152
- Dempster AP, Laird NM, Rubin DB (1977) Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B 39(1):1–38MATHMathSciNetGoogle Scholar
- Dewar M, Scerri K, Kadirkamanathan V (2009) Data-driven spatio-temporal modeling using the integro-difference equation. IEEE Trans Signal Process 57(1):83–91View ArticleMathSciNetGoogle Scholar
- Donoho DL, Johnstone IM, Kerkyacharian G, Picard D (1995) Wavelet shrinkage: asymptopia? J R Stat Soc Ser B (Methodol) 57(2):301–369 ArticleType: research-article/Full publication date: 1995/Copyright 1995 Royal Statistical SocietyMATHMathSciNetGoogle Scholar
- Feinberg DA, Moeller S, Smith SM, Auerbach E, Ramanna S, Glasser MF, Miller KL, Ugurbil K, Yacoub E (2010) Multiplexed echo planar imaging for sub-second whole brain FMRI and fast diffusion imaging. PLoS One 5(12):e15710View ArticleGoogle Scholar
- Figueiredo MAT, Nowak RD (2003) An EM algorithm for wavelet-based image restoration. IEEE Trans Image Process 12(8):906–916View ArticleMATHMathSciNetGoogle Scholar
- Friston KJ, Frith CD, Liddle PF, Frackowiak RSJ (1993) Functional connectivity: the principal-component analysis of large (PET) data sets. J Cereb Blood Flow Metab 13(1):5–14View ArticleGoogle Scholar
- Georgiev P, Theis F, Cichocki A, Bakardjian H (2007) Sparse component analysis: a new tool for data mining. In: Pardalos PM, Boginski VL, Vazacopoulos A (eds) Data mining in biomedicine, number 7 in Springer optimization and its applications. Springer, New York, pp 91–116Google Scholar
- Kellermann T, Regenbogen C, De Vos M, Mnang C, Finkelmeyer A, Habel U (2012) Effective connectivity of the human cerebellum during visual attention. J Neurosci 32(33):11453–11460View ArticleGoogle Scholar
- Lohmann G, Volz KG, Ullsperger M (2007) Using non-negative matrix factorization for single-trial analysis of fMRI data. Neuroimage 37(4):1148–1160View ArticleGoogle Scholar
- Macher K, Bhringer A, Villringer A, Pleger B (2014) Cerebellar-parietal connections underpin phonological storage. J Neurosci 34(14):5029–5037View ArticleGoogle Scholar
- Mallat SG (2009) A wavelet tour of signal processing the sparse way. Elsevier/Academic Press, Amsterdam; BostonMATHGoogle Scholar
- Mardia KV, Goodall C, Redfern EJ, Alonso FJ (1998) The kriged kalman filter. Test 7(2):217–282View ArticleMATHMathSciNetGoogle Scholar
- Rauch HE, Striebel CT, Tung F (1965) Maximum likelihood estimates of linear dynamic systems. J Am Inst Aeronaut Astronaut 3(8):1445–1450View ArticleMathSciNetGoogle Scholar
- Scerri K, Dewar M, Kadirkamanathan V (2009) Estimation and model selection for an IDE-based spatio-temporal model. IEEE Trans Signal Process 57(2):482–492View ArticleMathSciNetGoogle Scholar
- Shumway RH, Stoffer DS (1982) An approach to time series smoothing and forecasting using the em algorithm. J Time Ser Anal 3(4):253–264View ArticleMATHGoogle Scholar
- Stoodley CJ, Schmahmann JD (2009) Functional topography in the human cerebellum: a meta-analysis of neuroimaging studies. Neuroimage 44(2):489–501View ArticleGoogle Scholar
- Theophilides CN, Ahearn SC, Grady S, Merlino M (2003) Identifying west nile virus risk areas: the dynamic continuous-area space-time system. Am J Epidemiol 157(9):843–854View ArticleGoogle Scholar
- Tibshirani R (1994) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B 58:267–288MathSciNetGoogle Scholar
- Vieira G, Amaro E, Baccala LA (2014) Local dimension-reduced dynamical spatio-temporal models for resting state network estimation. In: Hutchison D, Kanade T, Kittler J, Kleinberg JM, Kobsa A, Mattern F, Mitchell JC, Naor M, Nierstrasz O, Pandu Rangan C, Steffen B, Terzopoulos D, Tygar D, Weikum G, lezak D, Tan A-H, Peters JF, Schwabe L (eds) Brain informatics and health. Springer International Publishing, Cham, pp 436–446View ArticleGoogle Scholar
- Vieira G, Amaro E, Baccala LA (2014) Local sparse component analysis for blind source separation: an application to resting state fmri. In: Proceedings of IEEE EMBS conference, IEEEGoogle Scholar
- Wikle CK, Cressie N (1999) A dimension-reduced approach to space-time kalman filtering. Biometrika 86(4):815–829View ArticleMATHMathSciNetGoogle Scholar
- Woolrich MW, Jenkinson M, Michael Brady J, Smith SM (2004) Fully bayesian spatio-temporal modeling of FMRI data. IEEE Trans Med Imaging 23(2):213–231View ArticleGoogle Scholar
- Wright SJ, Nowak RD, Figueiredo MAT (2009) Sparse reconstruction by separable approximation. IEEE Trans Signal Process 57(7):2479–2493View ArticleMathSciNetGoogle Scholar
- Yuan M, Lin Y (2006) Model selection and estimation in regression with grouped variables. J R Stat Soc 68(1):49–67View ArticleMATHMathSciNetGoogle Scholar
- Zalesky A, Fornito A, Harding IH, Cocchi L, Ycel M, Pantelis C, Bullmore ET (2010) Whole-brain anatomical networks: Does the choice of nodes matter? NeuroImage 50(3):970–983View ArticleGoogle Scholar
- Zhao P, Rocha G (2009) The composite absolute penalties family for grouped and hierarchical variable selection. Ann Stat 37(6A):3468–3497 arXiv e-print arXiv:0909.0411
- Zibulevsky M, Pearlmutter BA (2001) Blind source separation by sparse decomposition in a signal dictionary. Neural Comput 13(4):863–882View ArticleMATHGoogle Scholar