Skip to main content

Robust unified Granger causality analysis: a normalized maximum likelihood form


Unified Granger causality analysis (uGCA) alters conventional two-stage Granger causality analysis into a unified code-length guided framework. We have presented several forms of uGCA methods to investigate causal connectivities, and different forms of uGCA have their own characteristics, which capable of approaching the ground truth networks well in their suitable contexts. In this paper, we considered comparing these several forms of uGCA in detail, then recommend a relatively more robust uGCA method among them, uGCA-NML, to reply to more general scenarios. Then, we clarified the distinguished advantages of uGCA-NML in a synthetic 6-node network. Moreover, uGCA-NML presented its good robustness in mental arithmetic experiments, which identified a stable similarity among causal networks under visual/auditory stimulus. Whereas, due to its commendable stability and accuracy, uGCA-NML will be a prior choice in this unified causal investigation paradigm.


Granger causality analysis (GCA) [1, 2], as a statistical predicting tool, provided causal descriptive relationships of candidate events in a sense of extra residual of regression comparing. Original GCA only describes the information flows between variables mathematically, which is predictive and may not truly describe the underlying causal relationships between events in a strict philosophic sense. However, due to its simple form of data-driven causal discovery paradigm, GCA has been widely applied and developed after it w as introduced into brain science. Considering the limitation of conventional GCA research paradigm, we proposed a unified paradigm of uGCA to investigate causal networks in the brain [3, 4]. This unified causal investigation paradigm is based on the category of code length to guide causal discovery, and then with the help of the principle of the minimum description length (MDL) principle to guide the generalized model selection of the whole process. Unified mathematical theory, no subjective choice of confidence level, and free comparison of candidate models make uGCA more advantageous.

Till now, we have extended several forms of uGCA behind introducing the crude two-part form, which actually is formalized upon different mathematical theories. The uGCA-TP form deriving by a two-part coding scheme, which to describe the fitting error term and model complexity term, behaved such as a Lagrange duality solving procedure. On the other hand, specifying some priors to its parameter space, the uGCA-MIX form adapts to behave such as a Bayes estimator, a simple approximation to this model selection issue is applied to derive the stochastic information criteria (SIC). In earlier two-part codes, it still remains some inherent redundancy. Thus the normalized maximum likelihood (NML) form of MDL, taking into account Fisher information, was developed based on the coding scheme of Shtarkov [5, 6]. In general, NML form restricted the early second part description of two-part MDL into a data space identified by parameter estimation [7]. This scheme for the generic model selection was formally introduced by Rissanen in 1996 and discussed its association with minimax theory. A sharper description length as the stochastic complexity and the associated universal process is derived for a class of parametric processes [8]. In addition, this description form is motivated by the maximum-likelihood estimate (MLE) which requires satisfying the Central Limit Theorem (CLT) [6, 9]. In this light, the associated uGCA-NML seems to be a more sensible choice, which not only eliminates the inherent redundancy in the coding process but also releases the priors to describe parameter space.

In previous studies, we focused on demonstrating the advantages of the uGCA paradigm over the conventional GCA paradigm. Although the characteristics of several different forms of uGCA had been described [4], we did not make a choice between them. In this study, we conclude that uGCA-NML will be a better selection for the most causal investigations. Not only for the advantages mentioned above, but for most of the current scientific researches which all tend to follow the research convention of larger samples and bigger data, these will yield to the requirements of uGCA-NML regarding the CLT to the more extent. At the same time, we consider that uGCA-NML is more consistent with our original intention of investigating causality based on a unified mathematical principle, and this form can more precisely incorporate the generalized model selection issues into the code length guided framework.

The rest of the article is organized as follows. In Sect. 2, we first briefly demonstrate the code length guided causal investigation paradigm. Then the uGCA-NML, deriving from the NML form, has been stated in detail, its generalized formulas also have been derived within a general model class. Immediately, the formula of description length guided causal investigation in an ordinary linear model is yielded out. In Sect. 3, we illustrate its advantages over other uGCA forms in 6-node network synthetic experiments. More importantly, in a task-related fMRI data set, uGCA-NML methods identified the consistent and more stable results of causality investigation of mental arithmetic networks under different stimuli. Sections  4 and 5 demonstrate comparisons among several forms from a mathematical modeling standpoint, and discuss its following potential development.


Initially, we attempt to integrate the whole process of causal discovery into a unified mathematical theoretical framework. Inspired by the development of current coding theory and general computer theory, we consider incorporating the generalized model selection issues of GCA into the same benchmark, from which a unified code length guided causal investigation paradigm has emerged. At the same time, derived from information theory and stochastic complexity, the MDL principle has presented a systematic solution to the optimization problem of generalized model selection, and has different forms to cope with the diversity of data sources. Consequently, we developed the uGCA paradigm to explore causal relationships based on code length by means of the MDL principle.

Description length guided causal investigation

Considering two variables, \({X_{N}}\) and \({Y_{N}}\), the description models associating with \({X_{N}}\) represent as

$$\begin{aligned} {\left\{ \begin{array}{ll} X_{t}=\sum _{j=1}^{n1}a_{1i}X_{t-j}+\epsilon _{1t}\\ X_{t}=\sum _{j=1}^{n2}a_{2i}X_{t-j}+\sum _{j=1}^{n3}b_{2i}Y_{t-j}+\epsilon _{2t}, \end{array}\right. } \end{aligned}$$

where \(\epsilon _{t}\) is fitting residual. Distilling the concept of GCA paradigm, causal effect from Y to X within uGCA paradigm is defined by

$$\begin{aligned} \begin{aligned} F_{Y\rightarrow X}=L_{X}-L_{X+Y}, \end{aligned} \end{aligned}$$

where \(L_{X}\) denotes the shortest coding length of restricted model in Eq. (1), and \(L_{X+Y}\) denotes the shortest coding length of unrestricted model in Eq. (1) after adding \(Y_{N}\). Causal effect from Y to X existed when \(F_{Y\rightarrow X} >0\), or else no causal effect existed between them. The conditional form of GCA already had been introduced into uGCA paradigm, which also was extended to large-scale network analysis [3, 4]. Then, the derivation process for obtaining the coding length associated its optimal model in uGCA-NML form was illustrated below in detail.

uGCA-NML—minimax solution for inherent redundancy

Recur to the universal coding, suggested by Kolmogorov, it constructs a code for data sequences such that asymptotically, as the length of data sequence increases, the average code length per symbol would approach the entropy generated the data. Different universal coding schemes thus can be compared in terms of the average code redundancy in its worst-case process, i.e., maximizing the average code length excess over its entropy in the candidate model class. Later on, Clarke and Barron [10, 11], further provided a very accurate asymptotic formula for the code redundancy, defined by a mixture density:

$$\begin{aligned} f_{w}(x^{n})= \int f(x^{n}|\theta )\mathrm{d}\omega (\theta ), \end{aligned}$$


$$\begin{aligned} E_{\theta }\ln \dfrac{f(x^{n}|\theta )}{f_{w}(x^{n})}=\frac{k}{2}\ln \frac{n}{2\pi e}+\ln \dfrac{|I(\theta )|^{1/2}}{\omega (\theta )}+0(1). \end{aligned}$$

Decades ago, universal coding has evolved into the so-called universal modeling, which is no longer restricted to how to encode data but rather to pursue optimal models, above all an optimal universal model. Distill these thinkings, a universal modeling principle, the MDL for statistical inference, then, generalizes the older idea of parameter estimator in statistics [12,13,14], and it incorporates the model complexity which affects all aspects of model performance into its coding scheme [8].

Unfortunately, code length within earlier extended coding theorems [15, 16] cannot be sharpened to distinguish by a constant; however, large the data is, and the second term in the right-hand side of Eq. (4) suggests that the constant term can be large indeed when the Fisher information of data generating machinery is nearly singular. Hence, code lengths such as the stochastic complexity would not serve the intended purpose to provide a yardstick, by which model classes can be compared in accordance with a finite and possibly even small amount of data. For this reason, Rissanen pointed that the issues of coding data sequences in a non-redundant procedure [8], should be reconsidered efficiently while paying attention to any potentially large additional terms that may arise.

Among the earlier coding schemes, one stands out as an intuitively appealing candidate for the sought-for code, the so-called maximum-likelihood estimator (MLE) , given by

$$\begin{aligned} {\hat{f}}(x^{n})=\dfrac{ f(x^{n}|{\hat{\theta }}(x^{n}))}{\int f(x^{n}|{\hat{\theta }}(x^{n}))\mathrm{d} x^{n}}, \end{aligned}$$

and finite alphabets were also dealt in [17, 18] but without an explicit easy-to-calculate formula. Obviously, for infinite alphabets, the integral domain must be finite for the code to exist. By presenting an implementable version of this coding scheme, in which the maximum-likelihood estimates \({\hat{\theta }}(x^{n})\) are quantized, it had been shown that was equivalent with a two-part code, as discussed in [13], with the inherent redundancy removed. In this case, as long as \({\hat{\theta }}_{n}\) exists for all \(x^{n}\), we have

$$\begin{aligned} P^{(n)}_{nml}(x^{n})=\dfrac{P_{{\hat{\theta }}_{n}}(x^{n})}{\sum P_{{\hat{\theta }}_{n}}(x^{n})}. \end{aligned}$$

The sequence of distributions \(P^{1}_{nml}\), \(P^{2}_{nml}\),..., constitutes minimax optimal universal model relative to the considered class \({\mathcal {M}}\), it tries to assign to each \(x^{n}\) a probability according to MLE for \(x^{n}\) [19]. In addition, the researches were carried further by [6, 8], for sequences \(x^{n}\) such that \({\hat{\theta }}(x^{n})\in \Gamma\):

$$\begin{aligned} \begin{aligned} L_{n}=-\log f(x^{n}|{\hat{\theta }}(x^{n}))+\frac{k}{2}\ln \frac{n}{2\pi }+\ln \int _{\Gamma }\sqrt{|I(\theta )|}\mathrm{d}\theta +o(1). \end{aligned} \end{aligned}$$

Then, the non-integrability of MLE procedure is the key issue to be solved. However, some of the most important model classes, for example, the class of Gaussian distributions and exponential distributions, are such that the square root of the Fisher information is not integrable nor is the parameter space compact. For these cases, the asymptotic formula Eq. (6) for describing its stochastic complexity term requires a modification, it has been illustrated how such issues can be handled by calculating an asymptotic expression for the stochastic complexity in the all-important Gaussian family, as needed in the regression analysis [8]. As a consequence, in the family of Gaussian distributions, the Fisher information is given by

$$\begin{aligned} |I(\beta ,\tau )|=|S|/(2\tau ^{k+2}), \end{aligned}$$

and the integral of its square root dealt by [6, 9] is

$$\begin{aligned} \int _{\beta ^{'}S\beta \le R}\int _{\tau _{0}}^{\infty }|I(\beta ,\tau )|^{1/2}d\tau d\beta =(2|S|)^{1/2}\left({\frac{R}{\tau _{0}}}\right)^{k/2}\frac{V_{k}}{k}, \end{aligned}$$

where \(V_{k}R^{\frac{k}{2}}=|S|^{-\frac{1}{2}}2(\pi R)^{\frac{k}{2}}/k\Gamma (\frac{k}{2})\) denotes the volume of a k-dimensional ball \(B=\{\beta ^{'}S\beta \le R\}\). Lower bound \(\tau _{0}\) is determined by the precision which the data are written, then \({\hat{\tau }}_{0}=RSS/n\) and \({\hat{R}}=({\hat{\beta }}^{'}X^{'}_{t-k}X_{t-k}{\hat{\beta }})/n\) obtained by MLE. Thus a code length, that is the shortest code length (\(L_{X}\) or \(L_{X+Y}\)), derived from Eq. (7) arrives at

$$\begin{aligned} \begin{aligned} { L_{uGCA-NML} }=n\ln \sqrt{2\pi \tau } +\frac{RSS}{2\tau }+\frac{k}{2}\ln \frac{n}{2}-\log \Gamma \left({\frac{k}{2}}\right)+\frac{k}{2}\log \frac{{\hat{R}}}{\tau _{0}}-2\log k . \end{aligned} \end{aligned}$$

Synthetic experiment protocol

To reveal the specialty of uGCA-NML among several forms, a synthetic network was given by

$$\begin{aligned} \begin{aligned} {\left\{ \begin{array}{ll} x1_{t}=0.68x1_{t-1}-0.24x1_{t-2}+0.45x2_{t-1}-0.15x2_{t-2}+\epsilon _{1}\\ x2_{t}=0.76x2_{t-1}-0.34x2_{t-2}+0.33x1_{t-1}-0.12x1_{t-2}+\epsilon _{2}\\ x3_{t}=0.72x3_{t-1}-0.36x3_{t-2}+0.30x1_{t-1}-0.09x1_{t-2}+\epsilon _{3}\\ x4_{t}=0.68x4_{t-1}-0.22x4_{t-2}+0.42x2_{t-1}-0.19x2_{t-2}+0.33x5_{t-1}-0.14x5_{t-2}+\epsilon _{4}\\ x5_{t}=0.62x5_{t-1}-0.29x5_{t-2}+0.32x2_{t-1}-0.12x2_{t-2}+0.42x4_{t-1}-0.18x4_{t-2}+\epsilon _{5}\\ x6_{t}=0.75x6_{t-1}-0.26x6_{t-2}+0.41x3_{t-1}-0.22x3_{t-2}+0.38x5_{t-1}-0.15x5_{t-2}+\epsilon _{6}. \end{array}\right. } \end{aligned} \end{aligned}$$

Then, several uGCA forms and conventional GCA were compared their characteristics in this synthetic 6-node network, its structural network is presented in Fig. 1. Noise terms \(\epsilon _i (i=1, 2,..., 6)\) were Gaussian distribution with mean 0.

Fig. 1

Relationships of simulation data sets in the 6-node networks

fMRI data within mental arithmetic experiment protocol

In this mental arithmetic experiment, we let ten subjects perform simple one-digit (consisting of 1–10) serial addition (SSA) and complex two-digit (consisting of 1–5) serial addition (CSA) by visual stimulus and simultaneously measured their brain activities with fMRI. Immediately following, each subject was asked to perform the same serial addition arithmetic tasks by an auditory stimulus. Nine right-handed healthy subjects (four female, \(24 \pm 1.5\) years old) and one left-handed healthy female subject (24 years old) participated. One of the subject’s(a right-hand male) experimental data was removed due to excessive head motion. All subjects volunteered to participate in this study with the informal written consent by themselves.

Experiments and results

Synthetic data

Figure 2 illustrates causal networks obtained by several uGCA forms and conventional GCA. For true connectivities, except for uGCA-MIX, several uGCA forms and conventional GCA all have an admirable property. As shown in the previous research [4], uGCA-MIX had more chances of producing false negatives because of introducing some priors on estimated parameter distribution. The uGCA-TP and uGCA-NML forms had a very stable identification performance for the true positive rate (TPR). As for false connectivities, the advantages of uGCA paradigm have emerged distinctly. Specifically, uGCA-MIX and uGCA-NML ensured a higher true negative rate (TNR), which meant they both would identify a sparse connection network. Even for uGCA-TP, its false positives also were stifled at a low level. However, poor identification was obvious for conventional GCA in eliminating false connectivities, whatever its confidence level is 0.05 or 0.01. Especially for \(1 \rightarrow 6\), \(2 \rightarrow 6\), quite a few false positives existed. Although experimental results illustrated that increasing confidence level improved its TNR, the subjectivity of confidence level selection would bring another problem to be dealt with. That is, the ground truth is given in a synthetic data experiment, but in real data, its prior knowledge is usually deficient, which leads to the lack of a uniform yardstick to choose a confidence level. Clearly, the comparisons were presented in Table 1, uGCA-NML obtained higher TNR and TPR, which was less affected by the varied noise. At the same time, uGCA-NML identified the most outstanding ground-truth rate, which conveyed the method’s ability to recognize the real situation more directly and precisely. However, all methods would produce more false connectivities as the noise variance increased, which all were associated with the connectivities \(1 \rightarrow 6\), \(2 \rightarrow 6\). We consider these increased false connectivities within different noise terms that are due to this specific structural network in Fig. 1 [3, 4]. Generally speaking, uGCA-TP, uGCA-NML, and conventional GCA all had a good anti-interference ability for noise [4]. However, clearly, the uGCA-NML can identify true connectivities with a higher TPR, while ensures higher TNR to eliminate false connections.

Fig. 2

Causal connectivities obtained by several uGCA forms and conventional GCA. Top row represented results in low noise level (var = 0.2), the middle was middle noise level (var = 0.4), the bottom denoted high noise level (var = 0.6). The data length was set to 1000

Table. 1 Comparison between uGCA methods and conventional GCA under different noise level

To further confirm the priority of uGCA-NML, data length was ranged from 150 to 500. For conventional GCA, it identified all true connectivities with high accuracy when data length was above 500, shown in Fig. 3. However, several false connectivities were also increased to a high level when varied data lengths from 200 to 1000, such as \(1 \rightarrow 6\), \(2 \rightarrow 6\). For uGCA-TP form, it ensured a high TPR when data length was 300. Then varying data length to 500, all true positives were almost fully identified. The uGCA-TP can eliminate false positives as its data length increased, but the false connectivity \(1 \rightarrow 6\) had some increase either. As for uGCA-MIX, it obtained a higher accuracy in identifying true negatives within a shorter data length than uGCA-TP. However, uGCA-MIX can not identify the true positives with a high accuracy even data length is 1000. Thus, it stifled false positives to a very low level, which had the highest accuracy in eliminating these spurious connectivities, then identified a very sparse connection network. Similarly, uGCA-NML can ensure good accuracy in identifying true positives as data length was above 300, and almost fully obtained these connectivities when data length was 500. And the direct comparisons illustrated in Table 2, uGCA-NML almost identified a ground truth network in Fig. 1 for every synthetic data sample when data length was 500. On the contrary, other uGCA forms cannot reach the same accuracy when date length was above 300. By the way, these results demonstrated that when data length is below 200, distorted causal networks are identified for both individuals and groups, leading causal investigations unconvincing. And this specific structural network also led to a decline in the accuracy of the ground-truth rate, TPR, and TNR, for which the false connections almost were from \(1 \rightarrow 6\), \(2 \rightarrow 6\). Therefore, due to the increase of data length, the performance of causal investigation in uGCA-NML had the most obvious improvement. The uGCA-NML seems to rely on long data length to ensure admirable identification ability and is less affected by noise terms. Of course, the uGCA-TP can be regarded as a conservative choice.

Fig. 3

Causal connectivities obtained by uGCA and conventional GCA under different data length. From top row to bottom row, the data length is 150, 200, 300, 500

Table. 2 Comparison between uGCA methods and conventional GCA under different data length

fMRI data within mental arithmetic experiment

During tasking, these working scenarios of the brain were mental arithmetic tasks, thus these working scenarios can be considered similar regardless of specific stimuli (visual or auditory), respectively. Through the Statistical Parametric Mapping (SPM) software, we can get their mental arithmetic activation regions of the brain, shown in Fig. 4. In these mapped regions through statistical inference, these methods identified causal connectivities of the mental arithmetic network in their own feature space. Then, by comparing their similarities of mental arithmetic networks under different stimuli, we can quantitatively compare their characteristics of several uGCA forms in the causal network of real fMRI data [3, 20].

Fig. 4

Mental arithmetic of CSA-control state under the two stimuli (visual and auditory), the activation regions were processed by SPM12. a CSA-control state under visual stimulus. b CSA-control state under auditory stimulus (\(P < 0.0001\), uncorrected)

To compare the similarities among causal networks of different methods, we consider quantifying the mutual information between causal networks under visual and auditory stimulus. Let the joint distribution of two random variables (XY) be p(xy) , and the marginal distribution be p(x) , p(y) , respectively, and the mutual information is the relative entropy of the joint distribution p(xy) and the marginal distribution p(x), p(y), that is

$$\begin{aligned} I(X;Y)=\sum _{x\in X}\sum _{y\in Y}p(x,y)\log \frac{p(x,y)}{p(x)p(y)}. \end{aligned}$$

In our mental arithmetic experiment, variable (XY) are the causal networks under visual/auditory, respectively. Intuitively, these two causal networks should be isomorphic mapping, which means their mutual information will maintain a high level. Thus, the priority of different causal investigation methods can be compared by the mutual information between two causal networks, shown in Fig. 5. Clearly, the mutual information of uGCA paradigm revealed that uGCA had a more admirable identification for causal connectivities than conventional GCA whatever the confidence level is. Comparing several form uGCA, their mutual information all held on a high level, and are in good agreement with the simulation results. However, results among these 9 samples illustrated that uGCA-NML obtained a more stable identification level, which demonstrated its priority. In general, uGCA paradigm can ensure a clear superiority over the conventional GCA, and uGCA-NML can be the most recommended choice among these several forms.

Fig. 5

Mutual information of the obtained mental arithmetic networks under two stimuli(visual stimulus and auditory stimulus)

To further illustrate this superiority, causal networks obtained by different methods on individuals shown in Fig. 6, respectively. From the mutual information in Fig. 5, uGCA-TP obtained causal networks with the highest similar level in these samples. Clearly, uGCA paradigm identified more similar causal networks between two stimuli than conventional GCA. In these samples, most connecting edges in the mental arithmetic network (containing nodes 1, 2, 3, 4) were identical, only a few edges were different. Even in the whole tasking networks, there only were several different edges in their 6-node networks. In fact, \(2^{30}\) possible sub-causal connection networks can be generated in a random 6-node network (only contains 0 and 1). In subject 1, uGCA-TP and uGCA-NML had 7 different edges, uGCA-MIX had 10 different edges. However, as for the driven nodes, uGCA-NML and uGCA-TP obtained a more identical result, which node 4 was the driven node. For subject 5, uGCA-TP had 7 different edges and uGCA-NML had 9 different edges when uGCA-MIX had 5 different edges. Although these, several uGCA forms all obtained an identical driven node, node 2. In subject 8, uGCA-TP and uGCA-MIX only had 3 different edges, uGCA-NML also only had 4 different edges. Obviously, their driven nodes were also identical. In general, these identical mental arithmetic networks obtained through uGCA-MIX showed that the isomorphic mapping phenomenon of three subjects was legible, which meant that the ability of subjects to perform mental arithmetic tasks may be more prominent. Although uGCA-TP had a better performance of similarity measurement in mutual information, uGCA-NML seemed to be more identical in their causal network structure. On the other hand, these results illustrated that uGCA-MIX had a poor anti-interference capability. As mentioned above, uGCA-NML can identify true connections well when eliminating the influence of false connections, then obtain a more sparse connection matrix.

Fig. 6

Causal network in the mental arithmetic tasks obtained by uGCA methods and conventional GCA, respectively. With the conventional GCA approach, connected edges of causal networks in two different stimuli were to a large extent distinct. In contrast, for uGCA methods, their connection networks commonly showed high similarities. Node 1, 2, 3, 4 was involving the inside network of mental arithmetic tasks. As for different stimuli input nodes, they were CAL.L, CAL.R, ITG.L, and ITG.R, respectively. The solid lines represent causal connectivities within the mental arithmetic network, and the dashed lines represent causal connectivities involving the input stimulus nodes


Combining previous and current synthetic data experiments, in this study, we further provided more evidence to demonstrate the priority of uGCA-NML for causal investigation. As we discussed in previous studies, due to some priors employing on the parameter estimation, uGCA-MIX preferred to obtain a sparse causal network, but it may sometimes (in some specific noise level or network architecture) lead to very poor causal identification results because of this over-fitting model selection procedure. As for uGCA-NML, no matter what its noise level was, it can eliminate the influence of false connections better when found real connections, so as to get a sparse connection matrix more accurately. Turn to uGCA-TP, its overall performance may be a compromise between uGCA-MIX and uGCA-NML [3, 4].

In the fMRI experiment, we have demonstrated in previous studies that the mental arithmetic networks obtained by uGCA were more similar, and the isomorphic phenomenon seemed more obvious. Compared with conventional GCA, in which only a few subjects seemed to show clear isomorphism, uGCA integrates the conventional two-stage GCA scheme into a unified framework. And we considered that this isomorphic mapping involving mental arithmetic is a continuous closed process, which requires to keep a consistency of mathematical principles in that quantification process of isomorphism, otherwise, a breakpoint may be brought in. In mathematics, it named a singular point, whose related operations should be closed, otherwise, the processed results may be may deviate from the original space and become very distorted. A widely accepted view states that the original model space of generating the data set can not be found at all. Thus, toward the length of coding model complexity, several uGCA forms provided different solutions, which mapping the descriptive model into different feature spaces to approach the original model space in different aspects. With the help of mutual information, we further compared several uGCA forms and conventional GCA. The uGCA paradigm had a clear priority over conventional GCA. Then, among these forms, uGCA-NML obtained a more stable result, while it ensured accurate causal networks, which identified high-level similarities of causal connectivities. By the way, uGCA-TP also obtained nearly identical connection networks under visual/auditory stimuli, and uGCA identified some acceptable results either. Adopting a crude two-part coding version, uGCA-TP benefits from this parsimony coding scheme, it will also have some advantages in real fMRI data.

To sum up, uGCA-NML has certain preferential selectivity among these forms. Compared with uGCA-TP, it eliminates the inherent redundancy of model parameter estimation, while compared with uGAC-MIX, it does not require a prior and has a more stable causal identification. Moreover, these results indicated that causal isomorphism does exist during mental arithmetic tasks. Actually, the postulation that the isomorphic mapping of the brain under similar tasks is not fabricated from the single experimental phenomena. Gradually over the years, some researchers have tried to demonstrate this capability that the brain perceives our world by the analogical reasoning [21,22,23,24,25,26,27]. And some other researchers also suggested using category theory to mathematically demonstrate how analogical reasoning in the human brain get rid of the spurious inferences that puzzle traditional artificial intelligence modeling (called systematicness) [28,29,30]. As a consequence, a more unified causal investigation method, uGCA-NML, will more appropriate for the brain with such logical rigor.


The uGCA paradigm first maps the original space into a unified code length guided space, and then to identify the causal connectivities. Therefore, this allows data sets to hold their original correlations as much as possible, thus obtaining an optimal approximate description for their correlations in the original space. Actually, different uGCA forms provided different aspects to approach the ground truth, and obtained the optimal descriptive model in their own characteristic spaces. In this paper, we conclude a standpoint that uGCA-NML owns a priority among these several uGCA forms. Although several uGCA forms have their own different advantages, especially for this kind of exploratory study of causal investigation, the comparison of different methods is still controversial. However, for causal investigation in our unified code length guided framework, uGCA-NML will be the most recommended choice.

Availability of data and materials

Data is available from the corresponding author upon request.


  1. 1.

    Granger CWJ (1969) Investigating causal relations by econometric models and cross-spectral methods. Econometrica 37(3):424–438

    Article  Google Scholar 

  2. 2.

    Granger C, Newbold P (1974) Spurious regressions in econometrics. J Econometr 2:111–120

    Article  Google Scholar 

  3. 3.

    Li F, Wang X, Lin Q, Hu Z (2020) Unified model selection approach based on minimum description length principle in granger causality analysis. IEEE Access 8:68400–68416

    Article  Google Scholar 

  4. 4.

    Hu Z, Li F, Wang X, Lin Q (2021) Description length guided unified granger causality analysis. IEEE Access, 9:13704–13716.

    Article  Google Scholar 

  5. 5.

    Shtarkov YM (1987) Universal sequential coding of single messages. Transl Prob Inform Transmission 23:175–186

    MathSciNet  Google Scholar 

  6. 6.

    Barron A, Rissanen J, Yu B (1998) The minimum description length principle in coding and modeling. IEEE Trans Inform Theor 44(6):2743–2760

    MathSciNet  Article  Google Scholar 

  7. 7.

    Hansen MH, Yu B (2001) Model selection and the principle of minimum description length. Publicat Am Statis Assoc 96(454):746–774

    MathSciNet  Article  Google Scholar 

  8. 8.

    Rissanen JJ (1996) Fisher information and stochastic complexity. IEEE Trans Inform Theor 42(1):40–47

    MathSciNet  Article  Google Scholar 

  9. 9.

    Rissanen J (2000) Mdl denoising. IEEE Trans Inform Theor 46(7):2537–2543

    Article  Google Scholar 

  10. 10.

    Clarke BS, Barron AR (1990) Information-theoretic asymptotics of Bayes methods. IEEE Trans Inform Theor 36(3):453–471

    MathSciNet  Article  Google Scholar 

  11. 11.

    Clarke BS, Barron AR (1994) Jeffrey's prior is asymptotically least favorable under entropy risk. J Statis Plan Inference 41(1):37–60.

    MathSciNet  Article  MATH  Google Scholar 

  12. 12.

    Hannan EJ, Rissanen J (1982) Recursive estimation of mixed autoregressive-moving average order. Biometrika 69(1):81–94

    MathSciNet  Article  Google Scholar 

  13. 13.

    Rissanen J (1983) A universal prior for integers and estimation by minimum description length. Ann Statis 11(2):416–431

    MathSciNet  Article  Google Scholar 

  14. 14.

    Rissanen J (1987) Stochastic complexity. J R Statis Soc 49(3):223–239

    MathSciNet  MATH  Google Scholar 

  15. 15.

    Rissanen J (1984) Universal coding, information, prediction, and estimation. IEEE Trans Inform Theor 30(4):629–636

    MathSciNet  Article  Google Scholar 

  16. 16.

    Rissanen J (1986) Stochastic complexity and modeling. Ann Statis 14(3):1080–1100.

    MathSciNet  Article  MATH  Google Scholar 

  17. 17.

    Davisson L (1983) Minimax noiseless universal coding for Markov sources. IEEE Trans Inform Theor 29(2):211–215

    MathSciNet  Article  Google Scholar 

  18. 18.

    Krichevsky R, Trofimov V (1981) The performance of universal encoding. IEEE Trans Inform Theor 27(2):199–207

    MathSciNet  Article  Google Scholar 

  19. 19.

    Grünwald PD (2007) The minimum description length principle. MIT Press, USA

    Book  Google Scholar 

  20. 20.

    Li F, Wang X, Shi P, Lin Q, Hu Z (2020) Neural network can be revealed with functional mri: evidence from self-consistent experiment. Nat Commun (Submitted)

  21. 21.

    Medin DL, Goldstone RL, Gentner D (1993) Respects for similarity. Psychol Rev 100(2):254

    Article  Google Scholar 

  22. 22.

    Gentner D, Markman AB (1997) Structure mapping in analogy and similarity. Am Psychol 52(1):45

    Article  Google Scholar 

  23. 23.

    Hofstadter DR (1995) Fluid concepts and creative analogies: computer models of the fundamental mechanisms of thought. Basic books

  24. 24.

    Hofstadter DR (2001) Analogy as the core of cognition. The Analogical Mind: Perspectives from Cognitive Science 499–538

    Google Scholar 

  25. 25.

    Hofstadter DR (2008) Metamagical Themas: questing for the essence of mind and pattern. Basic books

    Google Scholar 

  26. 26.

    Gick ML, Holyoak KJ (1980). Analogical problem solving. Cognitive Psychology 12:306–355

    Article  Google Scholar 

  27. 27.

    Hummel JE, Holyoak KJ (1997) Distributed representations of structure: a theory of analogical access and mapping. Psychol Rev 104(3):427

    Article  Google Scholar 

  28. 28.

    Phillips S, Wilson WH (2010) Categorial compositionality: a category theory explanation for the systematicity of human cognition. PLoS Comput Biol 22;6(7):e1000858.

    Article  Google Scholar 

  29. 29.

    Phillips S, Wilson WH (2011) Categorial compositionality II: universal constructions and a general theory of (quasi-) systematicity in human cognition. PLoS Comput Biol 7(8):e1002102.

    Article  Google Scholar 

  30. 30.

    Phillips S, Wilson WH (2012) Categorial compositionality III: F-(co)algebras and the systematicity of recursive capacities in human cognition. PLoS One 7(4):e35028.

    Article  Google Scholar 

Download references


Not applicable.


This work is supported in part by National Key Research and Development Program of China under Grant 2018YFA0701400, in part by the Public Projects of Science Technology Department of Zhejiang Province under Grant LGF20H180015.

Author information




ZH, FL conceived and designed the experiments. FL performed the experiments and analyzed the data. FL wrote the manuscript. ZH, FL, MC, JS, YT and QL edited and approved the final version of the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Zhenghui Hu.

Ethics declarations

Ethics approval and consent to participate

The data set used in the present work was obtained from a study on students carried out at the Zhejiang University of Technology. The study and all procedures were approved by the College of Science, Zhejiang University of Technology Internal Review Board; all subjects volunteered to participate in this study with the informal written consent by themselves.

Consent for publication

We consent for the publication of this article

Competing interests

The authors declare that there is no conflict of interests regarding the publication of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Hu, Z., Li, F., Cheng, M. et al. Robust unified Granger causality analysis: a normalized maximum likelihood form. Brain Inf. 8, 15 (2021).

Download citation


  • Unified Granger causality analysis
  • Normalized maximum likelihood
  • Inherent redundancy
  • Granger causality analysis
  • FMRI