Stroke recovery phenotyping through network trajectory approaches and graph neural networks

Stroke is a leading cause of neurological injury characterized by impairments in multiple neurological domains including cognition, language, sensory and motor functions. Clinical recovery in these domains is tracked using a wide range of measures that may be continuous, ordinal, interval or categorical in nature, which can present challenges for multivariate regression approaches. This has hindered stroke researchers’ ability to achieve an integrated picture of the complex time-evolving interactions among symptoms. Here, we use tools from network science and machine learning that are particularly well-suited to extracting underlying patterns in such data, and may assist in prediction of recovery patterns. To demonstrate the utility of this approach, we analyzed data from the NINDS tPA trial using the Trajectory Profile Clustering (TPC) method to identify distinct stroke recovery patterns for 11 different neurological domains at 5 discrete time points. Our analysis identified 3 distinct stroke trajectory profiles that align with clinically relevant stroke syndromes, characterized both by distinct clusters of symptoms, as well as differing degrees of symptom severity. We then validated our approach using graph neural networks to determine how well our model performed predictively for stratifying patients into these trajectory profiles at early vs. later time points post-stroke. We demonstrate that trajectory profile clustering is an effective method for identifying clinically relevant recovery subtypes in multidimensional longitudinal datasets, and for early prediction of symptom progression subtypes in individual patients. This paper is the first work introducing network trajectory approaches for stroke recovery phenotyping, and is aimed at enhancing the translation of such novel computational approaches for practical clinical application.


Dynamic and multi-domain nature of stroke recovery
The process of neurological recovery after brain injuries such as stroke entails complex interactions among multiple variables that change dynamically over time [1,2]. It is well known that the degree of recovery after stroke varies widely between individuals [3][4][5][6], where each patient's recovery pattern uniquely reflects the combined influence of their lesion size and location [7], baseline health status, time to initial treatment [8], and response to medical treatment or rehabilitation, among many other intrinsic and extrinsic factors. Recovery trajectories furthermore vary depending on the specific neurological domain(s) affected (i.e., for motor, language, or sensory impairments) [9,10], and each of these symptoms may show varying responsiveness to treatment. For example, language problems (aphasia), right-sided motor symptoms, and spatial perceptual problems (hemineglect) are reportedly less responsive than other symptoms to treatment with tissue plasminogen activator (tPA) [11]. Stroke recovery is therefore notoriously heterogeneous in terms Open Access Brain Informatics *Correspondence: s.krishnagopal@ucl.ac.uk of the type and severity of residual symptoms, as well as the timecourse of progression and/or resolution of those symptoms [12].
An important goal for stroke research is to reduce the 'noise' arising from this inherent heterogeneity by stratifying patients who are likely to have similar symptom trajectories. The heterogeneity of symptoms and time-varying recovery patterns inherent to stroke make it an area especially well-suited area for data-driven approaches. The increasing availability of large scale stroke datasets has led to a recent explosion in the use of data-science methods for stroke research [13]. For example, machine learning analyses of stroke clinical data [12,14] have been used to characterize symptom clusters [15], predict outcomes [3], and define composite measures of recovery [16].

Limitations of conventional regression and machine learning approaches versus network science
While machine learning (ML) approaches have been successful in a variety of analytical tasks, they often present challenges for interpretation and subsequent application. By contrast, network science tools are explicit in their modeling, making them more useful for studying medical data where clinical interpretation is paramount. Additionally, typical ML approaches focus on prediction, while taking the outcome itself at face value; in contrast, network approaches attempt to improve how the outcome itself is captured. While ML tools may not be as readily interpretable as network approaches for various types of analyses (such as understanding the interactions that underlie disease recovery patterns) ML still presents several desirable properties, particularly in terms of datadriven predictive ability, and may therefore be useful for prognostication of patient recovery patterns. Conventionally, statistical tools such as mixed-effects regression are used for modeling longitudinal data in disciplines where repeated measures designs are particularly relevant, such as education, motor learning, and psychology [17][18][19]. Mixed-effects models have tremendous flexibility in their ability to accommodate different types of study designs and data types [20][21][22]. Such models are thus increasingly used in fields like neurorehabilitation where serial measures of recovery constitute a central focus [2,18,23]. Most importantly for our present purposes, mixed-effect models provide a means of estimating a unique trajectory for each person by combining the models fixed-effects with random-slopes and intercepts to obtain a unique (non)linear trajectory for each person. These trajectories can then be compared across different domains of recovery to see which domains covary or vary independently from each other.

Network science and trajectory profile clustering for stroke research
Insights into the complex patterns of symptom evolution can be gained through the computational power of network analysis. The field of network medicine [24] studies disease manifestation and progression as a function of multiple interacting disease variables, which may be of similar or different types. Network approaches also produce intuitive data visualizations that can facilitate interaction between clinicians and data scientists to yield novel insights on disease. However, with few exceptions, most network medicine studies have focused on biomolecular data [25,26] rather than characterizing patients' patterns of symptom progression over time.
Recently, Krishnagopal et al. [27] introduced a network-based approach called Trajectory Profile Clustering (TPC) that groups patients based on similar patterns of symptom evolution. The intuitiveness and ability of TPC to integrate variables on multiple different scales make it especially useful for studying disease severity, progression, and recovery. Multi-layer [28] types of trajectory clustering have also shown success in clinically validated disease trajectory prediction in Parkinson's disease. We argue that TPC offers unique advantages for stroke recovery research based on its ability to simultaneously: (i) identify the dominant variables that differentiate stroke recovery subtypes; (ii) account for temporal disease progression patterns; and (iii) delineate distinct symptom groupings. This paper is the first work introducing network trajectory approaches for stroke recovery phenotyping, and is aimed at enhancing the translation of such novel computational approaches for practical clinical application.
When analyzing recovery trajectories with TPC, an obvious question that arises is at what stage do patients begin to stratify into distinct trajectory clusters (i.e., when do they begin to show symptom patterns unique to their recovery subtype)? The timing of medical treatments might be one important influence on the timecourse of recovery subtype stratification. For example in stroke, stratification might be expected to occur based on when patients receive treatments such as tPA or clot retrieval. Naively, this may appear to be a problem of simply measuring the differences between trajectory subtypes at each timepoint. However, since treatment efficacy for the same individual at different timepoints is not unrelated, more sophisticated tools are required to extract the timescale of separation. We can investigate these questions through graphical tools such as graph neural networks in machine learning.

Graph neural networks for the study of neurological disorders
The field of machine learning has been revolutionized by recent advances in deep neural networks, especially convolutional neural networks (CNNs) [29]. Conventionally, CNNs use local connections, shared weights and multiple layers to extract representations of data. However, CNNs work in a Euclidean domain and are best suited for use with images. By contrast, other deep learning methods can operate on a graph domain (i.e., graph neural networks or GNNs) [30]. Convolutional variants of graph neural networks provide a framework for transferring deep learning operators into a non-Euclidean (graphical) domain, and have been successful in a variety of tasks such as graph classification, node identification, link prediction in protein interactions, knowledge graphs, and social network analysis, among others. Of particular interest here, they have been successfully applied to study neurological disorders including Alzheimer's disease and autism [31,32], and could similarly have utility in the study of stroke.

Stroke dataset analyzed
To demonstrate the utility of using TPC and GNN for stroke recovery research, we analyzed cases from the well-characterized NINDS tPA trial data set [33]. This study was a randomized, double-blinded, placebo-controlled trial that compared the effects of intravenous tPA (a thrombolytic agent used in ischemic stroke to dissolve blood clots) versus placebo treatment in 624 patients. The data set captured neurologic deficits on the NIH Stroke Scale (NIHSS) [34], which is the most widely used measurement scale for stroke neurologic deficits, and has well-defined clinimetric properties [35,36]. Each item is scored on a scale (from 0-2 or 0-5), with higher values indicating greater stroke severity. The NINDS tPA trial captured NIHSS scores across 5 time points: at hospital admission, at 2 h, 24 h, 7-10 days, and 3 months poststroke. Here, we examined symptom progression in 11 neurologic domains as assessed by 15 individual item subscores on the NIHSS. A description of the items/ variables is given in Table 1. We excluded a total of 135 cases who had imputed data at any time point (134) and/or had died (118). We excluded these cases because the imputation approach that had been used could distort patterns of change in scores for individual patients (i.e., missing values were imputed as the worst score for each NIHSS item). After exclusions, there were 489 remaining cases for analysis. Further, we treated time as a series of discrete observations, 0-4, starting with the patients' assessment at admission. We have to treat time discretely rather than continuously because of how the data are coded in the NINDS tPA trial database. Ideally, we could measure time continuously in days or years, preserving the variability in assessment times [17][18][19], but that information was not available to us. Instead, in both the mixed-effect and TPC models, we fit trajectories based on discrete time. Although this transformation of the time variable means that absolute changes in time are arbitrary (i.e., 0-2 is 24 h, but 2-4 is potentially 3 months), relative changes in time are still meaningful (i.e., negative slopes mean that neurological deficits were improving over time, at the choosen timepoints) (Fig. 1).

Mixed-effects regression model
To obtain individual trajectories for each person on the different domains of the NIHSS, we fit a series of ordinal (cumulative link [21]) and Poisson (generalized linear [22]) mixed-models with random-intercepts and slopes for each subject. We focus on the ordinal models in the text because cumulative link models are designed to handle ordered, but non-continuous, response data. We present the Poisson models in the Additional file 1: Appendix S1. Although these models provide a reasonable fit to the data, it is less clear that the NIHSS items meet the assumption of treating the responses as counts (i.e., the language item is scored as 0 = normal, 1 = mild aphasia, 2 = severe aphasia, 3 = total/global aphasia, but severe aphasia is not necessarily twice mild aphasia). However, the Poisson model might be more familiar to many readers than the ordinal model, and the models do largely agree in correlations between trajectories over time as shown in the Additional file 1: Appendix S1. The models do not completely agree, however, so we defer to the ordinal model as more appropriate given the scoring of the NIHSS. The NINDS dataset contains repeated measures at 5 time points across 11 different neurological domains (as measured by 15 different NIHSS assessment items), resulting in 75 observations per patient. To understand how changes in different symptoms relate to each other over time, we extracted the random-effects from each model to get a unique trajectory for each individual in each domain. Ideally, this estimation could be done in a single multilevel model that nests time within each domain within each participant. However, because the NIHSS domains all have numerically different maxima, they cannot all be estimated in the same ordinal model. As such, we chose to fit a unique model for each domain, extract the slopes for each person, and then compare those slopes across domains. Thus, it is important to remember that absolute differences in the outcome are difficult (if not impossible) to interpret (i.e., is total sensory loss, a 2 on the sensory domain, equivalent in severity to total gaze palsy, a 2 on the gaze domain?). However, relative changes across domains are still meaningful (i.e., negative slopes mean improvement over time for all domains and if the slopes are positively correlated, that means the symptoms tend to improve together).
Models also included fixed-effects of Time, Group (tPA versus placebo), and the Group × Time interaction. Estimates, standard errors, and p-values for all models are presented in the Additional file 1: Appendix S1. Statistical significance was defined as ( α = 0.05 ) for all tests. Although the fixed-effects are presented in the results below, we want to emphasize that demonstrating the efficacy of tPA is not the goal of our analysis. Our goal is to describe how individuals change over time across domains and see which domains tend to be correlated with each other (doing this first with mixed-effect models and then with TPC). We present the effects of Group and the Group × Time interactions as an internal validity Note that some items (e.g., ataxia) range from 0 to 2, others from 0 to 3 (e.g., language), and others from 0 to 4 (e.g., arm and leg measures). The proportion of participant obtaining that score at each timepoint is shown via a color-coded stacked histogram (total N = 489) check so that these analysis can be compared to past work showing the efficacy of tPA [33].

Trajectory profile clustering
The Trajectory Profile Clustering algorithm [27] is designed to group together patients based on the similarities of their disease trajectories. In essence, it uses graphical tools to generate trajectory profiles for each individual that track their evolution of symptoms across time, then clusters them into communities of similarly behaving individuals that define a recovery subtype. The algorithm proceeds as follows: 1. Model using bipartite networks At time point t we construct a bipartite graph modeling connections between N individuals and V disease variables/ symptoms. The connections between the individuals and symptoms are encoded through an adjacency matrix A t of size N × V . For M time points, we can represent the set of these bipartite graphs as an N × V × M by stacking the A t across time points to generate a tensor X where X ivt gives the value of individual i's disease symptom v at time t.

Threshold and binarize to obtain trajectory profile
We threshold each symptom to set values less than a fixed fraction κ of the maximum score for the symptom to zero. For example, if symptom ν takes score values in (0, 1, 2, 3, 4, 5), if κ = 0.5 , we binarize scores such that scores below 5 × 0.5 = 2.5 are set to 0, and above are set to 1. We call this thresholded matrix the trajectory profile matrix, T i for patient i, that contains a representation of how the set of symptoms that a patient is severely affected by varies with time. The matrix entries of T i are calculated as follows: Since the range of values for each symptom represents the entire scale of severity, this thresholding ensures that patients are only considered 'connected' to symptoms that they severely express. 3. Create a patient-patient network based on trajectory similarity We create a patient-patient network P of all patients. The nodes of this network denote patients, and the strength of a link between patient i and patient j captures the similarity of their trajectory profiles. P has an adjacency matrix given by: In other words, P ij gives the number of matrix entries for which trajectory profile T i has the same value as T j . This formulation implies that symptoms are equally weighted. While the approach is amenable to non-uniform weighting, there is little clinical consensus on the relative importance of different symptoms. Hence, in the interest of not introducing external bias, we choose uniform weighting, adopting an agnostic approach that assumes all symptoms/ indicators are equally important. Other applications may require unequal weighting for symptoms and different time points, in which case one may calculate the patient-patient matrix as follows: where w vt is the weight of symptom v at time t. 4. Cluster the network to identify subtypes We then perform Louvain community detection [37] to maximize the Newman-Girvan modularity function [38] on the network defined by the adjacency matrix P. Such community detection allows us to identify 'communities' of patients, where individuals within a community have a relatively more similar stroke recovery profiles than patients between communities. As is common in network community detection approaches, the number of communities is not set a priori, but rather chosen so that the modularity is maximized. This process allows us to cluster trajectory profiles, and hence patients, into subtypes which have high intra-subtype similarity. The subtypes are denoted by C 1 , C 2 , . . . C L , where each C l is a collection of trajectory profiles of the patients in that subtype, and L is the total number of subtypes. 5. Construct aggregate profiles to characterize each subtype: We average the trajectory profiles of all patients in each community C l to obtain the 'community/subtype profile' S l . The subtype profile is indicative of the symptom features that describe the subtype. More specifically, it is the normalized average of the trajectory profiles of all the patients in that subtype, i.e., S l is a V × M matrix with elements defined by where N l is the total number of individuals in subtype C l .

Graph neural network
Graph neural networks (GNNs) [30] are a machine learning approach that captures the relationships represented in graphs through message passing between the nodes of those graphs. GNNs take a graph as input and pass them several layers of nodes, artificial 'neurons' . Here we use (4) S l vt = i∈C l T i vt N l , graph neural networks to identify the timepoints that are most relevant in determining stroke recovery subtypes. Specifically, we train a graph neural network on symptom-symptom graphs generated at each timepoint, and test the accuracy of a GNN in its ability to classify an individual into the correct recovery subtype using data from a single timepoint. A higher accuracy at a given timepoint implies that the recovery subtypes attributed to the patients are strongly correlated with the symptom profiles at that timepoint. We generate a symptom-symptom interaction graph G t at each timepoint t where the nodes represent the disease symptoms. This graph is undirected (i.e., if node x is connected to node y, vice-versa is also true, i.e., the adjacency matrix of this graph is symmetric). The graph is generated as follows. First we generate a symptompatient binary interaction network for a given timepoint as in step (3) of the previous section. We then project it to symptom space to obtain the symptom interaction profile of each patient at a timepoint. The corresponding adjacency matrix (of size V × V ) for the graph for patient i is given by Lastly, We repeat this for each individual such that there exist N symptom-symptom interaction graphs at each timepoint.
We then separate the individual cases into a training data set (70% of total individuals) and a test data set (30% of total individuals) used to validate our approach. A convolutional graph neural network is trained on a graph classifying task for each time point, with labels provided by the subtypes/communities of that individual. The stratification of individuals into their recovery subtypes at each timepoint is then measured by testing the accuracy of the GNN on the test data for each timepoint.
The graph neural network takes as input symptomsymptom networks where we consider the 15 NIHSS assessment items as the nodes. The network consists of an input layer, a single hidden layer, and an output layer. The hidden layer comprises 64 artificial neurons. The input is processed through two graph convolutional layers with ReLU nonlinearities. We then calculate the graph representation by averaging all the neuron representations in the output layer, which contains an equal number of neurons to the number of subtypes. The output is passed through a softmax classifier that yields the probability of the graph belonging to a particular category/subtype. We use cross-entropy loss and the adaptive moment estimation (ADAM) optimizer.

Key patterns in slopes from the ordinal mixed-effect models
Full details of the models are presented in the Additional file 1: Appendix S1. In brief, however, there were statistically , which showed that the placebo group fared worse overall or improved more slowly over time in several domains compared to the tPA group. Inspection of the person-level coefficients of this model also provides some insights relevant to the current goal of creating behavioral phenotypes. As shown in Fig. 2, and as would be expected given the common occurrence of post-stroke hemiplegia (weakening on one side of the body), some of the strongest correlations were for the ipsilateral arm and leg. The right arm and leg showed a similar timecourse of change in impairment ( r = 0.80 ), as did the left arm and leg ( r = 0.81 ). Second, there are also patterns in the correlation matrix consistent with lateralization of function as affected by unilateral stroke. For instance, the NIHSS item for extinction was positively associated with left arm/leg deficits (both from right hemisphere damage; r = 0.36 − 0.40 ), and much less associated with right arm/leg deficits (from left hemisphere damage; r = 0.08 − 0.14 ). Gaze was also positively associated with extinction r = 0.53 , possibly reflecting the fact that gaze deviation is typically more pronounced in patients with hemineglect [39]. Lastly, language was positively associated with the right arm/ leg (all left hemisphere effected; r = 0.56 − 0.57 ) and trivially associated with the left arm/leg (which are right hemisphere effected, r = − 0.07 − 0.04).
In sum, ordinal mixed-effect regression provides us with analytical replication of past work (i.e., the superiority of tPA to placebo as shown in several different domains of the NIHSS) and new insights into how sets of symptoms co-evolve over time. However, these correlations between individual trajectories do not tell us which individuals cluster together nor do they provide us clear guidance on where cut-offs between different groups of individuals should be drawn, or when clusters begin to show reliable separation. There multiple analytical approaches that one could take to achieve these aims (e.g., cluster analysis of the random-slopes at successive timepoints could theoretically achieve this goal following the mixed-effect models). Acknowledging the diversity of possible methods, we present TPC as a pragmatic method that can both establish the common multidimensional trajectories that individuals tend to show and classify those individuals based on their trajectories.

Stroke recovery subtypes identified by TPC
Maximizing modularity on the patient-patient trajectory-similarity network gives us three distinct recovery subtypes. It is worth mentioning that the number of subtypes are not predetermined, but are optimally chosen such that the modularity is maximized, i.e., the subtypes are optimally separated. Figure 3 shows the clinical profiles of each subtype. The darkness of the shade of grey for each symptom over time denotes the fraction of patients who had a value above threshold for that symptom. To reiterate, the thresholding ensures that patients are only considered affected with symptoms for which they display relatively high severity, defined to be above the population median. Our analysis identifies 3 distinct stroke trajectory profiles that align with clinically relevant stroke syndromes, characterized both by distinct clusters of symptoms, as well as differing degrees of symptom severity over time. Several key features of the identified subtypes warrant comment. First, our TPC approach identifies a 'mildly affected' group that was the least symptomatic of the three subtypes both in terms of baseline severity and 3-month residual symptoms. As a group, this subtype showed a mixture of features that are not clearly lateralizing. In addition, two severely affected subtypes are readily identified that correspond to left and right hemisphere syndromes: We find a 'left motor' subtype, showing severely impaired left arm and leg strength together with hemineglect, but with essentially no right-sided motor symptoms (red boxes, Fig. 3), and a 'right motor' subtype, showing severely impaired right arm and right leg strength together with aphasia (blue boxes, Fig. 3). Additionally, spatial perception scale items are most affected in the 'left motor' group (corresponding clinically to a right hemisphere syndrome with hemispatial neglect). Conversely, the language and question-answering items are most affected in the 'right motor' group (corresponding to a left hemisphere syndrome with primarily expressive aphasia). These findings are in alignment with prior factor analysis on the clinimetric properties of the NIHSS [40] and principal component analysis (PCA) to define common behavioral clusters [41], as well as results from our mixed-effects model in Sect. 3.1. The fact that our results capture clinically relevant subtypes and corroborate these prior findings supports the content validity of this analytic approach. Additionally, our clustering approaches reveal subtype structure at a finer scale (both in terms of symptoms as well as longitudinal symptom evolution) than can be achieved with PCA, and results that are clinically consistent.
It is notable that all three identified trajectory subtypes included both tPA-and placebo-treated patients, suggesting that treatment effects were less defining characteristics of patient recovery profiles than were initial severity and stroke laterality. TPC also provides interesting insights into patterns of symptom prevalence over time across subtypes. Spatial perception deficits (hemineglect) are present in both the left-motor and right-motor subtypes, but tend to be milder and have better resolution in left hemisphere strokes. This observation reinforces the importance of targeted screening during rehabilitation for hemineglect symptoms in both left-and right-hemisphere stroke, since persistent milder symptoms that could be amenable to treatment might otherwise be overlooked. Visual deficits are also present in both left-and right-hemisphere strokes as would be expected, but contrary to conventional understanding that visual deficits resolve less well than hemineglect, the overall prevalence of persistent visual symptoms at 3 months is lower than for hemineglect.

Effects of tPA treatment and time
A natural extension of our TPC subtyping is to study the timecourse of stratification into these subtypes. One might wonder whether subtype (and consequently the expected recovery profile of a patient) is largely driven by baseline symptom severity or by early stroke treatments. Machine learning is particularly well-posed to answer such questions. Since we operate in the graph domain, we use graph neural networks. We first extract trajectory profiles independently on patients who had received tPA within 3 h of stroke onset compared to patients who had received a placebo. In Fig. 4, we see that the identified subtypes retain the same symptom clusters identified in Fig. 3, but that overall symptom severity is lower in the tPA-treated population (Fig. 4A), particularly in symptoms that are dominant identifiers of the group. For instance, in the left-motor group, assessment items for gaze, left arm and leg strength, and pain/ pinprick sensation showed higher recovery in the tPAtreated group (Fig. 4A) versus placebo group (Fig. 4B). Similarly, in the right-motor group, assessment items for right arm and leg strength, and command-following (SLOCCOM) showed higher recovery in the tPA-treated group (Fig. 4A) compared to the placebo group (Fig. 4B). As expected, the effect of tPA is less obvious in the minimally impaired group. We explore in Fig. 5 the accuracy of a neural network in predicting the subtype of an individual given data at a discrete timepoint. We generated a symptom-interaction network for each individual at each timepoint, and trained a convolutional-GNN to learn properties of interaction with neighbors. The convolutions are used for averaging over the neighborhood. If the learned properties for that timepoint are separated according to subtypes, then data at that timepoint is considered a good predictor of the subtype. In the training stage, we assume that the subtype is known to the neural network, which attempts to learn correlations between symptom-interaction patterns and the subtype. We then test to identify if the features learned by the neural network are consistent with the actual subtypes of the test patients.  Figure 5 shows that there is a difference in predictive accuracy at baseline for the tPA vs. Placebo groups. The baseline timepoint is a more accurate predictor of subtype for patients who received tPA. This finding may seem unexpected if one posits that tPA 'rescues' an otherwise poor prognosis with severe baseline symptoms predicting poor outcomes. However, the predictive accuracy for the tPA group rapidly increases during the first 2 h, and given the expected timecourse for therapeutic effects of tPA, this finding provides additional validation of our approach. For the Placebo group, predictive accuracy grows at comparable rates (comparable slopes) up to the 24-h mark, showing peak predictive accuracy at the 7-10 day mark, with standard error on the order 10 −2 . The tPA group showed a further uptick in predictive accuracy by the 3-month mark, suggesting that treatment continues to exert an effect on recovery subtype stratification even in the later stages of post-stroke. One might speculate that this is the result of tPA treatment salvaging a greater 'reserve' of neural tissue for later rehabilitation therapies to act upon. Our report on the rapid increase in predictive accuracy from 2 to 24 h post-stroke furthermore aligns with recent work by Heitsch et al. [42] who reported on the early change in NIHSS scores between 6 and 24 h as a dynamic phenotype associated with longterm outcomes.

Conclusion and future work
In this work we introduce a network-based, data-driven method for stroke recovery analysis. First, we analyze the NINDS tPA stroke dataset using conventional quantitative medicine methods including a ordinal mixedeffects regression model, examining the effects of time, group (tPA vs. Placebo), and their interactions across neurological domains. Further, to identify stroke recovery subtypes and examine their characteristics at a finer resolution, we use the Trajectory Profile Clustering method which accounts not only for symptom severity at different timepoints, but also symptom interactions and their temporal evolution. Of note, although the analytical approach is clinically agnostic, we identify  Test accuracy denoting predictive power of a graph neural network as a function of timepoint. Accuracy plotted separately for patients that received tPA treatment, placebo patients and all patients (tPA treated+placebo). 70% data used for training the neural network, 30% for testing. Number of training epochs = 100. We use a 2 layer graph convolutional neural network with 16 hidden units and relu nonlinearity at both layers subtypes that are clinically relevant. In particular, we identify a mildly affected recovery subtype comprising a larger proportion of patients who received tPA. Additionally, we observed that the two other recovery subtypes stratify as left-versus right-sided hemiplegia. Additionally, we identified that left motor deficits are strongly correlated with deficits in gaze and extinction, whereas right motor deficits correlated with deficits in language. These results again are biologically relevant, and are further validated by convergent findings in the mixed-effects regression models. Lastly, we use graph neural networks to study how much of the stratification into subtypes is identifiable at different time points, and found that stroke recovery trajectories were largely defined within the first 24 h, consistent with the expected pharmacodynamics of tPA treatment delivered in the first 3 h after stroke.
This paper is the first work introducing network trajectory approaches for stroke recovery phenotyping, and is aimed at enhancing the translation of such novel computational approaches for practical clinical application. This work presents a data-driven method that is widely applicable to heterogenous neurological disorders such as stroke, and bridges the fields of predictive medicine and network informatics. Because our approach is uniquely adapted to accommodate input variables on multiple scales, future applications could include the integration of other types of data that may contribute to the heterogeneity of recovery, such as data on patient genotypes.