A wavelet-based feature extraction method is proposed in this section to harness multi-domanial (time/frequency/space) information of the MEG data. Unlike conventional signal processing approaches which try to address the problem from time or/and frequency perspectives, the proposed new approach is trying to also leverage the spatial information of the signals that is available, thanks to the high-resolution data provided by the employed MEG systems. Coupled with the time–frequency analysis of wavelet decomposition, the novel method proposed here is performing a spatial–time–frequency analysis on the MEG data for a three-way classification on AD, MCI and HC participants.

This section is divided into two parts. Section 2.1 describes the overall system design; the expert system pipeline starts from raw brain signals and terminates with the final classification decision. Section 2.2 is devoted to highlighting the new approach based on using a two-dimensional wavelet decomposition, and will focus on the proposed new segmentation approach for MEG signal patching, inspired by image segmentation techniques.

### System design

All the brainwave signals in this study were captured using the 306-channel Elekta MEG systems [2, 3], equipped with two major types of sensor: 102 magnetometers and 204 planar gradiometers (102 derivative along latitude and 102 derivative along longitude directions [15]). Magnetometers consist of a single coil which measures the magnetic flux perpendicular to its surface. Planar gradiometers consist of a "figure-of-eight"-type coil configuration. The measured signal is the difference between the two loops of the "eight" [15]. The data collected by magnetometer and gradiometer sensors may be considered as two different modalities and were processed separately in this work (Fig. 1), due to the different meaning and measuring scale.

The original sampling frequency of the signals captured through Elekta system was 1000 Hz. To extract the frequency band of interest and avoid the possibility of aliasing, the signal was band-limited to 200 Hz for fast computation using a 4th order low-pass Butterworth filter, which was further band-passed to 80 Hz for better capturing the signal trend. Conventionally, for EEG signals, only up to 60 Hz is made use for brain activation analysis, whereas MEG signal may be capable of revealing useful content contained within higher frequency ranges [16].

The low-passed signals were firstly segmented into 10-s (2000 samples) epochs. A number of sensors from a selected region of interest (ROI) were grouped to produce a series of *M* × *N* matrices that can be interpreted as 2D images. For each of these images, *M* denotes the number of samples per epoch and *N* represents the number of sensors for each ROI. The images were fed into a series of two-dimensional wavelet packet decomposition (WPD) filter banks, where they were used for the preliminary feature extraction [17]. The details of this approach are presented in Sect. 2.2.

After the preliminary feature extraction, the resulting wavelet coefficients were processed separately for magnetometer and gradiometer channels. For the magnetometer, the statistical mean of the WPD coefficients for each image was computed, whereas for the data from gradiometers, the standard deviation (SD) of the image after the derivative of the wavelet coefficient was computed (Fig. 1). Computing the mean and SD for the final stage of feature extraction is based on a few well-established approaches reported in [5, 18], and serves for both dimension reduction and effective feature extraction. The resulting features were then fed to a 3-nearest neighbour (3-NN) and a quadratic Bayes normal classifier (QBNC) to generate confidence scores. These two classifiers are the best performed models after a range of tests, include SVM and neural nets with optimized parameters. The selection of these classifiers is based on preliminary investigations on a number of mainstream classifiers, including back-propagation neural networks, support vector machine with polynomial kernel, and linear discriminate analysis. Due to the limited data available, deep learning networks were not employed in this study. To leverage the two modalities (magnetometer- and gradiometer-based), a score-level fusion was used to achieve a further improvement of the recognition performance.

### Two-dimensional WPD for MEG image

One novelty of the proposed approach is concatenating multiple MEG signals into an image for a two-dimensional wavelet multivariate analysis. Rather than conducting feature extraction for each sensor individually, multiple nearby sensors (follow the standard MEG sensor labelling system [19]) are concatenated to form an MEG image, with its horizontal direction representing the spatial information (sensor, *N*), its vertical direction indicating the time information (epoch, *M*).

Compared to EEG, MEG systems are usually equipped with larger number of sensors (roughly 300 vs. 100), and such dense sensor distribution facilitates a higher spatial resolution for the measured brainwave. However, signals captured by sensors are conventionally treated independently for feature extraction, disregarding the fact that nearby sensors are picking similar activities from underlying neuronal sources, and combining multiple nearby sensors may enhance the measurement of the underlying activity. Ignoring the spatial relationships of sensors, therefore, makes it difficult to benefit from the advantage of the high spatial resolution provided by MEG sensor matrix.

In this study, the two-dimensional wavelet packet decomposition (2D-WPD) is employed for analysing the characteristics of signals, obtained through magnetometers and gradiometers. For each MEG image, the 2D-WPD initially produces four nodes of sub-band coefficients for each level of decomposition in the wavelet domain, namely: approximation, horizontal detail, vertical detail and diagonal detail coefficients [17, 20]. The approximations reflect the low-frequency part (shape) of the signal; the remaining three “detail” nodes extract the high-frequency part of the signal along three directions (domains): horizontal detail reflects the time; vertical detail reflects the space and diagonal detail reflects both the time and spatial information. Multi-resolution analysis can also be achieved across overlapped frequency ranges as the decomposition level is increased (see Fig. 2). For an image defined by \({ }f\left( {x,y} \right)\), the 2D-WPD can be expressed as:

$$ f\left( {x,y} \right) = \frac{1}{{\sqrt {MN} }}\mathop \sum \limits_{m} \mathop \sum \limits_{n} W_{\varphi } \left( {j_{0} ,m,n} \right)\varphi_{{j_{0} ,m,n}} \left( {x,y} \right) + \frac{1}{{\sqrt {MN} }}\mathop \sum \limits_{i = H,V,D} \mathop \sum \limits_{{j = j_{0} }}^{\infty } \mathop \sum \limits_{m} \mathop \sum \limits_{n} W_{\psi }^{i} \left( {j,m,n} \right)\psi_{j,m,n}^{i} \left( {x,y} \right), $$

(1)

where (in Eq. 1) \(W_{\varphi }\) and \(W_{\psi }^{i} \) are defined below:

$$ W_{\varphi } \left( {j_{0} ,m,n} \right) = \frac{1}{{\sqrt {MN} }}\mathop \sum \limits_{x = 0}^{M - 1} \mathop \sum \limits_{y = 0}^{N - 1} f\left( {x,y} \right)\varphi_{{j_{0} ,m,n}} \left( {x,y} \right), $$

(2)

$$ W_{\psi }^{i} \left( {j,m,n} \right) = \frac{1}{{\sqrt {MN} }}\mathop \sum \limits_{x = 0}^{M - 1} \mathop \sum \limits_{y = 0}^{N - 1} f\left( {x,y} \right)\psi_{j,m,n}^{i} \left( {x,y} \right), i = \left\{ {H,V,D} \right\}, $$

(3)

in which the symbol \(j_{0}\) indicates an arbitrary starting scale, \(W_{\varphi } \left( {j_{0} ,m,n} \right){ }\) defines an approximation (Eq. 2) of \(f\left( {x,y} \right)\) at scale \({ }j_{0}\), \(W_{\psi }^{i} \left( {j,m,n} \right)\) add horizontal, vertical and diagonal details (Eq. 3) for scales \({ }j > j_{0}\). These four functions are expressed by Eqs. 8–9, which comprise the scaled (by *j*) and shifted (by *m*, *n*) by the wavelet functions (Eqs. 4–7) which can be used to synthesize the original signal [21]:

$$ \varphi \left( {x,y} \right) = \varphi \left( x \right)\varphi \left( y \right), $$

(4)

$$ \psi^{H} \left( {x,y} \right) = \psi \left( x \right)\varphi \left( y \right), $$

(5)

$$ \psi^{V} \left( {x,y} \right) = \psi \left( x \right)\varphi \left( y \right), $$

(6)

$$ \psi^{D} \left( {x,y} \right) = \psi \left( x \right)\varphi \left( y \right), $$

(7)

$$ \varphi_{j,m,n} \left( {x,y} \right) = 2^{\frac{j}{2}} \varphi \left( {2^{j} x - m,2^{j} y - n} \right), $$

(8)

$$ \psi_{j,m,n}^{i} \left( {x,y} \right) = 2^{\frac{j}{2}} \varphi \left( {2^{j} x - m,2^{j} y - n} \right), i = \left\{ {H,V,D} \right\}. $$

(9)

When the two-dimensional analysis is performed, wavelet approximation captures both the low-frequency content and the time domain trend of multiple signals simultaneously. In this implementation, the detail coefficients reflect each MEG image patch from three aspects: horizontal detail for spatial information, vertical detail for temporal information, and diagonal for both the spatial and time domain contents. As shown in Fig. 2, the three detail measurements retain the high-frequency part of the signal within each decomposition level. Combining time, frequency and spatial domains, the features based on this extraction method may, therefore, be more effective in solving challenging classification problems than conventional approaches. In the next section, this feature extraction method will be applied to address the three-class classification problem for distinguishing AD, MCI and HC subjects.