 Original Research
 Open Access
Sidechannel attacks against the human brain: the PIN code case study (extended version)
 Joseph Lange^{1},
 Clément Massart^{1},
 André Mouraux^{1} and
 FrançoisXavier Standaert^{1}Email authorView ORCID ID profile
 Received: 13 April 2017
 Accepted: 15 October 2018
 Published: 29 October 2018
Abstract
We revisit the sidechannel attacks with brain–computer interfaces (BCIs) first put forward by Martinovic et al. at the USENIX 2012 Security Symposium. For this purpose, we propose a comprehensive investigation of concrete adversaries trying to extract a PIN code from electroencephalogram signals. Overall, our results confirm the possibility of partial PIN recovery with high probability of success in a more quantified manner and at the same time put forward the challenges of full/systematic PIN recovery. They also highlight that the attack complexities can significantly vary in function of the adversarial capabilities (e.g., supervised/profiled vs. unsupervised/nonprofiled), hence leading to an interesting tradeoff between their efficiency and practical relevance. We then show that similar attack techniques can be used to threat the privacy of BCI users. We finally use our experiments to discuss the impact of such attacks for the security and privacy of BCI applications at large, and the important emerging societal challenges they raise.
Keywords
 Brain–computer interfaces (BCIs)
 Electroencephalography (EEGs)
 Security
 Privacy
1 Introduction
State of the art The increasing deployment of Brain–computer interfaces (BCIs) allowing to control devices based on cerebral activity has been a permanent trend over the last decade. While originally specialized to the medical domain (e.g., [1, 2]), such interfaces can now be found in a variety of applications. Notorious examples include drowsiness estimation for safety driving [3] and gaming [4]. Quite naturally, these new capabilities come with new security and privacy issues, since the signals BCIs exploit can generally be used to extract various types of sensitive information [5, 6]. For example, at the USENIX 2012 Security Symposium, Martinovic et al. showed empirical evidence that electroencephalogram (EEG) signals can be exploited in simple, yet effective attacks to (partially) extract private information such as credit card numbers, PIN codes, dates of birth and locations of residence from users [7]. These impressive results leveraged a broad literature in neuroscience, which established the possibility to extract such private information (e.g., see [8] for lie detection and [9] for neural markers of religious convictions). Or less invasively, they can be connected to linguistic research on the reactions of the brain to semantic associations and incongruities (e.g., [10–12]). All these threats are gaining relevance with the availability of EEGbased gaming devices to a general public [13, 14].

Can we exactly extract private information with high success rate by increasing the number of observations in sidechannel attacks exploiting BCIs?

How does the effectiveness of unsupervised (aka nonprofiled) sidechannel attacks exploiting BCIs compare to supervised (aka profiled) ones?

How efficiently can an adversary build a sufficiently accurate model for supervised (aka profiled) sidechannel attacks exploiting BCIs?

How similar/different are the behavior and the resistance of different users in the context of sidechannel attacks exploiting BCIs?
Contributions For this purpose, we propose an indepth study of (a variation of) one of the case studies in [7], namely sidechannel PIN code recovery attacks, that share some similarities with key recovery attacks against embedded devices. In this respect, our contributions are threefold. After a description of our experimental settings (Sect. 2), we first describe a methodology allowing us to analyze the informativeness of EEG signals and their impact on security with confidence (Sect. 3). While this methodology indeed borrows tools from the field of sidechannel attacks against cryptographic implementations, it also deals with new constraints (e.g., the limited amount of observations available for the evaluations and the less regular distribution of these observations, for which a very systematic and principled approach is particularly important). Second, we provide a comprehensive experimental evaluation of our sidechannel attacks against the human brain using this methodology (Sect. 4). We combine informationtheoretic and security analyses in the supervised/profiled and unsupervised/nonprofiled contexts, provide quantified estimates for the complexity of the attacks and pay a particular attention to the stability of and confidence in our results. Eventually, and after a brief excursion toward the privacy issues raised by our experiments (i.e., what happens if the adversary aims to recover the user IDs rather than the PIN codes?), we conclude by discussing consequences for the security and privacy of BCIbased applications and list interesting scopes for further research (Sect. 6).
Admittedly, and as will be detailed next, our results can be seen as positive or negative. That is, we show in the same time that partial information about PINs can be extracted with confidence and that full PIN extractions are challenging because of the high cardinality of the target and risks of false positive. So they should mostly be viewed as a warning flag that such partial information is possible and may become critical when the cardinality of the target decreases and/or large amounts of data are available to the adversary.^{1}
2 Experimental setting and threat model
In our experiments, eight people (next denoted as users) agreed to provide the 4digit PIN code that they consider the most significant to them, meaning the one they use the most frequently in their daily life. This PIN code was given by the users before the experiment started, stored during the experiment and deleted afterward for confidentiality reasons. Five other random 4digit codes were generated for each user (meaning a total of six 4digit codes per user).
Each (real or random) PIN was then shown on a computer exactly 150 times to each user (in a random order), meaning a total of 900 events for which we recorded the EEG signal in sets of 300, together with a tag T ranging from 1 to 6 (with \(T=1\) the correct PIN and \(T=2\) to 6 the incorrect ones). We used 32 Ag–AgCl electrodes for the EEG signals collection. These were placed on the scalp using a WaveGuard cap from Cephalon, using the international 1010 system. The stimulus onset asynchrony (SOA) was set to 1.009 s (i.e., slightly more than 1 s, to reduce the environmental noise). The time each PIN was shown was set to 0.5 s. When no PIN was displayed on the screen, a + sign was maintained in order to keep the focus of the user on the center of the screen. We additionally ensured that two identical 4digit codes were always separated by at least two other 4digit codes. The split of our experiments in subexperiments of 300 events was motivated by a maximum duration of 5 min, during which we assumed the users to remain focused on the screen. The signals were amplified and sampled at a 1000 Hz rate with a 32channel ASALAB EEG system from Advanced NeuroTechnologies. Eventually, and in order to identify eye blinks which potentially perturb the EEG signal, we added two bipolar surface electrodes on the upper left and lower right sides of the right eye and rejected the records for which such an artifact was observed. This slightly reduced the total number of events stored for each user. (Precisely, this number was reduced to 900, 818, 853, 870, 892, 887, 878, 884, for users 1–8.)
This simplified setting naturally comes with limitations. First and concretely, the number of possible PIN codes for a typical smart card would of course be much larger than the 6 ones we investigate (e.g., 10,000 for a 4digit PIN). In this respect, we first insist that the primary goal of the following experiments is to investigate the information leakages in EEG signals thoroughly, and this limited number of PIN codes allowed us to draw conclusions with good statistical confidence. Yet, we also note that this setting could be extended to a reasonable threat model. For example, one could target \(\approx 1000\) different users by repeatedly showing them \(\approx 10\) PIN codes among the 10,000 possible ones and recover one PIN with good confidence. Second, and since the attacks we carry out essentially test familiar versus unfamiliar information, there is also a risk of false positives (e.g., an all zero code or a close to correct code). In this respect, our mitigation plan is to exploit statistical tools minimizing the number of false negatives, therefore potentially allowing enumeration among the most likely candidates [18].
3 Methodology
In this section, we describe the methodology we used in order to assess and better quantify the feasibility of sidechannel attacks against the human brain. Concretely, and contrary to the case of embedded devices where the leakage distributions are supposed to be stable and the number of observations made by the adversary can be large, we deal with a very different challenge. Namely, we need to cope with irregular distributions possibly affected by outliers and can only assume a limited number of observations.
As a result, the following sections mainly aim to convince the reader that our treatment of the EEG signals is not biased by datasetspecific overfitting. For this purpose, our strategy is twofold. First, we apply the same (pre)processing methods to the measurements of all the users. This means the same selection of electrodes, the same dimensionality reduction and probability density function (PDF) estimation tools (with identical parameters), and the same outliers definition. Second, we systematically verified that our results were in the same time consistent with neurophysiological expectations and stable across a sufficient range of (pre)processing parameters. As a result, our primary focus is on the confidence in and stability of the results, more than on their optimality (which is an interesting scope for further research). In other words, we want to guarantee that EEG signals provide exploitable sidechannel information for PIN code recovery and to evaluate a sufficient number of observations for which such an attack can be performed with good success probability.
3.1 Notations
We denote the (multivariate) EEG signals of our experiments with a random variable \(\varvec{O}\), a sample EEG signal as \(\varvec{o}\), and the set of all the observations available for evaluation as \(\mathcal {O}\). These observations depend on (at least) three parameters: the user under investigation, next denoted with a random variable U such that \(u\in \{1,2,\ldots ,8\}\); the nature of the 4digit code observed (i.e., whether it is correct or a random PIN), next denoted with a random variable P such that \(p\in \{0,1\}\); and a noise random variable N. Each observation is initially made of 32 vectors of 1000 samples, corresponding to 32 electrodes and \(\approx 1\)s per event.
3.2 Supervised (aka profiled) evaluation

From a practical point of view, building a model for all the PINs and users seems impractical in realworld settings: this would require being able to collect multiple observations for each of the 10,000 possible values of a 4digit code. Furthermore, and as discussed in Sect. 3.3, our real versus random profiling allowed us to lean toward realistic (nonprofiled) attacks.

From a neurophysiological point of view, the information we aim to extract is based on eventrelated potentials (ERPs) that have been shown to reflect semantic associations and incongruities [10–12]. In this respect, while we can expect a user to react differently to real and random 4digit codes, there is no reason for him to treat the random codes differently. (Up to problems due to the apparition of other “significant” values that may lead to false positives, as will be discussed next.)
Preprocessing As a first step, all the observations were preprocessed using a bandpass filter. We set the lowfrequency cutoff to 0.5 Hz to remove the slow drifts in the EEG signals and the highfrequency cutoff to 30 Hz to remove muscle artifacts and 50 Hz noise.
Eventually, a look at the standard deviation curves in Fig. 4 suggests that the measurements are quite noisy, hence nontrivial to exploit with a limited amount of observations. This will be confirmed in our following PDF estimation phase and therefore motivates the dimensionality reduction in the next section (intuitively because using more dimensions can possibly lead to better signal extraction, which can mitigate the effect of a large noise level).
Dimensionality reduction The evaluation of our metrics requires to build a probabilistic model, which may become data intensive as the number of dimensions in the observations increases. For example, directly estimating a 2000dimensional PDF corresponding to our selected electrodes is not possible. In order to deal with this problem, we follow the standard approach of reducing dimensionality. More precisely, we use the principal component analysis (PCA) that was shown to provide excellent results in the context of sidechannel attacks against cryptographic devices [19]. We investigate two options in this direction.
Yet, one possible drawback of the previous method is that estimating the average traces \(\bar{\varvec{o}}^j\) becomes expensive when the number of PIN codes increases. In order to deal with and quantify the impact of this limitation, we also considered a “raw PCA,” where we directly reduce the dimensionality based on raw traces, next denoted as \(\varvec{R}_{1:N_d}\leftarrow {\mathsf {PCA}}\big (\{\varvec{o}_i\}_{i\approx 1:900}\big )\). While this approach is not expected to extract the information as effectively, it allows deriving a much larger number of dimensions than in the previous (average) case. Concretely though, exploiting dimensions 1–5 only was a good tradeoff between the informativeness of the dimensionality reduction, the risk of overfitting (useless) datasetdependent patterns and the risk of outliers in our experiments (see the paragraph on outliers).
As a result of this dimensionality reduction phase, the observation vectors \(\varvec{o}(1\):2000) (which correspond to the concatenation of the measurements for our two selected electrodes) are reduced to smaller vectors \(\varvec{R}_{1:N_d}\times \varvec{o}\) (i.e., each dimension o(d) corresponds to the scalar product between the original observations \(\varvec{o}\) and a 2000element vector \(\varvec{R}_d\)). We recall that PCA is not claimed to be an optimal dimensionality reduction, since it optimizes a criteria (i.e., the variance between the raw or mean traces) which does not capture all the information in our measurements. However, it is a natural first step in our investigations, and we could verify that our following conclusions are not affected by slight variations of the number of extracted dimensions (i.e., adding one or two dimensions), which therefore fits our (primary) confidence and stability goal.
PDF estimation We now describe the main ingredient of our supervised/profiled evaluation, namely the PDF estimation for which we exploit the knowledge of the p values for the observations in the profiling sets.
In order to build a model \(\hat{\mathsf {f}}_{\mathrm {model}}(\varvec{o}_{1:N_d}p)\), we first take advantage of the fact that the dimensions of the \(\varvec{o}_{1:N_d}\) vectors after PCA are orthogonal. By additionally considering them as independent, this allows us to reduce the PDF estimation problem from one \(N_d\)variate one to \(N_d\) univariate ones. Based on this simplification, the standard approach in sidechannel analysis is to assume the observations to be normally distributed and to build Gaussian templates [20]. Yet, in our experiments no such obvious assumption on the distributions in hand was a priori available. As a result, we first considered a (nonparametric) kernel density estimation as used in [21], which has slower convergence but avoids any risk of biased evaluations [22]. Kernel density estimation is a generalization of histograms. Instead of bundling samples together in bins, it adds (for each observation) a small kernel centered on the value of the observation to the estimated PDF. The resulting estimation that is a sum of kernels is smoother than histograms and usually converges faster. Concretely, kernel density estimation requires selecting a kernel function (we used a Gaussian one) and to set the bandwidth parameter (which can be seen as a counterpart to the bin size in histograms). The optimal choice of the bandwidth depends on the distribution of the observations, which is unknown in our case. So we need to rely on a heuristic and used Silverman’s ruleofthumb for this purpose [23].^{4}
Evaluation metrics Following the general principles put forward in [17], our evaluations will be based on a combination of informationtheoretic and security analyses. The first ones aim at evaluating whether exploitable information is available in the EEG signals; the second ones at evaluating how efficiently this information can be exploited to mount a sidechannel attack. Note that since we do not assume the users to behave identically, these metrics will always be evaluated and discussed for each user independently.
 1.The observations’ conditional distribution is estimated from a profiling set. We denote this phase withNote that the \(\Pr _{\mathrm {model}}[p\varvec{o}]\) factor involved in the PI definition is directly derived via Bayes’ theorem as:$$\begin{aligned} \hat{\mathsf {f}}^{(j)}_{\mathrm {model}}(\varvec{o}p)\leftarrow {\mathcal {O}}_{\mathsf {p}}^{(j)}. \end{aligned}$$$$\begin{aligned} \hat{\Pr }_{\mathrm {model}}[p\varvec{o}]=\frac{\hat{\mathsf {f}}^{(j)}_{\mathrm {model}}(\varvec{o}p)\cdot \Pr [p]}{\sum _{p^*} \hat{\mathsf {f}}^{(j)}_{\mathrm {model}}(\varvec{o}p^*)\cdot \Pr [p^*]}\cdot \end{aligned}$$
 2.The model is tested by computing the PI estimate:with \(n_{p}^j\) the number of observations in the test set \({\mathcal {O}}_{\mathsf {t}}^{(j)}p\).$$\begin{aligned}\hat{\mathrm {PI}}^{(j)}(P;\varvec{O})={\mathrm {H}}[P]+\sum _{p=0}^1\Pr [p]\cdot \sum _{\varvec{o}\in {\mathcal {O}}_{\mathsf {t}}^{(j)}p} \frac{1}{n_{p}^j} \cdot {\log} _{2} \hat{\mathrm{Pr}}_{\mathsf{model}}[p\varvec{o}], \end{aligned}$$
Success rate and average rank In order to confirm that the estimated PI indeed leads to concrete attacks, we consider two simple security metrics. Here, the main challenge is that we only have models for the real and random PIN codes, while the actual observations in the test set naturally come from six different events. As a result, we first considered the success rate event per event. For this purpose, the \(\approx 900\) observations are split in 6 sets of \(\approx 150\) observations that correspond to the six different tag values. Based on these 6 sets, we can compute the probability that the observations are correctly classified as real or random in function of the number of observations exploited in the attack, next denoted as q. This is done by averaging a success function \(\mathsf {S}\) that is computed as follows. If \(q=1\): \(\mathsf {S}(\varvec{o}_1)=1\) if \(\hat{\Pr }_{\mathsf {model}}[p\varvec{o}_1]>\hat{\Pr }_{\mathsf {model}}[\bar{p}\varvec{o}_1]\) and \(\mathsf {S}(\varvec{o}_1)=0\) otherwise (where \(\bar{p}\) denotes the incorrect event); if \(q=2\): \(\mathsf {S}(\varvec{o}_1,\varvec{o}_2)=1\) if \(\hat{\Pr }_{\mathsf {model}}[p\varvec{o}_1]\times \hat{\Pr }_{\mathsf {model}}[p\varvec{o}_2]>\hat{\Pr }_{\mathsf {model}}[\bar{p}\varvec{o}_1]\times \hat{\Pr }_{\mathsf {model}}[\bar{p}\varvec{o}_2];\ldots\) Concretely, this success rate is an interesting metric to check whether the observations generated by different incorrect PIN values indeed behave similarly.
Of course, an adversary eventually wants to compare the likelihoods of different PIN values. For this purpose, we also considered the average rank of the correct PIN in an experiment where we gradually increase the number of observations per tag q, but this time consider sets of 6 observations at once that we classify only according to the model for the real PIN. This leads to vectors \((\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_1^1],\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_1^2],\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_1^3],\ldots,\) \(\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_1^6])\) if \(q=1\), \((\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_1^1] \times\) \(\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_2^1],\) ..., \(\hat{\Pr }_{\mathrm {model}}[p\varvec{o}_1^6]\times \hat{\Pr }_{\mathrm {model}}[p\varvec{o}_2^6])\) if \(q=2\), ..., where the superscripts denote the tag from which the observations originate. The average rank is then obtained by sorting this vector and estimating the sample mean of the position of the tag 1 in the sorted vector.
Connecting the metrics (sanity check) Note that as discussed in [25], informationtheoretic and security metrics can be connected (i.e., a model that leads to a positive PI should lead to successful attacks).^{5} We consider both types of metrics in our experiments because the first ones allow a better assessment of the confidence in the evaluations (see the following paragraph on confidence), while the second ones lead to simpler intuitions regarding the concrete impact of the attacks.
Outliers As mentioned in the Dimensionality Reduction paragraph, the main drawback of the raw PCA is that it extracts the useful EEG information less efficiently, which we mitigate by using more dimensions. Unfortunately, this comes with an additional caveat. Namely, the less informative information extraction combined with the addition of more dimensions increases the risk of outliers (i.e., observations that would classify the correct PIN value very badly for some dimensions, possibly leading to a negative PI). In this particular case, we considered an additional postprocessing (after the dimensionality reduction and model building phases). Namely, given the \(\approx 900\) probabilities \(\hat{\Pr }[p\varvec{R}_{1:N_d}\times \varvec{o}_i]\), we rejected the ones below 0.001 and set them to this minimum value. This choice is admittedly heuristic, yet did consistently lead to positive results for all the users. It is motivated by limiting the weight of the log probabilities for the outliers in the PI estimation. We insist that this treatment of outliers is only needed for the raw PCA. For the average PCA, we did not reject any observation (other than the ones in Sect. 2).
Confidence By using \(\approx 900\)fold crossvalidation, we can guarantee that our PI estimates will be based on 900 observations, leading to 900 values for the log probabilities \(\log _2(\hat{\Pr }[p\varvec{R}_{1:N_d}\times \varvec{o}_i])\). Since this remains a limited amount of data compared to the case of sidechannel attacks against cryptographic implementations, and the extracted PI values are small, we completed our informationtheoretic evaluations by computing a confidence interval for the PI estimates. To avoid any distributionspecific assumption, we computed a 10% bootstrap confidence interval [26], by resampling 100 bootstrap samples out of our 900 log probabilities, computing 100 mean bootstrap samples, sorting them and using the 95th and 5th percentiles as the endpoints of the intervals.^{6} For simplicity, this was only done for the PI metric and not for the success rate and average rank since (1) successful Bayesian attacks are implied by the informationtheoretic analysis [25], (2) these metrics are more expensive to sample (e.g., we have only one evaluation of the success function with \(q\approx 150\) per user), and (3) they are only exhibited to provide intuitions regarding the exploitability of the observations (i.e., the attack complexities).
3.3 Unsupervised (aka nonprofiled) analysis
While supervised (aka profiled) analyses are the method of choice to gain understanding about the information available in a sidechannel, their practical applicability is of course questionable. Indeed, building a model for a target user may not always be feasible, and this is particularly true in the context of attacks against the human brain since, as will be discussed in Sect. 4.3, models built for one user are not always (directly) exploitable against another user. In this section, we therefore propose an unsupervised/nonprofiled extension of the previous (supervised/profiled) informationtheoretic evaluation. To the best of our knowledge, this variation was never described as such in the open literature (although it shares some similarities with the nonprofiled attacks surveyed in [21]). For this purpose, our starting point is the observation from Fig. 3, that in an unsupervised/nonprofiled context, one can take advantage of the fact that the (e.g., mean) traces of the EEG signals corresponding to the correct PIN value may stand out. As a result, a natural idea is to compute the PI metric 6 times independently, each time assuming a different (possibly random) tag to be correct during an “onthefly” modeling phase. If the traces corresponding to the (truly) correct PIN are more singular (comparatively to the others), we can expect the PI estimated with this PIN to be larger, leading to a successful attack.
Of course, such an attack implies an additional neurophysiological assumption (while in the supervised/profiled setting, we just exploit any information available). Yet, it nicely fits the intuitions discussed in the rest of this section, which makes it a good candidate for concrete evaluation. Furthermore, we mention that directly recovering the correct PIN value may not always be necessary: as in the case of sidechannel analysis, reducing the rank of the correct PIN value down to an enumerable one may be sufficient [18].
4 Experimental results
4.1 Supervised (aka profiled) evaluation
As in the previous section, we start with the results of our supervised/profiled evaluations, which will be in two (informationtheoretic and security) parts. Beforehand, there is one last choice regarding the computation of \(\hat{\Pr }[p\varvec{R}_{1:N_d}\times \varvec{o}_i]\) via Bayes’ theorem. Namely, should we consider maximum likelihood or maximum a posteriori attacks (i.e., should we take advantage of the a priori knowledge of \(\Pr [p]\) or consider a uniform a priori). Interestingly, in our context ignoring this a priori and performing maximum likelihood attacks is more relevant, since we mostly want to avoid false negatives (i.e., correct PINs that would be classified as random ones), which prevent efficient enumeration. Since the a priori on P increases the amount of such errors (due to the a priori bias of 5/6 toward random PIN values), the rest of this section reports on the results of maximum likelihood attacks.
4.1.1 Perceived information

The value of the PI estimated using the maximum profiling set (i.e., the extreme right values in the graphs). It reflects the informativeness of the model built in the profiling phases and is correlated with the success rate of the online (maximum likelihood) attack using this model [25]. Positive PI values indicate that the model is sound (up to Footnote 5) and should lead to successful online attacks if the number of observations (i.e., the q parameter in our notations) used by the adversary is sufficient.

The number of traces in the profiling set required to reach a positive PI. It reflects the (offline) complexity of the model estimation (profiling) phase [27].
In this respect, the results in Fig. 7 show a positive convergence for the two illustrated users, yet toward different PI values which indicate that the informativeness of the EEG signals differs between them. Next, and quite interestingly, we also see that the difference between average PCA (in the left part of the figure) and raw PCA (in the right side) confirms the expected intuitions. Namely, the fact that raw PCA reduces dimensionality based on less meaningful criteria and requires more dimensions implies a slower model convergence. Typically, model convergence was observed in the 100 observations’ range with average PCA and required up to 400 traces with raw PCA. For completeness, Table 1 contains the estimated PI values with maximum profiling set, for the different users and types of PCA. Excepted for one user (User 5) for which we could never reach a positive PI value with confidence,^{7} this analysis suggests that all the users lead to exploitable information and confirms the advantage of average PCA. A similar table obtained with the Gaussian profiling is given in Appendix 1.
Estimated PI values with maximum profiling set
User  \(\hat{\mathrm {PI}}(P;O)\) with avg. PCA  \(\hat{\mathrm {PI}}(P;O)\) with raw PCA 

1  0.0739  0.0618 
2  0.1643  0.1315 
3  0.1494  0.1398 
4  0.0920  0.0228 
5  \(\varnothing\)  \(\varnothing\) 
6  0.0521  0.0214 
7  0.0759  0.0568 
8  0.1697  0.0458 
4.1.2 Success rate and average rank
As discussed in the previous section, our informationtheoretic analysis is a method of choice to determine whether discriminant information can be extracted from EEG signals with confidence. Yet, it does not lead to obvious intuitions regarding the actual complexity of an online attack where an adversary obtains a set of q fresh observations and tries to detect whether some of them correspond to a real PIN value. Therefore, we now provide the results of our complementary security analysis and estimate the success rate and average key rank metrics. As previously mentioned these evaluations are less confident, since for large q values such as \(q=150\) we can have only one evaluation of the success function. Concretely, the best success rate/average key rank estimates are therefore obtained for \(q=1\). We took advantage of resampling when estimating them for larger q’s.
Figures 8 and 9 illustrate that these metrics are indeed correlated with the value of the PI estimates using the maximum profiling set, which explains the more efficient attacks against Users 2, 3 and 8. Concretely, the average rank figure suggests that correct PIN value can be exactly extracted in our 6PIN case study with 5–10 observations for the most informative users and 30–40 observations for the least informative ones. The success rate curves also bring meaningful intuitions since they highlight that all (correct and random) PIN values can be correctly classified with our profiled models (in slightly more traces). This confirms our neurophysiological assumption from the previous section that the users react similarly to all random values.^{8}
Besides, Fig. 8 is interesting since it shows how confidently the correct PIN value is classified independent of the others. Hence, its results would essentially scale with larger number of PIN values.
4.2 Unsupervised (aka nonprofiled) analysis
First, looking at the first line of the figure, which corresponds to the correct PIN value, we can now confirm that the PI estimates of Sect. 4.1.1 are sufficiently accurate (e.g., the confidence intervals clearly guarantee a positive PI). Second, the confidence intervals for the random PIN values (i.e., tags 2–6) confirm the observation from our success rate curves (Fig. 8) that the users react similarly to all random values. Third, the middle and bottom parts of the figure show the results of two (resp. 4) nonprofiled attacks where the profiling set was split in 2 (resp. 4) independent parts (without resampling), therefore leading to the evaluation of 2 (resp. 4) confidence intervals for each tag value. Concretely, the PI estimate for the correct PIN value consistently started to overlap with the ones of random PINs for all users, as soon as the number of attack traces q was below 200, and no clear gain for the correct PIN could be noticed below \(q=100\). This confirms the intuition that unsupervised/nonprofiled sidechannel attacks are generally more challenging than supervised/profiled ones (here, by an approximate factor 5–10 depending on the users).
This conclusion also nicely matches the one in Sect. 4.1.1, Fig. 7, where we already observed that the (offline) estimation of an informative model is more expensive than its (online) exploitation for PIN code recovery as measured by the success rate and average rank (by similar factors). Indeed, in the unsupervised/nonprofiled context such an estimation has to be performed “onthefly”.
4.3 Model portability
Since the previous section suggests a significant advantage of supervised/profiled attacks over unsupervised/nonprofiled ones, a natural question is whether the profiling can lead to realistic attack models. Clearly, estimating a model for the correct PIN of each user an adversary would like to target seems hardly realistic (especially if 10,000 PIN values are considered). Therefore, and in order to get around this drawback, a solution would be to use the model built for one user against another user. Despite limited by the number of users in our experiments, we made preliminary analyses in this direction. Interestingly, while for most pairs of users the resulting attacks failed and the PI estimates remained negative, we also found two pairs of users for which the models could be mutually exchanged. Namely, targeting User 1 (resp. User 6) with the model of User 6 (resp. User 1) leads to a PI of 0.0211 (resp. 0.0357). And targeting User 1 (resp. User 3) with the model of User 3 (resp. User 1) leads to a PI of 0.0281 (resp. 0.0246). Intuitively, this positive result is in part explained by the similar shapes of the first eigenvectors used to reduce the dimensionality when estimating these models. Overall, this problem of model portability is in fact similar to the problem of variability faced in the context of sidechannel attacks against cryptographic devices [24]. Hence, it is an interesting scope for further research to investigate how advanced profiling techniques (e.g., profiling multiple users jointly with mixture models) could be used to increase the practical relevance of supervised/profiled attacks against the human brain.
5 From security issues to privacy issues
Estimated PI values with maximum profiling set
User  \(\hat{\mathrm {PI}}(U=u;O)\) with avg PCA  \(\hat{\mathrm {PI}}(U=u;O)\) with raw PCA 

1  0.7044  0.5257 
2  0.7217  0.6378 
3  0.2680  0.2138 
4  0.3337  0.8044 
5  \(\varnothing\)  \(\varnothing\) 
6  0.2620  0.4254 
7  0.4003  0.5650 
8  1.4532  1.1351 
6 Consequences and conclusions
The results in this paper lead to two conclusions.
First, and from the security point of view, our experiments show that PIN extraction attacks using BCIs are feasible, yet require several observations to succeed with high probability. In this respect, the difference between the complexity of successful supervised/profiled attacks (around 10 correct PIN observations) and unsupervised/nonprofiled attacks (more in the hundreds range) is noticeable. It suggests the aggregation of users into classes for which the models are sufficiently similar as an interesting scope for further research (which would require larger scale experiments, with more users). In this setting, a better investigation of the impact of enumeration would also be worthwhile. Indeed, the reduction of the average rank of the correct PIN is also significant in our analyses. Therefore, combining sidechannel attacks against the human brain with some enumeration power can reduce the number of observations required to succeed. (Roughly, we can assume that the average key rank will be reduced exponentially in the number of observations, as usually observed in sidechannel attacks [30].)
More generally, our results suggest that extracting concrete PIN codes from EEG signals, while theoretically feasible and potentially damaging from some users and PINs, is not yet a very critical threat for systematic PIN extraction. This may change in the future, if/when massive amounts of BCI signals start to be collected. Besides, other targets with smaller cardinality could already be more worrying (e.g., extracting the knowledge of one relative among a set of unknown people displayed on a screen), because of avoiding issues related to users loosing their focus due to too long experiments.
Second, and given the importance of profiling for efficient information extraction from EEG signals, our experiments also underline that privacy issues may be even more worrying than security ones in BCIbased applications. Indeed, when it comes to privacy, the adversary trying to identify a user is much less limited in his profiling abilities. In fact, any correlation between his target user and some feature found in a dataset is potentially exploitable. Furthermore, the amount and types of correlations that can be exhibited in this case are potentially unbounded, which makes the associated risks very hard to quantify. In this respect, the data minimization principle does not seem to be a sufficient answer: it may very well be that the EEG signals collected for one (e.g., gaming) activity can be used to reveal various other types of (e.g., medical, political) correlations. Anonymity is probably not the right answer either (since correlations with groups of users may be as discriminant as personal ones). And such issues are naturally amplified in case of malicious applications (e.g., it seems possible to design a BCIbased game where situations lead the users to incidentally reveal preferences). So overall, it appears as an important challenge to design tools that provide evidence of “fair treatment” when manipulating sensitive data such as EEG signals, which can be connected to emerging challenges related to computations on encrypted data [31] which can be connected to emerging challenges related to computations on encrypted data [31].
The experiments described next were approved by the local Research Ethics Committee and performed in compliance with the Code of Ethics of the World Medical Association (Declaration of Helsinki). All participants gave written informed consent.
We further checked systematically that other electrodes did not provide significantly more discriminating information so that our conclusions would be affected.
Note that for completeness, we also considered simple Gaussian templates. Comparing nonparametric and parametric approaches was useful in our experiments, in order to gain confidence that the kernel density estimation is not capturing datasetspecific features. Yet, since no significant variation was noticed, the following sections will focus on the results obtain with kernel density estimation.
More precisely, the PI is an average metric, so what is needed is that each line of the PI matrix defined in [17] (corresponding to 6 different events in our study) are positive, which we confirmed with the success rate analysis.
We note that confidence intervals estimated based on a Gaussian assumption did not lead to different conclusions in our case study.
As mentioned in Sect. 2, this is due to the presence of another familiar event for this user, which he mentioned to us after the experiments were performed. Further analysis of this critical case was not possible since the experiment approved by our ethical board was conditioned on the fact that no user PIN was stored.
We may expect more singularities (such as the one of User 5) to appear and launch false alarms in case studies with more PIN values. Yet, this would not contradict the trend of a significantly reduced average rank for the correct PIN value.
Despite a positive PI, the key rank for User 7 also stabilizes to 2. Yet, in this case we observed that it is due one single misleading observation that is not rejected by our outlier management).
Declarations
Authors’ contributions
JL, CM and AM participated to the data collection part of the paper. JL, CM and FXS participated to the data analysis part of the paper. All coauthors participated to the writing of the paper. All authors read and approved the final manuscript.
Authors’ information
Joseph Lange is a research engineer at Google. Clément Massart is a PhD student at UCL. André Mouraux and FrançoisXavier Standaert are Professor at UCL.
Acknowledgements
FrançoisXavier Standaert is a senior associate researcher of the Belgian Fund for Scientific Research (FNRSF.R.S.). This work has been funded in parts by the FEDER Project CryptoMediaUCL and the ERC Project 724725.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Engel J, Kuhl DE, Phelps ME, Crandall paul H (1982) Comparative localization of foci in partial epilepsy by PCT and EEG. Ann Neurol 12(6):529–537View ArticleGoogle Scholar
 Portas CM, Krakow K, Allen P, Josephs O, Armony JL, Frith CD (2000) Auditory processing across the sleepwake cycle: simultaneous EEG and FMRI monitoring in humans. Neuron 28(3):991–999View ArticleGoogle Scholar
 Lin C, Wu R, Liang S, Chao W, Chen Y, Jung T (2005) Eegbased drowsiness estimation for safety driving using independent component analysis. IEEE Trans Circuits Syst 52–I(12):2726–2738Google Scholar
 Coyle D, Príncipe JC, Lotte F, Nijholt A (2013) Guest editorial: brain/neuronal—computer game interfaces and interaction. IEEE Trans Comput Intell AI Games 5(2):77–81View ArticleGoogle Scholar
 Bonaci T, Calo R, Chizeck HJ (2015) App stores for the brain: privacy and security in brain–computer interfaces. IEEE Technol Soc Mag 34(2):32–39View ArticleGoogle Scholar
 Ienca M (2016) Hacking the brain: brain–computer interfacing technology and the ethics of neurosecurity. Ethics Inf Technol 18(2):117–129View ArticleGoogle Scholar
 Martinovic I, Davies D, Frank M, Perito D, Ros T, Song D (2012) On the feasibility of sidechannel attacks with braincomputer interfaces. In: Kohno T (ed) USENIX security symposium. Proceedings. USENIX Association, pp 143–158Google Scholar
 Farwell LA, Donchin E (1991) The truth will out: interrogative polygraphy (lie detection) with eventrelated brain potentials. Psychophysiology 28(5):531–547View ArticleGoogle Scholar
 Inzlicht M, McGregor I, Hirsh JB, Nash K (2009) Neural markers of religious conviction. Psychol Sci 20(3):385–392View ArticleGoogle Scholar
 Berlad I, Pratt H (1995) P300 in response to the subject’s own name. Electroencephalogr Clin Neurophysiol 96(5):472–474View ArticleGoogle Scholar
 Kutas M, Hillyard SA (1980) Reading senseless sentences: brain potentials reflect semantic incongruity. Science 207:203–205View ArticleGoogle Scholar
 Kutas M, Hillyard SA (1984) Brain potentials during reading reflect word expectancy and semantic association. Nature 307:161–163View ArticleGoogle Scholar
 http://emotiv.com/. Last retrieved July 2016
 http://neurosky.com/. Last retrieved July 2016
 Mangard S, Oswald E, Popp T (2007) Power analysis attacks—revealing the secrets of smart cards. Springer, BerlinMATHGoogle Scholar
 http://www.chesworkshop.org/. Last retrieved July 2016
 Standaert F, Malkin T, Yung M (2009) A unified framework for the analysis of sidechannel key recovery attacks. In: Joux A (ed) EUROCRYPT. Proceedings, volume 5479 of LNCS. Springer, pp 443–461Google Scholar
 VeyratCharvillon N, Gérard B, Renauld M, Standaert F (2012) An optimal key enumeration algorithm and its application to sidechannel attacks. In: KnudsenLR, Wu H (eds) SAC. Proceedings, volume 7707 of LNCS. Springer, pp 390–406Google Scholar
 Archambeau C, Peeters E, Standaert F, Quisquater J (2006) Template attacks in principal subspaces. In: Goubin L, Matsui M (eds) CHES 2006. Proceedings, volume 4249 of LNCS. Springer, pp 1–14Google Scholar
 Chari S, Rao JR, Rohatgi P (2002) Template attacks. In: Kaliski Jr BS, Koç ÇK, Paar C (eds) CHES. Proceedings. volume 2523 of LNCS. Springer, pp 13–28Google Scholar
 Batina L, Gierlichs B, Prouff E, Rivain M, Standaert F, VeyratCharvillon N (2011) Mutual information analysis: a comprehensive study. J Cryptol 24(2):269–291MathSciNetView ArticleGoogle Scholar
 Durvaux F, Standaert F, VeyratCharvillon N (2014) How to certify the leakage of a chip? In: Nguyen PQ, Oswald E (eds) EUROCRYPT. Proceedings, volume 8441 of LNCS. Springer, pp 459–476Google Scholar
 Silverman BW (1986) Density estimation for statistics and data analysis. Chapman & Hall, LondonView ArticleGoogle Scholar
 Renauld M, Standaert F, VeyratCharvillon N, Kamel D, Flandre D (2011) A formal study of power variability issues and sidechannel attacks for nanoscale devices. In: Paterson KG (ed) EUROCRYPT 2011. Proceedings, volume 6632 of LNCS. Springer, pp 109–128Google Scholar
 Duc A, Faust S, Standaert F (2015) Making masking security proofs concrete—or how to evaluate the security of any leaking device. In: Oswald E, Fischlin M (eds) EUROCRYPT 2015. Proceedings, Part I, volume 9056 of LNCS. Springer, pp 401–429Google Scholar
 Efron B, Tibshirani RJ (1994) An introduction to the bootstrap. CRC Press, Boca RatonMATHGoogle Scholar
 Standaert F, Koeune F, Schindler W (2009) How to compare profiled sidechannel attacks? In: Abdalla M, Pointcheval D, Fouque P, Vergnaud D (eds) ACNS. Proceedings, volume 5536 of LNCS, pp 485–498View ArticleGoogle Scholar
 Marcel S, Millán JR (2007) Person authentication using brainwaves (EEG) and maximum A posteriori model adaptation. IEEE Trans Pattern Anal Mach Intell 29(4):743–752View ArticleGoogle Scholar
 Paranjape RB, Mahovsky J, Benedicenti L, Koles Z (2001) The electroencephalogram as a biometric. In: Electrical and Computer Engineering, vol 2. IEEE, pp 1363–1366Google Scholar
 VeyratCharvillon N, Gérard B, Standaert F (2013) Security evaluations beyond computing power. In: Johansson T, Nguyen PQ (eds) EUROCRYPT. Proceedings, volume 7881 of LNCS. Springer, pp 126–141Google Scholar
 Smart NP (2016) Computing on encrypted data. Kayaks and Dreadnoughts in a sea of crypto (September 2016)Google Scholar