Near-channel classifier: symbiotic communication and classification in high-dimensional space

Brain-inspired high-dimensional (HD) computing represents and manipulates data using very long, random vectors with dimensionality in the thousands. This representation provides great robustness for various classification tasks where classifiers operate at low signal-to-noise ratio (SNR) conditions. Similarly, hyperdimensional modulation (HDM) leverages the robustness of complex-valued HD representations to reliably transmit information over a wireless channel, achieving a similar SNR gain compared to state-of-the-art codes. Here, we first propose methods to improve HDM in two ways: (1) reducing the complexity of encoding and decoding operations by generating, manipulating, and transmitting bipolar or integer vectors instead of complex vectors; (2) increasing the SNR gain by 0.2 dB using a new soft-feedback decoder; it can also increase the additive superposition capacity of HD vectors up to 1.7\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\times$$\end{document}× in noise-free cases. Secondly, we propose to combine encoding/decoding aspects of communication with classification into a single framework by relying on multifaceted HD representations. This leads to a near-channel classification (NCC) approach that avoids transformations between different representations and the overhead of multiple layers of encoding/decoding, hence reducing latency and complexity of a wireless smart distributed system while providing robustness against noise and interference from other nodes. We provide a use-case for wearable hand gesture recognition with 5 classes from 64 EMG sensors, where the encoded vectors are transmitted to a remote node for either performing NCC, or reconstruction of the encoded data. In NCC mode, the original classification accuracy of 94% is maintained, even in the channel at SNR of 0 dB, by transmitting 10,000-bit vectors. We remove the redundancy by reducing the vector dimensionality to 2048-bit that still exhibits a graceful degradation: less than 6% accuracy loss is occurred in the channel at − 5 dB, and with the interference from 6 nodes that simultaneously transmit their encoded vectors. In the reconstruction mode, it improves the mean-squared error by up to 20 dB, compared to standard decoding, when transmitting 2048-dimensional vectors.


Introduction
ML layers into a single framework for wireless distributed smart sensing systems, as shown in Fig. 1.
One viable option is to exploit novel representations in high-dimensional (HD) computing [10][11][12][13], where data are represented by very long, random vectors (dimension D = 1000 -10,000). Inspired by the size of the brain's circuits, these vectors are holographic and (pseudo)random with independent and identically distributed (i.i.d.) components [10]. As the vectors are composed through a set of well-defined mathematical operations, they can be queried, decomposed [14], and reasoned about [15,16]. For learning and classification tasks, HD computing was initially applied to text analytics, where each discrete symbol can be readily mapped to a random vector to be combined across text [17][18][19][20]. More recently, HD computing has been extended to operate with a set of analog inputs [21][22][23][24][25], mainly in several biosignal processing applications, or with event-driven inputs from neuromorphic dynamic vision sensors [26].
HD vectors are very tolerant to noise, variations, or faulty computations due to their redundant i.i.d. representation, in which information symbols are spread holographically across many components [10,20,27]. This makes HD computing a prime candidate for implementation on emerging nanoscale hardware operating at low signal-to-noise (SNR) conditions [28][29][30]. In a similar vein, methods have been proposed to make use of the robustness of HD vectors in various communication layers [31][32][33][34][35][36][37]. Particularly, recent hyperdimensional modulation (HDM) [33] can be interpreted as a spreading modulation scheme whose spreading gain linearly improves with the vector dimension, allowing higher error tolerance with increased dimensionality. Multiple spread vectors are superposed before transmission; at the receiver, an iterative feedback decoder denoises the query vector by subtracting the estimated vectors. In low SNR channels where each value cannot be reliably demodulated, HDM can still achieve successful demodulation of symbols without requiring an explicit error correction.
In an initial effort, it was shown that HDM exhibits a comparable bit error rate (BER) to that of low-density parity check (LDPC) and Polar codes at a lower number of operations in decoding [33]. Moreover, HDM was shown to be more collision tolerant than conventional orthogonal modulations (e.g., OFDM) in highly congested low power wide area networks [34]. However, the HDM proposed in [33] represents symbols using complex-valued components in a vector, hence we call it Complex-HDM, which requires more bits per symbol to be transmitted and involves energy-hungry fast Fourier transform (FFT) operations in encoding and decoding.
Here, we first address these shortcomings of Complex-HDM by simplifying its encoding/decoding operations, and improving its SNR gain. Next, we demonstrate how our approach can effectively blur the boundaries between communication and ML by relying on a unified HD representation system. This paper makes the following three main contributions (highlighted in Fig. 1 as well).
First, in Sect. 3, we propose Integer-HDM that superposes bipolar vectors. These vectors can be rematerialized in an encoder with a combination of simple lookup and permutation operations that are hardware-friendly [38]. Further, the burden of decoding complexity is lowered by using associative memory (AM) searches, purely with integer arithmetic instead of performing FFT. Such best match searches use cheap clean-up operations, which scale better than FFT searches on long codes, and can be efficiently implemented with analog in-memory computing [30]. Our Integer-HDM achieves the same SNR gain as the Complex-HDM [33] under additive Gaussian white noise (AWGN) without relying on the expensive FFT operations in encoding and decoding.
Secondly, to improve the SNR gain, we propose a soft-feedback decoding mechanism which additionally takes the estimation's confidence into account (Sect. 4). Although the soft-feedback involves floating-point operations, it improves the SNR gain of the Integer-HDM by 0.2 dB at a BER of 10 −4 . To simplify the soft-feedback decoder, it is quantized to 4.1 fixed-point without any degradation in the SNR gain under AWGN. Further, we have observed that our soft-feedback decoder can be combined with an optimized minimum mean-squared error (MMSE) readout to increase the number of superposed vectors, which can be successfully decomposed in a noise-free case. This effectively improves the capacity of HD superposition by 1.7× for noise-free information retrieval; we improve the number of encoded information  Fig. 1 Overview of this work with colors pointing to Sects. 3, 4 and 5 in the paper. Sensor data are encoded to high-dimensional vector x ∈ Z D using a novel Integer-HDM encoder and transmitted over a noisy channel. At the receiver, the perturbed vector y is either decomposed by an optimized soft-feedback decoder (that improved Integer-HDM decoder) to reconstruct the sensory data, or directly classified by a near-channel classifier (NCC) without any decoding step bits in a 500-dimensional HD vector [14] from 0.7 to 1.2 bits/dimension. Thirdly, we propose to combine channel coding, source coding, and ML classification into a single unified layer exploiting multifaceted HD representations. This approach avoids transformations between representations and the addition of multiple layers of encoding/decoding. The approach is inspired by the structural similarities between the Integer-HDM encoding and the spatial feature encoding in HD classifiers used for multichannel biosignal classification tasks [22,25]. In practice, we reuse the spatial encoding for both data transmission and classification; hence, we avoid the transition between different representations. The encoded vector can be reliably transmitted to the receiver, where it is either decoded to analyze the underlying data, or directly classified, enabling near-channel classification (NCC). In Sect. 5, we present a use case for wearable hand gesture recognition (5-class) based on electromyography (EMG) signals from 64 sensors [22] where encoded vectors are transmitted to perform either NCC, or reconstruct the underlying features at the receiver. In NCC mode, the 10,000-bit representation shows great robustness by maintaining the noise-free accuracy of 94% at SNR as low as 0 dB. Reducing the vector dimension to 2048-bit-where there is no redundancy-also exhibits graceful degradation in the presence of AWGN and interference from other sensor nodes, allowing up to −5 dB SNR and up to 6 simultaneously sending sensor nodes at less than 6% accuracy loss, compared to the noise-free case. Moreover, the soft-feedback decoder guarantees successful reconstruction of the features even in noisy environments and improves the mean-squared reconstruction error by up to 20 dB compared to standard decoding at dimension D = 2048.
In the remainder of the paper, Sect. 2 provides background into HD computing, the creation and decomposition of HD superpositions, and HDM. Section 6 concludes the paper.

High-dimensional computing
The brain's circuits are massive in terms of numbers of neurons and synapses, suggesting that large circuits are fundamental to the brain's functioning. HD computing [10]-aka holographic reduced representations [12], semantic pointer architecture [39], or vector symbolic architectures [13,40]-explores this idea by looking at computing with vectors as ultrawide words. These vectors are D-dimensional (the number of dimensions is in the thousands) and (pseudo)random with independent and identically distributed (i.i.d.) components. They thus conform to a holographic or holistic representation: the encoded information is distributed equally over all the D components such that no component is more responsible for storing any piece of information than another. Such representation maximizes robustness for the most efficient use of redundancy [10].
In this work, we focus on multiply-add-permute (MAP) architectures [13], which define the multiplication ( * ) as the element-wise multiplication between two vectors, the addition ( + ) as the element-wise addition among multiple vectors, and the permutation ( ) as the random shuffling of the vector elements. Multiplication and permutation yield dissimilar vectors compared to their input vector, whereas addition preserves similarity and is often used to represent sets. The permutation can be realized with hardwarefriendly, cyclic shifts ( ρ ). We compare two D-dimensional vectors x and y with the cosine similarity: where < ., . > is the ℓ 2 -inner product and ||.|| 2 the ℓ 2norm. The cosine similarity reflects the angle between vectors, neglecting their length/norm.
The high dimensionality guarantees all elements in the dictionary to be orthogonal with high probability, aka quasi-orthogonality. Information can be encoded by HD superposition: a string of information symbols (q 1 , q 2 , ..., q V ), q i ∈ {1, 2, ..., N } ∀i is mapped to the corresponding element in the dictionary, permuted, and superposed via addition: where T is the transpose, E := (e 1 , e 2 , ..., e N ) ∈ {−1, 1} D×N the matrix representation of the IM containing the atomic vectors as columns, and c(q v ) ∈ {0, 1} N an all-zero vector except element q v that is one. Note that all permutations v are distinct.
The individual vectors in the superposition can be retrieved by the associative memory (AM) search: where ĉ v ∈ R N . The estimated index q v is the one with the highest value in ĉ v : Increasing the number of superposed vectors yields a higher information density; therefore, HD superposition can be used for compression. For example, it has been successfully applied for compressing model weights in deep neural networks [41]. However, the number of correct retrievals from highly compressed representations is limited by the number of superposed vectors V; an increasing V yields a lower signal-to-interference ratio (SIR) for retrieval.
The superposition x has integer-valued elements instead of bipolar elements; it can be bipolarized by setting negative elements to " −1 " and positive to " +1 ". If the number of superposed vectors is even, ties at zero are broken at random, or by simply adding another deterministic (random) vector to the superposition before bipolarizing (see [38]). Even though bipolarizing the superposition is common practice in HD computing, it heavily affects both the number of retrievable vectors and the noise resiliency in HD superposition.

Hyperdimensional modulation
Hyperdimensional modulation (HDM) [33] superposes complex-valued vectors using the rows of the discrete Fourier transform (DFT) matrix as entries in the IM. The mapping is realized by transforming the sparse vector c v with a DFT, whereas the readout matrix corresponds to the inverse DFT, which can be efficiently implemented with FFT and inverse FFT. Additional information is encoded by having multiple non-zero values in c v 66, and modulating the non-zero values with phase-shift keying. Decoding is performed in multiple iterations, subtracting the last iteration's estimation from the superposition for the next estimation. An additional cyclic redundancy check (CRC) validates the estimation's correctness; if the CRC fails, the decoder searches through a list of most probable alternative solutions correcting single, double, or triple errors. This yields an SNR gain of 1.75 dB at a BER of 10 −5 . Overall, the presented decoding resulted in similar SNR gains compared to LDPC and Polar codes [33].

Integer-HDM
This section is the first main contribution of the paper: we introduce Integer-HDM, a new modulation scheme that transmits the superposition of bipolar vectors, depicted in Fig. 2. We present a novel encoding scheme that effectively increases the IM size (i.e., the dictionary) while keeping the memory footprint small, which allows to achieve a high code throughput even on resourcelimited devices. An iterative unit-feedback decoder decomposes the transmitted vector to get the estimated bit-string. Our decoder is inspired by Complex-HDM [33], but instead of requiring FFT operations it relies only on efficient AM searches. We experimentally evaluate the SNR gain in an AWGN channel and show that our novel encoding achieves the same SNR gain as Complex-HDM.

Memory-efficient encoding
We start with the description of a memory-efficient encoding of a binary input string u of length k to a D-dimensional integer vector, defined as We define the throughput r of the code in bits per channel usage The ultimate goal is to find an encoding function with a high code throughput while ensuring that the encoded vector is robust against errors occurring during transmission.
The left side of Fig. 2 illustrates the proposed encoding scheme. First, the input string u is divided into V

Fig. 2
Integer-HDM encoder and decoder: binary string u ∈ {0, 1} k is encoded to HD superposition x ∈ Z D which is transmitted over an AWGN channel. The received vector y is finally decoded using an iterative unit-feedback decoder with unit feedback yielding the estimation û equally sized sub-strings ( u 1 , u 2 , ..., u V ). Each sub-string is encoded separately with its corresponding encoding module. In the following we will explain the encoding of u 1 , and then the generalization to all other encoding modules.
First, the bit-to-index (b2i) block maps the bit-string u 1 of length k/V to the IM index q 1 , rotation index r 1 , and sign index s 1 . For generating the indexes, we split the bit-string into three slices that are mapped to their corresponding integer values. The resulting indexes are then further used for decoding information in the HD space.
The IM builds the central part of the encoding and serves as a random but fixed dictionary. It stores N bipolar vectors of dimension D, where the entries are drawn randomly with an equal number of " +1 " and " −1 ". The IM index q 1 is used to read out the corresponding vector in the IM. The number of information bits k q which can be encoded with an IM of size N is The IM grows exponentially with the number of bits we want to encode. As a consequence, the code throughput of tightly resource-limited devices would be restricted. To relax the memory requirements, we extend the encoding by rotation encoding ρ r 1 , which applies a cyclic rotation by r 1 positions to the vector. A cyclic rotation is an alternative, hardware-friendly random permutation. The shifted result is quasi-orthogonal to its input vector. The number of available shifts is limited to the number of dimensions D, resulting in a maximum of additionally encoded bits. The rotation encoding virtually increases the IM size by factor D, without requiring any additional memory. In the next step, the vector is multiplied with the sign modulator s 1 ∈ {−1, 1} . This further gives bit.
We illustrate the encoding with an example assuming dimension D = 64 and an IM size of N = 8 . The bitstring u 1 contains k q + k r + k s = 3 + 6 + 1 = 10 bits, e.g., u 1 = (0100100010) . The bit-to-index block splits the bit-string into three slices (010|010001|0) and maps them to the corresponding integer indexes q 1 = 2 , r 1 = 17 , and s 1 = (−1) . Finally, the encoded vector is The described encoding steps are identical among different encoding blocks; the same IM is shared among all blocks. In the last step, the encoded vectors are permuted with a unique, random permutation v per encoding block and superposed, resulting in the final vector x . The final throughput of the code is

FFT-free decoding based on associative memory
We present an iterative unit-feedback decoder, depicted in Fig. 2, which decomposes the transmitted vector y to estimate the bit-string û . It consists of an estimation and a feedback stage. In the estimation stage, the indexes q v , r v , and ŝ v are guessed for every block v individually. The estimated indexes are encoded to the corresponding vector x v using the same encoding as described in the previous part. To perform the estimation in the next iteration, the encoded vectors x v are subtracted from the input vector y removing the interference from other vectors in the superposition.
The estimation in block v starts with computing the inner products between the inversely permuted input vector and all elements in the associative memory (AM): is the inverse permutation of block v and ρ −r the cyclic shift by (−r) elements. The estimated item and rotation indexes are those that maximize the absolute value of the inner product: and the estimated sign is the sign of the maximizing inner product: After encoding the estimated indexes to the vectors x v , the input vector is cleaned up for the estimation in the next iteration i + 1: In the first iteration, all feedback vectors are initialized to zero, i.e., x (0) v = 0 . The decoding is repeated until all estimated indexes converge, or until a maximum number of iterations is reached without convergence. Finally, the estimated indexes are mapped to the bit-string û.
The computations in the proposed unit-feedback decoder are dominated by the AM search depicted in Eq. (14). These AM searches allow for a high degree of parallelism and only require additions and subtractions, thanks to the bipolar representation of the dictionary. Moreover, the search can be efficiently deployed to a computational memory [42], such as phase-change memory, where the inner product is computed in constant time at O(1) in the analog domain leveraging Kirchhoff 's law. When applied to a language classification problem, performing the AM search in the phase-change memory has shown to be over 100× more energy efficient than in an optimized digital implementation [30].

Experimental results
This section evaluates the BER vs. SNR performance for Integer-HDM and other state-of-the-art (SoA) codes. We assume an AWGN channel with the received signal in the baseband y being modeled as: where x is the sent vector containing V accumulated vectors, and n is AWGN with n ∼ N (0, V SNR I D ) and SNR the signal-to-noise ratio. We define the energy per information bit over noise floor E b /N 0 := SNR/2r. Figure 3a shows the BER vs. SNR behavior of Integer-HDM when varying the number of superposed vectors V and the IM size N while fixing the dimension to D = 512 . Transmitting a single vector ( V = 1 ) shows the highest noise resiliency but results in the lowest code throughput ( r = 0.031 − 0.041 for N = 64 − 2048 ). Integer-HDM allows us to flexibly increase the number of superposed vectors resulting in a linear increase in code throughput; e.g., superposing nine vectors achieves the highest coding rate of r = 0.37 . Transmitting more vectors at the same time reduces the self-induced SIR; hence, a higher SNR is required to achieve the same BER.
The number of decoding iterations of the same code configurations is shown in Fig. 3b. Iterative decoding is not helpful when transmitting only one vector ( V = 1 ) as no denoising of other superposed vectors is needed; thus, decoding is terminated after the first iteration. Conversely, the number of decoding iterations depends heavily on the number of superposed vectors, the IM size, and the SNR, when superposing more than one vector. However, the number of iterations converges towards two when increasing the SNR. More importantly, a low number of iterations is observed in low BER regimes (where the code is eventually operating); e.g., Integer-HDM in configuration V = 7 and N = 512 requires ≈ 0 dB at BER = 10 −4 and takes only 2.44 decoding iterations at the same SNR.
Next, we compare Integer-HDM to Complex-HDM [33] and a Polar code. Like in Complex-HDM [33], we evaluate the codes in short block lengths ( D = 512 ) at a throughput of r = 1/4 . Complex-HDM sends vectors with complex-valued elements of block length D = 256 at a throughput of r c = 1/2 bits per complex channel use, which is equivalent to our setting with r = 1/4 bits per real channel use and a block length of D = 512.
The integer codes are configured to V = 7 and N = 512 , yielding a throughput of r = 0.2598 . A rate 1/4 Polar code at equal block length 512 serves as a second baseline. We use it according to the downlink configuration specified by 3GPP for 5G New Radio (NR) [43]: the information bits are appended by 24   decoding the soft symbols, we use CRC-aided successive cancellation list decoding with list length L = 4 [44]. As the L = 4 list decoder utilizes a part of the information in the CRC bits, we count two of these towards the parity bits. We consider the remaining 22 bits as effective information bits for the comparison, as block errors are not detected in the HDM case. As a result, the effective information bits comprised 106 information bits plus 22 CRC information bits for the Polar code. Figure 4 shows the waterfall diagram of all considered codes. Our proposed Integer-HDM with unit-feedback decoder performs on par with Complex-HDM [33] without needing CRC-aided decoding nor FFT operations. Moreover, it requires fewer decoding iterations than Complex-HDM (2.44 vs. 2.9 @0 dB SNR). The rate 1/4 Polar code outperforms the HD-based codes: it requires 1.2 dB less SNR at BER of 10 −6 . However, this comes at the cost of a higher number of decoding operations: Polar codes have shown to require 1.2× more decoding operations than Complex-HDM (336 vs. 280 operations per information bit) [33]. The high decoding complexity has an impact on the overall power consumption of the system that includes encoding, transmission, and decoding [45]. Complex-HDM has already been shown to require fewer decoding operations than Polar codes. We further reduce the number of iterations by lowering the number of decoding iterations and replacing the FFT-based decoding with cheap AM searches, that can be efficiently implemented in the analog domain [30].

Soft-feedback decoding
This section proposes enhancements to the decoder, introducing a new soft-feedback strategy and quantization schemes for more efficient decoding. Figure 5 depicts the soft-feedback decoding mechanism that scales the currently estimated vector according to the confidence of the previous estimation. Estimations with low confidence are attenuated in the feedback, which results in a damped behavior. We show that the new soft-feedback decoding increases the number of correct vectors retrieved in both the AWGN and noise-free case.

Soft-feedback decoding
The feedback stage reconstructs the estimated vector to remove the noise from the superposition in order to increase the SIR. However, it is not clear in advance how much the past estimations should influence the future ones. The unit-feedback strategy, used both in Complex-HDM and our standard Integer-HDM, weighs all estimations equally with factor one, which can have limitations. For example, if the number of wrong estimations outweighs the correct ones, the feedback decreases the SIR instead of increasing it. Moreover, we observed oscillatory behavior in the unit-feedback decoder, illustrated in Fig. 6  To this end, we propose a soft-feedback scaling function, which attenuates estimations with low confidence: is the highest absolute inner product interpreted as the confidence of the previous estimation. As the inner product can exceed one, we limit the feedback scaling to be less or equal to one. The example in Fig. 6 illustrates the soft-feedback scaling's effectiveness: the oscillations are no longer present, and we converge to the correct solution.

Quantized Soft-feedback decoding
For most FEC codes, the decoding complexity is significantly higher than the coding complexity. This also holds for our proposed Integer-HDM; therefore, any reduction of the computational requirements for decoding is desirable. We start by quantizing the decoder to fixed-point, where we quantize every value in the decoder to a fixedpoint representation with m magnitude bits (integer) and q fractional bits, denoted as "fixed-point m.q". The quantization has the main effect on the input vector y as well as the damped feedback vector x . The range of expected values of the input vector depends on the number of added vectors V. For example, with V = 3 , we expect values in {−3, −1, 1, 3} , which can be represented by m = 3 integer bits. If we reduced the number of integer bits, high values get clipped, which is not desirable in the decoding process. The feedback scaling takes values in [0, 1]; a quantization to q = 1 fractional bits and arbitrary m yields scaling factors in {0, 0.5, 1}.
In addition to the quantization of the general decoder to fixed-point arithmetic, we further reduce the complexity by quantizing the AM search. The dominating (20) x operation in the AM search is the inner product between the query vector ỹ and all vectors in the dictionary e q ∈ {−1, 1} D . We quantize the query vector before the AM search by mapping it to the nearest neighbor from the set of values in the original, noise-free case: Figure 7 shows the histograms of the elements in an encoded vector with dimension D = 512 and V = 7 . The elements in x take values in {−7, −5, ..., 5, 7} , whereas values with large amplitude are less probable than small values, which are close to 0. We then add AWGN (0 dB SNR) to the encoded vector, yielding y . In the readoutquantization, we map the values to the nearest neighbor of the values in the original, noise-free case. Moreover, we limit the values to V ′ due to the low probability of values with large amplitudes. In the extreme case, we set V ′ = 1 , which would reduce the inner product to a Hamming similarity computation. If V ′ > 1 , the inner product can be computed with integer or binary arithmetic, mapping the values to a Thermometer code.

MMSE-optimized readout
We consider an alternative AM readout matrix to E determined by minimizing the mean-squared error between the estimated ĉ v and the ground truth vector c v [14]: where we assume no sign and rotation encoding for simplicity. The minimum mean square error (MMSE) estimator can be found by solving a linear regression problem, providing a training set of R samples with ground truth symbol vectors c v and their encoded HD superposition x . The MMSE readout matrix F can be found with stochastic gradient descent (SGD) minimizing the MSE between ground truth symbol vectors c v and estimated symbol vectors ĉ v on the training. Note that we neither have to inversely permute the superposition x nor require the knowledge of the underlying dictionary; the readout  matrix is only learned based on empirical data. However, a separate readout matrix F v is needed for every superposed vector, which increases the memory footprint, specifically with large V. The MMSE readout has been shown to increase the number of superposed vectors that can be successfully retrieved with high probability p c [14], compared to the standard AM search. Consequently, this results in a higher operational capacity of the superposition which is defined as the number of bits/dimension:

Experimental results
We compare our novel soft-feedback decoder in AWGN simulation using both full-precision floating-point and quantized decoder. Moreover, we evaluate the accuracy of the correct retrieval of HD superpositions in the noisefree case using different decoding strategies.

Soft-feedback decoding
First, we compare the soft-feedback with the unitfeedback decoder used in Integer-HDM and Complex-HDM, shown in Fig 4. The Integer-HDM code is in the same configuration as in the previous experiment (i.e., D = 512 , N = 512 , and V = 7 ). The soft-feedback decoder is able to increase the SNR gain by 0.2 dB compared to the unit-feedback decoder. As a result, Integer-HDM with soft-feedback reduces the SNR gap to the Polar 1/4 code (0.7 dB gap at BER = 10 −4 and 0.8 dB at BER = 10 −5 ).

Quantized Soft-feedback decoding
We analyze the performance of the soft-feedback decoder when quantizing specific parts of the decoder, described in Sect. 4.2. We start with the quantization of the AM readout, i.e., the values in the query vectors ỹ fed to the AM readout. The results in Fig. 8 illustrate that when quantizing the vector elements to bipolar values (i.e., {−1, 1} at V ′ = 1 ), the code performance degrades significantly, compared to the full-precision AM readout. Similar degradation was observed when quantizing the encoded vector x to bipolar values before sending it over the channel. On allowing more levels ( V ′ = 7 ), however, the code performance can be re-established.
When quantizing the entire decoding to fixed-point arithmetic (see Fig. 9), one fractional and four integer bits are sufficient to achieve the same performance as the decoder in floating-point. In addition to the desired (23) Capacity(p c ) = V D p c log 2 (p c N ) reduction in decoding complexity, this result also gives valuable insight into the soft-feedback decoder: a feedback scale taking values in c ∈ {0, 0.5, 1} is sufficient. This yields three options for feedback: take estimation fully into account ( c = 1 ), ignore it ( c = 0 ), or partly use it ( c = 0.5).

Recall from noise-free superpositions
Finally, we experimentally evaluate the decoding performance of the presented feedback decoder and different readout matrices (standard AM and MMSE) in the noise-free case. We measure the probability of correct retrieval p c and derive the operational capacity as in (23). For comparison, we use the same configurations as in [14]:    Figure 10 shows the accuracy and the resulting capacity for the decoder without feedback, with unit-feedback, and soft-feedback. Moreover, we conducted experiments with the MMSE estimator with and without feedback. The MMSE decoder performed similarly with unit and soft-feedback; therefore, we only show unit-feedback results.
Considering the estimator's accuracy without feedback in small IM sizes ( N = 5 ), the MMSE readout can decode a much larger number of superposed vectors with 100% accuracy, compared to the standard AM readout ( V = 134 vs. V = 12 ). However, the advantage of MMSE over AM readout vanishes when increasing the IM sizes ( N = 100).
The feedback decoder significantly increases the number of correctly retrieved vectors in small IM size when using both the MMSE and AM readout ( V = 250 and V = 100 for AM soft-feedback and MMSE unit-feedback, respectively). Moreover, the soft-feedback further increases the accuracy compared to unit-feedback, especially in larger IM sizes ( N = 100 ). Generally, the feedback decoder moves the corner point of 100% correct recoveries to larger Vs; however, the accuracy descent is much steeper compared to non-iterative estimations. The later yet steeper descent of the feedback decoder shows that the denoising is only effective until a certain SIR (i.e., the number of added vectors V). If the SIR gets too low, most of the estimations are wrong, and the feedback adds even more interference.
Considering the capacity, MMSE unit-feedback significantly improves the capacity in small dictionary sizes ( N = 5 ) compared to the current SoA MMSE readout (1.2 vs. 0.7 bits/dimension). This capacity cannot be achieved in larger dictionary sizes. On the contrary, the AM readout with unit or soft-feedback keeps the maximum capacity constant ( ≈ 0.6 bits/dimension), with the soft-feedback achieving slightly higher capacity than the unit-feedback.

Case study: hybrid near-channel classification and data transmission in EMG-based gesture recognition
This section extends the application of pure data transmission with a classification task in EMG-based gesture recognition [22], illustrated in Fig. 11. Our hybrid system provides two modes: (1) a classification mode, where the received bipolar vector is used to estimate the gesture using an AM search; (2) a data transmission mode, where the quantized features are reconstructed at the receiver for further analysis. In related work, alternative hybrid approaches compress EMG data using rakeness-based compressed sensing [46] or with a stacked auto encoder  [47], before sending the data to the receiver. The received data can be reconstructed or classified using an artificial neural network (ANN). However, these representations are sensitive to noise when used in connection with ANNs [48], while the HD representation in our approach is naturally robust against noise, as we will experimentally show in this section.

flexEMG dataset
We use the dataset from a study in [22], which contains recordings of three healthy, male subjects. Each subject participated in three sessions recorded on three different days. We only use sessions one and three, which contain a separate training set and test set. The subjects performed four different gestures (fist, raise, lower, open) plus the rest class in ten runs, yielding a total of 10 · 5 = 50 trials per training and test set. The data were acquired with 64 electrodes, uniformly distributed on a flexible 16 × 4 grid of size 29.3 cm ×8.2 cm. Finally, the data were sampled at 1 kS/s and sent to a base-station over BLE.

Classification
We propose a spatiotemporal encoding, which differs from [22] by exclusively using bipolar MAP operations instead of multiplicative mappings. First, the data of every EMG channel is pre-processed the same way, passing it through a digital notch filter with a 60 Hz stopband and a Q-factor of 50, an 8th-order Butterworth bandpass filter , an absolute value computation, a moving average filter with 100 taps, and then downsampled by 100× , yielding ten samples per second. Moreover, the samples are normalized with the 95% quantile of the training data per channel, which results in features f t ch in [0, 1] with high probability (i.e., p = 0.95 on the training set).
For mapping features to HD vectors, we quantize them to L = 128 levels and map them to a corresponding value vector stored in a continuous IM (CiM) [23]. The CiM is shared among all channels and is constructed as follows. First, a bipolar seed vector is drawn randomly, which corresponds to level l = 1 . For level l = 2 , we invert D/(2L) values at random positions. For the remaining levels, we continue inverting an increasing number of bits until we have inverted D/2 elements for level l = L , which yields orthogonal vectors for level l = 1 and l = L . This mapping is fully bipolar and more hardware-friendly than the multiplicative mapping used in [22], which relies on multiplicative floating-point operations.
The embedded value vector is circularly permuted, depending on the channel index, and superposed resulting in the compressed representation x t . The encoding is completed by bipolarizing x t and building a 5-gram out of five consecutive vectors with random permutations ( ) and binding ( * ). Overall, the encoding achieves a throughput of which can, depending on the dimension of the HD vector, result in compression (e.g., r = 4.375@D = 512).
The encoded vector is modulated (e.g., with BPSK) and sent to the receiver over a wireless channel. At the receiver, the demodulated signal y is finally classified with an AM search. The AM stores a prototype vector per class. Each prototype is learned by accumulating all encoded vectors of the training samples for each class and finally bipolarizing the vectors. For classification, the query vector y is compared to all prototype vectors using (24) r = 64 channels · 7 bits · 5 gram D , Demod  Fig. 11 EMG use-case with hybrid modes: a classification mode (c), and a data transmission mode (d). Both modes use the same spatial encoder consisting of pre-processing, quantization to 128 levels, mapping to HD vectors with continuous item memory (CiM), and superposition. In classification mode (c), n subsequent vectors are converted to bipolar vectors with σ (.) , encoded to an n-gram, transmitted over a wireless channel, and classified with the AM search. In data transmission mode (d), the spatial vector is directly transmitted over the channel and decomposed using the iterative HDM decoder the AM readout. The class with the corresponding best matching prototype is the estimated label [23].

Data transmission
The availability of the underlying data, which led to a certain decision or classification, can be helpful in many applications, e.g., allowing interpretability of the model or analysis of the data by a medical specialist. To address this demand, we propose an additional data transmission mode, where the spatially encoded vector x t is sent to the receiver and decoded with an iterative HDM decoder. This comes with minimal additional requirements at the sensing node, compared to the standard approaches where features are encoded with separate source and channel coding.
In contrast to the quasi-orthogonal IM used for encoding in the previous Sect. 3, the CiM is non-orthogonal, i.e., not every quantization level q i has an orthogonal vector. This makes the exact decoding of the features difficult; however, the distance preserving CiM mapping reduces the effective error in the reconstruction. For example, an estimation of e q+1 instead of e q translates to an error of only 1/L.

Classification
We assess the classification performance in the noisefree, single-node AWGN, and multi-node interference case. The classification accuracy is defined as the ratio between the number of correct estimations and the total number of estimations, given that the classifier makes a new estimation every 100 ms. All models were implemented and tested in MATLAB 2019b. Table 1 shows the classification accuracy in the noisefree case. A support vector machine (SVM) with linear kernel and cost parameter C = 500 on pre-processed, flattened features in float-32 precision with dimension 320 (64 channels 5-gram) [49] as well as an HD classifier with multiplicative mapping [22] serve as baselines. Both HD classifiers operate at a dimension of D = 10, 000 . The SVM marginally outperforms the HD classifiers by 0.14% and 2%; however, in contrast to the HD classifiers, the SVM does not support online updates of the model, which is crucial for practical deployment of EMG applications [49]. The bipolar feature embedding using the CiM instead of the float-based multiplicative mapping in the HD classification yields only a small accuracy degradation (95.99% vs. 94.13%).
Next, we evaluate the classification accuracy when the query vector was exposed to noise: (25) y = x + n, where x ∈ {−1, 1} D is the encoded vector and n ∼ N (0, 1 SNR I D ) AWGN. Figure 12 shows the average classification accuracy for different vector dimensions, depending on the SNR. In the high SNR regime (SNR = 10 dB), a reduction in the dimension results in slight accuracy degradation (e.g., 93.91%@D = 8192 vs. 86.32%@D = 512 ). When decreasing the SNR, we see a graceful accuracy degradation with superior performance when using higher dimension: at D = 4096 , the absolute accuracy loss compared to the noise-free case is less than 4% in low SNR until −10 dB SNR (91.16% vs. 94.13%).
As an additional experiment, we bipolarize the query vector y before the AM search, shown in dashed lines. This allows a more efficient AM search only requiring Hamming distance computation; however, it results in Table 1 Classification accuracy (%) on 5-class EMG-based gesture recognition task using 64-channel flexEMG data [22] We compare a linear SVM, an HD classifier with multiplicative embedding, and our HD classifier with bipolar CiM embedding. Both HD classifiers operate at dimension D = 10 000 a Reproduced  Furthermore, we demonstrate the robustness of our distributed representations in the presence of interference from unrelated nodes as well as AWGN, shown in Fig. 13. The nodes operate at D = 2048 where the effective throughput is r = 1.094 ; hence, the encoding does not add any redundancy. The HD representation exhibits robustness against the interference: when interfering with up to 6 nodes at large SNR (10 dB), the classification accuracy drops by only 4.07% (93.50% vs. 89.43%). Moreover, a graceful accuracy degradation is observed at low SNR of −5 dB and 6 interfering nodes, where an accuracy of 87.75% is maintained.

Reconstruction of features
Finally, we reconstruct the encoded features with the soft-feedback decoder in the presence of AWGN. We measure the mean-squared error (MSE) between reconstructed and original features during active gesture intervals of all subjects in sessions 1 and 3. The time between trials is not considered for reconstruction. Also, the encoded vector is exposed to AWGN. Figure 14 shows the MSE depending on the SNR using either the soft-feedback decoder or the AM search without feedback. Akin to previous classification results, higher dimensional representations show higher noise resiliency yielding a lower MSE. Moreover, the soft-feedback further improves the retrieval of the features with up to 10 dB MSE reduction compared to AM readout without feedback. As a result, the soft-feedback decoder allows the vector dimension to be reduced while still ensuring lower MSE: at 10 dB SNR, soft-feedback at dimension D = 2048-8192 achieves lower MSE than AM readout in all considered dimensions D ≤ 8192 . At dimension D = 2048 , the soft-feedback decoder achieves a maximal reconstruction gain of 20 dB MSE at 10 dB SNR compared to AM readout without feedback.
For illustration, Fig. 15 depicts the original features of subject 1 in the training session of the first session, the reconstructed features with soft-feedback decoder, and the reconstructed features with the AM readout without feedback. The reconstructed features from the AM readout without feedback shows many faulty estimations that do not follow the ground truth, being particularly visible as peaks during the rest state. In contrast, the soft-feedback decoder's estimation follows the ground truth more accurately.

Conclusion
This paper investigates the use of robust and distributed HD representations in wireless communication and classification. We propose a novel encoding, called Integer-HDM, that generates integer-valued vectors based on bipolar seed vectors, cyclic shift encoding, sign modulation, and superposition. A new soft-feedback decoder successfully decomposes the vectors, improving the decoding performance in both noise-free and AWGN scenarios. Achieving a similar SNR gain as complex HDM [33], the proposed Integer-HDM does not require FFT operations and can be quantized to low-resolution fixed-point arithmetic. In a classification use-case, an EMG-based hand gesture recognition demonstrates the robustness of HD representations against AWGN and other interfering sensing nodes; and thus, the same spatial encoding can be used for classification as well as reconstruction of the underlying features. Further investigations can be made into  the decoding of bipolarized superpositions, and N-gram encoded vectors, e.g., using resonator networks [50,51].