Periodic Reporting for period 1 - ZERO-TRAIN-BCI (Combining constrained based learning and transfer learning to facilitate Zero-training Brain-Computer Interfacing)
Reporting period: 2015-04-01 to 2017-03-31
Brain-Computer Interfaces (BCI) are designed to allow the user to control a computer directly through his brain signals. Important use-cases for BCI are the restoration of communication for paralysed patients, restoration of motor control and gaming.
The first BCI prototypes required the user to generate brain signals that the computer could recognise easily. Here, the computer was pre-programmed and the user had to learn how to control his brain signals to obtain control over the device. The introduction of machine learning shifted part of the learning process to the computer. By recording brain signals and the user’s intention, the computer can learn how to produce the desired output. Recording this labelled data requires a calibration session, which typically takes around 15-30 minutes. During the recording of this calibration data, the user cannot be productive with the BCI. Since patients often have a limited attention span, this time must be limited as much as possible. Furthermore, the fact that the underlying statistics of the brain signals change over time leads to the need of frequent re-calibration.
The BCI community has invested much effort on reducing the need for calibration data. In this project, we build upon these and we develop a new generation of BCI decoders that do not require explicit labelled data for all subjects. Instead we use unsupervised learning, weakly supervised learning and transfer learning to build a true zero training brain-computer interface. Such an interface can be used by a novel user without prior calibration.
Based on this analysis, we have introduced the Learning from Label Proportions, that was proposed to the machine learning community by N. Quadrianto in 2009, to the BCI community. The LLP framework is especially suitable for BCI since it is guaranteed to converge to the optimal solution given enough data. For BCI this implies that the decoder will become the optimal one and the method results in an inherently adaptive decoder. The drawback of LLP is that it is only applicable to specific paradigm design. The co-design of paradigm and decoder is one of the key novelties in this project.
Specifically, we have performed a simulation of an LLP decoder using data containing imagined movement. There we found that the LLP can be used to decode imagined movement but under the following pre-conditions. (1) The amount of unlabelled data is large. (2) The BCI paradigm supports LLP decoding. (3) A good feature representation is already present and we must not use the raw EEG.
The huge benefit of LLP decoding was also demonstrated by us in a series of experiments on Event-Related Potential BCI. Here we have shown that LLP results in a reliable decoder. However, while the decoder is incredibly stable, it is often not as successful as an Expectation Maximisation based unsupervised decoder (Kindermans 2012, PLoS One).
For this reason, we investigated the transfer of knowledge between the EM and LLP classifier. By combining them in an analytically optimal manner, we were able to build a new BCI decoder that is more reliable than previous unsupervised decoders and learns very efficiently. Our experiments indicate that this mixture of BCI decoders can readily replace traditional supervised decoders without introducing additional drawbacks.
The disadvantage of LLP based decoding and EM decoding is that the application must support these types of decoders by producing structured data. Deep learning on the other hand is generally applicable and has transformed the machine learning community in recent years. Therefore we also investigated whether deep learning is suitable for BCI. Our initial results are promising and will be validated using two analysis methods that were developed during this project. These methods are PatternNet and PatternLRP. Contrary to popular analysis methods for neural networks, PatternNet and PatternLRP produce the optimal solution for a linear model. On top of that, they also produce improved explanation results on the explanation standard deep learning benchmarks.
- We proposed to design BCI systems as a whole. Instead of optimising the paradigm/application and the decoder independently, we propose to optimise them jointly. By adapting the BCI paradigm and the decoder to each-other a new generation of BCI systems was developed. To showcase this idea, we have developed the LLP based BCI that does not require label information, but it is guaranteed to find the optimal decoder when enough data is available. While previous adaptive BCI decoders worked well empirically, none of them had this guarantee to converge to a good solution. This is important for patient application. When the patient is actively using the BCI, the ground truth (user’s intention) is not known. Using the traditional BCI methodology, this data was not worth much during further analysis. Using the LLP-BCI this data becomes as valuable as labelled data.
- We combined LLP and EM based decoders to build a new type of true zero training BCI interface. Our experiments shows that within less than 10 trials, the decoder is as accurate as a traditional supervised system, but without needing a calibration session. This is a huge step towards plug and play BCI systems, from which patients can benefit but it also allows BCI to be used for non-medical applications such as gaming.
- Finally, we have also evaluated deep-learning methods for BCI. While the initial results are promising, our key contribution are our new analysis methods. These methods, PatternNet and PatternLRP are able to produce the correct explanation for a linear model. Previous approaches were not able to do this. This was remarkable since linear models are simple neural networks. Therefore we argue that explanation methods to better understand the decision process of a deep neural network should work reliably on a linear model. Since this is achieved for the first time by PatternNet and PatternLRP and our experiments indicate that these methods also produce better explanations of highly non-linear deep architectures, this can be considered a huge leap towards reliably knowledge extraction for machine learning models. This is an essential step for machine learning to not only produce high quality results, but also to allow for interpretation of these results. This is an aspect that we believe will push science as a whole forward.