Work Package 5 Research

SPHERE researchers are developing theoretical and empirical evaluation of static and structured classification models on structured datasets which should inform SPHERE of what models to use. Researchers can encode the structural nature of these problems into features, which:


Hyperstream is a large-scale, flexible and robust software package for processing streaming data.

Hyperstream overcomes the limitations of other computational engines and provides high-level interfaces to execute complex nesting, fusion, and prediction both in online and offline forms in streaming environments. Although developed specifically for SPHERE, Hyperstream is a general purpose tool that is well-suited for the design, development, and deployment of algorithms and predictive models in a wide space of sequential predictive problems.

This software has been designed from the outset to be domain-independent, in order to provide maximum value to the wider community. Key aspects of the software include the capability to create complex interlinked workflows, and a computational engine that is designed to be "compute-on-request", meaning that no unnecessary resources are used.



Beta calibration: a well-founded and easily implemented improvement on logistic calibration for binary classifiers
M Kull, T Silva Filho, P Flach
Artificial Intelligence and Statistics, 623-631
For optimal decision making under variable class distributions and misclassification costs a classi- fier needs to produce well-calibrated estimates of the posterior probability. Isotonic calibration is a powerful non-parametric method that is however prone to overfitting on smaller datasets; hence a parametric method based on the logistic curve is commonly used. While logistic calibration is designed for normally distributed per-class scores, we demonstrate experimentally that many classifiers including Naive Bayes and Adaboost suffer from a particular distortion where these score distributions are heavily skewed. In such cases logistic calibration can easily yield probability estimates that are worse than the original scores. Moreover, the logistic curve family does not include the identity function, and hence logistic calibration can easily uncalibrate a perfectly calibrated classifier. In this paper we solve all these problems with a richer class of calibration maps based on the beta distribution. We derive the method from first principles and show that fitting it is as easy as fitting a logistic curve. Extensive experiments show that beta calibration is superior to logistic calibration for Naive Bayes and Adaboost

Unsupervised learning of sensor topologies for improving activity recognition in smart environments
N Twomey, T Diethe, I Craddock, P Flach
Neurocomputing 234, 93-106

There has been significant recent interest in sensing systems and ‘smart environments’, with a number of longitudinal studies in this area. Typically the goal of these studies is to develop methods to predict, at any one moment of time, the activity or activities that the resident(s) of the home are engaged in, which may in turn be used for determining normal or abnormal patterns of behaviour (e.g. in a health-care setting). Classification algorithms, such as Conditional Random Fields (CRFs), typically consider sensor activations as features but these are often treated as if they were independent, which in general they are not. Our hypothesis is that learning patterns based on combinations of sensors will be more powerful than single sensors alone. The exhaustive approach – to take all possible combinations of sensors and learn classifier weights for each combination – is clearly computationally prohibitive. We show that through the application of signal processing and informationtheoretic techniques we can learn about the sensor topology in the home (i.e. learn an adjacency matrix) which enables us to determine the combinations of sensors that will be useful for classification ahead of time. As a result we can achieve classification performance better than that of the exhaustive approach, whilst only incurring a small cost in terms of computational resources. We demonstrate our results on several datasets, showing that our method is robust in terms of variations in the layout and the number of residents in the house. Furthermore, we have incorporated the adjacency matrix into the CRF learning framework and have shown that it can improve performance over multiple baselines.

Probabilistic Sensor Fusion for Ambient Assisted Living
T Diethe, N Twomey, M Kull, P Flach, I Craddock
arXiv preprint arXiv:1702.01209

There is a widely-accepted need to revise current forms of health-care provision, with particular interest in sensing systems in the home. Given a multiple-modality sensor platform with heterogeneous network connectivity, as is under development in the Sensor Platform for HEalthcare in Residential Environment (SPHERE) Interdisciplinary Research Collaboration (IRC), we face specific challenges relating to the fusion of the heterogeneous sensor modalities. We introduce Bayesian models for sensor fusion, which aims to address the challenges of fusion of heterogeneous sensor modalities. Using this approach we are able to identify the modalities that have most utility for each particular activity, and simultaneously identify which features within that activity are most relevant for a given activity. We further show how the two separate tasks of location prediction and activity recognition can be fused into a single model, which allows for simultaneous learning an prediction for both tasks. 
We analyse the performance of this model on data collected in the SPHERE house, and show its utility. We also compare against some benchmark models which do not have the full structure,and show how the proposed model compares favourably to these methods

BDL. NET: Bayesian dictionary learning in Infer. NET
T Diethe, N Twomey, P Flach
Machine Learning for Signal Processing

We introduce and analyse a flexible and efficient implementation of Bayesian dictionary learning for sparse coding. By placing Gaussian-inverse-Gamma hierarchical priors on the coefficients, the model can automatically determine the required sparsity level for good reconstructions, whilst also automatically learning the noise level in the data, obviating the need for heuristic methods for choosing sparsity levels. This model can be solved efficiently using Variational Message Passing (VMP), which we have implemented in the Infer.NET framework for probabilistic programming and inference. We analyse the properties of the model via empirical validation on several accelerometer datasets. We provide source code to replicate all of the experiments in this paper.


Subgroup discovery with proper scoring rules
H Song, M Kull, P Flach, G Kalogridis
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 492-510
Subgroup Discovery is the process of finding and describing sufficiently large subsets of a given population that have unusual distributional characteristics with regard to some target attribute. Such subgroups can be used as a statistical summary which improves on the default summary of stating the overall distribution in the population. A natural way to evaluate such summaries is to quantify the difference between predicted and empirical distribution of the target. In this paper we propose to use proper scoring rules, a well-known family of evaluation measures for assessing the goodness of probability estimators, to obtain theoretically well-founded evaluation measures for subgroup discovery. From this perspective, one subgroup is better than another if it has lower divergence of target probability estimates from the actual labels on average. We demonstrate empirically on both synthetic and real-world data that this leads to higher quality statistical summaries than the existing methods based on measures such as Weighted Relative Accuracy.

On the need for structure modelling in sequence prediction
N Twomey, T Diethe, P Flach
Machine Learning 104 (2-3), 291-314
There is no uniform approach in the literature for modelling sequential correlations in sequence classification problems. It is easy to find examples of unstructured models (e.g. logistic regression) where correlations are not taken into account at all, but there are also many examples where the correlations are explicitly incorporated into a—potentially computationally expensive—structured classification model (e.g. conditional random fields). In this paper we lay theoretical and empirical foundations for clarifying the types of problem which necessitate direct modelling of correlations in sequences, and the types of problem where unstructured models that capture sequential aspects solely through features are sufficient. The theoretical work in this paper shows that the rate of decay of auto-correlations within a sequence is related to the excess classification risk that is incurred by ignoring the structural aspect of the data. This is an intuitively appealing result, demonstrating the intimate link between the auto-correlations and excess classification risk. Drawing directly on this theory, we develop well-founded visual analytics tools that can be applied a priori on data sequences and we demonstrate how these tools can guide practitioners in specifying feature representations based on auto-correlation profiles. Empirical analysis is performed on three sequential datasets. With baseline feature templates, structured and unstructured models achieve similar performance, indicating no initial preference for either model. We then apply the visual analytics tools to the datasets, and show that classification performance in all cases is improved over baseline results when our tools are involved in defining feature representations.

ADLTM: A Topic Model for Discovery of Activities of Daily Living in a Smart Home
Y Chen, T Diethe, P Flach
International Joint Conference on Artificial Intelligence. AAAI Press, pp. 1404-1410

We present an unsupervised approach for discovery of Activities of Daily Living (ADL) in a smart home. Activity discovery is an important enabling technology, for example to tackle the healthcare requirements of elderly people in their homes. The technique applied most often is supervised learning, which relies on expensive labelled data and lacks the flexibility to discover unseen activities. Building on ideas from text mining, we present a powerful topic model and a segmentation algorithm that can learn from unlabelled sensor data. The model has been evaluated extensively on datasets collected from real smart homes. The results demonstrate that this approach can successfully discover the activities of residents, and can be effectively used in a range of applications such as detection of abnormal activities and monitoring of sleep quality, among many others.

The SPHERE challenge: Activity recognition with multimodal sensor data
Niall Twomey, Tom Diethe, Meelis Kull, Hao Song, Massimo Camplani, Sion Hannuna, Xenofon Fafoutis, Ni Zhu, Pete Woznowski, Peter Flach, Ian Craddock
arXiv preprint arXiv:1603.00797
This paper outlines the Sensor Platform for HEalthcare in Residential Environment (SPHERE) project and details the SPHERE challenge that will take place in conjunction with European Conference on Machine Learning and Principles and Practice of Knowledge Discovery (ECML-PKDD) between March and July 2016. The SPHERE challenge is an activity recognition competition where predictions are made from video, accelerometer and environmental sensors. Monetary prizes will be awarded to the top three entrants, with Euro 1,000 being awarded to the winner, Euro 600 being awarded to the first runner up, and Euro 400 being awarded to the second runner up.

Active transfer learning for activity recognition
T Diethe, N Twomey, P Flach
European Symposium on Artificial Neural Networks, Computational Intelligence and Machine LearningWe examine activity recognition from accelerometers, which provides at least two major challenges for machine learning. Firstly, the deployment context is likely to differ from the learning context. Secondly, accurate labelling of training data is time-consuming and error-prone. This calls for a combination of active and transfer learning. We derive a hierarchical Bayesian model that is a natural fit to such problems, and provide empirical validation on synthetic and publicly available datasets. The results show that by combining active and transfer learning, we can achieve faster learning with fewer labels on a target domain than by either alone

Bayesian modelling of the temporal aspects of smart home activity with circular statistics
T Diethe, N Twomey, P Flach
Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 279-294

Typically, when analysing patterns of activity in a smart home environment, the daily patterns of activity are either ignored completely or summarised into a high-level “hour-of-day” feature that is then combined with sensor activities. However, when summarising the temporal nature of an activity into a coarse feature such as this, not only is information lost after discretisation, but also the strength of the periodicity of the action is ignored. We propose to model the temporal nature of activities using circular statistics, and in particular by performing Bayesian inference with Wrapped Normal  and WN Mixture. We firstly demonstrate the accuracy of inference on toy data using both Gibbs sampling and Expectation Propagation (EP), and then show the results of the inference on publicly available smart-home data. Such models can be useful for analysis or prediction in their own right, or can be readily combined with larger models incorporating multiple modalities of sensor activity.

Gaussian Process Model Re-Use
T Diethe, N Twomey, P Flach
Consider the situation where we have some pre-trained classification models for bike rental stations (or any other spatially located data). Given a new rental station (deployment context), we imagine that there might be some rental stations that are more similar to this station in terms of the daily usage patterns, whether or not these stations are close by or not. We propose to use a Gaussian Process (GP) to model the relationship between geographic location and the type of the station, as determined by heuristics based on the daily usage patterns. For a deployment station, we then find the closest stations in terms of the Gaussian Process (GP) function output, and then use the models trained on these stations on the deployment station. We compare against several baselines, and show that this method is able to outperform those baselines.

Model Reuse with Subgroup Discovery
H Song, P Flach
In this paper we describe a method to reuse models with Model-Based Subgroup Discovery (MBSD), which is a extension of the Subgroup Discovery scheme. The task is to predict the number of bikes at a new rental station 3 hours in advance. Instead of training new models with the limited data from these new stations, our approach first selects a number of pre-trained models from old rental stations according to their mean absolute errors (MAE). For each selected model, we further performed MBSD to locate a number of subgroups that the selected model has a deviated prediction performance. Then another set of pre-trained models are selected only according to their MAE over the subgroup. Finally, the prediction are made by averaging the prediction from the models selected during the previous two steps. The experiments show that our method performances better than selecting trained models with the lowest MAE, and the averaged lowMAE models.

Bayesian active learning with evidence-based instance selection
N Twomey, T Diethe, P Flach
There are at least two major challenges for machine learning when performing activity recognition in the smart-home setting. Firstly, the deployment context may be very different to the context in which learning occurs, due to both individual differences in typical activity patterns and different house and sensor layouts. Secondly, accurate labelling of training data is an extremely timeconsuming process, and the resulting labels are potentially noisy and error-prone. We propose that these challenges are best solved by combining transfer learning and active learning, and argue that hierarchical Bayesian methods are particularly well suited to problems of this nature. We introduce a new active learning method that is based on on Bayesian model selection, and hence fits more concomitantly with the Bayesian framework than previous decision theoretic approaches, and is able to cope with situations that the simple but na¨Ä±ve method of uncertainty sampling cannot. These initial results are promising and show the applicability of Bayesian model selection for active learning. We provide some experimental results combining two publicly available activity recognition from accelerometry data-sets, where we transfer from one data-set to another before performing active learning. This effectively utilises existing models to new domains where the parameters may be adapted to the new context if required. Here the results demonstrate that transfer learning is effective, and that the proposed evidence-based active selection method can be more effective than baseline methods for the subsequent active learning.

Bayesian active transfer learning in smart homes
T Diethe, N Twomey, P Flach
There are at least two major challenges for machine learning in the smart-home setting. Firstly, the deployment context will be very different to the the context in which learning occurs, due to both individual differences in typical activity patterns and different house and sensor layouts. Secondly, accurate labelling of training data is an extremely time-consuming process, and the resulting labels are potentially noisy and error-prone. The resulting framework is therefore a combination of active and transfer learning. We argue that hierarchical Bayesian methods are particularly well suited to problems of this nature, and give a possible formulation of such a model.

A Machine Learning Approach to Objective Cardiac Event Detection
N Twomey, PA Flach

This paper presents an automated framework for the detection of the QRS complex from Electrocardiogram (ECG) signals. We introduce an artefact-tolerant pre-processing algorithm which emphasises a number of characteristics of the ECG that are representative of the QRS complex. With this processed ECG signal we train Logistic Regression and Support Vector Machine classification models. With our approach we obtain over 99.7% detection sensitivity and precision on the MIT-BIH database without using supplementary de-noising or pre-emphasis filters.

Context modulation of sensor data applied to activity recognition in smart homes
N Twomey, P Flach
In this paper we present a method of modulating the context of data captured in smart homes. We show that we can dramatically adapt their sensor network topology and that this approach can be used to help understand various aspects of such sensor environments. We demonstrate how, with our software, we can discover the importance of individual sensors, clusters of sensors and sensor categories for resident identification and activity recognition. Finally, we validate the utility of context modulation in a number of experimental scenarios that show how the activity recognition is affected by each sensor topology elicited by these scenarios.