ONCORELIEF’S AI MODELS VERIFICATION

The consortium is happy to announce that has proceeded with the first design of the Artificial Intelligence (AI) models that will be used in ONCORELIEF for predicting a patient’s quality of life. CERTH, SUITE5 and EXUS, in close collaboration with the medical partners, have had a consensus with regard to the features that will be used to train the Machine Learning (ML) models. After this, CERTH has proceeded to a preliminary design of the ML models, as well as the training and evaluation process, that will be finalized when real data will be available.

According to the adopted ONCORELIEF Data Model, the data are collected from the different data sources (Big Data), are transformed to FHIR format and stored to the back-end storage. There are three different data sources: 1. Measurements from the wearable sensors; 2. Data from questionnaires answered by the patients; 3. Data from the medical record of the patient. The first case provides dynamic data (time dependent), while the last two provide static data (that could be updated when they change). A number of about 300 different patients (for both colorectal cancer (CRC) and Acute Myeloid Leukaemia (AML)) are expected to provide the training data.

In order to select the most suitable ML model, several Deep Learning Algorithms, such as Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) will be explored, so as to select the most accurate. CNNs are useful when dimensionality reduction is needed, but this is not the case, since the raw data are not images. On the other hand, RNNs can capture time dependencies across time sequencies. Despite this, they require more parameters than CNNs but only half the number compared to Deep Neural Networks (DNNs). Finally, RNNs are able to look at long sequence inputs without increasing the network size. Since the objective of the project is to predict the patient’s Well Being and Quality of Life Index, based of historical data, the RNNs look more suitable for this task.

Thus, several RNNs architectures will be trained with the incoming data. After training, validation enhancements, using data that has not been used in the training phase will be performed through the use of deep generative models, attempting to select the most accurate.

Thus, the training process includes the following steps:

  • Training is performed from scratch, using an initial training set from the collected data.

  • After a sufficient amount of new data is collected (for example every week of month) or the prediction error is over a predefined threshold, the ML model will be re-trained at the back-end, modifying the previously trained model, using the new data.

  • The trained and fine-tuned ML model will be deployed to a “lite” version, suitable to be installed and used in mobile devices.

This training and evaluation process is described in the following flow-chart.

Figure 1. Model Training and Evaluation Graph.

The next step is to decide the suitable NN architecture, given the specified data set. In the literature, there are many methods that the team has reviewed to select a fitting plan for the ONCORELIEF project needs. These are briefly presented in the following paragraphs and in figures 2 and 3.

In [2], an RNN architecture is described, that is specifically designed for the clinical domain, combining static and dynamic information in order to predict future events. Based on this idea, a suitable general architecture is proposed in fig. 2. This architecture will be trained for different configurations (different number of dynamic inputs, different size of depth, different types of units, etc.).

Figure 2. Recurrent Neural Network (RNN) Architecture.

In [3], generative models such as LSTM NNs have been used to extract temporal information from dynamic data. Then this information can be combined with the static features in order to improve the classification performance. The architecture depicted in fig. 3 is inspired from these ideas. During the training phase, a variety of ML algorithms will be implemented and trained, in the different modules of the proposed architecture.

Figure 3. Ensemble Architecture Using Different ML modules.

References

  1. Predicting Clinical Events by Combining Static and Dynamic Information using Recurrent Neural Networks, C Esteban, O. Staech et al., 2016 IEEE International Conference on Healthcare Informatics (ICHI), DOI: 10.1109/ICHI.2016.16, October 2016.

  2. Combining Static and Dynamic Features for Multivariate Sequence Classification, A. Leontjeva and I. Kuzoukin, 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), DOI: 10.1109/DSAA.2016.10, October 2016.