Descrizione: 

Jeanine-Houwing Duistermaat and Haiyan Liu
from Department of Statistics, University of Leeds

Abstract:
The current increase of the availability of temporal datasets provides many opportunities for methods development. Examples are integration and joint analysis of multiple temporal datasets and modelling of sparse and irregular data from Electronic Health Records (EHR). One of our motivating studies aims to build a prediction tool for disease progress of Scleroderma using data from EHRs. Scleroderma is a rare, clinically heterogeneous multisystem disorder which greatly affects patients’ physical and psychological functioning. Since only 15% of the patients show progress of the disease, prediction of progression is important for clinicians and patients to decide on follow up and treatment strategies. One of the outcomes of progression of the disease is drop in DLCO which is an index of lung function capacity. In our dataset, we have DLCO measurements for 152 patients with 2 to 7 visits over 60 months. DLCO measurements appear to change continually over time, hence they are (sparse) functional data. In addition to the historical DLCO measurements, we have access to measurements for four biomarkers. Our aim is to predict Scleroderma disease progress based on patient’s historical data together with the information of all other patients, and biomarkers. Here the methodological challenges are sparsity and irregularity of the data.
To address these challenges, we propose a functional principal component analysis method and scalar-on-function regression method. The restricted maximum likelihood method is employed to estimate the eigenelements of underlying covariance function and scores are estimated through conditional expectation method. Then the DLCO trajectories are recovered by using the truncated Karhunen-Loeve decomposition based on the estimated eigenelements and scores. Similar FPCA procedure is also applied to predict a patient’s last visit DLCO value by borrowing the information of all the other patients and its own history (with the last visit DLCO value being removed).
We will present our methods, results of the data analysis and discuss future challenges for modelling temporal datasets.

Data: 
14-06-2018
Luogo: 
La Sapienza di Roma. Città universitaria. Scienze statistiche. IV piano aula 34. Ore 11.30