Lasso-penalized clusterwise linear regression

  Abstract In clusterwise regression analysis, the goal is to predict a response variable based on a set of explanatory variables, where each predictor has different contributions to the response depending on the cluster. The number of candidates is typically large: whereas some of these variables might be useful, some others might contribute very little to the prediction. A well known method to perform variable selection is the lasso, where the penalty is calibrated by minimizing the Bayesian Information Criterion (BIC). However, available approaches to the computation of lasso-penalized estimators are time consuming and/or require approximate schemes making the tuning of the penalty cumbersome. In order to ease such computation, we introduce an expectation maximization algorithm with closed-form updates. This is based on an iterative scheme where the Least Angle Regression algorithm is used to update the component specific regression coefficients. We show that this approach, in addition to shortening the calculation times of the lasso-penalized solution, gives an optimal grid for BIC minimization. The method is assessed by means of a simulation study and an application to Major League Baseball salary data from the 1990s.   Key words: clusterwise linear regression, penalized likelihood, feature selection.  
Roberto Di Mari (Università di Catania)
16/12/2019 - 14:00
[Aula VII. Edificio di Scienze Statistiche. CU002, Sapienza Universita' di Roma Città universitaria. La Sapienza.]