A Multiobjective Optimization Perspective on Support Vector Machines for Data Classification

Seminario DSS modalità mista: online  e in presenza,   Abstract:  The data classification problem is encountered in diverse application domains such as healthcare (e.g. disease diagnosis), production (e.g. failure and reliability prediction), marketing (e.g. promotion response prediction) among others. Support Vector Machines (SVMs) present an optimization-based approach to the classification problem. The original SVM formulation seeks to minimize generalization and empirical errors and is indeed a bicriteria one. However, the formulation is expressed by a single scalarized objective function that uses a balancing parameter. We propose taking a traditional multiobjective optimization approach instead of relying on a single balancing parameter. We thus formulate the problem as a biobjective optimization problem with the objectives of generalization and empirical errors. Using an l1-norm formulation makes it possible to use linear programming techniques. We observe that the computational cost of enumerating all extreme nondominated SVM classifiers may not be too high when objectives are linear due to a simple mechanism that is motivated by sensitivity analysis. Experiments on real data in comparison with multiple runs of an SVM model in order to tune the balancing parameter show that, although the standard tuning approach is good at achieving comparable results for the most part, its performance may be less than satisfactory on some instances. Moreover, re-running a SVM model multiple times with different values of the balancing parameter is always more expensive than our comprehensive approach. Next, we study the imbalanced data classification problem. This case is particularly difficult because the larger class typically outweighs the smaller one and further balancing of empirical errors of the two classes becomes necessary. We propose a triobjective SVM formulation where the empirical errors for the two distinct classes are handled separately along the generalization error. As the three objective optimization problem is more difficult to solve, we explore different sampling strategies on nondominated classifiers. We observe that these sampling strategies differ in terms of their classification accuracy and computational cost on real imbalanced data sets. We conclude by identifying some future research directions.    Link all'evento: https://uniroma1.zoom.us/j/87523454651?pwd=wblqgRgOjOs18YcRlprGsmLBz3KzPm.1
Relatore: 
Prof.ssa Serpil Sayin
Affiliazione Relatore: 
Koç University's College of Administrative Sciences and Economics, Instabul
Data: 
25/06/2024 - 12:00
Luogo: 
[Modalità mista. In presenza :stanza 34, 4° piano, Dipartimento di Scienze Statistiche. Online: https://uniroma1.zoom.us/j/87523454651?pwd=wblqgRgOjOs1]