Autore: 
L. Amorosi, J. Puerto, C.Valverde
Abstract: 
Hierarchical clustering is a statistical technique to study the occurring groups (clusters) within a dataset creating a hierarchy of clusters. This is represented by a rooted tree (dendrogram) whose leaves correspond to the data points, and each internal node represents the cluster containing its descendant leaves. Among methods to perform hierarchical clustering, the agglomerative ones are based on greedy procedures that return a sequence of nested partitions, where each level up joins two clusters of the lower partition relying on a local criterion. In this work, motivated by the lack of exact approaches that guarantee global optimality, we focus on a unified mathematical programming formalisation that embeds single and complete linkage procedures. Through preliminary experiments, we evaluate, according to different measures commonly used in this context, the dendrograms obtained from the exact resolution of the formulations and those produced by the greedy approach. Furthermore, by exploiting the mathematical formulation, we also present a scalable matheuristic algorithm capable of generating better quality dendrograms than those produced by the greedy approach, even for large-sized datasets.
Parole Chiave: 
Data science; Hierarchical clustering; Mathematical programming;
Tipo di pubblicazione: 
Rapporto Tecnico
Codice Pubblicazione: 
1
Allegato Pubblicazione: 
ISSN:
2279-798X