Autore:
L. Amorosi, J. Puerto, C.Valverde
Abstract:
Hierarchical clustering is a statistical technique to study the occurring groups (clusters) within a dataset
creating a hierarchy of clusters. This is represented by a rooted tree (dendrogram) whose leaves correspond
to the data points, and each internal node represents the cluster containing its descendant leaves. Among
methods to perform hierarchical clustering, the agglomerative ones are based on greedy procedures that
return a sequence of nested partitions, where each level up joins two clusters of the lower partition relying on
a local criterion. In this work, motivated by the lack of exact approaches that guarantee global optimality,
we focus on a unified mathematical programming formalisation that embeds single and complete linkage
procedures. Through preliminary experiments, we evaluate, according to different measures commonly used
in this context, the dendrograms obtained from the exact resolution of the formulations and those produced
by the greedy approach. Furthermore, by exploiting the mathematical formulation, we also present a scalable
matheuristic algorithm capable of generating better quality dendrograms than those produced by the greedy
approach, even for large-sized datasets.
Parole Chiave:
Data science; Hierarchical clustering; Mathematical programming;
Tipo di pubblicazione:
Rapporto Tecnico
Codice Pubblicazione:
1
Allegato Pubblicazione:
Contatto:
ISSN:
2279-798X