Abstract
Given a dataset D partitioned in clusters, the joint distance function (JDF) J (x) at any point x is the harmonic mean of the distances between x and the cluster centers. The JDF is a continuous function, capturing the data points in its lower level sets (a property called contour approximation), and is a useful concept in probabilistic clustering and data analysis. In particular, contour approximation allows a compact representation of the data: for a dataset in Rn with N points organized in K clusters, the JDF requires K centers and covariances (if Mahalanobis distances are used), for a total of Kn (n + 3) / 2 parameters, and a considerable reduction of storage if N ≫ K, n. The JDF of the whole dataset, J (D) {colon equals} ∑ {J (x) : x ∈ D}, is a measure of the classifiability of the data, and can be used to determine the "right" number of clusters for D. A duality theory for the JDF J (D) is given, in analogy with Kuhn's geometric duality theory for the Fermat-Weber location problem. The JDF J (D) is the optimal value of a primal problem (P), for which a dual problem (D) is given, with a sharp lower bound on J (D).
| Original language | English (US) |
|---|---|
| Pages (from-to) | 2771-2780 |
| Number of pages | 10 |
| Journal | Linear Algebra and Its Applications |
| Volume | 430 |
| Issue number | 10 |
| DOIs | |
| State | Published - May 1 2009 |
All Science Journal Classification (ASJC) codes
- Algebra and Number Theory
- Numerical Analysis
- Geometry and Topology
- Discrete Mathematics and Combinatorics
Keywords
- Clustering
- Contour approximation of data
- Duality
- Harmonic mean
- Joint distance function
- Mahalanobis distance
- Weiszfeld method
Fingerprint
Dive into the research topics of 'Contour approximation of data: A duality theory'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver