Dynamic visualization for L1 fusion convex clustering in near-linear time
B Zhang, J Chen, Y Terada - Uncertainty in Artificial …, 2021 - proceedings.mlr.press
B Zhang, J Chen, Y Terada
Uncertainty in Artificial Intelligence, 2021•proceedings.mlr.pressConvex clustering has drawn recent attention because of its competitive performance and
nice property to guarantee global optimality. However, convex clustering is infeasible due to
its high computational cost for large-scale data sets. We propose a novel method to solve
the L1 fusion convex clustering problem by dynamic programming. We develop the Convex
clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution
path efficiently. The proposed C-PAINT yields the exact solution while other general solvers …
nice property to guarantee global optimality. However, convex clustering is infeasible due to
its high computational cost for large-scale data sets. We propose a novel method to solve
the L1 fusion convex clustering problem by dynamic programming. We develop the Convex
clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution
path efficiently. The proposed C-PAINT yields the exact solution while other general solvers …
Abstract
Convex clustering has drawn recent attention because of its competitive performance and nice property to guarantee global optimality. However, convex clustering is infeasible due to its high computational cost for large-scale data sets. We propose a novel method to solve the L1 fusion convex clustering problem by dynamic programming. We develop the Convex clustering Path Algorithm In Near-linear Time (C-PAINT) algorithm to construct the solution path efficiently. The proposed C-PAINT yields the exact solution while other general solvers for convex problems applied in the convex clustering depend on tuning parameters such as step size and threshold, and it usually takes many iterations to converge. Including a sorting process that almost takes no time in practice, the main part of the algorithm takes only linear time. Thus, C-PAINT has superior scalability comparing to other state-of-art algorithms. Moreover, C-PAINT enables the path visualization of clustering solutions for large data. In particular, experiments show our proposed method can solve the convex clustering with 10^ 7 data points in two minutes. We demonstrate the proposed method using both synthetic data and real data. Our algorithms are implemented in the dpcc R package.
proceedings.mlr.press
Showing the best result for this search. See all results