# k-Means Clustering of Lines for Big Data

### Wednesday, November 13th, 2019, 16:10

### Checkpoint 480

### k-Means Clustering of Lines for Big Data

### Yair Marom, University of Haifa

### Abstract:

The input to the k-mean for lines problem is a set L of n lines in R^d, and the goal is to compute a set of k centers (points) in R^d that minimizes the sum of squared distances over every line in L and its nearest center. This is a straightforward generalization of the k-mean problem where the input is a set of n points instead of lines

We suggest and the first PTAS that computes a (1+epsilon)-approximation to this problem in time O(n log n) for any constant epsilon>0, and k, d > 1.

Streaming input using ~log(n) memory, and distributed parallel computations are also provided.

This is by proving that there is always a weighted subset (called coreset) of dk^{O(k)} log (n)/epsilon^2 lines in L that approximates the sum of squared distances from L to any given set of k points.

Experimental results, open source, and applications for anomaly detection and localization via standard (2D) cameras are also provided.

Based on a NIPS'19 paper, co-authored with Dan Feldman

We suggest and the first PTAS that computes a (1+epsilon)-approximation to this problem in time O(n log n) for any constant epsilon>0, and k, d > 1.

Streaming input using ~log(n) memory, and distributed parallel computations are also provided.

This is by proving that there is always a weighted subset (called coreset) of dk^{O(k)} log (n)/epsilon^2 lines in L that approximates the sum of squared distances from L to any given set of k points.

Experimental results, open source, and applications for anomaly detection and localization via standard (2D) cameras are also provided.

Based on a NIPS'19 paper, co-authored with Dan Feldman