Spark-TS

Posted on: Mon 14 December 2015

Link parkin’:

Large-scale time-series data shows up across a variety of domains. In this post, I’ll introduce Spark-TS, a library developed by Cloudera’s Data Science team (and in use by customers) that enables analysis of data sets comprising millions of time series, each with millions of measurements. Spark-TS runs atop Apache Spark, and exposes Scala and Python APIs.

Deployed by Cloudera with real customers, according to them. Sorely needed. Appreciate the Python modules, which I hope aren’t too far behind the Scala API.