home ¦ Archives ¦ Atom ¦ RSS

BlinkDB

Cool! BlinkDB provides Interactive timescale querying of massive data through declaring how much error you’re willing to tolerate in the answer. Hive on top, Hadoop and/or Spark underneath.

Today’s web is predominantly data-driven. People increasingly depend on enormous amounts of data (spanning terabytes or even petabytes in size) to make intelligent business and personal decisions. Often the time it takes to make these decisions is critical. However, unfortunately, quickly analyzing large volumes of data poses significant challenges. For instance, scanning 1TB of data may take minutes, even when the data is spread across hundreds of machines and read in parallel. BlinkDB is a massively parallel, sampling-based approximate query engine for running interactive queries on large volumes of data. The key observation in BlinkDB is that one can make perfect decisions.

Paper preprint available. Go Bears!!

Via @bigdata

[embed]https://twitter.com/bigdata/status/291228309910614016[/embed]

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.