home ¦ Archives ¦ Atom ¦ RSS

The Impala Inhale

Cloudera’s Impala sounds like an exciting way to query HDFS and HBase data at interactive speeds. But the installation dependencies are sort of painful, basically forcing either the use of Cloudera Manager or Cloudera’s packages for RedHat Enterprise Linux. Checking out the fun-filled requirements even includes this gem:

Impala creates and uses a user and group named impala. Do not delete this account or group and do not modify the account’s or group’s permissions and rights. Ensure no existing systems obstruct the functioning of these accounts and groups. For example, if you have scripts that delete user accounts not in a white-list, add these accounts to the list of permitted accounts.

So now the user accounts space gets littered on along with a bunch of other config files across the filesystem. Yick!

Makes Shark look a lot more attractive from a small scale applied research project level. I think you can do Shark, and Spark, pretty much from tar balls and at user level. And avoiding that enterprisey inhale is a good way to reduce complexity.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.