home ¦ Archives ¦ Atom ¦ RSS

A Chihuahua in Your Pig

That would be Pig UDF for GraphChi. A complete hack of the best kind:

Pig is a powerful query language for Hadoop commonly used for large scale data processing. Now it is possible to run GraphChi programs as parts of Pig-scripts, with just one line of script! This allows easy huge scale graph computation with data stored in HDFS (Hadoop File System). As GraphChi will ultimately execute only on a single Hadoop machine (see HowGraphChiForPigWorks), the size of the Hadoop Cluster is not a limiting factor.

GraphChi for Pig is a viable alternative to Giraph, which is a distributed graph engine built on top of Hadoop. With GraphChi, you can develop your algorithms on your laptop (with realistically sized data) and then deploy them to run the big cluster. GraphChi will also often run faster and uses much less resources than alternatives.

© 2008-2024 C. Ross Jam. Built using Pelican. Theme based upon Giulio Fidente’s original svbhack, and slightly modified by crossjam.