Distributed Data Analysis at Facebook
December 1st, 2009 by analyticjournalism

This is a few months old, but we're wondering if any readers have used Hive or tried to deploy it in newsrooms, where “exploring and analyzing data…[is] everyone's responsibility.”

Distributed Data Analysis at Facebook

Exploring and analyzing data isn’t the responsibility of one team here at Facebook; it’s everyone’s responsibility. “Move fast” is one of our core values, and to facilitate fast data-driven decisions, the Data Infrastructure Team has created tools like Hive and its UI sidekick, HiPal, to make analyzing Facebook’s petabytes of data easy for anyone in the company. The Data Science team runs open tutorial sessions for groups eager to run their own analysis using these tools. And non-programmers on every team have fearlessly rolled up their sleeves to learn how to write Hive queries.

Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data.

If you like to move fast and want to be a data ninja (no matter what team you are in), check out our Careers page.


Leave a Reply

XHTML: You can use these tags: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

»  Substance:WordPress   »  Style:Ahren Ahimsa