Nathan, the chap who curates the valuable blog Flowing Data, offers up a bit of hope for journalists who are worried about their employment futures and yet have invested in learning methods of data analysis. When thinking about re-inventing ourselves, consider the phrase “data scientist.”
Rise of the Data Scientist
As we've all read by now, Google's chief economist Hal Varian commented in January that the next sexy job in the next 10 years would be statisticians. Obviously, I whole-heartedly agree. Heck, I'd go a step further and say they're sexy now – mentally and physically.
However, if you went on to read the rest of Varian's interview, you'd know that by statisticians, he actually meant it as a general title for someone who is able to extract information from large datasets and then present something of use to non-data experts.
Sexy Skills of Data Geeks
As a follow up to Varian's now-popular quote among data fans, Michael Discroll of Dataspora, discusses the three sexy skills of data geeks. I won't rehash the post, but here are the three skills that Michael highlights:
- Statistics – traditional analysis you're used to thinking about
- Data Munging – parsing, scraping, and formatting data
- Visualization – graphs, tools, etc.
Oh, but there's more…
These skills actually fit tightly with Ben Fry's dissertation on Computational Information Design (2004). However, Fry takes it a step further and argues for an entirely new field that combines the skills and talents from often disjoint areas of expertise:
- Computer Science – acquire and parse data
- Mathematics, Statistics, & Data Mining – filter and mine
- Graphic Design – represent and refine
- Infovis and Human-Computer Interaction (HCI) – interaction
And after two years of highlighting visualization on FlowingData, it seems collaborations between the fields are growing more common, but more importantly, computational information design edges closer to reality. We're seeing data scientists – people who can do it all – emerge from the rest of the pack.
Advantages of the Data Scientist
Think about all the visualization stuff you've been most impressed with or the groups that always seem to put out the best work. Martin Wattenberg. Stamen Design. Jonathan Harris. Golan Levin. Sep Kamvar. Why is their work always of such high quality? Because they're not just students of computer science, math, statistics, or graphic design.
They have a combination of skills that not just makes independent work easier and quicker; it makes collaboration more exciting and opens up possibilities in what can be done. Oftentimes, visualization projects are disjoint processes and involve a lot of waiting. Maybe a statistician is waiting for data from a computer scientist; or a graphic designer is waiting for results from an analyst; or an HCI specialist is waiting for layouts from a graphic designer.
Let's say you have several data scientists working together though. There's going to be less waiting and the communication gaps between the fields are tightened.
How often have we seen a visualization tool that held an excellent concept and looked great on paper but lacked the touch of HCI, which made it hard to use and in turn no one gave it a chance? How many important (and interesting) analyses have we missed because certain ideas could not be communicated clearly? The data scientist can solve your troubles.
An Application
This need for data scientists is quite evident in business applications where educated decisions need to be made swiftly. A delayed decision could mean lost opportunity and profit. Terabytes of data are coming in whether it be from websites or from sales across the country, but in an area where Excel is the tool of choice (or force), there are limitations, hence all the tools, applications, and consultancies to help out. This of course applies to areas outside of business as well.
Learn and Prosper
Even if you're not into visualization, you're going to need at least a subset of the skills that Fry highlights if you want to seriously mess with data. Statisticians should know APIs, databases, and how to scrape data; designers should learn to do things programmatically; and computer scientists should know how to analyze and find meaning in data.
Basically, the more you learn, the more you can do, and the higher in demand you will be as the amount of data grows and the more people want to make use of it.