analyticjournalism.com

It's not "all about story" if you don't have anything to say. So go get some data.

SIDEBAR

»
S
I
D
E
B
A
R
«

How to Make a Heatmap – a Quick and Easy Solution

Jan 21st, 2010 by analyticjournalism

Thanks to Nathan at Flowing Data:

How to Make a Heatmap – a Quick and Easy Solution

By Nathan / Jan 21, 2010 to Featured, Statistical Visualization, Tutorials / 11 comments

How do you make a heatmap? This came from kerimcan in the FlowingData forums, and krees followed up with a couple of good links on how to do them in R. It really is super easy. Here's how to make a heatmap with just a few lines of code, but first, a short description of what a heatmap is.

The Heatmap

In case you don't know what a heatmap is, it's basically a table that has colors in place of numbers. Colors correspond to the level of the measurement. Each column can be a different metric like above, or it can be all the same like this one. It's useful for finding highs and lows and sometimes, patterns.

On to the tutorial.

Step 0. Download R

We're going to use R for this. It's a statistical computing language and environment, and it's free. Get it for Windows, Mac, or Linux. It's a simple one-click install for Windows and Mac. I've never tried Linux.

Did you download and install R? Okay, let's move on.

Step 1. Load the data

Like all visualization, you should start with the data. No data? No visualization for you.

For this tutorial, we'll use NBA basketball statistics from last season that I downloaded from databaseBasketball. I've made it available here as a CSV file. You don't have to download it though. R can do it for you.

I'm assuming you started R already. You should see a blank window.

1Rconsole

Now we'll load the data using read.csv().

nba <- read.csv("http://datasets.flowingdata.com/ppg2008.csv", sep=",")

We've read a CSV file from a URL and specified the field separator as a comma. The data is stored in nba.

Type nba in the window, and you can see the data.

2load

Step 2. Sort data

The data is sorted by points per game, greatest to least. Let's make it the other way around so that it's least to greatest.

nba <- nba[order(nba$PTS),]

We could just as easily chosen to order by assists, blocks, etc.

Step 3. Prepare data

As is, the column names match the CSV file's header. That's what we want.

But we also want to name the rows by player name instead of row number, so type this in the window:

row.names(nba) <- nba$Name

Now the rows are named by player, and we don't need the first column anymore so we'll get rid of it:

nba <- nba[,2:20]

Step 4. Prepare data, again

Are you noticing something here? It's important to note that a lot of visualization involves gathering and preparing data. Rarely, do you get data exactly how you need it, so you should expect to do some data munging before the visuals. Anyways, moving on.

The data was loaded into a data frame, but it has to be a data matrix to make your heatmap. The difference between a frame and a matrix is not important for this tutorial. You just need to know how to change it.

nba_matrix <- data.matrix(nba)

Step 5. Make a heatmap

It's time for the finale. In just one line of code, build the heatmap (remove the line break):

nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, col = cm.colors(256), scale="column", margins=c(5,10))

You should get a heatmap that looks something like this:

3heatmap

Step 6. Color selection

Maybe you want a different color scheme. Just change the argument to col, which is cm.colors(256) in the line of code we just executed. Type ?cm.colors for help on what colors R offers. For example, you could use more heat-looking colors:

nba_heatmap <- heatmap(nba_matrix, Rowv=NA, Colv=NA, col = heat.colors(256), scale="column", margins=c(5,10))

4heat

For the heatmap at the beginning of this post, I used the RColorBrewer library. Really, you can choose any color scheme you want. The col argument accepts any vector of hexidecimal-coded colors.

Step 7. Clean it up – optional

If you're using the heatmap to simply see what your data looks like, you can probably stop. But if it's for a report or presentation, you'll probably want to clean it up. You can fuss around with the options in R or you can save the graphic as a PDF and then import it into your favorite illustration software.

I personally use Adobe Illustrator, but you might prefer Inkscape, the open source (free) solution. Illustrator is kind of expensive, but you can probably find an old version on the cheap. I still use CS2. Adobe's up to CS4 already.

For the final basketball graphic, I used a blue color scheme from RColorBrewer and then lightened the blue shades, added white border, changed the font, and organized the labels in Illustrator. Voila.

nba_heatmap_revised

Rinse and repeat to use with your own data. Have fun heatmapping.

No Comments »

So what ARE people talking abouit

Jan 14th, 2010 by analyticjournalism

One of the things we've noticed about journalism operation that allow comments and discussion on their web pages is that few take the time to analyze that interchange and content. Partially, that's because of a lack of tools. The “tldr Project” is a step toward meeting that challenge.

tldr PROJECT – http://demaws.net/projects/tldr#about

Recent years have seen a proliferation of large-scale discussion spaces on the internet. With increasing user participation, it is not uncommon to find discussion spaces with hundreds to thousands of messages/participants. This phenomenon can be observed on a wide variety of websites – news outlets, blogs, social media websites, community websites and support forums. While most of these discussion spaces are able to support small discussions, their effectiveness is greatly reduced as the discussions grow larger. Users participating in these discussions are overwhelmed by the sheer amount of information presented, and the systems that support these conversations are lacking in functionality that lets users navigate to content of interest.

tldr is an application for navigating through large-scale online discussions. The application visualizes structures and patterns within ongoing conversations to let the user browse to content of most interest. In addition to visual overviews, it also incorporates features such as thread summarization, non-linear navigation, multi-dimensional filtering, and various other features that improve the experience of participating in large-discussions.

The current version of the application is functional for discussions on Reddit. This application will be released shortly. Until the application can be released, here is a video that presents many of the unique features built into the application. For best results, watch the video with HD turned on, or download a high-resolution version from Vimeo. More soon!

VISUALIZATION GALLERY

Here is a sample of patterns seen with the visualizations built into the application. Each of these visualizations present unique insight into the nature of the conversation, and help in discerning points of interest within a large conversation.

PUBLICATION

Narayan, Srikanth and Cheshire, Coye – “Not too long to read: The tldr Interface for Exploring and Navigating Large-Scale Discussion Spaces”. To appear in The 43rd Annual Hawaii International Conference on System Sciences – Persistent Conversations Track – Jan 2010

No Comments »

David Rumsey's website redesign

Dec 14th, 2009 by analyticjournalism

New davidrumsey.com Website Redesign

“For the first time since its launch in 1999, the www.davidrumsey.com website has been completely redesigned and updated. With better navigation and structure, users will find it easier to explore the site's many viewers and collection database with over 21,000 maps online. A new Blog has been added to the site, and includes entries for Recent Additions, News, Featured Maps, Related Sites, and Videos. Over 200 historic maps from the collection can be viewed in a new browser-based version of Google Earth, and users can enter the Second Life version of the map collection directly from a dedicated Second Life portal page on the site. And the collection ticker at the bottom of the home page shows the entire online map library in random order over about 10 hours. As always, all maps can be downloaded for free directly from the site at full resolution. And a new service from Pictopia allows purchase of reproductions of any map in the collection directly from the new LUNA viewing software.”

No Comments »

Cartography 2.0

Dec 11th, 2009 by analyticjournalism

From Internet Scout </a>:

Cartography 2.0

http://cartography2.org/

“Professor Mark Harrower at the University of Wisconsin Madison's Department of Geography was frustrated with the “inability of traditional textbooks to keep pace with Web technologies.” So he and his colleagues set out to create Cartography 2.0, which is a “free knowledge base and e-textbook for students and professionals interested in interactive and animated maps.” First-time visitors might want to look over the “Purpose” section before diving into the separate “Chapters” of the book. All of the chapters can be found on the homepage, and they cover topics such as map animation, virtual globes, elements of design, and map interaction techniques. Each chapter contains descriptive essays, along with maps and diagrams that illustrate key principles. The “New Content” section on the homepage features the latest additions to the site, and overall this work is a model for educators who might be interested in crafting an engaging and dynamic online textbook.”

No Comments »

A fine how-to from FlowingData on making an "Interactive Area Graph"

Dec 9th, 2009 by analyticjournalism

Nathan, the guy behind the code at the FlowingData blog, offers up a good how-to set for producing interactive area graph.

How to Make an Interactive Area Graph with Flare

Posted by Nathan / Dec 9, 2009 to Tutorials / 3 comments

You've seen the NameExplorer from the Baby Name Wizard by Martin Wattenberg. It's an interactive area chart that lets you explore the popularity of names over time. Search by clicking on names or typing in a name in the prompt. It's simple. It's sexy. Everybody loves it.

This is a step-by-step guide on how to make a similar visualization in Actionscript/Flash with your own data and how to customize the design for whatever you need. We're after last week's graphic on consumer spending:

consumer spending

Audience

This tutorial is for people with at least a little bit of programming experience. I'll try to make it as straightforward as possible, but the concepts might be a little hard to grasp if you've never written a line of code. Just a heads up. Of course it never hurts to try.

If you don't care about customization or integration into an application and don't mind putting your data in the public domain, you could also just dump your data into Many Eyes, and use the Stack Graph.

Get Adobe Flex Builder

Like I said, this is all in Actionscript, so before we start anything, I strongly recommend you get Adobe Flex Builder if you don't already have it. You can buy it, get a trial version from the Adobe site, or if you're in education, you can get it for free.

There are ways to compile Actionscript without Flex Builder, but they are more complicated. [read more here]

No Comments »

Swimming in Data? Three Benefits of Visualization

Dec 4th, 2009 by analyticjournalism

Good piece on dataviz from Harvard Business Publishing.

John Sviokla The Near Futurist RSS Feed

Swimming in Data? Three Benefits of Visualization

4:11 PM Friday December 4, 2009

Tags:Information & technology, Knowledge management

“A good sketch is better than a long speech…” — a quote often attributed to Napoleon Bonaparte

The ability to visualize the implications of data is as old as humanity itself. Yet due to the vast quantities, sources, and sinks of data being pumped around our global economy at an ever increasing rate, the need for superior visualization is great and growing. To give dimension to the size of the challenge, the EMC reports that the “digital universe” added 487 exabytes — or 487 billion gigabytes — in 2008. They project that in 2012, we will add five times as much digital information as we did last year.

I believe that we will naturally migrate toward superior visualizations to cope with this information ocean. Since the days of the cave paintings, graphic depiction has always been an integral part of how people think, communicate, and make sense of the world. In the modern world, new information systems are at the heart of all management processes and organizational activities.

About ten years ago, I vividly remember visiting the Cabinet War Rooms in the basement of Whitehall, where Churchill had his war room during WW II. The desks were full of phones, and the walls covered with maps and information about troop levels and movements. These used color coded pieces of string to help Churchill's team easily understand what was happening:

On the one hand, I was struck by how primitive their information environment was only sixty years ago. But on the other, I found it reassuring to see how similar their approach was to war fighting today. The mode, quality and speed of data capture has changed greatly from the 1940s, but the paradigm for visualization of the terrain, forces, and strategy are almost identical to those of WWII. So, the good news is that even in a world of information surplus, we can draw upon deep human habits on how to visualize information to make sense of a dynamic reality. [more]

No Comments »

Distributed Data Analysis at Facebook

Dec 1st, 2009 by analyticjournalism

This is a few months old, but we're wondering if any readers have used Hive or tried to deploy it in newsrooms, where “exploring and analyzing data…[is] everyone's responsibility.”

Distributed Data Analysis at Facebook

Tuesday, August 11, 2009 at 2:53pm

Exploring and analyzing data isn’t the responsibility of one team here at Facebook; it’s everyone’s responsibility. “Move fast” is one of our core values, and to facilitate fast data-driven decisions, the Data Infrastructure Team has created tools like Hive and its UI sidekick, HiPal, to make analyzing Facebook’s petabytes of data easy for anyone in the company. The Data Science team runs open tutorial sessions for groups eager to run their own analysis using these tools. And non-programmers on every team have fearlessly rolled up their sleeves to learn how to write Hive queries.

Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data.

If you like to move fast and want to be a data ninja (no matter what team you are in), check out our Careers page.

No Comments »

Roll-your-own choropleth map with free tools

Nov 12th, 2009 by analyticjournalism

Nathan, honcho at FlowingData, has put together a fine tutorial on making a choropleth map using free tools. This is one bookmark you will want to save.

How to Make a US County Thematic Map Using Free Tools

Posted: 11 Nov 2009 10:57 PM PST

There are about a million ways to make a choropleth map. You know, the maps that color regions by some metric. The problem is that a lot of solutions require expensive software or have a high learning curve…or both. What if you just want a simple map without all the GIS stuff? In this post, I'll show you how to make a county-specific choropleth map using only free tools.

The Result

Here's what we're after. It's the most recent unemployment map from last week.

Unemployment in the United States

Step 0. System requirements

Just as a heads up, you'll need Python installed on your computer. Python comes pre-installed on the Mac. I'm not sure about Windows. If you're on Linux, well, I'm sure you're a big enough nerd to already be fluent in Python.

We're going to make good use of the Python library Beautiful Soup, so you'll need that too. It's a super easy, super useful HTML/XML parser that you should come to know and love.

Step 1. Prepare county-specific data

The first step of every visualization is to get the data. You can't do anything without it. In this example we're going to use county-level unemployment data from the Bureau of Labor Statistics. However, you have to go through FTP to get the most recent numbers, so to save some time, download the comma-separated (CSV) file here.

No Comments »

Philip Meyer Awards – Call for Entries

Oct 16th, 2009 by analyticjournalism

The deadline for applications for the Philip Meyer Award is approaching. Established in 2005, the award was created to honor Philip Meyer’s pioneering efforts to utilize social science research methods to foster better journalism. The postmark deadline for entries is October 31, 2009.

The contest recognizes stories that incorporate survey research, probabilities and other social science tools in creative ways that lead to journalism vital to the community.

Three awards are given annually:

    * $500 for 1st place

    * $300 for 2nd place

    * $200 for 3rd place

Last year’s winning entries exposed bureaucratic lapses that have hindered the search for causes of SIDS; uncovered the NHTSA’s failure to consider non-deploying airbags as being a significant safety issue; and analyzed arrest and court data to reveal towns where blacks were being arrested in extraordinary numbers for minor offenses like loitering or jaywalking.

Work submitted for consideration must have been published between October 1, 2008 and September 30, 2009. A pdf of the entry is available to download online (http://www.ire.org/resourcecenter/contest/MeyerEntryForm09.pdf) . For additional information, please refer to the Philip Meyer Award FAQ page (http://ire.org/resourcecenter/contest/meyerawardfaq.html).

Award winners will be honored at the 2010 CAR Conference in Phoenix, Ariz.

The Philip Meyer Award is sponsored by the National Institute for Computer-Assisted Reporting, a joint program of IRE and the Missouri School of Journalism; the Knight Chair in Journalism at Arizona State University; and IRE.

—

Beth Kopine

Resource Center Director & Contest Coordinator

Investigative Reporters and Editors, Inc.

141 Neff Annex

Missouri School of Journalism

Columbia, MO 65211

Phone: 573-882-6668

Fax: 573-882-5431

Email: beth@ire.org

No Comments »

Could World of Warcraft be the new War and Peace?

Oct 2nd, 2009 by analyticjournalism

From the Nieman Foundation “Storyboard“:

Could World of Warcraft be the new War and Peace?

Whether Pacman or Halo first introduced you to video games, calling them “high art” might stretch the sensibilities. But boardwalk nickelodeons led to movies like The Godfather—could a similarly radical transformation be underway with games?

Narrative journalism draws many of its core principles from novels, films, and short stories. Elements like character development, scene-setting, and a narrative arc work whether the tale is true or made up.

Games, however, are different.

“There are characters and stories in games, just like there are characters and stories in linear media, so it feels like you’re dealing with something that’s in the same ballpark,” says Chris Swain, associate professor at the University of Southern California’s Games Institute. “But I actually believe that they’re very different.”

No Comments »

» Substance:WordPress » Style:Ahren Ahimsa