An up-dated data clean-up tool at Google-Refine
Nov 14th, 2010 by Tom Johnson

Check out Google-Refine at



Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.


Another fine tool for slicing and dicing data….
Nov 9th, 2010 by Tom Johnson

From …..

Find the names in your data with Mr. People

November 8, 2010 to Online Applications | Post on Twitter

Inspired by Shan Carter's simple data converter, appropriately named Mr. Data Converter, Matthew Ericson just put Mr. People online. The tool lets you paste a list of names, and it will parse the first and last name, suffix, title, and other parts for you. You can even have multiple names in a single row.

Years ago, while trying to clean up the names of donors in campaign finance data from the Federal Election Commission, I hacked together a Perl module — loosely based on the Lingua-EN-NameParse module — to standardize names. One port to Ruby later, I've finally put together a Web front end for it.

Getting data in the right format, whether for analysis or visualization, can be a huge pain. Imagine. All the data you need is right in front of you, but you can't do anything with it yet, because as often is the case, it's not in a nice and pretty rectangular format. So anything that makes this easier and quicker is an instant bookmark for me.

[Mr. People via @mericson]


Agent-Based Modelling: The Next 15 Years
Nov 2nd, 2010 by Tom Johnson

The Journal of Artificial Societies and Social Simulation ( ) has been around for 13 years, and it has become increasingly important for analytic journalists who believe that simulation modeling is — and increasingly will be — an keystone perspective for serious, value-added journalism.  And it's one of the FREE e-journals available.  Support it if you can. 

In the meantime, if you're not familiar with agent-based modeling, check out this article by Lynne Hamill; it will point you to some useful concepts and tools:

Agent-Based Modelling: The Next 15 Years

by Lynne Hamill


This short note makes recommendations for the future direction of research in agent-based modelling (ABM). It is a personal view based on my experience as a policy adviser who has recently come to ABM. I suggest that to promote the use of ABM, the ABM community needs demonstrate the value of modelling to other social scientists by showing-by-doing and offering training projects; and to produce tools, guidance on good-practice and basic building blocks. Then the policy contexts most likely to benefit from ABM need to be identified along with any new data requirements, so that the usefulness of ABM can be demonstrated to policy analysts. This is, in my view, the challenge facing the ABM community for the next 15 years.


Race and ethnicity mapped by block
Sep 20th, 2010 by Tom Johnson

Census tract data and maps, while better than nothing, can often deceive because the size of the tract is greatly influenced by population size, not area. It is not uncommon that natural and constructed barriers — mountains or freeways — influence the movements and spatial demographics of a tract. Ah, but BLOCK data, now there is some fine, fine-grained data that we can use to extract insights and meaning.
Once again, tips us off to a good visualization of population data and the resulting maps.


Race and ethnicity mapped by block

Sep 20, 2010 to Mapping | Post on Twitter

A taxonomy of transitions

Instead of breaking up demographics by defined boundaries, Bill Rankin uses dots to show the more subtle changes across neighborhoods in a map of Chicago using block-specific data US Census.

Any city-dweller knows that most neighborhoods don't have stark boundaries. Yet on maps, neighborhoods are almost always drawn as perfectly bounded areas, miniature territorial states of ethnicity or class. This is especially true for Chicago, where the delimitation of Chicago's official “community areas” in the 1920s was one of the hallmarks of the famous Chicago School of urban sociology.

Each dot represents 25 people of the map color's corresponding ethnicity.

Eric Fischer takes the next step and applies the same method to forty major cities. Here are the maps for Los Angeles, San Francisco, and New York, respectively. Same color-coding applies. You definitely see the separation, but zoom and you much more subtle transitions.

[Eric Fischer via Data Pointed]

Taking control of Google Maps
Sep 8th, 2010 by Tom Johnson

Patrick Cain, who correctly describes himself as “a journalist who makes maps for the Web,” has posted a couple neat sets of tips to his blog. Basically, they suggest ways to tweak some of Google's code to improve presentation. Check out his blog tips at


Simplifying map display

I’ve never been a fan of the way Google Maps handles local labels (neighbourhoods, for example) – they are often redundant, inconsistent and wrong, as well as cluttering the map visually.

These examples didn’t take long to collect:

Leslieville is so nice they labeled it twice:

Same with the Bridle Path, more or less:

Google has solved the unresolvable Beach vs. Beaches debate by using both labels:

Forest Hill South is not an ambition of well-off Annexites, but is actually north of Forest Hill:

Ramping up your statistical skills
Sep 3rd, 2010 by Tom Johnson

From FlowingData….

Statistical literacy guides for the basics

Sep 3, 2010 to Statistics | Post on Twitter

Guide to statistical charts - before and after

“You can get pretty far with data graphics with just limited statistical knowledge, but if you want to take your skills, resume, and portfolio to the next level, you should learn standard data practices. Of all places, UK Parliament has some short and free guides to help you with basic statistical concepts. They provide 13 notes, each only two or three pages long that can help you with stuff like how to adjust for inflation, confidence intervals and statistical significance, or basic graph suggestions [pdf]. I like.”



WSJ goes the distance with the Google Maps API
Aug 10th, 2010 by Tom Johnson

A good piece on the  on how the Wall Street Journal crew created a fine set of maps illustrating various major-city marathons.  Go here for complete piece.

WSJ goes the distance with the Google Maps API

Sunday, August 08, 2010

The following guest blog post was written by Albert Sun of the Wall Street Journal. He takes us behind the scenes in the creation of a recent news graphic titled: “Going the Distance: Comparing Marathons“.

The Google Maps API has been a great boon for news websites and a great help in creating all kinds of interactive graphics involving maps. Here at the WSJ we're big fans of the API and happy that Google continues to improve it and roll out new features.

 We got the idea to map out the routes of Marathons from a story by Kevin Helliker about how despite the beautiful scenic route of the race, the San Francisco marathon was still very unpopular. The difficulty and the hilly terrain kept people from attempting it. To help people see this better, we decided to compare the San Francisco marathon to the big three US marathons: Boston, New York and Chicago.

The code for our marathons graphic grew out of a similar graphic we did for our coverage of the Tour De France. In this one, we managed to incorporate many improvements. Two new features of the Google Maps API played a big role in this graphic. The Elevation API let us quickly and easily get a comparison between the different routes.

Styled Maps let us give the map more of a distinctive WSJ look. We have a distinctive style for our maps in print, and there is some reluctance to run maps online that deviate from that style. Styled Maps lets us get close enough for what we're trying to show. When Styled Maps first becomes available we used the Styled Map Wizard to create a set of different looks for different types of maps, trying to recreate our own maps style.

Along with the Google Maps API, we used jQuery for its wealth of convenience functions and how much easier it makes writing programs in JavaScript. The core of the graphic is a basic Polyline drawn in Google Maps showing the route.  [more]


Internship in Bolivia
Jul 14th, 2010 by Tom Johnson

Jack Kinsella, of the Bolivian Express, writes:

“The Bolivian Express, an English language magazine in La Paz, Bolivia, set up by Bolivian graduates in collaboration with students from around the world. We are a subsidiary of the Grupo Express Press, which publishes another magazine in Bolivia, Revista Metro ( We would love if you could include us in your database of journalism internships and feature us on your website (

“The Bolivian Express has just started an ongoing journalism internship program in Bolivia where interns take Spanish classes, journalism classes, photography classes and cinematography classes. Participants are paired with Bolivians in La Paz and are then expected to explore Bolivian culture, eventually producing four pages of content for our magazine each month. This content is then passed to our editors who offer feedback, helping to improve our intern's writing skills. Due to the large numbers of classes offered our internship is perfect for students with a strong passion for learning.

“Our magazine is distributed on the ground, in the skies and, within the next week, online.

“For more information see our website:

Use "BatchGeo" to quickly generate Google Maps with multiple locations
Jun 19th, 2010 by Tom Johnson

If you've acquired a spreadsheet file with a bunch of addresses, you can quickly map them using BatchGeo.  We haven't tried it yet with a huge data set, but it works nicely with a couple hundred addresses.  Check out BatchGeo at

Have locations in a spreadsheet? Well try this free and unique tool to…

  • Map them using Google Maps
  • Publish a map on your Web site
  • Create a store locator
  • Get coordinates, print maps, and more!

Get started by following the steps below, or check out our video tutorials

What could I use this for?

  • Create a map – Copy directly from spreadsheet program such as Excel, Numbers, or the free Google Docs or OpenOffice Calc.
  • Distance Calculator – Calculate the distance in miles or kilometers to several locations from a single address.
  • Satellite Photos – Addresses are linked to Google Maps for satellite photos and driving directions.
  • Make your own Google Earth KML – Quickly create KML files with your address data for 3D viewing data in Google Earth.
  • Get postal codes / zip codes – Retrieve postal or zip codes for a given address internationally.
  • Print a map – Make a printable map with your addresses on it.
  • Save a map – Create a map with your locations and associated data to a web page for later use.
  • Create a store locator – Map your store properties, and then link to them from your website.
  • Get center coordinates (centroids) for a listing of zip codes, cities, or states.
  • For quick single address geocodes, zip code, city or state centroids use our Single Address Lookup Tool ”

JavaScript Solutions for Charts and Graphs
Jun 15th, 2010 by Tom Johnson

15 Crazy Useful JavaScript Solutions for Charts and Graphs

Graphs and charts are a great way to break down the information at hand to the user in a descriptive and visually enticing manner. These visual structures allow you to easily simplify complex data and output easier to understand content. Everyone can use a graph or chart, however, not everyone has the right tools to create an effective one. Below we’ve compiled the best JavaScript graphs and chart solutions. We chose to put a list of JavaScript graphs because of their flexibility and functionality.


»  Substance:WordPress   »  Style:Ahren Ahimsa