A bit of creative Analytic Journalism, Oprah-wise
Aug 11th, 2008 by Tom Johnson

The NYTimes moves an interesting short today describing how a couple of economists did some creative analysis suggesting that Oprah was worth a million-plus primary votes for Obama.

Endorsement From Winfrey Quantified: A Million Votes

Published: August 10, 2008

Presidential candidates make the most of celebrity supporters, showing
them off in television ads and propping them on podiums to stand and
wave. No doubt Mike Huckabee’s aborted campaign for the Republican nomination got some sort of bump from those commercials of him with Chuck Norris, right?

Or maybe not. Politicians and pundits routinely claim that celebrity
endorsements have little sway on voters, and two economists set out
recently to test the premise. What they found was that at least one
celebrity does hold influence in the voting booth: Oprah Winfrey.

The economists, Craig Garthwaite and Timothy Moore of the University of Maryland, College Park, contend that Ms. Winfrey’s endorsement of Barack Obama last year gave
him a boost of about one million votes in the primaries and caucuses.
Their conclusions were based partly on a county-by-county analysis of
subscriptions to O: The Oprah Magazine and sales figures for books that
were included in her book club.

Those data points were cross-referenced with the votes cast for Mr.
Obama in various polling precincts. The results showed a correlation
between magazine sales and the vote share obtained by Mr. Obama, and
extrapolated an effect of 1,015,559 votes.

“We think people take political information from all sorts of sources
in their daily life,” Mr. Moore said in an e-mail message, “and for
some people Oprah is clearly one of them.”

In their as-yet-unpublished research paper on the topic, the economists
trace celebrity endorsements back to the 1920 campaign of Warren
Harding (who had Al Jolson, Lillian Russell and Douglas Fairbanks in his corner), and call Ms. Winfrey “a celebrity of nearly unparalleled influence.”

The economists did not, however, look at how Ms. Winfrey’s endorsement
of Mr. Obama may have affected her own popularity. A number of people —
women in particular — were angry that Ms. Winfrey threw her first-ever
political endorsement to a man rather than his female opponent.

The research did not try to measure the influence of other stars’
endorsements; for instance, no similar measures were available for
Obama supporters like the actress Jessica Alba or Pete Wentz of the
band Fall Out Boy. “If a celebrity endorsement is ever going to have an
empirically identifiable audience, then it is likely to be hers,” the
researchers said of Ms. Winfrey. Sorry, Chuck Norris.

More good work out the UC Berkeley Viz Lab
Jul 31st, 2008 by Tom Johnson

A helpful post from Nathan at FlowingData

New Version of Flare Visualization Toolkit Released

Posted Jul 31, 2008 to Software, Visualization by Nathan
3 responses

New Version of Flare Visualization Toolkit Released

A new version of Flare,
the data visualization toolkit for Actionscript (which means it runs in
Flash), was just released yesterday with a number of major improvements
from the previous version. The toolkit was created and is maintained by
the UC Berkeley Visualization Lab and was one of the first bits of Actionscript that I got my hands on. The effort-to-output ratio was pretty satisfying, so if you want to learn Acitonscript for data visualization, check out Flare. The tutorial is a good place to start.

Here are some sample applications created with Flare:

Direct to the Dashboard
Jan 28th, 2008 by JTJ

 We've been a fan of the dashbroad approach for a long time because dashboard graphics can give readers a quick snapshot of multiple sets of dynamic data.  Charley Kyd, who studied journalism some years back, has developed a nifty plug-and-play package — Dashbroad Kit #1 — to generate these.  And below is a recent and relevant posting from Jorge Camoes that gives us some good tips on the topic.


10 tips to improve your Excel dashboard

Posted: 26 Jan 2008 06:42 PM CST

Posts in the series Excel Dashboard

  1. How to create a dashboard in Excel
  2. 10 tips to improve your Excel dashboard

Excel is a great (but underrated) BI tool. Several BI vendors gave up fighting it and offer Excel add-ins as front-ends for their BI solutions. So, if you want to create a dashboard you should consider Excel, since it really offers better functionalities than many other applications for a fraction of the cost and development time. I know that Excel is not a one-size-fits-all solution, but first you should be sure that your requirements are not met by Excel. Let me share with you some random tips from my experience with the Demographic Dashboard.

But, shouldn’t I just ask my IT to create the dashboard?

This is a fact: many IT departments hate Excel. The IT spends millions in BI solutions and users keep using Excel. Why? Because they know it, they like it, they feel in control and can do what ever they want with the data. Ask your BI manager to replicate the image above using an expensive BI solution and he’ll come back six month later with something you didn’t ask for, to answer a need you don’t have anymore (I know, I’m oversimplifying…). Do you know Master Foo Defines Enterprise Data?

1. Go to the point, solve a business need

So, you have your idea for a dashboard, you’ve discuss the project it with the users (right?) and you are ready. But where to start? Remember this: a graph, a table, the entire dashboard, are merely instrumental to solve a business need. It’s about insights, not about data, not about design.

2. Don’t use formulas

Yes, I know, this is Excel, and it is supposed to have formulas. What I am telling you is that you should aim at minimizing the number of independent formulas, and this should be a fundamental constraint to your global strategy. Too often I see Excel used as a database application. It is not, it is a spreadsheet (not everyone finds this obvious).

Over the years I had my share of “spreadsheet hell”: a lookup formula in the middle of nowhere would reference a wrong range for no apparent reason. An update cycle adds a new column and suddenly there are errors all over the place. You leave the project for a week and when you come back you don’t know what all those formulas mean. Even if everything goes smoothly the auditing dep wants to trace every single result.

But how do you minimize the use of formulas? If your data table resides in an Excel sheet you’ll have to rely heavily on lookup formulas, and that’s one of the highways to spreadsheet hell. Instead, get the data from an external source (access, OLAP cube…) and bring data into Excel. Calculations should be performed at the source. After removing all the formulas you can, the remaining should be as clear as possible.

3. Abuse Pivot Tables

Every object (graph, table) in the Demographic Dashboard is linked to a pivot table. Let me give you an example. One of the charts shows population growth over the years, using 1996 as reference. Pivot tables can calculate that directly, I don’t need to add a new layer of complexity by using formulas (to calculate the actual values and look up formulas to get them).

The population table has 200,000 records, so I coundn’d fit into the Excel limit of 65 thousand rows (yes, that’s changed in Excel 2007, but it is debatable if a table with a million rows in a spreadsheet application can be considered good practice). By using a pivot table I can overcome that limit.

4. Use named ranges

To be able to use self-document formulas (”=sales-costs” is much simpler to understand than “=$D$42-$F$55″) is one of several uses of named ranges. But they are also the building blocks of interaction with the user and they make your Excel dashboard more robust.

5. Use as many sheets as you need, or more

You don’t have to pay for each additional sheet you use in a workbook, so use as many as you need. Each cell in your dashboard report sheet should point to some other sheet where you actually perform the calculations. You should have at least three groups of sheets: a sheet with the dashboard report itself, sheets with the base data and other group with supporting data, definitions, parameters, etc. Add also a glossary sheet and a help sheet.

6. Use autoshapes as placeholders

Once you know what you need, start playing with the dashboard sheet. Use autoshapes to test alternative layouts or, better yet, use real objects (charts, tables…) linked to some dummy data.

7. Get rid of junk

There are two ways to wow your users: by designing a dashboard that actually answer needs, or by planting gauges and pie charts all over the place (this one can guarantee you a promotion in some dubious workplaces, but it will not help you in the long run). In the series on Xcelsius Dashboards you can see how difficult is to create something beyond the most basic and irrelevant charts.

So, get rid of Excel defaults (take a look at this before/after example) and just try to make your dashboard as clean and clear as possible. You’ll find many tips around here to improve your charts, so I’ll not repeat myself.

8. Do you really need that extra-large chart?

Charts are usually larger than they should. What it really matters in a chart is the pattern, not the individual values, and that can be seen even with a very small chart.

9. Implement some level of interaction

A dashboard is not an exploratory tool, is something that should give you a clear picture of what is going on. But I believe that at least a basic level of interactions should be provided. User like to play with the tools and can they learn a lot more than just looking at some static image.

10. Document your work

Please, please, structure and document your workbook. Excel is a very flexible environment, but with flexibility comes responsibility… I am not a very organized person myself, but from time to time I try the tourist point of view: I pretend I never saw that file in my life and I’ll try to understand it. If I can’t or takes me too long, either I must redesign it or write a document that explains the basic structure and flow.

Bonus tip: there is always something missing…

Once you have a prototype, user will come up with new ideas. Some of them can be implemented, others will ruin your project and if you accept them you’ll have to restart from scratch. So, make sure the specifications are understood and approved and the consequences of a radical change are clear.

This is far too incomplete, but I’ll try to improve it. Will you help? Do you have good tips specific to the design of Excel dashboards? Please share them in the comments.


The Dataweb and the DataFeret
Jan 3rd, 2008 by Tom Johnson

Marylaine Block's always informative “Neat New Stuff” [Neat New Stuff I Found This Week at] tipped us to the DataWeb site and its interesting tool, the Data Feret (or “dataferet”).

“TheDataWeb is a network of online data libraries that the DataFerrett application accesses the data through. Data topics include, census data, economic data, health data, income and unemployment data, population data, labor data, cancer data, crime and transportation data, family dynamics, vital statistics data, . . . As a user, you have an easy access to all these kinds of data. As a participant in TheDataWeb, you can publish your data to TheDataWeb and, in turn, benefit as a provider to the consumer of data.”

What is the DataFerrett?
DataFerrett is a unique data mining and extraction tool. DataFerrett allows you to select a databasket full of variables and then recode those variables as you need. You can then develop and customize tables. Selecting your results in your table you can create a chart or graph for a visual presentation into an html page. Save your data in the databasket and save your table for continued reuse. DataFerrett helps you locate and retrieve the data you need across the Internet to your desktop or system, regardless of where the data resides. DataFerrett:
* lets you receive data in the form in which you need it (whether it be extracted to an ascii, SAS, SPSS, Excel/Access file); or
* lets you move seamlessly between query, analysis, and visualization of data in one package;
* lets data providers share their data easier, and manage their own online data.
DataFerrett Desktop IconDataFerrett runs from the application icon installed on your desktop.

Check it out at


SoCal fire maps
Oct 24th, 2007 by Tom Johnson

Today, literally hundreds of square kilometers of Southern California — Los Angeles to San Diego — are burning. Some very alert newspapers and radio stations, though, are using Google Maps and a program called Twitter ( to update the maps on a regular basis. A good example, I think, of applied tools of analytic journalism.

Southern California fires on Google Maps


The Coming Phase
Sep 23rd, 2007 by Tom Johnson

We were pleased to see last week (via the NICAR listserv) that multiple newspapers, at least in the U.S., have discovered they can get public records data bases, create specialized look-up tools for their frontends and post it/them on their web site. Let's keep on keeping on with this.
It seems quite possibly that the next phase of bringing bits and bytes to the people might well be in the realm of 3D, mapping and simulation modeling. To that end, take a look at the “Terrain Tools & Software Packages” jumpstation. This is a nifty collection of commercial and open-source apps that just make your job easier and more interesting.

More on Benford's Law
Jul 30th, 2007 by JTJ

We've long been intrigued with Benford's Law and its potential for Analytic Journalism.  Today we ran across a new post by Charley Kyd that explains both the Law and presents some clear formulas for its application.

An Excel 97-2003 Tutorial:

Use Benford's Law with Excel

To Improve Business Planning

Benford's Law addresses an amazing characteristic of data. Not only does his formula help to identify fraud, it could help you to improve your budgets and forecasts.

by Charley Kyd

July, 2007

(Email Comments)

(Follow this link for the Excel 2007 version.)

Unless you're a public accountant, you probably haven't experimented with Benford's Law.

Auditors sometimes use this fascinating statistical insight to uncover fraudulent accounting data. But it might reveal a useful strategy for investing in the stock market. And it might help you to improve the accuracy of your budgets and forecasts.

This article will explain Benford's Law, show you how to calculate it with Excel, and suggest ways that you could put it to good use.

From a hands-on-Excel point of view, the article describes new uses for the SUMPRODUCT function and discusses the use of local and global range names.  [Read more…]



Simulation modeling
Jul 21st, 2007 by JTJ

Assoc. Prof. Paul M. Torrens, at Arizona State University's School of Geographical Sciences (torrens at geosimulation dot com) continues to turn out interesting simulation models. Most recently they are about crowd movement, but the methods are applicable to many venues. See his work at


The Beauty of Statistics
Jul 11th, 2007 by JTJ

FYI: From the O'Reilly Radar

Unveiling the Beauty of Statistics

Posted: 11 Jul 2007 03:01 AM CDT

By Jesse Robbins

I presented last week at the OECD World Forum in Istanbul along with Professor Hans Rosling, Mike Arrington, John Gage and teams from MappingWorlds, Swivel (disclosure: I am an adviser to Swivel) and Many Eyes. We were the “Web2.0 Delegation” and it was an incredible experience.

The Istanbul Declaration signed at the conference calls for governments to make their statistical data freely available online as a “public good.” The declaration also calls for new measures of happiness and well-being, going beyond just economic output and GDP. This requires the creation of new tools, which the OECD envisions will be “wiki for progress.” Expect to hear more about these initiatives soon.

This data combined with new tools like Swivel and MappingWorlds is powerful. Previously this information was hard to acquire and the tools to analyze it were expensive and hard to use, which limited it's usefulness. Now, regular people can access, visualize and discuss this data. Creating an environment where knowledge can be shared and explored.

H.G. Wells predicted that “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write.” Proponents of specific public policies often use statistics to support their view. They have the ability to select the data to fit with the policy. Democratization of statistics allows citizens to see the data that doesn't fit the policy, giving the public the power to challenge policymakers with new interpretations.

I highly recommend you watch Professor Rosling's exceptional summary of these exciting changes (where I got the title for this post), as well as his talks at TED.”



Doing urban modeling with real data
Jul 3rd, 2007 by JTJ

Once again, O'Reilly's Radar tips us to an interesting application of cell phone GPS data, this time to illustrate daily traffic activity in Rome.

Real Time Rome: Using Cellphones To Model a City's Movements

Posted: 02 Jul 2007 01:14 PM CDT

By Brady Forrest

rome at different times of the day

MIT's Senseable City Lab is using cellphone data to model Rome's populations. The project is called Real Time Rome. It is an exhibit at architecture conference La Biennale di Venezia's show Global Cities (shown Sept 10 – Nov 19 2006).

There are descriptions about the exhibit from an MIT article about the exhibit:

Real Time Rome features seven large animations, projected on transparent plexiglass screens. One screen shows traffic congestion around the city, while another screen shows the exact movements of all the city's buses and taxis. Another screen is able to track Romans celebrating major events like the World Cup or the city's annual White Nights festival (Notte Bianca, which will happen on Sept. 9, the evening before the Biennale's architecture exhibition opening). Additional screens show how tourists use urban spaces and how cars and pedestrians move about the city.

and how the data was collected:

Ratti's team obtains its data anonymously from cell phones, GPS devices on buses and taxis, and other wireless mobile devices, using advanced algorithms developed by Telecom Italia, the principal sponsor of the project. These algorithms are able to discern the difference between, say, a mobile phone signal from a user who is stuck in traffic and one that is sitting in the pocket of a pedestrian wandering down the street. Data are made anonymous and aggregated from the beginning, so there are no implications for individual privacy.

This certainly would be a more cost-effective method of gathering traffic data for determining commute times. Imagine if predictive systems could prepare us for the onslaught of traffic from a baseball game just letting out by watching the fans head towards there care. Or let us know that a highway is about to be flooded by traffic from a side road. Would you put up with your location being (formally) tracked in exchange for this service?

[BBC via Data Mining]


»  Substance:WordPress   »  Style:Ahren Ahimsa