Alfredo Covaleda,
Bogota, Colombia
Stephen Guerin,
Santa Fe, New Mexico, USA
James A. Trostle,
Trinity College, Hartford, Connecticut, USA
Today, literally hundreds of square kilometers of Southern California — Los Angeles to San Diego — are burning. Some very alert newspapers and radio stations, though, are using Google Maps and a program called Twitter (www.twitter.com) to update the maps on a regular basis. A good example, I think, of applied tools of analytic journalism.
Southern California fires on Google Maps
We've long been intrigued with Benford's Law and its potential for Analytic Journalism. Today we ran across a new post by Charley Kyd that explains both the Law and presents some clear formulas for its application.
An Excel 97-2003 Tutorial:
by Charley Kyd July, 2007 (Email Comments)
(Follow this link for the Excel 2007 version.)
Unless you're a public accountant, you probably haven't experimented with Benford's Law.
Auditors sometimes use this fascinating statistical insight to uncover fraudulent accounting data. But it might reveal a useful strategy for investing in the stock market. And it might help you to improve the accuracy of your budgets and forecasts.
This article will explain Benford's Law, show you how to calculate it with Excel, and suggest ways that you could put it to good use.
From a hands-on-Excel point of view, the article describes new uses for the SUMPRODUCT function and discusses the use of local and global range names. [Read more…]
FYI: From the O'Reilly Radar
Unveiling the Beauty of Statistics Posted: 11 Jul 2007 03:01 AM CDT By Jesse Robbins I presented last week at the OECD World Forum in Istanbul along with Professor Hans Rosling, Mike Arrington, John Gage and teams from MappingWorlds, Swivel (disclosure: I am an adviser to Swivel) and Many Eyes. We were the “Web2.0 Delegation” and it was an incredible experience. The Istanbul Declaration signed at the conference calls for governments to make their statistical data freely available online as a “public good.” The declaration also calls for new measures of happiness and well-being, going beyond just economic output and GDP. This requires the creation of new tools, which the OECD envisions will be “wiki for progress.” Expect to hear more about these initiatives soon. This data combined with new tools like Swivel and MappingWorlds is powerful. Previously this information was hard to acquire and the tools to analyze it were expensive and hard to use, which limited it's usefulness. Now, regular people can access, visualize and discuss this data. Creating an environment where knowledge can be shared and explored. H.G. Wells predicted that “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write.” Proponents of specific public policies often use statistics to support their view. They have the ability to select the data to fit with the policy. Democratization of statistics allows citizens to see the data that doesn't fit the policy, giving the public the power to challenge policymakers with new interpretations. I highly recommend you watch Professor Rosling's exceptional summary of these exciting changes (where I got the title for this post), as well as his talks at TED.”
Unveiling the Beauty of Statistics
Posted: 11 Jul 2007 03:01 AM CDT
By Jesse Robbins
I presented last week at the OECD World Forum in Istanbul along with Professor Hans Rosling, Mike Arrington, John Gage and teams from MappingWorlds, Swivel (disclosure: I am an adviser to Swivel) and Many Eyes. We were the “Web2.0 Delegation” and it was an incredible experience.
The Istanbul Declaration signed at the conference calls for governments to make their statistical data freely available online as a “public good.” The declaration also calls for new measures of happiness and well-being, going beyond just economic output and GDP. This requires the creation of new tools, which the OECD envisions will be “wiki for progress.” Expect to hear more about these initiatives soon.
This data combined with new tools like Swivel and MappingWorlds is powerful. Previously this information was hard to acquire and the tools to analyze it were expensive and hard to use, which limited it's usefulness. Now, regular people can access, visualize and discuss this data. Creating an environment where knowledge can be shared and explored.
H.G. Wells predicted that “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write.” Proponents of specific public policies often use statistics to support their view. They have the ability to select the data to fit with the policy. Democratization of statistics allows citizens to see the data that doesn't fit the policy, giving the public the power to challenge policymakers with new interpretations.
I highly recommend you watch Professor Rosling's exceptional summary of these exciting changes (where I got the title for this post), as well as his talks at TED.”
So the NYT did backtrack on the percent-of-change error described yesterday without assigning blame. That's fine. But the correction suggests another big story that we have only seen parts of. That is, of all the U.S. presence in Iraq — military and contractors — how many and what proportion are actually on the streets and how many and in what capacity are in support categories.
This weekend, friend-of-the-IAJ Joe Traub sent the following to the editor of the New York Times. Here's the story Joe is talking about: “White House….“
The headline error is bad enough (it's only in the hed, not not in the story) — and should be a huge embarrassment to the NYT. But the error gets compounded because while the Times no longer sets the agenda for the national discussion, it is still thought of (by most?) as the paper of record. Consequently, as other colleagues have pointed out, the reduction percentage gets picked up by other journalists who don't bother to do the math (or who cannot do the math.) See, for example: * CBS News — “Troop Retreat In '08?” — (This video has a shot of the NYT story even though the percentage is not mentioned. Could it be that the TV folks don't think viewers can do the arithmetic?)(NB: We could not yet find on the NPR site the transcript of the radio story that picked up the 50 percent error. But run a Google search with “cut in Troops by 50%” and note the huge number of bloggers who also went with the story without doing the math.)Colleague Steve Doig has queried the reporter of the piece, David Sanger, asking if the mistake is that of the NYT or the White House. No answer yet received, but Doig later commented: “Sanger's story did talk about reducing brigades from 20 to 10. That's how they'll justify the “50% reduction” headline, I guess, despite the clear reference higher up to cutting 146,000 troops to 100,000.”
Either way, it is a serious blunder of a fundamental sort on an issue most grave. It should have been caught, but then most journalists are WORD people and only word people, we guess.
We would also point out the illogical construction that the NYT uses consistently in relaying statistical change over time. To wit: “… could lower troop levels by the midst of the 2008 presidential election to roughly 100,000, from about 146,000…” We wince.
English is read from left to right. Most English calendars and horizontal timelines are read from left to right. When writing about statistical change, the same convention should be followed: oldest dates and data precedes newest or future dates and data. Therefore, this should best be written: “…could lower troop levels from about 146,000 to roughly 100,000 by the midst of the 2008 presidential election.”
Thanks to our friend at the University de Zulia in Maracaibo, Prof. Maria-Isabel Neuman, we just learned about this Rosetta Stone of data visualization. This is a must-see: “A Periodic Table of Visualization Methods.”http://www.visual-literacy.org/pages/documents.htm These guys in Switzerland at the Visual-Literacy Project have pulled together, in a wonderfully coherent fashion, the multiple concepts that many of us have been working on for years. Be sure to also take a look at the paper by Lengler and Eppler at the bottom of the “Maps” page. It's a good, tight explanation of what they are up to. We like their definition:
But we're not so sure that “permanent” is crucial or should even be included. If they are referring to “method,” then that would seem to limit the opportunity for refinements over time. And if they are talking about the resulting displays of data, might not that reduce the possibility of dynamic data displays, say real-time traffic flows or changes in the stock market? Simulations? Oh, well, a refinement ripe for discussion.
No story? Then check out Swivel, a web site rich with data — and the display of data — that you didn't know about and which is pregnant with possibilities for a good news feature. And often a news feature that could be localized.Here, for example, is a posting from the SECRECY REPORT CARD 2005 illustrating the changing trends in the the classification and de-classification of U.S. government data. (You can probably guess the direction of the curves.)
The number of classified documents is steadily increasing, while the number of pages being declassified is dwindling. This data were uploaded by mcroydon.
FYI, folks:
Prior to 2006, analysts had to make do with increasingly out-of-datedetailed information about households and individuals while they waitedfor the next decennial census. Starting in 2006, this information willbe made available on an annual basis in the ACS.
This course shows what sort of information is included, how to obtainit, and what methodological and sample size issues present themselves.
If you have not made use of similar Census data previously, learn howyou can leverage these improvements in data currency and timeliness foryour projects. If you have used decennial census data before, you willbenefit by learning about the methodological differences between thisSurvey and the decennial census long form – they affect the results andyou may make errors if you don't know how to handle the differences.
Ms. Taeuber, a senior policy advisor at the University of Baltimore'sJacob France Institute, has 30 years of experience at the U.S. CensusBureau, directed the analytic staff for the American Community Survey,and received the Commerce Dept.'s Gold Medal Award for her innovativework on the American Community Survey. She is the author of “TheAmerican Community Survey: Updated Information for America'sCommunities,” and more.
As with all online courses at statistics.com, there are no set hourswhen you must be online; we estimate you will need 7-15 hours per week.
Register: http://www.statistics.com/courses/census
Peter Brucecourses@statistics.com
P.S. Also coming up:
Nov. 3 – Cluster Analysis (useful for customer segmentation)Nov. 17 – How to deal with missing dataNov. 27 – Basic Concepts in Probability and Statistics
statistics.com612 N. Jackson St.Arlington, VA 22201USA
Another unique investigation by The New York Times gets A1 play in this Sunday's edition (1 Oct. 2006) under the hed “Campaign Cash Mirrors a High Court's Rulings.” Adam Liptak and Janet Roberts (who probably did the heavy lifting on the data analysis) took a long-term look at who contributed to the campaigns of Ohio's Supreme Court justices. It ain't a pretty picture if one believes the justices should be above lining their own pockets, whether it's a campaign fund or otherwise.
In any event, there seems to be a clear correlation between contributions — and the sources — and the outcome to too many cases. A sidebar, “Case Studies: West Virginia and Illinois,” would suggest there is much to be harvested by reporters in other states. There is, thankfully, a fine description of how the data for the study was collected and analyzed. See “How Information Was Collected“
There are two accompanying infographics, one (“Ruling on Contributors' Cases” ) is much more informative than the other (“While the Case Is Being Heard, Money Rolls In” ), which is a good, but confusing, attempt to illustrate difficult concepts and relationships.
At the end of the day, though, we are grateful for the investigation, data crunching and stories.
Any discipline always has subsets of argument, typically about definitions, methodologies, process or significance. Statistics, of course, is no different. Below is an interesting article from the Washington Monthly about what constitutes statistical significance. The article is OK, but the commentary below it even better. See http://www.blogware.com/admin/index.cgi/cmd=post_article