Alfredo Covaleda,
Bogota, Colombia
Stephen Guerin,
Santa Fe, New Mexico, USA
James A. Trostle,
Trinity College, Hartford, Connecticut, USA
We've long been intrigued with Benford's Law and its potential for Analytic Journalism. Today we ran across a new post by Charley Kyd that explains both the Law and presents some clear formulas for its application.
An Excel 97-2003 Tutorial:
by Charley Kyd July, 2007 (Email Comments)
(Follow this link for the Excel 2007 version.)
Unless you're a public accountant, you probably haven't experimented with Benford's Law.
Auditors sometimes use this fascinating statistical insight to uncover fraudulent accounting data. But it might reveal a useful strategy for investing in the stock market. And it might help you to improve the accuracy of your budgets and forecasts.
This article will explain Benford's Law, show you how to calculate it with Excel, and suggest ways that you could put it to good use.
From a hands-on-Excel point of view, the article describes new uses for the SUMPRODUCT function and discusses the use of local and global range names. [Read more…]
FYI: From the O'Reilly Radar
Unveiling the Beauty of Statistics Posted: 11 Jul 2007 03:01 AM CDT By Jesse Robbins I presented last week at the OECD World Forum in Istanbul along with Professor Hans Rosling, Mike Arrington, John Gage and teams from MappingWorlds, Swivel (disclosure: I am an adviser to Swivel) and Many Eyes. We were the “Web2.0 Delegation” and it was an incredible experience. The Istanbul Declaration signed at the conference calls for governments to make their statistical data freely available online as a “public good.” The declaration also calls for new measures of happiness and well-being, going beyond just economic output and GDP. This requires the creation of new tools, which the OECD envisions will be “wiki for progress.” Expect to hear more about these initiatives soon. This data combined with new tools like Swivel and MappingWorlds is powerful. Previously this information was hard to acquire and the tools to analyze it were expensive and hard to use, which limited it's usefulness. Now, regular people can access, visualize and discuss this data. Creating an environment where knowledge can be shared and explored. H.G. Wells predicted that “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write.” Proponents of specific public policies often use statistics to support their view. They have the ability to select the data to fit with the policy. Democratization of statistics allows citizens to see the data that doesn't fit the policy, giving the public the power to challenge policymakers with new interpretations. I highly recommend you watch Professor Rosling's exceptional summary of these exciting changes (where I got the title for this post), as well as his talks at TED.”
Unveiling the Beauty of Statistics
Posted: 11 Jul 2007 03:01 AM CDT
By Jesse Robbins
I presented last week at the OECD World Forum in Istanbul along with Professor Hans Rosling, Mike Arrington, John Gage and teams from MappingWorlds, Swivel (disclosure: I am an adviser to Swivel) and Many Eyes. We were the “Web2.0 Delegation” and it was an incredible experience.
The Istanbul Declaration signed at the conference calls for governments to make their statistical data freely available online as a “public good.” The declaration also calls for new measures of happiness and well-being, going beyond just economic output and GDP. This requires the creation of new tools, which the OECD envisions will be “wiki for progress.” Expect to hear more about these initiatives soon.
This data combined with new tools like Swivel and MappingWorlds is powerful. Previously this information was hard to acquire and the tools to analyze it were expensive and hard to use, which limited it's usefulness. Now, regular people can access, visualize and discuss this data. Creating an environment where knowledge can be shared and explored.
H.G. Wells predicted that “Statistical thinking will one day be as necessary for efficient citizenship as the ability to read or write.” Proponents of specific public policies often use statistics to support their view. They have the ability to select the data to fit with the policy. Democratization of statistics allows citizens to see the data that doesn't fit the policy, giving the public the power to challenge policymakers with new interpretations.
I highly recommend you watch Professor Rosling's exceptional summary of these exciting changes (where I got the title for this post), as well as his talks at TED.”
This weekend, friend-of-the-IAJ Joe Traub sent the following to the editor of the New York Times. Here's the story Joe is talking about: “White House….“
The headline error is bad enough (it's only in the hed, not not in the story) — and should be a huge embarrassment to the NYT. But the error gets compounded because while the Times no longer sets the agenda for the national discussion, it is still thought of (by most?) as the paper of record. Consequently, as other colleagues have pointed out, the reduction percentage gets picked up by other journalists who don't bother to do the math (or who cannot do the math.) See, for example: * CBS News — “Troop Retreat In '08?” — (This video has a shot of the NYT story even though the percentage is not mentioned. Could it be that the TV folks don't think viewers can do the arithmetic?)(NB: We could not yet find on the NPR site the transcript of the radio story that picked up the 50 percent error. But run a Google search with “cut in Troops by 50%” and note the huge number of bloggers who also went with the story without doing the math.)Colleague Steve Doig has queried the reporter of the piece, David Sanger, asking if the mistake is that of the NYT or the White House. No answer yet received, but Doig later commented: “Sanger's story did talk about reducing brigades from 20 to 10. That's how they'll justify the “50% reduction” headline, I guess, despite the clear reference higher up to cutting 146,000 troops to 100,000.”
Either way, it is a serious blunder of a fundamental sort on an issue most grave. It should have been caught, but then most journalists are WORD people and only word people, we guess.
We would also point out the illogical construction that the NYT uses consistently in relaying statistical change over time. To wit: “… could lower troop levels by the midst of the 2008 presidential election to roughly 100,000, from about 146,000…” We wince.
English is read from left to right. Most English calendars and horizontal timelines are read from left to right. When writing about statistical change, the same convention should be followed: oldest dates and data precedes newest or future dates and data. Therefore, this should best be written: “…could lower troop levels from about 146,000 to roughly 100,000 by the midst of the 2008 presidential election.”
No story? Then check out Swivel, a web site rich with data — and the display of data — that you didn't know about and which is pregnant with possibilities for a good news feature. And often a news feature that could be localized.Here, for example, is a posting from the SECRECY REPORT CARD 2005 illustrating the changing trends in the the classification and de-classification of U.S. government data. (You can probably guess the direction of the curves.)
The number of classified documents is steadily increasing, while the number of pages being declassified is dwindling. This data were uploaded by mcroydon.
Paul Parker, of the Providence (Rhode Island) Journal, is the Quick and an impressive list of folks on the state's voter registration rolls are the Dead this week. Below is a note Parker posted to the NICAR-L listserv. The great thing about this is the recipe Parker provides for an analytic journalists' cookbook. Said he:
Here's the link:http://www.projo.com/extra/election/content/deadvoters9_11-09-06_DN2P2GR.33b46ef.html
I know it's CAR101, but I'll outline how we did it (which is alsoexplained in the story):
1. Get your state's central voter registration database.2. Get your state slice of the Social Security Administration's DeathMaster File from IRE/NICAR.3. Run a match on First Name, Last Name and Date of Birth.4. Exclude matches where middle initials conflict. (Allow P=PETER orP=NULL, but not P=G.)5. Calculate a per capita rate for each city/town by dividing the numberof dead people by the total registered.6. Interview the biggest offenders about why they're the biggest offenders.
This was so easy, and now everyone at the paper thinks I'm some sort ofjournalism deity. (And the voter registration people called to ask,“Where do I get a copy of that Social Security list.”)
As for the possibility of false positives, we pointed this out in thestory, which I think sufficed because the odds are low enough. I alsohand checked a few against our obituary archives.
—Paul ParkerReporterThe Providence Journal75 Fountain StreetProvidence, RI 02902401-277-7360pparker@projo.com
Then David Heath, at the Seattle Times layered in his experience. Said he:
Eric Lipton has a piece in Wedneday's (4 Oct. 2006) NYTimes about some “new” research efforts to come up with software “that would let the [U.S.] government monitor negative opinions of the United States or its leaders in newspapers and other publications overseas.” (See “Software Being Developed to Monitor Opinions of U.S.“) Surely this is an interesting problem, and one made especially difficult when the translation factor kicks in.
This is not, however, the first attempt to gin-up such software. We have long admired the work done some years ago at the Pacific Northwest National Laboratory in the ThemeRiver™ visualization.
We hope the PNNL will continue by giving us more of this intriguing tool.
Another unique investigation by The New York Times gets A1 play in this Sunday's edition (1 Oct. 2006) under the hed “Campaign Cash Mirrors a High Court's Rulings.” Adam Liptak and Janet Roberts (who probably did the heavy lifting on the data analysis) took a long-term look at who contributed to the campaigns of Ohio's Supreme Court justices. It ain't a pretty picture if one believes the justices should be above lining their own pockets, whether it's a campaign fund or otherwise.
In any event, there seems to be a clear correlation between contributions — and the sources — and the outcome to too many cases. A sidebar, “Case Studies: West Virginia and Illinois,” would suggest there is much to be harvested by reporters in other states. There is, thankfully, a fine description of how the data for the study was collected and analyzed. See “How Information Was Collected“
There are two accompanying infographics, one (“Ruling on Contributors' Cases” ) is much more informative than the other (“While the Case Is Being Heard, Money Rolls In” ), which is a good, but confusing, attempt to illustrate difficult concepts and relationships.
At the end of the day, though, we are grateful for the investigation, data crunching and stories.
Any discipline always has subsets of argument, typically about definitions, methodologies, process or significance. Statistics, of course, is no different. Below is an interesting article from the Washington Monthly about what constitutes statistical significance. The article is OK, but the commentary below it even better. See http://www.blogware.com/admin/index.cgi/cmd=post_article
Eight or nine years back we attended one of the first Crime Mapping conferences sponsored by the National Institute of Justice and found it to be one of the most creative and practical events of this type. (We also have very high regard for the ESRI Users Conference and the Special Libraries Assoc. meetings.) So we want to be sure to let all analytic journos know about next year's Crime Mapping confab, scheduled for March 28 to 31, 2007 in Pittsburg, Penn. Here's part of the official call for papers:
The Mapping & Analysis for Public Safety Program announces it's Call for Papers for the Ninth Crime Mapping Research Conference in Pittsburgh, PA at the Omni William Penn Hotel, March 28 to 31, 2007. The deadline for submission is Friday, September 29th.... The theme of this conference will be Spatial Approaches to Understanding Crime & Demographics. The use of Geographic Information Systems (GIS) and spatial data analysis techniques have become prominent tools for analyzing criminal behavior and the impacts of the criminal justice system on society. Classical and spatial statistics have been merged to form more comprehensive approaches in understanding social problems from research and practical standpoints. These methods allow for the measurement of proximity effects on places by neighboring areas that lead to a multi-dimensional and less static understanding of factors that contribute to or repel crime across space.The 9th Crime Mapping Research Conference will be about demonstrating the use and development of methodologies for practitioners and researchers. The MAPS Program is anticipating the selection of key accepted presentations for further development of an electronic monograph on GIS, Spatial Data Analysis and the Study of Crime in the following year. Its purpose will be to demonstrate the fusing of classical and spatial analysis techniques to enhance policy decisions. Methods should not be limited to the use of classical and spatial statistics but also demonstrate the unique capabilities of GIS in preparing, categorizing and visualization data for analysis....
For more, see: http://www.ojp.usdoj.gov/nij/maps/
Least any of us think that Social Network Analysis is something new, please take the time to read this wonderful, albeit personal, history of the field. Edward O. Laumann, of the University of Chicago, has been swimming in these waters for more than 40 years. His address to the International Network of Social Network Analysis, 26th Annual Sunbelt Conference in Vancouver, Canada, April 2006, tells much about how we have arrived at the current level of SNASee “A 45-Year Retrospective of Doing Networks”http://www.insna.org/Connections-Web/Volume27-1/8.Laumann.pdf