U.S. Terror Targets: Petting Zoo and Flea Market?
Jul 13th, 2006 by JTJ

Regular readers know that the IAJ has long been interested in the quality of the data in public records databases.  The NY Times of 12 July 2006 carries a front-page story by Eric Lipton on just how bad the data is in the “National Asset Database.”  As Lipton's story points out:

“The National Asset Database, as it is
known, is so flawed, the inspector general found, that as of January, Indiana,
with 8,591 potential terrorist targets, had 50 percent more listed sites than
New York (5,687) and more than twice as many as California (3,212), ranking the
state the most target-rich place in the nation….

“But the audit says that lower-level
department officials agreed that some older information in the inventory “was
of low quality and that they had little faith in it.

“The presence of large numbers of out-of-place
assets taints the credibility of the data,” the report says.”

Sigh.  This is not a new problem, or even one that we can hang on the Bush Administration.  It started with the Clinton Administration in 1998.  In 1998, President Clinton issued Presidential Decision Directive No. 63
(PDD-63), Critical Infrastructure Protection, which set forth principles for
protecting the nation by minimizing the threat of smaller-scale terrorist attacks
against information technology and geographically-distributed supply chains
that could cascade and disrupt entire sectors of the economy.” [Source here.]

Link to the PDF of the Inspector General's Report at

A MUST read: The (Ongoing) Vitality of Mythical Numbers
Jun 30th, 2006 by JTJ

“The (Ongoing) Vitality of Mythical Numbers
This article serves as a valuable reminder that we should view
all statistics, no matter how frequently they are used in
public arguments, with skepticism until we know who produced
them and how they were derived.”


Neat New Stuff I Found This Week


Copyright, Marylaine Block, 1999-2006.

Sometimes what is NOT there is more important
Jun 27th, 2006 by JTJ

Steve Bass, a PC World columnist, had an item this week that reminds us that a good analytic journalist is always thinking about what is NOT in the data.  He writes:

Risky Business: Stealth Surfing at Work

Not long after I told my buddy about Anonymizer, I heard from another friend, an IT director for a fairly large company. It may
not be such a good idea to surf anonymously at the office:

“I recently had an employee, an MIS employee at that, fired. He was using Anonymizer at work. We have a tracking system (Web
Inspector) and I kept noticing that he was leaving no tracks.

“I consulted with my supervisor and he decided that I should analyze the employee's system. I found footprints, hacking, and a
batch file he used to delete all Internet traces. So I sent the system off to forensics and they found all the bits, each and
every one. We're now in legal limbo. The employee is being fired, not for the hacking or the batch file, but for using the

“Thought maybe you'd be interested in hearing about the dangers of using the Anonymizer in the workplace. They claim the
Anonymizer hides your tracks at work–but I guess not all of them.”

–Name Withheld, Network and Computer Systems Administrator

I asked George Siegel, my network guru, what he thought. Here's what he said: “It's interesting to note how the user was
initially discovered — by the absence of anything incriminating. Network professionals have logs showing just about everything
that goes on and they look for any deviation from the norm. I can always tell who is up to no good… their computers are
scrupulously clean.

Ver 1.0 — The beat goes on
Apr 18th, 2006 by JTJ

We're pulling together the final pieces following the Ver 1.0
workshop in Santa Fe last week.  Twenty journalists, social
scientists, computer scientists, educators, public administrators and
GIS specialists met in Santa Fe April 9-12 to consider the question,
“How can we verify data in public records databases?” 

The papers,
PowerPoint slides and some initial results of three breakout groups are
now posted for the public on the Ver1point0 group site at Yahoo.  Check it out.

Ver 1.0 kicks off. Statician George Duncan opening speaker.
Apr 9th, 2006 by JTJ

Late this afternoon, the 20 participants in Ver 1.0
will be gathering at the Inn of the Governors in Santa Fe, NM for the
first session of the workshop.  The first, set-the-tone speaker is
George Duncan, professor of statistics at Carnegie Mellon University.  George will be speaking on “
Confidentiality: What Does It Mean for Journalists’ Use of Public Databases?

We will post George's address as soon as possible, along with those of other participants in coming days.

are very pleased with high-powered thinkers who are in or coming to
Santa Fe to address the major problem of how do we verify the data in
public records databases.  The proceedings of the workshop will,
we hope, be published by the end of the month and also available online.

Covering financial markets, etc.
Apr 3rd, 2006 by JTJ

Some new online resources for understanding, and engaging in, analytic journalism.  See the site for:

Online Tutorials

Covering Financial Markets

Prepared by Chris Roush
Director of the Carolina Business News Initiative, University of North Carolina at Chapel Hill
(Mouse over the speaker icon
to hear an overview of the tutorial)

Using Numbers Effectively

Prepared by Curt Hazlett
Senior Presenter, Donald W. Reynolds National Center for Business Journalism at the American Press Institute
(Mouse over the speaker icon
to hear an overview of the tutorial)

Understanding Financial Statements

Prepared by James K. Gentry, Ph.D.
Professor and former dean, School of Journalism and Mass Communications, University of Kansas

(Mouse over the speaker icon
to hear an overview of the tutorial)

SEC Filings

Prepared by Chris Roush
Director of the Carolina Business News Initiative, University of North Carolina at Chapel Hill
(Mouse over the speaker icon
to hear an overview of the tutorial)

Covering the Economy

Prepared by Merrill Goozner
Freelance Writer and Former Chief Economics Correspondent, The Chicago Tribune

Summer workshop on IPUMS databases
Mar 20th, 2006 by JTJ

A good learning opportunity in the Land of Lakes this summer….

Dear IPUMS Users,

I am pleased to announce the first annual IPUMS Summer Workshop, to be held
in Minneapolis on July 19th-21st. This training session will cover four
major databases: IPUMS-USA, IPUMS-International, IPUMS-CPS, and the North
Atlantic Population Project (NAPP).

For more information, please visit

I hope to see some of you in Minneapolis this summer.


Steven Ruggles
Principal Investigator
IPUMS Projects

Can Network Theory Thwart Terrorists?
Mar 14th, 2006 by JTJ

We should have caught this on Friday, but….

Radden Keefe (The Century Foundation) offers up a good overview of the pros and cons of Social Network Analytis in last Friday's (12 March 2006) edition of The New York Times.  In “Can Network Theory Thwart Terrorists?” he says that “the
intercepts some 650 million communications worldwide every day.”  Well, that's a nice round number, but one so large that we wonder how, for example, to account for basic variables such as the length of call?  (You don't suppose the good folks at the N.S.A. have to wait while the “Please wait.  A service technician will be with you shortly” messages are being replayed for 18 minutes, do we?) 

We think Social Network Analysis is another of those tools in its infancy, but one with (a) great potential and (b) an equally great development curve.

SJ Mercury-News Series: "Tainted Trials, Stolen Justice."
Jan 23rd, 2006 by JTJ

Friend-of-IAJ Griff Palmer alerts us to an impressive series this week that examines the conduct of the DA's office in Santa Clara County, California.  If nothing else, the series illustrates why good, vital-to-the-community journalism takes time and is expensive.  Rick Tulsky, Griff and other colleagues spent three years — not not three days, but YEARS — on the story.  Griff writes:

I invite you all to take a look at “Tainted Trials, Stolen Justice.”
This five-day series was three years in the making. It starts in
today's Mercury News:

registration is required to view the Merc's content. I'm not sure yet
if this URL will be cumulative or will only point to each day's part.
If the latter, I'll work to get the entire package pulled together
under one URL.

The Merc's on-line presentation includes a multimedia presentation, with Flash graphics, streaming audio and streaming video.

The project's backbone is reporter Rick Tulsky's review of  every 
criminal appeal originating out of Santa Clara County Superior Court
for five years. Rick was aided in his review by staff writers Julie
Patel and Mike Zapler.

Rick has a law degree, and he used
his legal training to analyze these cases for prosectuorial er! ror,
defense error and judicial error. He went over the cases with the Santa
Clara County District Attorney's Office, defense attorneys and judges.
He recruited seasoned criminal justice scholars and former judges and
prosecutors to review his findings.

Rick's findings: Santa Clara County's criminal justice system, while
far from broken, is systemically troubled by serious flaws that bias
the system in prosecutors' favor and, in the worst cases, lead to
outright miscarriages of justice. Rick found that more than a third of
the 727 cases he analyzed were marred by some form of questionable
conduct on the part of prosecutors, defense attorneys or judges. He
found that California's Sixth Appellate District routinely found
prosecutorial and judicial error to be harmless to criminal defendants
— in dozens of instances, resorting to factual distortions and flawed
reasoning to reach their conclusions.

This analysis has at
least one serious limitation: It doesn't comp! are Rick's Santa Clara
County findings with similar data from any other jurisdiction. It would
frankly have been impossible, at least within three years, to conduct a
similar case review on a broader scale.

To help us examine how Santa Clara County's criminal justice system
differs from those of other counties, I captured 10 years' worth of
felony arrest disposition data from the Criminal Justice Statistics
Center, maintained by the California Attorney General's Office.  (
I hand-keyed another four years' worth of CJSC data that were available
only on paper. (I did a rough estimate at one point and determined that
I'd keyed in somewhere in the neighborhood of 10,000 cells of data.)

This analysis showed us that, within the accuracy limitations of the
CJSC data, Santa Clara County stood out for having one of the highest
conviction rates and one of the lowest judicial dismissal rates among
all counties with populations of ! 100,000 or more.

As Rick's attention turned to the the appellate
system, my attention was drawn to an interactive database system
maintained by the California Administrative Office of the Courts:

requesed a copy of the underlying database from the AOC, only to be
stonewalled. Months of effort on our attorneys' part yielded only one
summary spreadsheet from the AOC.

Thanks to discussions on
this list and at NICAR conferences, I knew it should be possible to
programmatically retrieve the contents of the AOC database.  With Aron
Pilhofer's and John Perry's Perl scripting tutorials, and with lots of
generous coaching from John, I put together scripts that harvested the
criminal appeals data from the AOC system and parsed it from HTML into
delimited files.”

That data retrieval underlies the numbers that appear in the final day of this series.

So is there a story in the song(s)?
Jan 14th, 2006 by JTJ

From Complexity Digest:

Semantic Descriptors To Help The Hunt For Music, ( Innovations-report)

Excerpts: You like a certain song and want to hear other
tracks like it, but don't know how to find them? Ending the
needle-in-a-haystack problem of searching for music on the Internet or
even in your own hard drive is a new audio-based music information
retrieval system. Currently under development by the SIMAC project, it
is a major leap forward in the application of semantics to audio
content, allowing songs to be described not just by artist, title and
genre but by their actual musical properties such as rhythm, timbre,
harmony, structure and instrumentation. This allows comparisons between
songs to be made (…).

Source: Semantic Descriptors To Help

this come to fruition, might there be stories in patterns — regional
patterns — in music?  How could we map this?  And when?

»  Substance:WordPress   »  Style:Ahren Ahimsa