Some nice tools for AJ
Jun 9th, 2007 by JTJ

Picking up some interesting Web 2.0 tools at the IRE's annual conference, this year in Phoenix.

  •  The

     Good jumpstation for APIs, Mashups, How-To info, etc.

  • CityCon —

    “CityCon allows you to find detailed information about any member of the current 110th U.S. Congress.  Use the Input field above to query the CityCon database and the Internet for a U.S. City, State, Senator or Representative.”

  • —

    “ brings together campaign contributions and how legislators vote, providing an unprecedented window into the connections between money and politics.  We currently cover the California Legislature and U.S.”


The NYT DOES run a correction on its percentage screw-up
May 28th, 2007 by JTJ

So the NYT did backtrack on the percent-of-change error described yesterday without assigning blame.  That's fine.  But the correction suggests another big story that we have only seen parts of.  That is, of all the U.S. presence in Iraq — military and contractors — how many and what proportion are actually on the streets and how many and in what capacity are in support categories. 

[New York Times] Corrections: For the Record
Published: May 28, 2007 [Monday]
A front-page headline on Saturday about
concepts being developed by the Bush administration to reduce United
States combat forces in Iraq by as much as half next year referred
imprecisely to the overall effect on troop levels. As the story
indicated, removing half of the 20 combat brigades now in Iraq by the
end of 2008, one of the ideas under consideration, would cut the total
number of troops there by about one-third, from 146,000 to roughly
100,000, not by 50 percent. That is because many of the troops that
would remain in Iraq are in training or support units, not in combat
forces. (Go to Article)

NYT needs to install a "math checker" on every copy editor's desk
May 27th, 2007 by JTJ

This weekend, friend-of-the-IAJ Joe Traub sent the following to the editor of the New York Times.  Here's the story Joe is talking about: “White House….

To the Editor:

The headline on page 1 on May 26 states
“White House Said to Debate '08 Cut in Troops by 50%”
The article reports a possible reduction to 100,000 troops
from 146,000. Thats 31.5%, not 50%. NPR's Morning Edition
picked up the story from the NYT and also reported 50%

Joseph F. Traub
The writer is a Professor of Computer Science at Columbia University

The headline error is bad enough (it's only in the hed, not not in the story) — and should be a huge embarrassment to the NYT.  But the error gets compounded because while the Times no longer sets the agenda for the national discussion, it is still thought of (by most?) as the paper of record.  Consequently, as other colleagues have pointed out, the reduction percentage gets picked up by other journalists who don't bother to do the math (or who cannot do the math.)
See, for example:
* CBS News — Troop Retreat In '08?” — (This video has a shot of the NYT story even though the percentage is not mentioned.  Could it be that the TV folks don't think viewers can do the arithmetic?)
(NB: We could not yet find on the NPR site the transcript of the radio story that picked up the 50 percent error.  But run a Google search with “cut in Troops by 50%” and note the huge number of bloggers who also went with the story without doing the math.)

Colleague Steve Doig has queried the reporter of the piece, David Sanger, asking if the mistake is that of the NYT or the White House.  No answer yet received, but Doig later commented: “Sanger's story did talk about reducing brigades from 20 to 10. That's
how they'll justify the “50% reduction” headline, I guess, despite the
clear reference higher up to cutting 146,000 troops to 100,000.”

Either way, it is a serious blunder of a fundamental sort on an issue most grave.  It should have been caught, but then most journalists are WORD people and only word people, we guess.

We would also point out the illogical construction that the NYT uses consistently in relaying statistical change over time.  To wit: “… could lower troop levels by the midst of the 2008 presidential election to roughly 100,000, from about 146,000…”  We wince. 

English is read from left to right.  Most English calendars and horizontal timelines are read from left to right.  When writing about statistical change, the same convention should be followed: oldest dates and data precedes newest or future dates and data.  Therefore, this should best be written: “…could lower troop levels from about 146,000 to roughly 100,000 by the midst of the 2008 presidential election.”

GeoCommons (Another tip from O'Reilly Radar)
May 24th, 2007 by JTJ


GeoCommons, Share Your GeoData

Posted: 23 May 2007 01:59 PM CDT

By Brady Forrest

geocommons map

is a new mapping site that allows members to use a variety of datasets
to create their own maps. It provides the free geodata, a map builder
tool,the ability to create heat maps, and a map hosting site. An API
will be available shortly. GeoCommons comes from FortiusOne, a Washington, D.C. company. The public Beta is going to be releasedWhere 2.0's launchpad.
Monday, May 28th, at Where 2.0's launchpad.

When building a map you can use one of the 1500 data sets (with 2
billion data attributes) that they have made freely available. The data
sets vary widely and include things like “Identity Theft 2006”, “Coral
Reef Bleaching – Worldwide”, “Starbucks Locations – Worldwide”, and
“HAZUS – Seattle, WA – Resident Demographics”. As you can see below,
data can be viewed in a tabular format prior to loading it onto a map.
Data sets can be combined together so that you can see “The Prices of Living in NYC & SF” and “Barack vs. Clinton – Show Me the Money! ” — it seems to me that Barack has more widespread support.

O'Reilly Radar tips us to update RE online mapping
May 22nd, 2007 by JTJ

We are finding O'Reilly's Radar an increasingly valuable site/blog to keep up with interesting developments in Web 2.0, publishing and the general Digital Revolution.  Brady Forrest's contribution below is an example.


Trends of Online Mapping Portals

Posted: 21 May 2007 04:34 PM CDT

By Brady Forrest

Last week there were several announcements made that show the direction
of the online mapping portals. Satellite images and slippy maps are no
longer differentiators for attracting users, everyone has them and as I
noted last week there are now companies that have cropped up to service
companies that want their own maps. Some of these new differentiators
are immersive experiences, owning the stack, and data!

Immersive experience within the browser – A couple of weeks ago Google maps added building frames that are visible at street level in some cities. These 2.5D frames are very clean and useful when trying to place something on a street.

google 2.5d maps

Now the Mercury News (warning: annoying reg required; found via TechCrunch) is reporting that these builds will soon be fully fleshed out.

The Mercury News has learned that Google has quietly
licensed the sensing technology developed by a team of Stanford
University students that enabled Stanley, a Volkswagon Touareg R5, to
win the 2005 DARPA Grand Challenge. In that race, the Stanford robotic
car successfully drove more than 131 miles through the Mojave Desert in
less than seven hours.

The technology will enable Google to map out photo-realistic 3-D
versions of cities around the world, and possibly regain ground it has
lost to Microsoft's 3-D mapping application known as Virtual Earth.

The license will be exclusive, but don't think Google
will be the only ones with 3-D in the browser. Microsoft has had 3-D
for a while now (unfortunately, it requires the .NET framework; my
assumption is that the team is busy converting it to SilverLight). 3-D
is going to become a standard part of mapping applications. The trick
will be making sure that the extra data doesn't get in the way of the
user's quest to get information. Buildings are slow to render and can
obscure directions.

This strategy is a nice compliment to their current strategy of
gathering and harnessing 3-D models from users. Currently these are
only available in Google Earth. The primary location to get them is
Google's 3D Warehouse. I suspect that we will start to see user contributed models on Google Maps.

No word on how many cities Google will roll out their 3D models in or when the new data will be available via their API.

Data, Data, & More Data – Until recently, search
engines did not provide neighborhoods as a way of searching cities.
Neighborhoods are an incredibly useful, if hard to define, method of
defining an area of a city.

Google has now added
neighboorhood data to their index, but they have not really done much
with it. If you know the neighborhood name then you can use that to
supplement searching a city. However, if you are uncertain or if you
are unaware of the feature, then you are SOL. There is no indication
that the feature exists, how widespread it is, or what the boundaries
of the neighborhood are. I hope that they continue to expand on this

ask neighborhood map

on the other hand has done a great job with this feature (see above).
They surface nearby neighborhood names for easy follow-on searches (see
below). They show you the bounds of the neighborhood quite clearly.

ask neighborhoods

Ask is using data from SF startup Urban Mapping. Urban Mapping claims
complete coverage of ~300 urban areas in the US and Canada (with Europe
coming). This isn't an easy problem. Urban Mapping has been working at
it for quite sometime and are known for having a good data set. They
have also been aggregating transit data. An interesting thing to note
is that many of the same neighborhoods available on Ask are also
available on Google maps (examples: Tenderloin, SF: Google, Ask; Civic Center, SF: Google, Ask)
No word yet if any of the other big engines are going to add
neighborhood data, but my guess is that it will soon become a standard
feature; it's too useful to not have.

Own the Stack – Until recently, Yahoo! used deCarta to handle creating directions (or routing). They have announced
that they have taken ownership of this part of the stack and have built
their own routing engine. Ask and Google still use deCarta. Microsoft
has always had their own. Yahoo! is hoping to make their new engine a
differentiator. In some ways this is analogous to Microsoft's purchase
of Vexcel, a 3D imagery provider. Microsoft did not want the same 3D data as Google Earth or any other search engine for its 3D world.

I think that any vendor servicing Google, Microsoft, Ask, Yahoo or
MapQuest will have to keep an eye on their next source of revenue.
Those contracts aren't going to necessarily last too long. The geostack
is too valuable to outsource.

There is only one part of the stack that I think *might* be to
expensive for any one of the engines to buy or build out right. That's
the street data and it's a data source primarily supplied by two
companies, NAVTEQ and Tele Atlas. NAVTEQ has a market cap of 3.5 bilion dollars as of this writing; Tela Atlas has one of 1.4 billion pounds. These would be spendy purchases. Microsoft is currently working closely with Facet Technology Corporation to collect street data for cities to add a street-level 3D layer (see Facet's SightMap
for a preview), but this Facet is not collecting data to match the
other players. It will be interesting to see if Yahoo! parleys its partnershipOpenStreetMap into a data play.

Is Your Baseball Team Overpaid?
May 20th, 2007 by JTJ

An interesting piece of analysis and visual infographics posted today on the O'Reilly Radar site.  See

Assuming you have a baseball team, Ben Fry will let you answer that
question. He has created a tool for visualizing the salary of Major
League Baseball teams versus their performance in 2007 (prev. As he explains:

This sketch looks at all 30 Major League Baseball Teams and ranks them
on the left according to their day-to-day standings. The lines connect
each team to their 2007 salary, listed on the right.

Drag the date at the top to move through the season. The first ten
days of the season are ommitted because the rankings to (at least) that
point are statistically silly. You can also use the arrow keys on the
keyboard to move forward or backward one day.

A steep blue line means that the team is doing well for its money,
which reflects well on the team's General Manager. A steep red line
implies that the team is throwing away money. The thickness of the line
is proportional to the team's salary relative to the others.

The images above are captures of the beginning of the season rankings
(left) as compared to now (right). It looks like Boston is now at a
break-even point whereas the Yankees are sinking and a bit over-paid. I
wonder if any of the GM compensation decisions are made based on this

4th Lake Arrowhead Conference on Human Complex Systems
Apr 26th, 2007 by JTJ

We're at the UCLA conference center attending the
4th Lake Arrowhead Conference on Human Complex Systems

First take:

Complex Systems: Phenomena, Characteristics & Research Questions
Rouse tells us that NSF has come to recognize the potential value of Agent Based Modeling.  The agency will, probably in the fall of 2008, issue calls for proposals to do multi-disciplinary research into complex systems and ABM.
See Human/Technology
Interaction in Complex Systems

Second take:

Organizational Metrics with the Quantum Approach: Constructing an Organization of Quantum Agents

Bill Lawless' interesting work finds that groups operating on a “concensus model” are less effective and efficient when compared to “majority model” decision-making groups.

Third take:

“The Emergence of Efficient Social Networks by Dynamic Reinforcement”

Chasparis' work has implications for journalism institutions IF they understand that they can (should?) be the hub (or node) for facilitating transactions between users and those with the desired resources and/or between the journalistic institution and the community.  The presentation is complicated and laden with equations — after all, the authors are in mechanical engineering — but study well their implications of how networks are created and emerge.

What this presentation suggests is that we could model circulation/promotion campaigns by “selling” one subscription to an individual household.  Then, having planted that seed of recognition and brand AND assuming that there is neighbor-to-neighbor communication, we fertilize that seed by delivering for free our paper to the immediately adjacent neighbors.  And, perhaps, we use stick-on/peel off labels to publicize something special for that node of concentration.  Now we have created a potential point of commonality for the neighbors to talk about and, we hope, appreciate.  The question then becomes “How can we create added value” for that cluster of subscribers.

Second point raised: Can we model what is the optimum  time for prescription offers?  Is 13-weeks the best or five?  Let's find out.

Fourth Take (Thur. afternoon):

“Intermediated Cultural Cognition: Putting Materiality Back into Simulations”

See Gessler's homepage — — for excellent collection of visual and dynamic tools for modeling.

Fifth Take:

Sixth take:

“Implementing the Community-Practice Model for Agent-based Simulation”

Seventh take:

    “Social Neuroscience: Lessons from Exploring Agents’ Minds

Session eight:

“Neighborhood Chance and Neighborhood Change”

Presentation on residential segregation modeling.  “Schelling suggests that segregation can emerge at the active level even if it is not sought by the residents.”  Later findings (Bruch and Mare): Segregation increases with indifference to segregation.  Why?  Not really a lack of indifference.  Also, equal granularity in the multicultural function.  (See also:

Potentially good methods here for researching segregation of a city, but also might be applied to understanding concentrations of newspaper subscriptions and their correlated demographics.

Friday Morning

First session

KLAUS JAFFE  Universidad Simón Bolívar
Sociodynamics: Towards a Fundamental Science of Social Dynamics

A fine demonstration of his modeling tool, SOCIODYNAMICA
   “Sociodynamics is an interdisciplinary attempt to study the dynamics of
complex systems within the conceptual frame of subjects spanning
biology, sociology, politics, history, economy and other sciences.

Second session

Computational Dialects and Communities of Discourse

Pertinent presentation on the importance of the language used in modeling, the ambiguity of words but the lack of it in terms of agents' action.  Also the need to build in threats & intimidation, humor and irony, etc.

Interesting discussion of what he terms “discourse communities.”  i.e. “Dynamic interplay of cultural resources and situated identities.”

Third session

Predicting Risky Behavior in Tribal Societies: Validating Decision Paradigms and Exploring Models

Addresses issue of “deep uncertainty” — experts agree on what the basic theory is, but can't agree on what the parameters and metrics should be.  And in his field, anthropology, can't even agree on what should be the issues and framework.

His approach is to apply a number of theoretical metrics (15 models) to building a “society” (based on good anthro data) and see which works best.  An approach closely related to exploratory data analysis that analytic journalists often use.

Commonalities of models that worked well:
    1) Agents were quasi-optimal (smart)
    2) Agents nonetheless diverse (heterogeneous.e.g. individual agents             doing different things.)

Fourth session

An Approach to Simulating Mobility and Migratory Behavior in Tijuana

Interesting related link here for Agenda Setting Workshop on Social Simulation.

Good presentation on simulation (computational modeling?) of the Tuberculosis cycle in Tijuana plus looking at models of corruption.  He points out that the Chinese population in Tijuana is growing very fast.  Interesting, and valuable, application of Maslow's pyramid of needs concepts (i.e starting with the physical needs to social to moral needs.)

Working on integrating Beer's Visable Systems Model with transactional analysis models.


Fifth Session

Interpretive Heatbugs: Aggressive Acts and Voluntary Contributions

Sixth Session

*) Warmer climate societies tend to be poorer.  (Because the ready availability of food means that the individuals do not have to think long-term.
*) Refutes the idea that “we are blind to emergence.”  Instead points to the “emergence” of academic disciplines.  [So how come some disciplines, e.g. biology, seem to have many more emerging sub-disciplines than those from the Humanities (and especially journalism)?]

Seventh Session:

“Smart Parts Logistics Systems as Complex Adaptive Systems: How to Design a Model to Manage an Artificial World?”

Objective: to make logistics systems work in/as complex adaptive models.
[Essentially, this is about the best — most efficient — way to receive raw materials and deliver the finished product to customers of various types.  Could have direct application for publishing industry, if it only knew about such methods.]

They are researching how to build-in RFID chips into products like cars to imbue the product with enough intelligence to, for example, figure out the most optimum way to get itself to a truck or ship. 

PlaSMA: Multiagent-based simulation for logistics

Data Visualization is Everywhere!
Apr 12th, 2007 by JTJ

This doesn't have anything to do with Analytic Journalism per se, but
while flying from Cairo to Dubai recently I looked out the window at
39,000 feet somewhere over the sands of central Saudi Arabia.  What to
my wondering eyes did appear, but an expanse of pie charts. 

course these are irrigated crops.  A friend in Dubai, who grew up in
Saudi Arabia, said the reason they are not all completely filled
circles is because some growers don't have enough money (yet) to buy
the equipment necessary to complete the 360-degree irrigation.

Good jump site for satellite maps
Apr 12th, 2007 by JTJ

Our thanks to someone somewhere who pointed us to “Flashearth,” an interesting site under development that supplies links to multiple mapping programs that draw on global  satellite imagery.  The are: Google Maps; Microsoft VE (aerial); Microsoft VE (labels); Yahoo Maps; (aerial); (physical); OpenLayers; NASA Terra (daily).

The sites vary in the degree of “zoomability,” but each offers slightly different capability and data.  In any event, it is most likely worthy of a bookmark.

The good stuff just keeps coming and coming
Feb 25th, 2007 by JTJ

We realize there is a robust handful of very good infographic reporters and designers working out there for many different publications, but the gang at the NY Times just keeps on keepin' on with innovative — and 98 percent of the time — highly informative infographics and visual displays of data.  Today's (25 Feb 2007) edition is a basket rich with fine examples:

* “Truck Sales Slip, Tripping Up Chrysler” (Business Section, p. 8). Offers up a complex (they often are) “treemap” of vehicle sales.

* “Who Do You Think We Are?” (Week in Review – Op-Art, p. 15).  Ben Schott, author of “Schott’s Original Miscellany” and “Schott’s Almanac 2007,” a yearbook of American society.” presents some basic line and bar charts, but on subjects of interest to AJ readers.  Specifically, “Confidence in Institutions” (the “press” is the lowest, even below Congress) and “Newspaper Readership.”  (And you already know what that graph looks like.) 

*) “How Two Rights Can Make a Wrong” (Week in Review – p. 5).  Howard Markel, M.D. and Bill Marsh give us a fine graphic illustrating complex drug interactions.

»  Substance:WordPress   »  Style:Ahren Ahimsa