Something cool for the Excel day-trippers
Aug 18th, 2006 by JTJ

OK, OK.  Maybe we've crossed over some line social acceptability, but this is neat addition to the analytic journalist's toolbox.  My friend Mike Collins tips us off to:

Lightweight data exploration in Excel data exploration in Excel digg:Lightweight data exploration in Excel reddit:Lightweight data exploration in Excel Y!:Lightweight data exploration in Excel


Lifehacker, delicious folks! This post generated a ton of great community ideas. Check out our followup post to see some more ideas and to download a spreadsheet with demos. Thanks.

We often are given a chunk of data in Excel that we need to explore.
Of course, the first tool you should pull out of your toolbox in cases
like this is the trusty PivotTable (it slices, it dices!). But at times
we have to dig a little deeper into the toolbox and pull out the
in-cell bar chart. Here’s what it looks like.

In cell bar charts in Excel

This picture shows some Major League Baseball data. I’m graphing the
number of walks each player has taken. The bar graphs are built using
the Excel REPT function which lets you repeat text a certain number of
times. REPT looks like this:


For instance, REPT(”X”,10) gives you “XXXXXXXXXX”. REPT can also
repeat a phrase; REPT(”Oh my goodness! “,3) gives “Oh my goodness! Oh
my goodness! Oh my goodness! ” (my daughter’s an Annie fan).

For in-cell bar charts, the trick is to repeat a single bar “|”.
When formatted in 8 point Arial font, single bars look like bar graphs.
Here’s the formula behind the bars:

The formula behind the bar

What are some practical uses of in-cell bar graphs? For starters,
they offer a good way to profile a dataset that has hundreds or
thousands of rows. Here’s a picture of in-cell bars compared to a
standard excel bar graph for a dataset with about 500 rows. It can be a
lot easier to scan the results when they’re in-cell.

Exploring tall data with in-cell bar graphsExploring the same data with an Excel bar graph

Another usage is lightweight dashboards. The report below compares a
number of metrics for players using both in-cell bar graphs as well as
conditional formatting. The conditional formatting highlights the top
25% of each metric in green and the bottom 25% in red but that is a
story for another day.

The formula behind the bar

"Making sense of the world by having fun with statistics!"
Aug 16th, 2006 by JTJ

Fascinating display of global statistics on site, Gapminder  The homepage currently has some dynamic displays related to
Human Development Trends: 2005.  Well worth watching, but be sure to scroll down the page to scan all the useful articles and presentations available.

Then, perhaps saving the best for last, go to the Gapminder Tool  at  Note that you can play with the axes to change (a) what is graphed and (b) how it is graphed (log or linear), and hit the play button on the bottom to see how the numbers changed over the past years.  [Thanks Patti Schank for this good tip.]

Search statistics through Google and watch it move with Gapminder

Google Subscribed Links makes it possible to search deep into Gapminder's moving graphs visualizing world development.

Subscribe or go straight to the graph.

Contact with questions or suggestions for improvements.

A MUST read: The (Ongoing) Vitality of Mythical Numbers
Jun 30th, 2006 by JTJ

“The (Ongoing) Vitality of Mythical Numbers
This article serves as a valuable reminder that we should view
all statistics, no matter how frequently they are used in
public arguments, with skepticism until we know who produced
them and how they were derived.”


Neat New Stuff I Found This Week


Copyright, Marylaine Block, 1999-2006.

A most-helpful statistics site
Jun 25th, 2006 by JTJ

From the good folks at Internet Scout:

HyperStat Online [Last reviewed December 19, 1997]

Does the mere mention of the phrase “sampling distributions” bring a tingle
to your spine? Visitors to this site will fear this basic concept of
statistics (along with many others) no longer, as it does a fine job of
explaining them in a fashion that is both lucid and jargon-free. Created and
maintained by Professor David M. Lane of Rice University, the HyperStat
Online site contains an online introductory statistics textbook, complete
with sections on normal distributions, confidence intervals, prediction, and
the logic of hypothesis testing. Each section contains a number of discrete
subsections, and users can feel free to browse around at their leisure.
Professor Lane has also included a number of external links to related
resources, including a visual statistics site by David Krus of Arizona State
University and a “Stat Primer”, authored by Bud Gerstman of San Jose State
University. Overall, this site is tremendously helpful, and will be of great
assistance to those entering the world of statistics for the first time.
Challenging the DATA of conventional wisdom
Jun 12th, 2006 by JTJ

Kudos this morning to National Public Radio's reporting on a Duke professor who thought the numbers on Chinese engineering grads seemed a little off kilter.

Figures on Chinese Engineers Fail to Add Up

Listen to this story... by  

Morning Edition, June 12, 2006 · A report cited in The New York Times
and quoted on the House floor claimed China graduates nine times as
many engineers as the U.S. Skeptical, a Duke professor had students
check the numbers.

For our readers in the UK….
Mar 23rd, 2006 by JTJ

Bridging quantitative and qualitative methods for social sciences using text mining techniques

Organiser: Dr Sophia Ananiadou
( or (0161)3063092),
School of Informatics, University of Manchester and National Centre for Text Mining (

Date and location

28 April 2006, Weston Conference Centre, University of Manchester.


To register for this workshop please complete the registration form.


This workshop aims to bring together researchers from different
subject areas (computer scientists, computational linguistics, social
scientists, psychologists, etc) in order to explore how text mining
techniques can revolutionise quantitative and qualitative research
methods in social sciences. New technologies from text mining (e.g.
information extraction, summarisation, question-answering, text
categorisation, sectioning, topic identification, etc.) which go beyond
concordances, frequency counts etc can be used for quantitative and
qualitative content analysis of different data types (e.g. transcripts
of interviews, questionnaire analysis, archives, chatroom files,
weblogs, etc). The semantic analysis of new text types, e.g. weblogs is
important for sociologists and political scientists in inferring social
trends. Reputation and sentiment analysis collects and identifies
people’s opinions, attitudes and sentiments in text. Text mining
techniques also aid metadata creation for qualitative data and
facilitate their sharing.

Summer workshop on IPUMS databases
Mar 20th, 2006 by JTJ

A good learning opportunity in the Land of Lakes this summer….

Dear IPUMS Users,

I am pleased to announce the first annual IPUMS Summer Workshop, to be held
in Minneapolis on July 19th-21st. This training session will cover four
major databases: IPUMS-USA, IPUMS-International, IPUMS-CPS, and the North
Atlantic Population Project (NAPP).

For more information, please visit

I hope to see some of you in Minneapolis this summer.


Steven Ruggles
Principal Investigator
IPUMS Projects

What about those polls, eh?
Mar 10th, 2006 by JTJ

Marylaine Block, at Ex Libris: an E-Zine for Librarians and Other Information Junkies. tips us to another good blog for analytic journalists.  Click below to see what Charles Franklin has to say about presidential polls.

Political Arithmetik – Where Numbers and Politics Meet
by Charles Franklin, a professor at the University of Wisconsin who teaches statistical analysis of
polls, public opinion and election results. He helps people understand
issues like political bias in poll samples and questions, and provides
historical context for current data.

More lies, loudly spoken, from the Bush Administration?
Jan 20th, 2006 by JTJ

Posted on Thu, Jan. 19, 2006

Feds dispute mine safety report


Knight Ridder Newspapers

WASHINGTON – Federal mine safety officials on Wednesday disputed a Knight Ridder analysis showing a dramatic reduction in the dollar amount of large fines for mine safety violations during the Bush administration, saying in an Internet posting that those fines are actually up.

Mine Safety and Health Administration spokesman Dirk Fillpot said that Knight Ridder made “assumptions that were incorrect'' in its Jan. 6 analysis.  But when Knight Ridder conducted a new analysis in the manner suggested by Fillpot using MSHA's newest database, it showed the same dramatic drop.

The newest data show a 43 percent reduction in proposed median major fines from the last five years of the Clinton administration when compared with the first five years of the Bush administration. That's the same percentage reduction found in Knight Ridder's original analysis, using a smaller, online database of MSHA violations.

When asked about that drop and the analysis, Fillpot refused Wednesday to answer 11 specific questions about MSHA's fines, its analysis or the posting of its critique.  Instead Fillpot repeated a prepared statement that said “it is unfortunate that Knight Ridder's analysis of MSHA's penalties was inaccurate.''

But four statistical experts who looked at the databases and analyses said Knight Ridder's findings were accurate and that MSHA's assessment didn't contradict the newspaper's findings of smaller fines during the Bush administration.

“It's really wrong for them (MSHA) to say you're incorrect,'' said John Grego, a professor of statistics at the University of South Carolina in Columbia. “There's no question that the average/median proposed penalty has gone down.”

MSHA's response “is looking at two different things and making a statement as if they are looking at the same thing,'' said Jeff Porter, a database library director for Investigative Reporters and Editors Inc., an association of journalists. Porter also teaches data analysis at the University of Missouri School of Journalism.

On its Web site, MSHA said the size of the final assessments — which are lower after bargaining and appeals — are up by “nearly 38 percent.''

Knight Ridder looked only at proposed fines because some of the actual fines are determined not by MSHA, but by administrative judges when mining companies appeal those penalties. Further, Knight Ridder found that fines finally assessed and paid fines are still lower on average in the Bush administration.

Fillpot wouldn't explain how his agency came up with the 38 percent figure.

The statistical experts said they couldn't understand how MSHA figured that out. Fillpot said “that information is taken from actual MSHA enforcement records and is accurate.''  He refused to elaborate.

In an unusual posting on the Internet on the Martin Luther King Jr. Day holiday on Monday, MSHA said, “Knight-Ridder's numbers are inaccurate, obscuring the reality that penalties issued by MSHA have gone up during this Administration — not down.''

After Knight Ridder questioned the posting, it was taken down Tuesday afternoon. It went back up Wednesday morning.  Among fines of $10,000 or more, the median penalty levied in the past five years was $27,139. During the last five years of the Clinton

administration, the comparable fine was $47,913, according to Knight Ridder's analysis of the newest data from MSHA.

That data, which included 221 large fines that weren't in the publicly available database used by Knight Ridder for its initial analysis, show that the total number of large fines increased to 527 in the Bush administration from 461 during the last five Clinton years.

Fillpot declined to say where those extra fines came from or why they weren't in the online database.|

(Johnson reports for Lexington Herald-Leader.)


Indirect indicators. Or maybe not.
Dec 5th, 2005 by Tom Johnson

journalists have a tendency to be too literal.  We want to ask a
question and we want the response to be a quote that is without
ambiguity.  One that's fills in some of the space between our
anecdotes.  But other times, we need tools that work like a
periscope, a device that allows us to not look at the object directly
but through a helpful lens.  Such periscopes for analyzing the
economy are indirect indicators.

(5 Dec. 2005) NYTimes' Business Section was loaded with references to
such indicators that journos could keep in mind when looking for
devices to show and explain what's happening.  Check out “
What's Ahead: Blue Skies, or More Forecasts of Them?”   Be sure to click on the link Graphic: Indicators From Everyday Life

Another indirector was mentined Sunday on National Public Radio in “Economic Signs Remain Strong
  There, an economist said he tracks changes in the “titanium dioxide” data, the compound is used in all white paint and reflects manufacturing production. 

»  Substance:WordPress   »  Style:Ahren Ahimsa