SIDEBAR
»
S
I
D
E
B
A
R
«
PDF Tables: Outstanding tool extracts tables to Excel
Jun 13th, 2015 by Tom Johnson

I just gave this a spin using the City of Santa Fe 2015 budget, a 150-pager.  Seemed to be very fast in the conversion and quite accurate.  Unless you need the text, it is even faster if you edit out text pages and just run those pages containing the desired tables.  The result is that each page becomes a separate Excel page, then they can be sliced-and-diced as necessary.

Kudos to the ScraperWiki folks.

Accurately extract tables from PDFs
No more time consuming and error prone copying and pasting
https://pdftables.com/

-tj

Tracking campaign contributions with MapLight
Jun 19th, 2014 by Tom Johnson

Maplight, a 501(c)(3) foundation, recently announced its “extensive mapping project examining the geographic origin of contributions to legislators by state; contributions from companies to legislators by state; and roll call votes by state and district on key bills in Congress.”

Today’s news peg points to “Who in Your State Has Contributed Money to Majority Leader Candidate Kevin McCarthy (R-CA)?”

MapLight looks to be a good edition to our GIS toolbox.

Important conference on Quantifying Journalism at Columbia J-School
May 30th, 2014 by Tom Johnson

The first Tow Research conference, Quantifying Journalism: Metrics, Data and Computation, on May 30, 2014 reflected on a big year in data journalism. Quantifying Journalism: Data, Metrics, and Computation brought together academics, practitioners and technologists to explore three critical questions at the heart of the data journalism conversation.
https://www.youtube.com/watch?v=JMDMCIv0-So

An up-dated data clean-up tool at Google-Refine
Nov 14th, 2010 by Tom Johnson

Check out Google-Refine at http://code.google.com/p/google-refine/

Logo


                

Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.


 

Distributed Data Analysis at Facebook
Dec 1st, 2009 by analyticjournalism

This is a few months old, but we're wondering if any readers have used Hive or tried to deploy it in newsrooms, where “exploring and analyzing data…[is] everyone's responsibility.”

Distributed Data Analysis at Facebook

Exploring and analyzing data isn’t the responsibility of one team here at Facebook; it’s everyone’s responsibility. “Move fast” is one of our core values, and to facilitate fast data-driven decisions, the Data Infrastructure Team has created tools like Hive and its UI sidekick, HiPal, to make analyzing Facebook’s petabytes of data easy for anyone in the company. The Data Science team runs open tutorial sessions for groups eager to run their own analysis using these tools. And non-programmers on every team have fearlessly rolled up their sleeves to learn how to write Hive queries.

Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data.

If you like to move fast and want to be a data ninja (no matter what team you are in), check out our Careers page.


 

The Evolution of Darwin's Ideas
Sep 7th, 2009 by analyticjournalism

FlowingData passes along the link to this fine piece of work by Ben Fry.  “Ben Fry Visualizes the Evolution of Darwin’s Ideas” Journos could be using a similar approach to analyze the evolution of the ideas of public officials.

Ben Fry Visualizes the Evolution of Darwin’s Ideas

Posted by Nathan / Sep 7, 2009 to Artistic Visualization / 2 comments

Ben Fry Visualizes the Evolution of Darwin’s Ideas

“Ben Fry, well-known for Processing and plenty of other data goodness, announced his most recent piece, On the Origin of Species: The Preservation of Favoured Traces, made possible by The Complete Work of Charles Darwin Online.

The visualization explores the evolution of Charles Darwin's theory of, uh, evolution. It began as a less-defined 150,000-word text in the first edition and grew and developed to a 190,000-word theory in the sixth edition.

Watch where the updates in the text occur over time. Chunks are removed, chunks are added, and words are changed. Blocks are color-coded by edition. Roll over blocks to see the text underneath.

As usual, excellent work, Mr. Fry.”


 

 

Designing for Big Data
Apr 29th, 2009 by analyticjournalism

Much of this is well-known by those of us who have worked with dataviz for the past decade or two, but his ending conclusions are solid and worth reviewing.

Key quote from Jeffrey Veen: “We need to create tools to help people manipulate THEIR data.”

 Good examples of how to use large data sets to find and tell stories and, if desired, to answer YOUR questions about the data.


Video: Designing for Big Data

This is a 20-minute talk I gave at the Web2.0 Expo in San Francisco a couple weeks ago. In it, I describe two trends: how we're shifting as a culture from consumers to participants, and how technology has enabled massive amounts of data to be recorded, stored, and analyzed. Putting those things together has resulted in some fascinating innovations that echo data visualization work that's been happening for centuries.

I've given this talk a few times now, but this particular delivery really went well. Only having 20 minutes forced me to really stay focus, and the large audience was very engaged. I'll be giving an extended version of this talk in June at the UX London conference, with a deeper look at how we integrated design and research while I was at Google.

http://www.veen.com/jeff/archives/001000.html


 

How the right kind of data visualization could lead to new research questions or insights.
Dec 30th, 2008 by analyticjournalism

Nathan, over at Flowingdata.com, posts this interesting data visualization from the Baylor College of Medicine. No, it probably doesn't give a science writer a story in itself, but the concept of taking a complex data set and illustrating that data with the right tool — in this case, Circos — good generate some interesting reporting vectors. For example, could Circos show us something about traffic patterns? Ambulance or fire department response times? We're not sure, but we hope someone could probe this a bit.

Researchers Map Chaos Inside Cancer Cell

Posted by Nathan / Dec 29, 2008 to Network Visualization / 2 comments

Researchers Map Chaos Inside Cancer Cell

The thing about cancer cells is that they suck. Their DNA is all screwy. They've got chunks of DNA ripped out and reinserted into different places, which is just plain bad news for the cells in our body that play nice. You know, kind of like life. Researchers at the Baylor College of Medicine in Houston have compared the DNA of a certain type of breast cancer cell to a normal cell and mapped the differences (and similarities) with the above visualization.

The graphic summarizes their results. Round the outer ring are shown the 23 chromosomes of the human genome. The lines in blue, in the third ring, show internal rearrangements, in which a stretch of DNA has been moved from one site to another within the same chromosome. The red lines, in the bull's eye, designate switches of DNA from one chromosome to another.

Some design would benefit the graphic so that your eyes don't bounce around when you look at the technicolor genome but it's interesting nevertheless.

Check out the Flare Visualization Toolkit or Circos if you're interested in implementing a similar visualization with the above network technique.



http://flowingdata.com/2008/12/29/researchers-map-chaos-inside-cancer-cell




 

Three Tuesdays workshop on data and the political campaigns at the Santa Fe Complex
Sep 27th, 2008 by Tom Johnson

Handicapping the Horserace

Published by Don Begley at 10:09 pm under Complex News, event

Handicapping the Horserace
    •September 30, 2008 – 6:30-8 pm  •October 7, 2008 – 6:30-8 pm  •October 14, 2008 – 6:30-8 pm

It’s human nature: Elections and disinformation go hand-in-hand. We idealize the competition of ideas and the process of debate while we listen to the whisper campaigns telling us of the skeletons in the other candidate’s closet. Or, we can learn from serious journalism to tap into the growing number of digital tools at hand and see what is really going on in this fall’s campaigns. Join journalist Tom Johnson for a three-part workshop at Santa Fe Complex to learn how you can be your own investigative reporter and get ready for that special Tuesday in November.

Over the course of three Tuesdays, beginning September 30, Johnson will show workshop participants how to do the online research needed to understand what’s happening in the fall political campaign. There will be homework assignments and participants will contribute to the Three Tuesdays wiki so their discoveries will be available to the general public.

Everyone is welcome but space will be limited. A suggested donation of $45 covers all three events or $20 will help produce each session. Click here to sign up.

  • The Daily Tip Sheet (September 30, 6:30 pm)

    Newspapers are a ‘morning line’ tip sheet. There isn’t enough room for what you need to know.

    Newspapers can be a good jumping-off point for political knowledge, but they rarely have enough staff, staff time and space to really drill down into a topic. Ergo, it is increasingly up to citizens to do the research to preserve democracy and help inform voters. Tonight we will be introduced to some of the city, state and national web sites to help in our reporting and to a few digital tools to help you save and retrieve what you find.
  • Swimming Against the Flow (October 7, 6:30 pm):

    How to track data to their upstream sources.

    A web page and its data are not static events. (Well, usually they are not.) Web pages and digital data all carry “signs” of where they came from, who owns the site(s) and sometimes who links to the sites. We will discuss how investigators can use these attributes to our advantage, and also take a step back to consider the “architecture of sophisticated web searching.”
  • The Payoff (October 14, 6:30 pm)

    Yup, it IS about following the money. But then what?

    Every election season, new web sites come along that make it easier to follow the money — election money. This final workshop looks at some of those sites and focuses on how to get their data into a spreadsheet. Then what? A short intro to slicing-and-dicing the numbers. (Even if you are a spreadsheet maven, please come and act as a coach.)

This workshop is NOT a sit-and-take-it-in event. We’re looking for folks who want to do some beginning hands-on (”On-line hands-on”, that is) investigation of New Mexico politics. And that means homework assignments and contributing to our Three Tuesdays wiki. Participants are also encouraged to bring a laptop if you can. Click here to sign up.


Tom Johnson’s 30-year career path in journalism is one that regularly moved from the classroom to the newsroom and back. He worked for TIME magazine in El Salvador in the mid-80s, was the founding editor of MacWEEK, and a deputy editor of the St. Louis Post-Dispatch. His areas of interest are analytic journalism, dynamic simulation models of publishing systems, complexity theory, the application of Geographic Information Systems in journalism and the impact of the digital revolution on journalism and journalism education. He is the founder and co-director of the Institute for Analytic Journalism and a member of the Advisory Board of Santa Fe Complex.


 

JAGIS at The University of Hong Kong
Dec 16th, 2007 by Tom Johnson

What have we here? Cooperation between two academic departments in the same university? Largely unheard of in most schools, but it has happened with positive results in Hong Kong.

23 Nov 2007
http://www.hku.edu/press/news_detail_5671.html

Power Distribution of the Four Political Camps, Seeing the 2007 District Council Election Results with Maps

The Department of Geography and the Journalism and Media Studies Centre of The University of Hong Kong (HKU) announced today (November 23) an analysis of results of the 2007 District Council Election of four political camps from the spatial perspective.

Dr. P.C. Lai, Associate Professor of the Department of Geography, and her team applied the Geographic Information System (GIS) to analyze results of the District Council Election. The GIS technology was used to explore the power re-distribution of the four political camps or affiliations – pro-government, pro-democrat, moderate (Liberal Party) and independent candidates – of the said election. [more]

»  Substance:WordPress   »  Style:Ahren Ahimsa