Alfredo Covaleda,
Bogota, Colombia
Stephen Guerin,
Santa Fe, New Mexico, USA
James A. Trostle,
Trinity College, Hartford, Connecticut, USA
Kudos to the ScraperWiki folks.
Accurately extract tables from PDFs No more time consuming and error prone copying and pasting https://pdftables.com/
-tj
Maplight, a 501(c)(3) foundation, recently announced its “extensive mapping project examining the geographic origin of contributions to legislators by state; contributions from companies to legislators by state; and roll call votes by state and district on key bills in Congress.”
Today’s news peg points to “Who in Your State Has Contributed Money to Majority Leader Candidate Kevin McCarthy (R-CA)?”
MapLight looks to be a good edition to our GIS toolbox.
The first Tow Research conference, Quantifying Journalism: Metrics, Data and Computation, on May 30, 2014 reflected on a big year in data journalism. Quantifying Journalism: Data, Metrics, and Computation brought together academics, practitioners and technologists to explore three critical questions at the heart of the data journalism conversation. https://www.youtube.com/watch?v=JMDMCIv0-So
Check out Google-Refine at http://code.google.com/p/google-refine/
Google Refine is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like Freebase.
This is a few months old, but we're wondering if any readers have used Hive or tried to deploy it in newsrooms, where “exploring and analyzing data…[is] everyone's responsibility.”
Exploring and analyzing data isn’t the responsibility of one team here at Facebook; it’s everyone’s responsibility. “Move fast” is one of our core values, and to facilitate fast data-driven decisions, the Data Infrastructure Team has created tools like Hive and its UI sidekick, HiPal, to make analyzing Facebook’s petabytes of data easy for anyone in the company. The Data Science team runs open tutorial sessions for groups eager to run their own analysis using these tools. And non-programmers on every team have fearlessly rolled up their sleeves to learn how to write Hive queries.
Today, Facebook counts 29% of its employees (and growing!) as Hive users. More than half (51%) of those users are outside of Engineering. They come from distinct groups like User Operations, Sales, Human Resources, and Finance. Many of them had never used a database before working here. Thanks to Hive, they are now all data ninjas who are able to move fast and make great decisions with data.
If you like to move fast and want to be a data ninja (no matter what team you are in), check out our Careers page.
FlowingData passes along the link to this fine piece of work by Ben Fry. “Ben Fry Visualizes the Evolution of Darwin’s Ideas” Journos could be using a similar approach to analyze the evolution of the ideas of public officials.
“Ben Fry, well-known for Processing and plenty of other data goodness, announced his most recent piece, On the Origin of Species: The Preservation of Favoured Traces, made possible by The Complete Work of Charles Darwin Online.
The visualization explores the evolution of Charles Darwin's theory of, uh, evolution. It began as a less-defined 150,000-word text in the first edition and grew and developed to a 190,000-word theory in the sixth edition.
Watch where the updates in the text occur over time. Chunks are removed, chunks are added, and words are changed. Blocks are color-coded by edition. Roll over blocks to see the text underneath.
As usual, excellent work, Mr. Fry.”
Much of this is well-known by those of us who have worked with dataviz for the past decade or two, but his ending conclusions are solid and worth reviewing.
Key quote from Jeffrey Veen: “We need to create tools to help people manipulate THEIR data.”
Good examples of how to use large data sets to find and tell stories and, if desired, to answer YOUR questions about the data.
This is a 20-minute talk I gave at the Web2.0 Expo in San Francisco a couple weeks ago. In it, I describe two trends: how we're shifting as a culture from consumers to participants, and how technology has enabled massive amounts of data to be recorded, stored, and analyzed. Putting those things together has resulted in some fascinating innovations that echo data visualization work that's been happening for centuries.
I've given this talk a few times now, but this particular delivery really went well. Only having 20 minutes forced me to really stay focus, and the large audience was very engaged. I'll be giving an extended version of this talk in June at the UX London conference, with a deeper look at how we integrated design and research while I was at Google.
http://www.veen.com/jeff/archives/001000.html
Nathan, over at Flowingdata.com, posts this interesting data visualization from the Baylor College of Medicine. No, it probably doesn't give a science writer a story in itself, but the concept of taking a complex data set and illustrating that data with the right tool — in this case, Circos — good generate some interesting reporting vectors. For example, could Circos show us something about traffic patterns? Ambulance or fire department response times? We're not sure, but we hope someone could probe this a bit.
The thing about cancer cells is that they suck. Their DNA is all screwy. They've got chunks of DNA ripped out and reinserted into different places, which is just plain bad news for the cells in our body that play nice. You know, kind of like life. Researchers at the Baylor College of Medicine in Houston have compared the DNA of a certain type of breast cancer cell to a normal cell and mapped the differences (and similarities) with the above visualization.
The graphic summarizes their results. Round the outer ring are shown the 23 chromosomes of the human genome. The lines in blue, in the third ring, show internal rearrangements, in which a stretch of DNA has been moved from one site to another within the same chromosome. The red lines, in the bull's eye, designate switches of DNA from one chromosome to another.
Some design would benefit the graphic so that your eyes don't bounce around when you look at the technicolor genome but it's interesting nevertheless.
Check out the Flare Visualization Toolkit or Circos if you're interested in implementing a similar visualization with the above network technique.
http://flowingdata.com/2008/12/29/researchers-map-chaos-inside-cancer-cell
Published by Don Begley at 10:09 pm under Complex News, event
It’s human nature: Elections and disinformation go hand-in-hand. We idealize the competition of ideas and the process of debate while we listen to the whisper campaigns telling us of the skeletons in the other candidate’s closet. Or, we can learn from serious journalism to tap into the growing number of digital tools at hand and see what is really going on in this fall’s campaigns. Join journalist Tom Johnson for a three-part workshop at Santa Fe Complex to learn how you can be your own investigative reporter and get ready for that special Tuesday in November.
Over the course of three Tuesdays, beginning September 30, Johnson will show workshop participants how to do the online research needed to understand what’s happening in the fall political campaign. There will be homework assignments and participants will contribute to the Three Tuesdays wiki so their discoveries will be available to the general public.
Everyone is welcome but space will be limited. A suggested donation of $45 covers all three events or $20 will help produce each session. Click here to sign up.
This workshop is NOT a sit-and-take-it-in event. We’re looking for folks who want to do some beginning hands-on (”On-line hands-on”, that is) investigation of New Mexico politics. And that means homework assignments and contributing to our Three Tuesdays wiki. Participants are also encouraged to bring a laptop if you can. Click here to sign up.
Tom Johnson’s 30-year career path in journalism is one that regularly moved from the classroom to the newsroom and back. He worked for TIME magazine in El Salvador in the mid-80s, was the founding editor of MacWEEK, and a deputy editor of the St. Louis Post-Dispatch. His areas of interest are analytic journalism, dynamic simulation models of publishing systems, complexity theory, the application of Geographic Information Systems in journalism and the impact of the digital revolution on journalism and journalism education. He is the founder and co-director of the Institute for Analytic Journalism and a member of the Advisory Board of Santa Fe Complex.
What have we here? Cooperation between two academic departments in the same university? Largely unheard of in most schools, but it has happened with positive results in Hong Kong.
23 Nov 2007 http://www.hku.edu/press/news_detail_5671.html
Power Distribution of the Four Political Camps, Seeing the 2007 District Council Election Results with Maps
The Department of Geography and the Journalism and Media Studies Centre of The University of Hong Kong (HKU) announced today (November 23) an analysis of results of the 2007 District Council Election of four political camps from the spatial perspective.
Dr. P.C. Lai, Associate Professor of the Department of Geography, and her team applied the Geographic Information System (GIS) to analyze results of the District Council Election. The GIS technology was used to explore the power re-distribution of the four political camps or affiliations – pro-government, pro-democrat, moderate (Liberal Party) and independent candidates – of the said election. [more]