Alfredo Covaleda,
Bogota, Colombia
Stephen Guerin,
Santa Fe, New Mexico, USA
James A. Trostle,
Trinity College, Hartford, Connecticut, USA
Friday's highlights from the conference in Amsterdam…. Henk van Ess has given two fine training sessions yesterday and this morning. The first: Training 02: Forensic surfing (Thursday 14.00 – 15.15) How can you figure out the reliability of a website – even without opening the site? How do you find the owner of a web site? How can you see how old a page is, even if it doesn't say 'Page last updated at..'? How do you find the author of a Word document? Welcome to the world of forensic surfing. Extra: CD-ROM with the course 'Internet Detective' for all participants. Watch the HTML version at www.searchbistro.com/forensic.htm The second session: Hacking with Google (Friday 9.30 – 10.45)
“People make mistakes. They put sensitive data on servers. They forget to remove delicate material. They leave directories open with hidden files. Learn how to use Google in a different way. The best search techniques for finding secret documents from governments, institutions and companies. Open them with the right questions. Henk van Ess (AD, Netherlands) teaches you what sort of words you have to type, which special syntax you have to use and how you should interpret the answers. Note: this training will teach you how to find material that shouldn't be on the web. It doesn't teach you how to hack into systems.” This presentation can be viewed at www.searchbistro.com/hack.htm There is a companion book – The Google Hacker’s Guide: Understanding and Defending Against the Google Hacker by Johnny Long (johnny@ihackstuff.com) — partial section at www.searchbistro.com/googlehacks.pdf
http://www.sfexaminer.com/articles/2005/09/26/opinion/20050926_op03_policies.txt Using psychological science to set policy.
Profs. David Kleinbaum and Nancy Barker will present theironline short course “Analysis of Epidemiologic Data” Oct.14 – Nov. 11 at statistics.com. Topics covered in thecourse include: simple analysis of 2×2 tables, control ofextraneous variables (including an introduction to logisticregression), stratified analysis, and matching.
David Kleinbaum, a professor at Emory University's RollinsSchool of Public Health, is internationally known for histextbooks in statistical and epidemiologic methods and asan outstanding teacher. He is the author of “ActiveEpi”and “Epidemiologic Research- Principles and QuantitativeMethods” and has also taught over 150 short courses overthe past 30 years throughout the world.
Nancy Barker is a consulting biostatistician and a co-author of the “ActivEpi Companion Text,” and has over 10years of experience teaching short courses in epidemiologyand biostatistics at Emory University and the Centers forDisease Control and Prevention.
As with all online courses at statistics.com, there are noset hours when you must be online, and you can interactwith the instructor over a period of 4 weeks via a privatediscussion board. We estimate you will need about 10 hoursper week.
Registration: $399 ($299 academic)http://www.statistics.com/content/courses/epi3/index.html
Peter Brucepbruce@statistics.com
P.S. Also coming up – “Clinical Trial Design” Oct. 21 –Nov. 18 with Dr Vance Berger.statistics.com612 N. Jackson St.Arlington, VA 22201USA
Another piece in The Guardian this week (some of the Brit papers are a very good read) discusses how Tesco harvests — and then replants — customer data. This is of interest because Tesco, a British company, is hankering after the U.S. grocery chain, Albertson´s. See “Tesco stocks up on inside knowledge of shoppers´ lives´´ below and “ Profile of an upmarket C10 deserter“ sidebar.
· Crucible database is exhaustive – and secret · Government bodies are tapped for information
Guardian
The company refuses to reveal the information it holds, yet Tesco is selling access to this database to other big consumer groups, such as Sky, Orange and Gillette. “It contains details of every consumer in the UK at their home address across a range of demographic, socio-economic and lifestyle characteristics,” says the marketing blurb of dunnhumby, the Tesco subsidiary in question. It has “added intelligent profiling and targeting” to its data through a software system called Zodiac. This profiling can rank your enthusiasm for promotions, your brand loyalty, whether you are a “creature of habit” and when you prefer to shop. As the blurb puts it: “The list is endless if you know what you are looking for.”
This publicity material was, until recently, available on the website of dunnhumby, but now appears less forthcoming. Attempts by a number of Guardian reporters to retrieve their own personal information under the Data Protection Act led to a four month battle; the request was ultimately denied so the Guardian has appealed to the Information Commissioner. Tesco has provided some personal data held by Clubcard, the loyalty scheme that monitors members' shopping and which has been credited with fuelling the supermarket group's astronomical growth in the past decade.
But as far as Crucible is concerned, the company admits it has “put great effort into designing our services” so information is classed in a way that circumvents disclosure provisions in the Data Protection Act. Clues about the content of dunnhumby's database have appeared in the company's marketing literature. Crucible, it says, is a “massive pool” of consumer data. “In the perfect world, we would know everything we need to know about consumers. We would have a complete picture: attitudes, behaviour, lifestyle. In reality, we never know as much as we would like.” But Crucible, it suggests, has got much further than rival systems by pooling data from several sources and then using the vast Clubcard data pool to profile customers.
Together, Crucible and Zodiac can generate a map of how an individual thinks, works and, more importantly, shops. The map classifies consumers across 10 categories: wealth, promotions, travel, charities, green, time poor, credit, living style, creature of habit and adventurous.
A “Mrs Pumpkin” is cited: she makes pennies work when she shops, mostly uses cash, has a steady repertoire of products but experiments with the new, shops at various times, spends a little more on eco-friendly items, is involved with charitable giving, is rarely away and likes promotions for things she buys.
How does Tesco get the information? Clubcard is used to target promotions at particular cardholders. But Crucible is separate and Tesco insists that while loyalty scheme data is used by Crucible it does so anonymously rather than a house-by-house, name-by-name basis.
Dunnhumby's chairman, Clive Humby, offers a few more clues. Companies such as Experian, Claritas and Equifax have databases on individuals and Crucible collects from them all. Any questionnaire you may have completed, any reader offers you responded to, are bought to build up a picture of attitudes and habits. Crucible also trawls the electoral roll, collecting names, ages and housing information. It uses data from the Land Registry, Office for National Statistics and other bodies to generate a profile of the area you live in. Zodiac is employed to provide a more detailed profile. The combination is valuable to many consumer goods firms: dunnhumby generated profits of £4m on sales of £28m in the last year for which accounts are available. Some £12m of business was done directly with Tesco.
Mr Humby and Edwina Dunn founded dunnhumby. The two have a reputation as shrewd operators in the marketing industry and still own shares in the firm alongside Tesco's majority stake. How the supermarket group and other customers use the data is less clear. One former employee involved in the company's marketing told the Guardian that it can be used to decide how to target offers to individuals or where to open new stores.
A Tesco spokesman said last night: “All work carried out by dunnhumby is regulated by the Data Protection Act and the Direct Marketing Association Code of Practice.” But, as the supermarket unveils yet another set of sparkling half-year figures today, one thing is clear: while past success may have been built on the company knowing its customers, Tesco plans to secure its future by knowing everyone else's customers as well.
Profile of an upmarket C10 deserter
When it comes to my personal information, I'm a natural paranoid. So when signing up for a Tesco Clubcard to get those cashback vouchers and offers, I made a point of providing as little information as the application would allow.
No matter. According to Tesco's disclosures under the Data Protection Act (DPA), in the year my card was in use the supermarket managed to build a substantial – if rather wayward – portrait of this reluctant shopper's habits. A formal DPA request, followed by numerous letters to and fro, a terse telephone conversation and finally, a fax explaining that, yes, this information would be used in a journalistic exercise, finally produced two sides of information.
Apparently, I'm a gal who hankers after “finer foods”- indeed, a “natural chef”, though friends tell me this probably has more to do with my tendency to cook with natural ingredients than any signs of being a budding Nigella. I am, Tesco determines, “upmarket” – a reference, I suspect, to my habit of buying organic food (Green & Blacks mint chocolate being a particular favourite).
The database defines me through the past four years, placing me in the mysterious “C10” category for 2003, having been an “H13” a year earlier – whatever that means. My “family type” is “other,” though alternative social options are not listed. Most importantly for the supermarket, I just don't spend as much as I could there. Under “share of spend” with Tesco I am deemed to have “potential”.
My household carries a “reference number”, the date of my last visit, with branches used in the past. It says whether I have used Clubcard vouchers and correctly states I do not want my personal information to be passed to other parts of the “Tesco Group”. There is no information as to whether I am diabetic, teetotal or have a special diet.
Five slots describe my “shopping habits”, each carries the words “Not shopped in last eight weeks”. Clearly, I'm a Tesco deserter and a prime candidate for those £10-off vouchers that have been dropping through the letter box of late.
· To learn how to get your personal information under the Data Protection Act, see www.guardian.co.uk/foi
The UK paper The Guardian carries a couple interesting pieces this week on the British company, The Press Association, or as it is know now, the PA Group. Essentially, it demonstrates that investment in creative people who can leverage digital technology can make money.
See ´´The new heart of British journalism´´ and “Service used by every paper makes only 1% of the money ´´
The new heart of British journalism
A sleepy Yorkshire town has become the hub of an international publishing operation
Martin WainwrightTuesday September 20, 2005
Twice now, extraordinary things have happened to the sleepy market town of Howden – little more than a village on the rich, flat land where the river Humber is joined by the Yorkshire Ouse. The first time, in the 1920s when the local airfield became the centre of Britain's airship industry, ended abruptly with the loss of the R101 (and the then air minister) in a storm over northern France. The second time is now, and it shows no sign of collapsing at all.
Quietly over a decade, Howden has become one of the biggest centres of journalism in the country. More than 650 staff of the Press Association – well over double the organisation's workforce in London – occupy buildings scattered round the quaint streets, as if an Oxbridge college had dropped in. Editorial trainees are in the Bishop's Manor, a medieval roost with jumbo plasma TV screens in the fireplaces where the Bishops of Durham used to warm up after trekking down from the north-east. Guests from London stay in a redbrick Georgian manor house which looks like something out of Jane Austen.
The high command of PA Sport has the vast, curving top floor of a purpose-built office block which replaced the town's redundant police station and magistrates' court two years ago. From here, among scores of other sports information services, Premier League goals and match analysis are texted live to mobile phones all over the world.
Howden is the main laboratory for PA's expansion from a comprehensive and reliable news-wire into the structural support for newspapers, websites, television, radio and magazines. The guts of the service is produced elsewhere, by reporters at news events, parliament or sports fixtures, but the processing and ever more imaginative marketing go on in Yorkshire.
Tony Watson, PA's editorial director, a multiple award-winner and former editor of the Yorkshire Post, relishes the innovation. Outside his office on the ground floor, reporters' material is slimmed into Teletext bulletins (“An excellent subediting exercise,” he says. “The contents have to have exactly the right wordage to fill a line across the screen.”) On the next floor up, the same data is repackaged for listings and, with extra content, for breaking-news sections on websites, including the Guardian's. On the top floor it gets reprocessed again for sport.
Another section turns it into mini-bulletins for mobiles, text-only or with pictures. There are initiatives to expand it into digital TV, with a studio just opened and a specialist journalists' training course starting next month. Although PA has always been, and remains, modestly anonymous, its Howden super-office is starting to publish on a scale most editors must envy.
Touring the main building, Watson points out a wall pinned with national and international news pages from British local newspapers. Copy has always been provided for these by PA but now staff at Howden offer story choice and complete page layout too. A couple of those magazines dished out by rail companies are produced here with advertising and printing subcontracted to regional newspaper customers of PA. A canny use of partnerships has been part of the agency's growth. The editorial centre grew out of joint working with now vanished Westminster Press. PA Weather, which now sells its meteorology to road-gritting departments as well as the media, has just taken over the other, Dutch half of the joint operation.
Howden is now full up, says Watson, whose colleague Chris Buckley, managing director of PA Sport, takes over half the middle floor on Saturdays, when football needs 70 extra staff and the listings terminals are briefly unoccupied. There has been criticism about PA pay rates – this month the National Union of Journalists published a survey showing levels as low as £12,000 a year at Howden. But the size of the operation is buoying the flagging local economy, and vacancies are quickly filled.
And now there is India. By November, 50 staff will be backing up the Yorkshire operation in offices in Mangalore, on the south-west coast of India, which are also designed to be a jumping off point for further news and sport packaging overseas. “There's tremendous interest in British sport in Asia,” says Watson, describing automated systems in Howden which text or email results, as they happen, in Cantonese, Thai, Mandarin and many other languages. “But there's also a growing number of fixtures locally, which we can handle either for other markets or for the countries involved.”
Two recent deals see PA distributing German sports results in Germany and – from this autumn – selling South African premier league reports and results within South Africa. Mr Buckley says: “They're holding the World Cup there in five years' time and Fifa has recommended the data-processing system as a model for the rest of Africa.”
After the R101 tragedy in 1930, there was gloom in Howden when glamorous airship designers stopped coming from London. Today, the “Howden Flyer”, a direct, two-hour train service from London which stops at the town six times a day to drop off largely PA clients, is only going to get busier.
Dwight Hines posts an interesting opportunity to the IRE listserv:
“I am going to participate as an internet journalist in IBM's Project Serrano Beta program. If you read the material below, you will see that the beauty, or the absolute brute force ability of the system being developed by IBM is the capacity to search lots of data bases and integrate the information. It seems to me that this is ideal for those involved in investigative reporting at global or local levels, or criminal justice issues, who need lots of flexibility and crank power to draw information from all over. If you are interested in participating in the Beta program, please contact me. You will be able to define the system that you need, working with the IBM folks and other journalists. Obviously, the more different people and different media organizations participating, the more power the system will have. I don't think antitrust issues or intellectual property rights will be an issue until the system is working, but those are just two areas that will become important, along with differences in laws in different countries. This ain't gonna be your Gramma's google. Dwight Hines, I do not work for IBM nor do I take goodies from them in any way. Project Serrano Beta Programs: Enterprise search and Data modeling and integration design Project Serrano extends WebSphere(r) Information Integration with enhanced search and data modeling and integration design. It expands the source accessibility, functionality, performance, and localization of already robust information integration technologies — to help customers manage their growing information requirements in both structured and unstructured domains. Project Serrano Beta includes two programs: Rational(r) Data Architect will combine traditional data modeling capabilities with metadata discovery, mapping, and analysis, all organized by a modular project-based structure. WebSphere Information Integration (II) OmniFind Edition finds information stored across the enterprise in file systems, content archives, databases, collaboration systems, and applications. http://www-306.ibm.com/software/data/integration/beta.html ================== WebSphere Information Integrator OmniFind Edition http://www-306.ibm.com/software/data/integration/db2ii/editions_womnifind.htmlres and benefits Key search features include: • search results with sub-second response from enterprise content such as intranets, extranets, corporate public websites, relational database systems, file systems, and content repositories. • supported sources such as HTTP/HTTPS, news groups (NNTP), file systems, Domino(r) databases, Microsoft(r) Exchange public folders, DB2(r) Content Manager, DB2 Universal Database™ (DB2 UDB), DB2 UDB for z/OS(r), Informix(r), and Oracle databases. Documentum and FileNet support is provided through WebSphere(r) II Content Edition. • state-of-the art relevancy algorithms for corporate content. The new OmniFind Edition provides numerous technology and business benefits. It: • scales to millions of documents and thousands of users • fits easily into enterprise Java™ applications with appropriate security so that confidential information is not exposed • eases administration for quick set up • utilizes background analysis to minimize administrator tasks required to get high quality search results • provides highly relevant search results and the framework for richer text analysis • includes a seamless upgrade to WebSphere II OmniFind for WebSphere Portal customers who can leverage existing taxonomies for navigation and categorization, migrate rules for rule-based classification, and surface the same user experience through the WebSphere Portal Search Center”
From the Librarians' Index to the Internet….
GISc Resources for Hurricane Katrina
http://ucgis.org/Katrina/ http://lii.org?recs=027428 Subjects: * Geographic information systems * Emergency management * Hurricane Katrina, 2005
Friend Steve Guerin sends this from Santa Fe….
The Disaster Dynamics Project at UCAR looks timely:http://swiki.ucar.edu/dd/2
Check out the Hurricane Landfall gamehttp://swiki.ucar.edu/dd/71The Hurricane Landfall Disaster Dynamics Game is a four-player virtual strategy game about the interaction between natural disasters and urban planning. The game is computerized; it plays like a traditional physical boardgame, but there are simulation components that require significant computation. The game's architecture is client-server, with each player having her own computer.
Individual machines allow moves to be made in parallel and enable players to access private representations of the game state in addition to the public representation. The server is typically run on the instructor's computer, andwill also provide facilitation tools.
This seems to be the best tool we've seen to track individuals who may be unaccounted for following Katrina.
Lycos: Katrina Missing Persons Site http://www.lycos.com/katrina/With multiple small databases of survivors, we desperately needed one search engine that would search through all of them, and Lycos created one. The site lists all the databases it searches through. If you're aware of others, please fill out Lycos' form to add them.