How big data can result in bad data

A couple of years back, ratings agency Standard and Poor’s downgraded American debt. Not because of the state of the economy, but because of an error in its original calculations… a mere US$2.1 trillion.

Nate Silver, the poster child for analytic predictions, told a recent conference the financial crisis was as much about bad modelling as greed. The ratings agencies, he said, based assumptions on past mortgages, not the amount of people who’d default after the over-exuberance of banks in giving loans to anybody and everybody.

Welcome to the world of bad data, something that’s caught on even in Australia. GS1, the agency responsible for barcoding and other product identification systems, recently released a report on the impact of bad data on the grocery industry which found bad data costs Australian grocery retailers $350m, and that 65 percent of ‘data misalignment’ problems led to lost sales.

Bad data mostly falls into two categories. First is the ‘rubbish in, rubbish out’ phenomenon of starting with the wrong dataset because of false records or simply looking at the wrong information (for example, your customers’ incomes which aren’t really related to their ages – and may even corrupt the analysis). “It’s crucial big data efforts only consider the right set of data points and that they’re clean,” is how David Bernstein, eQuest big data division VP, puts it.

In one example Jonah Lehrer describes in his book How We Decide, early test audiences gave the hit show Seinfeld a thumbs down using the feedback dial employed by focus groups, where copycat shows of the 1980s smash Friends scored exceedingly well even though they failed. Researchers soon discovered viewers were rating shows on familiarity, and had to adjust the approach of their questions accordingly.

In the right field like health or military, bad inputs can be tragic. In 2011 a US drone spotted a group of Afghan villagers and attacked, killing all 23. Concerned with protecting US troops nearby, the operators were so flooded with incoming intelligence someone overlooked the fact that because the gathering contained women and children, it was almost certainly civilians

We even have bad data in fiction. In Arthur C Clarke’s 2001: A Space Odyssey (and Stanley Kubrick’s film), the ship’s AI interface, HAL 9000, is programmed with human-like response. So when secret orders it can’t reveal send it into a feedback loop of paranoia, it solves the logic problem by proceeding to kill the crew.

The second and more common type of bad data is poor interpretation, often because of a lack of context. LatentView Analytics CEO Venkat Viswanathan uses the analogy of noise-cancelling headphones to illustrate how it’s all about the threshold demanded by the circumstances. “Maybe you can hear glass breaking outside while the music’s playing,” he says. “Maybe it’s important like when there are small children under your supervision, but not if you’re in a restaurant where breaking glass isn’t a big deal.”

BloomReach head of marketing Joelle Kaufman says data interpretation is all about intuition and control. “Machine-generated data can be useful, but without the human element of intuition and control it can be downright offensive. We’ve all been the victim of poorly targeted data-driven marketing – I recently purchased a dryer and have been stalked by dryers on the web ever since. I only need one.”

Omer Trajman, VP field operations at big data applications company WibiData, agrees the human element is what’s most important, and he points out that when there’s a bad reading it’s us, not the data, to blame. “There’s no such thing as bad data,” he says, “just bad models. Data analysts just need to extract the right data at the right time, then act on the insight in a timely manner.”

Some industries seem to be taking pause, and might even have been burned. Kaufman cites a Duke University Business School study that said marketing projects using analytics to drive decisions decreased from 37 percent in February 2012 to 30 percent in February this year.

But eQuest’s Bernstein believes the smartest executives and ICT managers are learning that the resources to interpret are as important as the mass of data. “As famed software architect Grady Booch said, ‘a fool with a tool is still a fool’,” he says. “Big data isn’t the be all and end all you just switch on and wait for amazing answers. Data itself – no matter how big – has little value unless it can drive quicker, more effective decisions.

Full client and publication list:

  • 3D Artist
  • APC
  • Auscam
  • Australian Creative
  • Australian Macworld
  • Australian Way (Qantas)
  • Big Issue
  • Black Velvet Seductions
  • Black+White
  • Bookseller & Publisher
  • Box Magazine
  • Brain World
  • Business News
  • Business NSW
  • Campaign Brief
  • Capture
  • Cleo
  • Cosmos
  • Cream
  • Curve
  • Daily Telegraph
  • Dark Horizons
  • Dazed and Confused
  • Desktop
  • DG
  • Digital Media
  • Disney Magazine
  • DNA Magazine
  • Empire
  • Empty Magazine
  • Famous Monsters of Filmland
  • Fast Thinking
  • FHM UK
  • Film Stories
  • Filmink
  • Follow Gentlemen
  • Geek Magazine
  • Good Reading
  • Good Weekend
  • GQ
  • How It Works
  • Hydrapinion
  • Inside Film
  • Loaded
  • M2 Magazine
  • Marie Claire Australia
  • Marketing
  • Maxim Australia
  • Men's Style
  • Metro
  • Moviehole
  • MSN
  • Nine To Five
  • Paranormal
  • PC Authority
  • PC Powerplay
  • PC Update
  • PC User
  • PC World
  • Penthouse
  • People
  • Pixelmag
  • Popular Science
  • Post Magazine
  • Ralph
  • Reader's Digest
  • ScienceNetwork WA
  • SciFiNow
  • Scoop
  • Scoop Traveller
  • Seaside Observer
  • SFX
  • Sydney Morning Herald
  • The Australian
  • The Retiree
  • The Sun Herald
  • The West Australian
  • TimeOut
  • Total Film
  • Video Camera
  • Video&Filmmaker
  • Writing Magazine
  • Xpress
  • Zoo