What do unstructured data and Santa Claus have in common?

Santa Claus as a huge balloon

Image courtesy of Bart Fields

There is no such thing as unstructured data. There, I said it.

Structure is inherent in the definition of data. No structure means no information means no data. Like “clearly misunderstood,” “unstructured data” is an oxymoron.

Some have proposed “semi-structured data” to overcome this logical issue, but this alternative is no less discriminatory. Whatever part of the data lacks structure has no informational content and is thus not data. Contradiction persists.

“Big data” is generally refers to data whose large size or structure or rate of change or complexity contributes to the difficulty of working with it in some way. “Unstructured data” refers to the subset of big data for which structure is part of the problem. These phrases are subjective and point to shortcomings in tools or users rather than something inherent about the data itself. They are highly context dependent and, as such, are often misused and misunderstood.

Which causes miscommunication. Unstructured communication, if you will.

Please regard this as a cease and desist letter for the use of the phrase “unstructured data” unless in a defamatory or humorous context (warning: it’s not that funny, either). We have plenty of words to describe specific instances of The Data Formally Known as Unstructured: “text,” “non-relational,” “the Web.”

If you absolutely need a term, perhaps “differently structured data” would be more somewhat more palatable to my fellow pedants. Another possibility, “multi-structured data,” is beginning to gain some momentum. Just keep the context crystal clear and we won’t come looking for you.

P.S. Apologies to any children whose Yuletide dreams were just crushed.

Quantrepreneurs

Image courtesy of Argonne National Laboratory

Google will learn just a tiny bit more about me (and you, the reader) from this post, enabling the search engine giant to (probabilistically) increase its bottom line through better targeted advertisements. You’re welcome, Google.

A hugely transformative data revolution is upon us. Machine readable information is being generated and captured at an astounding and rapidly accelerating rate. At the same time, a ballooning army of alchemists and applications attempt to transform it into value of various forms. Which results in even more data. Kaboom.

On the data generation side are technologies that digitize and enable the creation of new digital information: the vast human-generated data factories of the internet, increasingly sophisticated devices in our homes and handbags, and sensors galore in places most people don’t imagine.

On the value generation side are data scientists (also in places most people don’t imagine), enthusiasts, knowledge workers, and an ever expanding array of hardware and software.

At the center of the data revolution are quantrepreneurs, who innovate new ways to generate, capture, transform, and wring value from data, linking the two sides and propelling the revolution forward, evolving the information age into the age of actionable insight. This is their story.


Telling the full story of this revolution necessitates touching on many topics: data (of course), science, technology, engineering, math, business, innovation, entrepreneurship, privacy/transparency and the law, design, storytelling, technology, statistics, and more. Content will be wide ranging and will include case studies, opinions, thought experiments, predictions, rants, musings, and practical advice.

Non-specialists are welcome. In fact, this blog is really for you. Despite the recent explosion of online chatter about the data revolution, many sources still make it seem like magic. It’s not magic. Data Bitten aims to provide a voice that is significantly underrepresented in the conversation, by taking a first principles approach to the data revolution. We’ll demystify and debunk. We’ll be skeptical. We’ll expose the magic for what it is, simply good science or iterative engineering or a smart idea. The goal is to make this world accessible to a wider audience, and through a focus on the practical to enable and encourage greater understanding and participation.

Are you ready to get bitten by the data bug?

  • About

    Data Bitten aims to tell the story of the data revolution. More to come.

  • Stay connected