What do unstructured data and Santa Claus have in common?
There is no such thing as unstructured data. There, I said it.
Structure is inherent in the definition of data. No structure means no information means no data. Like “clearly misunderstood,” “unstructured data” is an oxymoron.
Some have proposed “semi-structured data” to overcome this logical issue, but this alternative is no less discriminatory. Whatever part of the data lacks structure has no informational content and is thus not data. Contradiction persists.
“Big data” is generally refers to data whose large size or structure or rate of change or complexity contributes to the difficulty of working with it in some way. “Unstructured data” refers to the subset of big data for which structure is part of the problem. These phrases are subjective and point to shortcomings in tools or users rather than something inherent about the data itself. They are highly context dependent and, as such, are often misused and misunderstood.
Which causes miscommunication. Unstructured communication, if you will.
Please regard this as a cease and desist letter for the use of the phrase “unstructured data” unless in a defamatory or humorous context (warning: it’s not that funny, either). We have plenty of words to describe specific instances of The Data Formally Known as Unstructured: “text,” “non-relational,” “the Web.”
If you absolutely need a term, perhaps “differently structured data” would be more somewhat more palatable to my fellow pedants. Another possibility, “multi-structured data,” is beginning to gain some momentum. Just keep the context crystal clear and we won’t come looking for you.
P.S. Apologies to any children whose Yuletide dreams were just crushed.
comments
One Response to “What do unstructured data and Santa Claus have in common?”
Leave a Reply
wow, I had never really thought about this issue before! But I’m right there with you about people who are not careful about nomenclature.