Use Word Clouds Cautiously
By Robert Relihan, Senior Vice President
Those of us who face the daunting task of sifting through mounds of verbal data -- forty individual interviews or a three week online community -- synthesizing all of those words, and presenting our analysis clearly, succinctly, AND with impact have been intrigued by word clouds.They seem simple and elegant; they reduce all of those words to a picture of the important themes. Now, there are numerous web sites to help us create word clouds, Wordle being only the best known.
So, why do I feel vaguely dissatisfied when I look at word clouds, even the ones I have created? They often seem to miss the point or be overly simplistic. Creating a word cloud sometimes feels like I have cheated myself and my audience.
Jacob Harris of The New York Times makes the case against word clouds and, in the process, gives a brief primer on how to report its data. It is an incredibly useful article.
He begins with a concept that should become a mantra to market research professionals. The critique of data clouds is based on the principles of data journalism. When many of us began our careers, the model for reporting was the academic paper. I wrote reports a long time ago with footnotes. Now we strive for clarity and simplicity. The magazine or newspaper (online versions, of course) is our guide. Data journalism should be our art.
"Visualization is reporting, with many of the same elements that would make a traditional story effective: a narrative that pares away extraneous information to find a story in the data; context to help the reader understand the basics of the subject; interviewing the data to find its flaws and be sure of our conclusions. Prettiness is a bonus; if it obliterates the ability to read the story of the visualization, it's not worth adding some wild new style or strange interface."
The ways Harris point out how word clouds go wrong provide us with a road map for good reporting or, rather, good data journalism.
- Word clouds are based on a very rudimentary textual analysis. In most cases, a phrase-level or a thematic analysis would provide richer and more penetrating analysis. The general lesson of this observation is that we need to focus on the concepts that knit the words together consumers use and not on the words.
- Word clouds are often used when textual analysis is not the appropriate tool. As Harris says, in our analysis we should not confuse "signifiers with what they signify." We need to use the appropriate methods for getting below the surface of consumers' comment. Simply digesting their words will not do that.
- Word clouds have a dirty secret. They really aren't analysis. They leave readers with the task of peering at the image and discerning the meaning themselves. Word clouds make the assumption that the meaning is obvious. But, any analysis worth its salt requires some explanation; it requires framing and focus.
- Finally word clouds miss the narrative. I am not saying what we write or present should be long, dreary marches through the data. Hardly. We need to find the thread or threads that bring fresh insight to a particular area of consumer behavior. There are often several reasons why two words might dominate a word cloud. We need to create the story for the reader or listener that that makes just one of these reasons the most compelling and the most relevant. And, incidentally, that narrative still can be visual.
A word cloud tries to make us believe its immediacy produces insight when, in fact, it may mask the narrative we "data journalists" should be creating.