Tag Clouds For Visualizing Text Mining Results
Tag Clouds have seemingly overnight become a set fixture on the internet landscape. They were initially conceived as the online equivalent of a content dump: Buzzwords are put in some kind of logical, attractive presentation to give a quick overview of what a company, conference, community or blog is all about.
I’m finding it interesting to consider exactly how much information can be presented via this new visualization. In the example below, the countries of the world and the size of font indicates the population.

The color (brown and blue) is only used for appearance, to separate the visually countries (a pity, from an information perspective).
Read the rest of this entry »
So let’s list some information possibilities:
• Color, and scale (eg: from light to dark)
• Size (either of the word or the space on the cloud)
• Proximity and Geographical relations
The first two qualities will give information about the term in relationship to all the rest. The third can give the relationship between individual terms. This invites some very exciting possibilities when combining with text mining.
Text mining goes beyond simple keyword recognition to give insights as to how terms are related in a text body. It can also give ‘weights’ to terms that are more ‘important’ (for example, terms that have more value in distinguishing between document types).
For instance, the following example…

In this graph, the size of the block indicates how often a term occurs in the articles of ‘Smart Business’. The color indicates how strongly the term is weighted. For example, “+ document” (the plus indicates that ‘stemming’ has occurred, multiple forms of the same word are counted together), occurs relatively frequently, but the high weighting shows that it has high categorization value (eg: it occurs in relatively few separate documents).
I’d be interested in hearing more of your ideas about what makes a good, informative tag cloud.
This entry was posted
on Monday, November 30th, 2009 at 7:00 AM and is filed under Market Research, Modeling, Text Mining.
You can follow any responses to this entry through the RSS 2.0 feed.
You can skip to the end and leave a response. Pinging is currently not allowed.