A UX-Related Hierarchy from Folksonomy

Creating structure from a user-generated tag cloud
Research
Author

Paul Matthews

Published

June 19, 2020

Background

This work is part of a research and development collaboration with Glean.ly, a UX new research repository designed to solve the problem of storing user research for large companies.

We are interested in how people tag UX-related information and did some analysis on the UX Stack Exchange community of social Q&A on UX issues.

Algorithm

A few algorithms have been published that allow us to create a hierarchy from a folksonomy tag network. Here, I used the UX Stack Exchange tags and user posts which are available in the Stack Exchange data dump.

The original work on hierarchy-from-folksonomy was done by Heymann, P. & Garcia-Molina (2006) who used a similarity measure between tags (based on cooccurance on objects) to generate hierarchies for Delicious and Flickr tags. I used an alternative approach from Tibeĺy et al (2013) that indicated slightly better results. The main steps are:

  1. Generate a directed graph from tag co-occurances on the data objects (in this case SE Posts). The graph edges are duplicated to go in both directions;
  2. Apply a global pruning process based on a threshold of the total local weight of incoming links on a node (those with over 0.3 of the total are retained);
  3. Calculate z-scores for the remaining edges and use these to select the best candidate parent for each node (the highest z-score indicating the node with which another node is most often paired);
  4. Select the global root from the remaining root nodes and then reattach the others.

Results

The resulting hierarchy looks like this:

                 levelName weight
1  website-design              NA
2  administration              25
3  advertisement              256
4  ajax                        64
5  background                  25
6  branding                    25
7  browser                    400
8  contact-us                  64
9  conversion                 400
10 conversion-rate            361
11 css                       2500
12 dates                      256
13 datetimepicker             289
14 time                       529
15 time-zones                  25
16 design-process             225
17 product-management          25
18 files                       36
19 flat-design                 49
20 ... 40 nodes w/ 391 sub     NA

Visualisation

Here is a PDF of the full hierarchy

The hierarchy can be visualised as a radial tree (pruned here to show only tags with 200 or more occurances)

See in full page view

The resulting taxonomy needs improvement, but it can be seen as a useful first step, based as it is on real user tagging activity. Interesting further research questions are how much it has been shaped by:

  1. Tag autosuggest, with users perhaps more likely to choose the first suggestions;
  2. Tag popularity - there could be a rich-get-richer effect where users want to use popular tags in order to increase the visibility of their posts.

The R code used is available here. It should work OK for any other SE site, though might struggle with larger sites, as the code is not yet optimised.

Attribution

Content created by the UX Stack Exchange community. Attribution-ShareAlike 4.0 International.

References

Heymann, P., & Garcia-Molina, H. (2006). Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. Stanford InfoLab Technical Report. http://heymann.stanford.edu/taghierarchy.html

Tibeĺy, G., Pollner, P., Vicsek, T., & Palla, G. (2013). Extracting tag hierarchies. PLoS ONE, 8(12), 1–46. https://doi.org/10.1371/journal.pone.0084133