levelName weight
1 website-design NA
2 administration 25
3 advertisement 256
4 ajax 64
5 background 25
6 branding 25
7 browser 400
8 contact-us 64
9 conversion 400
10 conversion-rate 361
11 css 2500
12 dates 256
13 datetimepicker 289
14 time 529
15 time-zones 25
16 design-process 225
17 product-management 25
18 files 36
19 flat-design 49
20 ... 40 nodes w/ 391 sub NA
Background
This work is part of a research and development collaboration with Glean.ly, a UX new research repository designed to solve the problem of storing user research for large companies.
We are interested in how people tag UX-related information and did some analysis on the UX Stack Exchange community of social Q&A on UX issues.
Algorithm
A few algorithms have been published that allow us to create a hierarchy from a folksonomy tag network. Here, I used the UX Stack Exchange tags and user posts which are available in the Stack Exchange data dump.
The original work on hierarchy-from-folksonomy was done by Heymann, P. & Garcia-Molina (2006) who used a similarity measure between tags (based on cooccurance on objects) to generate hierarchies for Delicious and Flickr tags. I used an alternative approach from Tibeĺy et al (2013) that indicated slightly better results. The main steps are:
- Generate a directed graph from tag co-occurances on the data objects (in this case SE Posts). The graph edges are duplicated to go in both directions;
- Apply a global pruning process based on a threshold of the total local weight of incoming links on a node (those with over 0.3 of the total are retained);
- Calculate z-scores for the remaining edges and use these to select the best candidate parent for each node (the highest z-score indicating the node with which another node is most often paired);
- Select the global root from the remaining root nodes and then reattach the others.
Results
The resulting hierarchy looks like this:
Visualisation
Here is a PDF of the full hierarchy
The hierarchy can be visualised as a radial tree (pruned here to show only tags with 200 or more occurances)
The resulting taxonomy needs improvement, but it can be seen as a useful first step, based as it is on real user tagging activity. Interesting further research questions are how much it has been shaped by:
- Tag autosuggest, with users perhaps more likely to choose the first suggestions;
- Tag popularity - there could be a rich-get-richer effect where users want to use popular tags in order to increase the visibility of their posts.
The R code used is available here. It should work OK for any other SE site, though might struggle with larger sites, as the code is not yet optimised.
Attribution
Content created by the UX Stack Exchange community. Attribution-ShareAlike 4.0 International.
References
Heymann, P., & Garcia-Molina, H. (2006). Collaborative Creation of Communal Hierarchical Taxonomies in Social Tagging Systems. Stanford InfoLab Technical Report. http://heymann.stanford.edu/taghierarchy.html
Tibeĺy, G., Pollner, P., Vicsek, T., & Palla, G. (2013). Extracting tag hierarchies. PLoS ONE, 8(12), 1–46. https://doi.org/10.1371/journal.pone.0084133