Data clusterization is an important problem in statistics: how to separate different values into distinct groups?
In the context of network science, clusterization is often based on the nodes' connectivity. However, we might also be interested in other perspectives. For instance, we might want to cut off the top N most influential nodes in a network or in a group. How would we do that?
The most typical approach is called K Means. It takes a certain number of values and divides them into groups where the differences inside each group are minimized while the differences between the groups are maximized. Cartesian K means (or ckmeans) is an R implementation of this approach used in many statistical packages.
A 1-dimensional implementation of the K Means algorithm is the Jenks natural breaks algorithm. We found it has useful applications in network science when there is a need to cut off the top N most influential nodes. This can be achieved by launching the Jenks-Breaks algorithm providing it with the set of values (e.g. betweenness centrality of the top nodes) plus the number of groups we need (e.g. 2 — the top N nodes and the rest). Based on this approach we get the results and display those nodes in InfraNodus' top influential nodes field.
Please sign in to leave a comment.