Again, apologies for the telegraphic language. It seemed to be better to share quickly than to spend overlong on the grammar. Hope it all makes sense (and if I've erred, do correct me in the comments below. Thx)
A warning: the Pajek team use ‘cluster’ and ‘partition’ interchangeably
There are two basic approaches, when trying to comprehend large networks – statistics and decomposition.
Decomposition of large graphs was one of the primary design goals of the Pajek software (or ‘divide and conquer’) – four basic strategies, here:
Cut-out a part of the network for closer scrutiny.
Context: select a sub-network (as above): do not remove the remainder, but instead reduce each remaining group to vertices.
Reduction: reduce all the more cohesive regions of the network to vertices.
Hierarchy: split the network up into progressively decomposed groups.
Cuts: can be vertex- or line-cuts, but of course the problem is deciding at what level to cut. The advice offered was to look for natural breaks in the histogram of the value that you’re looking to use for the cut [e.g. Info/Network/Line values]. To quote this is largely “art or witchcraft”! The advantage of cuts is that they are very general, and can employ any exogenous or endogenous variable. Which is nice.
Cores: are very quick to calculate, in any tool. Again, the key is to ask yourself what is the right structural property to connect your group together.
Line-cuts: Pajek
offers some subleties here. In order to use ‘triangular weights’, a ‘main task
flow’ procedural description (using menu items as in the latest version of
Pajek to date) would be:
- Net>Count>3-rings>Undirected
- … (I noted “Network>linevalues” – clearly wrong - will check and correct…)
- Net>Transform>Remove>Lines with value>Lower than…
- (then remove the isolates)
- Net>Partition>Degree>Compute all degree
- Operations>Extract from partition>1…*
Generalized cores: also interesting, perhaps to measure how productive someone is within their group (see the Pajek website for recently-updated slides on this).
Triangular connectivity (e.g. 3-cliques): a generalization of connectivity – triads can be connected by vertex or edge. Can be extended to 4-rings. [Why are 4-rings interesting? My favourite example is used by Sean Bergin of DSTO: imagine a double date – as long as neither couple is fighting, and both same-sex pairs get along, the night will be a success – there’s no real need for the man in one couple to talk to the woman in the other, or vc vs!] Triangular connectivity can be used to operationalise the concept of weak ties (arguing that all ties in short cycles are strong ties, and all other ties are ‘weak’) by joining 3- and 4-rings together into a network.
Islands: a nice alternative to the standard practice of the global ‘greater than’ cut-off value was demonstrated. To use a hill-walking analogy, instead of classifying mountains as ‘anything over x feet’ we often identify any local peak with a height drop of more than x feet between it and neighbouring peaks as a mountain. In Pajek, Islands allow for this more local definition of ‘peak-ness’.
- Net>Partitions>Islands>Generate Network with Islands (will show the ‘below the water’ links as well)
Clustering
(N.B. these techniques are marked with a ‘*’ in the menus – this indicates that they should not be used with large networks, but only on networks of ‘some hundreds’ of vertices. Hierarchical clustering on very large networks is coming soon, however!) Perhaps surprisingly, the key function is found at
- File>Network>Export Matrix to EPS>… (as the key visualisation is viewed as a printed matrix outside Pajek, in an EPS viewer). It is possible to use the hierarchy options in Pajek to improve the clustering tree further.
*GSView tip: use [Media>User-defined] to set the sheet to a size large enough to print large matrices!
Recent Comments