Announcements‎ > ‎

Complex Networks and Topological Features

posted Nov 25, 2009, 12:30 AM by Olaf Bochmann   [ updated Apr 27, 2010, 8:27 AM ]

The study of complex networks is a more resent activity in complex systems science. It is largely inspired by empirical studies of real world networks like social-, gene-, neuronal-, and computer networks. 

A complex network is essentially a graph with "non-trivial" topological features. Let's first review some trivial features for regular or random networks, as they are known from graph theory.

Trivial Network Features

Trivial networks such as lattices and random graphs have been studies extensively in the past. Lattices are regular graphs whose drawing corresponds to some grid/mesh/lattice. A random graph is a graph that is generated by a random process. Some of the more prominent graph properties will be discussed. Graph properties or invariants depend on the abstract structure only, not on the representation.  They include degree, clustering coefficient, connectivity (scalars); degree sequence, characteristic polynomial, Tutte polynominal (sequences/polynominals). 

Degree / Degree Distribution

The degree of a node is the number of connections it has to other nodes. There is a maximum degree, a minimum degree and a degree sum of a graph. A vertex with degree 0 is called an isolated vertex, a vertex with degree 1 is called a leaf vertex and the edge connected to that vertex is called a pendant edge.

The degree distribution is the probability distribution of these degrees over the whole network. This is an important property of both real world networks and theoretical networks. For example the random graph has a binomial distribution of degrees or a Poisson distribution for lage networks. Most real world networks are highly right-skewed, meaning that a large majority of nodes have low degree but a small number, known as "hubs", have high degree (see scale-free-networks below).

The degree sequence is a non-increasing sequence of vertex degrees. Graphs with the same degree sequence are isomorphic. Isomorphism captures the idea that the structure of the graphs is the same - ignoring individual distinctions, e.g. the number of cycles in the isomorphic graphs is the same, however node coloring and graph drawings may differ. 

Clustering Coefficient

The clustering coefficient assesses the degree to which nodes tend to cluster together, e.g. in social networks, nodes tend to create tightly knit groups characterised by a relatively high density of ties - greater than the average probability of a tie randomly established between two nodes. There is a global and a local version of this measure. The global clustering coefficient gives an overall indication of the clustering in the network, whereas the local one gives an indication of the embeddedness of single nodes.

The global clustering coefficient is based on triplets of nodes - three nodes that are connected by either two (open triplet) or three (closed triplet) undirected ties. It is calculated as the ratio of the number of closed triplets and the total number of triplets (both open and closed).

The local clustering coefficient of a node quantifies how close its neighbors are to being a clique (complete graph). It is used to determine whether a graph is a small-world network (below). The local clustering coefficient for a node is then given by the proportion of links between the vertices within its neighbourhood divided by the number of links that could possibly exist between them. The measure is 1 if every neighbour connected to the node is also connected to every other vertex within the neighbourhood, and 0 if no vertex that is connected to the node connects to any other vertex that is connected to the node.

Connectivity

Connectivity of a graph is a measure of robustness as a network. Two vertices are connected, if there is a path between them. Otherwise they are disconnected. A graph is connected if every pair of vertices is connected. 

Vertex connectivity is the smallest vertex cut of a connected graph - the number of vertexes that must be removed in order to disconnect the graph. Local connectivity is the size of a smallest vertex cut separating two vertexes in a graph. Graph connectivity equals the minimal local connectivity.

Edge connectivity is the smallest number of edge cuts that renders the graph disconnected. Local edge connectivity is the smallest number of edge cuts, that disconnect the two edges. Again, the smallest local edge connectivity in a graph is the graph egde connectivity.

The vertex- and edge-connectivities of a disconnected graph are both 0. The complete graph on n vertices has edge-connectivity equal to n − 1. Every other simple graph on n vertices has strictly smaller edge-connectivity. In a tree, the local edge-connectivity between every pair of vertices is 1. Edge connectivity is bounded by vertex connectivity and the latter is bounded by the minimum degree of the graph. 

Non-trivial Network Features

cell signaling network

The gadget spec URL could not be found

Most real world networks show topological features like a heavy tail in the degree distribution, a high clustering coefficient, assortativity or disassortativity among vertices, community structure, and hierarchical structure.

Assortativity

Assortativity refers to a preference of a node to attach to other nodes that are similar or different. Similarity is often but not necessarily expressed in terms of a nodes degree, i.e. in social networks, highly connected nodes prefer to attach to other highly connected nodes (assortativity). The oposite effect can be observed in technological and biological networks, where highly connected nodes tend to attach to low degree nodes (dissortativity). 

Assortativity is measured as correlation between two nodes. The most common measures are assortativity coefficient and neighbor connectivity. The assortativity coefficient is a Pearson correlation coefficient between pais of nodes. The range of the measure r is from -1 for perfect assortative mixing patterns to 1 for completely disassortative. 

Scale-free networks

In scale-free networks the degree distribution follows the power law. The degree distribution of these networks has no characteristic scale - some vertices have a degree that is orders of magnitude larger than the average (hubs). Networks exhibiting such a distribution are very different from networks, where edges existed independently and at random (Poisson process). Scale-free networks can be created with the Yule process (Gibrat principle, Matthew effect, cumulative advantage, preferential attachement).

These networks show a high robustness to the random vertex cut, i.e., the vast majority of vertices remain connected together in a giant component. However, they are clearly sensible to targeted attacks on hubs. Also, these critical vertices are the ones with the highest degree, and have thus been implicated in the spread of disease (natural and artificial) in social and communication networks, and in the spread of fads.

The detection and characterization of power laws is complicated by the large fluctuations that occur in the tail of the distribution—the part of the distribution representing large but rare events—and by the difficulty of identifying the range over which power-law behavior holds. Commonly used methods for analyzing power-law data, such as least-squares fitting, can produce substantially inaccurate estimates of parameters for power-law distributions, and even in cases where such methods return accurate answers they are still unsatisfactory because they give no indication of whether the data obey a power law at all. Finally, A. Clauset at all. [1] got there brilliant paper published, where they present a principled statistical framework for discerning and quantifying power-law behavior in empirical data. The approach combines maximum-likelihood fitting methods with goodness-of-fit tests based on the Kolmogorov–Smirnov (KS) statistic and likelihood ratios. The linked page hosts implementations of the methods described in the article and this site contains the 24 data sets.

[1] A. Clauset, C. R. Shalizi, and M. E. J. Newman. Power-law distributions in empirical data. SIAM Review, 51(4):661–703, 2009.


Small-world networks

A small-world network (in analogy to six degrees of separation phenomenon) is a regular graph with the addition of only a small number of long-range links. Even if the diameter of a regular graph is proportional to the size of the network, it can be transformed into a "small world" in which the average number of edges between any two vertices is very small. In small-world networks the diameter should grow as the logarithm of the size of the network, while the clustering coefficient stays large.

A wide variety of abstract graphs exhibit the small-world property, e.g., random graphs and scale-free networks. Real world networks such as the World Wide Web and the metabolic network also exhibit this property.

Network Data Sets

The SNAP project on Stanford University maintains a collection of network data sets. Here is a subset of it.

Social Networks

online social networks, edges represent interactions between people

Name Type Nodes Edges Description
soc-Epinions1 Directed 75,879 508,837 Who-trusts-whom network of Epinions.com
soc-LiveJournal1 Directed 4,847,571 6,8993,773 LiveJournal online social network
soc-Slashdot0811 Directed 77,360 905,468 Slashdot social network from November 2008
soc-Slashdot0922 Directed 82,168 948,464 Slashdot social network from February 2009
wiki-Vote Directed 7115 103,689 Wikipedia who-votes-on-whom network

Communication Networks

email communication networks with edges representing communication

Name Type Nodes Edges Description
email-EuAll Directed 265,214 420,045 Email network from a EU research institution
email-Enron Directed 36,692 367,662 Email communication network from Enron
wiki-Talk Directed 2,394,385 5,021,410 Wikipedia talk (communication) network

Citation Networks

nodes represent papers, edges represent citations

Name Type Nodes Edges Description
cit-HepPh Directed, Temporal, Labeled 34,546 421,578 Arxiv High Energy Physics paper citation network
cit-HepTh Directed, Temporal, Labeled 27,770 352,807 Arxiv High Energy Physics paper citation network
cit-Patents Directed, Temporal, Labeled 3,774,768 16,518,948 Citation network among US Patents

Collaboration Networks

nodes represent scientists, edges represent collaborations (co-authoring a paper)

Name Type Nodes Edges Description
ca-AstroPh Undirected 18,772 396,160 Collaboration network of Arxiv Astro Physics
ca-CondMat Undirected 23,133 186,936 Collaboration network of Arxiv Condensed Matter
ca-GrQc Undirected 5,242 28,980 Collaboration network of Arxiv General Relativity
ca-HepPh Undirected 12,008 237,010 Collaboration network of Arxiv High Energy Physics
ca-HepTh Undirected 9,877 51,971 Collaboration network of Arxiv High Energy Physics Theory
Comments