Networks from SNAP (Stanford Network Analysis Platform) Network Data Sets, Jure Leskovec http://snap.stanford.edu/data/index.html email jure at cs.stanford.edu Included into the Univ of Florida Sparse Matrix Collection, June 2010, by Tim Davis. * Social networks: online social networks, edges represent interactions between people * Communication networks: email communication networks with edges representing communication * Citation networks: nodes represent papers, edges represent citations * Collaboration networks: nodes represent scientists, edges represent collaborations (co-authoring a paper) * Web graphs: nodes represent webpages and edges are hyperlinks * Blog and Memetracker graphs: nodes represent time stamped blog posts, edges are hyperlinks * Amazon networks : nodes represent products and edges link commonly co-purchased products * Internet networks : nodes represent computers and edges communication * Road networks : nodes represent intersections and edges roads connecting the intersections * Autonomous systems : graphs of the internet * Signed networks : networks with positive and negative edges (friend/foe, trust/distrust) The numbers of nodes in the tables below can differ from the nodes of the graphs as stored in the UF Sparse Matrix Collection. In the table below, the number of nodes excludes nodes of zero degree, for some problems. These are marked with a "+" in front of the number of Nodes, in the list below. -------------------------------------------------------------------------------- Social networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description soc-Epinions1 Directed + 75,879 508,837 Who-trusts-whom network of Epinions.com soc-LiveJournal1 Directed 4,847,571 6,8993,773 LiveJournal online social network soc-Slashdot0811 Directed 77,360 905,468 Slashdot social network from November 2008 soc-Slashdot0922 Directed 82,168 948,464 Slashdot social network from February 2009 wiki-Vote Directed + 7115 103,689 Wikipedia who-votes-on-whom network -------------------------------------------------------------------------------- Communication networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description email-EuAll Directed 265,214 420,045 Email network from a EU research institution email-Enron ** Directed 36,692 367,662 Email communication network from Enron wiki-Talk Directed 2,394,385 5,021,410 Wikipedia talk (communication) network (** NOTE: The email-Enron graph is listed as "directed" in the SNAP data set, but for every edge (i,j) there is an edge (j,i). That is, the graph appears to be undirected. The adjacency matrix in the UF Collection is symmetric). -------------------------------------------------------------------------------- Citation networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description cit-HepPh Directed, Temporal, Labeled 34,546 421,578 Arxiv High Energy Physics paper citation network cit-HepTh Directed, Temporal, Labeled 27,770 352,807 Arxiv High Energy Physics paper citation network cit-Patents Directed, Temporal, Labeled 3,774,768 16,518,948 Citation network among US Patents -------------------------------------------------------------------------------- Collaboration networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description ca-AstroPh Undirected 18,772 396,160 Collaboration network of Arxiv Astro Physics ca-CondMat Undirected 23,133 186,936 Collaboration network of Arxiv Condensed Matter ca-GrQc Undirected 5,242 28,980 Collaboration network of Arxiv General Relativity ca-HepPh Undirected 12,008 237,010 Collaboration network of Arxiv High Energy Physics ca-HepTh Undirected 9,877 51,971 Collaboration network of Arxiv High Energy Physics Theory NOTE: the number of edges listed above for the ca-* graphs counts each edge twice. In the UF collection, this is exactly the number of entries in the sparse adjacency matrix. -------------------------------------------------------------------------------- Web graphs -------------------------------------------------------------------------------- Name Type Nodes Edges Description web-BerkStan Directed 685,230 7,600,595 Web graph of Berkeley and Stanford web-Google Directed + 875,713 5,105,039 Web graph from Google web-NotreDame Directed 325,729 1,497,134 Web graph of Notre Dame web-Stanford Directed 281,903 2,312,497 Web graph of Stanford.edu -------------------------------------------------------------------------------- Product co-purchasing networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description amazon0302 Directed 262,111 1,234,877 Amazon product co-purchasing network from March 2 2003 amazon0312 Directed 400,727 3,200,440 Amazon product co-purchasing network from March 12 2003 amazon0505 Directed 410,236 3,356,824 Amazon product co-purchasing network from May 5 2003 amazon0601 Directed 403,394 3,387,388 Amazon product co-purchasing network from June 1 2003 -------------------------------------------------------------------------------- Internet peer-to-peer networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description p2p-Gnutella04 Directed + 10,876 39,994 Gnutella peer to peer network from August 4 2002 p2p-Gnutella05 Directed 8,846 31,839 Gnutella peer to peer network from August 5 2002 p2p-Gnutella06 Directed 8,717 31,525 Gnutella peer to peer network from August 6 2002 p2p-Gnutella08 Directed 6,301 20,777 Gnutella peer to peer network from August 8 2002 p2p-Gnutella09 Directed 8,114 26,013 Gnutella peer to peer network from August 9 2002 p2p-Gnutella24 Directed 26,518 65,369 Gnutella peer to peer network from August 24 2002 p2p-Gnutella25 Directed 22,687 54,705 Gnutella peer to peer network from August 25 2002 p2p-Gnutella30 Directed 36,682 88,328 Gnutella peer to peer network from August 30 2002 p2p-Gnutella31 Directed 62,586 147,892 Gnutella peer to peer network from August 31 2002 -------------------------------------------------------------------------------- Road networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description roadNet-CA Undirected + 1,965,206 5,533,214 Road network of California roadNet-PA Undirected + 1,088,092 3,083,796 Road network of Pennsylvania roadNet-TX Undirected + 1,379,917 3,843,320 Road network of Texas NOTE: each edge is counted twice in the roadNet data above. The numbers, above, correspond to the number of entries in the sparse adjacency matrix. -------------------------------------------------------------------------------- Autonomous systems graphs -------------------------------------------------------------------------------- Name Type Nodes Edges Description as-735 (735 graphs) Undirected 103-6,474 243-13,233 735 daily instances(graphs) from November 8 1997 to January 2 2000 as-Skitter Undirected 1,696,415 * 11,095,298 Internet topology graph, from traceroutes run daily in 2005 as-Caida (122 graphs) Directed 8,020-26,475 36,406-106,762 The CAIDA AS Relationships Datasets, from January 2004 to November 2007 Oregon-1 (9 graphs) Undirected 10,670-11,174 22,002-23,409 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 Oregon-2 (9 graphs) Undirected 10,900-11,461 31,180-32,730 AS peering information inferred from Oregon route-views between March 31 and May 26 2001 NOTE: (*) the SNAP data lists the number of edges just once, for the as-Skitter and Oregon-* matrices. This is exactly half the number of edges in the sparse adjacency matrix in the UF collection. -------------------------------------------------------------------------------- Signed networks -------------------------------------------------------------------------------- Name Type Nodes Edges Description soc-sign-epinions Directed 131,828 841,372 Epinions signed social network wiki-Elec Directed, Bipartite 7,000 100,000 Wikipedia adminship election data (excluded from UF collection) soc-sign-Slashdot081106 Directed 77,357 516,575 Slashdot Zoo signed social network from November 6 2008 soc-sign-Slashdot090216 Directed 81,871 545,671 Slashdot Zoo signed social network from February 16 2009 soc-sign-Slashdot090221 Directed 82,144 549,202 Slashdot Zoo signed social network from February 21 2009 -------------------------------------------------------------------------------- Blog graphs and Memetracker data Memetracker data contains timestamped phrase and link information for news media articles and blog posts from 1.5 million different blogs and news websites. The data spans 10 months from August 2008 till May 2009, with several hundred million documents. http://www.memetracker.org/data.html (Note that this data has not been imported into the UF Sparse Matrix Collection) Network types * Directed : directed network * Undirected : undirected network * Bipartite : bipartite network * Multigraph : network has multiple edges between a pair of nodes * Temporal : for each node/edge we know the time when it appeared in the network * Labeled : network contains labels (weights, attributes) on nodes and/or edges Network statistics Dataset statistics Nodes Number of nodes in the network Edges Number of edges in the network Nodes in largest WCC Number of nodes in the largest weakly connected component Edges in largest WCC Number of edges in the largest weakly connected component Nodes in largest SCC Number of nodes in the largest stongly connected component Edges in largest SCC Number of edges in the largest stongly connected component Average clustering coefficient Average clustering coefficient Number of triangles Number of triples of connected nodes (considering the network as undirected) Fraction of closed triangles Number of connected triples of nodes / number of (undirected) length 2 paths Diameter (longest shortest path) Maximum undirected shortest path length (sampled over 1,000 random nodes) 90-percentile effective diameter 90-th percentile of undirected shortest path length distribution (sampled over 1,000 random nodes) ================================================================================ Note that some versions of these graphs already appear in the UF collection. Some have similar names: web-BerkStan Kamvar/Stanford_Berkeley in SNAP/: n: 685,230 nz: 7,600,595 in Kamvar/ n: 683,446 nz: 7,583,376 I obtained the Kamvar/Stanford_Berkeley directly from Sep Kamvar. It is slightly smaller than the version in SNAP. It is thus likely that Sep created multiple versions of the graph. web-Google appears only in SNAP. web-NotreDame Barabasi/NotreDame_www in SNAP/: n: 325,729 nz: 1,497,134 in Barabasi/: n: same nz: 929,849 The Barabasi/NotreDame_www is an exact copy of the graph of that name in the Pajek data set. The SNAP collection has a different version of this graph, of which SNAP/web-NotreDame is an exact copy. It is possible that Barabasi's version of the graph is yet a 3rd version of this graph. web-Stanford Kamvar/Stanford (same size and nnz) n: 281,903 nz: 2,312,497 The SNAP/web-Stanford graph and the Kamvar/Stanford graphs have the same number of nodes and edges. However, they differ in nonzero pattern. cit-HepTh Pajek/HEP-th-new is identical to the SNAP/cit-HeptTh graph. Since it's small, I have decided to include both in the collection, to keep the SNAP/ collection complete. n: 27,770 nz: 352,807 cit-HepPh appears only in SNAP ca-HepPh appears only in SNAP ca-HepTh appears only in SNAP Pajek/HEP-th appears only in the Pajek collection cit-Patents in SNAP n: 3,774,768 nz: 16,518,948 Pajek/patents n: same nz: 14,970,767 Both of these come from the NBER data. However, the edges are not the same. The SNAP/cit-Patents data is a strict superset of the Pajek/patents graph. If A0 = Pajek/patents and A1 = SNAP/cit-Patents, then nnz(A1-A0) = nnz(A1)-nnz(A0) = 1,548,181. All edges in A0 appear in A1. The aux data is not the same. Pajek/patents contains more auxiliary data for each node. This data can be used to interpret the SNAP/cit-Patents graph as well, since the nodes match up from one graph to the other.