Datasets Used in My Papers

Plain Graphs

Name#nodes#edges#labelsTypeURL
Youtube1,138,4992,990,44347undirected[raw] [preprocessed]
TWeibo2,320,89550,655,143100directed[raw] [preprocessed]
Orkut3,072,441117,185,084100undirected[raw] [preprocessed]
In-20041,382,90816,539,643-directed[raw] [preprocessed]
DBLP5,425,96317,298,032-undirected[raw] [preprocessed]
Pokec1,632,80330,622,564-directed[raw] [preprocessed]
LiveJournal4,847,57168,475,391-directed[raw] [preprocessed]
IT-200441,291,5941,135,718,909-directed[raw] [preprocessed]
Twitter41,652,2301,468,365,182-directed[raw] [preprocessed]
Friendster65,608,3661,806,067,135-undirected[raw] [preprocessed]
UK-2007105,896,5553,738,733,648-directed[raw][preprocessed]
UK-union133,633,0405,475,109,924-directed[raw] [preprocessed]
ClueWeb12978,408,09842,574,107,469-directed[raw]
ClueWeb091,684,868,3227,939,635,651-directed[raw] [preprocessed]

Welcome to cite our paper if you publish results based on our preprocessed datasets.

@article{yang13homogeneous,
  title={Homogeneous Network Embedding for Massive Graphs via Reweighted Personalized PageRank},
  author={Yang, Renchi and Shi, Jieming and Xiao, Xiaokui and Yang, Yin and Bhowmick, Sourav S},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={5}
}

@article{shi13realtime,
  title={Realtime Index-Free Single Source SimRank Processing on Web-Scale Graphs},
  author={Shi, Jieming and Jin, Tianyuan and Yang, Renchi and Xiao, Xiaokui and Yang, Yin},
  journal={Proceedings of the VLDB Endowment},
  volume={13},
  number={7}
}

Attributed Graphs

NameType#nodes#edges#attributes#labelsURL
Wikidirected240517981497319[raw] [preprocessed]
Coradirected2708542914337[raw] [preprocessed]
Citeseerdirected3312466037036[raw] [preprocessed]
Pubmeddirected19717443385003[raw] [preprocessed]
BlogCatalogundirected519634348681896[raw] [preprocessed]
PPIundirected5694481871650121[raw] [preprocessed]
Flickrundirected7575479476120479[raw] [preprocessed]
Facebookundirected4039882341283193[raw] [preprocessed]
Twitterdirected8130617681492168394065[raw] [preprocessed]
Google+directed1076141367345315907468[raw] [preprocessed]
TWeibodirected23208955065514316578[raw] [preprocessed]
MAGdirected592497199781472532000100[raw] [preprocessed]
MAG-SCdirected1054156026521999427842408[raw] [preprocessed]

Our datasets are also available in Pytorch-Geometric. Node attributes can be loaded as a sparse matrix using the following code

from scipy import sparse
features = sparse.load_npz("attrs.npz")

Welcome to cite our paper if you publish results based on our preprocessed datasets.

@article{yang2020scaling,
  title={Scaling Attributed Network Embedding to Massive Graphs},
  author={Yang, Renchi and Shi, Jieming and Xiao, Xiaokui and Yang, Yin and Liu, Juncheng and Bhowmick, Sourav S},
  journal={Proceedings of the VLDB Endowment},
  volume={14},
  number={1},
  pages={37--49},
  year={2021},
  publisher={VLDB Endowment}
}

Dataset Repositories

NameTypeCollected by
SNAPGraphs & NetworksStanford
LAWGraphs & NetworksUNIMI
BioSNAPBiomedical NetworksStanford
KONECTGraphs & NetworksJérôme Kunegis
AminerAcademic NetworksAMiner
UCI Network Data RepositoryGraphs & NetworksUCI Datalab
Network RepositoryGraphs & Networks-
Open Academic GraphAcademic NetworksMicrosoft
Open Graph BenchmarkGraphs & NetworksStanford
TuDatasetsGraphs & NetworksChristopher Morris, etc.
StreamingGraphsStreaming GraphsYibo Yao
ARBGraphs & NetworksAustin R. Benson
SuiteSparse Matrix CollectionMatrix/GraphsTAMU
Web Data CommonsHyperlink Graphs/Web Tables/RDFaUniversity of Mannheim
Yahoo Webscope DatasetsGraphs/Ratings/Languages/AdvertisingYahoo
UCI Machine Learning RepositoryMultivariate/Text/Time-SeriesUCI
Yelp Open Datasetbusinesses/reviews/user dataYelp
Recommender Systems Datasetsgraphs/interactions/reviews/ratingsUCSD
MIcrosoft News Datasetuser behavior logsMicrosoft
Search Query Logsquery logsJeff Huang
AOL DSquery logsRicardo Campos