Importing datasets#

This module makes graphs collected by the ‘Graph Layout Benchmark Datasets’ project from the Northeastern University Visualization Lab easily accessible for networkX.

The project aims to collect datasets used for graph layout algorithms and make them available for long-term access. The graphs are stored on the Open Science Foundation platform

Information about the individual datasets can be found at the project homepage. For more information refer to the corresponding short paper [1].

Usage#

To get a list of all available datasets, call get_available_datasets():

>>> available_datasets = get_available_datasets()
>>> print(available_datasets)
['subways', 'code', 'rome', 'chess', 'steinlib', ...

To iterate over all graphs of a given dataset, simply call iterate_dataset():

>>> for name, g in iterate_dataset('subways'):
>>>     print("'{name}' has {n} vertices and {m} edges".format(name=name, n=g.order(), m=len(g.edges()))

The module takes care of downloading, caching, maintaining and updating the graphs automatically. In case there are any problems or you want to free up disc space, you can clean all saved data with the following command:

>>> clear_cache()

Methods#

datasets.clear_cache() None#

In case that there any issues with corrupted data, call this method to clear all graphs saved to the disk.

datasets.get_available_datasets() List[str]#

Returns a list of all available datasets as a list of their identifying names. To get more information on a specific dataset, refer to the homepage of the ‘Graph Layout Benchmark Datasets’ project.

Example:

>>> available_datasets = get_available_datasets()
>>> print(available_datasets)
['subways', 'code', 'rome', 'chess', 'steinlib', ...
Returns:

List of available dataset names

Return type:

List[str]

datasets.iterate_dataset(dataset: str, adapt_attributes: bool = True) Iterator[Tuple[str, Graph]]#

Generates an iterator of all graphs in the specified data set.

Example:

>>> for name, g in iterate_dataset('subways'):
>>>     print("'{name}' has {n} vertices and {m} edges".format(name=name, n=g.order(), m=len(g.edges()))

In order to obtain a list of all available dataset names call get_available_datasets().

Parameters:
  • dataset (str) – name of the dataset.

  • adapt_attributes (bool) – If true, “weight”, “x” and “y” are converted to float (if present), and a “pos” attribute is calculated from “x” and “y”

Returns:

Iterator over all graphs of the dataset and its name

Return type:

Iterator[Tuple[str, Graph]]

Bibliography#