Importing datasets#

This module makes graphs collected by the ‘Graph Layout Benchmark Datasets’ project from the Northeastern University Visualization Lab easily accessible for networkX.

The project aims to collect datasets used for graph layout algorithms and make them available for long-term access. The graphs are stored on the Open Science Foundation platform

Information about the individual datasets can be found at the project homepage. For more information refer to the corresponding short paper [1].

Usage#

To get a list of all available datasets, call get_available_datasets():

>>> available_datasets = get_available_datasets()
>>> print(available_datasets)
['subways', 'code', 'rome', 'chess', 'steinlib', ...

To iterate over all graphs of a given dataset, simply call iterate_dataset():

>>> for name, g in iterate_dataset('subways'):
>>>     print("'{name}' has {n} vertices and {m} edges".format(name=name, n=g.order(), m=len(g.edges()))

The module takes care of downloading, caching, maintaining and updating the graphs automatically. In case there are any problems or you want to free up disc space, you can clean all saved data with the following command:

>>> clear_cache()

Methods#

datasets.clear_cache() None#

In case that there any issues with corrupted data, call this method to clear all graphs saved to the disk.

datasets.get_available_datasets() List[str]#

Returns a list of all available datasets as a list of their identifying names. To get more information on a specific dataset, refer to the homepage of the ‘Graph Layout Benchmark Datasets’ project.

Example:

>>> available_datasets = get_available_datasets()
>>> print(available_datasets)
['subways', 'code', 'rome', 'chess', 'steinlib', ...
Returns:

List of available dataset names

Return type:

List[str]

datasets.get_available_graph_names(dataset: str) List[str]#

Returns a list of all graphs available in the given dataset. Use get_available_datasets() to obtain a list of available datasets.

Parameters:

dataset (str) – Name of the dataset

Returns:

List of available graphs

Return type:

List[str]

datasets.get_specific_graph(dataset: str, graph_name: str, adapt_attributes: bool = True) Graph#

Use this function if you only want to retrieve a single graph from a dataset instead of iterating over the whole dataset using iterate_dataset().

Note: Do not use this function to iterate over the whole dataset. You might run into a ‘429 Too Many Requests’ HTTP Error, as the connection to OSF is reestablished on each call.

Parameters:
  • dataset (str) – Name of the dataset

  • graph_name (str) – Name of the graph in the dataset

  • adapt_attributes (bool) – If true, “weight”, “x” and “y” are converted to float (if present), and a “pos” attribute is calculated from “x” and “y”

Returns:

The graph in question as a networkX graph

Return type:

nx.Graph

Raises:

ValueError – if the given graph is not available in the dataset

datasets.iterate_dataset(dataset: str, adapt_attributes: bool = True) Iterator[Tuple[str, Graph]]#

Generates an iterator of all graphs in the specified data set.

Example:

>>> for name, g in iterate_dataset('subways'):
>>>     print("'{name}' has {n} vertices and {m} edges".format(name=name, n=g.order(), m=len(g.edges()))

In order to obtain a list of all available dataset names call get_available_datasets().

Parameters:
  • dataset (str) – name of the dataset.

  • adapt_attributes (bool) – If true, “weight”, “x” and “y” are converted to float (if present), and a “pos” attribute is calculated from “x” and “y”

Returns:

Iterator over all graphs of the dataset and its name

Return type:

Iterator[Tuple[str, Graph]]

Bibliography#