The extraction module

Functions necessary for the extraction of graph features.

hcga.extraction._load_feature_class(feature_name)[source]

load the feature class from feature name.

hcga.extraction._print_runtimes(all_features_df)[source]

Print sorted runtimes.

hcga.extraction._set_graph_labels(features, graphs)[source]

Set graph labels to features dataframe.

hcga.extraction.compute_all_features(graphs, list_feature_classes, n_workers=1, with_runtimes=False)[source]

Compute features for all graphs

Parameters
  • graphs (GraphCollection object) – GraphCollection object with loaded graphs (see graph.py)

  • list_feature_classes (list) – list of feature classes found in ./features

  • n_workers (int) – number of workers for parallel processing

  • with_runtimes (bool) – compute the run time of each feature

Returns

dataframe of calculated features for the graph collection.

Return type

(DataDrame)

hcga.extraction.extract(graphs, n_workers, mode='fast', normalize_features=True, statistics_level='basic', with_runtimes=False, with_node_features=True, timeout=10, connected=False, weighted=True)[source]

Main function to extract graph features.

Parameters
  • graphs (GraphCollection object) – GraphCollection object with loaded graphs (see graph.py)

  • n_workers (int) – number of workers for parallel processing

  • mode (str) – ‘fast’, ‘medium’, ‘slow’ - only features that are fast to compute will be run with ‘fast’

  • normalize_features (bool) – normalise features by number of nodes and number of edges

  • statistics_level (str) – ‘basic’, ‘advanced’ - for features that provide distributions we must compute statistics.

  • with_runtimes (bool) – calculating the run time of each feature.

  • with_node_features (bool) – include node features in feature extraction

  • timeout (int) – number of seconds before the calculation for a feature is cancelled

  • connected (bool) – True will make sure that only the largest connected component of a graph is used for feature extraction.

  • weighted (bool) – calculations will consider edge weights where possible.

Returns

dataframe of features (DataFrame): dataframe of meta information of computed features.

Return type

(DataFrame)

hcga.extraction.feature_extraction(graph, list_feature_classes, with_runtimes=False)[source]

Extract features for a single graph

Parameters
  • graph (Graph object) – Graph object (see graph.py)

  • list_feature_classes (list) – list of feature classes found in ./features

  • with_runtimes (bool) – compute the run time of each feature

Returns

dataframe of calculated features for a given graph.

Return type

(DataFrame)

hcga.extraction.get_list_feature_classes(mode='fast', normalize_features=True, statistics_level='basic', n_node_features=0, timeout=10)[source]

Generates and returns the list of feature classes to compute for a given mode.

Parameters
  • mode (str) – ‘fast’, ‘medium’, ‘slow’ - only features that are fast to compute will be run with ‘fast’

  • normalize_features (bool) – normalise features by number of nodes and number of edges

  • statistics_level (str) – ‘basic’, ‘advanced’ - for features that provide distributions we must compute statistics.

  • n_node_features (int) – dimension of node features for feature constructors

  • timeout (int) – number of seconds before the calculation for a feature is cancelled

Returns

list of feature classes instances (DataFrame): dataframe with feature information

Return type

(list)