The extraction module¶

Functions necessary for the extraction of graph features.

hcga.extraction._load_feature_class(feature_name)[source]¶: load the feature class from feature name.

hcga.extraction._print_runtimes(all_features_df)[source]¶: Print sorted runtimes.

hcga.extraction._set_graph_labels(features, graphs)[source]¶: Set graph labels to features dataframe.

hcga.extraction.compute_all_features(graphs, list_feature_classes, n_workers=1, with_runtimes=False)[source]¶

Compute features for all graphs

Parameters

graphs (GraphCollection object) – GraphCollection object with loaded graphs (see graph.py)
list_feature_classes (list) – list of feature classes found in ./features
n_workers (int) – number of workers for parallel processing
with_runtimes (bool) – compute the run time of each feature

Returns

dataframe of calculated features for the graph collection.

Return type

(DataDrame)

hcga.extraction.extract(graphs, n_workers, mode='fast', normalize_features=True, statistics_level='basic', with_runtimes=False, with_node_features=True, timeout=10, connected=False, weighted=True)[source]¶

Main function to extract graph features.

Parameters

graphs (GraphCollection object) – GraphCollection object with loaded graphs (see graph.py)
n_workers (int) – number of workers for parallel processing
mode (str) – ‘fast’, ‘medium’, ‘slow’ - only features that are fast to compute will be run with ‘fast’
normalize_features (bool) – normalise features by number of nodes and number of edges
statistics_level (str) – ‘basic’, ‘advanced’ - for features that provide distributions we must compute statistics.
with_runtimes (bool) – calculating the run time of each feature.
with_node_features (bool) – include node features in feature extraction
timeout (int) – number of seconds before the calculation for a feature is cancelled
connected (bool) – True will make sure that only the largest connected component of a graph is used for feature extraction.
weighted (bool) – calculations will consider edge weights where possible.

Returns

dataframe of features (DataFrame): dataframe of meta information of computed features.

Return type

(DataFrame)

hcga.extraction.feature_extraction(graph, list_feature_classes, with_runtimes=False)[source]¶

Extract features for a single graph

Parameters

graph (Graph object) – Graph object (see graph.py)
list_feature_classes (list) – list of feature classes found in ./features
with_runtimes (bool) – compute the run time of each feature

Returns

dataframe of calculated features for a given graph.

Return type

(DataFrame)

hcga.extraction.get_list_feature_classes(mode='fast', normalize_features=True, statistics_level='basic', n_node_features=0, timeout=10)[source]¶

Generates and returns the list of feature classes to compute for a given mode.

Parameters

mode (str) – ‘fast’, ‘medium’, ‘slow’ - only features that are fast to compute will be run with ‘fast’
normalize_features (bool) – normalise features by number of nodes and number of edges
statistics_level (str) – ‘basic’, ‘advanced’ - for features that provide distributions we must compute statistics.
n_node_features (int) – dimension of node features for feature constructors
timeout (int) – number of seconds before the calculation for a feature is cancelled

Returns

list of feature classes instances (DataFrame): dataframe with feature information

Return type

(list)

The extraction module¶

hcga

Navigation

Related Topics