The extraction module¶
Functions necessary for the extraction of graph features.
-
hcga.extraction.
_load_feature_class
(feature_name)[source]¶ load the feature class from feature name.
-
hcga.extraction.
_set_graph_labels
(features, graphs)[source]¶ Set graph labels to features dataframe.
-
hcga.extraction.
compute_all_features
(graphs, list_feature_classes, n_workers=1, with_runtimes=False)[source]¶ Compute features for all graphs
- Parameters
graphs (GraphCollection object) – GraphCollection object with loaded graphs (see graph.py)
list_feature_classes (list) – list of feature classes found in ./features
n_workers (int) – number of workers for parallel processing
with_runtimes (bool) – compute the run time of each feature
- Returns
dataframe of calculated features for the graph collection.
- Return type
(DataDrame)
-
hcga.extraction.
extract
(graphs, n_workers, mode='fast', normalize_features=True, statistics_level='basic', with_runtimes=False, with_node_features=True, timeout=10, connected=False, weighted=True)[source]¶ Main function to extract graph features.
- Parameters
graphs (GraphCollection object) – GraphCollection object with loaded graphs (see graph.py)
n_workers (int) – number of workers for parallel processing
mode (str) – ‘fast’, ‘medium’, ‘slow’ - only features that are fast to compute will be run with ‘fast’
normalize_features (bool) – normalise features by number of nodes and number of edges
statistics_level (str) – ‘basic’, ‘advanced’ - for features that provide distributions we must compute statistics.
with_runtimes (bool) – calculating the run time of each feature.
with_node_features (bool) – include node features in feature extraction
timeout (int) – number of seconds before the calculation for a feature is cancelled
connected (bool) – True will make sure that only the largest connected component of a graph is used for feature extraction.
weighted (bool) – calculations will consider edge weights where possible.
- Returns
dataframe of features (DataFrame): dataframe of meta information of computed features.
- Return type
(DataFrame)
-
hcga.extraction.
feature_extraction
(graph, list_feature_classes, with_runtimes=False)[source]¶ Extract features for a single graph
- Parameters
graph (Graph object) – Graph object (see graph.py)
list_feature_classes (list) – list of feature classes found in ./features
with_runtimes (bool) – compute the run time of each feature
- Returns
dataframe of calculated features for a given graph.
- Return type
(DataFrame)
-
hcga.extraction.
get_list_feature_classes
(mode='fast', normalize_features=True, statistics_level='basic', n_node_features=0, timeout=10)[source]¶ Generates and returns the list of feature classes to compute for a given mode.
- Parameters
mode (str) – ‘fast’, ‘medium’, ‘slow’ - only features that are fast to compute will be run with ‘fast’
normalize_features (bool) – normalise features by number of nodes and number of edges
statistics_level (str) – ‘basic’, ‘advanced’ - for features that provide distributions we must compute statistics.
n_node_features (int) – dimension of node features for feature constructors
timeout (int) – number of seconds before the calculation for a feature is cancelled
- Returns
list of feature classes instances (DataFrame): dataframe with feature information
- Return type
(list)