The cli app¶

Hcga app module.

Users can interact with hcga directly through the command line using the purpose built command line interface app.

For those users that wish to interact with hcga via Python directly (e.g. through a notebook) then please use the hcga class.

Below is a short example of the commands necessary to run the ENZYMES dataset directly from the command line:

hcga get_data ENZYMES

hcga extract_features datasets/ENZYMES.pkl -m fast -n -1 --timeout 10

hcga feature_analysis ENZYMES`

Alternatively these commands can be bundled together into a single bash file, see ‘run_example.sh’ in the examples folder.

cli¶

Cli.

cli [OPTIONS] COMMAND [ARGS]...

Options

-v, --verbose¶

extract_features¶

Extract features from dataset of graphs and save the feature matrix, info and labels.

cli extract_features [OPTIONS] DATASET

Options

-rf, --results-folder <results_folder>¶

Location of results

Default: results

-n, --n-workers <n_workers>¶

Number of workers for multiprocessing

Default: 1

-m, --mode <mode>¶

Mode of features to extract (fast, medium, slow)

Default: fast

--norm, --no-norm¶

Normalised features by number of edges/nodes (by default not)

Default: True

--node-feat, --no-node-feat¶

Use node features if any.

Default: True

-sl, --stats-level <stats_level>¶

Level of statistical features (basic, medium, advanced)

Default: advanced

--timeout <timeout>¶

Timeout for feature evaluations.

Default: 10.0

-of, --output-file <output_file>¶: Location of results, by default same as initial dataset

--runtimes, --no-runtimes¶

Output runtimes

Default: False

--connected, --no-connected¶

Remove disconnected components

Default: False

Arguments

DATASET¶: Required argument

feature_analysis¶

Analysis of the features extracted in feature_file.

cli feature_analysis [OPTIONS] DATASET

Options

-rf, --results-folder <results_folder>¶

Location of results

Default: ./results

-ff, --feature-file <feature_file>¶

Location of features

Default: all_features.pkl

--analysis-type <analysis_type>¶

classification/regression/unsupervised.

Default: classification

--graph-removal <graph_removal>¶

Fraction of failed features to remove a graph from dataset.

Default: 0.3

-i, --interpretability <interpretability>¶

Interpretability of feature to consider

Default: 1

-m, --model <model>¶

model for feature analysis (RF, LGBM, XG)

Default: XG

--kfold, --no-kfold¶

use K-fold

Default: True

--reduce-set, --no-reduce-set¶

True or False whether to recompute accuarcies with a reduced set of top features.

Default: True

--reduced-set-size <reduced_set_size>¶

Number of uncorrelated top features to consider in top reduced feature classificaion.

Default: 100

--reduced-set-max-correlation <reduced_set_max_correlation>¶

Maximum correlation to allow for selection of top features for reduced feature classification.

Default: 0.9

-p, --plot, -np, --no-plot¶

Optionnaly plot analysis results

Default: True

--max-feats-plot <max_feats_plot>¶

Number of top features to plot with violins.

Default: 20

--n-splits <n_splits>¶: Number of splits for k-fold, None will use an automatic estimation.

--n-repeats <n_repeats>¶

Number of repeats of k-folds for better averaged accuracies.

Default: 1

Arguments

DATASET¶: Required argument

get_data¶

Generate the benchmark or test data.

Dataset_name can be either:

TESTDATA: to generate synthetic dataset for testing
DD, ENZYMES, REDDIT-MULTI-12K, PROTEINS, MUTAG,

or any other dataset hosted on https://ls11-www.cs.tu-dortmund.de/people/morris/graphkerneldatasets

cli get_data [OPTIONS] DATASET_NAME

Options

-f, --folder <folder>¶

Location to save dataset

Default: ./datasets

Arguments

DATASET_NAME¶: Required argument

The cli app¶

cli¶

extract_features¶

feature_analysis¶

get_data¶

hcga

Navigation

Related Topics