rhf package

Submodules

rhf.rhf module

Main module.

class rhf.rhf.Node[source]

Bases: object

Node object

class rhf.rhf.RHF(num_trees=100, max_height=5, split_criterion='kurtosis', check_duplicates=True)[source]

Bases: object

Random Histogram Forest. Builds and ensemble of Random Histogram Trees

Parameters:
  • num_trees (int) – number of trees
  • max_height (int) – maximum height of each tree
  • split_criterion (str) – split criterion to use - ‘kurtosis’ or ‘random’
  • check_duplicates (bool) – check duplicates in each leaf
check_hash(data)[source]

Checks if there are duplicates in the dataset

Parameters:data – dataset
fit(data)[source]

Fit function: builds the ensemble and returns the scores

Parameters:data – the dataset to fit
Return scores:anomaly scores
get_hash(data)[source]

Builds hash of data for duplicates identification

Parameters:data – dataset
class rhf.rhf.RandomHistogramTree(data=None, max_height=None, split_criterion='kurtosis')[source]

Bases: object

Random Histogram Tree object

Parameters:
  • max_height (int) – max height of the tree
  • split_criterion (bool) – split criterion to use: ‘kurtosis’ or ‘random’
build(node, data)[source]

Function which recursively builds the tree

Parameters:
  • node – current node
  • data – data corresponding to current node
build_tree(data)[source]

Build tree function: generates the root node and successively builds the tree recursively

Parameters:data – the dataset
generate_node(depth=None, parent=None)[source]

Generates a new new

Parameters:
  • depth (int) – depth of the node
  • parent (Node) – parent node
set_leaf(node, data)[source]

Transforms generic node into leaf

Parameters:
  • node – generic node to transform into leaf
  • data – node data used to define node size and data indexes corresponding to node
class rhf.rhf.Root[source]

Bases: rhf.rhf.Node

Node (Root) object

rhf.rhf.get_kurtosis_feature_split(data)[source]

Get attribute split according to Kurtosis Split

Parameters:data – the dataset of the node
Returns:
  • feature_index: the attribute index to split
  • feature_split: the attribute value to split
rhf.rhf.get_random_feature_split(data)[source]

Get attribute split according to Random Split

Parameters:data – the dataset of the node
Returns:
  • feature_index: the attribute index to split
  • feature_split: the attribute value to split

Module contents

Top-level package for rhf.