rankeval.model package¶

The rankeval.model module includes utilities to load a model and dump it according to several supported model’s format.

class rankeval.model.RTEnsemble(file_path, name=None, format='QuickRank', base_score=None, learning_rate=1, n_trees=None)[source]¶

Bases: object

Class for efficient modelling of an ensemble-based model composed of binary regression trees.

This class only provides the sketch of the data structure to use for storing the model. The responsibility to correctly fill these data structures is delegated to the various proxies model.

Load the model from the file identified by file_path using the given format.

file_path : str

The fpath to the filename where the model has been saved

name : str

The name to be given to the current model

format : [‘QuickRank’, ‘ScikitLearn’, ‘XGBoost’, ‘LightGBM’]

The format of the model to load.

base_score : None or float

The initial prediction score of all instances, global bias. If None, it uses default value used by each software (0.5 XGBoost, 0.0 all the others).

learning_rate : None or float

The learning rate used by the model to shrinks the contribution of: each tree. By default it is set to 1 (no shrinking at all).

n_trees : None or int

The maximum number of trees to load from the model. By default it is set to None, meaning the method will load all the trees.

file : str: The path to the filename where the model has been saved
name : str: The name to be given to the current model
n_trees : integer: The number of regression trees in the ensemble.
n_nodes : integer: The total number of nodes (splitting nodes and leaves) in the ensemble
trees_root: list of integers: Numpy array modelling the indexes of the root nodes of the regression trees composing the ensemble. The indexes refer to the following data structures: * trees_left_child * trees_right_child * trees_nodes_value * trees_nodes_feature
trees_weight: list of floats: Numpy array modelling the weights of the regression trees composing the ensemble.
trees_left_child: list of integers: Numpy array modelling the structure (shape) of the regression trees, considering only the left children. Given a node of a regression tree (a single cell in this array), the value identify the index of the left children. If the node is a leaf, the children assumes -1 value.
trees_right_child: list of integers: Numpy array modelling the structure (shape) of the regression trees, considering only the right children. Given a node of a regression tree (a single cell in this array), the value identify the index of the right children. If the node is a leaf, the children assumes -1 value.
trees_nodes_value: list of integers: Numpy array modelling either the output of a leaf node (whether the node is a leaf, in accordance with the trees_structure data structure) or the splitting value of the node in the regression trees (with respect to the feature identified by the trees_nodes_feature data structure).
trees_nodes_feature: list of integers: Numpy array modelling the feature-id used by the selected splitting node (or -1 if the node is a leaf).

model : RegressionTreeEnsemble: The loaded model as a RTEnsemble object

clear_cache()[source]¶: This method is used to clear the internal cache of the model from the scoring objects. Call this method at the end of the analysis of the current model (the memory otherwise will be automatically be freed on object deletion)

copy(n_trees=None)[source]¶

Create a copy of this model, with all the trees up to the given number. By default n_trees is set to None, meaning to copy all the trees

n_trees : None or int: The number of trees the model will have after calling this method.

model : RTEnsemble: The copied model, pruned from all the trees exceeding the given number of trees chosen

initialize(n_trees, n_nodes)[source]¶

Initialize the internal data structures in order to reflect the given shape and size of the ensemble. This method should be called only by the Proxy Models (the specific format-based loader/saver)

n_trees : integer: The number of regression trees in the ensemble.
n_nodes : integer: The total number of nodes (splitting nodes and leaves) in the ensemble

is_leaf_node(index)[source]¶

This method returns true if the node identified by the given index is a leaf node, false otherwise

index : integer: The index of the node to test

save(f, format='QuickRank')[source]¶

Save the model onto the file identified by file_path, using the given model format.

f : str: The path to the filename where the model has to be saved
format : str: The format to use for saving the model

status : bool: Returns true if the save is successful, false otherwise

score(dataset, detailed=False)[source]¶

Score the given model on the given dataset. Depending on the detailed parameter, the scoring will be either basic (i.e., compute only the document scores) or detailed (i.e., besides computing the document scores analyze also several characteristics of the model. The scorer is cached until existance of the model instance.

dataset : Dataset: The dataset to be scored
detailed : bool: True if the model has to be scored in a detailed fashion, false otherwise

y_pred : numpy 1d array (n_instances): The predictions made by scoring the model on the given dataset
partial_y_pred : numpy 2d array (n_instances x n_trees): The predictions made by scoring the model on the given dataset, on a tree basis (i.e., tree by tree and instance by instance)

class rankeval.model.ProxyQuickRank[source]¶

Bases: object

Class providing the implementation for loading/storing a QuickRank model from/to file.

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

class rankeval.model.ProxyLightGBM[source]¶

Bases: object

Class providing the implementation for loading/storing a LightGBM model from/to file.

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

class rankeval.model.ProxyXGBoost[source]¶

Bases: object

Class providing the implementation for loading/storing a XGBoost model from/to file.

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

class rankeval.model.ProxyScikitLearn[source]¶

Bases: object

Class providing the implementation for loading/storing a Scikit-Learn model from/to file.

static export_scikit_model(model, file_path)[source]¶

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

Submodules¶

rankeval.model.proxy_LightGBM module¶

Class providing the implementation for loading/storing a LightGBM model from/to file.

The LightGBM project is described here:: https://github.com/Microsoft/LightGBM

The LightGBM format adopts a textual representation using arrays for storing split nodes (both features and thresholds), leaf values and tree structure. Not all the information reported in the model are useful for the different analysis, thus only the relevant parts are parsed.

NOTE: the leaves output of the regression trees already take into account the weight of the tree (i.e., the learning rate or shrinkage factor). In order to maintain the scoring made by rankeval (that multiply the leaf output by the tree weight), the weight of the trees have been set equals to 1.

NOTE: currently rankeval support the loading of LightGBM models only if they have been trained by disabling missing values, i.e., when setting the relative parameter of the training method to False (‘use_missing’=False). This is required because LtR datasets do not have missing values, but have feature values equals to zero (while LightGBM consider zero valued feature as missing values).

class rankeval.model.proxy_LightGBM.ProxyLightGBM[source]¶

Bases: object

Class providing the implementation for loading/storing a LightGBM model from/to file.

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

rankeval.model.proxy_QuickRank module¶

Class providing the implementation for loading/storing a QuickRank model from/to file.

The QuickRank project is described here: http://quickrank.isti.cnr.it

The QuickRank format adopts an XML representation. There is an header section, identified by the “info” tag, with the most important parameters adopted to learn such a model. It follows then the description of the ensemble, with a node for each tree, identified by the “tree” tag, followed by the description of the tree (with splitting and leaf nodes). The splitting nodes are described with two information: the feature id used for splitting, and the threshold value. Leaf nodes on the other hand are described by an “output” tag with the value as content.

class rankeval.model.proxy_QuickRank.ProxyQuickRank[source]¶

Bases: object

Class providing the implementation for loading/storing a QuickRank model from/to file.

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

rankeval.model.proxy_ScikitLearn module¶

Class providing the implementation for loading/storing a XGBoost model from/to file. The model has to be saved using textual representation, i.e., by using the following method: .. code-block:: python

import xgboost as xgb … bst = xgb.train(param, dtrain, num_round) bst.dump_model(‘xgboost.model’)

The XGBoost project is described here:: https://github.com/dmlc/xgboost

The XGBoost format adopts a textual representation where each line of the file represent a single split node or a leaf node, with several attributes describing the feature and the threshold involved (in case of a split node) or the output (in case of a leaf). Each node is identified by a unique integer as well as additional information not usefull for rankeval and thus ignored.

class rankeval.model.proxy_ScikitLearn.ProxyScikitLearn[source]¶

Bases: object

Class providing the implementation for loading/storing a Scikit-Learn model from/to file.

static export_scikit_model(model, file_path)[source]¶

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

rankeval.model.proxy_XGBoost module¶

Class providing the implementation for loading/storing a XGBoost model from/to file. The model has to be saved using textual representation, i.e., by using the following method: .. code-block:: python

import xgboost as xgb … bst = xgb.train(param, dtrain, num_round) bst.dump_model(‘xgboost.model’)

The XGBoost project is described here:: https://github.com/dmlc/xgboost

The XGBoost format adopts a textual representation where each line of the file represent a single split node or a leaf node, with several attributes describing the feature and the threshold involved (in case of a split node) or the output (in case of a leaf). Each node is identified by a unique integer as well as additional information not usefull for rankeval and thus ignored.

NOTE: the XGBoost version 0.6 does not properly dump the model. Indeed, as reported in the issue here:

https://github.com/dmlc/xgboost/issues/2077

The precision of the dumping is not sufficient and cause inconsistencies with the XGBoost model. This inconsistencies cause rankeval scoring to return different predictions with respect to the original model. Without a fix by XGBoost authors, DO NOT USE this proxy.

class rankeval.model.proxy_XGBoost.ProxyXGBoost[source]¶

Bases: object

Class providing the implementation for loading/storing a XGBoost model from/to file.

static load(file_path, model)[source]¶

Load the model from the file identified by file_path.

file_path : str: The path to the filename where the model has been saved
model : RTEnsemble: The model instance to fill

static save(file_path, model)[source]¶

Save the model onto the file identified by file_path.

file_path : str: The path to the filename where the model has to be saved
model : RTEnsemble: The model RTEnsemble model to save on file

status : bool: Returns true if the save is successful, false otherwise

rankeval.model.rt_ensemble module¶

Class for efficient modelling of an ensemble-based model of binary regression trees.

class rankeval.model.rt_ensemble.RTEnsemble(file_path, name=None, format='QuickRank', base_score=None, learning_rate=1, n_trees=None)[source]¶

Bases: object

Class for efficient modelling of an ensemble-based model composed of binary regression trees.

This class only provides the sketch of the data structure to use for storing the model. The responsibility to correctly fill these data structures is delegated to the various proxies model.

Load the model from the file identified by file_path using the given format.

file_path : str

The fpath to the filename where the model has been saved

name : str

The name to be given to the current model

format : [‘QuickRank’, ‘ScikitLearn’, ‘XGBoost’, ‘LightGBM’]

The format of the model to load.

base_score : None or float

The initial prediction score of all instances, global bias. If None, it uses default value used by each software (0.5 XGBoost, 0.0 all the others).

learning_rate : None or float

The learning rate used by the model to shrinks the contribution of: each tree. By default it is set to 1 (no shrinking at all).

n_trees : None or int

The maximum number of trees to load from the model. By default it is set to None, meaning the method will load all the trees.

file : str: The path to the filename where the model has been saved
name : str: The name to be given to the current model
n_trees : integer: The number of regression trees in the ensemble.
n_nodes : integer: The total number of nodes (splitting nodes and leaves) in the ensemble
trees_root: list of integers: Numpy array modelling the indexes of the root nodes of the regression trees composing the ensemble. The indexes refer to the following data structures: * trees_left_child * trees_right_child * trees_nodes_value * trees_nodes_feature
trees_weight: list of floats: Numpy array modelling the weights of the regression trees composing the ensemble.
trees_left_child: list of integers: Numpy array modelling the structure (shape) of the regression trees, considering only the left children. Given a node of a regression tree (a single cell in this array), the value identify the index of the left children. If the node is a leaf, the children assumes -1 value.
trees_right_child: list of integers: Numpy array modelling the structure (shape) of the regression trees, considering only the right children. Given a node of a regression tree (a single cell in this array), the value identify the index of the right children. If the node is a leaf, the children assumes -1 value.
trees_nodes_value: list of integers: Numpy array modelling either the output of a leaf node (whether the node is a leaf, in accordance with the trees_structure data structure) or the splitting value of the node in the regression trees (with respect to the feature identified by the trees_nodes_feature data structure).
trees_nodes_feature: list of integers: Numpy array modelling the feature-id used by the selected splitting node (or -1 if the node is a leaf).

model : RegressionTreeEnsemble: The loaded model as a RTEnsemble object

clear_cache()[source]¶: This method is used to clear the internal cache of the model from the scoring objects. Call this method at the end of the analysis of the current model (the memory otherwise will be automatically be freed on object deletion)

copy(n_trees=None)[source]¶

Create a copy of this model, with all the trees up to the given number. By default n_trees is set to None, meaning to copy all the trees

n_trees : None or int: The number of trees the model will have after calling this method.

model : RTEnsemble: The copied model, pruned from all the trees exceeding the given number of trees chosen

initialize(n_trees, n_nodes)[source]¶

Initialize the internal data structures in order to reflect the given shape and size of the ensemble. This method should be called only by the Proxy Models (the specific format-based loader/saver)

n_trees : integer: The number of regression trees in the ensemble.
n_nodes : integer: The total number of nodes (splitting nodes and leaves) in the ensemble

is_leaf_node(index)[source]¶

This method returns true if the node identified by the given index is a leaf node, false otherwise

index : integer: The index of the node to test

save(f, format='QuickRank')[source]¶

Save the model onto the file identified by file_path, using the given model format.

f : str: The path to the filename where the model has to be saved
format : str: The format to use for saving the model

status : bool: Returns true if the save is successful, false otherwise

score(dataset, detailed=False)[source]¶

Score the given model on the given dataset. Depending on the detailed parameter, the scoring will be either basic (i.e., compute only the document scores) or detailed (i.e., besides computing the document scores analyze also several characteristics of the model. The scorer is cached until existance of the model instance.

dataset : Dataset: The dataset to be scored
detailed : bool: True if the model has to be scored in a detailed fashion, false otherwise

y_pred : numpy 1d array (n_instances): The predictions made by scoring the model on the given dataset
partial_y_pred : numpy 2d array (n_instances x n_trees): The predictions made by scoring the model on the given dataset, on a tree basis (i.e., tree by tree and instance by instance)