rankeval.analysis package¶
The rankeval.analysis
module implements the functionalities for analysing the
behaviour of several ranking models with respect to several metrics and
datasets. It proposes a comprehensive set of analysis for tuning, evaluating
and comparing Gradient Boosted Regression Tree models devoted to learning a
ranking function.
Submodules¶
rankeval.analysis.effectiveness module¶
This package implements several effectiveness analysis focused on assessing the performance of the models in terms of accuracy. These functionalities can be applied to several models at the same time, so to have a direct comparison of the analysis performed.

rankeval.analysis.effectiveness.
document_graded_relevance
(datasets, models, bins=100, start=None, end=None)[source]¶ This method implements the analysis of the model on a perlabel basis, i.e., it allows the evaluation of the cumulative predicted score distribution. For example, for each relevance label available in each dataset, it provides the fraction of documents with a predicted score smaller than a given score (the latter are binned among start and end). By plotting this fractions it is possible to obtains a curve for each relevance label. The bigger the distance amongst curves the larger the model’s discriminative power.
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given models
 models : list of RTEnsemble
 The models to analyze
 bins : int or None
 Number of equispaced bins for which to computer the cumulative distribution of the predicted scores. If bin is None, it will use the maximum number of queries across all the datasets as bins value.
 start : int or None
 The start point of the range for which we will compute the cumulative distribution of the predicted scores. If start is None, it will use the minimum metric score as starting point for the range.
 end : int or None
 The end point of the range for which we will compute the cumulative distribution of the predicted scores. If end is None, it will use the maximum metric score as starting point for the range.
 graded_relevance : xarray.DataArray
 A DataArray containing the fraction of documents with a predicted score smaller than a given score, for each model and each dataset.

rankeval.analysis.effectiveness.
model_performance
(datasets, models, metrics)[source]¶ This method implements the model performance analysis (part of the effectiveness analysis category).
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given metrics and models
 models : list of RTEnsemble
 The models to analyze
 metrics : list of Metric
 The metrics to use for the analysis
 metric_scores : xarray.DataArray
 A DataArray containing the metric scores of the models using the given metrics on the given datasets.

rankeval.analysis.effectiveness.
query_class_performance
(datasets, models, metrics, query_classes)[source]¶ This method implements the analysis of the effectiveness of a given model by providing a breakdown of the performance over query class. Whenever a query classification is provided, e.g., navigational, informational, transactional, number of terms composing the query, etc., it provides the model effectiveness over such classes. This analysis is important especially in a production environment, as it allows to calibrate the ranking infrastructure w.r.t. a specific context.
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given metrics and models
 models : list of RTEnsemble
 The models to analyze
 metrics : list of Metric
 The metrics to use for the analysis
 query_classes : list of lists
 A list containing lists of classes each one for a specific Dataset. The ith item in the jth list identifies the class of the ith query of the jth Dataset.
 query_class_performance : xarray.DataArray
 A DataArray containing the perclass metric scores of each model using the given metrics on the given datasets.

rankeval.analysis.effectiveness.
query_wise_performance
(datasets, models, metrics, bins=None, start=None, end=None)[source]¶ This method implements the analysis of the model on a querywise basis, i.e., it compute the cumulative distribution of a given performance metric. For example, the fraction of queries with a NDCG score smaller that any given threshold, over the set of queries described in the dataset.
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given metrics and models
 models : list of RTEnsemble
 The models to analyze
 metrics : list of Metric
 The metrics to use for the analysis
 bins : int or None
 Number of equispaced bins for which to computer the cumulative distribution of the given metric. If bin is None, it will use the maximum number of queries across all the datasets as bins value.
 start : int or None
 The start point of the range for which we will compute the cumulative distribution of the given metric. If start is None, it will use the minimum metric score as starting point for the range.
 end : int or None
 The end point of the range for which we will compute the cumulative distribution of the given metric. If end is None, it will use the maximum metric score as starting point for the range.
 metric_scores : xarray.DataArray
 A DataArray containing the metric scores of each model using the given metrics on the given datasets. The metric scores are cumulatively reported tree by tree, i.e., top 10 trees, top 20, etc., with a stepsize between the number of trees as highlighted by the step parameter.

rankeval.analysis.effectiveness.
rank_confusion_matrix
(datasets, models, skip_same_label=False)[source]¶ RankEval allows for a novel rankoriented confusion matrix by reporting for any given relevance label l_i, the number of document with a predicted score smaller than documents with label l_j. When l_i > l_j this corresponds to the number of misranked document pairs. This can be considered as a breakdown over the relevance labels of the ranking effectiveness of the model.
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given models
 models : list of RTEnsemble
 The models to analyze
 skip_same_label : bool
 True if the method has to skip the pair with the same labels, False otherwise
 ranked_matrix: xarray.DataArray
 A DataArray reporting for any given relevance label l_i, the number of documents with a predicted score smaller than documents with label l_j

rankeval.analysis.effectiveness.
tree_wise_average_contribution
(datasets, models)[source]¶ This method provides the average contribution given by each tree of each model to the scoring of the datasets.
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given metrics and models
 models : list of RTEnsemble
 The models to analyze
 average_contribution : xarray.DataArray
 A DataArray containing the average contribution given by each tree of each model to the scoring of the given datasets. The average contribution are reported tree by tree.

rankeval.analysis.effectiveness.
tree_wise_performance
(datasets, models, metrics, step=10)[source]¶ This method implements the analysis of the model on a treewise basis (part of the effectiveness analysis category).
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given metrics and models
 models : list of RTEnsemble
 The models to analyze
 metrics : list of Metric
 The metrics to use for the analysis
 step : int
 Stepsize identifying evenly spaced number of trees for evaluating the top=k model performance. (e.g., step=100 means the method will evaluate the model performance at 100, 200, 300, etc trees).
 metric_scores : xarray.DataArray
 A DataArray containing the metric scores of each model using the given metrics on the given datasets. The metric scores are cumulatively reported tree by tree, i.e., top 10 trees, top 20, etc., with a stepsize between the number of trees as highlighted by the step parameter.
rankeval.analysis.feature module¶
This package implements feature importance analysis.

rankeval.analysis.feature.
feature_importance
(model, dataset, metric=None)[source]¶ This method computes the feature importance relative to the given model and dataset.
 dataset : Dataset
 The dataset used to evaluate the model (typically the one used to train the model).
 model : RTEnsemble
 The model whose features we want to evaluate.
 metric : rankeval.metrics.Metric
 The metric to use for compute the feature gain at each split node. The default metric is the Root Mean Squared Error (MSE).
 feature_importance : xarray.DataArray
A DataArray containing the feature importance scores, one for each feature of the given model scored on the given dataset. Two main information are stored in the DataArray:
 feature_importance: A vector of importance values, one for each
 feature in the given model. The importance values reported are the sum of the improvements, in terms of MSE, of each feature, evaluated on the given dataset. The improvements are computed as the delta MSE before a split node and after, evaluating how much the MSE is improved as a result of the split.
 feature_count: A vector of count values, one for each feature in
 the given model. The count values reported highlights the number of times each feature is used in a split node, i.e., to improve the MSE.
rankeval.analysis.statistical module¶
This package implements several statistical significance tests.

rankeval.analysis.statistical.
bias_variance
(datasets=[], algos=[], metrics=[], L=10, k=2)[source]¶ This method computes the bias vs. variance decomposition of the error. The approach used here is based on the works of [Webb05] and [Dom05].
Each instance of the dataset is scored L times. A single scoring is achieved by splitting the dataset at random into k folds. Each fold is scored by the model M trained on the remainder folds. [Webb05] recommends the use of 2 folds.
If metric is MSE then the standard decomposition is used. The Bias for and instance x is defined as mean squared error of the L trained models w.r.t. the true label y, denoted with . The Variance for an instance x is measured across the L trained models: . Both are averaged over all instances in the dataset.
If metric is any of the IR quality measures, we resort to the bias variance decomposition of the mean squared error of the given metric w.r.t. its ideal value, e.g., for the case of NDCG, . Recall that, a formal Bias/Variance decomposition was not proposed yet.
 dataset : rankeval.dataset.Dataset
 The dataset instance.
 algo : function
This should be a wrapper of learning algorithm. The function should accept four parameters: train_X, train_Y, train_q, test_X.
 train_X: numpy.ndarray storing a 2D matrix of size num_docs x num_features
 train_Y: numpy.ndarray storing a vector of document’s relevance labels
 train_q: numpy.ndarray storing a vector of query lengths
 test_X: numpy.ndarray as for train_X
A model is trained on train_X, train_Y, train_q, and used to score test_X. An numpy.ndarray with such score must be returned.
 metric : “mse” or rankeval.metrics.metric.Metric
 The metric used to compute the error.
 L : int
 Number of iterations
 k : int
 Number of folds.
 bias_variance : xarray.DataArray
 A DataArray containing the bias/variance decomposition of the error for any given dataset, algorithm and metric.
[Webb05] (1, 2) Webb, Geoffrey I., and Paul Conilione. “Estimating bias and variance from data.” Prepublication manuscript (pdf) (2005). [Dom05] Domingos P. A unified biasvariance decomposition. In Proceedings of 17th International Conference on Machine Learning 2000 (pp. 231238).

rankeval.analysis.statistical.
statistical_significance
(datasets, model_a, model_b, metrics, n_perm=100000)[source]¶ This method computes the statistical significance of the performance difference between model_a and.
 datasets : list of Dataset
 The datasets to use for analyzing the behaviour of the model using the given metrics and models
 model_a : RTEnsemble
 The first model considered.
 model_b : RTEnsemble
 The second model considered.
 metrics : list of Metric
 The metrics to use for the analysis
 n_perm : int
 Number of permutations for the randomization test.
 stat_sig : xarray.DataArray
 A DataArray containing the statistical significance of the performance difference between any pair of models on the given dataset.
rankeval.analysis.topological module¶
This package implements several topological analysis focused on the topological characteristics of ensemblebased LtR models. These functionalities can be applied to several models, so as to have a direct comparison of the shape of the resulting forests (e.g., trained by different LtR algorithms).

class
rankeval.analysis.topological.
TopologicalAnalysisResult
(model, include_leaves)[source]¶ Bases:
object
This class is used to return the topological analysis made on the model. Several lowlevel information are stored in this class, and then reelaborated to provide highlevel analysis.
Analyze the model in a topological perspective
 model : RTEnsemble
 the model to analyze from the topological perspective
 include_leaves : bool
 Whether the leaves has to be included in the analysis or not
 model : RTEnsemble
 The model analyzed
 height_trees : numpy array
 The ordered height of each trees composing the ensemble
 topology : scipy.sparse.csr_matrix
 The matrix used to store lowlevel information related to the aggregated shape of the trees. Each matrix cell identifies a tree node with a pair of coordinates rowcol, with row highlighting the depth and col the column with respect to a full binary tree.

avg_tree_shape
()[source]¶ Computes the fraction of trees having each node with respect to a full binary tree. The fraction is obtained by normalizing the count by the number of trees composing the ensemble model.
 fractions : scipy.sparse.csr_matrix
 Sparse matrix with the same shape of the topology matrix, where each matrix cell identifies a tree node by a pair of coordinates rowcol, with row highlighting the depth and col the column with respect to a full binary tree. Each cell value highlights how many trees have the specific node, normalized by the number of trees.

describe_tree_height
()[source]¶ Computes several descriptive statistics of the height of the trees.
 nobs : int
 Number of trees
 minmax: tuple of ndarrays or floats
 Minimum and maximum height of trees
 mean : ndarray or float
 Arithmetic mean of tree heights.
 variance : ndarray or float
 Unbiased variance of the tree heights. denominator is number of trees minus one.
 skewness : ndarray or float
 Skewness, based on moment calculations with denominator equal to the number of trees, i.e. no degrees of freedom correction.
 kurtosis : ndarray or float
 Kurtosis (Fisher). The kurtosis is normalized so that it is zero for the normal distribution. No degrees of freedom are used.

fullness_per_level
()[source]¶ Computes the normalized number of trees with full level i, for each level of a full binary tree. The normalization is done by the number of trees.
 fullness : np.array
 An array long as the maximum height of a tree in the ensemble, and where the jth cell highlight how much the jth level of the trees is full (normalized by the number of trees).

rankeval.analysis.topological.
topological_analysis
(model, include_leaves=True)[source]¶ This method implements the topological analysis of a ensemblebased LtR model. Given a model, it studies the shape of each tree composing the model and return several information useful for having insights about the shape of the trees, their completeness (level by level) as well as min/max/mean height and the fraction of trees having a specific node (where each node is identified by a pair of coordinates rowcol, with row highlighting the depth and col the column with respect to a full binary tree).
 model : RTEnsemble
 The model to analyze
 include_leaves : bool
 Whether the leaves has to be included in the analysis or not
 object : TopologicalAnalysisResult
 The topological result, to use for retrieving several information