gflownet.evaluator.abstract

Abstract evaluator class for GFlowNetAgent.

Warning

Should not be used directly, but subclassed to implement specific evaluators for different tasks and environments.

See BaseEvaluator for a default, concrete implementation of this abstract class.

This class handles some logic that will be the same for all evaluators. The only requirements for a subclass are to implement the eval() and plot() methods which will be called by the eval_and_log() method:

def eval_and_log(self, it, metrics=None):
    results = self.eval(metrics=metrics)
    for m, v in results["metrics"].items():
        setattr(self.gfn, m, v)

    metrics_to_log = {
        METRICS[k]["display_name"]: v for k, v in results["metrics"].items()
    }

    figs = self.plot(**results["data"])

    self.logger.log_metrics(metrics_to_log, it, self.gfn.use_context)
    self.logger.log_plots(figs, it, use_context=self.gfn.use_context)

See gflownet.evaluator for a full-fledged example and gflownet.evaluator.base for a concrete implementation of this abstract class.

Attributes

`METRICS`	All metrics that can be computed by a `BaseEvaluator`.
`ALL_REQS`	Union of all requirements of all metrics in `METRICS`.

Classes

AbstractEvaluator

Abstract evaluator class for GFlowNetAgent.

Module Contents

gflownet.evaluator.abstract.METRICS[source]

All metrics that can be computed by a BaseEvaluator.

Structured as a dict with the metric names as keys and the metric display names and requirements as values.

Requirements are used to decide which kind of data / samples is required to compute the metric.

Display names are used to log the metrics and to display them in the console.

Implementations of AbstractEvaluator can add new metrics to this dict by implementing the method AbstractEvaluator.define_new_metrics().

gflownet.evaluator.abstract.ALL_REQS[source]: Union of all requirements of all metrics in METRICS.

class gflownet.evaluator.abstract.AbstractEvaluator(gfn_agent=None, **config)[source]

Abstract evaluator class for GFlowNetAgent.

In charge of evaluating the GFlowNetAgent, computing metrics plotting figures and optionally logging results using the GFlowNetAgent’s Logger.

You can use the from_dir() or from_agent() class methods to easily instantiate this class from a run directory or an existing in-memory GFlowNetAgent.

Use set_agent() to set the evaluator’s GFlowNetAgent after initialization if it was not provided at instantiation as GflowNetEvaluator(gfn_agent=...).

This __init__ function will call, in order:

update_all_metrics_and_requirements() which uses new metrics defined in the define_new_metrics() method to update the global METRICS and ALL_REQS variables in classes inheriting from AbstractEvaluator.
self.metrics = self.make_metrics(self.config.metrics) using make_metrics()
self.reqs = self.make_requirements() using make_requirements()

Parameters:

gfn_agent (GFlowNetAgent, optional) – The GFlowNetAgent to evaluate. By default None. Should be set using the from_dir() or from_agent() class methods.
config (dict) – The configuration of the evaluator. Will be converted to an OmegaConf instance and stored in the self.config attribute.

config[source]

The configuration of the evaluator.

Type:: omegaconf.OmegaConf

metrics[source]

Dictionary of metrics to compute, with the metric names as keys and the metric display names and requirements as values.

Type:: dict

reqs[source]

The set of requirements for the metrics. Used to decide which kind of data / samples is required to compute the metric.

Type:: set[str]

logger

The logger to use to log the results of the evaluation. Will be set to the GFlowNetAgent’s logger.

Type:: Logger

config[source]

metrics[source]

reqs[source]

property gfn[source]

Get the GFlowNetAgent to evaluate.

This is a read-only property. Use the set_agent() method to set the GFlowNetAgent.

Returns:: GFlowNetAgent – The GFlowNetAgent to evaluate.
Raises:: ValueError – If the GFlowNetAgent has not been set.

set_agent(gfn_agent)[source]

Set the GFlowNetAgent to evaluate after initialization.

It is then accessible through the self.gfn property.

Parameters:: gfn_agent (GFlowNetAgent) – The GFlowNetAgent to evaluate.

define_new_metrics()[source]

Method to be implemented by subclasses to define new metrics.

Example

def define_new_metrics(self):
    return {
        "my_custom_metric": {
            "display_name": "My custom metric",
            "requirements": ["density", "new_req"],
        }
    }

Returns:: dict – Dictionary of new metrics to add to the global METRICS dict.

update_all_metrics_and_requirements()[source]: Method to be implemented by subclasses to update the global dict of metrics and requirements.

classmethod from_dir(path, no_wandb=True, print_config=False, device='cuda', load_final_ckpt=True)[source]

Instantiate a BaseEvaluator from a run directory.

Parameters:

cls (BaseEvaluator) – Class to instantiate.
path (Union[str, os.PathLike]) – Path to the run directory from which to load the GFlowNetAgent.
no_wandb (bool, optional) – Prevent wandb initialization, by default True
print_config (bool, optional) – Whether or not to print the resulting (loaded) config, by default False
device (str, optional) – Device to use for the instantiated GFlowNetAgent, by default “cuda”
load_final_ckpt (bool, optional) – Use the latest possible checkpoint available in the path, by default True

Returns:

BaseEvaluator – Instance of BaseEvaluator with the GFlowNetAgent loaded from the run.

classmethod from_agent(gfn_agent)[source]

Instantiate a BaseEvaluator from a GFlowNetAgent.

Parameters:

cls (BaseEvaluator) – Evaluator class to instantiate.
gfn_agent (GFlowNetAgent) – Instance of GFlowNetAgent to use for the BaseEvaluator.

Returns:

BaseEvaluator – Instance of BaseEvaluator with the provided GFlowNetAgent.

make_metrics(metrics=None)[source]

Parse metrics from a dict, list, a string or None.

If None, all metrics are selected.
If a string, it can be a comma-separated list of metric names, with or without spaces.
If a list, it should be a list of metric names (keys of METRICS).
If a dict, its keys should be metric names and its values will be ignored: they will be assigned from METRICS.

All metrics must be in METRICS.

Parameters:: metrics (Union[str, List[str]], optional) – Metrics to compute when running the eval() method. Defaults to None, i.e. all metrics in METRICS are computed.
Returns:: dict – Dictionary of metrics to compute, with the metric names as keys and the metric display names and requirements as values.
Raises:: ValueError – If a metric name is not in METRICS.

make_requirements(reqs=None, metrics=None)[source]

Make requirements for the metrics to compute.

If metrics is provided, they must be as a dict of metrics. The requirements are computed from the requirements attribute of the metrics.
Otherwise, the requirements are computed from the reqs argument:
- If reqs is "all", all requirements of all metrics are computed.
- If reqs is None, the evaluator’s self.reqs attribute is used.
- If reqs is a list, it is used as the requirements.

Parameters:

reqs (Union[str, List[str]], optional) – The metrics requirements. Either "all", a list of requirements or None to use the evaluator’s self.reqs attribute. By default None.
metrics (Union[str, List[str], dict], optional) – The metrics to compute requirements for. If not a dict, will be passed to make_metrics(). By default None.

Returns:

set[str] – The set of requirements for the metrics.

should_log_train(step)[source]

Check if training logs should be done at the current step. The decision is based on the self.config.train.period attribute.

Set self.config.train.period to None or a negative value to disable training.

Parameters:: step (int) – Current iteration step.
Returns:: bool – True if train logging should be done at the current step, False otherwise.

should_eval(step)[source]

Check if testing should be done at the current step. The decision is based on the self.config.test.period attribute.

Set self.config.test.first_it to True if testing should be done at the first iteration step. Otherwise, testing will be done aftter self.config.test.period steps.

Set self.config.test.period to None or a negative value to disable testing.

Parameters:: step (int) – Current iteration step.
Returns:: bool – True if testing should be done at the current step, False otherwise.

should_eval_top_k(step)[source]

Check if top k plots and metrics should be done at the current step. The decision is based on the self.config.test.top_k and self.config.test.top_k_period attributes.

Set self.config.test.top_k to None or a negative value to disable top k plots and metrics.

Parameters:: step (int) – Current iteration step.
Returns:: bool – True if top k plots and metrics should be done at the current step, False

should_checkpoint(step)[source]

Check if checkpoints should be done at the current step. The decision is based on the self.checkpoints.period attribute.

Set self.checkpoints.period to None or a negative value to disable checkpoints.

Parameters:: step (int) – Current iteration step.
Returns:: bool – True if checkpoints should be done at the current step, False otherwise.

abstract plot(**kwargs)[source]

The main method to plot results.

Will be called by the eval_and_log() method to plot the results of the evaluation. Will be passed the results of the eval() method:

# in eval_and_log
results = self.eval(metrics=metrics)
figs = self.plot(**results["data"])

Returns:: dict – Dictionary of figures to log, with the figure names as keys and the figures as values.

abstract eval(metrics=None, **plot_kwargs)[source]

The main method to compute metrics and intermediate results.

This method should return a dict with two keys: "metrics" and "data".

The “metrics” key should contain the new metric(s) and the “data” key should contain the intermediate results that can be used to plot the new metric(s).

Example

>>> metrics = None # use the default metrics from the config file
>>> results = gfne.eval(metrics=metrics)
>>> plots = gfne.plot(**results["data"])

>>> metrics = "all" # compute all metrics, regardless of the config
>>> results = gfne.eval(metrics=metrics)

>>> metrics = ["l1", "kl"] # compute only the L1 and KL metrics
>>> results = gfne.eval(metrics=metrics)

>>> metrics = "l1,kl" # alternative syntax
>>> results = gfne.eval(metrics=metrics)

See Basic concepts for more details about metrics.

Parameters:: metrics (Union[str, dict, list], optional) – Which metrics to compute, by default None.

abstract eval_top_k(it)[source]

Evaluate the GFlowNetAgent’s top k samples performance.

Classes extending this abstract class should implement this method.

Parameters:

it (int) – Current iteration step.

Returns:

dict – Dictionary with the following keys schema: .. code-block:: python

{
“metrics”: {str: float}, “figs”: {str: plt.Figure}, “summary”: {str: float},

}

eval_and_log(it, metrics=None)[source]

Evaluate the GFlowNetAgent and log the results with its logger.

Will call self.eval() and log the results using the GFlowNetAgent’s logger log_metrics() and log_plots() methods.

Parameters:

it (int) – Current iteration step.
metrics (Union[str, List[str]], optional) – List of metrics to compute, by default the evaluator’s metrics attribute.

eval_and_log_top_k(it)[source]

Evaluate the GFlowNetAgent’s top k samples performance and log the results with its logger.

Parameters:: it (int) – Current iteration step, by default None.