gflownet.evaluator ================== .. py:module:: gflownet.evaluator .. autoapi-nested-parse:: An ``Evaluator`` is a class that is used to compute metrics and generate plots. It serves two complementary purposes: 1. Evaluate the performance of the agent during training and to log the results. 2. Evaluate the performance of a trained agent, from a directory containing the agent's checkpoints for instance. .. note:: This dual use explains some seaminlgy redundant methods / or arguments to methods. For instance in :meth:`gflownet.evaluator.abstract.AbstractEvaluator.eval` the ``metrics`` argument will never change during the training of a GflowNet (it will always be ``None``, *i.e.* inherited from the config file) but a user looking to evaluate a trained agent may want to specify different metrics to compute without altering the config file. .. important:: Prefer the :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.from_dir` and :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.from_agent` class methods to instantiate an evaluator. Typical call stack: 1. :meth:`gflownet.gflownet.GFlowNetAgent.train` calls the evaluator's 2. :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.should_eval`. If it returns ``True`` then :meth:`~gflownet.gflownet.GFlowNetAgent.train` calls 3. :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` which itself calls 4. :meth:`~gflownet.evaluator.base.BaseEvaluator.eval` as ``results = self.eval(metrics=None)`` and then ``figs = self.plot(**results["data"])`` 5. finally, :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` logs the results using the GFlowNetAgent's logger as ``self.logger.log_metrics(results["metrics"])`` and ``self.logger.log_plots(figs)``. .. _evaluator basic concepts: Basic concepts -------------- The evaluator is used to compute metrics and generate plots. It is used to evaluate the performance of the agent during training and to log the results. It is also intended to be used to evaluate the performance of a trained agent. The ``metrics`` keyword argument usually reflect to a description of which quantities are to be computed. They can take the following forms: - ``None``: all metrics defined in the config file / in the evaluator's ``.config.metrics`` attribute will be computed. - ``"all"``: all known metrics as defined in :const:`~gflownet.evaluator.abstract.AbstractEvaluator.METRICS` will be computed. - Note that classes that inherit from :class:`~gflownet.evaluator.abstract.AbstractEvaluator` can define new metrics with the :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.define_new_metrics` method. - ``list``: a list of metric names to be computed. The names must be keys of :const:`~gflownet.evaluator.abstract.AbstractEvaluator.METRICS`. - ``dict``: a dictionary that is a subset of :const:`~gflownet.evaluator.abstract.AbstractEvaluator.METRICS`. The concept of ``requirements`` is used to avoid unnecessary computations. If a metric requires a certain quantity to be computed, then the evaluator will only compute that quantity if the metric is requested. This is done by the :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.make_requirements` method and can be used in methods that compute metrics and plots like ``if "some_req" in reqs`` (see below for an example). .. _using an evaluator: Using an Evaluator ------------------ .. code-block:: python # How to create a new evaluator: from gflownet.evaluator.base import BaseEvaluator gfn_run_dir = "PUT_YOUR_RUN_DIR_HERE" # a run dir contains a .hydra folder gfne = BaseEvaluator.from_dir(gfn_run_dir) results = gfne.eval() for name, metric in results["metrics"].items(): print(f"{name:20}: {metric:.4f}") data = results.get("data", {}) plots = gfne.plot(**data) print( "Available figures in plots:", ", ".join([fname for fname, fig in plots.items() if fig is not None]) or "None", ) Implementing your own evaluator ------------------------------- In general, you will inherit from :class:`~gflownet.evaluator.base.BaseEvaluator` and override the following methods: * ``define_new_metrics``: define new metrics and associated requirements. * ``eval``: compute the metrics and return them as a ``dict``: ``{"metrics": {metric_name: metric_value}, "data": {str: Any}}``. * ``plot``: return a ``dict`` of figures as ``{figure_title: figure}``. By default, the training loop will call the ``eval_and_log`` method which itself calls the ``eval`` method to log the metrics, and the ``plot`` method to log the figures: .. code-block:: python def eval_and_log(self, metrics=None, **plot_kwargs): results = self.eval(metrics=metrics) for m, v in results["metrics"].items(): setattr(self.gfn, m, v) mertics_to_log = { METRICS[k]["display_name"]: v for k, v in results["metrics"].items() } figs = self.plot(**results["data"]) self.logger.log_metrics(mertics_to_log, it, self.gfn.use_context) self.logger.log_plots(figs, it, use_context=self.gfn.use_context) Example implementation: .. code-block:: python # gflownet/evaluator/my_evaluator.py from gflownet.evaluator.base import BaseEvaluator class MyEvaluator(BaseEvaluator): def define_new_metrics(self): ''' This method is called when the class is instantiated and is used to update the global METRICS and ALL_REQS variables. ''' my_metrics = super().define_new_metrics() my_metrics["new_metric"] = { "display_name": "My custom metric", "requirements": ["density", "new_req"], } return my_metrics def my_custom_metric(self, some, arguments): ''' Your metric-computing method. It should return a dict with two keys: ``"metrics"`` and ``"data"``. The "metrics" key should contain the new metric(s) and the "data" key should contain the intermediate results that can be used to plot the new metric(s). Its arguments will come from the `eval()` method below. Parameters ---------- some : type description arguments : type description Returns ------- dict A dict with two keys: ``"metrics"`` and ``"data"``. ''' intermediate = some + arguments return { "metrics": { "my_custom_metric": intermediate ** (-0.5) }, "data": { "some_other": some ** 2, "arguments": arguments, "intermediate": intermediate, } } ... def my_custom_plot( self, some_other=None, arguments=None, intermediate=None, **kwargs ): ''' Your plotting method. It should return a dict with figure titles as keys and the figures as values. Its arguments will come from the `plot()` method below, and basically come from the "data" key of the output of other metrics-computing functions. Parameters ---------- some_other : type, optional description, by default None arguments : type, optional description, by default None intermediate : type, optional description, by default None Returns ------- dict A dict with figure titles as keys and the figures as values. ''' # whatever gets to **kwargs will be ignored, this is used to handle # methods with varying signatures. figs = {} if some_other is not None: f = plt.figure() # some plotting procedure for some_other figs["My Title"] = f if arguments is not None: f = plt.figure() # some other plotting procedure with both figs["My Other Title"] = f elif arguments is not None: f = plt.figure() # some other plotting procedure with arguments figs["My 3rd Title"] = f if intermediate is not None: f = plt.figure() # some other plotting procedure with intermediate figs["My 4th Title"] = f return figs def plot(self, **kwargs): ''' Your custom plot method. It should return a dict with figure titles as keys and the figures as values. It will be called by the `eval_and_log` method to log the figures, and given the "data" key of the output of other metrics-computing functions. Returns ------- dict A dict with figure titles as keys and the figures as values. ''' figs = super().plot(**kwargs) figs.update(self.my_custom_plot(**kwargs)) return figs def eval(self, metrics=None, **plot_kwargs): ''' Your custom eval method. It should return a dict with two keys: ``"metrics"`` and ``"data"``. It will be called by the `eval_and_log` method to log the metrics, Parameters ---------- metrics : Union[list, dict], optional The metrics you want to compute in this evaluation procedure, by default None, meaning the ones defined in the config file. Returns ------- dict A dict with two keys: ``"metrics"`` and ``"data"``. ''' metrics = self.make_metrics(metrics) reqs = self.make_requirements(metrics=metrics) results = super().eval(metrics=metrics, **plot_kwargs) if "new_req" in reqs: some = self.gfn.sample_something() arguments = utils.some_other_function() my_results = self.my_custom_metric(some, arguments) results["metrics"].update(my_results.get("metrics", {})) results["data"].update(my_results.get("data", {})) return results Then define your own ``evaluator`` in the config file: .. code-block:: yaml # config/evaluator/my_evaluator.yaml defaults: - base _target_: gflownet.evaluator.my_evaluator.MyEvaluator # any other params hereafter will extend or override the base class params: period: 1000 .. note:: In general, you should not override the ``make_requirements`` or ``make_metrics`` methods. They should be used as-is in your ``eval`` method (or any other) to decide which metrics and plots to compute. In the previous example, the ``define_new_metrics`` method is used to define new metrics and associated requirements. It will be called when the ``MyEvaluator`` class is instantiated, in the init of :class:`~gflownet.evaluator.abstract.AbstractEvaluator`. By defining a new requirement, you ensure that the new metrics and plots will only be computed if user asks for a metric that requires such computations. By default, the training loop will call :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` which itself calls :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval` so if you override ``eval()`` as above, the new metrics and plots will be computed and logged. Similarly, :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` will compute the ``dict`` of figures as ``fig_dict = self.plot(**results["data"])`` where ``results`` is the output of :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval`. Submodules ---------- .. toctree:: :maxdepth: 1 /autoapi/gflownet/evaluator/abstract/index /autoapi/gflownet/evaluator/base/index