gflownet.evaluator.abstract =========================== .. py:module:: gflownet.evaluator.abstract .. autoapi-nested-parse:: Abstract evaluator class for GFlowNetAgent. .. warning:: Should not be used directly, but subclassed to implement specific evaluators for different tasks and environments. See :class:`~gflownet.evaluator.base.BaseEvaluator` for a default, concrete implementation of this abstract class. This class handles some logic that will be the same for all evaluators. The only requirements for a subclass are to implement the :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval` and :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.plot` methods which will be called by the :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` method: .. code-include :: :meth:`gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` .. code-include :: :func:`gflownet.evaluator.abstract.AbstractEvaluator.eval_and_log` .. code-include :: :class:`gflownet.gflownet.abstract.AbstractEvaluator` .. code-include :: :func:`gflownet.utils.common.gflownet_from_config` .. code-block:: python def eval_and_log(self, it, metrics=None): results = self.eval(metrics=metrics) for m, v in results["metrics"].items(): setattr(self.gfn, m, v) metrics_to_log = { METRICS[k]["display_name"]: v for k, v in results["metrics"].items() } figs = self.plot(**results["data"]) self.logger.log_metrics(metrics_to_log, it, self.gfn.use_context) self.logger.log_plots(figs, it, use_context=self.gfn.use_context) See :mod:`gflownet.evaluator` for a full-fledged example and :mod:`gflownet.evaluator.base` for a concrete implementation of this abstract class. Attributes ---------- .. autoapisummary:: gflownet.evaluator.abstract.METRICS gflownet.evaluator.abstract.ALL_REQS Classes ------- .. autoapisummary:: gflownet.evaluator.abstract.AbstractEvaluator Module Contents --------------- .. py:data:: METRICS All metrics that can be computed by a ``BaseEvaluator``. Structured as a dict with the metric names as keys and the metric display names and requirements as values. Requirements are used to decide which kind of data / samples is required to compute the metric. Display names are used to log the metrics and to display them in the console. Implementations of :class:`AbstractEvaluator` can add new metrics to this dict by implementing the method :meth:`AbstractEvaluator.define_new_metrics`. .. py:data:: ALL_REQS Union of all requirements of all metrics in :const:`METRICS`. .. py:class:: AbstractEvaluator(gfn_agent=None, **config) Abstract evaluator class for :class:`GFlowNetAgent`. In charge of evaluating the :class:`GFlowNetAgent`, computing metrics plotting figures and optionally logging results using the :class:`GFlowNetAgent`'s :class:`Logger`. You can use the :meth:`from_dir` or :meth:`from_agent` class methods to easily instantiate this class from a run directory or an existing in-memory :class:`GFlowNetAgent`. Use :meth:`~gflownet.evaluator.abstract.AbstractEvaluator.set_agent` to set the evaluator's :class:`GFlowNetAgent` after initialization if it was not provided at instantiation as ``GflowNetEvaluator(gfn_agent=...)``. This ``__init__`` function will call, in order: 1. :meth:`update_all_metrics_and_requirements` which uses new metrics defined in the :meth:`define_new_metrics` method to update the global :const:`METRICS` and :const:`ALL_REQS` variables in classes inheriting from :class:`AbstractEvaluator`. 2. ``self.metrics = self.make_metrics(self.config.metrics)`` using :meth:`make_metrics` 3. ``self.reqs = self.make_requirements()`` using :meth:`make_requirements` :param gfn_agent: The GFlowNetAgent to evaluate. By default None. Should be set using the :meth:`from_dir` or :meth:`from_agent` class methods. :type gfn_agent: GFlowNetAgent, optional :param config: The configuration of the evaluator. Will be converted to an OmegaConf instance and stored in the ``self.config`` attribute. :type config: dict .. attribute:: config The configuration of the evaluator. :type: :class:`omegaconf.OmegaConf` .. attribute:: metrics Dictionary of metrics to compute, with the metric names as keys and the metric display names and requirements as values. :type: dict .. attribute:: reqs The set of requirements for the metrics. Used to decide which kind of data / samples is required to compute the metric. :type: set[str] .. attribute:: logger The logger to use to log the results of the evaluation. Will be set to the GFlowNetAgent's logger. :type: Logger .. py:attribute:: config .. py:attribute:: metrics .. py:attribute:: reqs .. py:property:: gfn Get the ``GFlowNetAgent`` to evaluate. This is a read-only property. Use the :meth:`set_agent` method to set the ``GFlowNetAgent``. :returns: :class:`GFlowNetAgent` -- The ``GFlowNetAgent`` to evaluate. :raises ValueError: If the ``GFlowNetAgent`` has not been set. .. py:method:: set_agent(gfn_agent) Set the ``GFlowNetAgent`` to evaluate after initialization. It is then accessible through the ``self.gfn`` property. :param gfn_agent: The ``GFlowNetAgent`` to evaluate. :type gfn_agent: :class:`GFlowNetAgent` .. py:method:: define_new_metrics() Method to be implemented by subclasses to define new metrics. .. admonition:: Example .. code-block:: python def define_new_metrics(self): return { "my_custom_metric": { "display_name": "My custom metric", "requirements": ["density", "new_req"], } } :returns: *dict* -- Dictionary of new metrics to add to the global :const:`METRICS` dict. .. py:method:: update_all_metrics_and_requirements() Method to be implemented by subclasses to update the global dict of metrics and requirements. .. py:method:: from_dir(path, no_wandb = True, print_config = False, device = 'cuda', load_final_ckpt = True) :classmethod: Instantiate a BaseEvaluator from a run directory. :param cls: Class to instantiate. :type cls: BaseEvaluator :param path: Path to the run directory from which to load the GFlowNetAgent. :type path: Union[str, os.PathLike] :param no_wandb: Prevent wandb initialization, by default True :type no_wandb: bool, optional :param print_config: Whether or not to print the resulting (loaded) config, by default False :type print_config: bool, optional :param device: Device to use for the instantiated GFlowNetAgent, by default "cuda" :type device: str, optional :param load_final_ckpt: Use the latest possible checkpoint available in the path, by default True :type load_final_ckpt: bool, optional :returns: *BaseEvaluator* -- Instance of BaseEvaluator with the GFlowNetAgent loaded from the run. .. py:method:: from_agent(gfn_agent) :classmethod: Instantiate a BaseEvaluator from a GFlowNetAgent. :param cls: Evaluator class to instantiate. :type cls: BaseEvaluator :param gfn_agent: Instance of GFlowNetAgent to use for the BaseEvaluator. :type gfn_agent: GFlowNetAgent :returns: *BaseEvaluator* -- Instance of BaseEvaluator with the provided GFlowNetAgent. .. py:method:: make_metrics(metrics=None) Parse metrics from a dict, list, a string or ``None``. - If ``None``, all metrics are selected. - If a string, it can be a comma-separated list of metric names, with or without spaces. - If a list, it should be a list of metric names (keys of :const:`METRICS`). - If a dict, its keys should be metric names and its values will be ignored: they will be assigned from :const:`METRICS`. All metrics must be in :const:`METRICS`. :param metrics: Metrics to compute when running the :meth:`.eval` method. Defaults to ``None``, i.e. all metrics in :const:`METRICS` are computed. :type metrics: Union[str, List[str]], optional :returns: *dict* -- Dictionary of metrics to compute, with the metric names as keys and the metric display names and requirements as values. :raises ValueError: If a metric name is not in :const:`METRICS`. .. py:method:: make_requirements(reqs=None, metrics=None) Make requirements for the metrics to compute. 1. If ``metrics`` is provided, they must be as a dict of metrics. The requirements are computed from the ``requirements`` attribute of the metrics. 2. Otherwise, the requirements are computed from the ``reqs`` argument: - If ``reqs`` is ``"all"``, all requirements of all metrics are computed. - If ``reqs`` is ``None``, the evaluator's ``self.reqs`` attribute is used. - If ``reqs`` is a list, it is used as the requirements. :param reqs: The metrics requirements. Either ``"all"``, a list of requirements or ``None`` to use the evaluator's ``self.reqs`` attribute. By default ``None``. :type reqs: Union[str, List[str]], optional :param metrics: The metrics to compute requirements for. If not a dict, will be passed to :meth:`make_metrics`. By default None. :type metrics: Union[str, List[str], dict], optional :returns: *set[str]* -- The set of requirements for the metrics. .. py:method:: should_log_train(step) Check if training logs should be done at the current step. The decision is based on the ``self.config.train.period`` attribute. Set ``self.config.train.period`` to ``None`` or a negative value to disable training. :param step: Current iteration step. :type step: int :returns: *bool* -- True if train logging should be done at the current step, False otherwise. .. py:method:: should_eval(step) Check if testing should be done at the current step. The decision is based on the ``self.config.test.period`` attribute. Set ``self.config.test.first_it`` to ``True`` if testing should be done at the first iteration step. Otherwise, testing will be done aftter ``self.config.test.period`` steps. Set ``self.config.test.period`` to ``None`` or a negative value to disable testing. :param step: Current iteration step. :type step: int :returns: *bool* -- True if testing should be done at the current step, False otherwise. .. py:method:: should_eval_top_k(step) Check if top k plots and metrics should be done at the current step. The decision is based on the ``self.config.test.top_k`` and ``self.config.test.top_k_period`` attributes. Set ``self.config.test.top_k`` to ``None`` or a negative value to disable top k plots and metrics. :param step: Current iteration step. :type step: int :returns: *bool* -- True if top k plots and metrics should be done at the current step, False .. py:method:: should_checkpoint(step) Check if checkpoints should be done at the current step. The decision is based on the ``self.checkpoints.period`` attribute. Set ``self.checkpoints.period`` to ``None`` or a negative value to disable checkpoints. :param step: Current iteration step. :type step: int :returns: *bool* -- True if checkpoints should be done at the current step, False otherwise. .. py:method:: plot(**kwargs) :abstractmethod: The main method to plot results. Will be called by the :meth:`eval_and_log` method to plot the results of the evaluation. Will be passed the results of the :meth:`eval` method: .. code-block:: python # in eval_and_log results = self.eval(metrics=metrics) figs = self.plot(**results["data"]) :returns: *dict* -- Dictionary of figures to log, with the figure names as keys and the figures as values. .. py:method:: eval(metrics=None, **plot_kwargs) :abstractmethod: The main method to compute metrics and intermediate results. This method should return a dict with two keys: ``"metrics"`` and ``"data"``. The "metrics" key should contain the new metric(s) and the "data" key should contain the intermediate results that can be used to plot the new metric(s). .. admonition:: Example >>> metrics = None # use the default metrics from the config file >>> results = gfne.eval(metrics=metrics) >>> plots = gfne.plot(**results["data"]) >>> metrics = "all" # compute all metrics, regardless of the config >>> results = gfne.eval(metrics=metrics) >>> metrics = ["l1", "kl"] # compute only the L1 and KL metrics >>> results = gfne.eval(metrics=metrics) >>> metrics = "l1,kl" # alternative syntax >>> results = gfne.eval(metrics=metrics) See :ref:`evaluator basic concepts` for more details about ``metrics``. :param metrics: Which metrics to compute, by default ``None``. :type metrics: Union[str, dict, list], optional .. py:method:: eval_top_k(it) :abstractmethod: Evaluate the ``GFlowNetAgent``'s top k samples performance. Classes extending this abstract class should implement this method. :param it: Current iteration step. :type it: int :returns: *dict* -- Dictionary with the following keys schema: .. code-block:: python { "metrics": {str: float}, "figs": {str: plt.Figure}, "summary": {str: float}, } .. py:method:: eval_and_log(it, metrics=None) Evaluate the GFlowNetAgent and log the results with its logger. Will call ``self.eval()`` and log the results using the GFlowNetAgent's logger ``log_metrics()`` and ``log_plots()`` methods. :param it: Current iteration step. :type it: int :param metrics: List of metrics to compute, by default the evaluator's ``metrics`` attribute. :type metrics: Union[str, List[str]], optional .. py:method:: eval_and_log_top_k(it) Evaluate the GFlowNetAgent's top k samples performance and log the results with its logger. :param it: Current iteration step, by default None. :type it: int