gflownet.proxy.base
===================

.. py:module:: gflownet.proxy.base

.. autoapi-nested-parse::

   Base class of GFlowNet proxies


Attributes
----------

.. autoapisummary::

   gflownet.proxy.base.LOGZERO


Classes
-------

.. autoapisummary::

   gflownet.proxy.base.Proxy


Module Contents
---------------

.. py:data:: LOGZERO
   :value: -1000.0


.. py:class:: Proxy(device = 'cpu', float_precision = 32, reward_function = 'identity', logreward_function = None, reward_function_kwargs = {}, reward_min = 0.0, do_clip_rewards = False, **kwargs)

   Bases: :py:obj:`abc.ABC`


   Base Proxy class for GFlowNet proxies.

   A proxy is the input to a reward function. Depending on the
   ``reward_function``, the reward may be directly the output of the proxy or a
   function of it.

   :param device: The device to be passed to torch tensors.
   :type device: str or torch.device
   :param float_precision: The floating point precision to be passed to torch tensors.
   :type float_precision: int or torch.dtype
   :param reward_function: The transformation applied to the proxy outputs to obtain a GFlowNet
                           reward. See :py:meth:`Proxy._get_reward_functions`.
   :type reward_function: str or Callable
   :param logreward_function: The transformation applied to the proxy outputs to obtain a GFlowNet
                              log reward. See :meth:`Proxy._get_reward_functions`. If None (default), the
                              log of the reward function is used. The Callable may be used to improve the
                              numerical stability of the transformation.
   :type logreward_function: Callable
   :param reward_function_kwargs: A dictionary of arguments to be passed to the reward function.
   :type reward_function_kwargs: dict
   :param reward_min: The minimum value allowed for rewards, 0.0 by default, which results in a
                      minimum log reward of :py:const:`LOGZERO`. Note that certain loss
                      functions, for example the Forward Looking loss may not work as desired if
                      the minimum reward is 0.0. It may be set to a small (positive) value close
                      to zero in order to prevent numerical stability issues.
   :type reward_min: float
   :param do_clip_rewards: Whether to clip the rewards according to the minimum value.
   :type do_clip_rewards: bool


   .. py:attribute:: device
      :value: 'cpu'


   .. py:attribute:: float
      :value: 32


   .. py:attribute:: reward_function_kwargs


   .. py:attribute:: reward_function
      :value: 'identity'


   .. py:attribute:: logreward_function
      :value: None


   .. py:attribute:: reward_min
      :value: 0.0


   .. py:attribute:: do_clip_rewards
      :value: False


   .. py:method:: setup(env=None)


   .. py:method:: __call__(states)
      :abstractmethod:


      Implement  this function to call the get_reward method of the appropriate Proxy
      Class (EI, UCB, Proxy, Oracle etc).

      :param states:
      :type states: ndarray


   .. py:method:: rewards(states, log = False, return_proxy = False)

      Computes the rewards of a batch of states.

      The rewards are computed by first calling the proxy function, then
      transforming the proxy values according to the reward function.

      :param states: A batch of states in proxy format.
      :type states: tensor or list or array
      :param log: If True, returns the logarithm of the rewards. If False (default), returns
                  the natural rewards.
      :type log: bool
      :param return_proxy: If True, returns the proxy values, alongside the rewards, as the second
                           element in the returned tuple.
      :type return_proxy: bool

      :returns: * **rewards** (*tensor*) -- The reward or log-reward of all elements in the batch.
                * **proxy_values** (*tensor (optional)*) -- The proxy value of all elements in the batch. Included only if return_proxy
                  is True.


   .. py:method:: proxy2reward(proxy_values)

      Transform a tensor of proxy values into rewards.

      If do_clip_rewards is True, rewards are clipped to self.reward_min.

      :param proxy_values: The proxy values corresponding to a batch of states.
      :type proxy_values: tensor

      :returns: *tensor* -- The reward of all elements in the batch.


   .. py:method:: proxy2logreward(proxy_values)

      Transform a tensor of proxy values into log-rewards.

      NaN values are set to self.logreward_min.

      :param proxy_values: The proxy values corresponding to a batch of states.
      :type proxy_values: tensor

      :returns: *tensor* -- The log-reward of all elements in the batch.


   .. py:method:: get_min_reward(log = False)

      Returns the minimum value of the (log) reward, retrieved from self.reward_min
      and self.logreward_min.

      :param log: If True, returns the logarithm of the minimum reward. If False (default),
                  returns the natural minimum reward.
      :type log: bool

      :returns: *float* -- The minimum (log) reward.


   .. py:method:: get_max_reward(log = False)

      Returns the maximum value of the (log) reward, retrieved from self.optimum, in
      case it is defined.

      :param log: If True, returns the logarithm of the maximum reward. If False (default),
                  returns the natural maximum reward.
      :type log: bool

      :returns: *float* -- The maximum (log) reward.


   .. py:property:: optimum

      Returns the optimum value of the proxy.

      Not implemented by default but may be implemented for synthetic proxies or when
      the optimum is known.

      The optimum is used, for example, to accelerate rejection sampling, to sample
      from the reward function.


   .. py:method:: infer_on_train_set()
      :abstractmethod:


      Implement this method in specific proxies.
      It should return the ground-truth and proxy values on the proxy's training set.