gflownet.envs.cube
==================

.. py:module:: gflownet.envs.cube

.. autoapi-nested-parse::

   Classes to represent hyper-cube environments


Attributes
----------

.. autoapisummary::

   gflownet.envs.cube.CELL_MIN
   gflownet.envs.cube.CELL_MAX


Classes
-------

.. autoapisummary::

   gflownet.envs.cube.CubeBase
   gflownet.envs.cube.ContinuousCube
   gflownet.envs.cube.HybridCube


Module Contents
---------------

.. py:data:: CELL_MIN
   :value: -1.0


.. py:data:: CELL_MAX
   :value: 1.0


.. py:class:: CubeBase(n_dim = 2, min_incr = 0.1, n_comp = 1, beta_params_min = 1.0, beta_params_max = 100.0, epsilon = 1e-06, kappa = 0.001, ignored_dims = None, fixed_distr_params = {'beta_weights': 1.0, 'beta_alpha': 10.0, 'beta_beta': 10.0, 'bernoulli_bts_prob': 0.1, 'bernoulli_eos_prob': 0.1}, random_distr_params = {'beta_weights': 1.0, 'beta_alpha': 10.0, 'beta_beta': 10.0, 'bernoulli_bts_prob': 0.1, 'bernoulli_eos_prob': 0.1}, **kwargs)

   Bases: :py:obj:`gflownet.envs.base.GFlowNetEnv`, :py:obj:`abc.ABC`


   Base class for hyper-cube environments, continuous or hybrid versions of the
   hyper-grid in which the continuous increments are modelled by a (mixture of) Beta
   distribution(s).

   The states space is the value of each dimension, defined in the closed set [0, 1].
   If the value of a dimension gets larger than 1 - min_incr, then the trajectory is
   ended (the only possible action is EOS).

   .. attribute:: n_dim

      Dimensionality of the hyper-cube.

      :type: int

   .. attribute:: min_incr

      Minimum increment in the actions, in (0, 1). This is necessary to ensure
      that all trajectories have finite length.

      :type: float

   .. attribute:: n_comp

      Number of components in the mixture of Beta distributions.

      :type: int

   .. attribute:: epsilon

      Small constant to control the clamping interval of the inputs to the
      calculation of log probabilities. Clamping interval will be [epsilon, 1 -
      epsilon]. The smaller the value, the lower the probability to incur an
      unbounded result due to numerical precision, but the lower the precision too.
      Default: 1e-6.

      :type: float

   .. attribute:: kappa

      Small constant to control the intervals of the generated sets of states (in a
      grid or uniformly). States will be in the interval [kappa, 1 - kappa]. Default:
      1e-3.

      :type: float

   .. attribute:: ignored_dims

      Boolean mask of ignored dimensions. This can be used for trajectories that may
      have multiple dimensions coupled or fixed. For each dimension, True if ignored,
      False, otherwise. If None, no dimension is ignored.

      :type: list


   .. py:attribute:: n_dim
      :value: 2


   .. py:attribute:: min_incr
      :value: 0.1


   .. py:attribute:: n_comp
      :value: 1


   .. py:attribute:: beta_params_min
      :value: 1.0


   .. py:attribute:: beta_params_max
      :value: 100.0


   .. py:attribute:: source


   .. py:attribute:: epsilon
      :value: 1e-06


   .. py:attribute:: kappa
      :value: 0.001


   .. py:attribute:: continuous
      :value: True


   .. py:method:: get_action_space()
      :abstractmethod:


      Constructs list with all possible actions (excluding end of sequence)


   .. py:method:: get_policy_output(params)
      :abstractmethod:


      Defines the structure of the output of the policy model, from which an
      action is to be determined or sampled, by returning a vector with a fixed
      random policy. As a baseline, the policy is uniform over the dimensionality of
      the action space.

      Continuous environments will generally have to overwrite this method.


   .. py:method:: get_mask_invalid_actions_forward(state = None, done = None)
      :abstractmethod:


      Returns a list of length the action space with values:
          - True if the forward action is invalid from the current state.
          - False otherwise.
      For continuous or hybrid environments, this mask corresponds to the discrete
      part of the action space.


   .. py:method:: get_mask_invalid_actions_backward(state=None, done=None, parents_a=None)
      :abstractmethod:


      Returns a list of length the action space with values:
          - True if the backward action is invalid from the current state.
          - False otherwise.
      For continuous or hybrid environments, this mask corresponds to the discrete
      part of the action space.

      The base implementation below should be common to all discrete spaces as it
      relies on get_parents, which is environment-specific and must be implemented.
      Continuous environments will probably need to implement its specific version of
      this method.


   .. py:method:: states2proxy(states)

      Prepares a batch of states in "environment format" for a proxy: clips the
      states into [0, 1] and maps them to [CELL_MIN, CELL_MAX]

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list or tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: states2policy(states)

      Prepares a batch of states in "environment format" for the policy model: clips
      the states into [0, 1] and maps them to [-1.0, 1.0]

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list or tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: state2readable(state)

      Converts a state (a list of positions) into a human-readable string
      representing a state.


   .. py:method:: readable2state(readable)

      Converts a human-readable string representing a state into a state as a list of
      positions.


   .. py:method:: get_parents(state = None, done = None, action = None)
      :abstractmethod:


      Determines all parents and actions that lead to state.

      :param state: Representation of a state
      :type state: list
      :param done: Whether the trajectory is done. If None, done is taken from instance.
      :type done: bool
      :param action: Last action performed
      :type action: int

      :returns: * **parents** (*list*) -- List of parents in state format
                * **actions** (*list*) -- List of actions that lead to state for each parent in parents


   .. py:method:: sample_actions_batch(policy_outputs, mask = None, states_from = None, is_backward = False, random_action_prob = 0.0, temperature_logits = 1.0)
      :abstractmethod:


      Samples a batch of actions from a batch of policy outputs.


   .. py:method:: get_logprobs(policy_outputs, is_forward, actions, mask_invalid_actions = None, loginf = 1000)

      Computes log probabilities of actions given policy outputs and actions.


   .. py:method:: step(action)

      Executes step given an action.

      :param action: Action to be executed. An action is a tuple with two values:
                     (dimension, increment).
      :type action: tuple

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*int*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the
                  root state


   .. py:method:: fit_kde(samples, kernel = 'gaussian', bandwidth = 0.1)

      Fits a Kernel Density Estimator on a batch of samples.

      :param samples: A batch of samples in proxy format.
      :type samples: tensor
      :param kernel: An identifier of the kernel to use for the density estimation. It must be a
                     valid kernel for the scikit-learn method
                     :py:meth:`sklearn.neighbors.KernelDensity`.
      :type kernel: str
      :param bandwidth: The bandwidth of the kernel.
      :type bandwidth: float


   .. py:method:: plot_reward_samples(samples, samples_reward, rewards, alpha = 0.5, dpi = 150, max_samples = 500, **kwargs)

      Plots the reward contour alongside a batch of samples.

      :param samples: A batch of samples from the GFlowNet policy in proxy format. These samples
                      will be plotted on top of the reward density.
      :type samples: tensor
      :param samples_reward: A batch of samples containing a grid over the sample space, from which the
                             reward has been obtained. These samples are used to plot the contour of
                             reward density.
      :type samples_reward: tensor
      :param rewards: The rewards of samples_reward. It should be a vector of dimensionality
                      n_per_dim ** 2 and be sorted such that the each block at rewards[i *
                      n_per_dim:i * n_per_dim + n_per_dim] correspond to the rewards at the i-th
                      row of the grid of samples, from top to bottom. The same is assumed for
                      samples_reward.
      :type rewards: tensor
      :param alpha: Transparency of the reward contour.
      :type alpha: float
      :param dpi: Dots per inch, indicating the resolution of the plot.
      :type dpi: int
      :param max_samples: Maximum of number of samples to include in the plot.
      :type max_samples: int


   .. py:method:: plot_kde(samples, kde, alpha = 0.5, dpi=150, colorbar = True, **kwargs)

      Plots the density previously estimated from a batch of samples via KDE over the
      entire sample space.

      :param samples: A batch of samples containing a grid over the sample space. These samples
                      are used to plot the contour of the estimated density.
      :type samples: tensor
      :param kde: A scikit-learn KDE object fit with a batch of samples.
      :type kde: KDE
      :param alpha: Transparency of the density contour.
      :type alpha: float
      :param dpi: Dots per inch, indicating the resolution of the plot.
      :type dpi: int


.. py:class:: ContinuousCube(**kwargs)

   Bases: :py:obj:`CubeBase`


   Continuous hyper-cube environment (continuous version of a hyper-grid) in which the
   action space consists of the increment of each dimension d, modelled by a mixture
   of Beta distributions. The state space is the value of each dimension. In order to
   ensure that all trajectories are of finite length, actions have a minimum increment
   for all dimensions determined by min_incr. If the value of any dimension is larger
   than 1 - min_incr, then that dimension can't be further incremented. In order to
   ensure the coverage of the state space, the first action (from the source state) is
   not constrained by the minimum increment.

   Actions do not represent absolute increments but rather the relative increment with
   respect to the distance to the edges of the hyper-cube, from the minimum increment.
   That is, if dimension d of a state has value 0.3, the minimum increment (min_incr)
   is 0.1 and the maximum value is 1.0, an action of 0.5 will increment the
   value of the dimension in 0.5 * (1.0 - 0.3 - 0.1) = 0.5 * 0.6 = 0.3. Therefore, the
   value of d in the next state will be 0.3 + 0.3 = 0.6.

   .. attribute:: n_dim

      Dimensionality of the hyper-cube.

      :type: int

   .. attribute:: min_incr

      Minimum increment in the actions, in (0, 1). This is necessary to ensure
      that all trajectories have finite length.

      :type: float

   .. attribute:: n_comp

      Number of components in the mixture of Beta distributions.

      :type: int


   .. py:attribute:: mask_dim_base
      :value: 3


   .. py:method:: get_action_space()

      The action space is continuous, thus not defined as such here.

      The actions contained in the action space are "representatives"

      The actions are tuples of length n_dim + 1, where the value at position d
      indicates the increment of dimension d, and the value at position -1 indicates
      whether the action is from or to source (1), or 0 otherwise.

      EOS is indicated by np.inf for all dimensions.

      The action space consists of the EOS actions and two representatives:
      - Generic increment action, not from or to source: (0, 0, ..., 0, 0)
      - Generic increment action, from or to source: (0, 0, ..., 0, 1)
      - EOS: (inf, inf, ..., inf, inf)


   .. py:method:: action2representative(action)

      Replaces the continuous values of an action by 0s (the "generic" or
      "representative" action in the first position of the action space), so that
      they can be compared against the action space or a mask.

      If the action is EOS, it is returned as is.

      :param action: An actual action of the Cube environment (with continuous values)
      :type action: tuple

      :returns: *tuple* -- A representative of the action, where continuous values are replaced by
                zeros.


   .. py:method:: get_policy_output(params)

      Defines the structure of the output of the policy model.

      The policy output will be used to initialize a distribution, from which an
      action is to be determined or sampled. This method returns a vector with a
      fixed policy defined by params.

      The environment consists of both continuous and discrete actions.

      Continuous actions

      For each dimension d of the hyper-cube and component c of the mixture, the
      output of the policy should return:
        1) the weight of the component in the mixture,
        2) the pre-alpha parameter of the Beta distribution to sample the increment,
        3) the pre-beta parameter of the Beta distribution to sample the increment.

      These parameters are the first n_dim * n_comp * 3 of the policy output such
      that the first 3 x C elements correspond to the first dimension, and so on.

      Discrete actions

      Additionally, the policy output contains one logit (pos -1) of a Bernoulli
      distribution to model the (discrete) forward probability of selecting the EOS
      action and another logit (pos -2) for the (discrete) backward probability of
      returning to the source node.

      Therefore, the output of the policy model has dimensionality D x C x 3 + 2,
      where D is the number of dimensions (self.n_dim) and C is the number of
      components (self.n_comp).

      See
      ---
      _beta_params_to_policy_outputs()


   .. py:method:: get_mask_invalid_actions_forward(state = None, done = None)

      The action space is continuous, thus the mask is not only of invalid actions as
      in discrete environments, but also an indicator of "special cases", for example
      states from which only certain actions are possible.

      The values of True/False intend to approximately stick to the semantics in
      discrete environments, where the mask is of "invalid" actions, but it is
      important to note that a direct interpretation in this sense does not always
      apply.

      For example, the mask values of special cases are True if the special cases they
      refer to are "invalid". In other words, the values are False if the state has
      the special case.

      The forward mask has the following structure:

      - 0 : whether a continuous action is invalid. True if the value at any
        dimension is larger than 1 - min_incr, or if done is True. False otherwise.
      - 1 : special case when the state is the source state. False when the state is
        the source state, True otherwise.
      - 2 : whether EOS action is invalid. EOS is valid from any state, except the
        source state or if done is True.
      - -n_dim: : dimensions that should be ignored when sampling actions or
        computing logprobs. This can be used for trajectories that may have
        multiple dimensions coupled or fixed. For each dimension, True if ignored,
        False, otherwise.


   .. py:method:: get_mask_invalid_actions_backward(state=None, done=None, parents_a=None)

      The action space is continuous, thus the mask is not only of invalid actions as
      in discrete environments, but also an indicator of "special cases", for example
      states from which only certain actions are possible.

      In order to approximately stick to the semantics in discrete environments,
      where the mask is of "invalid" actions, that is the value is True if an action
      is invalid, the mask values of special cases are True if the special cases they
      refer to are "invalid". In other words, the values are False if the state has
      the special case.

      The backward mask has the following structure:

      - 0 : whether a continuous action is invalid. True if the value at any
        dimension is smaller than min_incr, or if done is True. False otherwise.
      - 1 : special case when back-to-source action is the only possible action.
        False if any dimension is smaller than min_incr, True otherwise.
      - 2 : whether EOS action is invalid. False only if done is True, True
        (invalid) otherwise.
      - -n_dim: : dimensions that should be ignored when sampling actions or
        computing logprobs. this can be used for trajectories that may have
        multiple dimensions coupled or fixed. for each dimension, true if ignored,
        false, otherwise. By default, no dimension is ignored.


   .. py:method:: get_valid_actions(mask = None, state = None, done = None, backward = False)

      Returns the list of non-invalid (valid, for short) according to the mask of
      invalid actions.

      As a continuous environment, the returned actions are "representatives", that
      is the actions represented in the action space.

      :param mask: The mask of a state. If None, it is computed in place.
      :type mask: list (optional)
      :param state: A state in GFlowNet format. If None, self.state is used.
      :type state: list (optional)
      :param done: Whether the trajectory is done. If None, self.done is used.
      :type done: bool (optional)
      :param backward: True if the transtion is backwards; False if forward.
      :type backward: bool

      :returns: *list* -- The list of representatives of the valid actions.


   .. py:method:: get_parents(state = None, done = None, action = None)

      Defined only because it is required. A ContinuousEnv should be created to avoid
      this issue.


   .. py:method:: relative_to_absolute_increments(states, increments_rel, is_backward)

      Returns a batch of absolute increments (actions) given a batch of states,
      relative increments and minimum_increments.

      Given a dimension value x, a relative increment r, and a minimum increment m,
      then the absolute increment a is given by:

      Forward:

      a = m + r * (1 - x - m)

      Backward:

      a = m + r * (x - m)


   .. py:method:: absolute_to_relative_increments(states, increments_abs, is_backward)

      Returns a batch of relative increments (as sampled by the Beta distributions)
      given a batch of states, absolute increments (actions) and minimum_increments.

      Given a dimension value x, an absolute increment a, and a minimum increment m,
      then the relative increment r is given by:

      Forward:

      r = (a - m) / (1 - x - m)

      Backward:

      r = (a - m) / (x - m)


   .. py:method:: sample_actions_batch(policy_outputs, mask = None, states_from = None, is_backward = False, random_action_prob = 0.0, temperature_logits = 1.0)

      Samples a batch of actions from a batch of policy outputs.

      This method overwrites the methof of the GFlowNetEnv because it
      is a continious enviroment.

      :param policy_outputs: The output of the GFlowNet policy model.
      :type policy_outputs: tensor
      :param mask: The mask of invalid actions. For continuous or mixed environments, the mask
                   may be tensor with an arbitrary length contaning information about special
                   states, as defined elsewhere in the environment.
      :type mask: tensor
      :param states_from: The states originating the actions, in GFlowNet format. Ignored in discrete
                          environments and only required in certain continuous environments.
      :type states_from: tensor
      :param is_backward: True if the actions are backward, False if the actions are forward
                          (default).
      :type is_backward: bool
      :param random_action_prob: The probability of sampling a random action. If larger than one, the model
                                 outputs will be replaced by a random policy vector with probability
                                 `random_action_prob`, according to Bernoulli distribution.
      :type random_action_prob: float, optional
      :param temperature_logits: A scalar by which the model outputs are divided to temper the sampling
                                 distribution.
      :type temperature_logits: float, optional

      :returns: **actions** (*list*) -- The list of sampled actions.


   .. py:method:: get_logprobs(policy_outputs, actions, mask, states_from, is_backward)

      Computes log probabilities of actions given policy outputs and actions.

      :param policy_outputs: The output of the GFlowNet policy model.
      :type policy_outputs: tensor
      :param mask: The mask containing information about invalid actions and special cases.
      :type mask: tensor
      :param actions: The actions (absolute increments) from each state in the batch for which to
                      compute the log probability.
      :type actions: list or tensor
      :param states_from: The states originating the actions, in GFlowNet format. They are required
                          so as to compute the relative increments and the Jacobian.
      :type states_from: tensor
      :param is_backward: True if the actions are backward, False if the actions are forward
                          (default). Required, since the computation for forward and backward actions
                          is different.
      :type is_backward: bool


   .. py:method:: step(action)

      Executes step given an action. An action is the absolute increment of each
      dimension.

      :param action: Action to be executed. An action is a tuple of length n_dim, with the
                     absolute increment for each dimension.
      :type action: tuple

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*int*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the
                  root state


   .. py:method:: step_backwards(action)

      Executes backward step given an action. An action is the absolute decrement of
      each dimension.

      :param action: Action to be executed. An action is a tuple of length n_dim, with the
                     absolute decrement for each dimension.
      :type action: tuple

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*int*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the
                  root state


   .. py:method:: get_grid_terminating_states(n_states, kappa = None)

      Constructs a grid of terminating states within the range of the hyper-cube.

      :param n_states: Requested number of states. The actual number of states will be rounded up
                       such that all dimensions have the same number of states.
      :type n_states: int
      :param kappa: Small constant indicating the distance to the theoretical limits of the
                    cube [0, 1], in order to avoid innacuracies in the computation of the log
                    probabilities due to clamping. The grid will thus be in [kappa, 1 -
                    kappa]. If None, self.kappa will be used.
      :type kappa: float


   .. py:method:: get_uniform_terminating_states(n_states, seed = None, kappa = None)

      Constructs a set of terminating states sampled uniformly within the range of
      the hyper-cube.

      :param n_states: Number of states in the returned list.
      :type n_states: int
      :param kappa: Small constant indicating the distance to the theoretical limits of the
                    cube [0, 1], in order to avoid innacuracies in the computation of the log
                    probabilities due to clamping. The states will thus be uniformly sampled in
                    [kappa, 1 - kappa]. If None, self.kappa will be used.
      :type kappa: float


.. py:class:: HybridCube(**kwargs)

   Bases: :py:obj:`gflownet.envs.composite.setfix.SetFix`


   :param subenvs: An iterable containing the set of the sub-environments.
   :type subenvs: iterable


   .. py:attribute:: n_dim


   .. py:method:: states2proxy(states)

      Prepares a batch of states in environment format for a proxy.

      The input states are in the environment format of the Set. The outputs contain
      only the Cube part and the format is as in
      :py:meth:`gflownet.envs.cube.ContinuousCube.states2proxy`.

      :param states: A batch of states in Set environment format.
      :type states: list

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: get_grid_terminating_states(n_states, kappa = None)

      Constructs a grid of terminating states within the range of the hyper-cube.

      :param n_states: Requested number of states. The actual number of states will be rounded up
                       such that all dimensions have the same number of states.
      :type n_states: int
      :param kappa: Small constant indicating the distance to the theoretical limits of the
                    cube [0, 1], in order to avoid innacuracies in the computation of the log
                    probabilities due to clamping. The grid will thus be in [kappa, 1 -
                    kappa]. If None, self.kappa will be used.
      :type kappa: float


   .. py:method:: get_uniform_terminating_states(n_states, seed = None, kappa = None)

      Constructs a set of terminating states sampled uniformly within the range of
      the hyper-cube.

      :param n_states: Number of states in the returned list.
      :type n_states: int
      :param kappa: Small constant indicating the distance to the theoretical limits of the
                    cube [0, 1], in order to avoid innacuracies in the computation of the log
                    probabilities due to clamping. The states will thus be uniformly sampled in
                    [kappa, 1 - kappa]. If None, self.kappa will be used.
      :type kappa: float


   .. py:method:: fit_kde(samples, kernel = 'gaussian', bandwidth = 0.1)

      Fits a Kernel Density Estimator on a batch of samples.

      Simply calls fit_kde() of CubeBase.


   .. py:method:: plot_reward_samples(samples, samples_reward, rewards, alpha = 0.5, dpi = 150, max_samples = 500, **kwargs)

      Plots the reward contour alongside a batch of samples.

      Simply calls plot_reward_samples() of CubeBase.


   .. py:method:: plot_kde(samples, kde, alpha = 0.5, dpi=150, colorbar = True, **kwargs)

      Plots the density previously estimated from a batch of samples via KDE over the
      entire sample space.

      Simply calls plot_reward_samples() of CubeBase.