gflownet.envs.grid
==================

.. py:module:: gflownet.envs.grid

.. autoapi-nested-parse::

   Classes to represent a hyper-grid environments


Classes
-------

.. autoapisummary::

   gflownet.envs.grid.Grid


Module Contents
---------------

.. py:class:: Grid(n_dim = 2, length = 3, max_increment = 1, max_dim_per_action = 1, cell_min = -1, cell_max = 1, **kwargs)

   Bases: :py:obj:`gflownet.envs.base.GFlowNetEnv`


   Hyper-grid environment: A grid with n_dim dimensions and length cells per
   dimensions.

   The state space is the entire grid and each state is represented by the vector of
   coordinates of each dimensions. For example, in 3D, the origin will be at [0, 0, 0]
   and after incrementing dimension 0 by 2, dimension 1 by 3 and dimension 3 by 1, the
   state would be [2, 3, 1].

   The action space is the increment to be applied to each dimension. For instance,
   (0, 0, 1) will increment dimension 2 by 1 and the action that goes from [1, 1, 1]
   to [2, 3, 1] is (1, 2, 0).

   .. attribute:: n_dim

      Dimensionality of the grid

      :type: int

   .. attribute:: length

      Size of the grid (cells per dimension)

      :type: int

   .. attribute:: max_increment

      Maximum increment of each dimension by the actions.

      :type: int

   .. attribute:: max_dim_per_action

      Maximum number of dimensions to increment per action. If -1, then
      max_dim_per_action is set to n_dim.

      :type: int

   .. attribute:: cell_min

      Lower bound of the cells range

      :type: float

   .. attribute:: cell_max

      Upper bound of the cells range

      :type: float


   .. py:attribute:: n_dim
      :value: 2


   .. py:attribute:: length
      :value: 3


   .. py:attribute:: max_increment
      :value: 1


   .. py:attribute:: max_dim_per_action
      :value: 1


   .. py:attribute:: cells


   .. py:attribute:: source


   .. py:attribute:: eos


   .. py:method:: get_action_space()

      Constructs list with all possible actions, including eos. An action is
      represented by a vector of length n_dim where each index d indicates the
      increment to apply to dimension d of the hyper-grid.


   .. py:method:: get_mask_invalid_actions_forward(state = None, done = None)

      Returns a list of length the action space with values:
          - True if the forward action is invalid from the current state.
          - False otherwise.


   .. py:method:: states2proxy(states)

      Prepares a batch of states in "environment format" for the proxy: each state is
      a vector of length n_dim with values in the range [cell_min, cell_max].

      See: states2policy()

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list or tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: states2policy(states)

      Prepares a batch of states in "environment format" for the policy model: states
      are one-hot encoded.

      The output is a 2D tensor, with the second dimension of size length * n_dim,
      where each n-th successive block of length elements is a one-hot encoding of
      the position in the n-th dimension.

      Example (n_dim = 3, length = 4):
        - state: [0, 3, 1]
        - policy format: [1, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0]
                         |     0    |      3    |      1    |

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list or tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: readable2state(readable, alphabet={})

      Converts a human-readable string representing a state into a state as a list of
      positions.


   .. py:method:: state2readable(state = None, alphabet={})

      Converts a state (a list of positions) into a human-readable string
      representing a state.


   .. py:method:: get_parents(state = None, done = None, action = None)

      Determines all parents and actions that lead to state.

      :param state: Representation of a state, as a list of length length where each element is
                    the position at each dimension.
      :type state: list
      :param done: Whether the trajectory is done. If None, done is taken from instance.
      :type done: bool
      :param action: Ignored
      :type action: None

      :returns: * **parents** (*list*) -- List of parents in state format
                * **actions** (*list*) -- List of actions that lead to state for each parent in parents


   .. py:method:: step(action, skip_mask_check = False)

      Executes step given an action.

      :param action: Action to be executed. An action is a tuple int values indicating the
                     dimensions to increment by 1.
      :type action: tuple
      :param skip_mask_check: If True, skip computing forward mask of invalid actions to check if the
                              action is valid.
      :type skip_mask_check: bool

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*tuple*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state.


   .. py:method:: get_all_terminating_states()


   .. py:method:: get_uniform_terminating_states(n_states, seed = None)


   .. py:method:: plot_reward_samples(samples, samples_reward, rewards, dpi = 150, n_ticks_max = 50, reward_norm = True, **kwargs)

      Plots the reward density as a 2D histogram on the grid, alongside a histogram
      representing the samples density.

      It is assumed that the rewards correspond to entire domain of the grid and are
      sorted from left to right (first) and top to bottom of the grid of samples.

      :param samples: A batch of samples from the GFlowNet policy in proxy format. These samples
                      will be plotted on top of the reward density.
      :type samples: tensor
      :param samples_reward: A batch of samples containing a grid over the sample space, from which the
                             reward has been obtained. Ignored by this method.
      :type samples_reward: tensor
      :param rewards: The rewards of samples_reward. It should be a vector of dimensionality
                      length ** 2 and be sorted such that the each block at rewards[i *
                      length:i * length + length] correspond to the rewards at the i-th
                      row of the grid of samples, from top to bottom.
      :type rewards: tensor
      :param dpi: Dots per inch, indicating the resolution of the plot.
      :type dpi: int
      :param n_ticks_max: Maximum of number of ticks to include in the axes.
      :type n_ticks_max: int
      :param reward_norm: Whether to normalize the histogram. True by default.
      :type reward_norm: bool