gflownet.envs.ctorus
====================

.. py:module:: gflownet.envs.ctorus

.. autoapi-nested-parse::

   Classes to represent continuous hyper-torus environments.


Classes
-------

.. autoapisummary::

   gflownet.envs.ctorus.ContinuousTorus


Module Contents
---------------

.. py:class:: ContinuousTorus(n_dim = 2, length_traj = 1, n_comp = 1, policy_encoding_dim_per_angle = None, vonmises_min_concentration = 0.001, fixed_distr_params = {'vonmises_mean': 0.0, 'vonmises_concentration': 0.5}, random_distr_params = {'vonmises_mean': 0.0, 'vonmises_concentration': 0.001}, **kwargs)

   Bases: :py:obj:`gflownet.envs.base.GFlowNetEnv`


   Initializes a ContinuousCube environent.

   :param ndim: Dimensionality of the torus
   :type ndim: int
   :param length_traj: Fixed length of the trajectory.
   :type length_traj: int
   :param n_comp: Number of components in the mixture of von Mises distributions used to
                  sample angle increments.
   :type n_comp: int
   :param policy_encoding_dim_per_angle: Dimensionality of the policy encodings of the angles.
   :type policy_encoding_dim_per_angle: int
   :param vonmises_min_concentration: Minimum value allowed for the concentration parameter of the von Mises
                                      distributions.
   :type vonmises_min_concentration: float
   :param fixed_distr_params: Dictionary of parameters of the von Mises distribution that defines the
                              fixed distribution of the environment. It must contain two keys with float
                              values: ``vonmises_mean`` and ``vonmises_concentration``.
   :type fixed_distr_params: dict
   :param random_distr_params: Dictionary of parameters of the von Mises distribution that defines the
                               random distribution of the environment. It must contain two keys with float
                               values: ``vonmises_mean`` and ``vonmises_concentration``.
   :type random_distr_params: dict


   .. py:attribute:: n_dim
      :value: 2


   .. py:attribute:: length_traj
      :value: 1


   .. py:attribute:: n_comp
      :value: 1


   .. py:attribute:: policy_encoding_dim_per_angle
      :value: None


   .. py:attribute:: vonmises_min_concentration
      :value: 0.001


   .. py:attribute:: source_angles


   .. py:attribute:: source


   .. py:attribute:: eos


   .. py:attribute:: continuous
      :value: True


   .. py:property:: mask_dim

      Returns the dimensionality of the masks.

      The mask consists of two fixed flags.

      :returns: *The dimensionality of the masks.*


   .. py:method:: get_action_space()

      The action space is continuous, thus not defined as such here.

      The actions are tuples of length n_dim, where the value at position d indicates
      the increment of dimension d.

      EOS is indicated by np.inf for all dimensions.

      This method defines self.eos and the returned action space is simply a
      representative (arbitrary) action with an increment of 0.0 in all dimensions,
      and EOS.


   .. py:method:: action2representative(action)

      Returns the arbirary, representative action in the action space, so that the
      action can be contrasted with the action space and masks. If EOS, action return
      EOS.


   .. py:method:: get_policy_output(params)

      Defines the structure of the output of the policy model, from which an
      action is to be determined or sampled, by returning a vector with a fixed
      random policy.

      For each dimension d of the hyper-torus and component c of the mixture, the
      output of the policy should return
        1) the weight of the component in the mixture
        2) the location of the von Mises distribution to sample the angle increment
        3) the log concentration of the von Mises distribution to sample the angle
        increment

      Therefore, the output of the policy model has dimensionality D x C x 3, where D
      is the number of dimensions (self.n_dim) and C is the number of components
      (self.n_comp). The first 3 x C entries in the policy output correspond to the
      first dimension, and so on.


   .. py:method:: get_mask_invalid_actions_forward(state = None, done = None)

      The action space is continuous, thus the mask is not of invalid actions as
      in discrete environments, but an indicator of "special cases", for example
      states from which only certain actions are possible.

      The "mask" has 2 elements - to match the mask of backward actions - but only
      one is needed for forward actions, thus both elements take the same value,
      according to the following:

      - If done is True, then the mask is True.
      - If the number of actions (state[-1]) is equal to the (fixed) trajectory
        length, then only EOS is valid and the mask is True.
      - Otherwise, any continuous action is valid (except EOS) and the mask is False.


   .. py:method:: get_mask_invalid_actions_backward(state=None, done=None, parents_a=None)

      The action is space is continuous, thus the mask is not of invalid actions as
      in discrete environments, but an indicator of "special cases", for example
      states from which only certain actions are possible.

      The "mask" has 2 elements to capture the 2 special in backward actions. The
      possible values of the mask are the following:

      - mask[0]:
          - True, if only the "return-to-source" action is valid.
          - False otherwise.
      - mask[1]:
          - True, if only the EOS action is valid, that is if done is True.
          - False otherwise.


   .. py:method:: get_valid_actions(mask = None, state = None, done = None, backward = False)

      Returns the list of non-invalid (valid, for short) according to the mask of
      invalid actions.

      As a continuous environment, the returned actions are "representatives", that
      is the actions represented in the action space.

      :param mask: The mask of a state. If None, it is computed in place.
      :type mask: list (optional)
      :param state: A state in GFlowNet format. If None, self.state is used.
      :type state: list (optional)
      :param done: Whether the trajectory is done. If None, self.done is used.
      :type done: bool (optional)
      :param backward: True if the transtion is backwards; False if forward.
      :type backward: bool

      :returns: *list* -- The list of representatives of the valid actions.


   .. py:method:: get_parents(state = None, done = None, action = None)

      Defined only because it is required. A ContinuousEnv should be created to avoid
      this issue.


   .. py:method:: step(action, skip_mask_check = False)

      Executes forward step given an action.

      See: _step().

      :param action: Action to be executed. An action is a vector where the value at position d
                     indicates the increment in the angle at dimension d.
      :type action: tuple
      :param skip_mask_check: Ignored because the action space space is fully continuous, therefore there
                              is nothing to check.
      :type skip_mask_check: bool

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*int*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the
                  root state


   .. py:method:: step_backwards(action, skip_mask_check = False)

      Executes backward step given an action.

      See: _step().

      :param action: Action to be executed. An action is a vector where the value at position d
                     indicates the increment in the angle at dimension d.
      :type action: tuple
      :param skip_mask_check: Ignored because the action space space is fully continuous, therefore there
                              is nothing to check.
      :type skip_mask_check: bool

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*int*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the
                  root state


   .. py:method:: states2proxy(states)

      Prepares a batch of states in "environment format" for the proxy: each state is
      a vector of length n_dim where each value is an angle in radians. The n_actions
      item is removed.

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list or tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: states2policy(states)

      Prepares a batch of states in "environment format" for the policy model: if
      policy_encoding_dim_per_angle >= 2, then the state (angles) is encoded using
      trigonometric components.

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list or tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: state2readable(state)

      Converts a state (a list of positions) into a human-readable string
      representing a state. Angles are converted into degrees in [0, 360]


   .. py:method:: readable2state(readable)

      Converts a human-readable string representing a state into a state as a list of
      positions. Angles are converted back to radians.


   .. py:method:: sample_actions_batch(policy_outputs, mask = None, states_from = None, is_backward = False, random_action_prob = 0.0, temperature_logits = 1.0)

      Samples a batch of actions from a batch of policy outputs. The angle increments
      that form the actions are sampled from a mixture of Von Mises distributions.

      A distinction between forward and backward actions is made and specified by the
      argument is_backward, in order to account for the following special cases:

      Forward:

      - If the number of steps is equal to the maximum, then the only valid action is
        EOS.

      Backward:

      - If the number of steps is equal to 1, then the only valid action is to return
        to the source. The specific action depends on the current state.

      :param policy_outputs: The output of the GFlowNet policy model.
      :type policy_outputs: tensor
      :param mask: The mask containing information about special cases.
      :type mask: tensor
      :param states_from: The states originating the actions, in GFlowNet format.
      :type states_from: tensor
      :param is_backward: True if the actions are backward, False if the actions are forward
                          (default).
      :type is_backward: bool
      :param random_action_prob: The probability of sampling a random action.
      :type random_action_prob: float, optional
      :param temperature_logits: A scalar by which the model outputs are divided to temper the sampling
                                 distribution.
      :type temperature_logits: float, optional


   .. py:method:: get_logprobs(policy_outputs, actions, mask, states_from = None, is_backward = False)

      Computes log probabilities of actions given policy outputs and actions.

      :param policy_outputs: The output of the GFlowNet policy model.
      :type policy_outputs: tensor
      :param mask: The mask containing information special cases.
      :type mask: tensor
      :param actions: The actions (angle increments) from each state in the batch for which to
                      compute the log probability.
      :type actions: list or tensor
      :param states_from: Ignored.
      :type states_from: tensor
      :param is_backward: Ignored.
      :type is_backward: bool


   .. py:method:: copy()


   .. py:method:: get_grid_terminating_states(n_states)

      Samples n terminating states by sub-sampling the state space as a grid, where n
      / n_dim points are obtained for each dimension.

      :param n_states: The number of terminating states to sample.
      :type n_states: int

      :returns: **states** (*list*) -- A list of randomly sampled terminating states.


   .. py:method:: get_uniform_terminating_states(n_states, seed = None)


   .. py:method:: fit_kde(samples, kernel = 'gaussian', bandwidth = 0.1)

      Fits a Kernel Density Estimator on a batch of samples.

      The samples are previously augmented in order to account for the periodic
      aspect of the sample space.

      :param samples: A batch of samples in proxy format.
      :type samples: tensor
      :param kernel: An identifier of the kernel to use for the density estimation. It must be a
                     valid kernel for the scikit-learn method
                     :py:meth:`sklearn.neighbors.KernelDensity`.
      :type kernel: str
      :param bandwidth: The bandwidth of the kernel.
      :type bandwidth: float


   .. py:method:: plot_reward_samples(samples, samples_reward, rewards, min_domain = -np.pi, max_domain = 3 * np.pi, alpha = 0.5, dpi = 150, max_samples = 500, **kwargs)

      Plots the reward contour alongside a batch of samples.

      The samples are previously augmented in order to visualise the periodic aspect
      of the sample space. It is assumed that the rewards are sorted from left to
      right (first) and top to bottom of the grid of samples.

      :param samples: A batch of samples from the GFlowNet policy in proxy format. These samples
                      will be plotted on top of the reward density.
      :type samples: tensor
      :param samples_reward: A batch of samples containing a grid over the sample space, from which the
                             reward has been obtained. Ignored by this method.
      :type samples_reward: tensor
      :param rewards: The rewards of samples_reward. It should be a vector of dimensionality
                      n_per_dim ** 2 and be sorted such that the each block at rewards[i *
                      n_per_dim:i * n_per_dim + n_per_dim] correspond to the rewards at the i-th
                      row of the grid of samples, from top to bottom.
      :type rewards: tensor
      :param min_domain: Minimum value of the domain to keep in the plot.
      :type min_domain: float
      :param max_domain: Maximum value of the domain to keep in the plot.
      :type max_domain: float
      :param alpha: Transparency of the reward contour.
      :type alpha: float
      :param dpi: Dots per inch, indicating the resolution of the plot.
      :type dpi: int
      :param max_samples: Maximum of number of samples to include in the plot.
      :type max_samples: int


   .. py:method:: plot_kde(samples, kde, alpha = 0.5, dpi=150, colorbar = True, **kwargs)

      Plots the density previously estimated from a batch of samples via KDE over the
      entire sample space.

      :param samples: A batch of samples containing a grid over the sample space. These samples
                      are used to plot the contour of the estimated density.
      :type samples: tensor
      :param kde: A scikit-learn KDE object fit with a batch of samples.
      :type kde: KDE
      :param alpha: Transparency of the density contour.
      :type alpha: float
      :param dpi: Dots per inch, indicating the resolution of the plot.
      :type dpi: int


   .. py:method:: augment_samples(samples, exclude_original = False)
      :staticmethod:


      Augments a batch of samples by applying the periodic boundary conditions from
      [0, 2pi) to [-2pi, 4pi) for all dimensions.