gflownet.envs.composite.setbase
===============================

.. py:module:: gflownet.envs.composite.setbase

.. autoapi-nested-parse::

   Classes implementing the family of Set meta-environments, which allow to combine
   multiple sub-environments without any specific order.


Classes
-------

.. autoapisummary::

   gflownet.envs.composite.setbase.BaseSet


Module Contents
---------------

.. py:class:: BaseSet(can_alternate_subenvs=True, **kwargs)

   Bases: :py:obj:`gflownet.envs.composite.base.CompositeBase`


   Initializes the BaseSet.

   :param can_alternate_subenvs: If True, actions of different sub-environments can alternate and each
                                 sub-environment action is preceded and followed by a meta-action to toggle
                                 the sub-environment. If False, once a sub-environment is activated, only
                                 actions of that sub-environment can be performed until it gets done (its
                                 EOS action is performed).
   :type can_alternate_subenvs: bool


   .. py:attribute:: can_alternate_subenvs
      :value: True


   .. py:property:: n_toggle_actions
      :type: int


      Returns the number of actions to toggle sub-environments or unique environments.

      If the Set allows alternating actions between sub-environments, the number of
      toggle actions is the number of sub-environments. Otherwise, toggle actions
      activate unique environments and the number of unique environments is returned.


   .. py:method:: get_action_space()

      Constructs list with all possible actions, including eos.

      The action space of a Set environment consists of:
          - The actions to activate specific sub-environments or unique environments.
          - The EOS action.
          - The concatenation of the actions of all unique environments

      In order to make all actions the same length (required to construct batches of
      actions as a tensor), the actions are zero-padded from the back.

      In order to make all actions unique, the unique environment index is added as
      the first element of the action.

      Note that the actions of unique environments are only added once to the action
      space, regardless of how many elements of the unique environment
      (sub-environments) there are in the set. In other words, identical environments
      that are part of the Set share the actions and a given action will have an
      effect on the sub-environment that is active.

      The actions to activate a specific sub-environment are represented as:
      (-1, subenv index, ZERO-PADDING)

      See:
      - :py:meth:`~gflownet.envs.composite.setbase.BaseSet._pad_action`
      - :py:meth:`~gflownet.envs.composite.setbase.BaseSet._depad_action`


   .. py:method:: action_produces_permutation(action, is_backward = False)

      Determines whether an action produces permutations in the resulting state.

      The Set introduces actions that produce permutations, in particular in the key
      ``_keys`` of the state. These actions are introduced if
      ``self.can_alternate_subenvs`` is False.

      In particular, the actions that produce permutations are backward actions that
      toggle a sub-environment.

      Note that this method does not check whether all relevant substates are
      identical, in which case, there is effectively not more than one permutation.
      Instead, True is returned if the action _could_ produce permutations in the
      resulting state.

      :param action: An action of the environment.
      :type action: tuple
      :param is_backward: Whether the transition to consider is backward (True) or forward (False).
      :type is_backward: bool

      :returns: *bool* -- Whether the input actions produces permutations in the resulting state, in
                the direction indicated by ``is_backward``.


   .. py:method:: get_mask_invalid_actions_forward(state = None, done = None)

      Computes the forward actions mask of the state.

      The mask of the Set environment is the concatenation of the following:
      - A one-hot encoding of the index of the sub-environment or unique environment
        (True at the index of the active environment). All False if the only valid
        actions are meta-actions.
      - Actual (main) mask of invalid actions:
          - The mask of the actions to activate a sub-environment or unique
            environment, OR
          - The mask of the active sub-environment.

      The mask is False-padded from the back up to mask_dim.


   .. py:method:: get_mask_invalid_actions_backward(state = None, done = None)

      Computes the backward actions mask of the state.

      The mask of the Set environment is the concatenation of the following:
      - A one-hot encoding of the index of the subenv (True at the index of the
        active environment). All False if no sub-environment is active.
      - Actual (main) mask of invalid actions:
          - The mask of the actions to activate a sub-environment, OR
          - The mask of the active sub-environment.

      The mask is False-padded from the back up to mask_dim.


   .. py:method:: mask_conditioning(mask, env_cond, backward)

      Conditions the input mask based on the restrictions imposed by a conditioning
      environment, env_cond.

      This method is overriden because the base mask_conditioning would change the
      mask unaware of the special Stack format. Therefore, this method calls the
      mask_conditioning() method of the currently relevant sub-environment and
      returns the mask with the correct Stack format.


   .. py:method:: step(action, skip_mask_check = False)

      Executes forward step given an action.

      Actions may be either sub-environent actions, or set actions. If the former,
      the action is performed by the corresponding sub-environment and then the set
      state is updated accordingly. If the latter, no sub-environment is involved and
      the changes are in the meta-data of the state (active subenv and toggle flag)

      Because the same action may correspond to multiple sub-environments, the action
      will always be performed on the active sub-environment.

      - Toggle actions:
          - Activate the corresponding sub-environment if no sub-environment is
            currently active.
              - If can_alternate_subenvs is True, the toggle flag is set to 1.
          - Reset the active sub-environment flag to -1 if a sub-environment is
            currently active.
              - The toggle flag is expected to be 0 and it remains 0.
      - Environment actions:
          - Updates the corresponding sub-environment as well as the set state.
          - If can_alternate_subenvs is True, the toggle flag is set to 0.

      :param action: Action to be executed. The input action is global, that is padded.
      :type action: tuple

      :returns: * **self.state** (*dict*) -- The state after executing the action.
                * **action** (*int*) -- Action executed.
                * **valid** (*bool*) -- False, if the action is not allowed for the current state. True otherwise.


   .. py:method:: step_backwards(action, skip_mask_check = False)

      Executes backward step given an action.

      Actions may be either sub-environent actions, or set actions. If the former,
      the action is performed by the corresponding sub-environment and then the set
      state is updated accordingly. If the latter, no sub-environment is involved and
      the changes are in the meta-data of the state (active subenv and toggle flag)

      Because the same action may correspond to multiple sub-environments, the action
      will always be performed on the active sub-environment.

      - Toggle actions:
          - Activate the corresponding sub-environment if no sub-environment is
            currently active.
          - Reset the active sub-environment flag to -1 if a sub-environment is
            currently active.
          - Set the toggle flag to 0.
      - Environment actions:
          - Updates the corresponding sub-environment as well as the set state.
          - If can_alternate_subenvs is True, set the toggle flag is set to 1.

      :param action: Action to be executed. The input action is global, that is padded.
      :type action: tuple

      :returns: * **self.state** (*dict*) -- The state after executing the action.
                * **action** (*int*) -- Action executed.
                * **valid** (*bool*) -- False, if the action is not allowed for the current state. True otherwise.


   .. py:method:: get_parents(state = None, done = None, action = None)

      Determines all parents and actions that lead to state.

      :param state: State in environment format. If not, self.state is used.
      :type state: dict
      :param done: Whether the trajectory is done. If None, self.done is used.
      :type done: bool
      :param action: Ignored.
      :type action: tuple

      :returns: * **parents** (*list*) -- List of parents in state format
                * **actions** (*list*) -- List of actions that lead to state for each parent in parents


   .. py:method:: sample_actions_batch(policy_outputs, mask = None, states_from = None, is_backward = False, random_action_prob = 0.0, temperature_logits = 1.0)

      Samples a batch of actions from a batch of policy outputs.

      This method calls the sample_actions_batch() method of the sub-environment
      corresponding to each state in the batch, or samples the actions to activate a
      sub-environment for the environments with no active environment.

      Note that in order to call sample_actions_batch() of the sub-environments, we
      need to first extract the part of the policy outputs, the masks and the states
      that correspond to the sub-environment.


   .. py:method:: get_logprobs(policy_outputs, actions, mask, states_from, is_backward)

      Computes log probabilities of actions given policy outputs and actions.

      :param policy_outputs: The output of the GFlowNet policy model.
      :type policy_outputs: tensor
      :param mask: The mask containing information about invalid actions and special cases.
      :type mask: tensor
      :param actions: The actions (global) from each state in the batch for which to compute the
                      log probability.
      :type actions: list or tensor
      :param states_from: The states originating the actions, in environment format.
      :type states_from: tensor
      :param is_backward: True if the actions are backward, False if the actions are forward
                          (default).
      :type is_backward: bool


   .. py:method:: action2representative(action)

      Replaces the part of the action associated with a sub-environment by its
      representative. The part of the action that identifies the sub-environment
      concerned by the action remains unaffected.

      :param action: An action of the Set environment (padded)
      :type action: tuple

      :returns: *tuple* -- A representative of the action, re-padded as a Set action that should be in
                the action space.


   .. py:method:: get_valid_actions(mask = None, state = None, done = None, backward = False)

      Returns the list of non-invalid (valid, for short) according to the mask of
      invalid actions.

      This method is overridden because the mask of a Set of environments does not
      cover the entire action space, but only the relevant sub-environment or the
      toggle actions, depending on the state. Therefore, this method calls the
      get_valid_actions() method of the active sub-environment or retrieves the valid
      toggle actions and returns the padded actions.


   .. py:method:: get_policy_output(params)

      Defines the structure of the output of the policy model.

      This method is overriden to add the policy outputs corresponding to the Set
      actions. These are concatenated to the policy outputs of the unique
      environments, obtained from the parent's method.
      The policy output is the concatenation of the policy outputs corresponding to
      the Set actions (actions to activate a sub-environment and EOS) and the policy
      outputs of the unique environments.

      :param params: A list of distribution parameters. This list has as many elements as
                     there are unique environments, since all sub-environments of the same
                     environment type are expected to be identical.
      :type params: list


   .. py:method:: is_source(state = None)

      Returns True if the environment's state or the state passed as parameter (if
      not None) is the source state of the environment.

      This method is overriden for efficiency (for example, it would return False
      immediately if the meta-data part of the state is not the source's) and to
      cover special uses of the Set.

      :param state: None, or a state in environment format.
      :type state: dict

      :returns: *bool* -- Whether the state is the source state of the environment


   .. py:method:: equal(state_x, state_y)

      Checks whether the two input states are equal.

      This method is overriden in order to account for the fact that states with
      permuted substates must be considered equal if the permutations are indeed
      equivalent. The permutatation of substates is not done by permuting the
      substates directly bu by permuting the list of keys in ``state["_keys"]``.

      Thus, this method returns True if all keys of the state dictionary are equal
      (except ``_keys`` which is ignored) and the substates are equal, after
      accounting for the permutation.

      This method uses the parent method in order to compare the substates. If a
      substate is a dictionary containing the key ``_keys``, then it is assumed it is
      a Set state and the current method is used. If Set states appear deeper in the
      substates, the comparison is not expected to behave as expected.

      :param state_x: One of the Set states to be compared.
      :type state_x: dict
      :param state_y: The other Set state to be compared.
      :type state_y: dict

      :returns: *bool* -- True if the two input states are equal; False otherwise.


   .. py:method:: __eq__(other, ignored_keys = [])

      Checks whether the current environment instance is equal to the input
      environment instance.

      This method is overriden to ignore the keys:
          - ``envs_unique_cache``

      :param other: The environment instance to be compared.
      :type other: GFlowNetEnv
      :param ignored_keys: A list of keys (strings) to be ignored in the comparison. This parameter
                           may be used by subclasses that may need to ignore certain keys.
                           True if the environments's attributes are considered equal; False otherwise.
      :type ignored_keys: list