gflownet.envs.tetris
====================

.. py:module:: gflownet.envs.tetris

.. autoapi-nested-parse::

   An environment inspired by the game of Tetris.


Attributes
----------

.. autoapisummary::

   gflownet.envs.tetris.PIECES
   gflownet.envs.tetris.PIECES_COLORS


Classes
-------

.. autoapisummary::

   gflownet.envs.tetris.Tetris


Module Contents
---------------

.. py:data:: PIECES

.. py:data:: PIECES_COLORS

.. py:class:: Tetris(width = 10, height = 20, pieces = ['I', 'J', 'L', 'O', 'S', 'T', 'Z'], rotations = [0, 90, 180, 270], allow_redundant_rotations = False, allow_eos_before_full = False, **kwargs)

   Bases: :py:obj:`gflownet.envs.base.GFlowNetEnv`


   Tetris environment: an environment inspired by the game of tetris. It's not
   supposed to be a game, but rather a toy environment with an intuitive state and
   action space.

   The state space is 2D board, with all the combinations of pieces on it. Pieces that
   are added to the board are identified by a number that starts from
   piece_idx * max_pieces_per_type, and is incremented by 1 with each new piece from
   the same type. This number fills in the cells of the board where the piece is
   located. This enables telling apart pieces of the same type.

   The action space is the choice of piece, its rotation and horizontal location
   where to drop the piece. The action space may be constrained according to needs.

   .. attribute:: width

      Width of the board.

      :type: int

   .. attribute:: height

      Height of the board.

      :type: int

   .. attribute:: pieces

      Pieces to use, identified by [I, J, L, O, S, T, Z]

      :type: list

   .. attribute:: rotations

      Valid rotations, from [0, 90, 180, 270]

      :type: list


   .. py:attribute:: device


   .. py:attribute:: int


   .. py:attribute:: width
      :value: 10


   .. py:attribute:: height
      :value: 20


   .. py:attribute:: pieces
      :value: ['I', 'J', 'L', 'O', 'S', 'T', 'Z']


   .. py:attribute:: rotations
      :value: [0, 90, 180, 270]


   .. py:attribute:: allow_redundant_rotations
      :value: False


   .. py:attribute:: allow_eos_before_full
      :value: False


   .. py:attribute:: max_pieces_per_type
      :value: 100


   .. py:attribute:: piece2idx


   .. py:attribute:: idx2piece


   .. py:attribute:: piece2mat


   .. py:attribute:: rot2idx


   .. py:attribute:: source


   .. py:attribute:: eos


   .. py:attribute:: piece_rotation_mat


   .. py:attribute:: piece_rotation_mask_mat


   .. py:method:: get_action_space()

      Constructs list with all possible actions, including eos. An action is
      represented by a tuple of length 3 (piece, rotation, col). The piece is
      represented by its index, the rotation by the integer rotation in degrees
      and the location by horizontal cell in the board of the left-most part of the
      piece.


   .. py:method:: get_mask_invalid_actions_forward(state = None, done = None)

      Returns a list of length the action space with values:
          - True if the forward action is invalid from the current state.
          - False otherwise.


   .. py:method:: states2proxy(states)

      Prepares a batch of states in "environment format" for a proxy: : simply
      converts non-zero (non-empty) cells into 1s.

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list of 2D tensors or 3D tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: states2policy(states)

      Prepares a batch of states in "environment format" for the policy model.

      See states2proxy().

      :param states: A batch of states in environment format, either as a list of states or as a
                     single tensor.
      :type states: list of 2D tensors or 3D tensor

      :returns: *A tensor containing all the states in the batch.*


   .. py:method:: state2readable(state = None)

      Converts a state (board) into a human-friendly string.


   .. py:method:: readable2state(readable, alphabet={})

      Converts a human-readable string representing a state into a state as a list of
      positions.


   .. py:method:: get_parents(state = None, done = None, action = None)

      Determines all parents and actions that lead to state.

      See: _is_parent_action()

      :param state: Representation of a state, as a list of length length where each element is
                    the position at each dimension.
      :type state: list
      :param done: Whether the trajectory is done. If None, done is taken from instance.
      :type done: bool
      :param action: Ignored
      :type action: None

      :returns: * **parents** (*list*) -- List of parents in state format
                * **actions** (*list*) -- List of actions that lead to state for each parent in parents


   .. py:method:: step(action, skip_mask_check = False)

      Executes step given an action.

      :param action: Action to be executed. An action is a tuple int values indicating the
                     dimensions to increment by 1.
      :type action: tuple
      :param skip_mask_check: If True, skip computing forward mask of invalid actions to check if the
                              action is valid.
      :type skip_mask_check: bool

      :returns: * **self.state** (*list*) -- The sequence after executing the action
                * **action** (*tuple*) -- Action executed
                * **valid** (*bool*) -- False, if the action is not allowed for the current state.


   .. py:method:: set_state(state, done = False)

      Sets the state and done. If done is True but incompatible with state (done is
      True, allow_eos_before_full is False and state is not full), then force done
      False and print warning. Also, make sure state is tensor.


   .. py:method:: plot_samples_topk(samples, rewards, k_top = 10, n_rows = 2, dpi = 150, **kwargs)

      Plot tetris boards of top K samples.

      :param samples: List of terminating states sampled from the policy.
      :type samples: list
      :param rewards: Rewards of the samples.
      :type rewards: list
      :param k_top: The number of samples that will be included in the plot. The k_top samples
                    with the highest reward are selected.
      :type k_top: int
      :param n_rows: Number of rows in the plot. The number of columns will be calculated
                     according the n_rows and k_top.
      :type n_rows: int
      :param dpi: DPI (dots per inch) of the figure, to determine the resolution.
      :type dpi: int