gflownet.buffer.base
====================

.. py:module:: gflownet.buffer.base

.. autoapi-nested-parse::

   Base Buffer class to handle train and test data sets, reply buffer, etc.


Classes
-------

.. autoapisummary::

   gflownet.buffer.base.BaseBuffer


Module Contents
---------------

.. py:class:: BaseBuffer(env, proxy, datadir, replay_buffer = None, replay_capacity = 0, train = None, test = None, use_main_buffer=False, check_diversity = False, diversity_check_reward_similarity = 0.1, progress_process_dataset = False, **kwargs)

   Initializes the Buffer.

   :param datadir: The directory where the data sets and buffers are stored. By default, it is
                   ./data/ but it is first set by the logger and passed as an argument to the
                   Buffer for consistency, especially to handle resumed runs.
   :type datadir: str or PosixPath
   :param replay_buffer: A path to a file containing a replay buffer. If provided, the initial
                         replay buffer will be loaded from this file. This is useful for for
                         resuming runs. By default it is None, which initializes an empty buffer and
                         creates a new file.
   :type replay_buffer: str or PosixPath
   :param replay_capacity: Size of the replay buffer. By default, it is zero, thus no replay buffer is
                           used.
   :type replay_capacity: int
   :param train: A dictionary describing the training data. The dictionary can have the
                 following keys:
                     - type : str
                         Type of data. It can be one of the following:
                             - pkl: a pickled file. Requires path.
                             - csv: a CSV file. Requires path.
                             - all: all terminating states of the environment.
                             - grid: a grid of terminating states. Requires n.
                             - uniform: terminating states uniformly sampled. Requires n.
                             - random: terminating states sampled randomly from the intial
                               GFN policy. Requires n.
                     - path : str
                         Path to a CSV of pickled file (for type={pkl, csv})
                     - n : int
                         Number of samples (for type={grid, uniform, random})
                     - seed : int
                         Seed for random sampling (for type={uniform, random})
   :type train: dict
   :param test: A dictionary describing the test data. The dictionary is akin the train
                dictionarity.
   :type test: dict
   :param use_main_buffer: If True, a main buffer is kept up to date, that is all training samples are
                           added to a buffer. It is False by default because of the potentially large
                           memory usage it can incur.
   :type use_main_buffer: bool
   :param check_diversity: If True, new samples are only added to the buffer if they are not close to
                           any of the samples already present in the buffer. env.isclose() is used
                           for the comparison. It is False by default because this comparison can
                           easily take most of the running time with an uncertain impact on the
                           performance. The implementation should be improved to make this functional.
   :type check_diversity: bool
   :param diversity_check_reward_similarity: The accepted level of similarity of rewards to include samples from the
                                             replay buffer in the diversity check. Assuming check_diversity is True,
                                             given a sample x with reward R(x), the diversity check will only be
                                             performed against those samples in the replay buffer whose reward
                                             difference with respect to R(x) is smaller than
                                             diversity_check_reward_similarity times the difference between the maximum
                                             reward and the minimum reward in the replay buffer. By default, it is 0.1.
                                             If the value is -1 (or smaller than 0.0), then the diversity check will be
                                             done with the full replay buffer. Note too that a value of 0.0 is
                                             equivalent to not doing any diversity check at all.
   :type diversity_check_reward_similarity: float
   :param progress_process_dataset: Whether to show a progress bar while processing the data sets. False by
                                    default.
   :type progress_process_dataset: bool


   .. py:attribute:: datadir


   .. py:attribute:: env


   .. py:attribute:: proxy


   .. py:attribute:: replay_capacity
      :value: 0


   .. py:attribute:: train_config
      :value: None


   .. py:attribute:: test_config
      :value: None


   .. py:attribute:: use_main_buffer
      :value: False


   .. py:attribute:: check_diversity
      :value: False


   .. py:attribute:: diversity_check_reward_similarity
      :value: 0.1


   .. py:attribute:: progress_process_dataset
      :value: False


   .. py:attribute:: replay_updated
      :value: False


   .. py:method:: init_replay(replay_buffer_path = None)

      Initializes the replay buffer.

      If a path to an existing replay buffer file is provided, then the replay buffer
      is initialized from it. Otherwise, a new empty buffer is created.

      :param replay_buffer: A path to a file containing a replay buffer. If provided, the initial
                            replay buffer will be loaded from this file. This is useful for for
                            resuming runs. By default it is None, which initializes an empty buffer and
                            creates a new file.
      :type replay_buffer: str or PosixPath

      :returns: * **replay** (*pandas.DataFrame*) -- DataFrame with the initial replay buffer.
                * **replay_csv** (*PosixPath*) -- Path of the CSV that will store the replay buffer.


   .. py:property:: replay_samples


   .. py:property:: replay_trajectories


   .. py:property:: replay_rewards


   .. py:method:: save_replay()


   .. py:method:: load_replay_from_path(path = None)

      Loads a replay buffer stored as a CSV file.


   .. py:method:: add(samples, trajectories, rewards, it, buffer='main', criterion='greater')

      Adds a batch of samples (with the trajectory actions and rewards) to the buffer.

      :param samples: A batch of terminating states.
      :type samples: list
      :param trajectories: The list of trajectory actions of each terminating state.
      :type trajectories: list
      :param rewards: The reward of each terminating state.
      :type rewards: list or tensor
      :param it: Iteration number.
      :type it: int
      :param buffer: Identifier of the buffer: main or replay
      :type buffer: str
      :param criterion: Identifier of the criterion. Currently, only greater is implemented.
      :type criterion: str


   .. py:method:: make_data_set(config)

      Constructs a data set as a DataFrame according to the configuration.


   .. py:method:: compute_stats(data)
      :staticmethod:


   .. py:method:: select(df, n, mode = 'permutation', rng = None)
      :staticmethod:


      Selects a subset of n data points from data_dict, according to the criterion
      indicated by mode.

      The data dict may be a training set or a replay buffer.

      The mode argument can be one of the following:
          - permutation: data points are sampled uniformly from the dictionary, without
            replacement, using the random generator rng.
          - uniform: data points are sampled uniformly from the dictionary, with
            replacement, using the random generator rng.
          - weighted: data points are sampled with probability proportional to their
            score.

      :param data_dict: A dictionary containing data for various data samples. The keys of the
                        dictionary represent the sample attributes and the values are lists
                        that contain the values of these attributes for all the samples.
                        All the values in the data dictionary should have the same length.
                        If mode == "weighted", the data dictionary must contain sample scores
                        (key "scores" or "rewards").
      :type data_dict: dict
      :param n: The number of samples to select from the dictionary.
      :type n: int
      :param mode: Sampling mode. Options: permutation, weighted.
      :type mode: str
      :param rng: A numpy random number generator, used for the permutation mode. Ignored
                  otherwise.
      :type rng: np.random.Generator

      :returns: *filtered_data_dict* -- A dict containing the data of n samples, selected from data_dict.