gflownet.envs.cube ================== .. py:module:: gflownet.envs.cube .. autoapi-nested-parse:: Classes to represent hyper-cube environments Attributes ---------- .. autoapisummary:: gflownet.envs.cube.CELL_MIN gflownet.envs.cube.CELL_MAX Classes ------- .. autoapisummary:: gflownet.envs.cube.CubeBase gflownet.envs.cube.ContinuousCube gflownet.envs.cube.HybridCube Module Contents --------------- .. py:data:: CELL_MIN :value: -1.0 .. py:data:: CELL_MAX :value: 1.0 .. py:class:: CubeBase(n_dim = 2, min_incr = 0.1, n_comp = 1, beta_params_min = 1.0, beta_params_max = 100.0, epsilon = 1e-06, kappa = 0.001, ignored_dims = None, fixed_distr_params = {'beta_weights': 1.0, 'beta_alpha': 10.0, 'beta_beta': 10.0, 'bernoulli_bts_prob': 0.1, 'bernoulli_eos_prob': 0.1}, random_distr_params = {'beta_weights': 1.0, 'beta_alpha': 10.0, 'beta_beta': 10.0, 'bernoulli_bts_prob': 0.1, 'bernoulli_eos_prob': 0.1}, **kwargs) Bases: :py:obj:`gflownet.envs.base.GFlowNetEnv`, :py:obj:`abc.ABC` Base class for hyper-cube environments, continuous or hybrid versions of the hyper-grid in which the continuous increments are modelled by a (mixture of) Beta distribution(s). The states space is the value of each dimension, defined in the closed set [0, 1]. If the value of a dimension gets larger than 1 - min_incr, then the trajectory is ended (the only possible action is EOS). .. attribute:: n_dim Dimensionality of the hyper-cube. :type: int .. attribute:: min_incr Minimum increment in the actions, in (0, 1). This is necessary to ensure that all trajectories have finite length. :type: float .. attribute:: n_comp Number of components in the mixture of Beta distributions. :type: int .. attribute:: epsilon Small constant to control the clamping interval of the inputs to the calculation of log probabilities. Clamping interval will be [epsilon, 1 - epsilon]. The smaller the value, the lower the probability to incur an unbounded result due to numerical precision, but the lower the precision too. Default: 1e-6. :type: float .. attribute:: kappa Small constant to control the intervals of the generated sets of states (in a grid or uniformly). States will be in the interval [kappa, 1 - kappa]. Default: 1e-3. :type: float .. attribute:: ignored_dims Boolean mask of ignored dimensions. This can be used for trajectories that may have multiple dimensions coupled or fixed. For each dimension, True if ignored, False, otherwise. If None, no dimension is ignored. :type: list .. py:attribute:: n_dim :value: 2 .. py:attribute:: min_incr :value: 0.1 .. py:attribute:: n_comp :value: 1 .. py:attribute:: beta_params_min :value: 1.0 .. py:attribute:: beta_params_max :value: 100.0 .. py:attribute:: source .. py:attribute:: epsilon :value: 1e-06 .. py:attribute:: kappa :value: 0.001 .. py:attribute:: continuous :value: True .. py:method:: get_action_space() :abstractmethod: Constructs list with all possible actions (excluding end of sequence) .. py:method:: get_policy_output(params) :abstractmethod: Defines the structure of the output of the policy model, from which an action is to be determined or sampled, by returning a vector with a fixed random policy. As a baseline, the policy is uniform over the dimensionality of the action space. Continuous environments will generally have to overwrite this method. .. py:method:: get_mask_invalid_actions_forward(state = None, done = None) :abstractmethod: Returns a list of length the action space with values: - True if the forward action is invalid from the current state. - False otherwise. For continuous or hybrid environments, this mask corresponds to the discrete part of the action space. .. py:method:: get_mask_invalid_actions_backward(state=None, done=None, parents_a=None) :abstractmethod: Returns a list of length the action space with values: - True if the backward action is invalid from the current state. - False otherwise. For continuous or hybrid environments, this mask corresponds to the discrete part of the action space. The base implementation below should be common to all discrete spaces as it relies on get_parents, which is environment-specific and must be implemented. Continuous environments will probably need to implement its specific version of this method. .. py:method:: states2proxy(states) Prepares a batch of states in "environment format" for a proxy: clips the states into [0, 1] and maps them to [CELL_MIN, CELL_MAX] :param states: A batch of states in environment format, either as a list of states or as a single tensor. :type states: list or tensor :returns: *A tensor containing all the states in the batch.* .. py:method:: states2policy(states) Prepares a batch of states in "environment format" for the policy model: clips the states into [0, 1] and maps them to [-1.0, 1.0] :param states: A batch of states in environment format, either as a list of states or as a single tensor. :type states: list or tensor :returns: *A tensor containing all the states in the batch.* .. py:method:: state2readable(state) Converts a state (a list of positions) into a human-readable string representing a state. .. py:method:: readable2state(readable) Converts a human-readable string representing a state into a state as a list of positions. .. py:method:: get_parents(state = None, done = None, action = None) :abstractmethod: Determines all parents and actions that lead to state. :param state: Representation of a state :type state: list :param done: Whether the trajectory is done. If None, done is taken from instance. :type done: bool :param action: Last action performed :type action: int :returns: * **parents** (*list*) -- List of parents in state format * **actions** (*list*) -- List of actions that lead to state for each parent in parents .. py:method:: sample_actions_batch(policy_outputs, mask = None, states_from = None, is_backward = False, random_action_prob = 0.0, temperature_logits = 1.0) :abstractmethod: Samples a batch of actions from a batch of policy outputs. .. py:method:: get_logprobs(policy_outputs, is_forward, actions, mask_invalid_actions = None, loginf = 1000) Computes log probabilities of actions given policy outputs and actions. .. py:method:: step(action) Executes step given an action. :param action: Action to be executed. An action is a tuple with two values: (dimension, increment). :type action: tuple :returns: * **self.state** (*list*) -- The sequence after executing the action * **action** (*int*) -- Action executed * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the root state .. py:method:: fit_kde(samples, kernel = 'gaussian', bandwidth = 0.1) Fits a Kernel Density Estimator on a batch of samples. :param samples: A batch of samples in proxy format. :type samples: tensor :param kernel: An identifier of the kernel to use for the density estimation. It must be a valid kernel for the scikit-learn method :py:meth:`sklearn.neighbors.KernelDensity`. :type kernel: str :param bandwidth: The bandwidth of the kernel. :type bandwidth: float .. py:method:: plot_reward_samples(samples, samples_reward, rewards, alpha = 0.5, dpi = 150, max_samples = 500, **kwargs) Plots the reward contour alongside a batch of samples. :param samples: A batch of samples from the GFlowNet policy in proxy format. These samples will be plotted on top of the reward density. :type samples: tensor :param samples_reward: A batch of samples containing a grid over the sample space, from which the reward has been obtained. These samples are used to plot the contour of reward density. :type samples_reward: tensor :param rewards: The rewards of samples_reward. It should be a vector of dimensionality n_per_dim ** 2 and be sorted such that the each block at rewards[i * n_per_dim:i * n_per_dim + n_per_dim] correspond to the rewards at the i-th row of the grid of samples, from top to bottom. The same is assumed for samples_reward. :type rewards: tensor :param alpha: Transparency of the reward contour. :type alpha: float :param dpi: Dots per inch, indicating the resolution of the plot. :type dpi: int :param max_samples: Maximum of number of samples to include in the plot. :type max_samples: int .. py:method:: plot_kde(samples, kde, alpha = 0.5, dpi=150, colorbar = True, **kwargs) Plots the density previously estimated from a batch of samples via KDE over the entire sample space. :param samples: A batch of samples containing a grid over the sample space. These samples are used to plot the contour of the estimated density. :type samples: tensor :param kde: A scikit-learn KDE object fit with a batch of samples. :type kde: KDE :param alpha: Transparency of the density contour. :type alpha: float :param dpi: Dots per inch, indicating the resolution of the plot. :type dpi: int .. py:class:: ContinuousCube(**kwargs) Bases: :py:obj:`CubeBase` Continuous hyper-cube environment (continuous version of a hyper-grid) in which the action space consists of the increment of each dimension d, modelled by a mixture of Beta distributions. The state space is the value of each dimension. In order to ensure that all trajectories are of finite length, actions have a minimum increment for all dimensions determined by min_incr. If the value of any dimension is larger than 1 - min_incr, then that dimension can't be further incremented. In order to ensure the coverage of the state space, the first action (from the source state) is not constrained by the minimum increment. Actions do not represent absolute increments but rather the relative increment with respect to the distance to the edges of the hyper-cube, from the minimum increment. That is, if dimension d of a state has value 0.3, the minimum increment (min_incr) is 0.1 and the maximum value is 1.0, an action of 0.5 will increment the value of the dimension in 0.5 * (1.0 - 0.3 - 0.1) = 0.5 * 0.6 = 0.3. Therefore, the value of d in the next state will be 0.3 + 0.3 = 0.6. .. attribute:: n_dim Dimensionality of the hyper-cube. :type: int .. attribute:: min_incr Minimum increment in the actions, in (0, 1). This is necessary to ensure that all trajectories have finite length. :type: float .. attribute:: n_comp Number of components in the mixture of Beta distributions. :type: int .. py:attribute:: mask_dim_base :value: 3 .. py:method:: get_action_space() The action space is continuous, thus not defined as such here. The actions contained in the action space are "representatives" The actions are tuples of length n_dim + 1, where the value at position d indicates the increment of dimension d, and the value at position -1 indicates whether the action is from or to source (1), or 0 otherwise. EOS is indicated by np.inf for all dimensions. The action space consists of the EOS actions and two representatives: - Generic increment action, not from or to source: (0, 0, ..., 0, 0) - Generic increment action, from or to source: (0, 0, ..., 0, 1) - EOS: (inf, inf, ..., inf, inf) .. py:method:: action2representative(action) Replaces the continuous values of an action by 0s (the "generic" or "representative" action in the first position of the action space), so that they can be compared against the action space or a mask. If the action is EOS, it is returned as is. :param action: An actual action of the Cube environment (with continuous values) :type action: tuple :returns: *tuple* -- A representative of the action, where continuous values are replaced by zeros. .. py:method:: get_policy_output(params) Defines the structure of the output of the policy model. The policy output will be used to initialize a distribution, from which an action is to be determined or sampled. This method returns a vector with a fixed policy defined by params. The environment consists of both continuous and discrete actions. Continuous actions For each dimension d of the hyper-cube and component c of the mixture, the output of the policy should return: 1) the weight of the component in the mixture, 2) the pre-alpha parameter of the Beta distribution to sample the increment, 3) the pre-beta parameter of the Beta distribution to sample the increment. These parameters are the first n_dim * n_comp * 3 of the policy output such that the first 3 x C elements correspond to the first dimension, and so on. Discrete actions Additionally, the policy output contains one logit (pos -1) of a Bernoulli distribution to model the (discrete) forward probability of selecting the EOS action and another logit (pos -2) for the (discrete) backward probability of returning to the source node. Therefore, the output of the policy model has dimensionality D x C x 3 + 2, where D is the number of dimensions (self.n_dim) and C is the number of components (self.n_comp). See --- _beta_params_to_policy_outputs() .. py:method:: get_mask_invalid_actions_forward(state = None, done = None) The action space is continuous, thus the mask is not only of invalid actions as in discrete environments, but also an indicator of "special cases", for example states from which only certain actions are possible. The values of True/False intend to approximately stick to the semantics in discrete environments, where the mask is of "invalid" actions, but it is important to note that a direct interpretation in this sense does not always apply. For example, the mask values of special cases are True if the special cases they refer to are "invalid". In other words, the values are False if the state has the special case. The forward mask has the following structure: - 0 : whether a continuous action is invalid. True if the value at any dimension is larger than 1 - min_incr, or if done is True. False otherwise. - 1 : special case when the state is the source state. False when the state is the source state, True otherwise. - 2 : whether EOS action is invalid. EOS is valid from any state, except the source state or if done is True. - -n_dim: : dimensions that should be ignored when sampling actions or computing logprobs. This can be used for trajectories that may have multiple dimensions coupled or fixed. For each dimension, True if ignored, False, otherwise. .. py:method:: get_mask_invalid_actions_backward(state=None, done=None, parents_a=None) The action space is continuous, thus the mask is not only of invalid actions as in discrete environments, but also an indicator of "special cases", for example states from which only certain actions are possible. In order to approximately stick to the semantics in discrete environments, where the mask is of "invalid" actions, that is the value is True if an action is invalid, the mask values of special cases are True if the special cases they refer to are "invalid". In other words, the values are False if the state has the special case. The backward mask has the following structure: - 0 : whether a continuous action is invalid. True if the value at any dimension is smaller than min_incr, or if done is True. False otherwise. - 1 : special case when back-to-source action is the only possible action. False if any dimension is smaller than min_incr, True otherwise. - 2 : whether EOS action is invalid. False only if done is True, True (invalid) otherwise. - -n_dim: : dimensions that should be ignored when sampling actions or computing logprobs. this can be used for trajectories that may have multiple dimensions coupled or fixed. for each dimension, true if ignored, false, otherwise. By default, no dimension is ignored. .. py:method:: get_valid_actions(mask = None, state = None, done = None, backward = False) Returns the list of non-invalid (valid, for short) according to the mask of invalid actions. As a continuous environment, the returned actions are "representatives", that is the actions represented in the action space. :param mask: The mask of a state. If None, it is computed in place. :type mask: list (optional) :param state: A state in GFlowNet format. If None, self.state is used. :type state: list (optional) :param done: Whether the trajectory is done. If None, self.done is used. :type done: bool (optional) :param backward: True if the transtion is backwards; False if forward. :type backward: bool :returns: *list* -- The list of representatives of the valid actions. .. py:method:: get_parents(state = None, done = None, action = None) Defined only because it is required. A ContinuousEnv should be created to avoid this issue. .. py:method:: relative_to_absolute_increments(states, increments_rel, is_backward) Returns a batch of absolute increments (actions) given a batch of states, relative increments and minimum_increments. Given a dimension value x, a relative increment r, and a minimum increment m, then the absolute increment a is given by: Forward: a = m + r * (1 - x - m) Backward: a = m + r * (x - m) .. py:method:: absolute_to_relative_increments(states, increments_abs, is_backward) Returns a batch of relative increments (as sampled by the Beta distributions) given a batch of states, absolute increments (actions) and minimum_increments. Given a dimension value x, an absolute increment a, and a minimum increment m, then the relative increment r is given by: Forward: r = (a - m) / (1 - x - m) Backward: r = (a - m) / (x - m) .. py:method:: sample_actions_batch(policy_outputs, mask = None, states_from = None, is_backward = False, random_action_prob = 0.0, temperature_logits = 1.0) Samples a batch of actions from a batch of policy outputs. This method overwrites the methof of the GFlowNetEnv because it is a continious enviroment. :param policy_outputs: The output of the GFlowNet policy model. :type policy_outputs: tensor :param mask: The mask of invalid actions. For continuous or mixed environments, the mask may be tensor with an arbitrary length contaning information about special states, as defined elsewhere in the environment. :type mask: tensor :param states_from: The states originating the actions, in GFlowNet format. Ignored in discrete environments and only required in certain continuous environments. :type states_from: tensor :param is_backward: True if the actions are backward, False if the actions are forward (default). :type is_backward: bool :param random_action_prob: The probability of sampling a random action. If larger than one, the model outputs will be replaced by a random policy vector with probability `random_action_prob`, according to Bernoulli distribution. :type random_action_prob: float, optional :param temperature_logits: A scalar by which the model outputs are divided to temper the sampling distribution. :type temperature_logits: float, optional :returns: **actions** (*list*) -- The list of sampled actions. .. py:method:: get_logprobs(policy_outputs, actions, mask, states_from, is_backward) Computes log probabilities of actions given policy outputs and actions. :param policy_outputs: The output of the GFlowNet policy model. :type policy_outputs: tensor :param mask: The mask containing information about invalid actions and special cases. :type mask: tensor :param actions: The actions (absolute increments) from each state in the batch for which to compute the log probability. :type actions: list or tensor :param states_from: The states originating the actions, in GFlowNet format. They are required so as to compute the relative increments and the Jacobian. :type states_from: tensor :param is_backward: True if the actions are backward, False if the actions are forward (default). Required, since the computation for forward and backward actions is different. :type is_backward: bool .. py:method:: step(action) Executes step given an action. An action is the absolute increment of each dimension. :param action: Action to be executed. An action is a tuple of length n_dim, with the absolute increment for each dimension. :type action: tuple :returns: * **self.state** (*list*) -- The sequence after executing the action * **action** (*int*) -- Action executed * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the root state .. py:method:: step_backwards(action) Executes backward step given an action. An action is the absolute decrement of each dimension. :param action: Action to be executed. An action is a tuple of length n_dim, with the absolute decrement for each dimension. :type action: tuple :returns: * **self.state** (*list*) -- The sequence after executing the action * **action** (*int*) -- Action executed * **valid** (*bool*) -- False, if the action is not allowed for the current state, e.g. stop at the root state .. py:method:: get_grid_terminating_states(n_states, kappa = None) Constructs a grid of terminating states within the range of the hyper-cube. :param n_states: Requested number of states. The actual number of states will be rounded up such that all dimensions have the same number of states. :type n_states: int :param kappa: Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The grid will thus be in [kappa, 1 - kappa]. If None, self.kappa will be used. :type kappa: float .. py:method:: get_uniform_terminating_states(n_states, seed = None, kappa = None) Constructs a set of terminating states sampled uniformly within the range of the hyper-cube. :param n_states: Number of states in the returned list. :type n_states: int :param kappa: Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The states will thus be uniformly sampled in [kappa, 1 - kappa]. If None, self.kappa will be used. :type kappa: float .. py:class:: HybridCube(**kwargs) Bases: :py:obj:`gflownet.envs.composite.setfix.SetFix` :param subenvs: An iterable containing the set of the sub-environments. :type subenvs: iterable .. py:attribute:: n_dim .. py:method:: states2proxy(states) Prepares a batch of states in environment format for a proxy. The input states are in the environment format of the Set. The outputs contain only the Cube part and the format is as in :py:meth:`gflownet.envs.cube.ContinuousCube.states2proxy`. :param states: A batch of states in Set environment format. :type states: list :returns: *A tensor containing all the states in the batch.* .. py:method:: get_grid_terminating_states(n_states, kappa = None) Constructs a grid of terminating states within the range of the hyper-cube. :param n_states: Requested number of states. The actual number of states will be rounded up such that all dimensions have the same number of states. :type n_states: int :param kappa: Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The grid will thus be in [kappa, 1 - kappa]. If None, self.kappa will be used. :type kappa: float .. py:method:: get_uniform_terminating_states(n_states, seed = None, kappa = None) Constructs a set of terminating states sampled uniformly within the range of the hyper-cube. :param n_states: Number of states in the returned list. :type n_states: int :param kappa: Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The states will thus be uniformly sampled in [kappa, 1 - kappa]. If None, self.kappa will be used. :type kappa: float .. py:method:: fit_kde(samples, kernel = 'gaussian', bandwidth = 0.1) Fits a Kernel Density Estimator on a batch of samples. Simply calls fit_kde() of CubeBase. .. py:method:: plot_reward_samples(samples, samples_reward, rewards, alpha = 0.5, dpi = 150, max_samples = 500, **kwargs) Plots the reward contour alongside a batch of samples. Simply calls plot_reward_samples() of CubeBase. .. py:method:: plot_kde(samples, kde, alpha = 0.5, dpi=150, colorbar = True, **kwargs) Plots the density previously estimated from a batch of samples via KDE over the entire sample space. Simply calls plot_reward_samples() of CubeBase.