gflownet.envs.cube
Classes to represent hyper-cube environments
Attributes
Classes
Base class for hyper-cube environments, continuous or hybrid versions of the |
|
Continuous hyper-cube environment (continuous version of a hyper-grid) in which the |
|
Module Contents
- class gflownet.envs.cube.CubeBase(n_dim=2, min_incr=0.1, n_comp=1, beta_params_min=1.0, beta_params_max=100.0, epsilon=1e-06, kappa=0.001, ignored_dims=None, fixed_distr_params={'beta_weights': 1.0, 'beta_alpha': 10.0, 'beta_beta': 10.0, 'bernoulli_bts_prob': 0.1, 'bernoulli_eos_prob': 0.1}, random_distr_params={'beta_weights': 1.0, 'beta_alpha': 10.0, 'beta_beta': 10.0, 'bernoulli_bts_prob': 0.1, 'bernoulli_eos_prob': 0.1}, **kwargs)[source]
Bases:
gflownet.envs.base.GFlowNetEnv,abc.ABCBase class for hyper-cube environments, continuous or hybrid versions of the hyper-grid in which the continuous increments are modelled by a (mixture of) Beta distribution(s).
The states space is the value of each dimension, defined in the closed set [0, 1]. If the value of a dimension gets larger than 1 - min_incr, then the trajectory is ended (the only possible action is EOS).
- Parameters:
n_dim (int)
min_incr (float)
n_comp (int)
beta_params_min (float)
beta_params_max (float)
epsilon (float)
kappa (float)
ignored_dims (Optional[List[bool]])
fixed_distr_params (dict)
random_distr_params (dict)
- min_incr[source]
Minimum increment in the actions, in (0, 1). This is necessary to ensure that all trajectories have finite length.
- Type:
float
- epsilon[source]
Small constant to control the clamping interval of the inputs to the calculation of log probabilities. Clamping interval will be [epsilon, 1 - epsilon]. The smaller the value, the lower the probability to incur an unbounded result due to numerical precision, but the lower the precision too. Default: 1e-6.
- Type:
float
- kappa[source]
Small constant to control the intervals of the generated sets of states (in a grid or uniformly). States will be in the interval [kappa, 1 - kappa]. Default: 1e-3.
- Type:
float
- ignored_dims
Boolean mask of ignored dimensions. This can be used for trajectories that may have multiple dimensions coupled or fixed. For each dimension, True if ignored, False, otherwise. If None, no dimension is ignored.
- Type:
list
- abstract get_action_space()[source]
Constructs list with all possible actions (excluding end of sequence)
- abstract get_policy_output(params)[source]
Defines the structure of the output of the policy model, from which an action is to be determined or sampled, by returning a vector with a fixed random policy. As a baseline, the policy is uniform over the dimensionality of the action space.
Continuous environments will generally have to overwrite this method.
- Parameters:
params (dict)
- Return type:
torchtyping.TensorType[policy_output_dim]
- abstract get_mask_invalid_actions_forward(state=None, done=None)[source]
- Returns a list of length the action space with values:
True if the forward action is invalid from the current state.
False otherwise.
For continuous or hybrid environments, this mask corresponds to the discrete part of the action space.
- Parameters:
state (Optional[List])
done (Optional[bool])
- Return type:
List
- abstract get_mask_invalid_actions_backward(state=None, done=None, parents_a=None)[source]
- Returns a list of length the action space with values:
True if the backward action is invalid from the current state.
False otherwise.
For continuous or hybrid environments, this mask corresponds to the discrete part of the action space.
The base implementation below should be common to all discrete spaces as it relies on get_parents, which is environment-specific and must be implemented. Continuous environments will probably need to implement its specific version of this method.
- states2proxy(states)[source]
Prepares a batch of states in “environment format” for a proxy: clips the states into [0, 1] and maps them to [CELL_MIN, CELL_MAX]
- Parameters:
states (list or tensor) – A batch of states in environment format, either as a list of states or as a single tensor.
- Returns:
A tensor containing all the states in the batch.
- Return type:
torchtyping.TensorType[batch, state_dim]
- states2policy(states)[source]
Prepares a batch of states in “environment format” for the policy model: clips the states into [0, 1] and maps them to [-1.0, 1.0]
- Parameters:
states (list or tensor) – A batch of states in environment format, either as a list of states or as a single tensor.
- Returns:
A tensor containing all the states in the batch.
- Return type:
torchtyping.TensorType[batch, state_dim]
- state2readable(state)[source]
Converts a state (a list of positions) into a human-readable string representing a state.
- Parameters:
state (List)
- Return type:
str
- readable2state(readable)[source]
Converts a human-readable string representing a state into a state as a list of positions.
- Parameters:
readable (str)
- Return type:
List
- abstract get_parents(state=None, done=None, action=None)[source]
Determines all parents and actions that lead to state.
- Parameters:
state (list) – Representation of a state
done (bool) – Whether the trajectory is done. If None, done is taken from instance.
action (int) – Last action performed
- Returns:
parents (list) – List of parents in state format
actions (list) – List of actions that lead to state for each parent in parents
- Return type:
Tuple[List[List], List[Tuple[int, float]]]
- abstract sample_actions_batch(policy_outputs, mask=None, states_from=None, is_backward=False, random_action_prob=0.0, temperature_logits=1.0)[source]
Samples a batch of actions from a batch of policy outputs.
- Parameters:
policy_outputs (torchtyping.TensorType[n_states, policy_output_dim])
mask (Optional[torchtyping.TensorType[n_states, policy_output_dim]])
states_from (Optional[List])
is_backward (Optional[bool])
random_action_prob (Optional[float])
temperature_logits (Optional[float])
- Return type:
Tuple[List[Tuple], torchtyping.TensorType[n_states]]
- get_logprobs(policy_outputs, is_forward, actions, mask_invalid_actions=None, loginf=1000)[source]
Computes log probabilities of actions given policy outputs and actions.
- Parameters:
policy_outputs (torchtyping.TensorType[n_states, policy_output_dim])
is_forward (bool)
actions (Union[List, torchtyping.TensorType[n_states, action_dim]])
mask_invalid_actions (torchtyping.TensorType[batch_size, policy_output_dim])
loginf (float)
- Return type:
torchtyping.TensorType[batch_size]
- step(action)[source]
Executes step given an action.
- Parameters:
action (tuple) – Action to be executed. An action is a tuple with two values: (dimension, increment).
- Returns:
self.state (list) – The sequence after executing the action
action (int) – Action executed
valid (bool) – False, if the action is not allowed for the current state, e.g. stop at the root state
- Return type:
Tuple[List[float], Tuple[int, float], bool]
- fit_kde(samples, kernel='gaussian', bandwidth=0.1)[source]
Fits a Kernel Density Estimator on a batch of samples.
- Parameters:
samples (tensor) – A batch of samples in proxy format.
kernel (str) – An identifier of the kernel to use for the density estimation. It must be a valid kernel for the scikit-learn method
sklearn.neighbors.KernelDensity().bandwidth (float) – The bandwidth of the kernel.
- plot_reward_samples(samples, samples_reward, rewards, alpha=0.5, dpi=150, max_samples=500, **kwargs)[source]
Plots the reward contour alongside a batch of samples.
- Parameters:
samples (tensor) – A batch of samples from the GFlowNet policy in proxy format. These samples will be plotted on top of the reward density.
samples_reward (tensor) – A batch of samples containing a grid over the sample space, from which the reward has been obtained. These samples are used to plot the contour of reward density.
rewards (tensor) – The rewards of samples_reward. It should be a vector of dimensionality n_per_dim ** 2 and be sorted such that the each block at rewards[i * n_per_dim:i * n_per_dim + n_per_dim] correspond to the rewards at the i-th row of the grid of samples, from top to bottom. The same is assumed for samples_reward.
alpha (float) – Transparency of the reward contour.
dpi (int) – Dots per inch, indicating the resolution of the plot.
max_samples (int) – Maximum of number of samples to include in the plot.
- plot_kde(samples, kde, alpha=0.5, dpi=150, colorbar=True, **kwargs)[source]
Plots the density previously estimated from a batch of samples via KDE over the entire sample space.
- Parameters:
samples (tensor) – A batch of samples containing a grid over the sample space. These samples are used to plot the contour of the estimated density.
kde (KDE) – A scikit-learn KDE object fit with a batch of samples.
alpha (float) – Transparency of the density contour.
dpi (int) – Dots per inch, indicating the resolution of the plot.
colorbar (bool)
- class gflownet.envs.cube.ContinuousCube(**kwargs)[source]
Bases:
CubeBaseContinuous hyper-cube environment (continuous version of a hyper-grid) in which the action space consists of the increment of each dimension d, modelled by a mixture of Beta distributions. The state space is the value of each dimension. In order to ensure that all trajectories are of finite length, actions have a minimum increment for all dimensions determined by min_incr. If the value of any dimension is larger than 1 - min_incr, then that dimension can’t be further incremented. In order to ensure the coverage of the state space, the first action (from the source state) is not constrained by the minimum increment.
Actions do not represent absolute increments but rather the relative increment with respect to the distance to the edges of the hyper-cube, from the minimum increment. That is, if dimension d of a state has value 0.3, the minimum increment (min_incr) is 0.1 and the maximum value is 1.0, an action of 0.5 will increment the value of the dimension in 0.5 * (1.0 - 0.3 - 0.1) = 0.5 * 0.6 = 0.3. Therefore, the value of d in the next state will be 0.3 + 0.3 = 0.6.
- min_incr[source]
Minimum increment in the actions, in (0, 1). This is necessary to ensure that all trajectories have finite length.
- Type:
float
- get_action_space()[source]
The action space is continuous, thus not defined as such here.
The actions contained in the action space are “representatives”
The actions are tuples of length n_dim + 1, where the value at position d indicates the increment of dimension d, and the value at position -1 indicates whether the action is from or to source (1), or 0 otherwise.
EOS is indicated by np.inf for all dimensions.
The action space consists of the EOS actions and two representatives: - Generic increment action, not from or to source: (0, 0, …, 0, 0) - Generic increment action, from or to source: (0, 0, …, 0, 1) - EOS: (inf, inf, …, inf, inf)
- action2representative(action)[source]
Replaces the continuous values of an action by 0s (the “generic” or “representative” action in the first position of the action space), so that they can be compared against the action space or a mask.
If the action is EOS, it is returned as is.
- Parameters:
action (tuple) – An actual action of the Cube environment (with continuous values)
- Returns:
tuple – A representative of the action, where continuous values are replaced by zeros.
- Return type:
Tuple
- get_policy_output(params)[source]
Defines the structure of the output of the policy model.
The policy output will be used to initialize a distribution, from which an action is to be determined or sampled. This method returns a vector with a fixed policy defined by params.
The environment consists of both continuous and discrete actions.
Continuous actions
For each dimension d of the hyper-cube and component c of the mixture, the output of the policy should return:
the weight of the component in the mixture,
the pre-alpha parameter of the Beta distribution to sample the increment,
the pre-beta parameter of the Beta distribution to sample the increment.
These parameters are the first n_dim * n_comp * 3 of the policy output such that the first 3 x C elements correspond to the first dimension, and so on.
Discrete actions
Additionally, the policy output contains one logit (pos -1) of a Bernoulli distribution to model the (discrete) forward probability of selecting the EOS action and another logit (pos -2) for the (discrete) backward probability of returning to the source node.
Therefore, the output of the policy model has dimensionality D x C x 3 + 2, where D is the number of dimensions (self.n_dim) and C is the number of components (self.n_comp).
See
_beta_params_to_policy_outputs()
- Parameters:
params (dict)
- Return type:
torchtyping.TensorType[policy_output_dim]
- get_mask_invalid_actions_forward(state=None, done=None)[source]
The action space is continuous, thus the mask is not only of invalid actions as in discrete environments, but also an indicator of “special cases”, for example states from which only certain actions are possible.
The values of True/False intend to approximately stick to the semantics in discrete environments, where the mask is of “invalid” actions, but it is important to note that a direct interpretation in this sense does not always apply.
For example, the mask values of special cases are True if the special cases they refer to are “invalid”. In other words, the values are False if the state has the special case.
The forward mask has the following structure:
0 : whether a continuous action is invalid. True if the value at any dimension is larger than 1 - min_incr, or if done is True. False otherwise.
1 : special case when the state is the source state. False when the state is the source state, True otherwise.
2 : whether EOS action is invalid. EOS is valid from any state, except the source state or if done is True.
-n_dim: : dimensions that should be ignored when sampling actions or computing logprobs. This can be used for trajectories that may have multiple dimensions coupled or fixed. For each dimension, True if ignored, False, otherwise.
- Parameters:
state (Optional[List])
done (Optional[bool])
- Return type:
List
- get_mask_invalid_actions_backward(state=None, done=None, parents_a=None)[source]
The action space is continuous, thus the mask is not only of invalid actions as in discrete environments, but also an indicator of “special cases”, for example states from which only certain actions are possible.
In order to approximately stick to the semantics in discrete environments, where the mask is of “invalid” actions, that is the value is True if an action is invalid, the mask values of special cases are True if the special cases they refer to are “invalid”. In other words, the values are False if the state has the special case.
The backward mask has the following structure:
0 : whether a continuous action is invalid. True if the value at any dimension is smaller than min_incr, or if done is True. False otherwise.
1 : special case when back-to-source action is the only possible action. False if any dimension is smaller than min_incr, True otherwise.
2 : whether EOS action is invalid. False only if done is True, True (invalid) otherwise.
-n_dim: : dimensions that should be ignored when sampling actions or computing logprobs. this can be used for trajectories that may have multiple dimensions coupled or fixed. for each dimension, true if ignored, false, otherwise. By default, no dimension is ignored.
- get_valid_actions(mask=None, state=None, done=None, backward=False)[source]
Returns the list of non-invalid (valid, for short) according to the mask of invalid actions.
As a continuous environment, the returned actions are “representatives”, that is the actions represented in the action space.
- Parameters:
mask (list (optional)) – The mask of a state. If None, it is computed in place.
state (list (optional)) – A state in GFlowNet format. If None, self.state is used.
done (bool (optional)) – Whether the trajectory is done. If None, self.done is used.
backward (bool) – True if the transtion is backwards; False if forward.
- Returns:
list – The list of representatives of the valid actions.
- Return type:
List[Tuple]
- get_parents(state=None, done=None, action=None)[source]
Defined only because it is required. A ContinuousEnv should be created to avoid this issue.
- Parameters:
state (List)
done (bool)
action (Tuple[int, float])
- Return type:
Tuple[List[List], List[Tuple[int, float]]]
- relative_to_absolute_increments(states, increments_rel, is_backward)[source]
Returns a batch of absolute increments (actions) given a batch of states, relative increments and minimum_increments.
Given a dimension value x, a relative increment r, and a minimum increment m, then the absolute increment a is given by:
Forward:
a = m + r * (1 - x - m)
Backward:
a = m + r * (x - m)
- Parameters:
states (torchtyping.TensorType[n_states, n_dim])
increments_rel (torchtyping.TensorType[n_states, n_dim])
is_backward (bool)
- absolute_to_relative_increments(states, increments_abs, is_backward)[source]
Returns a batch of relative increments (as sampled by the Beta distributions) given a batch of states, absolute increments (actions) and minimum_increments.
Given a dimension value x, an absolute increment a, and a minimum increment m, then the relative increment r is given by:
Forward:
r = (a - m) / (1 - x - m)
Backward:
r = (a - m) / (x - m)
- Parameters:
states (torchtyping.TensorType[n_states, n_dim])
increments_abs (torchtyping.TensorType[n_states, n_dim])
is_backward (bool)
- sample_actions_batch(policy_outputs, mask=None, states_from=None, is_backward=False, random_action_prob=0.0, temperature_logits=1.0)[source]
Samples a batch of actions from a batch of policy outputs.
This method overwrites the methof of the GFlowNetEnv because it is a continious enviroment.
- Parameters:
policy_outputs (tensor) – The output of the GFlowNet policy model.
mask (tensor) – The mask of invalid actions. For continuous or mixed environments, the mask may be tensor with an arbitrary length contaning information about special states, as defined elsewhere in the environment.
states_from (tensor) – The states originating the actions, in GFlowNet format. Ignored in discrete environments and only required in certain continuous environments.
is_backward (bool) – True if the actions are backward, False if the actions are forward (default).
random_action_prob (float, optional) – The probability of sampling a random action. If larger than one, the model outputs will be replaced by a random policy vector with probability
random_action_prob, according to Bernoulli distribution.temperature_logits (float, optional) – A scalar by which the model outputs are divided to temper the sampling distribution.
- Returns:
actions (list) – The list of sampled actions.
- Return type:
Tuple[List[Tuple], torchtyping.TensorType[n_states]]
- get_logprobs(policy_outputs, actions, mask, states_from, is_backward)[source]
Computes log probabilities of actions given policy outputs and actions.
- Parameters:
policy_outputs (tensor) – The output of the GFlowNet policy model.
mask (tensor) – The mask containing information about invalid actions and special cases.
actions (list or tensor) – The actions (absolute increments) from each state in the batch for which to compute the log probability.
states_from (tensor) – The states originating the actions, in GFlowNet format. They are required so as to compute the relative increments and the Jacobian.
is_backward (bool) – True if the actions are backward, False if the actions are forward (default). Required, since the computation for forward and backward actions is different.
- Return type:
torchtyping.TensorType[batch_size]
- step(action)[source]
Executes step given an action. An action is the absolute increment of each dimension.
- Parameters:
action (tuple) – Action to be executed. An action is a tuple of length n_dim, with the absolute increment for each dimension.
- Returns:
self.state (list) – The sequence after executing the action
action (int) – Action executed
valid (bool) – False, if the action is not allowed for the current state, e.g. stop at the root state
- Return type:
Tuple[List[float], Tuple[int, float], bool]
- step_backwards(action)[source]
Executes backward step given an action. An action is the absolute decrement of each dimension.
- Parameters:
action (tuple) – Action to be executed. An action is a tuple of length n_dim, with the absolute decrement for each dimension.
- Returns:
self.state (list) – The sequence after executing the action
action (int) – Action executed
valid (bool) – False, if the action is not allowed for the current state, e.g. stop at the root state
- Return type:
Tuple[List[float], Tuple[int, float], bool]
- get_grid_terminating_states(n_states, kappa=None)[source]
Constructs a grid of terminating states within the range of the hyper-cube.
- Parameters:
n_states (int) – Requested number of states. The actual number of states will be rounded up such that all dimensions have the same number of states.
kappa (float) – Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The grid will thus be in [kappa, 1 - kappa]. If None, self.kappa will be used.
- Return type:
List[List]
- get_uniform_terminating_states(n_states, seed=None, kappa=None)[source]
Constructs a set of terminating states sampled uniformly within the range of the hyper-cube.
- Parameters:
n_states (int) – Number of states in the returned list.
kappa (float) – Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The states will thus be uniformly sampled in [kappa, 1 - kappa]. If None, self.kappa will be used.
seed (int)
- Return type:
List[List]
- class gflownet.envs.cube.HybridCube(**kwargs)[source]
Bases:
gflownet.envs.composite.setfix.SetFix- Parameters:
subenvs (iterable) – An iterable containing the set of the sub-environments.
- states2proxy(states)[source]
Prepares a batch of states in environment format for a proxy.
The input states are in the environment format of the Set. The outputs contain only the Cube part and the format is as in
gflownet.envs.cube.ContinuousCube.states2proxy().- Parameters:
states (list) – A batch of states in Set environment format.
- Returns:
A tensor containing all the states in the batch.
- Return type:
torchtyping.TensorType[batch, state_proxy_dim]
- get_grid_terminating_states(n_states, kappa=None)[source]
Constructs a grid of terminating states within the range of the hyper-cube.
- Parameters:
n_states (int) – Requested number of states. The actual number of states will be rounded up such that all dimensions have the same number of states.
kappa (float) – Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The grid will thus be in [kappa, 1 - kappa]. If None, self.kappa will be used.
- Return type:
List[List]
- get_uniform_terminating_states(n_states, seed=None, kappa=None)[source]
Constructs a set of terminating states sampled uniformly within the range of the hyper-cube.
- Parameters:
n_states (int) – Number of states in the returned list.
kappa (float) – Small constant indicating the distance to the theoretical limits of the cube [0, 1], in order to avoid innacuracies in the computation of the log probabilities due to clamping. The states will thus be uniformly sampled in [kappa, 1 - kappa]. If None, self.kappa will be used.
seed (int)
- Return type:
List[List]
- fit_kde(samples, kernel='gaussian', bandwidth=0.1)[source]
Fits a Kernel Density Estimator on a batch of samples.
Simply calls fit_kde() of CubeBase.
- Parameters:
samples (torchtyping.TensorType[batch_size, state_proxy_dim])
kernel (str)
bandwidth (float)
- plot_reward_samples(samples, samples_reward, rewards, alpha=0.5, dpi=150, max_samples=500, **kwargs)[source]
Plots the reward contour alongside a batch of samples.
Simply calls plot_reward_samples() of CubeBase.
- Parameters:
samples (torchtyping.TensorType[batch_size, state_proxy_dim])
samples_reward (torchtyping.TensorType[batch_size, state_proxy_dim])
rewards (torchtyping.TensorType[batch_size])
alpha (float)
dpi (int)
max_samples (int)
- plot_kde(samples, kde, alpha=0.5, dpi=150, colorbar=True, **kwargs)[source]
Plots the density previously estimated from a batch of samples via KDE over the entire sample space.
Simply calls plot_reward_samples() of CubeBase.
- Parameters:
samples (torchtyping.TensorType[batch_size, state_proxy_dim])
alpha (float)
colorbar (bool)