gflownet.envs.ctorus

Classes to represent continuous hyper-torus environments.

Classes

ContinuousTorus

Initializes a ContinuousCube environent.

Module Contents

class gflownet.envs.ctorus.ContinuousTorus(n_dim=2, length_traj=1, n_comp=1, policy_encoding_dim_per_angle=None, vonmises_min_concentration=0.001, fixed_distr_params={'vonmises_mean': 0.0, 'vonmises_concentration': 0.5}, random_distr_params={'vonmises_mean': 0.0, 'vonmises_concentration': 0.001}, **kwargs)[source]

Bases: gflownet.envs.base.GFlowNetEnv

Initializes a ContinuousCube environent.

Parameters:
  • ndim (int) – Dimensionality of the torus

  • length_traj (int) – Fixed length of the trajectory.

  • n_comp (int) – Number of components in the mixture of von Mises distributions used to sample angle increments.

  • policy_encoding_dim_per_angle (int) – Dimensionality of the policy encodings of the angles.

  • vonmises_min_concentration (float) – Minimum value allowed for the concentration parameter of the von Mises distributions.

  • fixed_distr_params (dict) – Dictionary of parameters of the von Mises distribution that defines the fixed distribution of the environment. It must contain two keys with float values: vonmises_mean and vonmises_concentration.

  • random_distr_params (dict) – Dictionary of parameters of the von Mises distribution that defines the random distribution of the environment. It must contain two keys with float values: vonmises_mean and vonmises_concentration.

  • n_dim (int)

n_dim = 2[source]
length_traj = 1[source]
n_comp = 1[source]
policy_encoding_dim_per_angle = None[source]
vonmises_min_concentration = 0.001[source]
source_angles[source]
source[source]
eos[source]
continuous = True[source]
property mask_dim[source]

Returns the dimensionality of the masks.

The mask consists of two fixed flags.

Returns:

The dimensionality of the masks.

get_action_space()[source]

The action space is continuous, thus not defined as such here.

The actions are tuples of length n_dim, where the value at position d indicates the increment of dimension d.

EOS is indicated by np.inf for all dimensions.

This method defines self.eos and the returned action space is simply a representative (arbitrary) action with an increment of 0.0 in all dimensions, and EOS.

action2representative(action)[source]

Returns the arbirary, representative action in the action space, so that the action can be contrasted with the action space and masks. If EOS, action return EOS.

Parameters:

action (Tuple)

Return type:

Tuple

get_policy_output(params)[source]

Defines the structure of the output of the policy model, from which an action is to be determined or sampled, by returning a vector with a fixed random policy.

For each dimension d of the hyper-torus and component c of the mixture, the output of the policy should return

  1. the weight of the component in the mixture

  2. the location of the von Mises distribution to sample the angle increment

3) the log concentration of the von Mises distribution to sample the angle increment

Therefore, the output of the policy model has dimensionality D x C x 3, where D is the number of dimensions (self.n_dim) and C is the number of components (self.n_comp). The first 3 x C entries in the policy output correspond to the first dimension, and so on.

Parameters:

params (dict)

Return type:

torchtyping.TensorType[policy_output_dim]

get_mask_invalid_actions_forward(state=None, done=None)[source]

The action space is continuous, thus the mask is not of invalid actions as in discrete environments, but an indicator of “special cases”, for example states from which only certain actions are possible.

The “mask” has 2 elements - to match the mask of backward actions - but only one is needed for forward actions, thus both elements take the same value, according to the following:

  • If done is True, then the mask is True.

  • If the number of actions (state[-1]) is equal to the (fixed) trajectory length, then only EOS is valid and the mask is True.

  • Otherwise, any continuous action is valid (except EOS) and the mask is False.

Parameters:
  • state (Optional[List])

  • done (Optional[bool])

Return type:

List

get_mask_invalid_actions_backward(state=None, done=None, parents_a=None)[source]

The action is space is continuous, thus the mask is not of invalid actions as in discrete environments, but an indicator of “special cases”, for example states from which only certain actions are possible.

The “mask” has 2 elements to capture the 2 special in backward actions. The possible values of the mask are the following:

  • mask[0]:
    • True, if only the “return-to-source” action is valid.

    • False otherwise.

  • mask[1]:
    • True, if only the EOS action is valid, that is if done is True.

    • False otherwise.

get_valid_actions(mask=None, state=None, done=None, backward=False)[source]

Returns the list of non-invalid (valid, for short) according to the mask of invalid actions.

As a continuous environment, the returned actions are “representatives”, that is the actions represented in the action space.

Parameters:
  • mask (list (optional)) – The mask of a state. If None, it is computed in place.

  • state (list (optional)) – A state in GFlowNet format. If None, self.state is used.

  • done (bool (optional)) – Whether the trajectory is done. If None, self.done is used.

  • backward (bool) – True if the transtion is backwards; False if forward.

Returns:

list – The list of representatives of the valid actions.

Return type:

List[Tuple]

get_parents(state=None, done=None, action=None)[source]

Defined only because it is required. A ContinuousEnv should be created to avoid this issue.

Parameters:
  • state (List)

  • done (bool)

  • action (Tuple[int, float])

Return type:

Tuple[List[List], List[Tuple[int, float]]]

step(action, skip_mask_check=False)[source]

Executes forward step given an action.

See: _step().

Parameters:
  • action (tuple) – Action to be executed. An action is a vector where the value at position d indicates the increment in the angle at dimension d.

  • skip_mask_check (bool) – Ignored because the action space space is fully continuous, therefore there is nothing to check.

Returns:

  • self.state (list) – The sequence after executing the action

  • action (int) – Action executed

  • valid (bool) – False, if the action is not allowed for the current state, e.g. stop at the root state

Return type:

Tuple[List[float], Tuple[float], bool]

step_backwards(action, skip_mask_check=False)[source]

Executes backward step given an action.

See: _step().

Parameters:
  • action (tuple) – Action to be executed. An action is a vector where the value at position d indicates the increment in the angle at dimension d.

  • skip_mask_check (bool) – Ignored because the action space space is fully continuous, therefore there is nothing to check.

Returns:

  • self.state (list) – The sequence after executing the action

  • action (int) – Action executed

  • valid (bool) – False, if the action is not allowed for the current state, e.g. stop at the root state

Return type:

Tuple[List[float], Tuple[float], bool]

states2proxy(states)[source]

Prepares a batch of states in “environment format” for the proxy: each state is a vector of length n_dim where each value is an angle in radians. The n_actions item is removed.

Parameters:

states (list or tensor) – A batch of states in environment format, either as a list of states or as a single tensor.

Returns:

A tensor containing all the states in the batch.

Return type:

torchtyping.TensorType[batch, state_proxy_dim]

states2policy(states)[source]

Prepares a batch of states in “environment format” for the policy model: if policy_encoding_dim_per_angle >= 2, then the state (angles) is encoded using trigonometric components.

Parameters:

states (list or tensor) – A batch of states in environment format, either as a list of states or as a single tensor.

Returns:

A tensor containing all the states in the batch.

Return type:

torchtyping.TensorType[batch, policy_input_dim]

state2readable(state)[source]

Converts a state (a list of positions) into a human-readable string representing a state. Angles are converted into degrees in [0, 360]

Parameters:

state (List)

Return type:

str

readable2state(readable)[source]

Converts a human-readable string representing a state into a state as a list of positions. Angles are converted back to radians.

Parameters:

readable (str)

Return type:

List

sample_actions_batch(policy_outputs, mask=None, states_from=None, is_backward=False, random_action_prob=0.0, temperature_logits=1.0)[source]

Samples a batch of actions from a batch of policy outputs. The angle increments that form the actions are sampled from a mixture of Von Mises distributions.

A distinction between forward and backward actions is made and specified by the argument is_backward, in order to account for the following special cases:

Forward:

  • If the number of steps is equal to the maximum, then the only valid action is EOS.

Backward:

  • If the number of steps is equal to 1, then the only valid action is to return to the source. The specific action depends on the current state.

Parameters:
  • policy_outputs (tensor) – The output of the GFlowNet policy model.

  • mask (tensor) – The mask containing information about special cases.

  • states_from (tensor) – The states originating the actions, in GFlowNet format.

  • is_backward (bool) – True if the actions are backward, False if the actions are forward (default).

  • random_action_prob (float, optional) – The probability of sampling a random action.

  • temperature_logits (float, optional) – A scalar by which the model outputs are divided to temper the sampling distribution.

Return type:

Tuple[List[Tuple], torchtyping.TensorType[ContinuousTorus.sample_actions_batch.n_states]]

get_logprobs(policy_outputs, actions, mask, states_from=None, is_backward=False)[source]

Computes log probabilities of actions given policy outputs and actions.

Parameters:
  • policy_outputs (tensor) – The output of the GFlowNet policy model.

  • mask (tensor) – The mask containing information special cases.

  • actions (list or tensor) – The actions (angle increments) from each state in the batch for which to compute the log probability.

  • states_from (tensor) – Ignored.

  • is_backward (bool) – Ignored.

Return type:

torchtyping.TensorType[batch_size]

copy()[source]
get_grid_terminating_states(n_states)[source]

Samples n terminating states by sub-sampling the state space as a grid, where n / n_dim points are obtained for each dimension.

Parameters:

n_states (int) – The number of terminating states to sample.

Returns:

states (list) – A list of randomly sampled terminating states.

Return type:

List[List]

get_uniform_terminating_states(n_states, seed=None)[source]
Parameters:
  • n_states (int)

  • seed (int)

Return type:

List[List]

fit_kde(samples, kernel='gaussian', bandwidth=0.1)[source]

Fits a Kernel Density Estimator on a batch of samples.

The samples are previously augmented in order to account for the periodic aspect of the sample space.

Parameters:
  • samples (tensor) – A batch of samples in proxy format.

  • kernel (str) – An identifier of the kernel to use for the density estimation. It must be a valid kernel for the scikit-learn method sklearn.neighbors.KernelDensity().

  • bandwidth (float) – The bandwidth of the kernel.

plot_reward_samples(samples, samples_reward, rewards, min_domain=-np.pi, max_domain=3 * np.pi, alpha=0.5, dpi=150, max_samples=500, **kwargs)[source]

Plots the reward contour alongside a batch of samples.

The samples are previously augmented in order to visualise the periodic aspect of the sample space. It is assumed that the rewards are sorted from left to right (first) and top to bottom of the grid of samples.

Parameters:
  • samples (tensor) – A batch of samples from the GFlowNet policy in proxy format. These samples will be plotted on top of the reward density.

  • samples_reward (tensor) – A batch of samples containing a grid over the sample space, from which the reward has been obtained. Ignored by this method.

  • rewards (tensor) – The rewards of samples_reward. It should be a vector of dimensionality n_per_dim ** 2 and be sorted such that the each block at rewards[i * n_per_dim:i * n_per_dim + n_per_dim] correspond to the rewards at the i-th row of the grid of samples, from top to bottom.

  • min_domain (float) – Minimum value of the domain to keep in the plot.

  • max_domain (float) – Maximum value of the domain to keep in the plot.

  • alpha (float) – Transparency of the reward contour.

  • dpi (int) – Dots per inch, indicating the resolution of the plot.

  • max_samples (int) – Maximum of number of samples to include in the plot.

plot_kde(samples, kde, alpha=0.5, dpi=150, colorbar=True, **kwargs)[source]

Plots the density previously estimated from a batch of samples via KDE over the entire sample space.

Parameters:
  • samples (tensor) – A batch of samples containing a grid over the sample space. These samples are used to plot the contour of the estimated density.

  • kde (KDE) – A scikit-learn KDE object fit with a batch of samples.

  • alpha (float) – Transparency of the density contour.

  • dpi (int) – Dots per inch, indicating the resolution of the plot.

  • colorbar (bool)

static augment_samples(samples, exclude_original=False)[source]

Augments a batch of samples by applying the periodic boundary conditions from [0, 2pi) to [-2pi, 4pi) for all dimensions.

Parameters:
  • samples (numpy.array)

  • exclude_original (bool)

Return type:

numpy.array