dna
Class to represent DNA sequences.
Attributes
Classes
Module Contents
- class dna.DNA(proxy_fmt='onehot-np', **kwargs)[source]
Bases:
gflownet.envs.sequences.base.SequenceBase- Parameters:
proxy_fmt (str) –
- Specifies the proxy format. Options:
onehot: One-hot encoding
letters: The nucleobases as a list of strings
np or numpy: numpy, for the onehot case
torch or tensor: torch tensor, for the onehot case
- states2proxy_onehot(states)[source]
Prepares a batch of states in “environment format” for a proxy model: states are one-hot encoded. If numpy is True (default), the output is converted into a numpy array, otherwise it remains a torch tensor.
- Example, with max_length = 5:
Sequence (tokens): ACGC
state: [1, 2, 4, 2, 0]
- policy format:
- [0, 1, 0, 0, 0, (A)
0, 0, 1, 0, 0, (C) 0, 0, 0, 0, 1, (G) 0, 0, 1, 0, 0, (C) 1, 0, 0, 0, 0] (PAD)
- Parameters:
states (tensor) – A batch of states in environment format, either as a list of states or as a single tensor.
- Returns:
A numpy array containing the one-hot encoding of all the states in the batch.
- Return type:
Union[torchtyping.TensorType[batch, policy_input_dim], numpy.typing.NDArray]