dna === .. py:module:: dna .. autoapi-nested-parse:: Class to represent DNA sequences. Attributes ---------- .. autoapisummary:: dna.NUCLEOBASES dna.PAD_TOKEN Classes ------- .. autoapisummary:: dna.DNA Module Contents --------------- .. py:data:: NUCLEOBASES :value: ('A', 'C', 'T', 'G') .. py:data:: PAD_TOKEN :value: '0' .. py:class:: DNA(proxy_fmt = 'onehot-np', **kwargs) Bases: :py:obj:`gflownet.envs.sequences.base.SequenceBase` :param proxy_fmt: Specifies the proxy format. Options: - onehot: One-hot encoding - letters: The nucleobases as a list of strings - np or numpy: numpy, for the onehot case - torch or tensor: torch tensor, for the onehot case :type proxy_fmt: str .. py:method:: states2proxy_onehot(states) Prepares a batch of states in "environment format" for a proxy model: states are one-hot encoded. If numpy is True (default), the output is converted into a numpy array, otherwise it remains a torch tensor. Example, with max_length = 5: - Sequence (tokens): ACGC - state: [1, 2, 4, 2, 0] - policy format: [0, 1, 0, 0, 0, (A) 0, 0, 1, 0, 0, (C) 0, 0, 0, 0, 1, (G) 0, 0, 1, 0, 0, (C) 1, 0, 0, 0, 0] (PAD) :param states: A batch of states in environment format, either as a list of states or as a single tensor. :type states: tensor :returns: *A numpy array containing the one-hot encoding of all the states in the batch.*