selfies

Class to represent SELFIES molecules.

Attributes

SELFIES_VOCAB_SMALL

PAD_TOKEN

Classes

Selfies

Module Contents

selfies.SELFIES_VOCAB_SMALL = ['[#Branch1]', '[#Branch2]', '[#C]', '[#N]', '[=Branch1]', '[=Branch2]', '[=C]', '[=N]', '[=O]',...[source]
selfies.PAD_TOKEN = '[nop]'[source]
class selfies.Selfies(selfies_vocab=None, **kwargs)[source]

Bases: gflownet.envs.sequences.base.SequenceBase

Parameters:

selfies_vocab (List[str] | None) – The list of SELFIES tokens to use as the vocabulary. If None (default), the small vocabulary defined in SELFIES_VOCAB_SMALL is used.

selfies_vocab = None[source]
states2proxy(states)[source]

Prepare a batch of states for a SELFIES-string proxy.

The proxy representation is the compact SELFIES string obtained by concatenating all non-padding tokens in the sequence.

Parameters:

states (list or tensor) – A batch of states in environment format, either as a list of states or as a single tensor.

Returns:

A list containing one SELFIES string per state.

Return type:

List[str]