GFlowNet#

This repository implements GFlowNets, generative flow networks for probabilistic modelling, on PyTorch. A design guideline behind this implementation is the separation of the logic of the GFlowNet agent and the environments on which the agent can be trained on. In other words, this implementation facilitates the extension with new environments for new applications. The configuration is handled via the use of Hydra.

Contributors#

Many wonderful scientists and developers have contributed to this repository: Alex Hernandez-Garcia, Nikita Saxena, Alexandra Volokhova, Michał Koziarski, Divya Sharma, Pierre Luc Carrier and Victor Schmidt. The GFlowNet implementation was initially part of github.com/InfluenceFunctional/ActiveLearningPipeline.

Research#

This repository has been used in at least the following research articles:

Installation#

Quickstart: If you simply want to install everything, run setup_all.sh.

  • This project requires python 3.10 and cuda 11.8.

  • Setup is currently only supported on Ubuntu. It should also work on OSX, but you will need to handle the package dependencies.

  • The recommend installation is as follows:

python3.10 -m venv ~/envs/gflownet  # Initalize your virtual env.
source ~/envs/gflownet/bin/activate  # Activate your environment.
./prereq_ubuntu.sh  # Installs some packages required by dependencies.
./prereq_python.sh  # Installs python packages with specific wheels.
./prereq_geometric.sh  # OPTIONAL - for the molecule environment.
pip install .[all]  # Install the remaining elements of this package.

Aside from the base packages, you can optionally install dev tools using this tag, materials dependencies using this tag, or molecules packages using this tag. The simplest option is to use the all tag, as above, which installs all dependencies.

How to train a GFlowNet model#

To train a GFlowNet model with the default configuration, simply run

python main.py user.logdir.root=<path/to/log/files/>

Alternatively, you can create a user configuration file in config/user/<username>.yaml specifying a logdir.root and run

python main.py user=<username>

Using Hydra, you can easily specify any variable of the configuration in the command line. For example, to train GFlowNet with the trajectory balance loss, on the continuous torus (ctorus) environment and the corresponding proxy:

python main.py gflownet=trajectorybalance env=ctorus proxy=torus

The above command will overwrite the env and proxy default configuration with the configuration files in config/env/ctorus.yaml and config/proxy/torus.yaml respectively.

Hydra configuration is hierarchical. For instance, a handy variable to change while debugging our code is to avoid logging to wandb. You can do this by setting logger.do.online=False.

GFlowNet loss functions#

Currently, the implementation includes the following GFlowNet losses:

Logging to wandb#

The repository supports logging of train and evaluation metrics to wandb.ai, but it is disabled by default. In order to enable it, set the configuration variable logger.do.online to True.

Cite#

Bibtex Format

@misc{hernandez-garcia2024,
  author = {Hernandez-Garcia, Alex and Saxena, Nikita and Volokhova, Alexandra and Koziarski, Michał and Sharma, Divya and Viviano, Joseph D and Carrier, Pierre Luc and Schmidt, Victor},
  title  = {gflownet},
  url    = {https://github.com/alexhernandezgarcia/gflownet},
  year   = {2024},
}

Or CFF file