BioLMTox Info#

Chance Challacombe

Dec 26, 2023

6 min read

This page explains applications of BioLMTox and documents it’s usage for classification and embedding extraction on BioLM.


Toxin classification has important applications in both industry and research settings and has been a concern for some time with respect to biosecurity and in the fields of protein, DNA and drug design. BioLMTox is an application of the pre-train fine-tune paradigm, honing the ESM-2 Pre-Trained Protein Language Model for general toxin classification.

Model Background#

BioLMTox is a protein language model fine-tuned for general (different domains of life and sequence lengths) toxin classification. BioLMTox was trained on a selection of sequences from the UniProt, UniRef50 and comparable SOTA datasets.

Applications of BioLMTox#

BioLMTox classification predictions and embeddings can be

  • used to augment biosecurity screening. Incorporate BioLMTox predictions before wet lab testing or alongside other computational screening software.

  • used to discriminate between toxin and not toxin homolologs that may bypass standard sequence similarity methods

  • incorporated into public facing APIs, we apps and chat agents to reduce dual-use risks

BioLM Benefits#

  • Always-on, auto-scaling GPU-backed APIs; highly-scalable parallelization.

  • Save money on infrastructure, GPU costs, and development time.

  • Quickly integrate multiple embeddings into your workflows.

  • Interact with the endpoint using natural language and our Chat Agents.

  • Rapidly screen for biosecurity risks

  • Get ahead of potential biosecurity regulation and laws