ESM-1v API#

Zeeshan Siddiqui

Oct 18, 2023

6 min read

On this page, we will show and explain the use of ESM-1v. As well as document the BioLM API for folding, demonstrate no-code and code interfaces to folding.

API Usage#

There are 6 BioLM endpoints corresponding to 5 different sized ESM-1v model endpoints and 1 endpoint combining all 5 models. These endpoints are:

The BioLM API ESM-1v predict endpoints have been customized to return the likelihoods for every AA unmasked at any <mask> position, so you can easily see how the likelihood of the sequence being functional with the wild-type residue compares to a single-AA mutation at that position. The way to get a straight, “what is the likelihood of function of this sequence” out of this model, is to mask one AA, then get the WT probability for the WT AA, returned by the API. Furthermore, the BioLM API has 5 distinct endpoints, as there are five models trained randomly on the same data. Hence, the likelihoods coming out of each one for the same input are slightly different. The best results are achieved by averaging the likelihoods given by all 5 models for a given AA at a given position corresponding to the 6th BioLM API endpoint.

For example, using the ESM-1v model 1, the predict API endpoint is https://biolm.ai/api/v2/esm1v-n1/predict/.

Making Requests#

curl --location 'https://biolm.ai/api/v2/esm1v-n1/predict/' \
   --header "Authorization: Token $BIOLMAI_TOKEN" \
   --header 'Content-Type: application/json' \
   --data '{
    "items": [
        {
            "sequence": "QERLEUTGR<mask>SLYNIVAT"
        }
    ]
}'
import requests
import json

url = "https://biolm.ai/api/v2/esm1v-n1/predict/"

payload = json.dumps({
    "items": [
        {
            "sequence": "QERLEUTGR<mask>SLYNIVAT"
        }
    ]
})
headers = {
'Authorization': 'Token {}'.format(os.environ['BIOLMAI_TOKEN']),
'Content-Type': 'application/json'
}

response = requests.request("POST", url, headers=headers, data=payload)

print(response.text)
import biolmai
seqs = ["QERLEUTGR<mask>SLYNIVAT"]

cls = biolmai.ESM1v1()
resp = cls.Predict(seqs)
library(RCurl)
headers = c(
'Authorization' = paste('Token', Sys.getenv('BIOLMAI_TOKEN')),
"Content-Type" = "application/json"
)
payload = "{
    \"items\": [
        {
            \"sequence\": \"QERLEUTGR<mask>SLYNIVAT\"
        }
    ]
}"
res <- postForm("https://biolm.ai/api/v2/esm1v-n1/predict/", .opts=list(postfields = payload, httpheader = headers, followlocation = TRUE), style = "httppost")
cat(res)

JSON Response#

Expand Example Response
{
   "results": [
       [
           {
               "token": 4,
               "token_str": "L",
               "score": 0.10017549991607666,
               "sequence": "Q E R L E U T G R L S L Y N I V A T"
           },
           {
               "token": 8,
               "token_str": "S",
               "score": 0.07921414822340012,
               "sequence": "Q E R L E U T G R S S L Y N I V A T"
           },
           {
               "token": 10,
               "token_str": "R",
               "score": 0.0782080590724945,
               "sequence": "Q E R L E U T G R R S L Y N I V A T"
           },

Note

The above response is only a small snippet of the full JSON response. Each of these dictionaries corresponds

to one of the acceptable amino acids

Request Definitions#

items:
Inside items are a list of dictionaries with each dictionary corresponding to one model input.
sequence:

The input sequence for the model

Response Definitions#

results:

This is the main key in the JSON object that contains an array of model results. Each element in the array represents a set of predictions for one input instance.

score:

This represents the confidence or probability of the model’s prediction for the masked token. A higher score indicates higher confidence.

token:

The predicted token’s identifier as per the model’s tokenization scheme. It’s an integer that corresponds to a particular token (in this case, a particular amino acid) in the model’s vocabulary.

token_str:

Represents the predicted token as a string. That is, the amino acid that was predicted to fill in the masked position in the sequence.

sequence:

Represents the complete sequence with the masked position filled in by the predicted token.