Batching and Input Flexibility#

The BioLM Python client supports a wide variety of input formats and batching strategies to maximize flexibility and efficiency. This document explains all supported input types, how auto-batching works, and how to use advanced batching for custom workflows.

Supported Input Formats#

You can provide input in several ways:

1. Single item (string or dict):
  • For a single sequence or context.

  • Example:

    biolm(entity="esm2-8m", action="encode", type="sequence", items="MSILVTRPSPAGEEL")
    
2. List of values (strings, numbers, etc):
  • For a batch of simple items (e.g. sequences). Pass a type so the client knows how to interpret the values.

  • Example:

    biolm(entity="esm2-8m", action="encode", type="sequence", items=["SEQ1", "SEQ2"])
    
3. List of dicts:
  • For a batch of structured items. Type is inferred from the dict keys.

  • Example:

    biolm(entity="esmfold", action="predict", items=[{"sequence": "SEQ1"}, {"sequence": "SEQ2"}])
    
4. Generators and iterators (memory-efficient):
  • Pass a generator or any iterable instead of a list. The client consumes it batch-by-batch, so you never hold all items in memory at once.

  • Ideal for large files, streams, or lazy data pipelines.

  • Note: The generator is fully consumed during the call; you cannot iterate it again afterwards.

  • Example:

    def sequences_from_file(path):
        with open(path) as f:
            for line in f:
                seq = line.strip()
                if seq:
                    yield {"sequence": seq}
    
    result = biolm(entity="esm2-8m", action="encode", items=sequences_from_file("sequences.txt"))
    
5. List of lists of dicts (advanced/manual batching):
  • Each inner list is treated as a batch and sent as a single API request.

  • Useful for custom batching, controlling batch size, or mixing valid/invalid items.

  • Example:

    batches = [
        [{"sequence": "SEQ1"}, {"sequence": "SEQ2"}],  # batch 1
        [{"sequence": "SEQ3"}],                        # batch 2
    ]
    biolm(entity="esmfold", action="predict", items=batches)
    

How auto-batching works#

The client asks the API for the model’s maximum batch size, splits your input into batches of that size, and sends each batch as a separate request. Results come back in the same order as your input. You don’t need to split manually.

Example:

# If the model's max batch size is 8, this will be split into 2 requests:
items = ["SEQ" + str(i) for i in range(12)]
result = biolm(entity="esm2-8m", action="encode", type="sequence", items=items)
# result is a list of 12 results, in order

Advanced: Manual Batching with List of Lists#

  • If you provide a list of lists of dicts, each inner list is treated as a batch.

  • This disables auto-batching: you control the batch size and composition.

  • Useful for:
    • Forcing certain items to be batched together (e.g., for error isolation).

    • Working around API limits or bugs.

    • Testing error handling with mixed valid/invalid batches.

Example:

# Two batches: first has 2 items, second has 1
items = [
    [{"sequence": "SEQ1"}, {"sequence": "BADSEQ"}],  # batch 1
    [{"sequence": "SEQ3"}],                          # batch 2
]
result = biolm(entity="esmfold", action="predict", items=items, stop_on_error=False)
# result is a flat list: [result1, result2, result3]

Input validation#

  • List of dicts: type is inferred from the keys.

  • List of plain values (e.g. strings): pass a type (e.g. sequence) so the client knows how to interpret them.

  • List of lists (manual batching): each inner list must be a list of dicts.

Sequence validity#

Protein sequences must use only valid amino acid letters. The client accepts the standard set (e.g. ACDEFGHIKLMNPQRSTVWYBXZUO).

Batch size and schema#

You can read the maximum batch size from the schema:

from biolmai.core.http import BioLMApi
model = BioLMApi("esm2-8m")
schema = model.schema("esm2-8m", "encode")
max_batch = model.extract_max_items(schema)
print("Max batch size:", max_batch)

Batching and errors#

If a batch has invalid items, the whole batch may fail. You can halt on the first error batch or process all batches and get error dicts in the results; with the API client you can also retry failed batches as single items. See Error Handling for details and examples.

Summary Table#

Input Format

Auto-batching?

Use Case

Single value/dict

Yes

Single item

List of values

Yes (pass type)

Batch of simple items

List of dicts

Yes

Batch of structured items

Generator/iterator

Yes (consumed in batches)

Large streams, low memory

List of lists of dicts

No (manual batching)

Custom batch control

Examples#

Batching with list of dicts:

from biolmai import biolm

items = [{"sequence": "SEQ1"}, {"sequence": "SEQ2"}]
result = biolm(entity="esm2-8m", action="encode", items=items)

Batching with list of values:

items = ["SEQ1", "SEQ2"]
result = biolm(entity="esm2-8m", action="encode", type="sequence", items=items)

Manual batching with list of lists:

batches = [
    [{"sequence": "SEQ1"}, {"sequence": "BADSEQ"}],  # batch 1
    [{"sequence": "SEQ3"}],                          # batch 2
]
result = biolm(entity="esmfold", action="predict", items=batches, stop_on_error=False)

Best practices#

  • Prefer a list of values or dicts and let the client auto-batch.

  • For large datasets (files, streams), use a generator so items are consumed batch-by-batch.

  • For very large result sets, write to disk (see Disk output in Usage).

  • Use manual batching (list of lists) only when you need custom batch sizes or composition.

See Also#