NatureLM-audio¶

Paper ·

Code ·

Model ·

BEANS-Zero

Updates¶

2025-05-27 We've updated NatureLM-audio with a flexible merge between the original Llama 3.1 8B and the LoRA fine-tuned weights. Merging with the original weights improves prompt flexibility but comes at the cost of some bioacoustic task performance. See the Usage page and paper for details.

Overview¶

NatureLM-audio is the first audio-language foundation model designed specifically for bioacoustics. It combines a fine-tuned audio encoder (BEATs) with a large language model (Llama 3.1 8B Instruct), enabling researchers to query bioacoustics data using natural language.

Key capabilities:

Flexible task support: species classification, detection, call type and life stage classification, audio captioning, and individual counting
Zero-shot generalization: trained across bioacoustics, speech, and music, the model transfers acoustic knowledge to unseen species and taxa
Real-world scale: designed for large, diverse, and sparsely labeled datasets typical of conservation fieldwork
Accessible by design: no task-specific fine-tuning required; researchers can interact with the model with plain English natural language prompts