NatureLM-audio¶
Updates¶
Overview¶
NatureLM-audio is the first audio-language foundation model designed specifically for bioacoustics. It combines a fine-tuned audio encoder (BEATs) with a large language model (Llama 3.1 8B Instruct), enabling researchers to query bioacoustics data using natural language.
Key capabilities:
Flexible task support: species classification, detection, call type and life stage classification, audio captioning, and individual counting
Zero-shot generalization: trained across bioacoustics, speech, and music, the model transfers acoustic knowledge to unseen species and taxa
Real-world scale: designed for large, diverse, and sparsely labeled datasets typical of conservation fieldwork
Accessible by design: no task-specific fine-tuning required; researchers can interact with the model with plain English natural language prompts