NatureLM-audio

Updates

2025-05-27 We've updated NatureLM-audio with a flexible merge between the original Llama 3.1 8B and the LoRA fine-tuned weights. Merging with the original weights improves prompt flexibility but comes at the cost of some bioacoustic task performance. See the Usage page and paper for details.

Overview

NatureLM-audio architecture diagram

NatureLM-audio is the first audio-language foundation model designed specifically for bioacoustics. It combines a fine-tuned audio encoder (BEATs) with a large language model (Llama 3.1 8B Instruct), enabling researchers to query bioacoustics data using natural language.

Key capabilities:

  • Flexible task support: species classification, detection, call type and life stage classification, audio captioning, and individual counting

  • Zero-shot generalization: trained across bioacoustics, speech, and music, the model transfers acoustic knowledge to unseen species and taxa

  • Real-world scale: designed for large, diverse, and sparsely labeled datasets typical of conservation fieldwork

  • Accessible by design: no task-specific fine-tuning required; researchers can interact with the model with plain English natural language prompts