Quick Start Guide¶

This guide applies to NatureLM-audio v1.1, available through the Interactive Demo on Hugging Face Spaces.

Below are sample prompts to try with the model and a few practical tips. See the Prompting Guide for the full task reference, prompt variants, and advanced configuration.

Tips

Trim clips to under 10 seconds. When using the Interactive Demo, only the first 10 seconds of audio will be processed. If your recording is longer, you can trim it with the scissor icon in the bottom right of the audio player.
Use a shortlist when you can. Providing a list of candidate species improves accuracy — even a rough shortlist based on location or habitat helps.
For Yes/No questions, always include "Answer Yes or No." Without this, the model may respond with species names rather than a yes or no answer.

Core Tasks¶

Species Detection	What are the common names for the species in the audio, if any?
Species Identification	What species is vocalizing in this audio recording? Common name? What is the scientific name of the focal species in the audio? Which of these is the focal species in the audio? Options: American Robin, Song Sparrow, House Finch, Black-capped Chickadee List the scientific names of all species vocalizing in this audio clip. Given the context: 'country: US, recorded in temperate forest, June', what is the common name for the focal species in the audio?
Taxonomy	What is the genus of the focal species in the audio? What is the family of the focal species in the audio? What is the taxonomic name of the focal species in the audio?
Call Type & Behavior	What type of vocalization or call is this? Is this a call or a song? Is an alarm call present in this recording? Answer Yes or No. Is a flight call present in this recording? Answer Yes or No.
Life Stage	Is the focal species an adult or juvenile?
Presence / Absence	Is there a bird vocalizing in this recording? Answer Yes or No. Does this recording contain mammal vocalizations? Answer Yes or No. Are there any animal vocalizations in this recording? Answer Yes or No.
Captioning	Caption the audio, using common names for any animal species. Caption this audio with a rich, detailed description. Avoid specific species names.
Environmental Sounds	Which of these non-animal sounds are present in the recording? rain, wind, traffic, running water. Answer with a comma-separated list, or None.

If you have a candidate list for species ID, use the multiple-choice form — accuracy (~91%) is substantially higher than open-ended identification (~77%).

Experimental Tasks¶

The following are experimental tasks and results should be taken as exploratory:

For sample prompts and details on experimental tasks, see the Prompting Guide.

Limitations¶

Today, NatureLM-audio performs strongest on birds, and particularly for North American and Western European species. It handles other taxa too, but with lower reliability. A few specific limitations worth noting:

High-frequency calls above ~8 kHz (e.g. bats) are outside the model’s frequency range and will not return meaningful results
Tropical regions (Neotropics, Southeast Asia) are harder due to data availability and species richness
The model won’t refuse a wrong-taxon prompt — if you ask “what whale is this?” on a bird recording, it will identify the bird anyway
It can’t interpret animal emotions or translate vocalizations beyond predicting call type