Quick Start Guide¶
This guide applies to NatureLM-audio v1.1, available through the Interactive Demo on Hugging Face Spaces.
Below are sample prompts to try with the model and a few practical tips. See the Prompting Guide for the full task reference, prompt variants, and advanced configuration.
Tips
- Trim clips to under 10 seconds. When using the Interactive Demo, only the first 10 seconds of audio will be processed. If your recording is longer, you can trim it with the scissor icon in the bottom right of the audio player.
- Use a shortlist when you can. Providing a list of candidate species improves accuracy — even a rough shortlist based on location or habitat helps.
- For Yes/No questions, always include "Answer Yes or No." Without this, the model may respond with species names rather than a yes or no answer.
Core Tasks¶
| Species Detection |
|
| Species Identification |
|
| Taxonomy |
|
| Call Type & Behavior |
|
| Life Stage |
|
| Presence / Absence |
|
| Captioning |
|
| Environmental Sounds |
|
If you have a candidate list for species ID, use the multiple-choice form — accuracy (~91%) is substantially higher than open-ended identification (~77%).
Experimental Tasks¶
The following are experimental tasks and results should be taken as exploratory:
For sample prompts and details on experimental tasks, see the Prompting Guide.
Limitations¶
Today, NatureLM-audio performs strongest on birds, and particularly for North American and Western European species. It handles other taxa too, but with lower reliability. A few specific limitations worth noting:
High-frequency calls above ~8 kHz (e.g. bats) are outside the model’s frequency range and will not return meaningful results
Tropical regions (Neotropics, Southeast Asia) are harder due to data availability and species richness
The model won’t refuse a wrong-taxon prompt — if you ask “what whale is this?” on a bird recording, it will identify the bird anyway
It can’t interpret animal emotions or translate vocalizations beyond predicting call type