Quick Start Guide

This guide applies to NatureLM-audio v1.1, available through the Interactive Demo on Hugging Face Spaces.

Below are sample prompts to try with the model and a few practical tips. See the Prompting Guide for the full task reference, prompt variants, and advanced configuration.

Tips

  • Trim clips to under 10 seconds. When using the Interactive Demo, only the first 10 seconds of audio will be processed. If your recording is longer, you can trim it with the scissor icon in the bottom right of the audio player.
  • Use a shortlist when you can. Providing a list of candidate species improves accuracy — even a rough shortlist based on location or habitat helps.
  • For Yes/No questions, always include "Answer Yes or No." Without this, the model may respond with species names rather than a yes or no answer.

Core Tasks

Species Detection
  • What are the common names for the species in the audio, if any?
Species Identification
  • What species is vocalizing in this audio recording? Common name?
  • What is the scientific name of the focal species in the audio?
  • Which of these is the focal species in the audio? Options: American Robin, Song Sparrow, House Finch, Black-capped Chickadee
  • List the scientific names of all species vocalizing in this audio clip.
  • Given the context: 'country: US, recorded in temperate forest, June', what is the common name for the focal species in the audio?
Taxonomy
  • What is the genus of the focal species in the audio?
  • What is the family of the focal species in the audio?
  • What is the taxonomic name of the focal species in the audio?
Call Type & Behavior
  • What type of vocalization or call is this?
  • Is this a call or a song?
  • Is an alarm call present in this recording? Answer Yes or No.
  • Is a flight call present in this recording? Answer Yes or No.
Life Stage
  • Is the focal species an adult or juvenile?
Presence / Absence
  • Is there a bird vocalizing in this recording? Answer Yes or No.
  • Does this recording contain mammal vocalizations? Answer Yes or No.
  • Are there any animal vocalizations in this recording? Answer Yes or No.
Captioning
  • Caption the audio, using common names for any animal species.
  • Caption this audio with a rich, detailed description. Avoid specific species names.
Environmental Sounds
  • Which of these non-animal sounds are present in the recording? rain, wind, traffic, running water. Answer with a comma-separated list, or None.

If you have a candidate list for species ID, use the multiple-choice form — accuracy (~91%) is substantially higher than open-ended identification (~77%).

Experimental Tasks

The following are experimental tasks and results should be taken as exploratory:

For sample prompts and details on experimental tasks, see the Prompting Guide.

Limitations

Today, NatureLM-audio performs strongest on birds, and particularly for North American and Western European species. It handles other taxa too, but with lower reliability. A few specific limitations worth noting:

  • High-frequency calls above ~8 kHz (e.g. bats) are outside the model’s frequency range and will not return meaningful results

  • Tropical regions (Neotropics, Southeast Asia) are harder due to data availability and species richness

  • The model won’t refuse a wrong-taxon prompt — if you ask “what whale is this?” on a bird recording, it will identify the bird anyway

  • It can’t interpret animal emotions or translate vocalizations beyond predicting call type