Supported Models¶

Official Models (ESP-AVES2)¶

The ESP-AVES2 model collection is available on HuggingFace: EarthSpeciesProject/esp-aves2

Available Models¶

Model Name	Architecture	Training Data	HuggingFace
`esp_aves2_sl_beats_all`	BEATs	All (AudioSet + Bio)	Link
`esp_aves2_sl_beats_bio`	BEATs	Bioacoustics	Link
`esp_aves2_naturelm_audio_v1_beats`	BEATs + NatureLM	All	Link
`esp_aves2_eat_all`	EAT	All (AudioSet + Bio)	Link
`esp_aves2_eat_bio`	EAT	Bioacoustics	Link
`esp_aves2_sl_eat_all_ssl_all`	EAT (SSL)	All	Link
`esp_aves2_sl_eat_bio_ssl_all`	EAT (SSL)	Bioacoustics	Link
`esp_aves2_effnetb0_all`	EfficientNet-B0	All (AudioSet + Bio)	Link
`esp_aves2_effnetb0_bio`	EfficientNet-B0	Bioacoustics	Link
`esp_aves2_effnetb0_audioset`	EfficientNet-B0	AudioSet	Link

Supported Architectures¶

BEATs: Bidirectional Encoder representation from Audio Transformers
EAT: Efficient Audio Transformer models
EfficientNet: EfficientNet-based models adapted for audio classification
AVES: AVES model for bioacoustics
BirdMAE: BirdMAE masked autoencoder for bioacoustic representation learning

Labels vs Features Only¶

Capability	Description
Classification with labels	Model has a trained classifier head and a class mapping (e.g. `label_map.json`). Use `load_model("model_name", device="cpu")` to get logits and use `model.label_mapping` for human-readable class names.
Features / embeddings only	Any model can be loaded for embedding extraction by passing `return_features_only=True`. The model then returns feature tensors instead of classification logits.

How to see which models offer what

At runtime: Call list_models() — the printed table has a “Trained Classifier” column (✅ = has checkpoint + class mapping, ❌ = backbone/features only). The returned dict includes has_trained_classifier and num_classes per model.
Per model: Call describe_model("model_name", verbose=True) to see “Has Trained Classifier”, checkpoint path, class mapping path, and number of classes.

All official ESP-AVES2 models have both a checkpoint and a class mapping, so they support classification with labels. They also support embedding extraction with load_model(..., return_features_only=True).

Model Configuration¶

Models are configured using YAML files which contain the model specifications model_spec. The official config files are in the avex/api/configs/official_models/ directory. These files define the model architecture, audio preprocessing parameters, and optional checkpoint/label mapping paths.

Minimal Model Configuration:

# Example: my_model.yml - Minimal configuration for model loading
model_spec:
  name: efficientnet
  pretrained: false
  device: cuda
  audio_config:
    sample_rate: 16000
    representation: mel_spectrogram
    n_mels: 128
  efficientnet_variant: b0

Full Model Configuration (with checkpoint):

# Example: esp_aves2_effnetb0_all.yml - Complete configuration
# Optional: Default checkpoint path
checkpoint_path: hf://EarthSpeciesProject/esp-aves2-effnetb0-all/esp-aves2-effnetb0-all.safetensors

# Optional: Label mapping for human-readable predictions
class_mapping_path: hf://EarthSpeciesProject/esp-aves2-effnetb0-all/label_map.json

# Required: Model specification
model_spec:
  name: efficientnet
  pretrained: false
  device: cuda
  audio_config:
    sample_rate: 16000
    representation: mel_spectrogram
    n_mels: 128
    target_length_seconds: 10
  efficientnet_variant: b0

These configurations can be loaded directly with load_model("path/to/config.yml"). See the Custom Model Registration section for usage examples.