Supported Models¶
Official Models (ESP-AVES2)¶
The ESP-AVES2 model collection is available on HuggingFace: EarthSpeciesProject/esp-aves2
Available Models¶
Model Name |
Architecture |
Training Data |
HuggingFace |
|---|---|---|---|
|
BEATs |
All (AudioSet + Bio) |
|
|
BEATs |
Bioacoustics |
|
|
BEATs + NatureLM |
All |
|
|
EAT |
All (AudioSet + Bio) |
|
|
EAT |
Bioacoustics |
|
|
EAT (SSL) |
All |
|
|
EAT (SSL) |
Bioacoustics |
|
|
EfficientNet-B0 |
All (AudioSet + Bio) |
|
|
EfficientNet-B0 |
Bioacoustics |
|
|
EfficientNet-B0 |
AudioSet |
Supported Architectures¶
BEATs: Bidirectional Encoder representation from Audio Transformers
EAT: Efficient Audio Transformer models
EfficientNet: EfficientNet-based models adapted for audio classification
AVES: AVES model for bioacoustics
BirdMAE: BirdMAE masked autoencoder for bioacoustic representation learning
Labels vs Features Only¶
Capability |
Description |
|---|---|
Classification with labels |
Model has a trained classifier head and a class mapping (e.g. |
Features / embeddings only |
Any model can be loaded for embedding extraction by passing |
How to see which models offer what
At runtime: Call
list_models()— the printed table has a “Trained Classifier” column (✅ = has checkpoint + class mapping, ❌ = backbone/features only). The returned dict includeshas_trained_classifierandnum_classesper model.Per model: Call
describe_model("model_name", verbose=True)to see “Has Trained Classifier”, checkpoint path, class mapping path, and number of classes.
All official ESP-AVES2 models have both a checkpoint and a class mapping, so they support classification with labels. They also support embedding extraction with load_model(..., return_features_only=True).
Model Configuration¶
Models are configured using YAML files which contain the model specifications model_spec. The official config files are in the avex/api/configs/official_models/ directory. These files define the model architecture, audio preprocessing parameters, and optional checkpoint/label mapping paths.
Minimal Model Configuration:
# Example: my_model.yml - Minimal configuration for model loading
model_spec:
name: efficientnet
pretrained: false
device: cuda
audio_config:
sample_rate: 16000
representation: mel_spectrogram
n_mels: 128
efficientnet_variant: b0
Full Model Configuration (with checkpoint):
# Example: esp_aves2_effnetb0_all.yml - Complete configuration
# Optional: Default checkpoint path
checkpoint_path: hf://EarthSpeciesProject/esp-aves2-effnetb0-all/esp-aves2-effnetb0-all.safetensors
# Optional: Label mapping for human-readable predictions
class_mapping_path: hf://EarthSpeciesProject/esp-aves2-effnetb0-all/label_map.json
# Required: Model specification
model_spec:
name: efficientnet
pretrained: false
device: cuda
audio_config:
sample_rate: 16000
representation: mel_spectrogram
n_mels: 128
target_length_seconds: 10
efficientnet_variant: b0
These configurations can be loaded directly with load_model("path/to/config.yml"). See the Custom Model Registration section for usage examples.