AVEX - Animal Vocalization Encoder Library Documentation

Getting Started

What is AVEX?

The Animal Vocalization Encoder library AVEX is an API for model loading and inference, and a Python-based system for training and evaluating bioacoustics representation learning models.

It provides:

  • Unified API for loading and using pre-trained audio models

  • Model Registry for managing model configurations

  • Plugin Architecture for custom model registration

  • Training and Evaluation tools for bioacoustics tasks

  • Embedding Extraction capabilities for downstream tasks

Quick Start

Basic Usage

from avex import list_models, load_model, describe_model

# List available models (prints table and returns dict)
models = list_models()  # Prints table + returns dict with detailed info
print(f"Available models: {list(models.keys())}")

# Get detailed information about a model
describe_model("esp_aves2_naturelm_audio_v1_beats", verbose=True)
# Shows: model type, whether it has a trained classifier, number of classes, usage examples

# Load a pre-trained model with checkpoint (num_classes extracted automatically)
model = load_model("esp_aves2_sl_beats_all", device="cpu")

# For a new task, load a backbone and attach a probe head (classifier)
base = load_model("esp_aves2_naturelm_audio_v1_beats", return_features_only=True, device="cpu")

# Load for embedding extraction (returns unpooled features)
model = load_model("esp_aves2_naturelm_audio_v1_beats", return_features_only=True, device="cpu")
# Returns (batch, time_steps, 768) for BEATs instead of classification logits

Probes (Heads on Top of Backbones)

Probes are task-specific heads attached to pretrained backbones for transfer learning:

┌─────────────────────────────────────────────────────────┐
│                    Audio Input                          │
│              (batch, time_steps)                        │
└────────────────────┬────────────────────────────────────┘
                     │
                     ▼
         ┌───────────────────────┐
         │   Pretrained Backbone │
         │   (e.g., BEATs)       │
         │                       │
         │  ┌─────────────────┐  │
         │  │  Layer 1        │  │
         │  └────────┬────────┘  │
         │           │           │
         │  ┌────────▼────────┐  │
         │  │  Layer 2        │  │
         │  └────────┬────────┘  │
         │           │           │
         │  ┌────────▼────────┐  │
         │  │  ...            │  │
         │  └────────┬────────┘  │
         │           │           │
         │  ┌────────▼────────┐  │
         │  │  Last Layer     │◄─┼── target_layers=["last_layer"]
         │  └────────┬────────┘  │
         └───────────┼───────────┘
                     │
                     ▼
         ┌───────────────────────┐
         │   Embeddings          │
         │   (batch, dim)        │
         └───────────┬───────────┘
                     │
                     ▼
         ┌───────────────────────┐
         │   Probe Head          │
         │   (linear/MLP/etc.)   │
         └───────────┬───────────┘
                     │
                     ▼
         ┌───────────────────────┐
         │   Task Predictions    │
         │   (batch, num_classes)│
         └───────────────────────┘

Key Concepts:

  • Backbone: Pretrained model (frozen or fine-tunable)

  • Probe: Task-specific head (trained for your task)

  • Target Layers: Which backbone layers to extract features from

  • Online Training: Backbone + probe trained together

  • Offline Training: Embeddings pre-computed, probe trained separately

from avex import load_model
from avex.models.probes import build_probe_from_config
from avex.configs import ProbeConfig

# Load backbone for feature extraction
base = load_model("esp_aves2_naturelm_audio_v1_beats", return_features_only=True, device="cpu")

# Define a simple linear probe on the backbone features
probe_config = ProbeConfig(
    probe_type="linear",
    target_layers=["last_layer"],
    aggregation="mean",
    freeze_backbone=True,
    online_training=True,
)

probe = build_probe_from_config(
    probe_config=probe_config,
    base_model=base,
    num_classes=10,
    device="cpu",
)

Note: Each model expects a specific sample rate (e.g., 16 kHz for BEATs, 32 kHz for Perch). Use describe_model() to check, and resample with librosa.resample() if needed. See Audio Requirements for details.

For more examples, see the examples/ directory:

  • 00_quick_start.py - Basic model loading and testing

  • 01_basic_model_loading.py - Loading models with different configurations

  • 02_checkpoint_loading.py - Working with checkpoints and class mappings

  • 03_custom_model_registration.py - Creating and registering custom models and ModelSpecs

  • 04_training_and_evaluation.py - Full training loop and evaluation examples

  • 05_embedding_extraction.py - Feature extraction with return_features_only=True (unpooled features)

  • 06_classifier_head_loading.py - Understanding classifier head behavior

Installation

The installation process depends on how you plan to use this package:

  • API user: you just want to load models and run inference.

  • Developer: you want to clone the repo, modify code, or contribute.

1. API Usage

For users who want to install the package and use it as a library (for example to load models and run inference).

Prerequisites

  • Python 3.10, 3.11, or 3.12

Install with pip

pip install avex

Install with uv

# Option A: Add and install in one step
uv add avex

# Option B: If you've already added it to [project.dependencies] in pyproject.toml
uv sync

2. Training Setup (ESP Internal)

Training the original supervised learning models requires esp-data for dataset management. This package is currently only available to Earth Species Project team members.

Prerequisites

  • Python 3.10, 3.11, or 3.12

  • Git

  • GCP authentication:

gcloud auth login
gcloud auth application-default login

Install with pip

pip install "avex[dev]" \
  --extra-index-url https://oauth2accesstoken@us-central1-python.pkg.dev/okapi-274503/esp-pypi/simple/

This installs additional dependencies for training:

  • esp-data – dataset management

  • esp-sweep – hyperparameter sweeping

  • pytorch-lightning, mlflow, wandb – training infrastructure

For contributing to the codebase, see CONTRIBUTING.md

Core Documentation

  • API Reference – Complete API documentation for model loading, registry, and management functions

  • Architecture – Framework architecture, core components, and plugin system

  • Supported Models – List of supported models and their configurations

  • Configuration – ModelSpec parameters, audio requirements, and configuration options

Usage Guides

Advanced Topics

  • Probe System – Understanding and using probes for transfer learning

  • API Probes – API reference for probe-related functionality

  • Custom Model Registration – Guide on registering custom model classes and loading pre-trained models

Contributing

  • Contributing Guide - Instructions for contributing to the project, including development setup, testing, and pull request process