Usage¶
End-to-end classification pipeline for 21 giant otter (Pteronura brasiliensis) call types using avex transfer learning.
Inspired by: ESP library / giant_otter
Dataset¶
Source: Internet Archive
Classes: 21 call types (barks, screams, contact calls, whistles, humming, …)
Size: 9–32 recordings per class
Pipeline¶
audio files ──► embedding extraction ──► linear probe ──► accuracy + figures
Both avex models are compared:
esp_aves2_sl_beats_all— mean pool over temporal tokens(N, T, 768) → (N, 768)esp_aves2_effnetb0_all— global average pool(N, C, H, W) → (N, C)
Usage¶
cd examples/01_giant_otter_classifier
# Train probes for both models, save embeddings + UMAP figure
python train.py --config config.yaml
# Evaluate one model and save confusion matrix
python evaluate.py --config config.yaml --model esp_aves2_sl_beats_all
Configuration¶
Key fields in config.yaml:
dataset:
url: "https://archive.org/..."
sample_rate: 16000
window_seconds: 3.0
models:
- name: "esp_aves2_sl_beats_all"
pooling: "mean" # mean | cls | max
- name: "esp_aves2_effnetb0_all"
pooling: "mean"
probe:
type: "linear"
test_size: 0.2
Outputs¶
File |
Description |
|---|---|
|
Per-model embedding arrays |
|
Interactive UMAP scatter (Plotly) |
|
Accuracy bar chart |
|
Normalised confusion matrix |
Note
Embeddings saved here are used as input to BEATs Transformer Layer Analysis.