Usage¶
Generic batch embedding extraction for any directory of .wav or .flac files.
Inspired by: Perch / embed_audio.ipynb
Usage¶
cd examples/02_embed_audio
python embed_audio.py --config config.yaml
What it does¶
Scans
audio.input_dirfor.wav/.flacfilesShards each file into fixed-length overlapping windows
Runs both avex models in
return_features_only=TruemodeMean-pools the output tokens / spatial maps to a 1-D vector per window
Saves one
.ptfile per audio file per model (shape(n_windows, dim))
Already-extracted files are skipped — the script is idempotent.
Configuration¶
audio:
input_dir: "data/audio"
sample_rate: 16000
window_seconds: 5.0
hop_seconds: 2.5
models:
- name: "esp_aves2_sl_beats_all"
- name: "esp_aves2_effnetb0_all"
output:
embeddings_dir: "outputs/embeddings"
Output format¶
outputs/embeddings/
├── esp_aves2_sl_beats_all/
│ ├── recording_001.pt # shape: (n_windows, 768)
│ └── recording_002.pt
└── esp_aves2_effnetb0_all/
├── recording_001.pt # shape: (n_windows, C)
└── recording_002.pt
Tip
The embeddings produced here are the expected input format for ../../examples/07_interactive_visualization/visualize.