Audio AI Intern

Internship, 2025.04-2025.09, Amoon AI, 2025

From April to September 2025, I interned at Amoon AI, focusing on low-resource infant-cry understanding and deployable audio intelligence.

The project addressed a difficult acoustic recognition problem: infant cries have highly entangled acoustic patterns, ambiguous semantic boundaries, and limited reliable labels. The goal was to improve both cry activity detection and infant-state classification beyond the limitations of existing methods.

My work included building the data foundation for the team. I collected and manually filtered 20k high-confidence labeled audio clips, then used iterative pseudo-labeling to expand the training batches while controlling label noise. This dataset work supported downstream model development and internal experiments.

On the modeling side, I fine-tuned multiple audio foundation models, including BEATs, Whisper, Audio-MAE, Wav2Vec2, Qwen-Audio, and CLAP. I also trained a Conformer-based audio model from scratch and explored practical improvements for infant-cry classification. By combining model tricks with log-based signals and phonetic rules, the multi-stage training pipeline reached a best five-class classification accuracy of 67.2% and helped resolve several corner cases.

For deployment, I iterated six versions of an infant-VAD dataset and designed a lightweight cry detection algorithm. The final system reached 98% detection accuracy with a false-alarm rate below 5%, and was used for edge-side deployment on device.

I also conducted a systematic literature review, supported experiment writing, and completed more than 60 of 120 comparative experiments, including benchmarking and analysis against related algorithms.