Sri Lankan elephant sound classification using deep learning
Loading...
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Understanding elephant caller types is crucial for various aspects of wildlife conservation and ecological research. By decoding the intricate vocalizations of elephants, researchers gain valuable insights into their behavior, social dynamics, and emotional expressions, which are pivotal for species conservation efforts. Elephant vocalizations serve as indicators of ecosystem health and vitality, aiding in ecological monitoring and biodiversity conservation initiatives. Furthermore, investigating caller types contributes to the preservation of cultural heritage by honoring the profound connection between humans and elephants across generations. In essence, delving into the world of elephant communication not only advances scientific knowledge but also fosters harmony between humans and these majestic animals, ensuring their long-term survival in the wild. In this study, we delve into the domain of elephant caller-type classification utilizing raw audio format processing. Our focus lies on exploring lightweight models suitable for deployment on edge devices, including MobileNet, YAMNET, and RawNet, alongside introducing a novel model termed ElephantCallerNet, based on ACDnet architecture. Notably, our investigation reveals that the ACDnet-based ElephantCaller- Net achieves an impressive accuracy of 89% when applied to a raw audio dataset. Leveraging Bayesian optimization techniques, we fine-tune crucial parameters such as learning rate, dropout, and kernel size, thereby enhancing model performance. Moreover, we scrutinize the efficacy of spectrogram-based training, a prevalent approach in animal sound classification. Through comparative analysis, we ascertain that for our dataset, raw audio processing outperforms spectrogram-based methods. In contrast to other models in the literature that primarily focus on a single caller type or binary classification (such as identifying whether a sound is an elephant voice or not), our models are designed to classify three distinct caller types: Roar, Rumble, and Trumpet. This approach significantly increases the complexity of our experiments compared to those discussed in the literature. In the domain of elephant vocalization analysis, there has been limited exploration into the direct processing of raw audio data. Predominantly, various feature extraction techniques have been employed before training machine learning algorithms. In our investigation, we aim to bypass preprocessing stages and directly input raw audio data into machine learning models to assess the feasibility and efficacy of training on unprocessed audio signals..
Description
Keywords
ELEPHANT VOCALIZATION, MACHINE LEARNING-Supervised Learning, AUDIO-VISUAL REPRESENTATION-Feature Extraction, AUDIO DATA PROCESSING, RAW, ELEPHANT VOICES DATASET, ASIAN ELEPHANT VOCALIZATIONS DATASET, WILDLIFE CONSERVATION-Elephants, ECOLOGICAL RESEARCH-Elephants, COMPUTER SCIENCE AND ENGINEERING-Dissertation, MSc in Computer Science
Citation
Dewmini, A.G.H.U.D. (2023). Continuous [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa.
https://dl.lib.uom.lk/handle/123/23688