Sri Lankan elephant sound classification using deep learning

Loading...
Thumbnail Image

Date

2024

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Understanding elephant caller types is crucial for various aspects of wildlife conservation and ecological research. By decoding the intricate vocalizations of elephants, researchers gain valuable insights into their behavior, social dynamics, and emotional expressions, which are pivotal for species conservation efforts. Elephant vocalizations serve as indicators of ecosystem health and vitality, aiding in ecological monitoring and biodiversity conservation initiatives. Furthermore, investigating caller types contributes to the preservation of cultural heritage by honoring the profound connection between humans and elephants across generations. In essence, delving into the world of elephant communication not only advances scientific knowledge but also fosters harmony between humans and these majestic animals, ensuring their long-term survival in the wild. In this study, we delve into the domain of elephant caller-type classification utilizing raw audio format processing. Our focus lies on exploring lightweight models suitable for deployment on edge devices, including MobileNet, YAMNET, and RawNet, alongside introducing a novel model termed ElephantCallerNet, based on ACDnet architecture. Notably, our investigation reveals that the ACDnet-based ElephantCaller- Net achieves an impressive accuracy of 89% when applied to a raw audio dataset. Leveraging Bayesian optimization techniques, we fine-tune crucial parameters such as learning rate, dropout, and kernel size, thereby enhancing model performance. Moreover, we scrutinize the efficacy of spectrogram-based training, a prevalent approach in animal sound classification. Through comparative analysis, we ascertain that for our dataset, raw audio processing outperforms spectrogram-based methods. In contrast to other models in the literature that primarily focus on a single caller type or binary classification (such as identifying whether a sound is an elephant voice or not), our models are designed to classify three distinct caller types: Roar, Rumble, and Trumpet. This approach significantly increases the complexity of our experiments compared to those discussed in the literature. In the domain of elephant vocalization analysis, there has been limited exploration into the direct processing of raw audio data. Predominantly, various feature extraction techniques have been employed before training machine learning algorithms. In our investigation, we aim to bypass preprocessing stages and directly input raw audio data into machine learning models to assess the feasibility and efficacy of training on unprocessed audio signals..

Description

Citation

Dewmini, A.G.H.U.D. (2023). Continuous [Master's theses, University of Moratuwa]. Institutional Repository University of Moratuwa. https://dl.lib.uom.lk/handle/123/23688

DOI

Endorsement

Review

Supplemented By

Referenced By