Show simple item record Talagala, PD Hyndman, RJ Miles, KM 2023-05-25T08:34:27Z 2023-05-25T08:34:27Z 2021
dc.identifier.citation Talagala, P. D., Hyndman, R. J., & Smith-Miles, K. (2021). Anomaly Detection in High-Dimensional Data. Journal of Computational and Graphical Statistics, 30(2), 360–374. en_US
dc.identifier.issn 360-374 en_US
dc.description.abstract The HDoutliers algorithm is a powerful unsupervised algorithm for detecting anomalies in high-dimensional data, with a strong theoretical foundation. However, it suffers from some limitations that significantly hinder its performance level, under certain circumstances. In this article, we propose an algorithm that addresses these limitations. We define an anomaly as an observation where its k-nearest neighbor distance with the maximum gap is significantly different from what we would expect if the distribution of k-nearest neighbors with the maximum gap is in the maximum domain of attraction of the Gumbel distribution. An approach based on extreme value theory is used for the anomalous threshold calculation. Using various synthetic and real datasets, we demonstrate the wide applicability and usefulness of our algorithm, which we call the stray algorithm. We also demonstrate how this algorithm can assist in detecting anomalies present in other data structures using feature engineering. We show the situations where the stray algorithm outperforms the HDoutliers algorithm both in accuracy and computational time. This framework is implemented in the open source R package stray. Supplementary materials for this article are available online en_US
dc.language.iso en_US en_US
dc.publisher Taylor and Francis en_US
dc.subject Extreme value theory en_US
dc.subject High dimensional data en_US
dc.subject Nearest neighbour searching en_US
dc.subject Temporal data en_US
dc.subject Unsupervised outlier detection en_US
dc.title Anomaly detection in high-dimensional data en_US
dc.type Article-Full-text en_US
dc.identifier.year 2021 en_US
dc.identifier.journal Journal of Computational and Graphical Statistics en_US
dc.identifier.issue 2 en_US
dc.identifier.volume 30 en_US
dc.identifier.database Taylor & Francis Online en_US
dc.identifier.pgnos 360-374 en_US
dc.identifier.doi en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record