Abstract:
The security and privacy of the end-users are a few of the most important components of a communication network. Though end-to-end encryption (e.g., TLS/SSL) fulfils this requirement, it makes inspecting network traffic with legacy solutions such as Deep Packet Inspection difficult. Recent Machine Learning techniques have shown outstanding performance in encrypted traffic classification. Nevertheless, such approaches require efficient flow sampling at real enterprise-scale networks due to the sheer volume of transferred data. Through this paper, we propose a holistic architecture to extract flow information of encrypted data at multi Gbps line rate using sampling and sketching mechanisms, enabling network operators to estimate flow size distribution accurately and understand the behavior of VPN-obfuscated traffic. Using over 6000 video traffic traces, under three main evaluation scenarios based on trace duration and starting time point, we show that it is possible to achieve 99% accuracy for service provider classification and over 90% accuracy for content classification for a given service provider in the best case. We also deploy our solution at an operational enterprise-scale network leveraging kernel bypassing to demonstrate its capability to efficiently sample live traffic for analytics.
Citation:
Kattadige, C., Choi, K. N., Wijesinghe, A., Nama, A., Thilakarathna, K., Seneviratne, S., & Jourjon, G. (2021). SETA++: Real-time scalable encrypted traffic analytics in multi-gbps networks. IEEE transactions on network and service management, 18(3), 3244–3259. https://doi.org/10.1109/TNSM.2021.3085097