面向不平衡数据集的tor混淆流量对抗生成识别框架
首发时间:2025-02-28
摘要:深度学习模型在识别原始tor流量的任务中表现出了卓越的性能,但是将此类模型应用于tor混淆传输流量识别的研究却较为匮乏。目前的困境在于,深度学习模型的性能很大程度上取决于数据集的质量,而高质量的混淆流量数据集却难以获取。为了解决上述问题,本文提出一种新的生成对抗网络(generative adversarial network,gan)框架,通过生成器补充少数类样本,可以直接提升判别器在不平衡数据集上的分类性能,简化了以往基于gan的数据增强流程。另外,框架采用结构感知的方式处理流量样本,从包、流和痕迹层级学习流量结构化特征,实现了超越以往工作的混淆流量识别性能。最后,本文通过自动化的流量采集系统收集目前广泛采用的多种混淆协议,包括obfs4、snowflake、webtunnel、dnstt及shadowsocks,构建了全面、最新的tor混淆传输协议数据集。实验表明,本模型在平衡和不平衡数据集上均有超出目前模型性能的表现,具有面向低质量数据集的鲁棒性。
关键词: 生成对抗网络
for information in english, please click here
gan-based tor obfuscated traffic identification framework on imbalanced dataset
abstract:although deep learning models have demonstrated strong performance in identifying vanilla tor traffic, the classification of obfuscated tor trafficremains under-explored. a critical challenge lies in the scarcity of high-quality datasets that are perfectly balanced and noise-free, upon which deep learning models heavily rely for optimal performance.to bridge this gap, we proposes a novel generative adversarial network (gan) framework that enhances the discriminator\'s performance on imbalanced datasets by directly supplementing minority class samples through the generator, simplifying the traditional gan-based data augmentation process. additionally, the framework employs a structure-aware approach to process traffic samples, learning structured features at the packet, flow, and trace levels, achieving superior obfuscated traffic identification performance compared to previous works. furthermore, an automated traffic collection system was developed to gather a comprehensive and up-to-date dataset of widely adopted obfuscation protocols, including obfs4, snowflake, webtunnel, dnstt, and shadowsocks. the proposed model demonstrates state-of-the-art performance on both balanced and imbalanced datasets, showcasing robustness in low-quality dataset scenarios.
keywords: generative adversarial network
论文图表:
引用
导出参考文献
no.****
动态公开评议
共计0人参与
勘误表
面向不平衡数据集的tor混淆流量对抗生成识别框架
评论
全部评论