面向不平衡数据集的tor混淆流量对抗生成识别框架，首发论文 -凯发网

0
0
浏览
下载

面向不平衡数据集的tor混淆流量对抗生成识别框架

首发时间：2025-02-28

赵鹏飞 ^{1
2}
赵鹏飞（2000-），男，硕士研究生，主要研究方向：匿名通信与隐私保护.
陆天波 ^{1
2}
陆天波（1977-），男，教授，主要研究方向：匿名通信与隐私保护.

1、北京邮电大学计算机学院（国家示范性软件学院），北京 100876
2、可信分布式计算与服务教育部重点实验室(北京邮电大学)，北京 100876

摘要：深度学习模型在识别原始tor流量的任务中表现出了卓越的性能，但是将此类模型应用于tor混淆传输流量识别的研究却较为匮乏。目前的困境在于，深度学习模型的性能很大程度上取决于数据集的质量，而高质量的混淆流量数据集却难以获取。为了解决上述问题，本文提出一种新的生成对抗网络（generative adversarial network，gan）框架，通过生成器补充少数类样本，可以直接提升判别器在不平衡数据集上的分类性能，简化了以往基于gan的数据增强流程。另外，框架采用结构感知的方式处理流量样本，从包、流和痕迹层级学习流量结构化特征，实现了超越以往工作的混淆流量识别性能。最后，本文通过自动化的流量采集系统收集目前广泛采用的多种混淆协议，包括obfs4、snowflake、webtunnel、dnstt及shadowsocks，构建了全面、最新的tor混淆传输协议数据集。实验表明，本模型在平衡和不平衡数据集上均有超出目前模型性能的表现，具有面向低质量数据集的鲁棒性。

关键词：生成对抗网络

for information in english, please click here

gan-based tor obfuscated traffic identification framework on imbalanced dataset

zhao pengfei ¹
赵鹏飞（2000-），男，硕士研究生，主要研究方向：匿名通信与隐私保护.
lu tianbo ²
陆天波（1977-），男，教授，主要研究方向：匿名通信与隐私保护.

1、school of computer science(notional pilot software engineering school), beijing university of posts and telecommunications, beijing 100876
2、key laboratory of trustworthy distributed computing and service (bupt), ministry of education beijing, 100876

abstract：although deep learning models have demonstrated strong performance in identifying vanilla tor traffic, the classification of obfuscated tor trafficremains under-explored. a critical challenge lies in the scarcity of high-quality datasets that are perfectly balanced and noise-free, upon which deep learning models heavily rely for optimal performance.to bridge this gap, we proposes a novel generative adversarial network (gan) framework that enhances the discriminator\'s performance on imbalanced datasets by directly supplementing minority class samples through the generator, simplifying the traditional gan-based data augmentation process. additionally, the framework employs a structure-aware approach to process traffic samples, learning structured features at the packet, flow, and trace levels, achieving superior obfuscated traffic identification performance compared to previous works. furthermore, an automated traffic collection system was developed to gather a comprehensive and up-to-date dataset of widely adopted obfuscation protocols, including obfs4, snowflake, webtunnel, dnstt, and shadowsocks. the proposed model demonstrates state-of-the-art performance on both balanced and imbalanced datasets, showcasing robustness in low-quality dataset scenarios.

keywords： generative adversarial network

基金：

1. 国家自然科学基金（62162060）

论文图表：

引用

导出参考文献

赵鹏飞，陆天波. 面向不平衡数据集的tor混淆流量对抗生成识别框架[eb/ol]. 北京：中国科技论文在线 [2025-02-28]. https://www.paper.edu.cn/releasepaper/content/202502-146.

no.****

动态公开评议

共计0人参与

动态评论进行中

全部评论

0/1000

勘误表

面向不平衡数据集的tor混淆流量对抗生成识别框架

论文编号	202502-146
论文题目	面向不平衡数据集的tor混淆流量对抗生成识别框架
文献类型
收录期刊	上传封面中文期刊英文期刊期刊名称（中文）期刊名称（英文）年，卷（）上传封面中文专著英文专著书名（中文）书名（英文）出版地出版社出版年上传封面中文译著英文译著书名（中文）书名（英文）出版地出版社出版年上传封面中文论文集英文论文集编者.论文集名称（中文） [c]. 出版地出版社出版年， - 编者.论文集名称（英文） [c]. 出版地出版社出版年，- 上传封面中文文献英文文献期刊名称（中文）期刊名称（英文）日期-- 在线地址http:// 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期-- 上传封面中文文献英文文献文题（中文）文题（英文）出版地出版社,出版日期--
英文作者写法：中外文作者均姓前名后，姓大写，名的第一个字母大写，姓全称写出，名可只写第一个字母，其后不加实心圆点“.”, 作者之间用逗号“，”分隔，最后为实心圆点“.”, 示例1：原姓名写法：albert einstein,编入参考文献时写法：einstein a. 示例2：原姓名写法：李时珍；编入参考文献时写法：li s z. 示例3：yelland r l,jones s c,easton k s,et al.