计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 48-53.doi: 10.11896/jsjkx.190700146
所属专题: 智能软件工程
康雁,崔国荣,李浩,杨其越,李晋源,王沛尧
KANG Yan,CUI Guo-rong,LI Hao,YANG Qi-yue,LI Jin-yuan,WANG Pei-yao
摘要: 随着软件数量的急剧增长以及种类的日益多样化,挖掘软件需求文本特征并对软件需求特征聚类,成为了软件工程领域的一大挑战。软件需求文本的聚类为软件开发过程提供了可靠的保障,同时降低了需求分析阶段的潜在风险和负面影响。然而,软件需求文本存在离散度高、噪声大和数据稀疏等特点,目前有关聚类的工作局限于单一类型的文本,鲜有考虑软件需求的功能语义。文中鉴于需求文本的特点和传统型聚类方法的局限性,提出了融合自注意力机制和多路金字塔卷积的软件需求聚类算法(SA-MPCN&SOM)。该方法通过自注意力机制捕获全局特征,然后基于多路金字塔卷积从不同窗口的通路深度挖掘需求文本特征,使得感知的文本片段逐倍增加,最终融合多路文本特征,利用SOM完成聚类。在软件需求数据上的实验表明,所提方法能较好地挖掘需求特征并对其聚类,性能上优于其他特征提取方式和聚类算法。
中图分类号:
[1]TONG Z X,MA P J,DING X,et al.Requirement Research on Demand Clustering and Demand Optimization Method Based on Natural Language Understanding [J].High Technology Letters,2015,25(3):257-269. [2]MÖLLER K H.Ausgangsdaten für Qualit?tsmetriken-Eine Fundgrube für Analysen[M]∥Software-Metriken in der Praxis.Springer,1996. [3]BOEHM B W,ROSS R.Theory-W Software Project Management:Principles and Examples[J].IEEE Transactions on Software Engineering,1989,15(7):902-916. [4]DAVIS A M.Software requirements:objects,functions,and states [M].PTR Prentice Hall,1993. [5]XIAO W N,ZHANG W Q,WANG L L.Research on a Software Needs Analysis Risk Assessment Model Based on BP Neural Network [J].Computer Science,2011,38(4):199-202. [6]WANG Y M,HAN F,WANG H P,et al.Software Demand Risk Model Based on Grey Clustering Evaluation and Its Application[J].Computer Engineering and Design,2006(18):3497-3500. [7]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[M]∥Encyclopedia of Systems Biology.New York:Springer,2013. [8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762,2017. [9]MARGHNY H.MOHAMED,MOHAMMED M.Abdelsamea:Self Organization Map based Texture Feature Extraction for Efficient Medical Image Categorization[J].arXiv:1408.4143,2014. [10]MARTIN J,KLEINROCK L.Excerpts from:An Information Systems Manifesto[J].Communications of the ACM,1985,28(3):252-255. [11]ZHAO W,ZHANG L,MEI H,et al.A Program Clustering Method Based on Functional Demand Hierarchy Condensation[J].Journal of Software,2006(8):1661-1668. [12]JIANG B,YE L Y,PAN W F,et al.Service clustering method based on demand function semantics [J].Journal of Computers,2018,41(6):1035-1046. [13]SUN Z Y,LIU G S.Research on neural network clustering algorithm for short texts [J].Computer Science,2018,45 (S1):392-395. [14]HU W S,YANG J F,ZHAO M.Demand analysis based on grey clustering algorithm [J].Computer Science,2016,43 (S1):471-475. [15]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013. [16]COATES A,NG A Y.Learning feature representations with K-means[J].Lecture Notes in Computer Science,2012,7700:561-580. [17]ZEPEDA-MENDOZA M L,RESENDIS-ANTONIO O.Hierarchical Agglomerative Clustering[M]∥Encyclopedia of Systems Biology.New York:Springer,2013. [18]COMON P.Independent component analysis,A new concept [J].Signal Processing,1994,36(3):287-314. [19]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2012,3:993-1022. |
[1] | 姜梦函, 李邵梅, 郑洪浩, 张建朋. 基于改进位置编码的谣言检测模型 Rumor Detection Model Based on Improved Position Embedding 计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046 |
[2] | 金方焱, 王秀利. 融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取 Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM 计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190 |
[3] | 张嘉淏, 刘峰, 齐佳音. 一种基于Bottleneck Transformer的轻量级微表情识别架构 Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer 计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023 |
[4] | 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩. 融合Bert和图卷积的深度集成学习软件需求分类 Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution 计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065 |
[5] | 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀. 基于BERT-GRU-ATT模型的中文实体关系分类 Chinese Entity Relations Classification Based on BERT-GRU-ATT 计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123 |
[6] | 胡艳丽, 童谭骞, 张啸宇, 彭娟. 融入自注意力机制的深度学习情感分析方法 Self-attention-based BGRU and CNN for Sentiment Analysis 计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063 |
[7] | 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳. 基于多级特征和全局上下文的纵膈淋巴结分割算法 Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context 计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067 |
[8] | 王习, 张凯, 李军辉, 孔芳, 张熠天. 联合自注意力和循环网络的图像标题生成 Generation of Image Caption of Joint Self-attention and Recurrent Neural Network 计算机科学, 2021, 48(4): 157-163. https://doi.org/10.11896/jsjkx.200300146 |
[9] | 郁友琴, 李弼程. 基于多粒度文本特征表示的微博用户兴趣识别 Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation 计算机科学, 2021, 48(12): 219-225. https://doi.org/10.11896/jsjkx.201100128 |
[10] | 周小诗, 张梓葳, 文娟. 基于神经网络机器翻译的自然语言信息隐藏 Natural Language Steganography Based on Neural Machine Translation 计算机科学, 2021, 48(11A): 557-564. https://doi.org/10.11896/jsjkx.210100015 |
[11] | 袁禄, 朱郑州, 任庭玉. 虚假评论识别研究综述 Survey on Fake Review Recognition 计算机科学, 2021, 48(1): 111-118. https://doi.org/10.11896/jsjkx.200500101 |
[12] | 张浩洋, 周良. 改进的GHSOM算法在民航航空法规知识地图构建中的应用 Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction 计算机科学, 2020, 47(6A): 429-435. https://doi.org/10.11896/JsJkx.190700161 |
[13] | 赵澄, 叶耀威, 姚明海. 基于金融文本情感的股票波动预测 Stock Volatility Forecast Based on Financial Text Emotion 计算机科学, 2020, 47(5): 79-83. https://doi.org/10.11896/jsjkx.190400145 |
[14] | 张鹏飞, 李冠宇, 贾彩燕. 面向自然语言推理的基于截断高斯距离的自注意力机制 Truncated Gaussian Distance-based Self-attention Mechanism for Natural Language Inference 计算机科学, 2020, 47(4): 178-183. https://doi.org/10.11896/jsjkx.190600149 |
[15] | 张义杰, 李培峰, 朱巧明. 基于自注意力机制的事件时序关系分类方法 Event Temporal Relation Classification Method Based on Self-attention Mechanism 计算机科学, 2019, 46(8): 244-248. https://doi.org/10.11896/j.issn.1002-137X.2019.08.040 |
|