计算机科学 ›› 2020, Vol. 47 ›› Issue (3): 48-53.doi: 10.11896/jsjkx.190700146

所属专题: 智能软件工程

• 智能软件工程 • 上一篇    下一篇

融合自注意力机制和多路金字塔卷积的软件需求聚类算法

康雁,崔国荣,李浩,杨其越,李晋源,王沛尧   

  1. (云南大学软件学院 昆明650091)
  • 收稿日期:2019-07-22 出版日期:2020-03-15 发布日期:2020-03-30
  • 通讯作者: 崔国荣([email protected])
  • 基金资助:
    国家自然科学基金(61762092,61762089);云南省软件工程重点实验室开放基金项目(2017SE204)

Software Requirements Clustering Algorithm Based on Self-attention Mechanism and Multi- channel Pyramid Convolution

KANG Yan,CUI Guo-rong,LI Hao,YANG Qi-yue,LI Jin-yuan,WANG Pei-yao   

  1. (College of Software, Yunnan University, Kunming 650091, China)
  • Received:2019-07-22 Online:2020-03-15 Published:2020-03-30
  • About author:KANG Yan,born in 1972,master supervisor,is member of China Computer Federation (CCF).Her main research interests include machine learning and so on. CUI Guo-rong,born in 1995,master.His main research interests include natural language processing and so on.
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (61762092, 61762089) and Yunnan Provincial Key Laboratory of Software Engineering Open Fund Project (2017SE204).

摘要: 随着软件数量的急剧增长以及种类的日益多样化,挖掘软件需求文本特征并对软件需求特征聚类,成为了软件工程领域的一大挑战。软件需求文本的聚类为软件开发过程提供了可靠的保障,同时降低了需求分析阶段的潜在风险和负面影响。然而,软件需求文本存在离散度高、噪声大和数据稀疏等特点,目前有关聚类的工作局限于单一类型的文本,鲜有考虑软件需求的功能语义。文中鉴于需求文本的特点和传统型聚类方法的局限性,提出了融合自注意力机制和多路金字塔卷积的软件需求聚类算法(SA-MPCN&SOM)。该方法通过自注意力机制捕获全局特征,然后基于多路金字塔卷积从不同窗口的通路深度挖掘需求文本特征,使得感知的文本片段逐倍增加,最终融合多路文本特征,利用SOM完成聚类。在软件需求数据上的实验表明,所提方法能较好地挖掘需求特征并对其聚类,性能上优于其他特征提取方式和聚类算法。

关键词: 金字塔卷积, 文本聚类, 文本特征, 需求分析, 自注意力机制

Abstract: With the rapid increasing in the number of software and the increasing variety of types,how to mine the text characteristics of software requirements and cluster the characteristics of software requirements has become a major challenge in the field of software engineering.The clustering of software requirements texts provides a reliable guarantee for the software development process while reducing the potential risks and negative impacts of the requirements analysis phase.However,the software requirements text has the characteristics of high dispersion,high noise,and sparse data.At present,the work related to clustering is limited to a single type of text,and the functional semantics of software requirements are rarely considered.In view of the characteristics of the demand text and the limitations of the traditional clustering method,this paper proposed a software demand clustering algorithm (SA-MPCN&SOM) combining the self-attention mechanism and multi-channel pyramid convolution.The method captures the global features through the self-attention mechanism,and then extract the required text features from the depth of the different windows based on multi-channel pyramid convolution.Thus,the perceived text fragments are multiplied,and finally the multiplexed text features are clustered using SOM.The experimental results on the software demand data show that the proposed method can better mine the demand features,cluster the demand features,and outperform other feature extraction methods and clustering algorithms.

Key words: Demand analysis, Pyramid convolution, Self-attention, Text clustering, Text feature

中图分类号: 

  • TP309
[1]TONG Z X,MA P J,DING X,et al.Requirement Research on Demand Clustering and Demand Optimization Method Based on Natural Language Understanding [J].High Technology Letters,2015,25(3):257-269.
[2]MÖLLER K H.Ausgangsdaten für Qualit?tsmetriken-Eine Fundgrube für Analysen[M]∥Software-Metriken in der Praxis.Springer,1996.
[3]BOEHM B W,ROSS R.Theory-W Software Project Management:Principles and Examples[J].IEEE Transactions on Software Engineering,1989,15(7):902-916.
[4]DAVIS A M.Software requirements:objects,functions,and states [M].PTR Prentice Hall,1993.
[5]XIAO W N,ZHANG W Q,WANG L L.Research on a Software Needs Analysis Risk Assessment Model Based on BP Neural Network [J].Computer Science,2011,38(4):199-202.
[6]WANG Y M,HAN F,WANG H P,et al.Software Demand Risk Model Based on Grey Clustering Evaluation and Its Application[J].Computer Engineering and Design,2006(18):3497-3500.
[7]DEVLIN J,CHANG M W,LEE K,et al.BERT:Pre-training of Deep Bidirectional Transformers for Language Understanding[M]∥Encyclopedia of Systems Biology.New York:Springer,2013.
[8]VASWANI A,SHAZEER N,PARMAR N,et al.Attention Is All You Need[J].arXiv:1706.03762,2017.
[9]MARGHNY H.MOHAMED,MOHAMMED M.Abdelsamea:Self Organization Map based Texture Feature Extraction for Efficient Medical Image Categorization[J].arXiv:1408.4143,2014.
[10]MARTIN J,KLEINROCK L.Excerpts from:An Information Systems Manifesto[J].Communications of the ACM,1985,28(3):252-255.
[11]ZHAO W,ZHANG L,MEI H,et al.A Program Clustering Method Based on Functional Demand Hierarchy Condensation[J].Journal of Software,2006(8):1661-1668.
[12]JIANG B,YE L Y,PAN W F,et al.Service clustering method based on demand function semantics [J].Journal of Computers,2018,41(6):1035-1046.
[13]SUN Z Y,LIU G S.Research on neural network clustering algorithm for short texts [J].Computer Science,2018,45 (S1):392-395.
[14]HU W S,YANG J F,ZHAO M.Demand analysis based on grey clustering algorithm [J].Computer Science,2016,43 (S1):471-475.
[15]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient Estimation of Word Representations in Vector Space[J].arXiv:1301.3781,2013.
[16]COATES A,NG A Y.Learning feature representations with K-means[J].Lecture Notes in Computer Science,2012,7700:561-580.
[17]ZEPEDA-MENDOZA M L,RESENDIS-ANTONIO O.Hierarchical Agglomerative Clustering[M]∥Encyclopedia of Systems Biology.New York:Springer,2013.
[18]COMON P.Independent component analysis,A new concept [J].Signal Processing,1994,36(3):287-314.
[19]BLEI D M,NG A Y,JORDAN M I.Latent dirichlet allocation[J].Journal of Machine Learning Research,2012,3:993-1022.
[1] 姜梦函, 李邵梅, 郑洪浩, 张建朋.
基于改进位置编码的谣言检测模型
Rumor Detection Model Based on Improved Position Embedding
计算机科学, 2022, 49(8): 330-335. https://doi.org/10.11896/jsjkx.210600046
[2] 金方焱, 王秀利.
融合RACNN和BiLSTM的金融领域事件隐式因果关系抽取
Implicit Causality Extraction of Financial Events Integrating RACNN and BiLSTM
计算机科学, 2022, 49(7): 179-186. https://doi.org/10.11896/jsjkx.210500190
[3] 张嘉淏, 刘峰, 齐佳音.
一种基于Bottleneck Transformer的轻量级微表情识别架构
Lightweight Micro-expression Recognition Architecture Based on Bottleneck Transformer
计算机科学, 2022, 49(6A): 370-377. https://doi.org/10.11896/jsjkx.210500023
[4] 康雁, 吴志伟, 寇勇奇, 张兰, 谢思宇, 李浩.
融合Bert和图卷积的深度集成学习软件需求分类
Deep Integrated Learning Software Requirement Classification Fusing Bert and Graph Convolution
计算机科学, 2022, 49(6A): 150-158. https://doi.org/10.11896/jsjkx.210500065
[5] 赵丹丹, 黄德根, 孟佳娜, 董宇, 张攀.
基于BERT-GRU-ATT模型的中文实体关系分类
Chinese Entity Relations Classification Based on BERT-GRU-ATT
计算机科学, 2022, 49(6): 319-325. https://doi.org/10.11896/jsjkx.210600123
[6] 胡艳丽, 童谭骞, 张啸宇, 彭娟.
融入自注意力机制的深度学习情感分析方法
Self-attention-based BGRU and CNN for Sentiment Analysis
计算机科学, 2022, 49(1): 252-258. https://doi.org/10.11896/jsjkx.210600063
[7] 徐少伟, 秦品乐, 曾建朝, 赵致楷, 高媛, 王丽芳.
基于多级特征和全局上下文的纵膈淋巴结分割算法
Mediastinal Lymph Node Segmentation Algorithm Based on Multi-level Features and Global Context
计算机科学, 2021, 48(6A): 95-100. https://doi.org/10.11896/jsjkx.200700067
[8] 王习, 张凯, 李军辉, 孔芳, 张熠天.
联合自注意力和循环网络的图像标题生成
Generation of Image Caption of Joint Self-attention and Recurrent Neural Network
计算机科学, 2021, 48(4): 157-163. https://doi.org/10.11896/jsjkx.200300146
[9] 郁友琴, 李弼程.
基于多粒度文本特征表示的微博用户兴趣识别
Microblog User Interest Recognition Based on Multi-granularity Text Feature Representation
计算机科学, 2021, 48(12): 219-225. https://doi.org/10.11896/jsjkx.201100128
[10] 周小诗, 张梓葳, 文娟.
基于神经网络机器翻译的自然语言信息隐藏
Natural Language Steganography Based on Neural Machine Translation
计算机科学, 2021, 48(11A): 557-564. https://doi.org/10.11896/jsjkx.210100015
[11] 袁禄, 朱郑州, 任庭玉.
虚假评论识别研究综述
Survey on Fake Review Recognition
计算机科学, 2021, 48(1): 111-118. https://doi.org/10.11896/jsjkx.200500101
[12] 张浩洋, 周良.
改进的GHSOM算法在民航航空法规知识地图构建中的应用
Application of Improved GHSOM Algorithm in Civil Aviation Regulation Knowledge Map Construction
计算机科学, 2020, 47(6A): 429-435. https://doi.org/10.11896/JsJkx.190700161
[13] 赵澄, 叶耀威, 姚明海.
基于金融文本情感的股票波动预测
Stock Volatility Forecast Based on Financial Text Emotion
计算机科学, 2020, 47(5): 79-83. https://doi.org/10.11896/jsjkx.190400145
[14] 张鹏飞, 李冠宇, 贾彩燕.
面向自然语言推理的基于截断高斯距离的自注意力机制
Truncated Gaussian Distance-based Self-attention Mechanism for Natural Language Inference
计算机科学, 2020, 47(4): 178-183. https://doi.org/10.11896/jsjkx.190600149
[15] 张义杰, 李培峰, 朱巧明.
基于自注意力机制的事件时序关系分类方法
Event Temporal Relation Classification Method Based on Self-attention Mechanism
计算机科学, 2019, 46(8): 244-248. https://doi.org/10.11896/j.issn.1002-137X.2019.08.040
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!