«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]刘宇昊,高　榕,严灵毓,等.基于图对比学习的长文本分类模型[J].湖北工业大学学报,2023,(5):67-74.
　LIU Yuhao,GAO Rong,YAN Lingyu,et al.A Long Text Classification Model Based on Graph Contrast Learning[J].,2023,(5):67-74.
点击复制

基于图对比学习的长文本分类模型()

分享到：

《湖北工业大学学报》[ISSN:1003-4684/CN:42-1752/Z]

卷:
期数:: 2023年第5期

页码:: 67-74

栏目:

出版日期:: 2023-10-30

文章信息/Info

Title:: A Long Text Classification Model Based on Graph Contrast Learning

文章编号:: 1003-4684(2023)05-0067-08

作者:: 刘宇昊; 高　榕; 严灵毓; 叶志伟; 湖北工业大学计算机学院,湖北武汉４３００６８

Author(s):: LIU Yuhao; GAO Rong; YAN Lingyu; YE Zhiwei; School of Computer Science，Hubei Univ. of Tech .，Wuhan 430068，China

关键词:: 文本表示; 长文本分类; 图对比学习; 负采样

Keywords:: text representation; long text classification; graph contrastive learning; negative sampling

分类号:: TP391.1

文献标志码:: A

摘要:: 当前基于字符级考虑的文本分类方法在长文本分类上,存在输入维度过大致使计算困难以及内容过长难以捕捉长距离关系,从而导致准确度不足的问题.由此,提出基于自适应视图生成器和负采样优化的图对比学习长文本分类模型.首先将长文本分为若干段落,用 BERT衍生模型对段落进行嵌入表示,然后基于文本的高级结构将段落的嵌入表示视为节点构建图模型,接着使用自适应视图生成器对图进行增广,并通过图对比学习得到文本的嵌入表示,同时在图对比学习的负采样阶段,引入 PULearning知识修正负采样偏差的问题,最后将得到的文本嵌入表示使用两层线性层进行分类.通过在两个中文数据集上的实验显示,方法优于主流先进模型.

Abstract:: The current text classification methods based on characterlevel consideration have the problems of computational difficulty due to the large input dimension and the difficulty of capturing the longdistance relationship due to the long content, which leads to a lack of accuracy in long text classification. Thus, the proposed graph contrast learning long text classification model is based on an adaptive view generator and negative sampling optimization. Specifically, the long text is first divided into several paragraphs, and the paragraphs are embedded with the BERTderived model, then the graph model is constructed based on the highlevel structure of the text by considering the embedded representation of the paragraphs as nodes, then the graph is augmented using the adaptive view generator, and the embedded representation of the text is obtained by graph contrast learning, while PU learning knowledge is introduced to alleviate the problem of negative sampling bias in the negative sampling phase of graph contrast learning, and finally the obtained embedded representation of the text is classified using two linear layers. Experiments on two Chinese datasets show that the method outperforms mainstream advanced models.

参考文献/References:

[１]　 KOWSARI K,JAFARI MEIMANDI K,HEIDARＧ YSAFA M,etal．Textclassificationalgorithms:AsurＧ vey[J]．Information,２０１９,１０(０４):１５０． [２]　MEDHAT W,HASSAN A,KORASHY H．SenＧtiment analysisalgorithmsandappＧlications:Asurvey[J]．Ain Shamsengineeringjournal,２０１４,５(０４):１０９３Ｇ１１１３． [３]　YATES A,NOGUEIRA R,LIN J．PretrainedtransＧ formersfortextranking:BERTandbeyond[C]∥ProＧ ceedingsofthe１４thACMInternationalConferenceon WebSearchandDataMining．２０２１:１１５４Ｇ１１５６． [４]　 MA X,ZHU Q,ZHOU Y,etal．Improvingquestion generation withsentenceＧlevelsemantic matchingand answerpositioninferring[C]∥ Proceedings ofthe AAAIConferenceon ArtificialIntelligence．２０２０,３４ (０５):８４６４Ｇ８４７１． [５]　 WANG Z,LIU X,YANG P,etal．CrossＧlingualtext classificationwithheterogeneousgraphneuralnetwork [C]∥ Proceedingsofthe５９thAnnualMeetingofthe AssociationforComputationalLinguisticsandthe１１th InternationalJoint Conference on Natural Language Processing,(ACL/IJCNLP)２０２１,(Volume２:Short Papers),２０２１:６１２Ｇ６２０ [６]　CHAFFARS,INKPEND．UsingaheterogeneousdataＧ setforemotionanalysisintext[C]∥CanadianconferＧ enceonartificialintelligence．Springer,Berlin,HeidelＧ berg,２０１１:６２Ｇ６７． [７]　MIKOLOVT,CHEN K,CORRADO G,etal．Efficient estimationofwordrepresentationsinvectorspace[C] ∥ １stInternationalConferenceonLearningRepresenＧ tations,(ICLR２０１３),Scottsdale,Arizona,USA,２０１３． http:∥arxiv．org/abs/１３０１．３７８１． [８]　LILLEBERGJ,ZHU Y,ZHANG Y．Supportvector machinesandword２vecfortextclassificationwithseＧ manticfeatures[C]∥２０１５IEEE １４thInternational ConferenceonCognitiveInformatics& CognitiveComＧ puting(ICCI? CC)．IEEE,２０１５:１３６Ｇ１４０． [９]　VASWANIA,SHAZEER N,PARMAR N,etal．AtＧ tentionisallyouneed[C]∥ AdvancesinNeuralInforＧ mationProcessingSystems３０:AnnualConferenceon NeuralInformation Processing Systems ２０１７．２０１７: ５９９８Ｇ６００８． [１０]DEVLINJ,CHANG M W,LEEK,etal．Bert:PreＧtrainＧ ingofdeepbidirectionaltransformersforlanguageunＧ derstanding[C]∥ProceedingsofNAACLＧHLT．２０１９: ４１７１Ｇ４１８６． [１１]VAN DEN OORD A,LIY,VinyalsO．Representation learningwithcontrastivepredictivecoding[C]．CoRR, ２０１８,abs/１８０７．０３７４８．http:∥ arxiv．org/abs/１８０７．０３７４８． [１２]SUNC,QIU X,XU Y,etal．HowtofineＧtunebertfor textclassification? [J]∥Chinanationalconferenceon Chinese computational linguistics．Springer,Cham, ２０１９:１９４Ｇ２０６． [１３]MOHANTYI,GOYAL A,DOTTERWEICH A．EmoＧ tionsaresubtle:learningsentimentbasedtextrepreＧ sentationsusingcontrastivelearning[J/OL]．[２０２１Ｇ０４Ｇ　第３８卷第５期　　　　　　　　　　　　刘宇昊,等　基于图对比学习的长文本分类模型７３１５]．CoRR,２０２１,abs/２１１２．０１０５４．https:∥arxiv．org/ abs/２１１２．０１０５４． [１４]XU P,CHEN X,MA X ,etal．ContrastiveDocument Representation Learning with Graph Attention NetＧ works[C]∥ FindingsoftheAssociationforComputaＧ tionalLinguistics:EMNLP２０２１:３８７４–３８８４． [１５]DUPLESSIS M,NIU G,SUGIYAMA M．ConvexforＧ mulationforlearningfrompositiveandunlabeleddata [C]∥Internationalconferenceon machinelearning． PMLR,２０１５:１３８６Ｇ１３９４． [１６]VELIC ? KOVIC ＇ P,CUCURULLG,CASANOVA A,et al．Graphattentionnetworks[J]．stat,２０１７,１０５０:２０． [１７]JANGE,GUS,POOLEB．CategoricalreparameterizaＧ tionwithgumbelＧsoftmax[C]∥５THInternationalconＧ ferenceonlearningrepresentations,(ICLR２０１７),TouＧ lon,France,２０１８．https:∥openreview．net/forum? id＝ rke３y８５ee． [１８]CHU G,WANGX,SHIC,etal．CuCo:GraphrepresenＧ tationwithcurriculumcontrastivelearning[C]∥Proc． IJCAI．２０２１:２３００Ｇ２３０６． [１９]KIRYOR,NIUG,DUPLESSISMC,etal．PositiveＧunＧ labeledlearningwithnonＧnegativeriskestimator[C]∥ Advancesin NeuralInformation Processing Systems ３０:AnnualConferenceonNeuralInformationProcessＧ ingSystems２０１７．２０１７:１６７５Ｇ１６８５． [２０] HINTON G,VINYALS O,DEAN J．Distillingthe knowledgeinaneuralnetwork[J/OL]．[２０２２Ｇ０４Ｇ１５]． CoRR,２０１５,abs/１５０３．０２５３１．http:∥arxiv．org/abs/ １５０３．０２５３１． [２１]WANGF,LIU H．UnderstandingthebehaviourofcontＧ rastiveloss[C]∥ProceedingsoftheIEEE/CVFconferＧ enceoncomputervisionandpatternrecognition．２０２１: ２４９５Ｇ２５０４． [２２]李景阳,孙茂松．NonＧindependentterm selectionfor Chinesetextcategorization[J]．TsinghuaScienceand Technology,２００９(０１):１１５Ｇ１２２． [２３]WANG C,ZHANG M,MA S,etal．Automaticonline newsissueconstructioninwebenvironment[C]∥ ProＧ ceedingsofthe１７thInternationalConferenceonWorld WideWeb,WWW ’０８,pages４５７Ｇ４６６,NewYork,NY, USA,２００８．ACM． [２４]LAIS,XUL,LIUK,etal．RecurrentconvolutionalneuＧ ralnetworksfortextclassification[C]∥TwentyＧninth AAAIconferenceonartificialintelligence．２０１５． [２５]ZHOU P,SHIW,TIANJ,etal．AttentionＧbasedbidiＧ rectionallongshortＧterm memorynetworksforrelation classification[C]∥ Proceedings ofthe５４th annual meetingoftheassociationforcomputationallinguistics (volume２:Shortpapers)．２０１６:２０７Ｇ２１２． [２６]KIMJ,JANGS,PARKE,etal．Textclassificationusing capsules[J]．Neurocomputing,２０２０,３７６:２１４Ｇ２２１． [２７]BELTAGYI,PETERS M E,COHAN A．Longformer: ThelongＧdocumenttransformer[J/OL]．[２０２２Ｇ０４Ｇ１５]． CoRR,２０２０,abs/２００４．０５１５０．https:∥arxiv．org/abs/ ２００４．０５１５０． [２８]HUAN H,YANJ,XIEY,etal．FeatureＧenhancednonＧ equilibriumbidirectionallongshortＧterm memorymodＧ elfor Chinesetextclassification[J]．IEEE Access, ２０２０,８:１９９６２９Ｇ１９９６３７． [２９]YANGZ,YANG D,DYERC,etal．HierarchicalattenＧ tionnetworksfordocumentclassification[C]∥ProＧ ceedingsofthe２０１６conferenceoftheNorthAmerican chapteroftheassociationforcomputationallinguistics: humanlanguagetechnologies．２０１６:１４８０Ｇ１４８９． [３０]YOU Y,CHEN T,SUI Y,etal．Graphcontrastive learningwithaugmentations[J]．AdvancesinNeuralInＧ formationProcessingSystems,２０２０,３３:５８１２Ｇ５８２３．

备注/Memo

备注/Memo:: [收稿日期]２０２２０６１２ [第一作者]刘宇昊(１９９６－),男,湖北咸宁人,湖北工业大学硕士研究生,研究方向为电子信息.

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed166
全文下载/Downloads22
评论/Comments

更新日期/Last Update: 2023-10-26