[1]吴 禹,靳华中. 基于文本层级结构的图像描述生成算法[J].湖北工业大学学报,2021,(4):17-21.
 WU Yu,JIN Huazhong.[J].,2021,(4):17-21.
点击复制

 基于文本层级结构的图像描述生成算法()
分享到:

《湖北工业大学学报》[ISSN:1003-4684/CN:42-1752/Z]

卷:
期数:
2021年第4期
页码:
17-21
栏目:
湖北工业大学学报
出版日期:
2021-08-26

文章信息/Info

文章编号:
1003-4684(2021)04-0017-05
作者:
 吴 禹 靳华中
 湖北工业大学计算机学院, 湖北 武汉 430068
Author(s):
 WU YuJIN Huazhong
 School of Computer Science, Hubei Univ. of Tech., Wuhan 430068,China
关键词:
 图像描述生成 语言模型 有序长短时记忆网络 文本层级结构
Keywords:
 image caption language model ON-LSTM text hierarchical structure
分类号:
TP3-0
文献标志码:
A
摘要:
 针对现有图像描述生成算法在解码阶段由于语言模型结构简单,解码表达能力较弱,容易造成语义缺失的问题,引入有序长短时记忆网络(ON-LSTM),改进现有模型解码器,构建双层LSTM架构,显式的提取描述文本层级结构,解码出更丰富的语义特征。在MSCOCO数据集上进行训练和测试,实验结果表明,改进的算法能够生成更加符合自然语言习惯的描述语句。
Abstract:
 Aiming at the existing image description generation algorithm, in the decoding stage, the language model is simple in structure and weak in decoding expression ability, which can easily cause the problem of lack of semantics. The ordered neurons Long Short-Term Memory network (ON-LSTM) is introduced to construct a two-layer LSTM architecture to improve the decoder of the existing model, so that it can explicitly extract the text hierarchical structure of the description to decode richer semantic features. Training and testing on the MSCOCO data set, the experimental results show that the improved algorithm can generate description sentences that are more in line with natural language habits. ordered neurons Long Short-Term Memory network.

参考文献/References:

[1] ORIOL V, ALEXANDER T, SAMY B, et al. Show and tell: A neural image caption generator[C]//CVPR2015: Proceedings of the 2015 International Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015, 3156-3164.
[2] XU K , BA J , KIROS R , et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention[J]. Computer Science, 2015:2048-2057.
[3] MAO J, XU W, YANG Y, et al. Deep captioning with multimodal recurrent neural networks (m-RNN)[C]//3rd International Conference on Learning Representations(ICLR 2015). San Diego, CA, United States,2015.
[4] CHEN X, MA L, JIANG W, et al. Regularizing rnns for caption generation by reconstructing the past with the present[C]//2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition(CVPR), 2018, 7995-8003.
[5] ZHOU Y, WANG M, LIU D, et al. More grounded image captioning by distilling image-text matching model[C]//2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, 2020.
[6] LEE K H, XI C, GANG H, et al. Stacked cross attention for image-text matching[C]// 15th European Conference on Computer Vision (ECCV 2018). Munich, Germany: Springer Verlag, 2018,212-228.
[7] YIKANG S, ZHOUHAN L, CHIN-WEI H, et al. Neural language modeling by jointly learning syntax and lexicon[C]//6rd International Conference on Learning Representations(ICLR 2018). Vancouver, Canada,2018.
[8] JAN K, KLAUS G, FAUSTINO G, et al. A clockwork RNN[C]//In Proceedings of the 31st International Conference on International Conference on Machine Learning-Volume 32 (ICML 2014). JMLR.org, II-1863-II-1871.
[9] SHEN Y, TAN S, SORDONI A, et al. Ordered neurons: integrating tree structures into recurrent neural networks[C]//7rd International Conference on Learning Representations(ICLR 2019). New Orleans, LA, United States,2019.
[10] 万齐智,万常选,胡蓉,等.基于句法语义依存分析的中文金融事件抽取[J].计算机学报,2021,44(3):508-530.
[11] 宗成庆.统计自然语言处理[M]. 北京:清华大学出版社,2013:16-17
[12] KIROS R, SALAKHUTDINOV R, ZEMEL R. Multimodal neural language models [C]//Proceedings of the International Conference on International Conference on Machine Learning. Beijing: JMLR,2014: 595-603.

相似文献/References:

[1]熊韧,曹海印,王焱清,等.非牛顿润滑静压轴承的节流器流量方程修正[J].湖北工业大学学报,2019,34(5):6.
 XIONG Ren,CAO Haiyin,WANG Yanqing,et al.Modified restrictor flow equations of hydrostatic bearings ubricated by non-Newtonian fluids[J].,2019,34(4):6.
[2]周星光,靳华中,徐雨东,等.基于多尺度特征的图像描述生成模型[J].湖北工业大学学报,2020,(2):61.
 ZHOU Xingguang,JIN Huazhong,XU Yudong,et al.An Image Description Generation Model Based on Multi-scale[J].,2020,(4):61.
[3]王照远,曹 民,王 毅,等. 场景与数据双驱动的隧道图像拼接方法[J].湖北工业大学学报,2020,(4):11.
 WANG Zhaoyuan,CAO Min,WANG Yi,et al. Tunnel Image Stitching Method based on Scene and Data[J].,2020,(4):11.
[4]潘 健,梁佳成,陈凤娇,等. 单电流闭环多重PR控制的LCL型逆变器[J].湖北工业大学学报,2020,(4):16.
 PAN Jian,LIANG Jiacheng,CHEN Fengjiao,et al. Design of LCL Grid Connected Inverter based on Single Closed Loop Control and Multiple PR Controllers[J].,2020,(4):16.
[5]王晓光,赵 萌,文益雪,等. 定子闭口槽结构对永磁电机齿槽转矩影响分析[J].湖北工业大学学报,2020,(4):25.
 WANG Xiaoguang,ZHAO Meng,WEN Yixue,et al. Study on Cogging Torque and Vibration Noise of Permanent Magnet Motor with Segmental Stator and Closed-Slot[J].,2020,(4):25.
[6]宇 卫,凃玲英,陈 健. 风电场集中接入对集电线电流保护的影响[J].湖北工业大学学报,2020,(4):29.
 YU Wei,TU Lingying,CHEN Jian. Effect of the Collective Line Current Protection when Wind Farms are Centralized Accessed to the Power System[J].,2020,(4):29.
[7]廖政斌,王泽飞,祝 珊. 二惯量系统谐振在线抑制及相位补偿[J].湖北工业大学学报,2020,(4):34.
 LIAO Zhengbin,WANG Zefei,ZHU Shan. Online Resonance Suppression and Phase Compensation for Double Inertia System[J].,2020,(4):34.
[8]王 欣,游 颖,姜天翔,等. 面向3D打印过程的产品工艺设计和优化[J].湖北工业大学学报,2020,(4):39.
 WANG Xin,YOU Ying,JIANG Tianxiang,et al. Product Process Design and Optimization for 3D Printing Processes[J].,2020,(4):39.
[9]冉晶晶,文 红,罗雅梅,等. 全自动样品前处理平台及其控制系统[J].湖北工业大学学报,2020,(4):43.
 RAN Jingjing,WEN Hong,LUO Yamei,et al. Research on Automatic Sample Preprocessing Platform and its Control System[J].,2020,(4):43.
[10]杨 磊,马志艳,石 敏,等. 基于模糊PID的小型冷库过热度控制方法[J].湖北工业大学学报,2020,(4):43.
 YANG Lei,MA Zhiyan,SHI Min,et al. Research on Superheat Control Method of Small Cold Storage based on Fuzzy PID[J].,2020,(4):43.

备注/Memo

备注/Memo:
[收稿日期] 2021-03-15
[第一作者] 吴 禹(1996-),男,湖北咸宁人,湖北工业大学硕士研究生,研究方向为图像描述生成
[通信作者] 靳华中(1973-),男,湖北洪湖人,湖北工业大学副教授,研究方向为计算机视觉
更新日期/Last Update: 2021-08-27