«上一篇/Previous Article|本期目录/Table of Contents|下一篇/Next Article»

[1]周星光,靳华中,徐雨东,等.基于多尺度特征的图像描述生成模型[J].湖北工业大学学报,2020,(2):61-66.
　ZHOU Xingguang,JIN Huazhong,XU Yudong,et al.An Image Description Generation Model Based on Multi-scale[J].,2020,(2):61-66.
点击复制

基于多尺度特征的图像描述生成模型()

分享到：

《湖北工业大学学报》[ISSN:1003-4684/CN:42-1752/Z]

卷:
期数:: 2020年第2期

页码:: 61-66

栏目:

出版日期:: 2020-04-30

文章信息/Info

Title:: An Image Description Generation Model Based on Multi-scale

文章编号:: 1003-4684(2020)02-0061-06

作者:: 周星光; 靳华中; 徐雨东; 李晴晴; 胡满; 湖北工业大学计算机学院，湖北武汉 430068

Author(s):: ZHOU Xingguang; JIN Huazhong; XU Yudong; LI Qingqing; HU Man; School of Computer Science, Hubei Univ. of Tech., Wuhan 430068, China

关键词:: 图像描述生成; 深度学习; 多尺度; 图像特征

Keywords:: image description generation; deep learning; multi-scale; image features

分类号:: TP3-0

文献标志码:: A

摘要:: 针对现有基于深度学习图像描述生成模型，在图像特征编码阶段，由于编码器提取的图像特征较为单一，图像信息利用不充分，造成文字对图片内容描述得不够准确、语义较模糊的问题，在VGG19基础上，改进现有模型对图像特征的编码形式，通过提取和融合图像多尺度特征的方法，获取更丰富的图像信息。在MSCOCO数据集上进行训练和测试，实验结果表明，提出的模型能够生成更加准确、完整，更有意义的图像描述语句。

Abstract:: Aiming at the existing model based on deep learning image description, in the image feature encoding stage, the image features extracted by the encoder are relatively simple and the image information is not fully utilized, which causes inaccuracy in describing the content of the image of the text and fuzziness of the semantics. Based on VGG19, this paper improves the coding pattern of image features of existing models, and extracts and fuses image multi-scale feature methods to obtain more abundant image information. The method in this paper is trained and tested on the MSCOCO dataset. The experimental results show that the proposed model can generate more accurate, complete and meaningful image description statements.

参考文献/References:

[1]Lecun Y , Bengio Y , Hinton G . Deep learning[J]. Nature, 2015, 521(7553):436.
[2]Fang H , Gupta S , Iandola F, et al. From captions to visual concepts and back[C]// 2015 IEEE Conference on Computer Vision And Pattern Recognition (CVPR). IEEE, 2015.
[3]Kuznetsova P, Ordonez V, Berg A C, et al. Collective generation of natural image descriptions [C]//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1. Association for Computational Linguistics, 2012: 359-368.
[4]Kuznetsova P, Ordonez V, Berg T L, et al. Treetalk: composition and compression of trees for image descriptions [J]. Transactions of the Association for Computational Linguistics, 2014(2): 351-362.
[5]Hopfield J J. Neural networks and physical systems with emergent collective computational abilities[J]. Proceedings of the national academy of sciences, 1982, 79(8): 2554-2558.
[6]Mao J, Xu W, Yang Y, et al. Explain images with multimodal recurrent neural networks[EB/OL]. [2018-6-10]https://arxiv.org/pdf/1410.1090v1.pdf
[7]Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks[C]//NIPS2012: Proceedings of the 2012 International Conference on Neural Information Processing Systems. Nevada, USA: Curran Associates Inc. 2012: 1097-1105.
[8]Vinyals O, Toshev A, Bengio S, et al. Show and tell: A neural image caption generator[C]//CVPR2015: Proceedings of the 2015 International Conference on Computer Vision and Pattern Recognition. Washington, DC: IEEE Computer Society, 2015: 3156-3164.
[9]Xu K, Ba J, Kiros R, et al. Show, attend and tell: neural image caption generation with visual attention [J]. Computer Science, 2015: 2048-2057.
[10] Bahdanau D, Cho K, Bengio Y. Neural machine translation by jointly learning to align and translate[EB/OL]. [2018-06-10]https://arxiv.org/pdf/1409.0473.pdf.
[11] Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[C]// NIPS2017: Proceedings of the 2012 International Conference on Neural Information Processing Systems. Long Beach, USA. 2017: 6000-6010.
[12] Anderson P, He X, Buehler C, et al. Bottom-up and top-down attention for image captioning and visual question answering [C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018: 6077-6086.
[13] Lin G S,Shen C H,van den Hengel,et al.Efficient piecewise training of deep structured models for semantic segmentation[C]//2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) ,June 27-30,2016,Las Vegas,NV,USA,New York:IEEE,2016:3194-3203
[14] Kalantidis Y, Mellina C, Osindero S. Cross-dimensional weighting for aggregated deep convolutional features [C]// Proc of European Conference on Computer Vision. Amsterdam: IEEE press, 2016: 685-701.
[15] Pan Xingang, Shi Jianping, Luo Ping, et al. Spatial as deep: Spatial cnn for traffic scene understanding [C]// The AAAI Conference on Artificial Intelligence. New Orleans: AAAI press, 2018: 7276-7683.
[16] He K, Zhang X, Ren S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis & Machine Intelligence, 2014, 37(9):1904-16.
[17] Jose A, Lopez R D, Heisterklaus I, et al. Pyramid Pooling of Convolutional Feature Maps for Image Retrieval [C]// IEEE International Conference on Image Processing. Athens: IEEE press, 2018: 480-484.
[18] Papineni K, Roukos S, Ward T, Zhu W J. BLEU: a method for automatic evaluation of machine translation [C]//Proceedings of the 40th Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, 2012: 311-318.
[19] Banerjee S, Lavie A. METEOR: an automatic metric for MT evaluation with improved correlation with human judgments [C]//Proceedings of the aclWorkshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, 2015: 65-72.
[20] Lin C Y. Rouge: a package for automatic evaluation of summaries[C]//Proceedings of the ACL-04 Workshop on Text Summarization Branches Out, Barcelona, 2004: 74-81.

相似文献/References:

[1]杨帆,陈建峡,郑吟秋,等.基于深度学习的法院信息文本分类[J].湖北工业大学学报,2019,34(4):63.
　YANG Fan,CHEN Jianxia,ZHENG Yingqiu,et al.Research on Classification of Court Information Texts Based on Deep Learning[J].,2019,34(2):63.
[2]龚启文,程玉,陈建峡,等.基于深度学习的法院命名实体识别模型[J].湖北工业大学学报,2019,34(4):68.
　GONG Qiwen,CHENG Yu,CHEN Jianxia,et al.Research on the Recognition Model of Court Judgment Named Entity Based on Deep Learning[J].,2019,34(2):68.
[3]汤青洲,张德津,王墨川,等. SLIC超像素与Inception网络的路面裂缝识别方法[J].湖北工业大学学报,2021,(4):8.
　TANG Qingzhou,ZHANG Dejin,WANG Mochuan,et al. Pavement Crack Detection Method Based on SLIC Superpixel and Inception Network[J].,2021,(2):8.
[4]吴禹,靳华中. 基于文本层级结构的图像描述生成算法[J].湖北工业大学学报,2021,(4):17.
　WU Yu,JIN Huazhong.[J].,2021,(2):17.
[5]黄剑锋,王淑青,王年涛,等. 面向无人机巡检的农村输电线螺栓锈蚀检测[J].湖北工业大学学报,2022,(1):54.
　HUANG Jianfeng,WANG Shuqing,WANG Niantao,et al. RuRal Transmission Line Bolt Corrosion Detection Method Oriented to Drone Inspection[J].,2022,(2):54.
[6]顿伟超,王淑青,张鹏飞,等. 基于改进YOLOv4的电力高空作业安全带检测[J].湖北工业大学学报,2022,(5):6.
　DUN Weichao,WANG Shuqing,ZHANG Pengfei,et al. Safety Belt Detection Algorithm for Electric Aerial Work Based on Improved YOLOv4[J].,2022,(2):6.
[7]张鹏飞,王淑青,王年涛,等. 基于改进MobileNetV3的PCB裸板缺陷检测[J].湖北工业大学学报,2023,(1):27.
　ZHANG Pengfei,WANG Shuqing,WANG Niantao,et al. PCB Bare Board Defect Detection Based on Improved MobileNetV3[J].,2023,(2):27.
[8]李纬,吴聪.基于多级残差多尺度的医学图像分割网络[J].湖北工业大学学报,2023,(1):38.
　LI Wei,WU Cong.Medical Image Segmentation Network based on Multilevel Residuals and Multi-scales[J].,2023,(2):38.
[9]鲁濠,王淑青,鲁东林,等. 基于改进YOLOv5的小龙虾品质检测方法[J].湖北工业大学学报,2023,(4):76.
　LU Hao,WANG Shuqing,LU Donglin,et al. Quality Detection Method of Crayfish based on Improved YOLOv5[J].,2023,(2):76.

备注/Memo

备注/Memo:: ［收稿日期］ 2019-10-12
［基金项目］大学生创新创业训练计划项目（S201910500074）
［第一作者］周星光(1993-)，男，湖北孝昌人，湖北工业大学硕士研究生，研究方向为图像描述生成
［通信作者］靳华中(1973-)，男，湖北洪湖人，湖北工业大学副教授，研究方向为图像处理

常用功能

工具/Tools

统计/Statistics

摘要浏览/Viewed439
全文下载/Downloads203
评论/Comments

更新日期/Last Update: 2020-05-13