With the development of deep learning, if we obtain the images and their corresponding human descriptions, we can train networks to automatically caption images. Image captioning (IC) is the process of generating a textual description for given images, which is a fundamental and essential task in the deep learning domain. The language difference between Chinese and English is quite large, and Chinese semantics are more abundant and flexible. However, there is relatively little research on Chinese image description generation at present, and most models refer to English IC technology.