Chinese image captioning for visual disabilities

Background

Globally, at least 2.2 billion people have near or distant vision impairment. According to the official website of the China Association for the Blind, China has the largest number of blind people in the world, with 17.31 million people with visual disabilities, one in every 100 Chinese. ​

common.docs_name - LarkCCM_Docs_Menu_Image

Fig. Vision disabilities

There are many inconvenient factors in all aspects of their daily lives. The material life of the blind needs social support, and the growing spiritual and cultural requirements also need to be met. Movies can be understood through Audio descriptions, and text can be heard through audiobooks. However, these barrier-free cultural products take a long time to prepare and are concentrated in more developed cities, they cannot be widespread to those in need.​

Fig. Barrier-free movie

With the development of deep learning, if we obtain the images and their corresponding human descriptions, we can train networks to automatically caption images. Image captioning (IC) is the process of generating a textual description for given images, which is a fundamental and essential task in the deep learning domain. The language difference between Chinese and English is quite large, and Chinese semantics are more abundant and flexible. However, there is relatively little research on Chinese image description generation at present, and most models refer to English IC technology.​

Fig. Image from dataset named Flickr8k-cn

Data format & data Loader

We provide 10 Chinese texts with picture descriptions. The number of pages in each book varies from 47 to 256, and the number and style of images are also different.​

Fig. Color ink painting

Fig. Monochrome cartoon

Fig. Color abstract painting

The pdf files are the scan of the original books, and the corresponding txt files are manually annotated translated versions. The location of pictures in pdf and descriptions in txt are roughly the same, the latter will be placed with obvious prompts such as "此处有张图".​

Basic Problem (70 points): Translation pipeline

TODO

Chinese image captioning for visual disabilities​

Chinese image captioning for visual disabilities