Counterfactual Cycle-consistent Learning For Instruction Following And Generation In Vision-language Navigation | Awesome LLM Papers Add your paper to Awesome LLM Papers

Counterfactual Cycle-consistent Learning For Instruction Following And Generation In Vision-language Navigation

Hanqing Wang, Wei Liang, Jianbing Shen, Luc van Gool, Wenguan Wang . 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) 2022 – 41 citations

[Paper]   Search on Google Scholar   Search on Semantic Scholar
CVPR Diffusion Processes Instruction Following Scalability Vision Language

Since the rise of vision-language navigation (VLN), great progress has been made in instruction following – building a follower to navigate environments under the guidance of instructions. However, far less attention has been paid to the inverse task: instruction generation – learning a speaker~to generate grounded descriptions for navigation routes. Existing VLN methods train a speaker independently and often treat it as a data augmentation tool to strengthen the follower while ignoring rich cross-task relations. Here we describe an approach that learns the two tasks simultaneously and exploits their intrinsic correlations to boost the training of each: the follower judges whether the speaker-created instruction explains the original navigation route correctly, and vice versa. Without the need of aligned instruction-path pairs, such cycle-consistent learning scheme is complementary to task-specific training targets defined on labeled data, and can also be applied over unlabeled paths (sampled without paired instructions). Another agent, called~creator is added to generate counterfactual environments. It greatly changes current scenes yet leaves novel items – which are vital for the execution of original instructions – unchanged. Thus more informative training scenes are synthesized and the three agents compose a powerful VLN learning system. Extensive experiments on a standard benchmark show that our approach improves the performance of various follower models and produces accurate navigation instructions.

Similar Work