Vision-and-Language Grounding

Speaker: Dr Qi Wu

Abstract: Vision-and-Language Navigation is a recently raised research direction which has attracted a lot of attention from the computer vision, natural language processing and robotics communities. We lighted up this direction in 2018 by proposing the first benchmarked VLN task and dataset, known as Room-to-Room (R2R). Now two years passed, many new models and datasets are proposed, including our recently released REVERIE (Remote Embodied Visual Referring Expression in Real Indoor Environments). In this talk, I will first present the original VLN task and dataset and then discuss some of our recently proposed methods based on it. I will also introduce our REVERIE dataset and show a new general model that can solve all the VLN tasks in a single framework.

 

Speaker: Dr Abhinav Dhall

Abstract: Availability of image and video manipulation software have made it easier to create deepfake videos. In this work, we analyse the effectiveness of human implicit signals for aiding deepfake content analysis. We will present user-centric and content-centric approaches for detecting fake videos based on user gaze, audio and video signals. Furthermore, we will show how to localise the manipulation in time.

About DICTA

The International Conference on Digital Image Computing: Techniques and Applications (DICTA) is the flagship Australian Conference on computer vision, image processing, pattern recognition, and related areas. DICTA was established in 1991 as the premier conference of the Australian Pattern Recognition Society (APRS).

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2020 Conference Design Pty Ltd