Vision-and-Language Grounding

Speaker: Dr Qi Wu

Abstract: Vision-and-Language Navigation is a recently raised research direction which has attracted a lot of attention from the computer vision, natural language processing and robotics communities. We lighted up this direction in 2018 by proposing the first benchmarked VLN task and dataset, known as Room-to-Room (R2R). Now two years passed, many new models and datasets are proposed, including our recently released REVERIE (Remote Embodied Visual Referring Expression in Real Indoor Environments). In this talk, I will first present the original VLN task and dataset and then discuss some of our recently proposed methods based on it. I will also introduce our REVERIE dataset and show a new general model that can solve all the VLN tasks in a single framework.


Speaker: Dr Abhinav Dhall

Abstract: Availability of image and video manipulation software have made it easier to create deepfake videos. In this work, we analyse the effectiveness of human implicit signals for aiding deepfake content analysis. We will present user-centric and content-centric approaches for detecting fake videos based on user gaze, audio and video signals. Furthermore, we will show how to localise the manipulation in time.

Learning under label noise

Tongliang Liu

The University of Sydney


Tongliang Liu is a Lecturer with the School of Computer Science at the University of Sydney and a Visiting Scientist at RIKEN AIP, Japan. His research interests include machine learning and deep learning, with an emphasis on learning under label noise. He regularly serves as a meta-reviewer/reviewer for many conferences and journals. He was a recipient of ARC DECRA 2019 and was named in the Early Achievers Leaderboard by The Australian in 2020.

Lightweight Neural Architectures for Efficient Deep Learning

Chang Xu

The University of Sydney


Chang Xu is Senior Lecturer and ARC DECRA Fellow at the School of Computer Science, University of Sydney. He has authored or coauthored over 90 CORE A*/A papers in machine learning and computer vision. He was recipient of several paper awards, e.g. Distinguished Paper Award in IJCAI 2018. He has been recognized as Top Ten Distinguished Senior PC Member in IJCAI 2017.

Context modelling for human action recognition and anticipation

Qiuhong Ke

The University of Melbourne

Human action analysis, including action recognition and anticipation, is very important in a wide range of real-world applications such as visual surveillance, self-driving system and human-robot interaction. In this talk, I will introduce three of my previous works on context modelling for human action recognition and anticipation. The first work is about spatial-temporal context modelling for action recognition. The second work is about global context modelling for early action recognition. The last work is about time-conditioned context modelling for action anticipation.


Dr. Qiuhong Ke received her PhD degree from The University of Western Australia in 2018. She is currently a Lecturer at The School of Computing and Information Systems of The University of Melbourne. She was a Postdoctoral Researcher in Max Planck Institute for Informatics from 2018 to 2019. Her research interests include action recognition and prediction.

Learning and Inference from Complex Data

Recent years have witnessed increasing applications where data exhibits complex dependencies, including social networks, knowledge graphs, and biological data, etc. Machine learning and inference are vital techniques to make sense of this kind of data.

This workshop will bring two invited talks for the theme of learning and inference over complex data to our DICTA audiences. The first talk will cover how natural-language questions can be answered with the help of knowledge graphs, a powerful tool to represent complex relations among entities. The second talk will showcase how complicated statistical dependencies can be discovered from observational data via causal inference, a frontier of machine learning technique.

This workshop is targeted at professionals who would like to know the frontier of machine learning and inference for complex data. The workshop assumes background knowledge in supervised learning, basic statistical concepts and basic deep learning techniques.

Learning for Human-Robot Interaction

Prof. Dana Kulić

Monash University

Robots working in human environments need to learn from and adapt to their users. In this talk, I will describe the challenges of robot learning during human-robot interaction:  what should be learned?  how can a user effectively provide feedback and input? I will illustrate the challenges with examples of robots in different roles and applications, including rehabilitation, collaboration in industrial and field settings, and in education and entertainment.

Prof. Dana Kulić conducts research in robotics and human-robot interaction (HRI), and develops autonomous systems that can operate in concert with humans, using natural and intuitive interaction strategies while learning from user feedback to improve and individualize operation over long-term use.

Dana Kulić received the combined B. A. Sc. and M. Eng. degree in electro-mechanical engineering, and the Ph. D. degree in mechanical engineering from the University of British Columbia, Canada, in 1998 and 2005, respectively. From 2006 to 2009, Dr. Kulić was a JSPS Post-doctoral Fellow and a Project Assistant Professor at the Nakamura-Yamane Laboratory at the University of Tokyo, Japan. In 2009, Dr. Kulić established the Adaptive System Laboratory at the University of Waterloo, Canada, conducting research in human robot interaction, human motion analysis for rehabilitation and humanoid robotics. Since 2019, Dr. Kulić is a professor and director of Monash Robotics at Monash University, Australia. In 2020, Dr. Kulić was awarded the ARC Future Fellowship. Her research interests include robot learning, humanoid robots, human-robot interaction and mechatronics.

Deep Generative Prior

A/Prof. Chen Change Loy

Learning image prior is essential to solving various tasks of image restoration and manipulation, such as image colorization, image inpainting, and super-resolution. In the past decades, many image priors have been proposed to capture statistics of natural images. Despite their successes, these priors often serve a dedicated purpose. In this talk, I will share our efforts of leveraging Generative Adversarial Networks (GANs) that are trained on large-scale natural images for richer priors. Deep generative prior (DGP) offers compelling results in restoring missing semantics, e.g., color, patch, resolution, in degraded images. It also enables various image manipulation, including random jittering and image morphing. I will further show the possibility of mining 3D geometric clues from an off-the-shelf 2D GAN that is trained on RGB images only. We found that a pre-trained GAN indeed contains rich 3D knowledge and thus can be used to recover 3D shape from a single 2D image in an unsupervised manner. The recovered 3D shapes immediately allow high-quality image editing like relighting and object rotation. Lastly, I will discuss the way to overcome the efficiency bottleneck in GAN inversion so that we can use a pre-trained GAN for efficient and effective large-factor image super-resolution.


Chen Change Loy is a Nanyang Associate Professor with the School of Computer Science and Engineering, Nanyang Technological University, Singapore. He is also an Adjunct Associate Professor at the Chinese University of Hong Kong.

He received his PhD (2010) in Computer Science from the Queen Mary University of London. Before joining NTU, he served as a Research Assistant Professor at the MMLab of the Chinese University of Hong Kong, from 2013 to 2018. He is the recipient of the 2019 Nanyang Associate Professorship (Early Career Award) from Nanyang Technological University.

He is recognized by inclusion in the AI 2000 Most Influential Scholar Annual List (AI 2000). His research interests include computer vision and deep learning with a focus on image/video restoration, enhancement, and manipulation. His journal paper on image super-resolution was selected as the `Most Popular Article’ by TPAMI in 2016. It remains as one of the top 10 articles to date. He serves as an Associate Editor of IJCV and TPAMI. He also serves/served as the Area Chair of CVPR 2021, CVPR 2019, BMVC 2019, ECCV 2018, and BMVC 2018. He has co-organized several workshops and challenges at major computer vision conferences.

Less is More: Visual Search without Image and User Ownership at the Edge

Professor Sean Gong

Queen Mary University of London

Visual search of unseen objects as a Zero-Shot Learning problem assumes the availability of at least one-shot query image depicting a representation of a search target. This assumption is limited when only a brief textual (or verbal) description of the search target is available whilst visual data is either unavailable or due to privacy protection, e.g. removal of facial imagery from public video footages. Deep learning has been hugely successful for computer vision tasks in recent years because of the accessibility of shared and centralised large sized training data pulled globally. However, increasing awareness of privacy concerns and a renewed focus on regional user-ownership of localised data poses new challenges to the conventional wisdom for centralised deep learning on big data, especially for improving human recognition tasks such as person reidentification.

In this talk, I will highlight challenges and recent progress on deep learning for text guided visual search without any query visual input, and decentralised learning from non-shared training data distributed at multiple user-locations having independent non-overlapping multi-domain labels. Both examples are generalisations of Zero-Shot Learning.


Shaogang Gong is Professor of Visual Computation at Queen Mary University of London (since 2001), a pioneer in computer vision research for visual surveillance and person re-identification, and for video analytics technology deployment in law enforcement video forensic analysis.

He served on the Steering Panel of the UK Government Chief Scientific Adviser’s Science Review.

Gong was a research fellow on the EU ESPRIT VIEWS (Visual Interpretation and Evaluation of Wide-area Scenes) in 1989-1993, the world’s first multinational collaborative computer vision project on visual surveillance in urban environments. He led the EU Security Programme SAMURAI (Suspicious and Abnormal Behaviour Monitoring Using a Network of Cameras for Situation Awareness Enhancement) that pioneered Person Re-Identification (RE-ID) in-the-wild for public infrastructure protection in 2008-2011. Between 2009-2013, he led the UK government project on developing a system for Multi-Camera Object Tracking by RE-ID funded by the UK INSTINCT Programme (Innovative Science and Technology in Counter-Terrorism), in collaboration with the BAE Systems. He won the 2019 Bruce Dickinson Entrepreneur of the Year Award, the 2019 Queen Mary Innovation Award, and the 2017 Queen Mary Academic Commercial Enterprise Award. A commercial system built on the patents and software from Gong’s research won the 2017 Global Frost & Sullivan Award for Technical Innovation for Law Enforcement Video Forensics Technology, and won the 2017 Aerospace Defence Security Innovation Award given by the UK Security Minister for “revolutionary solution to reviewing CCTV footage”.

Gong has authored and edited 7 books on Person Re-Identification, Visual Analysis of Behaviour, Video Analytics for Business Intelligence, Dynamic Vision from Images to Face Recognition, Analysis and Modelling of Faces and Gestures. His recent research has been on Zero-Shot Learning, Transfer Learning, Distributed Learning, Unsupervised and Semi-Supervised Deep Learning, Imbalanced Deep Learning, Deep Reinforcement Learning, Attention Deep Learning, and Human-In-The-Loop Active Learning.

Gong is a Turing Fellow of the Alan Turing Institute of Data Science and Artificial Intelligence, and was a Royal Society Research Fellow. He received his DPhil degree from Keble College, Oxford University in 1989, sponsored by GEC Hirst and the Royal Society. He is a Fellow of IEE (now IET), a Fellow of the British Computer Society, and a Member of the UK Computing Research Committee.

Creating robots that see

Peter Corke

University of Melbourne

This talk will define and motivate the problem of robotic vision, the challenges as well as recent progress at the Australian Centre for Robotic Vision. This includes component technologies such as novel cameras, deep learning for computer vision, transfer learning for manipulation, evaluation methodologies, and also end-to-end systems for applications such as logistics, agriculture, environmental remediation and asset inspection.


Peter Corke is a robotics researcher and educator. He is the distinguished professor of robotic vision at Queensland University of Technology, director of the ARC Centre of Excellence for Robotic Vision and Chief Scientist at Dorabot. His research is concerned with enabling robots to see, and the application of robots to mining, agriculture and environmental monitoring. He created widely used open-source software for teaching and research, wrote the best selling textbook “Robotics, Vision, and Control”, created several MOOCs and the Robot Academy, and has won national and international recognition for teaching including 2017 Australian University Teacher of the Year. He is a fellow of the IEEE, the Australian Academy of Technology and Engineering, the Australian Academy of Science; founding editor of the Journal of Field Robotics; founding multi-media editor of the International Journal of Robotics Research; member of the editorial advisory board of the Springer Tracts on Advanced Robotics series; former editor-in-chief of the IEEE Robotics & Automation magazine and member of the executive editorial board member of the International Journal of Robotics Research; the recipient of the Qantas/Rolls-Royce and Australian Engineering Excellence awards; and has held visiting positions at Oxford, University of Illinois, Carnegie-Mellon University and University of Pennsylvania. He received his undergraduate and masters degrees in electrical engineering and PhD from the University of Melbourne.

Deep learning hands on tutorial for general audience

This workshop is an introduction to how deep learning works and how you could create a neural network using TensorFlow v2. We start by learning the basics of deep learning including what a neural network is, how information passes through the network, and how the network learns from data through the automated process of gradient descent. You would build, train and evaluate your very own network using a cloud GPU (Google Colab).

We then proceed to look at image data and how we could train a convolution neural network to classify images. You will extend your knowledge from the first part to design, train and evaluate this convolutional neural network.

This workshop is targeted at professionals with some data science knowledge who would like a theoretical and hands-on introduction to deep learning. The workshop assumes background knowledge in Python programming, understanding of basic data science concepts such as training vs. testing data, overfitting, and regression. A high level understanding of calculus and matrix operations is beneficial but not essential.



The International Conference on Digital Image Computing: Techniques and Applications (DICTA) is the flagship Australian Conference on computer vision, image processing, pattern recognition, and related areas. DICTA was established in 1991 as the premier conference of the Australian Pattern Recognition Society (APRS).

Conference Managers

Please contact the team at Conference Design with any questions regarding the conference.
© 2020 Conference Design Pty Ltd