Motion information for fast and efficient video understanding

The goal of this project is to develop lightweight video models that make use of motion to compress the information needed.

Objectives

To develop video models that use sparse point tracking as input instead of a sequence of images, to improve data and computational efficiency, resulting in faster training and testing times, as well as needing less training data.
To automatically learn to identify the key points to track that are most discriminative.
To create an end-to-end system that can identify the points to track, tracks them and performs video understanding

Description

Video Understanding is a fundamental ability of intelligent systems, from robots, to drones or AI systems. However, current video understanding builds on technology designed for images, and since videos can be orders of magnitude larger than images, video technology becomes expensive, both computationally and in terms of the amount of data needed to train. Most video technology, however, ignores explicit motion information altogether. During this PhD, we will leverage advances in fast motion estimation to improve and make faster video understanding. We will then learn the regions where motion estimation should be focused within a scene, and finally put everything together in an end-to-end fast and efficient video understanding model.

This fundamental technology will be useful to a range of concrete applications including: action recognition, anomaly detection of actions, gait identification, detection and identification of moving objects among others.

Research theme

Sensor Signal Processing

Principal supervisor

Dr Laura Sevilla-Lara
University of Edinburgh, Edinburgh Centre for Robotics
lsevilla@exseed.ed.ac.uk

This article was published on 2025-10-31