The goal of this project is to develop lightweight video models that make use of motion to compress the information needed. ObjectivesTo develop video models that use sparse point tracking as input instead of a sequence of images, to improve data and computational efficiency, resulting in faster training and testing times, as well as needing less training data.To automatically learn to identify the key points to track that are most discriminative.To create an end-to-end system that can identify the points to track, tracks them and performs video understanding DescriptionVideo Understanding is a fundamental ability of intelligent systems, from robots, to drones or AI systems. However, current video understanding builds on technology designed for images, and since videos can be orders of magnitude larger than images, video technology becomes expensive, both computationally and in terms of the amount of data needed to train. Most video technology, however, ignores explicit motion information altogether. During this PhD, we will leverage advances in fast motion estimation to improve and make faster video understanding. We will then learn the regions where motion estimation should be focused within a scene, and finally put everything together in an end-to-end fast and efficient video understanding model.This fundamental technology will be useful to a range of concrete applications including: action recognition, anomaly detection of actions, gait identification, detection and identification of moving objects among others. Research themeSensor Signal ProcessingPrincipal supervisorDr Laura Sevilla-LaraUniversity of Edinburgh, Edinburgh Centre for Roboticslsevilla@exseed.ed.ac.uk This article was published on 2025-10-31