Walterio Mayol-Cuevas Research

Friday, 30 September 2022

Sensor-level computer vision with pixel processor arrays for agile robots, Science Robotics 2022

Piotr Dudek, Thomas Richardson, Laurie Bose, Stephen Carey, Jianing Chen, Colin Greatwood, Yanan Liu, Walterio Mayol-Cuevas, Sensor-level computer vision with pixel processor arrays for agile robotsScience Robotics, 2022. [PDF@SR] (author's PDF is available upon request).

Vision processing for control of agile autonomous robots requires low-latency computation, within a limited power and space budget. This is challenging for conventional computing hardware. Parallel processor arrays (PPAs) are a new class of vision sensor devices that exploit advances in semiconductor technology, embedding a processor within each pixel of the image sensor array. Sensed pixel data are processed on the focal plane, and only a small amount of relevant information is transmitted out of the vision sensor. This tight integration of sensing, processing, and memory within a massively parallel computing architecture leads to an interesting trade-off between high performance, low latency, low power, low cost, and versatility in a machine vision system. Here, we review the history of image sensing and processing hardware from the perspective of in-pixel computing and outline the key features of a state-of-the-art smart camera system based on a PPA device, through the description of the SCAMP-5 system. We describe several robotic applications for agile ground and aerial vehicles, demonstrating PPA sensing functionalities including high-speed odometry, target tracking, obstacle detection, and avoidance. In the conclusions, we provide some insight and perspective on the future development of PPA devices, including their application and benefits within agile, robust, adaptable, and lightweight robotics.

Monday, 15 August 2022

On-Sensor Binarized CNN Inference with Dynamic Model Swapping in Pixel Processor Arrays, Frontiers in Neuroscience 2022

Liu Y, Bose L, Fan R, Dudek P and Mayol-Cuevas W. On-sensor binarized CNN inference with dynamic model swapping in pixel processor arrays. Frontiers in Neuroscience. Neuromorphic Engineering. 15 August 2022. [PDF]

In this research, we explore novel ways to develop and deploy visual inference on the image plane including dynamically swapping CNN modules on-the-fly to fit larger models in a Pixel Processor Array.

Sunday, 8 May 2022

Rebellion and Disobedience as Useful Tools in Human-Robot Interaction Research - The Handheld Robotics Case, RaD in AI workshop AAMAS 2022

W.W. Mayol-Cuevas Rebellion and Disobedience as Useful Tools in Human-Robot Interaction Research - The Handheld Robotics Case. CoRR abs/2205.03968. Rebellion and Disobedience in AI Workshop AAMAS (2022) [PDF]

This position paper* argues on the utility of rebellion and disobedience (RaD) in human-robot interaction (HRI). In general, we see two main opportunities in the use of controlled and well designed rebellion and disobedience: i) illuminate insight into the effectiveness of the collaboration (or lack of) and ii) prevent mistakes and correct user actions when in the user's own interest. Through the use of a close interaction modality, that of handheld robots, we discuss use cases for utility of rebellion and disobedience that can be applicable to other instances of HRI.

*Underpinning work in collaboration with Janis Stolzenwald and Austin Gregg-Smith

Monday, 4 October 2021

Yao Lu, Walterio W. Mayol-Cuevas: The Object at Hand: Automated Editing for Mixed Reality Video Guidance from Hand-Object Interactions. IEEE ISMAR 2021. [PDF]

Monday, 29 March 2021

Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent. Journal of Vision 2021

Brian Sullivan, Casimir J. H. Ludwig, Dima Damen, Walterio Mayol-Cuevas, Iain D. Gilchrist; Look-ahead fixations during visuomotor behavior: Evidence from assembling a camping tent. Journal of Vision 2021;21(3):13. [PDF]

Monday, 1 March 2021

Weighted Node Mapping and Localisation on a Pixel Processor Array. IEEE ICRA 2021

H. Castillo Elizalde, Y. Liu, L. Bose and W. Mayol-Cuevas. Weighted Node Mapping and Localisation on a Pixel Processor Array. IEEE ICRA 2021. [PDF]

Wednesday, 30 September 2020

Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays. ECCV 2020

L Bose, P Dudek, J Chen, S J. Carey, W Mayol-Cuevas. Fully Embedding Fast Convolutional Networks on Pixel Processor Arrays. ECCV. 2020 [PDF]

Visual Odometry Using Pixel Processor Arrays for Unmanned Aerial Systems in GPS Denied Environments. Frontiers in Robotics and AI, 2020

A. McConville, L. Bose, R. Clarke, W. Mayol-Cuevas, J. Chen, C. Greatwood, S. Carey,P. Dudek, T. Richardson. Visual Odometry Using Pixel Processor Arrays for Unmanned Aerial Systems in GPS Denied Environments. Frontiers in Robotics and AI, Vol 7, pp126. September, 2020. [Paper]

Friday, 31 July 2020

Geometric Affordance Perception: Leveraging Deep 3D Saliency With the Interaction Tensor. Frontiers in Neurorobotics 2020

E. Ruiz and W. Mayol-Cuevas, Geometric Affordance Perception: Leveraging Deep 3D Saliency With the Interaction Tensor. Frontiers in Neurorobotics, July, 2020. [Paper] [Code]

Tuesday, 30 June 2020

Action Modifiers: Learning From Adverbs in Instructional Videos, CVPR 2020

Hazel Doughty, Ivan Laptev, Walterio Mayol-Cuevas, Dima Damen; Action Modifiers: Learning From Adverbs in Instructional Videos. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 868-878. 2020. [PDF]

Tuesday, 29 October 2019

A camera that CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays, (ICCV 2019 Oral)

We present a convolutional neural network implementation for pixel processor array (PPA) sensors. PPA hardware consists of a fine-grained array of general-purpose processing elements, each capable of light capture, data storage, program execution, and communication with neighboring elements. This allows images to be stored and manipulated directly at the point of light capture, rather than having to transfer images to external processing hardware. Our CNN approach divides this array up into 4x4 blocks of processing elements, essentially trading-off image resolution for increased local memory capacity per 4x4 "pixel". We implement parallel operations for image addition, subtraction and bit-shifting images in this 4x4 block format. Using these components we formulate how to perform ternary weight convolutions upon these images, compactly store results of such convolutions, perform max-pooling, and transfer the resulting sub-sampled data to an attached micro-controller. We train ternary weight filter CNNs for digit recognition and a simple tracking task, and demonstrate inference of these networks upon the SCAMP5 PPA system. This work represents a first step towards embedding neural network processing capability directly onto the focal plane of a sensor.

A Camera That CNNs: Towards Embedded Neural Networks on Pixel Processor Arrays, Laurie Bose, Jianing Chen, Stephen J. Carey, Piotr Dudek and Walterio Mayol-Cuevas. International Conference on Computer Vision ICCV 2019. Seoul, Korea, 2019. Arxiv [PDF].

Sunday, 29 September 2019

Rebellion and Obedience: The Effects of Intention Prediction in Cooperative Handheld Robots (IROS 2019)

Within this work, we explore intention inference for user actions in the context of a handheld robot
setup. Handheld robots share the shape and properties of handheld tools while being able to process task information and aid manipulation. Here, we propose an intention prediction model to enhance cooperative task solving. The model derives intention from the user’s gaze pattern which is captured using a robot-mounted remote eye tracker. The proposed model yields real-time capabilities and reliable accuracy up to 1.5 s prior to predicted actions being executed. We assess the model in an assisted pick and place task and show how the robot’s intention obedience or rebellion affects the cooperation with the robot.

Janis Stolzenwald and Walterio Mayol-Cuevas, "Rebellion and Obedience: The Effects of Intention Prediction in Cooperative Handheld Robots", to be published at 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, 2019. [PDF]

Thursday, 30 May 2019

The Pros and Cons: Rank-aware Temporal Attention for Skill Determination in Long Videos CVPR 2019

HAZEL DOUGHTY, WALTERIO MAYOL-CUEVAS AND DIMA DAMEN IN IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2019.

We present a new model to determine relative skill from long videos, through learnable temporal attention modules. Previous work formulates skill determination for common tasks as a ranking problem, yet measures skill from randomly sampled video segments. We believe this approach to be limiting since many parts of the video are irrelevant to assessing skill, and there may be variability in the skill exhibited throughout a video. Assessing skill from a single section may not reflect the overall skill in the video. We propose to train rank-specific temporal attention modules, learned with only video-level supervision, using a novel rank-aware loss function. In addition to attending to task-relevant video parts, our proposed loss jointly trains two attention modules to separately attend to video parts which are indicative of higher (pros) and lower (cons) skills. We evaluate the approach on the public EPIC-Skills dataset and additionally collect and annotate a larger dataset for skill determination with five previously unexplored tasks. Our method outperforms previous approaches and classic softmax attention on both datasets by over 4% pairwise accuracy, and as much as 12% on individual tasks. We also demonstrate our model's ability to attend to rank-aware parts of the video.

[Webpage] [arXiv] [Dataset]

Wednesday, 19 September 2018

Who's Better, Who's Best: Skill Determination in Video using Deep Ranking. (CVPR 2018 Spotlight)

Who's Better, Who's Best: Skill Determination in Video using Deep Ranking. CVPR PDF | arxiv
CVPR Spotlight talk

Published at CVPR2018.

Here we explore a new problem for egocentric perception. The study of how to determine who is better at a task in an unsupervised manner. We use Deep methods to rank videos in both medical tasks -for which there is expert scoring for validation- and also for the more complex daily living tasks -for which the scoring is ambiguous-.

Saturday, 1 September 2018

I Can See Your Aim: Estimating User Attention From Gaze For Handheld Robot Collaboration (IROS2018)

Extended version of our demo video for our IEEE/RSJ IROS 2018 submission [Arxiv]
Janis Stolzenwald, Walterio W. Mayol-Cuevas. "I Can See Your Aim: Estimating User Attention From Gaze For Handheld Robot Collaboration". IROS 2018.

Also see www.handheldrobotics.org for extended information on this project line.

Tuesday, 1 May 2018

Where can i do this? Geometric Affordances from a Single Example with the Interaction Tensor (ICRA2018)

IEEE ICRA2018:
Eduardo Ruiz, Walterio W. Mayol-Cuevas: Where can i do this? Geometric Affordances from a Single Example with the Interaction Tensor. ICRA 2018.

This paper introduces and evaluates a new tensor field representation to express the geometric affordance of one object over another. Evaluations also include crowdsourcing comparisons that confirm the validity of our affordance proposals, which agree on average 84% of the time with human judgments, which is 20-40% better than the baseline methods.

Tuesday, 24 October 2017

Visual Odometry for Pixel Processor Arrays (ICCV2017 Spotlight)

"Visual Odometry for Pixel Processor Arrays [PDF]", Laurie Bose, Jianing Chen, Stephen J. Carey, Piotr Dudek and Walterio Mayol-Cuevas. International Conference on Computer Vision (ICCV), Venice, Italy, 2017.
ICCV2017 Spotlight Talk

We present an approach of estimating constrained egomotion on a Pixel Processor Array (PPA). These devices embed processing and data storage capability into the pixels of the image sensor, allowing for fast and low power parallel computation directly on the image plane. Rather than the standard visual pipeline whereby whole images are transferred to an external general processing unit, our approach performs all computation upon the PPA itself, with the camera’s estimated motion as the only information output. Our approach estimates 3D rotation and a 1D scale less estimate of translation. We introduce methods of image scaling, rotation and alignment which are performed solely upon the PPA itself and form the basis for conducting motion estimation. We demonstrate the algorithms on a SCAMP-5 vision chip, achieving frame rates higher than 1000Hz and at about 2W power consumption. project website www.project-agile.org

Friday, 20 October 2017

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video (ICCV2017)

Trespassing the Boundaries: Labeling Temporal Bounds for Object Interactions in Egocentric Video. D Moltisanti, M Wray, W Mayol-Cuevas, D Damen. International Conference on Computer Vision (ICCV), 2017. pdf

Monday, 25 September 2017

Visual target tracking with a Pixel Processor Array (IROS2017)

"Tracking control of a UAV with a parallel visual processor", C. Greatwood, L. Bose, T. Richardson, W. Mayol-Cuevas, J. Chen, S.J. Carey and P. Dudek, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2017 [PDF].

This paper presents a vision-based control strategy for tracking a ground target using a novel vision sensor featuring a processor for each pixel element. This enables computer vision tasks to be carried out directly on the focal plane in a highly efficient manner rather than using a separate general purpose computer. The strategy enables a small, agile quadrotor Unmanned Air Vehicle (UAV) to track the target from close range using minimal computational effort and with low power consumption. The tracking algorithm exploits the parallel nature of the visual sensor, enabling high rate image processing ahead of any communication bottleneck with the UAV controller. With the vision chip carrying out the most intense visual information processing, it is computationally trivial to compute all of the controls for tracking onboard. This work is directed toward visual agile robots that are power efficient and that ferry only useful data around the information and control pathways.
Visit the project website at:
www.project-agile.org

Wednesday, 8 March 2017

Automated capture and delivery of assistive task guidance with an eyewear computer: The GlaciAR system (Augmented Human 2017)

An approach that allows both automatic capture and delivery of mixed reality guidance fully onboard Google Glass.

From the paper:

Teesid Leelasawassuk, Dima Damen, Walterio Mayol-Cuevas, Automated capture and delivery of assistive task guidance with an eyewear computer: The GlaciAR system. Augmented Human 2017.

https://arxiv.org/abs/1701.02586

In this paper we describe and evaluate an assistive mixed reality system that aims to augment users in tasks by combining automated and unsupervised information collection with minimally invasive video guides. The result is a fully self-contained system that we call GlaciAR (Glass-enabled Contextual Interactions for Augmented Reality). It operates by extracting contextual interactions from observing users performing actions. GlaciAR is able to i) automatically determine moments of relevance based on a head motion attention model, ii) automatically produce video guidance information, iii) trigger these guides based on an object detection method, iv) learn without supervision from observing multiple users and v) operate fully on-board a current eyewear computer (Google Glass). We describe the components of GlaciAR together with user evaluations on three tasks. We see this work as a first step toward scaling up the notoriously difficult authoring problem in guidance systems and an exploration of enhancing user natural abilities via minimally invasive visual cues.

Wednesday, 4 November 2015

Investigating Spatial Guidance for a Cooperative Handheld Robot

How to provide feedback information to guide users of a handheld robotic device when performing a spatial exploration task? We are investigating this issue via various feedback methods for communicating a 5 degree of freedom target to a user. We compare against a non-robotic handheld wand and use various display alternatives including a stereoscopic VR display (Oculus Rift), a monocular see-through AR display and a 2D screen, as well as simple robot arm gesturing. Our results indicate a significant improvement when using the handheld robot and some interesting conclusions on the effect of various popular display technologies. More to come on a follow up publication... Watch this space and also handheldrobotics.org

Improving MAV Control by Predicting Aerodynamic Effects of Obstacles

Building on our previous work, we demonstrate how it is possible to improve flight control of a MAV that experiences aerodynamic disturbances caused by objects on its path. Predictions based on low resolution depth images taken at a distance are incorporated into the flight control loop on the throttle channel as this is adjusted to target undisrupted level flight. We demonstrate that a statistically significant improvement (p << 0:001) is possible for some common obstacles such as boxes and steps, compared to using conventional feedback-only control. Our approach and results are encouraging toward more autonomous MAV exploration strategies.

John Bartholomew, Andrew Calway, Walterio Mayol-Cuevas, Improving MAV Control by Predicting Aerodynamic Effects of Obstacles. IEEE/RSJ International Conference on Intelligent Robots and Systems. September 2015. [PDF]

Beautiful vs Useful Maps

We are developing methods for compressing maps in a more intelligent way than only looking at geometry. Specifically we look at the relocalisation criteria as a more useful metric. Our methods in the above slide show that they are much better at relocalisation than traditional compression approaches. Some of them also preserve geometry well.

Luis Contreras Toledo, Walterio Mayol-Cuevas, Trajectory-Driven Point Cloud Compression Techniques for Visual SLAM. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). October 2015. [PDF]

Tuesday, 1 September 2015

Estimating Visual Attention from a Head Mounted IMU

How to obtain attention for an eyewear computer without a gaze tracker? We developed this method for Google Glass.
This paper concerns with the evaluation of methods for the estimation of both temporal and spatial visual attention using a head-worn inertial measurement unit (IMU). Aimed at tasks where there is a wearer-object interaction, we estimate the when and the where the wearer is interested in. We evaluate various methods on a new egocentric dataset from 8 volunteers and compare our results with those achievable with a commercial gaze tracker used as ground-truth. Our approach is primarily geared for sensor-minimal EyeWear computing.

Teesid Leelasawassuk, Dima Damen, Walterio W Mayol-Cuevas, Estimating Visual Attention from a Head Mounted IMU. ISWC '15 Proceedings of the 2015 ACM International Symposium on Wearable Computers. ISBN 978-1-4503-3578-2, pp. 147–150. September 2015. PDF, 2045 Kbytes. External information

Wednesday, 6 May 2015

Accurate Photometric and Geometric Error Minimisation with Inverse Depth & What to landmark?

Two papers related to RGBD mapping from a nice collaboration with Daniel Gutierrez and Josechu Guerrero from the University of Zaragoza. Both to be presented at ICRA 2015. One of them nominated for Awards:

D. Gutiérrez-Gómez, W. Mayol-Cuevas, J.J. Guerrero. "Inverse Depth for Accurate Photometric and Geometric Error Minimisation in RGB-D Dense Visual Odometry", In IEEE International Conference on Robotics and Automation (ICRA), 2015. Nominated for Best Robotic Vision Paper Award. [pdf][video][code available]

D. Gutiérrez-Gómez, W. Mayol-Cuevas, J.J. Guerrero. "What Should I Landmark? Entropy of Normals in Depth Juts for Place Recognition in Changing Environments Using RGB-D Data", In IEEE International Conference on Robotics and Automation (ICRA), 2015.[pdf]

Monday, 1 September 2014

Cognitive Handheld Robots

We have been working for 2.5 years on prototypes (and since 2006 in the concept!) on what we think is a new extended type of robot. Handheld robots have the shape of tools and are intended to have cognition and action while cooperating with people. This video is from our first prototype back in November 2013. We are also offering details of its construction and 3D CAD models at www.handheldrobotics.org . We are currently developing a new prototype and more on this soon. Austin Gregg-Smith is sponsored by the James Dyson Foundation.

Austin Gregg-Smith and Walterio Mayol. The Design and Evaluation of a Cooperative Handheld Robot. IEEE International Conference on Robotics and Automation (ICRA). Seattle, Washington, USA. May 25th-30th, 2015. [PDF] Nominated for Best Cognitive Robotics Paper Award.

Saturday, 9 August 2014

Recognition and Reconstruction of Transparent Objects for Augmented Reality

Dealing with real transparent objects for AR is challenging due to their lack of texture and visual features as well as the drastic changes in appearance as the background, illumination and camera pose change. In this work, we explore the use of a learning approach for classifying transparent objects from multiple images with the aim of both discovering such objects and building a 3D reconstruction to support convincing augmentations. We extract, classify and group small image patches using a fast graph-based segmentation and employ a probabilistic formulation for aggregating spatially consistent glass regions. We demonstrate our approach via analysis of the performance of glass region detection and example 3D reconstructions that allow virtual objects to interact with them.

From our paper: Alan Francisco Torres-Gomez, Walterio Mayol-Cuevas, Recognition and reconstruction of transparent objects for Augmented Reality. ISMAR 2014. PDF available at here.

Friday, 1 August 2014

Discovering Task Relevant Objects, their Usage and Providing Video Guides from Multi-User Egocentric Video

Using a wearable gaze tracking setup, we have developed a fully unsupervised method for the discovery of task relevant objects (TRO) and how these objects have been used. A TRO is an object, or part of an object that a person interacts with during a task. We combine visual appearance, motion and user attention to discover TROs. Both static objects such as a coffee machine as well as movable ones such as a cup are discovered. We introduce the term Mode of Interaction to refer to the different ways in which TROs are used. Say, a cup can be lifted, washed, or poured into. When harvesting interactions with the same object from multiple operators, common modes of interaction can be found. We also developed an online fully unsupervised prototype for automatically extracting video guides of how the objects are used. The method automatically selects suitable video segments that indicate how others have used that object before.

Damen, Dima and Leelasawassuk, Teesid and Haines, Osian and Calway, Andrew and Mayol-Cuevas, Walterio (2014). You-Do, I-Learn: Discovering Task Relevant Objects and their Modes of Interaction from Multi-User Egocentric Video. British Machine Vision Conference (BMVC), Nottingham, UK. [pdf]
Damen, Dima and Haines, Osian and Leelasawassuk, Teesid and Calway, Andrew and Mayol-Cuevas, Walterio (2014). Multi-user egocentric Online System for Unsupervised Assistance on Object Usage. ECCV Workshop on Assistive Computer Vision and Robotics (ACVR), Zurich, Switzerland. [preprint]

We also have released the dataset available on this project webpage.

Wednesday, 23 April 2014

Learning to Predict Obstacle Aerodynamics from Depth Images for Micro Air Vehicles

This work develops a method to anticipate the aerodynamic ground effects a Micro Air Vehicle will have when going above different obstacles. This works by learning a mapping from depth images to the acceleration experienced from flying above a variety of objects. Computing full 3D aerodynamic effects using air flow simulation onboard a MAV is currently unfeasible. This work uses the alternative approach of learning the visual appearance of objects that produce a given effect which therefore turns the problem into one of regression rather than raw computation.
With the current "easiness" with which 3D maps are now possible to be constructed, this work in a way aims to enhance maps with information that is beyond purely geometric. We have also closed the control loop so we correct for the deviation in anticipation, but that is for another paper.

John Bartholomew, Andrew Calway and Walterio Mayol-Cuevas, Learning to Predict Obstacle Aerodynamics from Depth Images for Micro Air Vehicles by , IEEE ICRA 2014. [PDF]

Saturday, 12 October 2013

3D from Looking: Using Wearable Gaze Tracking for Hands-Free and Feedback-Free Object Modelling

How to get a 3D model of something one looks at without any clicks or even feedback to the user?

From our paper T. Leelasawassuk and W.W. Mayol-Cuevas. 3D from Looking: Using Wearable Gaze Tracking for Hands-Free and Feedback-Free Object Modelling. ISWC 2013.

Monday, 1 July 2013

Real-time 3D simultaneous localization and map-building for a dynamic walking humanoid robot

Image from our Advanced Robotics Journal paper

This work is with our Samsung Electronics colleagues where our real-time slam system has been used to provide Samsung's RoboRay with mapping capability. RoboRay is one of the most advanced humanoid robots and uses dynamic walking which imposes higher demands on the slam system as there is more agility and motion overall. More details in the paper:

Real-time 3D simultaneous localization and map-building for a dynamic walking humanoid robot
S Yoon, S Hyung, M Lee, KS Roh, SH Ahn, A Gee, P Bunnun, A Calway, & WW Mayol-Cuevas,
Advanced Robotics, published online on May 1st, 2013.

Friday, 28 June 2013

Real-Time Continuous 6D Relocalisation for Depth Cameras

In this work we present our fast (50Hz) relocalisation method based on simple visual descriptors plus a 3D geometrical test for a system performing visual 6-D relocalisation at every single frame and in real time. Continuous relocalisation is useful in re-exploration of scenes or for loop-closure in earnest. Our experiments suggest the feasibility of this novel approach that benefits from depth camera data, with a relocalisation performance of 73% while running on a single core onboard a moving platform over trajectory segments of about 120m. The system also reduces in 95% the memory footprint compared to a system using conventional SIFT-like descriptors.

J. Martinez-Carranza, Walterio Mayol-Cuevas. Real-Time Continuous 6D Relocalisation for Depth Cameras. Workshop on Multi VIew Geometry in RObotics (MVIGRO), in conjunction with Robotics Science and Systems RSS. Berlin, Germany. June, 2013. PDF
J. Martinez Carranza, A. Calway, W. Mayol-Cuevas, Enhancing 6D visual relocalisation with depth cameras. International Conference on Intelligent Robots and Systems IROS. November 2013.

Wednesday, 1 May 2013

Mapping and Auto Retrieval for Micro Air Vehicles (MAV) using RGBD Visual Odometry

These videos show work we been doing with our partners at Blue Bear for onboard visual mapping for MAVs. These are based on visual odometry mapping for working over large areas and build maps onboard the MAV using an asus xtion-pro RGBD camera mounted on the vehicle. One of the videos show autoretrieval of the vehicle where a human pilot first flies the vehicle through the space and then the map is used for relocalisation using the map built on the way back. The other video is on a nuclear reactor installation. These are works we been doing for a while on uses of our methods for industrial inspection.

Monday, 25 March 2013

March 2013 New Robotics MSc. I redesigned our joint MSc in Robotics which is now aimed to support students from various backgrounds in Engineering, Physics and Maths. I am looking forward to supervise MSc project here. Have a look at the programme here. The application deadline is August 31st.

Thursday, 1 November 2012

Integrating 3D Object Detection, Modelling and Tracking on a Mobile Phone

https://www.youtube.com/watch?v=t4JvAkKh730

Click image for video

This is from a paper at ISMAR 2012 where we combine three things on a mobile phone: 1) in-situ interactive object modelling, 2) real-time tracking based on the wireframe model of the object and 3) re-detection of a lost object using our textureless object detector. It works on a Nokia 900, but look for another post for the Android App.

Pished Bunnun, Dima Damen, Andrew Calway, Walterio Mayol-Cuevas, Integrating 3D Object Detection, Modelling and Tracking on a Mobile Phone. International Symposium on Mixed and Augmented Reality (ISMAR). November 2012. PDF.

Sunday, 7 October 2012

Predicting Micro Air Vehicle Landing Behaviour from Visual Texture

How does a UAV can decide where is best to land and what to expect if landing on a particular material? Here we develop a framework to predict the landing behaviour of a Micro Air Vehicle (MAV) from the visual appearance of the landing surface. We approach this problem by learning a mapping from visual texture observed from an onboard camera to the landing behaviour on a set of sample materials. In this case we exemplify our framework by predicting the yaw angle of the MAV after landing.

John Bartholomew, Andrew Calway, Walterio Mayol-Cuevas, Predicting Micro Air Vehicle Landing Behaviour from Visual Texture . IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). October 2012. PDF.

Egocentric Real-time Workspace Monitoring using an RGB-D Camera

We developed an integrated system for personal workspace monitoring based around an RGB-D sensor. The approach is egocentric, facilitating full flexibility, and operates in real-time, providing object detection and recognition, and 3D trajectory estimation whilst the user undertakes tasks in the workspace. A prototype on-body system developed in the context of work-flow analysis for industrial manipulation and assembly tasks is described. We evaluated on two tasks with multiple users, and results indicate that the method is effective, giving good accuracy performance.

Dima Damen, Andrew Gee, Walterio Mayol-Cuevas and Andrew Calway. Egocentric Real-time Workspace Monitoring using an RGB-D Camera, IROS 2012. PDF, Video available from here.

Monday, 3 September 2012

6D RGBD relocalisation

Relocalisation is about finding out where the camera is in translation and rotation (6D) when it visits the space for the first time after a map has been created, or if gets lost during tracking due to occlusion. This is also known as the "kidnapped robot" problem in Robotics and appears frequently in SLAM at the loopclosing stage. Here, we develop a fast relocalisation method for RGB-D cameras that operates in workplaces where low texture and some occlusion can be present. Videos are available here.

Andrew P. Gee, Walterio Mayol-Cuevas, 6D Relocalisation for RGBD Cameras Using Synthetic View Regression. Proceedings of the British Machine Vision Conference (BMVC). September 2012. PDF.