Guest Lecture

From 2D Similarities to 3D Correspondences

Guest Lecture by Leonhard Sommer
March 24th, 2026
16:00
Robot Learning Lab; Seminar Room, Georges-Köhler-Allee 080, 79110 Freiburg
Self-supervised visual representations often capture semantic part similarities, features that respond consistently to wings, legs, or wheels. However, these features encode 2D similarities rather than 3D correspondences. To reason in a 3D world, an ideal representation would provide 3D correspondences even in the case of occlusions. We introduce a new evaluation for 3D correspondences directly in camera space and decompose the problem into 9D pose estimation and a morphable prior in canonical object space. Finally, we outline how large-scale canonicalization of object-centric videos could scale this approach to hundreds of object categories.