🚀 Exciting Opportunity for ETH Zürich Computer Science Master Students to do an Industry Thesis! 🚀
We offer Master's Theses in collaboration with the Computational Robotics Lab at ETH Zurich.
Motivation
Recent breakthroughs in robot learning (Diffusion Policy, ACT/ALOHA, RDT-1B) have unlocked dexterous single-arm manipulation from a handful of demonstrations. Yet coordinated dual-arm manipulation of real-world objects remains largely unsolved. Logistics is the ideal proving ground: parcels vary wildly in size, mass, and shape, and many are simply too large or heavy for a single arm. This thesis develops a learning-based controller that enables two robot arms (UR5/UR20) to jointly grasp and manipulate parcels under real warehouse conditions.
Goal
Develop a learning-based controller that enables two robot arms to jointly pick and manipulate parcels of varying sizes and shapes.
Approach
Teleoperate two arms (UR5 / UR20) to collect demonstrations.
Train a policy using imitation learning or diffusion policies.
Transfer policy from simulation to real robot.
Evaluate robustness across parcel shapes, weight distributions, and clutter.
Required Skills
Solid programming skills in Python
Familiarity with machine learning and deep learning frameworks (e.g. PyTorch)
Basic knowledge of robotics, control, or computer vision
Strong problem-solving skills and interest in experimental research
Skills developed during the Thesis
During the thesis, you will gain hands-on experience with modern learning-based robotics systems, including:
Robot manipulation and control with industrial robot arms (UR5 / UR20)
Imitation learning and diffusion policies for robot control
Teleoperation systems and demonstration data collection
Training and evaluating deep learning models for manipulation
Sim-to-real transfer and robot learning pipelines
Experimental evaluation on real robotic hardware
Motivation
Robots operating in real-world environments must manipulate objects that are rarely arranged neatly. In logistics centers and warehouses, objects often form cluttered piles where items support, block, or constrain each other. Removing the wrong object can destabilize the scene or cause failures. Humans naturally reason about these physical relationships and identify which objects can be safely removed. Enabling robots to develop a similar capability remains an open challenge in robotics. This project explores how robots can learn to infer physical relationships between objects in cluttered scenes and use this understanding to select better manipulation actions.
Goal
The goal of this thesis is to develop a learning-based system that enables a robot to select which object to pick next in a cluttered environment and how to perform the pick safely. Given observations from RGB-D cameras, the system should infer the physical relationships between objects in the scene and predict which item can be removed without destabilizing the pile. The project will focus on training a neural model that predicts the best next manipulation action by reasoning about support, blocking, and contact relationships between objects. The resulting system will output a policy that selects both a target object and an appropriate grasp strategy, maximizing the probability of successful manipulation while maintaining the stability of the remaining scene. The approach may combine structured scene representations, such as object-level graphs, with modern machine learning techniques including graph neural networks or transformer-based models, trained using a combination of large-scale simulation and real-world robot data.
Required Skills
Solid programming skills in Python
Familiarity with machine learning and deep learning frameworks (e.g. PyTorch)
Basic knowledge of robotics, computer vision, or 3D perception
Strong analytical and problem-solving skills
Experience with graph neural networks, 3D perception (point clouds, RGB-D), physics simulation, or robotic manipulation is beneficial but not strictly required.
Skills developed during the Thesis
During the thesis, you will gain hands-on experience in learning-based physical reasoning for robotics, including:
3D perception and scene understanding using RGB-D cameras and point clouds
Learning structured scene representations, such as object graphs and relational models
Graph neural networks and modern scene representation learning methods
Physics-based simulation and data generation for manipulation tasks
Designing policies for robotic manipulation in cluttered environments
Evaluating learning-based methods on realistic robotic tasks
Only open to current students in ETH Zurich's Computer Science Master's programme. Apply for more details!