Affordance-based grasping models trained from scratch can struggle to pick up new objects after 60 minutes of training (left). With pre-training from visual tasks, our affordance-based grasping models can easily generalize to picking up new objects with less than 10 minutes of training, even when evaluated with different hardware (middle: suction, right: gripper). |