Abstract:
In this paper, a multichannel vision-based approach for intelligent robotic grasping in cluttered environments is proposed. The experiments are conducted with an open-source synthetic dataset consisting of color and depth images to address this general problem. The proposed approach involves the use of a modified Cascade Mask R-CNN-based semantic segmentation model to detect and classify objects in the scene. The results show a high mAP@0.5-0.95 score of 93.85% for the customized Meta-Grasp dataset using this model. The captured depth data is processed based on the segmented mask regions to approximate their position in a 3D coordinate system. The affinity between the edge profiles is calculated to estimate the relation between the segmented objects in 3D space. This information is used to generate a priority order for object pickup such that only the objects in the top layer are picked first, followed by the underlying layers. The methodology was evaluated for various placement options for a 6-class subset of the dataset with a varying number of objects. The actual object classes and their mask positions were obtained successfully, and the priority order was calculated so that no lower-layered object was picked before the upper-lying object. Overall, the proposed two-stage decision pipeline has demonstrated its effectiveness in generating the pickup priority and sorting order for a multi-object scene and has potential applications in fully automated factories or smart manufacturing.