1. Place all the data in the following structure, and there are 130 scenes in total.
|-- google1000
|-- scenes/
| |--- scene_0000/
| |--- scene_0001/
| |--- ... ...
| |--- scene_1499/
|
|-- models
| |-- 000/ # Details of model of object 0
| |-- ... ...
| `-- 1029/
|-- models_down
| |-- 000.ply # Downsampled model point cloudof object 0
| |-- ... ...
| `-- 1029.ply
|
|-- camera.json # Camera intrinsics
|
`-- graspnet_labels_v3 # correspondence lables between scenes and object models in ./models_down/; initially empty, need further labelling
2. Detail structure of each scene (take scene_0000 as an example)
|-- scene_0000
|-- blender_proc
| | |-- rgb
| | | |-- 0000.jpg to 049.jpg # 50 rgb images
| | `-- depth
| | | |-- 0000.png to 049.png # 50 depth images
| | `-- label
| | | |-- 0000.png to 049.png # 50 object mask images, 0 is background, 1-88 denotes each object (1-indexed), same format as YCB-Video dataset
| | `-- annotations
| | | |-- 0000.xml to 049.xml # 50 object 6d pose annotation. 'pos_in_world' and'ori_in_world' denotes position and orientation w.r.t the camera frame.
| | `-- meta
| | | |-- 0000.mat to 049.mat # 50 object 6d pose annotation, same format as YCB-Video dataset for easy usage
| | `-- camK.npy # camera intrinsic, shape: 3x3, [[f_x,0,c_x], [0,f_y,c_y], [0,0,1]]
| | `-- camera_poses.npy # 50 camera poses with respect to the first frame, shape: 256x(4x4)
| | `-- cam0_wrt_table.npy # first frame's camera pose with respect to the table, shape: 4x4
|
`-- object_id_list.txt # ids of objects appeared in this scene
Copyright © 2021 Machine Vision and Intelligence Group, Shanghai Jiao Tong University.