SeManticInDustry-S.MID-A Dataset for LiDAR semantic segmentation

Scenes

We hope to setup a large-scale robotic application benchmark for LiDAR semantic segmentation task. We collect a total of 38904 frames of hybrid-solid LiDAR data in different substations through an industrial robot and have annotated 25 categories.

Example of labeled cumulative point clouds in S.MID.

A scene demo of labeled cumulative point clouds in our novel dataset S.MID.

Semantic LiDAR dataset comparison. Frames ^† for train/val/test. Number of classes ^‡ for single frame evaluation and annotated total number in brackets.
Datasets	Frames ^†	LiDAR	Types of LiDAR	Classes ^‡	Applications
nuScenes	28130/6019/6008	Velodyne-HDL-32E	Mechanical Spinning LiDAR	16 (32)	Autonomous Vehicle
SemanticKITTI	19130/4071/20351	Velodyne-HDL-64E	Mechanical Spinning LiDAR	19 (34)	Autonomous Vehicle
S.MID	13101/5000/20803	Livox Mid-360	Hybrid-Solid LiDAR	14 (25)	Industrial Robot

Sensors

Figures below show the sensors equipped on our industrial robot used to collect S.MID. Please note that only data collected by Livox Mid-360 and the corresponding labels are released with SMID_beta_v1_2 and SMID_v1_3.

Livox Mid-360 is suitable for industrial robots involving scene understanding tasks since it covers a broader range of scenes with non-repetitive scanning mode. However, it is a double-edged sword. This mode will also make the point cloud relatively sparse and randomly distributed. Therefore, the single-frame hybrid-solid LiDAR segmentation task brings more challenges to network design. (More details can be found in our paper ).

Label distributions

For single-frame segmentation task, we merge the annotated labels into 14 classes (knife switch, main transformer, arrester, voltage transformer, busbar, switch, current transformer, scaffold, support column, road, other-ground, fence, fire shelter, wall). The imbalanced count of classes is common in substation scenes. Hence, similar to imbalanced class distributions observed in autonomous driving datasets, addressing the issue of imbalanced class distribution in S.MID is an integral aspect that methods must contend with.

A diagram of number of points in each class in S.MID.

Folder structure and format

Similar to SemanticKITTI, we provide for each scan XXXXXX.bin of the hybrid folder, a file XXXXXX.label in the labels folder that contains for each point a label in binary format. The label is a 32-bit unsigned integer (aka uint32_t) for each point, where the lower 16 bits correspond to the label. You can go to our project page to learn more about how to load our dataset.

SMID_beta_v1_3