Powering the Future of Robotics with Machine Perception

The Machine Perception Stack:
Core Capabilities of LGMs
LGMs unify multiple downstream tasks critical to Machine Perception, leveraging either Unified Foundation Models or task-specific AI models. These tasks form the building blocks of intelligent robotic systems:
Visual Relocalization
Pinpointing a robot’s position in mapped or unmapped environments with precision.
Depth Estimation (Metric Scaling)
Measuring physical distances and environmental dimensions accurately from 2D images.
3D Reconstruction
Generating detailed 3D models of objects and locations from monocular or binocular views.
Semantic 3D Visual Segmentation
Identifying and categorizing objects and their structures in 3D space.
These capabilities enable robots to navigate, manipulate objects, and interact with complex environments autonomously.
The OVER 3D Maps Dataset:
A Game-Changer for LGM Training
Training Large Geospatial Models (LGMs) requires vast, high-quality datasets—akin to how Large Language Models (LLMs) rely on internet-scale text. LGMs, however, demand multi-view images, depth maps, and metric scaling data, which are notoriously scarce. OVER’s 3D Maps Dataset redefines the standard with unparalleled scale and diversity:
- 150,000 3D Maps of diverse indoor and outdoor real-world locations
- 75 Million+ images with associated depth and metric scaling data
- Orders of magnitude larger than datasets powering current Vision Foundation Models
To put this into perspective, the table below compares OVER’s dataset with popular datasets used for machine perception, highlighting its dominance in scale and real-world applicability:
Features | Over the Reality |
---|---|
Scenes Count | 150,000 |
Images Count | ≈75M |
Scene Type | Mixed |
Resolution | 1920x1080 3840x2880 |
Real Data | |
Synthetic Data | |
Static Scenes | |
Dynamic Scenes | |
Camera Data | |
Point Cloud | |
Depth Data | |
Metric Scaling | |
Mesh Data | |
LiDAR Data | |
Semantic Labels | |
Instance Masks | |
Optical Flow |
7Scenes | Replica | TUM RGBD | Matterport3D | HyperSim | Dynamic Replica | ScanNet++ | ScanNet | ARKitScenes | Virtual Kitti | KITTI360 | Spring | MegaDepth | ACID | MIPNERF360 | Tanks&Temples | ETH3D | PointOdyssey | TartanAir | DL3DV-10K | RealEstate10K | BlendedMVS |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7 | 18 | 39 | 90 | 461 | 524 | 1,006 | 1,513 | 1,661 | 5 | 11 | 47 | 196 | 13,047 | 9 | 21 | 25 | 195 | 1,037 | 10,510 | 74,766 | 113 |
≈ 20K-30K | N/A | N/A | ≈ 194K | ≈ 74K | N/A | 280K (DSLR) | N/A | N/A | N/A | N/A | N/A | ≈ 130K | N/A | ≈ 1-2K | ≈ 3.5K | N/A | N/A | N/A | N/A | N/A | ≈ 17K |
Indoor | Indoor | Indoor | Indoor | Indoor | Indoor | Indoor | Indoor | Indoor | Outdoor | Outdoor | Outdoor | Outdoor | Outdoor | Mixed | Mixed | Mixed | Mixed | Mixed | Mixed | Mixed | Mixed |
640x480 | ≈1080p | 640x480 | 1280x1024 | 1024x768 | ≈1080p | 1920x1440 | 1296x968 | 1920x1440 | 1242x375 | 1408x376 | 1920x1080 | Varies (≈1024x768) | 1080p | 1008x756 | 1920x1080 | 4048x3032 752x480 | 960x540 | 640x480 | 3840x2160 (960p/480p) | 720p-1080p (Videos) | 768x576 to 1600x1200 |
Bridging the Sim2Real Gap with Real-World Data
While robots are often trained in simulated environments for navigation and manipulation, the Sim2Real Gap remains a critical challenge: simulations struggle to replicate the complexity and variability of real-world settings, leading to failures in deployment.
OVER’s 3D Maps Dataset tackles this issue head-on by enabling:
State-of-the-Art 3D World Generators
Creating hyper-realistic synthetic environments grounded in real-world data.
Real-World Training Environments
Training robots directly on 150,000 high-fidelity 3D reconstructions of real locations.
By combining real-world scale with synthetic flexibility, OVER’s dataset empowers developers to build robust, adaptable robotic systems ready for real-world challenges.