Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Werby, Abdelrhman; Huang, Chenguang; Büchner, Martin; Valada, Abhinav; Burgard, Wolfram

doi:10.15607/RSS.2024.XX.077

Computer Science > Robotics

arXiv:2403.17846 (cs)

[Submitted on 26 Mar 2024 (v1), last revised 3 Jun 2024 (this version, v2)]

Title:Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Authors:Abdelrhman Werby, Chenguang Huang, Martin Büchner, Abhinav Valada, Wolfram Burgard

View PDF HTML (experimental)

Abstract:Recent open-vocabulary robot mapping methods enrich dense geometric maps with pre-trained visual-language features. While these maps allow for the prediction of point-wise saliency maps when queried for a certain language concept, large-scale environments and abstract queries beyond the object level still pose a considerable hurdle, ultimately limiting language-grounded robotic navigation. In this work, we present HOV-SG, a hierarchical open-vocabulary 3D scene graph mapping approach for language-grounded robot navigation. Leveraging open-vocabulary vision foundation models, we first obtain state-of-the-art open-vocabulary segment-level maps in 3D and subsequently construct a 3D scene graph hierarchy consisting of floor, room, and object concepts, each enriched with open-vocabulary features. Our approach is able to represent multi-story buildings and allows robotic traversal of those using a cross-floor Voronoi graph. HOV-SG is evaluated on three distinct datasets and surpasses previous baselines in open-vocabulary semantic accuracy on the object, room, and floor level while producing a 75% reduction in representation size compared to dense open-vocabulary maps. In order to prove the efficacy and generalization capabilities of HOV-SG, we showcase successful long-horizon language-conditioned robot navigation within real-world multi-storage environments. We provide code and trial video data at this http URL.

Comments:	Code and video are available at this http URL
Subjects:	Robotics (cs.RO); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG)
Cite as:	arXiv:2403.17846 [cs.RO]
	(or arXiv:2403.17846v2 [cs.RO] for this version)
	https://doi.org/10.48550/arXiv.2403.17846
Related DOI:	https://doi.org/10.15607/RSS.2024.XX.077

Submission history

From: Chenguang Huang [view email]
[v1] Tue, 26 Mar 2024 16:36:43 UTC (37,891 KB)
[v2] Mon, 3 Jun 2024 17:12:25 UTC (47,053 KB)

Computer Science > Robotics

Title:Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Robotics

Title:Hierarchical Open-Vocabulary 3D Scene Graphs for Language-Grounded Robot Navigation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators