Vision-Language Navigation (VLN)
Researching VLN for robotic navigation in unseen environments.
This research is currently in progress. I am exploring how quadruped robots can follow natural-language instructions to navigate novel environments using vision-language models.
Introduction
The framework is built on top of NaVILA, which was proposed by Cheng et al. NaVILA is a universal Vision-Language-Action (VLA) model for legged robots to perform navigation through natural-language instructions. For more details, please refer to the their website embeded above. Below are recent deployments demonstrating vision-language navigation on a Unitree A1 quadruped in indoor environments.
System Requirements
We install the conda environment for NaVILA and run the VLN model on the server with RTX 5090 GPU. Since the paper used 40-series for the setup, there's some incompatibility with the 50-series. For a detailed setup guide, please refer to this instruction written by Richard Wang.
More updates are coming soon. Stay tuned!