ReplanVLM

graph TB
a(User Input)
b(Observe Image)
c(Decision Bot)
d(Task plan)
e(Code Generation)
f(Inner Bot)
g(Environment)
h(Extra Bot)

a-->c
b-->c
c-->d
d-->e
e-->f
f-->|No|c
f-->|Yes|g
g-->h
h-->|No|c
h-->|Yes|i(end)

  • Decision Bot:
    • generate task plan based on user input and observed images
    • generate code based on task plan
  • Inner Bot:
    • check code correctness
    • check with environment and codebase information
  • Extra Bot:
    • compare images before and after taking action
    • return feedback if not succeed

Chain of Verification

::: block

  1. generate baseline response by LLM

  2. based on user input and baseline response, generate verification question for baseline response

  3. independently answer the verification questions (w/o baseline response).

    then check the answer against baseline response

  4. generate final response based on baseline response and feedback from step 3

:::


Adaptive Interactive Navigation

  • Task planing:
    • generate skill tree
    • evaluate nodes and find high-level skeleton
  • Advisor:
    • interpret environment: Failure, New Object, Revaluation
  • Arborist:
    • adding node for new information
    • pruning the failed nodes