ReplanVLM
—
graph TB a(User Input) b(Observe Image) c(Decision Bot) d(Task plan) e(Code Generation) f(Inner Bot) g(Environment) h(Extra Bot) a-->c b-->c c-->d d-->e e-->f f-->|No|c f-->|Yes|g g-->h h-->|No|c h-->|Yes|i(end)
—
- Decision Bot:
- generate task plan based on user input and observed images
- generate code based on task plan
- Inner Bot:
- check code correctness
- check with environment and codebase information
- Extra Bot:
- compare images before and after taking action
- return feedback if not succeed
Chain of Verification
—
::: block
-
generate baseline response by LLM
-
based on user input and baseline response, generate verification question for baseline response
-
independently answer the verification questions (w/o baseline response).
then check the answer against baseline response
-
generate final response based on baseline response and feedback from step 3
:::
Adaptive Interactive Navigation
—
- Task planing:
- generate skill tree
- evaluate nodes and find high-level skeleton
- Advisor:
- interpret environment: Failure, New Object, Revaluation
- Arborist:
- adding node for new information
- pruning the failed nodes