ReplanVLM

—

graph TB
a(User Input)
b(Observe Image)
c(Decision Bot)
d(Task plan)
e(Code Generation)
f(Inner Bot)
g(Environment)
h(Extra Bot)

a-->c
b-->c
c-->d
d-->e
e-->f
f-->|No|c
f-->|Yes|g
g-->h
h-->|No|c
h-->|Yes|i(end)

—

Decision Bot:
- generate task plan based on user input and observed images
- generate code based on task plan
Inner Bot:
- check code correctness
- check with environment and codebase information
Extra Bot:
- compare images before and after taking action
- return feedback if not succeed

Chain of Verification

—

::: block

generate baseline response by LLM
based on user input and baseline response, generate verification question for baseline response
independently answer the verification questions (w/o baseline response).

then check the answer against baseline response
generate final response based on baseline response and feedback from step 3

:::

—

Task planing:
- generate skill tree
- evaluate nodes and find high-level skeleton
Advisor:
- interpret environment: Failure, New Object, Revaluation
Arborist:
- adding node for new information
- pruning the failed nodes