Using llama3.2-vision:11b for the app agent #140

ms-cleblanc · 2024-12-04T19:18:30Z

I'm using GPT for my host agent and the response has all of the components I would expect

DEBUG: Json string before loading: {
    "Observation": "I observe that the Google Chrome application is available from the control item list, with the title of 'New Tab - Google Chrome'.",
    "Thought": "The user request can be solely completed on the Google Chrome application. I need to open the Google Chrome application and click on the + icon in the top bar to open a new tab.",
    "CurrentSubtask": "Open a new tab in Google Chrome by clicking on the + icon in the top bar.",
    "Message": ["(1) Locate the + icon in the top bar of Google Chrome.", "(2) Click on the + icon to open a new tab."],
    "ControlLabel": "4",
    "ControlText": "New Tab - Google Chrome",
    "Status": "CONTINUE",
    "Plan": [],
    "Questions": [],
    "Comment": "I plan to open a new tab in Google Chrome by clicking on the + icon in the top bar."
}

However, when I use Ollama as my app agent the responses are not in the format UFO expects, I don't have Observations or Thoughts or even plans. I do get a decent response from the llama

DEBUG: Json string before loading: { "id": 3, "title": "Open a new tab in Google Chrome by clicking on the + icon in the top bar.", "steps": [ { "stepNumber": 1, "description": "Locate the + icon in the top bar of Google Chrome." }, { "stepNumber": 2, "description": "Click on the + icon to open a new tab." } ], "image": "annotated screenshot" }

What could I be doing wrong? How does the AppAgent know to provide Thoughts and Observations?

The text was updated successfully, but these errors were encountered:

vyokky · 2024-12-05T05:49:49Z

This is probably because the model you use is not strong enough. We feed the same prompts to all models. If the model fails to follow instruction, it may generate different output which we do not expect.

ms-cleblanc · 2024-12-05T14:32:44Z

Thanks for your help! I upgraded my VM and ran the 90b model with the same issue. The context window is 128K just like GPT so I wonder why it's ignoring the prompt. I think Ollama wants the image as a filename rather than bytes in the context window. Do you think that change might help?

DEBUG: Json string before loading: {"control_text": "Customer Service workspace", "control_type": "TabItem", "label": "13"}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using llama3.2-vision:11b for the app agent #140

Using llama3.2-vision:11b for the app agent #140

ms-cleblanc commented Dec 4, 2024

vyokky commented Dec 5, 2024

ms-cleblanc commented Dec 5, 2024

Using llama3.2-vision:11b for the app agent #140

Using llama3.2-vision:11b for the app agent #140

Comments

ms-cleblanc commented Dec 4, 2024

vyokky commented Dec 5, 2024

ms-cleblanc commented Dec 5, 2024