1. Receive image URL and user prompt.
2. Download image from URL or process base64 data.
3. Prepare the image for analysis by the VLM.
4. Send the image and prompt to the `understand_image` tool.
5. VLM analyzes the image based on the prompt.
6. Return the AI analysis of the image.