What it does
This is a simplistic demo workflow showing how to extract a license plate number from an image of a car submitted via a form – or in more general terms showcasing how you can:
– Use a form trigger to upload files and feed it into an LLM
– Use a changeable LLM model for image-to-text analysis
Set up steps
1. Import the workflow.
2. Ensure you have registered an account, purchased some credits, and created an API key for OpenRouter.ai.
3. Create/adapt the OpenRouter credential with your individual API key for OpenRouter.
4. “Test workflow” and submit an image of a car with a license plate to extract its number.
How to adapt
By changing the “prompt” in the “Settings” node, you can quickly adapt this exemplatory workflow to other image-to-text use cases, such as:
– Summarization: “summarize what’s seen in the image”
– Location finding: “identify the location where the image was taken”
– Text extraction: “extract all text from the image and return it as markdown”
Thanks to using OpenRouter, you also can quickly experiment with finding good model choices by simply changing the “model” in the “Settings” node. The following models gave good results for this demo use-case:
– google/gemini-2.0-flash-001
– meta-llama/llama-3.2-90b-vision-instruct
– openai/gpt-4o
The llama-3.2-11b and even claude-3.5-sonnet didn’t recognize all characters in all test images.
Using a generic LLM model offers a quick way of prototyping an image-to-text application. For specific use cases in serious and scalable production deployments, consider using an API-based service specifically made for that purpose, such as:
– Google Cloud Vision API
– Microsoft Azure Computer Vision
– Azure AI Document Intelligence
– Amazon Textract