Vector Database as a Big Data Analysis Tool for AI Agents
This series of workflows shows how to build big data analysis tools for production-ready AI agents with the help of vector databases. These pipelines are adaptable to any dataset of images, hence, many production use cases.
Uploading (Image) Datasets to Qdrant
1. The first pipeline to upload an image dataset to Qdrant.
2. The second pipeline sets up cluster (class) centres and cluster (class) threshold scores needed for anomaly detection.
Anomaly Detection Tool
The third is the anomaly detection tool, which takes any image as input and uses all preparatory work done with Qdrant to detect if it’s an anomaly to the uploaded dataset.
For KNN (K Nearest Neighbours) Classification
1. The first pipeline to upload an image dataset to Qdrant.
2. The second is the KNN classifier tool, which takes any image as input and classifies it on the uploaded Qdrant dataset.
To Recreate Both
You’ll have to upload crops and lands datasets from Kaggle to your own Google Storage bucket and recreate APIs/connections to Qdrant Cloud (you can use the Free Tier cluster), Voyage AI API, and Google Cloud Storage.
Workflow for Setting Up Cluster (Class) Centres & Cluster (Class) Threshold Scores for Anomaly Detection
This preparatory workflow sets cluster centres and cluster threshold scores so anomalies can be detected based on these thresholds.
Here, we’re using two approaches to set up these centres: the “distance matrix approach” and the “multimodal embedding model approach.”