Test Chat - info

This experimental chat application is implemented in Python and deployed on Microsoft Azure using Kubernetes, Azure Functions, the Azure Container Registry, and other Azure services.

The backend runs distributed across three pods, with load balancer in the front.

A local Azure depolyed instance of Ollama (due to no GPU usage it is slow) provides responses using the llama3.2:3b model. Additionally, users can access the gpt-4o model hosted on Azure AI (the fastest option), while the most advanced capabilities are available via gpt-5-mini on OpenAI.

The current RAG (Retrieval-Augmented Generation) implementation is basic, built on a simple vector database.
RAG is one of the most common approaches for customizing responses based on the content of internal documents.

Play with your own system prompt - have fun!

You can self define a prompt to drive the way the answers are going to be generated. Text given below overwrittes your roles radio buttons above.