[Design] Prompt Templating #31

missBerg · 2024-12-05T21:42:14Z

The design proposal for Prompt Templating should include:

Motivation #49
Feature Definition
Control Plane API
Technical Implementation Proposal

This issue was created from conversation during Dec 5th Community meeting
https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#bookmark=id.dz9gpy397ymu

sanjeewa-malalgoda · 2024-12-12T11:49:30Z

Motivation

As AI-based applications grow, managing prompts dynamically at the gateway level can become critical for scalability, security, and operational efficiency. Envoy AI Gateway can leverage Custom Resources (CRs) to manage and apply AI prompt templates as part of its integration with AI backends like LLMs.

Challenges Address

Handle diverse AI use cases with minimal application logic changes.
Centralize prompt template management within the gateway
Ensure only pre-approved templates are used, mitigating risks of misuse.

Feature Definition

The proposed feature introduces AI Prompt Templating Support to the Envoy AI Gateway. The key components are:
Template Definition and Management:
Templates are defined as part of a Custom Resource (CR) within Kubernetes.
Templates include placeholders (variables) and configurations for specific AI models or tasks.
Dynamic Updates:
Templates can be updated through Kubernetes CRs, making changes seamless without disrupting application workflows.
Template Application:
The gateway processes templates dynamically based on request metadata and template CR configurations.
Templates are applied to both requests and responses.
Integration with LLMBackend CR:
The feature integrates with the LLMRoute CR to specify which templates are applicable for a particular LLMRoute.
Templates are referenced to the LLMRoute configuration, ensuring compatibility.
Pre-Approved Templates:
Only templates defined in Kubernetes CRs and referenced LLMRoute CR are applied.

Control Plane Design

Example Custom Resources
LLMPolicy CR - We can extend this LLMPolicy CR to add other LLM policies as well in the future. That can include prompt decoration, prompt templating and other use cases which change request payloads before passing it to the LLM backend.

apiVersion: ai.envoyproxy.io/v1alpha1
kind: LLMPolicy
metadata:
  name: translation-policy
spec:
  description: "Policy for managing AI interactions with translation prompts and additional configurations"
  policies:
    promptTemplate:
      description: "Template for translating text between languages"
      variables:
        - name: source_language
          description: "Source language of the text"
          from:
            type: "header"
            value: "x-source-language"
        - name: target_language
          description: "Target language of the text"
          from:
            type: "path"
            value: "/translations/{target_language}" # Extract from path parameter
        - name: text
          description: "The text to translate"
          from:
            type: "body"
            xpath: "/payload/text" # XPath for extracting from XML or JSON body
      template: "Translate '{{text}}' from {{source_language}} to {{target_language}}."
  targetRefs:
    - apiVersion: ai.envoyproxy.io/v1alpha1
      kind: LLMRoute
      name: translation-route-1
    - apiVersion: ai.envoyproxy.io/v1alpha1
      kind: LLMRoute
      name: translation-route-2

Example Workflow
The application sends a request to the Envoy Gateway with metadata indicating the translation-prompt template.
Envoy Gateway:

Matches the request metadata to the translation-prompt CR.
Substitutes variables dynamically using request metadata.
Sends the processed prompt to the LLM backend configured in the LLMRoute CR.
The LLM backend processes the request and sends the response back through the gateway.

Technical Implementation Proposal

Prompt Template Engine:

Embedding a lightweight templating engine into the Envoy Gateway or using existing capability.
Templates will be loaded dynamically from Kubernetes CRs.

Custom Resource Definitions (CRDs):

Define CRDs for LLMPolicy and reference to LLMRoute CR to support template references.
Use the Kubernetes API server for CRUD operations on policies(including prompt templates).

Dynamic Configuration Loader:

Implement a watcher for Kubernetes CR changes to dynamically update the gateway's configuration.
Cache templates in memory for low-latency access during runtime.

Request/Response Processing:

Extend Envoy's filter chain to include a Prompt Templating Filter.

The filter:

Extracts metadata from the request.
Matches the template name from the request with the PromptTemplate CR.
Applies the template by substituting variables.
Sends the processed prompt to the backend defined in the LLMBackend CR.

missBerg added enhancement New feature or request api Control Plane API design labels Dec 5, 2024

missBerg assigned missBerg and sanjeewa-malalgoda and unassigned missBerg Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Design] Prompt Templating #31

[Design] Prompt Templating #31

missBerg commented Dec 5, 2024 •

edited

Loading

sanjeewa-malalgoda commented Dec 12, 2024 •

edited

Loading

[Design] Prompt Templating #31

[Design] Prompt Templating #31

Comments

missBerg commented Dec 5, 2024 • edited Loading

sanjeewa-malalgoda commented Dec 12, 2024 • edited Loading

Motivation

Feature Definition

Control Plane Design

Technical Implementation Proposal

missBerg commented Dec 5, 2024 •

edited

Loading

sanjeewa-malalgoda commented Dec 12, 2024 •

edited

Loading