Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design] Prompt Templating #31

Open
1 of 4 tasks
missBerg opened this issue Dec 5, 2024 · 1 comment
Open
1 of 4 tasks

[Design] Prompt Templating #31

missBerg opened this issue Dec 5, 2024 · 1 comment
Assignees
Labels
api Control Plane API design enhancement New feature or request

Comments

@missBerg
Copy link
Contributor

missBerg commented Dec 5, 2024

The design proposal for Prompt Templating should include:

  • Motivation #49
  • Feature Definition
  • Control Plane API
  • Technical Implementation Proposal

This issue was created from conversation during Dec 5th Community meeting
https://docs.google.com/document/d/10e1sfsF-3G3Du5nBHGmLjXw5GVMqqCvFDqp_O65B0_w/edit?tab=t.0#bookmark=id.dz9gpy397ymu

@missBerg missBerg added enhancement New feature or request api Control Plane API design labels Dec 5, 2024
@sanjeewa-malalgoda
Copy link

sanjeewa-malalgoda commented Dec 12, 2024

Motivation

As AI-based applications grow, managing prompts dynamically at the gateway level can become critical for scalability, security, and operational efficiency. Envoy AI Gateway can leverage Custom Resources (CRs) to manage and apply AI prompt templates as part of its integration with AI backends like LLMs.

Challenges Address

  • Handle diverse AI use cases with minimal application logic changes.
  • Centralize prompt template management within the gateway
  • Ensure only pre-approved templates are used, mitigating risks of misuse.

Feature Definition

The proposed feature introduces AI Prompt Templating Support to the Envoy AI Gateway. The key components are:
Template Definition and Management:
Templates are defined as part of a Custom Resource (CR) within Kubernetes.
Templates include placeholders (variables) and configurations for specific AI models or tasks.
Dynamic Updates:
Templates can be updated through Kubernetes CRs, making changes seamless without disrupting application workflows.
Template Application:
The gateway processes templates dynamically based on request metadata and template CR configurations.
Templates are applied to both requests and responses.
Integration with LLMBackend CR:
The feature integrates with the LLMRoute CR to specify which templates are applicable for a particular LLMRoute.
Templates are referenced to the LLMRoute configuration, ensuring compatibility.
Pre-Approved Templates:
Only templates defined in Kubernetes CRs and referenced LLMRoute CR are applied.

Control Plane Design

Example Custom Resources
LLMPolicy CR - We can extend this LLMPolicy CR to add other LLM policies as well in the future. That can include prompt decoration, prompt templating and other use cases which change request payloads before passing it to the LLM backend.

apiVersion: ai.envoyproxy.io/v1alpha1
kind: LLMPolicy
metadata:
  name: translation-policy
spec:
  description: "Policy for managing AI interactions with translation prompts and additional configurations"
  policies:
    promptTemplate:
      description: "Template for translating text between languages"
      variables:
        - name: source_language
          description: "Source language of the text"
          from:
            type: "header"
            value: "x-source-language"
        - name: target_language
          description: "Target language of the text"
          from:
            type: "path"
            value: "/translations/{target_language}" # Extract from path parameter
        - name: text
          description: "The text to translate"
          from:
            type: "body"
            xpath: "/payload/text" # XPath for extracting from XML or JSON body
      template: "Translate '{{text}}' from {{source_language}} to {{target_language}}."
  targetRefs:
    - apiVersion: ai.envoyproxy.io/v1alpha1
      kind: LLMRoute
      name: translation-route-1
    - apiVersion: ai.envoyproxy.io/v1alpha1
      kind: LLMRoute
      name: translation-route-2

Example Workflow
The application sends a request to the Envoy Gateway with metadata indicating the translation-prompt template.
Envoy Gateway:

  • Matches the request metadata to the translation-prompt CR.
  • Substitutes variables dynamically using request metadata.
  • Sends the processed prompt to the LLM backend configured in the LLMRoute CR.
  • The LLM backend processes the request and sends the response back through the gateway.

Technical Implementation Proposal

Prompt Template Engine:

  • Embedding a lightweight templating engine into the Envoy Gateway or using existing capability.
  • Templates will be loaded dynamically from Kubernetes CRs.

Custom Resource Definitions (CRDs):

  • Define CRDs for LLMPolicy and reference to LLMRoute CR to support template references.
  • Use the Kubernetes API server for CRUD operations on policies(including prompt templates).

Dynamic Configuration Loader:

  • Implement a watcher for Kubernetes CR changes to dynamically update the gateway's configuration.
  • Cache templates in memory for low-latency access during runtime.

Request/Response Processing:

  • Extend Envoy's filter chain to include a Prompt Templating Filter.

The filter:

  • Extracts metadata from the request.
  • Matches the template name from the request with the PromptTemplate CR.
  • Applies the template by substituting variables.
  • Sends the processed prompt to the backend defined in the LLMBackend CR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api Control Plane API design enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants