Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Advanced Paste > Paste with AI] Custom Model / Endpoint Selection #32960

Open
nathancartlidge opened this issue May 22, 2024 · 42 comments
Open
Labels
Idea-Enhancement New feature or request on an existing product Product-Advanced Paste Refers to the Advanced Paste module Tracker Issue that is used to collect multiple sub-issues about a feature

Comments

@nathancartlidge
Copy link
Contributor

Description of the new feature / enhancement

It should be possible to configure the model used (currently fixed as gpt3.5-turbo) and endpoint (currently fixed as OpenAI's) to arbitrary values

Scenario when this would be used?

Sending requests to an alternative AI endpoint (eg a local model, internal company hosted models, alternative ai providers), or ensuring higher-quality conversions (eg by pointing requests at gpt-4o)

Supporting information

Microsoft's documentation appears to suggest that the underlying library used for AI completions supports other libraries, it just needs to be provided with an endpoint.

The currently used model is a hardcoded string in this repository

@nathancartlidge nathancartlidge added the Needs-Triage For issues raised to be triaged and prioritized by internal Microsoft teams label May 22, 2024
@htcfreek htcfreek added Idea-Enhancement New feature or request on an existing product Product-Advanced Paste Refers to the Advanced Paste module labels May 22, 2024
@minzdrav
Copy link

minzdrav commented May 23, 2024

It would be nice to have local models too.
For example: https://ollama.com/
It supports Llama 3, Phi3, and a lot of other models: https://ollama.com/library
C# client: https://github.com/awaescher/OllamaSharp

@nathancartlidge
Copy link
Contributor Author

@minzdrav This would be enabled by my proposed change - Ollama provides partial support for the OpenAI API schema, so you'd be able to point the plugin at your local model

@wcwong
Copy link

wcwong commented May 23, 2024

In particular, supporting an Azure OpenAI endpoint would be great first implementation. It would be even better if the Azure implementation supported Managed Identities so we don't end up with the unmanageable mess of API key distribution and rotation.

@htcfreek htcfreek added the Tracker Issue that is used to collect multiple sub-issues about a feature label May 24, 2024
@joadoumie joadoumie removed the Needs-Triage For issues raised to be triaged and prioritized by internal Microsoft teams label Jun 7, 2024
@AmirH-Amini
Copy link

supporting Groq would be nice too

@htcfreek
Copy link
Collaborator

IMPORTANT
Regarding the custom AI model option planned: We should make sure that companies can (still) force opt-out using Group Policies. And I think it would be great, if companies could enforce a list of supported endpoints by Group Policy.

@wellmorq
Copy link

wellmorq commented Jul 3, 2024

bump...

@alexonpeace
Copy link

bump

@tjtanaa
Copy link

tjtanaa commented Jul 19, 2024

Has anyone started working on this item?

@nathancartlidge
Copy link
Contributor Author

nathancartlidge commented Jul 19, 2024

Has anyone started working on this item?

To my knowledge, no

The basics should be pretty easy to implement though! All you'd need to do to allow for a different api-compatible host and model is add two text fields to the settings page (model, URL) and link them in exactly the same way that the chatgpt token field is currently linked into the app (as far as I know, they are just additional inputs into the same function in the associated library)

Obviously making it "Microsoft-quality" will require more work on documentation and integration - see the points @htcfreek has raised in this thread for examples of these

I'd be happy to take a look, but I won't be able to for at least a week so you may be better placed than me.

@htcfreek
Copy link
Collaborator

@nathancartlidge , @tjtanaa
Directly started to implement this feature: No. But @CrazeXD and @joadoumie are working on #33109 and as I imagine their plans also include this issue or at least depends on it.

@nathancartlidge

Obviously making it "Microsoft-quality" will require more work on documentation and integration - see the points @htcfreek has raised in this thread for examples of these

Are you referring to my comment regarding the Group Policies above.

@nathancartlidge
Copy link
Contributor Author

Yeah, that's what I was referring to! It's a great addition, but also the kind of thing I'd completely overlook when building this sort of feature :)

I hadn't seen that thread before, thanks for bringing it up - from a cursory reading it does look like their work could currently be independent from this, as it seems to be exclusively non-ai features - however, I agree that it could make sense to combine them for the sake of reduced development overheads.

@tjtanaa
Copy link

tjtanaa commented Jul 19, 2024

Has anyone started working on this item?

To my knowledge, no

The basics should be pretty easy to implement though! All you'd need to do to allow for a different api-compatible host and model is add two text fields to the settings page (model, URL) and link them in exactly the same way that the chatgpt token field is currently linked into the app (as far as I know, they are just additional inputs into the same function in the associated library)

Obviously making it "Microsoft-quality" will require more work on documentation and integration - see the points @htcfreek has raised in this thread for examples of these

I'd be happy to take a look, but I won't be able to for at least a week so you may be better placed than me.

Thank you very much for the suggestions @nathancartlidge .

I have a prototype version which leads me to think there are some changes that I am thinking of making. I would be great if I could get some inputs. I am planning to target local LLM Usecase on PC without dedicated GPU. (In most cases, there is only enough resources to host one model at a time).

  1. I found that the Azure OpenAIClient is not that compatible with some OpenAI-Compatible API. I am thinking of implemented a simple class that invokes the /v1/completions or /v1/chat/completions.
  2. Moreover, many opensource models are mainly chat/instruct models. Chat completion endpoint handles the prompt template of the models. Thus, I am thinking of adding an additional function (private Response<ChatCompletions> GetAIChatCompletion(string systemInstructions, string userMessage)) to class AICompletionHelper that calls chat completion endpoint instead for other custom endpoints.
  3. Within the private Response<Completions> GetAICompletion(string systemInstructions, string userMessage), the model is automatically through /v1/models endpoints.
  4. On the setting page, the users is able to see which endpoint it is pointing to it. (Default to OpenAI service endpoint)

Other feature improvements would be adding some common usecases as quick access on menu, such as

  • Explain
  • Summarise
  • Keypoint

Moreover, I also saw that there is a branch dev/crloewen/advancedpaste-v2improvements that has been adding more features in. Moreover, this feature has been previewed in the official channel (e.g. I saw a youtube video about it). But this branch seems to have been stale for 2 months.
If I am going to start working on it, do I start from this branch?

I am new to Group Policies. How is this feature implemented?

@geekloper
Copy link

Is there any update on this feature? I’m really looking forward to it! 😊

@Aniket-Bhat
Copy link

Wouldn't it be easier to add support for OpenRouter? That should cover most of the popular AI models, and make things easier on the integration too, yes?

@CrazeXD
Copy link

CrazeXD commented Sep 11, 2024

Using OpenRouter requires you to use their credit platform I believe. This would not be useful to people who wish to use their own API keys.

@Zaazu
Copy link

Zaazu commented Sep 12, 2024

I'd like to also advocate for Ollama support.

@k10876
Copy link

k10876 commented Sep 23, 2024

Hi there,

I've personally made a version which fixes the AI to use Aliyun Qwen, one of the AIs available in China. Please note that my edits still do NOT offer custom options, which is somehow beyond my capabilities.

Due to the fact that I'm poor-experienced, I'll not be making a pull request since it's far from Microsoft's standards. I look forward to seeking expert developers to help with this issue.

If you are interested in my edits, please go to https://github.com/k10876/PowerToys/tree/development-qwen (I'll offer a binary installer if I have time). The interface shows OpenAI but it's actually Qwen-based. Simply plug your API keys and get ready to enjoy!

@nightkall
Copy link

nightkall commented Oct 23, 2024

Google Gemini 1.5 Flash (Sep) is the fastest AI and has a free API.

It would also be nice to add some AI actions by default or functionality like in Writing Tools:
Image

Where you select a text, you choose or write a task for the AI and then it replaces the selected text with the processed AI text automatically. (Instead of copying and choosing the option to paste the LLM-processed text). You can also translate text prompting 'in Spanish' (and whatever you want).

@Kiansjet
Copy link

Kiansjet commented Oct 26, 2024

+1 on this. Quickest and minimal effort change since the team is probably busy is just let the user override the api root url. Single settings entry + some concat + maybe a bit more robust error handling since endpoint isnt fixed now.

Then if the user wants to redirect the request to a local llm server or a middleman script that proxies the request to a different model or whatever they can.

Over time though, what everyone else above said.


As a temporary solution, I found software like Fiddler Classic can be used to manually redirect calls to openai's api to anywhere you want. I'm not good with regex and I rushed it but this works:
Image

Runs just fine on a local gemma-2 9B IQ4 XS on LM Studio Server, but I found some other models may have issues complying with the instruction to not write too much garbage. Llama in particular just kept trying to write me python to complete the task after it already completed the task.
Image

@bj114514
Copy link

I also need this feature very much. My network does not allow me to use OpenAI services, and I often have to work offline. I think we can add API address and model name options so that I can use models from Olama and other service providers.

@riedel
Copy link

riedel commented Nov 3, 2024

Hi, I just want to chime in. Cool feature that I unfortunately cannot use right now...

Having OpenAI-Endpoints as the only option, is IMHO a red flag for many commercial users in Europe due to GDPR-concerns (especially coupling sth to copy & paste).

I also wonder if such bundling is not rather anti-competitive in the end. Similar to the availabilities of different models in GitHub Co-Pilot, it would seem good practice to offer different endpoints.

@HaoTian22
Copy link

HaoTian22 commented Nov 6, 2024

I also noticed that Advanced Paste is incompatible with some alternative endpoint formats.
Some API needs to post "messages": [{"role": "user", "content": "xxxx"}] instead of "prompt": ["xxxxx"], and responses with "text" instead of "message", otherwise the server might respond with 500

I wrote a script for mitmproxy, with the help of LLM. Maybe it can solve this issue.

Start command: mitmdump --mode local:PowerToys.AdvancedPaste.exe --mode upstream:http://proxy.server -s route.py

import json
from mitmproxy import http

def request(flow: http.HTTPFlow) -> None:
    if flow.request.host == "api.openai.com" and flow.request.method == "POST" and flow.request.headers.get("content-type", "").startswith("application/json"):
        try:
            # Add your alternative endpoint
            flow.request.host = "your.alternative.endpoint"
            flow.request.path = "/v1/chat/completions"
            # flow.request.headers["authorization"] = "Bearer sk-xxx" # token
            flow.request.headers["http-referer"] = "http://localhost:8080/my_great_app"

            request_data = json.loads(flow.request.get_text())

            request_data["model"] = "gpt-4o-mini"
            # API Compatible
            if "prompt" in request_data:
                request_data["messages"] = [{"role": "user", "content": request_data["prompt"][0]}]
                del request_data["prompt"]


            flow.request.set_text(json.dumps(request_data))
            pass
        except json.JSONDecodeError:
            pass

# Forbid Stream Effect
def responseheaders(flow):
    flow.response.stream = False


def response(flow: http.HTTPFlow) -> None:
    # Process  your.alternative.endpoint's response
    if flow.request.host == "your.alternative.endpoint" and flow.response.headers.get("content-type", "").startswith("application/json"):
        try:
            response_data = json.loads(flow.response.get_text())

            # API Compatible
            if "choices" in response_data:
                for choice in response_data["choices"]:
                    if "message" in choice:
                        choice["text"] = choice["message"].get("content", "")
                        del choice["message"]
            
            response_data["usage"] = {"prompt_tokens": 88,"completion_tokens": 27, "total_tokens": 115}
            
            flow.response.set_text(json.dumps(response_data))
        except json.JSONDecodeError:
            pass

@an303042
Copy link

an303042 commented Nov 7, 2024

+1 for ollama support on localhost/IP on local network.

@Jitsins
Copy link

Jitsins commented Nov 13, 2024

+1 on this. Quickest and minimal effort change since the team is probably busy is just let the user override the api root url. Single settings entry + some concat + maybe a bit more robust error handling since endpoint isnt fixed now.

Then if the user wants to redirect the request to a local llm server or a middleman script that proxies the request to a different model or whatever they can.

Over time though, what everyone else above said.

As a temporary solution, I found software like Fiddler Classic can be used to manually redirect calls to openai's api to anywhere you want. I'm not good with regex and I rushed it but this works: Image

Runs just fine on a local gemma-2 9B IQ4 XS on LM Studio Server, but I found some other models may have issues complying with the instruction to not write too much garbage. Llama in particular just kept trying to write me python to complete the task after it already completed the task. Image Image

This works surprisingly well. I ended up adding rules manually in the FiddlerScript. Thanks!

@an303042
Copy link

an303042 commented Nov 16, 2024

This works surprisingly well. I ended up adding rules manually in the FiddlerScript. Thanks!

Trying to get this to run without success. Can you share your Fiddler script? I keep getting 400 responses from ollama :(

@varrialeciro
Copy link

Are there some news on integration of Advanced paste with Microsoft Copilot?

@BurningSpy
Copy link

If they want to stay with openai which is understandable I'd at least appreciate if we can get a model selection for the currently available openai models. For example I'd prefer using gpt-4o or gpt-4o turbo instead of gpr3.5.

That'd already help a lot!
Local and other APIs as options would be cool too.

@CberYellowstone
Copy link

Has there been any progress on this process?

@risedangel
Copy link

+1 for this request any improvements ?

@vfxturjo
Copy link

+1 for this request, any improvements?

1 similar comment
@KimLay233
Copy link

+1 for this request, any improvements?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Idea-Enhancement New feature or request on an existing product Product-Advanced Paste Refers to the Advanced Paste module Tracker Issue that is used to collect multiple sub-issues about a feature
Projects
Status: No status
Development

No branches or pull requests