Chat: refactor Ollama chat client #3881

abeatrix · 2024-04-20T03:04:14Z

This PR fixes the issue of using PromptString as prompt text in the messages we send to the LLM, causing regression in chat response from some Ollama models.
Also updated the chat client for Ollama and Groq to have better error handling:
Log usage on complete for easier debugging purpose

Ollama:

Groq:

Test plan

Run this branch in debug mode and follow our Ollama docs to set up Ollama with Cody
Try asking Cody a question

Before

After

dominiccooney

Could you tell me more about the specific reason PromptString was regressing ollama prompts?

lib/shared/src/llm-providers/ollama/chat-client.ts

lib/shared/src/llm-providers/ollama/utils.ts

lib/shared/src/llm-providers/ollama/chat-client.ts

dominiccooney

I believe there are still problems with ollama response decoding and this PR is on the right track. It just needs to go further in handling the incremental responses.

lib/shared/src/llm-providers/ollama/chat-client.ts

dominiccooney

Specific feedback inline.

dominiccooney · 2024-04-25T08:29:48Z

lib/shared/src/llm-providers/ollama/chat-client.ts

+                try {
+                    const { done, value } = await reader.read()
+                    if (typeof value === 'string') {
+                        const parsedData = JSON.parse(value) as OllamaGenerateResponse


This could be fragile. We shouldn't assume that the network response is chunked on JSON boundaries, either. We should split on newlines, and then decode. The protocol is a bit like Server-Sent Events, "packets" are terminated by newlines.

@dominiccooney hey Dom sorry i haven't got a chance to work on this since I''m still working on my sprint works, but I will try to pick this back up tomorrow. Will mark it as a draft until this is ready again, appreciate your feedback

abeatrix · 2024-04-30T00:30:22Z

@dominiccooney i've applied your feedback to the PR:

Use a TextDecoderStream
Accumulate the decoded text into a string
Break on newlines
JSON.parse each chunk and act on it
Don't assume that, having seen the end of the network response, that you have decoded all the characters in the response yet; the last packet may contain data.
- process if (value) where value will be decoded first, before checking for if (done)

valerybugakov · 2024-05-09T00:43:28Z

lib/shared/src/llm-providers/ollama/chat-client.ts

+                // Splits the decoded chunk by the new lines and filters out empty strings.
+                if (value) {
+                    for (const chunk of value.split(RESPONSE_SEPARATOR).filter(Boolean)) {
+                        const line = JSON.parse(chunk) as OllamaGenerateResponse


Similar to the completions client, we should buffer the chunks in case they are not cut on JSON boundaries.

What do you think about reusing the response streaming/reading part from the completions client to keep this logic in one place?

@valerybugakov yea that'd be great! I will update this PR to reuse the one you created!

abeatrix added 2 commits April 19, 2024 19:46

Chat: Fix prompt string issue in Ollama messages

bf1dfb8

Fix typo

d04de87

abeatrix requested review from dominiccooney and a team April 20, 2024 03:04

abeatrix added 2 commits April 19, 2024 20:40

Add unit tests

68052f0

fix lint

03624bf

dominiccooney mentioned this pull request Apr 22, 2024

feedback: Context files with ollama #3796

Open

8 tasks

dominiccooney requested changes Apr 22, 2024

View reviewed changes

abeatrix added 3 commits April 22, 2024 08:48

Merge branch 'main' into bee/fix-promptstring

68771f1

remove getOllamaChatMessages

35440b4

use TextDecoderStream for ollama chat

62fed93

dominiccooney requested changes Apr 25, 2024

View reviewed changes

lib/shared/src/llm-providers/ollama/chat-client.ts Show resolved Hide resolved

dominiccooney requested changes Apr 25, 2024

View reviewed changes

abeatrix marked this pull request as draft April 26, 2024 04:55

abeatrix added 3 commits April 27, 2024 00:47

split on newlines

d8a9232

use TextDecoder instead

8099ec5

switch back to TextDecoderStream

1130d25

abeatrix changed the title ~~Chat: Fix prompt issue in Ollama chat client~~ Chat: refactor Ollama chat client Apr 30, 2024

abeatrix marked this pull request as ready for review April 30, 2024 00:30

valerybugakov reviewed May 9, 2024

View reviewed changes

abeatrix marked this pull request as draft May 20, 2024 18:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Chat: refactor Ollama chat client #3881

Chat: refactor Ollama chat client #3881

abeatrix commented Apr 20, 2024

dominiccooney left a comment

dominiccooney left a comment

dominiccooney left a comment

dominiccooney Apr 25, 2024

abeatrix Apr 26, 2024

abeatrix commented Apr 30, 2024

valerybugakov May 9, 2024

abeatrix May 20, 2024

Chat: refactor Ollama chat client #3881

Are you sure you want to change the base?

Chat: refactor Ollama chat client #3881

Conversation

abeatrix commented Apr 20, 2024

Test plan

Before

After

dominiccooney left a comment

Choose a reason for hiding this comment

dominiccooney left a comment

Choose a reason for hiding this comment

dominiccooney left a comment

Choose a reason for hiding this comment

dominiccooney Apr 25, 2024

Choose a reason for hiding this comment

abeatrix Apr 26, 2024

Choose a reason for hiding this comment

abeatrix commented Apr 30, 2024

valerybugakov May 9, 2024

Choose a reason for hiding this comment

abeatrix May 20, 2024

Choose a reason for hiding this comment