Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chat: sync token limit at model import time #3486

Merged
merged 17 commits into from Mar 22, 2024
Merged

Conversation

abeatrix
Copy link
Contributor

Building block for #3455

Previously, we had to check the authStatus and config every time we wanted to get the token limit for a model. This approach was inefficient as it required the caller to have access to authProvider and check the same config repeatedly.

With this change, we sync the token limit along with the model when we import models By including the contextWindow property in the model definitions, we can easily access the token limit for each model anywhere without the need for repeating the same checks we did when we import the model.

This approach eliminates the need for scattered token limit checks throughout the codebase, making the code more maintainable and accessible. It centralizes the token limit configuration within the model definitions, making it easier to understand and modify.

Test plan

Check token limit

Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this! Some suggestions for naming to help people catch mixing up byte/character and token counts.

lib/shared/src/models/dotcom.ts Outdated Show resolved Hide resolved
) {
const splittedModel = model.split('/')
this.provider = getProviderName(splittedModel[0])
this.title = splittedModel[1]?.replaceAll('-', ' ')
this.default = isDefaultModel
this.default = true
this.contextWindow = tokenLimit ? tokenLimit * 4 : DEFAULT_FAST_MODEL_TOKEN_LIMIT
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great to put tokenLimit * 4 inside a function, say tokenCountToByteCount(tokenLimit), so that when we make this accurate we don't have to hunt the codebase for the digit "4" and can instead find uses of that function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominiccooney Thank you for pointing this out! I have renamed them to use character limit (since we already have a constant called CHARS_PER_TOKEN and a function called tokensToChars) and token limit instead and updated the code based on your feedback!
Also added some unit tests to cover the token and char limits as well.

lib/shared/src/models/index.ts Outdated Show resolved Hide resolved
@abeatrix abeatrix requested review from dominiccooney and a team March 21, 2024 17:12
Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making it clear about tokens vs chars.

We can simplify this more. See how ModelProvider.get has a side effect, overwriting the primary models if the user is a dotcom user? It is surprising to have "get" change state. See how many auth status listeners push auth status to the model provider? That's too much. That can be simplified:

  • Make a class for holding all the models.
  • Move the statics from ModelProvider, so they become instance properties of the new class.
  • Create an instance of the class when the extension starts.
  • Subscribe this new instance to auth status changes.
  • Do the dotcom handling along with the rest of the auth status change handling.
  • You don't need to call setModelProvider from all the other places auth status changes.

Ideally, pass the instance of the new class to every place that needs models. But if that is too hard in one patch, you can make one static method to hang onto the single instance of it and we will clean it up later.

lib/shared/src/models/index.ts Outdated Show resolved Hide resolved
vscode/src/chat/chat-view/SimpleChatPanelProvider.ts Outdated Show resolved Hide resolved
Copy link
Contributor

@dominiccooney dominiccooney left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. Some more feedback inline I leave up to your judgement.

lib/shared/src/models/index.ts Outdated Show resolved Hide resolved
if (process.env.CODY_SHIM_TESTING === 'true') {
return []
}
// TODO (bee) watch file change to determine if a new model is added
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no file to watch?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking we can watch for changes in ~/. ollama/models/manifests/registry.ollama.ai/library where the ollama models are stored.

lib/shared/src/models/index.ts Outdated Show resolved Hide resolved
])
)
}
setModelProviders(authStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess one place is better than N places... Because the models are static, I think we should wire up setting the model providers to auth status at the top level. For example, say we make ChatManager lazier but still query models for edits, then we have a bug.

Having less global state, and giving the model provider object responsibility for keeping itself up-to-date instead of free-riding on nearby objects, will make it easier to keep evolving this code.

@abeatrix abeatrix merged commit da1fe66 into main Mar 22, 2024
20 checks passed
@abeatrix abeatrix deleted the bee/sync-model-token branch March 22, 2024 18:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants