Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracking: context limit issues with @-mentioned files #3455

Closed
10 tasks done
abeatrix opened this issue Mar 18, 2024 · 2 comments
Closed
10 tasks done

tracking: context limit issues with @-mentioned files #3455

abeatrix opened this issue Mar 18, 2024 · 2 comments
Assignees
Labels

Comments

@abeatrix
Copy link
Contributor

abeatrix commented Mar 18, 2024

This is an issue spinned off from #2965 where the original goal was to display warnings for the @-mentioning massive files

Context: https://sourcegraph.slack.com/archives/C05AGQYD528/p1710331429302729

Current state

Here is the current state of @-mentioning massive files summarized by @chillatom:

  1. The file size "check" that we use to display the warning is not the real token limit applied by the system. It is an approximation. This leads to the odd behavior where we trigger a warning, but you can still include the file or you can successfully copy and paste the file contents into the chat message.
  2. The warning check is computed per file, which means that if file A & B are both individually under the estimated limit, we trigger no warning, even if A+B is above the limit.
  3. If the file is actually too large to be included, but the user submits anyway (e.g. vscode/src/completions/logger.ts which is 860 LOC and ~6,700 tokens. It appears that we silently exclude the file from context and Cody hallucinates. 🔴
  4. If the file triggers the warning, but isn't too large to be included e.g. vscode/src/local-context/symf.ts which is 641 LOC and 4,867 tokens, I see the content actually used as context and referenced correctly.

At a minimum it feels like we should do a few things

  1. Tie our warning to the actual operation of the product. If we say it's too large, the file should not be able to be input
  2. We should not silently exclude a file that has been explicitly @ mentioned
  3. Consider the case of multiple @ mentioned files
  4. Prioritize @ mention files over other fetched context (not sure if we do this today) as to avoid silently removing the file explicitly referenced

Over a longer term, I think we should explore

  • Expanding the context windows, especially for some of the flagship models (I'll be writing a proposal here)
  • Consider summarization or proposition extraction from the file

Design Tasks

Design Tasks:

Engineering Tasks

Engineering Tasks:

  1. cody
  2. bug clients/jetbrains cody
  3. cody
    abeatrix

Other design ideas from @toolmantim : #3439 (comment)

@abeatrix abeatrix self-assigned this Mar 18, 2024
@abeatrix abeatrix changed the title Deal with @-mentioning massive files tracking: context limit issues with @-mentioned files Mar 18, 2024
@taylorsperry
Copy link
Contributor

Just a note that once we've aligned on a path forward and those changes have shipped, we should ping @MaedahBatool to make sure the docs are up to date. (We know users have been confused about this.)

@kalanchan
Copy link
Contributor

landed in v1.12

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants