Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically start Embeddings indexing #4091

Merged
merged 8 commits into from May 16, 2024
Merged

Conversation

rafax
Copy link
Contributor

@rafax rafax commented May 8, 2024

Start Embedding indexing process on extension startup provided that Sourcegraph embedding API is used instead of OpenAI. Additionally, restarts embedding indexing process if a partial index is found.

Currently behind a dedicated feature flag, default off.

This doesn't remove the "Embedding consent" functionality as it is still needed for OpenAI (the current default).

Test plan

  • tested locally with un-indexed/partially indexed/indexed repos
    • un-indexed repo will automatically be indexed
    • partially indexed repo will continue indexing
    • fully indexed repo will be used for search (without re-indexing)
  • tested locally with repos not eligible for embedding
  • tested locally with auto-indexing flag off (caught an issue with indexing status not being reflected correctly)
  • tested locally with sourcegraph-embeddings flag off (making this PR a no-op)

@rafax rafax changed the title Automatically start indexing for Embeddings Automatically start Embeddings indexing May 8, 2024
@@ -310,6 +310,11 @@ const register = async (
searchViewProvider.initialize()
}

if (localEmbeddings) {
// kick-off embeddings initialization
localEmbeddings.start()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question to reviewers - is this the right place to start this? We start symf indexing in the block above, and the call to .start() takes <1ms in local testing, so this should be fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds acceptable to me.

@rafax rafax requested review from dominiccooney and a team May 8, 2024 13:11
@rafax rafax marked this pull request as ready for review May 8, 2024 13:17
if (!loadedOk) {
// failed to load the index, let's see if we should start indexing
if (this.autoIndexingEnabled && this.modelConfig.provider === 'sourcegraph') {
this.index()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not an expert in this part of the code, so I need someone to verify my concern here.

It might be the case that this.eagerlyLoad will return true, even if there are partial embeddings present for the repo. In that case, we will need check the status and call this.indexRetry.

I see it mentioned in the test plan that you have covered this case but I just wanted to double-check.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@rafax rafax May 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for checking and raising this - the existing treatment for partially indexed repos is that we require user action to continue indexing (screenshot), I tested that that this PR doesn't change that logic, I planned to have a separate PR for "automatic restart" once I understand why we don't automatically re-index in the first place (what comes to mind and what I saw in code is that some of the indexing processes can kill the agent, so automatic re-start could make Cody / agent crashloop?)
Screenshot 2024-05-09 at 09 05 10

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dominiccooney (as the original author of this) - is there a specific reason why we don't automatically re-start indexing after the extension is restarted? Would you be ok with making that change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this PR to restart indexing on extension startup (when no errors occurred in the current run) if we're using Sourcegraph provider and the feature-flag is enabled.

@@ -310,6 +310,11 @@ const register = async (
searchViewProvider.initialize()
}

if (localEmbeddings) {
// kick-off embeddings initialization
localEmbeddings.start()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sounds acceptable to me.

@rafax rafax requested a review from thenamankumar May 15, 2024 13:08
Copy link
Member

@thenamankumar thenamankumar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@rafax rafax enabled auto-merge (squash) May 15, 2024 14:27
@rafax rafax merged commit e3abb63 into main May 16, 2024
18 of 19 checks passed
@rafax rafax deleted the rg/auto_enable_embeddings branch May 16, 2024 07:59
rafax added a commit that referenced this pull request May 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants