Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

infra: attempt to identify and reduce CI flakiness #2644

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

idoros
Copy link
Collaborator

@idoros idoros commented Aug 3, 2022

This draft is to help identify some of the causes of CI flakiness and eliminate them. Flaky cases are logged here, researched, attempted to be reproduced, and then solved in a separate PR.

Known cases


webpack e2e - timeout 10,000ms

packages\webpack-plugin\dist\test\e2e\3rd-party-standalone.spec.js

  • (3rd-party-standalone)
    • render css
    • override 3rd party

packages\experimental-loader\dist\test\basic-integration.spec.js

  • (basic-integration)
    • render css
    • css applied correctly

rollup e2e - timeout 30,000ms (macOS)

packages/rollup-plugin/dist/test/rollup-stc-config.spec.js

  • StylableRollupPlugin
    • should recovers in watch mode when broken "stc" source file in invalid

CLI diagnostics - timeout 25,000ms (windows)

packages\cli\dist\test\cli.spec.js

  • Stylable CLI
    • CLI diagnostics
      • should report diagnostics by default and exit the process with error exit 1

webpack watched project fail to update

packages/webpack-plugin/test/e2e/stc-watched-project.spec.ts

  • (st-watched-project)
    • build "stc" and webpack in the correct order

fixed in PR that improve the test runner actAndWait mechanism - #2655


OUT OF MEMORY (macOS)

  • seems to only happen on macOS (all node versions)
  • fail after 40-50 minutes - something is stuck in a loop

fixed in PR that reduces the total job timeout, so that instead of waiting for 40-50 minutes for an unhelpful Javascript heap out of memory error, the job simply fails faster with a clearer log, showing the relative tests status - #2647


process not released (macOS)

packages/webpack-plugin/test/e2e/watched-project.spec.ts

This is the core issue that caused the OUT OF MEMORY flakiness, now resolved, we get a more helpful log quicker (see time gap between test and job fail):

image
This has something to do with the watch mechanism not releasing the test (maybe in after) AND the test not timing out, but I couldn't reproduce.

Notice that in cases that this happens, the Complete job step in the action has lots of orphan processes to terminate:
image

2022-01-18: this seems to be related to the CLI test-kit not clearing the timeout when running the CLI - fixed in 2810

@idoros idoros added infrastructure Git, CI or otherwise infrastructure related dev velocity labels Aug 3, 2022
@idoros idoros self-assigned this Aug 3, 2022
@idoros idoros force-pushed the ido/cli-flakiness branch 2 times, most recently from 41f72d9 to 017e788 Compare August 9, 2022 08:39
@idoros idoros added the plan a plan for organizing larger amounts of work label Aug 9, 2022
@idoros idoros marked this pull request as draft August 9, 2022 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dev velocity infrastructure Git, CI or otherwise infrastructure related plan a plan for organizing larger amounts of work
Projects
Status: 🎬 Ready for Work
Development

Successfully merging this pull request may close these issues.

None yet

1 participant