Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fatal trace logic error - expected start event, got X #7764

Closed
connorjclark opened this issue Mar 28, 2019 · 27 comments
Closed

Fatal trace logic error - expected start event, got X #7764

connorjclark opened this issue Mar 28, 2019 · 27 comments
Assignees

Comments

@connorjclark
Copy link
Collaborator

connorjclark commented Mar 28, 2019

I attached the assets from a LR run that resulted in this error message.

assets.txt

I'd provided the lhr too, but it's on a remote machine, too big and scp keeps crashing (side note, what???)

Some notes:

  • could not get this to happen locally, neither via node lighthouse nor a local WRS. Only seems to happen in production LR environments
  • provided assets was from running against bloomberg.com. @exterkamp has a whole list of of other URLs that it happens on too.
  • does not happen in prod LR
  • happens on ~5% of requests in canary LR

Related issues

#7691

@patrickhulce patrickhulce self-assigned this Mar 28, 2019
@patrickhulce
Copy link
Collaborator

patrickhulce commented Mar 28, 2019

The issue here is that there's a v8.callFunction child event that is trying to be larger than its V8.Execute parent. It's only by 1 microsecond, so maybe if we wanted to be forgiving we could try to adjust event end times within some allowable threshold?

@brendankenny
Copy link
Member

that definitely seem weird, you would think the trace system wouldn't allow let that be possible (logging an event isn't responsible for its own timing, right?). But I also vaguely remember dealing with something like this back when we were figuring out timestamp sorting.

@paulirish is trace viewer flexible in this way?

@patrickhulce
Copy link
Collaborator

when we were figuring out timestamp sorting.

Yeah I remember having trouble there, but that was when they were equal though. This time it is actually 1 microsecond after its parent should have ended.

@brendankenny
Copy link
Member

but that was when they were equal though.

Ah, I was thinking about when we were looking at tts, since in theory if two things are in the same thread their tts should put them in the same order that ts does, but somehow it turns out that's not always true (or wasn't at the time).

Even less of an analogous situation :) but I think I remember trace viewer (or devtools timeline model?) just rolling with it since it sometimes happens, so I'm curious if it does the same for this.

@connorjclark
Copy link
Collaborator Author

The issue here is that there's a v8.callFunction child event that is trying to be larger than its V8.Execute parent. It's only by 1 microsecond, so maybe if we wanted to be forgiving we could try to adjust event end times within some allowable threshold?

Could you point out the relevant trace events here? My first time looking at this trace and dunno how to find this myself

@connorjclark
Copy link
Collaborator Author

for the uninitiated

@patrickhulce
Copy link
Collaborator

Could you point out the relevant trace events here? My first time looking at this trace and dunno how to find this myself

I popped the trace into main-thread-tasks-test.js and put a debugger on the throw line :)

I deleted the setup but can split it back out again if you need the timestamps?

@connorjclark
Copy link
Collaborator Author

connorjclark commented Apr 3, 2019

I deleted the setup but can split it back out again if you need the timestamps?

i gots it

image

just the good parts:

{
"traceEvents": [
  {"pid":25,"tid":25,"ts":0,"ph":"M","cat":"__metadata","name":"thread_name","args":{"name":"CrRendererMain"}},
  {"pid":2,"tid":2,"ts":5369216088,"ph":"I","cat":"disabled-by-default-devtools.timeline","name":"TracingStartedInBrowser","s":"t","tts":126876,"args":{"data":{"frameTreeNodeId":1,"persistentIds":true,"frames":[{"frame":"CC0F3EE5A17C1A7B522189106CB19588","url":"about:blank","name":"","processId":25}]}}},
  {"pid":25,"tid":25,"ts":5369233648,"ph":"R","cat":"blink.user_timing","name":"navigationStart","tts":553722,"args":{"frame":"CC0F3EE5A17C1A7B522189106CB19588","data":{"documentLoaderURL":"https://www.bloomberg.com/","isLoadingMainFrame":true,"navigationId":"8E68D13D5E2F56F4F5DC6A966190ED1A"}}},
  {"pid":25,"tid":25,"ts":5369487132,"ph":"R","cat":"loading,rail,devtools.timeline","name":"firstPaint","tts":780878,"args":{"frame":"CC0F3EE5A17C1A7B522189106CB19588","data":{"navigationId":"8E68D13D5E2F56F4F5DC6A966190ED1A"}}},
  {"pid":25,"tid":25,"ts":5369487134,"ph":"R","cat":"loading,rail,devtools.timeline","name":"firstContentfulPaint","tts":780981,"args":{"frame":"CC0F3EE5A17C1A7B522189106CB19588","data":{"navigationId":"8E68D13D5E2F56F4F5DC6A966190ED1A"}}},
  {"pid":25,"tid":25,"ts":5370649595,"ph":"B","cat":"v8","name":"V8.Execute","tts":1298190,"args":{},"meta":"current task.parent"},
  {"pid":25,"tid":25,"ts":5370649600,"ph":"E","cat":"v8","name":"V8.Execute","tts":1298194,"args":{"runtime-call-stats":{}},"meta":"event"},
  {"pid":25,"tid":25,"ts":5370649595,"ph":"X","cat":"v8","name":"v8.callFunction","dur":6,"tdur":6,"tts":1298189,"args":{},"meta":"current task"}
]}

The surrounding v8 events might be relevant, but the above does give the same error.

@connorjclark
Copy link
Collaborator Author

diff --git a/lighthouse-core/computed/main-thread-tasks.js b/lighthouse-core/computed/main-thread-tasks.js
index 8efb0f32..207f436e 100644
--- a/lighthouse-core/computed/main-thread-tasks.js
+++ b/lighthouse-core/computed/main-thread-tasks.js
@@ -124,6 +124,9 @@ class MainThreadTasks {
         tasks.push(newTask);
         currentTask = newTask;
       } else {
+        while (currentTask.parent && currentTask.event.ph === 'X') {
+          currentTask = currentTask.parent;
+        }
         if (currentTask.event.ph !== 'B') {
           throw new Error(
             `Fatal trace logic error - expected start event, got ${currentTask.event.ph}`);

@patrickhulce does this make any sense? passed all the tests + error doesn't happen

@patrickhulce
Copy link
Collaborator

Kinda, but if it's definitely a fatal trace logic error for trace events to overlap like that. Later logic will generate nonsensical numbers if there are two events executing at once like this or the events aren't directly nested within their parents. We would have to force the X event to end at the timestamp of the B event if we go this route.

We should definitely add a test for this case too so something would've caught it I can get on that :)

@Nikunj1703
Copy link

Hi, I am also getting this error when I run the Lighthouse test from the CI. It never happens when I run the test locally.
This error appears in the Diagnostics of performance metrics for:

  1. Reduce JavaScript execution time
  2. Minimize main-thread work

Is there a work-around I can put in place?

@patrickhulce patrickhulce removed the 5.0 label May 28, 2019
@andystalick
Copy link

We are still seeing intermittent Fatal trace logic error - expected start event, got X when testing our React site from the Lighthouse 5.0.0 system on web.dev.

@patrickhulce
Copy link
Collaborator

Thanks for letting us know @andystalick! 5.0 shipped with a few of the workarounds but was still relatively conservative. It sounds like we'll have to be a bit more aggressive in our recovery from Chrome errors to fully take care of this.

@benschwarz
Copy link
Contributor

@patrickhulce Seeing this on LH 5.1 too

@patrickhulce
Copy link
Collaborator

Thanks @benschwarz there was a lot more discussion around these cases in aslushnikov/tracium#8 which fixes will be making their way over for our tracehouse refactor in the next few weeks.

@christian-bromann
Copy link

christian-bromann commented Jun 18, 2019

In case this info helps: we at Sauce only experiencing this issue (Fatal trace logic error - expected start event, got X) when running a Lighthouse test on Windows using Chrome 75 (see performance.json), on MacOS it works flawless (see performance.json).

@christian-bromann
Copy link

To give some context: we are using Puppeteer and Lighthouse to trace the browser when being automated via WebDriver and we experience the error above only for Chrome v75 on Windows. Looking into the flame chart of a simple trace (e.g. for a page like https://this-page-intentionally-left-blank.org/) we see more than just ordering issues, e.g.

Chrome v74 Windows (working_tracelog.txt):
Screen Shot 2019-06-19 at 3 32 41 PM

Chrome v75 Windows (borked_tracelog.txt):
Screen Shot 2019-06-19 at 3 33 05 PM

I see the same issue with when using Tracium to get all computed main thread tasks. If remove the following 3 events from the borked tracelog I get proper results back:

// ...
{"pid":4048,"tid":3652,"ts":67225000,"ph":"X","cat":"v8","name":"LocalWindowProxy::Initialize","dur":19000,"args":{"IsMainFrame":true}},
// ...
{"pid":4048,"tid":3652,"ts":67258000,"ph":"X","cat":"disabled-by-default-devtools.timeline","name":"RunTask","dur":7000,"args":{}},
// ...
{"pid":4048,"tid":3652,"ts":67276000,"ph":"X","cat":"disabled-by-default-devtools.timeline","name":"RunTask","dur":29000,"args":{}},
// ...

I will look more into this to see if I can make Lighthouse more forgiving without impacting the performance results.

@patrickhulce
Copy link
Collaborator

Thanks @christian-bromann! This effort is mostly underway (#9174 #9230), but your info does confirm what we're seeing in that sometimes a few trace events are just irreconcilable and the only option is to drop some of them to get back to a sensible state.

@cjboranp
Copy link

So I've 'googled' the error that I'm getting from pagespeed insights and I've found myself onto this page. Is this the same error we would be getting on the page and if so is it just a waiting game before this gets resolved for the pagespeed insights site?

@patrickhulce
Copy link
Collaborator

@cjboranp yes it's likely the same error and it's a waiting game before #9491 is merged and deployed to PSI.

@christian-bromann
Copy link

@patrickhulce is there an effort to fix this in Chrome directly? I logged an issue in Chromium because this problem also messes up the performance tab in DevTools, e.g. here

@patrickhulce
Copy link
Collaborator

@christian-bromann not that I'm aware of. You've already filed in the correct place though for it to be looked into.

@AdnanCukur
Copy link

We are also experiencing this issue.

@Thanaen
Copy link

Thanaen commented Nov 22, 2019

For information, the problem is still present on version 5.2.0!
(screenshot)

@patrickhulce
Copy link
Collaborator

@Thanaen the fixes for these issues are in 5.6.0, so that would be expected.

@Thanaen
Copy link

Thanaen commented Nov 22, 2019

Oh, okay!
Sorry for the inconvenience!

@patrickhulce
Copy link
Collaborator

patrickhulce commented Jun 23, 2020

We expect this has been fixed by #9785 and #9786. The relative silence since that release went out is reassuring too, so we'll go ahead and close this out but if anyone runs into this in >=6.0.0 and can provide a trace, we'd love to hear about it :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests