Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate increase in "bootup-time" errors #7691

Closed
connorjclark opened this issue Mar 25, 2019 · 12 comments
Closed

Investigate increase in "bootup-time" errors #7691

connorjclark opened this issue Mar 25, 2019 · 12 comments

Comments

@connorjclark
Copy link
Collaborator

connorjclark commented Mar 25, 2019

@patrickhulce

awful repro:


N=30

export CHROME_PATH="/Applications/Google Chrome Canary.app/Contents/MacOS/Google Chrome Canary"

mkdir -p timing-runs

for run in {1..$N}
do
	echo "run NS $run"
	output_path="timing-runs/ns-$run.json"
	node lighthouse-cli https://www.elbatal-egypt.com/nader/ --output json --output-path="$output_path" --chrome-flags="--enable-features=NetworkService" --save-assets 2> /dev/null
	grep -q "Invalid task timing data" "$output_path"
	if [ $? -eq 0 ]; then
		echo broke
	fi
done

for run in {1..$N}
do
	echo "run no NS $run"
	output_path="timing-runs/no-ns-$run.json"
	node lighthouse-cli https://www.elbatal-egypt.com/nader/ --output json --output-path="$output_path" --chrome-flags="--disable-features=NetworkService" --save-assets 2> /dev/null
	grep -q "Invalid task timing data" "$output_path"
	if [ $? -eq 0 ]; then
		echo broke
	fi
done

for trace in timing-runs/*.trace.json; do
	echo $trace
	grep '"ph":"B"' $trace | wc -l ; grep '"ph":"E"' $trace | wc -l ;
done

Runs 12 and 15 for NS on failed. NS off never failed. Using Canary 75.0.3745.0.

Just NS 12 and 15:

ns-12-and-15.zip

timing-runs/no-ns-1-0.trace.json
   11057
   11052
timing-runs/no-ns-10-0.trace.json
   13173
   13168
timing-runs/no-ns-11-0.trace.json
   11194
   11189
timing-runs/no-ns-12-0.trace.json
   10710
   10705
timing-runs/no-ns-13-0.trace.json
   10992
   10986
timing-runs/no-ns-14-0.trace.json
   11276
   11270
timing-runs/no-ns-17-0.trace.json
   10534
   10529
timing-runs/no-ns-18-0.trace.json
   10845
   10839
timing-runs/no-ns-19-0.trace.json
   11046
   11041
timing-runs/no-ns-2-0.trace.json
    9754
    9750
timing-runs/no-ns-20-0.trace.json
   11345
   11339
timing-runs/no-ns-21-0.trace.json
   12167
   12161
timing-runs/no-ns-22-0.trace.json
   11571
   11566
timing-runs/no-ns-23-0.trace.json
   11784
   11778
timing-runs/no-ns-24-0.trace.json
   12144
   12138
timing-runs/no-ns-25-0.trace.json
   12870
   12864
timing-runs/no-ns-26-0.trace.json
   11549
   11543
timing-runs/no-ns-27-0.trace.json
   12113
   12107
timing-runs/no-ns-28-0.trace.json
   11884
   11880
timing-runs/no-ns-29-0.trace.json
   11580
   11574
timing-runs/no-ns-3-0.trace.json
   11248
   11242
timing-runs/no-ns-30-0.trace.json
   11901
   11895
timing-runs/no-ns-4-0.trace.json
   10796
   10790
timing-runs/no-ns-5-0.trace.json
    3537
    3533
timing-runs/no-ns-6-0.trace.json
   11472
   11466
timing-runs/no-ns-7-0.trace.json
   10895
   10890
timing-runs/no-ns-8-0.trace.json
   10997
   10992
timing-runs/no-ns-9-0.trace.json
   10665
   10660
timing-runs/ns-1-0.trace.json
   12281
   12276
timing-runs/ns-10-0.trace.json
   11881
   11876
timing-runs/ns-11-0.trace.json
   12156
   12151
timing-runs/ns-12-0.trace.json
    8761
    8754
timing-runs/ns-13-0.trace.json
   12384
   12379
timing-runs/ns-14-0.trace.json
   11936
   11931
timing-runs/ns-15-0.trace.json
   12096
   12087
timing-runs/ns-16-0.trace.json
    4120
    4116
timing-runs/ns-17-0.trace.json
   11637
   11632
timing-runs/ns-18-0.trace.json
   11660
   11654
timing-runs/ns-19-0.trace.json
   12428
   12422
timing-runs/ns-2-0.trace.json
   11877
   11872
timing-runs/ns-20-0.trace.json
   10836
   10831
timing-runs/ns-21-0.trace.json
   11888
   11883
timing-runs/ns-22-0.trace.json
   11808
   11803
timing-runs/ns-23-0.trace.json
   12032
   12027
timing-runs/ns-24-0.trace.json
   11909
   11904
timing-runs/ns-25-0.trace.json
   11018
   11013
timing-runs/ns-26-0.trace.json
   11617
   11612
timing-runs/ns-27-0.trace.json
   11674
   11668
timing-runs/ns-28-0.trace.json
   12381
   12376
timing-runs/ns-29-0.trace.json
   14834
   14829
timing-runs/ns-3-0.trace.json
   11533
   11528
timing-runs/ns-30-0.trace.json
   12478
   12472
timing-runs/ns-4-0.trace.json
   12751
   12746
timing-runs/ns-5-0.trace.json
   11631
   11625
timing-runs/ns-6-0.trace.json
   11443
   11439
timing-runs/ns-7-0.trace.json
   11955
   11950
timing-runs/ns-8-0.trace.json
   12248
   12243
timing-runs/ns-9-0.trace.json
   11776
   11770
@connorjclark
Copy link
Collaborator Author

connorjclark commented Mar 25, 2019

I ran 30 more times without NS, still never happened.

I ran 30 times with NS on stable (73.0.3683.75), never happened. Maybe a bisect will be enlightening.

@connorjclark
Copy link
Collaborator Author

connorjclark commented Mar 25, 2019

this will not be fast

python tools/bisect-builds.py -a mac -g 625896 -b 643727 --use-local-cache --not-interactive -t 30 -c "bash $HOME/bisect-timing.sh  %p"

bisect-timing.sh

#!/bin/bash

LH=~/code/lighthouse/lighthouse-cli

killall -9  chrome
export CHROME_PATH=$1
qs="Invalid task timing data"
if node "$LH" https://www.elbatal-egypt.com/nader/ --output json --chrome-flags=--enable-features=NetworkService 2> /dev/null | grep -q "$qs"; then
  echo broke
  exit 1
else
  echo ok
  exit 0
fi

@patrickhulce
Copy link
Collaborator

Taking one for the team @hoten, respect 🖖

@connorjclark
Copy link
Collaborator Author

connorjclark commented Mar 26, 2019

Bisect ain't gonna work :(

Doesn't happen often enough to bisect. Plus, and I'm not 100%, I think it can happen in stable too?

@connorjclark connorjclark changed the title Invalid task timing data Investigate increase in "Invalid task timing data" Mar 26, 2019
@connorjclark
Copy link
Collaborator Author

I started looking into this because this audit error ~doubled when we updated LR's render server. I'm not having any luck repro'ing this consistently, and the above was my attempt at getting around it, but didn't work too well.

Couple reasons for punting -

  • In today's meeting, we also saw that overall audit errors dropped, so that's cool
  • I don't believe this should block a WRS update (and the associated PWAness that will come out of it).
  • No effect on perf score, just sometimes (3% in staging) miss out on a diagnostic.

So for now I suggest a punt.

@brendankenny
Copy link
Member

Are we still only planning on running with NS in LR?

So for now I suggest a punt.

I am worried about us noticing/remembering the issue in the future. main-thread-tasks throws an error, but if we start doing task breakdowns in new audits, it might not be as evident that something's wrong. It would be good to have a plan here, if only setting an alarm to when we'll get back into this.

@connorjclark
Copy link
Collaborator Author

Are we still only planning on running with NS in LR?

Yup. New versions of WRS don't even work w/ NS off.

I am worried about us noticing/remembering the issue in the future. main-thread-tasks throws an error, but if we start doing task breakdowns in new audits, it might not be as evident that something's wrong. It would be good to have a plan here, if only setting an alarm to when we'll get back into this.

Yeah lets not let it slide. I don't understand the task stuff yet, failed at reproing it, and see this as something that can wait - at least until after working out the NS / WRS / PWA stuff.

I'll set a reminder for after I/O. Hopefully can resolve it earlier.

@brendankenny
Copy link
Member

from the ns-12 trace, at first glance it seems like this is caused by there being two B events with no matching E events because the trace ends. Seems surprising that we haven't run into this before, actually, though maybe I'm missing the actual cause.

If that is right, seems ok to just discard those events?

@patrickhulce
Copy link
Collaborator

If that is right, seems ok to just discard those events?

Yeah that's a good workaround for now 👍 This was the result of the initial investigation too.

It's still somewhat disconcerting why we're missing so many more of them now than we were before though.

@brendankenny
Copy link
Member

here's the end of the ns-15 trace

Screen Shot 2019-03-26 at 15 41 18

It's still somewhat disconcerting why we're missing so many more of them now than we were before though.

maybe something changed with tracing where ending previously waited on any open events but now it ends immediately?

@brendankenny
Copy link
Member

This was the result of the initial investigation too

oh, ha, just saw the previous email about this. Late to the party :)

@connorjclark connorjclark changed the title Investigate increase in "Invalid task timing data" Investigate increase in "bootup-time" errors Mar 28, 2019
@brendankenny
Copy link
Member

mostly fixed in #7728, with more coming in #9230 and tracked in #7764

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants