New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cannot retrieve content of pages that are >100MB. #4543
Comments
This is the max message size that DevTools can emit over the DevTools protocol: https://cs.chromium.org/chromium/src/content/browser/devtools/devtools_http_handler.cc?type=cs&q=kSendBufferSizeForDevTools&sq=package:chromium&g=0&l=83 Fix puppeteer#4543
…4571) This is the max message size that DevTools can emit over the DevTools protocol: https://cs.chromium.org/chromium/src/content/browser/devtools/devtools_http_handler.cc?type=cs&q=kSendBufferSizeForDevTools&sq=package:chromium&g=0&l=83 Test is failing on firefox since Firefox crashes when allocating 100Mb string. Fix #4543
Thanks for the quick fix @aslushnikov ! Just as an fyi, related to your patch it appears sending content >256MB using In any case, I don't know if it's worth documenting, but I thought I'd mention these findings just in case. |
Puppeteer >=1.11.0 cannot retrieve the content of HTML pages that are >100MB.
In Puppeteer 1.11.0, the
ws
dependency was bumped to ^6.1.0 (See: d3f50ea). However,ws
introduced a breaking change in 6.0.0 by adding amaxPayload
option that capped WebSocket message sizes to 100MB by default (See: websockets/ws#1402) and Puppeteer relies on thews
defaults (See: https://github.com/GoogleChrome/puppeteer/blob/9c4b6d06e214946e38999b9325c7d10152a1cf69/lib/WebSocketTransport.js#L28)Suggested Fix
Increasing the
maxPayload
option inws
allows us to circumvent this issue. I would suggest allowing users to customizews
options in Puppeteer to set a custommaxPayload
size. Thanks!Steps to reproduce
Environment:
Steps to reproduce
test.html
file >100MB in size. Quick example in Python:(Note: display set to none in test file purely to speed up loading of page. It can be omitted, but please note Puppeteer will take longer to load the page when trying to reproduce this issue).
test.html
file and try to retrieve page content (Add in path to yourtest.html
file in below script):What is the expected result?
Page content is retrievable without error.
What happens instead?
We see the following traceback:
The text was updated successfully, but these errors were encountered: