Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Page.setContent should wait for resources to be loaded #728

Closed
aslushnikov opened this issue Sep 9, 2017 · 39 comments
Closed

Page.setContent should wait for resources to be loaded #728

aslushnikov opened this issue Sep 9, 2017 · 39 comments
Assignees

Comments

@aslushnikov
Copy link
Contributor

(as mentioned in #486 and other places)

We need a way to wait for page to load all the resources after the page.setContent.

The lifecycle events might help help.

@aslushnikov
Copy link
Contributor Author

aslushnikov commented Oct 4, 2017

Meanwhile. a good workaround for page.setContent that waits for all the resources to load:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });

@cmazakas
Copy link

cmazakas commented Oct 10, 2017

It's amazing that you posted the workaround. I'm running exactly into this issue, attempting to use puppeteer as a PDF generating service from HTML.

Thank you for filing a formal issue.

kimmobrunfeldt added a commit to alvarcarto/url-to-pdf-api that referenced this issue Oct 13, 2017
aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Oct 23, 2017
This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References puppeteer#728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.
aslushnikov added a commit that referenced this issue Oct 24, 2017
This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References #728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.
aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Oct 24, 2017
This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes puppeteer#728.
aslushnikov added a commit that referenced this issue Oct 24, 2017
This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes #728.
ithinkihaveacat pushed a commit to ithinkihaveacat/puppeteer that referenced this issue Oct 31, 2017
This patch:
- migrates navigation watcher to use protocol-issued lifecycle events.
- removes `networkIdleTimeout` and `networkIdleInflight` options for
  `page.goto` method
- adds a new `networkidle0` value to the waitUntil option of navigation
  methods

References puppeteer#728.

BREAKING CHANGE:

As an implication of this new approach, the `networkIdleTimeout` and
`networkIdleInflight` options are no longer supported. Interested
clients should implement the behavior themselves using the `request` and
`response` events.
ithinkihaveacat pushed a commit to ithinkihaveacat/puppeteer that referenced this issue Oct 31, 2017
…eteer#1152)

This patch adds "options" parameter to the `page.setContent` method. The
parameter is the same as a navigation parameter and allows to specify
maximum timeout to wait for resources to be loaded, as well as to
describe events that should be emitted before the setContent operation
would be considered successful.

Fixes puppeteer#728.
@aslushnikov aslushnikov reopened this Nov 8, 2017
@murilozilli
Copy link

I'm having trouble with this too on version 0.13-alpha with browser configs as:

"waitUntil": "networkidle2",
"timeout": 60000

@HanXHX
Copy link

HanXHX commented Dec 7, 2017

I confirm the same trouble.

@Padam87
Copy link

Padam87 commented Dec 13, 2017

with the latest release this hangs too:

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle' });

wait for networkidle0 instead

await page.goto(`data:text/html,${html}`, { waitUntil: 'networkidle0' });

@HanXHX
Copy link

HanXHX commented Dec 15, 2017

Hey @aslushnikov

You said in #1312 to wait for https://chromium-review.googlesource.com/c/chromium/src/+/747805
The patch is merged...

Everything is OK to solve this issue?

Cheers!

@aslushnikov
Copy link
Contributor Author

@HanXHX this requires more work upstream: in order to reuse lifecycle events, page.setContent should initiate a navigation, which in turn should be plumbed through browser-side navigation aka "plznavigate".

@tzieleniewski
Copy link

Hi Team!

Please advise at the WO is not working in our case. (puppeteer 1.9.0)
I am trying to convert the XHTML content. I am providing XHTML content as an excaped inlined string.
The generated document contains raw XHTML and there are no external resources requests (in this case CSS).

Example

'use strict'

const puppeteer = require('puppeteer');

(async () => {
    const browser = await puppeteer.launch()
    const page = await browser.newPage()
    await page.setRequestInterception(true)

    const xhtml = `<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> <meta charset=utf-8"/> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/> <link href="css.css" media="all" rel="stylesheet" type="text/css"/> </head> <body> some content </body> </html>`

    console.log(xhtml)

    page.on('request', request => {
        console.log(`Intercepted request with URL: ${request.url()}`)
        request.continue()
    });

    await page.goto(`data:text/html,${xhtml}`, {
        waitUntil: 'networkidle0'
    });
    await page.pdf({
        path: 'xhtml.pdf'
    })
    await browser.close()
})()

Here is the initial document content

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
   <head>
      <meta charset=utf-8"/>
      <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
      <link href="css.css" media="all" rel="stylesheet" type="text/css"/>
   </head>
   <body>
      some content 
   </body>
</html>

@ObviouslyGreen
Copy link

Is there a way to set the url if we end up using the data:text/html workaround? Do relative paths for resources work via this method?

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 16, 2018
This roll includes:
- https://crrev.com/608658 - DevTools: emit "init" lifecycle event when document gets opened

References puppeteer#728
aslushnikov added a commit that referenced this issue Nov 16, 2018
This roll includes:
- https://crrev.com/608658 - DevTools: emit "init" lifecycle event when document gets opened

References #728
@cape-dev
Copy link

cape-dev commented Nov 20, 2018

I had the exact same problem with external resources. So the workaround from @aslushnikov helped me a lot. But as @ObviouslyGreen points out it lacks the support of resolving relative paths. I investigated what puppeteer takes as "url" when using this workaround and it is the whole html (obviously).

I could solve the problem with relative paths (for me in CSS styles) with the following approach:

  1. create a folder (let's name it dist) in which all relative resources are placed in
  2. generate the html as needed (paths should be relative to the root of dist)
  3. write the html file to the root of dist
  4. use the following code to load the html with all the relative resources resolved correctly:
const pathToHtml = path.join(__dirname, 'dist', `${randomName}.html`);

const page = await browser.newPage();
await page.goto(`file:${pathToHtml}`, { waitUntil: 'networkidle0' });

Note that the html file needs to have the '.html' suffix for puppeteer to render the html properly (at least for me this was the case).

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 20, 2018
aslushnikov added a commit that referenced this issue Nov 20, 2018
This patch teaches `page.setContent` to await resources in
the new document.

**NOTE**: This patch changes behavior: currently, `page.setContent`
awaits the `"domcontentloaded"` event; with this patch, we can now await
other lifecycle events, and switched default to the `"load"` event.

The change is justified since current behavior made `page.setContent`
unusable for its main designated usecases, pushing our client
to use [dataURL workaround](#728 (comment)).

Fixes #728
@cape-dev
Copy link

cape-dev commented Nov 21, 2018

@aslushnikov Great to see the same options for page.setContent!

Is it possible now with page.setContentto load resources with relative paths as I described in my workaround in the above comment?

@aslushnikov
Copy link
Contributor Author

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');

@tzieleniewski
Copy link

@aslushnikov when can we expect next release with updated setContent?

randytarampi added a commit to randytarampi/resume-cli that referenced this issue Dec 15, 2018
Per puppeteer/puppeteer#728 and https://github.com/GoogleChrome/puppeteer/blob/master/docs/api.md#pagesetcontenthtml-options.

The latest `puppeteer` seems to have broken PDF generation (with the [`page.goto`](puppeteer/puppeteer#728 (comment)) workaround), at least locally for me both here in `resume-cli` and in [`jsonresume-theme-randytarampi`](https://www.npmjs.com/package/jsonresume-theme-randytarampi).
@iksheth
Copy link

iksheth commented Aug 26, 2019

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');

This solution is really helpful, though if described with little bit details might have saved some time.

Thank you.

@casesolved-co-uk
Copy link

@kamekazemaster yeah, the paths should be resolved against the page's URL.

await page.goto('https://example.com');
// logo.png becomes https://example.com/logo.png
await page.setContent('<img src="/logo.png"></img>');

Can anyone confirm if adding a <base href=...> tag to the html works instead of the initial page.goto above ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.