Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Puppeteer does not follow javascript redirects - leads to infinite loading of page without timeout #3684

Closed
DaveRingelnatz opened this issue Dec 17, 2018 · 3 comments

Comments

@DaveRingelnatz
Copy link

DaveRingelnatz commented Dec 17, 2018

Steps to reproduce

Tell us about your environment:

What steps will reproduce the problem?

const puppeteer = require('puppeteer');

let browser;
let page;

(async() => {
  let config = {
        args: ['--disable-gpu', '--no-sandbox', '--disable-setuid-sandbox', '--ignoreHTTPSErrors']
    };

  browser = await puppeteer.launch(config);
  console.log(await browser.version());
  page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on('request', interceptedRequest => {
    if (interceptedRequest.resourceType() === 'media') {
      log("request aborted");
      interceptedRequest.abort();
    }
    else {
      interceptedRequest.continue();
    }
  });

  async function collectLinks() {
    // don't use toJSON method of website -> delete it if existing
    delete Array.prototype.toJSON;

    // HTML4 elements
    let a = Array.from(document.querySelectorAll('a')).map((val) => val.href);

    // compose report object
    let dataObject = {};
    dataObject['a'] = JSON.stringify(a);
    //console.log(dataObject);
    return dataObject;
  }

  let response = await page.goto('http://clkuk.tradedoubler.com/click?p=190137&a=3063988&g=22260412&url=https://eu.puma.com/de/de/home/&epi=dede-shopping-stripe');
  await page.waitFor(5000);
  const status = response.status();
  if ((status >= 300) && (status <= 399)) {
    console.log('Redirect from', response.url(), 'to', response.headers()['location'])
  }
  let dataObject = await page.evaluate(collectLinks).catch(console.log);
  console.log(dataObject);
  console.log(response.request().redirectChain());
  await page.close();
  await browser.close();
})();

What is the expected result?

The page.evaluate function should either throw an error or a timeout should occur. But neither happens. Alternative solution: puppeteer should follow the javascript redirects; that would also solve the problem.

What happens instead?

The script is running in an endless loop, cause the page is not fully loaded or rather the redirects in form of javascript redirects is not followed. If I open the page above in my Chrome I got redirected to this url. If the evaluate script would run on this url, everything would be fine. Unfortunately the redirects don't work with puppeteer for some reasons.

@DaveRingelnatz DaveRingelnatz changed the title Problem with javascript redirects Puppeteer does not follow javascript redirects - leads to infinite loading of page without timeout Dec 18, 2018
@samgooi4189
Copy link

Having the same issue.

The URL I visit is https://drive.google.com/?tab=io.
Puppeteer is throwing error, because there is a Javascript redirect rather than 302 redirect.

@DaveRingelnatz
Copy link
Author

I would be happy if puppeteer would throw an error in my case. :)
My problem is the endless loop I stuck in.

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Jan 15, 2019
Drop requirement for matching "origin" and "content-type" headers
in requests and request interceptions. This way javascript redirects
that use form submission start working.

Fix puppeteer#3684.
@aslushnikov
Copy link
Contributor

I can reproduce this with the following script:

const puppeteer = require('puppeteer');

(async() => {
  const browser = await puppeteer.launch({
    ignoreHTTPSErrors: true,
  });
  const page = await browser.newPage();
  await page.setRequestInterception(true);
  page.on('request', r => r.continue());

  await page.goto('http://clkuk.tradedoubler.com/click?p=190137&a=3063988&g=22260412&url=https://eu.puma.com/de/de/home/&epi=dede-shopping-stripe');
  await page.waitFor(5000);
  await page.evaluate(() => 7 * 8); // hangs here
  await browser.close();
})();

The core issue here is again request interception - #3471, since our heuristic fails to match interception with requestWillBeSent.

I'll try to relax the heuristic a little more for now and see if it still holds in the wild.

aslushnikov added a commit that referenced this issue Jan 15, 2019
Drop requirement for matching "origin" and "content-type" headers
in requests and request interceptions. This way javascript redirects
that use form submission start working.

Fix #3684.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants