Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

userDataDir + headless = lost authorization #921

Closed
vsemozhetbyt opened this issue Sep 30, 2017 · 75 comments · Fixed by #1028
Closed

userDataDir + headless = lost authorization #921

vsemozhetbyt opened this issue Sep 30, 2017 · 75 comments · Fixed by #1028
Labels

Comments

@vsemozhetbyt
Copy link
Contributor

vsemozhetbyt commented Sep 30, 2017

  • Puppeteer version: v0.12.0-alpha
  • Platform / OS version: Windows 7 x64
  • URLs (if applicable): any that needs authorization
  1. Create a test script and an empty folder test-profile-dir:
'use strict';

const puppeteer = require('puppeteer');

(async function main() {
  try {
    const browser = await puppeteer.launch({
      headless: false,
      userDataDir: 'test-profile-dir',
    });
    const page = await browser.newPage();

    await page.goto('https://twitter.com/');

    console.log(await page.evaluate(() => document.title));
    console.log(await page.evaluate(() => document.cookie));

    // await browser.close();
  } catch (err) {
    console.error(err);
  }
})();

You will see something like this:

Twitter. It's what's happening.

personalization_id=...; guest_id=...; ct0=...

Sign in into Twitter and close the browser.

  1. Run the same script second time (await browser.close(); can be uncommented). You will see something like this:
Twitter

personalization_id=...; guest_id=...; ct0=...; ads_prefs=...; remember_checked_on=...;
twid=...; tip_nightmode=...; _ga=...; _gid=...; dnt=...; lang=...
  1. Run the same script, but with headless: true, The output is the same as before authorization:
Twitter. It's what's happening.

personalization_id=...; guest_id=...; ct0=...

I have tried various sites, all of them seem to lose authorization in headless mode.

Some more notes:

  • response.request().headers does not contain cookies in both headless: false and headless: true modes.

  • console.log(await page.cookies('https://twitter.com/')); contains many cookies in the headless: false mode. In the headless: true mode it gives an empty array [].

@chuckchamberland
Copy link

I can duplicate this on MacOS 10.10.5 using 0.11.0 and facebook.

I'm also having trouble with an internal site I'm trying to automate, but also seeing it in headful mode. I only mention it here because of the note about 'response.request().headers' not containing cookies.

@aslushnikov aslushnikov added the bug label Oct 4, 2017
aslushnikov pushed a commit that referenced this issue Oct 6, 2017
Headless isn't closing gracefully, which sometimes causes data loss when Chrome closes before it finishes writing things to disk.

See https://crbug.com/771830

References #921
@klaus82
Copy link

klaus82 commented Oct 8, 2017

Hello,
made some test.
In the both cases of the test added (page and page2), into the function launch of Launcher.js the userDataDir are correct, the folder logged is the same.

@aslushnikov
Copy link
Contributor

For the record: upstream bug is filed as https://crbug.com/771830

@vsemozhetbyt
Copy link
Contributor Author

@aslushnikov Is it connected? In OP's case, two first runs are with headless: false — so the issue is not in closing with headless: true, it is in not sending cookies with headless: true that have been successfully saved by previous runs.

Did you mean #918 ?

@fhmd4k
Copy link

fhmd4k commented Oct 10, 2017

The chromium can't save and restore storage and cookies in headless mode. It's chromium bug.

@fhmd4k
Copy link

fhmd4k commented Oct 13, 2017

It's chromium bug. It's not puppeteer bug.
On windows, you can run fiddler capture url cookies and then run chrome:(test some have cookie site)
chrome --user-data-dir=C:\Users\xxx\Desktop\testud --headless -- "https://www.example.com"

run some times, you can find the cookie is different in every times(same userDataDir)

@aslushnikov
Copy link
Contributor

This is actually not fixed by #1028.

@aslushnikov aslushnikov reopened this Oct 17, 2017
@vsemozhetbyt
Copy link
Contributor Author

Refs: #1055 (comment)

@JoelEinbinder
Copy link
Collaborator

This should be fixed as of #1063. @vsemozhetbyt can you confirm? Thanks

@vsemozhetbyt
Copy link
Contributor Author

vsemozhetbyt commented Oct 18, 2017

@JoelEinbinder Unfortunately, exactly this case from OP still is not fixed: logging in in headful mode is not restored in headless mode. I did not test if cookies are stored and restored in completely headless mode — this can be fixed, but for me, this is rather a speculative use case.

@aslushnikov
Copy link
Contributor

For the record: this is tracked by upstream bug https://bugs.chromium.org/p/chromium/issues/detail?id=775703

@vsemozhetbyt
Copy link
Contributor Author

Seems to be fixed upstream.

aslushnikov added a commit that referenced this issue Nov 2, 2017
This roll brings in a bunch of important patches:
- crrev.com/512647 Changed headless browser profile dir to use Default profile path
- crrev.com/512760 DevTools: stop idleness detector when pending navigation commits
- crrev.com/512905 DevTools: introduce Page.getFrameTree
- crrev.com/513373 DevTools: report loaderId in the lifecycle events
- crrev.com/513419 DevTools: introduce Page.setLifecycleEventsEnabled
- crrev.com/513422 DevTools: return loaderId from Page.navigate

Fixes #921 

BREAKING CHANGE:

Headless user profile structure is changing. Custom profiles set with --user-data-dir flag will no longer be read in Chrome 63 and will have to be recreated.

Alternatively, you can migrate old headless profile to a new structure. if you stored your profile in `<profile>` folder, you would run the following bash commands:

```bash
cd <profile>
mkdir Default
mv * Default
```

Full headless-dev PSA announcement: https://groups.google.com/a/chromium.org/forum/#!msg/headless-dev/asX8WgktXIE/zTUfmHDcAQAJ
@vsemozhetbyt
Copy link
Contributor Author

@aslushnikov @JoelEinbinder Unfortunately, #1259 does not fix the case in the OP: the outputs of the examples still differ and the console.log(await page.cookies('https://twitter.com/')); still displays many cookies in the headless: false mode and much fewer cookies in the the headless: true mode.

@aslushnikov
Copy link
Contributor

@vsemozhetbyt we're very sorry; it indeed doesn't work! We're on it.

@aslushnikov aslushnikov reopened this Nov 2, 2017
@aslushnikov
Copy link
Contributor

For the record: upstream patch is up for review https://chromium-review.googlesource.com/c/chromium/src/+/752743

aslushnikov added a commit to aslushnikov/puppeteer that referenced this issue Nov 3, 2017
This test ensures that Chrome Headless can successfully read cookies written
by Chrome Headful.

References puppeteer#921
aslushnikov added a commit that referenced this issue Nov 3, 2017
This test ensures that Chrome Headless can successfully read cookies written
by Chrome Headful.

References #921
@vincentheet
Copy link

I'm also seeing this issue on Windows 10. Cookies aren't persisted when running in headless mode since v2.1.0. It worked fine in v2.0.0. The issue is not present on macOS, version v2.1.1 and v3.0.2 work as expected on the mac. The fix suggested by @Ron-Burg to set the user data dir as an argument does not work for me either.

On thing I noticed is that when running in headless mode true the Cookies table in the Cookies file (SQLite database) stays empty. When running with headless mode false the Cookies table does get populated.

@bozdoz
Copy link

bozdoz commented May 7, 2020

Tried every suggestion here. Can't get cookies to persist from headless to headful. Even between headless sessions, the cookies are lost.

@Llorx
Copy link

Llorx commented May 7, 2020

So the solution is to revert to v2.0.x if headless cookies are critical to us?

@shanesreal
Copy link

I've downgraded puppeteer from v3.0.4 to v2.1.1 which didn't work and then finally downgraded to v2.0.0 and cache is now working properly in headless mode.

@peixotorms
Copy link

why is this closed?
Still not working.

@ArtixZ
Copy link

ArtixZ commented Jul 20, 2020

Is this issue really fixed in Chrome version 64.0.3262.0? I still have this issue in Chrome version 84.

Tried in Windows 10. userDataDir with headless doesn't store my login session.

@rubengmurray
Copy link

I'm adding this for reference as a few people have mentioned macOS in this thread: I've been able to use cookies for a couple of years without issue in puppeteer using the chrome-cookies-secure npm package.

@mikelpr
Copy link

mikelpr commented Oct 2, 2020

workaround? (not cookies but whole user profile, including localstorage)
it is definitely not fixed. maybe a regression. really terrible in any case

@marqmarti
Copy link

Hi,

I had this problem about 1 year and a half ago and I submitted a comment in this thread. The same problem was happening to me a month ago and the solution I found was including the full path in the userDataDir parameter.

Example:

const browser = await puppeteer.launch({
	userDataDir: '/var/www/html/nodetest/whateverfolder',
	headless: true,
	args: ['--no-sandbox']
});

@kajojify
Copy link

kajojify commented Oct 8, 2020

Hello guys. Still face this on Chrome 85.0.4183.121 (Official Build) (64-bit) for Windows 10.

@Zogoo
Copy link

Zogoo commented Oct 29, 2020

Is this issue solved or still pending? Because, it's still happening on v5.4.0

@shirshak55
Copy link

I just use playwright and there is no such issue + playwright is maintained by original authors. This issue will probably never be fixed because I don't see same progress in puppeteer as it used to be.

@anaskasmi
Copy link

I can't believe This issue is still persisting since 2017 and it is almost 2021 :/
puppeteer: "^5.3.1",
version of Chromium : 86.0.4240.0 (Developer Build) (64-bit)
OS : Windows 10 Pro

@coelho-faminto
Copy link

coelho-faminto commented Nov 4, 2020

This is not a "bug" really.
The developers could have fixed this long ago but they wont. That is because saved sessions can defeat google recaptcha ;)

The only way to fix that is by recompiling chromium, this is not a puppeteer issue, you can easily reproduce this behavior by running the chromium/chrome command line passing --user-data-dir and --headless to confirm that when --headless is present the user-data-dir will be ignored and the sessions will not be retrieved.

This is by design...

@sathio
Copy link

sathio commented Nov 4, 2020

and this is a very lame design I must say

@coelho-faminto
Copy link

coelho-faminto commented Nov 5, 2020

Exactly... as a malicious software developer, for me this does not represent a limitation since I can bypass it perfoming some runtime tricks on the targeted software. But for those legit users who don't have the ($) time and motivation necessary to get through this, it is just lame design, as you say.

In the end they are blocking the legit users while the malicious developers are still taking full advantage from the debugging tools 👍

If anyone here reading this is using chromium to defeat captcha on botnets: yay!

If you are not but you would like to you just have to think about one thing: The behavior of chromium-like browsers is completely different when --headless is enabled. So think carefully about it, all you really want is to do everything without the (victim) (client) user seeing anything. You don't need headless... Keep your software away from this option and you will be fine.

@H4xX0r1337
Copy link

So any solution for the C# bindings? Except using firefox instead? :p

@mikelpr
Copy link

mikelpr commented Mar 10, 2021

@H4xX0r1337 the solution is use playwright. it works and is much better documented

@shirshak55
Copy link

@mikelpr yea people complaining here are too lazy to even switch package. Just use playwright and be happy? It has even python, c# bindings etc.

@H4xX0r1337
Copy link

@mikelpr @shirshak55 ayy thanks, that that looks very cool as well. It also looks more modern and the async API would be another pro. Although it is much less known than selenium.

@r-ti
Copy link

r-ti commented Mar 28, 2022

Olá, estamos em 2022, não sei se foi resolvido de outra forma. Mas resolvi usando a biblioteca puppeteer-firefox:

//npm i puppeteer-firefox

const puppeteerFirefox = require('puppeteer-firefox');

let browser = await puppeteerFirefox.launch({
headless: true,
args: ['--no-zygote', '--no-sandbox'],
defaultViewport: null,
userDataDir: DATA_PATH,
//browser: 'chromium',
});

Depois setei:
page.setUserAgent("Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36");

Até mais...

@rainb3rry
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.