Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't force query string normalization #1246

Merged
merged 7 commits into from May 12, 2020

Conversation

szmarczak
Copy link
Collaborator

@szmarczak szmarczak commented May 11, 2020

It assumes that the query speicified in the input is not URLSearchParams unless the searchParams option is present.

The search params are correctly normalized via decodeURIComponent (I spent 30 mins reading stackoverflow and exeprimenting by myself, I'm pretty sure that's the correct one). The WHATWG URL handles it, what I need is just decodeURI before parsing the location header.

Fixes #1234

Checklist

  • I have read the documentation.
  • I have included a pull request description of my changes.
  • I have included some tests.
  • If it's a new feature, I have included documentation updates.

source/core/index.ts Outdated Show resolved Hide resolved
test/arguments.ts Outdated Show resolved Hide resolved
@Giotino
Copy link
Collaborator

Giotino commented May 11, 2020

I would suggest to simply remove the if block (if (options.url.search)) which you're modifying.

There's nothing wrong about using decodeURIComponent() (read the next comment), this behavior is even suggested by the RFC 3986 in section 2.3, but I think that it can cause problems with some server.

As the RFC use the word "should" I think you should consider to not apply that policy in order to increase compatibility.

source/core/index.ts Outdated Show resolved Hide resolved
@Giotino
Copy link
Collaborator

Giotino commented May 11, 2020

I don't want to be pedantic (you probably hate me now), but this PR is going to break x-www-form-urlencoded

Example: http://127.0.0.1:8123/?FIRST%3DSECOND=THIRD
Correct: { 'FIRST=SECOND': 'THIRD' }
This PR: { 'FIRST': 'SECOND=THIRD' }
This PR (without decodeURIComponent): { 'FIRST=SECOND': 'THIRD' }

@Giotino
Copy link
Collaborator

Giotino commented May 11, 2020

This is what I'm proposing
https://github.com/Giotino/got/tree/issue-1234

BTW both proposals (this PR and my fork) has a problem with merging
Example:

const client = got.extend({
  searchParams: {a: 'b'}
})

await client('http://example.org/?something');

become: http://example.org/?a=b

Note: apparently that was there even before (if the params are in the URL they're discarded)

@szmarczak

This comment has been minimized.

@szmarczak
Copy link
Collaborator Author

Actually I misunderstood you. This is working as expected. If you provide searchParams, it overrides the input. It's in the docs.

@szmarczak
Copy link
Collaborator Author

#1246 (comment) Well, we gotta sacrifice something! This is an edge case anyway... :P

source/core/index.ts Outdated Show resolved Hide resolved
@Giotino
Copy link
Collaborator

Giotino commented May 12, 2020

#1246 (comment) Well, we gotta sacrifice something! This is an edge case anyway... :P

Removing decodeURIComponent does not cause any side effect and does not break any format.

I can't see any reason to use decodeURIComponent in this case.

@Giotino
Copy link
Collaborator

Giotino commented May 12, 2020

4724a0d#diff-8eef38c70c64f535c3b8046b5ce40d9eR1067

Using decodeURI there have problems.
Example:

Location: http://example.org/?something%25 (valid)

http://example.org/?something%25 => http://example.org/?something%
which is should be invalid (according to WHATWG spec), error RequestError: URI malformed (read next comment)

@Giotino
Copy link
Collaborator

Giotino commented May 12, 2020

error RequestError: URI malformed

That maybe wrong, but the behavior is even stranger.

What's happening on my end is:
First request to: http://example.org/a_url_that_answer_with_redirect
New request following the redirect to: http://example.org/?something%25
If this request fails (eg. socket hang up) a new request is made, but it fails with RequestError: URI malformed

By removing decodeURI I managed to get rid of this strange behavior, and also the url "sended" to the server is correct (http://example.org/?something%25)

@szmarczak szmarczak changed the title Properly normalize query string Don't force query string normalization May 12, 2020
@szmarczak
Copy link
Collaborator Author

@sindresorhus So the solution seems to be just remove the stress on the search normalization. I think it's good to merge now.

@Giotino
Copy link
Collaborator

Giotino commented May 12, 2020

The last thing that's missing is to remove decodeURI from here
4724a0d#diff-8eef38c70c64f535c3b8046b5ce40d9eR1067

@szmarczak
Copy link
Collaborator Author

The last thing that's missing is to remove decodeURI from here

The server may send incorrect URL. If it's already correct, then decodeURI won't break it. But I don't say I'm right. Feel free to prove me wrong :)

@Giotino
Copy link
Collaborator

Giotino commented May 12, 2020

The last thing that's missing is to remove decodeURI from here

The server may send incorrect URL. If it's already correct, then decodeURI won't break it. But I don't say I'm right. Feel free to prove me wrong :)

Let's say I request a page that answer with Location: http://example.org/?some%25thing header (the URL is valid).
Got has to follow the redirect using that url (http://example.org/?some%25thing).
The expected behavior is to request this page http://example.org/?some%25thing, but with decodeURI it request http://example.org/?some%thing which is wrong (and, in this case, invalid).

@szmarczak
Copy link
Collaborator Author

https://tools.ietf.org/html/rfc3986#section-2.4

Under normal circumstances, the only time when octets within a URI
are percent-encoded is during the process of producing the URI from
its component parts. This is when an implementation determines which
of the reserved characters are to be used as subcomponent delimiters
and which can be safely used as data. Once produced, a URI is always
in its percent-encoded form.

When a URI is dereferenced, the components and subcomponents
significant to the scheme-specific dereferencing process (if any)
must be parsed and separated before the percent-encoded octets within
those components can be safely decoded, as otherwise the data may be
mistaken for component delimiters. The only exception is for
percent-encoded octets corresponding to characters in the unreserved
set, which can be decoded at any time. For example, the octet
corresponding to the tilde ("~") character is often encoded as "%7E"
by older URI processing implementations; the "%7E" can be replaced by
"~" without changing its interpretation.

Because the percent ("%") character serves as the indicator for
percent-encoded octets, it must be percent-encoded as "%25" for that
octet to be used as data within a URI. Implementations must not
percent-encode or decode the same string more than once, as decoding
an already decoded string might lead to misinterpreting a percent
data octet as the beginning of a percent-encoding, or vice versa in
the case of percent-encoding an already percent-encoded string.

@szmarczak
Copy link
Collaborator Author

image

@Giotino
Copy link
Collaborator

Giotino commented May 12, 2020

While ~ (%7E) is an edge case (that I suggest to leave as it is for compatibility), % is not.
Try to do the same thing you've done but with %25.

@szmarczak
Copy link
Collaborator Author

Welp. My example is as valid as yours. But decodeURI is indeed potentially breaking. Reverting.

@szmarczak szmarczak merged commit 761b8e0 into sindresorhus:master May 12, 2020
@szmarczak szmarczak deleted the fix-url-query branch May 12, 2020 19:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Don't force query string normalization
3 participants