<div dir="ltr">Hi,<div><br></div><div>As I mention to Dridi in a private irc channel, I like the idea of scheme, specially in the context of #1847.<div><br></div><div>req.path is a different story.  The meaning of path in HTTP/1.x and HTTP/2 has diverted AFAIU.  </div><div>In the former the path is just the path component of the URI, in the latter (well, the pseudo header :path) it includes the query parts.</div><div><br>If we were to implement req.path it will mean different things depending on the protocol which will be very confusing.</div><div>Also I don't think neither the browser nor the users will make that distinction.</div><div><br></div><div>Regarding req.authority, I don't think there is any reason to add req.authority or have both req.authority and req.http.host.</div><div>As long as the latter has the correct value we can use it to express the authority pseudo header and the Host header as we already do for absolute requests.</div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 28, 2016 at 11:20 AM, Dridi Boukelmoune <span dir="ltr"><<a href="mailto:dridi@varni.sh" target="_blank">dridi@varni.sh</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi all,<br>

<br>

I'm starting a discussion that will hopefully lead to a VIP. I have<br>

pondered this one for quite a while now and trac #1847 convinced<br>

me that I should share this with the -dev list. I predict it will be<br>

one of those long emails I'm very good at not synthesizing.<br>

<br>

The proposal is to have new request fields:<br>

- req.path (the absolute path)<br>

- req.host (the virtual host)<br>

- req.authority<br>

<br>

This is based on my understanding of HTTP/1.1 and it seems like it'd<br>

play nicely with HTTP/2 but I'm not done studying the latter.<br>

<br>

In HTTP/1.1 a request starts[1] with a request-line:<br>

<br>

  method SP request-target SP HTTP/1.1 CRLF<br>

<br>

The request-target is a URI[2] that can take several forms:<br>

<br>

  request-target = origin-form => the path, /something<br>

               / absolute-form => [scheme]://[authority][path]<br>

               / authority-form => host name for CONNECT<br>

               / asterisk-form => an actual * for OPTIONS<br>

<br>

The problem with the absolute-form is that Varnish is not supposed to<br>

receive this kind of request because they are meant for forward<br>

proxies, but it MUST [5] be handled regardless.<br>

<br>

The good thing with the absolute-form is that it integrates nicely with<br>

HTTP/2 [3] since all its components map to pseudo headers.<br>

<br>

The other thing that Varnish should do with absolute-form URIs is to<br>

ignore[4] host headers and use the absolute-form authority as such.<br>

<br>

One may ask where we should store the asterisk-form, and HTTP/2 says<br>

it belongs in the path[3].<br>

<br>

How would it work?<br>

<br>

For HTTP/1.1 Varnish would dissect the request and populate the<br>

following fields:<br>

- req.url => request-target<br>

- req.path => origin form OR asterisk-form OR path from absolute-form<br>

- req.authority => authority-form OR authority from absolute-form OR host header<br>

- req.scheme => scheme from absolute-form OR "http"<br>

<br>

For HTTP/2 I suppose we could reconstruct an absolute-form with the<br>

pseudo headers from the new fields, since we'd get them as pseudo<br>

headers[3].<br>

<br>

Changes in the built-in VCL:<br>

sed -e s/req.url/req.path/ -e s/req.http.host/req.authority/<br>

<br>

Security concerns:<br>

<br>

Mainly, how to deal with an "https" scheme? And for that I'd shift the<br>

responsibility to the user/documentation. If you have a trusted TLS or<br>

HTTPS proxy you can always route the decrypted traffic to a different<br>

port and check it when req.scheme == "https".<br>

<br>

Breaking changes:<br>

<br>

On top of the breaking changes (only the built-in VCL, i thnik) I<br>

wouldn't mind renaming req.url to req.target but not because I like to<br>

break things for the sake of breaking them (but I do like breaking<br>

stuff).<br>

<br>

The rationale is that since the request-target is not necessarily a<br>

URL, that would be the occasion of getting better semantics wrt to<br>

the RFCs (like req.request that became req.method) and also make<br>

sure that VCL wouldn't compile and give you subtle bugs because of<br>

changes in the built-in VCL.<br>

<br>

Thoughts?<br>

<br>

Best,<br>

Dridi<br>

<br>

[1] <a href="https://tools.ietf.org/html/rfc7230#section-3.1.1" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc7230#section-3.1.1</a><br>

[2] <a href="https://tools.ietf.org/html/rfc7230#section-5.3" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc7230#section-5.3</a><br>

[3] <a href="https://tools.ietf.org/html/rfc7540#section-8.1.2.3" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc7540#section-8.1.2.3</a><br>

[4] <a href="https://tools.ietf.org/html/rfc7230#section-5.4" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc7230#section-5.4</a><br>

[5] <a href="https://tools.ietf.org/html/rfc7230#section-5.3.2" rel="noreferrer" target="_blank">https://tools.ietf.org/html/rfc7230#section-5.3.2</a><br>

<br>

_______________________________________________<br>

varnish-dev mailing list<br>

<a href="mailto:varnish-dev@varnish-cache.org">varnish-dev@varnish-cache.org</a><br>

<a href="https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev" rel="noreferrer" target="_blank">https://www.varnish-cache.org/lists/mailman/listinfo/varnish-dev</a><br>

</blockquote></div><br></div>