Suggestions for HTTPS caching MITM/proxy

Suggestions for HTTPS caching MITM/proxy
by tepples on 2016-09-14 (#179537)

Now that nesdev.com has switched from legacy cleartext HTTP to HTTPS, it has come to my attention that Rahsennor, a user who lives where the only available Internet is pay-per-gigabyte with a ping up to 15,000 ms, desires a caching proxy that supports HTTPS. I assume this refers to a man-in-the-middle run on localhost that keeps documents cached as long as possible. Like other TLS MITMs, it would need to act as a private CA, verifying the certificate presented by the site and locally generating a certificate for each site, and the private CA's root certificate would need to be installed into each browser. Rahsennor currently uses Polipo, which can cache legacy cleartext HTTP but appears to act as a non-caching tunnel for HTTPS, not a MITM. Any suggestions?

Rahsennor: If you can provide enough additional information to formulate a question on Software Recommendations Stack Exchange, I'd appreciate it. Here's what they require.

Re: Suggestions for HTTPS caching MITM/proxy
by koitsu on 2016-09-14 (#179559)

squid. Hope he has the disk/RAM for it though.

Zero experience using it with SSL as a MITM proxy. Sounds like nightmare fuel.

Re: Suggestions for HTTPS caching MITM/proxy
by Rahsennor on 2016-09-14 (#179579)

If anyone else had started this thread it would have creeped me the hell out. Thanks for trying to help, tepples.

I use Linux, with a strong preference for open source software. I'm currently running polipo on localhost as a personal caching proxy, to reduce both bandwidth and latency, using up to 20 GiB of disk space and 1 GiB of RAM. I also make use of the selective censorReferer feature (which removes the Referer header unless it matches Host) and server blacklist, but I can always give the blacklist to Unbound.

I've used Squid before. It's been a while, but I remember the configuration was a nightmare even without HTTPS, it didn't improve user-visible latency, and the inability to cache partial downloads made it worse than useless in poor network conditions, since it prevented recovery from dead connections.

I think that covers the features I'm after:

1) runs on Linux
2) preferably open source
3) can handle very unreliable outgoing network
4) latency more important than throughput
5) can censor headers for privacy reasons, preferably with special handling for Referer to work around dodgy hotlink prevention

Not sure if 5 is still relevant for HTTPS. Or anything else relating to HTTPS for that matter. I wasn't even aware that unencrypted HTTP was considered obsolete until every website I visited started dropping support.

I'm also not familiar with MITM in this context. I usually hear it used for things that are intentionally hidden from the endpoints. I just set the http_proxy environment variable to "http://localhost:PORT/". If that's what you mean then that's what I'm doing.

Re: Suggestions for HTTPS caching MITM/proxy
by tepples on 2016-09-15 (#179584)

Rahsennor wrote:

If anyone else had started this thread it would have creeped me the hell out. Thanks for trying to help, tepples.

With the recent sentiment against topic splits, I decided to start a new topic rather than replying off-topic.

"Man in the middle" means your browser or other user agent (UA) sends a request to a proxy, then the proxy forwards the request to the origin server on your behalf, receives the response, and then returns the response to the UA. Here, the proxy is in a position of trust because it has the technical ability to modify the response body, such as to skip the trip to the origin server entirely if it already has a cached copy, or to insert advertisements or other malicious data. Comcast, for instance, has been known to insert advertisements into legacy cleartext HTTP sessions, such as to add an in-page pop-up notifying the subscriber that a newer model of wireless gateway is available.

Normally, HTTPS includes measures to prevent a proxy from modifying or even seeing the response body on its way to the UA. One of these measures is public key infrastructure (PKI), which requires connections to include a certificate attesting that the origin server is authorized to speak for a particular hostname. To prevent a malicious proxy from making its own certificates, the browser requires the certificate to have been issued by a trusted third party called a certificate authority (CA), with its own "root certificate" included in the UA's certificate store so that the UA knows that the CA's certificates are legit. But because a man in the middle needs to be able to speak for every hostname, it needs to support acting as a CA, and you'll need to add the root certificate of the proxy's CA to your UA's certificate store.

Rahsennor wrote:

I wasn't even aware that unencrypted HTTP was considered obsolete until every website I visited started dropping support.

Then I guess I must follow different forums from you. I've picked these up from Slashdot:

Firesheep extension (Eric Butler, October 2010): Sniff and reuse cookies of users on same subnet if a site uses HTTPS only for login and payment forms and redirects back to legacy cleartext HTTP for the rest of the site
HTTPS Everywhere (Electronic Frontier Foundation, 4Q 2010): Browser extension in response to Firesheep to make Firefox or Chrome prefer HTTPS if available by distributing a list of sites that support HTTPS
HTTP Strict Transport Security: HTTP header through which a website can opt-in to make a UA prefer HTTPS if available
PRISM disclosure (Edward Snowden, 2013): The U.S. NSA is intercepting legacy cleartext HTTP communications
AdSense support for HTTPS (Google, September 2013): lack of HTTPS-compatible web advertising networks is no longer as much of an excuse against supporting HTTPS
Deprecating Non-Secure HTTP (Mozilla, April 30, 2015): proposal to disable sensitive JavaScript features on legacy cleartext HTTP
Deprecating Powerful Features on Insecure Origins (Google): proposal to disable sensitive JavaScript features on legacy cleartext HTTP
LEAN Ads (IAB, October 15, 2015): encouraging more web advertising networks to support HTTPS to keep malware from interfering with ad delivery
Let's Encrypt (IdenTrust, December 2015): automated means to obtain domain-validated certificates on your domain* without charge
W3C Secure Contexts (World Wide Web Consortium): proposal to disable sensitive JavaScript features on legacy cleartext HTTP

* Requires you to own a domain whose parent is on the Public Suffix List. Obtaining a certificate for a subdomain of someone else's domain is harshly rate-limited, especially on many dynamic DNS providers. Obtaining a certificate under a non-public TLD, such as .local for your LAN, is not possible.

Re: Suggestions for HTTPS caching MITM/proxy
by zzo38 on 2016-09-15 (#179588)

Rahsennor wrote:

5) can censor headers for privacy reasons, preferably with special handling for Referer to work around dodgy hotlink prevention

Not sure if 5 is still relevant for HTTPS.

It still is relevant, although not as much as it is with insecure connections. The headers can still be seen by the site you are accessing regardless of protocol; HTTPS prevents anyone else from seeing or tampering with it. So it still is relevant, but less so.

HTTPS does not prevent malware or anything like that, but does prevent an unauthorized "man in the middle" from adding such malware.

They want to disable sensitive JavaScript features based on protocol, but I think it makes more sense to enable/disable them by user settings instead. (Although, what Mozilla/Google/W3C proposes could be used as a default setting.)

Let's Encrypt is good though.

My own server will continue supporting unencrypted connections forever, even if allowing some (or all) files to be signed and/or encrypted is added on later too.

I too think we should see what proxies are available. I don't know how easily something might be written with Node.js or with Perl or something. In any case, this site no longer requires HTTPS and I am glad that it no longer requires HTTPS (and nocash also likes this); therefore, if you cannot set up the proxy for HTTPS, you can set up the proxy for HTTP instead.

HSTS is terrible though.

Re: Suggestions for HTTPS caching MITM/proxy
by Dwedit on 2016-09-15 (#179589)

There's Paros, a proxy program you can install on your computer to sniff your own HTTPS traffic. Necessary to install a certificate to allow connections because of how HTTPS sniffing works. It is technically a MITM proxy.

But it's not a caching proxy or anything like that.

Re: Suggestions for HTTPS caching MITM/proxy
by Rahsennor on 2016-09-15 (#179613)

tepples wrote:

Comcast, for instance, has been known to insert advertisements into legacy cleartext HTTP sessions, such as to add an in-page pop-up notifying the subscriber that a newer model of wireless gateway is available.

This is what the term "man-in-the-middle proxy" made me think of, because this:

tepples wrote:

...your browser or other user agent (UA) sends a request to a proxy, then the proxy forwards the request to the origin server on your behalf, receives the response, and then returns the response to the UA.

is just the definition of the word "proxy". I guess the qualifier is meaningful when encryption is involved.

tepples wrote:

Then I guess I must follow different forums from you.

Nesdev is the only forum I follow right now. I haven't been keeping up with news of any kind for the last few years due to health issues.

I made a start on writing my own proxy a while back, meaning to include a bunch of latency-reduction and dropout-handling tricks that I don't think anyone has put in an HTTP proxy before (because I'm probably the only person on Earth who wants them), but gave up in frustration when I realized it would be useless by the time I finished it. How many hoops would I have to jump through to implement this HTTPS certificate hoo-hah? Are there any tools that could handle that for me, and pass non-sensitive traffic through a normal caching proxy?

Re: Suggestions for HTTPS caching MITM/proxy
by tepples on 2016-09-22 (#179988)

Running a private CA is theoretically just a few commands in certtool or openssl. You would need to do three things:

Generate a keypair for your CA, and export a private key and root certificate. Because key generation depends on highly random numbers, such as those obtained through the exact timing of keystrokes and mouse movements, this may be easier on a workstation or on a server with a hardware random number generator. The GnuTLS page on Ubuntu Community Help Wiki describes how to do this using certtool of GnuTLS.
Install this root certificate as a trusted CA certificate in all your browsers.
Give the private key to the HTTPS proxy software so that it can make a certificate for each site that you visit.

But I'll step back again to rule out an XY problem, as is traditional when composing a question for any Stack Exchange site:

What's the benefit of running a locally caching proxy over letting your primary web browser cache a resource until it Expires:? I can think of a few reasons:

Your browser is deleting something from cache before it Expires: to save space for another object, but your proxy would still keep it cached because of greater capacity (namely 20 GB).
You use multiple browsers, either on one computer or on multiple devices in a household, and want them to share a cache. In this case, watch for headers to the effect Vary: User-agent, which means a cache shouldn't assume it can serve an identical resource to different kinds of browser.
You often visit sites that fail to follow best practices for Expires: and other cache control headers.
You want to cache a resource even after it Expires:, in what the W3C's HTML5 standard calls "willful violation".

Which best describes your use case?

And by "1 GiB of RAM", did you mean total on the machine, or did you mean that's how much RAM you have devoted just to the cache?

Re: Suggestions for HTTPS caching MITM/proxy
by Rahsennor on 2016-09-22 (#180051)

tepples wrote:

Running a private CA is theoretically just a few commands in certtool or openssl. You would need to do three things:

The actual question, which I admit could have been phrased better, was how to implement step 3 in my own proxy software. But your response was probably more useful to me at this point anyway, so thanks.

tepples wrote:

What's the benefit of running a locally caching proxy over letting your primary web browser cache a resource until it Expires:?

All of the above. Plus, running a dedicated proxy lets me configure caching how I like it, not how Google, Mozilla or someone else with an unmetered fibre connection likes it, and I only have to trawl through one poorly-written manual instead of half a dozen before I find out that the feature I'm looking for is not supported.

Offtopic: I fail to see why Expires: should actually cause a deletion. If an object has Last-Modified:/ETag:, those should be used instead; the optimal eviction policy depends on the user's requirements.

tepples wrote:

And by "1 GiB of RAM", did you mean total on the machine, or did you mean that's how much RAM you have devoted just to the cache?

That's the RAM limit for the cache. It never actually uses all of it, since my connection is so slow, but I can easily spare that much, since I have 8 total.

Re: Suggestions for HTTPS caching MITM/proxy
by tepples on 2016-09-23 (#180062)

Rahsennor wrote:

tepples wrote:

Running a private CA is theoretically just a few commands in certtool or openssl. You would need to do three things:

The actual question, which I admit could have been phrased better, was how to implement step 3 in my own proxy software.

I'll keep that in mind for phase 2 when we figure out how to make the software that the Software Recs users fail to turn up.

Quote:

I fail to see why Expires: should actually cause a deletion. If an object has Last-Modified:/ETag:, those should be used instead; the optimal eviction policy depends on the user's requirements.

Some documents are intended for viewing by only the logged-in user, and they send a short Expires: so that the browser doesn't continue to display a document containing a user's private data even after that person has walked away from the PC and someone else has started to use it. (Think public library.)

Thank you for your clarifications. I've posted this question on Software Recs. If there's anything I failed to capture in my restatement of your requirements, let me know.

Re: Suggestions for HTTPS caching MITM/proxy
by Rahsennor on 2016-09-23 (#180088)

tepples wrote:

It's been way too long since I last looked at the RFCs, but I thought keeping private data out of caches was the job of Cache-Control: and Vary: + cookies.

The comment about Expires: wasn't intended to be a requirement, by the way; I was intending to convey that since stale objects can be re-used after a 304 Not Modified deleting them immediately is counter-productive, assuming you have enough storage. Polipo only deletes files based on time since last accessed.

But again, I haven't looked at the HTTP spec in months.

tepples wrote:

Thank you for your clarifications. I've posted this question on Software Recs. If there's anything I failed to capture in my restatement of your requirements, let me know.

If anything, you've probably made it too much like what I would have posted. Thanks again for trying to help, though I doubt there's an existing utility out there that would fulfill even most of my wishlist.

Re: Suggestions for HTTPS caching MITM/proxy
by tepples on 2016-12-02 (#183793)

Related discussions about interference of HTTPS with caching:

Re: Suggestions for HTTPS caching MITM/proxy
by tepples on 2018-07-30 (#222349)

When Slashdot and SoylentNews discuss the web's shift to HTTPS, I continue to ask for recommendations. Recently, urza9814 suggested switching to Squidguard.