ISP’s and proxies can sometimes update your site HTML without your permission. See how they do it, why, and how you can put a stop to it.
I’m in the fortunate position where I have to commute an hour and fifteen minutes in to and back from work each day. Some people might see that as a curse but I really enjoy the opportunity for some quiet time, my laptop, and a table. It also allows me to write articles like this.
One of the issues that I do face is the intermittent access to the internet over the course of the journey.
The train comes packed with Wifi however they only provide 20Mb for the entire day which gives me access to around 10 web pages (if I’m lucky, don’t get me started on the size of web pages these days).
Instead I opt for the use of my mobile phones wifi hotspot and the 4G/3G connection I can pick up for most of the journey. I use this quite a bit when I’m out and about and as a result I now sit it on 4gig a month (which is only £15… not too bad really).
So with my tethered connection to the internet and my laptop and the go all that is left is to start writing…. right?
Well writing yes, but I’m hamstrung when it comes to doing some other work.
Hey ISP’s, leave websites alone
…all in all you’re just a-nother brick in the wall… a really annoying wall that shouldn’t be there. Let me tell you what I mean.
One of the things that I work on in the mornings is reviews of existing responsive design sites for the RWD example section of the site. I take a screen shot of the site using Am I Responsive and then I delve into the HTML/CSS/JS to find out how they put the site together, did they employ progressive enhancement, did they use plugins, how were they configured, is the site using RWD images and which parts of the spec, is there compression on the files, do the concatenate files, does the site use critical CSS etc etc.
The issue that I’ve found recently is that when I ‘view-source’ on a site that I’m reviewing I do not get the version that someone worked very hard to publish… instead I get a version that the ISP decides is best for me.
Recently Jeffrey Zeldman released a new site about the new studio he has founded aptly named Studio.Zeldman. As the grandfather of designing with web standards any new site from Jeffrey was something that I really wanted to take a look into and see how he went about building something in todays age of the web.
The first thing I looked at was how the images were being included.
<img src=“http://188.8.131.52/bmi/studio.zeldman.com/img/keynotespeaker.jpg” srcset=“img/keynotespeaker-md.jpg 1000w, img/keynotespeaker-lg.jpg 2000w” alt=“Photo of Jeffrey Zeldman onstage.”>
Looks like he’s using
srcset and leaving off the
sizes attribute, but hold the phone he’s also using some kind of different CDN for the main image source and his own server for the other srcset versions?
No he’s actually not. The ISP is looking at the SRC files and updating the URL to go through
I saved both of the files and they are exactly the same. Both weigh in at 3,922 bytes (Kudos to Jeffrey for adding the smallest version as the default src).
The next thing I looked at was the CSS and I noticed that Jeffrey had not included a CSS file at all, and instead he had inlined everything with a strange implementation method.
<style style=“display:none”>/*! * All Styles here*/ </style>
That seemed a little strange to include all of the styles on the page instead of just the “above the fold” styles. I concluded that because it was only a single page site at the moment I could see there could be a benefit in removing the HTTP request and going all inline with the expense of having a slightly bigger HTML request each time.
Also the whole
style="display:none;" seemed super weird… those elements are hidden by default and I’m not actually sure you could see them if you wanted to… was this a new method that reaped huge benefits that Jeffrey was sharing with us?
No, it wasn’t.
The ISP was taking the actual reference to the CSS
<link rel="stylesheet" href="/css/main.css?version=1.7" />
pulling out the the CSS and inlining it into the page. There was no difference in the CSS itself, just in the way it was being delivered. This means that the ISP is disallowing the intention of the web developer for me to cache the CSS file and never have to re-request it (depending on the Cache expiry) and instead adding it to the document
In Jeffrey’s site he has 3 script tags in the footer for
form. In this case these should be concatenated into a single file.
The ISP in this case doesn’t something a little strange. It keeps the first
<script> tag as it was and then inlines the next 2 scripts again using the strange
<script style="display:none"> method.
My only guess here is that the scripts file was deemed to be too large to inline and stayed as a standard file, but I’m grasping at straws to properly explain it.
The weigh in
Lets just assume the ISP was trying to be nice to me by reducing my bandwidth… except that the original HTML file is 30kb and their ‘enhanced’ version is 143kb.
Downloading the whole web page we see a similar result
- Original: 615,648 bytes (647 KB on disk) for 17 items
- ISP version: 616,132 bytes (639 KB on disk) for 14 items
The ISP was able to save me a 3 HTTP requests by combining 3 JS files into a 1 file (2 requests saved) and inlining all of the CSS (1 request saved). This is at the expense of a larger page for every additional visit to the site.
These figures aren’t great in my opinion, but I’m sure they would be a bit better on a site that isn’t carefully developed with care taken to be as performant as possible.
Yes, I do have a point.
I can see why this can be a good idea and it links back to my issue with the 20MB on offer with the trains default wifi.
Our web pages have become so bloated these days that companies like Google (AMP), Facebook (Instant Pages), and ISP’s are taking matters into their own hands. They are providing services to allow webpages to be crunched down and delivered much faster because the majority of the sites out there are too bloated and too slow.
The knock on affect is that when people who are web standards and web performance minded build their sites they do so in a way that is best for the users. These sites do not need any configuring or hacking after the fact, they just need to be served as they should be.
You can force the ISP’s and other proxies to leave your stuff alone but it requires making a few server level changes. By adding
Cache-Control: no-transform you are telling everyone that you don’t want any of your files messed with. This does mean that your users won’t get the benefit if you’re going to be lazy and not optimise your images the way you should be… but then of course you would be doing that anyway right?
A Simpler fix — be secure
HTTPS is quickly becoming a default approach to every website build, or at least it should be.
With the prices of SSL Certificates dropping, places like LetsEncrypt providing free SSL certificates, and CDN’s like Cloudflare offering free SSL there’s little reason not to do it.
The added benefit is the case of this article is that ISP’s are not able to fiddle with the documents and files being transferred across the world, so no updating image sources or inlining all of you CSS.
Another benefit with https means that you can switch across to http2 as well. While HTTPS isn’t a specific requirement in the specification so far none of the browsers are looking to do any implementations to support http2 on an un-encrypted, something for which I whole heartedly agree.