Today I had a breakthrough on an issue that has stewing for many months. Every once in a while, I would pick the project back up again and see if I could approach it differently and maybe come up with a solution. Today is the day.
The problem was in one of my homebrew personal utility programs, used to download images from the Internet. One day, when I went to run the utility, it suddenly started failing and I could not get it to resume. When I would request the same URL in a web browser, the download worked without any problem, but trying to download using code was a no-go.
Clearly, there is a difference between handmade code and a web browser, the task was to determine what the difference was in order to make the code better emulate the browser. I added many http headers, trying to find out what was the missing piece. I tried capturing cookies, but that had no effect either. I even used a network monitor to capture the network traffic and couldn’t see any significant difference between my code and a browser.
Today, starting from scratch in a new project, I tried again. To get the basic browser header that I needed, I visited a web site, like I had done in previous attempts. This particular web site returned a number of headers I had never seen before, which I found very curious. When I duplicated all those headers in my code, my requests worked. Eliminating the headers one by one identified the one that I needed: X-Forwarded-Proto.
This header has been around for nearly a decade, although I’ve never heard of it, nor seen it implemented anywhere. It seems to be specific to load balancers, which is actually relevant since the image host I am calling runs through a load balancer. I was able to add the header to a standard WebClient and didn’t actually need to go to the HttpWebRequest level of code and my requests started working again.
I am not sure if the obscurity of this header was a browser-sniffing check by the load balancer to weed out bots and automated download tools (of which my utility would be classified), or whether it was just part of a new, stricter set of w3c standards that got included in a software update. Realistically, it didn’t break anything for their users since their primary traffic is web browsers.