Web Request Always Redirects, While Web Browser Does Not

Today I had a breakthrough on an issue that has stewing for many months.  Every once in a while, I would pick the project back up again and see if I could approach it differently and maybe come up with a solution.  Today is the day.

The problem was in one of my homebrew personal utility programs, used to download images from the Internet.  One day, when I went to run the utility, it suddenly started failing and I could not get it to resume.  When I would request the same URL in a web browser, the download worked without any problem, but trying to download using code was a no-go.

Clearly, there is a difference between handmade code and a web browser, the task was to determine what the difference was in order to make the code better emulate the browser.  I added many http headers, trying to find out what was the missing piece.  I tried capturing cookies, but that had no effect either.  I even used a network monitor to capture the network traffic and couldn’t see any significant difference between my code and a browser.

Today, starting from scratch in a new project, I tried again.  To get the basic browser header that I needed, I visited a web site, like I had done in previous attempts.  This particular web site returned a number of headers I had never seen before, which I found very curious.  When I duplicated all those headers in my code, my requests worked.  Eliminating the headers one by one identified the one that I needed: X-Forwarded-Proto.

This header has been around for nearly a decade, although I’ve never heard of it, nor seen it implemented anywhere.  It seems to be specific to load balancers, which is actually relevant since the image host I am calling runs through a load balancer.  I was able to add the header to a standard WebClient and didn’t actually need to go to the HttpWebRequest level of code and my requests started working again.

I am not sure if the obscurity of this header was a browser-sniffing check by the load balancer to weed out bots and automated download tools (of which my utility would be classified), or whether it was just part of a new, stricter set of w3c standards that got included in a software update.  Realistically, it didn’t break anything for their users since their primary traffic is web browsers.