Analysis of the Alexa Top 1M sites (Feb 2018)

Prior to the release of the Mozilla Observatory in June of 2016, I ran a scan of the Alexa Top 1M websites. Despite being available for years, the usage rates of modern defensive security technologies was frustratingly low. A lack of tooling combined with poor and scattered documentation had led to minimal awareness around countermeasures such as Content Security Policy (CSP), HTTP Strict Transport Security (HSTS), and Subresource Integrity (SRI).

Since then, a number of additional assessments have done, including in October 2016 and June 2017. Both of those surveys demonstrated clear and continual improvement in the state of internet security. But now that tools like the Mozilla Observatory, securityheaders.io and Hardenize have become more commonplace, has the excitement for improvement been tempered?

February 2018 Scan

Technology June 2017 February 2018 % Change
(June 2017)
% Change
(All‑Time1)
Content Security Policy (CSP) .018%2
.043%3
.022%2
.112%3
+22%
+161%
+340%
+833%
Cookies (Secure/HttpOnly)4 6.50% 8.97% +38% +139%
Cross-origin Resource Sharing (CORS)5 96.55% 96.89% +.35% +3.3%
HTTPS 45.80% 54.31% +19% +83%
HTTP → HTTPS Redirection 14.38%6
22.88%7
21.46%6
32.82%7
+49%
+43%
+324%
+268%
Public Key Pinning (HPKP) 0.71% 1.07% +51% +148%
  — HPKP Preloaded8 0.43% 0.70% +63% +71%
Strict Transport Security (HSTS)9 4.37% 6.03% +38% +245%
  — HSTS Preloaded8 .337% .631% +87% +299%
Subresource Integrity (SRI) 0.113%10 0.182%11 +61% +1113%
X-Content-Type-Options (XCTO) 9.41% 11.72% +21% +89%
X-Frame-Options (XFO)12 10.98% 12.55% +14% +84%
X-XSS-Protection (XXSSP)13 8.12% 10.36% +28% +106%

Improvement across the web appears to be continuing at a steady rate. Although a 19% increase in the number of sites that support HTTPS might seem small, the absolute numbers are quite large — it represents over 83,000 websites, a slight slowdown from the previous survey's 119,000 jump, but still a great sign of progress in encrypting the web's long tail.

Not only that, but an additional 97,000 of the top websites have chosen to be HTTPS by default, with another 16,000 of them forbidding any HTTP access at all through the use of HTTP Strict Transport Security (HSTS). Also notable is the jump in websites that have chosen to opt into being preloaded in major web browsers, via a process known as HSTS preloading. Until browsers switch to HTTPS by default, HSTS preloading is the best method for solving the trust-on-first-use problem in HSTS.

Content Security Policy (CSP) — one of the most important recent advances due to its ability to prevent cross-site scripting (XSS) attacks — continues to see strong growth. Growth is faster in policies that ignore inline stylesheets (CSS), perhaps reflecting the difficulties that many sites have with separating their presentation from their content. Nevertheless, improvements brought about by specification additions such as 'strict-dynamic' and policy generators such as the Mozilla Laboratory continue to push forward CSP adoption.

Mozilla Observatory Grading

Despite this progress, the vast majority of top websites around the web continue not to use Content Security Policy, Strict Transport Security, or Subresource Integrity. As these technologies — when properly used — can nearly eliminate huge classes of attacks against sites and their users, they are given a significant amount of weight in Observatory scans.

As a result of their low usage rates amongst top websites, they typically receive failing grades from the Observatory. But despite new tests and harsher grading, I continue to see improvements across the board:

Grade April 2016 October 2016 June 2017 February 2018 % Change
  A+ .003% .008% .013% .018% +38%
A .006% .012% .029% .011% -62%
B .202% .347% .622% 1.08% +74%
C .321% .727% 1.38% 2.04% +48%
D 1.87% 2.82% 4.51% 6.12% +36%
F 97.60% 96.09% 93.45% 90.73% -2.9%

As 976,930 scans were successfully completed in the last survey, a decrease in failing grades by 2.9% implies that over 27,000 of the top sites in the world have improved from a failing grade in the last eight months alone. Note that the drop in A grades is due to a recent change where extra credit points can no longer be used to move up to an A grade.

Thus far, over 140,000 websites around the web have directly used the Mozilla Observatory to improve their grades, indicated by making an improvement to their website after an initial scan. Of these 140,000 websites, over 2,800 have improved all the way from a failing grade to an A or A+ grade.

When I first built the Observatory at Mozilla, I had never imagined that it would see such widespread use. 6.6M scans across 2.3M unique domains later, it seems to have made a significant difference across the internet. I couldn't have done it without the support of Mozilla and the security researchers who have helped to improve it.

Please share the Mozilla Observatory so that the web can continue to see improvements over the years to come!



Footnotes:

  1. Since April 2016
  2. Allows 'unsafe-inline' in neither script-src nor style-src
  3. Allows 'unsafe-inline' in style-src only
  4. Amongst sites that set cookies
  5. Disallows foreign origins from reading the domain's contents within user's context
  6. Redirects from HTTP to HTTPS on the same domain, which allows HSTS to be set
  7. Redirects from HTTP to HTTPS, regardless of the final domain
  8. As listed in the Chromium preload list
  9. max-age set to at least six months
  10. Percentage is of sites that load scripts from a foreign origin
  11. Percentage is of sites that load scripts
  12. CSP frame-ancestors directive is allowed in lieu of an XFO header
  13. Strong CSP policy forbidding 'unsafe-inline' is allowed in lieu of an XXSSP header

[Category: Security] [Permalink]


HTTP Status Code Handling

I was recently writing some code for the Mozilla Observatory to store and interact with the HTTP status codes. As part of my code, I wanted to ensure that I would only store these status codes if they were an integer as per the HTTP/1.1 specification:

 The status-code element is a three-digit integer code giving the
 result of the attempt to understand and satisfy the request.

While it is easy to create test cases for conditions that don't satisfy this requirement, it is somewhat more difficult to determine how third-party libraries will handle HTTP requests that fall outside this constraint. I looked around the internet for websites to help me test weird status codes, but most of them only let me test with the known status codes. As such, I decided to add arbitrary HTTP status codes to my naughty httpbin fork, called misbehaving.site.

What I discovered is that the various browser manufacturers have wildly different behavior with how they handle unknown HTTP status codes. Here is what the HTTP specification says that browsers should do:

 HTTP status codes are extensible.  HTTP clients are not required to
 understand the meaning of all registered status codes, though such
 understanding is obviously desirable.  However, a client MUST
 understand the class of any status code, as indicated by the first
 digit, and treat an unrecognized status code as being equivalent to
 the x00 status code of that class, with the exception that a
 recipient MUST NOT cache a response with an unrecognized status code.

…so what happens in reality?

Chrome

Chrome's behavior is strange, but surprisingly not the strangest of the major browsers:

Chrome's HTTP status code behavior

For negative status codes, Chrome always displays HTTP status code 200. For 0, it simply displays Finished instead of the actual status code. It otherwise simply reflects the status code, unless it exceeds 2147483647 (231-1), in which case it displays 2147483647.

Note that when exceeding 2147483647, it displays this error in the console, despite the page otherwise loading normally:

Chrome's HTTP status code behavior

Firefox

It actually took me quite a while to figure out Firefox's behavior. Let's take a look:

Firefox's HTTP status code behavior

Status codes in Firefox are modulo 65536 (216), unless it works out to 0, in which it displays status code 200.

This works up to a certain point, when it starts to display different behavior:

Firefox's HTTP status code behavior

Note how the status icon (blue dot, yellow triangle, etc.) is dependent on the first digit of the status code, once Firefox has finished interpreting it.

Safari

Safari only accepts status codes between 1 and 999. Should the status code fall outside that range, it reflects the entire HTTP request as plaintext, headers and all:

Safari's HTTP status code behavior

It also displays this error in the browser console. I'm not sure why, as the output is just JSON and there isn't any script on the page:

Safari's HTTP status code behavior

Note if you serve from localhost instead of a remote server, it displays a different error:

Safari's HTTP status code behavior

Edge

Not to be left behind, Edge also has some unusual HTTP status code handling:

Edge's HTTP status code behavior

For status code 0, it displays (Pending), although the page otherwise loads normally. For negative status codes, it displays them as the status code modulo 4294967296 (232). This is unless the status code is less than -4294967295, in which case it displays 1.

For positive status codes, it simply reflects them. This is until the status code reaches 4294967296 or higher, in which case it shows (Pending) and the browser displays this error:

Edge's HTTP status code behavior

Final words

Those who have been around in computing for a long time are likely familiar with Postel's Law:

 Be liberal in what you accept, and conservative in what you send.

While it seems like the neighborly thing to do, it is the bane of those of us who enjoy consistent software behavior. If the specification had simply stated that status codes falling outside 100-599 should be treated as an unrecoverable error, then we wouldn't see the unusual behavior that we see today.

Luckily, while all of the browsers have their own idiosyracies, none of them are actually harmful in this case.

If you enjoyed this post and would like to test how browsers handle other quirky HTTP responses, please consider opening an issue or sending a pull request to the misbehaving.site github repository.

[Category: Security] [Permalink]


Analysis of the Alexa Top 1M sites (June 2017)

Prior to the release of the Mozilla Observatory a year ago, I ran a scan of the Alexa Top 1M websites. Despite being available for years, the usage rates of modern defensive security technologies was frustratingly low. A lack of tooling combined with poor and scattered documentation had led to there being little awareness around countermeasures such as Content Security Policy (CSP), HTTP Strict Transport Security (HSTS), and Subresource Integrity (SRI).

A few months after the Observatory's release — and 1.5M Observatory scans later — I reassessed the Top 1M websites. The situation appeared as if it was beginning to improve, with the use of HSTS and CSP up by approximately 50%. But were those improvements simply low-hanging fruit, or has the situation continued to improve over the following months?

Technology April 2016 October 2016 June 2017 % Change
Content Security Policy (CSP) .005%1
.012%2
.008%1
.021%2
.018%1
.043%2
+125%
Cookies (Secure/HttpOnly)3 3.76% 4.88% 6.50% +33%
Cross-origin Resource Sharing (CORS)4 93.78% 96.21% 96.55% +.4%
HTTPS 29.64% 33.57% 45.80% +36%
HTTP → HTTPS Redirection 5.06%5
8.91%6
7.94%5
13.29%6
14.38%5
22.88%6
+57%
Public Key Pinning (HPKP) 0.43% 0.50% 0.71% +42%
  — HPKP Preloaded7 0.41% 0.47% 0.43% -9%
Strict Transport Security (HSTS)8 1.75% 2.59% 4.37% +69%
  — HSTS Preloaded7 .158% .231% .337% +46%
Subresource Integrity (SRI) 0.015%9 0.052%10 0.113%10 +117%
X-Content-Type-Options (XCTO) 6.19% 7.22% 9.41% +30%
X-Frame-Options (XFO)11 6.83% 8.78% 10.98% +25%
X-XSS-Protection (XXSSP)12 5.03% 6.33% 8.12% +28%

The pace of improvement across the web appears to be continuing at an astounding rate. Although a 36% increase in the number of sites that support HTTPS might seem small, the absolute numbers are quite large — it represents over 119,000 websites.

Not only that, but 93,000 of those websites have chosen to be HTTPS by default, with 18,000 of them forbidding any HTTP access at all through the use of HTTP Strict Transport Security.

The sharp jump in the rate of Content Security Policy (CSP) usage is similarly surprising. It can be difficult to implement for a new website, and often requires extensive rearchitecting to retrofit to an existing site, as most of the Alexa Top 1M sites are. Between increasingly improving documentation, advances in CSP3 such as 'strict-dynamic', and CSP policy generators such as the Mozilla Laboratory, it appears that we might be turning a corner on CSP usage around the web.

Observatory Grading

Despite this progress, the vast majority of large websites around the web continue to not use Content Security Policy and Subresource Integrity. As these technologies — when properly used — can nearly eliminate huge classes of attacks against sites and their users, they are given a significant amount of weight in Observatory scans.

As a result of their low usage rates amongst established websites, they typically receive failing grades from the Observatory. Nevertheless, I continue to see improvements across the board:

Grade April 2016 October 2016 June 2017 % Change
  A+ .003% .008% .013% +62%
A .006% .012% .029% +142%
B .202% .347% .622% +79%
C .321% .727% 1.38% +90%
D 1.87% 2.82% 4.51% +60%
F 97.60% 96.09% 93.45% -2.8%

As 969,924 scans were successfully completed in the last survey, a decrease in failing grades by 2.8% implies that over 27,000 of the largest sites in the world have improved from a failing grade in the last eight months alone.

In fact, my research indicates that over 50,000 websites around the web have directly used the Mozilla Observatory to improve their grades, indicated by scanning their website, making an improvement, and then scanning their website again. Of these 50,000 websites, over 2,500 have improved all the way from a failing grade to an A or A+ grade.

When I first built the Observatory a year ago at Mozilla, I had never imagined that it would see such widespread use. 2.8M scans across 1.55M unique domains later, it seems to have made a significant difference across the internet. I feel incredibly lucky to work at a company like Mozilla that has provided me with a unique opportunity to work on a tool designed solely to make internet a better place.

Please share the Mozilla Observatory so that the web can continue to see improvements over the years to come!



Footnotes:

  1. Allows 'unsafe-inline' in neither script-src nor style-src
  2. Allows 'unsafe-inline' in style-src only
  3. Amongst sites that set cookies
  4. Disallows foreign origins from reading the domain's contents within user's context
  5. Redirects from HTTP to HTTPS on the same domain, which allows HSTS to be set
  6. Redirects from HTTP to HTTPS, regardless of the final domain
  7. As listed in the Chromium preload list
  8. max-age set to at least six months
  9. Percentage is of sites that load scripts from a foreign origin
  10. Percentage is of sites that load scripts
  11. CSP frame-ancestors directive is allowed in lieu of an XFO header
  12. Strong CSP policy forbidding 'unsafe-inline' is allowed in lieu of an XXSSP header

[Category: Security] [Permalink]


Recording MTGO in 4K with OBS

One of the perennial complaints about MTGO streams and recordings is how difficult the cards are to read. And it's no surprise — pretty much any program would struggle with the requirements that Magic imposes. It has to combine sometimes obscene amounts of text with an eye-wateringly small rendering area of less than a dozen square centimeters.

This problem is exacerbated by the fact that most MTGO streamers record at the standard resolution of 1080p. There are simply not enough pixels available to legibly render a font in such a small area at such a low resolution. Here is an example of what I mean:

Demonstration of 1080p vs. 4K resolution
Left: 1080p Channel Fireball recording
Middle: 4K recording, smallest hand size
Right: 4K recording, standard hand size

This is not meant to pick on Channel Fireball. Their content is impeccable, but much of it is hard to read unless you know exactly what is going on. To be clear, it's not just Channel Fireball; even professional content put out by Wizards of the Coast suffers from similar issues. Unless you follow Standard, I suspect you'll have a hard time telling me what these cards from the recent Magic Online Championship do:

MTGO Championship Screenshot

Is 4K really that big of a deal?

YouTube displays recordings in a variety of resolutions, from 144p all the way up to 2160p (4K). It may not seem as if there is a big difference between 1080p and 2160p, but remember that the “1080” in “1080p” only refers to the number of vertical pixels. In terms of overall pixels, there is a pretty vast gulf between 1080p and 4K:

YouTube Resolution Pixels % of 1080p
4K 3840x2160 8,294,400 400%
1440p 2560x1440 3,686,400 178%
1080p 1920x1080 2,073,600
720p 1280x720 921,600 44%
480p 854x480 409,920 20%
360p 640x360 230,400 11%
240p 426x240 102,240 5%
144p 256x144 36,864 2%

As you can see, 4K gives you 400% more pixels to render legible text in the exact same amount of screen space. That's why the Deathrite Shaman on the right can clearly display its entire text box despite taking up the exact same amount of space as the Deathrite Shaman on the left.

Demonstration of 1080p vs. 4K resolution

Cranking OBS to 11

Before I began recording, I simply assumed that 1080p recordings were a matter of inertia. Everybody recorded in 1080p, so what was the point in trying to bump the resolution up to 4K? After all, 4K means more CPU usage, larger files, and slower downloads. Why bother when nobody else was doing it?

It turns out my assumptions were way, way wrong. It turns out that it's actually really difficult to record in 4K while playing MTGO. My machine isn't top-of-the-line, but it's nothing to scoff at: a 2013 Macbook Pro with quad 2.6GHz i7 CPUs and 16GB of memory.

I assumed that all I'd have to do was tell OBS to record at the native “Retina” resolution (3360x2100) and I could go on my merry way. What happened when I did that?

OBS CPU Usage

PAIN. Telling OBS to record at 4K in real time took so much CPU that my machine was rendered completely useless. I couldn't actually play MTGO, because each click took over 15 seconds to register.

I then tried telling OBS to use one of its faster CPU settings (ultrafast), but the image quality came out very poor, with lots of noise and other encoding artifacts:

OBS on Ultrafast

Path to 4K

This process left with a whole new respect and understanding for the professionals who do these recordings. It's simply impossible to have:

  1. High quality 4K recordings
  2. Low CPU usage
  3. Managably-sized video files that you can upload directly to YouTube

I realized I'd have to compromise and do my recordings in two steps:

  1. Record with at high quality and low CPU usage, but large video sizes
  2. Re-encode post-recording to generate high-quality videos with low file sizes

This is certainly more work, and takes a lot more time. But it comes with some benefits:

  1. The recording takes very little CPU usage, instead of causing the typical OBS lag
  2. The re-encoding takes 8-12 hours, but the settings used result in YouTube quickly generating all the other resolutions post-upload
  3. My screen's resolution (3360x2100) is actually lower than 4K (3840x2160), but the re-encoding lets me upscale to 4K

The Technical Details

So how did I do it? Here are my OBS video settings:

  1. Recording Format: mp4
  2. Encoder: x264
  3. Rescale Output: unchecked (native resolution)
  4. Rate Control: CRF
  5. CRF: 12
  6. Keyframe Interval: 0 (auto)
  7. CPU Usage Preset: superfast
  8. Tune: stillimage
  9. Variable Framerate (VFR): checked

These settings generate very high quality recordings that average about 1GB for every ten minutes of recording. Lowering the CRF value leads to higher quality files at the cost of increased CPU usage, and 12 was the highest quality my machine could handle. If you find these settings too aggressive, bump CRF to a higher number.

Once I am finished recording, I have an automated job that upscales and re-encodes with ffmpeg, using the optimal YouTube video settings:

$ ffmpeg -i input.mp4 \
  -c:v libx264 \
  -crf 21 \
  -tune stillimage \
  -bf 2 \
  -c:a copy \
  -pix_fmt yuv420p \
  -flags +cgop \
  -sws_flags lanczos \
  -movflags faststart \
  -vf scale=-1:2160 \
  output.mp4

You don't need to know what all those settings do. Suffice to say, the generated files are perfect for YouTube and have no perceptible loss in quality despite being 75% smaller.

How big of a difference does 4K make in practice? All I can say is to take a gander at one of my recent video settings and witness the 4K difference for yourself.

[Category: Magic] [Permalink]


Understanding CORS

RTFM… just kidding! There is no manual for the CORS (Cross-Origin Resource Sharing) specification. I really had you going there, didn't I?

Don't worry, it's not your fault. After all, here is what a Google search provides:

Google Results for searching for CORS documentation

Each of these sites contains a wealth of information about CORS, and each of them is far over the head of your average developer. Given the frequent questions that I receive from confused and frightened developers trying to understand these documents, I thought it might be helpful to boil CORS down into a couple simple examples.

Q. If I have static content that depends neither upon cookies nor user-specific URLs and/or parameters and I want to share my site's content with the web, what should I do?

A.

Access-Control-Allow-Origin: *


Q. Well, that is great and all. But what if I want to let a foreign website interact with my site, as a logged-in user, allowing them to do anything they could as if they were on my site? I swear that I understand the risks that this entails and that I really trust this other site to not make any security mistakes such as falling victim to a cross-site scripting (XSS) attack.

A.

Access-Control-Allow-Credentials: true
Access-Control-Allow-Methods: GET, HEAD, OPTIONS, POST, PUT
Access-Control-Allow-Origin: https://example.com
Access-Control-Expose-Headers: X-Poop-Emoji
Access-Control-Max-Age: 300

Where these headers mean the following:

  • Access-Control-Allow-Credentials means that the user's cookies (such as their session cookies) will be sent with the request
  • Access-Control-Allow-Origin is the whitelisted origin sent in the Origin header by the browser and not * nor blindly reflected

And these optional headers mean the following:

  • Access-Control-Allow-Methods is the list of allowed HTTP methods beyond GET, HEAD, and POST
  • Access-Control-Expose-Headers allows example.com to read the contents of the X-Poop-Emoji header (💩, obviously)
  • Access-Control-Max-Age allows example.com to make these requests without preflights for the next 300 seconds

Again, please be aware that you need to be very careful with Access-Control-Allow-Credentials. Even if you think you're safe by only allowing idempotent methods such as GET, that might be enough to steal an anti-CSRF token and let attackers go to town with CSRF attacks.

If you need additional documentation about other features in CORS, I highly recommend the frustratingly hard to locate CORS for Developers document by Brad Hill.

[Category: Standards] [Permalink]