In addition to HTML documents, about a dozen other file formats are recognized and displayed by the rendering engines of modern web browsers; a list that is likely to grow over time.
Because of the powerful scripting capabilities available in some of these formats, and because of the antics of browser-content handling, the set of natively supported non-HTML inputs deserves a closer examination at this point, even if a detailed discussion of some of their less-obvious security consequences—such as content sniffing—will have to wait until Part II of this book.
Perhaps the most prosaic type of non-HTML document recognized by every single browser is a plaintext file. In this rendering mode, the input is simply displayed as is, typically using a nonproportional typeface, and save for optional character set transcoding, the data is not altered in any way.
All browsers recognize plaintext files served with Content-Type: text/plain in the HTTP headers. In all implementations but Internet Explorer, plaintext is also the fallback display method for headerless HTTP/0.9 responses and HTTP/1.x data with Content-Type missing; in both these cases, plaintext is used when all other content detection heuristics fail. (Internet Explorer unconditionally falls back to HTML rendering, true to the letter of Tim Berners-Lee’s original protocol drafts.)
For the convenience of developers, most browsers also automatically map several other MIME types, including application/javascript and friends[35] or text/css, to plaintext. Interestingly, application/json, the value mandated for JSON responses in RFC 4627, is not on the list (perhaps because it is seldom used in practice).
Plaintext rendering has no specific security consequences. That said, due to a range of poor design decisions in other browser components and in third-party code, even seemingly harmless non-HTML formats are at a risk of being misidentified as, for example, HTML. Attacker-controlled plaintext documents are of special concern because their layout is often fairly unconstrained and therefore particularly conducive to being misidentified. Chapter 13 dissects these threats and provides advice on how to mitigate the risk.
[35] The official MIME type for JavaScript is application/javascript, as per RFC 4329, but about a dozen other values have been used in the past (e.g., text/javascript, application/x-javascript, application/ecmascript).