2009-03-19

XHTML vs. HTML vs. MSIE; or, content-negotiation woes

“See, I like mah HTML in the raw. No sissy ‘editors’ or fancy-ass ‘templates’ for me, no sir. Gimme good old Emacs’ with an nxml-mode on the rocks like God wanted it. I write proper Appendix C–compliant XHTML 1.0 and the redfaces serve it and all is well.”

“Ah, but you see, friend, Appendix C sucks. Verily I say, if thou sendest them XHTMLs as text/html, then they’ll be tag-souped, so there’s no point. Thou might as well write the plainest HTML.”

Word. And further, I hate composing Appendix C code! I hate having to duplicate id and name and lang and xml:lang and I hate those lame spaces before slashes. It just defeats the purpose of choosing XHTML as the shorter format.

Way back then, when I still thought XML was a good idea, I decided to solve this with content negotiation and XSL. I wrote this stylesheet to convert XHTML 1.1 to HTML 4.01.

sabcmd xhtml2html.xsl eatsquaredonuts.xhtml eatsquaredonuts.html

With some help from make(1), I can write pretty, succinct, non-appendixized XHTML 1.1 and automatically create HTML from it.

Now what? Well, I gotta tell Apache to send XHTML to browsers that support it, and HTML to others. A first attempt could go

Options +MultiViews
AddType application/xhtml+xml;charset=UTF-8 .xhtml
AddType text/html;charset=UTF-8 .html

Then, instead of making links to /eatsquaredonuts.html or .xhtml, you simply use the URI /eatsquaredonuts. Each request, Apache will have a seat with the browser and they’ll decide politely what version (“representation”) to send.

Looks great, but when I try a stock, run-of-the-mill Firefox, I get the HTML version. That’s not so bad, but you know, the XHTML version is sexier. I’m using this for my resume, for devil’s sake; how am I supposed to impress the guys at Canonical with boring old HTML? I need a job, Firefox. Just what are you asking Apache in those private conversations, don Firefox?

Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8

Ah, see, Firefox say it will accept HTML and XHTML equally well. If neither is available, it will take application/xml, and if even that isn’t in the menu it will make do with anything (*/*). The order is controlled by the q parameter; it’s an implied 1 if omitted, so Firefox is leaving the choice to Apache, and Apache chooses HTML (perhaps because Firefox mentioned it earlier —I guess old Apache is a lazy man).

Is there a way to tell Apache to prefer XHTML in case of ties? It’s not clear from the docs, but turns out there is:

AddType application/xhtml+xml;charset=UTF-8;qs=1 .xhtml
AddType text/html;charset=UTF-8;qs=0.99 .html

Now Firefox does get my beautiful, precious XHTML, and non-XHTML browsers shouldn’t even see it (since it won’t be in their Accept headers). You know just who am I talking about, right? Let’s test:

Internet Explorer can’t download blahblahblah.

Internet Explorer could not open this site blahblablah not available or could not be found blahblahblah.

wat.

I get it, this message means HUURRR IM A BROWZZARR WHATS THIS XML THINGIE HUUURRR. But why is Apache sending XHTML in the first place? What is the browser asking Apache?

Accept: image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, application/x-shockwave-flash, */*

* * *

There’s no text/html.

This is a web browser, and there is no text/html.

It relies in the order to decide content instead of q parameters, and has a */* at the end, and there is no text/html.

* * *

MSIE, the Bane of the Internet. MSIE, the Breaker of Standards. MSIE, the Blight of Developers.

* * *

Because of MSIE, even our first attempt isn’t safe. We’re forced to tell Apache to resolve ties in HTML’s favor:

AddType application/xhtml+xml;charset=UTF-8;qs=0.99 .xhtml
AddType text/html;charset=UTF-8;qs=1 .html

As far as I can tell, the only way to get Firefox to see XHTML without choking MSIE would be some very convoluted new configuration option to Apache. I’d have to tell Apache, “if text/html and application/xhtml+xml tie, then choose XHTML, except if they’re tying in a */* because the browser didn’t even ask for HTML, in which case you should send HTML”. I’m not sure I even want to ask this feature. I wouldn’t know what to call it.

* * *

The solution, I guess, will be to wait for HTML 5 to turn this dispute academic. See you in 2022…

|comments| = 7

7 comments

  1. Haha, o/

    Comment by Matías2009-03-19 15:53:21

  2. There’s no text/html.

    But there is also no application/xhtml+xml. So, you can for example send it as application/xhtml+xml if application/xhtml+xml is somewhat ok for the user agent in question.

    From my own experience this seems to work fine.

    Comment by Jos Hirth2009-03-19 17:56:41

  3. XML has been buried by JSON. Just use HTML, dude.

    Comment by Elvis Pfutzenreuter2009-03-19 18:45:55

  4. @Jos: Unfortunately Internet Explorer dies horribly with XHTML 1.1.

    @epx: HTML 4 is by far less æsthetically pleasing (source code–wise) than XHTML 1.1. Also, HTML has no <ruby>. HTML 5 is beautiful and made of win, but won’t be usable in a long time. JSON has nothing to do whatsoever with this discussion.

    Comment by leoboiko2009-03-19 19:30:49

  5. Unfortunately Internet Explorer dies horribly with XHTML 1.1.

    Well, IE doesn’t say it would accept it… so… use text/html there. I used PHP for that and it works pretty well. Check the following article for PHP, Python, and htaccess options:

    http://www.xml.com/pub/a/2003/03/19/dive-into-xml.html

    Comment by Jos Hirth2009-03-19 21:12:18

  6. That’s exactly the problem, Jos: how do I tell Apache to negotiate the XHTML when there’s both mimetypes in the Accept header; but negotiate the HTML when there’s neither?

    Following a reddit suggestion I’m fiddling with mod_rewrite, let’s see if there’s a not-too-ugly solution. It would be neat if mod_negotiation allowed you to create RewriteCond–like conditions to influence negotiation decisions.

    Comment by leoboiko2009-03-20 13:04:18

  7. So yep, one could solve this employing mod_rewrite together with mod_negotiation, only they hate each other and break horribly if you try to use both.

    Comment by leoboiko2009-03-23 17:20:49

Leave a comment