Ben Cantrick (mackys) wrote,
Ben Cantrick

  • Music:

A decades late rant on XML.

The fact that I've been able to avoid XML until now is - I think - generally a positive reflection on my software engineering instincts. When XML was a big new thing, I took one look at it and said to myself: "Uh, yeah... no. I'll just be over here with my C and my micro-controllers and stuff. Y'all have fun with that." As it turns out I called that correctly. It's nice to get one right.

But I'm also lucky that today, in the here-and-now, we actually know that XML isn't the best way to do a lot of the things that its advocates claim(ed) it should be used for. Today we can point at JSON or CSV or whatever and say: "That works (much) better." If you were a programmer in 2003 and thought XML sucked, you didn't have anything well-known to counter it with. And so quite possibly got forced to use it.

So forgive the "old news" flavor of this rant. Everyone who was forced to use XML (which is probably nearly everybody by now) has already learned these lessons. I've just been lucky enough to not have to use XML in any serious capacity... until recently.

XML comments are weapons-grade fail.

First, let's face it: XML got comments wrong from the very start. The mere idea of using one string ("!--") to open a comment, and then a different string ("--") to close it, was flat out stupid to begin with. What, exactly, was wrong with <-- comment --> ? Would that have been too easy to read? Too easy to type?

But of course XML couldn't stop failing there...

You can't put the string "--" inside a comment.

<!-- This is the first line of a comment.
  -- This is the second line of a comment.

If you do something like that, get ready to see:
ERROR caused by: org.yackity.smackity.WhackIty.JackIty.HackIty.OhGodHelpMe.ICanSeeForever.YouAreInAMazeOfTwistyLittleExeption
*** String '--' not allowed in comment at [row,col {unknown-source}]: [37,6]

Stop and think for a minute about how incredibly stupid this is. The string "--" is not "<--", nor is it "-->". Thus, "--" should have no significance what so ever unless it is immediately attached to a "!" or ">" character. There is quite literally no reason at all to disallow "--" inside a comment.

Just to drive the point home, let's show an example of what would happen if we used some other string instead of "--". How about "fish"?
<!fish This is a fish. fish>

That would be invalid XML. Because the same word that came after the exclamation point, was also found again in the comment.

The level of brain death required to accept this state of affairs even temporarily, much less advocate this as a global standard to be used for decades... is just staggering.

But did XML cease its parade of comment failure there? Oh no...

You can't comment out an attribute.

Wanting to commenting out an individual attribute is a perfectly normal and reasonable thing to do. Something you'd clearly anticipate someone wanting to do, among other occasions, during process of debugging. But XML won't let you do it:
<!--  secondattrib="b"  -->

Or, suppose you're trying to explain why an attribute is set the way it is...
  secondattrib="b"      <!-- Reticulates splines optimally. -->

In both cases, your XML parser will shit all over itself at secondattrib. In XML you simply cannot have comments anywhere in a list of attributes. Period. Why? Because XML comments are made of pure fail.


XML's <xs:sequence> is used all over the place in XML schema definition files. This allows the schema designer to enforce the exact order of sub-tags with a tag. Which in turn allows the XML parser to reject a perfectly valid XML document for no other reason than the order of the sub-elements inside some tag was different than it expected. E.g, this works fine:

But this blows right the hell up:

I honestly don't know why <xs:sequence> was even created. I've been thinking about this for a week and I can neither think of, nor find via googling, any example anywhere that shows a valid need to enforce the order of child tags within a tag. The whole point of having a Data Description Language is that doing so allows the computer to take care of the little bullshit things - like what order some arbitrary list of items is given in. But XML schemas far and wide enforce bullshit ordering with <xs:sequence>. Why? Because screw you, that's why!

Considering that XML requires a sophisticated Turing-complete parser to parse correctly, the fact that it can't make a single one of the above simple and obvious things work is very impressive. XML has accomplished something that's rare even in the bug-ridden and incompatible realm of software: it has managed to create the worst of all possible worlds.

So, why does this incredible heap of crap survive? Are we, the software engineers, REALLY THAT STUPID?

Yes. Yes we are.

In 2.5 years it will be 2015. And in 2015, hundreds of millions upon hundreds of millions of lines of code will still depend on - or even be written specifically to support - XML. Because we as software engineers are too damn stupid, too damn lazy, and too damn cowardly to put a bullet in this disease-ridden corpse that never should have won out over plain old SGML in the first place.

tl;dr - Screw XML. Screw it forever. XML is the herpes of the software universe.

And to my fellow software "engineers": if you advocate the use of XML in a new project that could just as easily use JSON, CSV, Windows .INI file format, or any of dozens of other far saner options... shoot yourself in the head you are a bad person and you should feel bad. (I understand, however, if you got stuck with a codebase that is already deeply XML-dependent. My condolences. Welcome to the club.)
  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.