XML Is Not HTML, Unfortunately
Posted on Tue, 21st February 2006 at 02:06 under Software, Hmmmm..., Coding, Publicity
A response to the
Yes, but XML is not HTML, fortunately.
I’m an engineer, not a philosopher, meaning I’m irritable, impatient and totally uninterested in moral arguments, which eliminates most of yours. I just don’t care.
Universal Access Includes Across Space And Through Time
I write software and content now, that is intended to be usable now and into the distant future, unchanged, so I need sound standards, simple enough for me and my software to understand, and that don’t undergo some radical facelift every 10 years. I want the certainty of paper and ink. I need to know now that what I publish will still be understood by browsers long after I’m dead. I care that the knowledge everyone using the internet produces is available to the coming generations, not which standard-du-jour certain companies happen to think is profitable, nor what certain people happen to think is pure.
More Agreement Than Disagreement
Some fundamental agreements have been reached, and some fundamental disagreements remain, but as I keep saying, I’m hopeful. It’s still
Agreements
We have agreed on transport, although that’s not really relevant, but fundamental nonetheless.
We have agreed on a character set. I confidently expect my
We have agreed on location,
We have agreed on media types,
We have agreed on tagging, an approach to combining content and meta-content in text that is agreeable to both humans and computers.
Disagreements?
I thought we had agreed, but now seem to be
Confusion
Now I don’t know what to do. Although I don’t worry about the character set I’ve adopted, people like you make me worry about my content. You make me worry that any use I make of <style> and <script>, whether under my direct control or not, may render my content unviewable not only by my great, great grandchildren after I’m gone, but by me, very much alive and in the not-so-distant future. You make me think my content is rotting.
Compromise
You lament that I do not use <xml:style> and <xml:script>. I am happy to adopt them, but lack the resources to do a full regression test. Can I have your assurance that if I alter my content to that particular expression of the style and script tags, that all current browsers and all future browers will treat the contained text exactly as the <style> and <script> forms of expression are now?
I Was Never Into Crosswords
That’s all I really need to know. I just want to make the right choice for my children, my children’s children and so forth, however the standard emotional argument goes. I don’t care what it is. It’s only a sequence of characters, a word for which I do not know the letters, nor can I figure out what they should be, despite having been given many clues, and it is important to me that I know. I was never into crosswords. Sad, huh?
What To Do Next
I’m not participating in an important mailing list to argue over trivia. I’m desperate to find answers that are extremely important to me. If you are not interested in helping me find those answers, or to work them out for myself, please leave me alone.
See Also
W3 Schools,
Lachlan Hunt, XHTML is not for Beginners
Lachlan Hunt said: February 21st, 2006 at 04:14
I’m really quite confused about the point you’re trying to make. In fact, I have no idea what your point is.
The use of script and style elements in (X)HTML will remain fully forwards compatible, no-one suggested in anyway that this wasn’t the case. Your content will not “rot” because of their use, they will continue to work from now and into the foreseeable future.
If and when <xml:script> and <xml:style> are ever introduced (they’re just ideas floating around at the moment), you will not need to alter your existing XHTML documents to use them, the script and style elements in the XHTML namespace will continue to work as is.
ReplyLibertus said: February 21st, 2006 at 13:05
Hello Lachlan,
People miss my point regularly, I’m no stranger to it. I don’t always have a point, often I am just confused.
However, you have cleared up my confusion. People were just arguing with me instead of trying to find a way to help me. So, what I am doing is perfectly OK, satisfies my desire for posterity and XML will have no impact on it. That is reassuring. I can now confidently use XHTML, as-is, forever, without fear of my content decaying.
Thank you.
ReplyLaurens Holst said: February 21st, 2006 at 17:40
(sorry, my previous reply got botched because it stripped anything between )
[Moderator’s note: I know. Very sorry about that. It catches me too. I’m getting it fixed.]
Hi Paul,
HTML is not XML, and none of this discussion applied to HTML. Your website is served as text/html, so it is HTML and not XHTML, and thus all this isn’t very relevant for it.
You were talking about having <style> and <script> elements in the ‘global namespace’ of XML, a term which does not exist. After I replied that you must have meant to have those attributes in a namespace which applied globally (after all, the discussion was about whether something like that would be useful), and you replied negatively, I asked whether you then meant the empty namespace (xmlns=”" or XML files without any default namespace declarations), and you replied positively.
In XHTML, on the root <html> element there is an xmlns=”http://www.w3.org/1999/xhtml” attribute, which declares the default namespace for that element and all its child elements (unless redeclared). The <style> and <script> elements in XHTML files are therefore not in a ‘global’, nor in the empty namespace, but they are in the XHTML namespace. So if your website were served as application/xhtml+xml or application/xml, you could still use those elements the same way as you’re using them right now.
So it’s a misunderstanding on your side. “People were just arguing with me instead of trying to find a way to help me.” — the W3C mailinglists are not a helpdesk. You suggested some things based on your misunderstanding that were really a bad idea, and that was what people were responding to. So there’s no reason to get upset, except perhaps at yourself.
~Grauw
ReplyLibertus said: February 21st, 2006 at 18:36
Welcome Laurens,
XML gets mentioned a lot on the style list. I still don’t quite understand the relevance of XML to style, nor why people ever mention it at all. So it was interesting, albeit more confusing, to witness such a vigourous defence of the purity of XML on the style mailing list in the face of a simple proposition.
Yes, the misunderstanding is completely on my side. Enlighten me, please. Tell me about the relevance of XML to Style. Without that knowledge, I cannot participate effectively on the mailing list.
ReplyLibertus said: February 21st, 2006 at 18:58
Remember to read the last paragraph of the post. I have an ulterior motive.
I’m trying to find people to talk with, at my level, about technical matters that are extremely important to me. Things I cannot figure out for myself - authoritative knowledge. The very idea of a helpdesk is so far beneath me, Laurens, that you risk insulting my intelligence by even mentioning it, something I’m sure you have no intention of doing. A helpdesk cannot be expected to authoritatively answer the question “how can I ensure my content will still be readable after I’m dead?”, something the Paper Age people took for granted.
Can you answer that question, authoritatively? Do you even care what happens to your work after you’re dead?
ReplyLibertus said: February 21st, 2006 at 19:08
Oh, and I have a hidden agenda too!
I intend to teach as many people as possible how to write content for the web and publish it. That means I need the lingua franca to be as simple as possible. <html> is easy to teach. <html xmlns=”some-sequence-of-nonsense”>, though correct, is not easy to teach, as “why do we have to do that?” is the obvious question and I cannot answer it.
Easiness of teaching isn’t really the issue though. My concern is teaching people the right way. The right way, much as I hate to admit it, is not a matter for me to decide. Someone who knows has to tell me what it is. What is the right way to make web content, accessible to all, forever?
ReplyLaurens Holst said: February 22nd, 2006 at 16:34
(oops, my elements got stripped out again)
(Libertus says: bloody annoying, isn’t it? I’ve reconstructed for you.)
Hey Paul,
Like HTML, XML, too, can be styled.
In fact, that is very useful, and has many applications. An example is a technology such as PrinceXML (printing XML -including XHTML- using CSS).
When you don’t know about XML, participating in discussions about XML isn’t really a good idea. May I recommend you to read a tutorial about XML and namespaces somewhere? E.g. this one: http://www.w3schools.com/xml/
I think no-one can guarantee that, however if you’re using web standards such as HTML, XHTML and/or CSS, I’ll give it a good chance that that will be the case.
Then you should not be using XHTML 1.0 but HTML 4.01. XHTML is HTML in XML, and for namespaced XML, the xmlns is necessary. Similarly, one might ask why in XHTML the closing / is necessary in <link />, <br /> and <img /> elements, and the answer is the same.
~Grauw
ReplyLibertus said: February 22nd, 2006 at 21:18
I know this. I have given thought as to how I would approach teaching someone XSLT/FO. I gave up. XML needs CSS for styling. Ironic, isn’t it?
Precisely Laurens, which is why I said in my prior comment, to which you responded, Your answer is , which I knew, and with CSS, which I knew. That doesn’t make XML relevant to the W3C Style mailing list, does it? The XHTML application sure, but not XML.
Did you check whether I am an author of that tutorial before pointing me to it? Wouldn’t that be embarrassing? Well don’t worry, I wasn’t.
On the other hand, wouldn’t it be just a little embarrassing if you were pointing out a tutorial on XML and XML Namespaces to the author of a page that already used those techniques, and you hadn’t bothered to check?
Oops!
I think I must accept that no guarantee is possible. I already use and intend to continue using the standards you mention. They feel right, although CSS is a bit of a bugger to teach.
I was talking about teaching, not using. I’m perfectly comfortable with XML, HTML, anyMLyoulike. I grok them all, meaning using one is simply a matter of referring to the appropriate specification. Teaching other people how to use them correctly is in no way related to my comfort level with them, but the simplicity, coherence and logic of the language.
The mandatory XML namespace declaration in XHTML requires a priori knowledge in order to use it. Do I have to teach people about XML, XML Namespaces and the relationship between XML, XHTML and the W3C before I can teach them how to write a web page properly? No, of course not. So I have to lie a little, say that <html> is correct, and defer the DOCTYPE and namespace magic until some later stage.
If I teach HTML first, which does not allow self-closing tags, and does not require all tags to be closed, am I teaching bad practice? Yes. Closing and balancing tags is a key skill for web page writing, and I want to teach it from the beginning.
Do you see my dilemma? XHTML requires vast amounts of a priori knowledege just to get the opening tag right, whereas HTML is obsolete.
Because HTML is a SGML application and XHTML is a XML application? One might ask why XML needed self-closing tags in the first place. Why would it be logical for a tag in a data definition language to have no text, therefore provide no meaningful definition of anything?
ReplyLaurens Holst said: February 23rd, 2006 at 01:41
What? I didn’t mention XSL/FO, also don’t see how CSS being used for XML is ironic.
I don’t see how XML is not relevant, you’re not making sense to me. If I think up my own XML markup language for documents (better to use a standard, but oh well), which has for example a subset of the elements that XHTML has, but with different names, and some more, if CSS discussions were not held about XML generically but only about XHTML, all that wouldn’t apply to my own document format?
If you talk about styling XHTML, you’re talking about one application of XML, but as styling XHTML is really no different from any XML, talking about XHTML specifically is more specific than is necessary, the topic could be much more generic.
E.g. the discussion this all originated from showed it perfectly: some people wished to not only be able to include external stylesheets in XML (by means of ), but also have the stylesheet inline in the same document. That’s how came up as an idea, to serve as a generic container for inline style in any XML document, not tied to a specific application of XML like is.
Sorry, I don’t view-source every page I encounter to get a personality profile of the author. Besides, for all I know you could just have pasted something and filled in your own namespace.
Anyways, if you know about XML and namespaces, I don’t see how you could get confused about it.
Why not? Just say this xmlns attribute should be here, and explain why that is later on. In the end, it’s all the same; it’s not as if I do anything else but copy/paste the basic structure of any webpage I create from one that I created earlier. Saves me the bother of finding/remembering/typing the proper attributes for the !DOCTYPE, html xmlns, meta and link elements.
But one can’t author proper XHTML if one doesn’t know the rules of XML. So imho, if you don’t want to teach them XML, then you shouldn’t be teaching them XHTML, but HTML instead.
I’d say it is better to teach everything correctly from the start. If you don’t, you unconsciously also teach them that making mistakes is OK, ‘it will still work anyway’. While with XHTML that is simply not the case! You should have people use the browser’s XML parser by naming the files .xhtml or something. Then if their documents are not well-formed, it won’t work, and that way they won’t get accustomed to the browser being forgiving.
If you don’t like that, as I said before, use (or teach, whatever) HTML, not XHTML.
Mwoa, in HTML you can close most tags (e.g. <p>, <td>, <li>), and the ’self-closing’ isn’t really that important.
Besides, when you’re teaching them to close all tags, why not explain in one go that the xmlns=”http://www.w3.org/1999/xhtml” means that the string inside the xmlns attribute identifies XHTML, and that because of this all elements are in the XHTML ’space’?
I think you’re exaggerating, you can a. just tell people to do it because it belongs there (it is not different from the !DOCTYPE at all), and b. even if you explain some more of the details, the concept isn’t that difficult to grasp.
It’s not, certainly not if you want to believe the WHATWG who is very busy developing HTML5. And frankly, if XHTML is not taught properly (including XML, and its strict error-handling, which are an intrinsic parts of it), I’d rather have you just teach HTML.
I doubt your students have any clue about SGML. Even I don’t really grasp the full reach of it, I only really know how HTML works and how browsers parse HTML. Anyways, my point was: if you can explain /, you can also explain xmlns.
Anyways, I’m drifting away from the subject, which is not ‘XHTML must be parsed as XML’ (which it should), but (I suppose) ‘CSS is relevant to XML’.
Or perhaps it turned into what ‘the right way’ is. Well then, let’s start with: HTML will be around virtually forever. There are millions of pages using HTML, and that’s not going to go away anytime soon, nor become inaccessible in tomorrow’s browser. So teaching HTML isn’t wrong. (Do teach HTML Strict however.)
Now as for XHTML, I really think it is cool and all, and I’m sure that’ll stick around as well for a very long time. But it should be used as being an XML application, and be taught as such. If XHTML is served as HTML (or worse: has errors which make it invalid XML!) then it is useless, all that’s left is some invalid HTML with weird / attributes that an SGML parser can’t make heads or tails of. Now fortunately HTML parsing does really not equal SGML parsing, and is pretty forgiving too, but still you’re increasing the risk that it won’t work forever by writing code that is neither valid HTML nor valid XHTML.
So to summarize: the best way to write future-proof code is by writing valid, standard-complient code, according to a common standard. HTML, XHTML and CSS being examples of that.
~Grauw
p.s. right now, there are now much more applications that process HTML than there are applications that process XHTML… e.g. Google. I can’t tell the future, but judging from the current situation, if future-compatibility is your only concern, you should go for HTML, it is the safest choice.
ReplyLaurens Holst said: February 23rd, 2006 at 01:50
One note with regard to my p.s…. Actually, I’m not so sure about that. HTML requires a very complicated and dedicated parser, while XHTML can be read by any generic XML parser, which are easier to make and more often available as standalone components and as part of other processing tools. So maybe 100-years-from-now they’ll like XML pages better, they will be easier to process and analyse, just like is the case now.
Then again, given the huge amount of HTML pages, they’d have to go through the bother of creating an HTML parser anyway.
~Grauw
ReplyLachlan Hunt said: February 23rd, 2006 at 05:13
No, in fact, you should be teaching HTML to beginners, because XHTML is not for beginners. HTML is not yet obsolete, because XHTML is not yet ready to replace it - well, at least browsers aren’t ready for it.
ReplyLibertus said: February 23rd, 2006 at 10:47
Hi Guys,
You’ve both given me a lot to think about, all relevant to my current area of study, which is most helpful, so thank you. I’m busy with other things today, so I likely won’t respond in detail until later.
In the meantime, to illustrate my point, but for fun only, is the question “How would you teach your parents and grandparents to write web pages?”
ReplyLachlan Hunt said: February 23rd, 2006 at 12:41
I would start with the basics and teach them HTML. I’d give them a standard template comprising the bare minimum markup required: A DOCTYPE, html, head, title and body elements.
I then teach them about basic structures like headings, paragraphs and lists and how to add emphasis and strong emphasis their words. That will give them a basic understanding, sufficient for them to write up a basic article. Perhaps have them markup a short article from a newspaper or magazine.
Once they’ve marked up a couple of sample documents and have a feel for the language, I’d teach them how about hyperlinks, and have them link their sample documents together, and then how to link to specific parts of the page using fragment identifiers and ID attributes.
Tables would probably come next, starting with just simple table, tr and td elements and possibly th elements. Other elementss like the thead, tfoot, tbody and col and attributes like rowspan and colspan would be left till later.
Throughout the whole process, I’d have them continually verify that they aren’t making any mistakes with constant validation. Given that the’ll be relatively simple documents, the errors will be equally trivial to fix, and that will help in two ways: a) give them more confidence with fixing mistakes, and b) help them to recognise and avoid common mistakes very early on.
ReplyLibertus said: February 27th, 2006 at 10:36
Lachlan,
Apart from the very first steps, your strategy for teaching markup seems sound. The learning curve of those first steps is very steep, of course, so I don’t believe there is a “right” way. Most people naturally understand both “text” and “style”, but have difficulty with the techinical magic that separates them.
Mastering basic tagging is a huge intellectual leap, so much so that, as I said before, I am inclined to leave the non-visible syntactic elements such as document types and headers for later introduction. Depends on the student, of course. I focus more on teaching people who can be expected to know nothing, such as3rd Agers .
Marking up existing text is an excellent idea, especially for teaching intermediate concepts such as layout. Each step in the process can be demonstrated and there is no shortage of example material from which to draw.
It is teaching advanced techniques, especially hyperlinks, that causes me most difficulty. Links are a novel concept arising from the medium, so whilst there is much technical guidance available, there is little in the way of conventional wisdom regarding their use that can be taught, as we are all still learning. Links are important because they form the “mind” out of the web “brain”. They can be structural, navigational and semantic so their correct use is both a technical and artistic discipline, demanding a self-less view of the world that doesn’t come easily.
Regarding the whole process, I like the idea of a training wiki containing documents in many states of both disrepair and perfection, with a set of improvement tasks or challenges for the students relating to certain aspects of markup, from bare text which needs proof-reading and styling to old documents full of broken links that need retargeting. Wikis are an excellent teaching tool because they encourage collaboration, reward team effort and are forgiving of mistakes.
What challenges would you set your students in order to prove their readiness to write for the web, both to yourself and them? What skills would you want them to be able to demonstrate? Are there any mistakes that you think should attract an immediate FAIL?
Reply