MOSS 2007: Guide to making SharePoint XHTML Compliant

This is intended as a pretty high level overview of how you can get MOSS to validate against the W3C XHTML 1.0 recommendation. The aim of this article is not to explain every intricate detail of getting a MOSS site to be XHTML compliant. However it should demonstrate some techniques to get people started and eventually develop better methods of achieving compliance.
Firstly it should be noted that this is really just for public facing publishing sites. In other words we're talking about using just the WCM features of MOSS. You are not going to get your fully featured Intranet to be conformant in the current releases of SharePoint. The reason for this is that SharePoint generates out a lot of non W3C validating code. This is largely due to the richness of many SharePoint features like web parts.
Some of the techniques mentioned in this article are equally applicable to standard ASP.NET sites. So if you're totally unfamiliar with XHTML validation it may be worth having a look at the Microsoft technical article on Building ASP.NET 2.0 Web Sites Using Web Standards. It is also likely that some of these techniques will be unsupported by Microsoft; so use at your own risk.

Web.Config

The first step is to add the conformance configuration setting to your web.config file. This will force certain asp.net controls to only output attributes that comply with XHTML standards.
<xhtmlConformance mode="Strict" />
This is only really important if you are trying to conform to the strict standard. If the setting is omitted the default output for ASP.NET controls is XHTML transitional. I prefer to add the tag in any way to make it explicit. Unfortunately many SharePoint controls are still going to output non-compliant tags.

Master Page Basics

The master page is where you are going to do the majority of the validation work. It really pays to start fresh with the Microsoft minimal master page. You are going to cause yourself a serious amount of pain if you try and tweak one of the out-of-the-box master pages. It is also best to start off using a page that is based on a blank page layout. This will help identify where any validation errors are actually coming from. The first thing to do with the master page is set the doctype to reflect the xhtmlconformance setting. For example if I'm using the strict standard my doctype will be:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
You can also add a few XHTML attributes to the html tag:
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">

At this stage it is worth performing a W3C validation check to get a feel for the types of errors that need to be fixed.
My initial check yielded a horrifying 232 validation errors, with no content or navigation. Did I say this was going to be easy?
If you scroll through some of the errors you should notice that most of them are to do with invalid attributes. The majority of the errors come from only a few controls; namely the site actions menu and publishing console (referred to as Authoring Controls from here on).

Authoring Controls

These controls aren't as big a problem as they first appear to be. We only really care about W3C compliance for anonymous (public) accessing users. As the authoring console is security trimmed when a normal public user accesses the site the authoring html won't be rendered at all. You can prove this by enabling anonymous access, logging out and then trying to view the page anonymously. Make sure you have checked in and published the master page and layout page or you won't be able to see the changes. The page will only display the welcome control displaying the 'Sign In' message. A quick validation check should show only a dozen or so validation errors. About half of these errors will have something to do with the site actions control. Although it has been removed from the page visually it is still rendering some HTML. There is an easy to solution to this problem; the SPSecurityTrimmedControl. This control allows blocks of content to only render when the user has specified a specified permission set. By wrapping the site action control inside the security trimming control and setting a required permission, the site action control is prevented from rendering any HTML at all.
<SharePoint:SPSecurityTrimmedControl PermissionsString="AddAndCustomizePages" runat="server">
<PublishingSiteAction:SiteActionMenu runat="server"/> </SharePoint:SPSecurityTrimmedControl>

Remove/clean non-compliant HTML

Upon making a closer examination of the remaining validation errors it becomes obvious that most of them are to do with poorly declared script tags. Unfortunately these script tags are usually generated by the HtmlForm control so it's not easy to override the output. The one technique that can be applied is to override the Page.Render method and do a bit of tag cleaning. This effectively lets us hijack the HTML rendering process and have a chance to add, modify or remove parts. It does involve writing a bit of inline code in the master page. Assuming that you have set your master page to allow inline code we can add some code similar to the following in the master page.
<script type="text/c#" runat="server">
protected override void Render(HtmlTextWriter writer)
{
// extract all html
System.IO.StringWriter str = new System.IO.StringWriter();
HtmlTextWriter wrt = new HtmlTextWriter(str);

// render html
base.Render(wrt);
wrt.Close();
string html = str.ToString();
// find all script tags
Regex scriptRegex = new Regex("<script[^>]*");
MatchCollection scriptMatches = scriptRegex.Matches(html);

// go through matches in reverse
for (int i = scriptMatches.Count - 1; i >= 0; i--)
{
    // identify script tags with no type attribute
    if (scriptMatches[i].ToString().IndexOf("type") < 0)
    {
    // add type attribute after script opening tag
      html = html.Insert(scriptMatches[i].Index + 7, " type=\"text/javascript\"");
    }
}

// write the 'clean' html to the page
writer.Write(html);
}

</script>
This code block uses some regex matching to find all script tags and add a type attribute to any tags that don't already have one. It is only a partial solution to the script tag problem, more code will need to be written to completely clean the tags.
Note that this code is crude and untested, but it should give you an idea of what can be done. It would be prudent to keep this kind of code to a bare minimum, so that performance is not affected. In other words you should only be using this method when you have no other option.

Meta Tags

You could extend the render code sample to remove all of your non-compliant code if you want. I would suggest that this is not the best way of doing things. Another technique is to replace standard SharePoint controls with your own custom built controls. A simple example of where this can be done is with the RobotsMetaTag. Using Lutz Roeder's Reflector I was able to extract the following (simplified) code for the RobotsMetaTag control:

public class RobotsMetaTag : SPControl
{
protected override void Render(HtmlTextWriter output)
{
    if (!SPControl.GetContextWeb(this.Context).ASPXPageIndexed)
    {
      output.Write("<META NAME=\"ROBOTS\" CONTENT=\"NOHTMLINDEX\"/>");
    }
}
}

We can see that the META, NAME and CONTENT elements all violate the XHTML rule that stipulates all tags should be lower case. The offending line can be rewritten in a custom control as:
output.Write("<meta name=\"ROBOTS\" content=\"NOHTMLINDEX\"/>");
Now deploy the custom control to your SharePoint environment and add it to the master page in place of the RobotsMetaTag. Two more validation errors out of the way. I recommend using this approach wherever possible.

Content

Between the security trimming control, render code and custom control methods it is now possible to fix all validation errors and eventually get a conforming web page. Great, we have just made a virtually blank web page comply. The next step is to add all the content and navigation back in. To achieve total compliance it will be necessary to create a custom control for many of the components on each page. The trick here is to make these controls as re-usable as possible.
The majority of the content in your site will come from page layouts and thankfully most of the field controls output XHTML compliant code out-of-the-box. When you come across a control that doesn't comply, again, you will need to make your own.
When it comes to navigation you have a couple of options. Either create custom navigation controls from scratch, or extend the SharePoint AspMenu control, as the source code has now been released for this control.

Web parts

At first I thought web parts were going to be no trouble when it came to W3C validation. Most of the web parts I use have an XSL source editor so all that is involved is to make sure that the XML is transformed into valid HTML right? Wrong, unfortunately web parts such as the Data View Web Part use some surrounding tables for layout. This is intrinsically bad practice for web development and accessibility but worse is that the tables use all kinds of non-standard attributes. The only solution that I have found to this is to override the render method as explained earlier. I recommend keeping web part use to a minimum. Web part zones should definitely not be used; they introduce a large amount of layout tables and non compliant HTML.

Summary

Hopefully this gives you a bit of a starting point in getting a SharePoint WCM site XHTML compliant. It is definitely not a simple task, however if you are doing a lot of SharePoint development you are bound to come across a project that requires it (e.g. government sector work). I believe that Microsoft is quite aware of these compliance issues and they will hopefully be addressed in the next release if not a service pack. I plan to maintain this document over time as I discover better techniques for dealing with the various validation issues that SharePoint presents. Good luck in getting your site to validate, you can look forward to seeing the following message to reward your hard work.

MOSS 2007

Thursday, December 10, 2009

Guide to making SharePoint XHTML Compliant