Inlining a CSS stylesheet with C#

This week I worked on adding confirmation emails to our quiz system and was introduced to the maddeningly fickle world of HTML emails. HTML emails are altogether a different beast from HTML for the web, with most of your favorite features missing, and each email client supporting a different subset of features. The feeling of this discovery was one akin to thinking that the dark ages were well behind us and then discovering that the bubonic plague has just broken out in town again.

I had hoped that the detailed confirmation emails could simply re-use the HTML that our system already generated, but after a few test emails it became apparent that most of that HTML relied heavily on external stylesheets which no email client would render. With mounting dread I foresaw the likelihood that I would have to write another rendering engine for emails rather than re-using the one we already had. This duplication of work was an affront to both the DRY philosophy and the inherent laziness of any good developer (most developers seem to be willing to do massive amounts of work now to save themselves the trouble of having to do any extra or repetitive work in the future).

Willing to do almost anything to avoid this code duplication I wondered if I could take the HTML rendered by the existing quiz engine and just tweak it to get it work in our emails. So how hard would it be to write some code that took these external stylesheets and applied them as inline rules (which is supported to some degree by most email clients)? Not hard at all as it turns out.

The plan

  1. Parse the CSS document, extracting all of the rules and properties
  2. Loop through the CSS rules, find all matching HTML elements, and apply the properties as inline styles
  3. Profit

Step 1 - Parsing the CSS

A regular expression was all it took to the create a rudimentary parser for our CSS file. It may not work with all CSS files, but it work on all the CSS files I needed for the project, which was good enough for now.

MatchCollection allCssRules = Regex.Matches(stylesheet, "([^{]*){([^}]*)}", RegexOptions.Singleline);

Step 2 - Inlining the CSS

Now the hard part: Finding all of the HTML elements that match the CSS rules. If, in the back of your mind, there is a voice that says “let’s use regex again”, that little voice is the voice of Beelzebub himself and needs to be excised.

My favorite HTML manipulation library is HtmlAgilityPack. The last release is from May 2010, but even without active development it is a rich and easy to use library that will likely be my goto library for HTML manipulation until something revoluationary replaces it. HtmlAgilityPack supports XPATH selectors, but doesn’t natively support finding elements via CSS selectors.

I discovered a few libraries that offer this functionality, but after taking them out for a test drive they seemed like overkill for this problem. While searching for these libraries, however, I came across a 2005 post by John Resig pointing out that CSS and XPATH selectors are actually fairly similar. So how hard would it be to convert our CSS selectors to XPATH selectors?

There’s a library for that.

css2xpath is a C# port by MostThingsWeb of a javscript function written by Andrea Giammarchi that accomplishes this exact feat. Using it is as trivial as:

string xpath = css2xpath.Transform(cssSelector);

Bingo. So now we can use that XPATH selector with HtmlAgilityPack to find the HTML tags we need.

string xpath = css2xpath.Transform(cssSelector);
HtmlNodeCollection matchingNodes = rootNode.SelectNodes(xpath);
if (matchingNodes != null) {
	foreach (HtmlNode node in matchingNodes ) {
		// detect if a style attribute already exists and create/append as necessary
		if (node.Attributes["style"] != null) {
			node.Attributes["style"].Value += ";" + cssProperties;
		} else {
			node.Attributes.Add("style", cssProperties);
		}
	}
}

And putting it all together.

public static string ApplyStylesheetAsInline(string html, string stylesheet)
{
	HtmlDocument htmlDoc = new HtmlDocument();
	htmlDoc.LoadHtml(html);
	HtmlNode rootNode = htmlDoc.DocumentNode;
	
	// clean up the stylesheet
	stylesheet = Regex.Replace(stylesheet, @"[\r\n]", string.Empty); // remove newlines
	stylesheet = Regex.Replace(stylesheet, @"\s*(?!<\"")\/\*[^\*]+\*\/(?!\"")\s*", string.Empty); // remove comments

	// remove excess space
	while (stylesheet.Contains("  ")) {
		stylesheet = stylesheet.Replace("  ", " ");
	}

	// extract and inline all css rules
	MatchCollection allCssRules = Regex.Matches(stylesheet, "([^{]*){([^}]*)}", RegexOptions.Singleline);
	foreach (Match cssRule in allCssRules) {

		string cssProperties = cssRule.Groups[2].Value.Trim();
		string[] cssSelectors = cssRule.Groups[1].Value.Split(',');

		foreach (string selector in cssSelectors) {

			string xpath = css2xpath.Transform(selector.Trim());
			HtmlNodeCollection matchingNodes = rootNode.SelectNodes(xpath);
			if (matchingNodes != null) {
				foreach (HtmlNode node in matchingNodes) {
					// detect if a style attribute already exists and create/append as necessary
					if (node.Attributes["style"] != null) {
						node.Attributes["style"].Value += ";" + cssProperties;
					} else {
						node.Attributes.Add("style", cssProperties);
					}
				}
			}
		}
	}

	return htmlDoc.DocumentNode.OuterHtml;
}

We’re in business! There are some CSS selectors that don’t translate well into XPATH selectors, but a little trial-and-error allowed me to find and tweak these rules so that they worked.

This isn’t a perfect solution, but it’s a beatifully simple one that solves the problem at hand.

Epilogue

After the inline CSS resolved 90% of the HTML formatting problems I still had to tweak the HTML and CSS a little to get everything working smoothly. The two biggest changes I made were:

Switching to tables for some of my layout. Float and display are two CSS tags which don’t play nicely in HTML emails and will cause your beautifully crafted page to melt in the email client. Swallowing some pride as a developer I switched to tables for some of the essential content.

Using images instead of background images. Some of the icons used throughout the page were initially background image which also don’t work in email clients. The rendered HTML had been designed so that even if the images didn’t render it would still look okay and be understandable, but after a number of tests I realized that the icons (like a green checkmark on the correct answer) added a lot to the page so I switched these to actual images on the page (which will also help when people print out the page). The email will still render correctly if people choose not the allow the images, but if they want to print out their results or anyting else the images really made the page a lot prettier.

comments powered by Disqus