Regular Expressions Aren’t the Devil

I love regular expressions. Okay, I love the challenge of crafting regular expressions. I do not enjoy reading regular expressions that I have not created or, really, even the ones I do create. But give me a problem and tell me to make a regular expression to match things and I am all over it.

A co-worker wanted a regular expression to turn unlinked URLs in text into HTML links and to correct linked URLs that lacked a protocol into valid URLs. In other words, if “www.google.com” appeared in some text, it needed to be replaced with <a href="http://www.google.com/">www.google.com</a> and <a href="www.google.com">some link text<a> needed to turn into <a href="http://www.google.com">some link text<a>

My first pass was a monster regular expression that handled both situations but I couldn’t get the replacement string to account for the fact that there was already link text in the invalid URL example. And I couldn’t adequately cover the situation where there were attributes before the href attribute. So scrap that one.

This is what I came up with after separating it into two replacement passes. I share it with you both as a testament to my regular expression abilities (good or bad, you decide) and because this situation seems like one that might come up pretty frequently.

Regular expression	Replacement string
`(?<=\s\|^)(?<domain>www\.[^\s]+)(?=\s) \|(?<=\s)(?<protocol>http[s]?://){1} (?<domain>(www)?\.?[^\s]+)(?=\s)`	`<a href="http://${domain}">${domain}</a>`
`href="(?<domain>www\.[^"]+)"`	`href="http://${domain}"`

This entry was posted on March 15, 2007 at 9:02 pm and is filed under .Net Development, Programming. You can follow any responses to this entry through the RSS 2.0 feed. Both comments and pings are currently closed.

bblog

Regular Expressions Aren’t the Devil

Share this:

Related