Matthew O'Riordan

This is where I record my rants, comment, quotes and thoughts on things. I welcome your input, please fire away.

Find out more about me at: http://mattheworiordan.com

November 22, 2011 at 10:33pm

Home

URL regular expression for links with or without the protocol

I’ve just come across a pretty common requirement to convert any text that looks like a link into a link within some HTML text.  Strangely, after searching for a good 15 minutes for a regular expression, all I could find was either a regular expressions which detects URLs with a protocol such as http://mattheworiordan.com/, or a regular expression which detects URLs without such as www.mattheworiordan.com.  Why the hell I could not find one which does both is beyond me, so here I go at posting a solution for anyone else to use.

Here is the holy grail:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/

Here is a nice example of this regular expression in action http://jsbin.com/eqocuh/5/edit#source

Please feel free to modify this JSBin, add examples, and update this regular expression, and I will update within this blog post.

Here is an explanation of the regular expression for those who care

(
 ( # brackets covering match for protocol (optional) and domain
  ([A-Za-z]{3,9}:(?:\/\/)?)   # match protocol, allow in format http:// or mailto:
  (?:[\-;:&=\+\$,\w]+@)?   # allow something@ for email addresses
  [A-Za-z0-9\.\-]+   # anything looking at all like a domain, non-unicode domains
| # or instead of above
(?:www\.|[\-;:&=\+\$,\w]+@) # starting with something@ or www.
[A-Za-z0-9\.\-]+ # anything looking at all like a domain
)
( # brackets covering match for path, query string and anchor
(?:\/[\+~%\/\.\w\-]*) # allow optional /path
?\??(?:[\-\+=&;%@\.\w]*) # allow optional query string starting with ?
#?(?:[\.\!\/\\\w]*) # allow optional anchor #anchor
)? # make URL suffix optional
)

Notes

  1. mattheworiordan posted this