Matthew O'Riordan

This is where I record my rants, comment, quotes and thoughts on things. I welcome your input, please fire away. Find out more about me at: http://mattheworiordan.com

URL regular expression for links with or without the protocol

I’ve just come across a pretty common requirement to convert any text that looks like a link into a link within some HTML text.  Strangely, after searching for a good 15 minutes for a regular expression, all I could find was either a regular expressions which detects URLs with a protocol such as http://mattheworiordan.com/, or a regular expression which detects URLs without such as www.mattheworiordan.com.  Why the hell I could not find one which does both is beyond me, so here I go at posting a solution for anyone else to use.

Here is the holy grail:

/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[\-;:&=\+\$,\w]+@)?[A-Za-z0-9\.\-]+|(?:www\.|[\-;:&=\+\$,\w]+@)[A-Za-z0-9\.\-]+)((?:\/[\+~%\/\.\w\-_]*)?\??(?:[\-\+=&;%@\.\w_]*)#?(?:[\.\!\/\\\w]*))?)/

Here is a nice example of this regular expression in action http://jsbin.com/eqocuh/5/edit#source

Please feel free to modify this JSBin, add examples, and update this regular expression, and I will update within this blog post.

Here is an explanation of the regular expression for those who care

(
  ( // brackets covering match for protocol (optional) and domain
    ([A-Za-z]{3,9}:(?:\/\/)?) // match protocol, allow in format http:// or mailto:
    (?:[\-;:&=\+\$,\w]+@)? // allow something@ for email addresses
    [A-Za-z0-9\.\-]+ // anything looking at all like a domain, non-unicode domains
| // or instead of above (?:www\.|[\-;:&=\+\$,\w]+@) // starting with something@ or www. [A-Za-z0-9\.\-]+ // anything looking at all like a domain ) ( // brackets covering match for path, query string and anchor (?:\/[\+~%\/\.\w\-]*) // allow optional /path ?\??(?:[\-\+=&;%@\.\w]*) // allow optional query string starting with ? #?(?:[\.\!\/\\\w]*) // allow optional anchor #anchor )? // make URL suffix optional )