URL regular expression for links with or without the protocol
I’ve just come across a pretty common requirement to convert any text that looks like a link into a link within some HTML text. Strangely, after searching for a good 15 minutes for a regular expression, all I could find was either a regular expressions which detects URLs with a protocol such as http://mattheworiordan.com/, or a regular expression which detects URLs without such as www.mattheworiordan.com. Why the hell I could not find one which does both is beyond me, so here I go at posting a solution for anyone else to use.
Here is the holy grail:
/((([A-Za-z]{3,9}:(?:\/\/)?)(?:[-;:&=\+\$,\w]+@)?[A-Za-z0-9.-]+|(?:www.|[-;:&=\+\$,\w]+@)[A-Za-z0-9.-]+)((?:\/[\+~%\/.\w-_]*)?\??(?:[-\+=&;%@.\w_]*)#?(?:[.\!\/\\w]*))?)/
Here is a nice example of this regular expression in action http://jsbin.com/eqocuh/5/edit#source
Please feel free to modify this JSBin, add examples, and update this regular expression, and I will update within this blog post.
Here is an explanation of the regular expression for those who care
(
( # brackets covering match for protocol (optional) and domain
([A-Za-z]{3,9}:(?:\/\/)?) # match protocol, allow in format http:// or mailto:
(?:[\-;:&=\+\$,\w]+@)? # allow something@ for email addresses
[A-Za-z0-9\.\-]+ # anything looking at all like a domain, non-unicode domains
| # or instead of above
(?:www\.|[\-;:&=\+\$,\w]+@) # starting with something@ or www.
[A-Za-z0-9\.\-]+ # anything looking at all like a domain
)
( # brackets covering match for path, query string and anchor
(?:\/[\+~%\/\.\w\-]*) # allow optional /path
?\??(?:[\-\+=&;%@\.\w]*) # allow optional query string starting with ?
#?(?:[\.\!\/\\\w]*) # allow optional anchor #anchor
)? # make URL suffix optional
)