Go to page content

Just Enough Zotonic Source Part 3 - Regular Expressions

Learn how to manipulate string data with the re module.

Lloyd R. Prentice August 31, 2011


The Erlang standard library re provides a powerful suite of functions to excute regular expressions to find, replace, and manipulate substrings within a string or Erlang binary.

Re functions are particularly useful for form validation, decomposing and modifying urls, e-mail addresses, and other common web elements represented as strings or binaries.

They are found throughout Zotonic source files.

See: http://www.erlang.org/doc/man/re.html


You have a current version of Erlang installed on your system.

Examples have been tested on Ubuntu 11.04.


Bring up an Erlang terminal:

$ erl
Erlang R14B03 (erts-5.8.4) [source] [64-bit] [smp:3:3] [rq:3] [async-threads:0] [kernel-poll:false]

Eshell V5.8.4  (abort with ^G)

...and follow along with the following examples. Modify and re-execute each example until you feel comfortable with what's going on.

What is a regular expression?

A regular expression is a pattern that is matched against a subject string from left to right.

The power of regular expressions comes from the ability to include alternatives and repetitions in the pattern. These are encoded in the pattern by the use of metacharacters.

What is a pattern?

Most characters stand for themselves in a pattern.

The pattern "ick", for instance, would match the first occurence of "ick" in the string "The quick brown fox."

We'll use re:run/2 to illustrate:

run(Subject,RE) -> {match, Captured} | nomatch


6> re:run("The quick brown fox.","ick").

The atom "match" is self-explanatory. The tuple {6,3} in the list provides the start position and length of the pattern in the subject string.

7> re:run("The brown fox.","ick").          

re:run/2 will also work with a binary:

8> re:run(<<"The quick brown fox.">>,"ick").

If we wish to find all instances of "ick" in a string, we need to use re:run/3.

run(Subject,RE,Options) -> {match, Captured} | match | nomatch


9> re:run("The sick quick brown fox.", "ick", [global]).

For documentation of re:run/3 options, see:


How can I replace a substring in a string?

Use re:replace/3:

replace(Subject, RE, Replacement) -> iodata() | unicode:charlist()


10> re:replace("The quick brown fox.", "brown", "red").
[<<"The quick ">>,<<"red">>|<<" fox.">>]

Hmmm... re:replace/3 returned an odd-looking binary.

Let's use re:replace/4 to provide an option:

replace(Subject, RE, Replacement, Options) ->
           iodata() | unicode:charlist()
11> re:replace("The quick brown fox.", "brown", "red", [{return, list}]).
"The quick red fox."

Frankly, I had to fiddle to figure out how to use the option. Erlang documentation is generally thorough, but often not that easy to follow. That's why I created this Cookbook item. I wanted to learn this stuff myself.

Regular expressions can deliver much much more, however, with shrewd use of metacharacters.

What is a metacharacter?

Metacharacters are interpreted in special ways.

For instance, the metacharacter . matches the first instance of any character in a string except newline.


13> re:run("The quick brown fox.", ".").

You'd usually use . in a more elaborate pattern.

14> re:run("The quick brown fox.", "qu.").                               

15> re:run("The quack brown fox.", "qu.").

The metachacter ^ asserts start of string.


16> re:run("The quack brown fox.", "^The").

17> re:run("The quack brown fox.", "^qua").

Similarly, the metacharacter $ asserts the end of a line.

18> re:run("The quick brown fox is sick.", "ick.$").

Are there other metacharacters?


The metacharacter * matches zero or more characters.


19> re:run("The quick brown fox.", "i*").

20> re:run("The quick brown fox.", "T*").   

21> re:run("TTTTThe quick brown fox.", "T*").

The metacharacter + matches one or more characters.

22> re:run("TTTTThe quick brown fox.", "z+").

23> re:run("TTTTThe quick brown fox.", "T+").

The metacharacter | alternate patterns. Think of it as "or".


24> re:run("The quick brown fox.", "fox|pig").

25> re:run("The quick brown pig.", "fox|pig").

You can also match generic character types.

\s, for instance matches any whitespace character.


26> re:run("The quick brown fox","\s",[global]).

How can I match non-printing characters?


Non-printing characters http://www.erlang.org/doc/man/re.html

Note that the metacharacters [ and ] have special meaning.

What do they mean?

They enclose "character classes."

What's a character class?

The set of characters in a character class match, if any found, one character in the subject string.


24> re:run("The quick brown fox.", "[qui]").

25> re:run("The quick brown fox.", "[ui]"). 

26> re:run("The quick brown fox.", "[qui]", [global]).

You can combine characters, meta-characters, and other regular expression elements into extended patterns that can search, match, and replace nearly any substrings you can imagine.


27> re:run("E-mail: xyz@pdq.com", "[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-z]{2,3}").

Note: DO NOT use this pattern in production. It needs more refinement and much more testing.

What other goodies does re offer?

split(Subject, RE) -> SplitList


split(Subject, RE, Options) -> SplitList


28> re:split("this/is/my/path","/").

If you wish to use a pattern multiple times and boost perfomance, you can compile it with re:compile/1.


29>  {_, P} = re:compile("[a-zA-Z0-9]+@[a-zA-Z0-9]+\.[a-z]{2,3}").
30> re:run("E-mail: xyz@pdq.com", P).

How are regular expressions used in Zotonic source?

For one of many examples, look at ../zotonic/src/markdown/get_url/1.

get_url(String) ->
    HTTP_regex = "^(H|h)(T|t)(T|t)(P|p)(S|s)*://",
    case re:run(String, HTTP_regex) of
        nomatch    -> not_url;
        {match, _} -> get_url1(String, [])

Where can I go from here?

Study and experiment with all the metacharacters and other regular expression constructs in:


Do further research on the web. Everytime you see an interesting regular expression, test it in re:run/2. You may well have to edit to get it to run on re:run/2. But if you understand the basics, it won't be difficult.


CAUTION: Complex regular expression patterns are hard to read and error prone. Break them down into short segments and test each segment. Then build them back up.

The hard part is confirming that your pattern will match all possible instances of the string segments you're interested in.


re http://www.erlang.org/doc/man/re.html



This page is part of the Zotonic documentation, which is licensed under the Apache License 2.0.