Regular Expression ( Regex ) in Python : Python tutorial 26

What is a regular expression :

A regular expression is a sequence of characters that represents a search pattern. Using regular expression, we can check if a sub- string of that expression type exist or not, if exist than replace it with another string etc. One great thing about regular expression is that the syntax is same for all different programming and scripting languages i.e. same pattern will work on Python,Java, Perl,SED etc.

How to test a regex Expression without running the code each time :

There are different websites available online, using which we can check and verify a regex string. e.g  this site or this site etc.

For using regular expression in python, we need to import ‘re’ module first. Then the expression should be compiled to a pattern object using compile() method. The pattern object has different methods which can be used after it is created.

A simple regex example in python :

Let’s try with the following simple example :

import re

print re.search("fox","The quick brown fox jumps over the lazy dog")

print re.search("goat","The quick brown fox jumps over the lazy dog")

Output :

<_sre.SRE_Match object at 0x105f71a58>
None

The first line was able to find the “fox” word in the line, but the “goat” was not found. We can even write this same expression as below :

import re

if re.search("fox","The quick brown fox jumps over the lazy dog") :
    print ("fox is in the line")

if re.search("goat","The quick brown fox jumps over the lazy dog") :
    print ("goat is in the line")

Frist we have imported the “re” module to be able to work with regular expression and then we have used the “search” method from the re module. re.search(ex,s) looks for a substring in the string “s” which matches the regular expression “ex”. Most of the characters will match themselves, but some special metacharacters are there , which don’t match but provides some matching meanings. Following are the list of metacharacters :

. ^ $ * + ? { } [ ] \ | ( )

We are not going to look for meanings of all these metacharacters here, let’s try to understand how they are used :
’[‘ and ‘]’ defines a set for characters. Put all the characters you want to match inside it. For checking all characters from a to c, we can either use [abc] or [a-c] . ‘-‘ is used as a range indicator. ‘ ^’ is used for “not”. [ ^6] will match all characters except 6. Check here  for the complete list of different regular expression syntax.

Similar to the above syntax, we have some more syntax patterns :

\w : Matches alphanumeric characters
\W : Matches non-alphanumeric characters
\d : Matches decimal digits
\D : matches non-digit characters
\s : Matches white-space characters
\S : Matches non white-space characters

Check the link we have mentioned above to get full list of different syntax.

Regex using expression :

First of all, let me show you how to check the output of a regex without running a code each time. Open regex101 and check the string 12345abcd678 for regular expression \d . As explained above, \d is used to match decimal digits, so only the numbers are highlighted.

Check the following program :

import re

str = "12345abcd678"
pattern1 = re.compile('\d')
pattern2 = re.compile('\D')
pattern3 = re.compile('[1-9]')
pattern4 = re.compile('[^1-9]')

print pattern1.findall(str) # ['1', '2', '3', '4', '5', '6', '7', '8']
print pattern2.findall(str) # ['a', 'b', 'c', 'd']
print pattern3.findall(str) # ['1', '2', '3', '4', '5', '6', '7', '8']
print pattern4.findall(str) # ['a', 'b', 'c', 'd']

In this example, we are first creating one compiled pattern using ‘compile’ method of ‘re’ module. Next, we are using ‘findall’ method that returns one list of all matched values.

Hope you have learnt the basics of regular expression and how to use in python. Please like our facebook page and subscribe for more programming tutorials.

Leave a Reply