RegEx¶

RegEx, atau Regular Expression, adalah serangkaian karakter yang membentuk pola pencarian.

RegEx dapat digunakan untuk memeriksa apakah suatu string berisi pola pencarian yang ditentukan.

RegEx Module¶

Python memiliki paket bawaan bernama re, yang dapat digunakan untuk bekerja dengan Ekspresi Reguler.

Impor modul re:

import re

RegEx in Python¶

Setelah Anda mengimpor modul re, Anda dapat mulai menggunakan ekspresi reguler:

# Search the string to see if it starts with "The" and ends with "Spain":

import re

txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)

RegEx Functions¶

Modul re menawarkan serangkaian fungsi yang memungkinkan kita mencari kecocokan pada suatu string:

Function	Description
findall	Returns a list containing all matches
search	Returns a Match object if there is a match anywhere in the string
split	Returns a list where the string has been split at each match
sub	Replaces one or many matches with a string

Metacharacters¶

Metakarakter adalah karakter dengan makna khusus:

Character	Description	Example
[]	A set of characters	"[a-m]"
``	Signals a special sequence	"d"
.	Any character (except newline character)	"he..o"
^	Starts with	"^hello"
$	Ends with	"planet$"
*	Zero or more occurrences	"he.*o"
+	One or more occurrences	"he.+o"
?	Zero or one occurrences	"he.?o"
{}	Exactly the specified number of occurrences	"he.{2}o"
\|	Either or	"falls\|stays"
()	Capture and group

Flags¶

Anda dapat menambahkan bendera ke pola saat menggunakan ekspresi reguler.

Flag	Shorthand	Description
re.ASCII	re.A	Returns only ASCII matches
re.DEBUG		Returns debug information
re.DOTALL	re.S	Makes the . character match all characters (including newline character)
re.IGNORECASE	re.I	Case-insensitive matching
re.MULTILINE	re.M	Returns only matches at the beginning of each line
re.NOFLAG		Specifies that no flag is set for this pattern
re.UNICODE	re.U	Returns Unicode matches. This is default from Python 3. For Python 2: use this flag to return only Unicode matches
re.VERBOSE	re.X	Allows whitespaces and comments inside patterns. Makes the pattern more readable

Special Sequences¶

Urutan khusus adalah \ diikuti oleh salah satu karakter dalam daftar di bawah ini, dan memiliki arti khusus:

Sets¶

Set adalah sekumpulan karakter di dalam sepasang tanda kurung siku [] dengan makna khusus:

Set	Description
[arn]	Returns a match where one of the specified characters (a, r, or n) is present
[a-n]	Returns a match for any lower case character, alphabetically between a and n
[^arn]	Returns a match for any character EXCEPT a, r, and n
[0123]	Returns a match where any of the specified digits (0, 1, 2, or 3) are present
[0-9]	Returns a match for any digit between 0 and 9
[0-5][0-9]	Returns a match for any two-digit numbers from 00 and 59
[a-zA-Z]	Returns a match for any character alphabetically between a and z, lower case OR upper case
[+]	In sets, +, *, ., \|, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string

The findall() Function¶

Fungsi findall() mengembalikan daftar yang berisi semua kecocokan.

#Print a list of all matches:

import re

txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x) # ['ai', 'ai']

Daftar ini berisi kecocokan berdasarkan urutan penemuannya.

Jika tidak ada kecocokan yang ditemukan, daftar kosong akan dikembalikan:

# Return an empty list if no match was found:

import re

txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x) # []

The search() Function¶

Fungsi search() mencari string untuk kecocokan, dan mengembalikan objek Match jika ada kecocokan.

Jika ada lebih dari satu kecocokan, hanya kemunculan pertama kecocokan yang akan dikembalikan:

# Search for the first white-space character in the string:

import re

txt = "The rain in Spain"
x = re.search("\s", txt)

print("The first white-space character is located in position:", x.start())

Jika tidak ada kecocokan yang ditemukan, nilai None dikembalikan:

# Make a search that returns no match:

import re

txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x) # None

The split() Function¶

Fungsi split() mengembalikan daftar yang stringnya telah dipecah pada setiap pencocokan:

# Split at each white-space character:

import re

txt = "The rain in Spain"
x = re.split("\s", txt)
print(x) # ['The', 'rain', 'in', 'Spain']

Anda dapat mengontrol jumlah kemunculan dengan menentukan parameter maxsplit:

import re

#Split the string at the first white-space character:

txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x) # ['The', 'rain in Spain']

The sub() Function¶

Fungsi sub() mengganti kecocokan dengan teks pilihan Anda:

import re

#Replace all white-space characters with the digit "9":

txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x) # The9rain9in9Spain

Anda dapat mengontrol jumlah penggantian dengan menentukan parameter count:

import re

#Replace the first two occurrences of a white-space character with the digit 9:

txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x) # The9rain9in Spain

Match Object¶

Objek Match adalah objek yang berisi informasi tentang pencarian dan hasilnya.

Catatan

Jika tidak ada kecocokan, nilai None akan dikembalikan, dan bukan Objek Kecocokan.

import re

#The search() function returns a Match object:

txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) <_sre.SRE_Match object; span=(5, 7), match='ai'>

Objek Match memiliki properti dan metode yang digunakan untuk mengambil informasi tentang pencarian dan hasilnya:

.span() mengembalikan tuple yang berisi posisi awal dan akhir dari kecocokan.
.string mengembalikan string yang dimasukkan ke dalam fungsi.
.group() mengembalikan bagian string yang terdapat kecocokan.

import re

txt = "The rain in Spain"

#Search for an upper case "S" character in the beginning of a word, and print its position:

x = re.search(r"\bS\w+", txt)
print(x.span()) # (12, 17)

#The string property returns the search string:

x = re.search(r"\bS\w+", txt)
print(x.string) # The rain in Spain

#Search for an upper case "S" character in the beginning of a word, and print the word:

x = re.search(r"\bS\w+", txt)
print(x.group()) # Spain