RegEx¶
RegEx, atau Regular Expression, adalah serangkaian karakter yang membentuk pola pencarian.
RegEx dapat digunakan untuk memeriksa apakah suatu string berisi pola pencarian yang ditentukan.
RegEx Module¶
Python memiliki paket bawaan bernama re, yang dapat digunakan untuk bekerja dengan Ekspresi Reguler.
Impor modul re:
import re
RegEx in Python¶
Setelah Anda mengimpor modul re, Anda dapat mulai menggunakan ekspresi reguler:
# Search the string to see if it starts with "The" and ends with "Spain":
import re
txt = "The rain in Spain"
x = re.search("^The.*Spain$", txt)
RegEx Functions¶
Modul re menawarkan serangkaian fungsi yang memungkinkan kita mencari kecocokan pada suatu string:
Function |
Description |
|---|---|
findall |
Returns a list containing all matches |
search |
Returns a Match object if there is a match anywhere in the string |
split |
Returns a list where the string has been split at each match |
sub |
Replaces one or many matches with a string |
Metacharacters¶
Metakarakter adalah karakter dengan makna khusus:
Character |
Description |
Example |
|---|---|---|
[] |
A set of characters |
"[a-m]" |
`` |
Signals a special sequence |
"d" |
. |
Any character (except newline character) |
"he..o" |
^ |
Starts with |
"^hello" |
$ |
Ends with |
"planet$" |
* |
Zero or more occurrences |
"he.*o" |
+ |
One or more occurrences |
"he.+o" |
? |
Zero or one occurrences |
"he.?o" |
{} |
Exactly the specified number of occurrences |
"he.{2}o" |
| |
Either or |
"falls|stays" |
() |
Capture and group |
Flags¶
Anda dapat menambahkan bendera ke pola saat menggunakan ekspresi reguler.
Flag |
Shorthand |
Description |
|---|---|---|
re.ASCII |
re.A |
Returns only ASCII matches |
re.DEBUG |
Returns debug information |
|
re.DOTALL |
re.S |
Makes the . character match all characters (including newline character) |
re.IGNORECASE |
re.I |
Case-insensitive matching |
re.MULTILINE |
re.M |
Returns only matches at the beginning of each line |
re.NOFLAG |
Specifies that no flag is set for this pattern |
|
re.UNICODE |
re.U |
Returns Unicode matches. This is default from Python 3. For Python 2: use this flag to return only Unicode matches |
re.VERBOSE |
re.X |
Allows whitespaces and comments inside patterns. Makes the pattern more readable |
Special Sequences¶
Urutan khusus adalah \ diikuti oleh salah satu karakter dalam daftar di bawah ini, dan memiliki arti khusus:
Sets¶
Set adalah sekumpulan karakter di dalam sepasang tanda kurung siku [] dengan makna khusus:
Set |
Description |
|---|---|
[arn] |
Returns a match where one of the specified characters (a, r, or n) is present |
[a-n] |
Returns a match for any lower case character, alphabetically between a and n |
[^arn] |
Returns a match for any character EXCEPT a, r, and n |
[0123] |
Returns a match where any of the specified digits (0, 1, 2, or 3) are present |
[0-9] |
Returns a match for any digit between 0 and 9 |
[0-5][0-9] |
Returns a match for any two-digit numbers from 00 and 59 |
[a-zA-Z] |
Returns a match for any character alphabetically between a and z, lower case OR upper case |
[+] |
In sets, +, *, ., |, (), $,{} has no special meaning, so [+] means: return a match for any + character in the string |
The findall() Function¶
Fungsi findall() mengembalikan daftar yang berisi semua kecocokan.
#Print a list of all matches:
import re
txt = "The rain in Spain"
x = re.findall("ai", txt)
print(x) # ['ai', 'ai']
Daftar ini berisi kecocokan berdasarkan urutan penemuannya.
Jika tidak ada kecocokan yang ditemukan, daftar kosong akan dikembalikan:
# Return an empty list if no match was found:
import re
txt = "The rain in Spain"
x = re.findall("Portugal", txt)
print(x) # []
The search() Function¶
Fungsi search() mencari string untuk kecocokan, dan mengembalikan objek Match jika ada kecocokan.
Jika ada lebih dari satu kecocokan, hanya kemunculan pertama kecocokan yang akan dikembalikan:
# Search for the first white-space character in the string:
import re
txt = "The rain in Spain"
x = re.search("\s", txt)
print("The first white-space character is located in position:", x.start())
Jika tidak ada kecocokan yang ditemukan, nilai None dikembalikan:
# Make a search that returns no match:
import re
txt = "The rain in Spain"
x = re.search("Portugal", txt)
print(x) # None
The split() Function¶
Fungsi split() mengembalikan daftar yang stringnya telah dipecah pada setiap pencocokan:
# Split at each white-space character:
import re
txt = "The rain in Spain"
x = re.split("\s", txt)
print(x) # ['The', 'rain', 'in', 'Spain']
Anda dapat mengontrol jumlah kemunculan dengan menentukan parameter maxsplit:
import re
#Split the string at the first white-space character:
txt = "The rain in Spain"
x = re.split("\s", txt, 1)
print(x) # ['The', 'rain in Spain']
The sub() Function¶
Fungsi sub() mengganti kecocokan dengan teks pilihan Anda:
import re
#Replace all white-space characters with the digit "9":
txt = "The rain in Spain"
x = re.sub("\s", "9", txt)
print(x) # The9rain9in9Spain
Anda dapat mengontrol jumlah penggantian dengan menentukan parameter count:
import re
#Replace the first two occurrences of a white-space character with the digit 9:
txt = "The rain in Spain"
x = re.sub("\s", "9", txt, 2)
print(x) # The9rain9in Spain
Match Object¶
Objek Match adalah objek yang berisi informasi tentang pencarian dan hasilnya.
Catatan
Jika tidak ada kecocokan, nilai None akan dikembalikan, dan bukan Objek Kecocokan.
import re
#The search() function returns a Match object:
txt = "The rain in Spain"
x = re.search("ai", txt)
print(x) <_sre.SRE_Match object; span=(5, 7), match='ai'>
Objek Match memiliki properti dan metode yang digunakan untuk mengambil informasi tentang pencarian dan hasilnya:
.span()mengembalikan tuple yang berisi posisi awal dan akhir dari kecocokan..stringmengembalikan string yang dimasukkan ke dalam fungsi..group()mengembalikan bagian string yang terdapat kecocokan.
import re
txt = "The rain in Spain"
#Search for an upper case "S" character in the beginning of a word, and print its position:
x = re.search(r"\bS\w+", txt)
print(x.span()) # (12, 17)
#The string property returns the search string:
x = re.search(r"\bS\w+", txt)
print(x.string) # The rain in Spain
#Search for an upper case "S" character in the beginning of a word, and print the word:
x = re.search(r"\bS\w+", txt)
print(x.group()) # Spain
Catatan
If there is no match, the value None will be returned, instead of the Match Object.