Hướng dẫn regex match word with special characters python - regex khớp từ với ký tự đặc biệt python

Question

Đối với chuỗi ví dụ bạn đưa ra, biểu thức chính quy sau hoạt động OK:

Nội dung chính Show

Ký tự nghĩa đen
Ký tự meta regex
Những nhân vật Python Regex đặc biệt nào phải được trốn thoát?
Đi đâu từ đây
Regex hài hước
'\ S +' trong Python là gì?
Điều này có nghĩa là gì trong regex ([]) \ 1?
\ B trong python regex là gì?
Là gì '?'Trong biểu hiện chính quy?

>>> a = '%he#llo, my website is: http://www.url.com/abcdef123'
>>> re.findall('(http://\S+|\S*[^\w\s]\S*)',a)
['%he#llo,', 'is:', 'http://www.url.com/abcdef123']

... hoặc bạn có thể xóa những từ đó bằng re.sub

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

| có nghĩa là sự xen kẽ và sẽ phù hợp với biểu thức ở hai bên trong nhóm. Phần bên trái khớp với http:// theo sau là một hoặc nhiều ký tự không phải là không gian. Phần bên phải khớp với các ký tự không phải không gian hoặc không gian, theo sau là bất cứ thứ gì không phải là ký tự từ hoặc không gian, theo sau là các ký tự không phải là không gian-đảm bảo rằng bạn có một chuỗi có ít nhất một -word nhân vật và không có không gian.

Cập nhật: Tất nhiên, như các câu trả lời khác hoàn toàn đề xuất, vì tiền tố http:// chứa một ký tự không từ (

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

0) mà bạn không cần phải thay thế đó - bạn có thể đơn giản hóa biểu thức thông thường thành

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

1. Tuy nhiên, có lẽ ví dụ trên với sự xen kẽ vẫn còn hữu ích.

Biểu cảm thường xuyên là một con vật kỳ lạ. Nhiều sinh viên thấy họ khó hiểu - phải không?

Regex Nhân vật đặc biệt - Ví dụ trong Python Re

Tôi nhận ra rằng một lý do chính cho điều này chỉ đơn giản là họ không hiểu các nhân vật Regex đặc biệt. Nói cách khác: Hiểu các nhân vật đặc biệt và mọi thứ khác trong không gian Regex sẽ dễ dàng hơn nhiều với bạn.

Bài viết liên quan: Siêu cường Python Regex - Hướng dẫn cuối cùng Python Regex Superpower – The Ultimate Guide

Bạn có muốn thành thạo siêu cường Regex không? Kiểm tra cuốn sách mới của tôi Cách thông minh nhất để học các biểu thức thường xuyên trong Python với cách tiếp cận 3 bước sáng tạo để học tập tích cực: (1) Nghiên cứu một chương sách, (2) Giải câu đố mã và (3) xem video chương giáo dục . Check out my new book The Smartest Way to Learn Regular Expressions in Python with the innovative 3-step approach for active learning: (1) study a book chapter, (2) solve a code puzzle, and (3) watch an educational chapter video.

Biểu thức chính quy được xây dựng từ các ký tự. Có hai loại ký tự: ký tự theo nghĩa đen và các ký tự đặc biệt.literal characters and special characters.

Ký tự nghĩa đen

Hãy để bắt đầu với điều đầu tiên tuyệt đối mà bạn cần biết với các biểu thức thông thường: một biểu thức thông thường (ngắn: regex) tìm kiếm cho một mẫu nhất định trong một chuỗi nhất định.

Những gì một mô hình? Ở dạng cơ bản nhất của nó, một mẫu có thể là một nhân vật theo nghĩa đen. Vì vậy, các ký tự theo nghĩa đen

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

2,

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

3 và

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

4 đều là các mẫu regex hợp lệ.

Ví dụ: bạn có thể tìm kiếm mẫu regex

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

2 trong chuỗi

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

6 nhưng nó đã giành được một trận đấu. Bạn cũng có thể tìm kiếm mẫu

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

2 trong chuỗi

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

8 và có một trận đấu: ký tự cuối cùng thứ hai trong chuỗi.

Dựa trên cái nhìn sâu sắc đơn giản rằng một ký tự theo nghĩa đen là mẫu Regex hợp lệ, bạn sẽ thấy rằng sự kết hợp của các ký tự theo nghĩa đen cũng là một mẫu Regex hợp lệ. Ví dụ: mẫu regex

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

9 khớp với hai ký tự cuối cùng trong chuỗi

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

8.

Tóm tắt: Biểu thức chính quy được xây dựng từ các ký tự. Một lớp quan trọng của các ký tự là các ký tự theo nghĩa đen. Về nguyên tắc, bạn có thể sử dụng tất cả các ký tự theo nghĩa đen của Unicode trong mẫu Regex của bạn.: Regular expressions are built from characters. An important class of characters are the literal characters. In principle, you can use all Unicode literal characters in your regex pattern.

Tuy nhiên, sức mạnh của các biểu hiện thông thường đến từ khả năng trừu tượng của chúng. Thay vì viết bộ ký tự

import re

text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''


print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''

print(re.findall('\n$', text))
'''
Finds all occurrences where the new-line character '\n'
occurs at the end of the string.
['\n']
'''

print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death']
'''

1, bạn đã viết

import re

text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''


print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''

print(re.findall('\n$', text))
'''
Finds all occurrences where the new-line character '\n'
occurs at the end of the string.
['\n']
'''

print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death']
'''

2 hoặc thậm chí

import re

text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''


print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''

print(re.findall('\n$', text))
'''
Finds all occurrences where the new-line character '\n'
occurs at the end of the string.
['\n']
'''

print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death']
'''

3. Sau này là một nhân vật Regex đặc biệt, và người chuyên nghiệp biết họ bằng trái tim. Trong thực tế, các chuyên gia Regex hiếm khi phù hợp với các nhân vật theo nghĩa đen. Trong hầu hết các trường hợp, họ sử dụng các cấu trúc nâng cao hơn hoặc các ký tự đặc biệt vì nhiều lý do khác nhau như ngắn gọn, biểu cảm hoặc tính tổng quát.

Vậy các ký tự đặc biệt bạn có thể sử dụng trong các mẫu Regex của mình là gì?

Hãy để một cái nhìn vào bảng sau đây chứa tất cả các ký tự đặc biệt trong gói Python, ____24 để xử lý biểu thức thông thường.

Tính cách đặc biệt	Nghĩa
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 5	Biểu tượng mới không phải là một biểu tượng đặc biệt đặc biệt đối với Regex, nó thực sự là một trong những ký tự tiêu chuẩn được sử dụng rộng rãi nhất. Tuy nhiên, bạn sẽ thấy nhân vật Newline thường xuyên đến mức tôi không thể viết danh sách này mà không bao gồm nó. Ví dụ: regex import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 6 khớp với một chuỗi trong đó chuỗi import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 7 được đặt trong một dòng và chuỗi import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 8 được đặt vào dòng thứ hai. & Nbsp;newline symbol is not a special symbol particular to regex only, it’s actually one of the most widely-used, standard characters. However, you’ll see the newline character so often that I just couldn’t write this list without including it. For example, the regex import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 6 matches a string where the string import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 7 is placed in one line and the string import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 8 is placed into the second line.
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 9	Ký tự bảng, giống như ký tự mới, không phải là biểu tượng dành riêng cho Regex. Nó chỉ mã hóa không gian bảng `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 0 khác với một chuỗi các không gian trắng (ngay cả khi nó không khác ở đây). Ví dụ: Regex `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 1 khớp với chuỗi bao gồm import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 7 trong dòng đầu tiên và `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 3 trong dòng thứ hai (với ký tự tab hàng đầu).tabular character is, like the newline character, not a “regex-specific” symbol. It just encodes the tabular space `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 0 which is different to a sequence of whitespaces (even if it doesn’t look different over here). For example, the regex `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 1 matches the string that consists of import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 7 in the first line and `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 3 in the second line (with a leading tab character).
`.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 4	Nhân vật Whitespace, trái ngược với nhân vật Newline, là một biểu tượng đặc biệt của các thư viện Regex. Bạn cũng sẽ tìm thấy nó trong nhiều ngôn ngữ lập trình khác. Vấn đề là bạn thường không biết loại khoảng trắng nào được sử dụng: ký tự bảng, khoảng trắng đơn giản hoặc thậm chí là newlines. Ký tự khoảng trắng `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 5 chỉ đơn giản là phù hợp với bất kỳ trong số chúng. Ví dụ: Regex `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 6 khớp với chuỗi `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 7, cũng như `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 6.whitespace character is, in contrast to the newline character, a special symbol of the regex libraries. You’ll find it in many other programming languages, too. The problem is that you often don’t know which type of whitespace is used: tabular characters, simple whitespaces, or even newlines. The whitespace character `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 5 simply matches any of them. For example, the regex `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 6 matches the string `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 7, as well as `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 6.
`.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 9	Nhân vật DEGATION YETESPACE phù hợp với mọi thứ không khớp với `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 4.whitespace-negation character matches everything that does not match `.` – --> `\.` `` – --> `\` `?` – --> `\?` `+` – --> `\+` `^` – --> `\^` `$` – --> `\$` `\|` – --> `\\|` 4.
import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 3	Các ký tự từ Regex đơn giản hóa việc xử lý văn bản đáng kể. Nó đại diện cho lớp của tất cả các ký tự được sử dụng trong các từ điển hình ( \|^&+-%/=!> 2, \|^&+-%/=!> 3, \|^&+-%/=!> 4 và \|^&+-%/=!> 5). Điều này đơn giản hóa việc viết các biểu thức thường xuyên phức tạp đáng kể. Ví dụ: Regex \|^&+-%/=!> 6 phù hợp với các chuỗi import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 7, \|^&+-%/=!> 8, \|^&+-%/=!> 9 và ________ 50. & nbsp;word* character regex simplifies text processing significantly. It represents the class of all characters used in typical words ( \|^&+-%/=!> 2, \|^&+-%/=!> 3, \|^&+-%/=!> 4, and \|^&+-%/=!> 5). This simplifies the writing of complex regular expressions significantly. For example, the regex \|^&+-%/=!> 6 matches the strings import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 7, \|^&+-%/=!> 8, \|^&+-%/=!> 9, and import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 0.
import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] ['*'] ['/'] ['='] ['!'] ''' 1	Các từ tích hợp từ. Nó phù hợp với bất kỳ ký tự nào không phải là một ký tự từ.word-character-negation. It matches any character that is not a word character.
import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] ['*'] ['/'] ['='] ['!'] ''' 2	Ranh giới từ cũng là một biểu tượng đặc biệt được sử dụng trong nhiều công cụ regex. Bạn có thể sử dụng nó để phù hợp, & nbsp; Như tên cho thấy, ranh giới giữa ký tự A ( import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 3) và ký tự không từ ( import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 1). Nhưng lưu ý rằng nó chỉ phù hợp với chuỗi trống! Bạn có thể hỏi: Tại sao nó tồn tại nếu nó không phù hợp với bất kỳ nhân vật nào? Lý do là nó không phải là người tiêu thụ nhân vật ngay phía trước hoặc ngay sau một lời nói. Bằng cách này, bạn có thể tìm kiếm toàn bộ từ (hoặc các phần của từ) và chỉ trả về từ nhưng không phải là các ký tự phân định tách biệt từ, ví dụ: & nbsp; từ những từ khác.word boundary is also a special symbol used in many regex tools. You can use it to match, as the name suggests, the boundary between the a word character ( import re text = ''' Ha! let me see her: out, alas! he's cold: Her blood is settled, and her joints are stiff; Life and these lips have long been separated: Death lies on her like an untimely frost Upon the sweetest flower of all the field. ''' print(re.findall('.a!', text)) ''' Finds all occurrences of an arbitrary character that is followed by the character sequence 'a!'. ['Ha!'] ''' print(re.findall('is.and', text)) ''' Finds all occurrences of the word 'is', followed by an arbitrary number of characters and the word 'and'. ['is settled, and'] ''' print(re.findall('her:?', text)) ''' Finds all occurrences of the word 'her', followed by zero or one occurrences of the colon ':'. ['her:', 'her', 'her'] ''' print(re.findall('her:+', text)) ''' Finds all occurrences of the word 'her', followed by one or more occurrences of the colon ':'. ['her:'] ''' print(re.findall('^Ha.', text)) ''' Finds all occurrences where the string starts with the character sequence 'Ha', followed by an arbitrary number of characters except for the new-line character. Can you figure out why Python doesn't find any? [] ''' print(re.findall('\n$', text)) ''' Finds all occurrences where the new-line character '\n' occurs at the end of the string. ['\n'] ''' print(re.findall('(Life\|Death)', text)) ''' Finds all occurrences of either the word 'Life' or the word 'Death'. ['Life', 'Death'] ''' 3) and a non-word ( import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 1) character. But note that it matches only the empty string! You may ask: why does it exist if it doesn’t match any character? The reason is that it doesn’t “consume” the character right in front or right after a word. This way, you can search for whole words (or parts of words) and return only the word but not the delimiting characters that separate the word, e.g., from other words.
import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] ['*'] ['/'] ['='] ['!'] ''' 5	Ký tự chữ số phù hợp với tất cả các ký hiệu số trong khoảng từ 0 đến 9. Bạn có thể sử dụng nó để khớp các số nguyên với số chữ số tùy ý: regex import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 6 khớp với số nguyên import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 7, import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 8, import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 9 và `re.sub`0.digit character matches all numeric symbols between 0 and 9. You can use it to match integers with an arbitrary number of digits: the regex import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 6 matches integer numbers import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 7, import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 8, import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 9, and `re.sub`0.
`re.sub`1	Khớp với bất kỳ ký tự không chữ số. Đây là nghịch đảo của import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 5 và nó tương đương với `re.sub`3.non-digit character. This is the inverse of import re text = '\|^&+-%/=!>' # WITHIN CHARACTER CLASS --> ESCAPE '-' print(re.findall('[\|^&+\-%/=!>]', text)) # ['\|', '^', '&', '+', '-', '%', '', '/', '=', '!', '>'] # WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.?+^$\|' pattern = '\|^&+$-%/=!>' print(re.findall('\\|', text)) print(re.findall('\^', text)) print(re.findall('\$', text)) print(re.findall('\+', text)) print(re.findall('-', text)) print(re.findall('%', text)) print(re.findall('\', text)) print(re.findall('/', text)) print(re.findall('=', text)) print(re.findall('!', text)) ''' ['\|'] ['^'] ['$'] ['+'] ['-'] ['%'] [''] ['/'] ['='] ['!'] ''' 5 and it’s equivalent to `re.sub`3.

Nhưng đây không phải là tất cả các ký tự bạn có thể sử dụng trong một biểu thức thông thường.

Ngoài ra còn có các ký tự meta cho động cơ Regex cho phép bạn làm những thứ mạnh mẽ hơn nhiều.

Một ví dụ điển hình là toán tử Asterisk phù hợp với các lần xuất hiện của Zero hoặc nhiều hơn của Regex trước đó. Ví dụ: mẫu re.sub4 phù hợp với số lượng ký tự tùy ý theo sau là hậu tố re.sub5. Mẫu này có hai ký tự regex đặc biệt: DOT re.sub6 và toán tử Asterisk re.sub7. Bây giờ bạn sẽ tìm hiểu về những nhân vật meta đó:

Ký tự meta regex

Hãy xem video ngắn về các ký tự Regex Meta quan trọng nhất:

Cú pháp Python Regex [mồi 15 phút]

Tiếp theo, bạn sẽ nhận được một cái nhìn tổng quan nhanh chóng và bẩn thỉu về các hoạt động Regex quan trọng nhất và cách sử dụng chúng trong Python.

Dưới đây là các nhà khai thác Regex quan trọng nhất:

Nhân vật meta	Nghĩa
`re.sub`6	Toán tử thẻ hoang dã (DOT) khớp với bất kỳ ký tự nào trong một chuỗi ngoại trừ ký tự mới `re.sub`9. Ví dụ: Regex `\|`0 khớp với tất cả các từ có ba ký tự như `\|`1, `\|`2 và ________ 73. & nbsp; & nbsp;wild-card operator (dot) matches any character in a string except the newline character `re.sub`9. For example, the regex `\|`0 matches all words with three characters such as `\|`1, `\|`2, and `\|`3.
`re.sub`7	Toán tử dấu hoa thị không hoặc không có nhiều lần xuất hiện tùy ý (bao gồm cả các lần xuất hiện bằng không) của Regex ngay trước đó. Ví dụ: Regex ‘Cat, khớp với các chuỗi `\|`5, `\|`2, `\|`7, `\|`8 và `\|`9.zero-or-more asterisk operator matches an arbitrary number of occurrences (including zero occurrences) of the immediately preceding regex. For example, the regex ‘cat’ matches the strings `\|`5, `\|`2, `\|`7, `\|`8, and `\|`9.
`http://`0	Toán tử không hoặc một khớp nối (như tên cho thấy) hoặc bằng không hoặc một lần xuất hiện của Regex ngay trước đó. Ví dụ: Regex ‘Cat? Khăn khớp với cả hai chuỗi `http://`1 và `http://`2 - nhưng không phải `http://`3, `http://`4 và ________ 85. & nbsp;zero-or-one operator matches (as the name suggests) either zero or one occurrences of the immediately preceding regex. For example, the regex ‘cat?’ matches both strings `http://`1 and `http://`2 — but not `http://`3, `http://`4, and `http://`5.
`http://`6	Toán tử tại địa điểm một hoặc nhiều lần xuất hiện của Regex ngay trước đó. Ví dụ: Regex `http://`7 không khớp với chuỗi `http://`1 nhưng khớp với tất cả các chuỗi với ít nhất một ký tự dấu `http://`9 như `http://`2, `http://`3 và ________ 84. & nbsp;at-least-one operator matches one or more occurrences of the immediately preceding regex. For example, the regex `http://`7 does not match the string `http://`1 but matches all strings with at least one trailing character `http://`9 such as `http://`2, `http://`3, and `http://`4.
`http://`3	Toán tử khởi động phù hợp với sự khởi đầu của một chuỗi. Ví dụ: Regex `http://`4 sẽ khớp với các chuỗi `http://`5 và `http://`6 nhưng không phải `http://`7 và `http://`8 trong đó ký tự `http://`9 không xảy ra khi bắt đầu chuỗi.start-of-string operator matches the beginning of a string. For example, the regex `http://`4 would match the strings `http://`5 and `http://`6 but not `http://`7 and `http://`8 where the character `http://`9 does not occur at the start of the string.
`>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 00	Toán tử cuối chuỗi khớp với phần cuối của chuỗi. Ví dụ: Regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 01 sẽ khớp với các chuỗi `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 02 và `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 03 nhưng không phải là chuỗi `http://`5 và ________ 105. & nbsp;end-of-string operator matches the end of a string. For example, the regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 01 would match the strings `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 02 and `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 03 but not the strings `http://`5 and `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 05.
`>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 06	Toán tử OR phù hợp với Regex A hoặc Regex B. Lưu ý rằng trực giác khá khác biệt so với cách giải thích tiêu chuẩn của toán tử hoặc người vận hành cũng có thể thỏa mãn cả hai điều kiện. Ví dụ: Regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 07 khớp với các chuỗi `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 08 và `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 09. Nó sẽ có ý nghĩa khi cố gắng phù hợp với cả hai người cùng một lúc.OR operator matches either the regex A or the regex B. Note that the intuition is quite different from the standard interpretation of the or operator that can also satisfy both conditions. For example, the regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 07 matches strings `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 08 and `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 09. It wouldn’t make sense to try to match both of them at the same time.
________ 110 & nbsp;	Toán tử và đầu tiên khớp với Regex A và thứ hai là Regex B, trong chuỗi này. Chúng tôi đã thấy nó tầm thường trong Regex `http://`1 phù hợp với Regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 12 đầu tiên và Regex thứ hai ________ 113. & nbsp;AND operator matches first the regex A and second the regex B, in this sequence. We’ve already seen it trivially in the regex `http://`1 that matches first regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 12 and second regex `>>> re.sub('(http://\S+\|\S[^\w\s]\S)','',a) ' my website '` 13.

Lưu ý rằng tôi đã cho các toán tử ở trên một số tên có ý nghĩa hơn (in đậm) để bạn có thể ngay lập tức nắm bắt mục đích của mỗi regex. Ví dụ, toán tử

>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '

14 thường được ký hiệu là toán tử ‘caret. Những cái tên đó không phải là mô tả nên tôi đã đưa ra những từ giống như mẫu giáo hơn như toán tử khởi động trên mạng.

Hãy để đi sâu vào một số ví dụ!

Ví dụ

import re

text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''


print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''

print(re.findall('\n$', text))
'''
Finds all occurrences where the new-line character '\n'
occurs at the end of the string.
['\n']
'''

print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death']
'''

Trong các ví dụ này, bạn đã thấy biểu tượng đặc biệt

import re

text = '''
    Ha! let me see her: out, alas! he's cold:
    Her blood is settled, and her joints are stiff;
    Life and these lips have long been separated:
    Death lies on her like an untimely frost
    Upon the sweetest flower of all the field.
'''

print(re.findall('.a!', text))
'''
Finds all occurrences of an arbitrary character that is
followed by the character sequence 'a!'.
['Ha!']
'''

print(re.findall('is.*and', text))
'''
Finds all occurrences of the word 'is',
followed by an arbitrary number of characters
and the word 'and'.
['is settled, and']
'''

print(re.findall('her:?', text))
'''
Finds all occurrences of the word 'her',
followed by zero or one occurrences of the colon ':'.
['her:', 'her', 'her']
'''

print(re.findall('her:+', text))
'''
Finds all occurrences of the word 'her',
followed by one or more occurrences of the colon ':'.
['her:']
'''


print(re.findall('^Ha.*', text))
'''
Finds all occurrences where the string starts with
the character sequence 'Ha', followed by an arbitrary
number of characters except for the new-line character. 
Can you figure out why Python doesn't find any?
[]
'''

print(re.findall('\n$', text))
'''
Finds all occurrences where the new-line character '\n'
occurs at the end of the string.
['\n']
'''

print(re.findall('(Life|Death)', text))
'''
Finds all occurrences of either the word 'Life' or the
word 'Death'.
['Life', 'Death']
'''

5 biểu thị ký tự dòng mới trong Python (và hầu hết các ngôn ngữ khác). Có nhiều ký tự đặc biệt, được thiết kế đặc biệt cho các biểu thức thông thường.

Những nhân vật Python Regex đặc biệt nào phải được trốn thoát?

Câu trả lời ngắn gọn: Ở đây, một danh sách đầy đủ của tất cả các nhân vật đặc biệt cần được thoát ra:: Here’s an exhaustive list of all special characters that need to be escaped:

.      – -->     \.
*      – -->     \*
?      – -->     \?
+      – -->     \+
^      – -->     \^
$      – -->     \$
|      – -->     \|

Câu hỏi: Có một danh sách toàn diện về những nhân vật đặc biệt phải được thoát ra để loại bỏ ý nghĩa đặc biệt trong regex?: Is there a comprehensive list of which special characters must be escaped in order to remove the special meaning within the regex?

Ví dụ: Giả sử bạn tìm kiếm các biểu tượng đó trong một chuỗi đã cho và bạn tự hỏi bạn phải trốn thoát nào trong số chúng:: Say you search for those symbols in a given string and you wonder which of them you must escape:

|^&+-%*/=!>

Trả lời: Phân biệt giữa việc sử dụng các ký hiệu đặc biệt trong hoặc bên ngoài một lớp ký tự.: Differentiate between using the special symbols within or outside a character class.

Trong lớp ký tự, bạn chỉ cần thoát khỏi biểu tượng trừ thay thế
```
>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '
```
16 bằng
```
>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '
```
17 vì điều này có ý nghĩa đặc biệt trong lớp ký tự (ký tự phạm vi phạm vi).
Bên ngoài lớp ký tự theo mẫu Regex bình thường, bạn chỉ cần thoát khỏi Regex Chars với ý nghĩa đặc biệt. Ở đây, một danh sách đầy đủ của tất cả các nhân vật đặc biệt cần được thoát ra:
```
>>> re.sub('(http://\S+|\S*[^\w\s]\S*)','',a)
' my website  '
```
18

import re

text = '|^&+-%*/=!>'

# WITHIN CHARACTER CLASS --> ESCAPE '-'
print(re.findall('[|^&+\-%*/=!>]', text))
# ['|', '^', '&', '+', '-', '%', '*', '/', '=', '!', '>']

# WITHOUT CHARACTER CLASS --> ESCAPE ALL SPECIAL CHARS '.*?+^$|'
pattern = '|^&+$-%*/=!>'
print(re.findall('\|', text))
print(re.findall('\^', text))
print(re.findall('\$', text))
print(re.findall('\+', text))
print(re.findall('-', text))
print(re.findall('%', text))
print(re.findall('\*', text))
print(re.findall('/', text))
print(re.findall('=', text))
print(re.findall('!', text))
'''
['|']
['^']
['$']
['+']
['-']
['%']
['*']
['/']
['=']
['!']
'''

Bằng cách thoát khỏi các biểu tượng Regex đặc biệt, chúng mất đi ý nghĩa đặc biệt của chúng và bạn có thể tìm thấy các biểu tượng trong văn bản gốc.

Đi đâu từ đây

Bạn đã học được tất cả các nhân vật đặc biệt của các biểu thức thông thường, cũng như các ký tự meta. Điều này sẽ cung cấp cho bạn một cơ sở mạnh mẽ để cải thiện kỹ năng regex của bạn.

Nếu bạn muốn tăng tốc các kỹ năng của mình, bạn cần một nền tảng tốt. Kiểm tra cuốn sách Python hoàn toàn mới của tôi, Py Python One-Liners (Amazon Link), giúp tăng các kỹ năng của bạn từ số 0 đến Hero, trong một dòng mã Python duy nhất!

Regex hài hước

Hướng dẫn regex match word with special characters python - regex khớp từ với ký tự đặc biệt python

Đợi đã, quên thoát một không gian. Wheeeeee [Taptaptap] Eeeeee. (nguồn)

Trong khi làm việc như một nhà nghiên cứu trong các hệ thống phân tán, Tiến sĩ Christian Mayer đã tìm thấy tình yêu của mình đối với việc dạy các sinh viên khoa học máy tính.

Để giúp học sinh đạt được thành công cao hơn của Python, ông đã thành lập trang web giáo dục chương trình Finxter.com. Ông là tác giả của cuốn sách lập trình phổ biến Python Oneer (Nostarch 2020), đồng tác giả của loạt sách Break Break Python, những cuốn sách tự xuất bản, người đam mê khoa học máy tính, freelancer và chủ sở hữu của một trong 10 blog Python lớn nhất trên toàn thế giới.

Niềm đam mê của ông là viết, đọc và mã hóa. Nhưng niềm đam mê lớn nhất của anh là phục vụ các lập trình viên đầy tham vọng thông qua Finxter và giúp họ tăng cường các kỹ năng của họ. Bạn có thể tham gia học viện email miễn phí của anh ấy ở đây.

'\ S +' trong Python là gì?

Vì \ s+ có nghĩa là một chuỗi các ký tự không phải là màu trắng và \ s+ có nghĩa là một chuỗi các ký tự khoảng trắng, điều này phù hợp chính xác là một phần của đầu ra.a string of non-whitespace characters” and \s+ means “a string of whitespace characters”, this correctly matches that part of the output.

Điều này có nghĩa là gì trong regex ([]) \ 1?

Đây là thẻ HTML mở. . \ 1 khớp với cùng một văn bản được khớp bởi nhóm chụp đầu tiên. / Trước đó là một nhân vật theo nghĩa đen.references the first capturing group. \1 matches the exact same text that was matched by the first capturing group. The / before it is a literal character.

\ B trong python regex là gì?

Bên trong một phạm vi ký tự, \ B đại diện cho ký tự backspace, để tương thích với các chữ viết của Python.Khớp với chuỗi trống, nhưng chỉ khi nó không ở đầu hoặc cuối của một từ.backspace character, for compatibility with Python's string literals. Matches the empty string, but only when it is not at the beginning or end of a word.

Là gì '?'Trong biểu hiện chính quy?

có nghĩa là "0 hoặc một chữ số, nhưng không phải hai hoặc nhiều".[0-9]* có nghĩa là "0 hoặc nhiều chữ số (không có giới hạn, có thể là 42 trong số đó)".Lưu ý rằng một số ngôn ngữ yêu cầu phao được viết với 0 hàng đầu trước.Nếu số là từ 0 đến 1 (0,5 không.zero or one digits, but not two or more". [0-9]* means "zero or more digits (no limit, could be 42 of them)". Note that some languages require that floats are written with a leading 0 before the . if the number is between 0 and 1 ( 0.5 not .

programming python Python special characters

Hướng dẫn regex match word with special characters python - regex khớp từ với ký tự đặc biệt python

Ký tự nghĩa đen

Ký tự meta regex

Ví dụ

Những nhân vật Python Regex đặc biệt nào phải được trốn thoát?

Đi đâu từ đây

Regex hài hước

'\ S +' trong Python là gì?

Điều này có nghĩa là gì trong regex ([]) \ 1?

\ B trong python regex là gì?

Là gì '?'Trong biểu hiện chính quy?

Bài Viết Liên Quan

Quảng Cáo

Có thể bạn quan tâm

Toplist được quan tâm

Quảng cáo

Xem Nhiều

Quảng cáo

Chúng tôi

Điều khoản

Trợ giúp

Mạng xã hội