Hướng dẫn python decode escaped string

I have some escaped strings that need to be unescaped. I'd like to do this in Python.

For example, in Python 2.7 I can do this:

>>> "\\123omething special".decode['string-escape']
'Something special'
>>>

How do I do it in Python 3? This doesn't work:

>>> b"\\123omething special".decode['string-escape']
Traceback [most recent call last]:
  File "", line 1, in 
LookupError: unknown encoding: string-escape
>>>

My goal is to be able to take a string like this:

s\000u\000p\000p\000o\000r\000t\000@\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000

And turn it into:

""

After I do the conversion, I'll probe to see if the string I have is encoded in UTF-8 or UTF-16.

SuperStormer

4,7515 gold badges20 silver badges32 bronze badges

asked Feb 11, 2013 at 20:37

You'll have to use unicode_escape instead:

>>> b"\\123omething special".decode['unicode_escape']

If you start with a str object instead [equivalent to the python 2.7 unicode] you'll need to encode to bytes first, then decode with unicode_escape.

If you need bytes as end result, you'll have to encode again to a suitable encoding [.encode['latin1'] for example, if you need to preserve literal byte values; the first 256 Unicode code points map 1-on-1].

Your example is actually UTF-16 data with escapes. Decode from unicode_escape, back to latin1 to preserve the bytes, then from utf-16-le [UTF 16 little endian without BOM]:

>>> value = b's\\000u\\000p\\000p\\000o\\000r\\000t\\000@\\000p\\000s\\000i\\000l\\000o\\000c\\000.\\000c\\000o\\000m\\000'
>>> value.decode['unicode_escape'].encode['latin1']  # convert to bytes
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00@\x00p\x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'
>>> _.decode['utf-16-le'] # decode from UTF-16-LE
''

answered Feb 11, 2013 at 20:40

Martijn Pieters♦Martijn Pieters

984k273 gold badges3872 silver badges3234 bronze badges

The old "string-escape" codec maps bytestrings to bytestrings, and there's been a lot of debate about what to do with such codecs, so it isn't currently available through the standard encode/decode interfaces.

BUT, the code is still there in the C-API [as PyBytes_En/DecodeEscape], and this is still exposed to Python via the undocumented codecs.escape_encode and codecs.escape_decode.

>>> import codecs
>>> codecs.escape_decode[b"ab\\xff"]
[b'ab\xff', 6]
>>> codecs.escape_encode[b"ab\xff"]
[b'ab\\xff', 3]

These functions return the transformed bytes object, plus a number indicating how many bytes were processed... you can just ignore the latter.

>>> value = b's\\000u\\000p\\000p\\000o\\000r\\000t\\000@\\000p\\000s\\000i\\000l\\000o\\000c\\000.\\000c\\000o\\000m\\000'
>>> codecs.escape_decode[value][0]
b's\x00u\x00p\x00p\x00o\x00r\x00t\x00@\x00p\x00s\x00i\x00l\x00o\x00c\x00.\x00c\x00o\x00m\x00'

answered Apr 18, 2014 at 9:57

If you want str-to-str decoding of escape sequences, so both input and output are Unicode:

def string_escape[s, encoding='utf-8']:
    return [s.encode['latin1']         # To bytes, required by 'unicode-escape'
             .decode['unicode-escape'] # Perform the actual octal-escaping decode
             .encode['latin1']         # 1:1 mapping back to bytes
             .decode[encoding]]        # Decode original encoding

Testing:

>>> string_escape['\\123omething special']
'Something special'

>>> string_escape[r's\000u\000p\000p\000o\000r\000t\000@'
                  r'\000p\000s\000i\000l\000o\000c\000.\000c\000o\000m\000',
                  'utf-16-le']
''

answered Nov 13, 2019 at 2:33

MestreLionMestreLion

11.9k4 gold badges62 silver badges55 bronze badges

You can't use unicode_escape on byte strings [or rather, you can, but it doesn't always return the same thing as string_escape does on Python 2] – beware!

This function implements string_escape using a regular expression and custom replacement logic.

def unescape[text]:
    regex = re.compile[b'\\\\[\\\\|[0-7]{1,3}|x.[0-9a-f]?|[\'"abfnrt]|.|$]']
    def replace[m]:
        b = m.group[1]
        if len[b] == 0:
            raise ValueError["Invalid character escape: '\\'."]
        i = b[0]
        if i == 120:
            v = int[b[1:], 16]
        elif 48


				
					

                 
	Bài Viết Liên Quan
	
	 	
		
		   
		   
		   
		
		
			How do i remove a string in html?

		
	

		
		
		   
		   
		   
		
		
			How do i print an individual digit of a number in python?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn python prime factorization

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn tmp trong python

		
	

		
		
		   
		   
		   
		
		
			Package php5 curl has no installation candidate

		
	

		
		
		   
		   
		   
		
		
			How do you convert text to a table in python?

		
	

		
		
		   
		   
		   
		
		
			What is json dump python?

		
	

		
		
		   
		   
		   
		
		
			Bộ đề thi chuyên viên chính năm 2023

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn showerror php

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng external JavaScript

		
	

		
		
		   
		   
		   
		
		
			What is database in php

		
	

		
		
		   
		   
		   
		
		
			How do you fix a javascript error occurred in the main process?

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng gfloor trong PHP

		
	

		
		
		   
		   
		   
		
		
			Remove tab from string python

		
	

		
		
		   
		   
		   
		
		
			How to read confusion matrix python

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng c loc python

		
	

		
		
		   
		   
		   
		
		
			Kiểm tra ký tự trong javascript

		
	

		
		
		   
		   
		   
		
		
			Change string date format python

		
	

		
		
		   
		   
		   
		
		
			Bà bói mù vanga năm 2023

		
	

		
		
		   
		   
		   
		
		
			Hướng dẫn dùng whilist JavaScript

		
	

	
	




Toplist mới

 
	
	 
		#1
		
			Top 4 uống nước chanh sả mật ong có tác dụng gì 2023
			5 tháng trước
		
	



	
	 
		#2
		
			Top 10 bài tập làm văn số 5 lớp 7 de 4 2023
			5 tháng trước
		
	



	
	 
		#3
		
			Top 3 vừa chơi đã có tài khoản vương giả chap 1 2023
			5 tháng trước
		
	



	
	 
		#4
		
			Top 6 anh sẽ on thôi cover phạm nguyên ngọc lyrics 2023
			5 tháng trước
		
	



	
	 
		#5
		
			Top 7 tài liệu quản lý nhà nước và quản lý ngành giáo dục đào tạo 2023
			5 tháng trước
		
	



	
	 
		#6
		
			Top 7 hãy ra khỏi người đó đi hợp âm 2023
			5 tháng trước
		
	



	
	 
		#7
		
			Top 6 giáo án thơ về thăm nhà bác 2023
			5 tháng trước
		
	



	
	 
		#8
		
			Top 8 giáo án ngữ văn 6 cánh diều 2023
			5 tháng trước
		
	



	
	 
		#9
		
			Top 9 tinh bột tham gia phản ứng nào 2023
			5 tháng trước
		
	






		


	Bài mới nhất
	
	 	
		
		   
		   
		   
		
		
			Cong văn 3722 ngay 4 thang 10 nam 2023 năm 2024

		
	

		
		
		   
		   
		   
		
		
			Trung tâm dịch vụ nông nghiệp 2023 báo bình định năm 2024

		
	

		
		
		   
		   
		   
		
		
			Top những người được yêu thích nhất thế giới năm 2024

		
	

		
		
		   
		   
		   
		
		
			Cần tập trung nội dung văn hóa nào nhất năm 2024

		
	

		
		
		   
		   
		   
		
		
			Toán lớp 4 tập 1 trang 83 84 năm 2024

		
	

		
		
		   
		   
		   
		
		
			Hàng hóa cho vay mượn có lập hóa đơn không năm 2024

		
	

		
		
		   
		   
		   
		
		
			Theo dõi thai có nguy cơ cao là gì năm 2024

		
	

		
		
		   
		   
		   
		
		
			Xe bus 22 đi qua những điểm nào năm 2024

		
	

		
		
		   
		   
		   
		
		
			Top kem dưỡng trắng da mặt hiệu quả năm 2024

		
	

		
		
		   
		   
		   
		
		
			Chuong trinh cấp 1 có bài tập về nhà khong năm 2024

		
	

	
	
                 
	Chủ Đề
	
	
	
		  programming
		  Hỏi Đáp
		  Mẹo Hay
		  Toplist
		  Là gì
		  Địa Điểm Hay
		  Học Tốt
		  Công Nghệ
		  mẹo hay
		  Nghĩa của từ
		  Bao nhiêu
		  Khỏe Đẹp
		  đánh giá
		  Top List
		  bao nhieu
		  bao nhiêu
		  hướng dẫn
		  So Sánh
		  So sánh
		  Tiếng anh
		  Bài tập
		  Xây Đựng
		  Sản phẩm tốt
		  Ngôn ngữ
		  Bài Tập
		  Máy tính
		  javascript
		  Ở đâu
		  Hướng dẫn
		  Dịch 
		  Thế nào
		  Tại sao
		  Đại học
		  Món Ngon
		  Facebook
		  Khoa Học