For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n
, \t
, etc. is:
a = 'hello\nbobby\nsally\n'
a.encode['unicode-escape'].decode[].replace['\\\\', '\\']
print[a]
Which gives a value that can be written as CSV:
hello\nbobby\nsally\n
There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.
For example, to serialize a pandas.Series
containing a list of strings with special characters in to a textfile in the
format BERT expects with a CR between each sentence and a blank line between each document:
with open['sentences.csv', 'w'] as f:
current_idx = 0
for idx, doc in sentences.items[]:
# Insert a newline to separate documents
if idx != current_idx:
f.write['\n']
# Write each sentence exactly as it appared to one line each
for sentence in doc:
f.write[sentence.encode['unicode-escape'].decode[].replace['\\\\', '\\'] + '\n']
This outputs [for the Github CodeSearchNet docstrings for all languages tokenized into sentences]:
Makes sure the fast-path emits in order.
@param value the value to emit or queue up\n@param delayError if true, errors are delayed until the source has terminated\n@param disposable the resource to dispose if the drain terminates
Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{@code amb} does not operate by default on a particular {@link Scheduler}.
@param the common element type\n@param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
@return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n@see ReactiveX operators documentation: Amb
...
Summary: in this tutorial, you will learn about Python raw strings and how to use them to handle strings that treat backslashes as literal characters.
Introduction to the Python raw strings
In Python, when you prefix a string with the letter r
or R
such as r'...'
and R'...'
, that string becomes a raw string. Unlike a regular string, a raw string treats the backslashes [\
] as literal characters.
Raw strings are useful when you deal with strings that have many backslashes, for example, regular expressions or directory paths on Windows.
To represent special characters such as tabs and newlines, Python uses the backslash [\
] to signify the start of an escape sequence. For example:
Code language: Python [python]
s = 'lang\tver\nPython\t3' print[s]
Output:
Code language: Python [python]
lang ver Python 3
However, raw strings treat the backslash [\
] as a
literal character. For example:
Code language: Python [python]
s = r'lang\tver\nPython\t3' print[s]
Output:
Code language: Python [python]
lang\tver\nPython\t3
A raw string is like its regular string with the backslash [\
] represented as double backslashes [\\
]:
Code language: Python [python]
s1 = r'lang\tver\nPython\t3' s2 = 'lang\\tver\\nPython\\t3' print[s1 == s2] # True
In a regular string, Python counts an escape sequence as a single character:
Code language: Python [python]
s = '\n' print[len[s]] # 1
However, in a raw string, Python counts the backslash [\
] as one character:
Code language: Python [python]
s = r'\n' print[len[s]] # 2
Since the backslash [\
] escapes the single quote ['
] or double quotes ["
], a raw string cannot end
with an odd number of backslashes.
For example:
Code language: Python [python]
s = r'\'
Error:
SyntaxError: EOL while scanning string literal
Code language: Python [python]
Or
Code language: Python [python]
s = r'\\\'
Error:
Code language: Python [python]
SyntaxError: EOL while scanning string literal
Use raw strings to handle file path on Windows
Windows OS uses backslashes to separate paths. For example:
Code language: Python [python]
c:\user\tasks\new
If you use this path as a regular string, Python will issue a number of errors:
Code language: Python [python]
dir_path = 'c:\user\tasks\new'
Error:
Code language: Python [python]
SyntaxError: [unicode error] 'unicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX escape
Python treats \u in the path as a Unicode escape but couldn’t decode it.
Now, if you escape the first backslash, you’ll have other issues:
Code language: Python [python]
dir_path = 'c:\\user\tasks\new' print[dir_path]
Output:
Code language: Python [python]
c:\user asks ew
In this example, the \t
is a tab and \n
is the new line.
To make it easy, you can turn the path into a raw string like this:
Code language: Python [python]
dir_path = r'c:\user\tasks\new' print[dir_path]
Convert a regular string into a raw string
To convert a regular string into a raw string, you use the built-in repr[] function. For example:
Code language: Python [python]
s = '\n' raw_string = repr[s] print[raw_string]
Output:
Code language: Python [python]
'\n'
Note that the result raw string has the quote at the beginning and end of the string. To remove them, you can use slices:
Code language: Python [python]
s = '\n' raw_string = repr[s][1:-1] print[raw_string]
Summary
- Prefix a literal string with the letter r or R to turn it into a raw string.
- Raw strings treat backslash as a literal character.
Did you find this tutorial helpful ?