Convert string to raw string python

For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:

a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)

Which gives a value that can be written as CSV:

hello\nbobby\nsally\n

There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.

For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:

with open('sentences.csv', 'w') as f:

    current_idx = 0
    for idx, doc in sentences.items():
        # Insert a newline to separate documents
        if idx != current_idx:
            f.write('\n')
        # Write each sentence exactly as it appared to one line each
        for sentence in doc:
            f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')

This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):

Makes sure the fast-path emits in order.
@param value the value to emit or queue up\n@param delayError if true, errors are delayed until the source has terminated\n@param disposable the resource to dispose if the drain terminates

Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{@code amb} does not operate by default on a particular {@link Scheduler}.
@param  the common element type\n@param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
@return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n@see ReactiveX operators documentation: Amb


...

In Python, strings prefixed with r or R, such as r'...' and r"...", are called raw strings and treat backslashes \ as literal characters. Raw strings are useful when handling strings that use a lot of backslashes, such as Windows paths and regular expression patterns.

This article describes the following contents.

  • Escape sequences
  • Raw strings treat backslashes as literal characters
  • Convert normal strings to raw strings with repr()
  • Raw strings cannot end with an odd number of backslashes

Escape sequences

In Python, characters that cannot be represented in a normal string (such as tabs, line feeds. etc.) are described using an escape sequence with a backslash \ (such as \t or \n), similar to the C language.

  • 2. Lexical analysis - String and Bytes literals — Python 3.9.7 documentation

s = 'a\tb\nA\tB'
print(s)
# a b
# A B

Raw strings treat backslashes as literal characters

Strings prefixed with r or R, such as r'...' and r"...", are called raw strings and treat backslashes \ as literal characters. In raw strings, escape sequences are not treated specially.

rs = r'a\tb\nA\tB'
print(rs)
# a\tb\nA\tB

There is no special type for raw strings; it is just a string, which is equivalent to a regular string with backslashes represented by \\.

print(type(rs))
# 

print(rs == 'a\\tb\\nA\\tB')
# True

In a normal string, an escape sequence is considered to be one character, but in a raw string, backslashes are also counted as characters.

  • Get the length of a string (number of characters) in Python

print(len(s))
# 7

print(list(s))
# ['a', '\t', 'b', '\n', 'A', '\t', 'B']

print(len(rs))
# 10

print(list(rs))
# ['a', '\\', 't', 'b', '\\', 'n', 'A', '\\', 't', 'B']

Windows paths

Using the raw string is useful when representing a Windows path as a string.

Windows paths are separated by backslashes \, so if you use a normal string, you have to escape each one like \\, but you can write it as is with a raw string.

path = 'C:\\Windows\\system32\\cmd.exe'
rpath = r'C:\Windows\system32\cmd.exe'
print(path == rpath)
# True

Note that a string ending with an odd number of backslashes raises an error, as described below. In this case, you need to write it in a normal string or write only the trailing backslash as a normal string and concatenate it.

path2 = 'C:\\Windows\\system32\\'
# rpath2 = r'C:\Windows\system32\'
# SyntaxError: EOL while scanning string literal
rpath2 = r'C:\Windows\system32' + '\\'
print(path2 == rpath2)
# True

Convert normal strings to raw strings with repr()

Use the built-in function repr() to convert normal strings into raw strings.

  • Built-in Functions - repr() — Python 3.9.7 documentation

s_r = repr(s)
print(s_r)
# 'a\tb\nA\tB'

The string returned by repr() has ' at the beginning and the end.

print(list(s_r))
# ["'", 'a', '\\', 't', 'b', '\\', 'n', 'A', '\\', 't', 'B', "'"]

Using slices, you can get the string equivalent to the raw string.

s_r2 = repr(s)[1:-1]
print(s_r2)
# a\tb\nA\tB

print(s_r2 == rs)
# True

print(r'\t' == repr('\t')[1:-1])
# True

Raw strings cannot end with an odd number of backslashes

Since backslashes escape the trailing ' or ", an error will occur if there are an odd number of backslashes \ at the end of the string.

  • Design and History FAQ - Why can’t raw strings (r-strings) end with a backslash? — Python 3.9.7 documentation

# print(r'\')
# SyntaxError: EOL while scanning string literal

print(r'\\')
# \\

# print(r'\\\')
# SyntaxError: EOL while scanning string literal

How do you convert a string to a raw string in Python?

Use the built-in function repr() to convert normal strings into raw strings. The string returned by repr() has ' at the beginning and the end. Using slices, you can get the string equivalent to the raw string.

What is a raw string?

A raw string in programming allows all characters in a string literal to remain the same in code and in the material, rather than performing their standard programming functions. Raw strings are denoted with the letter r, or capital R, and might look something like this: R “(hello)”

How do you convert a string to a literal in Python?

To convert, or cast, a string to an integer in Python, you use the int() built-in function. The function takes in as a parameter the initial string you want to convert, and returns the integer equivalent of the value you passed.