How do you convert a string to a raw string in python?

For Python 3, the way to do this that doesn't add double backslashes and simply preserves \n, \t, etc. is:

a = 'hello\nbobby\nsally\n'
a.encode('unicode-escape').decode().replace('\\\\', '\\')
print(a)

Which gives a value that can be written as CSV:

hello\nbobby\nsally\n

There doesn't seem to be a solution for other special characters, however, that may get a single \ before them. It's a bummer. Solving that would be complex.

For example, to serialize a pandas.Series containing a list of strings with special characters in to a textfile in the format BERT expects with a CR between each sentence and a blank line between each document:

with open('sentences.csv', 'w') as f:

    current_idx = 0
    for idx, doc in sentences.items():
        # Insert a newline to separate documents
        if idx != current_idx:
            f.write('\n')
        # Write each sentence exactly as it appared to one line each
        for sentence in doc:
            f.write(sentence.encode('unicode-escape').decode().replace('\\\\', '\\') + '\n')

This outputs (for the Github CodeSearchNet docstrings for all languages tokenized into sentences):

Makes sure the fast-path emits in order.
@param value the value to emit or queue up\n@param delayError if true, errors are delayed until the source has terminated\n@param disposable the resource to dispose if the drain terminates

Mirrors the one ObservableSource in an Iterable of several ObservableSources that first either emits an item or sends\na termination notification.
Scheduler:\n{@code amb} does not operate by default on a particular {@link Scheduler}.
@param  the common element type\n@param sources\nan Iterable of ObservableSource sources competing to react first.
A subscription to each source will\noccur in the same order as in the Iterable.
@return an Observable that emits the same sequence as whichever of the source ObservableSources first\nemitted an item or sent a termination notification\n@see ReactiveX operators documentation: Amb


...

Summary: in this tutorial, you will learn about Python raw strings and how to use them to handle strings that treat backslashes as literal characters.

Introduction to the Python raw strings

In Python, when you prefix a string with the letter r or R such as r'...' and R'...', that string becomes a raw string. Unlike a regular string, a raw string treats the backslashes (\) as literal characters.

Raw strings are useful when you deal with strings that have many backslashes, for example, regular expressions or directory paths on Windows.

To represent special characters such as tabs and newlines, Python uses the backslash (\) to signify the start of an escape sequence. For example:

s = 'lang\tver\nPython\t3' print(s)

Code language: Python (python)

Output:

lang ver Python 3

Code language: Python (python)

However, raw strings treat the backslash (\) as a literal character. For example:

s = r'lang\tver\nPython\t3' print(s)

Code language: Python (python)

Output:

lang\tver\nPython\t3

Code language: Python (python)

A raw string is like its regular string with the backslash (\) represented as double backslashes (\\):

s1 = r'lang\tver\nPython\t3' s2 = 'lang\\tver\\nPython\\t3' print(s1 == s2) # True

Code language: Python (python)

In a regular string, Python counts an escape sequence as a single character:

s = '\n' print(len(s)) # 1

Code language: Python (python)

However, in a raw string, Python counts the backslash (\) as one character:

s = r'\n' print(len(s)) # 2

Code language: Python (python)

Since the backslash (\) escapes the single quote (') or double quotes ("), a raw string cannot end with an odd number of backslashes.

For example:

s = r'\'

Code language: Python (python)

Error:

SyntaxError: EOL while scanning string literal

Code language: Python (python)

Or

s = r'\\\'

Code language: Python (python)

Error:

SyntaxError: EOL while scanning string literal

Code language: Python (python)

Use raw strings to handle file path on Windows

Windows OS uses backslashes to separate paths. For example:

c:\user\tasks\new

Code language: Python (python)

If you use this path as a regular string, Python will issue a number of errors:

dir_path = 'c:\user\tasks\new'

Code language: Python (python)

Error:

SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX escape

Code language: Python (python)

Python treats \u in the path as a Unicode escape but couldn’t decode it.

Now, if you escape the first backslash, you’ll have other issues:

dir_path = 'c:\\user\tasks\new' print(dir_path)

Code language: Python (python)

Output:

c:\user asks ew

Code language: Python (python)

In this example, the \t is a tab and \n is the new line.

To make it easy, you can turn the path into a raw string like this:

dir_path = r'c:\user\tasks\new' print(dir_path)

Code language: Python (python)

Convert a regular string into a raw string

To convert a regular string into a raw string, you use the built-in repr() function. For example:

s = '\n' raw_string = repr(s) print(raw_string)

Code language: Python (python)

Output:

'\n'

Code language: Python (python)

Note that the result raw string has the quote at the beginning and end of the string. To remove them, you can use slices:

s = '\n' raw_string = repr(s)[1:-1] print(raw_string)

Code language: Python (python)

Summary

  • Prefix a literal string with the letter r or R to turn it into a raw string.
  • Raw strings treat backslash as a literal character.

Did you find this tutorial helpful ?

How can we make a string a raw string?

Python raw string is created by prefixing a string literal with 'r' or 'R'. Python raw string treats backslash (\) as a literal character. This is useful when we want to have a string that contains backslash and don't want it to be treated as an escape character.

What is a raw string?

A raw string in programming allows all characters in a string literal to remain the same in code and in the material, rather than performing their standard programming functions. Raw strings are denoted with the letter r, or capital R, and might look something like this: R “(hello)”

How do you print a raw string in Python?

raw strings are raw string literals that treat backslash (\ ) as a literal character. For example, if we try to print a string with a “\n” inside, it will add one line break. But if we mark it as a raw string, it will simply print out the “\n” as a normal character.

How do you convert a string to a literal in Python?

To convert, or cast, a string to an integer in Python, you use the int() built-in function. The function takes in as a parameter the initial string you want to convert, and returns the integer equivalent of the value you passed.