How do you decode a file in python?

You are looking at byte string values, printed as repr() results because they are contained in a dictionary. String representations can be re-used as Python string literals and non-printable and non-ASCII characters are shown using string escape sequences. Container values are always represented with repr() to ease debugging.

Thus, the string 'K\xc3\xa4se' contains two non-ASCII bytes with hex values C3 and A4, a UTF-8 combo for the U+00E4 codepoint.

You should decode the values to unicode objects:

with open('dictionary.txt') as my_file:
    for line in my_file:   # just loop over the file
        if line.strip(): # ignoring blank lines
            key, value = line.decode('utf8').strip().split(':')
            words[key] = value

or better still, use codecs.open() to decode the file as you read it:

import codecs

with codecs.open('dictionary.txt', 'r', 'utf8') as my_file:
    for line in my_file:
        if line.strip(): # ignoring blank lines
            key, value = line.strip().split(':')
            words[key] = value

Printing the resulting dictionary will still use repr() results for the contents, so now you'll see u'cheese': u'K\xe4se' instead, because \xe4 is the escape code for Unicode point 00E4, the ä character. Print individual words if you want the actual characters to be written to the terminal:

print words['cheese']

But now you can compare these values with other data that you decoded, provided you know their correct encoding, and manipulate them and encode them again to whatever target codec you needed to use. print will do this automatically, for example, when printing unicode values to your terminal.

You may want to read up on Unicode and Python:

  • The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) by Joel Spolsky

  • The Python Unicode HOWTO

  • Pragmatic Unicode by Ned Batchelder

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    decode() is a method specified in Strings in Python 2.
    This method is used to convert from one encoding scheme, in which argument string is encoded to the desired encoding scheme. This works opposite to the encode. It accepts the encoding of the encoding string to decode it and returns the original string.

    Syntax : decode(encoding, error)

    Parameters :
    encoding : Specifies the encoding on the basis of which decoding has to be performed.
    error : Decides how to handle the errors if they occur, e.g ‘strict’ raises Unicode error in case of exception and ‘ignore’ ignores the errors occurred.

    Returns : Returns the original string from the encoded string.

     
    Code #1 : Code to decode the string

    str = "geeksforgeeks"

    str_enc = str.encode(encodeing='utf8'

    print ("The encoded string in base64 format is : ",) 

    print (str_enc )

    print ("The decoded string is : ",) 

    print (str_enc.decode('utf8', 'strict'))

    Output:

    The encoded string in base64 format is :  Z2Vla3Nmb3JnZWVrcw==
    
    The decoded string is :  geeksforgeeks
    

    Application :
    Encoding and decoding together can be used in the simple applications of storing passwords in the back end and many other applications like cryptography which deals with keeping the information confidential.
    A small demonstration of the password application is depicted below.

     
    Code #2 : Code to demonstrate application of encode-decode

    user = "geeksforgeeks"

    passw = "i_lv_coding"

    passw = passw.encode('base64', 'strict'

    user_login = "geeksforgeeks"

    pass_wrong = "geeksforgeeks"

    print ("Password entered : " + pass_wrong )

    if(pass_wrong == passw.decode('base64', 'strict')): 

        print ("You are logged in !!")

    else : print ("Wrong Password !!")

    print( '\r')

    pass_right = "i_lv_coding"

    print ("Password entered : " + pass_right )

    if(pass_right == passw.decode('base64', 'strict')): 

        print ("You are logged in !!")

    else

        print ("Wrong Password !!")

    Output:

    Password entered : geeksforgeeks
    Wrong Password!!
    
    Password entered : i_lv_coding
    You are logged in!!
    

    What does decode () do in Python?

    Python bytes decode() function is used to convert bytes to string object. Both these functions allow us to specify the error handling scheme to use for encoding/decoding errors. The default is 'strict' meaning that encoding errors raise a UnicodeEncodeError.

    How do I decode an encoded file?

    How Do I Decode an Encoded Word Document?.
    Click the "File" tab and select "Options." Select the "Advanced" tab in the left pane. ... .
    Scroll down to the General section. ... .
    Close the encoded file and reopen it..

    What is encoding and decoding in Python?

    Practical Data Science using Python To represent a unicode string as a string of bytes is known as encoding. To convert a string of bytes to a unicode string is known as decoding.

    How do I decode a number in Python?

    Practical Python: Learn Python Basics Step by Step - Python 3.
    x := s[i] as integer, y := substring of s from index i – 1 to i + 1 as integer..
    if x >= 1 and y <= 9, then dp[i] := dp[i] + dp[i – 1].
    if y >= 10 and y <= 26. if i – 2 >= 0, then dp[i] := dp[i] + dp[i – 2], otherwise increase dp[i] by 1..