How do you find the encoding of a string in python?

In this tutorial, we will learn about the Python String encode() method with the help of examples.

The encode() method returns an encoded version of the given string.


title = 'Python Programming'

# change encoding to utf-8 print(title.encode())

# Output: b'Python Programming'

Syntax of String encode()

The syntax of encode() method is:


String encode() Parameters

By default, the encode() method doesn't require any parameters.

It returns an utf-8 encoded version of the string. In case of failure, it raises a UnicodeDecodeError exception.

However, it takes two parameters:

  • encoding - the encoding type a string has to be encoded to
  • errors - response when encoding fails. There are six types of error response
    • strict - default response which raises a UnicodeDecodeError exception on failure
    • ignore - ignores the unencodable unicode from the result
    • replace - replaces the unencodable unicode to a question mark ?
    • xmlcharrefreplace - inserts XML character reference instead of unencodable unicode
    • backslashreplace - inserts a \uNNNN escape sequence instead of unencodable unicode
    • namereplace - inserts a \N{...} escape sequence instead of unencodable unicode

Example 1: Encode to Default Utf-8 Encoding

# unicode string
string = 'pythön!'

# print string
print('The string is:', string)

# default encoding to utf-8

string_utf = string.encode()

# print result print('The encoded version is:', string_utf)


The string is: pythön!
The encoded version is: b'pyth\xc3\xb6n!'

Example 2: Encoding with error parameter

# unicode string
string = 'pythön!'

# print string
print('The string is:', string)

# ignore error

print('The encoded version (with ignore) is:', string.encode("ascii", "ignore"))

# replace error

print('The encoded version (with replace) is:', string.encode("ascii", "replace"))


The string is: pythön!
The encoded version (with ignore) is: b'pythn!'
The encoded version (with replace) is: b'pyth?n!'

Note: Try different encoding and error parameters as well.

String Encoding

Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points.

For efficient storage of these strings, the sequence of code points is converted into a set of bytes. The process is known as encoding.

There are various encodings present which treat a string differently. The popular encodings being utf-8, ascii, etc.

Using the string encode() method, you can convert unicode strings into any encodings supported by Python. By default, Python uses utf-8 encoding.

What do I have to do in Python to figure out which encoding a string has?

How do you find the encoding of a string in python?

asked Feb 13, 2011 at 22:27


In Python 3, all strings are sequences of Unicode characters. There is a bytes type that holds raw bytes.

In Python 2, a string may be of type str or of type unicode. You can tell which using code something like this:

def whatisthis(s):
    if isinstance(s, str):
        print "ordinary string"
    elif isinstance(s, unicode):
        print "unicode string"
        print "not a string"

This does not distinguish "Unicode or ASCII"; it only distinguishes Python types. A Unicode string may consist of purely characters in the ASCII range, and a bytestring may contain ASCII, encoded Unicode, or even non-textual data.


241k26 gold badges391 silver badges466 bronze badges

answered Feb 13, 2011 at 22:40

Greg HewgillGreg Hewgill

906k177 gold badges1131 silver badges1267 bronze badges


How to tell if an object is a unicode string or a byte string

You can use type or isinstance.

In Python 2:

>>> type(u'abc')  # Python 2 unicode string literal

>>> type('abc')   # Python 2 byte string literal

In Python 2, str is just a sequence of bytes. Python doesn't know what its encoding is. The unicode type is the safer way to store text. If you want to understand this more, I recommend

In Python 3:

>>> type('abc')   # Python 3 unicode string literal

>>> type(b'abc')  # Python 3 byte string literal

In Python 3, str is like Python 2's unicode, and is used to store text. What was called str in Python 2 is called bytes in Python 3.

How to tell if a byte string is valid utf-8 or ascii

You can call decode. If it raises a UnicodeDecodeError exception, it wasn't valid.

>>> u_umlaut = b'\xc3\x9c'   # UTF-8 representation of the letter 'Ü'
>>> u_umlaut.decode('utf-8')
>>> u_umlaut.decode('ascii')
Traceback (most recent call last):
  File "", line 1, in 
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in range(128)

answered Feb 13, 2011 at 22:33


24k8 gold badges63 silver badges66 bronze badges


In python 3.x all strings are sequences of Unicode characters. and doing the isinstance check for str (which means unicode string by default) should suffice.

isinstance(x, str)

With regards to python 2.x, Most people seem to be using an if statement that has two checks. one for str and one for unicode.

If you want to check if you have a 'string-like' object all with one statement though, you can do the following:

isinstance(x, basestring)

answered Sep 9, 2013 at 20:24

How do you find the encoding of a string in python?


14.2k8 gold badges58 silver badges80 bronze badges


Unicode is not an encoding - to quote Kumar McMillan:

If ASCII, UTF-8, and other byte strings are "text" ...

...then Unicode is "text-ness";

it is the abstract form of text

Have a read of McMillan's Unicode In Python, Completely Demystified talk from PyCon 2008, it explains things a lot better than most of the related answers on Stack Overflow.

answered May 21, 2012 at 14:12

Alex DeanAlex Dean

15.1k11 gold badges61 silver badges73 bronze badges


If your code needs to be compatible with both Python 2 and Python 3, you can't directly use things like isinstance(s,bytes) or isinstance(s,unicode) without wrapping them in either try/except or a python version test, because bytes is undefined in Python 2 and unicode is undefined in Python 3.

There are some ugly workarounds. An extremely ugly one is to compare the name of the type, instead of comparing the type itself. Here's an example:

# convert bytes (python 3) or unicode (python 2) to str
if str(type(s)) == "":
    # only possible in Python 3
    s = s.decode('ascii')  # or  s = str(s)[2:-1]
elif str(type(s)) == "":
    # only possible in Python 2
    s = str(s)

An arguably slightly less ugly workaround is to check the Python version number, e.g.:

if sys.version_info >= (3,0,0):
    # for Python 3
    if isinstance(s, bytes):
        s = s.decode('ascii')  # or  s = str(s)[2:-1]
    # for Python 2
    if isinstance(s, unicode):
        s = str(s)

Those are both unpythonic, and most of the time there's probably a better way.

answered Aug 14, 2012 at 12:33

Dave BurtonDave Burton

2,79428 silver badges18 bronze badges



import six
if isinstance(obj, six.text_type)

inside the six library it is represented as:

if PY3:
    string_types = str,
    string_types = basestring,

How do you find the encoding of a string in python?

answered Aug 8, 2016 at 8:50

How do you find the encoding of a string in python?


5,1952 gold badges36 silver badges36 bronze badges


Note that on Python 3, it's not really fair to say any of:

  • strs are UTFx for any x (eg. UTF8)

  • strs are Unicode

  • strs are ordered collections of Unicode characters

Python's str type is (normally) a sequence of Unicode code points, some of which map to characters.

Even on Python 3, it's not as simple to answer this question as you might imagine.

An obvious way to test for ASCII-compatible strings is by an attempted encode:

"Hello there!".encode("ascii")
#>>> b'Hello there!'

"Hello there... ☃!".encode("ascii")
#>>> Traceback (most recent call last):
#>>>   File "", line 4, in 
#>>> UnicodeEncodeError: 'ascii' codec can't encode character '\u2603' in position 15: ordinal not in range(128)

The error distinguishes the cases.

In Python 3, there are even some strings that contain invalid Unicode code points:

"Hello there!".encode("utf8")
#>>> b'Hello there!'

#>>> Traceback (most recent call last):
#>>>   File "", line 19, in 
#>>> UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 0: surrogates not allowed

The same method to distinguish them is used.

answered Jul 9, 2014 at 2:35

How do you find the encoding of a string in python?


55.5k14 gold badges108 silver badges165 bronze badges


This may help someone else, I started out testing for the string type of the variable s, but for my application, it made more sense to simply return s as utf-8. The process calling return_utf, then knows what it is dealing with and can handle the string appropriately. The code is not pristine, but I intend for it to be Python version agnostic without a version test or importing six. Please comment with improvements to the sample code below to help other people.

def return_utf(s):
    if isinstance(s, str):
        return s.encode('utf-8')
    if isinstance(s, (int, float, complex)):
        return str(s).encode('utf-8')
        return s.encode('utf-8')
    except TypeError:
            return str(s).encode('utf-8')
        except AttributeError:
            return s
    except AttributeError:
        return s
    return s # assume it was already utf-8

answered Dec 23, 2015 at 22:16


You could use Universal Encoding Detector, but be aware that it will just give you best guess, not the actual encoding, because it's impossible to know encoding of a string "abc" for example. You will need to get encoding information elsewhere, eg HTTP protocol uses Content-Type header for that.

Tom Morris

3,9592 gold badges24 silver badges42 bronze badges

answered Feb 13, 2011 at 22:34


16.3k7 gold badges37 silver badges27 bronze badges


In Python-3, I had to understand if string is like b='\x7f\x00\x00\x01' or b='' My solution is like that:

def get_str(value):
    str_value = str(value)
    if str_value.isprintable():
        return str_value

    return '.'.join(['%d' % x for x in value])

Worked for me, I hope works for someone needed

answered Apr 7, 2021 at 16:05

Ali KatkarAli Katkar

4894 silver badges6 bronze badges

For py2/py3 compatibility simply use

import six if isinstance(obj, six.text_type)

answered May 28, 2018 at 11:56

How do you find the encoding of a string in python?

Vishvajit PathakVishvajit Pathak

2,9831 gold badge20 silver badges16 bronze badges

One simple approach is to check if unicode is a builtin function. If so, you're in Python 2 and your string will be a string. To ensure everything is in unicode one can do:

import builtins

i = 'cats'
if 'unicode' in dir(builtins):     # True in python 2, False in 3
  i = unicode(i)

answered Sep 18, 2019 at 14:24

How do you find the encoding of a string in python?


23.7k14 gold badges152 silver badges199 bronze badges

Not the answer you're looking for? Browse other questions tagged python unicode encoding utf-8 or ask your own question.

How do you find the encoding in Python?

Encoding : which encoding it is..
Code : encoding. convert(string) to convert the encoding..
Code : Example..
Output :.
detect() : It is a charade. detect() wrapper. It encodes the strings and handles the UnicodeDecodeError exceptions..

What encoding is this string Python?

Since Python 3.0, strings are stored as Unicode, i.e. each character in the string is represented by a code point. So, each string is just a sequence of Unicode code points. For efficient storage of these strings, the sequence of code points is converted into a set of bytes.

What is the encoding of a string?

String objects use UTF-16 encoding. The problem with UTF-16 is that it cannot be modified. There is only one way that can be used to get different encoding i.e. byte[] array.

Why encode () is used in Python?

Definition. The Python encode() is a built-in string method that is used to return an encoded version of the string according to the encoded standard. Python encode() string function is used to secure the string by encoding it based on the specified encoding type.