Python get codec of file

Here is a small snippet to help you to guess the encoding. It guesses between latin1 and utf8 quite good. It converts a byte string to a unicode string.

# Attention: Order of encoding_guess_list is import. Example: "latin1" always succeeds.
encoding_guess_list=['utf8', 'latin1']
def try_unicode[string, errors='strict']:
    if isinstance[string, unicode]:
        return string
    assert isinstance[string, str], repr[string]
    for enc in encoding_guess_list:
        try:
            return string.decode[enc, errors]
        except UnicodeError, exc:
            continue
    raise UnicodeError['Failed to convert %r' % string]
def test_try_unicode[]:
    for start, should in [
        ['\xfc', u'ü'],
        ['\xc3\xbc', u'ü'],
        ['\xbb', u'\xbb'], # postgres/psycopg2 latin1: RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK
        ]:
        result=try_unicode[start, errors='strict']
        if not result==should:
            raise Exception[u'Error: start=%r should=%r result=%r' % [
                    start, should, result]]

How can I tell what encoding a file is using?

Open up your file using regular old vanilla Notepad that comes with Windows. It will show you the encoding of the file when you click "Save As...". Whatever the default-selected encoding is, that is what your current encoding is for the file.

How do I check if a file is UTF

Could be simpler by using only one line: codecs. open["path/to/file", encoding="utf-8", errors="strict"].

How do I check the encoding of a CSV file in Python?

The evaluated encoding of the open file will display on the bottom bar, far right side. The encodings supported can be seen by going to Settings -> Preferences -> New Document/Default Directory and looking in the drop down.

How do you find the encoding of a string in Python?

You can use type or isinstance . In Python 2, str is just a sequence of bytes. Python doesn't know what its encoding is. The unicode type is the safer way to store text.

Chủ Đề