Hướng dẫn print raw binary python

'rb' mode enables you to read raw binary data from a file in Python:

with open(filename, 'rb') as file:
    raw_binary_data = file.read()

type(raw_binary_data) == bytes. bytes is an immutable sequence of bytes in Python.

Don't confuse bytes and their text representation: print(raw_binary_data) would show you the text representation of the data e.g., a byte 127 (base 10: decimal) that you can represent as
bin(127) == '0b1111111' (base 2: binary) or as hex(127) == '0x7f' (base 16: hexadecimal) is shown as b'\x7f' (seven ascii characters are printed). Bytes from the printable ascii range are represented as the corresponding ascii characters e.g., b'\x41' is shown as b'A' (65 == 0x41 == 0b1000001).

0x7f byte is not stored on disk as seven ascii binary digits 1111111, it is not stored as two ascii hex digits: 7F, it is not stored as three literal decimal digits 127. b'\x7f' is a text representation of the byte that may be used to specify it in Python source code (you won't find literal seven ascii characters b'\x7f' on disk too). This code writes a single byte to disk:

with open('output.bin', 'wb') as file:
    file.write(b'\x7f')

Some kind of characters must be used to represent the bytes, what are they?

OS interfaces (the way you access hardware such as disks) are defined in terms of bytes e.g., POSIX read(2) i.e., the byte is a fundamental unit here: you can read/write bytes directly -- you don't need any intermediate representation. Watch Richard Feynman. Why.

How bytes are represented physically is between OS drivers and the hardware -- it may be anything -- you don't need to worry about it: it is hidden behind the uniform OS interface. See How is data physically written, read and stored inside hard drives?

You could call os.read() directly in Python but you don't need it; file.read() does it for you (Python 3 file objects are implemented on top of POSIX interface directly. Python 2 I/O uses C stdio library that in turn uses OS interfaces to implement its functionality).

As you point out, it's up to the OS drivers and hardware to establish how bytes are written, but the Python interpreter would then be able to read them. So it's reading something - what is that? It's not reading magnetic orientation of particles on the disk, is it? It's reading something symbolic, and I want access to it.

It's reading bytes. A hard disk is a small computer and therefore interesting things may happen but it does not change that It's bytes all the way down (as far as "symbolic" or software is concerned).

The book "CODE The Hidden Language of Computer Hardware and Software" provides a very gentle introduction into how information is represented in computers — the word "byte" is not defined until page 180. To see through abstraction levels used in computers, the course "From NAND to Tetris" can help.

The file that contains the binary data is called a binary file. Any formatted or unformatted binary data is stored in a binary file, and this file is not human-readable and is used by the computer directly.  When a binary file is required to read or transfer from one location to another location, the file’s content is converted or encoded into a human-readable format. The extension of the binary file is .bin. The content of the binary file can be read by using a built-in function or module. Different ways to read binary files in Python have been shown in this tutorial.

Nội dung chính

  • Pre-requisite:
  • Example-1: Read the binary file of string data into the byte array
  • Example-2: Read the binary file of string data into the array
  • Example-3: Read binary file using NumPy
  • Syntax of tofile():
  • Syntax of fomfile():
  • Conclusion:

Nội dung chính

  • Pre-requisite:
  • Example-1: Read the binary file of string data into the byte array
  • Example-2: Read the binary file of string data into the array
  • Example-3: Read binary file using NumPy
  • Syntax of tofile():
  • Syntax of fomfile():
  • Conclusion:
  • About the author

Pre-requisite:

Before checking the examples of this tutorial, it is better to create one or more binary files to use in the example script. The script of two python files has given below to create two binary files. The binary1.py will create a binary file named string.bin that will contain string data, and the binary2.py will create a binary file named number_list.bin that will contain a list of numeric data.

Binary1.py

# Open a file handler to create a binary file

file_handler = open("string.bin", "wb")

# Add two lines of text in the binary file

file_handler.write(b"Welcome to LinuxHint.\nLearn Python Programming.")

# Close the file handler

file_handler.close()

Binary2.py

# Open a file handler to create a binary file

file=open("number_list.bin","wb")

# Declare a list of numeric values

numbers=[10,30,45,60,70,85,99]

# Convert the list to array

barray=bytearray(numbers)

# Write array into the file

file.write(barray)

file.close()

Example-1: Read the binary file of string data into the byte array

Many ways exist in Python to read the binary file. You can read the particular number of bytes or the full content of the binary file at a time. Create a python file with the following script. The open() function has used to open the string.bin for reading. The read() function has been used to read 7 characters from the file in each iteration of while loop and print. Next, the read() function has been used without any argument to read the full content of the binary file that will be printed later.

# Open the binary file for reading

file_handler = open("string.bin", "rb")

# Read the first three bytes from the binary file

data_byte = file_handler.read(7)

print("Print three characters in each iteration:")

# Iterate the loop to read the remaining part of the file

while data_byte:

    print(data_byte)

    data_byte = file_handler.read(7)

# Read the entire file as a single byte string

with open('string.bin', 'rb') as fh:

    content = fh.read()

print("Print the full content of the binary file:")

print(content)

Output:

The following output will appear after executing the above script.

Example-2: Read the binary file of string data into the array

Create a python file with the following script to read a binary file named number_list.bin created previously. This binary file contains a list of numeric data. Like the previous example, the open() function has used open the binary file for reading in the script. Next, the first 5 numbers will be read from the binary file and converted into a list before printing.

# Open the binary file for reading

file = open("number_list.bin", "rb")

# Read the first five numbers into a list

number = list(file.read(5))

# Print the list

print(number)

# Close the file

file.close()

Output:

The following output will appear after executing the above script. The binary file contains 7 numbers, and the first five numbers have printed in the output.

Example-3: Read binary file using NumPy

The ways to create the binary file using the NumPy array and read the content of the binary file using into a list by using the NumPy module have shown in this part of the tutorial. Before checking the script given below, you have to install the NumPy module by executing the command from the terminal or installing the NumPy package in the Python editor, where the script will be executed. The tofile() function is used to create a text or binary file, and the fromfile() function is used to create an array by reading a text or binary file.

Syntax of tofile():

ndarray.tofile(file, sep='', format='%s')

The first argument is mandatory and takes the filename or string or path as a value. The file will be created if a filename is provided in this argument. The second argument is optional that is used to separate the array elements. The third argument is optional also and used for formatting the output of the text file.

Syntax of fomfile():

numpy.fromfile(file, dtype=float, count=- 1, sep='', offset=0, *, like=None)

The first argument is mandatory and takes the filename or string or path as a value. The content of the file will be read if a filename will be provided in this argument. The dtype defines the data type of the returned array. The count is used to count the number of items. The purpose of the sep is to separate the text or array items. The offset is used to define the current position of the file. The last argument is used to create an array object that not a NumPy array.

Create a python file with the following script to create a binary file using NumPy array and read and print the content of the binary file.

# Import NumPy module

import numpy as np

# Declare numpy array

nparray = np.array([34, 89, 30, 45, 90, 11])

# Create binary file from numpy array

nparray.tofile("list.bin")

# Print data from the binary file

print(np.fromfile("list.bin",  dtype=np.int64))

Output:

The following output will appear after executing the above script.

Conclusion:

Three different ways to read the binary file have been shown in this tutorial by using simple examples. The first example returned the content of the binary file as a byte array. The second example returned the content of the binary file as a list. The last example also returned the content of the binary file as a list.