programming python

Biểu đồ xác suất biểu đồ

Trọng tâm. Hiển thị với các ví dụ. hãy ước tính và vẽ đồ thị hàm mật độ xác suất của một biến ngẫu nhiên bằng hàm biểu đồ Matplotlib của Python

Việc tạo ra các biến ngẫu nhiên với đặc tính phân bố xác suất theo yêu cầu là hết sức quan trọng trong việc mô phỏng một hệ thống thông tin liên lạc. Hãy xem cách chúng ta có thể tạo một biến ngẫu nhiên đơn giản, ước tính và vẽ đồ thị hàm mật độ xác suất [PDF] từ dữ liệu được tạo và sau đó khớp nó với PDF lý thuyết dự định. Biến ngẫu nhiên bình thường được xem xét ở đây để minh họa

Bước 1. Tạo mẫu ngẫu nhiên

Một cuộc khảo sát về các phương pháp cơ bản thường được sử dụng để tạo ra một biến ngẫu nhiên nhất định được đưa ra trong [1]. Đối với minh họa này, chúng tôi sẽ xem xét biến ngẫu nhiên bình thường với các tham số sau. μ – trung bình và σ – độ lệch chuẩn. Trước tiên, hãy tạo một vectơ gồm các số ngẫu nhiên được phân phối ngẫu nhiên có độ dài đủ [giả sử 100000] với một số giá trị hợp lệ cho μ và σ. Có nhiều hơn một cách để tạo ra điều này. Hai trong số chúng được đưa ra dưới đây

● Cách 1. Sử dụng numpy tích hợp. ngẫu nhiên. chức năng normal[] [yêu cầu cài đặt gói numpy]

import numpy as np

mu=10;sigma=2.5 #mean=10,deviation=2.5
L=100000 #length of the random vector

#Random samples generated using numpy.random.normal[]
samples_normal = np.random.normal[loc=mu,scale=sigma,size=[L,1]] #generate normally distributted samples

● Cách 2. Phương pháp biến đổi Box-Muller [2] tạo ra cặp số ngẫu nhiên phân phối chuẩn [Z1, Z2] bằng cách biến đổi cặp mẫu ngẫu nhiên độc lập phân phối đều [U1,U2]. Thuật toán chuyển đổi được đưa ra bởi

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

Bước 2. Vẽ biểu đồ ước tính

Thông thường, nếu chúng tôi có một vectơ số ngẫu nhiên được rút ra từ một phân phối, chúng tôi có thể ước tính PDF bằng công cụ biểu đồ. Chức năng lịch sử của Matplotlib có thể được sử dụng để tính toán và vẽ biểu đồ. Nếu đối số mật độ được đặt thành 'True', thì hàm hist sẽ tính toán biểu đồ được chuẩn hóa sao cho diện tích bên dưới biểu đồ sẽ có tổng bằng 1. Ước tính và vẽ biểu đồ đã chuẩn hóa bằng hàm hist

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

Bước 3. PDF lý thuyết

Và để xác minh, hãy phủ tệp PDF lý thuyết cho bản phân phối dự kiến. PDF lý thuyết của các mẫu ngẫu nhiên phân phối bình thường được đưa ra bởi

PDF lý thuyết cho phân phối bình thường có thể dễ dàng lấy được từ các số liệu thống kê. định mức. pdf[] trong gói SciPy

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Hình 1. PDF ước tính [biểu đồ] và PDF lý thuyết cho các mẫu được tạo bằng cách sử dụng numpy. ngẫu nhiên. chức năng bình thường []

Biểu đồ và PDF lý thuyết của các mẫu ngẫu nhiên được tạo bằng phép biến đổi Box-Muller, có thể được vẽ theo cách tương tự

Trong hướng dẫn này, bạn sẽ được trang bị để tạo các biểu đồ biểu đồ Python có chất lượng sản xuất, sẵn sàng trình bày với nhiều lựa chọn và tính năng

Nếu bạn có kiến thức nhập môn đến trung cấp về Python và thống kê, thì bạn có thể sử dụng bài viết này như một cửa hàng duy nhất để xây dựng và vẽ biểu đồ biểu đồ trong Python bằng các thư viện từ ngăn xếp khoa học của nó, bao gồm NumPy, Matplotlib, Pandas và Seaborn

Biểu đồ là một công cụ tuyệt vời để đánh giá nhanh phân phối xác suất mà hầu hết mọi đối tượng đều có thể hiểu được bằng trực giác. Python cung cấp một số tùy chọn khác nhau để xây dựng và vẽ biểu đồ. Hầu hết mọi người biết một biểu đồ bằng biểu diễn đồ họa của nó, tương tự như biểu đồ thanh

Bài viết này sẽ hướng dẫn bạn cách tạo các ô giống như ở trên cũng như các ô phức tạp hơn. Đây là những gì bạn sẽ bao gồm

Xây dựng biểu đồ bằng Python thuần túy, không sử dụng thư viện của bên thứ ba
Xây dựng biểu đồ với NumPy để tóm tắt dữ liệu cơ bản
Vẽ biểu đồ kết quả bằng Matplotlib, Pandas và Seaborn

Tiền thưởng miễn phí. Thời gian ngắn?

Biểu đồ trong Python thuần túy

Khi bạn đang chuẩn bị vẽ một biểu đồ tần suất, đơn giản nhất là đừng nghĩ về các thùng mà hãy báo cáo số lần mỗi giá trị xuất hiện [một bảng tần số]. Một từ điển Python rất phù hợp cho nhiệm vụ này

>>>

>>> # Need not be sorted, necessarily
>>> a = [0, 1, 1, 1, 2, 3, 7, 7, 23]

>>> def count_elements[seq] -> dict:
..     """Tally elements from `seq`."""
..     hist = {}
..     for i in seq:
..         hist[i] = hist.get[i, 0] + 1
..     return hist

>>> counted = count_elements[a]
>>> counted
{0: 1, 1: 3, 2: 1, 3: 1, 7: 2, 23: 1}

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

9 trả về một từ điển với các phần tử duy nhất từ chuỗi dưới dạng khóa và tần số [số lượng] của chúng dưới dạng giá trị. Trong vòng lặp trên

>>> recounted.items[] == counted.items[]
True

>>> recounted.items[] == counted.items[]
True

1 nói, “đối với mỗi phần tử của chuỗi, hãy tăng giá trị tương ứng của nó trong

>>> recounted.items[] == counted.items[]
True

2 lên 1. ”

Trên thực tế, đây chính xác là những gì được thực hiện bởi lớp

>>> recounted.items[] == counted.items[]
True

3 từ thư viện chuẩn của Python, lớp này phân lớp một từ điển Python và ghi đè phương thức

>>> recounted.items[] == counted.items[]
True

4 của nó

>>>

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Bạn có thể xác nhận rằng chức năng thủ công của bạn thực hiện hầu như giống như

>>> recounted.items[] == counted.items[]
True

3 bằng cách kiểm tra sự bằng nhau giữa hai

>>>

>>> recounted.items[] == counted.items[]
True

Chi tiết kỹ thuật. Ánh xạ từ

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

9 ở trên mặc định thành hàm C được tối ưu hóa cao hơn nếu có sẵn. Trong hàm Python

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

9, một tối ưu hóa vi mô mà bạn có thể thực hiện là khai báo

>>> recounted.items[] == counted.items[]
True

8 trước vòng lặp for. Điều này sẽ liên kết một phương thức với một biến để gọi nhanh hơn trong vòng lặp

Có thể hữu ích khi xây dựng các hàm đơn giản hóa từ đầu như là bước đầu tiên để hiểu những hàm phức tạp hơn. Hãy phát minh lại bánh xe một chút với biểu đồ ASCII tận dụng định dạng đầu ra của Python

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

Hàm này tạo một biểu đồ tần suất được sắp xếp trong đó số lượng được biểu diễn dưới dạng các ký hiệu dấu cộng [

>>> recounted.items[] == counted.items[]
True

9]. Gọi

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

70 trên một từ điển trả về một danh sách đã sắp xếp các khóa của nó, sau đó bạn truy cập giá trị tương ứng cho từng khóa với

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

71. Để thấy điều này hoạt động, bạn có thể tạo tập dữ liệu lớn hơn một chút với mô-đun

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

72 của Python

>>>

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

Ở đây, bạn đang mô phỏng việc gảy từ

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

73 với tần số được cung cấp bởi

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

74 [một biểu thức trình tạo]. Dữ liệu mẫu thu được lặp lại từng giá trị từ

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

73 một số lần nhất định trong khoảng từ 5 đến 15

Ghi chú.

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

76 được sử dụng để tạo hoặc khởi tạo, trình tạo số giả ngẫu nhiên cơ bản [PRNG] được sử dụng bởi

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

72. Nghe có vẻ giống như một nghịch lý, nhưng đây là một cách làm cho dữ liệu ngẫu nhiên có thể tái tạo và xác định. Nghĩa là, nếu bạn sao chép nguyên trạng mã ở đây, bạn sẽ nhận được chính xác biểu đồ giống như vậy bởi vì lệnh gọi đầu tiên tới ____178 sau khi khởi tạo bộ tạo sẽ tạo ra dữ liệu "ngẫu nhiên" giống hệt nhau bằng cách sử dụng Mersenne Twister

Loại bỏ các quảng cáo

Xây dựng từ cơ sở. Tính toán biểu đồ trong NumPy

Cho đến nay, bạn đã làm việc với thứ tốt nhất có thể gọi là “bảng tần số. ” Nhưng về mặt toán học, biểu đồ là ánh xạ của các thùng [khoảng] thành tần số. Về mặt kỹ thuật hơn, nó có thể được sử dụng để tính gần đúng hàm mật độ xác suất [PDF] của biến cơ bản

Tiếp tục từ “bảng tần suất” ở trên, trước tiên, một biểu đồ thực sẽ “phân tách” phạm vi giá trị và sau đó đếm số lượng giá trị rơi vào mỗi ngăn. Đây là chức năng của hàm

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

79 của NumPy và nó là cơ sở cho các hàm khác mà bạn sẽ thấy sau này trong các thư viện Python như Matplotlib và Pandas

Hãy xem xét một mẫu số float được rút ra từ phân phối Laplace. Phân phối này có đuôi béo hơn phân phối bình thường và có hai tham số mô tả [vị trí và tỷ lệ]

>>>

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Trong trường hợp này, bạn đang làm việc với phân phối liên tục và sẽ không hữu ích lắm nếu kiểm đếm từng số float một cách độc lập, xuống đến chữ số thập phân thứ mười một. Thay vào đó, bạn có thể tạo thùng hoặc “nhóm” dữ liệu và đếm các quan sát rơi vào mỗi thùng. Biểu đồ là tổng số giá trị trong mỗi thùng

>>>

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Kết quả này có thể không trực quan ngay lập tức.

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

20 theo mặc định sử dụng 10 thùng có kích thước bằng nhau và trả về một bộ giá trị tần số và các cạnh thùng tương ứng. Chúng là các cạnh theo nghĩa là sẽ có một cạnh bin nhiều hơn số thành viên của biểu đồ

>>>

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Chi tiết kỹ thuật. Tất cả trừ thùng cuối cùng [ngoài cùng bên phải] đang mở một nửa. Nghĩa là, tất cả các ngăn trừ ngăn cuối cùng là [bao gồm, loại trừ] và ngăn cuối cùng là [bao gồm, bao gồm]

Một sự cố rất cô đọng về cách các thùng được xây dựng bởi NumPy trông như thế này

>>>

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Trường hợp trên có rất nhiều ý nghĩa. 10 ngăn cách đều nhau trên phạm vi từ đỉnh đến đỉnh là 23 có nghĩa là các khoảng có chiều rộng 2. 3

Từ đó, hàm ủy quyền cho

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

21 hoặc

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

22. Bản thân

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

23 có thể được sử dụng để xây dựng hiệu quả “bảng tần suất” mà bạn đã bắt đầu ở đây, với điểm khác biệt là bao gồm các giá trị không có lần xuất hiện nào

>>>

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Ghi chú.

>>> recounted.items[] == counted.items[]
True

2 ở đây thực sự đang sử dụng các thùng có chiều rộng 1. 0 thay vì đếm "rời rạc". Do đó, điều này chỉ hoạt động để đếm số nguyên, không phải số float, chẳng hạn như

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

Trực quan hóa Biểu đồ với Matplotlib và Pandas

Bây giờ bạn đã biết cách xây dựng biểu đồ trong Python từ đầu, hãy xem các gói Python khác có thể thực hiện công việc đó cho bạn như thế nào. Matplotlib cung cấp chức năng trực quan hóa biểu đồ Python ngay lập tức với một trình bao bọc linh hoạt xung quanh NumPy's

#Samples generated using Box-Muller transformation

U1 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]
U2 = np.random.uniform[low=0,high=1,size=[L,1]] #uniformly distributed random numbers U[0,1]

a = np.sqrt[-2*np.log[U1]]
b = 2*np.pi*U2

Z = a*np.cos[b] #Standard Normal distributed numbers
samples_box_muller= Z*sigma+mu #Normal distribution with mean and sigma

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Như đã định nghĩa trước đó, một biểu đồ của biểu đồ sử dụng các cạnh bin của nó trên trục x và các tần số tương ứng trên trục y. Trong biểu đồ trên, thông qua

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

27 chọn giữa hai thuật toán để ước tính số lượng thùng "lý tưởng". Ở cấp độ cao, mục tiêu của thuật toán là chọn độ rộng thùng tạo ra biểu diễn dữ liệu trung thực nhất. Để biết thêm về chủ đề này, chủ đề này có thể khá kỹ thuật, hãy xem phần Chọn thùng biểu đồ từ tài liệu Astropy

Ở trong ngăn xếp khoa học của Python, Pandas'

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

28 sử dụng

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

29 để vẽ biểu đồ Matplotlib của Sê-ri đầu vào

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

00 tương tự nhưng tạo biểu đồ cho từng cột dữ liệu trong DataFrame

Loại bỏ các quảng cáo

Vẽ sơ đồ ước tính mật độ hạt nhân [KDE]

Trong hướng dẫn này, bạn đã làm việc với các mẫu, nói một cách thống kê. Cho dù dữ liệu là rời rạc hay liên tục, nó được giả định là bắt nguồn từ một tổng thể có phân phối chính xác, thực tế được mô tả chỉ bằng một vài tham số

Ước tính mật độ hạt nhân [KDE] là một cách để ước tính hàm mật độ xác suất [PDF] của biến ngẫu nhiên “làm cơ sở” cho mẫu của chúng tôi. KDE là một phương tiện làm mịn dữ liệu

Gắn bó với thư viện Pandas, bạn có thể tạo và phủ các biểu đồ mật độ bằng cách sử dụng

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

01, khả dụng cho cả đối tượng

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

02 và

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

03. Nhưng trước tiên, hãy tạo hai mẫu dữ liệu riêng biệt để so sánh

>>>

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Bây giờ, để vẽ từng biểu đồ trên cùng một trục Matplotlib

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Các phương pháp này tận dụng

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

04 của SciPy, dẫn đến một tệp PDF trông mượt mà hơn

Nếu bạn xem xét kỹ hơn chức năng này, bạn có thể thấy nó gần đúng với PDF “đúng” như thế nào đối với một mẫu tương đối nhỏ gồm 1000 điểm dữ liệu. Bên dưới, trước tiên bạn có thể xây dựng bản phân phối “phân tích” với

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

05. Đây là một thể hiện của lớp gói gọn phân phối chuẩn thống kê tiêu chuẩn, các khoảnh khắc của nó và các hàm mô tả. PDF của nó là "chính xác" theo nghĩa nó được định nghĩa chính xác là

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Xây dựng từ đó, bạn có thể lấy một mẫu ngẫu nhiên gồm 1000 điểm dữ liệu từ phân phối này, sau đó cố gắng quay lại ước tính của tệp PDF bằng ________ 307

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Đây là một đoạn mã lớn hơn, vì vậy hãy dành một giây để chạm vào một vài dòng chính

Gói con

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

08 của SciPy cho phép bạn tạo các đối tượng Python đại diện cho các bản phân phối phân tích mà bạn có thể lấy mẫu từ đó để tạo dữ liệu thực tế. Vì vậy,

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

09 đại diện cho một biến ngẫu nhiên liên tục bình thường và bạn tạo các số ngẫu nhiên từ biến đó với

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

Để đánh giá cả PDF phân tích và Gaussian KDE, bạn cần một mảng

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

11 lượng tử [độ lệch chuẩn trên/dưới giá trị trung bình, đối với phân phối chuẩn].

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

12 đại diện cho một tệp PDF ước tính mà bạn cần đánh giá trên một mảng để tạo ra thứ gì đó có ý nghĩa trực quan trong trường hợp này

Dòng cuối cùng chứa một số LaTex, tích hợp độc đáo với Matplotlib

Một sự thay thế ưa thích với Seaborn

Hãy mang thêm một gói Python vào hỗn hợp. Seaborn có hàm

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

13 vẽ biểu đồ và KDE cho phân phối đơn biến trong một bước. Sử dụng mảng NumPy

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

14 từ trước đó

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Cuộc gọi ở trên tạo ra một KDE. Ngoài ra còn có tùy chọn để phù hợp với một phân phối cụ thể cho dữ liệu. Điều này khác với KDE và bao gồm ước tính tham số cho dữ liệu chung và tên phân phối được chỉ định

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Một lần nữa, lưu ý sự khác biệt nhỏ. Trong trường hợp đầu tiên, bạn đang ước tính một số tệp PDF không xác định;

Loại bỏ các quảng cáo

Các công cụ khác trong Pandas

Ngoài các công cụ vẽ đồ thị, Pandas cũng cung cấp một phương thức

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

15 thuận tiện để tính toán biểu đồ các giá trị khác null cho một

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

02 của Pandas

>>>

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Ở những nơi khác,

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

17 là một cách thuận tiện để sắp xếp các giá trị thành các khoảng tùy ý. Giả sử bạn có một số dữ liệu về độ tuổi của các cá nhân và muốn sắp xếp chúng một cách hợp lý

>>>

>>> from collections import Counter

>>> recounted = Counter[a]
>>> recounted
Counter[{0: 1, 1: 3, 3: 1, 2: 1, 7: 2, 23: 1}]

Điều thú vị là cả hai hoạt động này cuối cùng đều sử dụng mã Cython giúp chúng cạnh tranh về tốc độ trong khi vẫn duy trì tính linh hoạt

Được rồi, vậy tôi nên sử dụng cái nào?

Tại thời điểm này, bạn đã thấy nhiều hàm và phương thức để lựa chọn để vẽ biểu đồ Python. Làm thế nào để họ so sánh? . ” Đây là bản tóm tắt các chức năng và phương thức mà bạn đã đề cập cho đến nay, tất cả đều liên quan đến việc chia nhỏ và biểu diễn các bản phân phối trong Python

Bạn Có/Muốn Cân nhắc Sử dụng [Các] Ghi chú Dữ liệu số nguyên cắt gọn được đặt trong một cấu trúc dữ liệu như danh sách, bộ hoặc tập hợp và bạn muốn tạo biểu đồ Python mà không cần nhập bất kỳ thư viện bên thứ ba nào.

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

18 từ thư viện chuẩn Python cung cấp một cách nhanh chóng và đơn giản để lấy số lượng tần suất từ một vùng chứa dữ liệu. Đây là một bảng tần suất, vì vậy nó không sử dụng khái niệm tạo thùng như một biểu đồ "đúng". Mảng dữ liệu lớn và bạn muốn tính toán biểu đồ "toán học" đại diện cho các thùng và tần số tương ứng. NumPy's

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

20 và

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

21 rất hữu ích để tính toán các giá trị biểu đồ bằng số và các cạnh thùng tương ứng. Để biết thêm, hãy xem

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

31. Dữ liệu dạng bảng trong đối tượng

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

02 hoặc

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

03 của Pandas. Các phương thức của Pandas như

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

34,

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

35,

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

36 và

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

37, cũng như

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

38 và

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

39. Xem tài liệu trực quan hóa Pandas để lấy cảm hứng. Tạo một biểu đồ tinh chỉnh, có thể tùy chỉnh cao từ bất kỳ cấu trúc dữ liệu nào.

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

40 là một hàm vẽ biểu đồ biểu đồ được sử dụng rộng rãi sử dụng

#For plotting
import matplotlib.pyplot as plt
%matplotlib inline
plt.style.use['ggplot']

fig, ax0 = plt.subplots[ncols=1, nrows=1] #creating plot axes
[values, bins, _] = ax0.hist[samples_normal,bins=100,density=True,label="Histogram of samples"] #Compute and plot histogram, return the computed values and bins

20 và là cơ sở cho các hàm vẽ biểu đồ của Pandas. Matplotlib, và đặc biệt là khung hướng đối tượng của nó, rất tuyệt vời để tinh chỉnh các chi tiết của biểu đồ. Giao diện này có thể mất một chút thời gian để thành thạo, nhưng cuối cùng cho phép bạn rất chính xác về cách trình bày mọi hình ảnh trực quan. Thiết kế và tích hợp đóng hộp sẵn. Seaborn's

from scipy import stats
bin_centers = 0.5*[bins[1:] + bins[:-1]]
pdf = stats.norm.pdf[x = bin_centers, loc=mu, scale=sigma] #Compute probability density function
ax0.plot[bin_centers, pdf, label="PDF",color='black'] #Plot PDF
ax0.legend[]#Legend entries
ax0.set_title['PDF of samples from numpy.random.normal[]'];

42, để kết hợp biểu đồ tần suất và biểu đồ KDE hoặc biểu đồ phù hợp với phân phối. Về cơ bản, một "trình bao bọc xung quanh một trình bao bọc" tận dụng biểu đồ Matplotlib bên trong, từ đó sử dụng NumPy

Tiền thưởng miễn phí. Thời gian ngắn?

Bạn cũng có thể tìm thấy các đoạn mã từ bài viết này cùng nhau trong một tập lệnh tại trang Tài liệu Python thực

Với điều đó, chúc may mắn khi tạo biểu đồ trong tự nhiên. Hy vọng rằng một trong những công cụ trên sẽ phù hợp với nhu cầu của bạn. Dù bạn làm gì, chỉ cần không sử dụng biểu đồ hình tròn

Đánh dấu là đã hoàn thành

Xem ngay Hướng dẫn này có một khóa học video liên quan do nhóm Real Python tạo. Xem nó cùng với hướng dẫn bằng văn bản để hiểu sâu hơn. Vẽ biểu đồ Python. NumPy, Matplotlib, Pandas và Seaborn

🐍 Thủ thuật Python 💌

Nhận một Thủ thuật Python ngắn và hấp dẫn được gửi đến hộp thư đến của bạn vài ngày một lần. Không có thư rác bao giờ. Hủy đăng ký bất cứ lúc nào. Được quản lý bởi nhóm Real Python

Gửi cho tôi thủ thuật Python »

Giới thiệu về Brad Solomon

Brad là một kỹ sư phần mềm và là thành viên của Nhóm hướng dẫn Python thực sự

» Thông tin thêm về Brad

Mỗi hướng dẫn tại Real Python được tạo bởi một nhóm các nhà phát triển để nó đáp ứng các tiêu chuẩn chất lượng cao của chúng tôi. Các thành viên trong nhóm đã làm việc trong hướng dẫn này là

Adriana

Đan

Joanna

Bậc thầy Kỹ năng Python trong thế giới thực Với quyền truy cập không giới hạn vào Python thực

Tham gia với chúng tôi và có quyền truy cập vào hàng nghìn hướng dẫn, khóa học video thực hành và cộng đồng các Pythonistas chuyên gia

Nâng cao kỹ năng Python của bạn »

Bậc thầy Kỹ năng Python trong thế giới thực
Với quyền truy cập không giới hạn vào Python thực

Tham gia với chúng tôi và có quyền truy cập vào hàng ngàn hướng dẫn, khóa học video thực hành và cộng đồng Pythonistas chuyên gia

Nâng cao kỹ năng Python của bạn »

Bạn nghĩ sao?

Đánh giá bài viết này

Tweet Chia sẻ Chia sẻ Email

Bài học số 1 hoặc điều yêu thích mà bạn đã học được là gì?

Mẹo bình luận. Những nhận xét hữu ích nhất là những nhận xét được viết với mục đích học hỏi hoặc giúp đỡ các sinh viên khác. Nhận các mẹo để đặt câu hỏi hay và nhận câu trả lời cho các câu hỏi phổ biến trong cổng thông tin hỗ trợ của chúng tôi

Bước 1. Tạo mẫu ngẫu nhiên

Bước 2. Vẽ biểu đồ ước tính

Bước 3. PDF lý thuyết

Biểu đồ trong Python thuần túy

Xây dựng từ cơ sở. Tính toán biểu đồ trong NumPy

Trực quan hóa Biểu đồ với Matplotlib và Pandas

Vẽ sơ đồ ước tính mật độ hạt nhân [KDE]

Một sự thay thế ưa thích với Seaborn

Các công cụ khác trong Pandas

Được rồi, vậy tôi nên sử dụng cái nào?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề