So, I am iterating through a dictionary and taking a bunch of values out as a array - Trying to make a Dataframe with each observation as a separate row.
X1 =[]
for k,v in DF_grp:
date = v['Date'].astype[datetime]
usage = v['Usage'].astype[float]
comm = v['comm'].astype[float]
mdf = pd.DataFrame[{'Id' : k[0],'date':date,'usage':usage, 'comm':comm}]
mdf['used_ratio'] = [[mdf['used']/mdf['comm']].round[2]]*100
ts = pd.Series[mdf['usage'].values, index=mdf['date']].sort_index[ascending=True]
ts2 = pd.Series[mdf['used_ratio'].values, index = mdf['date']].sort_index[ascending=True]
ts2 = ts2.dropna[]
data = ts2.values.copy[]
if len[data] == 10:
X1 =np.append[X1,data, axis=0]
print[X1]
[0,0,0,0,1,0,0,0,1]
[1,2,3,4,5,6,7,8,9]
[0,5,6,7,8,9,1,2,3]
....
similarly, so the question is how do I capture all these arrays in a single DataFrame so that it looks like below:
[[0,0,0,0,1,0,0,0,1]] --- #row 1 in dataframe
[[1,2,3,4,5,6,7,8,9]] --- #row 2 in dataframe
If the same task can be divided further ? There are more thank 500K arrays in the dataset. Thank You
Pandas Numpy
We will create DataFrame by using 1-D and 2-D Numpy arrays [numpy ndarray].
DataFrame can be created by using Numpy arrays. We know that Numpy array can have one type of data only, so we will
try to create different numpy arrays by using different types of data and finally we will create one DataFrame with name of the students [ string ] and their marks [ numbers ].
Our final DataFrame will have NAME [ String ] and marks in two subjects or numbers in MATH & ENGLISH [ integer].
Let us create one 1-D array to store marks of students. While creating the DataFrame we will add the column name as MATH. We are creating DataFrame for marks in MATH only for four
students.
import pandas as pd
import numpy as np
my_np=np.array[[30,40,50,45]] # Numpy array
# print[my_np] # display the array
my_pd=pd.DataFrame[data=my_np,columns=['MATH']]
print[my_pd]
Output MATH
0 30
1 40
2 50
3 45
Using 2-D array to create the DataFrame
We will use one 2-D array to create the DataFrame. Here we will not add the column names.import pandas as pd
import numpy as np
my_np1=np.array[[[30,40,50,45],
[50,60,50,55]]]
my_pd=pd.DataFrame[data=[my_np1[0],my_np1[1]]]
print[my_pd]
Output 0 1 2 3
0 30 40 50 45
1 50 60 50 55
Adding columns
Before adding the columns we will transpose the DataFrame to make it two columns.import pandas as pd
import numpy as np
my_np1=np.array[[[30,40,50,45],
[50,60,50,55]]]
# transpose the Dataframe
my_pd=pd.DataFrame[data=[my_np1[0],my_np1[1]]].T
my_pd.columns=['MATH','ENGLISH']
print[my_pd]
Output MATH ENGLISH
0 30 50
1 40 60
2 50 50
3 45 55
Here we got the marks of two subjects in our DataFrame. Let us add one string column to this to include the student Names. import pandas as pd
import numpy as np
my_np1=np.array[[[30,40,50,45],
[50,60,50,55]]]
my_names=np.array[['Alex','Ron','Jack','King']]
my_pd=pd.DataFrame[data=[my_names,my_np1[0],my_np1[1]]].T
my_pd.columns=['NAMES','MATH','ENGLISH']
print[my_pd]
Output NAMES MATH ENGLISH
0 Alex 30 50
1 Ron 40 60
2 Jack 50 50
3 King 45 55
Adding new column to DataFrame
In above code we have two integer columns showing marks in two subjects. We can add one more column to show us sum of the marks or total marks. We will use sum[] for this.import pandas as pd
import numpy as np
my_np1=np.array[[[30,40,50,45],
[50,60,50,55]]]
my_names=np.array[['Alex','Ron','Jack','King']]
my_pd=pd.DataFrame[data=[my_names,my_np1[0],my_np1[1]]].T
my_pd.columns=['NAMES','MATH','ENGLISH']
my_pd['Total']=my_pd['MATH'] + my_pd['ENGLISH']
print[my_pd]
Output NAMES MATH ENGLISH Total
0 Alex 30 50 80
1 Ron 40 60 100
2 Jack 50 50 100
3 King 45 55 100
We have used one 2-D array for two subjects. However it is better to use multiple 1-D arrays, one for each subject so it can be scaled up to include more subjects. import pandas as pd
import numpy as np
my_math=np.array[[30,40,50,45]]
my_english=np.array[[50,60,50,55]]
my_names=np.array[['Alex','Ron','Jack','King']]
my_pd=pd.DataFrame[data=[my_names,my_math,my_english]].T
my_pd.columns=['NAMES','MATH','ENGLISH']
print[my_pd]
Output NAMES MATH ENGLISH
0 Alex 30 50
1 Ron 40 60
2 Jack 50 50
3 King 45 55
Removing index
my_pd=pd.DataFrame[data=[my_names,my_math,my_english]].T
my_pd.columns=['NAMES','MATH','ENGLISH']
print[my_pd]
# remove index
print[ my_pd.to_string[index=False]]
Output NAMES MATH ENGLISH
0 Alex 30 50
1 Ron 40 60
2 Jack 50 50
3 King 45 55
NAMES MATH ENGLISH
Alex 30 50
Ron 40 60
Jack 50 50
King 45 55
Using random integers
Create one DataFrame by using random integer Numpy array. We created here one student mark DataFrame using 5 students [ rows ] and two subjects [ columns ] , you can increase to include more number of columns [ subjects ] and rows [students].import numpy as np
import pandas as pd
n=5 # Number of students
my_math=np.random.randint[40,100,size=n]
my_english=np.random.randint[40,100,size=n]
my_pd=pd.DataFrame[data=[my_math,my_english]].T
my_pd.columns=['MATH','ENG']
print[my_pd]
Output MATH ENG
0 76 91
1 53 40
2 69 60
3 47 67
4 73 91
We can add one more column as student ID import numpy as np
import pandas as pd
n=5 # Number of students
my_id=np.arange[1,n+1]
my_math=np.random.randint[40,100,size=n]
my_english=np.random.randint[40,100,size=n]
my_pd=pd.DataFrame[data=[my_id,my_math,my_english]].T
my_pd.columns=['ID','MATH','ENG']
print[my_pd.to_string[index=None]]
Output ID MATH ENG
1 65 58
2 58 97
3 75 90
4 42 69
5 55 51
Pandas read_csv[] read_excel[] to_excel[]