散文網(wǎng) » 科技 »學(xué)習(xí) » pandas基礎(chǔ)數(shù)據(jù)分析實(shí)例：有史以來票房最高的700部電影

pandas基礎(chǔ)數(shù)據(jù)分析實(shí)例：有史以來票房最高的700部電影

2023-07-26 06:22 作者:矢來美羽MIUYARAI 0人讀過 | 我要投稿

數(shù)據(jù)集（截至2019）：movie.csv
鏈接：https://pan.baidu.com/s/10QJJ-UqJJVa1g3xjbelImA?pwd=1234

jupyter notebook默認(rèn)路徑調(diào)整：打開Anaconda Prompt，輸入所需路徑、回車，

再輸入jupyter notebook、回車，jupyter notebook即可打開所需路徑：

創(chuàng)建新.ipynb文件：

csv讀取

數(shù)據(jù)類型?DataFrame（數(shù)據(jù)框架）：整個(gè)csv數(shù)據(jù)的存儲(chǔ)容器。

DataFrame數(shù)據(jù)的選擇和切片（更多參考：https://www.runoob.com/pandas/pandas-functions.html）

索引位置（通過位置選擇數(shù)據(jù)）：DataFrame.iloc[row_index, column_index]

索引標(biāo)簽（通過標(biāo)簽選擇數(shù)據(jù)）：DataFrame.loc[row_index, column_name]

數(shù)據(jù)類型 Series（串/系列）：類似表格中的一個(gè)列（column），類似于一維數(shù)組。

排序：DataFrame.sort_values([column_name1, column_name2], ascending=[True, False])

統(tǒng)計(jì)數(shù)量：統(tǒng)哪部公司的入圍數(shù)量最多?DataFrame[column_name].value_counts()

多條件篩選：DataFrame[column_name]，可以直接篩選：

也可以將篩選條件賦給變量，再使用：

更細(xì)致的數(shù)值篩選：

Gross的數(shù)據(jù)轉(zhuǎn)換：轉(zhuǎn)換前，

數(shù)據(jù)清洗：（將指定值替換為新值）DataFrame.replace(old_value, new_value)，用將美元、逗號(hào)符號(hào)替換為空文本""，

再加上文本轉(zhuǎn)浮點(diǎn)數(shù)：astype([data_type])

以上只是操作副本（用于臨時(shí)顯示的緩存），永久性操作則需要賦值：（程序內(nèi)部的DataFrame數(shù)據(jù)發(fā)生變化，原csv文件并不變化）

分組：按"Studio"將Series分組，得到新的studios

將studios的"Gross"按count()數(shù)量進(jìn)行排序：

將studios的"Gross"按sum()數(shù)值求和進(jìn)行排序：

標(biāo)簽：