Gephi實(shí)現(xiàn)作者共現(xiàn)網(wǎng)絡(luò)可視化
Gephi實(shí)現(xiàn)作者共現(xiàn)網(wǎng)絡(luò)可視化
1.下載安裝
??? https://gephi.org/??

2 作者共現(xiàn)分析
https://github.com/SparksFly8/Co-occurrence-Matrix
作者共現(xiàn)可視化
2.1 數(shù)據(jù)
https://raw.githubusercontent.com/SparksFly8/Co-occurrence-Matrix/master/data.csv
數(shù)據(jù)格式
Xiaohui Bei,Shengyu Zhang
Pin-Yu Chen,Yash Sharma,Huan Zhang,Jinfeng Yi,Cho-Jui Hsieh
Jonathan Chung,Moshe Eizenman,Uros Rakita,Roger McIntyre,Peter Giacobbe
Bolin Ding,Harsha Nori,Paul Li,Joshua Allen
2.2 計(jì)算頻次與共現(xiàn)次數(shù)
def get_Co_authors(filePath):
??? '''
??? 讀取csv文件獲取作者信息并存儲(chǔ)到列表中
??? :param filePath: csv文件路徑
??? :return co_authors_list: 一個(gè)包含所有作者的列表
??? '''
??? # 設(shè)置編碼為utf-8-sig防止首部\ufeff的出現(xiàn),它是windows系統(tǒng)自帶的BOM,用于區(qū)分大端和小端UTF-16編碼
??? with open(filePath, 'r', encoding='utf-8-sig') as f:
??????? text = f.read()
??????? co_authors_list = text.split('\n')? # 分割數(shù)據(jù)中的換行符'\n'兩邊的數(shù)據(jù)
??????? if '' in co_authors_list:
??????????? co_authors_list.remove('')????????? # 刪除列表結(jié)尾的空字符
??????? return co_authors_list
def str2csv(filePath, s):
??? '''
??? 將字符串寫入到本地csv文件中
??? :param filePath: csv文件路徑
??? :param s: 待寫入字符串(逗號(hào)分隔格式)
??? '''
??? with open(filePath, 'w', encoding='utf-8') as f:
??????? f.write(s)
??? print('寫入文件成功,請(qǐng)?jiān)?#39;+filePath+'中查看')
def sortDictValue(dict, is_reverse):
??? '''
??? 將字典按照value排序
??? :param dict: 待排序的字典
??? :param is_reverse: 是否按照倒序排序
??? :return s: 符合csv逗號(hào)分隔格式的字符串
??? '''
??? # 對(duì)字典的值進(jìn)行倒序排序,items()將字典的每個(gè)鍵值對(duì)轉(zhuǎn)化為一個(gè)元組,key輸入的是函數(shù),item[1]表示元組的第二個(gè)元素,reverse為真表示倒序
??? tups = sorted(dict.items(), key=lambda item: item[1], reverse=is_reverse)
??? s = ''
??? for tup in tups:? # 合并成csv需要的逗號(hào)分隔格式
??????? s = s + tup[0] + ',' + str(tup[1]) + '\n'
??? return s
def build_matrix(co_authors_list, is_reverse):
??? '''
??? 根據(jù)共同作者列表,構(gòu)建共現(xiàn)矩陣(存儲(chǔ)到字典中),并將該字典按照權(quán)值排序
??? :param co_authors_list: 共同作者列表
??? :param is_reverse: 排序是否倒序
??? :return node_str: 三元組形式的節(jié)點(diǎn)字符串(且符合csv逗號(hào)分隔格式)
??? :return edge_str: 三元組形式的邊字符串(且符合csv逗號(hào)分隔格式)
??? '''
??? node_dict = {}? # 節(jié)點(diǎn)字典,包含節(jié)點(diǎn)名+節(jié)點(diǎn)權(quán)值(頻數(shù))
??? edge_dict = {}? # 邊字典,包含起點(diǎn)+目標(biāo)點(diǎn)+邊權(quán)值(頻數(shù))
??? # 第1層循環(huán),遍歷整表的每行作者信息
??? for row_authors in co_authors_list:
??????? row_authors_list = row_authors.split(',') # 依據(jù)','分割每行所有作者,存儲(chǔ)到列表中
??????? # 第2層循環(huán),遍歷當(dāng)前行所有作者中每個(gè)作者信息
??????? for index, pre_au in enumerate(row_authors_list): # 使用enumerate()以獲取遍歷次數(shù)index
??????????? # 統(tǒng)計(jì)單個(gè)作者出現(xiàn)的頻次
??????????? if pre_au not in node_dict:
??????????????? node_dict[pre_au] = 1
??????????? else:
??????????????? node_dict[pre_au] += 1
??????????? # 若遍歷到倒數(shù)第一個(gè)元素,則無需記錄關(guān)系,結(jié)束循環(huán)即可
??????????? if pre_au == row_authors_list[-1]:
??????????????? break
??????????? connect_list = row_authors_list[index+1:]
??????????? # 第3層循環(huán),遍歷當(dāng)前行該作者后面所有的合作者,以統(tǒng)計(jì)兩兩作者合作的頻次
??????????? for next_au in connect_list:
??????????????? A, B = pre_au, next_au
??????????????? # 固定兩兩作者的順序
??????????????? if A > B:
??????????????????? A, B = B, A
??????????????? key = A+','+B? # 格式化為逗號(hào)分隔A,B形式,作為字典的鍵
??????????????? # 若該關(guān)系不在字典中,則初始化為1,表示作者間的合作次數(shù)
??????????????? if key not in edge_dict:
??????????????????? edge_dict[key] = 1
??????????????? else:
??????????????????? edge_dict[key] += 1
??? # 對(duì)得到的字典按照value進(jìn)行排序
??? node_str = sortDictValue(node_dict, is_reverse)? # 節(jié)點(diǎn)
??? edge_str = sortDictValue(edge_dict, is_reverse)?? # 邊
??? return node_str, edge_str
if __name__ == '__main__':
??? readfilePath = r'data.csv'
??? writefilePath1 = r'node.csv'
??? writefilePath2 = r'edge.csv'
??? # 讀取csv文件獲取作者信息并存儲(chǔ)到列表中
??? co_authors_list = get_Co_authors(readfilePath)
??? # 根據(jù)共同作者列表, 構(gòu)建共現(xiàn)矩陣(存儲(chǔ)到字典中), 并將該字典按照權(quán)值排序
??? node_str, edge_str = build_matrix(co_authors_list, is_reverse=True)
??? # print(edge_str)
??? # 將字符串寫入到本地csv文件中
??? str2csv(writefilePath1, node_str)
??? str2csv(writefilePath2, edge_str)

2.3 過濾數(shù)據(jù)(共現(xiàn)頻次)
import pandas as pd
import numpy as np
def main(edge_weight):
??? node_df = pd.read_csv('./node.csv',header=None)
??? node_df.columns = ['Label','Weight']
??? node_df['Id'] = node_df.Label
??? node_df = node_df[['Id','Label','Weight']]
??? edge_df = pd.read_csv('./edge.csv',header=None)
??? edge_df.columns = ['Source','Target','Weight']
??? edge_df = edge_df[edge_df.Weight >= edge_weight]
??? node_label= list(set(np.array(edge_df[['Source','Target']].values).flatten()))
??? node_df['Filter'] = node_df.Label.apply(lambda x:1 if x in node_label else 0)
??? node_df = node_df.query("Filter == 1")
??? node_df.drop(columns=['Filter'],inplace=True)
??? node_df.to_csv('./transform_node.csv',index=0,encoding='utf-8')
??? edge_df.to_csv('./transform_edge.csv',index=0,encoding='utf-8')
??? return node_df,edge_df
if __name__ == '__main__':
??? main(3)
3 Gephi可視化

4 導(dǎo)出至Vosviewer可視化

5 參考資料
python 共現(xiàn)矩陣的實(shí)現(xiàn) - Dragon水魅 - 博客園
【繪制關(guān)系網(wǎng)絡(luò)圖】Gephi 入門使用_賣山楂啦prss的博客-CSDN博客_gephi網(wǎng)絡(luò)圖
https://github.com/SparksFly8/Co-occurrence-Matrix
本文使用 文章同步助手 同步