使用網(wǎng)絡(luò)爬蟲自動抓取長江水位和流量數(shù)據(jù)

首先使用urlread讀取網(wǎng)頁的內(nèi)容,再找到數(shù)據(jù)所在的字符串
wl=regexp(str,'"wptn":"\d","z":"\S{2,10}"}','match'); %提取水位 單位是米
name=regexp(str,'"stnm":"\S{2,10}","tm":','match'); %提取觀測站點(diǎn)名字
j=1; %網(wǎng)站數(shù)據(jù)是逐時(shí)的,整點(diǎn)會更新,為了預(yù)防可能出現(xiàn)的延遲問題,建議12點(diǎn)的數(shù)據(jù)12點(diǎn)15分時(shí)才去爬取
while 0<1
?time(j,:)=datestr(now);
str=urlread('http://www.cjh.com.cn/sssqcwww.html');
%str1='{"oq":"0","q":"1710","rvnm":"長江","stcd":"60103400","stnm":"向家壩","tm":1606701600000,"wptn":"5","z":"266.22"}'
flow=regexp(str,'{"oq":"0","q":"\d*','match');? %流量 單位是m3/s
wl=regexp(str,'"wptn":"\d","z":"\S{2,10}"}','match'); %水位 單位是米
name=regexp(str,'"stnm":"\S{2,10}","tm":','match'); %觀測站點(diǎn)名字
for i=1:length(flow)
??? d(j,i)=str2num(flow{i}(16:end));? %0代表沒有流量觀測
end
for i=1:length(wl)
??? sl(j,i)=str2num(wl{i}(17:end-2));?
end
j=j+1
save('Yangtze.mat','time','d','sl','name')
pause(3600); %每隔一小時(shí)執(zhí)行一次
end