尚硅谷Python爬蟲教程小白零基礎(chǔ)速通(含python基礎(chǔ)+爬蟲案例)

P62 urllib_ajax的代碼,這里面我添加了隨機(jī)休眠數(shù),如果不想要可以刪去主函數(shù)里的sleep方法。
一定要注意base_url是否拼寫正確。
import urllib.parse import urllib.request import random import time # 定義請(qǐng)求的函數(shù) def create_request(page): base_url = 'https://movie.douban.com/j/chart/top_list?type=24&interval_id=100:90&action=&' headers = { 'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/109.0.0.0 Safari/537.36' } data = { 'start':(page-1) * 20, 'limit':20, } data = urllib.parse.urlencode(data) url = base_url + data request = urllib.request.Request(url=url,headers=headers) return request #獲取響應(yīng)內(nèi)容的函數(shù) def get_content(request): response = urllib.request.urlopen(request) content = response.read().decode('utf-8') return content #下載的函數(shù) def down_load(page,content): with open('douban' + str(page) +'.json','w',encoding='utf-8') as fp: fp.write(content) # 程序的入口 start_page = int(input('請(qǐng)輸入起始的頁(yè)碼')) end_page= int(input('請(qǐng)輸入結(jié)束的頁(yè)碼')) for page in range(start_page,end_page+1): time.sleep(random.randint(5,15)) request = create_request(page) content = get_content(request) down_load(page,content)
p78中,selenium代碼如果按照老師教的,會(huì)出現(xiàn)DeprecationWarning: executable_path has been deprecated, please pass in a Service object
?browser = webdriver.Edge(path)
但是并不影響繼續(xù)運(yùn)行

但是如果不想要報(bào)錯(cuò),可以這樣修改
# (1)導(dǎo)入selenium from selenium import webdriver from selenium.webdriver.chrome.service import Service # 導(dǎo)入Service類 s = Service(executable_path='chromedriver.exe') # 創(chuàng)建Service對(duì)象 browser = webdriver.Chrome(service=s) # 傳入Service對(duì)象 # (3)訪問網(wǎng)站 url = 'https://www.jd.com' browser.get(url) concent = browser.page_source print(concent)
標(biāo)簽: