Python和requests庫結(jié)合采集豆瓣短評(píng)

2023-11-08 15:54 作者:華科云商小彭 0人讀過 | 我要投稿

Python是一種常用的程序語言，今天我們就用Python和requests庫結(jié)合，來寫一個(gè)采集豆瓣短評(píng)的程序，非常的簡單，一起來學(xué)學(xué)吧。

```python

import requests

from bs4 import BeautifulSoup

# 設(shè)置代理

proxy = f'http://{proxy_host}:{proxy_port}'

headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'}

response = requests.get('https://book.douban.com/top250', headers=headers, proxies=proxy)

# 解析HTML

soup = BeautifulSoup(response.text, 'html.parser')

reviews = soup.find_all('span', class_='short')

# 打印短評(píng)

for review in reviews:

print(review.text)

```

每一步的解釋如下：

1. 導(dǎo)入需要的庫（requests和BeautifulSoup）。

2. 設(shè)置代理（proxy_host和proxy_port）。

3. 使用requests庫的get方法，向豆瓣圖書top250頁面發(fā)送GET請求，同時(shí)設(shè)置headers和proxies。

4. 使用BeautifulSoup庫解析返回的HTML。

5. 使用find_all方法，找到所有class為'short'的span標(biāo)簽，這些標(biāo)簽包含短評(píng)信息。

6. 使用for循環(huán)，打印出每個(gè)短評(píng)。

標(biāo)簽：

Python和requests庫結(jié)合采集豆瓣短評(píng)的評(píng)論 (共條)