手機(jī)站首頁散文詩歌雜文隨筆日記小小說

散文網(wǎng) » 科技 »學(xué)習(xí) » 哪種大模型是程序員的福音？ChatGPT并非最佳Copilot 成功“弒神”

哪種大模型是程序員的福音？ChatGPT并非最佳Copilot 成功“弒神”

2023-07-05 18:30 作者:AI研習(xí)所 0人讀過 | 我要投稿

我們都知道ChatGPT4的綜合能力肯定是大于ChatGPT3.5。

1：那么官方提供的 GPT3.5 API 能力和官方的 GPT3.5 能力相當(dāng)嗎？

（友情提示：ChatGPT3.5和ChatGPT3.5API是不一樣的）

2：GPT4.0和GPT4.0配合聯(lián)網(wǎng)搜索哪個(gè)效果好？

3：另外， Github 的 copilot chat 能力又怎樣？

帶著這些問題，我們用一個(gè)簡(jiǎn)單的編程問題，逐個(gè)進(jìn)行測(cè)試，看看他們的表現(xiàn)如何？

編程問題：

幫我寫一段python完整的代碼，我希望用最快的效率爬取10000個(gè)網(wǎng)頁，并存儲(chǔ)在本地的MySQL數(shù)據(jù)庫。

ChatGPT3.5 API

import requests

from bs4 import BeautifulSoup

import mysql.connector
# 設(shè)置MySQL數(shù)據(jù)庫連接

db = mysql.connector.connect(

host="localhost",

user="your_username",

password="your_password",

database="your_database")

cursor = db.cursor()
# 爬取網(wǎng)頁并存儲(chǔ)到數(shù)據(jù)庫

def crawl_and_store(url):

try:

response = requests.get(url)

soup = BeautifulSoup(response.text, 'html.parser')
# 解析網(wǎng)頁內(nèi)容并提取需要存儲(chǔ)的數(shù)據(jù)

# 這里只是一個(gè)示例，你需要根據(jù)具體的網(wǎng)頁結(jié)構(gòu)進(jìn)行解析 title = soup.title.text.strip()

content = soup.find('div', class_='content').text.strip()
# 將數(shù)據(jù)插入到數(shù)據(jù)庫中

query = "INSERT INTO webpages (url, title, content) VALUES (%s, %s, %s)"

values = (url, title, content)

cursor.execute(query, values)

db.commit()
print(f"成功爬取并存儲(chǔ)網(wǎng)頁: {url}")

except Exception as e:

print(f"爬取網(wǎng)頁失敗: {url}")

print(f"錯(cuò)誤信息: {str(e)}")
# 待爬取的網(wǎng)頁列表

urls = ["https://example.com/page1","

https://example.com/page2",# 添加更多的URL...]
# 遍歷網(wǎng)頁列表進(jìn)行爬取和存儲(chǔ)for url in urls: crawl_and_store(url)
# 關(guān)閉數(shù)據(jù)庫連接cursor.close()db.close()

總結(jié)：

GPT3.5 API的代碼寫的簡(jiǎn)單明了。沒什么大問題，但是卻忽略了我的要求，我要求需要以最快的效率，但是它忽略了我的要求，依然用了最簡(jiǎn)單方法寫了代碼。

ChatGPT3.5

import requests

import mysql.connector

from concurrent.futures import ThreadPoolExecutor

# 創(chuàng)建數(shù)據(jù)庫連接

db = mysql.connector.connect(

host="localhost",

user="your_username",

password="your_password",

database="your_database")

cursor = db.cursor()
# 創(chuàng)建表格

cursor.execute("CREATE TABLE IF NOT EXISTS web_pages (id INT AUTO_INCREMENT PRIMARY KEY, url VARCHAR(255), content TEXT)")
def fetch_and_store(url):

try:

response = requests.get(url)

content = response.text
# 存儲(chǔ)到數(shù)據(jù)庫

sql = "INSERT INTO web_pages (url, content) VALUES (%s, %s)"

val = (url, content)

cursor.execute(sql, val)

db.commit()
print(f"成功爬取并存儲(chǔ)網(wǎng)頁: {url}")

except Exception as e:

print(f"爬取網(wǎng)頁 {url} 失敗: {str(e)}")

# 要爬取的網(wǎng)頁列表urls = ["https://example.com/page1", "https://example.com/page2", ...] # 添加更多的URL
# 創(chuàng)建線程池

executor = ThreadPoolExecutor(max_workers=10) # 設(shè)置適當(dāng)?shù)木€程數(shù)
# 使用線程池并發(fā)地爬取和存儲(chǔ)網(wǎng)頁for url in urls: executor.submit(fetch_and_store, url)
# 等待所有任務(wù)完成executor.shutdown()
# 關(guān)閉數(shù)據(jù)庫連接cursor.close()db.close()

總結(jié)：

ChatGPT3.5的網(wǎng)頁版本是比API的效果要好的。有兩點(diǎn)，第一，它分析出來，我在插入MySQL的之前是要?jiǎng)?chuàng)建表的，并且它注意到了我強(qiáng)調(diào)了要高效率的實(shí)現(xiàn)，代碼也引入了多線程。

ChatGPT4

import concurrent.futures

import requests

import pymysql.cursors
# TODO: 替換為實(shí)際的網(wǎng)址列表

url_list = ["http://www.example.com/1", "http://www.example.com/2", "..."]
# 爬取單個(gè)網(wǎng)頁的函數(shù)

def fetch_url(url):

response = requests.get(url)

return url, response.text
# 連接到 MySQL

connection = pymysql.connect(host='localhost', user='your_username', password='your_password',

database='your_database',

charset='utf8mb4',

cursorclass=pymysql.cursors.DictCursor)
try:# 創(chuàng)建一個(gè)線程池

with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:

# 使用線程池異步地爬取所有網(wǎng)頁

future_to_url = {executor.submit(fetch_url, url): url for url in url_list}

for future in concurrent.futures.as_completed(future_to_url):

url = future_to_url[future]try: data = future.result()except Exception as exc: print('%r generated an exception: %s' % (url, exc))else: print('%r page is %d bytes' % (url, len(data[1])))
# 存儲(chǔ)到數(shù)據(jù)庫

try:with connection.cursor() as cursor:

sql = "INSERT INTO `your_table` (`url`, `content`) VALUES (%s, %s)" cursor.execute(sql, (data[0], data[1]))

connection.commit()

except Exception as error: print('Failed to update database: {}'.format(error))

finally: connection.close()

總結(jié)：

ChatGPT4比ChatGPT3.5又升級(jí)了一版，理解了我要高效率的意思，不僅用了多線程還在注釋里面強(qiáng)調(diào)：使用線程池異步地爬取所有網(wǎng)頁。異步使得程序的抓取效率又上升了一個(gè)檔次，而且還考慮到插入表的時(shí)候會(huì)報(bào)錯(cuò)。還打印出來了報(bào)錯(cuò)信息?？芍^是非常規(guī)范了。

科大星火

import requests
from bs4 import BeautifulSoup
import pymysql

# 連接MySQL數(shù)據(jù)庫
def connect_db():
db = pymysql.connect(host='localhost', user='root', password='your_password', database='your_database', charset='utf8')
return db

# 爬取網(wǎng)頁內(nèi)容
def get_html(url):
try:
response = requests.get(url)
response.raise_for_status()
response.encoding = response.apparent_encoding
return response.text
except:
print("獲取網(wǎng)頁失敗")
return None

# 解析網(wǎng)頁內(nèi)容
def parse_html(html):
soup = BeautifulSoup(html, 'html.parser')
# 根據(jù)網(wǎng)頁結(jié)構(gòu)提取所需信息，這里需要根據(jù)實(shí)際情況修改
titles = soup.find_all('h1')
for title in titles:
yield title.text

# 將數(shù)據(jù)存儲(chǔ)到MySQL數(shù)據(jù)庫
def save_to_db(data):
db = connect_db()
cursor = db.cursor()
sql = "INSERT INTO your_table (title) VALUES (%s)"
try:
cursor.executemany(sql, data)
db.commit()
except:
db.rollback()
print("保存數(shù)據(jù)失敗")
finally:
cursor.close()
db.close()

if __name__ == '__main__':
urls = ['https://www.example.com/page{}'.format(i) for i in range(1, 10001)]
data = []
for url in urls:
html = get_html(url)
if html:
parsed_data = parse_html(html)
data.extend(parsed_data)
if len(data) >= 10000:
save_to_db(data)
data = []
else:
print("跳過無效網(wǎng)頁")
continue
if data:
save_to_db(data)

總結(jié)：

基本效果等于ChatGPT3.5API

Copilot

import asyncioimport aiohttpimport pymysqlimport aiomysql
# MySQL數(shù)據(jù)庫配置信息DB_SETTINGS = {"host": "localhost","database": "your_database","user": "your_username","password": "your_password",}
# 你需要爬取的URL列表URLS = ["http://www.example.com"] * 10000
async def fetch(session, url):async with session.get(url) as response:return await response.text()
async def save_to_db(pool, url, content):async with pool.acquire() as conn:async with conn.cursor() as cur:# 根據(jù)你的數(shù)據(jù)表結(jié)構(gòu)，寫入具體的SQL語句 insert_sql = "INSERT INTO table_name(url, content) VALUES(%s, %s)"await cur.execute(insert_sql, (url, content))await conn.commit()
async def fetch_and_save(pool, session, url): content = await fetch(session, url)await save_to_db(pool, url, content)
async def main():async with aiohttp.ClientSession() as session: pool = await aiomysql.create_pool(**DB_SETTINGS) tasks = []for url in URLS: task = asyncio.create_task(fetch_and_save(pool, session, url)) tasks.append(task)await asyncio.gather(*tasks)
if __name__ == "__main__": asyncio.run(main())

總結(jié)：

Github 的 copilot chat給了我心中最好的答案，采用asyncio和aiohttp，甚至使用了aiomysql，全程異步，最大程度的保證了代碼的效率。代碼寫的也很優(yōu)雅。綜合評(píng)分如上圖。

標(biāo)簽：AGI 大模型 AIGC 人工智能