手機站首頁散文詩歌雜文隨筆日記小小說

散文網(wǎng) » 生活 »日常 » 大模型應用實踐：用LLaMA 2.0, FAISS and LangChain實現(xiàn)自有知識問答

大模型應用實踐：用LLaMA 2.0, FAISS and LangChain實現(xiàn)自有知識問答

2023-08-21 21:59 作者:不想打工的程序員 0人讀過 | 我要投稿

在過去的幾周里，我一直在試用幾個大型語言模型(LLMs)并使用互聯(lián)網(wǎng)上的各種方法探索它們的潛力，但現(xiàn)在是時候分享我到目前為止所學到的東西了！

我很興奮地得知元推出了其開源大型語言模型的下一代，LLaMA 2（于2023年7月18日發(fā)布），該模型最有趣的部分是，他們將其免費提供給公眾用于商業(yè)用途。因此，我決定嘗試一下它的性能表現(xiàn)。

在這篇文章中，我將分享如何使用Llama-2 -7b-chat模型和LangChain框架以及FAISS庫執(zhí)行類似于聊天機器人的問答任務，這些文檔是我從Databricks文檔網(wǎng)站在線獲取的。

想了解更多好玩的人工智能應用，請關(guān)注公眾號“機器AI學習數(shù)據(jù)AI挖掘”，”智能應用"菜單中包括：顏值檢測、植物花卉識別、文字識別、人臉美妝等有趣的智能應用。。

介紹

LLaMA 2模型是使用2萬億個tokens和70億到700億參數(shù)預訓練和微調(diào)的，使其成為功能強大的開源模型之一。它有三種不同的模型大?。?B、13B和70B），與Llama 1模型相比有顯著改進，包括在40%更多的tokens上進行訓練，具有更長的上下文長度（4k tokens ），并使用分組查詢注意力快速推理70B模型。它在許多外部基準測試中超越了其他開源LLMs，包括推理、編碼、熟練度和知識測試。

LangChain是一個強大、開源的框架，旨在幫助您開發(fā)由語言模型（特別是大型語言模型）提供支持的應用程序。該庫的核心思想是我們可以將不同的組件“鏈接”在一起，以創(chuàng)建圍繞LLMs的更高級用例。LangChain由來自多個模塊的多個組件組成。

模塊：

提示（Prompts）：該模塊允許您使用模板構(gòu)建動態(tài)提示。根據(jù)上下文窗口大小和用作上下文的輸入變量，它可以適應不同的LLM類型，例如對話歷史記錄、搜索結(jié)果、先前的答案等。

模型（Models）：該模塊提供了一個抽象層來連接到大多數(shù)可用的第三方LLM API。它有API連接到約40個公共LLMs、聊天和嵌入模型。

內(nèi)存（Memory）：此模塊為LLM提供對會話歷史的訪問權(quán)限。

索引（Indexes）：索引指的是使LLM能夠最佳地與文檔交互的方式。此模塊包含處理文檔的實用函數(shù)以及與其他向量數(shù)據(jù)庫集成的集成。

代理（Agents）：某些應用程序不僅需要預定的LLM或其他工具的調(diào)用鏈，而且可能需要依賴于用戶輸入的未知鏈。在這些類型的鏈中，有一個具有訪問一組工具的代理。根據(jù)用戶的輸入，代理可以決定調(diào)用哪個工具（如果有的話）。

鏈（Chains）：對于一些簡單的應用程序，單獨使用LLM就足夠了，但對于許多更復雜的應用程序，需要將LLM鏈接在一起，或者與其他專家鏈接在一起。LangChain提供了鏈的標準接口以及一些通用的鏈實現(xiàn)，以方便使用。

FAISS（Facebook AI Similarity Search）是一個用于高效相似度搜索和密集向量聚類的庫。它可以在標準數(shù)據(jù)庫引擎（SQL）無法或效率低下地搜索多媒體文檔（如圖像）的情況下進行搜索。它包含了能夠在可能不適用于RAM的任意大小的向量集合中進行搜索的算法。它還包含評估和支持代碼參數(shù)調(diào)整。

處理流程

在本節(jié)中，我將簡要描述流程的每個部分。

初始化模型管道：使用Hugging Face的transformers庫為預訓練的Llama-2-7b-chat-hf模型初始化文本生成管道。

攝取數(shù)據(jù)：將任意來源的文本形式的數(shù)據(jù)加載到文檔加載器中。

拆分為塊：將加載的文本拆分成較小的塊。創(chuàng)建這些文本塊是必要的，因為語言模型只能處理有限的文本量。

創(chuàng)建嵌入：將文本塊轉(zhuǎn)換為數(shù)值表示，也稱為嵌入。這些嵌入用于在大型數(shù)據(jù)庫中快速搜索和檢索類似或相關(guān)的文檔，因為它們代表了文本的語義含義。

將嵌入加載到向量存儲中：將嵌入加載到向量存儲（在這種情況下是“FAISS”）中。與傳統(tǒng)數(shù)據(jù)庫相比，向量存儲在基于文本嵌入的相似性搜索方面表現(xiàn)出色。

啟用記憶功能：將對話歷史記錄與新問題結(jié)合起來，并將它們變成單獨的問題對于啟用提出后續(xù)問題的能力非常重要。

查詢數(shù)據(jù)：使用嵌入在向量存儲中搜索存儲的相關(guān)信息。

生成答案：將獨立的問題的相關(guān)信息傳遞給問答鏈，在那里使用語言模型生成答案。

代碼編寫

本節(jié)中，我將詳細介紹代碼的每個步驟。

開始使用您可以在Hugging Face transformers和LangChain中使用開源Llama-2-7b-chat模型。但是，您必須首先通過Meta網(wǎng)站請求訪問Llama 2模型，并在接受Hugging Face網(wǎng)站上的Meta共享您的帳戶詳細信息時接受該請求。通常需要幾分鐘或幾小時才能獲得訪問權(quán)限。

注意，您在Hugging Face網(wǎng)站上提供的電子郵件地址必須與Meta網(wǎng)站上提供的電子郵件地址匹配，否則您的請求將無法通過審核。

如果您正在使用Google Colab來運行代碼，請按以下步驟操作：在筆記本中轉(zhuǎn)到“運行時”>“更改運行時類型”>“硬件加速器”>“GPU”>“GPU類型”>“T4”。進行推理需要大約8GB的GPU RAM，在CPU上運行幾乎不可能。

安裝依賴庫

!pip install -qU transformers accelerate einops langchain xformers bitsandbytes faiss-gpu sentence_transformers

初始化Hugging Face pipeline您必須使用Hugging Face transformers初始化一個文本生成管道。該管道需要以下三個必須初始化的內(nèi)容：

1. LLM，在這種情況下將是
meta-llama/Llama-2-7b-chat-hf。

2. 模型的相應分詞器。

3. 停止標準對象。您必須初始化模型并將其移動到支持CUDA的GPU上。使用Colab，這可能需要5-10分鐘來下載和初始化模型。

此外，您需要生成一個訪問令牌，以便在代碼中從Hugging Face下載模型。為此，請轉(zhuǎn)到您的Hugging Face個人資料>設置>訪問令牌>新建令牌>生成令牌。只需復制該令牌并在下面的代碼中添加它。

from torch import cuda, bfloat16import transformersmodel_id = 'meta-llama/Llama-2-7b-chat-hf'device = f'cuda:{cuda.current_device()}' if cuda.is_available() else 'cpu'# set quantization configuration to load large model with less GPU memory# this requires the `bitsandbytes` librarybnb_config = transformers.BitsAndBytesConfig( ? ?load_in_4bit=True, ? ?bnb_4bit_quant_type='nf4', ? ?bnb_4bit_use_double_quant=True, ? ?bnb_4bit_compute_dtype=bfloat16)# begin initializing HF items, you need an access tokenhf_auth = '<add your access token here>'model_config = transformers.AutoConfig.from_pretrained( ? ?model_id, ? ?use_auth_token=hf_auth)model = transformers.AutoModelForCausalLM.from_pretrained( ? ?model_id, ? ?trust_remote_code=True, ? ?config=model_config, ? ?quantization_config=bnb_config, ? ?device_map='auto', ? ?use_auth_token=hf_auth)# enable evaluation mode to allow model inferencemodel.eval()print(f"Model loaded on {device}")

管道需要一個分詞器，該分詞器將人類可讀的明文轉(zhuǎn)換為LLM可讀的令牌ID。Llama 2.7B模型使用Llama 2.7B分詞器進行訓練，可以使用以下代碼初始化該分詞器：

tokenizer = transformers.AutoTokenizer.from_pretrained( ? ?model_id, ? ?use_auth_token=hf_auth)

現(xiàn)在我們需要定義模型的停止條件。停止條件允許我們指定模型何時應該停止生成文本。如果我們不提供停止條件，則模型在回答初始問題后會走一些離題的路線。

stop_list = ['\nHuman:', '\n```\n'] stop_token_ids = [tokenizer(x)['input_ids'] for x in stop_list] stop_token_ids

您必須將這些停止令牌ID轉(zhuǎn)換為LongTensor對象。

import torchstop_token_ids = [torch.LongTensor(x).to(device) for x in stop_token_ids]stop_token_ids

您可以快速檢查stop_token_ids中是否出現(xiàn)令牌ID（0），因為沒有出現(xiàn)，因此我們可以繼續(xù)構(gòu)建停止條件對象，該對象將檢查是否滿足停止條件 - 即是否生成了任何這些令牌ID組合。

from transformers import StoppingCriteria, StoppingCriteriaList# define custom stopping criteria objectclass StopOnTokens(StoppingCriteria): ? ?def __call__(self, input_ids: torch.LongTensor, scores: torch.FloatTensor, **kwargs) -> bool: ? ? ? ?for stop_ids in stop_token_ids: ? ? ? ? ? ?if torch.eq(input_ids[0][-len(stop_ids):], stop_ids).all(): ? ? ? ? ? ? ? ?return True ? ? ? ?return Falsestopping_criteria = StoppingCriteriaList([StopOnTokens()])

您已經(jīng)準備好初始化Hugging Face管道了。在這里，我們必須定義一些額外的參數(shù)。代碼中包括注釋以進行進一步解釋。

generate_text = transformers.pipeline( ? ?model=model, ? ?tokenizer=tokenizer, ? ?return_full_text=True, ?# langchain expects the full text ? ?task='text-generation', ? ?# we pass model parameters here too ? ?stopping_criteria=stopping_criteria, ?# without this model rambles during chat ? ?temperature=0.1, ?# 'randomness' of outputs, 0.0 is the min and 1.0 the max ? ?max_new_tokens=512, ?# max number of tokens to generate in the output ? ?repetition_penalty=1.1 ?# without this output begins repeating)

運行這段代碼以確認一切正常。

res = generate_text("Explain me the difference between Data Lakehouse and Data Warehouse.") print(res[0]["generated_text"])

在LangChain中實現(xiàn)Hugging Face管道

現(xiàn)在，您需要將Hugging Face管道實現(xiàn)在LangChain中。您仍然會得到與此處沒有進行任何更改相同的輸出。但是，這段代碼將允許您使用LangChain的高級代理工具、鏈等與Llama 2一起使用。

from langchain.llms import HuggingFacePipeline llm = HuggingFacePipeline(pipeline=generate_text)# checking again that everything is working finellm(prompt="Explain me the difference between Data Lakehouse and Data Warehouse.")

使用文檔加載器攝取數(shù)據(jù)

您必須使用WebBaseLoader文檔加載器攝取數(shù)據(jù)，該加載器通過抓取網(wǎng)頁收集數(shù)據(jù)。在這種情況下，您將從Databricks文檔網(wǎng)站收集數(shù)據(jù)。

from langchain.document_loaders import WebBaseLoader web_links = ["https://www.databricks.com/","https://help.databricks.com","https://databricks.com/try-databricks","https://help.databricks.com/s/","https://docs.databricks.com","https://kb.databricks.com/","http://docs.databricks.com/getting-started/index.html","http://docs.databricks.com/introduction/index.html","http://docs.databricks.com/getting-started/tutorials/index.html","http://docs.databricks.com/release-notes/index.html","http://docs.databricks.com/ingestion/index.html","http://docs.databricks.com/exploratory-data-analysis/index.html","http://docs.databricks.com/data-preparation/index.html","http://docs.databricks.com/data-sharing/index.html","http://docs.databricks.com/marketplace/index.html","http://docs.databricks.com/workspace-index.html","http://docs.databricks.com/machine-learning/index.html","http://docs.databricks.com/sql/index.html","http://docs.databricks.com/delta/index.html","http://docs.databricks.com/dev-tools/index.html","http://docs.databricks.com/integrations/index.html","http://docs.databricks.com/administration-guide/index.html","http://docs.databricks.com/security/index.html","http://docs.databricks.com/data-governance/index.html","http://docs.databricks.com/lakehouse-architecture/index.html","http://docs.databricks.com/reference/api.html","http://docs.databricks.com/resources/index.html","http://docs.databricks.com/whats-coming.html","http://docs.databricks.com/archive/index.html","http://docs.databricks.com/lakehouse/index.html","http://docs.databricks.com/getting-started/quick-start.html","http://docs.databricks.com/getting-started/etl-quick-start.html","http://docs.databricks.com/getting-started/lakehouse-e2e.html","http://docs.databricks.com/getting-started/free-training.html","http://docs.databricks.com/sql/language-manual/index.html","http://docs.databricks.com/error-messages/index.html","http://www.apache.org/","https://databricks.com/privacy-policy","https://databricks.com/terms-of-use"] loader = WebBaseLoader(web_links) documents = loader.load()

使用文本分割器以塊形式拆分文本

您必須確保將文本拆分為小塊。您需要初始化
RecursiveCharacterTextSplitter并通過傳遞文檔來調(diào)用它。

rom langchain.text_splitter import RecursiveCharacterTextSplittertext_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)all_splits = text_splitter.split_documents(documents)

創(chuàng)建嵌入并存儲在向量存儲中

您需要為每個小文本塊創(chuàng)建嵌入，并將它們存儲在向量存儲（即FAISS）中。您將使用all-mpnet-base-v2句子轉(zhuǎn)換器將所有文本片段轉(zhuǎn)換為向量，同時將它們存儲在向量存儲中。

from langchain.embeddings import HuggingFaceEmbeddingsfrom langchain.vectorstores import FAISS model_name = "sentence-transformers/all-mpnet-base-v2"model_kwargs = {"device": "cuda"} embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)# storing embeddings in the vector storevectorstore = FAISS.from_documents(all_splits, embeddings)

初始化鏈

您需要初始化
ConversationalRetrievalChain。該鏈使您能夠擁有具有記憶功能的聊天機器人，同時依靠向量存儲從您的文檔中查找相關(guān)信息。另外，您可以在構(gòu)建鏈時指定可選參數(shù)return_source_documents=True，以返回用于回答問題的源文檔。

from langchain.chains import ConversationalRetrievalChain chain = ConversationalRetrievalChain.from_llm(llm, vectorstore.as_retriever(), return_source_documents=True)

現(xiàn)在，是時候使用自己的數(shù)據(jù)進行問答了！

chat_history = [] query = "What is Data lakehouse architecture in Databricks?"result = chain({"question": query, "chat_history": chat_history}) print(result['answer'])

輸出：

現(xiàn)在，您已經(jīng)可以使用強大的語言模型對自己的數(shù)據(jù)進行問答了。此外，您還可以使用Streamlit進一步開發(fā)它成為一個聊天機器人應用程序。

標簽：