最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊(cè)

2.8a課 嵌入,使用chroma 向量數(shù)據(jù)庫代碼講解 -- 大語言模型應(yīng)用開發(fā)

2023-08-16 20:22 作者:加油_芋頭  | 我要投稿

使用chroma 向量數(shù)據(jù)庫代碼


# MAGIC ## Read data


# COMMAND ----------


import pandas as pd


dais_pdf = pd.read_parquet(f"{DA.paths.datasets}/dais/dais23_talks.parquet")

display(dais_pdf)


# COMMAND ----------


dais_pdf["full_text"] = dais_pdf.apply(

??lambda row: f"""Title: {row["Title"]}

????????Abstract:?{row["Abstract"]}""".strip(),

??axis=1,

)

print(dais_pdf.iloc[0]["full_text"])


# COMMAND ----------


texts = dais_pdf["full_text"].to_list()


# COMMAND ----------


# MAGIC %md

# MAGIC ## Question 1

# MAGIC Set up Chroma and create collection


# COMMAND ----------


import chromadb

from chromadb.config import Settings


chroma_client = chromadb.Client(

??Settings(

????chroma_db_impl="duckdb+parquet",

????persist_directory=DA.paths.user_db,?# this is an optional argument. If you don't supply this, the data will be ephemeral

??)

)


# COMMAND ----------


# MAGIC %md

# MAGIC

# MAGIC Assign the value of `my_talks` to the `collection_name` variable.


# COMMAND ----------


# TODO

collection_name = "<FILL_IN>"


# If you have created the collection before, you need to delete the collection first

if len(chroma_client.list_collections()) > 0 and collection_name in [chroma_client.list_collections()[0].name]:

??chroma_client.delete_collection(name=collection_name)

else:

??print(f"Creating collection: '{collection_name}'")

??talks_collection = chroma_client.create_collection(name=collection_name)


# COMMAND ----------


# Test your answer. DO NOT MODIFY THIS CELL.


dbTestQuestion2_1(collection_name)


# COMMAND ----------


# MAGIC %md

# MAGIC ## Question 2

# MAGIC

# MAGIC [Add](https://docs.trychroma.com/reference/Collection#add) data to the collection.?


# COMMAND ----------


# TODO

talks_collection.add(

??documents=<FILL_IN>,

??ids=<FILL_IN>

)


# COMMAND ----------


# Test your answer. DO NOT MODIFY THIS CELL.


dbTestQuestion2_2(talks_collection)


# COMMAND ----------


# MAGIC %md

# MAGIC ## Question 3

# MAGIC

# MAGIC [Query](https://docs.trychroma.com/reference/Collection#query) for relevant documents. If you are looking for talks related to language models, your query texts could be `language models`.?


# COMMAND ----------


# TODO

import json


results = talks_collection.query(

??query_texts=<FILL_IN>,

??n_results=<FILL_IN>

)


print(json.dumps(results, indent=4))


# COMMAND ----------


# Test your answer. DO NOT MODIFY THIS CELL.


dbTestQuestion2_3(results)


# COMMAND ----------


# MAGIC %md

# MAGIC ## Question 4

# MAGIC

# MAGIC Load a language model and create a [pipeline](https://huggingface.co/docs/transformers/main/en/main_classes/pipelines).


# COMMAND ----------


# TODO

from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline


# Pick a model from HuggingFace that can generate text

model_id = "<FILL_IN>"

tokenizer = AutoTokenizer.from_pretrained(model_id)

lm_model = AutoModelForCausalLM.from_pretrained(model_id)


pipe = pipeline(

??"<FILL_IN>", model=lm_model, tokenizer=tokenizer, max_new_tokens=512, device_map="auto", handle_long_generation="hole"

)


# COMMAND ----------


# Test your answer. DO NOT MODIFY THIS CELL.


dbTestQuestion2_4(pipe)


# COMMAND ----------


# MAGIC %md

# MAGIC ## Question 5

# MAGIC

# MAGIC Prompt engineering for question answering


# COMMAND ----------


# TODO

# Come up with a question that you need the LLM assistant to help you with

# A sample question is "Help me find sessions related to XYZ"?

# Note: Your "XYZ" should be related to the query you passed in Question 3.?

question = "<FILL_IN>"


# Provide all returned similar documents from the cell above below

context = <FILL_IN>


# Feel free to be creative how you construct the prompt. You can use the demo notebook as a jumpstart reference.

# You can also provide more requirements in the text how you want the answers to look like.

# Example requirement: "Recommend top-5 relevant sessions for me to attend."

prompt_template = <FILL_IN>


# COMMAND ----------


# Test your answer. DO NOT MODIFY THIS CELL.


dbTestQuestion2_5(question, context, prompt_template)


# COMMAND ----------


# MAGIC %md

# MAGIC ## Question 6?

# MAGIC

# MAGIC Submit query for language model to generate response.

# MAGIC

# MAGIC Hint: If you run into the error `index out of range in self`, make sure to check out this [documentation page](https://huggingface.co/docs/transformers/main_classes/pipelines#transformers.TextGenerationPipeline.__call__.handle_long_generation).


# COMMAND ----------


# TODO

lm_response = pipe(<FILL_IN>)

print(lm_response[0]["generated_text"])


# COMMAND ----------


# Test your answer. DO NOT MODIFY THIS CELL.


dbTestQuestion2_6(lm_response)


# COMMAND ----------


# MAGIC %md

# MAGIC Notice that the output isn't exactly helpful. Head on to using OpenAI to try out GPT-3.5 instead!?


# COMMAND ----------


# MAGIC %md

# MAGIC ## OPTIONAL (Non-Graded): Use OpenAI models for Q/A

# MAGIC

# MAGIC For this section to work, you need to generate an Open AI key.?

# MAGIC

# MAGIC Steps:

# MAGIC 1. You need to [create an account](https://platform.openai.com/signup) on OpenAI.?

# MAGIC 2. Generate an OpenAI [API key here](https://platform.openai.com/account/api-keys).?

# MAGIC

# MAGIC Note: OpenAI does not have a free option, but it gives you $5 as credit. Once you have exhausted your $5 credit, you will need to add your payment method. You will be [charged per token usage](https://openai.com/pricing). **IMPORTANT**: It's crucial that you keep your OpenAI API key to yourself. If others have access to your OpenAI key, they will be able to charge their usage to your account!?


# COMMAND ----------


# TODO

import os


os.environ["OPENAI_API_KEY"] = "<FILL IN>"


# COMMAND ----------


import openai


openai.api_key = os.environ["OPENAI_API_KEY"]


# COMMAND ----------


# MAGIC %md

# MAGIC If you would like to estimate how much it would cost to use OpenAI, you can use `tiktoken` library from OpenAI to get the number of tokens from your prompt.

# MAGIC

# MAGIC

# MAGIC We will be using `gpt-3.5-turbo` since it's the most economical option at ($0.002/1k tokens), as of May 2023. GPT-4 charges $0.04/1k tokens. The following code block below is referenced from OpenAI's documentation on ["Managing tokens"](https://platform.openai.com/docs/guides/chat/managing-tokens).


# COMMAND ----------


import tiktoken


price_token = 0.002

encoder = tiktoken.encoding_for_model("gpt-3.5-turbo")

cost_to_run = len(encoder.encode(prompt_template)) / 1000 * price_token

print(f"It would take roughly ${round(cost_to_run, 5)} to run this prompt")


# COMMAND ----------


# MAGIC %md

# MAGIC We won't have to create a new vector database again. We can just send our `context` from above to OpenAI. We will use their chat completion API to interact with `GPT-3.5-turbo`. You can refer to their [documentation here](https://platform.openai.com/docs/guides/chat).

# MAGIC

# MAGIC Something interesting is that OpenAI models use the system message to help their assistant to be more accurate. From OpenAI's [docs](https://platform.openai.com/docs/guides/chat/introduction):

# MAGIC

# MAGIC > Future models will be trained to pay stronger attention to system messages. The system message helps set the behavior of the assistant.

# MAGIC

# MAGIC


# COMMAND ----------


# TODO

gpt35_response = openai.ChatCompletion.create(

??model="gpt-3.5-turbo",

??messages=[

????{"role": "system", "content": "You are a helpful assistant."},

????{"role": "user", "content": <FILL_IN>},

??],

??temperature=0, # 0 makes outputs deterministic; The closer the value is to 1, the more random the outputs are for each time you re-run.

)


# COMMAND ----------


print(gpt35_response.choices[0]["message"]["content"])


# COMMAND ----------


from IPython.display import Markdown


Markdown(gpt35_response.choices[0]["message"]["content"])


# COMMAND ----------


# MAGIC %md

# MAGIC We can also check how many tokens OpenAI has used


# COMMAND ----------


gpt35_response["usage"]["total_tokens"]



2.8a課 嵌入,使用chroma 向量數(shù)據(jù)庫代碼講解 -- 大語言模型應(yīng)用開發(fā)的評(píng)論 (共 條)

分享到微博請(qǐng)遵守國(guó)家法律
惠东县| 张家界市| 洛阳市| 剑阁县| 巴里| 南丰县| 双桥区| 汤原县| 西畴县| 鲁甸县| 水富县| 纳雍县| 中江县| 敖汉旗| 什邡市| 伊金霍洛旗| 锡林浩特市| 宁城县| 炉霍县| 丽江市| 望江县| 扶余县| 德化县| 文山县| 林口县| 信宜市| 白山市| 固原市| 志丹县| 和顺县| 丹寨县| 廊坊市| 英吉沙县| 灵宝市| 黎城县| 定陶县| 栾城县| 白水县| 湛江市| 阿鲁科尔沁旗| 全南县|