最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

歡迎光臨散文網(wǎng) 會(huì)員登陸 & 注冊

ChatGPT Moderation: AIGC怎樣與人類價(jià)值觀對齊?怎樣識(shí)別parsons語言?生態(tài)/破壞話語?

2023-07-22 17:02 作者:biggertree-Jing  | 我要投稿

ChatGPT API Moderation model:ChatGPT API 審查模型

為了保證人工智能與人類健康的價(jià)值觀對齊,ChatGPT構(gòu)建了一個(gè)審查模型(Moderation Model)。目的是用來識(shí)別色情、暴力、侮辱、粗俗等惡意言辭和指令。這一目標(biāo)似乎與英語教學(xué)中屏蔽parsnips語言(注: parsnips 是指有關(guān)politics,?alcohol,?religion,?sex,?narcotics,?-isms,?pork的敏感詞)、生態(tài)話語分析中辨別生態(tài)話語/破壞性話語、批評話語分析中識(shí)別意識(shí)形態(tài)(價(jià)值觀)的需求不謀而合。在語言教學(xué)應(yīng)用、話語研究、教學(xué)材料開發(fā)中都有很強(qiáng)的應(yīng)用潛力。故此,特轉(zhuǎn)發(fā)以下文章,希望給大家?guī)韼椭?/span>


Discover in this article what is the?ChatGPT API Moderation model, and what are the 7 categories used in it and how to call and interpret them.

ChatGPT API Moderation model

OpenAI API provides the possibility to classify any text to ensure it complies with their usage policies, using a binary classification.?This classification is integrated in their Moderation model that one can call using openai API in Python.

7 categories are used in the OpenAI model: Hate, Hate/Threatening, Self-harm, Sexual, Sexual/minors, Violence, Violence/graphic.

One can use them to filter any inappropriate content (comments in a website, inputs from clients in chatbot requests…).?

Source: OpenAI documentation – 7 categories in Moderation Model


OpenAI API Moderation method

The method to call to use the moderation classification is:?openai.Moderation.create?

The answer is a JSON object:?

In the JSON object, you have:?

  • model: The model currently used is called “text-moderation-004”.

  • results: in which you have:

    • True: if the input text does violate the given category

    • False: if does not

    • categories: For each of the 7 categories, you have a binary classification:

    • Category scores: for each category, a score is calculated. It’s not a probability. The lower the score, the better the content. The higher the score, the more it violates the above categories.

  • flagged: Which is the final classification of the input.

    • “false” if the input text does not violate OpenAI’s policies.

    • “true” if it does: If at least one category is true, this flag is set to true too.

Moderation API Call

Standard Call

The classification of the prompt “I love chocolate” is “false”, meaning it does not violate any of the above categories.

Here is the detailed output:

All scores are very low, thus the given categories are all “false”.

Call violation

The prompt given in the following request?is just for illustration. It is not a personal opinion.

The output is “true”, meaning there is a violation. This is because the input violates the first category “hate” with a score of 0.52, while the other categories are all showing very low scores.

Some variants

When the input is describing a personal belief, the classification is correct. However when it describes a global opinion, the model does not classify it as violating the policies.?

Here is an example, where the classification is false even if the input has a negative connotation :

Here is another variant, where a simple comma can change widely the score (the classification in both cases is “true”):

The score is about 0.66

Here the score is about 0.954 (with a simple comma):

Summary

In this article, you have learned how to use the ChatGPT API Moderation model, that you can put in place for your own project/website to avoid inputs or comments violating any common sense.

I hope you enjoy reading the article. Leave me a SanLian :-)?


本文英文部分轉(zhuǎn)載自:https://machinelearning-basics.com/chatgpt-api-moderation-model/?

.



ChatGPT Moderation: AIGC怎樣與人類價(jià)值觀對齊?怎樣識(shí)別parsons語言?生態(tài)/破壞話語?的評論 (共 條)

分享到微博請遵守國家法律
秦皇岛市| 怀柔区| 易门县| 永顺县| 富裕县| 包头市| 农安县| 吉安县| 靖宇县| 嘉禾县| 遂平县| 静安区| 盐津县| 丰镇市| 论坛| 新田县| 庆元县| 蚌埠市| 无锡市| 富平县| 吴忠市| 荣昌县| 泰兴市| 申扎县| 泰顺县| 如皋市| 白朗县| 彰武县| 腾冲县| 德格县| 高尔夫| 墨竹工卡县| 蕲春县| 特克斯县| 长沙县| 陈巴尔虎旗| 轮台县| 永吉县| 新平| 上栗县| 阿坝县|