散文網(wǎng) » 生活 »日常 » stable-diffusion里的Clip skip到底是什么？

stable-diffusion里的Clip skip到底是什么？

2023-03-24 21:20 作者:Gnedl 0人讀過 | 我要投稿

CLIP model (The text embedding present in 1.x models) has a structure that is composed of layers. Each layer is more specific than the last. Example if layer 1 is "Person" then layer 2 could be: "male" and "female"; then if you go down the path of "male" layer 3 could be: Man, boy, lad, father, grandpa... etc. Note this is not exactly how the CLIP model is structured, but for the sake of example.

The 1.5 model is for example 12 ranks deep. Where in 12th layer is the last layer of text embedding. Each layer matrix of some size, and each layer is has additional matrixes. So 4x4 first layer has 4 4x4 under it... SO and so forth. So the text space is dimensionally fucking huge.

Now why would you want to stop earlier in the Clip layers? Well if you want picture of "a cow" you might not care about the sub categories of "cow" the text model might have. Especially since these can have varying degrees of quality. So if you want "a cow" you might not want "a abederdeen angus bull".

You can imagine CLIP skip to basically be a setting for "how accurate you want the text model to be". You can test it out, wtih XY script for example. You can see that each clip stage has more definition in the description sense. So if you have a detailed prompt about a young man standing in a field, with lower clip stages you'd get picture of "a man standing", then deeper "young man standing", "Young man standing in a forest"... etc.

CLIP skip really becomes good when you use models that are structured in a special way. Like Booru models. Where "1girl" tag can break down to many sub tags that connect to that one major tag. Whether you get use of from clip skip is really just trial and error.

Now keep in mind that CLIP skip only works in models that use CLIP and or are based on models that use CLIP. As in 1.x models and it's derivates. 2.0 models and it's derivates do not interact with CLIP because they use OpenCLIP.

以下是中文翻譯（AI翻譯）

CLIP模型（1.x模型中存在的文本嵌入）具有由層組成的結(jié)構(gòu)。每一層比上一層更具體。例如，如果第一層是“人”，則第二層可能是：“男性”和“女性”；然后，如果您沿著“男性”的路徑走，第三層可能是：男人、男孩、小伙子、父親、爺爺?shù)取Ｕ?qǐng)注意，CLIP模型的結(jié)構(gòu)并非完全如此，但是為了舉例而言。

例如，1.5模型有12個(gè)等級(jí)。在第12層中，是文本嵌入的最后一層。每個(gè)層矩陣有一定的大小，每個(gè)層都有額外的矩陣。因此，第一層的4x4有4個(gè)4x4在其下面...如此等等。因此，文本空間的維度非常巨大。

現(xiàn)在為什么要在Clip層中停止較早？如果您想要“一頭牛”的圖片，則可能不關(guān)心文本模型可能具有的“?！钡淖宇悇e。特別是因?yàn)檫@些可以具有不同程度的質(zhì)量。因此，如果您想要“一頭?！?，您可能不想要“一頭阿伯丁安格斯公牛”。

您可以將CLIP skip想象成“您希望文本模型有多準(zhǔn)確”的設(shè)置。例如，您可以使用XY腳本進(jìn)行測(cè)試。您可以看到每個(gè)clip階段在描述意義上都具有更多的定義。因此，如果您有關(guān)于一個(gè)年輕男子站在田野上的詳細(xì)提示，那么在較低的clip階段中，您會(huì)得到“一個(gè)站立的男人的圖片”，然后更深入的是“站立的年輕男人”，“站在森林中的年輕男人”等等。

當(dāng)您使用以特殊方式結(jié)構(gòu)化的模型時(shí)，CLIP skip真正變得好用。例如，Booru模型。在那里，“1girl”標(biāo)記可以分解為許多連接到該主要標(biāo)記的子標(biāo)記。無論您是否從clip skip中獲得使用都只是試錯(cuò)。

現(xiàn)在請(qǐng)記住，CLIP跳過僅適用于使用CLIP或基于使用CLIP的模型。即1.x模型及其派生物。2.0模型及其派生物不與CLIP交互，因?yàn)樗鼈兪褂肙penCLIP。

尋找合適且易懂的解釋花了些時(shí)間，但還好這是有收獲的

此回答轉(zhuǎn)自https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/5674

感興趣的可以去看看。

標(biāo)簽：

stable-diffusion里的Clip skip到底是什么？的評(píng)論 (共條)

愛情散文傷感散文哲理散文優(yōu)美生活隨筆親情唯美句子傷感的句子現(xiàn)代詩歌空間日志經(jīng)典語句愛情句子作文大全

最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

stable-diffusion里的Clip skip到底是什么？

stable-diffusion里的Clip skip到底是什么？的評(píng)論 (共條)

你可能也喜歡這些文章

最新發(fā)布的文章

最美情侣中文字幕电影,在线麻豆精品传媒,在线网站高清黄,久久黄色视频

stable-diffusion里的Clip skip到底是什么？

本文作者的其他文章

stable-diffusion里的Clip skip到底是什么？的評(píng)論 (共 條)

你可能也喜歡這些文章

最新發(fā)布的文章

stable-diffusion里的Clip skip到底是什么？的評(píng)論 (共條)