每個(gè)都能看懂,有趣的AI新聞20230819,時(shí)常更新
來(lái)自QQ群926267297-CY簡(jiǎn)單匯總,渣渣機(jī)器翻譯,沒(méi)有仔細(xì)驗(yàn)證僅供參考。
NVIDIA發(fā)布了Neuralangelo的源碼!
該模型可以將來(lái)自任何設(shè)備的視頻轉(zhuǎn)換為詳細(xì)的3D結(jié)構(gòu),完全復(fù)制建筑物,雕塑或其他真實(shí)的世界物體或空間。以下是它的工作原理:模型利用具有對(duì)象或場(chǎng)景多個(gè)角度的 2D 視頻。我從不同的角度選擇框架,以了解深度、大小和形狀。 人工智能創(chuàng)建一個(gè)初始的3D表示,類似于雕塑家塑造一個(gè)主題。渲染經(jīng)過(guò)優(yōu)化以增強(qiáng)細(xì)節(jié),例如雕塑家優(yōu)化紋理。 結(jié)果是適合虛擬現(xiàn)實(shí)、數(shù)字孿生或機(jī)器人的 3D 對(duì)象或場(chǎng)景。




GitHub-https://github.com/NVlabs/neuralangelo
https://research.nvidia.com/labs/dir/neuralangelo/
SD WEBUI圖像瀏覽器,sd-webui-infinite-image-browsing

https://github.com/zanllp/sd-webui-infinite-image-browsing
CoDeF: Content Deformation Fields for Temporally Consistent Video Processing

類似GEN2的類似實(shí)現(xiàn),好處是可以調(diào)換部分畫(huà)面內(nèi)容或者改變視頻風(fēng)格,https://github.com/qiuyu96/CoDeF
仿制上古FLASH動(dòng)畫(huà)風(fēng)格的SD1.5大文件模型,2GB

https://huggingface.co/nerijs/coralchar-diffusion
DragNUWA是一種視頻生成模型,它利用文本、圖像和軌跡作為三個(gè)基本控制因素,從語(yǔ)義、空間和時(shí)間方面促進(jìn)高度可控的視頻生成。與現(xiàn)有研究不同,DragNUWA使用戶能夠直接操作圖像中的背景或?qū)ο?,并且模型將這些動(dòng)作無(wú)縫轉(zhuǎn)換為相機(jī)運(yùn)動(dòng)或物體運(yùn)動(dòng),生成相應(yīng)的視頻。單擊左上角的“播放”按鈕以觀察DragNUWA如何操作同一圖像以創(chuàng)建具有所需相機(jī)運(yùn)動(dòng)和對(duì)象運(yùn)動(dòng)的視頻。
Homepage for the paper: DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory(opens in new tab)
DragNUWA is a video generation model that utilizes text, images, and trajectory as three essential control factors to facilitate highly controllable video generation from semantic, spatial, and temporal aspects. Distinct from existing research, DragNUWA enables users to manipulate backgrounds or objects within images directly, and the model seamlessly translates these actions into camera movements or object motions, generating the corresponding video.
Click the top-left “play” button to observe how DragNUWA manipulates the same image to create videos with desired camera movements and object motions.

https://www.microsoft.com/en-us/research/project/dragnuwa/
https://huggingface.co/papers/2308.08089
Empowering Vision-Language Models to Follow Interleaved Vision-Language Instructions
通過(guò)多模態(tài)的訓(xùn)練和調(diào)用實(shí)現(xiàn),實(shí)現(xiàn)圖片里面各種模塊進(jìn)行組合識(shí)別并可以訓(xùn)練,可以看圖片并AI進(jìn)行解釋或者AI通過(guò)文字進(jìn)行重新組合畫(huà)面內(nèi)容。




https://github.com/DCDmllm/Cheetah
ToolBench

https://github.com/OpenBMB/ToolBench/blob/master/README_ZH.md
IP-Adapter



https://huggingface.co/h94/IP-Adapter/tree/main