版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
MACHINELEARNINGINPYTHON
(PART4):
DIFFUSIONMODELSINPYTORCH
LUKESHENEMAN
GENERATIVEARTIFICIALINTELLIGENCE
TexttoImage(StableDiffusion)
TexttoVideo
GenerativeAI:Learnalatentrepresentationofthedistributionofourcomplextrainingdataandthensamplefromit
TrainingData
DeepLearning
Diffusion,etc.
Transformers,etc.
DIFFUSIONMODELS
CONDITIONINGIMAGEGENERATION
Providenaturallanguagetextpromptsto
guidereversediffusionprocess
Text-to-ImageDiffusionModelsareboth:
ImageGenerationModels
LanguageModels
StableDiffusionArchitecture
OVERVIEW
RecapfromParts1-3
MachineLearningBasics
NeuralNetworks
Tensors
ConvolutionalNeuralNetworks(CNNs)
GPUsandCUDA
PyTorch
WhyusePyTorch?
ImplementingaDiffusionModelinPython
TrainandTestourDiffusionModel
REVIEWOFBASICS
Machinelearningisadata-drivenmethodforcreatingmodelsforprediction,optimization,classification,generation,andmore
Pythonandscikit-learn
MNIST
ArtificialNeuralNetworks(ANNs)
MNIST
NEURALNETWORKBASICS
WeightsandBiases
FULLY-CONNECTEDNEURALNETWORKS
Imagesaretensors!
FEATUREHIERARCHIES
Weneedimagefilterstohelpusextractfeatures
EXAMPLE: SOBELFILTER
Sobelkernels=
CONVOLUTIONALNEURALNETWORK
GPUvs.CPU
“Moore’sLawforCPUsisDead”
WHYGPUSEXACTLY?
CNNsareallaboutmatrixandvectoroperations(multiplication,addition)
GPUscanperformparallelmultiplicationandadditionstepspereachclockcycle.
FrameworksmakeGPUsEasy
DIFFUSIONMODELS
FORWARDDIFFUSION
Definehowmanytimestepswillbeused(commontousehundredsormore)
EstablishanoiseschedulewhichdescribestherateatwhichGaussiannoiseisadded
Linear
Cosine
T=0 T=1 T=2 T=3 …
T=n
Iused100timesteps. LargermodelslikeStableDiffusionusethousandsofsmallersteps.Iusedacosinenoiseschedule.
TIMESTEPENCODING
30
30
+ =
RGBImage Integertimestep 4-ChannelRGB+Timestep
Iencodetimestepasanotherbandintheimageinpixel-space
U-NetArchitecture
4
U-NetDenoiser
3
RGB+T RGB
TrainingourNeuralNetwork
PossiblelossfunctionsforourU-Net
loss1=MSE(predt,original)loss2=MSE(predt,noisyt-1)loss3=predt-noisyt
OtherHyperparameters:
Epochs=100
Timesteps=100BatchSize=1250Optimizer=AdamLearningRate=0.001
CoreTrainingLoop
schedule=cosine_schedule(TIMESTEPS)foreachEpoch:
foreachBatchb:
foreachTimestept:
img=add_gaussian_noise(img,schedule(t))predicted=UNet(img)
loss=loss_function(img,predicted)backward_propagationandoptimization
CELEBFACESATTRIBUTES(CELEBA)DATASET
202,599numberoffaceimagesofvariouscelebrities
10,177uniqueidentities,butnamesofidentitiesarenotgiven
40binaryattributeannotationsperimage
5landmarklocations
Images”inthewild”orCropped/Aligned
SOMEPRELIMINARYOUTPUT
Ohno!
UseseparateAImodelforupsampling
64
64
512
SRResNet
512
/twtygqyy/pytorch-SRResNet
MyModel
Mightnotbeterrific,but…
Itwastrainedononly5000imagesforafewhoursonasingleRTX4090GPU
StableDiffusionwastrainedon600millioncaptionedimages
Took256NVIDIAA100GPUsonAmazonWebServicesatotalof150,000GPU-hoursAtacostof$600,000
StableDiffusion
ConditioningreverseDiffusiononTextprompts
PRE-PROCESSINGCELEBADATASET
Readfirst5000annotationsintoPANDASdataframe(easy!)
Foreachimage,gettheheadingnamesforpositiveattributes
Convertheadingnamesintoatextprompt:
e.g.“Photoofperson<attribute_x>,<attribute_y>,<attribute_z>,…”
e.g.“Photoofpersonbushyeyebrows,beard,mouthslightlyopen,wearinghat.”
Cropthelargestsquarefromtheimage,thenresizeto64x64x3numpyarray
UseOpenAICLIPmodeltofindtheimageembeddingsandtextembeddingsforeveryimage/promptpair.
Createa5000elementPythonlistof4-tuples:
(filename,64x64xRGBimage_array,image_embedding,prompt_embedding)
Picklelisttoafilewecanquickyloadintomemorywhenwetrainourmodel!
OPENAICLIPMODEL (CONTRASTIVELANGUAGE–IMAGEPRE-TRAINING)
Opensource/weightsmulti-modalAImodeltrainedonimage,captionpairs
Sharedembeddingspace!
Usetransformermodel(GPT-2)tocreatetokenembeddingsfromtext
Usevisiontransformer(VIT)tocreatetokenembeddingsfromimages
CLIPExamples/research/clip
USINGCLIPISTRIVIAL
/openai/CLIP
Zero-shotclassifications!ConditioningGenerativeAI(DALL-E)
GeneratingcaptionsforimagesorvideoImagesimilaritysearch
ContentModerationObjectTracking
CLIPusesvectorswith512dimensionsGPT3(Davinci)uses12888dimensions
Vectorembeddingscapturethedeepersemanticcontextofawordortextchunk…orimage…oranything.
Thesemanticsofanobjectaredefinedbyitsmulti-dimensionalandmulti-scaleco-occurrenceandrelationshipswithotherobjectsinthetrainingdata
Semanticvectorembeddingsarelearnedfromvastamountsofdata.
400,000,000(image,text)pairs
CLIPwastrainedon256largeGPUsfor2weeks.
SizeofEmbeddingVector
Onewaytotrainamulti-modalembeddinglayer
“AcuteWelshCorgidog.”
Learningyoursemanticembeddingsf
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 计算机直接制版机项目可行性报告
- 外墙清洗实施方案(2篇)
- 学校迎春晚会方案策划书(2篇)
- 幼儿教育教学设计方案(2篇)
- 幼儿园年会方案及流程设计(2篇)
- 写委托书的模板
- 2024-2034年中国醋酸乙酯行业竞争格局分析及投资战略咨询报告
- 2024-2034年中国速冻馒头市场分析预测及投资策略报告
- 2024-2034年中国辛醇市场行情动态分析及发展前景趋势预测报告
- 2024-2034年中国车铃行业发展潜力分析及投资方向研究报告
- (1.39)-06-07-final关键字编程基础
- 车载抬头显示器系统的研究
- 2023年中考阅读理解真题B篇(30+)
- 《雾在哪里》说课课件
- 基于单片机控制的双足行走机器人的设计
- 工程数学智慧树知到课后章节答案2023年下四川水利职业技术学院
- 2022年大学英语六级考试试题册(含参考答案)
- 高中生物选择性必修1参考答案高中生物:选择性必修1思考与课后习题参考答案(已校对2遍)
- 220kV车载移动式变电站技术条件2023
- 植物的生长变化 单元作业设计
- 2023江苏公务员行测题目2023(C类)
评论
0/150
提交评论