Python 中的机器学习_第1页
Python 中的机器学习_第2页
Python 中的机器学习_第3页
Python 中的机器学习_第4页
Python 中的机器学习_第5页
已阅读5页,还剩33页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

MACHINELEARNINGINPYTHON

(PART4):

DIFFUSIONMODELSINPYTORCH

LUKESHENEMAN

GENERATIVEARTIFICIALINTELLIGENCE

TexttoImage(StableDiffusion)

TexttoVideo

GenerativeAI:Learnalatentrepresentationofthedistributionofourcomplextrainingdataandthensamplefromit

TrainingData

DeepLearning

Diffusion,etc.

Transformers,etc.

DIFFUSIONMODELS

CONDITIONINGIMAGEGENERATION

Providenaturallanguagetextpromptsto

guidereversediffusionprocess

Text-to-ImageDiffusionModelsareboth:

ImageGenerationModels

LanguageModels

StableDiffusionArchitecture

OVERVIEW

RecapfromParts1-3

MachineLearningBasics

NeuralNetworks

Tensors

ConvolutionalNeuralNetworks(CNNs)

GPUsandCUDA

PyTorch

WhyusePyTorch?

ImplementingaDiffusionModelinPython

TrainandTestourDiffusionModel

REVIEWOFBASICS

Machinelearningisadata-drivenmethodforcreatingmodelsforprediction,optimization,classification,generation,andmore

Pythonandscikit-learn

MNIST

ArtificialNeuralNetworks(ANNs)

MNIST

NEURALNETWORKBASICS

WeightsandBiases

FULLY-CONNECTEDNEURALNETWORKS

Imagesaretensors!

FEATUREHIERARCHIES

Weneedimagefilterstohelpusextractfeatures

EXAMPLE: SOBELFILTER

Sobelkernels=

CONVOLUTIONALNEURALNETWORK

GPUvs.CPU

“Moore’sLawforCPUsisDead”

WHYGPUSEXACTLY?

CNNsareallaboutmatrixandvectoroperations(multiplication,addition)

GPUscanperformparallelmultiplicationandadditionstepspereachclockcycle.

FrameworksmakeGPUsEasy

DIFFUSIONMODELS

FORWARDDIFFUSION

Definehowmanytimestepswillbeused(commontousehundredsormore)

EstablishanoiseschedulewhichdescribestherateatwhichGaussiannoiseisadded

Linear

Cosine

T=0 T=1 T=2 T=3 …

T=n

Iused100timesteps. LargermodelslikeStableDiffusionusethousandsofsmallersteps.Iusedacosinenoiseschedule.

TIMESTEPENCODING

30

30

+ =

RGBImage Integertimestep 4-ChannelRGB+Timestep

Iencodetimestepasanotherbandintheimageinpixel-space

U-NetArchitecture

4

U-NetDenoiser

3

RGB+T RGB

TrainingourNeuralNetwork

PossiblelossfunctionsforourU-Net

loss1=MSE(predt,original)loss2=MSE(predt,noisyt-1)loss3=predt-noisyt

OtherHyperparameters:

Epochs=100

Timesteps=100BatchSize=1250Optimizer=AdamLearningRate=0.001

CoreTrainingLoop

schedule=cosine_schedule(TIMESTEPS)foreachEpoch:

foreachBatchb:

foreachTimestept:

img=add_gaussian_noise(img,schedule(t))predicted=UNet(img)

loss=loss_function(img,predicted)backward_propagationandoptimization

CELEBFACESATTRIBUTES(CELEBA)DATASET

202,599numberoffaceimagesofvariouscelebrities

10,177uniqueidentities,butnamesofidentitiesarenotgiven

40binaryattributeannotationsperimage

5landmarklocations

Images”inthewild”orCropped/Aligned

SOMEPRELIMINARYOUTPUT

Ohno!

UseseparateAImodelforupsampling

64

64

512

SRResNet

512

/twtygqyy/pytorch-SRResNet

MyModel

Mightnotbeterrific,but…

Itwastrainedononly5000imagesforafewhoursonasingleRTX4090GPU

StableDiffusionwastrainedon600millioncaptionedimages

Took256NVIDIAA100GPUsonAmazonWebServicesatotalof150,000GPU-hoursAtacostof$600,000

StableDiffusion

ConditioningreverseDiffusiononTextprompts

PRE-PROCESSINGCELEBADATASET

Readfirst5000annotationsintoPANDASdataframe(easy!)

Foreachimage,gettheheadingnamesforpositiveattributes

Convertheadingnamesintoatextprompt:

e.g.“Photoofperson<attribute_x>,<attribute_y>,<attribute_z>,…”

e.g.“Photoofpersonbushyeyebrows,beard,mouthslightlyopen,wearinghat.”

Cropthelargestsquarefromtheimage,thenresizeto64x64x3numpyarray

UseOpenAICLIPmodeltofindtheimageembeddingsandtextembeddingsforeveryimage/promptpair.

Createa5000elementPythonlistof4-tuples:

(filename,64x64xRGBimage_array,image_embedding,prompt_embedding)

Picklelisttoafilewecanquickyloadintomemorywhenwetrainourmodel!

OPENAICLIPMODEL (CONTRASTIVELANGUAGE–IMAGEPRE-TRAINING)

Opensource/weightsmulti-modalAImodeltrainedonimage,captionpairs

Sharedembeddingspace!

Usetransformermodel(GPT-2)tocreatetokenembeddingsfromtext

Usevisiontransformer(VIT)tocreatetokenembeddingsfromimages

CLIPExamples/research/clip

USINGCLIPISTRIVIAL

/openai/CLIP

Zero-shotclassifications!ConditioningGenerativeAI(DALL-E)

GeneratingcaptionsforimagesorvideoImagesimilaritysearch

ContentModerationObjectTracking

CLIPusesvectorswith512dimensionsGPT3(Davinci)uses12888dimensions

Vectorembeddingscapturethedeepersemanticcontextofawordortextchunk…orimage…oranything.

Thesemanticsofanobjectaredefinedbyitsmulti-dimensionalandmulti-scaleco-occurrenceandrelationshipswithotherobjectsinthetrainingdata

Semanticvectorembeddingsarelearnedfromvastamountsofdata.

400,000,000(image,text)pairs

CLIPwastrainedon256largeGPUsfor2weeks.

SizeofEmbeddingVector

Onewaytotrainamulti-modalembeddinglayer

“AcuteWelshCorgidog.”

Learningyoursemanticembeddingsf

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论