课件l1线性代数基础_第1页
课件l1线性代数基础_第2页
课件l1线性代数基础_第3页
课件l1线性代数基础_第4页
课件l1线性代数基础_第5页
已阅读5页,还剩57页未读 继续免费阅读

付费下载

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

线性代数基础For

machine

learningVery

looselyBut

intuitive

(hopefully)Linear

Spce

-

Vector

Space(线性)向量空间:set

of

elements,elements

called

vectors

(一个函数也可称为向量!)but

different

to

set,

two

extra

operations

defined

over

it:1)

addition, 2)

scalar

multiplicationThese

two

must

satisfy

7

rules.e.g,

交换律,分配律等,very

trivial

but

general

enough,should

not

be

considered

as

strong

constraints.Examplese.g.,1.n-dimensional

real

coordinate

space(欧氏空间):null

vector,x+y,ax,

etc.2.

space

of

real

sequence:

each

element

is

an

infinite

seq.

ofrealnumber. Bounded

realsequence:

exist

a

constant

M,

s.t.,

theabsolutevalue

of

any

real

number

in

a

seq

is

smaller

than

it.

Boundedsequences

are

vector

space

as

well.f

=

g

means

that

f(x)=g(x)

at

every

position

xf

+

g means

a

new

function

hSubspace(子空间):是V的子集,但是其中元素对线性组合运算封闭。Linearcombination(线性组合):一种用已有样本线性组合生成新样本的机制。特别地,如果U=[u1,u2,…,uk]的线性组合能够填满整个S子空间,则称U“张成”子空间S,而S称为矩阵U的column

space。线性组合的矩阵表示:XwConvex

set:

more

“practical”

subspace

not

too

big,

butno

hole

in

it.Linear

independence

and

dimHow

to

define

“a

group

of

people

withcompletely

different

characteristics”mathematically?Each

one

cannot

be

defined

by

others,

ifyou

have

to

do

so,

then

the

combinationcoefficients

must

be

zero.Or

Linear

independence(线性独立).Why?去冗余,用最小的数据表达最大的信息量。basis

and

dimensionEach

vector

space,

has

a

few

fixednumber

of

key

members,

they

are

differentto

each

other

and

can

generate

others

inthis

space.You

can

change

basis,

but

cannot

changethe

dimension

for

a

vector

space.Normed

linear

spaceSimply

a

vector

space

equipped

with

aoperator

that

calculates

the

energy/size

ofa

vectorWhy

we

care

about

this?-判断一个数学对象是否可控?计算其能量!Transformation

and

continuityA

transformation

y=

T(x)

is

simply

amapping

from

X

to

Y.A

trans

from

X

to

a

space

of

realscalars,called

a

functional

ofX.Denoted

as

f,

g,

means

f(x)

over

X.E.g.,

f(x)

=

||x||,f(x)

=

<w,x>,

||f||=

max_x

|f(x)|=|<w,x>|<=||w||||x||=

||w||,

so

||f||=||w||.Transformation

and

continuitySmoothness:

if

x

changes

slightly,

then

ychanges

slightly

as

well.How

to

formulate

this?Banach

spaceIf

u

follow

a

sequence

(in

optimization),and

find

it

converges

to

a

point

outsideyour

viewpoint

A

Banach

space

is

a

spacethatguarantees

u

that

this

will

never

happen.(called

completeness)C[0,1]

is

a

Banach

space.Hilbert

space就是定义了内积的Bannach空间。可用于定义元素能量的计算方式(范数)–

x’x

is

a

norm,

i.e.,

||x||^2=

x’x正交集:H中任意两个元素的内积为0.单位正交集(orthonormal):正交且能量为1.它们构成的矩阵分别称为正交阵、单位正交阵,单位正交阵对对象进行线性变换后,不会改变其长度,只改变其方向。Gram-Schmidt

procedureHow

to

find

a

set

of

basis

for

a

group

ofpoints

v1,v2,…Vk

in

a

Hilbert

space?Input:

K

linearly

independent

pointsOutput:

K

orthonormal

points.思考:如果V是之前的矩阵、U是正交化后的矩阵,U是否改变了V张成的列空间?Project

v

onto

u投影:在集合U中找一点a,使得它与集合外一点v的距离最近。–在直线u上找一点a,使得它离v的距离最近。Rephrase:用受限模型f(u)=au去拟合数据v.关于a求导,等于0,得到V=au+(v-au)=投影+残差Gram-Schmidt

procedureResidual

learning:用迭代的方法,每次只拟合目标函数中的一小部分细节,而下次拟合“剩下”的部分(残差)。Iterate

t=1…T计算当前模型F_t

与目标之间的残差学习一个新的模型f_{t+1}来拟合残差将新模型f_{t+1}与F_t合并Good

enough?

if

yes,stop

now.Gram-Schmidt

procedureInput:

K

linearly

independent

points

Vk…Output:

K

orthonormal

points

Uk算法:1.任意选择一个方向u1=v1开始,F1={u1}Repeat

k=2….K:2.用当前模型F_{k-1}去拟合任意一点V_{k},计算残差r_{k},规整化。3.更新模型,F_{k}=F_{k-1}Ur_{k}Gram-Schmidt

procedure用当前模型F3={u1,u2,u3}来拟合当前样本v4拟合后的残差,将u4加入F集合,且与原有的{u1,u2,u3}正交。V:Nxd

:输入观测矩阵U:Nxd

:单位正交阵\Gamma:

dxd:上三角矩阵,对角线为1Least

square

RegressionIdea:把获得X的单位正交基Z,然后用Z来回归y

(Nx1).由Gram-Schidt

过程得到:X=UT,U是X的列空间:Nxd用U代替X

回归y:

Ua

=y =》

a

=

U’

y若直接用X回归y,有X

b=U

T

b=y,i.e.,T

b=U’y=a而根据T的定义方式,T(m,m)=1,so,xj的回归系数\beta(j),本质上是y在Um上的回归,Um:用除了Xj之外的所有列来回归Xj之后的残差。Problem:只要X中有一列与Xj高度相关,Um就会很小解决方案1:Forward

stepwiseregression同时进行特征选择和回归Let

X为Nxd,假定已用了q列(X1)进行

QR分解,用Q回归y得到q个回归变量,则当前模型的残差为:r=y–X1

\beta1还剩下d-q变量,问题,选哪个来继续正交化最好?解决方案2:PrincipalComponent

Regression问题:怎样寻找一个dxd正交转换矩阵P,使得它对X进行变换之后,得到目标阵T,它的每列彼此线性无关。i.e.,T=XP,but

只知X,T

、P未知So,do

PCA

on

X,

let

P

=

U,

trans

X

to

T,

and

then

perform

LS

Regression

over

T.计算主成分还有其他估计主成分的方法吗?What’s

covariance

matrix?Variance:

分布在某个方向上的散度What’s

covariance

matrix?Covariance:两个不同方向上的联合散度,一个d维空间上的样本,一共有d(d-1)/2个联合散度,构成一个对称矩阵。data

X协方差矩阵C=E(X-EX)’(X-EX)特征向量分别指向信息量最大和最小的方向。直觉协方差表示红色减去蓝色部分的净大小。与x,y轴的尺度有关易受离群点影响相关性:当大多数点沿向上排列时,红色居多,相关性大。注意区分两种情况:一种是inverse

协方差;一种直接是协方差(处理图像):What’s

covariance

matrix?对离群点和边缘敏感,对中间点不敏感。协方差矩阵的几何意义V=(1,3)’V’=C

*

vC

*v’applicationHarris

corner

detectorC.Harris,

M.Stephens.

“A

Combined

Corner

and

Edge

Detector”.

1988The

Basic

IdeaWe

should

easily

localize

the

point

bylooking

through

a

small

windowShifting

a

window

in

any

direction

shouldgive

a

large

change

in

intensityHarris

Detector:

Basic

Idea“flat”

region:no

change

as

shiftwindow

in

alldirections“edge”:no

change

as

shiftwindow

along

theedge

direction“corner”:significant

change

asshift

window

in

alldirections问题:怎样把这个检查动作表示成一个优化问题?Harris

Detector:

MathematicsWindow-averaged

change

of

intensity

inducedby

shifting

the

image

data

by

[u,v]:E(u,

v)

=

w(x,

y)

I

(x

+

u,

y

+

v)

-

I

(x,

y)

2x,

yIntensityShiftedintensityWindowfunctionorWindow

functionw(x,y)

=Gaussian1

in

window,

0

outsideTaylor

series

approx

to

shiftedimageE

(u,v)

»

w(x,

y)[I(x,

y)

+

uIx

+

vIy

-

I(x,

y)]2x,y=

w(x,

y)[uIx

+

vIy

]2x,yIx

IxIx

IyIx

Iy

uIy

Iy

v

=

w(x,

y)(u

v)x,yHarris

Detector:

MathematicsuE(u,

v)

@

[u,

v]

Mv

Expanding

I(x,y)

in

a

Taylor

series

expansion,

we

have,

for

small

shifts

[u,v],

abilinear

approximation:xx,

yx

y

yI

II

2

I

2I

I

x y

M

=

w(x,

y)

where

M

is

a

2·2

matrix

computed

from

image

derivatives:M

is

also

called

“structure

tensor”Harris

Detector:

MathematicsuE(u,

v)

@

[u,

v]

Mv

Intensity

change

in

shifting

window:

eigenvalue

analysisl1,

l2

eigenvalues

of

Mdirection

oftheslowestchangedirection

of

thefastestchange(lmax)-1/2(lmin)-1/2Ellipse

E(u,v)

=

constIso-intensity

contour

of

E(u,v)Selecting

Good

Featuresl1

and

l2

are

largeSelecting

Good

Featureslarge

l1,small

l2Selecting

Good

Featuressmall

l1,

small

l2Harris

Detector:

Mathematicsl1l2“Corner”l1

and

l2

are

large,l1

~

l2;E

increases

in

alldirectionsl1

and

l2

are

small;E

is

almostconstant

inalldirections“Edge”l1

>>

l2“Edge”l2

>>

l1“Flat”regionClassification

of

imagepoints

usingeigenvalues

of

M:Harris

Detector:

MathematicsMeasure

of

corner

response:R=

det

M

-

k

(trace

M

)2det

M

=

l1l2trace

M

=

l1

+

l2(k

empirical

constant,

k

=

0.04-0.06)This

expressiondoes

not

requirescomputing

theeigenvalues.Harris

Detector:

Mathematicsl1l2“Corner”“Flat”R

depends

only

oneigenvalues

of

MR

is

large

for

a

cornerR

is

negative

with

largemagnitude

for

an

edge|R|

is

small

for

a

flat

regionR

>

0“Edge”R

<

0“Edge”R

<

0|R|

smallHarris

DetectorThe

Algorithm:Find

points

with

large

corner

responsefunction

R

(R

>

threshold)Take

the

points

of

local

maxima

of

RHarris

Detector:

WorkflowHarris

Detector:

WorkflowCompute

corner

response

RHarris

Detector:

WorkflowFind

points

with

large

corner

response:

R>thresholdHarris

Detector:

WorkflowTake

only

the

points

of

local

maxima

of

RHarris

Detector:

WorkflowHarris

Detector:

SummaryAverage

intensity

change

in

direction

[u,v]

can

beexpressed

as

a

bilinear

form:Describe

a

point

in

terms

of

eigenvalues

of

M:measure

of

corner

responseA

good

(corner)

point

should

have

a

large

intensitychange

in

all

directions,

i.e.

R

should

be

largepositiveuE(u,

v)

@

[u,

v]

Mv

(

)21

2

1

2R

=

ll

-

k

l

+

lNonlinear

Iterative

Partial

LeastSquares

(NIPALS)Idea:如果知道样本xj在特征空间中的投影t,那么求主成分投影方向

p,就是一个回归过程。Butwe

don’tHowever,一方面,我们知道p是样本协方差矩阵的特征向量,另一方面,它满足重构约束,我们由此可得到由t估计p的函数关系如下:由此得到NIPALS算法:如此可获得第一个主成分p及其对应的t,:其余的继续拟合用投影坐标t来回归(解释)X的投影向量p,并规整化。重新估计X在p上的投影tPartial

Least

SquaresHow

to

relate

to

two

modals?寻找投影矩阵W和C,使得两个相关模态在共同的特征空间中的相关性最大迭代N次,对两个不同姿态的人脸:Partial

Least

SquaresIdea:要增强投影空间上的相关性,可以用彼此在特征空间上的投影来回归各自的投影向量。用Y的投影u来回归(解释)X的投影向量p用p来对X进行投影(特征抽取),得到特征空间坐标t用X的投影t来重构Y的投影向量q用q来对Y进行投影,得到特征空间坐标uPartial

Least

Squares当迭代完成后,X,Y的特征分别是T,U,然后用T来回归U:U=T

D,代入Y的生成模:Y=UQ’=TDQ’=XP’DQ’即:PLS目标是学习同时令两个模态X,Y相关性最高的latent

factors,揭示两个语义接近的不同观测背后的统计规律。Partial

Least

SquaresPLS其实是一种用监督方式估计特征空间的方法:回归、分类、降维、特征学习:和传统的Least

square

Regression

相比,PLS

更看重同时学习Y和X的特征学习,而传统LS只考虑了X空间的特征关系。和主成分分析PCA相比,PCA是unsupervised,而PLS利用监督信息来学习更有判别力的特征空间。But

why

called

“partial”?

es

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论