LOD云中身份和重定向的演变审视_第1页
LOD云中身份和重定向的演变审视_第2页
LOD云中身份和重定向的演变审视_第3页
LOD云中身份和重定向的演变审视_第4页
LOD云中身份和重定向的演变审视_第5页
已阅读5页,还剩13页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

Shuai

Wangwith

Idries

Nasim,

Joe

Raad,

Peter

Bloem,

Frank

van

HarmelenWAI,

19th

SeptemberExamining

the

Evolution

of

Identity

andRedirection

in

the

LOD

CloudhLps:///shuaiwangvu/idenPty_graph_evoluPonhLps:///shuaiwangvu/redirecPon•2Introduction

and

related

workEvolution

of

identity

graphsConstructing

the

new

identity

graphsCompare

the

identity

graphsAnalysis

of

redirectionConstructing

the

redirect

graphsA

qualitative

analysisA

quantitative

analsysisDiscussion

and

future

work•••OUTLINE•Identity

crisis

[Halpin,

et

al.]17.9%

entities

in

DBpedia

do

not

exist

after

2

years

[De

Melo,

2013]Semantic

web

evolution

[multiple]Semanticallybroken

links

[Regino

et

al]•••INTRODUCTION

AND

RELATED

WORKBlack

=

identity

linksRed

=

redirectionBlue

=

encoding

equivalence3RQ1:

How

has

the

identitygraph

in

the

semantic

web

changed?RQ2:

Can

graphs

of

redirects

provide

an

indication

of

the

evolution

of

the

identitygraphs

in

the

semantic

web?RQ3:

Can

we

approximate

the

implicit

semantics

of

redirection?RQ4:

What

are

the

properties

of

the

redirection

graph?4RESEARCH

QUESTIONSLOD

Laundromat

in

2015

shows

that•91.2%

of

entities

are

in

linksets8.1%

of

entities

are

in

major

hubs

with

more

than10identity

links.Construct

the

new

identity

graph

with•linksets:

DBpedia

DatabusMajor

linked

data

hub:

Yago,

Pleiades,

WordNet,etc.CONSTRUCTING

THE

NEW

IDENTITY

GRAPHS5•HTTP

200:

‘OK’400+

HTTP

error:

Not

Found:

‘NF’a

literal

or

the

request

fails:

‘ER’Timeout:

‘TO’••••All

redirects

of

300+Redirects

Until

Found:

‘RUT’RedirectUntil

Not

Found:

‘RUNF’Redirect

Until

Error:

‘RUE’Redirect

Until

Found:

‘RUF’••••657.9M

entities

are

shared=

32.3%

of

Gand13.4%

of

H.H

consists

of

many

more

entitiesthan

G.The

triple:entity

ratio

has

droppedfrom

3.12

in

G

to

0.94

in

H,

whichindicate

that

redundant

edgesmight

be

fewer

in

H.The

HDT

file

of

I

is

3.3

times

biggerthan

G.7COMPARING

THE

GRAPHS#

Triples#EntitiesSize

(HDT

file)G

(old

graph)558M179M4.5GH

(new

graph)409M443M11GI

(integrated

graph)951M555M15GFor

the

largest

CC

of

I.293700

(28%)

nodes

fromG,

450107

(44%)

from

H,290877(28%)

from

both•37176(46,5%)

CC’s

fromG,

42718(53,4%)

CC’sfrom

HCOMPARING

THE

GRAPHS|Biggest

CC|#CCSize

(HDT

file)G

(old

graph)178K49M4.5GH

(new

graph)219K137M11GI

(integrated

graph)1M164M15G8For

H

and

G:Sampling

100K

uniformlySampling

20K

fromCC(2)CC(3-10)CC(>10)CONSTRUCTING

THE

REDIRECT

GRAPHSDenoted9COMPARING

THE

OLD

AND

NEW

GRAPHS10Valid

=

OK

+

RUF(redirect

untilfound)Invalid

=

the

restANALYSIS

OF

THE

REDIRECTION

GRAPH11G

has

more

validentities

than

H.Only

1-3%

returnsmeaningful

infodirectly.>50%

has

redirectionfor

uniform

samplingANALYSIS

OF

THE

REDIRECTION

GRAPH12#Valid

decreases

as

the

sizeof

CCs

increase,

especially

H.Opposite

trend

for

NF,

TO,RUNF,

RUEDifferent

for

OK

is

too

smallto

draw

a

conclusion.ANALYSIS

OF

THE

REDIRECTION

GRAPH13ANALYSIS

OF

THE

REDIRECTION

GRAPH

GFrom

now

on,

the

resultsareonly

about

the

old

graph.14LONG

REDIRECTION

PATHS[‘/resource/Mirage_%28pop_group%29','/resource/Mirage_(pop_group)','/resource/Mirage_(pop_group)','/page/Mirage_(pop_group)','/page/Mirage_(pop_group)','/resource/Mirage_(disambiguation)','/resource/Mirage_(disambiguation)','/page/Mirage_(disambiguation)',‘/page/Mirage_(disambiguation)']1545.1%

(encoding,

http->https,

upper/lower

case)16.8%

(DBpedia

resource

to

page,

etc.)Approx.

45.1%

-

83.2%

canbe

taken

as

identity

linksIMPLICIT

SEMANTICS

OF

REDIRECTION

(4,000

EDGES)16100

CHAINS

OF

REDIRECTIONOn

average1.7hops.

We

examine

redirection

chains

with

over

2

hopsSimilar

number

of

hops

for

RUF,

RUE,

RUNF,

RUT.

So

we

sample

uniformly85%

happens

within

a

domainWikidata

(28%)andDBpedia

(25%)

are

among

the

most

observedChains

of

DBpedia

are

often

among

the

longest.Some

others

were

observed

for

(5%)

and

(1%)17•18Few

entities

can

be

redirected

from

G

to

HRedirection

is

a

well-observed

in

identity

graphsWhen

only

1-3%

can

bedereferenced,

it

hurts

accessibility

and

interoperabilityWe

observ

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论