版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领
文档简介
Shuai
Wangwith
Idries
Nasim,
Joe
Raad,
Peter
Bloem,
Frank
van
HarmelenWAI,
19th
SeptemberExamining
the
Evolution
of
Identity
andRedirection
in
the
LOD
CloudhLps:///shuaiwangvu/idenPty_graph_evoluPonhLps:///shuaiwangvu/redirecPon•2Introduction
and
related
workEvolution
of
identity
graphsConstructing
the
new
identity
graphsCompare
the
identity
graphsAnalysis
of
redirectionConstructing
the
redirect
graphsA
qualitative
analysisA
quantitative
analsysisDiscussion
and
future
work•••OUTLINE•Identity
crisis
[Halpin,
et
al.]17.9%
entities
in
DBpedia
do
not
exist
after
2
years
[De
Melo,
2013]Semantic
web
evolution
[multiple]Semanticallybroken
links
[Regino
et
al]•••INTRODUCTION
AND
RELATED
WORKBlack
=
identity
linksRed
=
redirectionBlue
=
encoding
equivalence3RQ1:
How
has
the
identitygraph
in
the
semantic
web
changed?RQ2:
Can
graphs
of
redirects
provide
an
indication
of
the
evolution
of
the
identitygraphs
in
the
semantic
web?RQ3:
Can
we
approximate
the
implicit
semantics
of
redirection?RQ4:
What
are
the
properties
of
the
redirection
graph?4RESEARCH
QUESTIONSLOD
Laundromat
in
2015
shows
that•91.2%
of
entities
are
in
linksets8.1%
of
entities
are
in
major
hubs
with
more
than10identity
links.Construct
the
new
identity
graph
with•linksets:
DBpedia
DatabusMajor
linked
data
hub:
Yago,
Pleiades,
WordNet,etc.CONSTRUCTING
THE
NEW
IDENTITY
GRAPHS5•HTTP
200:
‘OK’400+
HTTP
error:
Not
Found:
‘NF’a
literal
or
the
request
fails:
‘ER’Timeout:
‘TO’••••All
redirects
of
300+Redirects
Until
Found:
‘RUT’RedirectUntil
Not
Found:
‘RUNF’Redirect
Until
Error:
‘RUE’Redirect
Until
Found:
‘RUF’••••657.9M
entities
are
shared=
32.3%
of
Gand13.4%
of
H.H
consists
of
many
more
entitiesthan
G.The
triple:entity
ratio
has
droppedfrom
3.12
in
G
to
0.94
in
H,
whichindicate
that
redundant
edgesmight
be
fewer
in
H.The
HDT
file
of
I
is
3.3
times
biggerthan
G.7COMPARING
THE
GRAPHS#
Triples#EntitiesSize
(HDT
file)G
(old
graph)558M179M4.5GH
(new
graph)409M443M11GI
(integrated
graph)951M555M15GFor
the
largest
CC
of
I.293700
(28%)
nodes
fromG,
450107
(44%)
from
H,290877(28%)
from
both•37176(46,5%)
CC’s
fromG,
42718(53,4%)
CC’sfrom
HCOMPARING
THE
GRAPHS|Biggest
CC|#CCSize
(HDT
file)G
(old
graph)178K49M4.5GH
(new
graph)219K137M11GI
(integrated
graph)1M164M15G8For
H
and
G:Sampling
100K
uniformlySampling
20K
fromCC(2)CC(3-10)CC(>10)CONSTRUCTING
THE
REDIRECT
GRAPHSDenoted9COMPARING
THE
OLD
AND
NEW
GRAPHS10Valid
=
OK
+
RUF(redirect
untilfound)Invalid
=
the
restANALYSIS
OF
THE
REDIRECTION
GRAPH11G
has
more
validentities
than
H.Only
1-3%
returnsmeaningful
infodirectly.>50%
has
redirectionfor
uniform
samplingANALYSIS
OF
THE
REDIRECTION
GRAPH12#Valid
decreases
as
the
sizeof
CCs
increase,
especially
H.Opposite
trend
for
NF,
TO,RUNF,
RUEDifferent
for
OK
is
too
smallto
draw
a
conclusion.ANALYSIS
OF
THE
REDIRECTION
GRAPH13ANALYSIS
OF
THE
REDIRECTION
GRAPH
GFrom
now
on,
the
resultsareonly
about
the
old
graph.14LONG
REDIRECTION
PATHS[‘/resource/Mirage_%28pop_group%29','/resource/Mirage_(pop_group)','/resource/Mirage_(pop_group)','/page/Mirage_(pop_group)','/page/Mirage_(pop_group)','/resource/Mirage_(disambiguation)','/resource/Mirage_(disambiguation)','/page/Mirage_(disambiguation)',‘/page/Mirage_(disambiguation)']1545.1%
(encoding,
http->https,
upper/lower
case)16.8%
(DBpedia
resource
to
page,
etc.)Approx.
45.1%
-
83.2%
canbe
taken
as
identity
linksIMPLICIT
SEMANTICS
OF
REDIRECTION
(4,000
EDGES)16100
CHAINS
OF
REDIRECTIONOn
average1.7hops.
We
examine
redirection
chains
with
over
2
hopsSimilar
number
of
hops
for
RUF,
RUE,
RUNF,
RUT.
So
we
sample
uniformly85%
happens
within
a
domainWikidata
(28%)andDBpedia
(25%)
are
among
the
most
observedChains
of
DBpedia
are
often
among
the
longest.Some
others
were
observed
for
(5%)
and
(1%)17•18Few
entities
can
be
redirected
from
G
to
HRedirection
is
a
well-observed
in
identity
graphsWhen
only
1-3%
can
bedereferenced,
it
hurts
accessibility
and
interoperabilityWe
observ
温馨提示
- 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
- 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
- 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
- 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
- 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
- 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
- 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。
最新文档
- 市政工程促进城市环境保绿
- 市政改善城市绿地设计
- 2024年生化化工药品技能考试-电镀工笔试参考题库含答案
- 2024年火电电力职业技能鉴定考试-交联操作工笔试参考题库含答案
- 2024年海南住院医师-海南住院医师放射科笔试参考题库含答案
- 2024-2030全球与中国中文学习软件市场现状及未来发展趋势
- 2024年中国Y型分支接头行业研究报告
- 矿用电缆项目市场研究报告及运营管理方案|瑞克咨询|2024年编|
- 航天器记录设备项目商业计划书及实施方案|瑞克咨询|2024年编|
- 气体放电灯:氙气灯项目可行性研究报告及运营方案|瑞克咨询|2024年编|
- 甲状腺病例分析
- 家庭农场会议记录模板
- 网络安全之安全运维管理
- 国有企业违规经营投资责任追究制度专项培训课件
- 基于Multisim的健身计步器设计与仿真报告(完整版)
- 五年级(下)科学教科版全册全套单元期中期末检测卷(一)附答案
- 流量为210th-U型管式冷凝器设计毕业设计论文
- 上海市大学生安全教育(2022级)学习通课后章节答案期末考试题库2023年
- 菜鸟驿站转让合同协议
- 水利工程实验室量测作业指导书
- 广东省深圳市光明区2023年数学六下期末统考试题含解析
评论
0/150
提交评论