外文翻译网络服务的爬虫引擎

上传人：漫*** IP属地：贵州上传时间：2019-01-07 格式：DOC 页数：28 大小：567.50KB 积分：20 举报 版权申诉

已阅读5页，还剩23页未读，继续免费阅读

版权说明：本文档由用户提供并上传，收益归属内容提供方，若内容存在侵权，请进行举报或认领

文档简介

外文资料WSCE: A Crawler Engine for Large-Scale Discovery of Web Services Eyhab Al-Masri and Qusay H. Mahmoud Abstract This paper addresses issues relating to the efficient access and discovery of Web services across multiple UDDI Business Registries (UBRs). The ability to explore Web services across multiple UBRs is becoming a challenge particularly as size and magnitude of these registries increase. As Web services proliferate, finding an appropriate Web service across one or more service registries using existing registry APIs (i.e. UDDI APIs) raises a number of concerns such as performance, efficiency, end-to-end reliability, and most importantly quality of returned results. Clients do not have to endlessly search accessible UBRs for finding appropriate Web services particularly when operating via mobile devices. Finding relevant Webservices should be time effective and highly productive. In an attempt to enhance the efficiency of searching for businesses and Web services across multiple UBRs, we propose a novel exploration engine, the Web Service Crawler Engine (WSCE). WSCE is capable of crawling multiple UBRs, and enables for the establishment of a centralized Web servicesrepository which can be used for large-scale discovery of Web services. The paper presents experimental validation, results, and analysis of the presented ideas. 1. Introduction The continuous growth and propagation of the internet have been some of the main factors for information overload which at many instances act as deterrents for quick and easy discovery of information. Web services are internet-based, modular applications, and the automatic discovery and composition of Web services are an emerging technology of choice for building understandable applications used for business-to-business integration and are of an immense interest to governments, businesses, as well as individuals. As Web services proliferate, the same dilemma perceived in the discovery of Web pages will become tangible and the searching for specific business applications or Web services becomes challenging and time consuming particularly as the number of UDDI Business Registries (UBRs) begins to multiply.In addition, decentralizing UBRs adds another level of complexity on how to effectively find Web services within these distributed registries. Decentralization of UBRs is becoming tangible as new operating systems, applications, and APIs are already equipped with built-in functionalities and tools that enable organizations or businesses to publish their own internal UBRs for intranet and extranet use such as the Enterprise UDDI Services in Windows Server 2003, WebShpere Application Server, Systinet Business Registry, jUDDI, to name a few. Enabling businesses or organizations to self-operate and mange their own UBRs will maximize the likelihood of having a significant increase in the number of business registries and therefore, clients will soon face the challenge of finding Web services across hundreds, if not thousands of UBRs. At the heart of the Service Oriented Architecture (SOA) is a service registry which connects and mediates service providers with clients as shown in Figure 1. Service registries extend the concept of an application-centric Web by allowing clients (or conceivably applications) to access a wide range of Web services that match specific search criteria in an autonomous manner. Without publishing Web services through registries, clients will not be able to locate services in an efficient manner, and service providers will have to devote extra efforts in advertising their services through other channels. There are several companies that offer Web-based Web service directories such as WebServiceList 1, RemoteMethods 2, WSIndex 3, and XM 4. However, due to the fact that these Web-based service directories fail to adhere to Web services standards such as UDDI, it is likely that they become vulnerable to being unreliable sources forfinding relevant Web services, and may become disconnected from the Web services environment as in the cases of BindingPoint and SalCentral which closed their Web-based Web service directories after many years of exposure. Apart from having Web-based service directories, there have been numerous efforts that attempted to improve the discovery of Web services 5,6,9,21, however, many of them have failed to address the issue of handling discovery operations across multiple UBRs. Due to the fact that UBRs are hosted on Web servers, they are dependent on network traffic and performance, and therefore, clients that are looking for appropriate Web services are susceptible to performance issues when carrying out multiple UBR search requests. To address the above-mentioned issues, this work introduces a framework that serves as the heart of our Web Services Repository Builder (WSRB) architecture 7 by enhancing the discovery of Web services without having any modifications to exiting standards. In this paper, we propose the Web Service Crawler Engine (WSCE) which actively crawls accessible UBRs and collects business and Web service information. Our architecture enables businesses and organizations to maintain autonomous control over their UBRs while allowing clients to perform search queries adapted to large-scale discovery of Web services. Our solution has been tested and results present high performance rates when compared with other existing models. The remainder of this paper is organized as follows. Section two discusses related work. Section three discusses some of the limitations with existing UBRs. Section four discusses the motivations for WSCE. Section five presents our Web service crawler engines architecture. Experiments and results are discussed in Section six, and finally conclusion and future work are discussed in Section seven. 2. Related Work Discovery of Web services is a fundamental area of research in ubiquitous computing. Many researchers have focused on discovering Web services through a centralized UDDI registry 8,9,10. Although centralized registries can provide effective methods for the discovery of Web services, they suffer from problems associated with having centralized systems such as single point of failure, and bottlenecks. In addition, other issues relating to the scalability of data replication, providing notifications to all subscribers when performing any system upgrades, and handling versioning of services from the same provider have driven researchers to find other alternatives. Other approaches focused on having multiple public/private registries grouped into registry federations 6,12 such as METEOR-S for enhancing the discovery process. METEOR-S provides a discovery mechanism for publishing Web services over federated registries but this solution does not provide the means for articulating advanced search techniques which are essential for locating appropriate business applications. In addition, having federated registry environments can potentially provide inconsistent policies to be employed which will have a significant impact on the practicability of conducting inquiries across them. Furthermore, federated registry environments will have increased configuration overhead, additional processing time, and poor performance in terms of execution time when performing service discovery operations. A desirable solution would be a Web services crawler engine such as WSCE that can facilitate the aggregation of Web service references, resources, and description documents, and can provide clients with a standard, universal access point for discovering Web services distributed across multiple registries. Several approaches focused on applying traditional Information Retrieval (IR) techniques or using keyword-based matching 13,14 which primarily depend on analyzing the frequency of terms. Other attempts focused on schema matching 15,16 which try to understand the meanings of the schemas and suggest any trends or patterns. Other approaches studied the use of supervised classification and unsupervised clustering of Web services 17, artificial neural networks 18, or using unsupervised matching at the operation level 19. Other approaches focused on the peer-to-peer framework architecture for service discovery and ranking 20, providing a conceptual model based on Web service reputation 21, and providing keyword-based search engine for querying Web services 22. However, many of these approaches provide a very limited set of search methods (i.e. search by business name, business location, etc.) and attempt to apply traditional IR techniques that may not be suitable for services discovery since Web services often contain or provide very brief textual description of what they offer. In addition, the Web services structure is complex and only a small portion of text is often provided. WSCE enhances the process of discovering Web services by providing advanced search capabilities for locating proper business applications across one or more UDDI registries and any other searchable repositories. In addition, WSCE allows for high performance and reliable discovery mechanism while current approaches are mainly dependent on external resources which in turn can significantly impact the ability to provide accurate and meaningful results. Furthermore, current techniques do not take into consideration the ability to predict, detect, recover from failures at the Web service host, or keep track of any dynamic updates or service changes. 3. UDDI Business Registries (UBRs) Business registries provide the foundation for the cataloging and classification of Web services and other additional components. A UDDI Business Registry (UBR) serves as a service directory for the publishing of technical information about Web services 23. The UDDI is an initiative originally backed up by several technology companies including Microsoft, IBM, and Ariba 24 and aims at providing a focal point where all businesses, including their Web services meet together in an open and platform-independent framework. Hundreds of other companies have endorsed the UDDI initiative including HP, Intel, Fujitsu, BEA, Oracle, SAP, Nortel Networks, WebMethods, Andersen Consulting, Sun Microsystems, to name a few. E-Business XML (ebXML) is another service registry standard that focuses more on the collaboration between businesses 27. Although commonalities between UDDI and ebXML registries present opportunities for interoperability between them 26, the UDDI remains the de facto industry standard for Web service discovery 21. Although the UDDI provides ways for locating businesses and how to interface with them electronically, it is limited to a single search criterion. Keyword-based search techniques offered by UDDI will make it impractical to assume that it can be very useful for Web services discovery or composition. In addition, a client does not have to endlessly search UBRs for finding an appropriate Web service. As Web services proliferate and the number of UBRs increases, limited search capabilities are likely to yield less meaningful search results which makes the task of performing search queries across one or multiple UBRs very time consuming, and less productive. 3.1. Limitations with Current UDDI Apart from the problems regarding limited search capabilities offered by UDDI, there are other major limitations and shortcomings with the existing UDDI standard. Some of these limitations include: (1) UDDI was intended to be used only for Web services discovery; (2) UDDI registration is voluntary, and therefore, it risks becoming passive; (3) UDDI does not provide any guarantees to the validity and quality of information it contains; (4) the disconnection between UDDI and the current Web; (5) UDDI is incapable of providing Quality of Service (QoS) measurements for registered Web services, which can provide helpful information to clients when choosing appropriate Web services, (6) UDDI does not clearly define how service providers can advertise pricing models; and (7) UDDI does not maintain nor provide any Web service life-cycle information (i.e. Web services across stages). Other limitations with the current UDDI standard 23 are shown in Table 1. Although the UDDI has been the de facto industry standard for Web services discovery, the ability to find a scalable solution for handling significant amounts of data from multiple UBRs at a large-scale is becoming a critical issue. Furthermore, the search time when searching one or multiple UDDI registries (i.e. meta-discovery) raises several concerns in terms performance, efficiency, reliability and the quality of returned results. 4. Motivations for WSCE Web services are syntactically described using the Web Service Description Language (WSDL) which concentrates on describing Web services at the functional level. A more elaborate business-centric model for Web services is provided by the UDDI which allows businesses to create many-to-many partnership relationships and serves as a focal point where all businesses of all sizes can meet together in an open and a global framework. Although there have been numerous standards that support the description and discovery of Web services, combining these sources of information in a simple manner for clients to apprehend and use is not currently present. In order for clients to search or invoke services, first they have to manually perform search queries to an existing UBR based on a primitive keyword-based technique, loop through returned results, extract binding information (i.e. through bindingTemplates or via WSDL access points), and manually examine their technical details. In this case, clients have to manually collect Web service information from different types of resources which may not be a reliable approach for collecting information about Web services. What is therefore desirable is a Web services crawler engine such as WSCE that facilitates the aggregation of Web service references, resources, and description documents and provides a well defined access pattern of usages on how to discover Web services. WSCE facilitates the establishment of a Web services search engine in which service providers will have enough visibility for their services, and at the same time clients will have the appropriate tools for performing advanced search queries. The crucial design of WSCE is motivated by several factors including: (1) the inability to periodically keep track of business and Web service life-cycle using existing UDDI design, which can provide extremely helpful information serving as the basis for documenting Web services across stages; (2) the inherent search criterion offered by UDDI inquiry API which would not be beneficial for finding services of interest; (3) the apparent disconnection between UBRs from the existing Web; and (4) performance issues with real-time search queries across multiple UBRs which will eventually become very time consuming as the number of UBRs increases while UDDI clients may not have the potential of searching all accessible UBRs. Other factors of motivation will become apparent as we introduce WSCE. 5. Web Service Crawler Engine (WSCE) The Web Service Crawler Engine (WSCE) is part of the Web Services Repository Builder (WSRB) 7,11 in which it actively crawls accessible UBRs, and collects information into a centralized repository called Web Service Storage (WSS). The discovery of Web services in principle can be achieved through a number of approaches. Resources that can be used to collect Web service information may vary but all serve as an aggregate for Web service information. Prior to explaining the details of WSCE, it is important to discuss current Web service data resources that can be used for implementing WSCE. 5.1. Web Service Resources Finding information about Web services is not strictly tied to UBRs. There are other standards that support the description, discovery of businesses, organizations, service providers, and their Web services which they make available, while interfaces that contain technical details are used for allowing the proper access to those services. For example, WSDL describes message operations, network protocols, and access points to addresses used by Web services; XML Schemas describe the grammatical XML structure sent and received by Web services; WS-Policy describes general features, requirements, and capabilities of Web services; UDDI business registries describe a more business-centric model for publishing Web services; WSDL-Semantics (WSDL-S) uses semantic annotations that define meaning of inputs, outputs, preconditions, and effects of operations described by an interface. WSCE uses UBRs and WSDL files as data sources for collecting information into the Web Service Storage (WSS) since they contain the necessary information for describing and discovering Web services. We investigated several methods for obtaining Web service information and findings are summarized below: Web-based: Web-based crawling involved using an existing search engine API to discover WSDL files across the Web such as Google SOAP Search API. Unfortunately, a considerable amount of WSDL files crawled over the Web did not contain descriptions of what these Web services have to offer. In addition, a large amount of crawled Web services contain outdated, passive, or incomplete information. About 340 Web services were collected using this method and only 23% of the collected WSDL files contained an adequate level of documentation. File Sharing: File sharing tools such as Kazaa and Emule provide search capability by file types. A test was performed by extracting WSDL files using these file sharing tools, and approximately 56 Web services were collected. Unfortunately, peer

人人文库> 全部分类> 教育资料 > 辅导培训

温馨提示

1. 本站所有资源如无特殊说明，都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
2. 本站的文档不包含任何第三方提供的附件图纸等，如果需要附件，请联系上传者。文件的所有权益归上传用户所有。
3. 本站RAR压缩包中若带图纸，网页内容里面会有图纸预览，若没有图纸预览就没有图纸。
4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
5. 人人文库网仅提供信息存储空间，仅对用户上传内容的表现方式做保护处理，对用户上传分享的文档内容本身不做任何修改或编辑，并不能对任何下载内容负责。
6. 下载文件中如有侵权或不适当内容，请与我们联系，我们立即纠正。
7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

外文翻译网络服务的爬虫引擎

文档简介

温馨提示

最新文档

评论

外文翻译网络服务的爬虫引擎

文档简介

温馨提示

最新文档

评论

相关文档