生物信息学课件英文原版课件 (117)_第1页
生物信息学课件英文原版课件 (117)_第2页
生物信息学课件英文原版课件 (117)_第3页
生物信息学课件英文原版课件 (117)_第4页
生物信息学课件英文原版课件 (117)_第5页
已阅读5页,还剩36页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

An overview of my past and present activitiesPossible collaborations and synergies with research at The University of Newcastle,Pablo Moscato Department of Computing and AutomationandDepartment of Systems Engineering FEEC - UNICAMPwww.densis.fee.unicamp.br/moscato,About myself whereabouts,Dec. 1987 Graduation (Physics, 5 years)Physics (theoretical)Degree thesis work: t-expansion Sep. 1988 Oct. 1989Full-time Visiting Graduate Student in Physics at the California Institute of Technology (CALTECH)Member of the Core Research Group of CALTECH Concurrent Computation Program Worked in: neural networks, parallel combinatorial optimization, mathematical physics, Simulated Annealing, and Memetic Algorithms.,About myself,Nov. 1989 Jun. 1996Center for Analog and Digital Techniques Universidad Nacional de La Plata, Argentina(several research positions)Worked in: combinatorial optimizationparallel computing (transputer-based machines)off-line character recognitionimage segmentation/edge recognitionanalysis of algoritms / Fractal instances approachJuly. 1995 June 1996Visiting Professor (UNPCBA, Tandil, Argentina)Parallel Computing (semestral course, 5th year, undergrad.),About myself,July 1996 Jan. 1997Visiting Professor / ResearcherDepartment of Systems Engineering (UNICAMP, Brazil)Taught a post-graduate courseMetaheuristics for combinatorial optimization problemsMarch 1997 March 2001DENSIS UNICAMP, BrazilWork: Combinatorial OptimizationProduction Planning, Scheduling, Traveling Salesman Problem (asymmetric case), Fractal instances, etc. Approximability and Parameterized ComplexityDesign of OO frameworks, Java, parallelization issues.,About myself,March 2001 PhD thesis defended (EE-Automation) Nov. 2001Our Bioinformatics proposal got accepted in a brazilean nation-wide contest by the CNPqStarted organizing our new group/bioinfo course.Jan. 2002 presentAnalysis of gene expression data,Construction of optimal phylogenetic trees from distance matrices using mtDNA data, Parameterized complexity of feature selection problemsContinued work in other areas (metaheuristics for combinatorial optimization, production planning, etc.)Appointed Professor (MS-3) at the Institute of Computing, UNICAMP (March 5, 2002).,Research Interests,Design and Analysis of Algorithms,Graph Theory, massive datasets and large-scale non-linear and combinatorial optimization problems,Bioinformatics and Systems Biology. Metaheuristics, Parallel and Scientific Computing.Parameterized Complexity and design of metaheuristics.Logic, Modal Logic and multi-agent modal logics of belief.Memetic Algorithms, Theory and Applications.,Research areas of past activities - Highlights,Mathematical Physics Computer Vision Bioinformatics Parallel Computation, etc. Metaheuristics, in particular Memetic Algorithms (I coined the name in 1989)Genetic Algorithms Simulated Annealing (studied the properties of a deterministic update in 1989, later popularized as Threshold Accepting)Tabu Search Analysis of algorithms and heuristicsIntroduced a comparative analysis (1995-99) for recombination algorithms (worst-case based) for the graph coloring problem. Introduced the Evolutionary Analysis of Algorithms (2001), an approach to find a lower bound on the complexity of particular algorithms by searching for worst-case instances of fixed size.,Research areas of past activities,Analysis of algorithms and heuristics (cont)Introduced Fractal Instances of the TSP (1994-98) to analyse the behaviour of heuristics using a family of instances sharing a property. Proposed Fitness Landscape Analysis (1989-93) to help clarify if a particular representation of a problem is suitable to evolutionary and other metaheuristic search techniques.Combinatorial optimizationTraveling salesman problem (both symmetric and asymmetric)Quadratic Assignment ProblemBoolean Perceptron Learning Graph coloring Minimum weight k-cardinality sub-tree problemNumber partitioning,Central objective of my thesis: NP Optimization Problems, Approximability and Evolutionary Computation:From Practice to Theory,To answer the following question: “”Is there a systematic way to design eficient algorithms based on the principles of Evolutionary Search such that they will always offer good approximate solutions for optimization problems for which is known that is hard to find the global optimal solution ?”,Methodological Proposal:,To reach the desired goal through three research directionsFirst direction“Identify NP optimization problems for which the paradigm of Evolutionary Search has been proved not competitive in comparison with the best exact or approximation algorithms and the best known heuristics that use other paradigms”“Min Number Partitioning” - Chapter 5Weakly NP-complete,Methodological Proposal :,Second Research direction“To identify the problems for which the Evolutionary Search strategy has proven to be a good alternative and to try to identify the reasons for the success”Asymmetric Travelling Salesman Problem - Cap. 4Strongly NP-completeSeveral applications of memetic algorithms - Cap. 2Class of Polynomial Merger Algorithms - Cap. 7,Methodological Proposal:,Third research direction“For the problems that have been identified in the second research direction, it is important to find links with the Theory of Computational Complexity, and the complexity classes (regarding approximability) to which these problems belong.” Approximability and Parameterized ComplexityChapters 3 e 6,Approximability,Definition: A maximization problem (Max P), is approximable in (1+) if there exists a polynomial-time algorithm A such that, for any input x of Max P,optMax P(x)/A(x) 1+ Question: “Given an optimization problem Max P, which is the smallest value of for which Max P is (1 + )-approximable in polynomial-time ?”Practical interest ? Approximation algorithms offer a performance guarantee. Unfortunately, the guarantee often is a too high value of .,Approximability, classes PTAS and FPTAS,Definition: a problem belongs to class PTAS if there exists a polynomial-time algorithm that gives an approximate solution for all 0, that is, there exists an infinite sequence of algorithms A() named a “polynomial time approximation scheme”. Definition: a problem belongs to class FPTAS if it belongs to PTAS and if the execution time of each A() is polynomial in relationship to the instance size and 1/. The infinite sequence A() is named “a fully polynomial-time approximation scheme”.Theoretical interest ? Many NP optimization problems belong to PTAS and some of them to FPTAS. The Knapsack problem is one of them. Sometimes the fptas algorithms can be very useful in practice.,Theoretical interest of approximability,The degree of hardness of approximability might be used to classify the NP optimization problems.Very hard NPO PB-completeHardMax Independent SetUp to a constantAPX-complete“Easy”PTAS “Very easy”FPTAS,Dynamic and Evolutionary Programming,Dynamic Programming (Cormen, Leiserson & Rivest defns.) “solves (typycally optimization) problems by combining the solutions to subproblems” “applicable when the subproblems are not independent (ie. when subproblems share subsubproblems)”“Programming refers to a tabular form”,Dynamic and Evolutionary Programming,Evolutionary Programming solves (typycally optimization) problems by combining features of complete solutions to create new populations of solutions applicable when it is “hard” or “unreasonable” to try to completely identify a subproblem hierarchical structure or to approach the problem via an exact approach.Programming does not refer to a tabular form,Recombination,Purpose: to combine features of feasible solutions already visited in order to provide new potential candidate solutions with better objective function value. Mechanism that restarts the search by “exploring the space “between solutions.,offspring,parents,0 0 0 0 0 0 0,1 1 1 1 1 1 1,Components,Purpose: to introduce new characteristics in the population by random modifications.Explores the “neighborhood” of a solution.Random Local search.,Mutation,GAs and MAs,For GAs: population group of “living organismsFor MAs: population group with knowledge of solution methods for the problem. (and the class !) Knowledge means. proper and rational reuse of:Heuristics (constructive and iterative improvement)Approximation Algorithms - PTAS - FPTASExact methods (BnB - truncated BnB - Branch and Cut)Properties of the best or optimal solution(s)Transformations and reductions.Recombination has a more general character than GAs,Basic Memetic Algorithm,Cultural Evolution,Shared characteristics are not inherited due to simple processes of recombination of previous solutionsExamples: Formula 1 (technological evolution)Martial Arts The use of totally random mutation has a significantly less important rle. Heavy use of historical information and an external logic to speed-up the process.,Memetic Algorithms for combinatorial optimization problems,Meme: word introduced by Richard Dawkins in his best-seller book “The Selfish Gene (76). unit of imitation, analogous rle of gene but in the field of cultural evolution.Memetic Algorithms (MAs, Moscato 89): caracterization of evolutionary algorithms that can hardly fit the GAs methaphor with small or no relation with biology hybrid GAs MAs Scatter Search (Glover, 77) MAs,Comparative Glossary,In GAs we generally defineIndividual (genotype, cromossome, structure, string, etc.)chain of genes that codifies for a given configuration/solution of the problem at hand.coding (several ways, i.e. general rules, kind of “art )In MAs we prefer to use the word Agent we code not only for a confg./solution., we also include resolution methods for the problem (and problem class) we wish to solve.Use of multiple representations (cultural evolution).,The traveling salesman problem (from Chvatals page),1800-1900: first descriptions of the problem; 1920-1930: problem becomes well defined; 1940-50: it starts to be recognized as “hard”; 1954: an instance with 42 cities is solved to optimality.,532 cities from the United States (1987): att532,Optimal solution for instance pcb3038,Largest instance solved to optimality (1998) : usa13509,PocketPropagation() Agents use three different heuristics (2-opt, one-city insertion, e two-city insertion ). Solutions get modified when they go up the hierarchy example: agents 10 - 3 - 1.Transputers (constraint and inspiration),solution pocket/fitnesssolution current/fitness,Moscato and Tinetti ( 92, MA for the TSP),Thesis conclusions.,Hybridization in MAs should continue due to the good results in several application areasComplete memetic algorithms - good research direction. The approximability of an NP optimization problem appears to be uncorrelated with the “hardness” of being solved in practice with evolutionary programming techniques. Parameterized complexity appears to be a better paradigm to study the complexity of recombination problems arising in memetic algorithms. The “FPT Toolkit” of Downey & Fellows can be sometimes reused to solve to optimality the “dynastically optimal” problem in recombination and they can be useful in practice.,. And more thesis conclusions.,Possibility of establishing a solid computational complexity theory of recombination problems Classes uPMA and PMA.Computational complexity of multi-parent recombination (. Links to genetic engineering issues ?).Benefits of using methods from Modal Logic and Multi-Agent Belief Logic for the introduction of problem (and instance) dependent knowledge. Possibility of using the generated “on-line” knowledge to guide exact and GRASP-like algorithms by taking advantage of the learning processes in each agent. Use of Logic Programming and belief update methods to update the populations shared knowledge.,Synergies with CS at Newcastle,Possibility of establishing several interactions with many members of the FacultyA Memetic Algorithm Guided by Quicksort for the Error-Correcting Graph Isomorphism Problem R. Torres-Velzques and V. Estivill-Castrofrom the Knowledge Discovery and Data Mining LabSubjects of synergies may be: Evolutionary Computation, ( and of course MAs)Applications of KDD techniques in Bioinformatics (clustering, subgraph isomorphism, exploratory data analysis of massive data sets, data mining, classification and clustering, etc.),Synergies with CS at Newcastle,”System identification is a field where one tries to intelligently determine a mathematical model for the global behaviour of a system on the basis of observing its response around a local operating point for a fixed period of time.”. from a web page at ..au Genetic network inference problems from microarray data are pretty much a system idenfication problem ! Here the synergy may be on establishing a common framework of techniques applied in Control and System Theory and in conjunction with algorithms more akin to CS research (i.e. based on Graph Theory and Graph Optimization methods). AI + KDD + SI techniques + Graph Theory and Graph Optimization - as a systematic integrated approach to treat problems in Bioinformatics,k-Feature Set,Input: A set X of examples, linear binary array of n features and a binary label assigned to each of them, positive integer k 0. Question: Is there a subset of features S such that:S 1, ., n ,| S | = k, No pair of examples in X have the same values for the features in S but have different values for the binary label characteristic.,One-out k-Feature Set,Input: Same as k-FS, but restricted to the case in which there is only one example differently labelled. Exemple: 0,1,1,1,1 | good 1,1,0,0,1 | good0,0,0,0,1, | good 1,0,0,1,0 | good0,0,1,1,0 | good0,0,1,0,0 | good 0,1,0,1,0 | bad Analizar: F1 = 1,3,5, F2 = 2,3,4, e F3 = 2,5,Parameterized Complexity,R. Downey, M. Fellows A new classification of problems is necessary. Reductions that are normally used to prove that a problem is NP-complete generally do not preserve certain structural properties. Importance of parameters present in real-world instance of interest.Definio: A parameterized problem is a pair , where x is an instance and k 0 a constant, is said to be fixed-parameter tractable if there exists and algorithm that solves the problem in time O(f(k) |x|) where |x| is a measure of the size of x, and f(k) an arbitrary function of k only, and a constant independent of k and n.,k-Vertex Cover,Example of a problem in FPT Instance: a graph G(V,E) and an integer k 0.Question: Is there a set V V, such that for any edge (u,v) E, at least u or v is a member of V and |V | k ?Decision problem is NP-complete in the strong sense. Approximability of the optimization problem ? During at least two decades the best approximation algorithm had been one with = 1, (gap of 100 %). Recently is has been proved that there is no approximation algorithm with 1/6 (under the P NP conjecture).,Parameterized Complexity results,Fellows & Langston, 1986: O(f(k) n3).Johnson, 87: O(f(k) n2). Fellows, 88: O(2k n).Buss, 89: O(kn + 2k k2k+2 ).Balasubramanian et al., 92: O(kn + 2k k2 ). Papadimitriou & Yannakakis, 93: O(3k n).Downey, Fellows, & Stege, 99: O(kn + 1.31951k

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

评论

0/150

提交评论