iccv2019论文全集9596-cormorant-covariant-molecular-neural-networks_第1页
iccv2019论文全集9596-cormorant-covariant-molecular-neural-networks_第2页
iccv2019论文全集9596-cormorant-covariant-molecular-neural-networks_第3页
iccv2019论文全集9596-cormorant-covariant-molecular-neural-networks_第4页
iccv2019论文全集9596-cormorant-covariant-molecular-neural-networks_第5页
已阅读5页,还剩5页未读 继续免费阅读

下载本文档

版权说明:本文档由用户提供并上传,收益归属内容提供方,若内容存在侵权,请进行举报或认领

文档简介

1、Cormorant: Covariant Molecular Neural Networks Brandon Anderson, Truong-Son Hyand Risi Kondor Department of Computer Science,Department of Statistics The University of Chicago Center for Computational Mathematics, Flatiron Institute Atomwise hytruongson, Abst

2、ract We propose Cormorant, a rotationally covariant neural network architecture for learning the behavior and properties of complex many-body physical systems. We apply these networks to molecular systems with two goals: learning atomic potential energy surfaces for use in Molecular Dynamics simulat

3、ions, and learn- ing ground state properties of molecules calculated by Density Functional Theory. Some of the key features of our network are that (a) each neuron explicitly corre- sponds to a subset of atoms; (b) the activation of each neuron is covariant to rota- tions, ensuringthatoverallthenetw

4、orkisfullyrotationallyinvariant. Furthermore, the non-linearity in our network is based upon tensor products and the Clebsch- Gordan decomposition, allowing the network to operate entirely in Fourier space. Cormorant signifi cantly outperforms competing algorithms in learning molecular Potential Ene

5、rgy Surfaces from conformational geometries in the MD-17 dataset, and is competitive with other methods at learning geometric, energetic, electronic, and thermodynamic properties of molecules on the GDB-9 dataset. 1Introduction In principle, quantum mechanics provides a perfect description of the fo

6、rces governing the behavior of atoms, molecules and crystalline materials such as metals. However, for systems larger than a few dozen atoms, solving the Schrdinger equation explicitly at every timestep is not a feasible proposition on present day computers. Even Density Functional Theory (DFT) Hohe

7、nberg and Kohn, 1964, a widely used approximation to the equations of quantum mechanics, has trouble scaling to more than a few hundred atoms. Consequently, the majority of practical work in molecular dynamics today falls back on fundamen- tally classical models, where the atoms are essentially trea

8、ted as solid balls and the forces between them are given by pre-defi ned formulae called atomic force fi elds or empirical potentials, such as the CHARMM family of models Brooks et al., 1983, 2009. There has been a widespread real- ization that this approach has inherent limitations, so in recent ye

9、ars a burgeoning community has formed around trying to use machine learning to learn more descriptive force fi elds directly from DFT computations Behler and Parrinello, 2007, Bartk et al., 2010, Rupp et al., 2012, Shapeev, 2015, Chmiela et al., 2016, Zhang et al., 2018, Schtt et al., 2017, Hirn et

10、al., 2017. More broadly, there is considerable interest in using ML methods not just for learning force fi elds, but also for predicting many other physical/chemical properties of atomic systems across different branches of materials science, chemistry and pharmacology Montavon et al., 2013, Gilmer

11、et al., 2017, Smith et al., 2017, Yao et al., 2018. At the same time, there have been signifi cant advances in our understanding of the equivariance and covariance properties of neural networks, starting with Cohen and Welling, 2016a,b in the 33rd Conference on Neural Information Processing Systems

12、(NeurIPS 2019), Vancouver, Canada. context of traditional convolutional neural nets (CNNs). Similar ideas underly generalizations of CNNs to manifolds Masci et al., 2015, Monti et al., 2016, Bronstein et al., 2017 and graphs Bruna et al., 2014, Henaff et al., 2015. In the context of CNNs on the sphe

13、re, Cohen et al. 2018 real- ized the advantage of using “Fourier space” activations, i.e., expressing the activations of neurons in a basis defi ned by the irreducible representations of the underlying symmetry group (see also Esteves et al., 2017), and these ideas were later generalized to the enti

14、re SE(3) group Weiler et al., 2018. Kondor and Trivedi 2018 gave a complete characterization of what operations are allowable in Fourier space neural networks to preserve covariance, and Cohen et al generalized the framework even further to arbitrary gauge fi elds Cohen et al., 2019. There have also

15、 been some recent works where even the nonlinear part of the neural networks operation is performed in Fourier space: independently of each other Thomas et al., 2018 and Kondor, 2018 were to fi rst to use the ClebschGordan transform inside rotationally covariant neural networks for learning physical

16、 systems, while Kondor et al., 2018 showed that in spherical CNNs the ClebschGordan transform is suffi cient to serve as the sole source of nonlinearity. The Cormorant neural network architecture proposed in the present paper combines some of the insights gained from the various force fi eld and pot

17、ential learning efforts with the emerging theory of Fourier space covariant/equivariant neural networks. The important point that we stress in the following pages is that by setting up the network in such a way that each neuron corresponds to an actual set of physical atoms, and that each activation

18、 is covariant to symmetries (rotation and translation), we get a network in which the “laws” that individual neurons learn resemble known physical interactions. Our experiments show that this generality pays off in terms of performance on standard benchmark datasets. 2The nature of physical interact

19、ions in molecules Ultimately interactions in molecular systems arise from the quantum structure of electron clouds around constituent atoms. However, from a chemical point of view, effective atom-atom interactions break down into a few simple classes based upon symmetry. Here we review a few of thes

20、e classes in the context of the multipole expansion, whose structure will inform the design of our neural network. Scalar interactions.The simplest type of physical interaction is that between two particles that are pointlike and have no internal directional degrees of freedom, such as spin or dipol

21、e moments. A classical example is the electrostatic attraction/repulsion between two charges described by the Coulomb energy VC= 1 4?0 qAqB |rAB| .(1) Here qAand qBare the charges of the two particles, rAand rBare their position vectors, rAB= rArB, and ?0 is a universal constant. Note that this equa

22、tion already refl ects symmetries: the fact that (1) only depends on the length of rABand not its direction or the position vectors individually guarantees that the potential is invariant under both translations and rotations. Dipole/dipole interactions.One step up from the scalar case is the intera

23、ction between two dipoles. In general, the electrostatic dipole moment of a set of N charged particles relative to their center of mass r is just the fi rst moment of their position vectors weighted by their charges: = N X i=1 qi(ri r). The dipole/dipole contribution to the electrostatic potential e

24、nergy between two sets of particles A and B separated by a vector rABis then given by Vd/d= 1 4?0 ? A B |rAB|3 3 (A rAB)(B rAB) |rAB|5 ? .(2) One reason why dipole/dipole interactions are indispensible for capturing the energetics of molecules is that most chemical bonds are polarized. However, dipo

25、le/dipole interactions also occur in other contexts, such as the interaction between the magnetic spins of electrons. 2 Quadropole/quadropole interactions.One more step up the multipole hierarchy is the interac- tion between quadropole moments. In the electrostatic case, the quadropole moment is the

26、 second moment of the charge density (corrected to remove the trace), described by the matrix = N X i=1 qi(3rir i |ri|2I). Quadropole/quadropole interactions appear for example when describing the interaction between benzene rings, but the general formula for the corresponding potential is quite com

27、plicated. As a simplifi cation, let us only consider the special case when in some coordinate system aligned with the structure of A, and at polar angle (A,A) relative to the vector rABconnecting A and B, A can be transformed into a form that is diagonal, with Azz= Aand Axx=Ayy=A/2 Stone, 1997. We m

28、ake a similar assumption about the quadropole moment of B. In this case the interaction energy becomes Vq/q= 3 4 AB 4?0|rAB|5 ?1 5cos A5cos 2B 15cos2Acos2B+ 2(4cosAB sinAsinBcos(AB)2?.(3) Higher order interactions involve moment tensors of order 3,4,5, and so on. One can appreciate that the correspo

29、nding formulae, especially when considering not just electrostatics but other types of interactions as well (dispersion, exchange interaction, etc), quickly become very involved. 3Spherical tensors and representation theory Fortunately, there is an alternative formalism for expressing molecular inte

30、ractions, that of spherical tensors, which makes the general form of physically allowable interactions more transparent. This formalism also forms the basis of the our Cormorant networks described in the next section. The key to spherical tensors is understanding how physical quantities transform un

31、der rotations. Specifi cally, in our case, under a rotation R: q 7 q 7 R 7 RRrAB7 RrAB. Flattening into a vector R9, its transformation rule can equivalently be written as 7 (RR), showing its similarity to the other three cases. In general, a kth order Carte- sian moment tensor T(k)R33.3 (or its fl

32、attened T(k) R3kequivalent) transforms as T(k)7 (RR. R)T(k). Recall that given a group G, a representation of G is a matrix valued function : G Cdd obeying (xy) = (x)(y) for any two group elements x,y G. It is easy to see that R, and consequently R . R are representations of the three dimensional ro

33、tation group SO(3). We also know that because SO(3) is a compact group, it has a countable sequence of unitary so-called irreducible representations (irreps), and, up to a similarity transformation, any representation can be reduced to a direct sum of irreps. In the specifi c case of SO(3), the irre

34、ps are called Wigner D-matrices and for any positive integer = 0,1,2,. there is a single corresponding irrep D(R), which is a (2+1) dimensional representation (i.e., as a function, D: SO(3) C(2+1)(2+1). The =0 irrep is the trivial irrep D0(R)=(1). The above imply that there is a fi xed unitary trans

35、formation matrix C(k)which reduces the kth order rotation operator into a direct sum of irreducible representations: RR. R |z k = C(k) hM M i=1 D(R) i C(k) . Note that the transformation RR. R contains redundant copies of D(R), which we de- note as the multiplicites . For our present purposes knowin

36、g the actual values of the is not that important, except that k= 1 and that for any k, = 0. What is important is that T(k), the vectorized form of the Cartesian moment tensor has a corresponding decomposition T(k)= C(k) hM M i=1 Q,i i .(4) 3 This is nice, because using the unitarity of Qi, it shows

37、that under rotations the individual Q,i components transform independently as Q,i7 D(R)Q,i. What we have just described is a form of generalized Fourier analysis applied to the transforma- tion of Cartesian tensors under rotations. For the electrostatic multipole problem it is particularly relevant,

38、 because it turns out that in that case, due to symmetries of T(k), the only nonzero Q,icom- ponent of (4) is the single one with = k. Furthermore, for a set of N charged particles (indexing its components ,.,) Qhas the simple form Qm= ? 4 2+1 ?1/2 N X i=1 qi(ri)Y m (i,i)m = ,.,(5) where (ri,i,i) ar

39、e the coordinates of the ith particle in spherical polars, and the Y m (,) are the well known spherical harmonic functions. Qis called the th spherical moment of the charge distribution. Note that while T()and Qconvey exactly the same information, T()is a tensor with 3components, while Qis just a (2

40、+1) dimensional vector. Somewhat confusingly, in physics and chemistry any quantity U that transforms under rotations as U 7D(R)U is often called an (th order) spherical tensor, despite the fact that in terms of its presentation Qis just a vector of 2+1 numbers. Also note that since D0(R) = (1), a z

41、eroth order spherical tensor is just a scalar. A fi rst order spherical tensor, on the other hand, can be used to represent a spatial vector r =(r,) by setting U1m= rY m 1 (,). 3.1The general form of interactions The benefi t of the spherical tensor formalism is that it makes it very clear how each

42、part of a given physical equation transforms under rotations. For example, if Qand e Qare two th order spherical tensors, then Q e Qis a scalar, since under a rotation R, by the unitarity of the Wigner D-matrices, Q e Q7 (D(R)Q)(D(R) e Q) = Q (D(R)D(R) e Q= Q e Q. Even the dipole/dipole interaction

43、(2) requires a more sophisticated way of coupling spherical ten- sors than this, since it involves non-trivial interactions between not just two, but three different quan- tites: the two dipole moments Aand Band the the relative position vector rAB. Representing interactions of this type requires ta

44、king tensor products of the constituent variables. For example, in the dipole/dipole case we need terms of the form QA 1 QB 2. Naturally, these will transform according to the tensor product of the corresponding irreps: QA 1Q B 2 7 (D1(R)D2(R)(QA 1Q B 2). In general, D1(R)D2(R) is not an irreducible

45、 representation. However it does have a well studied decomposition into irreducibles, called the ClebschGordan decomposition: D1(R)D2(R) = C 1,2 ? 1+2 M =|12| D(R) ? C1,2. Letting C1,2,C(2+1)(21+1)(22+2)be the block of 2+1 rows in C1,2corresponding to the component of the direct sum, we see that C1,

46、2,(QA 1Q B 2) is an th order spherical tensor. In particular, given some other spherical tensor quantity U, U C1,2, (QA 1Q B 2) is a scalar, and hence it is a candidate for being a term in the potential energy. Note the similarity of this expression to the bispectrum Kakarala, 1992, Bendory et al.,

47、2018, which is an already established tool in the force fi eld learning literature Bartk et al., 2013. Almost any rotation invariant interaction potential can be expressed in terms of iterated Clebsch Gordan products between spherical tensors. In particular, the full electrostatic energy between two

48、 sets of charges A and B separated by a vector r = (r,) expressed in multipole form Jackson, 1999 is VAB= 1 4?0 X =0 X 0=0 s? 2 + 20 2 ?r 4 2+20+ 1 r(+ 0+1) Y+0(,)C1,2,+0(QA QB 0). (6) 4 Note the generality of this formula: the = 0= 1 case covers the dipole/dipole interaction (2), the = 0= 2 case co

49、vers the quadropole/quadropole interaction (3), while the other terms cover every other possible type of multipole/multipole interaction. Magnetic and other types of interac- tions, including interactions that involve 3-way or higher order terms, can also be recovered from appropriate combinations o

50、f tensor products and ClebschGordan decompositions. We emphasize that our discussion of electrostatics is only intended to illustrate the algebraic struc- ture of interatomic interactions of any type, and is not restricted to electrostatics. In what follows, we will not explicitly specify what inter

51、actions the network will learn. Nevertheless, there are physical constraints on the interactions arising from symmetries, which we explicitly impose in our design of Cormorant. 4 CORMORANT: COvaRiant MOleculaR Artifi cial Neural neTworks The goal of using ML in molecular problems is not to encode kn

52、own physical laws, but to provide a platform for learning interactions from data that cannot easily be captured in a simple formula. Nonetheless, the mathematical structure of known physical laws, like those discussed in the previ- ous sections, give strong hints about how to represent physical inte

53、ractions in algorithms. In partic- ular, when using machine learning to learn molecular potentials or similar rotation and translation invariant physical quantities, it is essential to make sure that the algorithm respects these invariances. Our Cormorant neural network has invariance to rotations b

54、aked into its architecture in a way that is similar to the physical equations of the previous section: the internal activations are all spherical tensors, which are then combined at the top of the network in such a way as to guarantee that the fi nal output is a scalar (i.e., is invariant). However,

55、 to allow the network to learn interactions that are more complicated than classical interatomic forces, we allow each neuron to output not just a single spherical tensor, but a combination of spherical tensors of different orders. We will call an object consisting of 0scalar components, 1 component

56、s transforming as fi rst order spherical tensors, 2 components transforming as second order spherical tensors, and so on, an SO(3)covariant vector of type (0,1,2 ,.). The output of each neuron in Cormorant is an SO(3)vector of a fi xed type. Defi nition 1. We say that F is an SO(3)-covariant vector

57、of type = (0,1,2,.,L) if it can be written as a collection of complex matrices F0,F1,.,FL, called its isotypic parts, where each Fis a matrix of size (2+1)and transforms under rotations as F7 D(R)F. The second important feature of our architecture is that each neuron corresponds to either a single a

58、tom or a set of atoms forming a physically meaningful subset of the system at hand, for example all atoms in a ball of a given radius. This condition helps encourage the network to learn physically meaningful and interpretable interactions. The high level defi nition of Cormorant nets is as follows.

59、 Defi nition 2. Let S be a molecule or other physical system consisting of N atoms. A “Cormorant” covariant molecular neural network for S is a feed forward neural network consisting of m neurons n1,.,nm, such that C1. Every neuron nicorresponds to some subset Siof the atoms. In particular, each input neuron corresponds to a single atom. Each output neuron corresponds to the entire system S. C2. The activation of each ni is an SO(3)-vector of a fi xed type i. C3. The type of each output neuron is out=(1), i.e., a scalar.

温馨提示

  • 1. 本站所有资源如无特殊说明,都需要本地电脑安装OFFICE2007和PDF阅读器。图纸软件为CAD,CAXA,PROE,UG,SolidWorks等.压缩文件请下载最新的WinRAR软件解压。
  • 2. 本站的文档不包含任何第三方提供的附件图纸等,如果需要附件,请联系上传者。文件的所有权益归上传用户所有。
  • 3. 本站RAR压缩包中若带图纸,网页内容里面会有图纸预览,若没有图纸预览就没有图纸。
  • 4. 未经权益所有人同意不得将文件中的内容挪作商业或盈利用途。
  • 5. 人人文库网仅提供信息存储空间,仅对用户上传内容的表现方式做保护处理,对用户上传分享的文档内容本身不做任何修改或编辑,并不能对任何下载内容负责。
  • 6. 下载文件中如有侵权或不适当内容,请与我们联系,我们立即纠正。
  • 7. 本站不保证下载资源的准确性、安全性和完整性, 同时也不承担用户因使用这些下载资源对自己和他人造成任何形式的伤害或损失。

最新文档

评论

0/150

提交评论