KAN: Kolmogorov–Arnold Networks

Ziming Liu

Abstract


Inspired by the Kolmogorov-Arnold representation theorem, we propose KolmogorovArnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).
While MLPs have fixed activation functions on nodes (“neurons”), KANs have learnable
activation functions on edges (“weights”). KANs have no linear weights at all – every
weight parameter is replaced by a univariate function parametrized as a spline. We show
that this seemingly simple change makes KANs outperform MLPs in terms of accuracy
and interpretability, on small-scale AI + Science tasks. For accuracy, smaller KANs can
achieve comparable or better accuracy than larger MLPs in function fitting tasks. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users.
Through two examples in mathematics and physics, KANs are shown to be useful “collaborators” helping scientists (re)discover mathematical and physical laws. In summary, KANs
are promising alternatives for MLPs, opening opportunities for further improving today’s
deep learning models which rely heavily on MLPs.


Full Text:

PDF

References


Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.

George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of

control, signals and systems, 2(4):303–314, 1989.

Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks

are universal approximators. Neural networks, 2(5):359–366, 1989.

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,

Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.

Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse

autoencoders find highly interpretable features in language models. arXiv preprint

arXiv:2309.08600, 2023.

A.N. Kolmogorov. On the representation of continuous functions of several variables as

superpositions of continuous functions of a smaller number of variables. Dokl. Akad. Nauk,

(2), 1956.

Andrei Nikolaevich Kolmogorov. On the representation of continuous functions of many

variables by superposition of continuous functions of one variable and addition. In Doklady

Akademii Nauk, volume 114, pages 953–956. Russian Academy of Sciences, 1957.

Jürgen Braun and Michael Griebel. On a constructive proof of kolmogorov’s superposition

theorem. Constructive approximation, 30:653–675, 2009.

David A Sprecher and Sorin Draghici. Space-filling curves and kolmogorov superpositionbased neural networks. Neural Networks, 15(1):57–67, 2002.

Mario Köppen. On the training of a kolmogorov network. In Artificial Neural Networks—ICANN 2002: International Conference Madrid, Spain, August 28–30, 2002 Proceedings 12, pages 474–479. Springer, 2002.

Ji-Nan Lin and Rolf Unbehauen. On the realization of a kolmogorov network. Neural Computation, 5(1):18–20, 1993.

Ming-Jun Lai and Zhaiming Shen. The kolmogorov superposition theorem can break the

curse of dimensionality when approximating high dimensional functions. arXiv preprint

arXiv:2112.09963, 2021.

Pierre-Emmanuel Leni, Yohan D Fougerolle, and Frédéric Truchetet. The kolmogorov spline

network for image processing. In Image Processing: Concepts, Methodologies, Tools, and

Applications, pages 54–78. IGI Global, 2013.

Daniele Fakhoury, Emanuele Fakhoury, and Hendrik Speleers. Exsplinet: An interpretable

and expressive spline-based neural network. Neural Networks, 152:332–346, 2022.

Hadrien Montanelli and Haizhao Yang. Error bounds for deep relu networks using the

kolmogorov–arnold superposition theorem. Neural Networks, 129:1–6, 2020.

Juncai He. On the optimal expressive power of relu dnns and its application in approximation

with kolmogorov superposition theorem. arXiv preprint arXiv:2308.05509, 2023.

Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. Relu deep neural networks and linear

finite elements. arXiv preprint arXiv:1807.03973, 2018.

Juncai He and Jinchao Xu. Deep neural networks and finite elements of any order on arbitrary

dimensions. arXiv preprint arXiv:2312.14276, 2023.

Tomaso Poggio, Andrzej Banburski, and Qianli Liao. Theoretical issues in deep networks.

Proceedings of the National Academy of Sciences, 117(48):30039–30045, 2020.

Federico Girosi and Tomaso Poggio. Representation properties of networks: Kolmogorov’s

theorem is irrelevant. Neural Computation, 1(4):465–469, 1989.

Henry W Lin, Max Tegmark, and David Rolnick. Why does deep and cheap learning work

so well? Journal of Statistical Physics, 168:1223–1247, 2017.

Hongyi Xu, Funshing Sin, Yufeng Zhu, and Jernej Barbiˇc. Nonlinear material design using

principal stretches. ACM Transactions on Graphics (TOG), 34(4):1–11, 2015.

Carl De Boor. A practical guide to splines, volume 27. springer-verlag New York, 1978.

Utkarsh Sharma and Jared Kaplan. A neural scaling law from the dimension of the data

manifold. arXiv preprint arXiv:2004.10802, 2020.

Eric J Michaud, Ziming Liu, and Max Tegmark. Precision machine learning. Entropy,

(1):175, 2023.

Joel L Horowitz and Enno Mammen. Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. 2007.

Michael Kohler and Sophie Langer. On the rate of convergence of fully connected deep neural

network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021.

Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu

activation function. 2020.

Ronald A DeVore, Ralph Howard, and Charles Micchelli. Optimal nonlinear approximation.

Manuscripta mathematica, 63:469–478, 1989.

Ronald A DeVore, George Kyriazis, Dany Leviatan, and Vladimir M Tikhomirov. Wavelet

compression and nonlinear n-widths. Adv. Comput. Math., 1(2):197–214, 1993.

Jonathan W Siegel. Sharp lower bounds on the manifold widths of sobolev and besov spaces.

arXiv preprint arXiv:2402.04407, 2024.

Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.

Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight vcdimension and pseudodimension bounds for piecewise linear neural networks. Journal of

Machine Learning Research, 20(63):1–17, 2019.

Jonathan W Siegel. Optimal approximation rates for deep relu neural networks on sobolev

and besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023.

Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of

machine precision. Journal of Computational Physics, page 112865, 2024.

Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16):eaay2631, 2020.

Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max

Tegmark. Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Advances in Neural Information Processing Systems, 33:4860–4871, 2020.

Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.

George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu

Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.

Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan.

Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference

on artificial intelligence, volume 32, 2018.

Bryan Kolb and Ian Q Whishaw. Brain plasticity and behavior. Annual review of psychology,

(1):43–64, 1998.

David Meunier, Renaud Lambiotte, and Edward T Bullmore. Modular and hierarchically

modular organization of brain networks. Frontiers in neuroscience, 4:7572, 2010.

James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins,

Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska,

et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national

academy of sciences, 114(13):3521–3526, 2017.

Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, and Yanan Sun. Revisiting neural networks for continual learning: An architectural perspective, 2024.

Alex Davies, Petar Veliˇckovi ́c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev,

Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathematics by guiding human intuition with ai. Nature, 600(7887):70–74, 2021.

Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for ribbons with machine learning, 2023.

P. Petersen. Riemannian Geometry. Graduate Texts in Mathematics. Springer New York,

Philip W Anderson. Absence of diffusion in certain random lattices. Physical review,

(5):1492, 1958.

David J Thouless. A relation between the density of states and range of localization for one

dimensional random systems. Journal of Physics C: Solid State Physics, 5(1):77, 1972.

Elihu Abrahams, PW Anderson, DC Licciardello, and TV Ramakrishnan. Scaling theory

of localization: Absence of quantum diffusion in two dimensions. Physical Review Letters,

(10):673, 1979.

Ad Lagendijk, Bart van Tiggelen, and Diederik S Wiersma. Fifty years of anderson localization. Physics today, 62(8):24–29, 2009.

Mordechai Segev, Yaron Silberberg, and Demetrios N Christodoulides. Anderson localization

of light. Nature Photonics, 7(3):197–204, 2013.

Z Valy Vardeny, Ajay Nahata, and Amit Agrawal. Optics of photonic quasicrystals. Nature

photonics, 7(3):177–187, 2013.

[54] Sajeev John. Strong localization of photons in certain disordered dielectric superlattices.

Physical review letters, 58(23):2486, 1987.

Yoav Lahini, Rami Pugatch, Francesca Pozzi, Marc Sorel, Roberto Morandotti, Nir Davidson, and Yaron Silberberg. Observation of a localization transition in quasiperiodic photonic

lattices. Physical review letters, 103(1):013901, 2009.

Sachin Vaidya, Christina Jörg, Kyle Linn, Megan Goh, and Mikael C Rechtsman. Reentrant delocalization transition in one-dimensional photonic quasicrystals. Physical Review

Research, 5(3):033170, 2023.

Wojciech De Roeck, Francois Huveneers, Markus Müller, and Mauro Schiulaz. Absence of

many-body mobility edges. Physical Review B, 93(1):014203, 2016.

Xiaopeng Li, Sriram Ganeshan, JH Pixley, and S Das Sarma. Many-body localization and

quantum nonergodicity in a model with a single-particle mobility edge. Physical review

letters, 115(18):186601, 2015.

Fangzhao Alex An, Karmela Padavi ́c, Eric J Meier, Suraj Hegde, Sriram Ganeshan, JH Pixley,

Smitha Vishveshwara, and Bryce Gadway. Interactions and mobility edges: Observing the

generalized aubry-andré model. Physical review letters, 126(4):040603, 2021.

J Biddle and S Das Sarma. Predicted mobility edges in one-dimensional incommensurate

optical lattices: An exactly solvable model of anderson localization. Physical review letters,

(7):070601, 2010.

Alexander Duthie, Sthitadhi Roy, and David E Logan. Self-consistent theory of mobility

edges in quasiperiodic chains. Physical Review B, 103(6):L060201, 2021.

Sriram Ganeshan, JH Pixley, and S Das Sarma. Nearest neighbor tight binding models with

an exact mobility edge in one dimension. Physical review letters, 114(14):146601, 2015.

Yucheng Wang, Xu Xia, Long Zhang, Hepeng Yao, Shu Chen, Jiangong You, Qi Zhou, and

Xiong-Jun Liu. One-dimensional quasiperiodic mosaic lattice with exact mobility edges.

Physical Review Letters, 125(19):196604, 2020.

Yucheng Wang, Xu Xia, Yongjian Wang, Zuohuan Zheng, and Xiong-Jun Liu. Duality between two generalized aubry-andré models with exact mobility edges. Physical Review B,

(17):174205, 2021.

Xin-Chi Zhou, Yongjian Wang, Ting-Fung Jeffrey Poon, Qi Zhou, and Xiong-Jun Liu.

Exact new mobility edges between critical and localized states. Physical Review Letters,

(17):176401, 2023.

Tomaso Poggio. How deep sparse networks avoid the curse of dimensionality: Efficiently

computable functions are compositionally sparse. CBMM Memo, 10:2022, 2022.

Johannes Schmidt-Hieber. The kolmogorov–arnold representation theorem revisited. Neural

networks, 137:119–126, 2021.

Aysu Ismayilova and Vugar E Ismailov. On the kolmogorov neural networks. Neural Networks, page 106333, 2024.

Michael Poluektov and Andrew Polar. A new iterative method for construction of the

kolmogorov-arnold representation. arXiv preprint arXiv:2305.08194, 2023.

Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich

Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning

with neural nets. Advances in neural information processing systems, 34:4699–4711, 2021.

Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. Advances in neural information processing systems,

, 2017.

Huan Song, Jayaraman J Thiagarajan, Prasanna Sattigeri, and Andreas Spanias. Optimizing

kernel machines using deep learning. IEEE transactions on neural networks and learning

systems, 29(11):5528–5540, 2018.

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon

Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural

language models. arXiv preprint arXiv:2001.08361, 2020.

Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. Scaling laws for autoregressive

generative modeling. arXiv preprint arXiv:2010.14701, 2020.

Mitchell A Gordon, Kevin Duh, and Jared Kaplan. Data and parameter scaling laws for neural

machine translation. In ACL Rolling Review - May 2021, 2021.

Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is

predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.

Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining

neural scaling laws. arXiv preprint arXiv:2102.06701, 2021.

Eric J Michaud, Ziming Liu, Uzay Girit, and Max Tegmark. The quantization model of neural

scaling. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.

Jinyeop Song, Ziming Liu, Max Tegmark, and Jeff Gore. A resource model for neural scaling

law. arXiv preprint arXiv:2402.05164, 2024.

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom

Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning

and induction heads. arXiv preprint arXiv:2209.11895, 2022.

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual

associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372,

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt.

Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. In The

Eleventh International Conference on Learning Representations, 2023.

Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna

Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of

superposition. arXiv preprint arXiv:2209.10652, 2022.

Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress

measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023.

Ziqian Zhong, Ziming Liu, Max Tegmark, and Jacob Andreas. The clock and the pizza:

Two stories in mechanistic explanation of neural networks. In Thirty-seventh Conference on

Neural Information Processing Systems, 2023.

Ziming Liu, Eric Gan, and Max Tegmark. Seeing is believing: Brain-inspired modular training for mechanistic interpretability. Entropy, 26(1):41, 2023.

Nelson Elhage, Tristan Hume, Catherine Olsson, Neel Nanda, Tom Henighan, Scott Johnston,

Sheer ElShowk, Nicholas Joseph, Nova DasSarma, Ben Mann, Danny Hernandez, Amanda

Askell, Kamal Ndousse, Andy Jones, Dawn Drain, Anna Chen, Yuntao Bai, Deep Ganguli,

Liane Lovitt, Zac Hatfield-Dodds, Jackson Kernion, Tom Conerly, Shauna Kravec, Stanislav

Fort, Saurav Kadavath, Josh Jacobson, Eli Tran-Johnson, Jared Kaplan, Jack Clark, Tom

Brown, Sam McCandlish, Dario Amodei, and Christopher Olah. Softmax linear units. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/solu/index.html.

Mohit Goyal, Rajan Goyal, and Brejesh Lall. Learning activation functions: A new paradigm

for understanding neural networks. arXiv preprint arXiv:1906.09529, 2019.

Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions. arXiv

preprint arXiv:1710.05941, 2017.

Shijun Zhang, Zuowei Shen, and Haizhao Yang. Neural network architecture beyond width

and depth. Advances in Neural Information Processing Systems, 35:5669–5681, 2022.

Garrett Bingham and Risto Miikkulainen. Discovering parametric activation functions. Neural Networks, 148:48–65, 2022.

Pakshal Bohra, Joaquim Campos, Harshit Gupta, Shayan Aziznejad, and Michael Unser.

Learning activation functions in deep (spline) neural networks. IEEE Open Journal of Signal

Processing, 1:295–309, 2020.

Shayan Aziznejad and Michael Unser. Deep spline networks with control of lipschitz regularity. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal

Processing (ICASSP), pages 3242–3246. IEEE, 2019.

Renáta Dubcáková. Eureqa: software review. Genetic Programming and Evolvable Machines, 12:173–178, 2011.

Gplearn. https://github.com/trevorstephens/gplearn. Accessed: 2024-04-19.

Miles Cranmer. Interpretable machine learning for science with pysr and symbolicregression.

jl. arXiv preprint arXiv:2305.01582, 2023.

Georg Martius and Christoph H Lampert. Extrapolation and learning equations. arXiv

preprint arXiv:1610.02995, 2016.

Owen Dugan, Rumen Dangovski, Allan Costa, Samuel Kim, Pawan Goyal, Joseph Jacobson,

and Marin Soljaˇci ́c. Occamnet: A fast neural model for symbolic regression at scale. arXiv

preprint arXiv:2007.10784, 2020.

Terrell N. Mundhenk, Mikel Landajuela, Ruben Glatt, Claudio P. Santiago, Daniel faissol,

and Brenden K. Petersen. Symbolic regression via deep reinforcement learning enhanced

genetic programming seeding. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman

Vaughan, editors, Advances in Neural Information Processing Systems, 2021.

Bing Yu et al. The deep ritz method: a deep learning-based numerical algorithm for solving

variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.

Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, and Eunbyung Park. Separable physics-informed neural networks. Advances in Neural Information

Processing Systems, 36, 2024.

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya,

Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.

Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. Physics-informed neural operator for learning

partial differential equations. ACM/JMS Journal of Data Science, 2021.

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya,

Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function

spaces with applications to pdes. Journal of Machine Learning Research, 24(89):1–97, 2023.

Haydn Maust, Zongyi Li, Yixuan Wang, Daniel Leibovici, Oscar Bruno, Thomas Hou,

and Anima Anandkumar. Fourier continuation for exact derivative computation in physicsinformed neural operators. arXiv preprint arXiv:2211.15960, 2022.

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.

Sergei Gukov, James Halverson, Fabian Ruehle, and Piotr Sułkowski. Learning to Unknot.

Mach. Learn. Sci. Tech., 2(2):025035, 2021.

L. H. Kauffman, N. E. Russkikh, and I. A. Taimanov. Rectangular knot diagrams classification with deep learning, 2020.

Mark C Hughes. A neural network approach to predicting and computing knot invariants.

Journal of Knot Theory and Its Ramifications, 29(03):2050005, 2020.

Jessica Craven, Vishnu Jejjala, and Arjun Kar. Disentangling a deep learned volume formula.

JHEP, 06:040, 2021.

Jessica Craven, Mark Hughes, Vishnu Jejjala, and Arjun Kar. Illuminating new and known

relations between knot invariants. 11 2022.

Fabian Ruehle. Data science applications to string theory. Phys. Rept., 839:1–117, 2020.

Y.H. He. Machine Learning in Pure Mathematics and Theoretical Physics. G - Reference,Information and Interdisciplinary Subjects Series. World Scientific, 2023.

Sergei Gukov, James Halverson, and Fabian Ruehle. Rigor with machine learning from field

theory to the poincaréconjecture. Nature Reviews Physics, 2024.

Shumao Zhang, Pengchuan Zhang, and Thomas Y Hou. Multiscale invertible generative

networks for high-dimensional bayesian inference. In International Conference on Machine

Learning, pages 12632–12641. PMLR, 2021.

Jinchao Xu and Ludmil Zikatanov. Algebraic multigrid methods. Acta Numerica, 26:591–

, 2017.

Yifan Chen, Thomas Y Hou, and Yixuan Wang. Exponentially convergent multiscale finite

element method. Communications on Applied Mathematics and Computation, pages 1–17,

Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein.

Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.


Refbacks

  • There are currently no refbacks.