KAN: Kolmogorov–Arnold Networks
Abstract
Inspired by the Kolmogorov-Arnold representation theorem, we propose KolmogorovArnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs).
While MLPs have fixed activation functions on nodes (“neurons”), KANs have learnable
activation functions on edges (“weights”). KANs have no linear weights at all – every
weight parameter is replaced by a univariate function parametrized as a spline. We show
that this seemingly simple change makes KANs outperform MLPs in terms of accuracy
and interpretability, on small-scale AI + Science tasks. For accuracy, smaller KANs can
achieve comparable or better accuracy than larger MLPs in function fitting tasks. Theoretically and empirically, KANs possess faster neural scaling laws than MLPs. For interpretability, KANs can be intuitively visualized and can easily interact with human users.
Through two examples in mathematics and physics, KANs are shown to be useful “collaborators” helping scientists (re)discover mathematical and physical laws. In summary, KANs
are promising alternatives for MLPs, opening opportunities for further improving today’s
deep learning models which rely heavily on MLPs.
Full Text:
PDFReferences
Simon Haykin. Neural networks: a comprehensive foundation. Prentice Hall PTR, 1994.
George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of
control, signals and systems, 2(4):303–314, 1989.
Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks
are universal approximators. Neural networks, 2(5):359–366, 1989.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez,
Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. Advances in neural information processing systems, 30, 2017.
Hoagy Cunningham, Aidan Ewart, Logan Riggs, Robert Huben, and Lee Sharkey. Sparse
autoencoders find highly interpretable features in language models. arXiv preprint
arXiv:2309.08600, 2023.
A.N. Kolmogorov. On the representation of continuous functions of several variables as
superpositions of continuous functions of a smaller number of variables. Dokl. Akad. Nauk,
(2), 1956.
Andrei Nikolaevich Kolmogorov. On the representation of continuous functions of many
variables by superposition of continuous functions of one variable and addition. In Doklady
Akademii Nauk, volume 114, pages 953–956. Russian Academy of Sciences, 1957.
Jürgen Braun and Michael Griebel. On a constructive proof of kolmogorov’s superposition
theorem. Constructive approximation, 30:653–675, 2009.
David A Sprecher and Sorin Draghici. Space-filling curves and kolmogorov superpositionbased neural networks. Neural Networks, 15(1):57–67, 2002.
Mario Köppen. On the training of a kolmogorov network. In Artificial Neural Networks—ICANN 2002: International Conference Madrid, Spain, August 28–30, 2002 Proceedings 12, pages 474–479. Springer, 2002.
Ji-Nan Lin and Rolf Unbehauen. On the realization of a kolmogorov network. Neural Computation, 5(1):18–20, 1993.
Ming-Jun Lai and Zhaiming Shen. The kolmogorov superposition theorem can break the
curse of dimensionality when approximating high dimensional functions. arXiv preprint
arXiv:2112.09963, 2021.
Pierre-Emmanuel Leni, Yohan D Fougerolle, and Frédéric Truchetet. The kolmogorov spline
network for image processing. In Image Processing: Concepts, Methodologies, Tools, and
Applications, pages 54–78. IGI Global, 2013.
Daniele Fakhoury, Emanuele Fakhoury, and Hendrik Speleers. Exsplinet: An interpretable
and expressive spline-based neural network. Neural Networks, 152:332–346, 2022.
Hadrien Montanelli and Haizhao Yang. Error bounds for deep relu networks using the
kolmogorov–arnold superposition theorem. Neural Networks, 129:1–6, 2020.
Juncai He. On the optimal expressive power of relu dnns and its application in approximation
with kolmogorov superposition theorem. arXiv preprint arXiv:2308.05509, 2023.
Juncai He, Lin Li, Jinchao Xu, and Chunyue Zheng. Relu deep neural networks and linear
finite elements. arXiv preprint arXiv:1807.03973, 2018.
Juncai He and Jinchao Xu. Deep neural networks and finite elements of any order on arbitrary
dimensions. arXiv preprint arXiv:2312.14276, 2023.
Tomaso Poggio, Andrzej Banburski, and Qianli Liao. Theoretical issues in deep networks.
Proceedings of the National Academy of Sciences, 117(48):30039–30045, 2020.
Federico Girosi and Tomaso Poggio. Representation properties of networks: Kolmogorov’s
theorem is irrelevant. Neural Computation, 1(4):465–469, 1989.
Henry W Lin, Max Tegmark, and David Rolnick. Why does deep and cheap learning work
so well? Journal of Statistical Physics, 168:1223–1247, 2017.
Hongyi Xu, Funshing Sin, Yufeng Zhu, and Jernej Barbiˇc. Nonlinear material design using
principal stretches. ACM Transactions on Graphics (TOG), 34(4):1–11, 2015.
Carl De Boor. A practical guide to splines, volume 27. springer-verlag New York, 1978.
Utkarsh Sharma and Jared Kaplan. A neural scaling law from the dimension of the data
manifold. arXiv preprint arXiv:2004.10802, 2020.
Eric J Michaud, Ziming Liu, and Max Tegmark. Precision machine learning. Entropy,
(1):175, 2023.
Joel L Horowitz and Enno Mammen. Rate-optimal estimation for a general class of nonparametric regression models with unknown link functions. 2007.
Michael Kohler and Sophie Langer. On the rate of convergence of fully connected deep neural
network regression estimates. The Annals of Statistics, 49(4):2231–2249, 2021.
Johannes Schmidt-Hieber. Nonparametric regression using deep neural networks with relu
activation function. 2020.
Ronald A DeVore, Ralph Howard, and Charles Micchelli. Optimal nonlinear approximation.
Manuscripta mathematica, 63:469–478, 1989.
Ronald A DeVore, George Kyriazis, Dany Leviatan, and Vladimir M Tikhomirov. Wavelet
compression and nonlinear n-widths. Adv. Comput. Math., 1(2):197–214, 1993.
Jonathan W Siegel. Sharp lower bounds on the manifold widths of sobolev and besov spaces.
arXiv preprint arXiv:2402.04407, 2024.
Dmitry Yarotsky. Error bounds for approximations with deep relu networks. Neural Networks, 94:103–114, 2017.
Peter L Bartlett, Nick Harvey, Christopher Liaw, and Abbas Mehrabian. Nearly-tight vcdimension and pseudodimension bounds for piecewise linear neural networks. Journal of
Machine Learning Research, 20(63):1–17, 2019.
Jonathan W Siegel. Optimal approximation rates for deep relu neural networks on sobolev
and besov spaces. Journal of Machine Learning Research, 24(357):1–52, 2023.
Yongji Wang and Ching-Yao Lai. Multi-stage neural networks: Function approximator of
machine precision. Journal of Computational Physics, page 112865, 2024.
Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science Advances, 6(16):eaay2631, 2020.
Silviu-Marian Udrescu, Andrew Tan, Jiahai Feng, Orisvaldo Neto, Tailin Wu, and Max
Tegmark. Ai feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity. Advances in Neural Information Processing Systems, 33:4860–4871, 2020.
Maziar Raissi, Paris Perdikaris, and George E Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational physics, 378:686–707, 2019.
George Em Karniadakis, Ioannis G Kevrekidis, Lu Lu, Paris Perdikaris, Sifan Wang, and Liu
Yang. Physics-informed machine learning. Nature Reviews Physics, 3(6):422–440, 2021.
Ronald Kemker, Marc McClure, Angelina Abitino, Tyler Hayes, and Christopher Kanan.
Measuring catastrophic forgetting in neural networks. In Proceedings of the AAAI conference
on artificial intelligence, volume 32, 2018.
Bryan Kolb and Ian Q Whishaw. Brain plasticity and behavior. Annual review of psychology,
(1):43–64, 1998.
David Meunier, Renaud Lambiotte, and Edward T Bullmore. Modular and hierarchically
modular organization of brain networks. Frontiers in neuroscience, 4:7572, 2010.
James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins,
Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska,
et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national
academy of sciences, 114(13):3521–3526, 2017.
Aojun Lu, Tao Feng, Hangjie Yuan, Xiaotian Song, and Yanan Sun. Revisiting neural networks for continual learning: An architectural perspective, 2024.
Alex Davies, Petar Veliˇckovi ́c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev,
Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathematics by guiding human intuition with ai. Nature, 600(7887):70–74, 2021.
Sergei Gukov, James Halverson, Ciprian Manolescu, and Fabian Ruehle. Searching for ribbons with machine learning, 2023.
P. Petersen. Riemannian Geometry. Graduate Texts in Mathematics. Springer New York,
Philip W Anderson. Absence of diffusion in certain random lattices. Physical review,
(5):1492, 1958.
David J Thouless. A relation between the density of states and range of localization for one
dimensional random systems. Journal of Physics C: Solid State Physics, 5(1):77, 1972.
Elihu Abrahams, PW Anderson, DC Licciardello, and TV Ramakrishnan. Scaling theory
of localization: Absence of quantum diffusion in two dimensions. Physical Review Letters,
(10):673, 1979.
Ad Lagendijk, Bart van Tiggelen, and Diederik S Wiersma. Fifty years of anderson localization. Physics today, 62(8):24–29, 2009.
Mordechai Segev, Yaron Silberberg, and Demetrios N Christodoulides. Anderson localization
of light. Nature Photonics, 7(3):197–204, 2013.
Z Valy Vardeny, Ajay Nahata, and Amit Agrawal. Optics of photonic quasicrystals. Nature
photonics, 7(3):177–187, 2013.
[54] Sajeev John. Strong localization of photons in certain disordered dielectric superlattices.
Physical review letters, 58(23):2486, 1987.
Yoav Lahini, Rami Pugatch, Francesca Pozzi, Marc Sorel, Roberto Morandotti, Nir Davidson, and Yaron Silberberg. Observation of a localization transition in quasiperiodic photonic
lattices. Physical review letters, 103(1):013901, 2009.
Sachin Vaidya, Christina Jörg, Kyle Linn, Megan Goh, and Mikael C Rechtsman. Reentrant delocalization transition in one-dimensional photonic quasicrystals. Physical Review
Research, 5(3):033170, 2023.
Wojciech De Roeck, Francois Huveneers, Markus Müller, and Mauro Schiulaz. Absence of
many-body mobility edges. Physical Review B, 93(1):014203, 2016.
Xiaopeng Li, Sriram Ganeshan, JH Pixley, and S Das Sarma. Many-body localization and
quantum nonergodicity in a model with a single-particle mobility edge. Physical review
letters, 115(18):186601, 2015.
Fangzhao Alex An, Karmela Padavi ́c, Eric J Meier, Suraj Hegde, Sriram Ganeshan, JH Pixley,
Smitha Vishveshwara, and Bryce Gadway. Interactions and mobility edges: Observing the
generalized aubry-andré model. Physical review letters, 126(4):040603, 2021.
J Biddle and S Das Sarma. Predicted mobility edges in one-dimensional incommensurate
optical lattices: An exactly solvable model of anderson localization. Physical review letters,
(7):070601, 2010.
Alexander Duthie, Sthitadhi Roy, and David E Logan. Self-consistent theory of mobility
edges in quasiperiodic chains. Physical Review B, 103(6):L060201, 2021.
Sriram Ganeshan, JH Pixley, and S Das Sarma. Nearest neighbor tight binding models with
an exact mobility edge in one dimension. Physical review letters, 114(14):146601, 2015.
Yucheng Wang, Xu Xia, Long Zhang, Hepeng Yao, Shu Chen, Jiangong You, Qi Zhou, and
Xiong-Jun Liu. One-dimensional quasiperiodic mosaic lattice with exact mobility edges.
Physical Review Letters, 125(19):196604, 2020.
Yucheng Wang, Xu Xia, Yongjian Wang, Zuohuan Zheng, and Xiong-Jun Liu. Duality between two generalized aubry-andré models with exact mobility edges. Physical Review B,
(17):174205, 2021.
Xin-Chi Zhou, Yongjian Wang, Ting-Fung Jeffrey Poon, Qi Zhou, and Xiong-Jun Liu.
Exact new mobility edges between critical and localized states. Physical Review Letters,
(17):176401, 2023.
Tomaso Poggio. How deep sparse networks avoid the curse of dimensionality: Efficiently
computable functions are compositionally sparse. CBMM Memo, 10:2022, 2022.
Johannes Schmidt-Hieber. The kolmogorov–arnold representation theorem revisited. Neural
networks, 137:119–126, 2021.
Aysu Ismayilova and Vugar E Ismailov. On the kolmogorov neural networks. Neural Networks, page 106333, 2024.
Michael Poluektov and Andrew Polar. A new iterative method for construction of the
kolmogorov-arnold representation. arXiv preprint arXiv:2305.08194, 2023.
Rishabh Agarwal, Levi Melnick, Nicholas Frosst, Xuezhou Zhang, Ben Lengerich, Rich
Caruana, and Geoffrey E Hinton. Neural additive models: Interpretable machine learning
with neural nets. Advances in neural information processing systems, 34:4699–4711, 2021.
Manzil Zaheer, Satwik Kottur, Siamak Ravanbakhsh, Barnabas Poczos, Russ R Salakhutdinov, and Alexander J Smola. Deep sets. Advances in neural information processing systems,
, 2017.
Huan Song, Jayaraman J Thiagarajan, Prasanna Sattigeri, and Andreas Spanias. Optimizing
kernel machines using deep learning. IEEE transactions on neural networks and learning
systems, 29(11):5528–5540, 2018.
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon
Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural
language models. arXiv preprint arXiv:2001.08361, 2020.
Tom Henighan, Jared Kaplan, Mor Katz, Mark Chen, Christopher Hesse, Jacob Jackson, Heewoo Jun, Tom B Brown, Prafulla Dhariwal, Scott Gray, et al. Scaling laws for autoregressive
generative modeling. arXiv preprint arXiv:2010.14701, 2020.
Mitchell A Gordon, Kevin Duh, and Jared Kaplan. Data and parameter scaling laws for neural
machine translation. In ACL Rolling Review - May 2021, 2021.
Joel Hestness, Sharan Narang, Newsha Ardalani, Gregory Diamos, Heewoo Jun, Hassan Kianinejad, Md Mostofa Ali Patwary, Yang Yang, and Yanqi Zhou. Deep learning scaling is
predictable, empirically. arXiv preprint arXiv:1712.00409, 2017.
Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining
neural scaling laws. arXiv preprint arXiv:2102.06701, 2021.
Eric J Michaud, Ziming Liu, Uzay Girit, and Max Tegmark. The quantization model of neural
scaling. In Thirty-seventh Conference on Neural Information Processing Systems, 2023.
Jinyeop Song, Ziming Liu, Max Tegmark, and Jeff Gore. A resource model for neural scaling
law. arXiv preprint arXiv:2402.05164, 2024.
Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom
Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning
and induction heads. arXiv preprint arXiv:2209.11895, 2022.
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual
associations in gpt. Advances in Neural Information Processing Systems, 35:17359–17372,
Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt.
Interpretability in the wild: a circuit for indirect object identification in GPT-2 small. In The
Eleventh International Conference on Learning Representations, 2023.
Nelson Elhage, Tristan Hume, Catherine Olsson, Nicholas Schiefer, Tom Henighan, Shauna
Kravec, Zac Hatfield-Dodds, Robert Lasenby, Dawn Drain, Carol Chen, et al. Toy models of
superposition. arXiv preprint arXiv:2209.10652, 2022.
Neel Nanda, Lawrence Chan, Tom Lieberum, Jess Smith, and Jacob Steinhardt. Progress
measures for grokking via mechanistic interpretability. In The Eleventh International Conference on Learning Representations, 2023.
Ziqian Zhong, Ziming Liu, Max Tegmark, and Jacob Andreas. The clock and the pizza:
Two stories in mechanistic explanation of neural networks. In Thirty-seventh Conference on
Neural Information Processing Systems, 2023.
Ziming Liu, Eric Gan, and Max Tegmark. Seeing is believing: Brain-inspired modular training for mechanistic interpretability. Entropy, 26(1):41, 2023.
Nelson Elhage, Tristan Hume, Catherine Olsson, Neel Nanda, Tom Henighan, Scott Johnston,
Sheer ElShowk, Nicholas Joseph, Nova DasSarma, Ben Mann, Danny Hernandez, Amanda
Askell, Kamal Ndousse, Andy Jones, Dawn Drain, Anna Chen, Yuntao Bai, Deep Ganguli,
Liane Lovitt, Zac Hatfield-Dodds, Jackson Kernion, Tom Conerly, Shauna Kravec, Stanislav
Fort, Saurav Kadavath, Josh Jacobson, Eli Tran-Johnson, Jared Kaplan, Jack Clark, Tom
Brown, Sam McCandlish, Dario Amodei, and Christopher Olah. Softmax linear units. Transformer Circuits Thread, 2022. https://transformer-circuits.pub/2022/solu/index.html.
Mohit Goyal, Rajan Goyal, and Brejesh Lall. Learning activation functions: A new paradigm
for understanding neural networks. arXiv preprint arXiv:1906.09529, 2019.
Prajit Ramachandran, Barret Zoph, and Quoc V Le. Searching for activation functions. arXiv
preprint arXiv:1710.05941, 2017.
Shijun Zhang, Zuowei Shen, and Haizhao Yang. Neural network architecture beyond width
and depth. Advances in Neural Information Processing Systems, 35:5669–5681, 2022.
Garrett Bingham and Risto Miikkulainen. Discovering parametric activation functions. Neural Networks, 148:48–65, 2022.
Pakshal Bohra, Joaquim Campos, Harshit Gupta, Shayan Aziznejad, and Michael Unser.
Learning activation functions in deep (spline) neural networks. IEEE Open Journal of Signal
Processing, 1:295–309, 2020.
Shayan Aziznejad and Michael Unser. Deep spline networks with control of lipschitz regularity. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal
Processing (ICASSP), pages 3242–3246. IEEE, 2019.
Renáta Dubcáková. Eureqa: software review. Genetic Programming and Evolvable Machines, 12:173–178, 2011.
Gplearn. https://github.com/trevorstephens/gplearn. Accessed: 2024-04-19.
Miles Cranmer. Interpretable machine learning for science with pysr and symbolicregression.
jl. arXiv preprint arXiv:2305.01582, 2023.
Georg Martius and Christoph H Lampert. Extrapolation and learning equations. arXiv
preprint arXiv:1610.02995, 2016.
Owen Dugan, Rumen Dangovski, Allan Costa, Samuel Kim, Pawan Goyal, Joseph Jacobson,
and Marin Soljaˇci ́c. Occamnet: A fast neural model for symbolic regression at scale. arXiv
preprint arXiv:2007.10784, 2020.
Terrell N. Mundhenk, Mikel Landajuela, Ruben Glatt, Claudio P. Santiago, Daniel faissol,
and Brenden K. Petersen. Symbolic regression via deep reinforcement learning enhanced
genetic programming seeding. In A. Beygelzimer, Y. Dauphin, P. Liang, and J. Wortman
Vaughan, editors, Advances in Neural Information Processing Systems, 2021.
Bing Yu et al. The deep ritz method: a deep learning-based numerical algorithm for solving
variational problems. Communications in Mathematics and Statistics, 6(1):1–12, 2018.
Junwoo Cho, Seungtae Nam, Hyunmo Yang, Seok-Bae Yun, Youngjoon Hong, and Eunbyung Park. Separable physics-informed neural networks. Advances in Neural Information
Processing Systems, 36, 2024.
Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. arXiv preprint arXiv:2010.08895, 2020.
Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. Physics-informed neural operator for learning
partial differential equations. ACM/JMS Journal of Data Science, 2021.
Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya,
Andrew Stuart, and Anima Anandkumar. Neural operator: Learning maps between function
spaces with applications to pdes. Journal of Machine Learning Research, 24(89):1–97, 2023.
Haydn Maust, Zongyi Li, Yixuan Wang, Daniel Leibovici, Oscar Bruno, Thomas Hou,
and Anima Anandkumar. Fourier continuation for exact derivative computation in physicsinformed neural operators. arXiv preprint arXiv:2211.15960, 2022.
Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via deeponet based on the universal approximation theorem of operators. Nature machine intelligence, 3(3):218–229, 2021.
Sergei Gukov, James Halverson, Fabian Ruehle, and Piotr Sułkowski. Learning to Unknot.
Mach. Learn. Sci. Tech., 2(2):025035, 2021.
L. H. Kauffman, N. E. Russkikh, and I. A. Taimanov. Rectangular knot diagrams classification with deep learning, 2020.
Mark C Hughes. A neural network approach to predicting and computing knot invariants.
Journal of Knot Theory and Its Ramifications, 29(03):2050005, 2020.
Jessica Craven, Vishnu Jejjala, and Arjun Kar. Disentangling a deep learned volume formula.
JHEP, 06:040, 2021.
Jessica Craven, Mark Hughes, Vishnu Jejjala, and Arjun Kar. Illuminating new and known
relations between knot invariants. 11 2022.
Fabian Ruehle. Data science applications to string theory. Phys. Rept., 839:1–117, 2020.
Y.H. He. Machine Learning in Pure Mathematics and Theoretical Physics. G - Reference,Information and Interdisciplinary Subjects Series. World Scientific, 2023.
Sergei Gukov, James Halverson, and Fabian Ruehle. Rigor with machine learning from field
theory to the poincaréconjecture. Nature Reviews Physics, 2024.
Shumao Zhang, Pengchuan Zhang, and Thomas Y Hou. Multiscale invertible generative
networks for high-dimensional bayesian inference. In International Conference on Machine
Learning, pages 12632–12641. PMLR, 2021.
Jinchao Xu and Ludmil Zikatanov. Algebraic multigrid methods. Acta Numerica, 26:591–
, 2017.
Yifan Chen, Thomas Y Hou, and Yixuan Wang. Exponentially convergent multiscale finite
element method. Communications on Applied Mathematics and Computation, pages 1–17,
Vincent Sitzmann, Julien Martel, Alexander Bergman, David Lindell, and Gordon Wetzstein.
Implicit neural representations with periodic activation functions. Advances in neural information processing systems, 33:7462–7473, 2020.
Refbacks
- There are currently no refbacks.