The Kolmogorov-Arnold Theorem Revisited: Why Averaging Functions Work Better

Kolmogorov-Arnold Networks (KANs) have emerged as a promising alternative to traditional Multi-Layer Perceptrons (MLPs). Inspired by the Kolmogorov-Arnold representation theorem, these networks utilize neurons that perform simple summation operations. However, the current implementation of KANs poses some challenges in practical applications. Currently, researchers are investigating the possibility of identifying alternative multivariate functions for KAN neurons that could offer enhanced practical utility across several benchmarks related to machine-learning tasks.

Research has highlighted the potential of KANs in various fields, like computer vision, time series analysis, and quantum architecture search. Some studies show that KANs can outperform MLPs in data fitting and PDE tasks while using fewer parameters. However, some research has raised concerns about the robustness of KANs to noise and their performance compared to MLPs. Variations and improvements to the standard KAN architecture are also explored, such as graph-based designs, convolutional KANs, and transformer-based KANs to solve the issues. Moreover, alternative activation functions like wavelets, radial basis functions, and sinusoidal functions are investigated to improve KAN efficiency. Despite these works, there is a need for further improvements to enhance KAN performance.

A Researcher from the Center for Applied Intelligent Systems Research at Halmstad University, Sweden, has proposed a novel approach to enhance the performance of Kolmogorov-Arnold Networks (KANs). This method aims to identify the optimal multivariate function for KAN neurons across various machine learning classification tasks. The traditional use of addition as the node-level function is often non-ideal, especially for high-dimensional datasets with multiple features. This can cause the inputs to exceed the effective range of subsequent activation functions, leading to training instability and reduced generalization performance. To solve this problem, the researcher suggests using the mean instead of the sum as the node function.Â

To evaluate the proposed KAN modifications, 10 popular datasets from the UCI Machine Learning Database Repository are utilized, covering multiple domains and varying sizes. These datasets are divided into training (60%), validation (20%), and testing (20%) partitions. A standardized preprocessing method is applied across all datasets, which includes categorical feature encoding, missing value imputation, and instance randomization. Models are trained for 2000 iterations using the Adam optimizer with a learning rate of 0.01 and a batch size of 32. Model accuracy on the testing set serves as the primary evaluation metric. The parameter count is managed by setting the grid to 3 and using default hyperparameters for the KAN models.

The results support the hypothesis that using the mean function in KAN neurons is more effective than the traditional sum function. This enhancement is due to the meanâ€™s ability to keep input values within the optimal range of the spline activation function, which is [-1.0, +1.0]. Standard KANs struggled to keep values within this range in intermediate layers as the number of features increased. However, adopting the mean function in neurons leads to enhanced performance, keeping values within the desired range across datasets with 20 or more features. For datasets with fewer features, values stayed within the range more than 99.0% of the time, except for the â€˜abaloneâ€™ dataset, which had a slightly lower adherence rate of 96.51%.

In this paper, a Researcher from the Center for Applied Intelligent Systems Research at Halmstad University, Sweden, has proposed a method to enhance the performance of KANs. An important modification to KANs is introduced in this paper by replacing the traditional summation in KAN neurons with an averaging function. Experimental results show that this change leads to more stable training processes and keeps inputs within the effective range of spline activations. This adjustment to KAN architecture solves previous challenges related to input range and training stability. In the future, this work offers a promising direction for future KAN implementations, potentially enhancing their performance and applicability in various machine-learning tasks.

Check out the Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter and join ourÂ Telegram Channel andÂ LinkedIn Group. If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 47k+ ML SubReddit

Find Upcoming AI Webinars here

Arcee AI Released DistillKit: An Open Source, Easy-to-Use Tool Transforming Model Distillation for Creating Efficient, High-Performance Small Language Models

The post The Kolmogorov-Arnold Theorem Revisited: Why Averaging Functions Work Better appeared first on MarkTechPost.

Source: Read MoreÂ

Sunshine And March Vibes (2025 Wallpapers Edition)

The Case For Minimal WordPress Setups: A Contrarian View On Theme Frameworks

How To Fix Largest Contentful Paint Issues With Subpart Analysis

How To Prevent WordPress SQL Injection Attacks

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

Minecraft licensing robbed us of this controversial NFL schedule release video

The power of generators

The power of generators

Simplify Factory Associations with Laravel’s UseFactory Attribute

This Week in Laravel: React Native, PhpStorm Junie, and more

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Microsoft has closed its “Experience Center” store in Sydney, Australia — as it ramps up a continued digital growth campaign

Bing Search APIs to be “decommissioned completely” as Microsoft urges developers to use its Azure agentic AI alternative

Microsoft might kill the Surface Laptop Studio as production is quietly halted

The Kolmogorov-Arnold Theorem Revisited: Why Averaging Functions Work Better

Salesforce AI Releases BLIP3-o: A Fully Open-Source Unified Multimodal Model Built with CLIP Embeddings and Flow Matching for Image Understanding and Generation

Nmap 7.96 Launches with Lightning-Fast DNS and 612 Scripts

New Cybersecurity Concerns Emerge as Kamala Harris Presidential Campaign Targeted by Foreign Hackers

YTSubConverter – create styled YouTube subtitles

CVE-2025-42600 – Meon KYC Brute Force OTP Vulnerability

Electrifying Jackery deals slashes power station prices by over 40%!

This AI Paper Introduces PLAN-AND-ACT: A Modular Framework for Long-Horizon Planning in Web-Based Language Agents

OpenDocument Format (ODF) celebra il suo 20° anniversario!

Prepare for your iOS interview

McAfee’s new AI tool detects email and text scams before you fall for them

The Kolmogorov-Arnold Theorem Revisited: Why Averaging Functions Work Better

Related Posts