CycleFormer: A New Transformer Model for the Traveling Salesman Problem (TSP)

Numerous groundbreaking modelsâ€”including ChatGPT, Bard, LLaMa, AlphaFold2, and Dall-E 2â€”have surfaced in different domains since the Transformerâ€™s inception in Natural Language Processing (NLP). Attempts to solve combinatorial optimization issues like the Traveling Salesman Problem (TSP) using deep learning have progressed logically from convolutional neural networks (CNNs) to recurrent neural networks (RNNs) and finally to transformer-based models. Using the coordinates of N cities (nodes, vertices, tokens), TSP determines the shortest Hamiltonian cycle that passes through each node. The computational complexity grows exponentially with the number of cities, making it a representative NP-hard issue in computer science.

Several heuristics have been used to deal with this. Iterative improvement algorithms and stochastic algorithms are the two main categories under which heuristic algorithms fall. There has been a lot of effort, but it still canâ€™t compare to the best heuristic algorithms. The performance of the Transformer is crucial as it is the engine that solves pipeline problems; however, this is analogous to AlphaGo, which was not powerful enough on its own but beat the top professionals in the world by combining post-processing search techniques like Monte Carlo Tree Search (MCTS). Choosing the next city to visit, depending on the ones already visited, is at the heart of TSP, and the Transformer, a model that attempts to discover relationships between nodes using attention mechanisms, is a good fit for this task. Due to its original design for language models, the Transformer has presented metaphorical challenges in previous studies when applied to the TSP domain.

Among the many distinctions between the language domain transformer and the TSP domain transformer is the significance of tokens. Words and their subwords are considered tokens in the realm of languages. On the other hand, in the TSP domain, every node usually turns into a token. Unlike a collection of words, the set of nodesâ€™ real-number coordinates is infinite, unpredictable, and unconnected. Token indices and the spatial link between neighboring tokens are useless in this arrangement. Duplication is another important distinction. Regarding TSP solutions, unlike linguistic domains, a Hamiltonian cycle cannot be formed by decoding the same city more than once. During TSP decoding, a visited mask is utilized to avoid repetition.

Researchers from Seoul National University present CycleFormer, a TSP solution based on transformers. In this model, the researchers merge the best features of a supervised learning (SL) language model-based Transformer with those of a TSP. Current transformer-based TSP solvers are limited since they are trained with RL. This prevents them from fully utilizing SLâ€™s advantages, such as faster training thanks to the visited mask and more stable convergence. The NP-hardness of the TSP makes it impossible for optimal SL solvers to know the global optimum as problem sizes get too big. However, this limitation can be circumvented if a transformer trained on reasonable-sized problems is generalizable and scalable. Consequently, for the time being, SL and RL will coexist.

The teamâ€™s exclusive emphasis is on the symmetric TSP, defined by the distance between any two points and is constant in all directions. They substantially changed the original design to guarantee that the Transformer embodies the TSPâ€™s properties. Because the TSP solution is cyclical, they ensured that their decoder-side positional encoding (PE) would be insensitive to rotation and flip. Thus, the starting node is very related to the nodes at the beginning and end of the tour but very unrelated to the nodes in the middle.Â

The researchers use the encoderâ€™s 2D coordinates for spatial positional encoding. The positional embeddings used by the encoder and decoder are completely different. The context embedding (memory) from the encoderâ€™s output serves as the input to the decoder. To quickly maximize the use of acquired information, this strategy takes advantage of the fact that the set of tokens used in the encoder and the decoder is the same in TSP. They swap out the last linear layer of the Transformer with a Dynamic Embedding; this is the graphâ€™s context encoding and acts as the encoderâ€™s output (memory).Â

The usage of positional embedding and token embedding, as well as the change of the decoder input and exploitation of the encoderâ€™s context vector in the decoder output, are two ways in which CycleFormer differs dramatically from the original Transformer. These enhancements demonstrate the potential for transformer-based TSP solvers to improve by adopting performance improvement strategies employed in Large Language Models (LLMs), such as raising the embedding dimension and the number of attention blocks. This highlights the ongoing challenges and the exciting possibilities for future advancements in this field.

According to extensive experimental results, with these design characteristics, CycleFormer can outperform SOTA models based on transformers while keeping the shape of the Transformer in TSP-50, TSP-100, and TSP-500. The â€˜optimality gap â€˜, a term used to measure the difference between the best possible solution and the solution found by the model, between SOTA and TSP-500 during multi-start decoding is 3.09% to 1.10%, a 2.8-fold improvement, thanks to CycleFormer.

The proposed model, CycleFormer, has the potential to surpass SOTA alternatives like Pointerformer. Its adherence to the transformer architecture allows for the inclusion of additional LLM approaches, such as raising the embedding dimension and stacking multiple attention blocks, to enhance performance. As the problem size increases, speed-up methods for inference in big language models, such as Retention and DeepSpeed, may prove advantageous. While the researchers could not experiment on TSP-1000 due to resource constraints, they believe that with enough TSP-1000 optimum answers, CycleFormer could outperform existing models. They plan to incorporate MCTS as a post-processing step in future studies to further enhance CycleFormerâ€™s performance.

Check out theÂ Paper. All credit for this research goes to the researchers of this project. Also,Â donâ€™t forget to follow us onÂ Twitter.Â Join ourÂ Telegram Channel,Â Discord Channel, andÂ LinkedIn Group.

If you like our work, you will love ourÂ newsletter..

Donâ€™t Forget to join ourÂ 43k+ ML SubReddit | Also, check out our AI Events Platform

The post CycleFormer: A New Transformer Model for the Traveling Salesman Problem (TSP) appeared first on MarkTechPost.

Source: Read MoreÂ

IBM’s next generation Granite models are now available

The Human Element: Using Research And Psychology To Elevate Data Storytelling

Google to offer free version of Gemini Code Assist

MongoDB acquires Voyage AI for its embedding and reranking models

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

OpenAI expands ‘Deep Reseach’ to those paying $20 a month or more, a day after Microsoft made OpenAI’s ‘Think Deeper’ free for all Copilot users with no usage caps

Rethink State💡 Why You Should Model Your Frontend Around Events

Rethink State💡 Why You Should Model Your Frontend Around Events

What To Expect When Migrating Your Site To A New Platform

Kotlin Multiplatform vs. React Native vs. Flutter: Building Your First App

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

AI-generated content in games is here to stay — the bigger issue is the outright deception and what the future may look like

Razer and Minecraft just announced a limited-edition collection, and I’m surprised it took so long

Panos Panay’s Amazon AI move: A bold bet or another Surface Duo?

CycleFormer: A New Transformer Model for the Traveling Salesman Problem (TSP)

ANDI Accessibility Testing Tool Tutorial

How Data Analytics in Insurance is Driving Smarter Decisions

APIâ€™d Like to Talk to You: A Dive into the OpenAI Assistant API

How to Fuzz Test Golang HTTP Services

Kobold â€“ easy declarative web interfaces

OpenAI launches next-gen reasoning models with “incredible” coding capabilities

I demoed Samsung’s new Galaxy AI features – these 3 made my iPhone look bad

Protecting children online: Where Florida’s new law falls short

Implementing login node load balancing in SageMaker HyperPod for enhanced multi-user experience

Mastering Buttons in CSS

CycleFormer: A New Transformer Model for the Traveling Salesman Problem (TSP)

Related Posts