AI: post transformers

EXPLORE

Society & Culture

Religion & Spirituality

© 2024 PodJoint

https://is1-ssl.mzstatic.com/image/thumb/Podcasts211/v4/92/f0/ad/92f0adf4-2b10-a63c-bc79-1889b710b139/mza_6601485165628379978.jpg/600x600bb.jpg

AI: post transformers

mcgrof

316 episodes

1 day ago

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Show more...

All content for AI: post transformers is the property of mcgrof and is served directly from their servers with no modification, redirects, or rehosting. The podcast is not affiliated with or endorsed by Podjoint in any way.

The transformer architecture revolutionized the world of Neural Networks. It was a springboard for what we know today as modern artificial intelligence. This podcast focuses on modern state of the art research paper reviews starting from the transformer and on.

Show more...

Episodes (20/316)

AI: post transformers

vAttention Vs Strata: advanced GPU memory management

We compare and contrast two advanced 2025 memory management and scheduling techniques for optimizing Large Language Model (LLM) serving throughput and latency:

vAttention Vs Strata

One core innovation discussed is **vAttention**, which improves upon the popular PagedAttention method by leveraging CUDA Virtual Memory Management (VMM) APIs to keep the KV cache virtually contiguous, thereby simplifying **attention kernel portability** and reducing performance overheads associated with non-contiguous memory access. The other major focus is **Strata**, a hierarchical context caching framework that boosts throughput by employing **GPU-assisted I/O and cache-aware scheduling** to efficiently manage and transfer KV cache data between CPU and GPU memory, specifically mitigating the "delay hit phenomenon" and allowing for on-the-fly data layout transformations. Both systems aim to resolve the efficiency challenges inherent in LLM inference, particularly during the resource-intensive prefill and decode phases, with Strata showing substantial throughput gains over existing hierarchical caching solutions. Ultimately, vAttention and Strata represent different, yet potentially complementary, approaches to addressing the **memory fragmentation and I/O bottlenecks** that limit LLM serving performance.

Sources:

January 29, 2025

vAttention: Dynamic Memory Management for

Serving LLMs without PagedAttention

https://arxiv.org/pdf/2405.04437

August 26 2025

Strata: Hierarchical Context Caching for Long Context Language Model Serving

https://arxiv.org/html/2508.18572v1

1 day ago

35 minutes 4 seconds

AI: post transformers

AMD: Instella: Fully Open Language Models with Stellar Performance

The November 13, 2025 paper by AMD introducs **Instella**, a new family of **fully open-source** three-billion-parameter large language models (LLMs) developed by AMD and powered by their Instinct MI300X GPUs. The central focus is on advancing transparency and reproducibility in LLMs by releasing not only the model weights but also the **complete training pipeline, datasets, and optimization details**. Instella achieves **state-of-the-art performance** among fully open models of its size, remaining competitive with leading open-weight counterparts despite using fewer pre-training tokens. The family includes specialized variants: **Instella-Long**, which supports a 128K token context length, and **Instella-Math**, a reasoning-centric model enhanced through specialized supervised fine-tuning and reinforcement learning. The document details the two-stage pre-training, post-training, and specific methods used to create the Long and Math versions, demonstrating that **openness does not compromise performance**.

Source:

https://arxiv.org/pdf/2511.10628

3 days ago

10 minutes 15 seconds

AI: post transformers

Mechanistic interpretability: Decoding the AI's Inner Logic: Circuits and Sparse Features

Ten different sources are used in this episode which are excerpts from academic papers and technical reports focusing on mechanistic interpretability and sparse autoencoders in language models (LLMs) and vision-language models (VLMs). This episode explores the state-of-the-art in **Mechanistic Interpretability** (MI), focusing on how researchers are decomposing large language models (LLMs) and multimodal models (MLLMs) into understandable building blocks. A central theme is the power of **Sparse Autoencoders (SAEs)**, which address the issue of polysemanticity—where a single neuron represents many unrelated concepts—by training overcomplete bases to extract sparse, **monosemantic features**. The episode would detail the successful scaling of SAEs to production models like Claude 3 Sonnet and Claude 3.5 Haiku, demonstrating that these techniques reveal features that are often abstract, multilingual, and even generalize across modalities (from text to images). Listeners would learn how advanced techniques like **Specialized SAEs (SSAEs)** are developed using dense retrieval to target and interpret rare or domain-specific "dark matter" concepts, such as specialized physics knowledge or toxicity patterns, that are often missed by general methods. The fundamental goal is establishing a linear representation of concepts that facilitates precise understanding and, crucially, manipulation of model internals.

The second half of the episode dives into the application of these features to trace computational pathways, or **circuits**, using tools like **attribution graphs** and causal interventions. We explore concrete discoveries regarding LLM reasoning, such as identifying the modular circuit components—like queried-rule locating, fact-processing, and decision heads—that execute propositional logic and multi-step reasoning. We review how these mechanistic insights enable **precise control**, such as editing a model's diagnostic hypothesis (e.g., in medical scenarios) or circumventing refusal behaviors (jailbreaks) by overriding harmful request features. We cover cutting-edge intervention methods like **Attenuation via Posterior Probabilities (APP)**, which leverages the improved separation of concepts achieved by SAEs to perform highly effective and minimally disruptive concept erasure.

Sources:

1. 2025, Carnegie Mellon University: https://aclanthology.org/2025.findings-naacl.87.pdf (Source for Specialized Sparse Autoencoders)

2. 2025, OpenAI: (Implicit Source: PDF for the paper titled "Weight-sparse transformers have interpretable circuits," attributed to an OpenAI author)

3. 2024, Anthropic: (Implied Source URL for the work "Scaling monosemanticity: Extracting interpretable features from claude 3 sonnet," published May 21, 2024)

4. 2024, Anthropic: The claude 3 model family: Opus, sonnet, haiku (URL/document cited in circuit analysis work)

5. 2024, Gemma Team: https://arxiv.org/abs/2408.00118 (Gemma 2: Improving open language models at a practical size)

6. 2024, OpenAI: https://openai.com/index/learning-to-reason-with-llms/ (Learning to reason with LLMs)

7. 2023, Transformer Circuits Thread: https://transformer-circuits.pub/2023/monosemantic-features/index.html (Towards Monosemanticity: Decomposing Language Models With Dictionary Learning)

8. 2022, AI Alignment Forum: https://www.alignmentforum.org/posts/JvZhhzycHu2Yd57RN/causal-scrubbing-a-method-for-rigorously-testing (Causal scrubbing)

9. 2022, Transformer Circuits Thread: https://transformer-circuits.pub/2022/solu/index.html (Softmax Linear Units)

10. 2021, Transformer Circuits Thread: https://transformer-circuits.pub/2021/framework/index.html (A mathematical framework for transformer circuits)

4 days ago

29 minutes 19 seconds

AI: post transformers

Spectral Gap: Analysis of Attention Layers and Graph Transformers

We review two papers on Spectral Gap, one 2021 and another from 2025. The first source presents the **Spectral Attention Network (SAN)**, a novel Transformer-based architecture for graph neural networks that addresses the difficulty of defining positional encodings in graphs by leveraging the **full Laplacian spectrum** to learn node positions. This approach, which involves a **Learned Positional Encoding (LPE)**, enables the fully-connected Transformer to overcome limitations of traditional Graph Neural Networks (GNNs) like **over-squashing** and achieves competitive or superior performance on standard benchmarks. The second source analyzes the **stability and signal propagation** in standard softmax-based attention layers of Transformers at initialization, identifying that a **spectral gap** in the attention matrix causes **rank collapse** both in the width and depth of the network, which hinders effective information flow and leads to **exploding gradients**. To remedy this, the authors propose a **simple modification** that removes the dominant outlier eigenvalue, demonstrating that this fix significantly **mitigates rank collapse** and stabilizes gradient growth in deep Transformer models. Both sources focus on **improving the theoretical foundations and performance** of attention mechanisms, with the first applying Transformers to graphs using spectral theory and the second addressing intrinsic instability issues in the core Transformer architecture.

Sources:

October 27, 2021:

Rethinking Graph Transformers with Spectral

Attention

https://arxiv.org/pdf/2106.03893

June 16, 2025:

Mind the Gap: a Spectral Analysis of Rank Collapse

and Signal Propagation in Attention Layers

https://arxiv.org/pdf/2410.07799

1 week ago

14 minutes 59 seconds

AI: post transformers

CARTRIDGE: Efficient In-Context Learning via Distillation

The June 13, 2025 joint collaboration between Stanford University, Caltech and University at Buffalo introduces a novel method called **CARTRIDGE** for efficiently handling long text corpora in large language models, addressing the high memory cost associated with standard **In-Context Learning (ICL)** and its required **KV cache**. A CARTRIDGE is a smaller, trained KV cache representation of a corpus, which is created offline using a technique termed **SELF-STUDY**. This training process involves generating synthetic conversational data about the corpus and employing a **context-distillation objective** to ensure the CARTRIDGE maintains the generality and structural awareness of ICL while dramatically reducing memory consumption (up to 38.6x less) and increasing throughput. The research demonstrates that CARTRIDGES can match or exceed ICL performance, enable **context length extrapolation** beyond the model's native window, and even be **composed** together at inference time. The paper also includes detailed ablation studies on the SELF-STUDY components and theoretical analysis contrasting this gradient-descent approach with other memory methods like linear attention on synthetic memory tasks.

Source: June 13, 2025

https://arxiv.org/pdf/2506.06266

1 week ago

18 minutes 40 seconds

AI: post transformers

Metacognition and Skill Discovery in LLM Math Reasoning

The May 20, 2024 academic paper explores the **metacognitive capabilities of Large Language Models (LLMs)**, specifically focusing on mathematical problem-solving. The core approach involves developing a method for a powerful LLM, such as GPT-4, to **identify and label mathematical questions with specific skills**, which are then organized into broader, interpretable categories. This process creates a **Skill Exemplar Repository** containing skill names matched with question-answer pairs. Experiments validate that providing an LLM with these skill labels and associated examples as in-context prompts **significantly improves accuracy** on challenging math datasets like MATH and GSM8K, outperforming baseline prompting techniques like Chain-of-Thought. Furthermore, the **skill knowledge transferred effectively** to other, less powerful LLMs and different math datasets, demonstrating the utility of this LLM-generated metacognitive framework.

Source:

https://arxiv.org/pdf/2405.12205

1 week ago

10 minutes 59 seconds

AI: post transformers

Context Distillation for Language Models

These five papers from 2022 up to 2025 discuss various **knowledge distillation techniques** aimed at transferring the capabilities of large language models (LLMs) to smaller, more efficient models, often without the need for explicit context during inference. One paper introduces **Contextualization Distillation** (CD) for Knowledge Graph Completion (KGC), demonstrating that utilizing LLMs like PaLM2 to generate descriptive context for triplets significantly enhances the performance of smaller, specialized KGC models, often outperforming direct use of LLMs for the task. Another source proposes **Context Distillation** as a general method for language models to internalize abstract instructions, step-by-step reasoning (scratch-pads), and concrete examples, effectively eliminating the need for lengthy prompts and improving inference efficiency. The third document details **In-context Learning Distillation**, a framework that combines in-context learning objectives with traditional language modeling to effectively transfer few-shot learning abilities from large to smaller models under different tuning paradigms. Finally, **Generative Prompt Internalization** (GenPI) is presented as a method to fully embed long, complex prompts into a smaller model by training it to generate the prompt content and the reasoning for its corresponding behavior, greatly increasing efficiency in agent-based applications.

2022: Learning by Distillation Context

https://arxiv.org/pdf/2209.15189

2022: In-context Learning Distillation: Transferring Few-shot

https://arxiv.org/pdf/2212.10670

2024: Contextualization Distillation from Large Language Model for Knowledge

Graph Completion

https://aclanthology.org/2024.findings-eacl.32.pdf

May 12, 2025: Efficient LLM Context Distillation

https://arxiv.org/pdf/2409.01930

March 25, 2025: Generative Prompt Internalization

https://arxiv.org/pdf/2411.15927

1 week ago

33 minutes 23 seconds

AI: post transformers

Tempo: SLO-Aware LLM Serving Maximizing Service Gain

The April 24, 2025 academic paper introduces **Tempo**, a novel scheduling system designed to optimize Large Language Model (LLM) serving by addressing the wide variety of Service Level Objectives (**SLOs**) in modern LLM applications. The authors categorize requests into three types—**latency-sensitive**, **throughput-intensive**, and **collective requests**—each with distinct performance requirements that existing schedulers fail to manage effectively. Tempo maximizes "service gain" by allocating just enough serving bandwidth to meet each request’s SLO, utilizing a **hybrid scheduling strategy** that relies on lightweight prediction models for conservative initial estimates of response length and **dependency-graph matching** for complex workflows. Evaluations demonstrate that Tempo significantly outperforms state-of-the-art systems in terms of both service gain and **SLO goodput** across diverse workloads and models.

Source:

April 24, 2025

Tempo: Application-aware LLM Serving with Mixed SLO Requirements

https://arxiv.org/pdf/2504.20068

1 week ago

14 minutes 52 seconds

AI: post transformers

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

The January 30, 2025 paper introduces **LLM-AutoDiff**, a novel framework for **Automatic Prompt Engineering (APE)** that allows for the optimization of complex Large Language Model (LLM) workflows. This framework models an entire LLM application—including multiple LLM calls, functional components like retrievers, and cyclical operations—as a **directed, auto-differentiable graph**. By treating textual inputs as trainable parameters, LLM-AutoDiff uses a separate "backward engine" LLM to generate **textual gradients** (feedback) that guide an optimizer LLM to revise prompts, effectively automating the manual and labor-intensive process of prompt engineering. The paper details several technical advances, such as **pass-through gradients for functional nodes** and **time-sequential gradients for cyclic structures**, to ensure accurate error attribution across multi-component pipelines, ultimately demonstrating improved accuracy and efficiency over existing textual gradient and few-shot baselines.

Source:

January 30, 2025

LLM-AutoDiff: Auto-Differentiate Any LLM Workflow

https://arxiv.org/pdf/2501.16673

1 week ago

14 minutes 26 seconds

AI: post transformers

Confucius: Intent-Driven Network Management with Multi-Agent LLMs

The August 27, 2025 paper introduces **Confucius**, a novel multi-agent Large Language Model (LLM) framework developed by Meta for **intent-driven network management** in hyper-scale environments. The framework models complex management tasks as **directed acyclic graphs (DAGs)** and integrates LLMs with existing tools using domain-specific languages (**DSLs**) to enhance planning and execution. Confucius leverages **Retrieval-Augmented Generation (RAG)** for long-term memory and employs specialized primitives like **Translator**, **Selector**, and **Collector** to improve translation accuracy and systematic validation for critical network operations. Successfully deployed for two years with over 60 onboarded applications, the system aims to significantly reduce manual engineering effort for tasks such as capacity planning and fault diagnosis while maintaining high accuracy.

Source:

https://dl.acm.org/doi/10.1145/3718958.3750537

1 week ago

19 minutes 58 seconds

AI: post transformers

SYMPHONY: Memory Management for LLM Multi-Turn Inference

The 2024 paper introduces **SYMPHONY**, a novel system designed to improve memory management and scheduling for **Large Language Model (LLM) inference workloads**, particularly those involving multi-turn interactions like chatbots and AI agents. The authors, researchers from the University of Texas-Austin and the University of Wisconsin-Madison, explain that existing LLM serving engines either waste computation by **recomputing Key-Value (K,V) caches** or suffer from **load imbalance** by offloading caches to host memory, creating stateful workloads. SYMPHONY addresses these issues by using "advisory requests"—signals indicating the likely arrival of a new request—to **proactively migrate K,V caches** off the critical serving path, thereby enabling fine-grained scheduling and load balancing. Evaluation results demonstrate that SYMPHONY significantly reduces latency and can handle **over eight times the number of requests** compared to state-of-the-art baselines.

Source:

December 21, 2024

SYMPHONY: Improving Memory Management for LLM Inference Workloads

https://arxiv.org/pdf/2412.16434

1 week ago

16 minutes 45 seconds

AI: post transformers

DSPy and TextGrad: Compiling Language Model Systems

These two academic papers introduce novel programming models aimed at systematically optimizing complex AI systems, particularly those built using Large Language Models (LLMs). The first source presents **DSPy**, a framework that abstracts traditional, hard-coded LLM pipelines into parameterized, declarative modules that can be automatically optimized using a compiler and **teleprompters**, demonstrating superior performance compared to hand-crafted prompts on tasks like math word problems. The second source introduces **TEXTGRAD**, a general optimization framework that utilizes LLMs to generate and propagate **natural language gradients**—textual feedback—through computation graphs, applying this "textual differentiation" approach successfully across diverse domains, including prompt optimization, code refinement, and scientific applications like molecular and medical treatment plan design. Both works highlight the shift from relying on expert prompt engineering to employing systematic, programmatic optimization techniques for compound AI systems.

Sources:

October 5, 2023

DSPY: COMPILING DECLARATIVE LANGUAGE

MODEL CALLS INTO SELF-IMPROVING PIPELINES

https://arxiv.org/pdf/2310.03714

June 11, 2024

TextGrad: Automatic “Differentiation” via Text

https://arxiv.org/pdf/2406.07496

1 week ago

15 minutes 54 seconds

AI: post transformers

Vidur: Simulation for Efficient LLM Inference Deployment

The May 21, 2024 paper introduces **Vidur**, a new, high-fidelity simulation framework designed to optimize the deployment and performance of Large Language Model (LLM) inference. The authors explain that experimentally optimizing LLM deployment is **prohibitively expensive**, requiring exploration of a vast configuration space of system parameters like parallelization strategies and batching techniques, which can cost hundreds of thousands of dollars and thousands of GPU hours. Vidur addresses this by using **predictive modeling and experimental profiling** of LLM operators to estimate end-to-end performance metrics, achieving less than 9% error in latency estimation. Complementing the simulator is **Vidur-Search**, a configuration search tool that leverages Vidur to automatically identify the most cost-effective deployment settings that meet application performance constraints, reducing optimization time from months of GPU time to approximately one hour on a CPU machine. The research emphasizes that the **optimal configuration depends on both the LLM and the specific workload trace**, justifying the need for a rapid simulation tool like Vidur.

Source:

May 21, 2024

VIDUR: A LARGE-SCALE SIMULATION FRAMEWORK FOR LLM INFERENCE

https://arxiv.org/pdf/2405.05465

1 week ago

18 minutes 45 seconds

AI: post transformers

Continuous Autoregressive Language Models: CALM

The October 31, 2025 paper introduces **Continuous Autoregressive Language Models (CALM)**, a new paradigm designed to overcome the efficiency bottleneck of traditional Large Language Models (LLMs) by shifting from discrete token-by-token prediction to **continuous next-vector prediction**. This approach compresses a chunk of multiple tokens into a single continuous vector using a **high-fidelity autoencoder**, thereby reducing the number of generative steps and significantly improving the performance-compute trade-off. To manage the challenges of operating in this continuous, likelihood-free domain, the framework includes a comprehensive toolkit: an **energy loss function** for training, a novel, sample-based evaluation metric called **BrierLM**, and **likelihood-free algorithms for temperature sampling**. Ultimately, the CALM framework establishes **semantic bandwidth** as a powerful new axis for scaling language models, enabling superior efficiency compared to discrete baselines.

Source:

October 31, 2025

CONTINUOUS AUTOREGRESSIVE LANGUAGE MODELS

https://arxiv.org/pdf/2510.27688

1 week ago

18 minutes 10 seconds

AI: post transformers

A Framework for LLM Application Safety Evaluation

The July 13, 2025 paper " Measuring What Matters: A Framework for Evaluating Safety Risks in Real-World LLM Applications" introduces a practical **framework for evaluating safety risks** in real-world Large Language Model (LLM) applications, arguing that current methods focusing only on foundation models are inadequate. This framework consists of two main parts: **principles for developing customized safety risk taxonomies** and **practices for evaluating these risks** within the application itself, which often includes components like system prompts and guardrails. It emphasizes the need for organizations to **contextualize general risks** and create taxonomies that are practical and specific to their operational context, as demonstrated by a case study from a government agency. The document then outlines a **safety testing pipeline** that involves curating meaningful and diverse adversarial prompts, running automated black-box tests, and evaluating model responses, particularly focusing on the use of refusals as a measure of safety.

Source:

July 13, 2025

Measuring What Matters: A Framework for Evaluating Safety Risks

in Real-World LLM Applications

https://arxiv.org/pdf/2507.09820

1 week ago

15 minutes 52 seconds

AI: post transformers

Doubly Stochastic Attention for Transformers

The four papers we review dated from 1967 up to two papers in 2025 collectively discuss the mathematical properties and deep learning applications of **doubly stochastic matrices**, which are nonnegative matrices whose rows and columns sum to one. One paper, "Concerning Nonnegative Matrices and Doubly Stochastic Matrices," provides the **foundational mathematical theory** regarding the convergence of iterative row and column scaling (known as the Sinkhorn algorithm) to a unique doubly stochastic matrix, contingent on the original matrix having "total support." The other papers focus on **Transformer architecture enhancements**, proposing "Sinkformers" and "Sparse Sinkhorn Attention" as variants that replace the standard row-wise SoftMax attention with the Sinkhorn algorithm to enforce **doubly stochastic attention matrices** for improved performance and theoretical properties, such as a connection to the Wasserstein metric. Furthermore, the "Gradient Multi-Normalization" paper introduces a **stateless optimizer** that uses a multi-normalization procedure, including a "Square-Root Sinkhorn" variant, demonstrating its efficacy and efficiency in training large language models.

Sources:

1967:

CONCERNING NONNEGATIVE MATRICES AND DOUBLY STOCHASTIC MATRICES

https://projecteuclid.org/journalArticle/Download?urlId=pjm%2F1102992505

June 24, 2022:

Sinkformers: Transformers with Doubly Stochastic Attention

https://arxiv.org/pdf/2110.11773

February 10, 2025:

Gradient Multi-Normalization for Stateless and Scalable LLM Training

https://arxiv.org/pdf/2502.06742

July 12, 2025:

ESPFormer: Doubly-Stochastic Attention with Expected Sliced Transport Plans

https://arxiv.org/pdf/2502.07962

1 week ago

35 minutes 23 seconds

AI: post transformers

Random Walk Methods for Graph Learning and Networks

We provide a review of the evolution of value of Page Rank to Random Walk with Random Restart and it's application to neural networks focusing on five research papers dating from the original page rank to 2025. They collectively focus on methods for learning on graphs, particularly through the use of **Random Walk Neural Networks (RWNNs)** and related random walk algorithms. One primary source introduces RWNNs, detailing their architecture, which involves a random walk generating a machine-readable record processed by a deep neural network, demonstrating that these models can achieve **universal approximation of graph functions** and overcome issues like over-smoothing found in Message Passing Neural Networks (MPNNs). This source also explores techniques like **anonymization** and **named neighbors** for walk recording and includes experimental results on graph isomorphism and transductive classification using language models like DeBERTa and Llama 3. The other sources provide brief contextual support, mentioning **Random Walk with Restart (RWR)** parameters and evaluation criteria like **Relative Accuracy** and **Relative Score** for related graph applications and datasets, suggesting connections to established graph algorithms such as PageRank.

Sources:

2025:

REVISITING RANDOM WALKS FOR LEARNING ON GRAPHS

https://proceedings.iclr.cc/paper_files/paper/2025/file/cd51b67dcb19db4e9f0022f500076b00-Paper-Conference.pdf

October 3, 2022:

Universal Multilayer Network Exploration by

Random Walk with Restart

https://arxiv.org/pdf/2107.04565

2020:

Random Walk Graph Neural Networks

https://proceedings.neurips.cc/paper/2020/file/ba95d78a7c942571185308775a97a3a0-Paper.pdf

2006:

Fast Random Walk with Restart and Its Applications

https://www.cs.cmu.edu/~htong/pdf/ICDM06_tong.pdf

January 29, 1998:

The Page Rank Citation Ranking: Bringing Order to the Web

https://www.cis.upenn.edu/~mkearns/teaching/NetworkedLife/pagerank.pdf

1 week ago

14 minutes 57 seconds

AI: post transformers

AlphaEvolve: Mathematical Discovery at Scale

The November 3, 2025 paper provide an overview of the **AlphaEvolve** system, an AI-powered evolutionary approach for mathematical exploration and discovery, often involving large language models (LLMs) to generate and mutate programs that act as search heuristics for finding extremal mathematical objects. The first source is a **GitHub repository** from Google DeepMind that serves as a **collection of 67 mathematical problems** used to test AlphaEvolve, noting that the code for the system itself is not included. The second, more extensive source is an **academic paper** detailing AlphaEvolve's methodology, contrasting it with its predecessor FunSearch, and showcasing its application to solving a **wide array of complex, open mathematical problems** across combinatorics, geometry, analysis, and number theory, frequently achieving or improving upon state-of-the-art bounds. This system demonstrates proficiency in both a standard search mode for fixed problem sizes and a "generalizer mode" for developing programs that solve problems for arbitrary inputs.

Sources:

https://arxiv.org/pdf/2511.02864

https://github.com/google-deepmind/alphaevolve_repository_of_problems

1 week ago

14 minutes 26 seconds

AI: post transformers

AdaFlow: Variance-Adaptive Flow-Based Imitation Learning

The November 22, 2024 paper from UT Texas introduces **AdaFlow**, a novel imitation learning framework designed to improve both the efficiency and diversity of policy generation, addressing computational bottlenecks found in previous diffusion-based methods. AdaFlow utilizes **flow-based generative modeling** represented by ordinary differential equations (ODEs) and incorporates a **variance-adaptive ODE solver** that dynamically adjusts the number of inference steps based on the complexity of the state. This adaptive approach allows AdaFlow to function as a highly efficient **one-step action generator** for states with deterministic actions while retaining the ability to produce diverse actions for multi-modal scenarios. Empirical results across various benchmarks, including maze navigation and complex robot manipulation tasks, demonstrate that AdaFlow achieves high success rates with significantly **reduced inference time** compared to state-of-the-art models like Diffusion Policy. The research establishes a connection between the conditional variance of the training loss and the discretization error of the ODEs, providing the theoretical basis for AdaFlow’s computational adaptivity.

Source:

https://arxiv.org/pdf/2402.04292

1 week ago

12 minutes 12 seconds

AI: post transformers

zFLoRA: Zero-Latency Fused Low-Rank Adapters

The October 28, 2025 Samsung research paper introduces **zFLoRA (zero-latency fused low-rank adapter)**, a novel parameter-efficient fine-tuning (PEFT) method designed to address the significant inference latency overheads associated with current adapter methods like **LoRA** in large language models (LLMs). The core contribution is a carefully engineered fusion of adapter blocks with the base model to achieve **zero or negligible latency overhead** during inference, leveraging optimized matrix multiplication on hardware like **NVIDIA H100 GPU** and **Samsung Galaxy S25+ NPU**. Experimental results across LLMs ranging from 1B to 7B parameters demonstrate that zFLoRA maintains **performance comparable to LoRA and Full Fine-Tuning (FFT)** across reasoning and generation tasks, while effectively eliminating the latency penalty, as visually confirmed by accompanying bar graphs. The paper details the architectural design of zFLoRA, which avoids costly expansion and merge operations present in naive fused adapter designs, and includes extensive **latency measurements** validating its efficiency on various platforms.

Source:

https://arxiv.org/pdf/2510.25784

2 weeks ago

14 minutes 57 seconds