здесь просто посты со статьями
33 subscribers
45 photos
37 links
здесь просто посты со статьями. Без пояснений. Добавляю то, что нахожу.

Преимущественно про LLM-агенты, бенчмаркинг, ризонинг, геометрию в DL

описание статей - это cpy/past абстракты или gpt-generated тексты

основной канал: https://t.iss.one/junkyardmathml
Download Telegram
The Relativity of Causal Knowledge

Recent advances in artificial intelligence reveal the limits of purely predictive systems and call for a shift toward causal and collaborative reasoning. Drawing inspiration from the revolution of Grothendieck in mathematics, we introduce the relativity of causal knowledge, which posits structural causal models (SCMs) are inherently imperfect, subjective representations embedded within networks of relationships. By leveraging category theory, we arrange SCMs into a functor category and show that their observational and interventional probability measures naturally form convex structures. This result allows us to encode non-intervened SCMs with convex spaces of probability measures. Next, using sheaf theory, we construct the network sheaf and cosheaf of causal knowledge. These structures enable the transfer of causal knowledge across the network while incorporating interventional consistency and the perspective of the subjects, ultimately leading to the formal, mathematical definition of relative causal knowledge.
👍2🔥2
Cellular Sheaves of Lattices and the Tarski Laplacian

This paper initiates a discrete Hodge theory for cellular sheaves taking values in a category of lattices and Galois connections. The key development is the Tarski Laplacian, an endomorphism on the cochain complex whose fixed points yield a cohomology that agrees with the global section functor in degree zero. This has immediate applications in consensus and distributed optimization problems over networks and broader potential applications.
🤔2👍1🔥1🤯1
An Algebraic Notion of Conditional Independence, and Its Application to Knowledge Representation

Conditional independence is a crucial concept supporting adequate modelling and efficient reasoning in probabilistics. In knowledge representation, the idea of conditional independence has also been introduced for specific formalisms, such as propositional logic and belief revision. In this paper, the notion of conditional independence is studied in the algebraic framework of approximation fixpoint theory. This gives a language-independent account of conditional independence that can be straightforwardly applied to any logic with fixpoint semantics. It is shown how this notion allows to reduce global reasoning to parallel instances of local reasoning, leading to fixed-parameter tractability results. Furthermore, relations to existing notions of conditional independence are discussed and the framework is applied to normal logic programming.
👍2🔥1🫡1
Text2World: Benchmarking Large Language Models for Symbolic World Model Generation

Recently, there has been growing interest in leveraging large language models (LLMs) to generate symbolic world models from textual descriptions. Although LLMs have been extensively explored in the context of world modeling, prior studies encountered several challenges, including evaluation randomness, dependence on indirect metrics, and a limited domain scope. To address these limitations, we introduce a novel benchmark, Text2World, based on planning domain definition language (PDDL), featuring hundreds of diverse domains and employing multi-criteria, execution-based metrics for a more robust evaluation. We benchmark current LLMs using Text2World and find that reasoning models trained with large-scale reinforcement learning outperform others. However, even the best-performing model still demonstrates limited capabilities in world modeling. Building on these insights, we examine several promising strategies to enhance the world modeling capabilities of LLMs, including test-time scaling, agent training, and more. We hope that Text2World can serve as a crucial resource, laying the groundwork for future research in leveraging LLMs as world models. The project page is available at this https URL.
🤔2👍1🔥1
WorldCoder, a Model-Based LLM Agent: Building World Models by Writing Code and Interacting with the Environment

We give a model-based agent that builds a Python program representing its knowledge of the world based on its interactions with the environment. The world model tries to explain its interactions, while also being optimistic about what reward it can achieve. We define this optimism as a logical constraint between a program and a planner. We study our agent on gridworlds, and on task planning, finding our approach is more sample-efficient compared to deep RL, more compute-efficient compared to ReAct-style agents, and that it can transfer its knowledge across environments by editing its code.
👍1😱1🫡1
Autoformalizing Natural Language to First-Order Logic: A Case Study in Logical Fallacy Detection

Translating natural language into formal language such as First-Order Logic (FOL) is a foundational challenge in NLP with wide-ranging applications in automated reasoning, misinformation tracking, and knowledge validation. In this paper, we introduce Natural Language to First-Order Logic (NL2FOL), a framework to autoformalize natural language to FOL step by step using Large Language Models (LLMs). Our approach addresses key challenges in this translation process, including the integration of implicit background knowledge. By leveraging structured representations generated by NL2FOL, we use Satisfiability Modulo Theory (SMT) solvers to reason about the logical validity of natural language statements. We present logical fallacy detection as a case study to evaluate the efficacy of NL2FOL. Being neurosymbolic, our approach also provides interpretable insights into the reasoning process and demonstrates robustness without requiring model fine-tuning or labeled training data. Our framework achieves strong performance on multiple datasets. On the LOGIC dataset, NL2FOL achieves an F1-score of 78%, while generalizing effectively to the LOGICCLIMATE dataset with an F1-score of 80%.
👍1🔥1🫡1
Divide and Translate: Compositional First-Order Logic Translation and Verification for Complex Logical Reasoning

Complex logical reasoning tasks require a long sequence of reasoning, which a large language model (LLM) with chain-of-thought prompting still falls short. To alleviate this issue, neurosymbolic approaches incorporate a symbolic solver. Specifically, an LLM only translates a natural language problem into a satisfiability (SAT) problem that consists of first-order logic formulas, and a sound symbolic solver returns a mathematically correct solution. However, we discover that LLMs have difficulties to capture complex logical semantics hidden in the natural language during translation. To resolve this limitation, we propose a Compositional First-Order Logic Translation. An LLM first parses a natural language sentence into newly defined logical dependency structures that consist of an atomic subsentence and its dependents, then sequentially translate the parsed subsentences. Since multiple logical dependency structures and sequential translations are possible for a single sentence, we also introduce two Verification algorithms to ensure more reliable results. We utilize an SAT solver to rigorously compare semantics of generated first-order logic formulas and select the most probable one. We evaluate the proposed method, dubbed CLOVER, on seven logical reasoning benchmarks and show that it outperforms the previous neurosymbolic approaches and achieves new state-of-the-art results.
1👍1😱1
A Categorical Representation Language and Computational System for Knowledge-Based Planning

Classical planning representation languages based on first-order logic have preliminarily been used to model and solve robotic task planning problems. Wider adoption of these representation languages, however, is hindered by the limitations present when managing implicit world changes with concise action models. To address this problem, we propose an alternative approach to representing and managing updates to world states during planning. Based on the category-theoretic concepts of C-sets and double-pushout rewriting (DPO), our proposed representation can effectively handle structured knowledge about world states that support domain abstractions at all levels. It formalizes the semantics of predicates according to a user-provided ontology and preserves the semantics when transitioning between world states. This method provides a formal semantics for using knowledge graphs and relational databases to model world states and updates in planning. In this paper, we conceptually compare our category-theoretic representation with the classical planning representation. We show that our proposed representation has advantages over the classical representation in terms of handling implicit preconditions and effects, and provides a more structured framework in which to model and solve planning problems.
👍1🔥1🫡1
LogiCity: Advancing Neuro-Symbolic AI with Abstract Urban Simulation (NIPS 2024)

В статье представлена разработка нового симулятора LogiCity для исследования нейро-символьных (NeSy) систем искусственного интеллекта. Авторы отмечают, что существующие тестовые среды для оценки NeSy-методов часто ограничены простыми логическими правилами, небольшим числом сущностей и не охватывают сложные сценарии взаимодействия множества агентов на длительном временном горизонте, что делает их недостаточно реалистичными. Для решения этих проблем авторы предлагают LogiCity — первый симулятор, основанный на настраиваемой логике первого порядка, который моделирует городской тип среды с множеством динамических агентов. В симуляторе используются пространственные и семантические понятия (например, IsAmbulance(X), IsClose(X,Y)), которые позволяют задавать абстрактные логические правила, управляющие поведением агентов. Благодаря своей абстрактности эти правила могут применяться к городам с различным составом агентов, создавая разнообразные сценарии. Важным преимуществом LogiCity является возможность настройки уровня абстракции пользователем, что позволяет варьировать сложность симуляций. Авторы статьи демонстрируют потенциал LogiCity на двух типах задач: первая задача проверяет долгосрочное последовательное принятие решений агентами, а вторая — визуальное логическое рассуждение за один шаг. Проведённые эксперименты показывают, что нейро-символьные подходы имеют преимущество в задачах, требующих абстрактного логического мышления.

code: https://github.com/Jaraxxus-Me/LogiCity
1👍1🔥1
Debate on Graph: a Flexible and Reliable Reasoning Framework for Large Language Models (AAAI 2025)

В статье исследуется, как объединить большие языковые модели (LLM) с графами знаний (KG) для ответа на вопросы с их использованием (KGQA). Графы знаний содержат обширные данные с множеством связей, но текущие методы сталкиваются с проблемами: слишком длинные цепочки рассуждений мешают формировать ответы, а ложные связи затрудняют их уточнение.

Авторы предлагают новый подход — Debating over Graphs (DoG), интерактивную систему KGQA, которая задействует возможности LLM для обучения и рассуждений. DoG работает в два этапа. Сначала он сосредотачивается на подграфах и проверяет ответы после каждого шага, что помогает избежать длинных рассуждений. Затем с помощью «дебата» между разными ролями сложные вопросы упрощаются, а влияние ложных связей уменьшается, делая выводы более надежными. Тесты на публичных наборах данных подтвердили эффективность подхода. DoG обошел лучший из существующих методов, ToG, на 23,7% по точности на WebQuestions и на 9,1% на GrailQA.

code: https://github.com/reml-group/DoG
👍1🔥1🥰1
The KoLMogorov Test: Compression by Code Generation (ICLR 2025)

Сжатие считается ключевым элементом интеллекта. В теории, чтобы идеально сжать любую последовательность данных, нужно найти самую короткую программу, которая её воспроизводит и затем завершает работу. Это так называемое «сжатие Колмогорова», но оно, увы, невычислимо. Современные языковые модели, генерирующие код, лишь с трудом подбираются к этому идеалу, ведь для этого нужны способности к рассуждению, планированию и поиску, которых у них пока нет в полной мере. В этой работе авторы предлагают тест KoLMogorov (KT) — испытание, где сжатие становится мерилом интеллекта для таких моделей. Суть теста проста: модель получает последовательность данных и должна выдать кратчайшую программу, способную её создать. У KT есть свои плюсы как для проверки моделей, так и для их обучения. Во-первых, задач можно придумать бесконечно много, и они будут разной сложности, но при этом их легко достать. Во-вторых, уже есть надёжные ориентиры для сравнения. В-третьих, метрика сжатия — штука честная, её не обманешь. К тому же, вряд ли данные для теста пересекутся с тем, на чём модели учились раньше. Для проверки авторы взяли аудио, тексты, ДНК и даже случайные последовательности от синтетических программ. Результаты у топовых моделей, таких как GPT4-o и Llama-3.1-405B, оказались слабенькими — они спотыкаются и на естественных, и на искусственных данных.
👏2👍1🔥1
PUZZLES: A Benchmark for Neural Algorithmic Reasoning

Статья посвящена исследованию алгоритмического мышления как ключевой когнитивной способности, необходимой для решения задач и принятия решений. Авторы подчеркивают, что RL уже продемонстрировал значительные успехи в таких областях, как управление движением, обработка сенсорных данных и взаимодействие со случайными средами, что стало возможным благодаря разработке соответствующих тестовых платформ. В данной работе представлен новый бенчмарк PUZZLES, основанный на коллекции логических головоломок Саймона Татама, целью которого является развитие алгоритмических и логических навыков RL-агентов. PUZZLES включает 40 разнообразных головоломок с настраиваемыми размерами и уровнями сложности, многие из которых дополнены параметрами конфигурации, что расширяет их вариативность.
🤔2👍1🔥1
Challenging the Boundaries of Reasoning: An Olympiad-Level Math Benchmark for Large Language Models

Статья посвящена разработке нового бенчмарка для оценки математических способностей LLM. OlymMATH — бенчмарк с олимпиадными задачами сложного уровня, созданный для проверки LLM ризонинга. Бенчмарк включает 200 задач. Задачи разделены на два уровня: простые (уровень AIME) для базовой оценки и сложные, чтобы испытать пределы современных моделей. OlymMATH охватывает четыре области математики: алгебру, геометрию, теорию чисел и комбинаторику, каждая задача имеет числовой ответ для объективной проверки. Результаты показывают, что даже передовые модели, такие как DeepSeek-R1 и o3-mini от OpenAI, справляются со сложным уровнем слабо.
👍1🔥1🤔1
Topological Blindspots: Understanding and Extending Topological Deep Learning Through the Lens of Expressivity (ICLR 2025)

Topological deep learning (TDL) is a rapidly growing field that seeks to leverage topological structure in data and facilitate learning from data supported on topological objects, ranging from molecules to 3D shapes. Most TDL architectures can be unified under the framework of higher-order message-passing (HOMP), which generalizes graph message-passing to higher-order domains. In the first part of the paper, we explore HOMP's expressive power from a topological perspective, demonstrating the framework's inability to capture fundamental topological and metric invariants such as diameter, orientability, planarity, and homology. In addition, we demonstrate HOMP's limitations in fully leveraging lifting and pooling methods on graphs. To the best of our knowledge, this is the first work to study the expressivity of TDL from a \emph{topological} perspective. In the second part of the paper, we develop two new classes of architectures -- multi-cellular networks (MCN) and scalable MCN (SMCN) -- which draw inspiration from expressive GNNs. MCN can reach full expressivity, but scaling it to large data objects can be computationally expansive. Designed as a more scalable alternative, SMCN still mitigates many of HOMP's expressivity limitations. Finally, we create new benchmarks for evaluating models based on their ability to learn topological properties of complexes. We then evaluate SMCN on these benchmarks and on real-world graph datasets, demonstrating improvements over both HOMP baselines and expressive graph methods, highlighting the value of expressively leveraging topological information.

code: https://github.com/yoavgelberg/SMCN
🔥42👍2
Simple Is Effective: The Roles of Graphs and Large Language Models in Knowledge-Graph-Based Retrieval-Augmented Generation
(ICLR 2025)

Large Language Models (LLMs) demonstrate strong reasoning abilities but face limitations such as hallucinations and outdated knowledge. Knowledge Graph (KG)-based Retrieval-Augmented Generation (RAG) addresses these issues by grounding LLM outputs in structured external knowledge from KGs. However, current KG-based RAG frameworks still struggle to optimize the trade-off between retrieval effectiveness and efficiency in identifying a suitable amount of relevant graph information for the LLM to digest. We introduce SubgraphRAG, extending the KG-based RAG framework that retrieves subgraphs and leverages LLMs for reasoning and answer prediction. Our approach innovatively integrates a lightweight multilayer perceptron with a parallel triple-scoring mechanism for efficient and flexible subgraph retrieval while encoding directional structural distances to enhance retrieval effectiveness. The size of retrieved subgraphs can be flexibly adjusted to match the query's need and the downstream LLM's capabilities. This design strikes a balance between model complexity and reasoning power, enabling scalable and generalizable retrieval processes. Notably, based on our retrieved subgraphs, smaller LLMs like Llama3.1-8B-Instruct deliver competitive results with explainable reasoning, while larger models like GPT-4o achieve state-of-the-art accuracy compared with previous baselines -- all without fine-tuning. Extensive evaluations on the WebQSP and CWQ benchmarks highlight SubgraphRAG's strengths in efficiency, accuracy, and reliability by reducing hallucinations and improving response grounding.

code: https://github.com/Graph-COM/SubgraphRAG
👍2🔥2🫡1
From Tokens to Lattices: Emergent Lattice Structures in Language Models (ICLR 2025)

😱 это что-то очень важное и интересное 🤫

Pretrained masked language models (MLMs) have demonstrated an impressive capability to comprehend and encode conceptual knowledge, revealing a lattice structure among concepts. This raises a critical question: how does this conceptualization emerge from MLM pretraining? In this paper, we explore this problem from the perspective of Formal Concept Analysis (FCA), a mathematical framework that derives concept lattices from the observations of object-attribute relationships. We show that the MLM's objective implicitly learns a formal context that describes objects, attributes, and their dependencies, which enables the reconstruction of a concept lattice through FCA. We propose a novel framework for concept lattice construction from pretrained MLMs and investigate the origin of the inductive biases of MLMs in lattice structure learning. Our framework differs from previous work because it does not rely on human-defined concepts and allows for discovering "latent" concepts that extend beyond human definitions. We create three datasets for evaluation, and the empirical results verify our hypothesis.
Please open Telegram to view this post
VIEW IN TELEGRAM
🔥2👍1🫡1
Neural Spacetimes for DAG Representation Learning (ICLR 2025)

We propose a class of trainable deep learning-based geometries called Neural SpaceTimes (NSTs), which can universally represent nodes in weighted Directed Acyclic Graphs (DAGs) as events in a spacetime manifold. While most works in the literature focus on undirected graph representation learning or causality embedding separately, our differentiable geometry can encode both graph edge weights in its spatial dimensions and causality in the form of edge directionality in its temporal dimensions. We use a product manifold that combines a quasimetric (for space) and a partial order (for time). NSTs are implemented as three neural networks trained in an end-to-end manner: an embedding network, which learns to optimize the location of nodes as events in the spacetime manifold, and two other networks that optimize the space and time geometries in parallel, which we call a neural (quasi-)metric and a neural partial order, respectively. The latter two networks leverage recent ideas at the intersection of fractal geometry and deep learning to shape the geometry of the representation space in a data-driven fashion, unlike other works in the literature that use fixed spacetime manifolds such as Minkowski space or De Sitter space to embed DAGs. Our main theoretical guarantee is a universal embedding theorem, showing that any k-point DAG can be embedded into an NST with 1 + O(log(k)) distortion while exactly preserving its causal structure. The total number of parameters defining the NST is sub-cubic in k and linear in the width of the DAG. If the DAG has a planar Hasse diagram, this is improved to O(log(k) + 2) spatial and 2 temporal dimensions. We validate our framework computationally with synthetic weighted DAGs and real-world network embeddings; in both cases, the NSTs achieve lower embedding distortions than their counterparts using fixed spacetime geometries.
🔥21👍1
Beyond the Lazy versus Rich Dichotomy: Geometry Insights in Feature Learning from Task-Relevant Manifold Untangling

(ICLR 2025 rejected)

The ability to integrate task-relevant information into neural representations is a fundamental aspect of both human and machine intelligence. Recent studies have explored the transition of neural networks from the lazy training regime (where the trained network is equivalent to a linear model of initial random features) to the rich feature learning regime (where the network learns task-relevant features). However, most approaches focus on weight matrices or neural tangent kernels, limiting their relevance for neuroscience due to the lack of representation-based methods to study feature learning. Furthermore, the simple lazy-versus-rich dichotomy overlooks the potential for richer subtypes of feature learning driven by variations in learning algorithms, network architectures, and data properties.

In this work, we present a framework based on representational geometry to study feature learning. The key idea is to use the untangling of task-relevant neural manifolds as a signature of rich learning. We employ manifold capacity—a representation-based measure—to quantify this untangling, along with geometric metrics to uncover structural differences in feature learning. Our contributions are threefold: First, we show both theoretically and empirically that task-relevant manifolds untangle during rich learning, and that manifold capacity quantifies the degree of richness. Second, we use manifold geometric measures to reveal distinct learning stages and strategies driven by network and data properties, demonstrating that feature learning is richer than the lazy-versus-rich dichotomy. Finally, we apply our method to problems in neuroscience and machine learning, providing geometric insights into structural inductive biases and out-of-distribution generalization. Our work introduces a novel perspective for understanding and quantifying feature learning through the lens of representational geometry.

arxiv: https://arxiv.org/abs/2503.18114
👍2🔥2🫡1
Synthesizing Programmatic Reinforcement Learning Policies with Large Language Model Guided Search (ICLR 2025)

Programmatic reinforcement learning (PRL) has been explored for representing policies through programs as a means to achieve interpretability and generalization. Despite promising outcomes, current state-of-the-art PRL methods are hindered by sample inefficiency, necessitating tens of millions of program-environment interactions. To tackle this challenge, we introduce a novel LLM-guided search framework (LLM-GS). Our key insight is to leverage the programming expertise and common sense reasoning of LLMs to enhance the efficiency of assumption-free, random-guessing search methods. We address the challenge of LLMs' inability to generate precise and grammatically correct programs in domain-specific languages (DSLs) by proposing a Pythonic-DSL strategy - an LLM is instructed to initially generate Python codes and then convert them into DSL programs. To further optimize the LLM-generated programs, we develop a search algorithm named Scheduled Hill Climbing, designed to efficiently explore the programmatic search space to improve the programs consistently. Experimental results in the Karel domain demonstrate our LLM-GS framework's superior effectiveness and efficiency. Extensive ablation studies further verify the critical role of our Pythonic-DSL strategy and Scheduled Hill Climbing algorithm. Moreover, we conduct experiments with two novel tasks, showing that LLM-GS enables users without programming skills and knowledge of the domain or DSL to describe the tasks in natural language to obtain performant programs.
👍2🔥2🫡1
Can ChatGPT Learn My Life From a Week of First-Person Video?

Исследуется способность LLM узнавать о личной жизни владельца с помощью данных с action-камеры. Автор носил гарнитуру с камерой 54 часа в течение недели. Затем вгружал в GPT-4o and GPT-4o-mini и запрашивал анализ полученной инфы. Обе модели узнали базовую информацию об авторе (приблизительный возраст, пол). Более того, GPT-4o правильно сделала вывод, что автор из Питтсбурга, что он аспирант Carnegie Mellon University, правша и владелец кошки. Однако обе модели также страдали галлюцинациями и придумывали имена для людей, присутствующих на видеозаписях.
👍1🔥1🤣1
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization (ICLR 2025)

Retrieval-augmented generation (RAG) - подход для улучшения LLM во многих задачах, основанных на внешних знаниях. Существующие методы RAG работают не очень хорошо с задачами на ризонинг, поскольку полезная информация, необходимая для этих задач, сильно разбросана по системам хранения знаний. Статья мотивирована когнитивными исследованиями о том, что люди преобразуют необработанную информацию в различные оптимально структурированные знания при решении задач на мышление. В работе предлагается подход StructRAG, который определяет оптимальный тип структуры для данной задачи, транслирует информацию в этот формат и выводит ответы на основе извлеченных знаний. Эксперименты показывают, что StructRAG достигает sota в сложных реальных приложениях.

imho: За этим направлением (разумеется, в более развитой его форме, чем сейчас) следующий хайп и мб AGI
🔥2👍1