• maiweb v0.1.0
  • ★
  • Feedback

#preprint

11 sources tagged with this.

  • arXiv - Computer Science: Artificial Intelligence
  • arXiv - Computer Science: Machine Learning
  • arXiv - Subject Class Template
  • arXiv - cs.AI
  • arXiv - cs.CL
  • arXiv - cs.CV
  • arXiv - cs.LG
  • arXiv - hep-th
  • arXiv - math.AP
  • arXiv - math.PR
  • arXiv - quant-ph
  • arXiv - Computer Science: Artificial Intelligence arxiv.org ai arxiv computer-science preprint research science 2026-06-18 04:00
    ↗

    arXiv:2606.07591v3 Announce Type: replace-cross Abstract: AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous...

    arXiv:2606.07591v3 Announce Type: replace-cross Abstract: AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.
    • ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research arXiv - cs.CL
    • Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems arXiv - cs.CL
    • ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research arXiv - cs.AI
  • arXiv - cs.CL arxiv.org ai arxiv computer-science preprint repository 2026-06-18 04:00
    ↗

    arXiv:2606.07591v3 Announce Type: replace-cross Abstract: AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous...

    arXiv:2606.07591v3 Announce Type: replace-cross Abstract: AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.
    • ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research arXiv - Computer Science: Artificial Intelligence
    • Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems arXiv - cs.CL
    • ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research arXiv - cs.AI
  • arXiv - cs.AI arxiv.org ai arxiv computer-science preprint repository 2026-06-18 04:00
    ↗

    arXiv:2606.07591v3 Announce Type: replace-cross Abstract: AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous...

    arXiv:2606.07591v3 Announce Type: replace-cross Abstract: AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.
    • ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research arXiv - Computer Science: Artificial Intelligence
    • ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research arXiv - cs.CL
    • Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems arXiv - cs.CL
  • arXiv - hep-th arxiv.org arxiv physics preprint repository science 2026-06-18 04:00
    ↗

    arXiv:2601.14288v2 Announce Type: replace-cross Abstract: We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic...

    arXiv:2601.14288v2 Announce Type: replace-cross Abstract: We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic regression (SR) engine and a retrieval-augmented generation (RAG) knowledge base. This framework enables the agent to automatically explore and verify the vast landscape of inflationary potentials while grounding its outputs in established theoretical literature. We demonstrate that DeepInflation can successfully discover simple and viable single-field slow-roll inflationary potentials consistent with the latest observations (with the ACT DR6 results taken as an example) or any given $n_s$ and $r$, and provide accurate theoretical context for obscure inflationary scenarios. DeepInflation serves as a prototype for a new generation of autonomous scientific discovery engines in cosmology, which enables researchers and non-experts alike to explore the inflationary landscape using natural language. This agent is available at https://github.com/pengzy-cosmo/DeepInflation.
    • I Hacked an AI Customer Service Agent in 8 Seconds Siraj Raval
    • I Built an AI That Wrote Me a Country Breakup Song Siraj Raval
    • I Quit Chrome for an AI Browser. It Actually Worked. Siraj Raval
    • Building an AI Interviewer From Scratch in 3 Hours Harkirat Singh
    • How an AI Agent Deleted PocketOS Production in 9 Seconds Kent C. Dodds
    • How to build an AI Agent and MCP Server (step-by-step) Google Cloud Tech
    • How to Add an AI Chatbot to Your Client's Website Code with Ania Kubów #JavaScriptGames
    • I Built an AI Agent That Fixes My Resume Codevolution
    • What Is an AI Agent? LLMs, Tools, and a Loop Real Python
    • How to Become an AI Engineer in 2026 Tech With Tim
    • They Killed an AI Model Overnight (Fable 5 & Mythos 5) Traversy Media
  • arXiv - cs.CL arxiv.org ai arxiv computer-science preprint repository 2026-06-18 04:00
    ↗

    arXiv:2605.29676v2 Announce Type: replace-cross Abstract: Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for...

    arXiv:2605.29676v2 Announce Type: replace-cross Abstract: Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for application-to-application interchange rather than token efficiency, so its structural elements impose substantial token overhead. Recent work proposes token-optimized alternatives such as TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation) as more compact replacements, but these formats have been evaluated only on isolated comprehension or generation tasks. Whether their token reductions hold inside end-to-end agentic loops therefore remains an open question. We evaluate TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupling input compression from output compression to measure comprehension and generation independently. TRON reduces tokens by up to 27% with accuracy within 14pp of the JSON baseline. TOON achieves up to 18% reduction at a similar 9pp accuracy cost, but additionally cascades on multi-turn parsing failures and collapses parallel tool-call output for most models. The code is available at: https://github.com/lkutschka/notation-matters
    • Where AI Is Heading In 2026- Generative AI, Agentic AI, LLM Gateways,Guardrails,Evals, LLM Caching Krish Naik
    • 3.0 Agentic AI Bootcamp Announcement Krish Naik
    • 3.0 Agentic AI Specialisation with AgentOps Bootcamp Krish Naik
    • Complete Agentic AI Course In 10 Hours- Langchain, Langgraph, RAG,Vectorless RAG, Guardrails,Evals Krish Naik
    • Thriving in the Agentic AI Era: A Guide for Knowledge Workers and Organizations Mathematical Foundations of ML with Jon Krohn
    • From Dashboards to Decisions: Zoho Analytics on the Agentic AI Revolution The Ravit Show
    • Agentic AI Live Course QnA Telusko
  • arXiv - cs.CL arxiv.org ai arxiv computer-science preprint repository 2026-06-18 04:00
    ↗

    arXiv:2606.18142v2 Announce Type: replace-cross Abstract: AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer...

    arXiv:2606.18142v2 Announce Type: replace-cross Abstract: AI agents are moving from advisors to actors, booking travel, planning menus, and running procurement on behalf of users. Existing benchmarks for AI and animal welfare evaluate model text responses to question-answer prompts, leaving open whether the welfare reasoning surfaced in those responses transfers to agentic deployment where the model must take actions with tools. We introduce TAC (Travel Agent Compassion), the first agentic benchmark measuring whether AI agents avoid options involving animal exploitation when acting on behalf of users. TAC presents an AI agent with twelve hand-authored travel booking scenarios across six categories of animal exploitation, augmented to forty-eight samples to control for price, rating, and position confounds. We evaluate seven frontier models from four labs. Every model scores below the chance level of sixty-four percent, with the best performer (Claude Opus 4.7) at fifty-three percent. A single welfare-aware sentence in the system prompt yields gains of forty-seven to sixty-three percentage points in Claude and GPT-5.5, twenty-six points in GPT-5.2, and under twelve points in DeepSeek and Gemini. An auxiliary Inspect Scout audit of 288 base-condition transcripts from the top two performers, using Gemini 2.5 Flash Lite as judge, flags zero transcripts for evaluation awareness, suggesting the below-chance rates do not stem from the models recognising the evaluation. We discuss implications for category-level variation across cultural domains, the limits of text-response welfare benchmarks, and the EU General-Purpose AI Code of Practice systemic risk framework.
    • AI Dev 26 x SF | Manos Koukoumidis & Stefan Webb: VibeML: Build your AI model in hours, not months DeepLearningAI
    • How hidden messages can hijack your AI! Code with Ania Kubów #JavaScriptGames
    • Secure your AI Traffic with AgentGateway! That DevOps Guy
    • Secure your AI traffic using LLM gateways! That DevOps Guy
  • arXiv - hep-th arxiv.org arxiv physics preprint repository science 2026-06-18 04:00
    ↗

    arXiv:2601.18652v4 Announce Type: replace-cross Abstract: Galaxy clusters are the largest virialized structures in the Universe and are predominantly dominated by dark matter. The hydrostatic mass and the mass obtained from gravitational lensing measurements generally differ,...

    arXiv:2601.18652v4 Announce Type: replace-cross Abstract: Galaxy clusters are the largest virialized structures in the Universe and are predominantly dominated by dark matter. The hydrostatic mass and the mass obtained from gravitational lensing measurements generally differ, a discrepancy known as the hydrostatic mass bias. In this work, we derive the hydrostatic mass of galaxy clusters within the framework of Rastall gravity. We consider two scenarios: (i) the absence of dark matter and (ii) the presence of dark matter. In both cases, we constrain the Rastall parameter in the cluster-scale using observational data. In the first scenario, Rastall gravity effectively reduces the hydrostatic mass, bringing it closer to the observed baryonic mass. The best linear fit yields a slope $\mathbf{M}=1.07\pm0.11$, indicating a near one-to-one correspondence between the two masses. In the second scenario, Rastall gravity helps to alleviate the hydrostatic mass bias. The linear fit between the Rastall hydrostatic mass and the observed lensing mass results in a best-fit slope $\mathbf{M}=0.99\pm0.26$, which is very close to unity. We also calculate the goodness-of-fit for every fit. The statistical evaluations indicate that Rastall gravity provides a viable phenomenological framework that can improve certain aspects of the mass discrepancy problem at the level of scaling relations. However, it does not universally outperform other modified gravity model, when evaluated using standard goodness-of-fit criteria.
    • Interest rates held as Bank warns of impact of high energy prices BBC News - Business
    • Interest rates held as Bank warns of impact of high energy prices BBC News - Business
    • The Impact Of Humanoid Robots On Humanity Smashing Magazine
  • arXiv - cs.AI arxiv.org ai arxiv computer-science preprint repository 2026-06-18 04:00
    ↗

    arXiv:2606.10466v2 Announce Type: replace-cross Abstract: In time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address...

    arXiv:2606.10466v2 Announce Type: replace-cross Abstract: In time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address this fragmentation, we propose UPLOTS, a Unified, Prompt-guided Language model framework fOr constrained Time-Series Generation across diverse domains. Instead of building task-specific models, UPLOTS leverages a single pre-trained transformer backbone guided by learned constraint prompts, enabling on-demand generation with precise pattern control. One key innovation is our dynamic multi-dataset loss re-weighting and prompt-to-pattern mapping, which allows UPLOTS to internalize diverse temporal structures during training and conditionally generate them at inference. We evaluate UPLOTS on four real-world benchmarks and multiple constraint settings, including peak-period, calendar, load-level, and volatility patterns. Additional held-out constraint-combination and downstream forecasting experiments further demonstrate that UPLOTS generalizes beyond the original peak-pattern setting and improves data augmentation under scarce real-data regimes. Our code and baselines are available at anonymous github repo: https://anonymous.4open.science/r/UPLOTS-6C36.
    • Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap arXiv - Computer Science: Artificial Intelligence
    • UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation arXiv - Computer Science: Artificial Intelligence
    • Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap arXiv - cs.AI
    • UnoSolver.jl a unified SQP/barrier solver for nonlinearly constrained optimization | Charlie Vanaret The Julia Programming Language
  • arXiv - cs.AI arxiv.org ai arxiv computer-science preprint repository 2026-06-18 04:00
    ↗

    arXiv:2606.15091v2 Announce Type: replace-cross Abstract: Millions of individuals worldwide suffer from sensory and communication deficits caused by neurodegenerative diseases, stroke, or trauma. Brain-computer interfaces (BCIs) offer a promising avenue for sensory and motor...

    arXiv:2606.15091v2 Announce Type: replace-cross Abstract: Millions of individuals worldwide suffer from sensory and communication deficits caused by neurodegenerative diseases, stroke, or trauma. Brain-computer interfaces (BCIs) offer a promising avenue for sensory and motor restoration. However, the scientific literature remains highly fragmented between invasive neuroprosthetics and non-invasive electrophysiological decoders, with a lack of consistent terminology and comparison metrics. This chapter proposes a unified 2 x 2 framework categorizing BCIs along two axes: degree of invasiveness (invasive vs. non-invasive) and signal direction (afferent sensory-IN vs. efferent sensory-OUT). We define and distinguish the paradigms of restoration, substitution, and augmentation. Furthermore, we outline a structural roadmap for the convergence of these modalities over near-, medium-, and long-term horizons, focusing on physical limits and the integrative role of machine learning foundation models.
    • Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap arXiv - Computer Science: Artificial Intelligence
    • UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation arXiv - Computer Science: Artificial Intelligence
    • UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation arXiv - cs.AI
    • UnoSolver.jl a unified SQP/barrier solver for nonlinearly constrained optimization | Charlie Vanaret The Julia Programming Language
  • arXiv - Computer Science: Artificial Intelligence arxiv.org ai arxiv computer-science preprint research science 2026-06-18 04:00
    ↗

    arXiv:2606.10466v2 Announce Type: replace-cross Abstract: In time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address...

    arXiv:2606.10466v2 Announce Type: replace-cross Abstract: In time-series generation, existing approaches typically handcraft ortrain a separate model for each dataset, which hinders their scalability and fails to leverage shared temporal structures across domains. To address this fragmentation, we propose UPLOTS, a Unified, Prompt-guided Language model framework fOr constrained Time-Series Generation across diverse domains. Instead of building task-specific models, UPLOTS leverages a single pre-trained transformer backbone guided by learned constraint prompts, enabling on-demand generation with precise pattern control. One key innovation is our dynamic multi-dataset loss re-weighting and prompt-to-pattern mapping, which allows UPLOTS to internalize diverse temporal structures during training and conditionally generate them at inference. We evaluate UPLOTS on four real-world benchmarks and multiple constraint settings, including peak-period, calendar, load-level, and volatility patterns. Additional held-out constraint-combination and downstream forecasting experiments further demonstrate that UPLOTS generalizes beyond the original peak-pattern setting and improves data augmentation under scarce real-data regimes. Our code and baselines are available at anonymous github repo: https://anonymous.4open.science/r/UPLOTS-6C36.
    • Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap arXiv - Computer Science: Artificial Intelligence
    • Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap arXiv - cs.AI
    • UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation arXiv - cs.AI
    • UnoSolver.jl a unified SQP/barrier solver for nonlinearly constrained optimization | Charlie Vanaret The Julia Programming Language
  • arXiv - Computer Science: Artificial Intelligence arxiv.org ai arxiv computer-science preprint research science 2026-06-18 04:00
    ↗

    arXiv:2606.15091v2 Announce Type: replace-cross Abstract: Millions of individuals worldwide suffer from sensory and communication deficits caused by neurodegenerative diseases, stroke, or trauma. Brain-computer interfaces (BCIs) offer a promising avenue for sensory and motor...

    arXiv:2606.15091v2 Announce Type: replace-cross Abstract: Millions of individuals worldwide suffer from sensory and communication deficits caused by neurodegenerative diseases, stroke, or trauma. Brain-computer interfaces (BCIs) offer a promising avenue for sensory and motor restoration. However, the scientific literature remains highly fragmented between invasive neuroprosthetics and non-invasive electrophysiological decoders, with a lack of consistent terminology and comparison metrics. This chapter proposes a unified 2 x 2 framework categorizing BCIs along two axes: degree of invasiveness (invasive vs. non-invasive) and signal direction (afferent sensory-IN vs. efferent sensory-OUT). We define and distinguish the paradigms of restoration, substitution, and augmentation. Furthermore, we outline a structural roadmap for the convergence of these modalities over near-, medium-, and long-term horizons, focusing on physical limits and the integrative role of machine learning foundation models.
    • UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation arXiv - Computer Science: Artificial Intelligence
    • Sensory Restoration via Brain-Computer Interfaces: A Unified 2 x 2 Framework and Convergence Roadmap arXiv - cs.AI
    • UPLOTS: A Unified Pretrained Language Model for Constrained Time-series Generation arXiv - cs.AI
    • UnoSolver.jl a unified SQP/barrier solver for nonlinearly constrained optimization | Charlie Vanaret The Julia Programming Language
  • arXiv - hep-th arxiv.org arxiv physics preprint repository science 2026-06-18 04:00
    ↗

    arXiv:2504.17533v3 Announce Type: replace-cross Abstract: The standard inflationary theory focuses on the freezing of super-horizon fluctuations, which generate a scale-invariant spectrum, while the sub-horizon modes are expected to remain in thermal equilibrium. Building...

    arXiv:2504.17533v3 Announce Type: replace-cross Abstract: The standard inflationary theory focuses on the freezing of super-horizon fluctuations, which generate a scale-invariant spectrum, while the sub-horizon modes are expected to remain in thermal equilibrium. Building upon recent development of quantum thermodynamics of the de Sitter universe, we investigate the graviton remnant originating from this thermal horizon radiation released at the end of inflation. Unlike the stochastic background from super-horizon fluctuations, this signal represents a snapshot of the thermal dS state, which subsequently decouples and undergoes cosmological redshift. We present a semi-analytical approximation prediction for this relic background, typically peaking in near MHz band, with characteristic energy density of $\log_{10}(\Omega_{\rm G} h^2) \sim \mathcal{O}(-18)$. These signals occupy a High-Frequency band, offering a potential novel probe of the reheating temperature and the thermal history of the early universe.
    • Polls close in historic Makerfield byelection that could see Andy Burnham elected and pave way for end of Starmer – UK politics live The Guardian - World
    • End of the Junior Engineer Era blondiebytes
    • End of Gemini CLI - Welcome to Antigravity 2.0 Telusko
  • End of feed
Maibook — your private personalized AI community
  • rcanand.com
  • mlaillc.com
  • @rcanand (X)
  • LinkedIn
  • Feedback
  • Credits