Assignee Research: Index of Papers

[400]

Can the domain adaptation capability of EVOR be quantified using cross-domain transfer accuracy on the MBPP an

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Remarkable progress has been made in image recognition, primarily due to the availability of large-scale annotated datasets and deep convolutional neural networks (CNNs). CNNs enable learning data-driven, highly representative, hierarchical image features from sufficient training data. However, obtaining datasets as…

[399]

What is the impact of query-knowledge base co-evolution on retrieval latency and generation quality as measure

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20438459

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[398]

To what extent does EVOR's diverse knowledge base evolution improve LLM performance on out-of-distribution cod

29 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20438444

Abstract: ChatGPT is an artificial intelligence (AI)-based conversational large language model (LLM). The potential applications of LLMs in health care education, research, and practice could be promising if the associated valid concerns are proactively examined and addressed. The current systematic review aimed to investigate…

[397]

What is the inference latency overhead of EVOR's synchronous query-knowledge base evolution compared to tradit

29 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20438433

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[396]

How does the EVOR pipeline's dynamic knowledge base evolution impact code generation accuracy compared to stat

29 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20438344

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[395]

What is the impact of network depth on multimodal fusion accuracy when combining visual and textual features f

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[394]

How does the cross-modal fusion efficiency of multimodal architectures compare to single-modality baselines fo

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: We present HERO, a novel framework for large-scale video+language omnirepresentation learning. HERO encodes multimodal inputs in a hierarchical structure, where local context of a video frame is captured by a Cross-modal Transformer via multimodal fusion, and global video context is captured by a Temporal…

[393]

How does the inference efficiency of VGG-style very deep convolutional networks compare to shallow baselines w

29 May 2026. Score: 3.83/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Interest in Artificial Intelligence (AI) and its applications has seen unprecedented growth in the last few years. This success can be partly attributed to the advancements made in the sub-fields of AI such as machine learning, computer vision, and natural language processing. Much of the growth in these fields has…

[392]

What is the impact of domain adaptation fine-tuning on LLaMA-2-7B model performance degradation across held-ou

29 May 2026. Score: 4.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Ad-hoc instruction fine-tuning of large language models (LLMs) is widely adopted for domain-specific adaptation. While domain-specific supervised fine-tuning (SFT) is effective and efficient, it often weakens cross-domain generalization and struggles with noisy training data. To address these challenges, we propose…

[391]

How does layer-wise modality specialization in MoE VLMs influence inference throughput and MMBench accuracy tr

29 May 2026. Score: 1.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Real world deployments often expose modern object recognition models to domain shifts that precipitate a severe drop in accuracy. Such shifts encompass (i) variations in low level image statistics, (ii) changes in object pose and viewpoint, (iii) partial occlusion, and (iv) visual confusion across adjacent classes.…

[390]

How does model accuracy and inference efficiency vary across different expert counts in sparse multimodal mode

29 May 2026. Score: 7.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20437799

Abstract: The deployment of large language models (LLMs) within the healthcare sector has sparked both enthusiasm and apprehension. These models exhibit the remarkable ability to provide proficient responses to free-text queries, demonstrating a nuanced understanding of professional medical knowledge. This comprehensive survey…

[389]

What is the impact of cross-modal expert routing consistency on SEED-Bench zero-shot accuracy when varying the

29 May 2026. Score: 6.67/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: Multilayer neural networks trained with the back-propagation algorithm constitute the best example of a successful gradient based learning technique. Given an appropriate network architecture, gradient-based learning algorithms can be used to synthesize a complex decision surface that can classify high-dimensional…

[388]

How do different routing strategies (top-k vs. noisy top-k) in SparseMoE vision-language models influence expe

29 May 2026. Score: 5.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Vision-Language Model (VLM) have gained widespread adoption in Open-Vocabulary (OV) object detection and segmentation tasks. Despite they have shown promise on OV-related tasks, their effectiveness in conventional vision tasks has thus far been unevaluated. In this work, we present the systematic review of VLM-based…

[387]

Does the self-invoking code generation task in HumanEval Pro reveal systematic failure modes in MambaFormer's

29 May 2026. Score: 7.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: As Large Language Models (LLMs) become increasingly integrated into secure software development workflows, a critical question remains unanswered: can these models not only detect insecure code but also reliably classify vulnerabilities according to standardized taxonomies? In this work, we conduct a systematic…

[386]

How does varying the number of active experts in a Mixture-of-Experts Transformer affect pass@k accuracy on Hu

29 May 2026. Score: 6.30/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Mixture-of-experts (MoE) models that employ sparse activation have demonstrated effectiveness in significantly increasing the number of parameters while maintaining low computational requirements per token. However, recent studies have established that MoE models are inherently parameter-inefficient as the…

[385]

What is the relationship between the number of active experts per token in SoftMoE and spatial reasoning perfo

29 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Abstract Physics-Informed Neural Networks (PINN) are neural networks (NNs) that encode model equations, like Partial Differential Equations (PDE), as a component of the neural network itself. PINNs are nowadays used to solve PDEs, fractional equations, integral-differential equations, and stochastic PDEs. This novel…

[384]

Can Vendi-RAG maintain answer accuracy on the HotpotQA benchmark when the diversity weight hyperparameter is s

29 May 2026. Score: 7.67/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Large Language Models (LLMs) have drawn a lot of attention due to their strong performance on a wide range of natural language tasks, since the release of ChatGPT in November 2022. LLMs' ability of general-purpose language understanding and generation is acquired by training billions of model's parameters on massive…

[383]

How does Vendi-RAG's iterative retrieval process compare to fixed-top-k retrieval on the 2WikiMultihop benchma

29 May 2026. Score: 5.33/10. Verification: L1, Literature synthesis. Gate status: Unverified.

Abstract: In the last few years, the deep learning (DL) computing paradigm has been deemed the Gold Standard in the machine learning (ML) community. Moreover, it has gradually become the most widely used computational approach in the field of ML, thus achieving outstanding results on several complex cognitive tasks, matching…

[382]

How does the accuracy of Tree of Reviews on MuSiQue at 128K context degrade when the number of distractor pass

29 May 2026. Score: 8.00/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20437593

Abstract: Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM shall enable its users to effortlessly process many originally exhausting tasks -e.g., digesting a long-form document to find answers v.s., directly asking an LLM about it.However, existing…

[381]

How does the Tree of Reviews framework's F1 score on the MuSiQue benchmark vary when evaluated with Llama-3-8B

29 May 2026. Score: 8.83/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20437582

Abstract: Large Language Models (LLMs) showcase impressive capabilities but encounter challenges like hallucination, outdated knowledge, and non-transparent, untraceable reasoning processes. Retrieval-Augmented Generation (RAG) has emerged as a promising solution by incorporating knowledge from external databases. This…

[380]

What is the token efficiency ratio (total input tokens processed per correct answer) of the Tree of Reviews me

29 May 2026. Score: 8.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20437574

Abstract: Abstract The rapid evolution of large language models (LLMs) has driven a transformative shift in artificial intelligence (AI), reshaping both research paradigms and practical applications. Distinguished from their predecessors by unprecedented scale and advanced capabilities, LLMs necessitate new frameworks for…

[379]

Does LongNav-R1's multi-turn RL approach improve generalization to unseen environments on the RxR-CE benchmark

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Unlike previous studies on the Metaverse based on Second Life, the current Metaverse is based on the social value of Generation Z that online and offline selves are not different. With the technological development of deep learning-based high-precision recognition models and natural generation models, Metaverse is…

[378]

What is the impact of scaling the VLA model size (e.g., from 7B to 13B parameters) on the average reward and t

29 May 2026. Score: 7.50/10. Verification: L2, Source-grounded claims. Gate status: Unverified. 10.5281/zenodo.20437422

Abstract: Recent advances in the areas of multimodal machine learning and artificial intelligence (AI) have led to the development of challenging tasks at the intersection of Computer Vision, Natural Language Processing, and Embodied AI. Whereas many approaches and previous survey pursuits have characterised one or two of…

[377]

How does the multi-turn RL framework in LongNav-R1 compare to fixed-length memory baselines on the VLN-CE benc

29 May 2026. Score: 7.33/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent…

[376]

Does Vendi-RAG's diversity optimization maintain its latency and accuracy benefits when evaluated on the MMLU

29 May 2026. Score: 7.17/10. Verification: L2, Source-grounded claims. Gate status: Unverified.

Abstract: Neural sequence-to-sequence models have provided a viable new approach for abstractive text summarization (meaning they are not restricted to simply selecting and rearranging passages from the original text). However, these models have two shortcomings: they are liable to reproduce factual details inaccurately, and…