HIVEMIND kicks-off to advance AI-powered human-centric software development
This press release announces the launch of the HIVEMIND project, highlighting its goals, key technologies, and international partnerships aimed at advancing AI-driven software development.
Author: HIVEMIND
HIVEMIND NEWSLETTER_01
The first edition of the HIVEMIND newsletter provides a concise overview of the project’s initial six months. It highlights key milestones, including the kick-off meeting, engagement in sector events, and interviews with Work Package leaders on building a responsible and collaborative multi-agent AI system.
HIVEMIND project overview presentation
This presentation introduces the HIVEMIND project, outlining its objectives, technical architecture, specialised AI agents, and five industrial use cases. It also highlights the project’s anticipated scientific, industrial, and societal impacts.
HIVEMIND project trifold brochure
This trifold brochure provides an overview of the HIVEMIND project, presenting its vision, consortium, application domains, and key AI-powered agents supporting the software development lifecycle.
HIVEMIND project poster
This poster presents an overview of the HIVEMIND project, highlighting its human-centric, AI-driven multi-agent framework for accelerating the software development lifecycle. It outlines the project’s vision, architecture, specialised agents, data handling approach, fine-tuning methods, and real-world validation use cases.
What About Emotions? Guiding Fine-Grained Emotion Extraction from Mobile App Reviews
This paper explores the underexamined area of fine-grained emotion classification in app reviews, extending beyond the traditional focus on sentiment polarity (positive, negative, neutral). To capture the complexity of users’ affective responses, the study adapts Plutchik’s emotion taxonomy and introduces a structured annotation framework and dataset tailored to app reviews. Through an iterative human annotation process, the authors establish clear guidelines, highlight challenges in interpreting emotions, and assess the feasibility of automation with large language models (LLMs). The results show that LLMs substantially reduce manual annotation effort and achieve notable agreement with human annotators, though full automation remains difficult due to the nuanced nature of emotions. This work provides structured guidelines, an annotated dataset, and insights for building semi-automated pipelines, offering valuable contributions to opinion mining, requirements engineering, and user feedback analysis.
Multi-Agent Debate Strategies to Enhance Requirements Engineering with Large Language Models
This paper investigates the potential of Multi-Agent Debate (MAD) strategies to enhance the performance of Large Language Model (LLM) agents in Requirements Engineering (RE) tasks. While prior research has focused on prompt engineering, fine-tuning, and retrieval-augmented generation, these methods often treat LLMs as isolated black boxes, relying on single-pass outputs with limited robustness and adaptability. Inspired by the way human debates improve accuracy by incorporating diverse perspectives, this study explores whether collaborative interactions among multiple LLM agents can yield similar benefits. We systematically analyze existing MAD strategies across different domains, identifying their key characteristics and developing a taxonomy of core attributes. Building on this foundation, we implement and evaluate a preliminary MAD-based framework for RE classification. The results demonstrate both the feasibility and potential advantages of applying MAD to RE, paving the way for more robust, adaptive, and accurate use of LLMs in engineering contexts.
HIVEMIND NEWSLETTER_02
The second issue of the HIVEMIND project newsletter includes a brief first-year status update from the project coordinator, references to recent scientific publications, an animated introduction to the project’s core concept, and an overview of ongoing clustering and collaboration activities with related initiatives.
AI-Powered Software Testing Tools: Full Autonomy Remains a Distant Goal
This paper examines the current landscape of AI-powered software testing tools by systematically reviewing and classifying 56 commercially available solutions as of 2024. It analyses how these tools support different stages of the software testing process, ranging from test planning and test-case design to execution and maintenance, and highlights their potential to improve efficiency and effectiveness for test engineers. At the same time, the paper identifies key limitations, including false positives and insufficient contextual or domain understanding, which underscores the continued need for human oversight. The study argues that AI-assisted testing tools should be seen as complementary to human testers rather than fully autonomous solutions, with close human–AI collaboration remaining essential in the foreseeable future.
PolyglotQL: A Pipeline for Multilingual Text-to-SPARQL Dataset Generation
Presented at LREC 2026, May 11, 2026.
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing.
Presented at LREC 2026, May 11, 2026.
SciLaD: A Large-Scale, Transparent, Reproducible Dataset for Natural Scientific Language Processing.
SciLaD is a novel, large-scale dataset of scientific language constructed entirely using open-source frameworks and publicly available data sources. It comprises a curated English split containing over 10 million scientific publications and a multilingual, unfiltered TEI XML split including more than 35 million publications. We also publish the extensible pipeline for generating SciLaD. The dataset construction and processing workflow demonstrates how open-source tools can enable large-scale, scientific data curation while maintaining high data quality. Finally, we pre-train a RoBERTa model on our dataset and evaluate it across a comprehensive set of benchmarks, achieving performance comparable to other scientific language models of similar size, validating the quality and utility of SciLaD. We publish the dataset and evaluation pipeline to promote reproducibility, transparency, and further research in natural scientific language processing and understanding, including scholarly document processing.
PolyglotQL: A Pipeline for Multilingual Text-to-SPARQL Dataset Generation.
We present PolyglotQL, an open-source ETL (Extract, Transform, Load) pipeline for systematically creating multilingual text-to-SPARQL datasets, along with an accompanying framework for evaluating text-to-SPARQL generation models. PolyglotQL provides an extensible and modular architecture that aggregates, normalizes, and augments heterogeneous question–SPARQL pairs from established text-to-SPARQL datasets. With this pipeline, we automatically construct a bilingual English–German dataset featuring contextualized entity and relationship mappings as well as automatically translated and aligned question pairs. We also conduct an empirical evaluation using two multilingual open large language models under two distinct contextualization settings. The results show consistent performance improvements when explicit grounding information is provided, highlighting the benefits of structured context in multilingual semantic parsing.
Collaborative Multi-Agent Testing for Emergent Failure Discovery in Autonomous Driving Systems
Presented at an ICRA workshop on 1 June 2026.
HIVEMIND NEWSLETTER_03
The third issue of the HIVEMIND project newsletter includes updates from the General Assembly in Berlin, the mid-term Exploitation and IPR Workshop, and the newly established Alliance for Generative Software Engineering. It also presents recent conference participation by project partners and highlights new scientific publications produced within the project.