The following is the author's opinion:
Three European scholars focused on research in the fields of empirical software engineering, software testing and AI application. Through system mapping research and topic analysis methods, they completed a special study on the current application status of AI in the software testing industry. The research results were published on the arXiv platform in April 2025.
This paper focuses on the current status of industrial application of AI in software testing. Through system mapping research (screening 17 relevant empirical documents after 2020) and theme analysis, the core findings are as follows:
Gap between expectation and reality: Although the industry’s interest in AI-driven testing is rising (75% of companies will list it as a strategy in 2025), the actual implementation rate is only 16%, and there is a significant gap between expectations and reality.
Focus of practical applications: Practical applications of AI focus on basic tasks such as test case generation and code generation. The verified benefits are mainly time saving and test coverage improvement, while expected benefits such as cost savings and job satisfaction improvement have not yet been realized.
Application scenarios: Application scenarios are divided into individual level (personal auxiliary tools for testers) and system level (large-scale deployment by organizations). Currently, individual-level applications are the main ones, and the whole is still in the early experimental stage.
Need for research: There are currently few relevant empirical studies in industry, and more targeted research is needed to bridge the gap between academic and practice.
AI has become a daily tool in the field of software development (such as code generation tools such as GitHubCopilot), which has significantly improved developer efficiency, but its application in the field of software testing is relatively lagging:
Industry interest is disconnected from practical applications: Perforce's 2024 survey showed that 48% of respondents were interested in AI testing but had not initiated relevant initiatives, and only 11% had actually implemented it. In the 2025 survey, 75% listed AI-driven testing as a core strategy, but the actual application rate is still only 16%.
There is a disconnect between academic research and industrial practice: Existing AI testing research is mostly experimental research and lacks support from industrial scenarios, making it difficult to solve practical problems.
The core research objectives are to clarify the current status of industrial application of AI in software testing through two major research questions:
RQ1: What are the empirical studies on the application of AI in software testing in the industrial context?
RQ2: How is AI applied in software testing in the industry?
Data sources: Scopus, GoogleScholar and other databases, screened industrial empirical research (including case studies, surveys, interviews, etc.) published after 2020, and finally included 17 valid documents (9 peer-reviewed papers, 6 dissertations, 2 gray literature).
Screening challenges: There are a large number of false positive results in searches related to "artificial intelligence" (such as literature citations, institution names containing AI terms but irrelevant content), which need to be optimized by limiting title/abstract keywords and combining search phrases.
Analysis method: Using reflective thematic analysis (ReflexiveTA), using a data-driven induction method, core themes were extracted from 17 documents, including "expectation vs. reality", "AI application scope" and "current adoption status".
Analysis focus: Focus on the differences between actual use cases and actual benefits and expected use cases and expected benefits, while clarifying the different levels of AI applications.
Literature characteristics: The highest proportion of literature published in 2024 (10 articles), but only 3 are peer-reviewed papers, indicating that relevant empirical research in the industry is still in its early stages.
Research methods: Mainly qualitative research (13 articles), including thematic analysis, case studies, action research, etc.; The data collection methods were mainly interviews (10 articles) and questionnaires (8 articles).
The paper divides AI use cases in software testing into 6 categories, and compares the differences between actual implementation scenarios and expected scenarios:
Key findings: Actual use cases are mostly concentrated in "generation categories" and "basic analysis categories", and are mainly repetitive and mechanical tasks; Expected use cases tend to be more complex scenarios (such as security testing, exploratory testing) and system-level optimization, which are difficult to support with current technology.
(1) Actual Verified Benefits
Only 6 documents reported clear actual benefits, with the core focus on three points:
Time saving: Shortening the testing process and troubleshooting time through test case generation, code generation, root cause analysis and other use cases.
Improved coverage: AI-generated test cases and test data cover more scenarios, especially boundary scenarios that are easily missed by humans.
Resource optimization: Optimize the allocation of human and equipment resources through test case prioritization and intelligent test automation.
(2) Expected but unrealized benefit
Cost savings: Early AI deployment requires investment in infrastructure, personnel training and other costs, and no significant cost reduction has been observed in the short term.
Improved job satisfaction: AI has not completely replaced repetitive tasks, and the maintenance work of AI-generated products has been added, which has not significantly improved tester satisfaction.
Improved communication efficiency: AI is expected to break down team information barriers, but no significant improvement in cross-role communication has been observed in actual applications.
The paper found that there are two clear levels of AI application in software testing, with the individual level being the main one:
Individual level application (mainstream): Testers use AI as a "personal assistant" to assist in completing their own tasks, such as using LLM to generate test cases for personally responsible modules and analyze bugs in code snippets.
System-level applications (a few): Deploy AI tools at the organizational level to achieve system-wide automation, such as automatic generation of large-scale test cases, intelligent configuration of test environments, and automation of cross-module regression testing.
Shallow application depth: Even the use cases that have been implemented (such as AI tools in GUI testing) are mostly "extensive but superficial" applications and do not go deep into the core process.
In the experimental stage: Most companies are still exploring POC (proof of concept), such as AI-based test data generation, defect prediction, etc., and have not yet formed stable production-level applications.
Cognitive confusion: Testers are confused about the actual capabilities and application scenarios of AI, and the mentality of "waiting for best practices" is common.
Technical level: AI-generated test cases are difficult to maintain, can only generate basic test data, are highly dependent on domain knowledge, and have the risk of hallucinations.
Process level: System-level AI applications need to transform the existing testing process and coordinate cross-department resources, which creates great resistance to implementation.
Cognitive level: The industry has unrealistic expectations for AI (such as "AI can solve all testing problems"), resulting in a large gap between actual results and expectations.
Limited sample size: Only 17 documents were included, and there may be missing relevant studies.
Limited search scope: Special searches that do not include machine learning, deep learning and other subdivided AI technologies focus on emerging AI tools such as LLM.
Topic selection: In order to control the scope of the paper, topics such as "AI application barriers" and "tester attitudes" were not analyzed in depth, and follow-up research is needed to supplement it.
The application of AI in software testing is still in its early stages and has not yet brought about revolutionary changes.
There is a significant gap between expectations and reality: actual use cases focus on basic generation and analysis tasks, and actual benefits are mainly time savings and coverage improvements, while expectations such as cost savings and system-level optimization have not yet been realized.
AI applications are characterized by "mainly at the individual level and supplemented by the system level". Large-scale application at the organizational level needs to break through multiple technical, process and cognitive barriers.
For enterprises: Set AI testing goals rationally, prioritize individual-level lightweight applications (such as test case generation, code analysis), and then advance system-level deployment after accumulating practical experience; Increase investment in the research and development of internal AI tools to solve data privacy and domain adaptation issues.
For testers: AI is not a substitute, but an auxiliary tool. It is necessary to improve the review ability and domain knowledge reserve of AI-generated products, and adapt to the new model of "AI assistance + manual decision-making".
For researchers: It is necessary to conduct more empirical research in the industry (such as field experiments, long-term case tracking), focus on practical issues (such as maintenance of AI-generated use cases, cross-scenario adaptation), and bridge the gap between academic and practice.
Source: TesterHome Community
If you need to learn more details, you can refer to the original text of the paper: https://arxiv.org/html/2504.04921v1