Traditional financial testing—centered on a "text-driven" approach and focused on single-dimensional verification like interface calls and business logic—struggles to adapt to the complex scenarios spawned by financial digital transformation. This article explores how multimodal technology and AI agent collaboration are transcending testing boundaries, propelling financial testing from "single-point verification" toward "system-level intelligent testing."
Key takeaways for technical leaders:
The challenge: Multimodal interaction coverage <50% (China UnionPay data); cross-system coordination inefficiencies
The solution: Multi-agent architectures with specialized financial domain agents
The results: 40-80% efficiency gains, 35-62% risk reduction across case studies
The 2026 imperative: Testing must now verify business value and regulatory compliance, not just technical coverage
The urgency for testing transformation is driven by three converging forces:
Financial services now operate across voice, vision, and text simultaneously—yet traditional testing covers less than 50% of multimodal scenarios . When a customer queries "What were my dining expenses last month?" via voice while viewing charts, that's a multimodal interaction requiring coordinated testing.
Cross-bank payment clearing, domestic IT infrastructure (Xinchuang) compatibility, and fintech-partner integrations create coordination nightmares. Traditional approaches require multiple teams and redundant environments.
The National Financial Regulatory Administration's Implementation Plan for the High-Quality Development of Digital Finance explicitly requires a shift from "process digitization" to "value creation." Testing must now verify:
With 358 large model projects in banking during Q1-Q3 2025 (totaling ¥955 million) and tech investment projected to exceed ¥450 billion by 2028, testing capabilities must evolve at the same velocity as the systems they validate .
Modern multimodal testing has transcended traditional "text + numerical" limitations to achieve full-spectrum coverage across text, images, audio, video, and unstructured documents—with emphasis on contextual awareness.
In 2026, content must be optimized not just for human readers but for "synthesizers"—AI models that construct answers . This requires:
Structured data that LLMs can parse (schema.org/Article, schema.org/TechArticle)
Entity-based organization using Wikidata IDs for financial concepts
Bottom Line Up Front (BLUF) formatting for AI Overview extractability
demonstrates production-grade implementation:
Kubernetes-based microservice simulation
Real-time parsing of UI interaction video streams (mobile banking transfer navigation)
Voice command analysis (dialect queries for credit card bills)
Result: Multimodal test coverage increased from 92% → 96%
IoT sensor fusion (branch temperature/humidity, ATM vibration)
Full "test environment ↔ physical scenario" mapping
Result: Hardware failure prediction accuracy improved 85% → 89%
innovatively fuses unstructured data:
OCR: ID card tampering detection
NLP: Risk signal extraction from corporate public opinion texts
Graph learning: Related-party transaction subgraph construction
Result: Identity fraud defect detection increased 40%
Li Lihui, former President of Bank of China, notes: "Current multimodal large models can perceive, understand, and simulate the dynamic physical world—for example, simulating voice customer service under extreme network conditions, filling gaps left by traditional text-based testing" .
China UnionPay's "general + specialized" agent matrix has evolved into a composite architecture featuring vertical domain division and cross-scenario collaboration.
| 2026 Multi-Agent Architecture for Financial Testing | ||
|
Orchestration Layer Cross-Agent Coordination | Priority Scheduling | Compute Resource Allocation (Lightweight/Standard/High-Performance) |
||
|
Functional Layer • Test Case Agent • Automation Agent • Document Analysis Agent |
Professional Domain Layer • Financial Calculation Agent • Compliance Document Agent • Regulatory Intelligence Agent |
Fitness Layer
• Evaluation Agent • Strategy Optimization Agent • Self-Healing Agent |
New capability: Multilingual corpus (English, Japanese, Southeast Asian languages)
Generates: "Ambiguous instruction + cross-border compliance" test cases
Example: "Repay Japanese yen credit card bill with US dollars" (implicit currency conversion)
Results: 12× test case generation increase; 60% cross-border coverage improvement
Architecture: FinTeam multi-agent system with conversation memory mechanism
Capability: Simulates cross-platform user journeys (mobile banking account opening → online banking card binding → third-party payment authorization)
Context recognition accuracy: >98% (solves "conversation" issues)
Function: Scans test reports, user agreements for compliance defects
Detects: Vague statements, clause conflicts (e.g., "expected returns" missing risk warnings)
Efficiency: 20× faster than manual review; >92% defect identification rate
(FinTeam "Accountant Agent" lineage)
Focus: Financial product pricing, fee calculations
Verifies: Floating LPR interest calculations, fund NAV accuracy
Supports: Excel formula parsing, blockchain transaction record comparison
Error rate: <0.1% for calculation-focused tests
Function: Automated scanning of test reports, user agreements
Identifies: "Vague statements," "clause conflicts" (e.g., missing risk warnings on "expected returns")
Efficiency: 20× faster than manual review
Defect identification rate: >92%
(Tongdun Technology lineage)
Function: Analyzes historical defect patterns (e.g., frequent "UI button response delays")
Action: Automatically increases test case density for high-risk modules
Results:30% testing efficiency increase; 50% defect recurrence reduction
Innovation: "Compliance quantification index" integration
Metrics: Beyond BLEU/semantic similarity, now includes regulatory compliance scoring
Regulations checked: Anti-money laundering, data security (per Implementation Plan)
Results: 80% evaluation efficiency increase; 35% compliance漏检率 (missed detection) reduction
Self-healing technologies have expanded from code repair to encompass compliance verification and domestic IT (Xinchuang) adaptation.
| SELF-HEALING ENGINE | |||
| Module Category | Specific Module | Core Functions | Key Indicators |
| Basic Repair Modules | Code Repair Module |
1. C language vulnerability repair; 2. ESBMC-AI integration |
Repair accuracy >90% |
| Environment Repair Module |
1. Xinchuang compatibility repair (KylinOS + JMeter); 2. Automatic plugin call |
Success rate >85% | |
| Verification and Report Modules | Compliance Verification Module |
1. Comparison of test reports with requirements; 2. Missing scenario detection |
Verification accuracy >90% |
| Report Generation Module |
1. Automatic generation of risk summaries; 2. Marking of high-risk items |
No clear quantitative indicators | |
Original: C language vulnerability repair
New (2026): Xinchuang software/hardware compatibility repair
Capability: When compatibility issues arise between domestic OS (KylinOS) and testing tools (JMeter), automatically invokes Xinchuang adaptation plugins
Success rate: >85%
Integrates "requirements → testing → operations" chain
Test Flow generates Cypress scripts automatically synced to operations platform
Enables: One-click "test script → production monitoring rule" conversion
Results: UAT time reduced 70%; production issue溯源 (traceability) time reduced 50%
Large model compares test reports against business requirement documents
Identifies: "Omitted test scopes," "contradictory results"
Example detected: "Credit approval test" missing "non-local household registration customers" scenario
Accuracy: >90%; manual review time reduced 60%
Automatically generates test risk analysis summaries
Flags "high-risk items" (e.g., "cross-border payment test missing SWIFT code verification")
Benefit: Prioritizes critical scenario verification
Industrial Bank's "Agent Swarm Architecture" has evolved into a layered orchestration model addressing resource adaptation across institutions of varying sizes.
(aligned with China Fintech and Digital Finance Development Report trends)
| Tier | Use Case | Technologies | Example |
| Lightweight Compute | Interface functional tests | Small models + edge computing | Basic API validation |
| Standard Compute | Multimodal interaction tests | General LLMs + cloud servers | Voice + UI testing |
| High-Performance Compute | Encrypted link tests | Large models + quantum communication | Encrypted transfers |
Quantum Integration: When testing bank (encrypted transfer) functions, system automatically orchestrates quantum-safe transmission compute (referencing ICBC and Hua Xia Bank quantum tech practices)
Response speed: 40% improvement
Security compliance: Meets Level 3 protection standards
New factor: "Business value weight"
High-priority classification: Digital yuan pilot tests, green credit tests
Example impact: Digital yuan red packet test response time: 5 minutes → 2 minutes; launch cycle accelerated by 3 days
"Insufficient compute power, technical capability gaps" (China Internet Information Center)
Leading banks (ICBC) spearhead "cross-institution test orchestration platforms"
Smaller banks access shared LLM compute and test resources via open APIs
Case study: City commercial bank using ICBC's "ICBC Zhiyong" LLM for credit risk testing
Cost reduction: 50%
Coverage increase: 88%
Scenario: "Bank + e-commerce" (installment payment)
Coordination: Bank test agents ↔ e-commerce platform test interfaces
End-to-end coverage: Order generation → payment → reconciliation
Efficiency gain: 70% improvement; eliminates "multi-team offline coordination"
ICBC's financial markets trading (forex, bonds) required comprehensive multimodal testing.
Multimodal data coverage:
Trading instruction text: "Buy $10 million against Euro"
K-line chart images: Support/resistance level annotation accuracy
Trading voice commands: Real-time English trader instructions
Multi-agent collaboration:
"Analyst Agent": Macroeconomic data impact analysis
"Accountant Agent": Transaction fee verification
"Compliance Agent": Foreign exchange control checks
End-to-end coverage: Trading decision → execution → compliance
| Metric | Improvement |
| Forex trading decision response test speed | 80% increase |
| Trading execution efficiency | 3× improvement |
| Related business revenue (H1 2025) | 15% YoY growth |
| Operational risk incidence | 62% decrease |
ICBC's implementation demonstrates strong E-E-A-T signals :
Experience: Real trading scenarios with multimodal interaction
Expertise: Domain-specific agents (Analyst, Accountant, Compliance)
Authoritativeness: Large-scale production deployment by tier-1 bank
Trustworthiness: 62% risk reduction validates reliability
Bank credit required comprehensive risk validation.
Multimodal risk signal testing:
OCR: ID card/business license tampering detection
NLP: "Debt default" extraction from public opinion texts
Graph learning: Guarantee chain construction (related-party risk)
Coverage: Identity verification → public opinion risk → related-party transactions
Test results → automatic risk analysis summaries
"High-risk items" flagged (e.g., "company address ≠ actual business location")
Automated comparison with regulatory requirements (Measures for Administration of Personal Loans)
Outcome: "Testing → compliance verification" automation
| Metric | Improvement |
| Identity fraud/document forgery detection | 40% increase |
| Credit testing cycle | 15 days → 7 days |
| NPL rate test prediction accuracy | 91% |
Small/medium bank needed SME loan testing capabilities.
PrivBayes synthetic test data (SME privacy protection)
Hengfeng Bank "Quanshutong" data governance tools for unstructured data (tax certificates)
Data preparation time reduction: 60%
Test Case Agent: SME loan use cases by industry
Automation Agent: Loan application → approval → disbursement simulation
Evaluation Agent: Policy适配 (alignment) quantification
| Metric | Improvement |
| Inclusive finance test coverage | 65% → 88% |
| Testing costs | 50% reduction |
| Defects identified | 12 (tax data forgery, industry qualification mismatch) |
| Inclusive loan volume | 25% YoY increase |
State-owned bank IT localization (Xinchuang) project required comprehensive compatibility testing.
Numerical: Domestic server (Huawei Kunpeng) temperature, bandwidth
Image: Domestic OS (KylinOS) UI interactions
Text: Domestic database (Renmin Jincang) SQL execution
Validation scope: Hardware → OS → database compatibility
ESBMC-AI (repair) Xinchuang code compatibility vulnerabilities
C language syntax adaptation for domestic compilers
Repair accuracy: >90%
Output: Automated Xinchuang test report generation
| Metric | Improvement |
| Test environment downtime | 45% reduction |
| Core system Xinchuang coverage | 92% |
| Xinchuang compatibility defects identified/repaired | 8 |
Synthesizing Top 10 Fintech Development Trends 2025 and industry practice, multimodal-AI agent synergy will accelerate toward "General-Purpose Intelligent Testing Entities."
Financial institutions must "establish digital transformation effectiveness evaluation systems"
From "coverage-first" → "coverage + quality/efficiency ratio"
General-purpose testing entities will include "quality/efficiency assessment modules" that automatically calculate:
Input: Compute resources, time
Output: Defect detection rate, business value enhancement
Sample output: "Testing cost reduced 30%, defect rate decreased 25%"
(2027 projection):
Quantum technology: Encrypted link testing for 80% of major bank core systems
Blockchain: Test data notarization
Edge computing: ATM offline scenario testing
Static data fusion → dynamic physical simulation
Earthquake scenarios: Mobile banking UI interaction testing
Network outage: System (robustness) validation
Method: Multimodal LLMs generate "extreme scenario test cases"
(2026):
Build "general testing agent ecosystems"
Open APIs for capability sharing
ICBC target: 100 small/medium banks accessing "ICBC Zhiyong" testing LLM interface by 2026
Focus on niche scenario lightweight testing
Example: Rural commercial banks specializing in "agriculture, rural areas, farmers" loan testing
Strategy: "Small but precise" differentiation vs. "big but comprehensive" resource waste
(2026-2027):
Voice commands: "Test credit card repayment interface performance"
AR interaction: Testers view mobile banking UI results via AR glasses, issue voice commands to adjust strategy
Drag-and-drop test scenario configuration
Business personnel build "credit approval test flow" without coding
Technical barrier reduction: 80%
Outcome: "Business-led testing" industry shift
| Year | Milestone |
| 2026 | Leading banks achieve core business coverage with General-Purpose Intelligent Testing Entities |
| 2027 | Small/medium institutions achieve large-scale adoption via ecosystem sharing |
| 2028 | Industry-wide: 60% testing efficiency increase; 55% defect rate reduction |
| 2028+ | Financial digital transformation progresses from "technology application" to "value cultivation" |