[ > ] WEB3 AI RESEARCH GROUP
AI Security Research
Benchmarks, audit pipelines, and measurable results for smart contract security
[ > ] WHAT WE OBSERVE
The tipping point
AI exploits 72% of known bugs yet the best AI defender shows only 55% precision. Simultaneously powerful and unreliable.
AI audit precision
exploited post-audit
92% of exploited contracts in 2022 had already been audited by at least one firm. Audits alone don’t prevent exploits. (AnChain.AI, 2023)
best AI precision
The highest independently verified precision for a production AI audit tool is 55.3% (Sherlock AI v2.2, 2026). Raw LLMs range from 6% to ~50%.
tools, minimal integration
Over 200 tools catalogued for smart contract security (Iuliano & Di Nucci, 2024). Best F1 among 17 tested scanners: 73%. Most below 50%.
Hypothesis
We exist because we see in this gap not a verdict, but a research problem. Hybrid architectures combining LLM reasoning with external verification push boundaries that neither can cross alone.
[ > ] THREE CHALLENGES
What defines our work
CHALLENGE 01
Capability boundaries
AI catches patterns at 73.7-100% F1 on lab benchmarks, but real-world recall drops to ~48%. It breaks on business logic, cross-contract attacks, and novel vulnerabilities.
Definition of Done
Systematically explore the boundary. Find specific architectural solutions (hybrid LLM + formal verification, CPG slicing) that push capability now.
CHALLENGE 02
Reliability
Production tools at ~55% precision. Academic systems claim 91% F1, but on sterile benchmarks. Self-correction without external feedback makes results worse.
Definition of Done
Precision >85% on production code. Every finding verified by external oracle (static analysis, formal verification, symbolic execution).
CHALLENGE 03
Fragmentation
200+ tools catalogued, minimal integration. Existing tools could have prevented only 8% of high-impact attacks. 49% of findings resist automated detection entirely.
Definition of Done
Unified pipeline that selects the right tool for the task. For every pairing of vulnerability type and approach, measure precision/recall.
Latest Publications