[ > ] WEB3 AI RESEARCH GROUP

AI Security Research

Benchmarks, audit pipelines, and measurable results for smart contract security

[ > ] WHAT WE OBSERVE

The tipping point

AI exploits 72% of known bugs yet the best AI defender shows only 55% precision. Simultaneously powerful and unreliable.

AI audit precision

30pp gap to close
~55%Current best (Sherlock AI)
85%Target
92%

exploited post-audit

~55%

best AI precision

200+

tools, minimal integration

Hypothesis

We exist because we see in this gap not a verdict, but a research problem. Hybrid architectures combining LLM reasoning with external verification push boundaries that neither can cross alone.

[ > ] THREE CHALLENGES

What defines our work

CHALLENGE 01

Capability boundaries

AI catches patterns at 73.7-100% F1 on lab benchmarks, but real-world recall drops to ~48%. It breaks on business logic, cross-contract attacks, and novel vulnerabilities.

Definition of Done

Systematically explore the boundary. Find specific architectural solutions (hybrid LLM + formal verification, CPG slicing) that push capability now.

CHALLENGE 02

Reliability

Production tools at ~55% precision. Academic systems claim 91% F1, but on sterile benchmarks. Self-correction without external feedback makes results worse.

Definition of Done

Precision >85% on production code. Every finding verified by external oracle (static analysis, formal verification, symbolic execution).

CHALLENGE 03

Fragmentation

200+ tools catalogued, minimal integration. Existing tools could have prevented only 8% of high-impact attacks. 49% of findings resist automated detection entirely.

Definition of Done

Unified pipeline that selects the right tool for the task. For every pairing of vulnerability type and approach, measure precision/recall.

Read Manifesto

Interested in our research?

View Research

Want to collaborate?

Get in Touch