EU AI ACT · META · AUTOMATED VS HUMAN AUDIT · DOG-FOODING

Skanowałem własny audit service AIR Blackbox'em — 6/57 passed

Piotr Reder · aiactaudit.pl 05 maja 2026 · ~10 min czytania

Sprzedaję EU AI Act audit. Logiczne pytanie: czy mój własny service przeszedłby audit?

Dziś rano (05.05.2026) zainstalowałem AIR Blackbox — open-source EU AI Act compliance scanner, 51+ automated checks Articles 9-15, post-quantum signed evidence. Apache 2.0. Tech-deep, używa go kilka EU SaaS w produkcji. Idealne narzędzie do tego eksperymentu.

Uruchomiłem na aiactaudit.pl source. Wynik:

Passing

Warnings

Failing

Out of 57 checks total · Static: 6/49 · Runtime: 0/8

Embarrassing? Niekoniecznie. Ten wynik mówi coś ważnego o automated compliance tools — i o DLACZEGO każdy audit wymaga human interpretation.

TL;DR

Automated EU AI Act tools (jak AIR Blackbox) skanują AI applications — projekty z LLM API integration, ML pipelines, agent orchestration. Mój audit service to landing page + service workflow, NIE AI app. 80% checks dotyczy runtime AI behavior (prompt injection, automation bias, model drift) — irrelevant dla mojego use case. To pokazuje: scope-aware audit (taki jaki my dostarczamy) bije one-size-fits-all checklist. Automated tools są wartościowe ALE wymagają human interpretation żeby zrozumieć "co applies dla TWOJEGO use case".

Setup eksperymentu

Krok po kroku, bez ukrywania niczego:

# Install (wymaga Python 3.10+)
brew install python@3.11
python3.11 -m pip install --user air-blackbox

# Run static code scan
air-blackbox comply --scan /path/to/aiactaudit.pl --no-llm

Project skanowany: directory _AIAct/ zawierający:

14 plików HTML (landing, articles, sample audit, calculator, intake form)
1 plik Python (render_audit.py — template renderer dla audit deliverables)
JSON + CSV (lead lists, configs)
Markdown docs (research, internal notes)

Note: aiactaudit.pl NIE używa LLM API w produkcji. To service-business landing, NIE AI application. To kluczowy detal dla interpretacji wyników.

Co AIR Blackbox sprawdził — 7 kategorii

Kategoria	Checks	Mój wynik
Article 9 — Risk Management	~7	1 pass / 4 warn / 2 fail
Article 10 — Data Governance	~5	0 pass / 3 warn / 2 fail
Article 11 — Technical Documentation	~5	1 pass / 2 warn / 2 fail
Article 12 — Logging & Audit Trail	~5	0 pass / 1 warn / 4 fail
Article 13 — Transparency	~4	0 pass / 4 warn / 0 fail
Article 14 — Human Oversight + Agent Boundaries	~10	0 pass / 9 warn / 1 fail
Article 15 — Accuracy / Cybersecurity	~10	3 pass / 5 warn / 2 fail
GDPR cross-checks	~8	0 pass / 8 warn / 0 fail

Top 5 fails — co to RZECZYWIŚCIE znaczy

1. ❌ "No risk classification (Article 6)"

Co AIR Blackbox mówi: "Article 6 wymaga klasyfikacji ryzyka systemu. Nie znaleziono dokumentacji."

Reality dla mnie: Article 6 dotyczy AI systems. Mój system to service business z landing page. To jak skanowanie kawiarni pod kątem "missing aircraft autopilot certification". Check is technically correct ALE not applicable.

Co fix wymaga: dodać RISK_CLASSIFICATION.md w root project z explicit "this is not an AI system per Article 6 definition".

2. ❌ "No logging infrastructure detected"

Co AIR Blackbox mówi: "No Python logging framework, no tamper-evident audit chain (Article 12 requires)."

Reality: aiactaudit.pl jest static HTML deployed na Vercel. Vercel ma własne logging. Plus ja jestem service provider — moim "audit trail" są emaile + intake form submissions + GA4 events. Article 12 wymaga logging dla high-risk AI systems, NIE dla landing pages.

Realny gap: ja powinienem mieć formal Article 12 logging IF mój service używa AI internally do audit work (np. LLM helping classify Annex III). Obecnie używam Claude Code dla preparowania content (NIE w runtime audit decisions). To limited risk w najgorszym razie.

3. ❌ "Token expiry / execution bounding" (Article 14)

Co AIR Blackbox mówi: "Agent może działać indefinitely without bounds."

Reality: AIR Blackbox sprawdza autonomous AI agents w produkcji. Mój product to landing page + email service. Nie mam żadnych autonomous agents w runtime. Check N/A.

4. ❌ "Data governance documentation"

Co AIR Blackbox mówi: "Article 10 wymaga data governance docs. Nie znaleziono."

Reality: Article 10 dotyczy training data dla AI systems. Ja nie trenuję modeli. Mój "data" to lead lists CSV + intake form submissions. GDPR data protection applies, NOT Article 10. To chodzi o privacy.html (mam) i RoPA (mam), nie Article 10 governance.

5. ❌ "GDPR consent management patterns"

Co AIR Blackbox mówi: "No consent patterns w code."

Reality: mam cookie consent (statycznie w HTML), privacy policy, intake form z explicit consent text. AIR Blackbox skanuje code patterns (regex match dla consent_ w Python), nie HTML compliance text. Static analysis ma blind spots.

Co to wszystko mówi o automated tools

AIR Blackbox jest świetnym narzędziem dla tego co designed do robić — skanowanie AI applications z LLM API, agent frameworks, ML pipelines. Pełen feature set:

51+ static checks Articles 9-15 (kod patterns, config, docs)
Runtime monitoring przez gateway proxy (intercepting LLM calls)
Post-quantum signed evidence ML-DSA-65 (FIPS 204)
Multi-framework mapping (ISO 42001, NIST AI RMF, Colorado SB 24-205)

Dla LangChain agent farms, OpenAI Assistants production deployment, custom LLM apps — to game-changer. Continuous monitoring, audit-ready evidence, integration z 7+ frameworks.

Ale — i tu jest nuance — automated scanner nie wie czy Twój projekt to:

AI application (gdzie wszystkie 57 checks applies)
Service business z AI w internal tooling (gdzie ~30% applies, plus extra checks dotyczące delivery)
Static landing page (gdzie ~10% applies, większość N/A)
SaaS używający LLM API (gdzie applicability zależy od czy use case wpada w Annex III)

Te kategorie wymagają human classification first. Inaczej dostajesz raport z 80% fails które są irrelevant — i 20% real gaps się zgubi w noise.

Insight: automated tool który "skanuje wszystko" generuje false positives + false negatives. Real audit zaczyna się od scope definition: "co konkretnie sprawdzamy, dla jakiego ryzyka, w jakim use case context". Bez scope, raport jest noise.

Co AIR Blackbox złapał jako real (Pricora porównanie)

Skanowałem też pricora_platform/ (mój drugi projekt — Next.js SaaS dla biur rachunkowych w PL). Wynik:

Passing

Warnings

Failing

Pricora: 10/57 passing — 4 punkty więcej niż aiactaudit.pl

Pricora ma więcej passing checks bo:

Larger codebase (Next.js + TypeScript) → więcej patterns wykryte
Structured logging hints w kodzie (NIE ML logging, ale Sentry-style)
Authentication patterns (Supabase Auth) — częściowo matche dla agent identity binding
API security headers + rate limiting w Next.js middleware

Pricora też NIE jest AI app — to pricing calculator. Same scope mismatch jak aiactaudit. Ale więcej code = więcej incidental matches.

Wniosek: oba scores (6/57 i 10/57) są scope mismatched. Oba projekty są w rzeczywistości limited risk per Annex III definition (Pricora może być argumented że transparency Art. 50 applies bo ma calculator). Ani jeden nie jest high-risk.

Dlaczego scope-aware audit (nasz €799) bije automated checklist

Krok po kroku, co audit wykonuje pierwszego dnia:

Annex III risk classification — czy projekt to high-risk czy nie? Decision tree tutaj. Bez tej klasyfikacji 80% reszty checków jest N/A.
Provider vs deployer scope — kto co odpowiada? GPAI obligations details.
System inventory — co dokładnie jest "AI system" w projekcie. Często NIE wszystko (np. sentiment analysis dla customer support ≠ AI system per AI Act).
Articles 9-15 applicability matrix — który article applies dla zidentyfikowanych AI systems. Art. 10 detail, Art. 14 detail.
Gap analysis — RZECZYWISTE gaps relative do real applicable checks, NIE static checklist.
Roadmap — priorytetyzowany action plan z effort estimates.

To wymaga human judgment. Automated tool może wspierać każdy z tych kroków (AIR Blackbox świetne dla #6 once #1-3 done), ale nie zastępuje ich.

Realny use case dla AIR Blackbox

Gdyby ktoś u mnie zamówił audit i pokazał projekt taki jak np. HR-Tech SaaS (Annex III #4 employment), to:

Klasyfikuję jako high-risk Annex III #4 (manual, 2h pracy)
Run AIR Blackbox na ich code (15 min)
Filter results — ~40/57 checks applies dla HR-Tech
Human interpret findings — które gaps są real, które są false positive
Map każdy real gap do Article + remediation effort
Build roadmap PDF + Loom walkthrough

To jest hybrid approach: tool dla automation, human dla scope + judgment + delivery. €799 fee covers human work; tool is free Apache 2.0 OSS.

Practical takeaways dla EU SMB SaaS

Don't fear automated tools — AIR Blackbox/VerifyWise/etc. są darmowe i wartościowe. Run je na własnym kodzie.
Don't trust raw scores — 6/57 dla service business jest fine. 6/57 dla high-risk AI app to red alert. Interpretation matters.
Map scope first — Annex III classification + system inventory MUSI być zrobione przed scoring. Inaczej raport jest noise.
Use AIR Blackbox dla actual AI deployments — jeśli masz LangChain agent farm w produkcji, ten tool jest worth €100k saved consulting. Run go.
Don't pay for "automated audit" services które po prostu run takich tools i print raport — value is interpretation, NIE scan.

Honesty disclaimer (eat your own dog food)

Możesz pomyśleć: "facet sprzedaje audit a ma 6/57 passing, joke". Fair point. Ale zauważ:

Mój produkt NIE jest AI app — to service. Audit Article applicability is different.
Wciąż ZRÓBMY te real fixes które applies (jest ich kilka, identifiable z 31 warnings):
- Add explicit RISK_CLASSIFICATION.md stating "service business, not AI system per Annex III"
- Document RoPA.md (Records of Processing Activities) for GDPR — nawet jeśli mam privacy policy, formal RoPA nie zaszkodzi
- Add SECURITY.md describing data flow (lead CSV, intake form, audit deliverables)
Run scan jeszcze raz post-fix → score should improve do ~15/57. Wciąż 70% N/A bo scope mismatch.

To uczciwość jest moja value proposition. Konsultanci EU AI Act często sprzedają "100% compliant" claims które są FTC-style misleading. Ja sprzedaję "clarity" — dokładnie co applies, co nie, co fix, w jakim priority.

Sprawdź swoje real EU AI Act scope

Zamów audit za €799 (founding tier, limited 10 spots) — w 5 dni dostajesz: Annex III classification, system inventory, gap analysis Articles 9-15, prioritized roadmap. PDF + Loom walkthrough. 30-day money-back guarantee.

Zamów audit →

Tools mentioned

AIR Blackbox (Apache 2.0) — pip install air-blackbox
VerifyWise (BSL 1.1) — self-hosted compliance platform
EU Compliance Bridge (EUPL-1.2) — AI Act + EAA mapping

Wszystkie open source. Wszystkie wartościowe dla appropriate use case. Żaden nie zastępuje human-led audit.

Co dalej

Plan na resztę miesiąca:

Run AIR Blackbox monthly na klient projects (post-audit) — track compliance drift
Contribute back: fix scope-classification UX issue do upstream (PR do airblackbox/airblackbox)
Test eucompliance AI Act + EAA bridge package
Publish next blog: "EU AI Act for US SaaS expanding to EU" (US traffic 82% naszego analytics — write content dla nich)

Disclaimer: ten artykuł jest informacyjny, NIE legal advice. Konkretne implications dla Twojego systemu wymagają legal opinion od EU AI Act-specialized lawyer w połączeniu z technical audit.