Automated Accessibility Checks for Downloadable PDFs

Learn about the technical architecture, features, and the motivation behind the PDF A11y Auditor tool.

Development and License

Developed by Dr. Harald Hutter.
License: MIT License.
https://a11y-pdf-audit.fly.dev/

Open Source:

View source code, contribute or report issues on GitHub Repository.

Purpose and Idea

The a11y PDF Audit is a modular web application designed to automatically check websites for accessible PDF files. It crawls any given URL, downloads discovered PDFs, validates them using VeraPDF, and generates structured HTML and PDF reports automatically.
VeraPDF is a purpose-built, open source, file-format validator covering all PDF/A and PDF/UA parts and conformance levels.

🔍 VeraPDF - Industry Supported PDF/A Validation

German Federal Monitoring Agency for Accessibility in Information Technology

The Federal Monitoring Agency for Accessibility of Information Technology (BFIT-Bund) began its work in autumn 2019. It was established on the basis of Section 13(3) of the Disability Equality Act (BGG). As the federal monitoring body, BFIT-Bund performs tasks assigned to Germany by the European Union (EU) Directive on the monitoring, review and reporting of digital services provided by public bodies. (Section 8 of Directive (EU) 2016/2102)

"Many people don't know that PDFs actually have to be barrier-free. There are still misunderstandings, e.g. some people say that PDFs are not a website - but it is clear, and PDFs must be just as accessible. I would like to clarify that."

— Michael Wahl, Head of the German Federal Monitoring Agency for Accessibility in Information Technology

Main Features (v1.3.0)

🔍 Dual-Audit System: Validates PDFs simultaneously against the strict ISO PDF/UA-1 standard AND our custom, pragmatic ScreenReadable profile.
🌐 Recursive Crawler: Searches websites for downloadable PDFs (configurable depth & limit) with smart error handling.
📊 Reporting: Generates detailed reports in PDF format (using WeasyPrint) with side-by-side Strict vs. ScreenReadable results.
💻 Web Interface: Easy-to-use Flask frontend with live server logs and report overview.
🧹 Auto-Cleanup: Automatically deletes reports older than 14 days to preserve server storage.
💯 Perfect Performance: Achieves 100/100 in Google PageSpeed Insights (Performance, Accessibility, Best Practices, SEO).

Limitations and Issues

VeraPDF vs. axesCheck (PAC)
There is a known discrepancy between VeraPDF (used by this tool) and axesCheck/PAC regarding ISO 14289-1:2014 (PDF/UA-1), specifically rule 7.5 (Tables).

VeraPDF tends to be very strict and may report `FAIL` on tables where the headers cannot be determined algorithmically according to its strict interpretation of the standard.
axesCheck might pass the same file if the logical structure is semantically sufficient for screen readers.
Solution: We introduced the ScreenReadable Profile alongside the strict check to bridge this gap.

Solution: The "ScreenReadable" Profile

To bridge the gap between strict ISO validators and real-world screen reader behavior (like JAWS, NVDA, or axesCheck), this tool runs a dual-audit. First test against the strict PDF/UA-1 standard, and then against a custom ScreenReadable profile, which ignores visual font metrics and strict matrix checks.

View Excluded Rules Details

Quality and Testing

Tool	Purpose	Status / Result
✅ flake8 / djlint	Formatting & Style Checking	No critical issues found.
⭐ pylint	Code Quality / Docstrings Review	Score: > 9.5 / 10 points.
🔒 bandit	Security Analysis	No high severity findings.
🌿 radon cc	Cyclomatic Complexity Tests	Mainly A-level functions.
🚀 PageSpeed Insights	provides suggestions on how that page may be improved	Performance (LCP=0.2s) 100 Accessibility 100 Best Practices 100 SEO 100
🔐 Screaming Frog SEO Spider	use for crawling up to 500 URLs at a time	All reported issues solved.