Automated Accessibility Checks for Downloadable PDFs
Learn about the technical architecture, features, and the motivation behind the PDF A11y Auditor tool.
Development and License
Developed by Dr. Harald Hutter.
License: MIT License.
https://a11y-pdf-audit.fly.dev/
Open Source:
View source code, contribute or report issues on GitHub Repository.Purpose and Idea
The a11y PDF Audit is a modular web application designed to automatically check websites for accessible PDF files.
It crawls any given URL, downloads discovered PDFs, validates them using VeraPDF,
and generates structured HTML and PDF reports automatically.
VeraPDF is a purpose-built, open source, file-format validator covering all PDF/A and PDF/UA parts and conformance levels.
๐ VeraPDF - Industry Supported PDF/A Validation
German Federal Monitoring Agency for Accessibility in Information Technology
The Federal Monitoring Agency for Accessibility of Information Technology (BFIT-Bund) began its work in autumn 2019. It was established on the basis of Section 13(3) of the Disability Equality Act (BGG). As the federal monitoring body, BFIT-Bund performs tasks assigned to Germany by the European Union (EU) Directive on the monitoring, review and reporting of digital services provided by public bodies. (Section 8 of Directive (EU) 2016/2102)
"Many people don't know that PDFs actually have to be barrier-free. There are still misunderstandings, e.g. some people say that PDFs are not a website - but it is clear, and PDFs must be just as accessible. I would like to clarify that."
Main Features (v1.3.0)
- ๐ Dual-Audit System: Validates PDFs simultaneously against the strict ISO PDF/UA-1 standard AND our custom, pragmatic ScreenReadable profile.
- ๐ Recursive Crawler: Searches websites for downloadable PDFs (configurable depth & limit) with smart error handling.
- ๐ Reporting: Generates detailed reports in PDF format (using WeasyPrint) with side-by-side Strict vs. ScreenReadable results.
- ๐ป Web Interface: Easy-to-use Flask frontend with live server logs and report overview.
- ๐งน Auto-Cleanup: Automatically deletes reports older than 14 days to preserve server storage.
- ๐ฏ Perfect Performance: Achieves 100/100 in Google PageSpeed Insights (Performance, Accessibility, Best Practices, SEO).
Limitations and Issues
VeraPDF vs. axesCheck (PAC)
There is a known discrepancy between VeraPDF (used by this tool) and
axesCheck/PAC regarding ISO 14289-1:2014 (PDF/UA-1), specifically rule 7.5 (Tables).
- VeraPDF tends to be very strict and may report `FAIL` on tables where the headers cannot be determined algorithmically according to its strict interpretation of the standard.
- axesCheck might pass the same file if the logical structure is semantically sufficient for screen readers.
- Solution: We introduced the ScreenReadable Profile alongside the strict check to bridge this gap.
Solution: The "ScreenReadable" Profile
To bridge the gap between strict ISO validators and real-world screen reader behavior (like JAWS, NVDA, or axesCheck), this tool runs a dual-audit. First test against the strict PDF/UA-1 standard, and then against a custom ScreenReadable profile, which ignores visual font metrics and strict matrix checks.
View Excluded Rules DetailsQuality and Testing
| Tool | Purpose | Status / Result |
|---|---|---|
| โ flake8 / djlint | Formatting & Style Checking | No critical issues found. |
| โญ pylint | Code Quality / Docstrings Review | Score: > 9.5 / 10 points. |
| ๐ bandit | Security Analysis | No high severity findings. |
| ๐ฟ radon cc | Cyclomatic Complexity Tests | Mainly A-level functions. |
|
๐ PageSpeed
Insights |
provides suggestions
on how that page may be improved |
Performance (LCP=0.2s) 100
Accessibility 100 Best Practices 100 SEO 100 |
|
๐ Screaming Frog
SEO Spider |
use for crawling
up to 500 URLs at a time |
All reported issues solved. |