Methodology & Dataset Schema
State of Small Business Websites 2026 — Dataset
292 rows. 41 columns. Open data under CC BY 4.0.
Source
DeepAudit AI scan of small business websites sourced from a ZoomInfo prospect list, collected 2026 Q1 through the Cold Call Engine pipeline at Axion Deep Digital. Each site was rendered in a headless Chromium browser (Puppeteer) and evaluated across 100+ technical SEO, performance, accessibility, and security checks.
Anonymization
Personally identifying fields (contact name, email, phone, URL, domain, company name, city) are removed. Each site is identified by site_id, a 16-character SHA-256 prefix of the original domain.
To verify your own site's row:
import hashlib
site_id = hashlib.sha256("yourdomain.com".encode()).hexdigest()[:16]State is retained at the US-state granularity. The 292 rows are not re-identifiable from state alone.
Columns
Identity
site_id— SHA-256 prefix of original domain (16 chars)state— US state code
Overall score
overall_score— DeepAudit composite score (0-100)overall_grade— letter grade (A+ through F)
Mobile Lighthouse (PageSpeed Insights, mobile form factor)
mobile_lh_valid— 1 if PageSpeed returned useful mobile data, 0 if notmobile_performance,mobile_seo,mobile_accessibility,mobile_best_practices— Lighthouse category scores (0-100)mobile_lcp_seconds— Largest Contentful Paint, secondsmobile_fcp_seconds— First Contentful Paint, secondsmobile_cls— Cumulative Layout Shift (unitless)mobile_lcp_bucket,mobile_fcp_bucket,mobile_cls_bucket— Google's good / needs_improvement / poor thresholdsmobile_cwv_all_three— pass if LCP≤2.5s AND FCP≤1.8s AND CLS≤0.1, else fail
Desktop Lighthouse
Same shape as mobile, with desktop_ prefix.
Authority
open_pagerank— Open PageRank value (0-10 scale, proxy for domain authority)
Specific high-signal checks (pass / warn / fail / empty)
check_link_labels— axe-core Link Labels accessibility checkcheck_focus_indicators— axe-core focus visibility checkcheck_form_labels— axe-core form labeling checkcheck_html_validation— W3C HTML validatorcheck_h1_tag— proper H1 tag presencecheck_json_ld— JSON-LD structured data presencecheck_sitemap_xml— sitemap.xml discoverablecheck_hsts— HTTP Strict Transport Security headercheck_primary_keyword— primary keyword placement in visible content
Category rollups
a11y_checks_total/a11y_checks_failed— accessibilitystructured_data_checks_total/structured_data_checks_failed— schema.org markupsecurity_checks_total/security_checks_failed— security headerstechnical_checks_total/technical_checks_failed— technical SEO
Caveats
- Sample selection. Sites were sourced from a B2B prospect list. Skewed toward US small businesses in sales-visible industries. Not a random sample of the web.
- Single scan per site. Performance numbers are one snapshot. Page conditions, CDN caching, time of day, and measurement variance affect LCP/FCP within ~10-20%.
- PSI partial failures. Some sites returned zero across all Lighthouse categories (PageSpeed rendering error). These rows have
mobile_lh_valid = 0. Filter them out before computing mobile performance statistics. - Open PageRank is a proxy. It is not Google's PageRank (retired in 2016). Useful for rank-ordering domains, not for absolute authority claims.
License
Creative Commons Attribution 4.0 International (CC BY 4.0).
You are free to share and adapt this dataset for any purpose, including commercial, provided you attribute:
Axion Deep Digital (2026). State of Small Business Websites 2026. DeepAudit AI scan (n=292). axiondeepdigital.com/research/state-of-small-business-websites-2026
Citation (BibTeX)
@dataset{axiondeepdigital_sbw_2026,
author = {Gutierrez, Joshua R. and Gutierrez, Crystal A.},
title = {State of Small Business Websites 2026},
year = {2026},
publisher = {Axion Deep Digital},
version = {1.0},
url = {https://axiondeepdigital.com/research/state-of-small-business-websites-2026}
}