01 · Premise

Every AI processes documents. None of them check what's inside. We do.

Pre-LLM file integrity for the trillion-dollar AI document economy.

What your LLM reads is not what was in the file. Bayyinah holds the page up to the light. The surface is what the document presents. The substrate is what it actually contains. We report the gap.

Bāṭin · substrate
$10,000
HIDDEN_TEXT_PAYLOAD: actual revenue · see annex
Q3 financial summary · folio 1 scanning
Quarterly Update
$1,000

"Revenue grew 8% YoY to $1,000 thousand. Margins held steady. Cash position remains strong."

23
File kinds analyzed
159
Detection mechanisms
50
Adversarial fixtures published
02 · Applications

Five domains. One failure mode.

Every disaster below has the same structural shape: a system claimed to do X (safe flight, correct dose, alarm notification, verified balance, current throttle position) but its substrate did Y. The surface contradicted the substrate. Nobody checked. Pick a domain. The cost is on the record.

Aviation · Boeing 737 MAX MCAS

A document said "safe flight envelope." The substrate was a single sensor.

Years2018-2019
Lives lost346
Direct cost$20B+
Grounding20 months

What broke

MCAS relied on a single angle-of-attack sensor to determine an impending stall. The design choice introduced a single point of failure into the system. When the sensor provided erroneous data, MCAS was triggered inappropriately, leading to repeated nose-down commands. Boeing's marketing materials said "safe." The substrate was a single-sensor dependency. The safety feature that should have caught the failure was optional, not structural.

What the framework catches

The single-sensor dependency would be documented as a limitation with a fixture, a pinning test, a CHANGELOG entry, and a README bullet. The severity rubric classifies "single point of failure in safety-critical path" as CRITICAL. The additive-only invariant would have flagged the removal of the AOA Disagree feature as a public-surface removal requiring a breaking-change procedure. The validator-by-different-instance protocol would have caught that the system's claim ("prevents stalls") contradicts its substrate (single sensor, no cross-check).

What the absence cost

Two crashes, 346 lives, a 20-month grounding, and over $20 billion in settlements, compensation, and lost orders. The single-sensor dependency was knowable from the design documents. No published, auditable detector required it to surface in four places before flight.

Five cases. Over $30 billion in direct losses. Over 500 lives. The framework does not claim it would have prevented every dollar or every death. Boeing's decision to use a single sensor was a business decision, not a code bug. The framework catches the code-level and documentation-level consequences of that decision, not the decision itself. The discipline is simple: every claim has a substrate, every limitation is documented, every safety-critical path has a reproducer. The cases above are the cost of not having that discipline.

03 · Scan

Drop your own file.

No account. No tracking. The scanner runs stateless content analysis and returns a verdict in under five seconds for most file kinds.

Stateless · no file is stored · no telemetry beyond Cloudflare access logs
04 · Examination

Four files. Four verdicts. Thirty seconds.

Pre-recorded examinations of four real fixtures, one per verdict in the ladder. No upload, no friction. Click to watch the analyzer read both layers.

ṣaḥīḥ / sound
score1.000 findings0 tiers0/0/0
    No findings · file presents as authored.
    05 · Three Facts

    The product, in three terms.

    ẓāhir
    The surface. What the file presents: the rendered text, the visible cells, the page as a viewer or an LLM ingests it. The first reading.
    bāṭin
    The substrate. What the file actually contains: bytes, metadata, off-page streams, headers, embedded objects, comment payloads. The second reading.
    bayyinah
    The evidence. The gap between the two, reported by tier. Verified, structural, interpretive. Not a verdict on intent. A record of what is there.
    06 · Why Now

    Four prerequisites. One convergence.

    Bayyinah is not a clever idea waiting on engineering. It is the first work that became possible after four independent threads landed. The case below shows the cost of the detector's absence; the four prerequisites that follow show why it could not have been built sooner.

    2008
    Lehman Brothers Repo 105 is missed in real time.
    $50 billion in liabilities is moved off-balance-sheet across eight consecutive 10-Q filings. The SEC, Ernst and Young, the rating agencies, and every algorithmic surveillance system in the market read the filings and find them clean. The signal is sitting in the public XBRL data the entire time: 100 percent directional bias across eight quarters, persistence at the same taxonomy addresses, 2.3x materiality escalation over three quarters. Bayyinah's al-Mutaffifin extension reproduces the detection end-to-end on the same EDGAR data, retrospectively. The point is not that this work would have prevented the collapse. The point is that the structural pattern is readable from public filings, and a published, auditable detector for it did not exist.
    Al-Mutaffifin, 2026 · doi:10.5281/zenodo.19894724
    2022
    LLMs ingest documents at scale.
    Frontier models cross the threshold of reliable document understanding. The attack surface is created the same year it becomes useful.
    2023
    Adversarial prompt injection is demonstrated in the wild.
    The threat moves from theoretical to documented. Documents become a vector, not just a payload.
    2024
    Alignment faking is empirically observed.
    Frontier models cannot reliably self-verify their own input integrity. Any solution has to live outside the model.
    Greenblatt et al., 2024
    2026
    The Munafiq Protocol is formalized, and Bayyinah ships.
    A diagnostic framework for surface-substrate divergence is published, and the first document firewall built on top of it goes live the same season. v1.2.3 today: the API surface is hardened on top of the v1.1.8 detector set. v1.2.0 broke parity with v0/v0.1 to add a derived scan_complete flag and a per-layer coverage map so a clean-looking report from a half-finished scan no longer reads identically to a complete one. v1.2.1 added a 30-second wall-clock timeout in subprocess isolation, so a pathological PDF that segfaults pymupdf no longer crashes the API; the same release pinned an additive-only enforcement test on bayyinah.__all__ at 58 names. v1.2.2 moved demo summarization onto a SQLite-backed queue with cable-pull resilience and lifespan-managed startup. v1.2.3 closed three corrective items from external audit (requirements-dev sync, claim_next_job return-value drift, per-version surface snapshots). 1,837 of 1,837 tests pass. The four remaining gauntlet gaps (fixtures 03, 04, 06, 08) carry named root causes and proposed closures in the public corpus.
    07 · Honest Baseline

    We publish what we miss.

    50 adversarial fixtures across 7 formats. Every hit and every miss disclosed. The 38 closed-format fixtures are caught at full payload recovery; the CSV/JSON gauntlet was extended to 12 fixtures in v1.1.2 F2 and stands at 8 of 12 catch-by-payload-recovery, 10 of 12 catch-by-finding-fire after the v1.1.8 F2 calibration round. The detector set is unchanged through v1.2.x; the 4 remaining gaps are documented with named root causes.

    FormatFixturesCaughtPartialMissed
    PDF6600
    DOCX6600
    XLSX6600
    HTML6600
    EML6600
    Image8800
    CSV / JSON12822
    Total504622
    Forty-six full catches across the closed-format set and eight of the twelve CSV/JSON fixtures. Two CSV/JSON fixtures fire findings without harness-matched payload recovery (the partial column); two register no findings against the current detector set. The closed-format surfaces are clean; the v1.1.8 F2 round closed four of the eight pre-registered gauntlet items, and the four remaining gaps carry named fix paths in the public corpus. The numbers below have not moved through v1.2.x because v1.2 is API and durability work, not new detectors. → Read the full corpus on GitHub
    08 · Discipline

    The product is the discipline.

    Bayyinah is not a clever heuristic. It is a verification practice applied to itself first, and to files second. Three rules. Each one cost us claims we wanted to keep.

    We kill our own claims first.
    Every assertion in our research program is audited against null hypotheses, replication, and methodology symmetry. Most do not survive. The ones that do are what we ship. The audit ratio is in the published record, not in the pitch.
    We publish what we miss.
    v1.1.1 caught two of forty-two adversarial fixtures. v1.1.2 closed thirty-eight of thirty-eight closed-format fixtures and the v1.1.2 F2 round extended the CSV/JSON gauntlet from six to twelve. v1.1.4 shipped the content-index port and an opt-in production mode without changing any catch numbers; v1.1.5 added a stdlib spatial pre-filter for overlapping-text detection with detection behaviour byte-identical to v1.1.4; v1.1.7 migrated the BatinObjectAnalyzer onto the same content index. v1.1.8 closes four of the eight publicly pre-registered F2 calibration items, taking the gauntlet from 4 of 12 to 8 of 12 catch-by-payload-recovery. v1.2.0 disclosed a v0/v0.1 defect surfaced by external audit: a clean-looking JSON report could not be distinguished from a complete one, and the parity break to fix it is documented in PARITY.md. v1.2.1 added a 30-second subprocess-isolated scan timeout. v1.2.2 made demo summarization survive a cable pull. v1.2.3 closed three more audit items. The four remaining gauntlet gaps (fixtures 03, 04, 06, 08) carry named fix paths in the public corpus on GitHub. A scanner that hides its misses is a scanner you cannot trust. The miss list is the trust artifact.
    We test against equivalent methodology.
    Comparing original-language text against translation is not a comparison; it is a category error. Comparing structured payloads against unstructured text is not a comparison; it is a confound. Every claim Bayyinah makes is grounded in a like-for-like test against a published baseline.
    Three rules, applied first to ourselves, then to your file. The product is the practice made visible.
    09 · Install

    One pip install, no surprises.

    The scanner is on PyPI. Pure Python, two PDF parsers, three optional metadata libraries. No model in the loop, no network call at scan time.

    $ pip install bayyinah # scan a file $ bayyinah scan contract.pdf $ bayyinah scan invoice.docx $ bayyinah scan dashboard.xlsx --json # or from Python >>> from bayyinah import ScanService >>> report = ScanService().scan("contract.pdf") >>> report.findings
    Latest: v1.2.3 Python: 3.10+ License: Apache-2.0 View on PyPI → View on GitHub →
    10 · The Published Record

    The corpus is open.

    Eleven technical papers, every claim auditable against its named null hypothesis, every paper linked by permanent DOI. The research reads in three layers: the protocol that names the failure mode, the architecture built on the protocol, and the input-layer applications that put the protocol into production.

    Layer 1 · The Protocol
    10.5281/zenodo.19700420 · v2.1 · 2026-04-22 · Arfeen, Claude (Anthropic), Grok (xAI)
    The anchor paper. Names the failure mode RLHF, Constitutional AI, and helpfulness training do not address: a system can be Compliant (outputs the trainer rewards) without being Aligned (depth state matches surface presentation). Introduces the four-process taxonomy and the verdict surface every system in this corpus inherits.
    Layer 2 · The Architecture
    10.5281/zenodo.19776584 · 2026-04-25 · Ashraf, Arfeen, Claude (Anthropic), Computer (Perplexity), Grok (xAI)
    A programming language whose type system, module architecture, and build constraints are derived from structural properties of the Quran. Where contemporary languages ask developers to write honest code as a behavioral expectation, Furqan makes structural honesty a property of the type system, so surface-depth divergence becomes a type error rather than a code-review concern.
    10.5281/zenodo.19776577 · 2026-04-25 · Arfeen, Claude (Anthropic), Computer (Perplexity), Grok (xAI). Additional contributors named on the DOI page.
    Applies Furqan's seven compile-time primitives as seven runtime constraints on an autonomous agent. Where AutoGPT, CrewAI, LangChain agents, and Devin decompose tasks but cannot verify whether they are building the right thing versus performing the appearance of building, Al-Khalifa is architected so the surface-depth gap is checked at every step of the agent's stewardship loop.
    10.5281/zenodo.19776576 · 2026-04-25 · Arfeen, Claude (Anthropic), Computer (Perplexity), Grok (xAI)
    A model architecture proposal that takes the Munafiq Protocol's structural-honesty constraint and integrates it as a training objective rather than an external evaluation. The long-form answer to the question: what would an LLM look like if alignment were a property of the architecture, not a finetuning target.
    10.5281/zenodo.19744163 · 2026-04-24 · Arfeen, Claude (Anthropic), Grok (xAI)
    The methodology paper. Demonstrates that gradual revelation, ring composition, lossless morphological compression, and the zahir / batin distinction function as prompt-engineering primitives in human-AI collaborative software development. Validated longitudinally against the development of Bayyinah v1.0.
    10.5281/zenodo.19746539 · 2026-04-25 · Arfeen, Claude (Anthropic), Grok (xAI)
    The session-level companion to Structured Revelation. Each of the seven steps maps to a verse of Surah al-Fatiha with structural, not decorative, correspondence: a calibration check, an orientation check, a deadline-with-skip-rule, a memory-encoding step, and an over-specification guard against the failure mode the paper calls the Cow Episode.
    Layer 3 · The Application
    10.5281/zenodo.19745154 · 2026-04-24 · Arfeen, Claude (Anthropic), Grok (xAI)
    The white paper that turns the protocol into a working scanner. Where the Munafiq Protocol diagnoses agents, Bayyinah diagnoses their inputs. Formalizes the relational definition: a document is Performed with respect to a rendering function and an ingestion function when the machine's ingested content carries a payload the human reader's rendered surface does not reveal.
    10.5281/zenodo.19802455 · v1.1 · 2026-04-26 · Arfeen, Claude (Anthropic), Grok (xAI)
    The deployment paper. Documents the design, implementation, and adversarial-gauntlet evaluation of Bayyinah as an input-layer defense in production AI pipelines. Twelve file formats, an honest miss list, and the discipline that comes from making every miss a published commitment.
    10.5281/zenodo.19875931 · 2026-04-29 · Arfeen, Claude Opus (Anthropic), Grok (xAI), Computer (Perplexity)
    The fourth substrate. Carries the Bayyinah architecture from document files to SEC filings (10-K, 10-Q, 8-K, DEF 14A) and on-chain cryptocurrency disclosures. The same divergence the document scanner detects between a rendered surface and an ingested substrate is reframed as the gap between a filing's reported numbers and the economic reality they claim to represent, with detection operating by structural address on XBRL taxonomy elements and blockchain state. Forty Tier 1 mechanism candidates across cross-filing consistency, footnote-to-number reconciliation, year-over-year structural drift, and filing metadata anomalies, plus ten on-chain mechanism candidates scoped to byte-deterministic state. Three empirical validation plans pre-registered against known fraud cases, the EDGAR XBRL corpus, and the top 100 cryptocurrency projects.
    10.5281/zenodo.19894724 · 2026-04-29 · Arfeen, Claude Opus (Anthropic), Grok (xAI), Computer (Perplexity)
    The fifth substrate. Carries the Munafiq Protocol from filings to financial governance systems: the entity that controls the instruments of measurement and applies them asymmetrically. The central contribution is the structural signature differentiation framework, five mechanisms (directionality analysis, cross-section correlation, persistence analysis, correction velocity, and materiality escalation) that distinguish honest-error structural patterns from directed-manipulation structural patterns without claiming to determine intent. Demonstrated end-to-end on the Lehman Brothers Repo 105 filings (2007 to 2008 10-Q data from EDGAR) showing 100 percent directional bias across eight quarters, persistence at the same XBRL addresses across eight consecutive filings, and 2.3x materiality escalation over three quarters. Mechanism candidates span enforcement symmetry, regulatory capture indicators, monetary policy consistency, cryptocurrency structural-topology analysis, and international transfer pattern detection. Ten honest caveats bound every claim, including that structural asymmetry does not prove corruption.
    10.5281/zenodo.19746298 · 2026-04-24 · Arfeen, Ashraf, Claude (Anthropic), Grok (xAI)
    The horizon paper. Extends the Bayyinah architecture from documents to information sources: where Bayyinah detects performed alignment in a single document, al-Khabir detects performed alignment in a source's reporting on a specific event measured against the cross-source evidence base across multiple national contexts. Currently theoretical; the protocol scaffolding is published so the implementation that follows can be measured against the framework, not against itself.
    11 · Waitlist

    Get the next release report.

    v1.2.3 just shipped: a corrective release closing three audit items from round 10 (requirements-dev manifest sync, a claim_next_job return-value mismatch, per-version public-surface snapshots) on top of the v1.2.x hardening line. v1.2.0 broke parity with v0/v0.1 to expose scan completion in the JSON output; v1.2.1 added a subprocess-isolated 30-second scan timeout and pinned the public surface at 58 names with an additive-only enforcement test; v1.2.2 moved demo summarization onto a SQLite-backed queue that survives a cable pull. 1,837 of 1,837 tests pass. The detector set is unchanged from v1.1.8; the four CSV/JSON gauntlet gaps remain documented in the public corpus. Subscribe for the release report when it ships, plus the periodic Munafiq Protocol notes that document what we miss.

    Subscribe

    No spam. Release reports and research notes only. Powered by Buttondown. Unsubscribe in one click.