Stop Formatting Spreadsheets: Compliance Data Intake Should Be This Easy

The email arrives at 4:47 PM on a Friday. The client sends a spreadsheet. Not the template you sent them three weeks ago – their own spreadsheet. Different column names. Different format. Three tabs, one of which is labeled “misc.” The “System Name” column is called “Tool.” The “AI Risk Level” column doesn’t exist. Instead, there’s a column called “Sensitivity” with values like “pretty important” and “TBD.”

You have two choices. Spend the next three hours reformatting this into your template. Or send the client a polite email explaining that you need the data in the correct format – which means waiting another week for a spreadsheet that will arrive in a different wrong format.

This is the data intake problem. Not a technology problem. A workflow problem that eats 15-20% of a compliance consultant’s billable hours on activities that create zero client value.

The Template Trap

Every compliance tool has a template. A rigid CSV or Excel file with exact column names, exact enum values, exact field order. The tool ingests the template. Nothing else.

The assumption behind the template is reasonable: structured data requires structured input. An AI system inventory needs system names, types, risk levels, descriptions, owners. Those fields have to exist. The data has to be clean.

The assumption that the consultant should be the one cleaning it is where the model breaks.

Your client doesn’t know what “system_type” means. She knows she has a “hiring AI,” a “credit model,” and a “chatbot.” She doesn’t know the difference between “ai_model” and “software” in your enum. She categorized everything as “software” because that was the first option in the dropdown.

The consultant receives this data. Reads the client’s intent. Maps “hiring AI” to system type “ai_model” with purpose “employment decisions.” Maps “credit model” to “ai_model” with purpose “credit scoring.” Maps “chatbot” to “software” with purpose “customer service.” Fixes the risk levels. Adds descriptions from context she gathered during the kickoff call.

That translation – from client language to compliance schema – is judgment work. Reformatting the spreadsheet to match column names is not. One is consulting. The other is data entry. They get billed at the same rate.

What “Bring Me Anything” Actually Means

We built an import system with one design principle: the consultant should never need to reformat a file.

Six file formats accepted. CSV, TSV, Excel (.xlsx and .xls), JSON, and PDF. Max 50 MB. The parser auto-detects delimiters for CSV files – comma, tab, semicolon, pipe. For Excel files with multiple sheets, you pick which sheet to import. For JSON, nested structures get flattened automatically. For PDFs, the system extracts tables using pdfplumber; if no tables exist, it extracts text for the unstructured flow.

That’s the mechanical part. The interesting part is what happens to the columns.

AI Column Mapping

The client’s spreadsheet has columns called “Tool,” “What it does,” “Who owns it,” “Type,” and “Risk?” Your inventory schema expects “name,” “description,” “owner,” “system_type,” and “ai_risk_level.”

Human pattern matching handles this instantly. “Tool” is obviously “name.” “What it does” is “description.” “Who owns it” is “owner.” You’d rename those columns in about 30 seconds – if the only problem were column names.

The harder problem is values. The “Type” column contains “AI,” “App,” “Database,” and “Third-party.” Your schema expects “ai_model,” “software,” “database,” “api,” “hardware,” “manual_process,” or “other.” The “Risk?” column contains “High,” “Medium,” “Dunno,” and blank cells.

Our mapper sends the source columns, their sample values, and the target schema to AI. It returns three things per column: which target field the source column maps to, a confidence level for the mapping, and value transformations for enum fields – “AI” → “ai_model,” “App” → “software,” “Dunno” → “undetermined.”

The frontend renders this as a visual mapper. Source columns on the left. Target fields on the right. Connection lines between them, color-coded by confidence: green for high, yellow for moderate, red for uncertain. The consultant reviews, adjusts if needed, and confirms. A column that doesn’t map to anything goes to the “ignore” zone.

The reformatting work that used to take three hours now takes three minutes. The consultant still applies judgment – she decides whether “Dunno” should map to “undetermined” or whether she has enough context to reclassify it. The AI handles the mechanical translation. The human handles the decisions.

Deduplication

The second import for the same client is where most tools fail. The client sends an updated spreadsheet. Half the systems are the same ones from the first import – updated names, updated descriptions. The other half are new.

A dumb import creates duplicates. A rigid import rejects the file because the system names don’t exactly match the existing records. Neither is acceptable.

Our dedup runs two-tier matching. First pass: exact name comparison. “Credit Scoring AI” matches “credit scoring ai.” Second pass: fuzzy matching using sequence similarity. “Credit Scoring Model v2” matches “Credit Scoring AI” – flagged as a potential duplicate.

For each potential duplicate, the consultant gets three options: skip (don’t import, keep existing), overwrite (update existing with new data), or import as new (create a separate entry). Keyboard shortcuts – s, o, n – because a consultant reviewing 40 potential duplicates across 200 rows needs speed, not modals.

The consultant overrides what she knows should be different. The system handles the pattern matching that would otherwise require her to eyeball two lists side by side.

Text Extraction: The Unstructured Path

Not every client sends a spreadsheet. Some send a PDF. Some paste text from a meeting transcript. Some forward an email chain where the CTO described their AI systems in three paragraphs with no formatting.

The text extraction endpoint accepts plain text and an entity type (system, process, data store, vendor). Claude reads the text, identifies entities, and returns structured items – each with a name, description, confidence score, and the exact source text that supports the extraction.

A paragraph like: “We use Anthropic’s Claude for our customer support chatbot and have an internal ML model that pre-screens insurance claims. Both connect to our Snowflake data warehouse.”

Becomes three extracted items: - Customer Support Chatbot – system_type: software, purpose: customer service, source: “Anthropic’s Claude for our customer support chatbot” - Insurance Claims Pre-Screening Model – system_type: ai_model, purpose: insurance underwriting, source: “internal ML model that pre-screens insurance claims” - Snowflake Data Warehouse – entity_type: data_store, store_type: cloud_storage, source: “Snowflake data warehouse”

Each extracted item includes the exact text snippet that proves the extraction. The consultant sees what the AI inferred and why. She confirms, edits, or discards. The three paragraphs of unstructured email become three structured inventory entries – with provenance.

Why This Matters for EU AI Act Compliance

Article 9 requires a risk management system. But you can’t manage risk on systems you haven’t inventoried. The inventory is step one of every compliance engagement. It’s also the step where momentum dies.

The consultant sends the template. Waits. Follows up. Receives the wrong format. Reformats. Discovers duplicates from the last engagement. Cross-references against the client’s previous inventory. Asks clarifying questions about three ambiguous entries. Two weeks have passed. The assessment hasn’t started.

With 1,561 AI bills across 50 states and the EU AI Act deadline at August 2, 2026, consultants managing multiple clients can’t afford two-week intake cycles for each one. The assessment work is where value lives. The data collection is overhead.

Smart import compresses the intake from weeks to hours. Not by cutting corners – by eliminating the reformatting labor that the consultant was doing manually and replacing it with AI-assisted mapping that the consultant reviews and approves.

The four-step workflow we designed around: Setup → Assess → Review → Deliver. Smart import is the reason “Setup” doesn’t take longer than “Assess.”

The Security Layer You Don’t See

Accepting arbitrary files from external sources is a security surface. We built accordingly.

Formula injection sanitization prevents CSV injection attacks. Text inputs are stripped of control characters and truncated to prevent abuse. Upload sessions expire automatically – they can’t be replayed. Every import step is audit-logged. Field assignments are validated against a whitelist – you can’t map a column to a field that doesn’t exist in the schema.

None of this is visible to the consultant. She uploads a file. She reviews the mapping. She confirms the import. The security happens underneath, because security that requires user attention is security that gets skipped.

What This Changes

The template-first approach to compliance data intake is a relic of tools built for internal teams who control their own data format. Consultants don’t control their clients’ data format. They never will.

“Bring me anything” isn’t a technical philosophy. It’s a respect for how consultants actually work. The client brings chaos. The AI makes order. The consultant applies judgment. The platform handles the translation layer between messy reality and structured compliance data.

Your clients will never fill out the template correctly.

Build the system that doesn’t need them to.

Smart import supports CSV, TSV, XLSX, XLS, JSON, and PDF formats. All sessions are tenant-isolated and audit-logged.