Blog

How to Compare Two CSV Files and Find the Differences

April 11, 2026

You have two CSV files. They should contain the same data, but something changed between them. Maybe it was a migration, a weekly export, or a supplier feed update. The question is always the same: what is different?

This guide covers how to find those differences accurately, whether you are dealing with 50 rows or 500,000.

What “comparing” a CSV file actually means

A CSV file is a table. Each row represents a record. Each column is a field. Comparing two CSV files means answering three questions:

Which rows exist in file A but not in file B?
Which rows exist in file B but not in file A?
For rows that exist in both, which field values changed?

That is the full picture of what changed between two files. A true CSV comparison gives you all three.

The naive approach: opening both in a spreadsheet

For small files, opening both in Excel or Google Sheets and eyeballing them works. Sort both by the same column, scroll through side by side, and look for differences.

This breaks down quickly. Above a few hundred rows it is slow, error-prone, and exhausting. It also only works if both files have exactly the same structure: same columns, same order. If a column was added or removed between the two files, the visual comparison falls apart immediately.

For anything larger or more structured, you need a systematic approach.

Matching rows by key column

The most important decision when comparing two CSV files is how you identify matching rows. The naive approach compares row by row in order: row 1 in file A against row 1 in file B. This only works if both files have exactly the same rows in exactly the same order.

In practice, rows move. Records get reordered, filtered, or sorted differently between exports. The right approach is to match rows by a key column: a field that uniquely identifies each record, like an order ID, product SKU, or user ID.

With key-based matching:

A row with key 1005 in file A is compared to the row with key 1005 in file B, regardless of position
If key 1005 exists in A but not in B, that row is reported as missing
If key 1009 exists in B but not in A, that row is reported as new

This is the only reliable way to compare CSV files where row order may differ.

Composite keys

Some datasets do not have a single unique column. An order might be uniquely identified by both order_id and region together. In those cases you need a composite key, which is multiple columns combined to identify a row.

For example: order_id=1002, region=EU is one record. order_id=1002, region=US is a different record. Comparing on order_id alone would incorrectly treat them as the same row.

A good CSV comparison tool handles composite keys. Sheet2Sheet lets you pass multiple columns: --key order_id,region.

What to do with structural differences

Sometimes the two files do not have the same columns. A column was added to the new export. A deprecated field was removed. These structural changes matter and they need to be reported separately from row-level diffs.

When comparing two CSV files:

Columns present in file A but missing in file B should be flagged as removed
Columns present in file B but missing in file A should be flagged as added
Only columns present in both files should be compared for value changes

Ignoring structural differences means you can miss entire categories of change.

Manual comparison with command-line tools

If you are comfortable on the command line, a few approaches work for smaller files.

diff compares files line by line. It works if both files have identical row order and structure, but gives raw output that is hard to read for CSV data.

awk and sort can be combined to sort both files the same way before diffing. This helps with row order but still produces output that requires interpretation.

For small files with simple structure, these tools are fine. For anything more complex, like composite keys, large files, or detailed field-level diffs, a dedicated tool is faster and more reliable.

Automated CSV reconciliation

A dedicated CSV reconciliation tool handles all of this without scripting:

You pass two files and specify the key column
The tool matches rows by key across both files
It reports missing rows with their full data, new rows with their full data, and changed rows with field-by-field diffs
Structural column changes are reported separately

The output is human-readable and actionable. You see exactly which records changed, which fields differ, and what the old and new values were.

Sheet2Sheet does exactly this. It runs locally, handles large files quickly, and produces clean text output or JSON depending on what you need.

Try Sheet2Sheet on your own CSV files