Guides / CSV Format

What is a CSV File Format? Complete Guide With Examples

Everything you need to know about Comma-Separated Values (CSV) files, the most widely used format for data exchange.

By text2csv.com team · ·

What Does CSV Stand For?

CSV stands for Comma-Separated Values. It's a simple, plain-text file format used to store tabular data, such as spreadsheets or databases. Each line in a CSV file represents a row of data, and each value within that row is separated by a comma (hence the name).

Despite its simplicity, CSV has become the de facto standard for data exchange between different applications, systems, and programming languages. Whether you're exporting data from a database, importing contacts into an email marketing tool, or sharing spreadsheet data, CSV is often the go-to format.

What Does a CSV File Look Like?

A CSV file is simply a text file with a specific structure. Here's an example:

Name,Email,Age,City
John Smith,john@example.com,28,New York
Jane Doe,jane@example.com,34,Los Angeles
Bob Johnson,bob@example.com,45,Chicago

In this example:

  • The first line is the header row, containing column names
  • Each subsequent line is a data row
  • Values are separated by commas
  • Each row has the same number of fields

A Brief History of CSV

CSV is one of the oldest data formats still in active use. Its origins predate personal computers entirely. In the 1960s and 1970s, comma-separated data was already being used on mainframe systems and punch card programs to represent tabular information. IBM Fortran compilers in the 1970s supported list-directed input/output that used commas to separate values — an early form of what we now call CSV.

By the 1980s, personal computer spreadsheet software such as Lotus 1-2-3 and early versions of Microsoft Excel adopted CSV as an import/export format. However, there was no formal specification — each application implemented its own variation. Some used commas, others used tabs or semicolons. Quoting rules varied. This inconsistency caused data corruption when exchanging files between different programs.

The situation improved in 2005 when Yakov Shafranovich published RFC 4180, the first formal specification for CSV. While technically an "informational" RFC rather than an internet standard, it became the de facto reference that most modern software follows.

The RFC 4180 Standard

RFC 4180 defines the official specification for CSV files. Understanding these rules is essential for creating files that work reliably across different applications.

Key rules from RFC 4180 include:

  1. Line breaks: Each record should be on a separate line, ending with CRLF.
  2. Header row: The first line may contain a header with field names (optional but recommended).
  3. Consistent fields: Each line should have the same number of fields.
  4. Quoting: Fields containing commas, quotes, or line breaks must be enclosed in double quotes.
  5. Escaping quotes: Double quotes within a field must be escaped by doubling them ("").

Handling Special Characters

One of the trickiest parts of CSV is handling special characters. What happens if your data contains a comma or a quote? The RFC 4180 standard provides clear rules:

Example: Data with Special Characters

If you have data like:

  • Name: John "Johnny" Smith
  • Address: 123 Main St, Apt 4

It would be encoded as:

"John ""Johnny"" Smith","123 Main St, Apt 4"

CSV MIME Type and File Extensions

The official MIME type for CSV files is text/csv, as registered by RFC 4180. When servers deliver CSV files, the HTTP response should include:

Content-Type: text/csv; charset=utf-8
Content-Disposition: attachment; filename="data.csv"

The standard file extension is .csv. Related formats include:

  • .tsv — Tab-separated values (MIME: text/tab-separated-values)
  • .txt — Sometimes used for delimited text files
  • .csv with semicolons — Common in European locales (still uses .csv extension)

Character Encoding in CSV

One of the most common sources of CSV problems is character encoding. CSV files are plain text, but "plain text" can be encoded in many ways:

EncodingWhen to UseNotes
UTF-8Default for modern systemsRecommended. Supports all languages.
UTF-8 with BOMOpening CSV in ExcelExcel needs the BOM byte (EF BB BF) to detect UTF-8 correctly.
Latin-1 (ISO 8859-1)Legacy Western European systemsCovers accented characters (e.g., e, u, n) but not Asian scripts.
Windows-1252Old Windows software exportsSimilar to Latin-1 with extra characters like curly quotes.
ASCIIEnglish-only dataSafe but limited to 128 characters. A subset of UTF-8.

Common pitfall: If you open a CSV in Excel and see characters like "é" instead of "e", or "â€" instead of a dash, the file was saved in UTF-8 but opened as Latin-1 (or vice versa). Re-open with the correct encoding, or save the CSV with a UTF-8 BOM for Excel compatibility.

Real-World CSV Failures and How to Avoid Them

CSV's simplicity is both its strength and its weakness. Here are common real-world failures that trip up even experienced developers:

Leading Zeros Stripped

A CSV file contains zip codes like 01234. Excel opens it and interprets the value as the number 1234, silently dropping the leading zero. The same happens with phone numbers, product codes, and account numbers.

Fix: In Excel, use the Text Import Wizard and set the column type to "Text". In Google Sheets, format the column as Plain Text before pasting.

Unquoted Fields with Commas

An address field containing "123 Main St, Apt 4" is written without quotes. The parser splits it into two columns: "123 Main St" and "Apt 4". Every row after this point has misaligned columns.

Fix: Always use a proper CSV library (like Python's csv module) that handles quoting automatically, rather than manual string concatenation.

Line Breaks Inside Fields

A "description" field contains a newline character. A naive parser that splits on newlines treats the second half of the description as a new row, corrupting the entire dataset.

Fix: Fields containing line breaks must be enclosed in double quotes per RFC 4180. Use a standards-compliant parser, not line-by-line string splitting.

CSV Injection (Formula Injection)

A field starts with =, +, -, or @. When opened in Excel or Google Sheets, the spreadsheet interprets it as a formula. Malicious data could execute =HYPERLINK("malicious-url") or exfiltrate data.

Fix: When generating CSV from user input, prefix dangerous characters with a single quote or tab character. Sanitize data at the application boundary.

Why is CSV So Popular?

  • Simplicity: Plain text format, easy to create and edit with any text editor.
  • Universal Compatibility: Every spreadsheet, database, and programming language supports CSV.
  • Small File Size: No formatting overhead means compact files. A 1 million row dataset might be 50 MB as CSV but 120 MB as XLSX.
  • Human Readable: Open a CSV and immediately understand its contents — no special software required.
  • Easy to Process: Simple parsing in any programming language. Python, JavaScript, Go, Rust, Java, and R all have built-in or standard CSV libraries.
  • Version Control Friendly: Because CSV is plain text, tools like Git can show exactly which rows changed between versions.
  • Streaming-Friendly: CSV can be read and processed one line at a time without loading the entire file into memory, making it suitable for very large datasets.

CSV vs Other Formats

FeatureCSVExcelJSON
Human ReadableYesNoYes
FormattingNoYesNo
Nested DataNoNoYes
File SizeSmallMediumMedium

Common Uses for CSV Files

  • Data Export/Import: Moving data between applications and databases. Most SaaS tools (Stripe, Shopify, Mailchimp) export data as CSV.
  • Spreadsheet Data: Sharing data without proprietary formatting. CSV works in Excel, Google Sheets, LibreOffice, and Numbers.
  • Contact Lists: Importing/exporting email contacts. Gmail, Outlook, and HubSpot all use CSV for contact import.
  • Financial Data: Bank statements, transaction records, and accounting data. Most banks offer CSV downloads for statements.
  • E-commerce: Product catalogs, inventory lists, and order exports. Shopify and WooCommerce use CSV for bulk product management.
  • Analytics: Exporting reports from Google Analytics, Mixpanel, or custom dashboards.
  • Machine Learning: Training datasets are commonly distributed as CSV files. Kaggle, the largest ML competition platform, uses CSV as its default format.
  • ETL Pipelines: Extract-Transform-Load processes frequently use CSV as an intermediate format between systems.

CSV in Programming Languages

Every major programming language provides CSV support, either built-in or through standard libraries:

LanguageLibraryType
Pythoncsv module, pandas.read_csv()Built-in
JavaScriptPapa Parse, csv-parsenpm packages
Goencoding/csvStandard library
JavaOpenCSV, Apache Commons CSVThird-party
Rread.csv(), readr::read_csv()Built-in
Rustcsv crateThird-party

CSV Limitations

While CSV is versatile, it has real limitations you should be aware of:

  • No data types: Everything is a string. Numbers, dates, and booleans have no explicit type — the receiving application must interpret them.
  • No nested data: CSV is flat. If you need hierarchical or nested structures, use JSON or XML instead.
  • No metadata: There is no way to specify encoding, delimiter, or column types within the file itself.
  • No multiple tables: A CSV file contains exactly one table. For related datasets, you need multiple files.
  • Ambiguity: Without metadata, parsers must guess the delimiter, encoding, and quoting rules. This leads to interoperability issues.

Convert Text to CSV

Need to convert text data to CSV format? Our free Text to CSV Converter makes it easy. Simply paste your text, choose your delimiter, and get properly formatted CSV output that follows RFC 4180 standards.

Key Takeaways

  • CSV stands for Comma-Separated Values — a plain text format for tabular data dating back to the 1960s
  • RFC 4180 defines the official standard, published in 2005
  • The MIME type is text/csv with .csv file extension
  • Always use UTF-8 encoding (with BOM for Excel compatibility)
  • Special characters — commas, quotes, and line breaks — must be quoted and escaped properly
  • Watch out for leading zeros, formula injection, and encoding mismatches
  • CSV is universally compatible with all spreadsheet, database, and programming environments