I am proud to announce that my CSV Parser for Windows RT is now available on MSDN Code Samples.
Creating a CSV parser sounds like an easy task, but it’s the developer equivalent to quick sand.
CSV stands for Comma Separted Values and, despite the name, the files in this "format" are often not fields separated by commas.
Often, tabs or pipe characters are used instead.
Additionally, parsing a CSV file is not as straightforward as it seems.
One would think that it would be as simple as splitting the raw text first by line to get the records and then by delimiter to extract the fields.
Simple Enough, right?
Like many elements to our line of work, it can be that simple, but it rarely is.
The Slippery Slope
Here’s an example from a feed about earthquake data from the USGS.
Notice how that nice simple code you had in mind suddenly gets more complex.
Clearly, a simple string.Split(‘,’) won’t cut it.
But wait, it gets worse.
Descent into Development Hell
The first rule of plain text files is that once you pick a special character, you have to come up with a way to escape that character anywhere else it may appear.
It’s a slippery slope, to be sure. First, we made the delimiter character a comma, then when have a comma legitimately appear in our data, we have to make it clear that it’s content, not structure.
Not a big deal, right? Just put the field inside quotes. But what happens when you have to escape the escape characters?
Just Use RegEx!
True, but using RegEx means you have to understand RegEx completely. RegEx, when properly used, works great.
But few people know RegEx all that well.
The bottom line is summed up in this quote:
“Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems.” - Jamie Zawinski
But This is the 21st Century, Use XML or JSON!
Many organizations make extensive use of the CSV file format for data exchange.
In fact, many government agencies expose their datasets on Data.gov via CSV.
Simply put: JSON or XML may not be an option for some organizations.
It’s hard to beat a file format that’s as easy to create as CSV. (It can be as simple as choosing Save As in Excel.)
The bottom line is that there’s a great deal of content available out there in the wild and developers should spend their time creating great Apps, not parsing files.