Online Since 1995

Important Links

Building a Better CSV Parser Engine

Tags: CSV, IoC, Dependency Injection, WinRT, Win8

Yesterday, I posted the CSV Parse Engine for WinRT to help developers write Windows 8 apps more quickly. [download | blog post]

If you’ve had the chance to download the bits,  you may have noticed a few things about the code and had some questions.

This post should answer questions the intro post didn’t.

Developers on a Plane

The entire library was written on a flight from Baltimore to Salt Lake City. 

Developers on a Plane

Writing code on an airplane works well for me since I know I have limited battery life and I generally don’t have an internet connection.

It also helps to have a defined outcome to what you’ll have done by the time the plane lands or your battery runs out.

Taking Apart the Engine

The main focal point of the library is the CsvParser class in the CsvParser.cs file.

It contains everything necessary to parse a CSV file and the goal was to keep it simple.

Here’s all the code it takes to turn a CSV file into a more consumable format: IEnumerable<IDictionary<string, string>>

CsvParser csvParser = new CsvParser(); 
csvParser.RawText = earthQuakeData; 
 
var results = await csvParser.Parse(); 

 

Three lines. Not too shabby. Winking smile

You can choose to keep the data as list (IEnumerable, actually) of Dictionary objects or you can parse it out into a fancier view model, like I did in the sample.

From MainPage.xaml.cs:

this._earthQuakeDataList = new List<EarthQuakeData>(); 
 
            foreach (var item in results) 
            { 
 
                EarthQuakeData earthquakeDataPoint = new EarthQuakeData(); 
 
                earthquakeDataPoint.Source = item["Src"]; 
                earthquakeDataPoint.NST = item["NST"]; 
                earthquakeDataPoint.Region = item["Region"]; 
                earthquakeDataPoint.Latitude = double.Parse(item["Lat"]); 
                earthquakeDataPoint.Longitude = double.Parse(item["Lon"]); 
                earthquakeDataPoint.Depth = double.Parse(item["Depth"]); 
                earthquakeDataPoint.Magnitude = double.Parse(item["Magnitude"]); 
 
                this._earthQuakeDataList.Add(earthquakeDataPoint); 
            } 

 

That’s much easier to work with than the raw data in this file.

 

You’re in Control If You Want to Be

If my parser doesn’t cut it for you, that’s fine. I knew that I couldn’t write the “CSV Parser to end all CSV Parsers” in a short amount of time. I also didn’t want to just rip off code I found on the internet. (That would create all sorts of potential licensing problems, aside from being a bit on the shady side Winking smile.)

Somewhere over Oklahoma, it hit me: use Dependency Injection to abstract away the finer parts of CSV Parsing.

Here’s the interface IParserEngine:

public interface IParserEngine 
{ 
 
    IAsyncOperation<IList<string>> ExtractRecords(char lineDelimiter, string csvText); 
 
    IAsyncOperation<IList<string>> ExtractFields(char delimiter, char quote, string csvLine); 
} 

There are two functions at the core of any CSV parser: 

  1. get the records
  2. get the fields for each record

The advantage to this approach is that I could complete my engine with a relatively simple parser and then work on improving it later.

The sample code has two parser engines: BasicParserEngine and BetterParserEngine.

Basic is pretty basic, there’s not a lot of logic in there.

Better is, well, better and is currently the default engine.

Or, and here’s the best part: you can write your own!

Better is not Best

There’s a lot of room for improvement in the BetterParserEngine and I do hope to release a more robust version.

The “Better Engine,” for instance, does not handle escaped quote characters. I’ve also not benchmarked it for performance.

I’m also pretty sure, if I sat down with a good RegEx book, I could craft a very elegant, if unreadable, RegEx to solve the problem.

But I also know that my schedule between now and the launch of Windows 8 is going to be pretty hectic.

I wanted to give you control in case you didn’t want to wait for me. Smile

Feedback

If you use this Toolkit in your app, please reach out to me (FrankLa at Microsoft cot com) to let me know.

If you're interested, I'd love to create a write up about your app on my blog.  If there's a Health Care, Education or Goverment aspect to your app, we might even be interested in interviewing you and your team to talk about how you used this kit.

Add a Comment