Monday, October 29, 2012

Read CSV with LinqPad

This is a super simple way of reading CSV in linqpad, it won't handle every situation (I'll cover more ways in future posts), but it's pretty handy.

Lets say we have the following CSV:

ID,Name,Email
1,alex,alex@example.org
2,bob,bob@example.org
3,fred,fred@example.org

The first step is to click "my extensions" on the bottom left hand corner of linqpad.

Now copy the following c# code (sourced from: stackoverflow) in the "my extensions" class in linqpad:
The advantage of this rather than simply reading all lines is that it will load the file line by line and using the yield keyword will stop if finds what you need (rather than reading the whole file).

 public static IEnumerable ReadFrom(string file) 
 {
  string line;
  using(var reader = File.OpenText(file))
  {
   while((line = reader.ReadLine()) != null) 
   {
    yield return line;
   }
  }
 }
Remember to press F5 in the my expressions window to compile it.

You can now very easily read the CSV in linqpad using the following linq query (in expression syntax):
Hover over to see descriptions

var csvData =
from row in MyExtensions.ReadFrom(@"c:\sample.csv").Skip(1)
let columns = row.Split(',')
select new
{
 ID = int.Parse(columns[0]),
 Name = columns[1],
 Email = columns[2]
};

csvData.Dump();

Now when you press F5 in LinqPad your CSV will be displayed in a nicely formatted table:

You can then parse the CSV in with normal linq queries, for example this simple linq expression will find people whose name "alex". Please note the intellisense drop down will only in linqpad pro edition (certainly worth the money if you ask me!)

 var alexs = from csv in csvData
         where csv.Name.Contains("alex")
      select csv;

I have found this sort of technique especially useful when comparing on an adhoc basis if CSV data was successfully imported via SSIS etc. This technique won't handle quoted CSV, but let me know and I'll post a follow up with more in-depth CSV handling techniques or perhaps if you want me to cover linqpad further.

8 comments:

Net Developer said...

Nice!

Recently I was looking for something quick to read excel file and found http://code.google.com/p/excellibrary/

Using it in Linqpad was as simple as
var excelQuery = new ExcelQueryFactory(@"D:\Book1.xlsx");

var rows = from x in excelQuery.Worksheet()
select x;

Alex Key said...

Thanks for the comment Net Developer. I'm often having to dive into excel, that library looks good.

Mark Pearson said...

I have problem getting error
System.Collections.IEnumerable' does not contain a definition for 'Skip' and no extension method 'Skip' accepting a first argument of type 'System.Collections.IEnumerable' could be found (press F4 to add a using directive or assembly reference)
What should I do

Daniel Leonard said...

If you work with csv frequently, I would recommend this as an extension instead:
public static IEnumerable<List<String>> ReadCsv(string file)
{
string line;
using(StreamReader reader = File.OpenText(file))
{
while((line = reader.ReadLine()) != null)
{
List<String> currentLine = new List<String>(line.Split(','));
yield return currentLine;
}
}
}

This will return each line of a csv as a List<String>. You can then use linq on each line to create a strongly typed object (with property names Col1, Col2, Col3) like this:
var results = from i in MyExtensions.ReadCsv(@".\FILENAME.csv") select new {Col1 = i[0], Col2 = i[1], Col3 = i[2]};

GZ said...

System.Collections.IEnumerable' does not contain a definition for 'Skip' ...

Use cast
from row in MyExtensions.ReadFrom(@"c:\sample.csv").Cast().Skip(1)

GZ said...
This comment has been removed by the author.
GZ said...
This comment has been removed by the author.
GZ said...


Cast<string>()