Tim Hastings - NonHostile (because there's no need)

Weblog and collection of geeky articles.

  Home :: Who? :: Contact :: Links :: Subscribe subscribe
Congratulations to Retiree Terry!!Abigail's ChristeningA Grand Day Out in Morecambe


This code snippet will decode a CSV line and return a string array.
Imports System.Text.RegularExpressions

Public Class CSV

    Public Shared Sub Test()

        Dim data() As String

        data = DecodeCSV("")
        data = DecodeCSV("1,2,3")
        data = DecodeCSV("1")
        data = DecodeCSV("1 ")
        data = DecodeCSV("1,""2"",3")
        data = DecodeCSV("1,""23"",3")
        data = DecodeCSV("1,""23"",3")
        data = DecodeCSV("12, asd, fsdfsdf, ""inquotes""")
        data = DecodeCSV("12,asd,""inquotes""")
        data = DecodeCSV("12, asd, ""inquotes""")
        data = DecodeCSV("12 , asd , ""inquotes""")
        data = DecodeCSV("12, asd, fsdfsdf, """"""Hello """"world"""" """)
        data = DecodeCSV("pony, and, trap")
        data = DecodeCSV("   yeah! ,,,,,,,,,,,man!!,,,,,,,,,,   ")

    End Sub

    Public Shared Function DecodeCSV(ByVal strLine As String) As String()

        Dim strPattern As String
        Dim objMatch As Match

        ' build a pattern
        strPattern = "^" ' anchor to start of the string
        strPattern += "(?:""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*))"
        strPattern += "(?:,(?:[ \t]*""(?<value>(?:""""|[^""\f\r])*)""|(?<value>[^,\f\r""]*)))*"
        strPattern += "$" ' anchor to the end of the string

        ' get the match
        objMatch = Regex.Match(strLine, strPattern)

        ' if RegEx match was ok
        If objMatch.Success Then
            Dim objGroup As Group = objMatch.Groups("value")
            Dim intCount As Integer = objGroup.Captures.Count
            Dim arrOutput(intCount - 1) As String

            ' transfer data to array
            For i As Integer = 0 To intCount - 1
                Dim objCapture As Capture = objGroup.Captures.Item(i)
                arrOutput(i) = objCapture.Value

                ' replace double-escaped quotes
                arrOutput(i) = arrOutput(i).Replace("""""", """")
            Next

            ' return the array
            Return arrOutput
        Else
            Throw New ApplicationException("Bad CSV line: " & strLine)
        End If

    End Function

End Class


8 comments, VB.Net, Tuesday, April 5, 2005 12:02

Timeline Navigation for VB.Net posts
VB.Net: How To Draw Anti-Aliased Text Over Bitmaps (made 8 weeks later)
VB.Net: How To Decode CSV Data using Regular Expressions (this post, made Tuesday, April 5, 2005 12:02)
VB.Net: How To Creat an Instance of an Object using its Type Name (made 4 weeks earlier)


Comments
This code is freaking great, saved me alot of time for a quick little csv reading/writing program I needed to write!!! The code worked perfectly the first time, and is very masterfully written, following propor coding standards. This is coming from a VB.NET coder who is anal about coding standards and propor coding.

Posted by: Triforce on Friday, November 4, 2005 10:56
....tho obviously not so anal about propEr English Triforce? ;-)

Nice article Tim!

Posted by: Matt Quinn on Monday, February 26, 2007 11:21
What can I say. Awsome.
Just Bloomin Awsome. Well do chap!

Posted by: Johnny B on Tuesday, October 28, 2008 21:49
Hi,
Could you please convert to c# and send me the code, I couldn't convert the string pattern

Posted by: Bhavani Sudha on Friday, October 31, 2008 19:04
Excellent - well useful!

One small thing I noticed is that it doesn't handle a single set of quotes in a string..

An example test case that will raise an error is:
data = DecodeCSV("12" & Chr(34) & " pizza")

Excel will handle csv files with that scenario just fine.

Posted by: IanMcD on Monday, January 26, 2009 14:37
i freaking love u

Posted by: Jr on Tuesday, March 16, 2010 23:00
Thanks, This very usefull. Here is a c# translation.
/// <summary>
/// Decodes CSV string
/// </summary>
public static string[] DecodeCSV(string strLine)
{
// http://www.nonhostile.com/page000029.asp
// http://www.nonhostile.com/howto-decode-csv-regular-expressions-vb-net.asp
string strPattern = "^";
strPattern = (strPattern + "(?:\"(?<value>(?:\"\"|[^\"\\f\\r])*)\"|(?<value>[^,\\f\\r\"]*))") + "(?:,(?:[ \\t]*\"(?<value>(?:\"\"|[^\"\\f\\r])*)\"|(?<value>[^,\\f\\r\"]*)))*" + "$";
Match objMatch = Regex.Match(strLine, strPattern);
if (!objMatch.Success)
{
return new string[] { };
}
Group objGroup = objMatch.Groups["value"];
int intCount = objGroup.Captures.Count;
string[] arrOutput = new string[(intCount - 1) + 1];
for (int i = 0; i < intCount; i++)
{
Capture objCapture = objGroup.Captures[i];
arrOutput[i] = objCapture.Value;
arrOutput[i] = arrOutput[i].Replace("\"\"", "\"").Trim();
}

return arrOutput;
}

Posted by: Karl Tarbet on Thursday, July 22, 2010 18:28

Post a Comment
Name:  Home page and email address are optional.
  Email addresses will not be displayed or spammed!
Remember these details
Email:
Home Page:
Comment:
Comments cannot contain HTML, URLs will be formatted into hyperlinks.
I reserve the right to remove any comments for any reason.