Tim Hastings - NonHostile (because there's no need)

Weblog and collection of geeky articles.

  Home :: Who? :: Contact :: Links :: Subscribe subscribe
Preparing for Dave and Michelle's WeddingOur Holiday in Costa De La LuzEnjoying Blackpool Beach


This code sample shows how to convert UTF-8 byte sequences (aka code page 65001) into unicode strings and back again in Visual Basic 6 instead of .Net like most of the examples I could find. What's more, this sample does not use APIs, but instead, relies on the Stream object provided by the ADODB library. This may not be the most efficient way of doing it, but these functions can be easily ported to Classic ASP by dropping all the variable types.

For this code to work, you will need to add a reference to the Microsoft ActiveX Data Objects 2.5 Library later versions of this library will also work.

The first function converts a unicode string to a byte array:
' accept a byte array containing utf-8 data
' and convert it to a string
Public Function ConvertStringToUtf8Bytes(ByRef strText As String) As Byte()

    Dim objStream As ADODB.Stream
    Dim data() As Byte
    
    ' init stream
    Set objStream = New ADODB.Stream
    objStream.Charset = "utf-8"
    objStream.Mode = adModeReadWrite
    objStream.Type = adTypeText
    objStream.Open
    
    ' write bytes into stream
    objStream.WriteText strText
    objStream.Flush
    
    ' rewind stream and read text
    objStream.Position = 0
    objStream.Type = adTypeBinary
    objStream.Read 3 ' skip first 3 bytes as this is the utf-8 marker
    data = objStream.Read()
    
    ' close up and return
    objStream.Close
    ConvertStringToUtf8Bytes = data

End Function
This second function does the opposite, converting a byte array into a unicode string:
' accept a byte array containing utf-8 data
' and convert it to a string
Public Function ConvertUtf8BytesToString(ByRef data() As Byte) As String

    Dim objStream As ADODB.Stream
    Dim strTmp As String
    
    ' init stream
    Set objStream = New ADODB.Stream
    objStream.Charset = "utf-8"
    objStream.Mode = adModeReadWrite
    objStream.Type = adTypeBinary
    objStream.Open
    
    ' write bytes into stream
    objStream.Write data
    objStream.Flush
    
    ' rewind stream and read text
    objStream.Position = 0
    objStream.Type = adTypeText
    strTmp = objStream.ReadText
    
    ' close up and return
    objStream.Close
    ConvertUtf8BytesToString = strTmp

End Function
This test method uses a function called DecodeBase64 which is defined in this article: Free, Easy and Quick Base64 Encoding and Decoding in Visual Basic.
Public Sub Main()

    Dim strB64 As String
    Dim data() As Byte
    Dim strTmp As String
    
    ' define test data as base64 and decode to array of bytes
    strB64 = "R3JlZXRpbmdzIGFuZCBTYWx1dGF0aW9uISAo4oKsKSBhbmQgc29"
    strB64 = strB64 & "tZSBVcmR1OiDaqdix2KfahtuMINm+2Kfaqdiz2KrYp9mG24w="
    data = DecodeBase64(strB64)

    ' convert from utf-8 to string
    strTmp = ConvertUtf8BytesToString(data)
    
    ' convert back to bytes
    data = ConvertStringToUtf8Bytes(strTmp)
   
End Sub
Please note that the VB6 IDE and the standard VB6 form controls have difficulty showing Unicode characters and will show exotic characters as '?????'

The code from this article can be download as a VB6 project here: NonHostile_VB6_Convert_UTF8.zip

Hope this helps :-)



22 comments, Visual Basic 6, Thursday, January 26, 2006 21:51

Timeline Navigation for Visual Basic 6 posts
VB6: How To Convert UTF-8 Byte Arrays into Unicode Strings (and vice versa) (this post, made Thursday, January 26, 2006 21:51)
VB6: Variant Stack Class (Code Library) (made 88 weeks earlier)


Comments
i thought it could help so here is a function that will encode a string utf8 from pure VB6 source code:

Private Function UTF8_Encode(ByVal sStr As String)
For l& = 1 To Len(sStr)
lChar& = AscW(Mid(sStr, l&, 1))
If lChar& < 128 Then
sUtf8$ = sUtf8$ + Mid(sStr, l&, 1)
ElseIf ((lChar& > 127) And (lChar& < 2048)) Then
sUtf8$ = sUtf8$ + Chr(((lChar& \ 64) Or 192))
sUtf8$ = sUtf8$ + Chr(((lChar& And 63) Or 128))
Else
sUtf8$ = sUtf8$ + Chr(((lChar& \ 144) Or 234))
sUtf8$ = sUtf8$ + Chr((((lChar& \ 64) And 63) Or 128))
sUtf8$ = sUtf8$ + Chr(((lChar& And 63) Or 128))
End If
Next l&
UTF8_Encode = sUtf8$
End Function

Posted by: jelo on Saturday, February 11, 2006 05:06
Well, I'll test those two functions. If they work, you saved my life!! I had ISO-1 data to convert to UTF-8 for an XML import, and I was unable to guess what to do.
Happily, there is always the option of destroying all the accents and "strange" characters...

Posted by: spiritoo on Thursday, February 16, 2006 15:53
Hiya, I hope it did the trick!

Posted by: Tim on Thursday, February 16, 2006 21:57
Thank you very, very much!!!!

Posted by: Maike on Friday, August 3, 2007 14:34
Thanks for your code, I've used it with vb.net 2005, and worked just fine.

Posted by: samm lee on Thursday, September 13, 2007 12:01
Thanks, you saved me lots of time!!

Posted by: Milan Rajkovic on Wednesday, January 9, 2008 15:05
Perfect!! I used it in MS Access.

Posted by: Timo on Thursday, January 31, 2008 16:56
thank u for yr .........help

Posted by: karthick on Monday, February 4, 2008 14:39
Great code, thanx a lot, perfect for comunicating with flash xml sockets :D

Posted by: joseXR on Wednesday, August 6, 2008 13:10
Hello, sorry but I am finding this same function but in VB .Net, because some of the instances of ADO are not working in .Net. May some body help me, thank you

Posted by: Alejandro Ruiz on Monday, October 13, 2008 16:41
Very, very, very thank you

Posted by: elektron on Wednesday, October 22, 2008 12:38
Hi "Maike " and "joseXR"!
VB.NET / 2005 does not need this trick mentioned here. In .NET, all strings are internally stored as UTF-16 and can be converted to an UTF-8 byte array easily with the following method:

System.Text.Encoding.UTF8.GetBytes( __stringToConvert__ )


Posted by: [Stefan] on Monday, January 12, 2009 10:52
Hi jelo,
Actually i tried to use ur code in VB6.
It return the same string what i gave as input.
in adddition to ur code i added declaration nothing else i changed.
Any thing that i needd to change

Posted by: Raja on Thursday, January 22, 2009 14:55
Good info.

A little tip for paste utf text in a vb field form, is use richtextbox instead of textbox.

http://tramusos.wordpress.com

Posted by: tramusos on Saturday, March 7, 2009 19:31
Great code, you saved my life :D

Posted by: kos on Tuesday, June 23, 2009 10:44
It's work!!! thank you so much

Posted by: tim on Friday, September 11, 2009 12:07
Thank you very much ...

Posted by: spark on Monday, September 21, 2009 08:05
Function UTF8_Decode(ByVal sStr As String)
Dim l As Long, sUTF8 As String, iChar As Integer, iChar2 As Integer
For l = 1 To Len(sStr)
iChar = Asc(Mid(sStr, l, 1))
If iChar > 127 Then
If Not iChar And 32 Then ' 2 chars
iChar2 = Asc(Mid(sStr, l + 1, 1))
sUTF8 = sUTF8 & ChrW$(((31 And iChar) * 64 + (63 And iChar2)))
l = l + 1
Else
Dim iChar3 As Integer
iChar2 = Asc(Mid(sStr, l + 1, 1))
iChar3 = Asc(Mid(sStr, l + 2, 1))
sUTF8 = sUTF8 & ChrW$(((iChar And 15) * 16 * 256) + ((iChar2 And 63) * 64) + (iChar3 And 63))
l = l + 2
End If
Else
sUTF8 = sUTF8 & Chr$(iChar)
End If
Next l
UTF8_Decode = sUTF8
End Function
' For more information look at http://de.wikipedia.org/wiki/UTF-8.

Posted by: Rainer on Wednesday, March 3, 2010 15:01
Great! Perfect for encoding accented vowels in XML files & generate RSS 2.0 feeds! Thanks a lot to all!

Posted by: Luca on Monday, March 8, 2010 12:37
Beware of bits! Encoding of characters > 2048 does not work!
Try replacing that code with the following lines:

sUTF8 = sUTF8 + Chr((((lChar \ &H1000&) And &HF&) Or &HE0&))
sUTF8 = sUTF8 + Chr((((lChar \ &H40&) And &H3F&) Or &H80&))
sUTF8 = sUTF8 + Chr(((lChar And &H3F&) Or &H80&))

This code will correctly encode the Euro € symbol too.

Posted by: Luca on Thursday, May 20, 2010 22:41
Thank you very much.
But I have problem on Latin character: ô
I think that It is Unicode but it's ascii code is 0x00F4.
When I submit "ô" to server, it í converted to "o". I don't know how to fix.

Posted by: tham tu on Tuesday, May 25, 2010 04:38
Thanks a million for the ConvertStringToUtf8Bytes code!!!
I have been working with website data that I have no control of for months and have not been able to figure out a way to filter out the bad characters out of the XML response that I receive. So, I added a small for-next loop to rebuild the string filtering out characters where data(x)>127 and viola, NO MORE INVALID XML CHARACTERS! Thanks again!

Posted by: Todd on Wednesday, July 28, 2010 02:05

Post a Comment
Name:  Home page and email address are optional.
  Email addresses will not be displayed or spammed!
Remember these details
Email:
Home Page:
Comment:
Comments cannot contain HTML, URLs will be formatted into hyperlinks.
I reserve the right to remove any comments for any reason.