David J Murray Dev
~ Learn and Do - Excel and Share ~

Decoding CSV

Oct 4, 2007
You're Reading: Decoding CSV

I am attempting to decode a web log file, into a UniVerse record. The weblog record fields are space delimited with text fields within double quotes. This is similar to a standard CSV (comma separated value) file format.

I personally hate massive string extraction work, and this started to look like spaghetti code.

So, while walking around the local park, for a breather, the idea hit me. Why not break the web log string line into groups, based on where the text fields are. Then it hit me again, this would work on CSV files also.

OK, the problem is that field delimiters (comma’s in CSV) can be within a text field (delimited by double quotes). So, if you break up or group the record by the text marks (double quotes) and then process each group separately, it should be easier to handle.

As the text marks must be in pairs, the group before a text mark will be normal field(s) and can have the field delimiters converted to attribute marks, the group between text marks must be processed as is, ignoring the field delimiters (comma’s) and the group after a text mark also must be processed as normal individual fields.

As a more general statement, if there are a mixture of normal fields and text fields, then all odd numbered groups consist of normal field(s) delimited and all even numbered groups are text fields (delimiters ignored).

So, the end result, along with error checking and all the gunk that turns a couple of lines of code into a usable routine, is:

SUBROUTINE DECODE.CSV(RESULT, CSV.TXT, FLD.DELIM, STR.DELIM)
*-------------------------------------------
*
* Decode Character Seperated Value line of text to a record
*
* Input : CSV.TXT - A single line of CSV text
*         FLD.DELIM - Field delimiter, normally a comma
*         STR.DELIM - String delimiter, normally a double quote
*
* Output: RESULT - A record with fields delimited by Field Marks (@FM)
*                        - EMPTY.STR if an error
*            @USER.RETURN.CODE - 0 - No error
*                                          - 1 - An error
*
$COPYRIGHT "David Murray 2007"
*
*-------------------------------------------
*
* Rev Date        User         Notes
* 0   04 Oct 2007 David Murray Created
*
* Setup equates or constants
EQUATE EMPTY.STR TO "" ;* Empty string
EQUATE COMMA TO "," ;* Comma for field delimiters
EQUATE STR.QUOTE TO '"' ;* String quote
*
* MAIN
GOSUB INITIALISE
GOSUB DECODE
GOSUB FINALISE
IF OK NE @TRUE THEN
*
RESULT = EMPTY.STR
@USER.RETURN.CODE = 1
END ELSE
*
@USER.RETURN.CODE = 0
END
*
RETURN
*
* GOSUBs
*
* Initialise variables, open files etc.
INITIALISE:
*---------
*
OK = @TRUE ;* General error flag
RESULT = EMPTY.STR ;* The resultant record goes here
* Quick error check and exit
IF CSV.TXT EQ EMPTY.STR THEN
OK = @FALSE
RETURN
END
* Check for defaults
IF FLD.DELIM EQ EMPTY.STR THEN FLD.DELIM = COMMA
IF STR.DELIM EQ EMPTY.STR THEN STR.DELIM = STR.QUOTE
* Can not have same character for field and string delimiters.
IF FLD.DELIM EQ STR.DELIM THEN
OK = @FALSE
RETURN
END
* Must have an even number of string delimiters else there is a syntax error
IF MOD(COUNT(CSV.TXT, STR.DELIM), 2) NE 0 THEN
OK = @FALSE
RETURN
END
RETURN
* Decode CSV to fields using @AM
DECODE:
*-----
*
IF OK NE @TRUE THEN RETURN ;* General error check
* Count the number of text strings and change the string quotes to attribute marks
* to group data
GROUP.MAX  = DCOUNT(CSV.TXT, STR.DELIM)
CSV.TXT = CHANGE(CSV.TXT, STR.DELIM, @AM)
* Initialise result var. Each field will be appended to the result.
RESULT = EMPTY.STR
* Cycle through each group, checking group position of odd and even.
FOR GROUP.PTR = 1 TO GROUP.MAX
IF MOD(GROUP.PTR,2) NE 0 THEN
* If the group position is odd then it is not a text field, so convert field delimiter(s) to a field mark(s)
RESULT := CHANGE(CSV.TXT, FLD.DELIM, @FM)
END ELSE
* The group position is even, so is a text field, so DO NOT convert field delimiter(s) to field marks,
* just copy whole text field as is.
RESULT := CSV.TXT
END
NEXT GROUP.PTR
RETURN
*
FINALISE:
*-------
*
RETURN

So, using a bit of paper, write down a line of CSV, divide up the line by the text marks (double quotes) and have a go…

Oh, I assume that this method is already documented somewhere. If you find or know of similar methods, add a link or reference as a comment. Thanks.

save.run.done.

About the Author

<a href="https://davidjmurray.dev/losucag/davidjmurray/" target="_self">djm</a>

djm

I have been in the computer/IT industry for over 35 years. After spending half a life-time in highly proprietary database environments, I have succumbed to the WordPress Eco-system and I'm well down the chasm. I am specializing in AWS hosting, Wordpress CMS, Elegant Themes, in particular Divi and Extra themes.

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Pin It on Pinterest

Share This