Spring Batch is used for a number of processing jobs. It’s quite robust, easy to configure and executes fairly well. One of the common tasks it is used for is parsing CSV files. But lets face it, how often do you get CSV files that are well structured and follow RFC4180? CSV files come in all form and shapes… and some of them are syntactically incorrect. You still have to deal with them. In this little blog we will look at three samples:
Sample 1 (simple1.csv):

"John", "Doe", "where ever he lives"
"John", "Doe", "where ever she lives around the corner"

Very simple and straight forward CSV. Its should be easy to parse.
Sample 2 (simple2.csv)::

"John", "Doe", "where ever he lives"
"John", "Doe", "where ever she lives
around the corner"

Now here we have a small issue: a line break with a quoted value. The assumption is that the file reader should continue reading past the line break and continue to the next line.
Sample 3 (simple3.csv):

"John", "Doe", "where ever he lives"
"John", "Doe", "where ever she ""lives around" the corner"

This sample has a serious issue, which actually violates RFC4180. The single double quote ” after ‘around’ breaks the CSV file. Of course you can always go to the provider of the data and ask him to change the CSV file. Well, good luck. I doubt that will happen.
Now lets take a look at some Spring Batch job configurations to read these files.

The Simple FlatFileItemReader

The job definition is very simple. You can pull it straight from any Spring Batch example or book. This will work with simple.csv.

<batch:job id="csv-demo">
  <batch:step id="csv-step">
    <batch:tasklet>
      <batch:chunk reader="csvFileDefaultItemReader" writer="logItemWriter" commit-interval="10"/>
    </batch:tasklet>
  </batch:step>
</batch:job>
<bean id="csvFileDefaultItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
  <!-- Read a csv file -->
  <property name="resource" value="classpath:csv/simple.csv" />
  <property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      <property name="lineTokenizer">
        <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
          <property name="delimiter" value=","/>
          <property name="quoteCharacter" value="&quot;"/>
        </bean>
      </property>
      <property name="fieldSetMapper">
        <bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
      </property>
    </bean>
  </property>
</bean>

The application parses everything correctly and displays the data.

FlatFileItemReader with Addition

Now we are running the same Spring Batch configuration with the second example simple2.csv. We run into a nice ArrayOutOfBoundException since parsing the third line which causes the parse to fail. However the fix is simple. We only need to add a record separator policy to the reader definition.

<bean id="csvFileSeparatorItemReader" class="org.springframework.batch.item.file.FlatFileItemReader">
  <!-- Read a csv file -->
  <property name="resource" value="classpath:csv/simple1.csv" />
  <property name="recordSeparatorPolicy">
    <bean class="org.springframework.batch.item.file.separator.DefaultRecordSeparatorPolicy"/>
  </property>
  <property name="lineMapper">
    <bean class="org.springframework.batch.item.file.mapping.DefaultLineMapper">
      <property name="lineTokenizer">
        <bean class="org.springframework.batch.item.file.transform.DelimitedLineTokenizer">
          <property name="delimiter" value=","/>
          <property name="quoteCharacter" value="&quot;"/>
        </bean>
      </property>
      <property name="fieldSetMapper">
        <bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
      </property>
    </bean>
  </property>
</bean>

CSVItemReader Using a Robust Parser

In out third example we try to process an invalid CSV file. You are out of luck with the default Spring Batch configurations which all rely on the CSV file being valid. After digging around for awhile I did find a CSV parse provided by univocity (https://github.com/uniVocity/univocity-parsers). The parser is easy to use. The important setting is parseUnescapedQuotes which needs to be set to true for our example to work. After finding that parser, it just comes down to the implementation of an item reader, which can be found in the github repository. The Spring Batch for our third example is:

<batch:job id="csv-demo2">
  <batch:step id="csv-step2">
    <batch:tasklet>
      <batch:chunk reader="robustCsvFileItemReader" writer="logItemWriter" commit-interval="10">
      </batch:chunk>
    </batch:tasklet>
  </batch:step>
</batch:job>
<bean id="robustCsvFileItemReader" class="demo.RobustCsvFileItemReader">
  <property name="resource" value="classpath:csv/simple2.csv"/>
  <property name="fieldSetMapper">
    <bean class="org.springframework.batch.item.file.mapping.PassThroughFieldSetMapper" />
  </property>
</bean>

Conclusion

Spring Batch gives you all the tools to parse CSV files. There are a number of configurations available that will solve most of your issues. Once in awhile you will run into issues where the CSV data you need to parse it not valid. In these cases you have to find a suitable CSV, as documented in this blog.
All the example code for this blog can be found at https://github.com/thbehlau/blogs-csv-demo