Java Performance: Splitting Strings

By Max
February 13, 2014

I was working recently on a CSV parser needed for one of my personal projects.
The idea was simple, you have one (or more files) with over 1M lines of CSV data.
I only needed the last 3 fields and wanted to keep it simple, i had implemented a simple method which received a string and invoked its .split method.
Very nice and simple but the performance was very bad, with all the processing it took more than 30s to parse one file.

The format was something like this : “2967487985,D,EUR/JPY,2013-09-01 17:00:42.613000000,130.060000,130.085000”

	public void usingSplit() {
		String[] parts = test.split(SEPARATOR);
		// do processing

My first test was with Java 6 so i gave Java 7  a chance, the splitting time has decreased by 400% but i was not satisfied.

Next step was to try to implement my own splitting using indexOf and a list.

	public void usingIndexOf() {
		List parts = new ArrayList(6);
		int lastIndex = 0;
		int newIndex;
		while ((newIndex = test.indexOf(SEPARATOR, lastIndex)) > -1) {
			parts.add(test.substring(lastIndex, newIndex));
			lastIndex = newIndex + 1;
		// do processing

The time again decreased by 200%, nice!
Can we get a little more from this? Knowing exactly the needed data and the format of the file, i chose to get only the required fields:

	private int amountLength = 0;
	public void usingNeeded() {
		int endStamp = (test.charAt(40)) == CHAR ? 40 : 50;
		// we don't need the first 21 chars
		String stamp = test.substring(21, endStamp++);
		if (amountLength == 0) {
			amountLength = test.indexOf(CHAR, endStamp) - endStamp;
		String bid = test.substring(endStamp, endStamp + amountLength);
		String ask = test.substring(endStamp + amountLength + 1);
		// do processing

The time improved again by 300% so in total the parsing improved by over 2000%!

Here’s a little table of the results for 1M iterations:

JAVA VersionusingSplitusingIndexOfusingNeeded
Java 67444ms915ms117ms
Java 71992ms1045ms275ms

A bit strange that the performance is better on Java 6 for the custom variants than in Java 7 , but i’ll bother with that another time.

Comments: 0

    Leave a Reply

    Your email address will not be published. Required fields are marked *


    pingback from Java Performance: IO vs NIO | Me Likey February 17, 2014