Java: Count instances of a substring

6 01 2009

Despite my distaste for Java, I work in it every day at e-kiwi (author of software Screen-Scraper).

As usual, I was surprised to find that Java has no easy way to count the number of occurances of a substring within a given String variable.  I Google’d it and was surprised at the first result’s poor coding.  It had a ‘while (true)‘ loop with a goofy break statement tied to a misleadingly-named variable.

Try this:

public int countIndexOf(String text, String search)
{
    int count = 0;
    for (int fromIndex = 0; fromIndex > -1; count++)
        fromIndex = text.indexOf(search, fromIndex + ((count > 0) ? 1 : 0));
    return count - 1;
}

Pardon the ternary operator in there, but Java doesn’t like making boolean-casting-to-int easy.

If there are any errors you encounter, just let me know and I’ll fix it.  My use for such a function doesn’t require the ‘border’ cases where substrings are at the beginning or end of the main string.  If you know that you don’t need to match a substring which will begin at the first character of the main string, you can change the ternary part from:
fromIndex + ((count > 0) ? : 0)
to just
fromIndex + 1

Sorry to the dude that I flamed on his blog, but poorly written code is easy to find these days.  Idioms are the way to go.  Java’s slow enough as an interpretted language that it doesn’t need extraneous if-else statements in ‘while (true)’ loops and incessant variable assignment to immutable values.

You might argue that it doesn’t really matter if you lose a few cycles on today’s machines, but programs should be as efficient as possible when it’s worth the time.  You may as well take a few minutes (or even an hour) and figure out the best way to do something.  Forethought with save you an amazing amount of time in the future when you develope real programs and not just school assignments.

Advertisements

Actions

Information

7 responses

25 02 2009
Hungry

wow, awesome solution. I agreed. We can do however we want but when it is done effectively, it counts.

25 02 2009
tonightslastsong

Honestly, it could be done more effectively with the following, but whatever 😛

import java.util.regex.*;
Matcher m = Pattern.compile(search).matcher(text);
int count;
for (count = 0; m.find(); count++);

7 04 2009
xeriomsodar

Honestly if you are worried about performance, using an algorithm like KMP is a better optimization than the low level stuff 🙂

7 04 2009
tonightslastsong

Sure, but how does one just “throw it together” for quick tasks. And if the problem were number-based, where numbers appear in a string, then we can say goodbye to Knuth, Morris, *and* Pratt. The point is that there’s not an efficient way to do it in Java at all, leaving anybody without an education on the matter completely incapable of producing effective code. I thank Java for that one.

17 08 2009
Ethaniel

Here is my own implementation for this need, I’m glad to see that I’m not the only one to loop on indexOf():

int count = 0 ;
for ( int i = -1 ; ( i = text.indexOf( search, i + 1 ) ) != -1 ; count ++ ) {
}
return count ;

As fromIndex/i starts from -1, the ternary operator is useless, and as the update of fromIndex/i is done before the comparison to -1, there’s no need to decrement the counter at the end.

Note: yes, I know that the empty {} can be replaced by a semicolon, but I don’t like it…

17 08 2009
tonightslastsong

Very nice, actually. Clever use of for-statements is too rare. It’s still sad that there’s no built-in method for simple stuff like this. I’m curious to benchmark a few of these and see just how well they perform…

18 08 2009
Ethaniel

Concerning built-in method, what about substr_count()?
Oh, oops, it’s PHP! 😀

Actually, there is a little imprecision in the task description “count instances of a substring”: how do we manage overlapped substrings?
For instance, if text=”AZAZAZAZ” and search=”AZAZ”, do we expect a count of 2 (because text=search+search) or a count of 3 (because search is found in text at positions 0, 2 and 4)?
The difference is the increment to use when we found a substring.
Here is a more complete method:

public int countSubstrings( String text, String search, boolean countOverlap ) {
int inc = countOverlap ? 1 : search.length() ;
int count = 0 ;
for ( int i = -inc ; ( i = text.indexOf( search, i + inc ) ) != -1 ; count ++ ) {
}
return count ;
}

For an even more complete method, please refer to http://www.php.net/manual/en/function.substr-count.php about offset and length parameters.
Personally, even in PHP I never use them, but it doesn’t mean that nobody need them ;).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s




%d bloggers like this: