Skip to content

Most visited

Recently visited

navigation
Added in API level 1

BreakIterator

public abstract class BreakIterator
extends Object implements Cloneable

java.lang.Object
   ↳ java.text.BreakIterator


Locates boundaries in text. This class defines a protocol for objects that break up a piece of natural-language text according to a set of criteria. Instances or subclasses of BreakIterator can be provided, for example, to break a piece of text into words, sentences, or logical characters according to the conventions of some language or group of languages. We provide four built-in types of BreakIterator:

BreakIterator's interface follows an "iterator" model (hence the name), meaning it has a concept of a "current position" and methods like first(), last(), next(), and previous() that update the current position. All BreakIterators uphold the following invariants:

BreakIterator accesses the text it analyzes through a CharacterIterator, which makes it possible to use BreakIterator to analyze text in any text-storage vehicle that provides a CharacterIterator interface.

Note: Some types of BreakIterator can take a long time to create, and instances of BreakIterator are not currently cached by the system. For optimal performance, keep instances of BreakIterator around as long as it makes sense. For example, when word-wrapping a document, don't create and destroy a new BreakIterator for each line. Create one break iterator for the whole document (or whatever stretch of text you're wrapping) and use it to do the whole job of wrapping the text.

Examples:

Creating and using text boundaries:

 public static void main(String args[]) {
     if (args.length == 1) {
         String stringToExamine = args[0];
         //print each word in order
         BreakIterator boundary = BreakIterator.getWordInstance();
         boundary.setText(stringToExamine);
         printEachForward(boundary, stringToExamine);
         //print each sentence in reverse order
         boundary = BreakIterator.getSentenceInstance(Locale.US);
         boundary.setText(stringToExamine);
         printEachBackward(boundary, stringToExamine);
         printFirst(boundary, stringToExamine);
         printLast(boundary, stringToExamine);
     }
 }
 

Print each element in order:

 public static void printEachForward(BreakIterator boundary, String source) {
     int start = boundary.first();
     for (int end = boundary.next(); end != BreakIterator.DONE; start = end, end = boundary.next()) {
         System.out.println(source.substring(start, end));
     }
 }
 

Print each element in reverse order:

 public static void printEachBackward(BreakIterator boundary, String source) {
     int end = boundary.last();
     for (int start = boundary.previous(); start != BreakIterator.DONE; end = start, start = boundary
             .previous()) {
         System.out.println(source.substring(start, end));
     }
 }
 

Print the first element:

 public static void printFirst(BreakIterator boundary, String source) {
     int start = boundary.first();
     int end = boundary.next();
     System.out.println(source.substring(start, end));
 }
 

Print the last element:

 public static void printLast(BreakIterator boundary, String source) {
     int end = boundary.last();
     int start = boundary.previous();
     System.out.println(source.substring(start, end));
 }
 

Print the element at a specified position:

 public static void printAt(BreakIterator boundary, int pos, String source) {
     int end = boundary.following(pos);
     int start = boundary.previous();
     System.out.println(source.substring(start, end));
 }
 

Find the next word:

 public static int nextWordStartAfter(int pos, String text) {
     BreakIterator wb = BreakIterator.getWordInstance();
     wb.setText(text);
     int last = wb.following(pos);
     int current = wb.next();
     while (current != BreakIterator.DONE) {
         for (int p = last; p < current; p++) {
             if (Character.isLetter(text.charAt(p)))
                 return last;
         }
         last = current;
         current = wb.next();
     }
     return BreakIterator.DONE;
 }
 

The iterator returned by BreakIterator.getWordInstance() is unique in that the break positions it returns don't represent both the start and end of the thing being iterated over. That is, a sentence-break iterator returns breaks that each represent the end of one sentence and the beginning of the next. With the word-break iterator, the characters between two boundaries might be a word, or they might be the punctuation or whitespace between two words. The above code uses a simple heuristic to determine which boundary is the beginning of a word: If the characters between this boundary and the next boundary include at least one letter (this can be an alphabetical letter, a CJK ideograph, a Hangul syllable, a Kana character, etc.), then the text between this boundary and the next is a word; otherwise, it's the material between words.)

See also:

Summary

Constants

int DONE

This constant is returned by iterate methods like previous() or next() if they have returned all valid boundaries.

Protected constructors

BreakIterator()

Default constructor, for use by subclasses.

Public methods

Object clone()

Returns a copy of this iterator.

abstract int current()

Returns this iterator's current position.

abstract int first()

Sets this iterator's current position to the first boundary and returns that position.

abstract int following(int offset)

Sets the position of the first boundary to the one following the given offset and returns this position.

static Locale[] getAvailableLocales()

Returns an array of locales for which custom BreakIterator instances are available.

static BreakIterator getCharacterInstance(Locale locale)

Returns a new instance of BreakIterator to iterate over characters using the given locale.

static BreakIterator getCharacterInstance()

Returns a new instance of BreakIterator to iterate over characters using the user's default locale.

static BreakIterator getLineInstance(Locale locale)

Returns a new instance of BreakIterator to iterate over line breaks using the given locale.

static BreakIterator getLineInstance()

Returns a new instance of {BreakIterator to iterate over line breaks using the user's default locale.

static BreakIterator getSentenceInstance(Locale locale)

Returns a new instance of BreakIterator to iterate over sentence-breaks using the given locale.

static BreakIterator getSentenceInstance()

Returns a new instance of BreakIterator to iterate over sentence-breaks using the default locale.

abstract CharacterIterator getText()

Returns a CharacterIterator which represents the text being analyzed.

static BreakIterator getWordInstance()

Returns a new instance of BreakIterator to iterate over word-breaks using the default locale.

static BreakIterator getWordInstance(Locale locale)

Returns a new instance of BreakIterator to iterate over word-breaks using the given locale.

boolean isBoundary(int offset)

Indicates whether the given offset is a boundary position.

abstract int last()

Sets this iterator's current position to the last boundary and returns that position.

abstract int next(int n)

Sets this iterator's current position to the next boundary after the given position, and returns that position.

abstract int next()

Sets this iterator's current position to the next boundary after the current position, and returns this position.

int preceding(int offset)

Returns the position of last boundary preceding the given offset, and sets the current position to the returned value, or DONE if the given offset specifies the starting position.

abstract int previous()

Sets this iterator's current position to the previous boundary before the current position and returns that position.

void setText(String newText)

Sets the new text string to be analyzed, the current position will be reset to the beginning of this new string, and the old string will be lost.

abstract void setText(CharacterIterator newText)

Sets the new text to be analyzed by the given CharacterIterator.

Inherited methods

From class java.lang.Object

Constants

DONE

Added in API level 1
int DONE

This constant is returned by iterate methods like previous() or next() if they have returned all valid boundaries.

Constant Value: -1 (0xffffffff)

Protected constructors

BreakIterator

Added in API level 1
BreakIterator ()

Default constructor, for use by subclasses.

Public methods

clone

Added in API level 1
Object clone ()

Returns a copy of this iterator.

Returns
Object a copy of this object.

current

Added in API level 1
int current ()

Returns this iterator's current position.

Returns
int this iterator's current position.

first

Added in API level 1
int first ()

Sets this iterator's current position to the first boundary and returns that position.

Returns
int the position of the first boundary.

following

Added in API level 1
int following (int offset)

Sets the position of the first boundary to the one following the given offset and returns this position. Returns DONE if there is no boundary after the given offset.

Parameters
offset int: the given position to be searched for.
Returns
int the position of the first boundary following the given offset.
Throws
IllegalArgumentException if the offset is invalid.

getAvailableLocales

Added in API level 1
Locale[] getAvailableLocales ()

Returns an array of locales for which custom BreakIterator instances are available.

Note that Android does not support user-supplied locale service providers.

Returns
Locale[]

getCharacterInstance

Added in API level 1
BreakIterator getCharacterInstance (Locale locale)

Returns a new instance of BreakIterator to iterate over characters using the given locale.

Parameters
locale Locale
Returns
BreakIterator

getCharacterInstance

Added in API level 1
BreakIterator getCharacterInstance ()

Returns a new instance of BreakIterator to iterate over characters using the user's default locale. See "Be wary of the default locale".

Returns
BreakIterator a new instance of BreakIterator using the default locale.

getLineInstance

Added in API level 1
BreakIterator getLineInstance (Locale locale)

Returns a new instance of BreakIterator to iterate over line breaks using the given locale.

Parameters
locale Locale
Returns
BreakIterator

getLineInstance

Added in API level 1
BreakIterator getLineInstance ()

Returns a new instance of {BreakIterator to iterate over line breaks using the user's default locale. See "Be wary of the default locale".

Returns
BreakIterator a new instance of BreakIterator using the default locale.

getSentenceInstance

Added in API level 1
BreakIterator getSentenceInstance (Locale locale)

Returns a new instance of BreakIterator to iterate over sentence-breaks using the given locale.

Parameters
locale Locale
Returns
BreakIterator

getSentenceInstance

Added in API level 1
BreakIterator getSentenceInstance ()

Returns a new instance of BreakIterator to iterate over sentence-breaks using the default locale. See "Be wary of the default locale".

Returns
BreakIterator a new instance of BreakIterator using the default locale.

getText

Added in API level 1
CharacterIterator getText ()

Returns a CharacterIterator which represents the text being analyzed. Please note that the returned value is probably the internal iterator used by this object. If the invoker wants to modify the status of the returned iterator, it is recommended to first create a clone of the iterator returned.

Returns
CharacterIterator a CharacterIterator which represents the text being analyzed.

getWordInstance

Added in API level 1
BreakIterator getWordInstance ()

Returns a new instance of BreakIterator to iterate over word-breaks using the default locale. See "Be wary of the default locale".

Returns
BreakIterator a new instance of BreakIterator using the default locale.

getWordInstance

Added in API level 1
BreakIterator getWordInstance (Locale locale)

Returns a new instance of BreakIterator to iterate over word-breaks using the given locale.

Parameters
locale Locale
Returns
BreakIterator

isBoundary

Added in API level 1
boolean isBoundary (int offset)

Indicates whether the given offset is a boundary position. If this method returns true, the current iteration position is set to the given position; if the function returns false, the current iteration position is set as though following(int) had been called.

Parameters
offset int: the given offset to check.
Returns
boolean true if the given offset is a boundary position; false otherwise.

last

Added in API level 1
int last ()

Sets this iterator's current position to the last boundary and returns that position.

Returns
int the position of last boundary.

next

Added in API level 1
int next (int n)

Sets this iterator's current position to the next boundary after the given position, and returns that position. Returns DONE if no boundary was found after the given position.

Parameters
n int: the given position.
Returns
int the position of last boundary.

next

Added in API level 1
int next ()

Sets this iterator's current position to the next boundary after the current position, and returns this position. Returns DONE if no boundary was found after the current position.

Returns
int the position of last boundary.

preceding

Added in API level 1
int preceding (int offset)

Returns the position of last boundary preceding the given offset, and sets the current position to the returned value, or DONE if the given offset specifies the starting position.

Parameters
offset int: the given start position to be searched for.
Returns
int the position of the last boundary preceding the given offset.
Throws
IllegalArgumentException if the offset is invalid.

previous

Added in API level 1
int previous ()

Sets this iterator's current position to the previous boundary before the current position and returns that position. Returns DONE if no boundary was found before the current position.

Returns
int the position of last boundary.

setText

Added in API level 1
void setText (String newText)

Sets the new text string to be analyzed, the current position will be reset to the beginning of this new string, and the old string will be lost.

Parameters
newText String: the new text string to be analyzed.

setText

Added in API level 1
void setText (CharacterIterator newText)

Sets the new text to be analyzed by the given CharacterIterator. The position will be reset to the beginning of the new text, and other status information of this iterator will be kept.

Parameters
newText CharacterIterator: the CharacterIterator referring to the text to be analyzed.
This site uses cookies to store your preferences for site-specific language and display options.

Hooray!

This class requires API level or higher

This doc is hidden because your selected API level for the documentation is . You can change the documentation API level with the selector above the left navigation.

For more information about specifying the API level your app requires, read Supporting Different Platform Versions.