Java Streams — a brief overview

Photo by Robert Anasch

In this article I want to introduce you, in my opinion, to one of the most significant innovations in Java since its inception, the Stream API.

What is this? Why? And what are the benefits?

Very often when we write a program, we need to process our data. We use loops or recursive functions to process our data.

The Stream API was created to help users process data faster and easier. The API itself provides a tool that allows us to give us a recipe on how to process objects.

If we draw parallels to the real world, let’s imagine that we have a kind of factory for the production of custom-made furniture.

Trucks bring logs to the factory. In this factory we have people who we have trained to do something with the wood to make it into furniture: they look at each log for defects and filter bad ones, process the boards, assemble them with nails or glue and protect the finished product with varnish.

The last element in this chain is the customer, who comes to the factory and places an order.

Without a buyer, there is no point in starting the whole production.

In the Java world, such a plant is called a Stream API. This API is a library that helps in a functional style to describe briefly but succinctly how to process data.

As in the example about the factory, each stream must have a source of objects. This source of information is most often a collection since that’s where we store our data, but it doesn’t have to be — there can also be some generator that generates objects according to a given rule, examples we’ll look at later.

The Stream API also provides intermediate operations. They act as workers. The operations describe the recipe for processing objects.

At the end of each stream there must be a terminal operation to absorb all processed data.

In the factory example, we saw that the customer becomes the trigger to start production and is the last link in the factory’s operation. He picks up all the products.

Let’s consider a simple stream. Let’s create a log class and put some logs in the collection:

class Log{
String type;
int count;
// getters and setters
}
List<Log> logs = List.of(new Log(“Siberian pine”, 10),
new Log(“Mongolian Oak”, 30),new Log(“Giant Sequoia”, 5));

Collections have a stream() method that will return a stream for a given dataset.

Stream<Log> stream = logs.stream();

Once we have the link to the stream, we can start processing our data stream.

Let’s filter out the logs whose number is less than 7 and leave only the logs that are not oak. This will look like this:

Stream<Log> filteredStream = stream.filter(x -> x.getCount() > 7)
.filter(x -> !”Mongolian Oak”.equalsIgnoreCase(x.getType()));

We added filters and got a stream that describes the processing of all our logs. Now we have to add a terminal operation to it to run the data stream from the collection:

filteredStream.forEach(x -> System.out.println(x.getType()));

In this example, the terminal operation takes the remaining elements after filtering and prints them. It is worth mentioning that you cannot call the terminal operation a second time — the stream is a “one-time” object. This is done by the authors of the library so that it is possible to correctly process data that have a limited lifetime. For example, if you process packets from the Internet, the data may get to the stream only once, so a repeated call makes no sense.

As mentioned earlier, it is possible to create a data source in different ways. Let’s consider the most popular ones.

Ways to create a data source

Let’s start with the methods declared in the Stream. Method Stream.of() accepts an array of objects and creates a stream based on them.

Example:

Stream<String> stringStream = Stream.of(“asd”, “aaa”, “bbb”);

There is a method for creating an empty stream:

Stream.empty()

The pattern builder is supported by the library, so if we get a Stream.builder() object, we can use it to build a new stream.

If we have two streams, we can combine them into one by calling the method:

Stream.concat()

Example:

Stream.concat(Stream.of(“aaa”, “bbb”, “ccc”), Stream.of(“111”, “222”, “333”))

As a result, we will get a stream with six elements in it.

The stream doesn’t have to absorb any data, you can create a generator that will deliver to our stream using the generate() method:

Stream.generate(() -> Math.random()).forEach(System.out::println);

Since the generator can infinitely generate a stream and in the example above we will get an infinite display of random values, it is necessary to add an intermediate operation limit(100). it will allow limiting the stream. We will get acquainted with these operations later.

The Random class has the same functionality. It already has methods that create streams from random numbers.

new Random().ints()
new Random().doubles()
new Random().longs()

Sometimes when a stream consists of only numbers, using wrappers over primitive types will greatly affect performance.

That’s why the creators of API have added special stream types for primitive types:

LongStream
DoubleStream
IntStream

These streams operate with only one type of data.

You can also get a stream from primitives using the methods of the utility class Arrays.stream(). This overloaded method allows you to wrap our array and get a stream from it. It should be noted that Arrays class has a method static <T> Stream<T> stream(T[] array), so you can get a stream from an array of objects, not just primitive types.

The Collection interface has a default method that returns a stream to us. Any collection has the ability to turn into a stream:

List.of(“a”, “b”, “c”).stream().forEach(System.out::println);

This is the most common way to get a stream from a dataset.

Now we know the basic creation methods. We can move on to the intermediate operations. They will allow us to process our data stream.

Intermediate operations

We are already familiar with the filter operation, it allows us to write an expression that will be checked for each item. If the expression is true, the item can pass on.

But in our factory we do much more than just filter logs. In order for the wood to become furniture it needs to be transformed.

The most popular function, map(), could help us. Let’s take our example above and try to use it:

logs.stream().map(x -> x.getType()).forEach(System.out::println);

The map function takes an implementation of the Function<T, R> functional interface.

We get an object of type T as input and return an object of type R. Our stream that was typed by the Log object becomes typed by the object that returns x.getType(). We get a set of strings with tree names.

Intermediate operations can be concatenated with each other, that is, we can add more transformations:

logs.stream().map(Log::getType).map(x -> x.split(“ “))
.forEach(System.out::println);

In the second map, we split each string into an array of strings. But if we run the application, we see that the strings aren’t displayed. Instead, there will be toString() values of the String[] array. We want the stream to be flat. For this purpose, the creators of Stream API have invented another intermediate operation — flatMap. Here’s how it will allow us to change our stream (I replaced the past operations with the reference method):

logs.stream().map(Log::getType).map(x -> x.split(“ “)).flatMap(x ->
Arrays.stream(x)).forEach(System.out::println);

The input of flatMap() is a function. It gets a stream from an object and concatenates it with others. Thus, we create streams from arrays of strings and concatenate them together. Now let’s try to get a list of all letters which occur in our stream. To do this, we will use the chars() method. The chars() method of the String class returns a stream of IntStream primitives. It maps an int value to each character in the string.

Stream<IntStream> str = logs.stream().map(Log::getType).map(x -> x.split(“ “)).flatMap(Arrays::stream)
.map(String::chars).forEach(System.out::println);

Ok. We have the stream of IntStream. The usual flatMap will not work, so for primitive streams there are special operations for their transformations:

IntStream chars = logs.stream()
.map(Log::getType)
.map(x -> x.split(“ “))
.flatMap(Arrays::stream)
.map(String::chars)
.flatMapToInt(x -> x);
chars.forEach(y -> System.out.println((char)y));

The function x -> x tells us that we don’t need any additional transformations to unit our streams together. In the terminal operation forEach we cast the value of y to char type.

The previous example displayed each name of the tree type letter by letter. What if we want to display only one type of letter, eliminating repetitive letters? To do this, we can use the intermediate operation distinct(). The operation itself is very similar to the filter, the only difference is that it stores in itself all the numbers that have passed through it and next time “skips” only those objects that have not yet passed through.

In order to sort the letters, we will use the sorted() operation:

IntStream chars = logs.stream()
.map(Log::getType)
.map(x -> x.split(“ “))
.flatMap(Arrays::stream).
map(String::chars).flatMapToInt(x -> x).distinct().sorted();

The sorted() operation has some problems. In order to sort objects coming from the stream, it should accumulate all the objects that are in the stream and only then start sorting. But what to do if the stream is infinite or there are a huge number of items? A call of such an operation will lead to OutOfMemoryException. So, be careful.

To limit infinite operations, there is the limit() operation. We can pass the number of elements we want to take from the stream as an argument to it:

IntStream chars = logs.stream()
.map(Log::getType)
.map(x -> x.split(“ “))
.flatMap(Arrays::stream)
.map(String::chars)
.flatMapToInt(x -> x)
.distinct()
.limit(3)
.sorted();

In the example above, we used the limit function to limit our stream to three items, which were later sorted().

The opposite limit() operation is called skip(). It takes as a parameter the number of elements to skip before start processing.

Sometimes it is not convenient, and sometimes it is impossible to specify in advance how many elements to skip. That’s why additional intermediate operations were added to streamlines, which takeWhile(Predicate<T> predicate) and dropWhile(Predicate<T> predicate).

new Random().ints()
.takeWhile(x -> x % 7 != 0)
.forEach(System.out::print);

The last operation I’d like to describe is boxed(). It should be used if we want to turn our stream of primitives into an object stream.

There are a few more intermediate operations, but I will talk about only one, in my opinion, the most important. This is the parallel() operation. Placing it anywhere in our stream, we magically start a very complex mechanism inside the JVM. We get the possibility of multi-thread processing of our stream. That is, the Stream API will try to perform all operations as efficiently as possible on different cores of the processor. And it really works!

Terminal operations

After getting acquainted with the main intermediate operations, we came smoothly to the conclusion. Let’s consider the terminal operations. These are the operations that sort of “start” our stream. We can create a stream and add any number of intermediate operations to it, but they will not be executed until a terminal operation is added.

We already used one of the most popular operations above — forEach(). It takes all objects that have passed through the stream and processes them according to the algorithm that will be specified in Consumer.

A terminal operation can also return a value. The most common ones — findFirst(), findAny(), anyMatch(), allMatch(), noneMatch().

The findFirst() and findAny() functions return a single value, wrapped in Optional. As you can easily guess, in the first case we get the first element of our stream, and in the second we get an arbitrary element from it, provided, of course, that the element exists, otherwise, we return Optional.empty().

The functions anyMatch(), allMatch(), noneMatch() allow to check elements of the stream for a certain condition and return true or false. The first function runs through the elements of the stream until at least one element meets the condition, if there are no such elements, it returns false. The opposite way allMatch() works — true is returned only if all elements fit, if for at least one element the predicate returned false, the terminal operation will immediately return the same. noneMath is the same as allMatch(), only with the predicate function inverted.

Now let’s move on to more complex functions. Often we want a set of new objects that have been created as a result of the stream as a result of processing. To do this it is convenient to put them in an array or collection.

Several methods have been added to the Stream API which gives the corresponding functionality.

By calling the terminal operation Object[] toArray() we will get a reference to an array that will contain all objects. If you want to return an array of a certain type, then IntFunction<A[]> generator will take the number of elements, and inside it, we need to create the type of array we need.

The next operation worth mentioning is reduce() — it takes the initial value and the binary function that defines the algorithm for combining the two objects.

To get the sum of the first 100 members of a stream from arbitrary values, write:

new Random().ints(100).reduce(0, (x, y) -> x + y);

We pass an initial element for addition, in our case, it is 0, and a binary function that describes how to combine the two values from the stream.

If we want to move this set of numbers to a collection, we need to specify how to create the collection and how to put the elements into it:

List<Integer> ints = new Random().ints(100).boxed(
.reduce(new ArrayList<>(),
(x, y) -> { x.add(y); return x; },
(a, b) -> { a.addAll(b); return a;});

We passed our initial argument, a new empty collection. Then we described the rule by which we will combine the collection and stream elements. Finally, we described how we would merge the two collections.

Note! This method creates a new ArrayList for every number, so it can negatively influence application performance.

With the collect() operation you can write it more succinctly and effective:

List<Integer> ints = new Random().ints(100).boxed()
.collect(Collectors.toList());

The entire logic of combining elements is stored in a data structure called a collector. The creators of the Stream API have added a lot of collectors to the library, let’s look at them.

Above we have already seen the collector that combines elements into a list. If we need to combine elements into a collection of type Set, it’s easy enough to use Collectors.toSet().

There is a more general Collectors.toCollection() method. You can pass a collection into it as an argument, in which the elements of the stream will be placed.

A more complex collector is toMap(). The collector needs to explain how to collect the dictionary: what to do with the object to get the key and value, as well as how to behave when the keys match.

Let’s count the number of letters:

chars.collect(Collectors.toMap(x -> x, x -> 1, Integer::sum))

For the key we use the letter itself without changing (x->x). Then we match each new letter with integer 1. If the letters match, we add the numbers in values for them.

The partitionBy() operation allows to divide the stream into two sets according to a certain condition. For example, we want to divide our letter stream into two groups with capital and uppercase letters:

chars.collect(Collectors.partitioningBy(Character::isLowerCase))

Collectors can be combined with each other, which gives more flexibility.

In the example above we see that some of the letters are repeated, we don’t want that so we add another collector that brings everything together in Set:

chars.collect(Collectors.partitioningBy(Character::isLowerCase, Collectors.toSet()))

The groupingBy() method does the same job as toMap(), the only difference being that you can also pass a chain of collectors into it, just like partitioningBy().

To implement the collector yourself, you can use the static method:

Collector<T, R, R> of(Supplier<R> supplier,
BiConsumer<R, T> accumulator,
BinaryOperator<R> combiner)

This is very similar to the reduce() method we discussed earlier, the only difference being that we need to add collector characteristics to the stream. They indicate properties of the collector that can be used for optimization. For example, whether the collector can be parallel, which will make combining significantly faster.

Let’s build a collector that similar to toList().

The Collector.of() method requires the implementation of four methods. Supplier returns a lambda expression that creates a container for storing intermediate expressions:

public static Supplier<List<Integer>> supplier() {
return ArrayList::new;
}

Accumulator adds the next value to the container of intermediate values:

public static BiConsumer<List<Integer>, Integer> accumulator() {
return List::add;
}

Combiner combines two containers of intermediate values into one:

public static BinaryOperator<List<Integer>> combiner() {
return (l, r) -> {
l.addAll(r);
return l;
};
}

Eventually, we can write our expression as follows:

List<Integer> ints = new Random().ints(100)
.boxed()
.collect(Collector.of(supplier(), accumulator(), combiner()));

In this short article, we met what I think is the coolest thing in Java since its inception. Streams allow you to simplify, and consequently speed up code development considerably. The possibility to make a parallel stream almost for free and in this way to increase the code’s performance many times more makes streams a tool number one in hands of every developer.

Have a nice coding! )

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store