Universal parser for your pet/prod project on JAVA

Roman Ivanov
5 min readApr 8, 2020

We often need some simple and reliable instruments to parse data into POJO. Let’s write such an utility package for our projects.

First of all, we need a base object — Parser. What is his area of responsibility? It should have a public method that returns the desired entity. We need to parametrize this method because we want to get the final object. our raw entity supplier also needs to handle generic objects. Afterward, our Parser may look like:

public class Parser<T> {
public T getBean() {…}
}

This seems to be good. Let’s dive deep into this method. Our aim is to get some raw data and then map it into the desired class. Let’s assume we need to parse a CSV file. First of all, we need to read a line, then separate it by commas and transform this array of data into our bean:

public T parseBean() {
if (!hasNext) throw new NoSuchElementException();
List<F> row = rawEntityReader.next();
T bean = beanType.getDeclaredConstructor().newInstance();
updateFields(bean, row);
return bean;
}

there is a reader variable In the code above. The Reader interface has a method — next(). It returns a prepared array with raw data. In our situation this is an array with values of CSV string:

public interface RawEntityReader<F> extends AutoCloseable {
List<F> next();
boolean hasNext();
}

Since RawEntityReader may work with files or other streams it is important to close those resources. it can be used to open the CSV file, read the file line by line and close the stream:

public class CSVEntityReaderImpl implements RawEntityReader<String> {
private final BufferedReader br;
private String nextLine;
@SneakyThrows
public CSVEntityReaderImpl(File file) {
this.br = new BufferedReader(new FileReader(file));
this.nextLine = br.readLine();
}
@Override
public List<String> next() {
return of(nextLine.split(“,”));
}
@Override
@SneakyThrows
public boolean hasNext() {
nextLine = br.readLine();
if (nextLine != null && nextLine.isBlank()) return hasNext();
return nextLine != null;
}
@Override
public void close() throws Exception {
br.close();
}
}

So, now we can read such CSV file:

Year,Make,Model1997,Ford,E3502000,Mercury,Cougar

And we want to parse it into such class:

public class Car{
String company;
String model;
Integer year;
}

At first, we need to mark fields for the parser. It should know how to map fields.

Such annotation will help:

@Retention(RUNTIME)
@Target(FIELD)
public @interface Parsed {
int index() default 0;
}

The class car will transform to this:

public class Car {
@Parsed(index = 1)
String company;
@Parsed(index = 2)
String model;
@Parsed(index = 0)
Integer year;
}

Now we have marked fields for our parser with the desired order. The parser will use it when doing mapping list of strings from CSV reader. Of course, the reflection will help us:

getAllFieldsList(Car.class).forEach(field -> {
field.setAccessible(true);
Annotation[] annotations = field.getAnnotations();
for (Annotation annotation : annotations) {
if (annotation.annotationType() != Parsed.class) continue;
Parsed parsed = (Parsed) annotation;
fields.put(parsed.index(), field);
break;
}
});

As you can see function from getAllFieldsList from org.apache.commons.lang3.reflect.FieldUtils class gives a stream of fields. We can manipulate all the information about the field, especially information about annotation on this field. I use list fields to save information about fields — index in parsed annotation and field:

private final Map<Integer, Field> fields;

RawEntityReader gave us a list of values and we have fields map. Now

function updateFields can be written:

private void updateFields(T bean, List<F> fieldValues) {
for (int i = 0; i < fields.size(); i++) {
Field field = fields.get(i);
F rawData = fieldValues.get(i);
processField(field, bean, rawData);
}
}

Function prosessField knows how to match raw data with our bean. Parser class should not know about this matching. This logic will be delegated to another object with such an interface:

public interface FieldProcessor<T> {
Object process(Field field, Object target, T value);
}

In this situation, we have a field and its value for the parent object. The value is not final and it is needed to be transformed:

var transformedField = transform(value);
field.set(target, transformedField);

But how to transform different types? For example, Double and LocalDateTime. On this step, we should create a special field handler. This class knows how to transform raw value to the desired field:

public interface FieldHandler<T, P> {
T process(P value);
Class<T> getFieldType();
}

Method getFieldType() returns Class object for the type that could be processed by this class.

For Double it can look like this:

public class DoubleFieldHandler implements FieldHandler<Double, String> {
@Override
public Double process(String value) {
return parseDouble(value);
}
@Override
public Class<Double> getFieldType() {
return Double.class;
}
}

Afterward, we can finish FieldProcessor. All field handlers will be stored in the map:

private final Map<Class<?>, FieldHandler<>> handleFieldMap;

We can initialize this map in FieldProcessor constructor:

public FieldProcessorImpl(List<FieldHandler<?, T>> fieldHandlers) {
this.handleFieldMap = fieldHandlers.stream()
.collect(toMap(FieldHandler::getFieldType, identity()));
}

For each field class we have an object, which is able to make a proper transformation.

But it may be a complex situation. For example, we can have two dates in our bean with different values: 2020*01*20 and 2020/01/20. The best solution to resolve this problem is to add additional information to the custom annotation.

Using the previous class wireframe, we create AnnotationProcessor:

public interface AnnotationProcessor<T> {
Object process(Annotation annotation, T value);
Set<Class<? extends Annotation>> getAnnotation();
}

Method getAnnotation will return the annotation class for the AnnotationProcessor map.

The realization will be almost like the previous one:

public class AnnotationProcessorImpl<T> 
implements AnnotationProcessor<T> {
private final Map<Class<? extends Annotation>, AnnotationHandler> handleAnnotationMap;public AnnotationProcessorImpl(List<AnnotationHandler> AnotationHandlerMap) {
this.handleAnnotationMap = annotationHandlerMap.stream()
.collect(toMap(AnnotationHandler::getAnnotationClass, identity()));
}
public Object process(Annotation annotation, T value) {
var handler = handleAnnotationMap.get(annotation.annotationType());
return handler == null ? null : handler.process(annotation, value);
}
}

Because there are many components in our parser, it is convenient to add a builder pattern to it.

I will omit this code, you will find it on GitHub.

Ok. We have the almost ready parser. It’s also very useful to add support for standard java Iterable interface. We have finished our parser with such workflow:

Let’s parse it! At first, we will try to parse simple CSV from the example above:

Parser<Car, String> parser = Parser.<Car, String>builder()
.withRawEntityReader(new CSVEntityReaderImpl(file))
.withAnnotationProcessor(new AnnotationProcessorImpl<>(List.of()))
.withFieldHandler(of(new IntegerFieldHandler(),
new StringFieldHandler()))
.to(Car.class)
.build();

We use our builder to initiate parser. We do not have special fields on it, so we don’t use any annotation processors in this example, but there are two field processors in use.

Now compare how it will look for an xlsx file:

Parser<Car, XSSFCell> parser = Parser.<Car, XSSFCell>builder()
.from(new ExcelEntityReaderImpl(file, 0))
.withAnnotationHandler(of())
.withFieldHandler(of(new IntegerFieldHandler(),
new StringFieldHandler()))
.to(Car.class)
.build();

Looks very similar and easy. Please, check full code in my GitHub repo: https://github.com/r331/uniparser

That is all. Thank you for attention!

--

--