
Elasticsearch in Java projects - index and read documents
Nowadays market puts a huge demand for projects on efficient searching and analyzing capabilities of the big volume data. The answer on this is using Elasticsearch, that can be easily integrated into Java projects. In the series of articles I would like to present how Elasticsearch can be used in practice by developing a simple demo project that will use Elasticsearch features step by step.
Mariusz Maślanka |
25 Apr 2022
This first article presents how to index or read documents using Java High Level Rest Client. This is official client, written and supported by Elasticsearch.
## What is Elasticsearch?
Elasticsearch is an open-source search engine developed in Java that is built on top of Apache Lucene, a high-performance full-text search-engine library.
It allows users to store, search and analyze data efficiently and gives the response in JSON format in near real-time.
Over the years Elasticsearch, mainly due to its performance, has become the most popular search engine. It is commonly used not only by big companies like Wikipedia, Netflix or developers' favourite Stack Overflow, but by the small startups as well.
The Elasticsearch organizes the data expressed in JSON format called *documents* in the *index*, which is the collection of documents of the same type. Each document is identified by its unique ID whereas the index is identified by its name.
## Prerequisites
1. Java 8 or higher
2. Installed and running Elasticsearch (refer to [official user guide](https://www.elastic.co/guide/en/elasticsearch/reference/current/install-elasticsearch.html))
## Project setup
### About
The idea is to develop a simple backend-for-frontend service that provides the data about formula 1 drivers. The service provides REST API so that indexed data can be either refreshed or retrieved.
At the beginning the `Driver` objects shown below will be indexed and retrieved.
```java
class Driver {
String driverId;
String code;
String givenName;
String familyName;
LocalDate dateOfBirth;
String nationality;
boolean active;
Integer permanentNumber;
}
```
### Maven configuration
Once the Elasticsearch is already working, required dependencies shown below need to be added `pom.xml`. For this project version `7.15.2` is chosen.
```xml
7.15.2
org.elasticsearch.client
elasticsearch-rest-high-level-client
${elasticsearch.version}
org.elasticsearch
elasticsearch
${elasticsearch.version}
```
### Project properties
For the convenience some elastic and project related properties were added to `application.yml`. It is declared that the documents will be stored in `db-drivers` index.
```
elasticsearch:
host: localhost
port: 9200
index:
name:
drivers: "db-drivers"
```
## Java API
There are many ways to talk to Elasticsearch. Next to REST API, many programming languages have their official clients, provided by Elasticsearch, for Java we have the High Level REST Client.
### High Level REST Client
High Level REST Client accepts request objects and returns response objects for the most important APIs like e.g. info, get, index, delete, update, bulk or search.
The `RestHighLevelClient` is built on top of the REST low-level client builder. The following code shows how to initialize the `RestHighLevelClient` with previously created a low level `RestClientBuilder` in Spring applications.
Each APIs in High Level REST Client can be called synchronously or asynchronously. In this article there is a focus on synchronous call which returns either a response object or in case of failure throw an `IOException`.
```java
@Configuration
public class ElasticsearchConfiguration {
@Value("${elasticsearch.host}")
private String elasticsearchHost;
@Value("${elasticsearch.port}")
private int elasticsearchPort;
@Bean
RestHighLevelClient restHighLevelClient() {
RestClientBuilder restClientBuilder = RestClient.builder(new HttpHost(elasticsearchHost, elasticsearchPort));
return new RestHighLevelClient(restClientBuilder);
}
}
```
### Indexing documents
Depending on the project requirements documents can be stored using different approaches. On the one hand there can be used only one index created once at the beginning, where new documents are added or updated once they change. This is useful for cases where not full scope of data is ingested.
On the other hand, when there is a scenario to always ingest full data, each time the new index can be created in order to provide some auditing possibilities. Each of those has its advantages and drawbacks and, as was mentioned before, the choice can be driven by the project requirements.
In the demo application the second approach was chosen. Each time new index will be created with name enriched by timestamp suffix like `db-drivers-20211222-122343`.
The process of indexing documents is divided into three steps:
1. Create index
Creating new index requires only building the `CreateIndexRequest` object with some additional configuration e.g. setting is added if needed. This prepared request is then used by the client when calling `create` method.
You can find setting the numbers of shards and replicas below.
Shortly `shard` is a Lucene index that contains a subset of the documents stored in the Elasticsearch index. The number of shards depends on many factors like: the amount of data, queries etc. There are two types of shards, the primary shard and a `replica`, or copy. Each `replica` is located on different node. That assures the data is available if another node has failure.The defaults for both shards and replicas is 1.
```java
CreateIndexRequest createIndexRequest = new CreateIndexRequest(indexName)
.settings(Settings.builder()
.put("index.number_of_shards", NO_OF_SHARDS)
.put("index.number_of_replicas", NO_OF_REPLICAS)
.build());
client.indices().create(createIndexRequest, DEFAULT);
```
2. Index documents
Once the index exist the documents can be stored. For each data object the 'IndexRequest' is created with the given indexName and then the current data is passed to it as a map.
In order to make it efficient via only one call to Elasticsearch, the 'BulkRequest' is used. At first each of the `IndexRequest` as added to `BulkRequest` instance and once the bulk request is prepared, the `client.bulk()` method is invoked.
```java
try {
BulkRequest bulkRequest = new BulkRequest().setRefreshPolicy(WAIT_UNTIL);
data.forEach(it -> {
// It converts an instance of data class to map because IndexRequest accepts a map as a source.
Map source = objectMapper.convertValue(it, new TypeReference<>() {
});
bulkRequest.add(new IndexRequest(indexName).source(source));
});
BulkResponse bulkResponse = client.bulk(bulkRequest, DEFAULT);
if (bulkResponse.hasFailures()) {
throw new ElasticsearchStoreException(bulkResponse.buildFailureMessage());
}
} catch (IOException e) {
throw new ElasticsearchStoreException(e);
}
```
3. Assign alias
'Alias' is a kind of secondary name that can be used to refer to an index or multiple indices. It means we always refer to the index assigned to the alias no matter what is the real index name.
This functionality is provided by rolling the alias from already assigned indices to the new one. The first step is to find all existing indices under specific alias using 'GetAliasRequest'.
Next all previously found indices are disconnected from alias. Finally, the alias is assigned to the new index. Each of this activity is defined in `AliasAction` that is added to `IndicesAliasRequst`. At the end the `updatedAliases()` method of `client` is invoked with already prepared alias request.
```java
IndicesAliasesRequest indicesAliasesRequest = new IndicesAliasesRequest();
// find all existing indices under specified alias
String[] assignedIndicesUnderAlias = client.indices().getAlias(new GetAliasesRequest(indexNameAsAlias), DEFAULT)
.getAliases().keySet().toArray(String[]::new);
// unassign all previously found indices
if (assignedIndicesUnderAlias.length > 0) {
AliasActions unassignAction = new AliasActions(REMOVE).indices(assignedIndicesUnderAlias).alias(indexNameAsAlias);
indicesAliasesRequest.addAliasAction(unassignAction);
}
// assign the newly created index to specified alias
AliasActions assignAction = new AliasActions(ADD).index(indexName).alias(indexNameAsAlias);
indicesAliasesRequest.addAliasAction(assignAction);
// invoke client command
client.indices().updateAliases(indicesAliasesRequest, DEFAULT);
```
In the next paragraph the documents will be retrieved from the index so that the activity will be verified.
### Reading documents
When all documents are in the index now it is time to read all of them out of it. Firstly, the `SearchSourceBuilder` is created, where it is important to increase `size` parameter. The default value is 10 so that only first 10 documents are returned.
Next, the source builder is passed to the instance of `SearchRequest` together with the name of source index so that it is defined what and from the documents are taken.
```java
try {
// Prepare search request
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
searchSourceBuilder.size(1000); // by default query returns 10
SearchRequest searchRequest = new SearchRequest();
searchRequest.indices(indexName);
searchRequest.source(searchSourceBuilder);
// invoke clint command to get search response
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
throw new ElasticsearchReadException(e);
}
```
Finally, the hits taken from the search response that is the result of calling `search()` method of the `client` is parsed to output objects.
```java
List fetchedList = stream(searchResponse.getHits().getHits())
.map(it -> toOutputDocument(it, typeReference))
.collect(toList());
```
By using the REST API `/db-drivers/_search` the result shown below can be achieved.

## Summary
As was shown integrating the Elasticsearch to the project is pretty straight forward. It just requires small amount of the configuration and with usage of Java Api Client indexing and retrieving documents is not troublesome either.
## Reference
- [demo application repository](https://github.com/Maniollo/driver-book)
- https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high.html
- https://www.elastic.co/guide/en/elasticsearch/client/java-rest/current/java-rest-high-java-builders.html
Mariusz Maślanka
I am a Senior Software Developer with over 9 years of experience gained in railway, banking and gaming industry.
Did you like this article?
0,0 / 0