2 years ago (2018-10-07)  Technology |   First to comment  117 
post score 0 times, average 0.0
[Slideup] Catalog

Elasticsearch as an open-source full-text search engine in today's software development has been more and more widely used in business function development, you can choose Elasticsearch to provide more powerful than database query search method, Also based on search results scoring (weights) and highlighting makes it easy for us to implement a search engine within a station.

ElasticSearch VS Database

When we first approached Elasticsearch (ES), we often used it in analogy with the database to learn from the structure:

  • Indices similar to database
  • Type a table similar to database
  • fields are similar to columns in a data table
  • Documents are similar to the rows of a data table (that is, each record) at the same time, the database provides the search syntax to find shadows on ES, such as the database provides and, or logical operators, ES has must, should, and the database such as "like" The text matching feature is more powerful in ES.

Still, ES is essentially a search engine.NoSQL and ES all have the same loosely structured structure, although we also have some discussion about whether ES can be used to replace non-relational databases (regardless of if ES is a nosql), but one reality is that ES and NoSQL still have the same pros and cons, and that the traditional relational database is transactional, The Multi-table association structure is also not available for ES. Therefore, in the actual development process, relational database, NoSQL, ES is still a complementary relationship, we generally only in the more complex search scenario will choose ES to provide search services, and its source data still comes from the database, so this leads to the data synchronization between ES and the database problem.

Full-volume data import

Importing the data stored in the database into ES for the first time requires a full-scale import, which updates the data through the Message queue notification ES when subsequent data updates are available.

Using Message Queuing to implement ES incremental synchronization

Message Queuing is a very common noun in the field of software development. At the operating system level, we can use Message Queuing to communicate between processes, and in a single application, such as Android, a MessageQueue class is used to solve the interface refresh problem between the UI thread and the time-consuming sub-thread, in the Internet of things, based on the release/ The MQTT protocol for the subscription model model is widely used in platform-wide messaging for mass devices, and in distributed systems, as well as in the increasingly popular microservices architecture of recent years, is a very common component that implements asynchronous messaging, decoupling applications, and eventual consistency. A common message queue is a "publish-subscribe" pattern, which beginners can almost think of as an "observer pattern."

Elasticsearch incremental data synchronization and seamless upgrade

Message Queuing mode-publish-Subscribe mode

Currently, the Message Queue Framework is Kafka and RabbitMQ. Message Queuing implements incremental synchronization by publishing a topic message to the message queue when the primary service creates, deletes, and modifies a record, and the synchronization service needs to subscribe to related topics so that Message Queuing can forward the updated records to the synchronization service. The synchronization service then updates the records in ES based on the contents of the message. In addition to the ability to decouple the primary and synchronous services, the message queue implements incremental synchronization with the benefit of ensuring synchronous fault tolerance, such as when a record is added to the database, if the connection fails when a direct HTTP (possibly a POST request) is contacted with the synchronization service, and the POST request fails , this record will not be synchronized if you do not take any action.The failure of Message Queuing mechanism can be a good solution to this problem, while Message Queuing, FIFO (first-in-one) mechanism also guarantees the order of message forwarding.

How to do seamless rebuild after ES index changes

Es index changes occur when the Elasticsearch index structure changes, such as changes in the field in type as the business progresses, or as a result of structural changes in the ES version upgrade, such as ES Version 5.0 splits the previous string type into the text and keyword types, and when we want to upgrade the ES version, the previous string type is no longer available.

Similar to the blue-green deployment of common Web services for no-downtime upgrades, ES seamless upgrades can also be implemented by analogy.The principle of the blue-green deployment of Web services is to use LoadBalancer (load balancer) to do traffic switching, the old and new services have different access URLs, but only the LoadBalancer URL is externally accessible, namely:

  • Before service upgrade: Load balancer points to old service V1
  • Service upgrade: Release the new service V2, load balancing still points to the old service, when there are new and old services exist simultaneously
  • Service upgrade complete: New service V2 start, load balancer switch point, access traffic is directed to new service V2
  • Service upgrade complete: Load balancer switch point and stop old service V1
Elasticsearch incremental data synchronization and seamless upgrade

Blue-Green Deployment

Alias for ES Index

ES provides a way to access the index through an index alias (alias): for example,

An alias test is created for the index test_20181007, so that access to Localhost:9200/test/_search and Localhost:9200/test_20181007/_search can search the contents of the index. The existence of ES aliases provides the possibility for seamless upgrade and switchover of ES, similar to the load Balancer switching point, where we can make ES aliases point to the old and new versions of the index before and after the upgrade.

ES Seamless Upgrade

  • Create new index with version
  • Pause incremental update Because we do not want subsequent records to be updated to the old index during the upgrade, we need to pause the message queue (pause) operation before the new index is created successfully.
  • Perform a full-volume data import
  • Toggle external aliases Pointing to an alias can point to multiple indexes, so we must remove it from the old index while adding the alias to the new index.This operation needs to be atomized, which means we need to use the _aliases operation:

  • Delete old index

  • Turn on incremental updates the records that are updated in the database during the upgrade will be synchronized on the new index

This article has been in the copyright printing record, protected by copyright law, without permission shall not be reproduced!If you think this article is useful for you, you can click on the "sponsor author" below to hit the author!

Reprint Annotated Original source:Baiyuan's Blog> >https://wangbaiyuan.cn/en/elasticsearch-incremental-data-synchronization-seamless-upgrade-2.html

Post comment


No Comment


Forget password?