Elasticsearch - Powered by MinDoc

目标

是什么？是一个分布式、可扩展的实时搜索和分析引擎。
解决什么问题？分布式、全文检索、数据分析
瓶颈及限制：？？？
优化及参数配置：？？？
安装、启动：
- 单机版
- 集群版
curl方式的crud，着重使用query_string
java方式的crud
elasticsearch-head插件的简单操作
导入导出
- elasticsearchdump
- docker版elasticsearchdump

概念

Node 与 Cluster Elastic 本质上是一个分布式数据库，允许多台服务器协同工作，每台服务器可以运行多个 Elastic 实例。单个 Elastic 实例称为一个节点（node）。一组节点构成一个集群（cluster）。查看节点的命令是curl -XGET "es:9200/_cat/nodes?v"
index ==》索引 ==》相当于Mysql中的库。查看所有索引的命令是curl -XGET "es:9200/_cat/indices?v"
type ==》类型 ==》相当于Mysql中的表，存储json类型的数据
document ==》文档 ==》相当于Mysql表中的一行记录
field ==》列 ==》相当于mysql中的列，即一个属性
倒排索引(inverted index)？？？
shard 主分片 where the data is stored
replica 副本 Replicas are used to increase search performance and for fail-over.
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch

优化分片

如何查看分片分布？

节点数大于等于分片数。
一个index数据有300G，分成10个分片，即每个分片30G。（主分片）

设置节点最大分片数

一个索引（index）默认5个主分片（shard），每个主分片有1个副本（replica）

一个集群中的分片总数 =
索引1的主分片 + 索引1的副本 +
索引2的主分片 + 索引2的副本 +
索引3的主分片 + 索引3的副本 +
…

增大节点最大分片数可以让集群容纳更多的索引。

elasticsearch7默认分片只有1000个，目前已经用完了，导致已经没法创建新的索引了。

查看节点最大分片数

curl es:9200/_cluster/settings?pretty

临时设置节点最大分片数

curl -XPUT -H "Content-Type:application/json" es:9200/_cluster/settings -d '{ "transient": { "cluster": { "max_shards_per_node": 10000 } } }'

永久设置节点最大分片数

curl -XPUT -H "Content-Type:application/json" es:9200/_cluster/settings -d '{ "persistent": { "cluster": { "max_shards_per_node": 10000 } } }'

改动就生效，不需要重启集群服务器。

参考：https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html

安装

单机
- 下载tar.gz包
- 使用docker或docker-compose

操作

新建和删除Index
- 新建 curl -XPUT "es:9200/index_name"
- 删除 curl -XDELETE "es:9200/index_name"
记录的crud
- 添加记录 curl -XPUT 'es:9200/index_name/sweet/0?pretty' -H 'Content-Type: application/json' -d ' { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } '
- 查询一条记录
  - 查询记录
    curl -XGET 'es:9200/index_name/sweet/0?pretty'
  - 不返回记录【可能无法正常工作】curl -XHEAD 'es:9200/index_name/sweet/0?pretty'
  - 不返回记录【能正常工作】curl -I 'es:9200/index_name/sweet/0?pretty'
  - 更新记录 curl -XPUT 'es:9200/index_name/sweet/0?pretty' -H 'Content-Type: application/json' -d ' { "user" : "kimchy-new", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch new " } '
- 删除记录 curl -XDELETE 'es:9200/index_name/sweet/0?pretty'
- 查询所有记录
  - 不带条件查询所有 curl -XGET 'es:9200/index_name/sweet/_search?pretty'
  - 带条件查询所有 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
    {
    “query”: {
```
  "match_all": {}
```
    }
    }
    ‘
  - 查询message字段（方法一） `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
    {
    “query”: {
```
  "match": {
      "message": "new"
  }
```
    }
    }
    ‘`
  - 查询message字段（方法二） curl -XGET 'es:9200/index_name/sweet/_search?pretty=true&q=message:new'
  - 分页查询message字段 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
    {
    “query”: {
```
  "match": {
      "message": "new"
  }
```
    },
    “from”: 1, – 开始索引，从第几条数据开始，默认是0
    “size”: 1 – 取多少条数据
    }
    ‘`
  - 使用query_string查询 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
    {
    “query”: {
```
  "query_string": {
      "query" : "((message:new) OR (message:a)) AND (id:(>=0 AND <20))"
  }
```
    },
    “from”: 1, – 开始索引，从第几条数据开始，默认是0
    “size”: 1 – 取多少条数据
    }
    ‘`
  - es配置密码后访问 curl -XGET --user USER:PASS es:9200/index_name
  - 只查询某些字段 curl -H 'Content-Type: application/json' -XGET es:9200/index_name/doc/_search?pretty -d '{ "query": { "query_string": { "query" : "(message:new)" }}, "_source": ["messaage", "id"] , "from": 0, "size": 10 } '
    `
查看索引的创建时间 https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
查看所有字段 curl 'h102:9200/_cat/indices?help'
curl 'h102:9200/_cat/indices?format=json&pretty&h=health,status,index,pri,rep,creation.date.string'
curl 'nj201:9200/_cat/indices?format=json&pretty&h=health,status,index,pri,rep,creation.date.string&s=creation.date.string:asc'

常用命令

查询某一个索引下，某一字段的分词结果：
curl -XGET es:9200/index_a/_doc/100/_termvectors?fields=remark

查看索引列表 curl --user elastic:???? es:9200/_cat/indices
查看索引mapping: curl -X GET “es:9200//_mapping?pretty”

查看模板列表 /cat/templates
查看某一模板 /template/tpl-xxx

查看集群在忙什么？ /_cat/tasks?v

分词器

默认/传统/标准分词器：

curl -XGET es:9200/_analyze -H 'Content-Type: application/json' -d '{ "analyzer": "standard", "text": "学习java和spring" }'

IK分词插件（https://www.cnblogs.com/soft2018/p/10203330.html）

下载地址: https://github.com/medcl/elasticsearch-analysis-ik/releases
注意: es-ik分词插件版本一定要和es安装的版本对应
第一步：下载es的IK插件，改名为ik.zip（上传时候是上传ik为名的文件夹）
第二步: 上传到/usr/local/elasticsearch-6.4.3/plugins
第三步: 重启elasticsearch即可

curl -XGET es:9200/_analyze -H 'Content-Type: application/json' -d '{ "analyzer": "ik_smart", "text": "学习java和spring" }'

可以自定义词典

定义模板时使用 IK ：
curl -X PUT es:9200/template/a-template -H ‘Content-Type: application/json’ -d ‘
{
“template”: “index-pattern-*”,
“settings”: {
“numberof_shards”: 3
},
“mappings”: {
“properties”: {
“id”: {
“type”: “long”
},
“name”: {
“type”: “text”,
“store”: false,
“analyzer”: “ik_max_word”,
“search_analyzer”: “ik_max_word”
}
}
}
}
‘

ilm

hot: The index is actively being written to
warm: The index is generally not being written to, but is still queried
cold: The index is no longer being updated and is seldom queried. The information still needs to be searchable, but it’s okay if those queries are slower.
delete: The index is no longer needed and can safely be deleted

Hot

Set Priority
Unfollow
Rollover
Warm
Set Priority
Unfollow
Read-Only
Allocate
Shrink
Force Merge
Cold
Set Priority
Unfollow
Allocate
Freeze
Delete
Delete
定义一个删除阶段的策略(10天后删除)

curl -X PUT "http://localhost:9200/_ilm/policy/my_policy?pretty" -H 'Content-Type: application/json' -d'
{
  "policy": {                       
    "phases": {
      "delete": {
        "min_age": "10d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
'

显示所有策略

curl -X GET "http://localhost:9200/_ilm/policy"

把策略应用到索引模板中

curl -X PUT "http://localhost:9200/_template/my_template?pretty" -H 'Content-Type: application/json' -d'
{
"index_patterns": ["my-index-*"],
"settings": {
  "number_of_shards": 1,
  "number_of_replicas": 1,
  "index.lifecycle.name": "my_policy",
  "indexing_complete": "true"
}
}
'

应用到现有的多个索引中

curl -X PUT "http://localhost:9200/my-index2-202008*/_settings?pretty" -H 'Content-Type: application/json' -d'
{
"index": {
  "lifecycle": {
    "name": "my_policy"
  }
}
}
'

显示索引所处的阶段

curl -X GET "http://localhost:9200/my-index-*/_ilm/explain?pretty"

中文分词设置

注意事项

es索引名称中不能有大写字母，如Invalid index name [testHello_20191101_e20191101], must be lowercase，（这个错误是logstash中配置es的index时出现的错误）
不要用root 启动es. (can not run elasticsearch as root: https://blog.csdn.net/showhilllee/article/details/53404042)
在hosts文件里配置es，如192.168.0.115 es，好处是就像定义一个变量，在任何机器上只要复制curl命令即可而不需要改动ip，如curl es:9200而不是curl 192.168.0.115:9200。
同一个 Index 里面的 Document，不要求有相同的结构（scheme），但是最好保持相同，这样有利于提高搜索效率。
查询语法 https://www.elastic.co/guide/en/elasticsearch/reference/6.2/query-dsl.html

问题

问题：max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
解决办法：su - root, vi /etc/security/limits.conf , add text * soft nofile 65536 * hard nofile 131072 , reboot
问题：max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
解决办法：su - root, vi /etc/sysctl.conf , add vm.max_map_count=655360, sysctl -p
- echo vm.max_map_count=655360 >> /etc/sysctl.conf && sysctl -p
问题：使用curl命令时出现Invalid UTF-8 middle byte 0xc5\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@4d3551af;
解决办法：使用postman发送请求
问题：restTemplate中往动态index中添加数据
解决办法：

@Autowired
    private ElasticsearchTemplate elasticsearchTemplate;

    @Value("${spring.data.elasticsearch.user-and-password}")
    private String userAndPassword;

    @Autowired
    private RestTemplate restTemplate;

    @Test
    public void testQueryByRestTemplate() {
        String indexName = makeIndexExist();
        Map<String, Object> varsRequest = new HashMap<String, Object>();
        Map<String, Object> queryContent = new HashMap<String, Object>();
        Map<String, Object> matchContent = new HashMap<String, Object>();
        queryContent.put("match_all", matchContent);
        varsRequest.put("query", queryContent);
        String[] _userAndPassword = StringUtils.split(userAndPassword, ":");
        String auth = Base64.encodeBase64String((_userAndPassword[0] + ":" + _userAndPassword[1]).getBytes());
//        restTemplate.headForHeaders("http://es:9200","Authorization", "Basic " + auth);
        HttpHeaders httpHeaders = new HttpHeaders();
        httpHeaders.setContentType(MediaType.parseMediaType("application/json; charset=UTF-8"));
        httpHeaders.add("Authorization", "Basic " + auth);
        ResponseEntity<String> exchange = restTemplate.exchange("http://192.168.0.204:9200/"+indexName+"/_search",
                HttpMethod.GET, new HttpEntity<Map<String, Object>>(varsRequest, httpHeaders), String.class);
        System.out.println(exchange.getBody());
    }

    @Test
    public void testSaveEntityByRestTemplate() {
        String mobile = "12x4128xxx";
        String eventName = "pageview";
        for (int i = 0; i < 10; i++) {
            Long eventTime = System.currentTimeMillis();
            testSaveByRestTemplate(mobile + i, eventName, eventTime);
        }
    }

    public void testSaveByRestTemplate(String mobile, String eventName, Long eventTime) {
        String indexName = makeIndexExist();
        String[] _userAndPassword = StringUtils.split(userAndPassword, ":");
        String auth = Base64.encodeBase64String((_userAndPassword[0] + ":" + _userAndPassword[1]).getBytes());
        HttpHeaders httpHeaders = new HttpHeaders();
        httpHeaders.setContentType(MediaType.parseMediaType("application/json; charset=UTF-8"));
        httpHeaders.add("Authorization", "Basic " + auth);

//        String mobile = "12x4128xxxx";
//        String eventName = "pageview";
        long finalEventTime = System.currentTimeMillis();
        if (eventTime != null) {
            finalEventTime = eventTime;
        }

        Map<String, String> map = new HashMap<>();
        map.put("eventName", eventName);
        map.put("eventTime", String.valueOf(finalEventTime));
        map.put("siteId", "dk8vUOhCciXNooRM");
        map.put("sessionId", "1ba6bd55524d87ec");
        map.put("deviceId", "a0603f6d56ebdb10");
        map.put("userId", "");
        map.put("ip", "180.110.126.56");
        map.put("userAgent", "\"Mozilla/5.0 (Linux; Android 8.1.0; PBCT10 Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/66.0.3359.126 MQQBrowser/6.2 TBS/044704 Mobile Safari/537.36--[{\\x22clienttype\\x22:\\x222\\x22,\\x22appName\\x22:\\x22lyf\\x22,\\x22company\\x22:\\x22ody\\x22,\\x22sessionId\\x22:\\x22862352044317453\\x22,\\x22deviceid\\x22:\\x22862352044317453\\x22,\\x22version\\x22:\\x226.0.80\\x22,\\x22ut\\x22:\\x222d259a4c20a2fde9c75f400ebb0fd96485\\x22}]--\"");
        map.put("uaName", "Chrome Mobile WebView");
        map.put("uaMajor", "66");
        map.put("resolution", "360x780");
        map.put("language", "zh-CN");
        map.put("netType", "4G");
        map.put("country", "中国");
        map.put("plugin", "cookie:1");
        map.put("continentCode", "AS");
        map.put("region", "江苏省");
        map.put("city", "南京");
        map.put("lat", "32.0617");
        map.put("lgt", "118.7778");
        map.put("url", "http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html");
        map.put("title", "Apache Hadoop 2.9.2 &amp;#x2013; HDFS Router-based Federation");
        map.put("referer", "http://taobao.com");
        map.put("eventBody", "");
        map.put("mobileNo", mobile);

        HttpEntity<String> entityPut = new HttpEntity<>(JsonUtils.toJson(map), httpHeaders);
        ResponseEntity<String> exchange1 = restTemplate.exchange("http://192.168.0.204:9200/"+indexName+"/doc", HttpMethod.POST, entityPut, String.class);
        System.out.println(exchange1.getBody());
    }

    public String makeIndexExist() {
        String indexName = "event_test" + DateUtils.formatDate(DateUtils.getToday(), "yyyyMMdd");
        if (!elasticsearchTemplate.indexExists(indexName)) {
            elasticsearchTemplate.createIndex(indexName);
        }

        return indexName;
    }

导入导出

使用 elasticsearchdump

# 安装
npm install elasticdump -g
# 使用
elasticdump

导出示例

#格式：elasticdump --input {protocol}://{host}:{port}/{index} --output ./test_index.json
#例子：将ES中的test_index 中的索引导出
#导出当前索引的mapping结构
$ elasticdump --input http://192.168.56.104:9200/test_index --output ./test_index_mapping.json --type=mapping
#导出当前索引下的所有真实数据
$ elasticdump --input http://192.168.56.104:9200/test_index --output ./test_index.json --type=data

导入示例

# 创建索引
$ curl -XPUT http:192.168.56.104:9200/test_index
#因为导入的是mapping，所以设置type为mapping
$ elasticdump --input ./test_index_mapping.json --output http://192.168.56.105:9200/ --type=mapping
#因为导入的是data（真实数据）所以设置type为data
$ elasticdump --input ./test_index.json --output http://192.168.56.105:9200/ --type=data

异常：

CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

export NODE_OPTIONS=”–max-old-space-size=8192”

docker版elasticsearchdump

# 镜像下载
$ docker pull taskrabbit/elasticsearch-dump
# 下面还是例子：通过镜像导出数据到本地
# 创建一个文件夹用于保存导出数据
$ mkdir -p /root/data
# 下面需要对路径进行映射并执行命令（导出mapping）
$ docker run --rm -ti -v /data:/tmp taskrabbit/elasticsearch-dump \
  --input=http://production.es.com:9200/my_index \
  --output=/tmp/my_index_mapping.json \
  --type=mapping
# 导出（data）
$ docker run --rm -ti -v /root/data:/tmp taskrabbit/elasticsearch-dump \
  --input=http://192.168.56.104:9200/test_index \
  --output=/tmp/elasticdump_export.json \
  --type=data
  -----------------------------------------------------------------------------
# 以下内容为ES -> ES的数据迁移例子
$ docker run --rm -ti taskrabbit/elasticsearch-dump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
$ docker run --rm -ti taskrabbit/elasticsearch-dump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

问题

NoNodeAvailableException[None of the configured nodes are available:
- 看一下config/elasticsearch.yml中的cluster.name，要和代码中的一致，
  network.host: 0.0.0.0
- java配置中的cluster-nodes是es:9300，而不是9200
- 版本问题 spring-boot 2.1.6 和 es7.4 不成功，和 es6.7.2 可以

作者：张三创建时间：2026-03-12 11:56
最后编辑：张三更新时间：2026-03-12 12:09

上一篇： Doris
下一篇： Kibana