目标

  • 是什么?是一个分布式可扩展实时搜索和分析引擎
  • 解决什么问题?分布式、全文检索、数据分析
  • 瓶颈及限制:???
  • 优化及参数配置:???
  • 安装、启动:
    • 单机版
    • 集群版
  • curl方式的crud,着重使用query_string
  • java方式的crud
  • elasticsearch-head插件的简单操作
  • 导入导出
    • elasticsearchdump
    • docker版elasticsearchdump

概念

  • Node 与 Cluster Elastic 本质上是一个分布式数据库,允许多台服务器协同工作,每台服务器可以运行多个 Elastic 实例。单个 Elastic 实例称为一个节点(node)。一组节点构成一个集群(cluster)。查看节点的命令是curl -XGET "es:9200/_cat/nodes?v"
  • index ==》索引 ==》相当于Mysql中的库。查看所有索引的命令是curl -XGET "es:9200/_cat/indices?v"
  • type ==》类型 ==》相当于Mysql中的表,存储json类型的数据
  • document ==》文档 ==》相当于Mysql表中的一行记录
  • field ==》列 ==》相当于mysql中的列,即一个属性
  • 倒排索引(inverted index)???
  • shard 主分片 where the data is stored
  • replica 副本 Replicas are used to increase search performance and for fail-over.
    https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch

优化分片

如何查看分片分布?

节点数大于等于分片数。
一个index数据有300G,分成10个分片,即每个分片30G。(主分片)

设置节点最大分片数

一个索引(index)默认5个主分片(shard),每个主分片有1个副本(replica)

一个集群中的分片总数 =
索引1的主分片 + 索引1的副本 +
索引2的主分片 + 索引2的副本 +
索引3的主分片 + 索引3的副本 +

增大节点最大分片数可以让集群容纳更多的索引。

elasticsearch7默认分片只有1000个,目前已经用完了,导致已经没法创建新的索引了。

查看节点最大分片数

curl es:9200/_cluster/settings?pretty

临时设置节点最大分片数

curl -XPUT -H "Content-Type:application/json" es:9200/_cluster/settings -d '{ "transient": { "cluster": { "max_shards_per_node": 10000 } } }'

永久设置节点最大分片数

curl -XPUT -H "Content-Type:application/json" es:9200/_cluster/settings -d '{ "persistent": { "cluster": { "max_shards_per_node": 10000 } } }'

改动就生效,不需要重启集群服务器。

参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html

安装

  • 单机
    • 下载tar.gz包
    • 使用docker或docker-compose

操作

  • 新建和删除Index

    • 新建 curl -XPUT "es:9200/index_name"
    • 删除 curl -XDELETE "es:9200/index_name"
  • 记录的crud

    • 添加记录 curl -XPUT 'es:9200/index_name/sweet/0?pretty' -H 'Content-Type: application/json' -d ' { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } '
    • 查询一条记录
      • 查询记录
        curl -XGET 'es:9200/index_name/sweet/0?pretty'
      • 不返回记录【可能无法正常工作】curl -XHEAD 'es:9200/index_name/sweet/0?pretty'
      • 不返回记录【能正常工作】curl -I 'es:9200/index_name/sweet/0?pretty'
      • 更新记录 curl -XPUT 'es:9200/index_name/sweet/0?pretty' -H 'Content-Type: application/json' -d ' { "user" : "kimchy-new", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch new " } '
    • 删除记录 curl -XDELETE 'es:9200/index_name/sweet/0?pretty'
    • 查询所有记录
      • 不带条件查询所有 curl -XGET 'es:9200/index_name/sweet/_search?pretty'
      • 带条件查询所有 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
        {
        “query”: {
          "match_all": {}
        }
        }
      • 查询message字段(方法一) `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
        {
        “query”: {
          "match": {
              "message": "new"
          }
        }
        }
        ‘`
      • 查询message字段(方法二) curl -XGET 'es:9200/index_name/sweet/_search?pretty=true&q=message:new'
      • 分页查询message字段 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
        {
        “query”: {
          "match": {
              "message": "new"
          }
        },
        “from”: 1, – 开始索引,从第几条数据开始,默认是0
        “size”: 1 – 取多少条数据
        }
        ‘`
      • 使用query_string查询 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
        {
        “query”: {
          "query_string": {
              "query" : "((message:new) OR (message:a)) AND (id:(>=0 AND <20))"
          }
        },
        “from”: 1, – 开始索引,从第几条数据开始,默认是0
        “size”: 1 – 取多少条数据
        }
        ‘`
      • es配置密码后访问 curl -XGET --user USER:PASS es:9200/index_name
      • 只查询某些字段 curl -H 'Content-Type: application/json' -XGET es:9200/index_name/doc/_search?pretty -d '{ "query": { "query_string": { "query" : "(message:new)" }}, "_source": ["messaage", "id"] , "from": 0, "size": 10 } '
        `
  • 查看索引的创建时间 https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
    查看所有字段 curl 'h102:9200/_cat/indices?help'
    curl 'h102:9200/_cat/indices?format=json&pretty&h=health,status,index,pri,rep,creation.date.string'
    curl 'nj201:9200/_cat/indices?format=json&pretty&h=health,status,index,pri,rep,creation.date.string&s=creation.date.string:asc'

常用命令

查询某一个索引下,某一字段的分词结果:
curl -XGET es:9200/index_a/_doc/100/_termvectors?fields=remark

查看索引列表 curl --user elastic:???? es:9200/_cat/indices
查看索引mapping: curl -X GET “es:9200//_mapping?pretty”

查看模板列表 /cat/templates
查看某一模板 /
template/tpl-xxx

查看集群在忙什么? /_cat/tasks?v

分词器

默认/传统/标准分词器:

curl -XGET es:9200/_analyze -H 'Content-Type: application/json' -d '{ "analyzer": "standard", "text": "学习java和spring" }'

IK分词插件 (https://www.cnblogs.com/soft2018/p/10203330.html

下载地址: https://github.com/medcl/elasticsearch-analysis-ik/releases
注意: es-ik分词插件版本一定要和es安装的版本对应
第一步:下载es的IK插件,改名为ik.zip(上传时候是上传ik为名的文件夹)
第二步: 上传到/usr/local/elasticsearch-6.4.3/plugins
第三步: 重启elasticsearch即可

curl -XGET es:9200/_analyze -H 'Content-Type: application/json' -d '{ "analyzer": "ik_smart", "text": "学习java和spring" }'

可以自定义词典

定义模板时使用 IK :
curl -X PUT es:9200/template/a-template -H ‘Content-Type: application/json’ -d ‘
{
“template”: “index-pattern-*”,
“settings”: {
“number
of_shards”: 3
},
“mappings”: {
“properties”: {
“id”: {
“type”: “long”
},
“name”: {
“type”: “text”,
“store”: false,
“analyzer”: “ik_max_word”,
“search_analyzer”: “ik_max_word”
}
}
}
}

ilm

hot: The index is actively being written to
warm: The index is generally not being written to, but is still queried
cold: The index is no longer being updated and is seldom queried. The information still needs to be searchable, but it’s okay if those queries are slower.
delete: The index is no longer needed and can safely be deleted

Hot

  • Set Priority

  • Unfollow

  • Rollover
    Warm

  • Set Priority

  • Unfollow

  • Read-Only

  • Allocate

  • Shrink

  • Force Merge
    Cold

  • Set Priority

  • Unfollow

  • Allocate

  • Freeze
    Delete

  • Delete

  • 定义一个删除阶段的策略(10天后删除)

curl -X PUT "http://localhost:9200/_ilm/policy/my_policy?pretty" -H 'Content-Type: application/json' -d'
{
  "policy": {                       
    "phases": {
      "delete": {
        "min_age": "10d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}
'
  • 显示所有策略
    curl -X GET "http://localhost:9200/_ilm/policy"
  • 把策略应用到索引模板中
    curl -X PUT "http://localhost:9200/_template/my_template?pretty" -H 'Content-Type: application/json' -d'
    {
    "index_patterns": ["my-index-*"],
    "settings": {
      "number_of_shards": 1,
      "number_of_replicas": 1,
      "index.lifecycle.name": "my_policy",
      "indexing_complete": "true"
    }
    }
    '
  • 应用到现有的多个索引中
    curl -X PUT "http://localhost:9200/my-index2-202008*/_settings?pretty" -H 'Content-Type: application/json' -d'
    {
    "index": {
      "lifecycle": {
        "name": "my_policy"
      }
    }
    }
    '
  • 显示索引所处的阶段
    curl -X GET "http://localhost:9200/my-index-*/_ilm/explain?pretty"

中文分词设置

注意事项

  • es索引名称中不能有大写字母,如Invalid index name [testHello_20191101_e20191101], must be lowercase,(这个错误是logstash中配置es的index时出现的错误)
  • 不要用root 启动es. (can not run elasticsearch as root: https://blog.csdn.net/showhilllee/article/details/53404042)
  • 在hosts文件里配置es,如192.168.0.115 es,好处是就像定义一个变量,在任何机器上只要复制curl命令即可而不需要改动ip,如curl es:9200而不是curl 192.168.0.115:9200
  • 同一个 Index 里面的 Document,不要求有相同的结构(scheme),但是最好保持相同,这样有利于提高搜索效率。
  • 查询语法 https://www.elastic.co/guide/en/elasticsearch/reference/6.2/query-dsl.html

问题

  • 问题:max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
  • 解决办法:su - root, vi /etc/security/limits.conf , add text * soft nofile 65536 * hard nofile 131072 , reboot
  • 问题:max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
  • 解决办法:su - root, vi /etc/sysctl.conf , add vm.max_map_count=655360, sysctl -p
    • echo vm.max_map_count=655360 >> /etc/sysctl.conf && sysctl -p
  • 问题:使用curl命令时出现Invalid UTF-8 middle byte 0xc5\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@4d3551af;
  • 解决办法:使用postman发送请求
  • 问题:restTemplate中往动态index中添加数据
  • 解决办法:
@Autowired
    private ElasticsearchTemplate elasticsearchTemplate;

    @Value("${spring.data.elasticsearch.user-and-password}")
    private String userAndPassword;

    @Autowired
    private RestTemplate restTemplate;

    @Test
    public void testQueryByRestTemplate() {
        String indexName = makeIndexExist();
        Map<String, Object> varsRequest = new HashMap<String, Object>();
        Map<String, Object> queryContent = new HashMap<String, Object>();
        Map<String, Object> matchContent = new HashMap<String, Object>();
        queryContent.put("match_all", matchContent);
        varsRequest.put("query", queryContent);
        String[] _userAndPassword = StringUtils.split(userAndPassword, ":");
        String auth = Base64.encodeBase64String((_userAndPassword[0] + ":" + _userAndPassword[1]).getBytes());
//        restTemplate.headForHeaders("http://es:9200","Authorization", "Basic " + auth);
        HttpHeaders httpHeaders = new HttpHeaders();
        httpHeaders.setContentType(MediaType.parseMediaType("application/json; charset=UTF-8"));
        httpHeaders.add("Authorization", "Basic " + auth);
        ResponseEntity<String> exchange = restTemplate.exchange("http://192.168.0.204:9200/"+indexName+"/_search",
                HttpMethod.GET, new HttpEntity<Map<String, Object>>(varsRequest, httpHeaders), String.class);
        System.out.println(exchange.getBody());
    }

    @Test
    public void testSaveEntityByRestTemplate() {
        String mobile = "12x4128xxx";
        String eventName = "pageview";
        for (int i = 0; i < 10; i++) {
            Long eventTime = System.currentTimeMillis();
            testSaveByRestTemplate(mobile + i, eventName, eventTime);
        }
    }

    public void testSaveByRestTemplate(String mobile, String eventName, Long eventTime) {
        String indexName = makeIndexExist();
        String[] _userAndPassword = StringUtils.split(userAndPassword, ":");
        String auth = Base64.encodeBase64String((_userAndPassword[0] + ":" + _userAndPassword[1]).getBytes());
        HttpHeaders httpHeaders = new HttpHeaders();
        httpHeaders.setContentType(MediaType.parseMediaType("application/json; charset=UTF-8"));
        httpHeaders.add("Authorization", "Basic " + auth);

//        String mobile = "12x4128xxxx";
//        String eventName = "pageview";
        long finalEventTime = System.currentTimeMillis();
        if (eventTime != null) {
            finalEventTime = eventTime;
        }

        Map<String, String> map = new HashMap<>();
        map.put("eventName", eventName);
        map.put("eventTime", String.valueOf(finalEventTime));
        map.put("siteId", "dk8vUOhCciXNooRM");
        map.put("sessionId", "1ba6bd55524d87ec");
        map.put("deviceId", "a0603f6d56ebdb10");
        map.put("userId", "");
        map.put("ip", "180.110.126.56");
        map.put("userAgent", "\"Mozilla/5.0 (Linux; Android 8.1.0; PBCT10 Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/66.0.3359.126 MQQBrowser/6.2 TBS/044704 Mobile Safari/537.36--[{\\x22clienttype\\x22:\\x222\\x22,\\x22appName\\x22:\\x22lyf\\x22,\\x22company\\x22:\\x22ody\\x22,\\x22sessionId\\x22:\\x22862352044317453\\x22,\\x22deviceid\\x22:\\x22862352044317453\\x22,\\x22version\\x22:\\x226.0.80\\x22,\\x22ut\\x22:\\x222d259a4c20a2fde9c75f400ebb0fd96485\\x22}]--\"");
        map.put("uaName", "Chrome Mobile WebView");
        map.put("uaMajor", "66");
        map.put("resolution", "360x780");
        map.put("language", "zh-CN");
        map.put("netType", "4G");
        map.put("country", "中国");
        map.put("plugin", "cookie:1");
        map.put("continentCode", "AS");
        map.put("region", "江苏省");
        map.put("city", "南京");
        map.put("lat", "32.0617");
        map.put("lgt", "118.7778");
        map.put("url", "http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html");
        map.put("title", "Apache Hadoop 2.9.2 &amp;#x2013; HDFS Router-based Federation");
        map.put("referer", "http://taobao.com");
        map.put("eventBody", "");
        map.put("mobileNo", mobile);

        HttpEntity<String> entityPut = new HttpEntity<>(JsonUtils.toJson(map), httpHeaders);
        ResponseEntity<String> exchange1 = restTemplate.exchange("http://192.168.0.204:9200/"+indexName+"/doc", HttpMethod.POST, entityPut, String.class);
        System.out.println(exchange1.getBody());
    }

    public String makeIndexExist() {
        String indexName = "event_test" + DateUtils.formatDate(DateUtils.getToday(), "yyyyMMdd");
        if (!elasticsearchTemplate.indexExists(indexName)) {
            elasticsearchTemplate.createIndex(indexName);
        }

        return indexName;
    }

导入导出

使用 elasticsearchdump

# 安装
npm install elasticdump -g
# 使用
elasticdump

导出示例

#格式:elasticdump --input {protocol}://{host}:{port}/{index} --output ./test_index.json
#例子:将ES中的test_index 中的索引导出
#导出当前索引的mapping结构
$ elasticdump --input http://192.168.56.104:9200/test_index --output ./test_index_mapping.json --type=mapping
#导出当前索引下的所有真实数据
$ elasticdump --input http://192.168.56.104:9200/test_index --output ./test_index.json --type=data

导入示例

# 创建索引
$ curl -XPUT http:192.168.56.104:9200/test_index
#因为导入的是mapping,所以设置type为mapping
$ elasticdump --input ./test_index_mapping.json --output http://192.168.56.105:9200/ --type=mapping
#因为导入的是data(真实数据)所以设置type为data
$ elasticdump --input ./test_index.json --output http://192.168.56.105:9200/ --type=data

异常:

CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory

export NODE_OPTIONS=”–max-old-space-size=8192”

docker版elasticsearchdump

# 镜像下载
$ docker pull taskrabbit/elasticsearch-dump
# 下面还是例子:通过镜像导出数据到本地
# 创建一个文件夹用于保存导出数据
$ mkdir -p /root/data
# 下面需要对路径进行映射并执行命令(导出mapping)
$ docker run --rm -ti -v /data:/tmp taskrabbit/elasticsearch-dump \
  --input=http://production.es.com:9200/my_index \
  --output=/tmp/my_index_mapping.json \
  --type=mapping
# 导出(data)
$ docker run --rm -ti -v /root/data:/tmp taskrabbit/elasticsearch-dump \
  --input=http://192.168.56.104:9200/test_index \
  --output=/tmp/elasticdump_export.json \
  --type=data
  -----------------------------------------------------------------------------
# 以下内容为ES -> ES的数据迁移例子
$ docker run --rm -ti taskrabbit/elasticsearch-dump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=mapping
$ docker run --rm -ti taskrabbit/elasticsearch-dump \
  --input=http://production.es.com:9200/my_index \
  --output=http://staging.es.com:9200/my_index \
  --type=data

问题

  • NoNodeAvailableException[None of the configured nodes are available:
    • 看一下config/elasticsearch.yml中的cluster.name,要和代码中的一致,
      network.host: 0.0.0.0
    • java配置中的cluster-nodes是es:9300,而不是9200
    • 版本问题 spring-boot 2.1.6 和 es7.4 不成功,和 es6.7.2 可以
作者:张三  创建时间:2026-03-12 11:56
最后编辑:张三  更新时间:2026-03-12 11:57