目标
- 是什么?是一个
分布式、可扩展的实时搜索和分析引擎。 - 解决什么问题?分布式、全文检索、数据分析
- 瓶颈及限制:???
- 优化及参数配置:???
- 安装、启动:
- 单机版
- 集群版
- curl方式的crud,着重使用query_string
- java方式的crud
- elasticsearch-head插件的简单操作
- 导入导出
- elasticsearchdump
- docker版elasticsearchdump
概念
- Node 与 Cluster Elastic 本质上是一个分布式数据库,允许多台服务器协同工作,每台服务器可以运行多个 Elastic 实例。单个 Elastic 实例称为一个节点(node)。一组节点构成一个集群(cluster)。查看节点的命令是
curl -XGET "es:9200/_cat/nodes?v" - index ==》索引 ==》相当于Mysql中的库。查看所有索引的命令是
curl -XGET "es:9200/_cat/indices?v" - type ==》类型 ==》相当于Mysql中的表,存储json类型的数据
- document ==》文档 ==》相当于Mysql表中的一行记录
- field ==》列 ==》相当于mysql中的列,即一个属性
- 倒排索引(inverted index)???
- shard 主分片 where the data is stored
- replica 副本 Replicas are used to increase search performance and for fail-over.
https://stackoverflow.com/questions/15694724/shards-and-replicas-in-elasticsearch
优化分片
如何查看分片分布?
节点数大于等于分片数。
一个index数据有300G,分成10个分片,即每个分片30G。(主分片)
设置节点最大分片数
一个索引(index)默认5个主分片(shard),每个主分片有1个副本(replica)
一个集群中的分片总数 =
索引1的主分片 + 索引1的副本 +
索引2的主分片 + 索引2的副本 +
索引3的主分片 + 索引3的副本 +
…
增大节点最大分片数可以让集群容纳更多的索引。
elasticsearch7默认分片只有1000个,目前已经用完了,导致已经没法创建新的索引了。
查看节点最大分片数
curl es:9200/_cluster/settings?pretty临时设置节点最大分片数
curl -XPUT -H "Content-Type:application/json" es:9200/_cluster/settings -d '{ "transient": { "cluster": { "max_shards_per_node": 10000 } } }'永久设置节点最大分片数
curl -XPUT -H "Content-Type:application/json" es:9200/_cluster/settings -d '{ "persistent": { "cluster": { "max_shards_per_node": 10000 } } }'改动就生效,不需要重启集群服务器。
参考:https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-update-settings.html
安装
- 单机
- 下载tar.gz包
- 使用docker或docker-compose
操作
新建和删除Index
- 新建
curl -XPUT "es:9200/index_name" - 删除
curl -XDELETE "es:9200/index_name"
- 新建
记录的crud
- 添加记录
curl -XPUT 'es:9200/index_name/sweet/0?pretty' -H 'Content-Type: application/json' -d ' { "user" : "kimchy", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch" } ' - 查询一条记录
- 查询记录
curl -XGET 'es:9200/index_name/sweet/0?pretty' - 不返回记录【可能无法正常工作】
curl -XHEAD 'es:9200/index_name/sweet/0?pretty' - 不返回记录【能正常工作】
curl -I 'es:9200/index_name/sweet/0?pretty' - 更新记录
curl -XPUT 'es:9200/index_name/sweet/0?pretty' -H 'Content-Type: application/json' -d ' { "user" : "kimchy-new", "post_date" : "2009-11-15T14:12:12", "message" : "trying out Elasticsearch new " } '
- 查询记录
- 删除记录
curl -XDELETE 'es:9200/index_name/sweet/0?pretty' - 查询所有记录
- 不带条件查询所有
curl -XGET 'es:9200/index_name/sweet/_search?pretty' - 带条件查询所有 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
{
“query”: {}"match_all": {}
}
‘ - 查询message字段(方法一) `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
{
“query”: {}"match": { "message": "new" }
}
‘` - 查询message字段(方法二)
curl -XGET 'es:9200/index_name/sweet/_search?pretty=true&q=message:new' - 分页查询message字段 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
{
“query”: {},"match": { "message": "new" }
“from”: 1, – 开始索引,从第几条数据开始,默认是0
“size”: 1 – 取多少条数据
}
‘` - 使用query_string查询 `curl -XGET ‘es:9200/index_name/sweet/_search?pretty’ -d ‘
{
“query”: {},"query_string": { "query" : "((message:new) OR (message:a)) AND (id:(>=0 AND <20))" }
“from”: 1, – 开始索引,从第几条数据开始,默认是0
“size”: 1 – 取多少条数据
}
‘` - es配置密码后访问
curl -XGET --user USER:PASS es:9200/index_name - 只查询某些字段
curl -H 'Content-Type: application/json' -XGET es:9200/index_name/doc/_search?pretty -d '{ "query": { "query_string": { "query" : "(message:new)" }}, "_source": ["messaage", "id"] , "from": 0, "size": 10 } '
`
- 不带条件查询所有
- 添加记录
查看索引的创建时间 https://www.elastic.co/guide/en/elasticsearch/reference/current/cat.html
查看所有字段curl 'h102:9200/_cat/indices?help'curl 'h102:9200/_cat/indices?format=json&pretty&h=health,status,index,pri,rep,creation.date.string'curl 'nj201:9200/_cat/indices?format=json&pretty&h=health,status,index,pri,rep,creation.date.string&s=creation.date.string:asc'
常用命令
查询某一个索引下,某一字段的分词结果:curl -XGET es:9200/index_a/_doc/100/_termvectors?fields=remark
查看索引列表 curl --user elastic:???? es:9200/_cat/indices
查看索引mapping: curl -X GET “es:9200/
查看模板列表 /cat/templates
查看某一模板 /template/tpl-xxx
查看集群在忙什么? /_cat/tasks?v
分词器
默认/传统/标准分词器:
curl -XGET es:9200/_analyze -H 'Content-Type: application/json' -d '{ "analyzer": "standard", "text": "学习java和spring" }'
IK分词插件 (https://www.cnblogs.com/soft2018/p/10203330.html)
下载地址: https://github.com/medcl/elasticsearch-analysis-ik/releases
注意: es-ik分词插件版本一定要和es安装的版本对应
第一步:下载es的IK插件,改名为ik.zip(上传时候是上传ik为名的文件夹)
第二步: 上传到/usr/local/elasticsearch-6.4.3/plugins
第三步: 重启elasticsearch即可curl -XGET es:9200/_analyze -H 'Content-Type: application/json' -d '{ "analyzer": "ik_smart", "text": "学习java和spring" }'
可以自定义词典
定义模板时使用 IK :
curl -X PUT es:9200/template/a-template -H ‘Content-Type: application/json’ -d ‘
{
“template”: “index-pattern-*”,
“settings”: {
“numberof_shards”: 3
},
“mappings”: {
“properties”: {
“id”: {
“type”: “long”
},
“name”: {
“type”: “text”,
“store”: false,
“analyzer”: “ik_max_word”,
“search_analyzer”: “ik_max_word”
}
}
}
}
‘
ilm
- phase, see https://www.elastic.co/guide/en/elasticsearch/reference/7.4.2/ilm-policy-definition.html
- action, see https://www.elastic.co/guide/en/elasticsearch/reference/7.4.2/_actions.html
hot: The index is actively being written to
warm: The index is generally not being written to, but is still queried
cold: The index is no longer being updated and is seldom queried. The information still needs to be searchable, but it’s okay if those queries are slower.
delete: The index is no longer needed and can safely be deleted
Hot
Set Priority
Unfollow
Rollover
WarmSet Priority
Unfollow
Read-Only
Allocate
Shrink
Force Merge
ColdSet Priority
Unfollow
Allocate
Freeze
DeleteDelete
定义一个删除阶段的策略(10天后删除)
curl -X PUT "http://localhost:9200/_ilm/policy/my_policy?pretty" -H 'Content-Type: application/json' -d'
{
"policy": {
"phases": {
"delete": {
"min_age": "10d",
"actions": {
"delete": {}
}
}
}
}
}
'- 显示所有策略
curl -X GET "http://localhost:9200/_ilm/policy" - 把策略应用到索引模板中
curl -X PUT "http://localhost:9200/_template/my_template?pretty" -H 'Content-Type: application/json' -d' { "index_patterns": ["my-index-*"], "settings": { "number_of_shards": 1, "number_of_replicas": 1, "index.lifecycle.name": "my_policy", "indexing_complete": "true" } } ' - 应用到现有的多个索引中
curl -X PUT "http://localhost:9200/my-index2-202008*/_settings?pretty" -H 'Content-Type: application/json' -d' { "index": { "lifecycle": { "name": "my_policy" } } } ' - 显示索引所处的阶段
curl -X GET "http://localhost:9200/my-index-*/_ilm/explain?pretty"
中文分词设置
注意事项
- es索引名称中不能有大写字母,如Invalid index name [testHello_20191101_e20191101], must be lowercase,(这个错误是logstash中配置es的index时出现的错误)
- 不要用root 启动es. (can not run elasticsearch as root: https://blog.csdn.net/showhilllee/article/details/53404042)
- 在hosts文件里配置es,如
192.168.0.115 es,好处是就像定义一个变量,在任何机器上只要复制curl命令即可而不需要改动ip,如curl es:9200而不是curl 192.168.0.115:9200。 - 同一个 Index 里面的 Document,不要求有相同的结构(scheme),但是最好保持相同,这样有利于提高搜索效率。
- 查询语法 https://www.elastic.co/guide/en/elasticsearch/reference/6.2/query-dsl.html
问题
- 问题:max file descriptors [4096] for elasticsearch process is too low, increase to at least [65536]
- 解决办法:su - root, vi /etc/security/limits.conf , add
text * soft nofile 65536 * hard nofile 131072, reboot - 问题:max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]
- 解决办法:su - root, vi /etc/sysctl.conf , add
vm.max_map_count=655360, sysctl -p- echo vm.max_map_count=655360 >> /etc/sysctl.conf && sysctl -p
- 问题:使用curl命令时出现Invalid UTF-8 middle byte 0xc5\n at [Source: org.elasticsearch.common.bytes.BytesReference$MarkSupportingStreamInputWrapper@4d3551af;
- 解决办法:使用postman发送请求
- 问题:restTemplate中往动态index中添加数据
- 解决办法:
@Autowired
private ElasticsearchTemplate elasticsearchTemplate;
@Value("${spring.data.elasticsearch.user-and-password}")
private String userAndPassword;
@Autowired
private RestTemplate restTemplate;
@Test
public void testQueryByRestTemplate() {
String indexName = makeIndexExist();
Map<String, Object> varsRequest = new HashMap<String, Object>();
Map<String, Object> queryContent = new HashMap<String, Object>();
Map<String, Object> matchContent = new HashMap<String, Object>();
queryContent.put("match_all", matchContent);
varsRequest.put("query", queryContent);
String[] _userAndPassword = StringUtils.split(userAndPassword, ":");
String auth = Base64.encodeBase64String((_userAndPassword[0] + ":" + _userAndPassword[1]).getBytes());
// restTemplate.headForHeaders("http://es:9200","Authorization", "Basic " + auth);
HttpHeaders httpHeaders = new HttpHeaders();
httpHeaders.setContentType(MediaType.parseMediaType("application/json; charset=UTF-8"));
httpHeaders.add("Authorization", "Basic " + auth);
ResponseEntity<String> exchange = restTemplate.exchange("http://192.168.0.204:9200/"+indexName+"/_search",
HttpMethod.GET, new HttpEntity<Map<String, Object>>(varsRequest, httpHeaders), String.class);
System.out.println(exchange.getBody());
}
@Test
public void testSaveEntityByRestTemplate() {
String mobile = "12x4128xxx";
String eventName = "pageview";
for (int i = 0; i < 10; i++) {
Long eventTime = System.currentTimeMillis();
testSaveByRestTemplate(mobile + i, eventName, eventTime);
}
}
public void testSaveByRestTemplate(String mobile, String eventName, Long eventTime) {
String indexName = makeIndexExist();
String[] _userAndPassword = StringUtils.split(userAndPassword, ":");
String auth = Base64.encodeBase64String((_userAndPassword[0] + ":" + _userAndPassword[1]).getBytes());
HttpHeaders httpHeaders = new HttpHeaders();
httpHeaders.setContentType(MediaType.parseMediaType("application/json; charset=UTF-8"));
httpHeaders.add("Authorization", "Basic " + auth);
// String mobile = "12x4128xxxx";
// String eventName = "pageview";
long finalEventTime = System.currentTimeMillis();
if (eventTime != null) {
finalEventTime = eventTime;
}
Map<String, String> map = new HashMap<>();
map.put("eventName", eventName);
map.put("eventTime", String.valueOf(finalEventTime));
map.put("siteId", "dk8vUOhCciXNooRM");
map.put("sessionId", "1ba6bd55524d87ec");
map.put("deviceId", "a0603f6d56ebdb10");
map.put("userId", "");
map.put("ip", "180.110.126.56");
map.put("userAgent", "\"Mozilla/5.0 (Linux; Android 8.1.0; PBCT10 Build/OPM1.171019.011; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/66.0.3359.126 MQQBrowser/6.2 TBS/044704 Mobile Safari/537.36--[{\\x22clienttype\\x22:\\x222\\x22,\\x22appName\\x22:\\x22lyf\\x22,\\x22company\\x22:\\x22ody\\x22,\\x22sessionId\\x22:\\x22862352044317453\\x22,\\x22deviceid\\x22:\\x22862352044317453\\x22,\\x22version\\x22:\\x226.0.80\\x22,\\x22ut\\x22:\\x222d259a4c20a2fde9c75f400ebb0fd96485\\x22}]--\"");
map.put("uaName", "Chrome Mobile WebView");
map.put("uaMajor", "66");
map.put("resolution", "360x780");
map.put("language", "zh-CN");
map.put("netType", "4G");
map.put("country", "中国");
map.put("plugin", "cookie:1");
map.put("continentCode", "AS");
map.put("region", "江苏省");
map.put("city", "南京");
map.put("lat", "32.0617");
map.put("lgt", "118.7778");
map.put("url", "http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs-rbf/HDFSRouterFederation.html");
map.put("title", "Apache Hadoop 2.9.2 &#x2013; HDFS Router-based Federation");
map.put("referer", "http://taobao.com");
map.put("eventBody", "");
map.put("mobileNo", mobile);
HttpEntity<String> entityPut = new HttpEntity<>(JsonUtils.toJson(map), httpHeaders);
ResponseEntity<String> exchange1 = restTemplate.exchange("http://192.168.0.204:9200/"+indexName+"/doc", HttpMethod.POST, entityPut, String.class);
System.out.println(exchange1.getBody());
}
public String makeIndexExist() {
String indexName = "event_test" + DateUtils.formatDate(DateUtils.getToday(), "yyyyMMdd");
if (!elasticsearchTemplate.indexExists(indexName)) {
elasticsearchTemplate.createIndex(indexName);
}
return indexName;
}导入导出
使用 elasticsearchdump
# 安装
npm install elasticdump -g
# 使用
elasticdump导出示例
#格式:elasticdump --input {protocol}://{host}:{port}/{index} --output ./test_index.json
#例子:将ES中的test_index 中的索引导出
#导出当前索引的mapping结构
$ elasticdump --input http://192.168.56.104:9200/test_index --output ./test_index_mapping.json --type=mapping
#导出当前索引下的所有真实数据
$ elasticdump --input http://192.168.56.104:9200/test_index --output ./test_index.json --type=data导入示例
# 创建索引
$ curl -XPUT http:192.168.56.104:9200/test_index
#因为导入的是mapping,所以设置type为mapping
$ elasticdump --input ./test_index_mapping.json --output http://192.168.56.105:9200/ --type=mapping
#因为导入的是data(真实数据)所以设置type为data
$ elasticdump --input ./test_index.json --output http://192.168.56.105:9200/ --type=data异常:
CALL_AND_RETRY_LAST Allocation failed - JavaScript heap out of memory
export NODE_OPTIONS=”–max-old-space-size=8192”
docker版elasticsearchdump
# 镜像下载
$ docker pull taskrabbit/elasticsearch-dump
# 下面还是例子:通过镜像导出数据到本地
# 创建一个文件夹用于保存导出数据
$ mkdir -p /root/data
# 下面需要对路径进行映射并执行命令(导出mapping)
$ docker run --rm -ti -v /data:/tmp taskrabbit/elasticsearch-dump \
--input=http://production.es.com:9200/my_index \
--output=/tmp/my_index_mapping.json \
--type=mapping
# 导出(data)
$ docker run --rm -ti -v /root/data:/tmp taskrabbit/elasticsearch-dump \
--input=http://192.168.56.104:9200/test_index \
--output=/tmp/elasticdump_export.json \
--type=data
-----------------------------------------------------------------------------
# 以下内容为ES -> ES的数据迁移例子
$ docker run --rm -ti taskrabbit/elasticsearch-dump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=mapping
$ docker run --rm -ti taskrabbit/elasticsearch-dump \
--input=http://production.es.com:9200/my_index \
--output=http://staging.es.com:9200/my_index \
--type=data问题
- NoNodeAvailableException[None of the configured nodes are available:
- 看一下config/elasticsearch.yml中的cluster.name,要和代码中的一致,
network.host: 0.0.0.0 - java配置中的cluster-nodes是es:9300,而不是9200
- 版本问题 spring-boot 2.1.6 和 es7.4 不成功,和 es6.7.2 可以
- 看一下config/elasticsearch.yml中的cluster.name,要和代码中的一致,
最后编辑:张三 更新时间:2026-03-12 11:57