欢迎来到悦读文库! | 帮助中心 分享价值,成长自我!
悦读文库

elasticsearch报Data too large异常

在线上ES集群日志中发现了如下异常,elasticsearch版本为7.3.2


[2021-03-16T21:05:10,338][DEBUG][o.e.a.a.c.n.i.TransportNodesInfoAction] [java-d-service-es-200-56-client-1] failed to execute on node [hsF4JzeAQ6mflJRGnJIKzQ]

org.elasticsearch.transport.RemoteTransportException: [data-es-group-online-200-67-2][10.110.200.67:9301][cluster:monitor/nodes/info[n]]

Caused by: org.elasticsearch.common.breaker.CircuitBreakingException: [parent] Data too large, data for [] would be [33093117638/30.8gb], which is larger than the limit of [31621696716/29.4gb], real usage: [33093114144/30.8gb], new bytes reserved: [3494/3.4kb], usages [request=0/0b, fielddata=0/0b, in_flight_requests=3494/3.4kb, accounting=104564949/99.7mb]

 at org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService.checkParentLimit(HierarchyCircuitBreakerService.java:342) ~[elasticsearch-7.3.2.jar:7.3.2]

 at org.elasticsearch.common.breaker.ChildMemoryCircuitBreaker.addEstimateBytesAndMaybeBreak(ChildMemoryCircuitBreaker.java:128) ~[elasticsearch-7.3.2.jar:7.3.2]

 at org.elasticsearch.transport.InboundHandler.handleRequest(InboundHandler.java:173) [elasticsearch-7.3.2.jar:7.3.2]

 at org.elasticsearch.transport.InboundHandler.messageReceived(InboundHandler.java:121) [elasticsearch-7.3.2.jar:7.3.2]

 at org.elasticsearch.transport.InboundHandler.inboundMessage(InboundHandler.java:105) [elasticsearch-7.3.2.jar:7.3.2]

 at org.elasticsearch.transport.TcpTransport.inboundMessage(TcpTransport.java:660) [elasticsearch-7.3.2.jar:7.3.2]

 at org.elasticsearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:62) [transport-netty4-client-7.3.2.jar:7.3.2]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.handler.codec.ByteToMessageDecoder.fireChannelRead(ByteToMessageDecoder.java:323) [netty-codec-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:297) [netty-codec-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:241) [netty-handler-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:352) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1408) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:374) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:360) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:930) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:163) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:682) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:582) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:536) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:496) [netty-transport-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:906) [netty-common-4.1.36.Final.jar:4.1.36.Final]

 at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.36.Final.jar:4.1.36.Final]

 at java.lang.Thread.run(Thread.java:835) [?:?]

[2021-03-16T21:05:11,203][INFO ][o.e.x.s.a.AuthenticationServi 

拉下ES源码,报错类位置org.elasticsearch.indices.breaker.HierarchyCircuitBreakerService,具体代码如下:


   public void checkParentLimit(long newBytesReserved, String label) throws CircuitBreakingException {

        final MemoryUsage memoryUsed = memoryUsed(newBytesReserved);

        long parentLimit = this.parentSettings.getLimit();

        if (memoryUsed.totalUsage > parentLimit) {

            this.parentTripCount.incrementAndGet();

            final StringBuilder message = new StringBuilder("[parent] Data too large, data for [" + label + "]" +

                    " would be [" + memoryUsed.totalUsage + "/" + new ByteSizeValue(memoryUsed.totalUsage) + "]" +

                    ", which is larger than the limit of [" +

                    parentLimit + "/" + new ByteSizeValue(parentLimit) + "]");

            if (this.trackRealMemoryUsage) {

                final long realUsage = memoryUsed.baseUsage;

                message.append(", real usage: [");

                message.append(realUsage);

                message.append("/");

                message.append(new ByteSizeValue(realUsage));

                message.append("], new bytes reserved: [");

                message.append(newBytesReserved);

                message.append("/");

                message.append(new ByteSizeValue(newBytesReserved));

                message.append("]");

            } else {

                message.append(", usages [");

                message.append(String.join(", ",

                    this.breakers.entrySet().stream().map(e -> {

                        final CircuitBreaker breaker = e.getValue();

                        final long breakerUsed = (long)(breaker.getUsed() * breaker.getOverhead());

                        return e.getKey() + "=" + breakerUsed + "/" + new ByteSizeValue(breakerUsed);

                    })

                        .collect(Collectors.toList())));

                message.append("]");

            }

            // derive durability of a tripped parent breaker depending on whether the majority of memory tracked by

            // child circuit breakers is categorized as transient or permanent.

            CircuitBreaker.Durability durability = memoryUsed.transientChildUsage >= memoryUsed.permanentChildUsage ?

                CircuitBreaker.Durability.TRANSIENT : CircuitBreaker.Durability.PERMANENT;

            throw new CircuitBreakingException(message.toString(), memoryUsed.totalUsage, parentLimit, durability);

        }

    } 

从代码可以看出,当memoryUsed.totalUsage > parentLimit时,才会出现熔断;parentLimit的值与配置indices.breaker.total.limit(默认值为95%或者70%)有关,它的默认值与indices.breaker.total.use_real_memory(默认值为true)的配置有关,如下代码所示:


    public static final SettingUSE_REAL_MEMORY_USAGE_SETTING =

        Setting.boolSetting("indices.breaker.total.use_real_memory", true, Property.NodeScope);


    public static final SettingTOTAL_CIRCUIT_BREAKER_LIMIT_SETTING =

        Setting.memorySizeSetting("indices.breaker.total.limit", settings -> {

            if (USE_REAL_MEMORY_USAGE_SETTING.get(settings)) {

                return "95%";

            } else {

                return "70%";

            }

        }, Property.Dynamic, Property.NodeScope); 

我们再来看看memoryUsed.totalUsage的值,它是该类的一个方法计算出来,代码如下:


 private MemoryUsage memoryUsed(long newBytesReserved) {

        long transientUsage = 0;

        long permanentUsage = 0;


        for (CircuitBreaker breaker : this.breakers.values()) {

            long breakerUsed = (long)(breaker.getUsed() * breaker.getOverhead());

            if (breaker.getDurability() == CircuitBreaker.Durability.TRANSIENT) {

                transientUsage += breakerUsed;

            } else if (breaker.getDurability() == CircuitBreaker.Durability.PERMANENT) {

                permanentUsage += breakerUsed;

            }

        }

        if (this.trackRealMemoryUsage) {

            final long current = currentMemoryUsage();

            return new MemoryUsage(current, current + newBytesReserved, transientUsage, permanentUsage);

        } else {

            long parentEstimated = transientUsage + permanentUsage;

            return new MemoryUsage(parentEstimated, parentEstimated, transientUsage, permanentUsage);

        }

    } 

  trackRealMemoryUsage的值(取自该配置indices.breaker.total.use_real_memory)决定了是使用实际的内存使用量还是child circuit breakers的内存使用量来判断熔断; 官方解释如下:


 Static setting determining whether the parent breaker should take real memory usage into account (true) or only consider the amount that is reserved by child circuit breakers (false). Defaults to true


 


总结:2021年3月17日中午11点50开始修改线上DATA节点配置:indices.breaker.total.use_real_memory:false 并且滚动重启了线上集群;


 


今天是2021年3月18日,昨天中午更新完该配置,昨天晚上18:30对集群进行了业务压测,未见该异常出现;(没改前,压力测试集群会掉点,并且由于分片漂移导致集群变yellow);

-----------------------------------

elasticsearch报Data too large异常

https://blog.51cto.com/u_15162069/2772176


分享到微信 分享到微博 分享到QQ空间

本文标签

关于我们 - 网站声明 - 网站地图 - 资源地图 - 友情链接 - 网站客服 - 联系我们

copyright@ 2008-2021 悦读文库网站版权所有

备案ICP备案号:京ICP备18064502号-6



收起
展开