-
Bug
-
Resolution: Done
-
Medium
-
None
-
None
-
None
-
Found in OOM branch release-1.0.0 and in master.
K8s 1.7 and 1.5
-
OOM Sprint 2, OOM Sprint 3, OOM Sprint 4, OOM Sprint 5
AAI's hbase container was late to have its database externally mounted to prevent data loss int the event of an HBASE restart/upgrade. It does not seem to work in the wild.
The hbase container never transitions to ready meaning the app is actually dead within the container.
I remove the "hbase-opt-data" entries from the hbase yaml and it starts up normally.
onap-aai hbase-2720973979-nwskz 0/1 Running 0 3m 10.42.84.219 kubernetes-6
Logs from inside:
2017-08-23 19:38:02,395 INFO [main] zookeeper.ZooKeeper: Initiating client connection, connectString=hbase:2181 sessionTimeout=90000 watcher=master:160000x0, quorum=hbase:2181, baseZNode=/hbase 2017-08-23 19:38:02,407 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:02,412 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:03,516 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:03,516 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:04,617 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:04,617 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:05,717 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:05,718 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:06,818 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:06,819 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:07,919 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:07,919 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:09,020 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:09,020 WARN [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361) at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) 2017-08-23 19:38:10,121 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:10,121 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Socket connection established to hbase/10.42.84.219:2181, initiating session 2017-08-23 19:38:10,143 INFO [main-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session establishment complete on server hbase/10.42.84.219:2181, sessionid = 0x15e109a20770000, negotiated timeout = 90000 2017-08-23 19:38:10,200 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: starting 2017-08-23 19:38:10,200 INFO [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: starting 2017-08-23 19:38:10,263 INFO [main] mortbay.log: Logging to org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via org.mortbay.log.Slf4jLog 2017-08-23 19:38:10,266 INFO [main] http.HttpRequestLog: Http request log for http.requests.master is not defined 2017-08-23 19:38:10,274 INFO [main] http.HttpServer: Added global filter 'safety' (class=org.apache.hadoop.hbase.http.HttpServer$QuotingInputFilter) 2017-08-23 19:38:10,274 INFO [main] http.HttpServer: Added global filter 'clickjackingprevention' (class=org.apache.hadoop.hbase.http.ClickjackingPreventionFilter) 2017-08-23 19:38:10,276 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context master 2017-08-23 19:38:10,276 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context static 2017-08-23 19:38:10,276 INFO [main] http.HttpServer: Added filter static_user_filter (class=org.apache.hadoop.hbase.http.lib.StaticUserWebFilter$StaticUserFilter) to context logs 2017-08-23 19:38:10,288 INFO [main] http.HttpServer: Jetty bound to port 16010 2017-08-23 19:38:10,288 INFO [main] mortbay.log: jetty-6.1.26 2017-08-23 19:38:10,588 INFO [main] mortbay.log: Started SelectChannelConnector@0.0.0.0:16010 2017-08-23 19:38:10,593 INFO [main] master.HMaster: hbase.rootdir=hdfs://hbase:8020/hbase, hbase.cluster.distributed=true 2017-08-23 19:38:10,603 INFO [main] master.HMaster: Adding backup master ZNode /hbase/backup-masters/hbase,16000,1503517081659 2017-08-23 19:38:10,664 INFO [hbase:16000.activeMasterManager] master.ActiveMasterManager: Deleting ZNode for /hbase/backup-masters/hbase,16000,1503517081659 from backup master directory 2017-08-23 19:38:10,670 INFO [hbase:16000.activeMasterManager] master.ActiveMasterManager: Registered Active Master=hbase,16000,1503517081659 2017-08-23 19:38:10,705 INFO [master/hbase/10.42.84.219:16000] zookeeper.RecoverableZooKeeper: Process identifier=hconnection-0x2ef3fd2b connecting to ZooKeeper ensemble=hbase:2181 2017-08-23 19:38:10,705 INFO [master/hbase/10.42.84.219:16000] zookeeper.ZooKeeper: Initiating client connection, connectString=hbase:2181 sessionTimeout=90000 watcher=hconnection-0x2ef3fd2b0x0, quorum=hbase:2181, baseZNode=/hbase 2017-08-23 19:38:10,706 INFO [master/hbase/10.42.84.219:16000-SendThread(hbase:2181)] zookeeper.ClientCnxn: Opening socket connection to server hbase/10.42.84.219:2181. Will not attempt to authenticate using SASL (unknown error) 2017-08-23 19:38:10,706 INFO [master/hbase/10.42.84.219:16000-SendThread(hbase:2181)] zookeeper.ClientCnxn: Socket connection established to hbase/10.42.84.219:2181, initiating session 2017-08-23 19:38:10,716 INFO [master/hbase/10.42.84.219:16000-SendThread(hbase:2181)] zookeeper.ClientCnxn: Session establishment complete on server hbase/10.42.84.219:2181, sessionid = 0x15e109a20770001, negotiated timeout = 90000 2017-08-23 19:38:10,717 INFO [master/hbase/10.42.84.219:16000] client.ZooKeeperRegistry: ClusterId read in ZooKeeper is null 2017-08-23 19:38:10,729 FATAL [hbase:16000.activeMasterManager] master.HMaster: Failed to become active master java.net.ConnectException: Call From hbase/10.42.84.219 to hbase:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:602) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279) at com.sun.proxy.$Proxy16.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2264) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:986) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:970) at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:525) at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:971) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:424) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153) at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:654) at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:186) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1762) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463) at org.apache.hadoop.ipc.Client.call(Client.java:1382) ... 29 more 2017-08-23 19:38:10,730 FATAL [hbase:16000.activeMasterManager] master.HMaster: Unhandled exception. Starting shutdown. java.net.ConnectException: Call From hbase/10.42.84.219 to hbase:8020 failed on connection exception: java.net.ConnectException: Connection refused; For more details see: http://wiki.apache.org/hadoop/ConnectionRefused at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:783) at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:730) at org.apache.hadoop.ipc.Client.call(Client.java:1415) at org.apache.hadoop.ipc.Client.call(Client.java:1364) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:206) at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187) at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102) at com.sun.proxy.$Proxy15.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.setSafeMode(ClientNamenodeProtocolTranslatorPB.java:602) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.hbase.fs.HFileSystem$1.invoke(HFileSystem.java:279) at com.sun.proxy.$Proxy16.setSafeMode(Unknown Source) at org.apache.hadoop.hdfs.DFSClient.setSafeMode(DFSClient.java:2264) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:986) at org.apache.hadoop.hdfs.DistributedFileSystem.setSafeMode(DistributedFileSystem.java:970) at org.apache.hadoop.hbase.util.FSUtils.isInSafeMode(FSUtils.java:525) at org.apache.hadoop.hbase.util.FSUtils.waitOnSafeMode(FSUtils.java:971) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:424) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:153) at org.apache.hadoop.hbase.master.MasterFileSystem.<init>(MasterFileSystem.java:128) at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:654) at org.apache.hadoop.hbase.master.HMaster.access$600(HMaster.java:186) at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1762) at java.lang.Thread.run(Thread.java:745) Caused by: java.net.ConnectException: Connection refused at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:744) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:529) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:493) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:606) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:700) at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:367) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1463) at org.apache.hadoop.ipc.Client.call(Client.java:1382) ... 29 more 2017-08-23 19:38:10,731 INFO [hbase:16000.activeMasterManager] regionserver.HRegionServer: STOPPED: Unhandled exception. Starting shutdown. 2017-08-23 19:38:13,734 INFO [master/hbase/10.42.84.219:16000] ipc.RpcServer: Stopping server on 16000 2017-08-23 19:38:13,734 INFO [RpcServer.listener,port=16000] ipc.RpcServer: RpcServer.listener,port=16000: stopping 2017-08-23 19:38:13,734 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped 2017-08-23 19:38:13,734 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping 2017-08-23 19:38:13,735 INFO [master/hbase/10.42.84.219:16000] regionserver.HRegionServer: Stopping infoServer 2017-08-23 19:38:13,744 INFO [master/hbase/10.42.84.219:16000] mortbay.log: Stopped SelectChannelConnector@0.0.0.0:16010 2017-08-23 19:38:13,845 INFO [master/hbase/10.42.84.219:16000] regionserver.HRegionServer: stopping server hbase,16000,1503517081659 2017-08-23 19:38:13,845 INFO [master/hbase/10.42.84.219:16000] client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x15e109a20770001 2017-08-23 19:38:13,848 INFO [master/hbase/10.42.84.219:16000] zookeeper.ZooKeeper: Session: 0x15e109a20770001 closed 2017-08-23 19:38:13,848 INFO [master/hbase/10.42.84.219:16000-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-08-23 19:38:13,850 INFO [master/hbase/10.42.84.219:16000] regionserver.HRegionServer: stopping server hbase,16000,1503517081659; all regions closed. 2017-08-23 19:38:13,850 INFO [master/hbase/10.42.84.219:16000] hbase.ChoreService: Chore service for: hbase,16000,1503517081659 had [] on shutdown 2017-08-23 19:38:13,856 INFO [master/hbase/10.42.84.219:16000] ipc.RpcServer: Stopping server on 16000 2017-08-23 19:38:13,862 INFO [master/hbase/10.42.84.219:16000] zookeeper.ZooKeeper: Session: 0x15e109a20770000 closed 2017-08-23 19:38:13,862 INFO [master/hbase/10.42.84.219:16000] regionserver.HRegionServer: stopping server hbase,16000,1503517081659; zookeeper connection closed. 2017-08-23 19:38:13,862 INFO [main-EventThread] zookeeper.ClientCnxn: EventThread shut down 2017-08-23 19:38:13,862 INFO [master/hbase/10.42.84.219:16000] regionserver.HRegionServer: master/hbase/10.42.84.219:16000 exiting
There are no Sub-Tasks for this issue.