一次oplog损坏事故-创新互联
问题
成都创新互联网站建设公司一直秉承“诚信做人,踏实做事”的原则,不欺瞒客户,是我们最起码的底线! 以服务为基础,以质量求生存,以技术求发展,成交一个客户多一个朋友!专注中小微企业官网定制,网站设计制作、成都网站建设,塑造企业网络形象打造互联网企业效应。1)程序报警
Execution Timeout Couldn't get a connection within the time limit2)mongod日志
Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.122+0800 I NETWORK [initandlisten] connection accepted from 10.0.0.1:24321 #32 (32 connections now open) Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.136+0800 I ACCESS [conn32] Successfully authenticated as principal __system on local Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.349+0800 I - [rsSync] Assertion: 10334:BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.358+0800 I CONTROL [rsSync] Jun 11 21:48:35 mongod mongod: 0x132c032 0x12c9988 0x12b29f8 0x12b2aac 0x9da659 0xae692f 0x106e4dc 0x1066b1e 0x1066d69 0xfd64f5 0xaea0fe 0xaea621 0xebe304 0xf563ae 0xf57c78 0xf4d29b 0x1b5c330 0x7efd30830dc5 0x7efd3055f73d Jun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE ----- Jun 11 21:48:35 mongod mongod: {"backtrace":[{"b":"400000","o":"F2C032","s":"_ZN5mongo15printStackTraceERSo"},{"b":"400000","o":"EC9988","s":"_ZN5mongo10logContextEPKc"},{"b":"400000","o":"EB29F8","s":"_ZN5mongo11msgassertedEiPKc"},{"b":"400000","o":"EB2AAC"},{"b":"400000","o":"5DA659","s":"_ZNK5mongo7BSONObj14_assertInvalidEv"},{"b":"400000","o":"6E692F","s":"_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE"},{"b":"400000","o":"C6E4DC","s":"_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib"},{"b":"400000","o":"C66B1E","s":"_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"C66D69","s":"_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib"},{"b":"400000","o":"BD64F5","s":"_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb"},{"b":"400000","o":"6EA0FE","s":"_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b"},{"b":"400000","o":"6EA621","s":"_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb"},{"b":"400000","o":"ABE304","s":"_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE"},{"b":"400000","o":"B563AE","s":"_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE"},{"b":"400000","o":"B57C78","s":"_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE"},{"b":"400000","o":"B4D29B","s":"_ZN5mongo4repl13runSyncThreadEv"},{"b":"400000","o":"175C330","s":"execute_native_thread_routine"},{"b":"7EFD30829000","o":"7DC5"},{"b":"7EFD30468000","o":"F773D","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.2.12", "gitVersion" : "ef3e1bc78e997f0d9f22f45aeb1d8e3b6ac14a14", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "3.10.0-514.6.2.el7.x86_64", "version" : "#1 SMP Thu Feb 23 03:04:39 UTC 2017", "machine" : "x86_64" }, "somap" : [ { Jun 11 21:48:35 mongod mongod: "elfType" : 2, "b" : "400000", "buildId" : "BAAC1A970F6D0F06B88D0DE75BF06E4C260939EC" }, { "b" : "7FFE4D008000", "elfType" : 3, "buildId" : "7112DB1E073B211AB0CF8DE793F40F3FF996F5B4" }, { "b" : "7EFD31753000", "path" : "/lib64/libssl.so.10", "elfType" : 3, "buildId" : "90EAF65D9B0EEEB1424241281F7F197451D4317D" }, { "b" : "7EFD31369000", "path" : "/lib64/libcrypto.so.10", "elfType" : 3, "buildId" : "7278C69EE161D98DDD0FA00F92B67AD78C7B7F40" }, { "b" : "7EFD31161000", "path" : "/lib64/librt.so.1", "elfType" : 3, "buildId" : "82E77ADE22BC9FFF8D3458BD37331E7EDF174C28" }, { "b" : "7EFD30F5D000", "path" : "/lib64/libdl.so.2", "elfType" : 3, "buildId" : "C5F560504E1AF52E29679C3B52FF11121015D6BB" }, { "b" : "7EFD30C5B000", "path" : "/lib64/libm.so.6", "elfType" : 3, "buildId" : "721C7CC9488EFA25F83B48AF713AB27DBE48EF3E" }, { "b" : "7EFD30A45000", "path" : "/lib64/libgcc_s.so.1", "elfType" : 3, "buildId" : "408B46E291B2D4C9612E27C0509D165D7E186D40" }, { "b" : "7EFD30829000", "path" : "/lib64/libpthread.so.0", "elfType" : 3, "buildId" : "C3DEB1FA27CD0C1C3CC575B944ABACBA0698B0F2" }, { "b" : "7EFD30468000", "path" : "/lib64/libc.so.6", "elfType" : 3, "buildId" : "8B2C421716985B927AA0CAF2A05D0B1F452367F7" }, { "b" : "7EFD319C1000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "8F3E366E2DB73C330A3791DEAE31AE9579099B44" }, { "b" : "7EFD3021A000", "path" : "/lib64/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A2499C359AA179EE23324ED949C0E508E4434F10" }, { "b" : "7EFD2FF33000", "path" : "/lib64/libkrb5.so.3", "elfType" : 3, "buildId" : "E09A34D9083DC6FEAF7018C09D55631DEEE2836D" }, { "b" : "7EFD2FD2F000", "path" : "/lib64/libcom_err.so.2", "elfType" : 3, "buildId" : "BF54B7C8932E450769FBBB8B18864D1DD70BBC67" }, { "b" : "7EFD2FAFD000", "path" : "/lib64/libk5crypto.so.3", "elfType" : 3, "buildId" : "BF8F00D7CB849ADB0B7A4703BC7B8D66AEE6A49C" }, { "b" : "7EFD2F8E7000", "path" : "/lib64/libz.so.1", "elfType" : 3, "buildId" : "EA8E45DC8E395CC5E26890470112D97A1F1E0B65" }, { "b" : "7EFD2F6D8000", "path" : "/lib64 Jun 11 21:48:35 mongod mongod: /libkrb5support.so.0", "elfType" : 3, "buildId" : "1E7A92FDD6FB3871DA97F4BCA2E147E72B6B6E1F" }, { "b" : "7EFD2F4D4000", "path" : "/lib64/libkeyutils.so.1", "elfType" : 3, "buildId" : "2E01D5AC08C1280D013AAB96B292AC58BC30A263" }, { "b" : "7EFD2F2BA000", "path" : "/lib64/libresolv.so.2", "elfType" : 3, "buildId" : "FE7AE845A123A3DFC0FDC2408BCBC2BA8B61B158" }, { "b" : "7EFD2F093000", "path" : "/lib64/libselinux.so.1", "elfType" : 3, "buildId" : "76687CA31A406854DF3BCF8D03055656F56E6892" }, { "b" : "7EFD2EE32000", "path" : "/lib64/libpcre.so.1", "elfType" : 3, "buildId" : "AE64AA461A26E01F60408013D361749D56DD0AE1" } ] }} Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo15printStackTraceERSo+0x32) [0x132c032] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10logContextEPKc+0x138) [0x12c9988] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11msgassertedEiPKc+0x88) [0x12b29f8] Jun 11 21:48:35 mongod mongod: mongod(+0xEB2AAC) [0x12b2aac] Jun 11 21:48:35 mongod mongod: mongod(_ZNK5mongo7BSONObj14_assertInvalidEv+0x3B9) [0x9da659] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection19aboutToDeleteCappedEPNS_16OperationContextERKNS_8RecordIdENS_10RecordDataE+0xBF) [0xae692f] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo19CappedRecordStoreV111allocRecordEPNS_16OperationContextEib+0x46C) [0x106e4dc] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base13_insertRecordEPNS_16OperationContextEPKcib+0x5E) [0x1066b1e] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo17RecordStoreV1Base12insertRecordEPNS_16OperationContextEPKcib+0xA9) [0x1066d69] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo11RecordStore13insertRecordsEPNS_16OperationContextEPSt6vectorINS_6RecordESaIS4_EEb+0xB5) [0xfd64f5] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection16_insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_b+0x16E) [0xaea0fe] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo10Collection15insertDocumentsEPNS_16OperationContextEN9__gnu_cxx17__normal_iteratorIPKNS_7BSONObjESt6vectorIS5_SaIS5_EEEESB_bb+0x1B1) [0xaea621] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl15writeOpsToOplogEPNS_16OperationContextERKSt6vectorINS_7BSONObjESaIS4_EE+0x144) [0xebe304] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail10multiApplyEPNS_16OperationContextERKNS1_7OpQueueE+0x98E) [0xf563ae] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl8SyncTail16oplogApplicationEPNS0_16StorageInterfaceE+0xD08) [0xf57c78] Jun 11 21:48:35 mongod mongod: mongod(_ZN5mongo4repl13runSyncThreadEv+0x2BB) [0xf4d29b] Jun 11 21:48:35 mongod mongod: mongod(execute_native_thread_routine+0x20) [0x1b5c330] Jun 11 21:48:35 mongod mongod: libpthread.so.0(+0x7DC5) [0x7efd30830dc5] Jun 11 21:48:35 mongod mongod: libc.so.6(clone+0x6D) [0x7efd3055f73d] Jun 11 21:48:35 mongod mongod: ----- END BACKTRACE ----- Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.389+0800 F - [rsSync] terminate() called. An exception is active; attempting to gather more information Jun 11 21:48:35 mongod mongod: 2017-06-11T21:48:35.393+0800 F - [rsSync] DBException::toString(): 10334 BSONObj size: 0 (0x0) is invalid. Size must be between 0 and 16793600(16MB) First element: EOO Jun 11 21:48:35 mongod mongod: Actual exception type: mongo::MsgAssertionException Jun 11 21:48:35 mongod mongod: 0x132c032 0x132bb82 0x1b14356 0x1b14383 0xf4d3e0 0x1b5c330 0x7efd30830dc5 0x7efd3055f73d Jun 11 21:48:35 mongod mongod: ----- BEGIN BACKTRACE -----一、集群环境
3个mongos、3个mongo config、3个shard(每个分片各三台,一主两从)
二、事故描述
次数1:某次活动,由于并发太多,导致mongod连接数太多,内存消耗太多(每个连接上来会分配一定的内存空间),导致整体query慢,产生大量堆积(应用报超时等错误)。直接重启mongod,重启前未stepDown,其中一个从节点2发现错误。临时剔除节点,打算后续有时间解决(原因未知)
次数2:此时距离上次事故6天,节点2还未修复。同一个分片主节点1意外出现同样的错误(原因未知)此时只有一个主节点,熬夜紧急修复这两个节点
次数3:距离第二次事故两天,同一个分片主节点3又出现了同样的错误(原因未知)此时一阵冷汗,同一个分片轮番坏了一遍,辛亏解决的及时。
三、解决步骤
1、查看报错信息,感觉像是数据块损坏问题。并且此分片不能做更新操作,但是可以正常查询。
2、问题定位,oplog.rs损坏
> db > local db.oplog.rs.find() error: { "$err" : "BSONObj size: 1852142352 (0x1073656E) is invalid. Size must be between 0 and 16793600(16MB) First element: Status: ?type=100", "code" : 10334 }
另外有需要云服务器可以了解下创新互联cdcxhl.cn,海内外云服务器15元起步,三天无理由+7*72小时售后在线,公司持有idc许可证,提供“云服务器、裸金属服务器、高防服务器、香港服务器、美国服务器、虚拟主机、免备案服务器”等云主机租用服务以及企业上云的综合解决方案,具有“安全稳定、简单易用、服务可用性高、性价比高”等特点与优势,专为企业上云打造定制,能够满足用户丰富、多元化的应用场景需求。
文章题目:一次oplog损坏事故-创新互联
网页链接:http://scyanting.com/article/djejde.html