Answer the question
In order to leave comments, you need to log in
How do I know which memory modules are failing (what motherboard slots they are in)?
Hello!
In syslog, I saw a lot of memory error messages like this:
May 28 06:26:52 ru-tul-dc01-mon02 kernel: [317191.270724] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
May 28 06:26:52 ru-tul-dc01-mon02 kernel: [317191.270726] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 5: cc1d730000010092
May 28 06:26:52 ru-tul-dc01-mon02 kernel: [317191.271141] EDAC sbridge MC0: TSC 0
May 28 06:26:52 ru-tul-dc01- mon02 kernel: [317191.271142] EDAC sbridge MC0: ADDR 7f80bf80
May 28 06:26:52 ru-tul-dc01-mon02 kernel: [317191.271143] EDAC sbridge MC0: MISC 4078f886
May 28 06:26:52 ru-tul mon02 kernel: [317191.271145] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1527478012 SOCKET 0 APIC 0
May 28 06:26:52 ru-tul-dc01-mon02 kernel: [317191.271425] EDAC MC0: 30156 CE memory read error on CPU_SrcID#0_Ha#0_Chan#2_DIMM#0 (channel:2 slot:0 page:0x7f80b offset:0xf80 grain:32 syndrome:0x0 - OVERFLOW area:DRAM err_code:0001:0092 socket:0 ha:0 channel_mask:4 rank:1)
May 28 06:26:44 ru-tul-dc01-mon02 kernel: [317183.270202] EDAC sbridge MC0: HANDLING MCE MEMORY ERROR
May 28 06:26:44 ru-tul-dc01-mon02 kernel: [317183.270203] EDAC sbridge MC0: CPU 0: Machine Check Event: 0 Bank 9: 8c000050000800c1
May 28 06:26:44 en- tul-dc01-mon02 kernel: [317183.270204] EDAC sbridge MC0: TSC 0
May 28
06:26:44 tul-dc01-mon02 kernel: [317183.270206] EDAC sbridge MC0: MISC 90000000000208c
May 28 06:26:44 ru-tul-dc01-mon02 kernel: [317183.270207] EDAC sbridge MC0: PROCESSOR 0:206d7 TIME 1527478004 SOCKET 0 APIC 0
May 28 06:26:44 ru-tul-dc01-mon02 kernel: [ 317183.270217] EDAC MC0: 1 CE memory scrubbing error on CPU_SrcID#0_Ha#0_Chan#0_DIMM#0 (channel:0 slot:0 page:0x794717 offset:0x0 grain:32 syndrome:0x0 - area:DRAM err_code:0008:00c1 socket: 0 ha:0 channel_mask:1 rank:0)
# dmidecode -t memory | the grep 'Locator: P'
Locator: the P1-DIMMA1
Bank Locator: P0_Node0_Channel0_Dimm0
Locator: the P1-DIMMA2
Bank Locator: P0_Node0_Channel0_Dimm1
Locator: the P1-DIMMA3
Bank Locator: P0_Node0_Channel0_Dimm2
Locator: the P1-DIMMB1
Bank Locator: P0_Node0_Channel1_Dimm0
Locator: the P1-DIMMB2
Bank Locator: P0_Node0_Channel1_Dimm1
Locator: the P1-DIMMB3
Bank Locator: P0_Node0_Channel1_Dimm2
Locator: the P1-DIMMC1
Bank Locator: P0_Node0_Channel2_Dimm0
Locator: the P1-DIMMC2
Bank Locator: P0_Node0_Channel2_Dimm1
Locator: the P1-DIMMC3
Bank Locator: P0_Node0_Channel2_Dimm2
Locator: the P1-DIMMD1
Bank Locator: P0_Node0_Channel3_Dimm0
Locator: the P1-DIMMD2
Bank Locator: P0_Node0_Channel3_Dimm1
Locator: the P1-DIMMD3
Bank Locator: P0_Node0_Channel3_Dimm2
Locator: the P2-DIMME1
Bank Locator: P1_Node1_Channel0_Dimm0
Locator: the P2-DIMME2
Bank Locator: P1_Node1_Channel0_Dimm1
Locator: the P2-DIMME3
Bank Locator: P1_Node1_Channel0_Dimm2
Locator : P2-DIMMF1
Bank Locator: P1_Node1_Channel1_Dimm0
Locator: P2-DIMMF2
Bank Locator: P1_Node1_Channel1_Dimm1
Locator: P2-DIMMF3
Bank Locator: P1_Node1_Channel1_Dimm2
Locator: P2-DIMMG1
Bank Locator: P1_Node1_Channel2_Dimm
-DIMMGG2
Bank Locator: P1_Node1_Channel2_Dimm1
Locator: P2-DIMMG3
Bank Locator: P1_Node1_Channel2_Dimm2
Locator: P2-DIMMH1
Bank Locator: P1_Node1_Channel3_Dimm0
Locator: P2-DIMMH2
Bank Locator: P1_Node1_Channel3_Dihanmm1
Locator: P2-DIMMH
: P1
Answer the question
In order to leave comments, you need to log in
Light up the specifications for the memory controller and motherboard layout.
If you believe the kernel's self-identification, the first modules on channels 1 and 3 of the first socket will fail.
Pull out one memory module first, as PrAw wrote, run the test. Then pull out the second and conduct a test, and so on.
Didn't find what you were looking for?
Ask your questionAsk a Question
731 491 924 answers to any question