S
S
SchmeL2016-04-05 12:21:09
Oracle
SchmeL, 2016-04-05 12:21:09

How to defragment ocfs2?

Greetings, there are 2 nodes, on Oracle Linux 7, to which lun is connected on raid10, via SAS. The size is approximately 8TB. The disk is formatted in ocfs2 and is used simultaneously from both nodes. After formatting the disk via rsync, it was synchronized from the old storage, about 1.5 TB. Everything was fine for a year and a half, but in the last month there has been a terrible drop in performance, mainly when trying to write to disk. During this time, the amount of data has grown to 3TB. When the place is cleared, the situation normalizes a little, but not for long. as soon as the amount of data exceeds ~ 40% of the total space, the write speed drops again. There is an assumption that fragmentation is to blame for everything. Initially, the partition was created with a cluster size of 4kb, but in the Oracle instructions ( https://docs.oracle.com/cd/E37670_01/E37355/html/o...) it is indicated that for such a volume the cluster size should be 64kb, I don’t know yet whether reformatting will help. Are there any utilities to defragment ocfs2? Maybe it's not fragmentation? If not, is it possible to do without reformatting the partition?

The current configuration is:
IBM Storwize V3700, 4x4TB HHD in RAID10.
Linux Oracle 7, Kernel: 3.8.13-118.4.2.el7uek.x86_64
OCFS2 1.8.

# dd if=/dev/zero of=/storage/tmp.dd bs=100M count=10
10+0 записей получено
10+0 записей отправлено
 скопировано 1048576000 байт (1,0 GB), 70,4514 c, 14,9 MB/c


[[email protected] ~]# df -h
/dev/sdb1             7,3T         2,8T  4,6T           39% /storage
[[email protected] ~]# df -hi
/dev/sdb1             1,9G          709M      1,2G            39% /storage


# modinfo ocfs2
filename:       /lib/modules/3.8.13-118.4.2.el7uek.x86_64/kernel/fs/ocfs2/ocfs2.ko
license:        GPL
author:         Oracle
version:        1.8.0
description:    OCFS2 1.8.0
srcversion:     C2F2928C6340706B57561A8
depends:        jbd2,ocfs2_stackglue,ocfs2_nodemanager
intree:         Y
vermagic:       3.8.13-118.4.2.el7uek.x86_64 SMP mod_unload modversions 
signer:         Oracle CA Server
sig_key:        BC:51:CE:95:28:97:32:9F:78:F8:42:9C:3B:A0:C5:57:2B:7D:FE:AD
sig_hashalgo:   sha512


# cat /proc/mounts

ocfs2_dlmfs /dlm ocfs2_dlmfs rw,relatime 0 0
/dev/sdb1 /storage ocfs2 rw,relatime,_netdev,heartbeat=local,nointr,data=ordered,errors=remount-ro,atime_quantum=60,coherency=buffered,user_xattr,acl 0 0


Here is the kernel output after deploying the 30GB postgresql database from the sql dump.
# less /var/log/messages

Mar 28 04:46:39 node-1 kernel: INFO: task postgres:10708 blocked for more than 120 seconds.
Mar 28 04:46:39 node-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 28 04:46:39 node-1 kernel: postgres        D ffff880667c73100     0 10708  10696 0x00000180
Mar 28 04:46:39 node-1 kernel: ffff880577511820 0000000000000082 ffff880584244100 ffff880577511fd8
Mar 28 04:46:39 node-1 kernel: ffff880577511fd8 ffff880577511fd8 ffff880584244100 7fffffffffffffff
Mar 28 04:46:39 node-1 kernel: ffff8805775119f8 ffff880577511a00 ffff880584244100 0000000000000000
Mar 28 04:46:39 node-1 kernel: Call Trace:
Mar 28 04:46:39 node-1 kernel: [<ffffffff8157d539>] schedule+0x29/0x70
Mar 28 04:46:39 node-1 kernel: [<ffffffff8157bb79>] schedule_timeout+0x1d9/0x2e0
Mar 28 04:46:39 node-1 kernel: [<ffffffffa0752849>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa008f020>] ? o2dlm_lock_ast_wrapper+0x20/0x20 [ocfs2_stack_o2cb]
Mar 28 04:46:39 node-1 kernel: [<ffffffff8157e99e>] ? _raw_spin_lock+0xe/0x20
Mar 28 04:46:39 node-1 kernel: [<ffffffffa07047a2>] ? ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa0752849>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffff8157da0a>] wait_for_common+0x11a/0x170
Mar 28 04:46:39 node-1 kernel: [<ffffffff81093130>] ? wake_up_state+0x20/0x20
Mar 28 04:46:39 node-1 kernel: [<ffffffff8157da7d>] wait_for_completion+0x1d/0x20
Mar 28 04:46:39 node-1 kernel: [<ffffffffa06f86f0>] __ocfs2_cluster_lock.isra.31+0x1a0/0x830 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa07047a2>] ? ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa0752849>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffff8157e99e>] ? _raw_spin_lock+0xe/0x20
Mar 28 04:46:39 node-1 kernel: [<ffffffffa06f9d31>] ocfs2_inode_lock_full_nested+0x1f1/0x4d0 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa0712f6f>] ocfs2_lookup_lock_orphan_dir.constprop.25+0x5f/0x1b0 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa071547d>] ocfs2_prepare_orphan_dir+0x3d/0x2a0 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffffa0717889>] ocfs2_rename+0x10d9/0x1ae0 [ocfs2]
Mar 28 04:46:39 node-1 kernel: [<ffffffff81195e88>] vfs_rename+0x358/0x4c0
Mar 28 04:46:39 node-1 kernel: [<ffffffff81198a81>] sys_renameat+0x391/0x430
Mar 28 04:46:39 node-1 kernel: [<ffffffff81020a93>] ? syscall_trace_enter+0x223/0x240
Mar 28 04:46:39 node-1 kernel: [<ffffffff81198b3b>] sys_rename+0x1b/0x20
Mar 28 04:46:39 node-1 kernel: [<ffffffff815871c7>] tracesys+0xdd/0xe2

Mar 30 05:32:39 node-1 kernel: INFO: task postgres:10708 blocked for more than 120 seconds.
Mar 30 05:32:39 node-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 30 05:32:39 node-1 kernel: postgres        D ffff880667cf3100     0 10708  10696 0x00000180
Mar 30 05:32:39 node-1 kernel: ffff880577511820 0000000000000082 ffff880584244100 ffff880577511fd8
Mar 30 05:32:39 node-1 kernel: ffff880577511fd8 ffff880577511fd8 ffff880584244100 7fffffffffffffff
Mar 30 05:32:39 node-1 kernel: ffff8805775119f8 ffff880577511a00 ffff880584244100 0000000000000000
Mar 30 05:32:39 node-1 kernel: Call Trace:
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157d539>] schedule+0x29/0x70
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157bb79>] schedule_timeout+0x1d9/0x2e0
Mar 30 05:32:39 node-1 kernel: [<ffffffffa0752849>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa008f020>] ? o2dlm_lock_ast_wrapper+0x20/0x20 [ocfs2_stack_o2cb]
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157e99e>] ? _raw_spin_lock+0xe/0x20
Mar 30 05:32:39 node-1 kernel: [<ffffffffa07047a2>] ? ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa0752849>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157da0a>] wait_for_common+0x11a/0x170
Mar 30 05:32:39 node-1 kernel: [<ffffffff81093130>] ? wake_up_state+0x20/0x20
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157da7d>] wait_for_completion+0x1d/0x20
Mar 30 05:32:39 node-1 kernel: [<ffffffffa06f86f0>] __ocfs2_cluster_lock.isra.31+0x1a0/0x830 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa07047a2>] ? ocfs2_inode_cache_unlock+0x12/0x20 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa0752849>] ? ocfs2_metadata_cache_unlock+0x19/0x20 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157e99e>] ? _raw_spin_lock+0xe/0x20
Mar 30 05:32:39 node-1 kernel: [<ffffffffa06f9d31>] ocfs2_inode_lock_full_nested+0x1f1/0x4d0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa0712f6f>] ocfs2_lookup_lock_orphan_dir.constprop.25+0x5f/0x1b0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa071547d>] ocfs2_prepare_orphan_dir+0x3d/0x2a0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa0717889>] ocfs2_rename+0x10d9/0x1ae0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffff81195e88>] vfs_rename+0x358/0x4c0
Mar 30 05:32:39 node-1 kernel: [<ffffffff81198a81>] sys_renameat+0x391/0x430
Mar 30 05:32:39 node-1 kernel: [<ffffffff81020a93>] ? syscall_trace_enter+0x223/0x240
Mar 30 05:32:39 node-1 kernel: [<ffffffff81198b3b>] sys_rename+0x1b/0x20
Mar 30 05:32:39 node-1 kernel: [<ffffffff815871c7>] tracesys+0xdd/0xe2
Mar 30 05:32:39 node-1 kernel: INFO: task java:7710 blocked for more than 120 seconds.
Mar 30 05:32:39 node-1 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Mar 30 05:32:39 node-1 kernel: java            D ffff880667c73100     0  7729      1 0x00000080
Mar 30 05:32:39 node-1 kernel: ffff880463aefc40 0000000000000082 ffff88042a7e2380 ffff880463aeffd8
Mar 30 05:32:39 node-1 kernel: ffff880463aeffd8 ffff880463aeffd8 ffff88042a7e2380 ffff880c5328d5e8
Mar 30 05:32:39 node-1 kernel: ffff880c5328d5ec ffff88042a7e2380 00000000ffffffff ffff880c5328d5f0
Mar 30 05:32:39 node-1 kernel: Call Trace:
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157dbe9>] schedule_preempt_disabled+0x29/0x70
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157c617>] __mutex_lock_slowpath+0x117/0x1d0
Mar 30 05:32:39 node-1 kernel: [<ffffffff8157bf8f>] mutex_lock+0x1f/0x30
Mar 30 05:32:39 node-1 kernel: [<ffffffffa0712f59>] ocfs2_lookup_lock_orphan_dir.constprop.25+0x49/0x1b0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa071547d>] ocfs2_prepare_orphan_dir+0x3d/0x2a0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffffa071631c>] ocfs2_unlink+0x73c/0xbd0 [ocfs2]
Mar 30 05:32:39 node-1 kernel: [<ffffffff811955c0>] vfs_rmdir+0xc0/0x120
Mar 30 05:32:39 node-1 kernel: [<ffffffff8119579d>] do_rmdir+0x17d/0x1d0
Mar 30 05:32:39 node-1 kernel: [<ffffffff8107dd8c>] ? task_work_run+0xac/0xe0
Mar 30 05:32:39 node-1 kernel: [<ffffffff81198316>] sys_rmdir+0x16/0x20
Mar 30 05:32:39 node-1 kernel: [<ffffffff81586fb9>] system_call_fastpath+0x16/0x1b

Answer the question

In order to leave comments, you need to log in

[[+comments_count]] answer(s)
S
SchmeL, 2016-04-15
@SchmeL

In general, there is no online defragmenter for it. And the matter is possible not only in fragmentation, but in the system itself. Although fragmentation played a significant role in lowering the record, the speed of the FS itself leaves much to be desired. I tried to format the FS with the recommended cluster size for 8TB - 64k, but after mounting the volume and writing the main data to this partition, everything hung after 200-300GB. I had to unmount and divide the volume in the storage system into several sections.
allocated 2 lun in 10m raid. Network 1Gb\s dedicated, between servers. (ocfs2 manages network locks)
lower write speed on ext4 and ocfs2
[[email protected] ~]# dd if=/dev/zero of=/mnt/ext4/5G bs=1M count=5000
5000+0 writes received
5000 +0 records sent
copied 5242880000 bytes (5.2 GB), 4.92276 s, 1.1 GB/s
[[email protected] ~]# dd if=/dev/zero of=/mnt/ocfs2/5G bs=1M count= 5000
5000+0 records received
5000+0 records sent
copied 5242880000 bytes (5.2 GB), 57.8145 s, 90.7 MB/s

Didn't find what you were looking for?

Ask your question

Ask a Question

731 491 924 answers to any question