Instructions modified from:
sudo apt-get install netatalk
ATALKD_RUN=no PAPD_RUN=no CNID_METAD_RUN=yes AFPD_RUN=yes TIMELORD_RUN=no A2BOOT_RUN=no
- -transall -uamlist uams_randnum.so,uams_dhx.so,uams_dhx2.so -savepassword -advertise_ssh
/media/documents LinuxMediaServer allow:nyeates,shyeates,shyeates513 cnidscheme:cdb options:usedots,upriv
sudo /etc/init.d/netatalk restart
sudo apt-get install avahi-daemon
sudo apt-get install libnss-mdns
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4 mdns
<?xml version="1.0" standalone='no'?><!--*-nxml-*--> <!DOCTYPE service-group SYSTEM "avahi-service.dtd"> <service-group> <name replace-wildcards="yes">%h</name> <service> <type>_afpovertcp._tcp</type> <port>548</port> </service> <service> <type>_device-info._tcp</type> <port>0</port> <txt-record>model=Xserve</txt-record> </service> </service-group>
sudo /etc/init.d/avahi-daemon restart
sudo restart avahi-daemon
2-2.5 MBps transfers! Yay
as root:
root@lms:/home/nyeates1# fdisk -l /dev/sdd Disk /dev/sdd: 10.2 GB, 10245537792 bytes 255 heads, 63 sectors/track, 1245 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x3b14a989 Device Boot Start End Blocks Id System /dev/sdd1 * 1 1186 9526513 83 Linux /dev/sdd2 1187 1245 473917 5 Extended /dev/sdd5 1187 1245 473886 82 Linux swap / Solaris root@lms:/home/nyeates1# root@lms:/home/nyeates1# root@lms:/home/nyeates1# fdisk -l /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img You must set cylinders. You can do this from the extra functions menu. Disk /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img: 0 MB, 0 bytes 255 heads, 63 sectors/track, 0 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x3b14a989 Device Boot Start End Blocks Id System /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img1 * 1 1186 9526513 83 Linux Partition 1 has different physical/logical endings: phys=(1023, 254, 63) logical=(1185, 254, 63) /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img2 1187 1245 473917 5 Extended Partition 2 has different physical/logical beginnings (non-Linux?): phys=(1023, 254, 63) logical=(1186, 0, 1) Partition 2 has different physical/logical endings: phys=(1023, 254, 63) logical=(1244, 254, 63) /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img5 1187 1245 473886 82 Linux swap / Solaris
Whenever a partition table is printed out, a consistency check is performed on the partition table entries. This check verifies that the physical and logical start and end points are identical, and that the partition starts and ends on a cylinder boundary (except for the first partition).
Expert command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. WARNING: Re-reading the partition table failed with error 25: Inappropriate ioctl for device. The kernel still uses the old table. The new table will be used at the next reboot. Syncing disks.
Have decided that I will now go ahead with bitwise copy of 2x 320 GB PATA drives.
ls sd* sda sda1 sdb sdb1 sdc sdc1 sdd sdd1 sdd2 sdd5 sde sdf
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x860fdec6 Device Boot Start End Blocks Id System /dev/sda1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 * 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes 255 heads, 63 sectors/track, 121601 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x0008bc1e Device Boot Start End Blocks Id System /dev/sdc1 1 121601 976760001 fd Linux raid autodetect Disk /dev/sdd: 10.2 GB, 10245537792 bytes 255 heads, 63 sectors/track, 1245 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x3b14a989 Device Boot Start End Blocks Id System /dev/sdd1 * 1 1186 9526513 83 Linux /dev/sdd2 1187 1245 473917 5 Extended /dev/sdd5 1187 1245 473886 82 Linux swap / Solaris Disk /dev/sde: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x39713970 Device Boot Start End Blocks Id System Disk /dev/sdf: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x39713971 Device Boot Start End Blocks Id System Disk /dev/md0: 2000.4 GB, 2000404348928 bytes 2 heads, 4 sectors/track, 488379968 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn't contain a valid partition table
PV /dev/sdf VG Media lvm2 [298.06 GB / 0 free] PV /dev/sde VG Media lvm2 [298.06 GB / 0 free] Total: 2 [596.12 GB] / in use: 2 [596.12 GB] / in no VG: 0 [0 ]
as root:
305245 1 records in 305245 1 records out 320072933376 bytes (320 GB) copied, 6398.41 s, 50.0 MB/s
305245 1 records in 305245 1 records out 320072933376 bytes (320 GB) copied, 6559.71 s, 48.8 MB/s
-rw-r--r-- 1 root nyeates1 233G 2009-08-03 03:38 lms_pata_320GB_sdf.img ls -lah /media/documents/Backups/ -rw-r--r-- 1 root nyeates1 299G 2009-08-03 04:06 lms_pata_320GB_sdf.img nyeates1@lms:~$ cd /media/documents/Backups/ nyeates1@lms:/media/documents/Backups$ sudo chmod 444 lms_pata_320GB_sdf.img [sudo] password for nyeates1: nyeates1@lms:/media/documents/Backups$ ls -lah /media/documents/Backups/ -r--r--r-- 1 root nyeates1 299G 2009-08-03 04:06 lms_pata_320GB_sdf.img
root@lms:~# pvscan PV /dev/sdc VG Media lvm2 [298.06 GB / 0 free] PV /dev/sdb VG Media lvm2 [298.06 GB / 0 free] Total: 2 [596.12 GB] / in use: 2 [596.12 GB] / in no VG: 0 [0 ] root@lms:~# root@lms:~# pvdisplay --- Physical volume --- PV Name /dev/sdc VG Name Media PV Size 298.09 GB / not usable 29.34 MB Allocatable yes (but full) PE Size (KByte) 32768 Total PE 9538 Free PE 0 Allocated PE 9538 PV UUID L5MYAm-YTkK-WXJn-3Y8N-I04A-sKzc-SoJcUM --- Physical volume --- PV Name /dev/sdb VG Name Media PV Size 298.09 GB / not usable 29.34 MB Allocatable yes (but full) PE Size (KByte) 32768 Total PE 9538 Free PE 0 Allocated PE 9538 PV UUID qgw2Hv-ssfb-7g7G-jqXd-1TeQ-erKc-1B2PtT root@lms:~# root@lms:~# vgdisplay --- Volume group --- VG Name Media System ID Format lvm2 Metadata Areas 2 Metadata Sequence No 12 VG Access read/write VG Status resizable MAX LV 256 Cur LV 1 Open LV 0 Max PV 256 Cur PV 2 Act PV 2 VG Size 596.12 GB PE Size 32.00 MB Total PE 19076 Alloc PE / Size 19076 / 596.12 GB Free PE / Size 0 / 0 VG UUID MV2Hm7-NZ7O-drJ6-tDfe-TXhS-VHS1-ekqx38 root@lms:~# root@lms:~# lvdisplay --- Logical volume --- LV Name /dev/Media/CargoPlane VG Name Media LV UUID xwvmre-dhz6-cQ7L-rBLt-knUw-Q1NB-ZqMqVK LV Write Access read/write LV Status available # open 0 LV Size 596.12 GB Current LE 19076 Segments 2 Allocation inherit Read ahead sectors 0 Block device 253:0 root@lms:~# vgscan Reading all physical volumes. This may take a while... Found volume group "Media" using metadata type lvm2
root@lms:~# vgchange -a y 1 logical volume(s) in volume group "Media" now active
root@lms:/dev/mapper# mkdir /media/olddocuments root@lms:/dev/mapper# mount -t ext3 /dev/Media/CargoPlane /media/olddocuments/
apt-get install smartmontools
# smartctl -i /dev/sdc smartctl version 5.37 [x86_64-unknown-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/ === START OF INFORMATION SECTION === Model Family: Western Digital Caviar SE family Device Model: WDC WD3200JB-00KFA0 Serial Number: WD-WCAMR2482097 Firmware Version: 08.05J08 User Capacity: 320,072,933,376 bytes Device is: In smartctl database [for details use: -P show] ATA Version is: 6 ATA Standard is: Exact ATA specification draft version not indicated Local Time is: Wed Aug 5 02:43:48 2009 EDT SMART support is: Available - device has SMART capability. SMART support is: Enabled
smartctl -d ata -a /dev/sdc
smartctl -d ata -t long /dev/sdc
smartctl -l selftest /dev/sdc
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed without error 00% 10766 -
# 1 Extended offline Completed without error 00% 11043 - # 2 Extended captive Interrupted (host reset) 90% 11040 -
umount /dev/Media/CargoPlane
fsck /dev/Media/CargoPlane
Now that I have all of the old LVM data over on the production RAID (3x 1TB in RAID-5), I can focus my efforts on the production RAID and a new OS setup. First, I need to understand how to test, use, and fix RAID in case anything needs to change or something bad happens. I also had originally wanted to do these tests before implementing RAID, in order to make sure that it was a good decision. I skipped it tho, not having time, and now I will return to it.
I need to test important use cases like if I want to expand to another hard drive in the array, will it do this with ease and the data is safe? Also, if i want to switch OS distributions, will I be able to mount it? Another is error testing, what if one drive goes down, how do I repair?
See my list of RAID tests at: raid_test
root@lms:/dev# mkdir /media/transit2 root@lms:/dev# mount -t ntfs /dev/sde5 /media/transit2/
diff -qr /media/transit2/ /media/documents/transit2/
nano /etc/ssh/sshd_config
/etc/init.d/ssh reload
scp ~/.ssh/id_rsa.pub nyeates1@yeates.dyndns.org:~/
cat id_rsa.pub >> /home/nyeates1/.ssh/authorized_keys rm id_rsa.pub
This is for safety of the data on these drives. Make sure that the fstab wont care if its missing.
First though, get drive setup to be ready correctly.
fdisk /dev/sdb
Arman was visiting and had some good movies on a USB drive. I turned off, reconnected SATA drives, turned on, mounted, and copied movies over.
sudo umount /dev/md0 umount: /media/documents: device is busy
cat /proc/mounts
fuser -m /dev/md0
Follow notes from the first creation of RAID array. Very good documentation at: raid_filesystem_lvm
root@lms:~# mdadm --verbose --create /dev/md1 --level=5 --raid-devices=3 /dev/sd[bcd]1 mdadm: layout defaults to left-symmetric mdadm: chunk size defaults to 64K mdadm: size set to 312568576K mdadm: array /dev/md1 started. root@lms:~# cat /proc/mdstat Personalities : [linear] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sdd1[3] sdc1[1] sdb1[0] 625137152 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_] [==>..................] recovery = 10.2% (31882484/312568576) finish=108.9min speed=42941K/sec unused devices: <none>
root@lms:/media/documents# mkfs.ext3 -v -m .1 -b 4096 -E stride=16,stripe-width=32 /dev/md1 mke2fs 1.40.8 (13-Mar-2008) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 39075840 inodes, 156284288 blocks 0 blocks (0.00%) reserved for the super user First data block=0 Maximum filesystem blocks=4294967296 4770 block groups 32768 blocks per group, 32768 fragments per group 8192 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 102400000 Writing inode tables: done Creating journal (32768 blocks): done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 28 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
#raid5 mdadm TEST filesystem /dev/md1 /media/test ext3 defaults 0 2
need to mount it at /media/documents so that i can get stuff onto it through the share that is already setup with AFP.
nyeates1@lms:/media$ ls -la total 24 drwxr-xr-x 6 root root 4096 2009-08-19 01:39 . drwxr-xr-x 21 root root 4096 2008-09-27 15:43 .. lrwxrwxrwx 1 root root 6 2008-08-22 20:04 cdrom -> cdrom0 drwxr-xr-x 2 root root 4096 2008-08-22 20:04 cdrom0 drwxr-xr-x 4 root root 4096 2009-08-19 12:37 documents lrwxrwxrwx 1 root root 7 2008-08-22 20:04 floppy -> floppy0 drwxr-xr-x 2 root root 4096 2008-08-22 20:04 floppy0 -rw-r--r-- 1 root root 0 2009-08-19 01:39 .hal-mtab drwxr-xr-x 2 root root 4096 2009-08-19 00:06 test nyeates1@lms:/media$ cd documents nyeates1@lms:/media/documents$ ls -la total 28 drwxr-xr-x 4 root root 4096 2009-08-19 12:37 . drwxr-xr-x 6 root root 4096 2009-08-19 01:39 .. drwxr-xr-x 2 root root 4096 2009-08-19 12:37 .AppleDB drwx------ 2 root root 16384 2009-08-18 23:41 lost found
Started reading into ways to verify that the data is the same data each time i make changes to the hdd array, etc. MD5 is not found to be insecure for public applications, but I think that it reasonable for me to use on this private, trusted, simple environment.
Basically, you can get the md5 hash of a disk before, and after some change to the underlying structure, or a copy of the disk, and use md5 to compare their hashes. If even one bit is changed, it will show different.
md5sum /dev/md1 > /tmp/320gb-raid-original.md5
cksum /dev/md1 > /tmp/320gb-raid-original-dev-md1.crc
root@lms:~# time sum /dev/md1 60173 625137152 real 111m46.343s user 35m6.684s sys 29m21.198s root@lms:~# time cksum /dev/md1 735903248 640140443648 /dev/md1 real 114m37.638s user 44m2.225s sys 27m30.415s
nyeates1@lms:/tmp/raidTests$ cat cksumScript.sh find /media/test/ -type f -print | sort | while read FNAME do cksum "${FNAME}" done
root@lms:/tmp/raidTests# sh cksumScript.sh > 320gb-raid-original-dir.crc root@lms:/tmp/raidTests# sh cksumScript.sh > 320gb-raid-original-dir-compare.crc root@lms:/tmp/raidTests# diff 320gb-raid-original.crc 320gb-raid-original-compare.crc root@lms:/tmp/raidTests#
I had had raid test data in /tmp . This must be removed on reboot. Had to move it to my users home directory
I ran 2 sets of baseline cksum tests.
Run this over and over after each RAID test
sh /home/nyeates1/raidTests/cksumScript.sh > /home/nyeates1/raidTests/320gb-raid-somename-dir.crc cksum /dev/md1 > /home/nyeates1/raidTests/320gb-raid-somename-md1.crc
See my list of RAID tests at: raid_test
root@lms:~# mdadm -E /dev/sd[bcd]1 /dev/sdb1: Magic : a92b4efc Version : 00.90.00 UUID : 48c708ae:53236bb8:448213b5:08c3805a (local to host lms)mdadm --detail /dev/md Creation Time : Mon Aug 10 01:39:35 2009 Raid Level : raid5 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 625137152 (596.18 GiB 640.14 GB) Raid Devices : 3 Total Devices : 3 Preferred Minor : 1 Update Time : Tue Sep 8 23:10:46 2009 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Checksum : 3a2492f1 - correct Events : 0.8 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 17 0 active sync /dev/sdb1 0 0 8 17 0 active sync /dev/sdb1 1 1 8 33 1 active sync /dev/sdc1 2 2 8 49 2 active sync /dev/sdd1 /dev/sdc1: Basically same as above /dev/sdd1: Basically same as above
mdadm --verbose --add /dev/md1 /dev/sde
mdadm --verbose --grow /dev/md1 --raid-device=4
root@lms:~# cat /proc/mdstat Personalities : [linear] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sde[3] sdb1[0] sdd1[2] sdc1[1] 625137152 blocks super 0.91 level 5, 64k chunk, algorithm 2 [4/4] [UUUU] [>....................] reshape = 1.7% (5586952/312568576) finish=699.6min speed=7310K/sec
e2fsck -fv -C 0 /dev/md1
check the file system (or use fsck.ext3)
resize2fs -p /dev/md1
resize the file system
tune2fs -E stride=16,stripe-width=48 /dev/md1
retune for diff # of disks in raid
Sep 12 13:55:51 lms kernel: [816246.172760] raid5: Disk failure on sde, disabling device. Sep 12 13:55:51 lms kernel: [816246.172762] raid5: Operation continuing on 3 devices. Sep 12 13:55:51 lms kernel: [816246.212012] RAID5 conf printout: Sep 12 13:55:51 lms kernel: [816246.212012] --- rd:4 wd:3 Sep 12 13:55:51 lms kernel: [816246.212012] disk 0, o:1, dev:sdb1 Sep 12 13:55:51 lms kernel: [816246.212012] disk 1, o:1, dev:sdc1 Sep 12 13:55:51 lms kernel: [816246.212012] disk 2, o:1, dev:sdd1 Sep 12 13:55:51 lms kernel: [816246.212012] disk 3, o:0, dev:sde Sep 12 13:55:51 lms kernel: [816246.224013] RAID5 conf printout: Sep 12 13:55:51 lms kernel: [816246.224016] --- rd:4 wd:3 Sep 12 13:55:51 lms kernel: [816246.224018] disk 0, o:1, dev:sdb1 Sep 12 13:55:51 lms kernel: [816246.224019] disk 1, o:1, dev:sdc1 Sep 12 13:55:51 lms kernel: [816246.224021] disk 2, o:1, dev:sdd1 Sep 12 13:55:51 lms mdadm: Fail event detected on md device /dev/md1, component device /dev/sde Sep 12 13:55:51 lms postfix/pickup[22613]: 9DA27332A0: uid=0 from=<root> Sep 12 13:55:51 lms postfix/cleanup[22666]: 9DA27332A0: message-id=<20090912175551.9DA27332A0@lms> Sep 12 13:55:51 lms postfix/qmgr[5439]: 9DA27332A0: from=<root@lms>, size=790, nrcpt=1 (queue active) Sep 12 13:55:54 lms postfix/smtp[22668]: 9DA27332A0: to=<nyeates1@umbc.edu>, relay=mxin.umbc.edu[130.85.12.6]:25, delay=2.5, delays=0.17/0.04/2.2/0.06, dsn=5.1.8, status=bounced (host mxin.umbc.edu[130.85.12.6] said: 553 5.1.8 <nyeates1@umbc.edu>... Domain of sender address root@lms does not exist (in reply to RCPT TO command)) Sep 12 13:55:54 lms postfix/cleanup[22666]: 0C19C332A1: message-id=<20090912175554.0C19C332A1@lms> Sep 12 13:55:54 lms postfix/qmgr[5439]: 0C19C332A1: from=<>, size=2550, nrcpt=1 (queue active) Sep 12 13:55:54 lms postfix/bounce[22669]: 9DA27332A0: sender non-delivery notification: 0C19C332A1 Sep 12 13:55:54 lms postfix/qmgr[5439]: 9DA27332A0: removed Sep 12 13:55:54 lms postfix/local[22670]: 0C19C332A1: to=<nyeates1@lms>, orig_to=<root@lms>, relay=local, delay=0.08, delays=0.01/0.03/0/0.04, dsn=2.0.0, status=sent (delivered to maildir) Sep 12 13:55:54 lms postfix/qmgr[5439]: 0C19C332A1: removed
Sep 13 01:27:11 lms kernel: [857726.412012] md: unbind<sde> Sep 13 01:27:11 lms kernel: [857726.412012] md: export_rdev(sde)
Sep 13 01:57:35 lms kernel: [859550.700607] md: bind<sde> Sep 13 01:57:35 lms kernel: [859550.728010] RAID5 conf printout: Sep 13 01:57:35 lms kernel: [859550.728010] --- rd:4 wd:3 Sep 13 01:57:35 lms kernel: [859550.728010] disk 0, o:1, dev:sdb1 Sep 13 01:57:35 lms kernel: [859550.728010] disk 1, o:1, dev:sdc1 Sep 13 01:57:35 lms kernel: [859550.728010] disk 2, o:1, dev:sdd1 Sep 13 01:57:35 lms kernel: [859550.728010] disk 3, o:1, dev:sde Sep 13 01:57:35 lms mdadm: RebuildStarted event detected on md device /dev/md1 Sep 13 01:57:35 lms kernel: [859550.733244] md: recovery of RAID array md1 Sep 13 01:57:35 lms kernel: [859550.733248] md: minimum _guaranteed_ speed: 1000 KB/sec/disk. Sep 13 01:57:35 lms kernel: [859550.733250] md: using maximum available idle IO bandwidth (but not more than 200000 KB/sec) for recovery. Sep 13 01:57:35 lms kernel: [859550.733254] md: using 128k window, over a total of 312568576 blocks.
Sep 18 23:04:06 lms kernel: [1367541.472016] ata5: exception Emask 0x10 SAct 0x0 SErr 0x1810000 action 0xe frozen Sep 18 23:04:06 lms kernel: [1367541.472016] ata5: SError: { PHYRdyChg LinkSeq TrStaTrns } Sep 18 23:04:06 lms kernel: [1367541.472016] ata5: hard resetting link Sep 18 23:04:07 lms kernel: [1367542.224021] ata5: SATA link down (SStatus 0 SControl 300) Sep 18 23:04:07 lms kernel: [1367542.224029] ata5: failed to recover some devices, retrying in 5 secs Sep 18 23:04:12 lms kernel: [1367547.228020] ata5: hard resetting link Sep 18 23:04:12 lms kernel: [1367547.548019] ata5: SATA link down (SStatus 0 SControl 300) Sep 18 23:04:12 lms kernel: [1367547.548025] ata5: failed to recover some devices, retrying in 5 secs Sep 18 23:04:17 lms kernel: [1367552.552022] ata5: hard resetting link Sep 18 23:04:18 lms kernel: [1367552.872018] ata5: SATA link down (SStatus 0 SControl 300) Sep 18 23:04:18 lms kernel: [1367552.872024] ata5.00: disabled Sep 18 23:04:18 lms kernel: [1367553.376023] ata5: EH complete Sep 18 23:04:18 lms kernel: [1367553.376161] ata5.00: detaching (SCSI 4:0:0:0) Sep 18 23:04:18 lms kernel: [1367553.377378] sd 4:0:0:0: [sde] Synchronizing SCSI cache Sep 18 23:04:18 lms kernel: [1367553.377418] sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Sep 18 23:04:18 lms kernel: [1367553.377421] sd 4:0:0:0: [sde] Stopping disk Sep 18 23:04:18 lms kernel: [1367553.377430] sd 4:0:0:0: [sde] START_STOP FAILED Sep 18 23:04:18 lms kernel: [1367553.377432] sd 4:0:0:0: [sde] Result: hostbyte=DID_BAD_TARGET driverbyte=DRIVER_OK,SUGGEST_OK Sep 18 23:04:18 lms NetworkManager: <debug> [1253329458.633024] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/pci_10de_55_scsi_host_scsi_device_lun0_scsi_generic'). Sep 18 23:04:18 lms NetworkManager: <debug> [1253329458.644806] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/storage_serial_1ATA_WDC_WD3200AAKS_00L9A0_WD_WMAV2A404686'). Sep 18 23:04:18 lms NetworkManager: <debug> [1253329458.649343] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/pci_10de_55_scsi_host_scsi_device_lun0'). Sep 18 23:04:18 lms NetworkManager: <debug> [1253329458.652641] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/pci_10de_55_scsi_host'). Sep 18 23:04:18 lms NetworkManager: <debug> [1253329458.660932] nm_hal_device_removed(): Device removed (hal udi is '/org/freedesktop/Hal/devices/volume_part1_size_320070288384'). ... Sep 18 23:07:53 lms kernel: [1367768.736010] raid5: Disk failure on sde, disabling device. Sep 18 23:07:53 lms kernel: [1367768.736010] raid5: Operation continuing on 3 devices. Sep 18 23:07:54 lms kernel: [1367768.773247] RAID5 conf printout: Sep 18 23:07:54 lms kernel: [1367768.773250] --- rd:4 wd:3 Sep 18 23:07:54 lms kernel: [1367768.773252] disk 0, o:1, dev:sdb1 Sep 18 23:07:54 lms kernel: [1367768.773253] disk 1, o:1, dev:sdc1 Sep 18 23:07:54 lms kernel: [1367768.773255] disk 2, o:1, dev:sdd1 Sep 18 23:07:54 lms kernel: [1367768.773256] disk 3, o:0, dev:sde Sep 18 23:07:54 lms kernel: [1367768.784300] RAID5 conf printout: Sep 18 23:07:54 lms kernel: [1367768.784304] --- rd:4 wd:3 Sep 18 23:07:54 lms kernel: [1367768.784305] disk 0, o:1, dev:sdb1 Sep 18 23:07:54 lms kernel: [1367768.784306] disk 1, o:1, dev:sdc1 Sep 18 23:07:54 lms kernel: [1367768.784308] disk 2, o:1, dev:sdd1 ... Sep 18 23:07:54 lms mdadm: Fail event detected on md device /dev/md1, component device /dev/sde
Personalities : [linear] [raid6] [raid5] [raid4] [multipath] [faulty] md1 : active raid5 sde[4](F) sdb1[0] sdd1[2] sdc1[1] 937705728 blocks level 5, 64k chunk, algorithm 2 [4/3] [UUU_] unused devices: <none>
Had to return it to best buy now, so i turned off machine and yanked it.
Unrelated, the battery died for the UPS power system connected to the server. I ordered a new battery finally and got it up and running in no time. Server is as it was, battery working fine. ~10/15/09
Also of note is that it did a disk check (fschk ??) on the first boot up from power off. I was looking for boot logs and I think I learned that ubuntu has a bug of not outputing boot data to a log (WTF?!). I think the file systems checked out ok.
Question: How do I really know if my file systems or drives are starting to go bad?
I think the machine still thinks there are 4 drives in the RAID array, and that only 3 are operating. This is for the test array, which doesnt matter as much now. I think I should next try to bring the fs down to size, and then bring the array down to 3 drives and right size, etc. Then I can unplug the shits.
From man page for resize2fs: " If you wish to shrink an ext2 partition, first use resize2fs to shrink the size of filesystem. Then you may use fdisk(8) to shrink the size of the partition. When shrinking the size of the partition, make sure you do not make it smaller than the new size of the ext2 filesystem!”
root@lms:~# mdadm -E /dev/sd[bcd]1 /dev/sdb1: Magic : a92b4efc Version : 00.90.00 UUID : 48c708ae:53236bb8:448213b5:08c3805a (local to host lms) Creation Time : Mon Aug 10 01:39:35 2009 Raid Level : raid5 Used Dev Size : 312568576 (298.09 GiB 320.07 GB) Array Size : 937705728 (894.27 GiB 960.21 GB) Raid Devices : 4 Total Devices : 3 Preferred Minor : 1 Update Time : Sat Oct 31 07:56:14 2009 State : clean Active Devices : 3 Working Devices : 3 Failed Devices : 1 Spare Devices : 0 Checksum : 3a6ff60e - correct Events : 0.208122 Layout : left-symmetric Chunk Size : 64K Number Major Minor RaidDevice State this 0 8 17 0 active sync /dev/sdb1 0 0 8 17 0 active sync /dev/sdb1 1 1 8 33 1 active sync /dev/sdc1 2 2 8 49 2 active sync /dev/sdd1 3 3 0 0 3 faulty removed /dev/sdc1: Basically same as above /dev/sdd1: Basically same as above
root@lms:~# mdadm --verbose --remove detached /dev/md1 mdadm: error opening detached: No such file or directory root@lms:~# mdadm --verbose --remove failed /dev/md1 mdadm: error opening failed: No such file or directory root@lms:~# mdadm --verbose --remove /dev/md1
[ 19.177553] md: bind<sdc1> [ 19.177721] md: bind<sdd1> [ 19.177882] md: bind<sdb1> [ 19.198859] raid5: device sdb1 operational as raid disk 0 [ 19.198862] raid5: device sdd1 operational as raid disk 2 [ 19.198865] raid5: device sdc1 operational as raid disk 1 [ 19.202247] input: Power Button (FF) as /class/input/input3 [ 19.223305] raid5: allocated 4274kB for md1 [ 19.223308] raid5: raid level 5 set md1 active with 3 out of 4 devices, algorithm 2 [ 19.223311] RAID5 conf printout: [ 19.223313] --- rd:4 wd:3 [ 19.223315] disk 0, o:1, dev:sdb1 [ 19.223316] disk 1, o:1, dev:sdc1 [ 19.223318] disk 2, o:1, dev:sdd1 [ 731.742440] EXT3-fs error (device md1): ext3_check_descriptors: Block bitmap for group 1920 not in group (block 0)! [ 731.770839] EXT3-fs: group descriptors corrupted!
ctrl + alt + F2 AND ctrl + alt + F8
/dev/md0: clean, 122820/122101760 files, 276659314/488379968 blocks
Also, how does RAID handle large deletes? Fine? Does it need a defragment of somekind? Anyway to easily check for file system corruption?
Current OS is getting bloated with installing stuff hodge-podge. Need to start from scratch and get just what I need on it. Keep security in mind.
[computername]_[en0 MAC].sparsebundle
You should have mdadm report if any errors happen. This can be done by adding a MAILADDR line in /etc/mdadm.conf
echo "MAILADDR root" >> /etc/mdadm.conf
Or you could use an email address for the notification instead of root.
Start monitoring the raids eg by:
mdadm --monitor --scan --daemonise
Test that email notification is done by
mdadm --monitor --scan --test
I want to backup some of the irreplacable data, to a separate location. In case of disaster, fire, server meltdown, I at least have those most important files. Likely use Rsync and cron. Take bandwidth into consideration if necessary.
Possibility to use tahoe fs and GridBackup as a backup medium for a friendnet. Hadoop and HDFS are possibilities too. ZFS is out of the running.
Found out that this service is not just cloud-based only but allows backup amongst computers for free and it runs on linux and mac and win. I could use this to backup to other locations like Rolands or my parents.
Installed CrashPlan and got following post-install info:
CrashPlan has been installed and the Service has been started automatically. Press Enter to complete installation. Important directories: Installation: /usr/local/crashplan Logs: /usr/local/crashplan/log Default archive location: /media/documents/Backups/CrashPlanBackups Start Scripts: sudo /usr/local/crashplan/bin/CrashPlanEngine start|stop /usr/local/crashplan/bin/CrashPlanDesktop You can run the CrashPlan Desktop UI locally as your own user or connect a remote Desktop UI to this Service via port-forwarding and manage it remotely. Instructions for remote management are in the readme files placed in your installation directory: /usr/local/crashplan/doc
Sometime in Spring or Summer of 2011, the lms servers hardware died. I tried many things to try to get it responsive. I think it was the motherboard that died, as it wouldnt even post or beep. Nothing. CMOS Battery removal didnt do anything.
I determined that to continue, I would need a replacement motherboard with same model, or to start on a whole new machine. Replacement Mobos were a bit expensive. I actually am thinking that I want to move off of the lms server and onto the mac mini so that it makes it easier and more integrated. This means that I do not want to spend too much money fixing this so that I can then immediately move to a different solution.
I ended up getting a new HP mini tower computer from my dads work. It did not have 3 SATA ports. I bought a decent Promise SATA card online for like 50 or 60 bucks. I installed the card and then didnt touch the machine for months.
Now at end of Oct, I have finally got the new machine hooked up and running in Bellas room. I had to try a few different hard drives on the main OS one to assure that it was a quite one. It was the large 320 GB HDD.
Put Ubuntu 11.10 on a USB stick (they have an awesome site and UX now! and the previews of their OS looks nice!) Also other have told me that it has improved much. They were right! I installed form the usb stick fine onto the 320 GB disk.
I immediately updated all advised updates in the UI. Took maybe 12 mins.
I got openssh running on it with
sudo apt-get install openssh-server
I used the GUI-based disk util to see the various disks. All 3x 1TB disks show. I also looked at their SMART status. All disks good except one. WARNING: One of the 1TB drives has had some bad sectors.
I want to get the raid array up and going asap.
Following notes at http://nickyeates.com/technology/unix/raid_filesystem_lvm I found that I had to install mdadm:
apt-get install mdadm
It went into a curses mode to install Postfix, the mail transport agent. I told it:
Started up existing raid array:
mdadm --verbose --assemble /dev/md0 /dev/sd[bcd]1 mdadm: looking for devices for /dev/md0 mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1. mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2. mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 0. mdadm: added /dev/sdb1 to /dev/md0 as 1 mdadm: added /dev/sdc1 to /dev/md0 as 2 mdadm: added /dev/sdd1 to /dev/md0 as 0 mdadm: /dev/md0 has been started with 3 drives.
Looks good:
cat /proc/mdstat md0 : active raid5 sdd1[0] sdc1[2] sdb1[1] 1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]
Mounted the raid array:
mkdir /media/documents mount -t ext3 /dev/md0 /media/documents/
Edited /etc/fstab:
# raid5 mdadm filesystem /dev/md0 /media/documents ext3 defaults 0 2
Went into mounted dir and its all there! Yay!
See above section that I updated on this
In using the AFP shares on mac mini for movies with plex media server, I noticed that the directory listing was molasses slow. Also, the entire ‘movies’ dir wouldnt load at all after some time.
I started digging into log files and people with similar problems.
Error log for Netatalk, the AFP daemon:
tail -f /media/documents/.AppleDB/db_errlog
This file was giving same error over and over:
cnid2.db/cnid2.db: DB_SECONDARY_BAD: Secondary index inconsistent with primary Finding last valid log LSN: file: 2 offset 7783453 Recovery starting from [2][7782965] Recovery complete at Tue Mar 6 17:17:04 2012 Maximum transaction ID 80000004 Recovery checkpoint [2][7783453]
Users here had similar issues and they basically were deleting their entire CNID databases (the database that keeps track of IDs → file names. They were then restarting. Another set of information under the official Netatalk docs explained about this CNID db and how it operates, what its role is, and about a new command:
dbd -s .
The -s scans. -r rebuilds what the scan finds.
I decided I would move the /media/documents/.AppleDB dir (same as deleting it), and rebuild.
I shutdown netatalk moved .AppleDB ran a cnid rebuild
dbd -r /media/documents/
The above command started checking and writing new CNIDs for every single file. Tons of lines flew by, one for each file it seemed.
45 mins of CNID rewrites, and it stops. I now start up netatalk. Logs and db seem to clear out from new.
Initial directory loads take a bit of time still, but once they are loaded, they are cached and fast.
I want to get all of the data on this linux server over to a mac server, held on our current mac mini that will come from the living room. Now that we have 2 apple TV devices, we dont need the mac mini as a TV device in living room, freeing it for server use. My hope is that 1) I dont have to mess with such low level stuff and piecing together 10 open source components when setting up hard drives and file shares and bonjour, etc 2) So that Sue can more easily modify or restart or understand the data on this server.
Purchased a 3 TB HDD, and a USB enclosure that can take 4 drives.
Read an article that I should try UDF file system, so that when I take it to the mac, it can still be read natively. Ubuntu/linux also can format it. Little did I know, there was a bit of researched needed to format and mount it on linux.
Turns out UDF does not use normal MBR (master boot record) at the begining of the disk. It does its own thing, and you just gobble up the entire disk. So I messed with partitions at first, without knowing I didnt need to. Instead, it is good to first clear the MBR at the start of the disk, and then format with udftools on ubuntu. See commands below.
sudo dd if=/dev/zero of=/dev/sde bs=512 count=1 sudo mkudffs –media-type=hd –blocksize=512 /dev/sde
Got the info from here:
When I look at the drive capacity though, with `df -h` it says the drives size is 747G. Not 3 TB
I could still test the 3TB UDF drive on the mac mini to see if it mounts and has files on it.