Table of Contents

Testing SATA drives

Test EVMS / LVM2 HDDs setup works

GUI boot problem on Ubuntu 7.04 -> 7.10

Get EVMS SATA HDDs working simultaneously

Installed EVMS on Ubuntu 7.10

Boot doesnt find /root HDD

Fixed GUI boot problem

No SATA

Unsuccessful 8.04 Install

Need to return 1TB Best Buy HDD

Tried to transfer

Installed temp winXP

Wrote plans

Successful 8.04 install

Get SATA to work on 8.04

Get EVMS to work on 8.04

Download packages

Patch, config, build custom kernel

Build and install EVMS tools 2.6.26.3

Activate EVMS volumes

Mount LVM2 drive

No Success

Ideas, regroup, rethink

Tried EVMS on "original" 7.04 clean

No Success

RAID-5 with 3x 1TB drives

Initialize disks

Create raid disk array

setup and create file system

NFS Share

Little success, fuck it

Still want to measure speed

Bittorrent and Large RAID test files downloaded

AFP and Avahi for Apple filesharing

Instructions modified from:

Install Netatalk

Config Netatalk (AFP)

Config Shared Volumes

Install Avahi (Bonjour)

Config Avahi and advertise services

Test on Macs

Worked Perfect!

2-2.5 MBps transfers! Yay

RAID monitor commands

Install email MTA (postfix? mailx?)

Network issue

Mess with lms

Mess with macmini

Gave up for now

Bad Shutdown??

More mess with lms

Solution: turn off static NAS on router

VNC setup

Snort Install

Open SSH install

Zenoss install

Device population

SNMP install

Server slowness

Retry to recover 2x 320 GB HDDs

Review and learning - dd fdisk cmds

Test Bitwise Copy

Test DD run

as root:

Test fdisk

root@lms:/home/nyeates1# fdisk -l /dev/sdd
Disk /dev/sdd: 10.2 GB, 10245537792 bytes
255 heads, 63 sectors/track, 1245 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x3b14a989
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1        1186     9526513   83  Linux
/dev/sdd2            1187        1245      473917    5  Extended
/dev/sdd5            1187        1245      473886   82  Linux swap / Solaris
root@lms:/home/nyeates1# 
root@lms:/home/nyeates1# 
root@lms:/home/nyeates1# fdisk -l /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img 
You must set cylinders.
You can do this from the extra functions menu.
 
Disk /media/documents/Backups/lms_Ubuntu_804_evms_kernel.img: 0 MB, 0 bytes
255 heads, 63 sectors/track, 0 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x3b14a989
 
                                                  Device Boot      Start         End      Blocks   Id  System
/media/documents/Backups/lms_Ubuntu_804_evms_kernel.img1   *           1        1186     9526513   83  Linux
Partition 1 has different physical/logical endings:
     phys=(1023, 254, 63) logical=(1185, 254, 63)
/media/documents/Backups/lms_Ubuntu_804_evms_kernel.img2            1187        1245      473917    5  Extended
Partition 2 has different physical/logical beginnings (non-Linux?):
     phys=(1023, 254, 63) logical=(1186, 0, 1)
Partition 2 has different physical/logical endings:
     phys=(1023, 254, 63) logical=(1244, 254, 63)
/media/documents/Backups/lms_Ubuntu_804_evms_kernel.img5            1187        1245      473886   82  Linux swap / Solaris
       Whenever a partition table is printed out, a consistency check is performed on the partition table entries.  This
       check verifies that the physical and logical start and end points are identical, and that  the  partition  starts
       and ends on a cylinder boundary (except for the first partition).
Expert command (m for help): w
The partition table has been altered!
 
Calling ioctl() to re-read partition table.
 
WARNING: Re-reading the partition table failed with error 25: Inappropriate ioctl for device.
The kernel still uses the old table.
The new table will be used at the next reboot.
Syncing disks.

Production Bitwise Copy

Have decided that I will now go ahead with bitwise copy of 2x 320 GB PATA drives.

Do PATA drives work?

ls sd*
sda  sda1  sdb  sdb1  sdc  sdc1  sdd  sdd1  sdd2  sdd5  sde  sdf
Disk /dev/sda: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x860fdec6
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1      121601   976760001   fd  Linux raid autodetect
 
Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1      121601   976760001   fd  Linux raid autodetect
 
Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x0008bc1e
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1               1      121601   976760001   fd  Linux raid autodetect
 
Disk /dev/sdd: 10.2 GB, 10245537792 bytes
255 heads, 63 sectors/track, 1245 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x3b14a989
 
   Device Boot      Start         End      Blocks   Id  System
/dev/sdd1   *           1        1186     9526513   83  Linux
/dev/sdd2            1187        1245      473917    5  Extended
/dev/sdd5            1187        1245      473886   82  Linux swap / Solaris
 
Disk /dev/sde: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39713970
 
   Device Boot      Start         End      Blocks   Id  System
 
Disk /dev/sdf: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x39713971
 
   Device Boot      Start         End      Blocks   Id  System
 
Disk /dev/md0: 2000.4 GB, 2000404348928 bytes
2 heads, 4 sectors/track, 488379968 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000
 
Disk /dev/md0 doesn't contain a valid partition table
  PV /dev/sdf   VG Media   lvm2 [298.06 GB / 0    free]
  PV /dev/sde   VG Media   lvm2 [298.06 GB / 0    free]
  Total: 2 [596.12 GB] / in use: 2 [596.12 GB] / in no VG: 0 [0   ]

DD Run

as root:

305245 1 records in
305245 1 records out
320072933376 bytes (320 GB) copied, 6398.41 s, 50.0 MB/s
305245 1 records in
305245 1 records out
320072933376 bytes (320 GB) copied, 6559.71 s, 48.8 MB/s
-rw-r--r--  1 root     nyeates1 233G 2009-08-03 03:38 lms_pata_320GB_sdf.img
ls -lah /media/documents/Backups/
-rw-r--r--  1 root     nyeates1 299G 2009-08-03 04:06 lms_pata_320GB_sdf.img
nyeates1@lms:~$ cd /media/documents/Backups/
nyeates1@lms:/media/documents/Backups$ sudo chmod 444 lms_pata_320GB_sdf.img 
[sudo] password for nyeates1: 
nyeates1@lms:/media/documents/Backups$ ls -lah /media/documents/Backups/
-r--r--r--  1 root     nyeates1 299G 2009-08-03 04:06 lms_pata_320GB_sdf.img

Try to mount 320GB disks - from pvscan cmd

root@lms:~# pvscan
  PV /dev/sdc   VG Media   lvm2 [298.06 GB / 0    free]
  PV /dev/sdb   VG Media   lvm2 [298.06 GB / 0    free]
  Total: 2 [596.12 GB] / in use: 2 [596.12 GB] / in no VG: 0 [0   ]
root@lms:~# 
root@lms:~# pvdisplay
  --- Physical volume ---
  PV Name               /dev/sdc
  VG Name               Media
  PV Size               298.09 GB / not usable 29.34 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              9538
  Free PE               0
  Allocated PE          9538
  PV UUID               L5MYAm-YTkK-WXJn-3Y8N-I04A-sKzc-SoJcUM
   
  --- Physical volume ---
  PV Name               /dev/sdb
  VG Name               Media
  PV Size               298.09 GB / not usable 29.34 MB
  Allocatable           yes (but full)
  PE Size (KByte)       32768
  Total PE              9538
  Free PE               0
  Allocated PE          9538
  PV UUID               qgw2Hv-ssfb-7g7G-jqXd-1TeQ-erKc-1B2PtT
 
root@lms:~#
root@lms:~# vgdisplay
  --- Volume group ---
  VG Name               Media
  System ID
  Format                lvm2
  Metadata Areas        2
  Metadata Sequence No  12
  VG Access             read/write
  VG Status             resizable
  MAX LV                256
  Cur LV                1
  Open LV               0
  Max PV                256
  Cur PV                2
  Act PV                2
  VG Size               596.12 GB
  PE Size               32.00 MB
  Total PE              19076
  Alloc PE / Size       19076 / 596.12 GB
  Free  PE / Size       0 / 0
  VG UUID               MV2Hm7-NZ7O-drJ6-tDfe-TXhS-VHS1-ekqx38
 
root@lms:~#
root@lms:~# lvdisplay
  --- Logical volume ---
  LV Name                /dev/Media/CargoPlane
  VG Name                Media
  LV UUID                xwvmre-dhz6-cQ7L-rBLt-knUw-Q1NB-ZqMqVK
  LV Write Access        read/write
  LV Status              available
  # open                 0
  LV Size                596.12 GB
  Current LE             19076
  Segments               2
  Allocation             inherit
  Read ahead sectors     0
  Block device           253:0
 
root@lms:~# vgscan
  Reading all physical volumes.  This may take a while...
  Found volume group "Media" using metadata type lvm2
root@lms:~# vgchange -a y
  1 logical volume(s) in volume group "Media" now active
root@lms:/dev/mapper# mkdir /media/olddocuments
root@lms:/dev/mapper# mount -t ext3 /dev/Media/CargoPlane /media/olddocuments/

Copy stuff over

Check LVM 320GB array with smartctl fsck or bonnie

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%     10766         -
# 1  Extended offline    Completed without error       00%     11043         -
# 2  Extended captive    Interrupted (host reset)      90%     11040         -

RAID tests

Now that I have all of the old LVM data over on the production RAID (3x 1TB in RAID-5), I can focus my efforts on the production RAID and a new OS setup. First, I need to understand how to test, use, and fix RAID in case anything needs to change or something bad happens. I also had originally wanted to do these tests before implementing RAID, in order to make sure that it was a good decision. I skipped it tho, not having time, and now I will return to it.

I need to test important use cases like if I want to expand to another hard drive in the array, will it do this with ease and the data is safe? Also, if i want to switch OS distributions, will I be able to mount it? Another is error testing, what if one drive goes down, how do I repair?

See my list of RAID tests at: raid_test

Prep for tests

Setup 3rd 320GB disk in LMS machine

Get data from remaining 320GB disk to RAID array

root@lms:/dev# mkdir /media/transit2
root@lms:/dev# mount -t ntfs /dev/sde5 /media/transit2/

Get rid of 320GB .img files

Copy data

diff -qr /media/transit2/ /media/documents/transit2/

Side Project: SSH Key authentication

Unplug 3x 1TB drives array, prep for reboot

This is for safety of the data on these drives. Make sure that the fstab wont care if its missing.

First though, get drive setup to be ready correctly.

Clear data, LVM, partition info on all 320GB disks

Arman came, got movies

Arman was visiting and had some good movies on a USB drive. I turned off, reconnected SATA drives, turned on, mounted, and copied movies over.

sudo umount /dev/md0
umount: /media/documents: device is busy

Create new RAID array on 3x 320GB disks

Follow notes from the first creation of RAID array. Very good documentation at: raid_filesystem_lvm

Which devices?

Put partitions on devices (drives)

See my drives and partitions!

Create array

Notes
Output
root@lms:~# mdadm --verbose --create /dev/md1 --level=5 --raid-devices=3 /dev/sd[bcd]1
mdadm: layout defaults to left-symmetric
mdadm: chunk size defaults to 64K
mdadm: size set to 312568576K
mdadm: array /dev/md1 started.
root@lms:~# cat /proc/mdstat 
Personalities : [linear] [raid6] [raid5] [raid4] [multipath] [faulty] 
md1 : active raid5 sdd1[3] sdc1[1] sdb1[0]
      625137152 blocks level 5, 64k chunk, algorithm 2 [3/2] [UU_]
      [==>..................]  recovery = 10.2% (31882484/312568576) finish=108.9min speed=42941K/sec
      
unused devices: <none>

Create filesystem on RAID device

Output
root@lms:/media/documents# mkfs.ext3 -v -m .1 -b 4096 -E stride=16,stripe-width=32 /dev/md1
mke2fs 1.40.8 (13-Mar-2008)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
39075840 inodes, 156284288 blocks
0 blocks (0.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
4770 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968, 
        102400000
 
Writing inode tables: done                            
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
 
This filesystem will be automatically checked every 28 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Mount filesystem

Add to fstab file

Put some fake data on new RAID, come up with verification schema

need to mount it at /media/documents so that i can get stuff onto it through the share that is already setup with AFP.

Current ownership

nyeates1@lms:/media$ ls -la
total 24
drwxr-xr-x  6 root root 4096 2009-08-19 01:39 .
drwxr-xr-x 21 root root 4096 2008-09-27 15:43 ..
lrwxrwxrwx  1 root root    6 2008-08-22 20:04 cdrom -> cdrom0
drwxr-xr-x  2 root root 4096 2008-08-22 20:04 cdrom0
drwxr-xr-x  4 root root 4096 2009-08-19 12:37 documents
lrwxrwxrwx  1 root root    7 2008-08-22 20:04 floppy -> floppy0
drwxr-xr-x  2 root root 4096 2008-08-22 20:04 floppy0
-rw-r--r--  1 root root    0 2009-08-19 01:39 .hal-mtab
drwxr-xr-x  2 root root 4096 2009-08-19 00:06 test
nyeates1@lms:/media$ cd documents
nyeates1@lms:/media/documents$ ls -la
total 28
drwxr-xr-x 4 root root  4096 2009-08-19 12:37 .
drwxr-xr-x 6 root root  4096 2009-08-19 01:39 ..
drwxr-xr-x 2 root root  4096 2009-08-19 12:37 .AppleDB
drwx------ 2 root root 16384 2009-08-18 23:41 lost found

MD5 Checksum / hash verification method

Started reading into ways to verify that the data is the same data each time i make changes to the hdd array, etc. MD5 is not found to be insecure for public applications, but I think that it reasonable for me to use on this private, trusted, simple environment.

Basically, you can get the md5 hash of a disk before, and after some change to the underlying structure, or a copy of the disk, and use md5 to compare their hashes. If even one bit is changed, it will show different.

root@lms:~# time sum /dev/md1
60173 625137152
real    111m46.343s
user    35m6.684s
sys     29m21.198s
 
root@lms:~# time cksum /dev/md1
735903248 640140443648 /dev/md1
real    114m37.638s
user    44m2.225s
sys     27m30.415s

cksum on data

nyeates1@lms:/tmp/raidTests$ cat cksumScript.sh 
find /media/test/ -type f -print | sort | while read FNAME
do
cksum "${FNAME}"
done

Found that restart gets rid of tmp directory

I had had raid test data in /tmp . This must be removed on reboot. Had to move it to my users home directory

I ran 2 sets of baseline cksum tests.

  1. 00-Original from start, no changes, has set of movie files on drive
  2. 01-Reboot I rebooted the machine, where md array was remounted, but files shouldnt be touched
    • no difference in the diff’s
    • there IS a difference in cksum of /dev/md1 :-( NO GOOD

ULTIMATE VERIFICATION CODE

Run this over and over after each RAID test

sh /home/nyeates1/raidTests/cksumScript.sh > /home/nyeates1/raidTests/320gb-raid-somename-dir.crc
cksum /dev/md1 > /home/nyeates1/raidTests/320gb-raid-somename-md1.crc

Run tests on 3x 320 GB RAID

See my list of RAID tests at: raid_test

Prep

grow

Add new drive

root@lms:~# mdadm -E /dev/sd[bcd]1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 48c708ae:53236bb8:448213b5:08c3805a (local to host lms)mdadm --detail /dev/md
  Creation Time : Mon Aug 10 01:39:35 2009
     Raid Level : raid5
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
     Array Size : 625137152 (596.18 GiB 640.14 GB)
   Raid Devices : 3
  Total Devices : 3
Preferred Minor : 1
 
    Update Time : Tue Sep  8 23:10:46 2009
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 0
  Spare Devices : 0
       Checksum : 3a2492f1 - correct
         Events : 0.8
 
         Layout : left-symmetric
     Chunk Size : 64K
 
      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1
 
   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
 
/dev/sdc1:
Basically same as above
 
/dev/sdd1:
Basically same as above

Test Chksum

yank a device

Soft yank

Hard Yank

Returned 4th HDD

Had to return it to best buy now, so i turned off machine and yanked it.

Battery died

Unrelated, the battery died for the UPS power system connected to the server. I ordered a new battery finally and got it up and running in no time. Server is as it was, battery working fine. ~10/15/09

Also of note is that it did a disk check (fschk ??) on the first boot up from power off. I was looking for boot logs and I think I learned that ubuntu has a bug of not outputing boot data to a log (WTF?!). I think the file systems checked out ok.

Question: How do I really know if my file systems or drives are starting to go bad?

State machine is in now

I think the machine still thinks there are 4 drives in the RAID array, and that only 3 are operating. This is for the test array, which doesnt matter as much now. I think I should next try to bring the fs down to size, and then bring the array down to 3 drives and right size, etc. Then I can unplug the shits.

Shrink

From man page for resize2fs: " If you wish to shrink an ext2 partition, first use resize2fs to shrink the size of filesystem. Then you may use fdisk(8) to shrink the size of the partition. When shrinking the size of the partition, make sure you do not make it smaller than the new size of the ext2 filesystem!”

root@lms:~# mdadm -E /dev/sd[bcd]1
/dev/sdb1:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 48c708ae:53236bb8:448213b5:08c3805a (local to host lms)
  Creation Time : Mon Aug 10 01:39:35 2009
     Raid Level : raid5
  Used Dev Size : 312568576 (298.09 GiB 320.07 GB)
     Array Size : 937705728 (894.27 GiB 960.21 GB)
   Raid Devices : 4
  Total Devices : 3
Preferred Minor : 1
 
    Update Time : Sat Oct 31 07:56:14 2009
          State : clean
 Active Devices : 3
Working Devices : 3
 Failed Devices : 1
  Spare Devices : 0
       Checksum : 3a6ff60e - correct
         Events : 0.208122
 
         Layout : left-symmetric
     Chunk Size : 64K
 
      Number   Major   Minor   RaidDevice State
this     0       8       17        0      active sync   /dev/sdb1
 
   0     0       8       17        0      active sync   /dev/sdb1
   1     1       8       33        1      active sync   /dev/sdc1
   2     2       8       49        2      active sync   /dev/sdd1
   3     3       0        0        3      faulty removed
 
/dev/sdc1:
Basically same as above
 
/dev/sdd1:
Basically same as above
root@lms:~# mdadm --verbose --remove detached /dev/md1
mdadm: error opening detached: No such file or directory
root@lms:~# mdadm --verbose --remove failed /dev/md1
mdadm: error opening failed: No such file or directory
root@lms:~# mdadm --verbose --remove /dev/md1

Delete array

[   19.177553] md: bind<sdc1>
[   19.177721] md: bind<sdd1>
[   19.177882] md: bind<sdb1>
[   19.198859] raid5: device sdb1 operational as raid disk 0
[   19.198862] raid5: device sdd1 operational as raid disk 2
[   19.198865] raid5: device sdc1 operational as raid disk 1
[   19.202247] input: Power Button (FF) as /class/input/input3
[   19.223305] raid5: allocated 4274kB for md1
[   19.223308] raid5: raid level 5 set md1 active with 3 out of 4 devices, algorithm 2
[   19.223311] RAID5 conf printout:
[   19.223313]  --- rd:4 wd:3
[   19.223315]  disk 0, o:1, dev:sdb1
[   19.223316]  disk 1, o:1, dev:sdc1
[   19.223318]  disk 2, o:1, dev:sdd1
[  731.742440] EXT3-fs error (device md1): ext3_check_descriptors: Block bitmap for group 1920 not in group (block 0)!
[  731.770839] EXT3-fs: group descriptors corrupted!

RE-PREP md0 original array

ctrl + alt + F2    AND    ctrl + alt + F8
/dev/md0: clean, 122820/122101760 files, 276659314/488379968 blocks

Clean out RAID, get rid of cruft

Delete dd-created .img files from RAID

Also, how does RAID handle large deletes? Fine? Does it need a defragment of somekind? Anyway to easily check for file system corruption?

New Stable OS Setup

Current OS is getting bloated with installing stuff hodge-podge. Need to start from scratch and get just what I need on it. Keep security in mind.

List of packages I know I will want/need

Setup Time Machine

Monitor RAID-5

Via built-in mdadm Manage Mode

You should have mdadm report if any errors happen. This can be done by adding a MAILADDR line in /etc/mdadm.conf

echo "MAILADDR root" >> /etc/mdadm.conf

Or you could use an email address for the notification instead of root.

Start monitoring the raids eg by:

mdadm --monitor --scan --daemonise

Test that email notification is done by

mdadm --monitor --scan --test

With Zenoss

Test

Remote Backup

I want to backup some of the irreplacable data, to a separate location. In case of disaster, fire, server meltdown, I at least have those most important files. Likely use Rsync and cron. Take bandwidth into consideration if necessary.

Possibility to use tahoe fs and GridBackup as a backup medium for a friendnet. Hadoop and HDFS are possibilities too. ZFS is out of the running.

CrashPlan

Found out that this service is not just cloud-based only but allows backup amongst computers for free and it runs on linux and mac and win. I could use this to backup to other locations like Rolands or my parents.

Installed CrashPlan and got following post-install info:

CrashPlan has been installed and the Service has been started automatically.

Press Enter to complete installation. 

Important directories:
  Installation:
    /usr/local/crashplan
  Logs:
    /usr/local/crashplan/log
  Default archive location:
    /media/documents/Backups/CrashPlanBackups

Start Scripts:
  sudo /usr/local/crashplan/bin/CrashPlanEngine start|stop
  /usr/local/crashplan/bin/CrashPlanDesktop

You can run the CrashPlan Desktop UI locally as your own user or connect
a remote Desktop UI to this Service via port-forwarding and manage it
remotely. Instructions for remote management are in the readme files
placed in your installation directory:
  /usr/local/crashplan/doc

RIP: LMS - Server Died

Sometime in Spring or Summer of 2011, the lms servers hardware died. I tried many things to try to get it responsive. I think it was the motherboard that died, as it wouldnt even post or beep. Nothing. CMOS Battery removal didnt do anything.

I determined that to continue, I would need a replacement motherboard with same model, or to start on a whole new machine. Replacement Mobos were a bit expensive. I actually am thinking that I want to move off of the lms server and onto the mac mini so that it makes it easier and more integrated. This means that I do not want to spend too much money fixing this so that I can then immediately move to a different solution.

New Server

I ended up getting a new HP mini tower computer from my dads work. It did not have 3 SATA ports. I bought a decent Promise SATA card online for like 50 or 60 bucks. I installed the card and then didnt touch the machine for months.

Now at end of Oct, I have finally got the new machine hooked up and running in Bellas room. I had to try a few different hard drives on the main OS one to assure that it was a quite one. It was the large 320 GB HDD.

New OS install

Put Ubuntu 11.10 on a USB stick (they have an awesome site and UX now! and the previews of their OS looks nice!) Also other have told me that it has improved much. They were right! I installed form the usb stick fine onto the 320 GB disk.

Initial Server Setup

I immediately updated all advised updates in the UI. Took maybe 12 mins.

I got openssh running on it with

sudo apt-get install openssh-server

I used the GUI-based disk util to see the various disks. All 3x 1TB disks show. I also looked at their SMART status. All disks good except one. WARNING: One of the 1TB drives has had some bad sectors.

Raid startup

I want to get the raid array up and going asap.

Following notes at http://nickyeates.com/technology/unix/raid_filesystem_lvm I found that I had to install mdadm:

apt-get install mdadm

It went into a curses mode to install Postfix, the mail transport agent. I told it:

Started up existing raid array:

mdadm --verbose --assemble /dev/md0 /dev/sd[bcd]1

mdadm: looking for devices for /dev/md0
mdadm: /dev/sdb1 is identified as a member of /dev/md0, slot 1.
mdadm: /dev/sdc1 is identified as a member of /dev/md0, slot 2.
mdadm: /dev/sdd1 is identified as a member of /dev/md0, slot 0.
mdadm: added /dev/sdb1 to /dev/md0 as 1
mdadm: added /dev/sdc1 to /dev/md0 as 2
mdadm: added /dev/sdd1 to /dev/md0 as 0
mdadm: /dev/md0 has been started with 3 drives.

Looks good:

cat /proc/mdstat 

md0 : active raid5 sdd1[0] sdc1[2] sdb1[1]
      1953519872 blocks level 5, 64k chunk, algorithm 2 [3/3] [UUU]

Mounted the raid array:

mkdir /media/documents
mount -t ext3 /dev/md0 /media/documents/

Edited /etc/fstab:

# raid5 mdadm filesystem
/dev/md0        /media/documents ext3   defaults        0       2 

Went into mounted dir and its all there! Yay!

Got AFP and Bonjour equivalents running

See above section that I updated on this

Dubugging dir list slowness

In using the AFP shares on mac mini for movies with plex media server, I noticed that the directory listing was molasses slow. Also, the entire ‘movies’ dir wouldnt load at all after some time.

I started digging into log files and people with similar problems.

Error log for Netatalk, the AFP daemon:

tail -f /media/documents/.AppleDB/db_errlog

This file was giving same error over and over:

cnid2.db/cnid2.db: DB_SECONDARY_BAD: Secondary index inconsistent with primary
Finding last valid log LSN: file: 2 offset 7783453
Recovery starting from [2][7782965]
Recovery complete at Tue Mar  6 17:17:04 2012
Maximum transaction ID 80000004 Recovery checkpoint [2][7783453]

Users here had similar issues and they basically were deleting their entire CNID databases (the database that keeps track of IDs → file names. They were then restarting. Another set of information under the official Netatalk docs explained about this CNID db and how it operates, what its role is, and about a new command:

dbd -s .

The -s scans. -r rebuilds what the scan finds.

I decided I would move the /media/documents/.AppleDB dir (same as deleting it), and rebuild.

I shutdown netatalk moved .AppleDB ran a cnid rebuild

dbd -r /media/documents/

The above command started checking and writing new CNIDs for every single file. Tons of lines flew by, one for each file it seemed.

45 mins of CNID rewrites, and it stops. I now start up netatalk. Logs and db seem to clear out from new.

Initial directory loads take a bit of time still, but once they are loaded, they are cached and fast.

Move data to Mac Mini Server

I want to get all of the data on this linux server over to a mac server, held on our current mac mini that will come from the living room. Now that we have 2 apple TV devices, we dont need the mac mini as a TV device in living room, freeing it for server use. My hope is that 1) I dont have to mess with such low level stuff and piecing together 10 open source components when setting up hard drives and file shares and bonjour, etc 2) So that Sue can more easily modify or restart or understand the data on this server.

Plan

Purchased a 3 TB HDD, and a USB enclosure that can take 4 drives.

  1. Plug this enclosure in, with the 3TB drive in it, into the Linux Media Server.
  2. Format and mount the 3 TB drive on LMS, now we have the 2 TB raid and 3 TB hdd
  3. Copy the files over to 3 TB hdd
    1. check that copy is correct
  4. Install mac mini and get server edition running on it (cheap app store purchase)
  5. Bring 3TB hdd over to mac mini (via usb enclosure), mount it,
    1. check that files show
    2. check that streaming is fast enough
  6. Start Backblaze backup immediately
  7. Unplug 3 x 1TB drives from LMS, slide them into USB enclosure on mac mini
  8. Format the 3 x 1TB drives into a mac raid-jbod (just a bunch of disks)
    1. there is no fault tolerance here, but i will setup a backup solution via backblaze or the likes
  9. Copy the files from 3 TB hdd to 3 x 1 TB raid-jbod
  10. Format the 3 TB hdd on mac mini
  11. Add the 3 TB hdd into the raid-jbod array, to create a 6TB JBOD array
  12. Install menu-bar utility that allowed SMART monitoring status and emails

Format and Mount 3 TB as UDF

Read an article that I should try UDF file system, so that when I take it to the mac, it can still be read natively. Ubuntu/linux also can format it. Little did I know, there was a bit of researched needed to format and mount it on linux.

Turns out UDF does not use normal MBR (master boot record) at the begining of the disk. It does its own thing, and you just gobble up the entire disk. So I messed with partitions at first, without knowing I didnt need to. Instead, it is good to first clear the MBR at the start of the disk, and then format with udftools on ubuntu. See commands below.

sudo dd if=/dev/zero of=/dev/sde bs=512 count=1 sudo mkudffs –media-type=hd –blocksize=512 /dev/sde

Got the info from here:

When I look at the drive capacity though, with `df -h` it says the drives size is 747G. Not 3 TB :-(

I could still test the 3TB UDF drive on the mac mini to see if it mounts and has files on it.