Removing a Raid Group from a Netapp Filer Aggregate
Well, today at work I ran in to a bit of a problem. We purchased four new hard drives for our Netapp FAS3020 storage system. I went in to the web interface and added the drives to the appropriate aggregrate. Everything worked and the drives were added, but only half the storage that I expected showed was added. I poked around and found that a new raid group had been formed and added to the aggregate. This was obviously not what I was expecting so I dug into the issue.

I checked the aggregate settings and it was configured with raidsize=10 which I now know means the maximum number of drives that can be put into a raid group within that aggregate is ten drives. I had 20 drives prior to the upgrade and 24 after the upgrade, so what I ended up with was:

Plex /aggr1/plex0: online, normal, active
RAID group /aggr1/plex0/rg0: normal - 10 drives
RAID group /aggr1/plex0/rg1: normal - 10 drives
RAID group /aggr1/plex0/rg2: normal - 4 drives

I think to myself, ok, I must be able to change this (did I mention that this was a production aggregate with 10 volumes on it). First avenue of exploration, can't I just remove the rg2 raid group, change the maximum raid group size and re add the drives? Short answer: No

Long answer:
You can't do this because of a couplel limitations on the filer (which initially surprised me, but once you think about how WAFL is designed it makes a little more sense.)

1. Once you add a raid group to an aggregate you cannot remove it. That's right, you can never remove a raid group from an aggregate unless you destroy the aggregate.

2. RAID Group size cannot be adjusted, except on the last raid group in an aggregate. This means that once rg2 was created I couldn't go back and resize rg0 and rg1. To change the size on rg2 you can do:
aggr options aggr1 raidsize 14
Which will set the raidsize to 14 drives for all future raid groups in this aggregrate

This is what I learned by reading through Netapp's documentation and it was confirmed with Netapp tech support who told me that it wouldn't be a problem, to simply drop to the command line, take the aggregate offline, destroy it, and recreate with the proper settings. After explaining that volumes existed they said "oh", put me on hold for about ten minutes and then came back indicating there was no way out of this situation.

So, at this point you may be wondering why we used a raidsize of ten in the first place. Well, it comes down to the consultants who set up the system not understanding how the aggregates/raid groups worked (or two of us misunderstood their explanation that was discussed in depth). Before deciding on the raidsize=10 for this aggregate they explained to us that we could fill our seven empty drive bays and add the disk to the existing raid groups. This was in error and it ended up costing us almost 1TB of space. Bottom line, don't change the default raidsize on aggregates unless you really know what you are doing.

All of this happened on a FAS 3020 running 7.1.1 using Flexvols for all volumes.