Change Details

This is something I was looking to do for a long time the goal being having *complete* ctype map for UTF-8; was just missing the fact that we already have a definitive source of ctype information. This includes a bit of cleanup to make things easier and cleaner, and the main change is in utf8-rollup.pl. We no longer use manually assembled definitions, and parse UnicodeData.txt directly. The only issue here is that there's no direct mapping between the categories defined in UnicodeData.txt and the ones defined by POSIX, so I used my best judgement here. The format is described at: http://www.unicode.org/reports/tr44/#UnicodeData.txt Categories are described at: http://www.unicode.org/reports/tr44/#General_Category_Values