This is something I was looking to do for a long time the goal being having *complete* ctype map for UTF-8; was just missing the fact that we already have a definitive source of ctype information.
The only issue here is that there's no direct mapping between the categories defined in UnicodeData.txt and the ones defined by POSIX, so I used my best judgement here.
The format is described at: http://www.unicode.org/reports/tr44/#UnicodeData.txt
Categories are described at: http://www.unicode.org/reports/tr44/#General_Category_Values