Parsing a complex dataset with Hadoop