• basically I think: given a sound segment, classify what subword / character it is
  • then you need to assemble these parts into actual words / sentences