for a chemical, you make look it up on a database and see the literature on it. To get more features, you can run tf-idf on the papers, or generate word embeddings from it