![]() These services allowed various prosodic features to be defined (such as stress, intonation and rhythm) to help make the generated voice more natural. Microsoft had a managed service simplifying this process, as did Resemble.ai, which is the service I chose to use. Initially I trained a model using the open source code, however with only three days to create this demonstration, I looked for a managed service to speed up this process. The amount of data used was key to the quality of the final result 50 samples led to an unintelligible model, 200 samples led to a "slightly American" model that was recognizeable as myself, and 500 samples led to a model that impressively captured the unique features of my voice. This involved recording short, 5 seconds clips with matching transcripts. Using a high quality condenser microphone, I collected 500 data samples (about 30 minutes). Tacotron 2 allows for text-to-speech models to be created, leveraging a pre-built model and transfer learning to reduce the amount of training data that is needed. I resigned myself to needing to build a large training dataset of my voice. The brand new CrazyTalk 8 contains all the powerful features people love about CrazyTalk plus a highly anticipated 3D Head Creation tool, a revolutionary Auto Motion engine, and smooth lip-syncing results for any talking. ![]() Unfortunately, the "pretrained" model was built using North American data samples, so whilst this approach works for American speakers, it led to my Australian accent having a bizarre American twang. CrazyTalk is the worlds most popular facial animation software that uses voice and text to vividly animate facial images. ![]()
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |