This module gets a text with markup (now only emph), synthesises it and creates a wave file for each chunk it receives. This module also aanalyses the synthesised speech and sends out timings for phonemes.
<module name="SpeechPlanner"> <description>Generates audio and phoneme data with timing from words.</description> <trigger from="WBSystem" type="RU.S1.Internal.Module.Status.Ping"/> <trigger from="WBAction" type=" RU.S1.Output.Plan.Task.Speech.Do"/> <trigger from="WBPlan" type="RU.S1.Output.Plan.Task"/> <post to="WBPlan" type="RU.S1.Output.Plan.Task.Speech.Ready"/> <post to="WBPlan" type="RU.S1.Output.Plan.Task.Answear.Ready"/> <post to="WBAction" type="RU.S1.Output.Plan.Act.Speech.Timing"/> <triggers from="WBPlanAns"> <trigger type="RU.S1.Output.Action.Speech.ReplyTo"/> </triggers> </module>
Triggers:
RU.S1.Output.Plan.Task;
Description: Processes text and creates phoneme analysis and wave file for playback at a later time
<ThoughtUnit id="9"> <SentenceFragment falsetto="false" fragment_id="9" answer="false"> <Text type="plain">Hello folks, how are you</Text> </SentenceFragment> </ThoughtUnit>
RU.S1.Output.Plan.Task.Cancel Desription: Cancel all output Xml: (empty)
RU.S1.Output.Action.Speech.ReplyTo
RU.S1.Output.Action.Speech.ReplyTo.Do
Posters:
RU.S1.Output.Plan.Task.Speech.Ready
Description: notifies that processing is done
<ProcessingDone fragment_id="9"/>
RU.S1.Output.Plan.Task.Speech.Done;
RU.S1.Output.Action.Speech.ReplyTo.Done;
RU.S1.Output.Plan.Task.Answear.Ready;
RU.S1.Output.Plan.Act.Speech.Timing
Desription: Phoneme timings sent to other modules when preprocessing for lip syncronization
Xml:
<phonemecollection id=“9”>
<phoneme starttime="" endtime="0.21000">#</phoneme> <phoneme starttime="" endtime="0.26268">w</phoneme> <phoneme starttime="" endtime="0.36208">ai</phoneme> <phoneme starttime="" endtime="0.41757">l</phoneme> <phoneme starttime="" endtime="0.45014">dh</phoneme> <phoneme starttime="" endtime="0.55284">ei</phoneme> <phoneme starttime="" endtime="0.63179">d</phoneme> <phoneme starttime="" endtime="0.66442">r</phoneme> <phoneme starttime="" endtime="0.79367">e</phoneme> <phoneme starttime="" endtime="0.88565">s</phoneme> <phoneme starttime="" endtime="0.92611">t</phoneme> <phoneme starttime="" endtime="1.00522">uh</phoneme> <phoneme starttime="" endtime="1.08368">p</phoneme> <phoneme starttime="" endtime="1.13934">i</phoneme> <phoneme starttime="" endtime="1.19813">n</phoneme> <phoneme starttime="" endtime="1.28759">f</phoneme> <phoneme starttime="" endtime="1.41052">ai</phoneme> <phoneme starttime="" endtime="1.47001">n</phoneme> <phoneme starttime="" endtime="1.58399">k</phoneme> <phoneme starttime="" endtime="1.63730">l</phoneme> <phoneme starttime="" endtime="1.82570">ou</phoneme> <phoneme starttime="" endtime="1.85876">dh</phoneme> <phoneme starttime="" endtime="1.94083">z</phoneme> <phoneme starttime="" endtime="2.15083">#</phoneme> </phonemecollection>
The executables are located in the “roboradio/is/ru/cadia/roboradio/src/modules/executors/speech/SpeechPlanner” folder
The same module is used for Speech_renderer and Speech_player but with different parameters
To run the speech_renderer in Linux: ./speechplanner psyclone=localhost
To run the Speech_player: ./speechplanner psyclone=localhost module=Speaker