Table of Contents

Description

This module gets a text with markup (now only emph), synthesises it and creates a wave file for each chunk it receives. This module also aanalyses the synthesised speech and sends out timings for phonemes.

Functions

Psyspec

    <module name="SpeechPlanner">      
          <description>Generates audio and phoneme data with timing from words.</description>
          <trigger from="WBSystem" type="RU.S1.Internal.Module.Status.Ping"/>
          <trigger from="WBAction" type=" RU.S1.Output.Plan.Task.Speech.Do"/>
          <trigger from="WBPlan" type="RU.S1.Output.Plan.Task"/>
          <post to="WBPlan" type="RU.S1.Output.Plan.Task.Speech.Ready"/>
          <post to="WBPlan" type="RU.S1.Output.Plan.Task.Answear.Ready"/>
          <post to="WBAction" type="RU.S1.Output.Plan.Act.Speech.Timing"/>
          <triggers from="WBPlanAns">
             <trigger type="RU.S1.Output.Action.Speech.ReplyTo"/>
          </triggers>
    </module>

Messages

Triggers:
RU.S1.Output.Plan.Task;
Description: Processes text and creates phoneme analysis and wave file for playback at a later time

   <ThoughtUnit id="9">
        <SentenceFragment falsetto="false" fragment_id="9" answer="false">
             <Text type="plain">Hello folks, how are you</Text>
        </SentenceFragment>
   </ThoughtUnit>

RU.S1.Output.Plan.Task.Cancel Desription: Cancel all output Xml: (empty)

RU.S1.Output.Action.Speech.ReplyTo
RU.S1.Output.Action.Speech.ReplyTo.Do

Posters:
RU.S1.Output.Plan.Task.Speech.Ready
Description: notifies that processing is done

   <ProcessingDone fragment_id="9"/>

RU.S1.Output.Plan.Task.Speech.Done;
RU.S1.Output.Action.Speech.ReplyTo.Done;
RU.S1.Output.Plan.Task.Answear.Ready;

RU.S1.Output.Plan.Act.Speech.Timing
Desription: Phoneme timings sent to other modules when preprocessing for lip syncronization
Xml:
<phonemecollection id=“9”>

       <phoneme starttime="" endtime="0.21000">#</phoneme>
       <phoneme starttime="" endtime="0.26268">w</phoneme>
       <phoneme starttime="" endtime="0.36208">ai</phoneme>
       <phoneme starttime="" endtime="0.41757">l</phoneme>
       <phoneme starttime="" endtime="0.45014">dh</phoneme>
       <phoneme starttime="" endtime="0.55284">ei</phoneme>
       <phoneme starttime="" endtime="0.63179">d</phoneme>
       <phoneme starttime="" endtime="0.66442">r</phoneme>
       <phoneme starttime="" endtime="0.79367">e</phoneme>
       <phoneme starttime="" endtime="0.88565">s</phoneme>
       <phoneme starttime="" endtime="0.92611">t</phoneme>
       <phoneme starttime="" endtime="1.00522">uh</phoneme>
       <phoneme starttime="" endtime="1.08368">p</phoneme>
       <phoneme starttime="" endtime="1.13934">i</phoneme>
       <phoneme starttime="" endtime="1.19813">n</phoneme>
       <phoneme starttime="" endtime="1.28759">f</phoneme>
       <phoneme starttime="" endtime="1.41052">ai</phoneme>
       <phoneme starttime="" endtime="1.47001">n</phoneme>
       <phoneme starttime="" endtime="1.58399">k</phoneme>
       <phoneme starttime="" endtime="1.63730">l</phoneme>
       <phoneme starttime="" endtime="1.82570">ou</phoneme>
       <phoneme starttime="" endtime="1.85876">dh</phoneme>
       <phoneme starttime="" endtime="1.94083">z</phoneme>
       <phoneme starttime="" endtime="2.15083">#</phoneme>
  </phonemecollection>

Implementation Remarks

The executables are located in the “roboradio/is/ru/cadia/roboradio/src/modules/executors/speech/SpeechPlanner” folder

The same module is used for Speech_renderer and Speech_player but with different parameters

To run the speech_renderer in Linux: ./speechplanner psyclone=localhost

To run the Speech_player: ./speechplanner psyclone=localhost module=Speaker