Ozeki VoIP SDK, Azure Cognitive Speech - Difficile de transférer la voix sur une ligne

Invité · 12/01/2023, 22h44

Bonjour,

J'essaye d'utiliser la sortie de voix azure pour l'injecter dans une conversation téléphonique avec Ozeki.

Enregistrement de la ligne -> Ok
Réception de l'appel externe -> Ok
Génération du TTS -> Ok
Connection du stream azure à ozeki -> Je sais pas comment faire

Mes contraintes:

Je ne dois pas utiliser des fichiers sur le disque.
Le temps de latence doit être le plus faible possible
Tant que l'appel n'est pas raccroché par le client, continué la lecture TTS au fur et à mesure
.Net Framework 4.5

Quelqu'un aurait une piste ?

Coté TTS:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
public static async Task<byte[]> SynthesisToPushAudioOutputStreamAsync()
        {
            var conf = SpeechConfig.FromSubscription(az_key, az_reg);
            conf.SpeechRecognitionLanguage = az_lang;
            conf.SpeechRecognitionLanguage = az_voice;
 
            // Prepare ssml from text input
            //var ssml = $@"<speak version='1.0' xml:lang='fr-FR' xmlns='http://www.w3.org/2001/10/synthesis' xmlns:emo='http://www.w3.org/2009/10/emotionml'  xmlns:mstts='http://www.w3.org/2001/mstts'><voice name='{az_voice}'><s /><mstts:express-as style='cheerful'>{text}</mstts:express-as><s /></voice ></speak > ";
 
            // Creates an instance of a customer class inherited
            var callback = new PushAudioOutputStreamSampleCallback();
 
            // Creates an audio out stream from the callback.
            using (var stream = AudioOutputStream.CreatePushStream(callback))
            {
                // Creates a speech synthesizer using audio stream output.
                using (var streamConfig = AudioConfig.FromStreamOutput(stream))
                using (var synthesizer = new SpeechSynthesizer(conf, streamConfig))
                {
                    while (true)
                    {
                        // Receives a text from console input and synthesize it to push audio output stream.
                        Console.WriteLine("Enter some text that you want to synthesize, or enter empty text to exit.");
                        Console.Write("> ");
                        string text = Console.ReadLine();
                        if (string.IsNullOrEmpty(text))
                        {
                            break;
                        }
 
                        using (var result = await synthesizer.SpeakTextAsync(text))
                        {
                            if (result.Reason == ResultReason.SynthesizingAudioCompleted)
                            {
                                Console.WriteLine($"Speech synthesized for text [{text}], and the audio was written to output stream. first byte latency: {callback.GetLatency()}");
                            }
                            else if (result.Reason == ResultReason.Canceled)
                            {
                                var cancellation = SpeechSynthesisCancellationDetails.FromResult(result);
                                Console.WriteLine($"CANCELED: Reason={cancellation.Reason}");
 
                                if (cancellation.Reason == CancellationReason.Error)
                                {
                                    Console.WriteLine($"CANCELED: ErrorCode={cancellation.ErrorCode}");
                                    Console.WriteLine($"CANCELED: ErrorDetails=[{cancellation.ErrorDetails}]");
                                    Console.WriteLine($"CANCELED: Did you update the subscription info?");
                                }
                            }
                        }
                    }
                }
 
                Console.WriteLine($"Totally {callback.GetAudioData().Length} bytes received.");
                return callback.GetAudioData();
            }
        }

Coté gestion de l'appel, bofff:

Code :

Sélectionner tout - Visualiser dans une fenêtre à part

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
 
 
        static MediaConnector connector;
        static PhoneCallAudioSender mediaSender;
        static PhoneCallAudioReceiver mediaReceiver;
        static Speaker speaker;
        static Microphone microphone;
        static RawStreamPlayback rawStreamPlayback;
 
 
 
        public static async void IncomingCall(object sender, VoIPEventArgs<IPhoneCall> e)
        {
            Log.info("Incoming call from: " + e.Item.DialInfo.ToString());
            var call = e.Item;
            call.CallStateChanged += StateChanged;
            call.Answer();
 
 
            // Récupération du flux audio
            /*var callback = new PushAudioOutputStreamSampleCallback();
            await TTS.SynthesisToPushAudioOutputStreamAsync();
            
            var data = callback.GetAudioData();
            playback = new AudioStreamPlayback(data);*/
            speaker = Speaker.GetDefaultDevice();
 
            // Create a new RawStreamPlayback object
            RawStreamPlayback rawStreamPlayback = new RawStreamPlayback();
 
            // Attach the audio output device (e.g. speakers) to the RawStreamPlayback object
            rawStreamPlayback.AttachToAudioOutputDevice();
 
            // Open a file stream to read the audio data
            FileStream fileStream = new FileStream("audio.pcm", FileMode.Open, FileAccess.Read);
 
            // Start playing the audio data from the file stream
            rawStreamPlayback.Start(fileStream);
 
 
            mediaSender = new PhoneCallAudioSender();
            mediaReceiver = new PhoneCallAudioReceiver();
 
            connector = new MediaConnector();
 
            connector.Connect(microphone, mediaSender);
            connector.Connect(mediaReceiver, speaker);
 
            mediaSender.AttachToCall(call);
            mediaReceiver.AttachToCall(call);
 
            microphone.Start();
            speaker.Start();
 
            while (true) Thread.Sleep(10);
        }
 
 
 
        public static void StateChanged(object sender, CallStateChangedArgs e)
        {
            Log.info("Call state " + e.State.ToString());
        }

Merci d'avance pour les aventureux !

Ozeki VoIP SDK, Azure Cognitive Speech - Difficile de transférer la voix sur une ligne

C#

Vue hybride

Discussions similaires

Partager

Partager