
Kapalı
İlan edilme:
Teslimde ödenir
I’m building a web-based voice assistant that can listen, reason and speak back to the user, and I need the server-side logic developed in Node.js using Express.js. Core requirements • Accept live or recorded audio, run speech-to-text, pass the transcript through my business logic (I’ll supply the rules/LLM calls), then return synthesized speech. • Provide an endpoint that can also place or receive phone calls, stream the audio both ways, and run the same STT / TTS pipeline in real time. • Expose clean JSON APIs plus a WebSocket channel so the front-end can show partial transcripts and intermediate thinking steps. • All configuration (API keys for Google Cloud, Amazon Polly, Twilio, etc.) must be environment-driven and easy to swap. • Unit-tested code, concise README and a short screencast that proves the flow from browser to call and back. Acceptance criteria 1. Docker-compose up builds and starts everything, including any Redis/queue you add. 2. Hitting /health returns 200. 3. Posting an audio file to /voice returns a playable TTS response within 3 s for a 5-second clip. 4. During a live call the latency between spoken word and assistant reply stays under 1 s on average. Let me know which STT/TTS stack you favour and any past projects where you streamed audio through Express; that context will help me move fast.
Proje No: 40091172
20 teklifler
Uzaktan proje
Son aktiviteden bu yana geçen zaman 2 ay önce
Bütçenizi ve zaman çerçevenizi belirleyin
Çalışmanız için ödeme alın
Teklifinizin ana hatlarını belirleyin
Kaydolmak ve işlere teklif vermek ücretsizdir
20 freelancer bu proje için ortalama ₹27.467 INR teklif veriyor

Hello Sir I can build your Node.js (Express) voice assistant backend with streaming STT → reasoning → TTS, plus real-time WebSocket updates and clean JSON APIs. I’ll integrate Twilio Media Streams for phone calls, Google Speech for fast transcripts, and Amazon Polly for low-latency replies — all environment-driven and swappable. The system will be containerized (docker-compose), unit-tested, and tuned to keep call latency under 1s. You’ll get /health, /voice, Redis-backed queues, and a screencast proving end-to-end flow. I’ve previously shipped production voice bots handling live calls and streaming audio through Express. Ready to start and move fast. Best Regards Jitendra Sharma
₹55.000 INR 20 gün içinde
5,1
5,1

Hi, I have reviewed your requirement, With extensive experience in backend development using Node.js and Express, I'm confident in my ability to create a robust and efficient voice AI system for you. Having developed numerous API-driven systems and real-time applications with similar requirements, I know what it takes to deliver the project successfully. Embarking on this voice AI project, I can significantly leverage my proficiency in speech-to-text (STT) and text-to-speech (TTS) technologies. Over the years, I have worked with various AI stacks including Google Cloud's STT/TTS and Amazon Polly and feel comfortable integrating these services into our project. Moreover, my past experiences with streaming audio through Express will ensure I tackle the endpoint and WebSocket channel requirements adeptly. Can we discuss your goals in a quick chat? Warm regards Usama
₹25.000 INR 3 gün içinde
4,9
4,9

Hello, I’m Rahul Singh, running Team Velora for the past 3+ years, with strong backend experience in Node.js + Express and real-time audio pipelines. We clearly understand your voice assistant flow—STT → business logic/LLM → TTS—with live streaming via WebSockets and phone calls using Twilio. Our team has delivered low-latency, Dockerized, environment-driven systems with clean APIs and tests. Let’s connect in chat and move fast.
₹30.000 INR 10 gün içinde
3,7
3,7

Hello, I’m Karthik, a Node.js/Express developer with 10+ years building real-time web applications and voice-powered platforms. I can develop a robust backend for your voice assistant with STT → LLM reasoning → TTS pipeline, fully Dockerized and production-ready. My approach: STT/TTS: I recommend Google Cloud Speech-to-Text or OpenAI Whisper for transcription, and Amazon Polly or ElevenLabs for high-quality, low-latency speech synthesis. Real-Time Audio: WebSocket streaming for partial transcripts and intermediate reasoning; Twilio integration for live call handling. API & Architecture: Clean JSON endpoints, environment-driven config for API keys, Redis/queue for async processing, unit-tested code. Deliverables: /health endpoint, /voice POST handling, real-time call streaming, concise README, and screencast demonstrating the full flow. I’ve built multiple real-time voice processing systems with Express, Docker, and cloud TTS/STT integrations. The system will meet your <1s latency requirement and be easy to extend with future business logic. Best regards, Karthik
₹55.000 INR 7 gün içinde
4,2
4,2

Hi there! I’ve reviewed your project and specialize in server-side logic development with Node.js and Express.js. I’ll create an efficient pipeline that accepts audio, processes it through your business logic, and synthesizes speech, all while ensuring low latency during calls. Let’s set up a quick meeting to discuss your preferred STT/TTS stack and streamline this project. Best Regards, Amjad Iqbal
₹35.000 INR 84 gün içinde
3,3
3,3

I’d be happy to help build the server side for your voice assistant using Node.js and Express. I can handle live and recorded audio, connect speech to text, run it through your logic, and return natural sounding speech. I’ll also set up real time phone call streaming with the same flow, plus APIs and WebSockets for live transcripts and updates. Everything will be easy to configure with environment variables. You’ll get clean, tested, Docker-ready code, clear documentation, and a short demo video showing the full flow working end to end.
₹12.500 INR 10 gün içinde
0,0
0,0

I’ve already built a very similar web-based voice assistant pipeline, and I can show you a working demo where audio flows from browser → server → reasoning layer → spoken response, including partial transcripts over WebSocket. How I’d approach your requirements Architecture (Node.js + Express) REST APIs for recorded audio (/voice) and health (/health) WebSocket layer for live partial transcripts, intermediate reasoning steps, and real-time updates Separate real-time audio streaming service for calls (Twilio Media Streams) Queue/worker (Redis + BullMQ) for non-blocking STT/TTS when needed
₹25.000 INR 1 gün içinde
0,0
0,0

Hello, I have reviewed your requirements for building a Voice AI backend using Node.js and Express. I can implement audio recording ingestion, speech-to-text, business logic processing, and text-to-speech response with low latency. I have experience working with REST APIs, WebSockets, JSON pipelines, and audio streaming. I can integrate STT/TTS providers such as Google Cloud, Amazon Polly, or Twilio, manage environment-based configs, and deliver clean, well-documented, unit-tested code with Docker support. The backend will be scalable, secure, and optimized for real-time responses. I am ready to start immediately and discuss the preferred STT/TTS stack. Best regards, Sahu
₹25.000 INR 7 gün içinde
0,0
0,0

Getting a perfect fit for your project is as easy as matching expertise with your need for clean, professional, user-friendly, and seamless server-side logic in Node.js and Express.js. I understand your focus on integrated, automated speech-to-text and text-to-speech pipelines with real-time bi-directional streaming and environment-driven configuration. You require concise, unit-tested code and reliable Docker setup that meets strict latency and health-check acceptance criteria. While I am new to Freelancer, I have tons of experience building Express-based audio streaming services with Google Cloud and Twilio, delivering robust JSON APIs and WebSocket support. I would love to chat more about your project! Regards, Nadia Du Preez
₹28.150 INR 30 gün içinde
0,0
0,0

Hi, I can build a low-latency Voice AI backend using Node.js + Express that supports both browser audio and real-time phone calls through a single, clean STT → logic → TTS pipeline. The design will prioritize sub-second response time, provider flexibility, and simple deployment. Your business rules or LLM calls will plug into a clearly defined interface, so you can change logic without touching audio or telephony code. Should the assistant interrupt (barge-in) if the user starts speaking during TTS? Do you want partial TTS streaming, or full response audio only?
₹25.000 INR 7 gün içinde
0,0
0,0

Kolkata, India
Eki 21, 2025 tarihinden bu yana üye
₹12500-37500 INR
$250-750 AUD
₹12500-37500 INR
$30-250 USD
₹600-1500 INR
₹37500-75000 INR
$8-15 USD / saat
₹37500-75000 INR
₹12500-37500 INR
$10-30 USD
₹600-1500 INR
₹400-750 INR / saat
$30-250 USD
$50-300 NZD
$250-750 USD
$8-15 USD / saat
₹1500-12500 INR
$10-30 USD
$30-250 USD
$1500-3000 USD