
Kapalı
İlan edilme:
We want to train a custom Turkish TTS model with speech quality comparable to ElevenLabs. The model must achieve ultra-low latency suitable for real-time phone call interactions. We have tried open models such as PiperTTS, xTTS-v2, MMS-TTS, Chatterbox, community fine-tuned Orpheus models etc. and have found them to be lacking in either quality or speed or both. We want something that will both be high quality and be serveable with very low latency.
Proje No: 40072117
26 teklifler
Uzaktan proje
Son aktiviteden bu yana geçen zaman 2 ay önce
Bütçenizi ve zaman çerçevenizi belirleyin
Çalışmanız için ödeme alın
Teklifinizin ana hatlarını belirleyin
Kaydolmak ve işlere teklif vermek ücretsizdir
26 freelancer bu proje için ortalama $20 USD/ saat teklif veriyor

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
$30 USD 40 gün içinde
7,2
7,2

Hello Nureddin, This is exactly the problem space I work in. I’ve trained and deployed custom neural TTS systems for real-time voice interactions where quality + latency both matter (IVR, live calls, assistants). I’m very familiar with the limitations of Piper, xTTS-v2, MMS-TTS, Orpheus variants, and why they fall short for Turkish prosody and sub-200ms latency. How I’ll Build Your Turkish TTS Model Architecture (Quality + Speed) Non-autoregressive acoustic model (VITS-derived / diffusion-free) Turkish-specific phoneme & stress modeling Custom vocoder (HiFi-GAN / UnivNet-class, aggressively optimized) Latency Engineering Chunked / streaming inference TorchScript / ONNX / TensorRT export CPU-first path with optional GPU Target: real-time phone calls (<150–200ms) Data & Training Turkish phonemisation & normalization pipeline Prosody conditioning (duration, pitch, energy) Careful speaker balance & noise control MOS-driven evaluation vs ElevenLabs-style benchmarks Serving & Integration Low-overhead inference server (REST / gRPC) Deterministic output, stable voices Ready for SIP / telephony pipelines Tech Stack PyTorch, custom TTS architectures HiFi-GAN / optimized neural vocoders ONNX / TensorRT / TorchScript Real-time audio streaming Relevant Projects Real-Time Neural TTS for Call Centers Low-Latency Multilingual Voice Engine Production TTS Model (Telephony-Grade) I can show demo audio, latency benchmarks, and training code from similar systems.
$30 USD 40 gün içinde
4,9
4,9

# 8 years experience in Machine Learning and AI model development Hello sir ! Happy to tell you that I already did this kind of multiple works and can see my portfolio and check profile reviews too. Let me tell you, if you need help in any domain like Full Stack Website Development and Design , App development, Automation, AI / ML related works , Graphics design , social media marketing, research and much more . With 8 years of experience in Machine Learning and AI model development, I am ready to start the work right now. Best regards, Prashant Kumar
$15 USD 60 gün içinde
2,9
2,9

Hey , I just finished reading the job description and I see you are looking for someone experienced in Deep Learning, AI Text-to-speech, Android, Machine Learning (ML) and AI Model Development. This is something I can do. Please review my profile to confirm that I have great experience working with these tech stacks. While I have few questions: 1. These are all the requirements? If not, Please share more detailed requirements. 2. Do you currently have anything done for the job or it has to be done from scratch? 3. What is the timeline to get this done? Why Choose Me? 1. I have done more than 250 major projects. 2. I have not received a single bad feedback since the last 5-6 years. 3. You will find 5 star feedback on the last 100+ major projects which shows my clients are happy with my work. Timings: 9am - 9pm Eastern Time (I work as a full time freelancer) I will share with you my recent work in the private chat due to privacy concerns! Please start the chat to discuss it further. Regards, Salik.
$20 USD 20 gün içinde
0,0
0,0

⭐️⭐️⭐️⭐️⭐️ Hello nureddin123, I'm excited about the opportunity to help you develop a custom Turkish TTS model that meets your high standards for quality and ultra-low latency. With my background in machine learning and experience in speech synthesis, I understand the challenges of balancing quality and speed, especially in real-time applications. I've successfully worked on similar projects, optimizing TTS models to enhance performance while ensuring top-notch sound quality. My commitment to your satisfaction means I will iterate on the model based on your feedback to achieve the desired results. I'm available to start immediately and would love to discuss your specific requirements further. Let's collaborate to create a TTS solution that surpasses your expectations. Looking forward to your response! Best, Abudulhamid
$25 USD 40 gün içinde
0,0
0,0

Hi nureddin123, How are you! I've carefully checked your requirements and really interested in this job. I'm a full stack Javascript developer working at large-scale apps as a lead developer with U.S. and European teams. I'm offering best quality and highest performance at lowest price. I can complete your project on time and your will experience great satisfaction with me. I'm well versed in React/Redux, Angular JS, VueJS, Node JS, Python, html/css as well as javascript and jquery. Simply, I have rich experienced in Android, Machine Learning (ML), Deep Learning, AI Text-to-speech, AI Model Development as you enumrated. For more information about me, please refer to my portfolios. I'm ready to discuss your project and start immediately. Looking forward to hearing you back and discussing all details. Thanks & Regards, Dragan M.
$20 USD 10 gün içinde
0,0
0,0

Hey , I just finished reading the job description and I see you are looking for someone experienced in Android, AI Text-to-speech, Deep Learning, AI Model Development and Machine Learning (ML). This is something I can do. Please review my profile to confirm that I have great experience working with these tech stacks. While I have few questions: 1. These are all the requirements? If not, Please share more detailed requirements. 2. Do you currently have anything done for the job or it has to be done from scratch? 3. What is the timeline to get this done? Why Choose Me? 1. I have done more than 250 major projects. 2. I have not received a single bad feedback since the last 5-6 years. 3. You will find 5 star feedback on the last 100+ major projects which shows my clients are happy with my work. Timings: 9am - 9pm Eastern Time (I work as a full time freelancer) I will share with you my recent work in the private chat due to privacy concerns! Please start the chat to discuss it further. Regards, Syed.
$15 USD 15 gün içinde
0,0
0,0

Drawing from my extensive experience as an ML Ops Engineer and Data Scientist with a unique focus on AI model development, I believe that I am well-equipped to tackle the challenge of training a high-quality, low-latency Turkish TTS model for your project. Having spent over a decade designing and scaling ML and data pipelines in production, I have encountered diverse scenarios that demand quality output without any compromise on speed - just like yours. In addition, I understand the frustrations that can arise with existing open models as they may not necessarily cater to niche use-cases such as real-time phone call interactions. As a solution-driven professional, I have consistently delivered results by creating bespoke models tailored to specific requirements. My wide domain expertise coupled with a penchant for cutting-edge technology will enableme to train a Turkish TTS model that satisfies your need for both ultra-low latency and exceptional speech quality. Lastly, my proficiencies in AI Text-to-speech and Machine Learning (ML) will be crucial in developing a model that exceeds your expectations. The objective here is not just to imitate the capabilities of renowned models but to surpass them. By analyzing your previous attempts at deploying models and analyzing their limitationsmy methodological approach will enable me to improve upon beforehand-fine tuned Orpheus models to create the unparalleled sophistication you're after. With me on board, you can rest assured knowing you have an accomplished professional dedicated exclusively to producing precisely what you desire - high speech quality at ultra-low latency.
$25 USD 90 gün içinde
0,0
0,0

I saw your project and am confident I can deliver on this. I'm currently working on a similar project and understand the importance of achieving high-quality Turkish TTS with ultra-low latency for real-time phone call interactions. My expertise lies in developing custom models that surpass open-source options like PiperTTS and xTTS-v2. By leveraging advanced techniques, I promise a Turkish TTS model that excels in both quality and speed. This project aligns perfectly with my skills, and I am committed to ensuring the successful completion of your required benefit. I invite you to view my portfolio, which showcases the quality and results of my past work. I look forward to hearing from you. Regards, Travis
$15 USD 40 gün içinde
0,0
0,0

Hello, I’m SUSAN, and I’ve led multiple AI TTS routes that balance natural Turkish prosody with streaming latency suitable for real-time calls. To meet your ElevenLabs-grade target, I will architect a compact, streaming-capable TTS stack using a fast non-autoregressive front-end (FastSpeech 2–style) plus a lightweight neural vocoder, optimized for on-device Android inference. The plan includes targeted data curation for Turkish, robust voice variants, latency-focused training, and aggressive quantization/pruning to hit sub-20–25 ms per utterance on mid-range devices. I’ll deliver an MVP within two weeks, with measurable quality benchmarks and a ready Android integration (NNAPI/GPU). We’ll iterate on quality and latency until the target is met.
$15 USD 33 gün içinde
0,0
0,0

Dear Hiring Manager, I am an experienced Android developer with a keen eye for detail, making me the perfect fit for your project of training a custom Turkish TTS model with high-quality speech output and ultra-low latency. I have a strong background in developing and optimizing models for real-time interactions, ensuring stability and clear communication throughout the process. Having worked on similar projects before, I understand the importance of achieving both high quality and low latency in TTS models. I am confident in my ability to deliver a solution that meets your requirements and exceeds your expectations. I would love the opportunity to discuss this project further and showcase how my skills and experience align with your needs. Thank you for considering my proposal. Best regards, Yevhenii
$20 USD 40 gün içinde
0,0
0,0

Hello there, I can develop a high-quality, low-latency Turkish TTS model tailored to your needs, ensuring performance comparable to ElevenLabs. By leveraging advanced model architectures and optimization techniques, I will create a solution that excels in both speech quality and real-time processing suitable for phone call interactions. I understand that previous models like PiperTTS and xTTS-v2 haven't met your expectations. I will focus on achieving ultra-low latency while maintaining exceptional sound fidelity and naturalness. Questions: • Do you have specific hardware or deployment constraints for the model to run efficiently? • Are there particular datasets or phonetic nuances in Turkish speech you would like prioritized during training? I'm excited about the opportunity to craft a state-of-the-art TTS model that meets your high standards and enhances user experience. Thanks and best regards, Faizan
$19 USD 40 gün içinde
0,0
0,0

Hi there, I understand you’re looking for a high-quality, ultra-low-latency Turkish TTS model suitable for real-time applications like phone calls. I have hands-on experience fine-tuning TTS models for both naturalness and speed, including work with low-latency architectures and inference optimization for real-time streaming. I can help design, train, and deploy a Turkish TTS system that balances audio quality with sub-100ms response times, comparable to ElevenLabs performance. Looking forward for your positive response in the chatbox. Best Regards, Arbaz H
$20 USD 40 gün içinde
0,0
0,0

✨✨✨ I AM DOMENICO ✨✨✨ If you want Turkish TTS that actually sounds human and responds instantly, stop settling for open-source compromises—I build ultra-low latency, production-ready models rivaling ElevenLabs in quality. I’ve reviewed your scope: previous attempts fail on speed, clarity, or real-time deployment. I specialize in custom deep learning TTS pipelines, optimized for inference on CPU/GPU with minimal latency, real-time streaming for phone calls, and fine-tuned voice naturalness. Android or server integration handled. I deliver high-fidelity, responsive, scalable TTS ready for immediate production use.
$20 USD 40 gün içinde
0,0
0,0

Hi, We’re a small team of full-stack and ML engineers with hands-on experience building production-ready TTS systems, not just research demos. We understand the challenge you’re facing — most open models trade off quality for speed or speed for quality, and neither works for real-time phone interactions. Our approach is to design the system end-to-end with latency as a first-class constraint, rather than fine-tuning an existing model and hoping it serves fast enough. This includes: Selecting or training a non-autoregressive acoustic model suitable for streaming Pairing it with a low-latency, real-time vocoder (CPU or GPU depending on infra) Streaming / chunked inference to minimize time-to-first-audio Turkish-specific text normalization, phoneme handling, and prosody tuning Benchmarks focused on real call latency, not offline MOS alone We’ve evaluated and moved beyond the same models you mentioned (Piper, xTTS-v2, MMS-TTS, Orpheus variants), so we’re aligned on what doesn’t work and why. Our goal would be to reach near-commercial quality while keeping the model servable at very low latency for live calls. We work pragmatically: clear milestones, early latency benchmarks, and honest trade-offs surfaced upfront. Happy to review your constraints, available data, target infra, and sign an NDA if required. Looking forward to discussing the next steps.
$20 USD 40 gün içinde
0,0
0,0

Hello!! I am a full-time AI engineer with 8+ years of experience in machine learning, deep learning, and production-grade speech systems, including low-latency TTS and real-time voice pipelines. I have worked with neural TTS architectures focused on both speech naturalness and inference speed, especially for telephony and conversational use cases. For your Turkish TTS model, my approach would prioritize quality comparable to ElevenLabs while meeting ultra-low latency constraints. This includes selecting a fast, high-fidelity architecture (e.g. non-autoregressive or streaming-capable models), careful dataset curation and phoneme alignment optimized for Turkish prosody, and aggressive inference optimization (ONNX/TensorRT, mixed precision, chunked/streaming synthesis). Rather than relying on generic open models, I focus on custom training and fine-tuning with latency-aware evaluation, ensuring the model is serveable in real-time phone call scenarios. The final deliverable would be a high-quality, production-ready Turkish TTS model with documented training, benchmarking results (latency + MOS-style quality checks), and a clear serving strategy suitable for real-time deployment. I’d be happy to discuss your target latency budget, deployment environment, and available datasets to align the training strategy precisely with your goals. Best regards, Sushma S.
$15 USD 40 gün içinde
0,0
0,0

Hi, you’re aiming to build a custom Turkish TTS that matches ElevenLabs-level naturalness while delivering ultra-low latency for real-time phone calls, which is challenging because most open models trade expressiveness for speed. I’ve worked on neural TTS pipelines where we combine a high-quality neural acoustic model (FastSpeech-style or lightweight diffusion) with a neural vocoder optimized via ONNX/TensorRT to achieve sub-100ms streaming latency. The solution would involve curated Turkish speech data, phoneme-level normalization, training a fast non-autoregressive acoustic model, and deploying a streaming vocoder with chunked inference for telephony. I would also apply aggressive model distillation, quantization, and real-time buffering to maintain quality without sacrificing responsiveness. This approach directly addresses the shortcomings you saw in Piper, xTTS, and MMS by separating expressiveness training from runtime efficiency. I’m confident this architecture can deliver phone-grade real-time performance with noticeably higher naturalness. I’d be happy to discuss the tradeoffs and validate the latency targets together. Best regards, Charonda.
$20 USD 40 gün içinde
0,0
0,0

Hi Nureddin, Thanks for the clear brief. You’re right, most open TTS models break down when you push for both natural quality and sub-200ms latency, especially for Turkish. Instead of fine-tuning generic open models, I’d design a custom low-latency TTS stack optimized specifically for phone calls. - Train a phoneme-aware Turkish acoustic model (FastSpeech-style or diffusion-lite hybrid) - Use a neural vocoder optimized for streaming (frame-level generation, no full-sentence wait) - Apply telephony-aware training (8k/16k data, prosody control, noise robustness) - Serve via chunked inference + CUDA graph optimization for real-time calls This avoids the quality/latency tradeoff you’re seeing with Piper, xTTS, MMS, etc. If helpful, I can propose a POC in 2–3 weeks with live call testing before full training. Happy to discuss.
$20 USD 40 gün içinde
0,0
0,0

I believe I’m a strong fit for this project because I have hands-on experience with deep learning models in audio, multimodal AI, and low-latency inference. I’ve worked on speech and audio-based systems in Chirp: Birdsong Recognition, where I handled audio feature extraction (MFCCs, spectral features) and trained neural models on real-world audio data, and in Farmnaxx, where I built speech-driven, multilingual AI pipelines. What differentiates me is my focus on both quality and speed. In the Arm AI Developer Challenge (Lumen), I optimized and deployed multiple models for real-time, fully offline inference on Arm-based devices using quantization, efficient architectures, and hardware-aware optimizations. This experience directly applies to building ultra-low-latency TTS systems for real-time phone calls. I’m comfortable going beyond off-the-shelf models, modifying architectures and training pipelines to bridge the gap between open-source and production-grade solutions. I have strong fundamentals in machine learning and signal processing, along with experience fine-tuning large models using LoRA and building end-to-end AI systems. I iterate quickly, benchmark quality versus latency, and make engineering tradeoffs to reach production-ready performance.
$15 USD 20 gün içinde
0,0
0,0

Istanbul, Turkey
Ara 17, 2025 tarihinden bu yana üye
$250-750 USD
$25-50 USD / saat
$10-30 USD
€30-250 EUR
₹12500-37500 INR
₹12500-37500 INR
₹600-1500 INR
$30-250 USD
$10-30 CAD
₹1500-12500 INR
€250-750 EUR
$30-250 AUD
$10-30 USD
₹12500-37500 INR
£250-750 GBP
€250-750 EUR
$15-25 USD / saat
₹600-800 INR
₹600-1500 INR
$15-25 USD / saat