
Kapalı
İlan edilme:
Teslimde ödenir
I have a very limited corpus of speech and transcripts—just a few hours—and I need to translate low data language to english in realtime. The goal is to adapt a large ASR/TTS Transformer (Whisper-style architecture) to this low-data setting, squeezing the most out of the dataset through smart training techniques. What matters most is the training strategy: curriculum scheduling, learning-rate tricks, mixed-precision, and any other techniques you know that keep a Transformer stable when the data are scarce. Data augmentation, synthetic data creation, few-shot and self-supervised learning are all on the table; noise injection, pitch or speed variation can be explored if they prove helpful. I prefer to work within proven open-source stacks—Coqui TTS, Mozilla TTS, and Tacotron—so please build and document the pipeline in those environments. The final model should achieve intelligible, natural-sounding speech in the target language and be reproducible on a single high-end GPU. Deliverables • Cleaned, ready-to-train audio/text dataset (scripts included) • Training code and configuration files for the chosen framework(s) • Fine-tuned model checkpoints plus inference script • Short report summarising hyper-parameters, augmentation methods, and evaluation results (WER/MOS) Acceptance criteria • Training scripts run end-to-end from raw data to synthesis without errors • Objective metrics meet or exceed baseline Whisper fine-tune on the same data • Synthesised samples judged 4.0 MOS or higher by at least three native speakers If you enjoy pushing Transformers into low-resource territory, let’s make this language heard.
Proje No: 40065263
24 teklifler
Uzaktan proje
Son aktiviteden bu yana geçen zaman 2 ay önce
Bütçenizi ve zaman çerçevenizi belirleyin
Çalışmanız için ödeme alın
Teklifinizin ana hatlarını belirleyin
Kaydolmak ve işlere teklif vermek ücretsizdir
24 freelancer bu proje için ortalama $6.013 USD teklif veriyor

Hello, As the leader of a talented team at Live Experts, we've accumulated extensive experience and knowledge in the domains of Deep Learning and Machine Learning. Specifically, within the realm of low-resource settings, we've developed quite an expertise. In fact, challenging boundaries and maximizing the potential of limited data is what excites us the most about AI. Your project's complex training strategy requirements align perfectly with our skillset. With hands-on familiarity in Tacotron, Coqui TTS, and Mozilla TTS among other frameworks, we can create a tailor-made pipeline for your unique dataset. We know how to bend low data limitations by employing smart training methods such as curriculum scheduling, learning-rate tricks, mixed-precision training and even self-supervised techniques. Moreover, our ability to navigate with ease through tasks like synthetic data creation and few-shot/zero-shot learning will undoubtedly prove valuable in this project. This combined with our strong understanding of augmentations using techniques like noise injection, pitch or speed variation will help maximize the use of your limited corpus. Our collaborative efforts with you will not end at training though: our deliverables include a comprehensive report summarizing hyper-parameter choices, augmentation methods used along with transparent evaluation results to know just what went into creating an intelligible model. In conclusion, if you'r Thanks!
$5.000 USD 3 gün içinde
6,8
6,8

With over 10 years of experience in web and mobile development, specializing in AI/ML, blockchain, and more, I understand the unique challenge your project presents. You have a limited corpus of speech and transcripts and need to train a Transformer in a new language using smart techniques. In previous projects, I have successfully implemented innovative strategies in fintech, healthcare, and blockchain domains, achieving outstanding results. My expertise in training AI models with scarce data aligns perfectly with your project requirements. Let's bring your vision to life by leveraging my experience and skills to deliver a high-quality solution. I am excited to collaborate with you on this project and look forward to discussing the details further. Contact me to discuss how we can achieve exceptional results together.
$4.000 USD 45 gün içinde
5,4
5,4

I propose to build a reproducible, low-resource speech-to-English translation system by adapting a Whisper-style Transformer using advanced training strategies designed for scarce data. With only a few hours of labeled speech, the focus will be on how the model is trained rather than model size: curriculum learning (short-to-long utterances), careful learning-rate scheduling with warm-up, mixed-precision training, encoder freezing/unfreezing, and strong regularization to prevent overfitting. The pipeline will be built on proven open-source stacks (Whisper + Coqui TTS/Tacotron). Data efficiency will be maximized through augmentation (speed perturbation, mild noise injection, SpecAugment), pseudo-labeling on unlabeled audio, and self-training. The first phase targets low-resource ASR → English text, followed by high-quality English TTS to ensure natural, intelligible output while keeping MOS high. The system will be optimized for near-real-time inference on a single high-end GPU. Deliverables include fully automated data-cleaning scripts, end-to-end training code and configs, fine-tuned checkpoints, and an inference script. A concise report will document hyperparameters, augmentation choices, and evaluation results (WER and MOS). Success is defined by stable end-to-end training, objective metrics exceeding a baseline Whisper fine-tune, and synthesized speech achieving ≥4.0 MOS from native evaluators.
$50.000 USD 55 gün içinde
5,3
5,3

You’ve already built something powerful — the challenge now is that in a low-data setting, Transformers run out of meaningful signal long before they run out of capacity. That’s why training starts to feel unstable, progress becomes unpredictable, and improvements don’t always translate into better real-world results. I understand that frustration, and I’m here to help you move forward calmly and deliberately. My job is to get you to a model you can trust, retrain, and extend — without wasted runs or evaluation drama. Given your constraints (a few hours of speech, limited phonetic coverage, and a single high-end GPU), this is a strategy problem more than a compute problem. The work is about keeping training stable past the early overfit cliff, using augmentation conservatively, and knowing early whether a run is worth continuing. How I’ll do this: Validate the dataset, check alignment, and run baselines to set realistic expectations. Run disciplined training cycles with controlled augmentation, learning-rate scheduling, and checkpoint evaluation. Refine results, align MOS feedback, and deliver clean configs, checkpoints, and an inference script. I’m comfortable working with Coqui TTS, Mozilla TTS, and Tacotron-based pipelines, and I prioritize reproducibility and documentation so nothing depends on “magic” settings. If this aligns with what you’re aiming for, I’d be happy to walk through the proposed phases and success criteria before we start.
$3.200 USD 15 gün içinde
4,8
4,8

Dear Project Lead, What if you could achieve natural-sounding speech synthesis in your low-resource language without massive datasets? I'd like to build a working demo of the optimized Whisper-style training pipeline—complete with curriculum scheduling and data augmentation—before you commit to the full project, so you can see the approach in action. I specialize in squeezing maximum performance from scarce speech data using proven techniques: mixed-precision training, synthetic data generation, self-supervised learning, and intelligent hyperparameter tuning. Using your preferred open-source stacks (Coqui TTS, Mozilla TTS, Tacotron), I'll deliver a reproducible, GPU-efficient pipeline that achieves 4.0+ MOS scores and exceeds baseline Whisper performance on your exact dataset. Your language deserves to be heard—and I'm confident this approach will get you there. Let's discuss your specific corpus characteristics and audio quality goals so I can refine the strategy and show you a working demo. Regards, Smith
$4.000 USD 7 gün içinde
4,4
4,4

As a versatile Full Stack Developer with expertise extending into Machine Learning, I’m passionate about pushing the boundaries of AI with innovative, bespoke solutions. Your project caught my eye because it specifically requires someone who enjoys navigating challenging low-resource language tasks- and that's exactly me! Being adept at both Python and AI frameworks such as TensorFlow orientates me well to the open-source Coqui TTS, Mozilla TTS, and Tacotron stacks you prefer. In the context of scarce data like yours, my ML skills cut through. I'm practiced at deploying advanced data augmentation, few-shot, self-supervised learning techniques in conjunction with clever curriculum scheduling, learning-rate tricks, and mixed-precision strategies to optimize results even when data is limited. I leverage my fluency in Python (including Django and FastAPI), Java (especially Spring Boot) and understanding of languages like C++, Lua and C# to build powerful models that yield natural-sounding real-time translations.
$3.000 USD 11 gün içinde
3,7
3,7

As an experienced full stack Engineer with a strong background in Web and Mobile Application Development, I am excited about the opportunity to work on the Low Resource TTS Transformer Training project. With over 8 years of experience in developing innovative solutions, I am confident in my ability to tackle the challenges posed by adapting a large ASR/TTS Transformer to a low-data setting. My expertise in curriculum scheduling, learning-rate optimization, and other smart training techniques will be instrumental in maximizing the potential of the limited corpus provided. I am well-versed in working with open-source stacks such as Coqui TTS, Mozilla TTS, and Tacotron, and I am committed to delivering a high-quality, reproducible model that meets your requirements. I am passionate about pushing the boundaries of technology and am eager to collaborate with you on this project to make this language heard. Let's work together to create intelligible, natural-sounding speech in the target language.
$3.000 USD 7 gün içinde
0,0
0,0

Hi, I’m ready to adapt a large ASR/TTS Transformer to translate your low-data language to English in real-time, maximizing performance despite having only a few hours of speech and transcripts. My approach: --> Clean and preprocess your audio and transcripts, including normalization, noise handling, and alignment, producing a ready-to-train dataset. --> Implement curriculum scheduling, adaptive learning rates, mixed-precision training, gradient accumulation, and stability-focused tricks tailored for low-data Transformers. --> Apply noise injection, pitch/speed variation, and potentially synthetic speech generation to expand the effective dataset. Few-shot and self-supervised learning methods will be explored to boost performance. --> Build and document the pipeline within Coqui TTS, Mozilla TTS, or Tacotron environments, ensuring training runs end-to-end on a single high-end GPU. --> Provide fine-tuned model checkpoints, inference scripts, and a short report summarizing hyperparameters, augmentation methods, and evaluation results (WER/MOS). Quick questions: What is the total duration and number of speakers in your dataset? Any specific domain or vocabulary focus for translation that we should prioritize? Preferred target latency or real-time constraints for inference? I will focus on your requirements. I worked on similar projects and can share upon request. Let’s discuss timeline and budget based on these details via chat. Regards, Atta
$4.000 USD 25 gün içinde
0,0
0,0

Boston, United States
Ara 15, 2025 tarihinden bu yana üye
₹600-1500 INR
$10-30 USD
₹600-1500 INR
₹600-1500 INR
$7000 USD
₹75000-150000 INR
₹12500-37500 INR
$10-30 CAD
₹1500-12500 INR
$30-250 USD
$20 USD
$30-250 USD
₹600-1500 INR
$250-750 USD
$80-100 USD / saat
$10-30 USD
₹1500-12500 INR
₹600-1500 INR
₹600-1500 INR
₹1500-12500 INR