
Open
Posted
•
Ends in 5 days
Paid on delivery
I need to turn a Large Language Model into a practical, real-time assistant that runs directly on an NVIDIA Jetson board. The model will reason over image data at the edge, so every millisecond saved and every megabyte spared matters. Memory-optimised execution is therefore the single most important constraint, though I still expect you to keep latency low and power draw sensible. Here is what I want to achieve. The LLM must accept a pre-processed visual input, apply prompt-engineering tricks that preserve context, and reply fast enough to be useful on-device—no cloud fallback. Smart caching, selective quantisation, and, when it genuinely pays off, lightweight fine-tuning are all on the table. I will look to you to suggest the right mix of techniques and to implement them. You should be comfortable with TensorRT, NVIDIA Triton, or any other inference engine that squeezes the most out of Jetson GPUs; hands-on experience with model compression libraries such as bits-and-bytes, FasterTransformer, or similar will help convince me you can meet the memory target. If a custom data pipeline is needed to translate raw camera frames into embeddings the LLM can consume, please include that in your plan. Deliverables • A working Jetson image or container that boots straight into the optimised LLM service, ready to accept image-derived prompts and respond in real time. • Source code, build scripts, and concise documentation describing the preprocessing, caching, and memory-saving strategies used, plus steps to reproduce results on a fresh device. • A short benchmark report demonstrating memory footprint and end-to-end latency under typical input sizes. Acceptance will be based on the ability to hit the promised memory budget while maintaining interactive response times on the specified Jetson hardware.
Project ID: 40490529
12 proposals
Open for bidding
Remote project
Active 6 hours ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
12 freelancers are bidding on average ₹359,167 INR for this job

Hi, Krishna here from Delhi. We are a team of 20+ Engineers and have completed 300+ projects with 4.7+ rating. Would love to connect with you to discuss the project. With over a decade of experience in the field and a wealth of skills in AI Development, Deep Learning, and Machine Learning under my belt, I am confident that I can tackle every aspect of your project, merging performance with efficiency to meet your unique needs. I have a thorough understanding of methodologies such as TensorRT, NVIDIA Triton, and model compression libraries like bits-and-bytes and FasterTransformer to maximize the capability on Jetson GPUs while minimizing memory usage. Lastly, collaboration is key during high-stakes projects like this one. My strengths as an effective team player and a fluid communicator will ensure that our lines are open and frequent updates are given throughout the duration of the project. Trust me with your Jetson Edge LLM deployment- let's redefine innovation together in the digital age.
₹375,000 INR in 7 days
3.5
3.5

As an experienced AI developer, I understand the unique challenges and opportunities that come with deploying a large language model on edge devices like the Jetson board. Having worked extensively with TensorRT and NVIDIA Triton, I know how to optimize models on Jetson GPUs for maximum memory efficiency without compromising latency or power draw. My familiarity with compression libraries like bits-and-bytes and FasterTransformer will certainly help me meet your strict memory targets. Moreover, I have a solid grasp of creating custom data pipelines for preprocessing image data and generating embeddings for the LLM. This is vital in ensuring that not only does the model operate real-time at the edge, but it is also able to reason over image data effectively. I back up my work with thorough documentation, source code, and build scripts so that every strategy employed can be easily understood and reproduced. My aim is to not just deliver what you have asked for, but to provide value-addition through concise yet comprehensive benchmark reports demonstrating memory footprint and end-to-end latency under typical input sizes. Your project aligns perfectly with my skillset that combines AI development genealogy with a resolute dedication to produce high-quality work. Let's turn your LLM into a blazing-fast Edge Assistant together!
₹375,000 INR in 7 days
0.0
0.0

I propose to develop a fully optimized on-device AI assistant for NVIDIA Jetson that can understand images and respond in real time without any cloud dependency. The system will use a lightweight LLM (Phi-3 Mini / Llama 3B INT4) combined with a fast vision encoder (CLIP/SigLIP). The pipeline will process camera frames locally, convert them into embeddings, and pass them to the LLM for reasoning. For performance optimization, TensorRT/TensorRT-LLM will be used along with INT4 quantization, KV-cache management, and smart caching to minimize memory usage and reduce latency. The solution will be deployed inside a Docker container with a ready-to-run API service (FastAPI/gRPC) for real-time inference. Final output will include source code, deployment scripts, and a benchmark report showing memory usage and latency on Jetson hardware.
₹375,000 INR in 50 days
0.0
0.0

Chennai, India
Member since Jun 4, 2026
$250-750 USD
₹250000-500000 INR
$750-1500 USD
$1500-3000 USD
₹150000-250000 INR
₹1500-12500 INR
₹250000-500000 INR
₹100-400 INR / hour
$30-250 USD
£10-15 GBP / hour
$8-15 USD / hour
$250-750 USD
€12-18 EUR / hour
$30-250 USD
₹600-1500 INR
$250-750 USD