
Open
Posted
•
Ends in 6 hours
I have a folder of MP4 recordings that contain all kinds of on-screen numbers—timestamps, counters, device read-outs, you name it. I need an offline application (source code is fine) that can automatically detect, read, and export every numerical value it finds as the video plays frame by frame. Nothing should leave the machine, so the entire pipeline must run locally without any cloud calls. Key details you can rely on: the videos are pre-recorded (no live streams) and always in MP4 format; the numbers are generic, not limited to scores or license plates. I want an AI-driven approach so that, after the first delivery, I can feed the model corrections or new examples and watch its accuracy climb over time. Think of a small feedback interface or a retraining script that lets me tweak performance without rewriting the whole program. I am comfortable with mainstream computer-vision stacks such as OpenCV, TensorFlow, or PyTorch, so feel free to build on those. The final package should include: • A runnable app or script that ingests an MP4 file, detects every distinct numerical instance, and outputs the readings (JSON or CSV is fine). • A simple mechanism—command-line flag, YAML config, or lightweight GUI—that lets me supply corrected readings and trigger incremental training. • Clear setup instructions plus a brief note on how the model can be extended to new video resolutions or number fonts. Acceptance criteria: on a short sample video I will provide, the tool should correctly capture at least 90 % of the visible numbers on first run, store timecodes alongside each reading, and retrain successfully when I give it a handful of edits. If that matches your expertise, I’m ready to test your prototype and iterate quickly.
Project ID: 40463718
72 proposals
Open for bidding
Remote project
Active 3 days ago
Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
72 freelancers are bidding on average £16 GBP/hour for this job

As an Electrical Engineer and AI expert with a dedication to firmware development and complete IoT product engineering, I am confident in my ability to meet your needs for this project. My experience with Microcontrollers & Embedded Systems, Firmware Development, PCB Design & RF Hardware, and AI/ML Integration aligns perfectly with the requirements of your project. By choosing me, you are selecting a professional who can design a cohesive offline application for you that thrives on deciphering numerical values from your videos effortlessly. Not only will I utilize mainstream computer-vision stacks such as OpenCV, TensorFlow, PyTorch but also optimize the code for enhanced performance.
£20 GBP in 40 days
8.2
8.2

Hello, Building a local, adaptive OCR pipeline for unstructured video data is exactly where standard tools fail and custom AI shines. To handle varying fonts and formats entirely offline, I will deploy YOLOv8 via ONNX Runtime to detect and track numerical regions frame-by-frame, paired with a lightweight PyTorch CRNN for robust local text extraction into a timecoded JSON/CSV. To ensure accuracy climbs over time, I'll build a local Streamlit GUI where you can review low-confidence frames, log corrections, and trigger an automated fine-tuning script that adapts the model without rewriting code. Question: Are these on-screen numbers generally fixed in place (like a timestamp) or do they move dynamically across the frame? Best, Niral
£10 GBP in 40 days
7.9
7.9

⭐⭐⭐⭐⭐ Project Understanding: We propose an offline AI solution using OpenCV for video frame extraction, combined with a fine-tunable PyTorch-based OCR model (e.g., CRNN or TrOCR variant) to detect and recognize generic on-screen numbers across timestamps, counters, and readouts in MP4 files. All processing runs locally with no cloud dependency. Proposed Solution: Develop a Python script/app that processes videos frame-by-frame, extracts bounding boxes for numbers via detection model, reads values with timestamps, and exports to JSON/CSV. Include a YAML config + simple CLI/GUI for uploading corrections and triggering incremental retraining on user feedback data. Key Features: 90%+ initial accuracy target on sample video; stores timecodes; supports model extension for new resolutions/fonts via transfer learning scripts. CnELIndia Team Support Steps: Analyze sample video and baseline model training (1 week). Build core pipeline and feedback interface (2 weeks). Test, iterate retraining, and optimize for accuracy (1 week). Deliver full source code, setup docs, and extension guide. Provide post-delivery maintenance for further tuning. This meets all acceptance criteria using required skills (Python, OpenCV, Computer Vision, Deep Learning). Ready for prototype testing.
£13 GBP in 40 days
7.5
7.5

Dear , We carefully studied the description of your project and we can confirm that we understand your needs and are also interested in your project. Our team has the necessary resources to start your project as soon as possible and complete it in a very short time. We are 25 years in this business and our technical specialists have strong experience in C Programming, Python, Software Architecture, C++ Programming, Image Processing, OpenCV, Computer Vision, Deep Learning and other technologies relevant to your project. Please, review our profile https://www.freelancer.com/u/tangramua where you can find detailed information about our company, our portfolio, and the client's recent reviews. Please contact us via Freelancer Chat to discuss your project in details. Best regards, Sales department Tangram Canada Inc.
£22 GBP in 5 days
7.9
7.9

Hey, I will build an offline Python pipeline — OpenCV for frame extraction, a fine-tuned CRNN model via PyTorch for number detection and recognition, outputting timestamped readings to JSON/CSV. For retraining, I will include a CLI script that accepts your corrected labels in a simple YAML format and runs incremental fine-tuning without touching the core codebase. One key design choice: I will use adaptive frame sampling to skip near-duplicate frames, cutting processing time significantly while preserving accuracy. Questions: 1) What is the typical resolution and frame rate of your MP4 files? 2) Are the numbers always static on screen, or do some scroll or animate? Looking forward to your response. Best regards, Kamran
£12 GBP in 40 days
7.1
7.1

Hello, I trust you're doing well. I am well experienced in machine learning algorithms, with nearly a decade of hands-on practice. My expertise lies in developing various artificial intelligence algorithms, including the one you require, using Matlab, Python, and similar tools. I hold a doctorate from Tohoku University and have a number of publications in the same subject. My portfolio, which showcases my past work, is available for your review. Your project piqued my interest, and I would be delighted to be part of it. Let's connect to discuss in detail. Warm regards. please check my portfolio link: https://www.freelancer.com/u/sajjadtaghvaeifr
£25 GBP in 40 days
6.7
6.7

I've built almost this exact kind of pipeline recently — a fully offline OCR tool that walks MP4 frames, detects on-screen numbers, and exports readings with timecodes, no cloud calls. I'd use OpenCV for frame extraction plus a local OCR engine (PaddleOCR or a fine-tuned model) running in PyTorch, all on-machine. The pipeline detects every distinct numeric instance per frame, deduplicates across consecutive frames so a number held on screen isn't logged hundreds of times, and writes each reading with its timecode to JSON or CSV. For the feedback loop I'd add a lightweight correction interface where you supply fixes, which get stored as a labelled dataset that an incremental retraining script fine-tunes on — so accuracy climbs without rewriting anything. I'll keep it extensible to new resolutions and fonts via config. Delivered with setup instructions and the retraining note. I'll target 90%+ on your sample first run. Happy to talk through the OCR approach on a quick call. Best, Dev S.
£15 GBP in 40 days
6.6
6.6

Hi, — this is a computer-vision extraction problem, not just OCR, and the offline requirement makes the system design more important than the model choice. The real engineering risk is separating true on-screen numeric signals from repeated frame noise, persistent overlays, and duplicate reads across adjacent frames while still keeping retraining simple. I've built several production systems around Python-based AI pipelines, OCR-style ingestion, and structured export workflows. For this kind of job, I usually structure the pipeline as video decode, candidate-region detection, numeric recognition, temporal tracking, and export so each layer can be tuned independently. The closest matches in my project history are AI-Driven Marketing Suite Development -- 2 for video-oriented AI processing, and Python Bug Localization Using Transformer Models (CodeBERT + TreeBERT) for reproducible training, confidence scoring, and documented CLI delivery. I would recommend separating detection from reading and then adding a temporal merge layer so the same number is not emitted as a new event on every frame. That tradeoff usually improves precision and makes correction data more useful for incremental retraining. I also typically add confidence thresholds, reviewable correction files, and an evaluation pass against labeled samples so accuracy improvements are measurable rather than subjective. Thanks, Hercules
£25 GBP in 40 days
6.6
6.6

Hi there, I understand you need a fully offline system that processes MP4 videos frame by frame, detects all visible numbers, extracts them with timecodes, and improves over time through user corrections. I am confident I can build a robust computer vision pipeline that meets these requirements while staying completely local with no cloud dependency. My approach will be to use Python with OpenCV for video ingestion and frame-level processing, combined with a local OCR engine such as Tesseract or an EasyOCR/PyTorch-based model for numeric recognition. The system will detect number regions, apply OCR per frame, and output structured results in CSV or JSON including timestamps. I will also design a feedback loop where you can submit corrected outputs, which will be stored and used for periodic retraining or fine-tuning to improve accuracy over time without rebuilding the system. The deliverable will include a runnable offline script or lightweight application for MP4 processing, numeric detection and export module, a simple correction mechanism for feedback-based retraining, and full setup instructions with guidance for adapting to different resolutions and fonts. Are your videos mostly clean screen recordings with digital numbers, or do they include noisy or real-world footage where numbers may be distorted or partially obscured? I’m ready to start immediately. Warm Regards, Aneesa.
£10 GBP in 40 days
6.3
6.3

I am an experienced AI Developer. Your job caught my eye and looks to be quite interesting to me as I developed real-time vehicle detection using transfer learning in recent past. I am well conversant with Generative AI and hands-on experience in developing AI applications using LangChain and LLMs. I am confident that I will be able to help you by developing deep learning model for on-screen number detection in video recordings. Similar work done in the past: - Real-time vehicle detection - Facial recognition for biometric - Fetal brain abnormality detection - AI Powered Copilot for Text2SQL Query - Semantic search engine Relevant Skills: - Python - Django/Flask - ResNet50/Yolov11/Transfer Learning - Agentic AI - GPT4o/Gemini/Llama3.2 - AWS - LangChain - MySQL - TensorFlow - Google Colab - OpenCV Let's have a chat to understand the project objective and the dataset in details. I assure you to deliver the best quality results and ensure the customer satisfaction. Looking forward to hearing from you soon. Thanks for the opportunity.
£15 GBP in 40 days
6.4
6.4

Hello, I have carefully reviewed your requirements and fully understand the scope of building an offline AI-driven video number extraction system for MP4 recordings. I have 10+ years of experience in computer vision, OCR pipelines, OpenCV, PyTorch/TensorFlow, and AI model training workflows, I can develop a fully local solution that processes videos frame-by-frame, detects numerical values, extracts readings with timestamps, and exports structured JSON/CSV results. I can also optimize the pipeline for incremental learning so the model accuracy improves continuously from user-provided corrections without rebuilding the entire system. WE WILL WORK WITH AGILE METHODOLOGY, PROVIDE COMPLETE SOURCE CODE, 2 YEARS FREE ONGOING SUPPORT, AND FULL DOCUMENTATION FOR SETUP, TRAINING, AND EXTENSION. I am confident I can deliver a reliable prototype matching your 90%+ detection accuracy target and iterate quickly based on your test samples. I eagerly await your positive response. Thanks
£13 GBP in 40 days
6.9
6.9

Hello, I understand you need a fully offline system to process MP4 files, extract all on-screen numerical values frame by frame, and export them with timestamps. The core of this system is not just the initial recognition but a crucial feedback loop where you can provide corrected data to incrementally retrain and improve the model's accuracy over time, all without cloud dependency. Technical approach: I'll use Python with OpenCV for video stream processing. A two-stage model is best: first, a lightweight object detector (like an EAST text detector or a custom YOLO variant) to find number regions, followed by a CRNN or Transformer-based OCR model in PyTorch for high-accuracy recognition. The entire pipeline will be self-contained. Core modules: Video Frame Processor, Bounding Box Detection Engine (for number localization), OCR Engine (for character recognition), Timestamped Data Logger (JSON/CSV), and a Retraining Script that accepts a corrections file to fine-tune the OCR model. Implementation strategy: We'll start by building the core detection/OCR pipeline and benchmarking it on your sample video. Once we hit the >90% accuracy baseline, we will build the user-facing retraining script and package the application with clear setup and model extension guides. I have a few questions to clarify the scope: 1. Are the numbers typically in fixed locations, or do they move freely around the screen? 2. Do the videos feature a wide variety of fonts, sizes, and colors for the numbers, or are they relatively consistent? 3. Is there a requirement for processing speed (e.g., frames per second), or is accuracy the only priority? Regards, Rohit
£14 GBP in 25 days
6.7
6.7

Your retraining requirement is the real challenge here - most OCR pipelines treat detection and recognition as separate problems, which means you'll need dual feedback loops to improve both bounding-box accuracy and digit classification. If you only correct the final number without flagging whether the detector missed regions or the recognizer misread digits, your model will plateau around 85% accuracy. Before architecting the solution, I need clarity on two things: What's the typical video resolution and frame rate you're working with? A 4K 60fps video generates 216,000 frames per hour, which changes whether we process every frame or sample at intervals. Second, are these numbers static overlays (like burned-in timestamps) or dynamic elements that move across the frame? That determines whether we need motion tracking or can rely on fixed region detection. Here's the architectural approach: - OPENCV + TESSERACT OCR: Build a preprocessing pipeline with adaptive thresholding and contour detection to isolate numeric regions, then feed cleaned patches to Tesseract with digit-only whitelist mode to avoid false positives from letters. - PYTORCH + CRNN MODEL: Train a custom Convolutional Recurrent Neural Network on your corrected samples using CTC loss, which handles variable-length sequences better than fixed-digit classifiers when numbers have inconsistent spacing. - RETRAINING INTERFACE: Create a JSON annotation file where you mark frame timestamps, bounding boxes, and ground-truth values - then trigger incremental training via a Python script that fine-tunes only the recognition head while freezing the feature extractor to prevent catastrophic forgetting. - FRAME SAMPLING LOGIC: Implement perceptual hashing to skip duplicate frames (common in screen recordings) and only process when pixel difference exceeds a threshold, reducing processing time by 60-70% without missing number changes. I've built similar video analysis pipelines for manufacturing QA systems that read instrument panels at 30fps with 94% accuracy after two retraining cycles. The key is separating your detection confidence from recognition confidence in the output CSV so you know whether to add more bounding-box examples or digit samples. Let's schedule a 15-minute call to review your sample video and confirm the number formats before I start development.
£12 GBP in 30 days
5.6
5.6

Most failures in read-on-screen-number projects come from treating OCR as a single step—compression, motion blur, varied fonts and overlapping UI elements mean you need robust detection, tracking, and a fine-tunable recognizer, not just off‑the‑shelf Tesseract calls. My plan: build a local, frame-by-frame pipeline that (1) detects numeric regions, (2) tracks instances across frames to attach timecodes, (3) recognizes digits with a small trainable model, and (4) exports JSON/CSV with timecode, bbox and confidence. A lightweight CLI/YAML lets you mark corrections; a retrain script consumes those edits and incrementally fine-tunes the recognizer so accuracy improves over time. Recommended stack: Python + OpenCV for video I/O/preprocessing, PyTorch for a CRNN (CTC) recognizer and a CRAFT or EAST detector, a simple IoU-based tracker, and SQLite or CSV for correction storage. Everything runs locally—no cloud calls. Maintenance: the retrain script includes augmentation for new fonts/resolutions and clear setup notes so you can extend datasets without rewriting code. I’ve implemented similar ML pipelines (Practice Tool AI) with local retraining and FastAPI deployment patterns—useful here for iterative testing. If that fits, I’ll prototype on your sample video. Quick question: are the numbers typically fixed overlay positions (e.g., timestamp) or do they move/change location across frames?
£12.50 GBP in 7 days
4.8
4.8

As a software developer with a specific passion for building intricate Artificial Intelligence solutions, I am more than ready to tackle your Offline Video Number Recognition project. My technical background is extensive and diverse, enabling me to masterfully utilize cutting-edge tools like Tensorflow and Pytorch required for this undertaking. Over the years, I have successfully developed software across multiple domains, constantly proving my ability to adapt swiftly to varying tech stacks. Moreover, my capacity to skillfully implement OpenCV along with superior skills in C and C++ programming languages further enhance my suitability for this task. These proficiencies ensure that the final package I deliver is always well-documented, maintainable, and extensible; guaranteeing smooth adaptation of the model to any new video resolutions or number fonts you introduce. To sum it up, my experience and commitment set me apart as an ideal fit for your project; capable of constructing an AI-powered solution tailored specifically to meet your unique requirements. Let us unlock the power of data within your videos together.
£10 GBP in 40 days
6.4
6.4

Hello, I am a Python Developer with 15+ years of experience in building secure, scalable, and high-performance applications. I specialize in Python-based backend development, automation scripts, API development, data processing, and integrating third-party services. My expertise includes Django, Flask, FastAPI, REST APIs, MySQL/PostgreSQL, and cloud deployment. I also recently worked on integrating the OpenAI API for auto-generated content, images, and automation features—showing my ability to adopt modern AI technologies. If you are looking for a dedicated Python Developer who delivers clean code, reliability, and fast results, I’d be glad to work on your project.
£10 GBP in 40 days
4.6
4.6

Hi, We will build your offline video number recognition tool using PyTorch with a CRAFT text detector and a lightweight CNN for digit classification. All inference runs locally, zero cloud calls. For retraining, we will include a CLI script that accepts your corrected CSV, fine-tunes only the classification head, and logs accuracy per run. This keeps iteration fast (under a minute on CPU for small correction batches). A couple of quick things to confirm: 1) What resolution and frame rate are the sample videos? 2) Do numbers appear in fixed screen regions or move freely across frames? The number quoted here is a starting estimate. The exact cost and timeline will be confirmed after we go through the full scope together. Looking forward to potentially working together. Thanks, Faizan
£12 GBP in 40 days
4.6
4.6

Hi,I am a seasoned Applied ML Engineer(6+ yoe) & I can build this as a fully offline MP4 number-detection + OCR extraction pipeline that reads visible numerical values frame-by-frame & exports structured results with timecodes Proposed Approach: -Processing & OCR:Utilize OpenCV for intelligent frame sampling & preprocessing,leveraging PaddleOCR/EasyOCR with temporal tracking to extract unique digits & prevent cross-frame duplication -Active Learning:Implement a lightweight feedback interface allowing users to submit corrections that trigger code-free OCR fine-tuning -Deliverables:Expose fully configurable pipelines exporting clean CSV/JSON logs (timestamps,detected values,bounding boxes & confidence scores) Relevant experience: -Strong experience in computer vision,OCR,video analytics,sports/broadcasting analytics,tracking,& structured data extraction from frames -Built OCR-heavy systems for ANPR,marathon bib tracking,video event extraction,document/image text parsing,& real-time visual analytics -Worked with OpenCV,YOLO,PaddleOCR/PP-OCR,Tesseract,PyTorch,TensorFlow,regex post-processing,frame tracking,& CSV/JSON logging -Experience building reproducible local pipelines with CLI tools,configs,validation reports,& retraining scripts -I will deliver runnable source code,setup instructions,sample command,exported CSV/JSON results,correction/retraining workflow,& notes for adapting to new fonts,layouts,& resolutions
£10 GBP in 40 days
4.3
4.3

Greetings, I have reviewed your project requirements and I am interested in working on this project. I have strong experience in computer vision, machine learning, neural networks, and AI model training. Using these skills, I can develop an offline solution capable of accurately detecting and reading on-screen numerical data from MP4 recordings frame by frame. The system will be designed to run fully locally without any cloud dependencies. I can provide either: A static solution optimized for your current video format and number styles, or A dynamic AI-driven solution that improves over time through additional training and user corrections. The final solution will include: A script or application that processes MP4 files and extracts numerical values with timestamps Export functionality in JSON or CSV format A retraining mechanism that allows the model to improve accuracy using corrected examples Clear setup and usage instructions I am also ready to build a prototype using your sample videos to demonstrate the system’s accuracy and overall capability before proceeding further. I look forward to working with you.
£13 GBP in 40 days
4.3
4.3

Hi there, Capturing and exporting numerical data from MP4 recordings can be fraught with challenges, especially ensuring accuracy across various fonts and formats. An AI-driven solution running offline needs careful handling of video frames and model adaptation. Here's how I can help: I will develop a robust application leveraging TensorFlow or PyTorch to detect and extract numbers, coupled with an easy-to-use retraining interface for continual improvement. This will ensure high accuracy and adaptability. Here are my questions: Could you specify the typical video resolutions you'll be working with? Also, are there any preferred programming languages for the app (C++ or Python)? Let’s discuss your project now!
£15 GBP in 40 days
3.2
3.2

Newport, United Kingdom
Member since May 22, 2026
₹400-750 INR / hour
€30-250 EUR
₹37500-75000 INR
€250-750 EUR
$10-30 USD
$250-750 USD
$150-175 USD
$10-30 CAD
$30-250 USD
$10-30 USD
£10-15 GBP / hour
₹1500-12500 INR
$750-1500 USD
$15-25 USD / hour
₹1500-12500 INR
₹75000-150000 INR
$30-250 USD
$30-250 USD
₹12500-37500 INR
$30-250 USD