Mock Data Gen with Machine Learning Module - 01/12/2023 02:56 EST

  • Durum: Closed
  • Ödül: $500
  • Alınan Girdiler: 1
  • Kazanan: td7x

Yarışma Özeti

*Summary*

This is a software engineering contest that leverages machine learning to solve developer experience inconveniences in creating mock data for testing and for demos within the JavaScript ecosystem.

Winning submissions will include a GitHub repo of the software, complete with documentation and CICD using GitHub Workflows.

Employer reserves all rights to the software created under this contest but will redistribute the software under an Open Source license. All dependencies must have permissive OSI approved licenses and the software must be runnable offline, without dependence on an external web service or datastore and without dependency on specialized hardware.

*Problem*

Simple faker or charade libraries can be used for mock data in software development but the use can be labor intensive because they require a developer to select the correct method and to identify the input parameters for each data field. Developers have enough cognitive overhead and need a fake data solution that can use existing data models/schemas with zero configuration to create the fake data.

*Solution*

A NodeJS module that produces semantically accurate fake data from an arbitrary data model or schema with zero configuration. We are primarily a Typescript/NodeJS shop and describe the requirements from that perspective but welcome submissions that are Rust based and that compile to WASM are more than welcomed. Runtime portability such as in-browser, Bun, Cloudflare, etc is preferred but NodeJS is required.

Data model handlers for GraphQL SDL and JSONSchema are required. Extra preference will be given to submissions with additional handlers for TypeScript type definitions and protobufs.

Various fake data handlers should be supported. Required is a handler that accepts a single field name from the data model and returns semantically correct mock data consistent with the larger data model. Extra preference will be given to submissions with additional handlers that accept a GraphQL request shape (returning a GraphQL response shape) and a handler that does not accept an argument and returns an object for the data model (that could be stringified into JSON).

It is expected that this software will utilize existing generators such as FakerJs, ChanceJs, CasualJs and RandExpJs just as other higher level tools do:

- https://github.com/json-schema-faker/json-schema-faker
- https://github.com/MedAli5543/graphql-fake-data-generator
- https://github.com/danibram/mocker-data-generator

Unlike these existing tools, this software will not statically code and thus limit itself to individual basic field types and require significant configuration for non-basic field types. How we overcome this limit is the crux of what makes this software different. Perhaps NLP string or vector comparisons can be used to select the correct generator function from the field name with only unmatched requests using an LLM. LangChain seems like a quite attractive pattern and tech for this.



*Code Standards*

Code will be written in strict TypeScript with strong typing and be compatible with Bun, Deno, and NodeJS. Code will be "Clean" and robust. OOP patterns are to be avoided in favor of "strategic" functional programming use. eslint-plugin-functional/recommended is great, using additional fp libs such as fp-ts or Ramda is not required. In general:
- Small composable functions.
- No nested code.
- Avoid if statements. Branches are only ok in the simplest and unavoidable use cases. Simple clean ternaries are fine.
- Along with avoiding branching, absolutely no try/catch.
- Never throw.
- No control loops.
- No unbounded iterators.
- Use maps rather than a switch or if/else.
- Functions should be small, pure, and composable.
- Separate configuration from code.
- Use arrow function syntax.
- Avoid async/await as one can accidentally block the event loop.

*Testing*

Fine grain testing of LangChain does not seem completely straight forward but there are current improvements to its testability and the LangSmith debugger should probably be used. Code should be decoupled so that mocks can be avoided. Vitest or Jest should be with fast-check as well as static assertions. Strict TDD is not required but preferred. Writing tests through the development and not at the end is required. The important thing is that testable code is cleaner, simpler, more robust. Tested code is easier to change.

The test suit should also prove the software works.

Aranan Beceriler

Genel Açıklama Panosu

  • farhankha4548
    farhankha4548
    • 2 ay önce

    I have ready your code and updated full functions but you have awarded someone

    • 2 ay önce
  • tokibul2
    tokibul2
    • 2 ay önce

    Hi,
    Do you know freelancer.com? Also, Do you know they are scammer?

    I earned 1000 GBP and 200+ USD by providing my service on this platform. But when I requested a payment withdrawal they closed my account. Blocked me and I couldn't chat or create any ticket.

    So, I created this account for help me to get my account balance in my bank account.

    what do you think about this scammer (freelancer.com) giving me my earnings in my account?

    [ They will just block this account. Because this is their only way of earning by taking hard-working payment from poor freelancers. In my words, they are a Beggar. ]

    Check this screenshot for more : https://drive.google.com/drive/folders/1tKtg5TC4-_6q_uG73rHNmUNhezqkRiaC?usp=sharing

    • 2 ay önce
  • farhankha4548
    farhankha4548
    • 2 ay önce

    I am working in rust to provide your a better and best solution and I will also show you demo video also

    • 2 ay önce
  • farhankha4548
    farhankha4548
    • 2 ay önce

    Hello, sir Is is good for you in node.js or RUST?
    What is preferred by you?
    I can also provide you in RUST if you want?

    • 2 ay önce
    1. dutco7
      Yarışma Sahibi
      • 2 ay önce

      A Rust solution would be great. It just needs to be able to run in BunJS and CloudFlare. WASI direction could be good, wasm-pack could help.

      https://github.com/scrippt-tech/orca and https://github.com/huggingface/candle are quite interesting.

      • 2 ay önce
  • dataexpert18
    dataexpert18
    • 2 ay önce

    Can you explain on which data you want to apply machine learning and what outcome you expect from machine learning?

    • 2 ay önce
    1. dutco7
      Yarışma Sahibi
      • 2 ay önce

      Hello Zafar, Im not sure how to explain it better than in the description. The generative model may need to use the data model for fine tuning or perhaps zero shot would work. The mock data gen function will accept a field name and return the semantically correct, generated data.

      • 2 ay önce

Daha fazla yorum göster

Yarışmalara nasıl başlanır

  • Projenizi ilan edin

    Yarışmanızı İlan Edin Hızlı ve kolay

  • Tonlarca girdi alın

    Tonlarca Girdi Alın Bütün dünyadan

  • En iyi girdiyi seçin

    En iyi girdiyi seçin Dosyaları indirin - Kolay!

Şimdi bir Yarışma İlan Et ya da Bugün Bize Katılın!