R
Link preview
Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? [d]
Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a technical/scientific domain. The goal would be to improve and evaluate how well code-generation models can use this library correctly. I am trying to understand the legal / Terms of Service boundary around using OpenAI API outputs in two different scenarios: Scenario 1: Silver dataset for fine-tuning an OSS model Use the OpenAI API to generate programming tasks, reference solutions, and verification tests for the specific Python library. Then human-review, filter, and validate the generated examples. Then use this silver dataset to fine-tune an open-source code model, with the goal of improving its performance on this specific library. My question: would this violate OpenAI’s terms because the API outputs are being used to train/fine-tune another coding model, even if the scope is narrow and library-specific? Scenario 2: Benchmark only, not training Use the OpenAI API to generate programming tasks, reference solutions, and verification tests. Human-review and validate them. Then use the resulting dataset only as an evaluation benchmark to compare different models. The benchmark would not be used to fine-tune or train any model. My question: is this generally considered allowed under OpenAI’s terms, assuming the benchmark is properly reviewed and documented as AI-assisted? I understand that Reddit is not legal advice, and I would still contact OpenAI or legal counsel for a definitive answer. However, I thought new ideas could come up from people who have already faced similar situations in practice. submitted by /u/ororo88 [link] [Kommentare] reddit.com · reddit.com ↗
Hello everyone, Is it allowed to use OpenAI API outputs to create a silver code dataset or benchmark for a specific Python library? I am working on a project idea related to library-specific code generation. The concrete case is a specific Python library used in a technical/scientific domain. The goal would be to improve and evaluate how well code-generation models can use this library correctly. I am trying to understand the legal / Terms of Service boundary around using OpenAI API outputs in two different scenarios: Scenario 1: Silver dataset for fine-tuning an OSS model Use the OpenAI API to generate programming tasks, reference solutions, and verification tests for the specific Python library. Then human-review, filter, and validate the generated examples. Then use this silver dataset to fine-tune an open-source code model, with the goal of improving its performance on this specific library. My question: would this violate OpenAI’s terms because the API outputs are being used to train/fine-tune another coding model, even if the scope is narrow and library-specific? Scenario 2: Benchmark only, not training Use the OpenAI API to generate programming tasks, reference solutions, and verification tests. Human-review and validate them. Then use the resulting dataset only as an evaluation benchmark to compare different models. The benchmark would not be used to fine-tune or train any model. My question: is this generally considered allowed under OpenAI’s terms, assuming the benchmark is properly reviewed and documented as AI-assisted? I understand that Reddit is not legal advice, and I would still contact OpenAI or legal counsel for a definitive answer. However, I thought new ideas could come up from people who have already faced similar situations in practice. submitted by /u/ororo88 [link] [Kommentare]
Comments