OpenAI Might Have Used Millions Of YouTube Videos To Train Its AI Model: All Details2Photo© gadgets360.com

OpenAI Might Have Used Millions Of YouTube Videos To Train Its AI Model: All Details

, 5 news, 5 views

OpenAI might have used more than a million hours of transcribed data from YouTube videos to train its latest artificial intelligence (AI) model GPT-4, claims a report. It further states that the ChatGPT maker was forced to procure data through YouTube as it had exhausted its entire supply of text-word resources to train its AI models. The allegation, if true, can lead to new problems for the AI firm which is already fighting multiple lawsuits for using copyrighted data. Notably, a report last month highlighted that its GPT Store contained mini chatbots that violated the company's guidelines.

In a report, The New York Times claimed that after running out of sources with unique text words to train its AI models, the company developed an automatic speech recognition tool called Whisper to use it to transcribe YouTube videos and train its models using the data. OpenAI launched Whisper publicly in September 2022, and the AI firm said it was trained on 6,80,000 hours of “multilingual and multitask supervised data collected from the web”.

The report further alleges, citing unnamed sources familiar with the matter, that the OpenAI employees discussed whether using YouTube's data could breach the platform's guidelines and land them in legal trouble. Notably, Google prohibits the usage of videos for applications that are independent of the platform.