A few years ago, open-source artificial intelligence (AI) models dominated conversations about AI. This vision for a democratized and collaborative AI ecosystem seemed bright. Fast forward to today and the contributions of open-source models are overshadowed and undervalued. Big tech players are monopolizing the AI industry, replaying a story the technology industry knows all too well.

Big tech controls AI development by leveraging existing advantages surrounding access to data, compute power, and the talent needed to run large-scale AI models. While open-source AI models make important contributions, the changing regulatory and industry environment threatens and undervalues the critical role open-source models play in the industry. Conversations on AI have shifted from fostering innovation to addressing safety, controlling risk, and regulating AI. As governments around the world grapple with how to regulate AI, open-source models must be re-centred in this discussion to grow a competitive, transparent, and innovative AI industry.

Why open-source AI is critical

Supporting open-source AI is key to creating a thriving AI ecosystem. Open-source models ensure more people have the opportunity to use AI and get involved with its development, without relying on a few big tech companies. This creates a more dynamic and competitive market, with several overall benefits.

Transparency and Accountability: Open code allows others to inspect and audit AI models for biases and errors. For example, BLOOM is a multilingual model collaboratively developed by over 1,000 researchers worldwide.This open approach has allowed several researchers to improve BLOOM over time.

Cost and Flexibility: Open-source models cut costs for organizations looking to apply AI to their business. Businesses can also customize these models to meet specific needs, which is especially beneficial for small and medium-sized enterprises looking for affordable AI integration.

Safety, Security and Privacy: Open-source AI promotes safety and security because more people are reviewing the technology to fix biases, vulnerabilities, and other flaws. Organizations can also maintain control over data by hosting their own version of an open model. Using closed models often requires companies to send sensitive data to third-party servers, which increases the risk of data breaches.

Subtle shifts from open to closed AI

Conversations about AI often interchangeably use terms like open-source models, open models, and open weights, without definition. Open-source AI refers to models which make the model’s code and training data openly available. AI model code shapes the entire model by processing data, defining the model’s architecture, training and evaluating the model, and facilitating deployment for use. Others can use, model, and improve the AI model when the code is available.

Open models can vary in the degree of openness. A model’s openness is influenced by factors like access to training data, availability of documentation on data collection and processing, and transparency about model design and testing. The Open Source Initiative recently released a definition of open-source AI. Under this new definition, an open-source AI is a system that allows users to freely use, study, modify, and share the system without permission. An AI system qualifies as “open” if the system includes the preferred form for modifying training data, source code, and model parameters. While the definition of open-source AI is still debated, this definition provides a solid basis for evaluating a model’s openness.

Meta’s LLaMA and BLOOM show varying degrees of openness. Meta discloses its LLaMA model weights, but it does not qualify as a truly open-source model under the Open Source Initiative’s definition because it imposes restrictions on commercial use. BLOOM is lauded as an example of true open-source AI because every element of the model is freely accessible for inspection and further improvement.

Closed-source models, like OpenAI's ChatGPT or Google's PaLM, are proprietary and controlled by developers. The model’s source code is not made available for others to use or audit.

The shift from open to closed AI models is already underway. For example, OpenAI previously published information on its earlier ChatGPT models. However, when launching GPT-4, Open AI announced it would not publish any details about the “architecture, hardware, training compute, dataset construction, training method, or similar,” citing the competitive AI landscape and safety implications.

The future for open-source AI

The rising dominance of large, closed models has coincided with AI regulation, which threatens to curtail the vital contributions open-source models are making to the ecosystem. Governments must balance the unique role of open-source models in driving innovation with regulatory compliance burdens. If compliance becomes too burdensome, especially for small developers, it could stifle AI innovation.

New AI regulations are already struggling to strike this balance. The European Union’s AI Act provides limited exceptions for open-source models, which will make it difficult for individual or small developers to meet the Act’s compliance burden and bring their models to the EU. This time last year, President Biden’s executive order on AI regulation sparked debate among open-source advocates. The order defines “dual-use models” as AI models trained on broad data which “generally use self-supervision.” Organizations like Y Combinator argued these overly broad definitions will capture most of the AI industry and place undue restrictions on small businesses and open-source models. As governments worldwide contemplate AI regulation, finding the right balance will be crucial to preserving an innovative and competitive AI industry.

Open-source models already have and continue to make powerful contributions to the AI ecosystem. In this critical moment for AI development, policymakers must recognize and protect the contributions open-source models are making to this ecosystem. The open-source movement is about democratizing the future of technology. We need to keep this vision for open-source AI alive and prevent it from fading in a closed, monopolized AI market.