The ChatGPT API can get expensive, specially if you’re using GPT-4. Did you know there’s an open-source drop-in alternative? We’re talking about LocalAI.
What’s the catch I hear you say? The catch is that you need to your own language model. Fear not, it’s easier that it seems, there’re literally thousands of open-source models to download and use directly.
Why Choose LocalAI?
LocalAI might be a good alternative if you’re facing any of these challenges:
- Cost Efficiency: When scaling up operations, using your own local language model with LocalAI can become more cost-effective than relying on cloud services.
- Privacy: your data doesn’t leave your system, offering a higher level of privacy and security.
- Customization: it allows you to train or fine-tune your own models, giving you the flexibility to tailor the AI to your specific needs.
- Open Source Experimentation: It supports various open-source models, offering a wider range of features while ensuring compatibility with existing projects.
Setting Up LocalAI
First, download a model from Hugging Face and copy into the a directory called models
. If you are not familiar with Hugging Face workings, check out this post first: 6 Ways For Running A Local LLM.
Then, to run with Docker:
$ docker run -p 8080:8080 -v $PWD/models:/models -ti --rm quay.io/go-skynet/local-ai:latest --models-path /models --context-size 700 --threads 4
If you are on Apple Silicon, it’s best to build your own binary instead of using Docker. Once built, you can start the API server with:
$ ./local-ai --models-path=./models/ --debug=true
Now you can test the API service by opening a new terminal and using curl:
$ curl http://localhost:8080/v1/completions -H "Content-Type: application/json" -d '{
"model": "llama2-13-2q-chat.gguf",
"prompt": "A long time ago in a galaxy far, far away",
"temperature": 0.7
}'
Replace the model
value with the filename of the model you downloaded from Hugging Face.
Integrating a Local LLM into Your Projects
LocalAI shines when it comes to replacing existing OpenAI API calls in your code. You can easily switch the URL endpoint to LocalAI and run various operations, from simple completions to more complex tasks.
For example, the following code sends a completion request to the local API server using the OpenAI official library. We have only to replace two things for it to work with LocalAI:
openai.base_url
: replaces the OpenAI endpoint with your own LocalAI instance.openai.api_key
: should be set to a generic API key, otherwise the call fails. You don’t need a valid API key to use LocalAI.
import openai
# API key does not need to be valid
openai.base_url = "http://localhost:8080"
openai.api_key = 'sk-XXXXXXXXXXXXXXXXXXXX'completion = openai.chat.completions.create(
model="llama2-13-2q-chat.gguf",
messages=[
{
"role": "user",
"content": "How do I output all files in a directory using Python?",
},
],
)
print(completion.choices[0].message.content)
Model Gallery
LocalAI also supports a feature called model gallery. You can define language models you want to support by setting the PRELOAD_MODELS
environment variable. For example, the following export replaces gpt-3.5-turbo with the GPT4ALL basic model:
$ export PRELOAD_MODELS='[{"url": "github:go-skynet/model-gallery/gpt4all-j.yaml", "name": "gpt-3.5-turbo"}]'
LocalAI will advertise the module name, letting you replace OpenAI models with any model you want. When we start LocalAI with this variable defined, the API server will automatically download and cache the model file.
You can browse the model gallery to see all the available models.
Conclusion
Whether you’re a hobbyist or a developer, LocalAI offers a versatile, cost-effective, and privacy-conscious alternative to cloud-based AI APIs. It’s suitable for experimenting with language models and image generation or for integrating AI into your projects without relying on third-party APIs.
Thank you for reading and happy building!
Originally published at https://semaphoreci.com on December 19, 2023.