Getting into coding with LLMs; Nebius onboarding is lovely

(best-by date notice: Given the pace of the field, just about everything below will probably be woefully out of date by 2025-03. Use at your own risk)

TL;DR: rambling about LLMs, ending up with a very pleasant experience with Nebius.

Background

or, I got ended up looking for model inference hosting

Over the holidays I started getting serious about actually using LLMs for coding. Both the models and the infrastructure have matured to a point where the focus can shift from “doing something” to “making real use”. (I.e. we’re no longer in the “dancing bear” situation: “the wonder is not how well the bear dances, but that he dances at all”).

The easiest way to get started for me was using the Zed editor - now that it has vim keybindings it feels reasonably usable. The AI integration with Claude 3.5 Sonnet where you are able to construct the context interactively and have it edit / propose edits inline is quite something.

Of course, after that, I wanted to experiment with using LLMs more programmatically (as part of an editor, they are a multiplier but still I have to be riding that process).

Programming with LLMs

or, too much to fit in this blog post. Maybe there will be another one some day :)

DSpy (interesting idea), SAMMO (looks very interesting, didn’t yet get a chance to try it))

How to use an LLM as a component

or, inference for fun and profit

Architecture and prompting aside, on the nuts and bolts side it all comes down to

performance
control
privacy
price (for the expected usage pattern)
easy start-up

Performance

To be useful, inference has to be fast.

My M4 macbook is just not going to handle the larger models I’d like to be able to run. Neither are the consumer-grade GPUs.

In practice, this means using some external hosting (I can’t justify the investment for a dedicated system).

Control

Let’s just say I’m allergic to building on top of systems I have no control over. Having an AI model as a black box behind an API is not going to cut it for me - there are reports of model behaviour changing in unpredictable ways over time.

Luckily, open-weights models seem to be catching up -> hosting a specific open-weights model seems like a good bet. And will allow finetuning and surgery in the future. (It’s not so much the ability to do operations on the model right now, it’s more about having a path there)

Privacy

I’m also allergic to allowing my queries to be recorded.

At the very least I want an EU hosting company (due to the GDPR, they have quite some incentives to keep the data private).

Price

My expected usage pattern is for the time being pretty sporadic and random. I don’t want to pay for a full instance.

This limits me to either using an existing inference-as-a-service API or some service where I can have a container spring up whenever I start using it.

Easy start-up winner: Nebius

Here, we have a clear winner that also does well on most of the above cases. Nebius, I have to applaud your onboarding UI:

There is no stage 2). Doing this:

gets you free $1 credit (so far I’ve used about one cent of that, $1 goes surprisingly far)
gets access to the model playground where you can
- see which models they host for inference
- chat with any of those models
allows you to create an API key for calling that model from DSpy

Now, this is what onboarding should look like. No friction, no hassle, no confirmation emails, no “create password”.

Where do we go from here?

The only thing this setup doesn’t allow is custom models (finetuning or other). For that, you need a GPU instance. For my usage pattern, we will need to look at services which allow VMs to boot up quickly (RunPod, Fly.io). But the setup is going to be much more of a hassle.

If you know of a better one, please let me know in the comments