Claiming "completely" is mapping a boolean to a float.
If you tell an LLM (with tools) to do a web search, it usually does a web search. The biggest issue right now is more at the scale of: if you tell it to create turn-by-turn directions to navigate across a city, it might create a python script that does this perfectly with OpenStreetMap data, or it may attempt to use its own intuition and get lost in a cul-de-sac.
Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?
The question is about the result of an action. Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.
Even for trivial tasks the output may vary between just a simple fix, and a rewrite of half of the codebase. You can never predict or replicate the output.
To quote Douglas Adams, "The ships hung in the sky in much the same way that bricks don't". Cars and table saws operate in much the same way that LLMs don't.
> Wow. It can do a web search. And that is useful in the context of programming how? Or in any context?
Your own example was turning a steering wheel.
A web search is as relevant to the broader problems LLMs are good at, as steering wheels are to cars.
> Given the same problem statement in the same codebase it will produce wildly different results even if prompted two times in a row.
Do you always drive the same route, every day, without alteration?
Does it matter?
> You can never predict or replicate the output.
Sure you can. It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw.
You can learn how to deal with reality even when randomness is present, and in fact this is something we're better at than the machines.
The original example was trying to compare LLMs to cars and table saws.
> Do you always drive the same route, every day, without alteration?
I'm not the one comparing operating machinery (cars, table saws) to LLMs. Again. If I turn a steering wheel in a car, the car turns. If input the same prompt into an LLM, it will produce different results at different times.
Lol. Even "driving a route" is probably 99% deterministic unlike LLMs. If I follow a sign saying "turn left", I will not end up in a "You are absolutely right, there shouldn't be a cliff at this location" situation.
Edit: and when signs end pointing to a cliff, or when a child runs onto the roads in front of you, these are called emergency situations. Whereas emergency situations are the only available modus operandi for an LLM, and actually following instructions is a lucky happenstance.
> It's just less like predicting what a calculator will show and more like predicting if, when playing catch, the other player will catch your throw
If you think that throwing more and more bad comparisons that don't work into the conversation somehow proves your point, let me dissuade you of that notion: it doesn't.
Claiming "completely" is mapping a boolean to a float.
If you tell an LLM (with tools) to do a web search, it usually does a web search. The biggest issue right now is more at the scale of: if you tell it to create turn-by-turn directions to navigate across a city, it might create a python script that does this perfectly with OpenStreetMap data, or it may attempt to use its own intuition and get lost in a cul-de-sac.