Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's the issue with 'delve'? It's hardly Old English


>What's the issue with 'delve'? It's hardly Old English

It's used a ton by LLMs for some reason despite being rarely used by real people. I think it's mostly a byproduct of LLMs having their training data being over represented by certain published works instead of casual communications.

There does seem to be something else going on with delve specifically though, one of the other comments mentions that delve isn't used in the specific training data for this, so it's odd to see it being used in the output. I wonder if it's because delve has secondary definitions of "to make a careful or detailed search for information" and "to examine a subject in detail" which is causing the LLM to use it to seem like it's answers are more thorough.


> It's used a ton by LLMs for some reason despite being rarely used by real people.

The popular theory is that it's due to English-language RLHF tasks being outsourced to Nigeria, where "delve" is used relatively often.

https://simonwillison.net/2024/Apr/18/delve/


I keep seeing that proposed as a theory, but are companies actually outsourcing this work to Nigeria or is that just an assumption someone made because they know delve is more commonly used there? A lot of these posts can't even decide which African country to blame it on. I've heard Kenya and Nigeria and just 'Africa' in general.


It's "lexically overrepresented". https://arxiv.org/abs/2412.11385


The standard narrative is that chatbot training efforts were all outsourced to poorest regions of Africa with best English fluency, and it turned out they(both their child LLM and the poor guys who trained them) use the word "delve" a lot.


I don't know anything about Africa, but I assumed it's because dictionaries define it to mean "to make a careful or detailed search for information" and "to examine a subject in detail" and LLMs seem to latch on to using it for that for some reason.

If anything, I'd assume the chatbots would use Indian English phrases like "do the needful" and those weird phrases that only make sense in Hindi but are translated to English.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: