Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am in academia and worked in NLP although I would describe myself as NLP adjacent.

I can confirm LLMs have essentially confined a good chunk of historical research into the bin. I suspect there are probably still a few PhD students working on traditional methods knowing full well a layman can do better using the mobile ChatGPT app.

That said traditional NLP has its uses.

Using the VADER model for sentiment analysis while flawed is vastly cheaper than LLMs to get a general idea. Traditional NLP is suitable for many tasks people are now spending a lot of money asking GPT to do just because they know GPT.

I recently did an analysis on a large corpus and VADER was essentially free while the cloud costs to run a Llama based sentiment model was about $1000. I ran both because VADER costs nothing but minimal CPU time.

NLP can be wrong but it can’t be jailbroken and it won’t make stuff up.



That's because VADER is just a dictionary mapping each word to a single sentiment weight and adding it up with some basic logic for negations and such. There's an ocean of smaller NLP ML between that naive approach and LLMs. LLMs are trained to do everything. If all you need is a model trained to do sentiment analysis, using VADER over something like DistilBERT is NLP malpractice in 2025.


> using VADER over something like DistilBERT is NLP malpractice in 2025.

Ouch. Was that necessary?

I used $1000 worth of GPU credits and threw in VADER because it’s basically free both in time and credits.

I usually do this on large dataset out of pure interest in how it correlates with expensive methods on English language text.

I am well aware of how VADER works and its limitations, I am also aware of the limitations of all sentiment analysis.


Sorry, I side with GP. Just because you don't want to use Llama/GPT because of cost, the middle-ground of DistilBERT etc (which can run on a single CPU) is a much more sensible cost/benefit tradeoff than VADER's decade old lexicon-based approach.

I can't really think of many NLP things that are one-decade old and don't have a better / faster / cheaper alternative.


I must have explained myself extremely poorly. I spent a fair bit of money ~$1,000 USD running a near SOTA fine-tuned llama model on cloud GPUs for this very particular task.


I think people do understand, but think you that your argument on price/performane uses two dataoint that are both far from a perceived better third option.

It's like saying I chose barefoot walking to get to the next town and while admittedly it was a painfull and not pleasant experience, it was free. I did try a helicopter service but that was very expensive for my use case.

People are pointing out you could have used a bicycle instead.


This was clear both other times you explained it, the other commenters seem to want to nitpick despite it.


Maybe I misinterpreted what he wrote, but sanity checking the shiny new tech against fossilized tech of yesteryear to assure the new tech actually justifies it's higher cost doesn't sound like malpractice to me?

I mean he did use the state of the art for his work, he just checked how much better it actually was in comparison to a much simpler algorithm and thought the cost/benefit ratio to be questionable... At least that's what I read from his comments


Curious how big your dataset was if you used $1000 of GPU credits on DistilBERT. I've run BERT on CPU on moderate cloud instances no problem for datasets I've worked with, but which admittedly are not huge.


If I'm reading correctly, they used $1000 running a Llama model, not DistilBERT.


You read it correctly. I obviously didn't explain myself well.


Price isn't a real issue in almost every imaginable use case either. Even a small open source model would outperform and you're going to get a lot of tokens per dollar with that.


> dictionary mapping each word to a single sentiment weight

That seems to me like it would flat out fail on sarcasm. How is that still considered a usable method today?


It's not.


It currently costs around $2200 to run Gemini flash lite on all of Wikipedia English. It would probably cost around 10x that much to run sentiment analysis on every Yelp review ever posted. It's true that LLMs still cost a lot for some use cases, but for essentially any business case it's not worth using traditional NLP any more


idk why are you changing targets for comparison?

it's like:

"does apple cure cancer in monkeys?" vs "does blueberry cure diabetes in pigs?"


More like "does apple cure cancer in monkeys?" vs "no, but some do cure diabetes in pigs"


I work in a non-profit and continue to use traditional NLP for the same reasons. I have lots of text, and LLMs are expensive. Also, our organization has restrictive policies on AIs, especially LLMs.

I try to get the best of both words by using LLMs to generate synthetic data to train NLP classifiers. First, I use LLMs to generate variations of human-labeled data. Second, I use LLMs to label unlabeled data.

In a future challenge, I want to train LLMs to generate data to train NER for segmenting documents and extracting information.


So... What were the results? How did the Llama based model compare to VADER?


*consigned a good chunk of historical research into the bin


To be fair it is still stuck there, so


Sentiment analysis using traditional means is really lacking. I can’t talk about the current project I’m working on. But I needed a more nuanced sentiment. Think of something like people commenting on the Uber Eats app versus people commenting on a certain restaurant.


VADER made me sad when it couldn’t do code mixed analyses in 2020. I’m thinking of dusting off that project, but then I dread the thought of using LLMs to do the same sentiment analysis.


Does it work for sarcasm and typos which real world people tend to do?


How well did VADER correlate with Llama? Did you try any other methods intermediate between them?


I’d love to hear your thoughts on BERTs - I’ve dabbled a fair bit, fairly amateurishly, and have been astonished by their performance.

I’ve also found them surprisingly difficult and non-intuitive to train, eg deliberately including bad data and potentially a few false positives has resulted in notable success rate improvements.

Do you consider BERTs to be the upper end of traditional - or, dunno, transformer architecture in general to be a duff? Am sure you have fascinating insight on this!


That is a really good question, I am not sure where to draw the line.

I think it would be safe to say BERT is/was firmly in the non-traditional side of NLP.

A variety of task specific RNN models preceded BERT, and RNN as a concept has been around for quite a long time, with the LSTM being more modern.

Maybe word2vec ushered in the end of traditional NLP and was simultaneously also the beginning of non-traditional NLP? Much like Newton has been said to be both the first scientist and also the last magician.

I find discussing these kind of questions with NLP academics to be awkward.


a) you can save costs on llama by running it locally

b) compute costs are plummeting. inference in the cloud costs has dropped over 80% in 1 year

c) similar to a), spending a little more and having a beefy enough machine is functionally cheaper after just a few projects

d) everyone trying to do sentiment analysis is trying to make waaaay more money anyway

so I dont see NLP’s even lower costs of being that relevant. its like pointing out that I could use assembly instead of 10 layers of abstraction. It doesnt really matter




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: