AI and ChatGPT

scientists are putting out papers about the Dunning-Kruger effect of these parrots at last, any idiot informing themselves through such puppets find their a better version of themselves while reinforcing their own prejudices and substantial ignorance.
 
scientists are putting out papers about the Dunning-Kruger effect of these parrots at last, any idiot informing themselves through such puppets find their a better version of themselves while reinforcing their own prejudices and substantial ignorance.
This is so ridiculous. Might have been true with GPT-2, but it has long been bullshit to call them parrots.
 
This is so ridiculous. Might have been true with GPT-2, but it has long been bullshit to call them parrots.
They are just encyclopedic calculators and I really really wait for someone proving they can do something coherent beyond the already-accepted line.
 
They are just encyclopedic calculators and I really really wait for someone proving they can do something coherent beyond the already-accepted line.
When is the last time you have used them? And what version of them?
 
When is the last time you have used them? And what version of them?
I am using the latest Claude right now, which is undoubtedly smart for any STEM problem (I am doing materials science right now) until it starts pattern matching your own sh*t while extrapolating beyond the accepted wisdom, as so many 2025-6 "papers" in my field demonstrate. Anyone "talking to bots" is an idiot at this stage, and the more they're in, higher the chance Dunning-Kruger is making its work.
 
I am using the latest Claude right now, which is undoubtedly smart for any STEM problem (I am doing materials science right now) until it starts pattern matching your own sh*t while extrapolating beyond the accepted wisdom, as so many 2025-6 "papers" in my field demonstrate. Anyone "talking to bots" is an idiot at this stage, and the more they're in, higher the chance Dunning-Kruger is making its work.
OK.

I have a quite different experience in software engineering. It is very very good at most things, while doing some basic mistakes here and there.

I think your comment is overly extreme. It has been already used to some degree to do novel math and novel quantum field theory. We have arguably the world's leading mathematician (Terry Tao) who is using LLMs in math research. It is definitely not just a pattern matching your own shit.
 
Tao and others are just using the "greater mind power" of the super-power calculator, they are not discovering anything new. Astrophyiscs seem possibly the best bet because there is no chance experiments will ever be done, so the suoer-calculator helps. We will see.
 
I can't believe Anthropic have released a new, best ever model, just after having an absolute PR nightmare.
 
Last edited:
if Mythos is anything like Anthropic are claiming then SWE's are so cooked

Still will make mistakes and human in the loop is needed still, also reality is tokens are still really expensive for those kinds of models. If it was say $10 per million and it could work autonomously then SWE might be cooked. Currently at say $100 per million on these models it'll only be for big orgs to find vulnerabilities.

My take is still once it can work completely autonomously, goal set and token prices drop to the point that it's cheaper than a SWE then we are cooked.

However the caveat for that is, by that time all other white collar work will be, and so will blue collar with robotics. I feel by EoY and start of 2027 we will have some major saturation across many benchmarks, but costs will be high still. 2028 I think most cognitive domains will be easy including many research tasks but costs again to offset R&D and compute.

Robotics will be on a similar path now 2030 latest I think but the cost of robotics will be a deciding factor... Issue is too, China at max capacity can only make 10,000 a year so there needs to be greater throughput.

We are starting to see the abilities of the models and now I can say Anthropics models are faster than I am and better than me with 12 YoE but it still needs me to prompt and check
 
Very interesting topic and you probably won't be out of work any time soon. It feels like AI will bring a host of new exploit not just through prompt injection but also if used as a tool to find and exploit already existing vulnerabilities.
About that....
https://www.anthropic.com/glasswing

## Project Glasswing Summary

**Project Glasswing** is a major cybersecurity initiative announced by Anthropic, bringing together 12 major tech and financial organizations (including AWS, Apple, Google, Microsoft, Cisco, and JPMorganChase) to use AI for **defensive cybersecurity purposes**.

### The Core Problem
Anthropic's new unreleased frontier model, **Claude Mythos Preview**, has demonstrated the ability to find and exploit software vulnerabilities at a level surpassing most human experts — including discovering thousands of zero-day vulnerabilities across **every major OS and web browser**. This capability, if it falls into the wrong hands, poses serious risks to critical infrastructure.

### The Response
Rather than wait for bad actors to exploit these capabilities, Project Glasswing aims to use them **defensively**:
- Partner organizations will use Mythos Preview to scan and fix vulnerabilities in critical software
- **$100M in usage credits** committed to participants
- **$4M in donations** to open-source security organizations (Alpha-Omega, OpenSSF, Apache)
- Access extended to 40+ additional organizations maintaining critical open-source infrastructure

### Key Points
- Mythos Preview will **not** be made generally available due to its risk level
- Anthropic is in discussions with **US government officials** about the model's capabilities
- A public report on findings will be released **within 90 days**
- The initiative is intended as a **long-term, industry-wide effort**
 
About that....
https://www.anthropic.com/glasswing

## Project Glasswing Summary

**Project Glasswing** is a major cybersecurity initiative announced by Anthropic, bringing together 12 major tech and financial organizations (including AWS, Apple, Google, Microsoft, Cisco, and JPMorganChase) to use AI for **defensive cybersecurity purposes**.

### The Core Problem
Anthropic's new unreleased frontier model, **Claude Mythos Preview**, has demonstrated the ability to find and exploit software vulnerabilities at a level surpassing most human experts — including discovering thousands of zero-day vulnerabilities across **every major OS and web browser**. This capability, if it falls into the wrong hands, poses serious risks to critical infrastructure.

### The Response
Rather than wait for bad actors to exploit these capabilities, Project Glasswing aims to use them **defensively**:
- Partner organizations will use Mythos Preview to scan and fix vulnerabilities in critical software
- **$100M in usage credits** committed to participants
- **$4M in donations** to open-source security organizations (Alpha-Omega, OpenSSF, Apache)
- Access extended to 40+ additional organizations maintaining critical open-source infrastructure

### Key Points
- Mythos Preview will **not** be made generally available due to its risk level
- Anthropic is in discussions with **US government officials** about the model's capabilities
- A public report on findings will be released **within 90 days**
- The initiative is intended as a **long-term, industry-wide effort**
Interesting and probably a wise move. That being said I always thought, that if one of these companies would ever achieve real asi they would probably not make it public but instead use it to further their advantage in the field to be always way ahead of the competition and maybe branch into other areas to make more money. I mean if you have an ASI that can build any digital product within days sharing it with anyone would be foolish from a business perspective.

That being said, I don't think LLMs alone can actually get to ASI status, they lack the self learning capabilities and too often fall flat on the reasoing end of things and I think it's impossible to really call something ASI that can't reason better than humans or learn new things that are not already stored as tokens in their vector DB.
 
Also, it probably explains why Claude turned to shit recently (I think they limited its reasoning tokens to save GPUs so they can give as much compute as possible in Mythos post-training).
Is this sort of confirmed? Because I've been seeing a few posts online in that direction and why it can be confirmation bias, I've felt the same.
I'm a complete vibecoder so not a lot of expertise but I'm experimenting a bit with Claude Code and Chatgpt and while I obviously use Claude for the coding part (essentially letting him run wild with full permission), I've felt that when it comes to logic it's making mistakes that remind me of LLMs from about 2 years ago, basically looking to confirm results with confident answers that are just nonsense (an example, I'm creating an application that roughly compares regional trains with high speed trains in German. When it calculated the differences there was a bug in the dataset that gave zero regional connections from Frankfurt Main station. When giving me the results and his opinion, Claude Code basically said "makes sense, Frankfurt is only a train hub for high speed rail" which is obviously total nonsense. It's also prone to making weird process errors, e.g. messing up that we have already done 3 runs of stage x, claiming we've only done 2.). Right now I feel Chatgpt is still better at logic, reasoning and critical thinking so while I'm doing the coding work with Claude, I'm doing planning, reviewing and discussing things mainly with Chatgpt.

Also, what's the best tool/LLM for academic research right now? Last time I did something was 2 years ago and (if I remember correctly) most consensus was that perplexity was best because it invented less stuff. Planning to do quite a lot of research soon and not sure what's the best to use.

Edit: I just realized that this thread now probably mainly focusses on the political implications of AI given that it has been moved to the CE forum. Shall we create one back in the General again for discussion about application use?
 
AI has done to the internet what VAR has done to football.

Now whenever I see something interesting eg, an old photo/video, a nice landscape or whatever, I can’t get into the true mode of enjoying it as I don’t even know for sure that it’s real.
 
Is this sort of confirmed? Because I've been seeing a few posts online in that direction and why it can be confirmation bias, I've felt the same.
I'm a complete vibecoder so not a lot of expertise but I'm experimenting a bit with Claude Code and Chatgpt and while I obviously use Claude for the coding part (essentially letting him run wild with full permission), I've felt that when it comes to logic it's making mistakes that remind me of LLMs from about 2 years ago, basically looking to confirm results with confident answers that are just nonsense (an example, I'm creating an application that roughly compares regional trains with high speed trains in German. When it calculated the differences there was a bug in the dataset that gave zero regional connections from Frankfurt Main station. When giving me the results and his opinion, Claude Code basically said "makes sense, Frankfurt is only a train hub for high speed rail" which is obviously total nonsense. It's also prone to making weird process errors, e.g. messing up that we have already done 3 runs of stage x, claiming we've only done 2.). Right now I feel Chatgpt is still better at logic, reasoning and critical thinking so while I'm doing the coding work with Claude, I'm doing planning, reviewing and discussing things mainly with Chatgpt.

Also, what's the best tool/LLM for academic research right now? Last time I did something was 2 years ago and (if I remember correctly) most consensus was that perplexity was best because it invented less stuff. Planning to do quite a lot of research soon and not sure what's the best to use.

Edit: I just realized that this thread now probably mainly focusses on the political implications of AI given that it has been moved to the CE forum. Shall we create one back in the General again for discussion about application use?

In my experience Perplexity by far the best for this. Seems to have a much better grasp of what is and isn’t a valid reference.

Although still prone to making shit up. As they all are.
 
Is this sort of confirmed? Because I've been seeing a few posts online in that direction and why it can be confirmation bias, I've felt the same.
I'm a complete vibecoder so not a lot of expertise but I'm experimenting a bit with Claude Code and Chatgpt and while I obviously use Claude for the coding part (essentially letting him run wild with full permission), I've felt that when it comes to logic it's making mistakes that remind me of LLMs from about 2 years ago, basically looking to confirm results with confident answers that are just nonsense (an example, I'm creating an application that roughly compares regional trains with high speed trains in German. When it calculated the differences there was a bug in the dataset that gave zero regional connections from Frankfurt Main station. When giving me the results and his opinion, Claude Code basically said "makes sense, Frankfurt is only a train hub for high speed rail" which is obviously total nonsense. It's also prone to making weird process errors, e.g. messing up that we have already done 3 runs of stage x, claiming we've only done 2.). Right now I feel Chatgpt is still better at logic, reasoning and critical thinking so while I'm doing the coding work with Claude, I'm doing planning, reviewing and discussing things mainly with Chatgpt.

Also, what's the best tool/LLM for academic research right now? Last time I did something was 2 years ago and (if I remember correctly) most consensus was that perplexity was best because it invented less stuff. Planning to do quite a lot of research soon and not sure what's the best to use.

Edit: I just realized that this thread now probably mainly focusses on the political implications of AI given that it has been moved to the CE forum. Shall we create one back in the General again for discussion about application use?
Nope, only Anthropic can do that and they won't ever confirm such a thing. But everyone I know who has been heavily using it is feeling that way (same for me). On the last couple of week, my team switched back to GPT.

https://fortune.com/2026/04/14/anth...k-of-transparency-accusations-compute-crunch/
 
Anthropic also working on AI design tool and Opus 4.7.

Anthropic is preparing its next flagship model, Claude Opus 4.7, along with a new AI-powered tool for designing websites and presentations, according to a person with knowledge of the products. Those new products could be released as soon as this week, the person said.
News of the upcoming AI design tool sent the share prices of Adobe, Wix and Figma down more than 2% in the house following this report.
https://www.theinformation.com/briefings/exclusive-anthropic-preps-opus-4-7-model-ai-design-tool
 
Nerfed big time, even worse than ChatGPT right now!
 
Anyway, even if all these new models are so much better, people hit their limit so quickly you wonder how much AI will actually boost productivity.

With the free version of Claude I could do 2 or 3 prompts before hitting a limit!
 
Anyway, even if all these new models are so much better, people hit their limit so quickly you wonder how much AI will actually boost productivity.

With the free version of Claude I could do 2 or 3 prompts before hitting a limit!
There are no limits in enterprise versions, they just cost a shitload.
 
There are no limits in enterprise versions, they just cost a shitload.
I have access to 3 pro versions (CoPilot, Gemini and ChatGPT) so technically with some limits, but I've never run into a limitation personally. I think a lot of it is simply being efficient in what you provide and ask it to do, rather than relying on it for 100% of what someone does. Perhaps in a field like computer programming that limit is tougher to deal with, but I do a lot of statistical analysis and the limits are plenty for me. It's clear though many are using AI to do all the thinking for them, and that's stupid for a variety of reasons, token limits aside.
 
I have access to 3 pro versions (CoPilot, Gemini and ChatGPT) so technically with some limits, but I've never run into a limitation personally. I think a lot of it is simply being efficient in what you provide and ask it to do, rather than relying on it for 100% of what someone does. Perhaps in a field like computer programming that limit is tougher to deal with, but I do a lot of statistical analysis and the limits are plenty for me. It's clear though many are using AI to do all the thinking for them, and that's stupid for a variety of reasons, token limits aside.
I am not saying for the paid pro versions (like Claude 5x or 20x). You ca easily run out of tokens in them. But for the enterprise versions when you pay for token.