Different organizations use different amounts of the tools the BEAM provides in ...

toolz · on March 5, 2021

sure, but I'm not talking about hot upgrades, I'm talking about upgrading entire nodes that exist in a cluster. The nodes will not be able to pass around functions as an upgraded node in a cluster will have a different version of that function. So this is an issue without considering hot code reloading, as I don't use that feature either.

yawaramin · on March 5, 2021

So we're not talking about hot code reloading here, we're talking exactly about upgrading nodes in a cluster. Many--I'd say most--Erlang/Elixir deployments today don't do that. They use Kubernetes.

toolz · on March 5, 2021

Kubernetes does not solve the issue, it merely has strategies for how to upgrade nodes. The issue is that when you compile new code, even if the function didn't change at all, it will be a new version of the function. If the upgraded node then sends a function to another node to be executed (as is a very common practice in meshed OTP apps) the older node will not be able to execute the function.

Now, I understand that most erlang apps still don't utilize clustering and instead are stateless apps and that's fine, but ocaml can already do stateless apps just fine, right? So my assumption here is that they're putting ocaml on the BEAM to utilize OTP and clustered apps, correct? If that's the case I don't see how you could have a real-world app that you can quickly scale up/down (which is something you often want in a clustered app) - instead it seems like you'd be forced to spin up entirely new clusters and then direct traffic to the new cluster to bypass this problem. This of course isn't a viable tradeoff for very large clusters.

derefr · on March 5, 2021

I think what the parent is saying isn't that most production Erlang apps aren't clustered; it's that most production Erlang apps aren't clustered using the distribution protocol. Instead, they're clustered at the application layer, with explicit wire protocols like gRPC. You can upgrade nodes within such clusters just fine, because the ABI of the wire protocol is an explicit part of the application, something that can be stabilized separately from the app itself, rather than being an implicit property of what ERTS version you're using, or about how the application's processes communicate internally.

Note that even in the context of Ericsson, Erlang's distribution protocol was never designed for the use-case of horizontal N-node clustering. It was designed for static-role clustering — where each node is like an organ in your body, with a name and a specific function relative to the "system" that is your body. For example, "master" and "warm standby" roles. (By analogy to Kubernetes, the Erlang nodes of a distributed-OTP architecture are closest to being like the sibling containers of a single k8s pod. Except that Erlang's "containers" can be running on separate machines while still being part of the same "pod".)

If you want to scale such a system (a "pod" of nodes), you are supposed to spawn N copies of the entire distribution set; and then those N system-copies will coordinate with other such clones not via the distribution protocol, but via explicit coordination protocols. (Sometimes this explicit coordination protocol is designed to use the distribution protocol as a carrier — this is one of the reasons that Erlang supports making manual distribution-protocol connections between nodes, rather than forcing a connected topology on you. But such connections carefully avoid passing arbitrary RPC data across them, instead usually having an explicit "peer server" on both ends, where the servers speak a limited and ABI-stable term protocol to one another.)

Yes, the Erlang distribution protocol has been repeatedly taken beyond its design tolerances to accomplish horizontal scaling. CouchDB does this, for example. But many Erlang applications that you might think of as doing this, are actually much closer to static "organs in a body" architecture than you'd think, despite claims of scalability. Riak and Ejabberd, for two examples, both treat their nodes very much like static organs, rather than a fleet.

nivertech · on March 6, 2021

Erlang's built-in distribution works well for cluster management and low-traffic control plan, but not for the critical path / data plan [1].

In the later OTP releases Erlang distribution has become much better and become useful even for data plan in less demanding scenarios.

While it is possible to customize Erlang distribution for your specific network/workload, it is much easier to use plain TCP sockets for your data plan (back in the day I used 0MQ for data plan in the cluster).

Also riak and ejabberd (via Mnesia) use Erlang distribution for critical path and this was one of the reason for their poor performance.

I think you mean that ejabberd is not a cloud-native application (i.e. it treats nodes like pets instead of cattle). Modern BEAM apps are implementing cluster node-autodiscovery (I did it for my first Erlang service on AWS in 2010).

I'm not sure, but I think RabbitMQ uses the Erlang distribution for management only.

Phoenix apps using PubSub and Presence/Tracker will also use Erlang distribution by default, but it can be switched to use other backends, i.e Redis-based PubSub.

[1] https://www.slideshare.net/nivertech/erlang-on-osv-49278675#...

dnautics · on March 6, 2021

> "it's that most production Erlang apps aren't clustered using the distribution protocol"

Phoenix pubsub plugs into pg2 directly and automatically so if you've got an elixir cluster with phoenix you are basically "using erlang clustering" even if you don't realize you are.

yawaramin · on March 5, 2021

> Kubernetes does not solve the issue, it merely has strategies for how to upgrade nodes. The issue is that...

I understand the issue, Gleam/Caramel/et al. (static typing layers on BEAM) are not trying to solve this issue.

> So my assumption here is that they're putting ocaml on the BEAM to utilize OTP and clustered apps, correct?

That's actually not my assumption. I'm assuming that the point of Caramel is to get sound static typing for (some of) the code that we are deploying on the BEAM. It doesn't necessarily have to be for clustered apps. As I said, many deploys of even Erlang and Elixir today are not using clustered nodes. They don't care about those capabilities. They use Kubernetes to manage nodes and state.

> instead it seems like you'd be forced to spin up entirely new clusters and then direct traffic to the new cluster to bypass this problem.

This is exactly what Kubernetes (or Nomad etc.) will let you avoid--they can spin up and spin down nodes within the same cluster.

jolux · on March 5, 2021

As I understand it, there are real questions about the type theory needed to correctly express what the BEAM does, and they have yet to be answered sufficiently. Nonetheless, a lot of people clearly want static types in their BEAM applications, and there are certainly parts of any application that don't cross the network. I'm interested in efforts like Caramel and Gleam mostly because of my interest in BEAM as a general-purpose platform, and because I want to see if they come up with solutions to the type theory problems.

jolux · on March 5, 2021

I don't quite follow how static types change this situation?