• pif 2 minutes ago

    > Although Homa is not API-compatible with TCP,

    IPv6 anyone? People must start to understand that "Because this is the way it is" is a valid, actually extremely valid, answer to any question like "Why don't we just switch technology A with technology B?"

    Despite all the shortcomings of the old technology, and the advantages of the new one, inertia _is_ a factor, and you must accept that most users will simply even refuse to acknowledge the problem you want to describe.

    For you your solution to get any traction, it must deliver value right now, in the current ecosystem. Otherwise, it's doomed to fail by being ignored over and over.

    • UltraSane 4 hours ago

      I wonder why Fibre Channel isn't used as a replacement for TCP in the datacenter. It is a very robust L3 protocol. It was designed to connect block storage devices to servers while making the OS think they are directly connected. OSs do NOT tolerate dropped data when reading and writing to block devices and so Fibre Channel has a extremely robust Token Bucket algorithm. The algo prevents congestion by allowing receivers to control how much data senders can send. I have worked with a lot of VMware clusters that use FC to connect servers to storage arrays and it has ALWAYS worked perfectly.

      • Sebb767 2 hours ago

        > I wonder why Fibre Channel isn't used as a replacement for TCP in the datacenter

        But it is often used for block storage in datacenters. Using it for anything else is going to be hard, as it is incompatible with TCP.

        The problem with not using TCP is the same thing HOMA will face - anything already speaks TCP, nearly all potential hires know TCP and most problems you have with TCP have been solved by smart engineers already. Hardware is also easily available. Once you drop all those advantages, either your scale or your gains need to be massive to make that investment worth it, which is why TCP replacements are so rare outside of FAANG.

        • YZF 2 hours ago

          Are you suggesting some protocol layer of Fibre Channel to be used over IP over Ethernet?

          TCP (in practice) runs on top of (mostly) routed IP networks and network architectures. E.g. a spine/leaf network with BGP. Fibre Channel as I understand it is mostly used in more or less point to point connections? I do see some mention of "Switched Fabric" but is that very common?

          • wejick an hour ago

            I'm imagining having a shared memory mounted as block storages then do the RPC thru this block. Some synchronization and polling/notifications work will need to be done.

          • Woodi 36 minutes ago

            You want to replace TCP becouse it is bad ? Then give better "connected" protocol over raw IP and other raw network topologies. Use it. Done.

            Don't mess with another IP -> UDP -> something

            • akira2501 4 hours ago

              > If Homa becomes widely deployed, I hypothesize that core congestion will cease to exist as a significant networking problem, as long as the core is not systemically overloaded.

              Yep. Sure; but, what happens when it becomes overloaded?

              > Homa manages congestion from the receiver, not the sender. [...] but the remaining scheduled packets may only be sent in response to grants from the receiver

              I hypothesize it will not be a great day when you do become "systemically" overloaded.

              • andrewflnr 3 hours ago

                Will it be a worse day than it would be with TCP? Either way, the only solution is to add more hardware, unless I'm misunderstanding the term "systemically overloaded".

              • parasubvert 2 hours ago

                This has already been done at scale with HTTP/3 (QUIC), it's just not widely distributed beyond the largest sites & most popular web browsers. gRPC for example is still on multiplexed TCP via HTTP/2, which is "good enough" for many.

                Though it doesn't really replace TCP, it's just that the predominant requirements have changed (as Ousterhout points out). Bruce Davie has a series of articles on this: https://systemsapproach.substack.com/p/quic-is-not-a-tcp-rep...

                Also see Ivan Pepelnjak's commentary (he disagrees with Ousterhout): https://blog.ipspace.net/2023/01/data-center-tcp-replacement...

                • wmf 5 hours ago

                  Previous discussions:

                  Homa, a transport protocol to replace TCP for low-latency RPC in data centers https://news.ycombinator.com/item?id=28204808

                  Linux implementation of Homa https://news.ycombinator.com/item?id=28440542

                  • slt2021 3 hours ago

                    the problem with trying to replace TCP only inside DC, is because TCP will still be used outside DC.

                    Networking Engineering is already convoluted and troublesome as it is right now, using only tcp stack.

                    When you start using homa inside, but TCP from outside things will break, because a lot of DC requests are created as a response for an inbound request from outside DC (like a client trying to send RPC request).

                    I cannot imagine trying to troubleshoot hybrid problems at the intersection of tcp and homa, its gonna be a nightmare.

                    Plus I don't understand why create a a new L4 transport protocol for a specific L7 application (RPC)? This seems like a suboptimal choice, because RPC of today could be replaced with something completely different, like RDMA over Ethernet for AI workloads or transfer of large streams like training data/AI model state.

                    I think tuning TCP stack in the kernel, adding more configuration knobs for TCP, switching from stream(tcp) to packet (udp) protocols where it is warranted, will give more incremental benefits.

                    One major thing author missed is security applications, these are considered table stakes: 1. encryption in transit: handshake/negotiation 2. ability to intercept and do traffic inspection for enterprise security purposes 3. resistance to attacks like flood 4. security of sockets in containerized Linux environment

                    • jayd16 3 hours ago

                      Are you imagining external TCP traffic will be translated at the load balancer or are you actually worried that requests out of an API Gateway need to be identical to what goes in?

                      I could see the former being an issue (if that's even implied by "inside the data center") and I just don't see how it's a problem for the latter.

                      • nicman23 3 hours ago

                        only thing homa makes sense is when there is no external tcp to the peers or at least not on the same context ie for roce

                      • unsnap_biceps 5 hours ago

                        The original paper was discussed previously at https://news.ycombinator.com/item?id=33401480

                      • runlaszlorun 3 hours ago

                        For those who might not have noticed, the author is John Ousterhout- best known for TCL/Tk as well as the Raft consensus protocol among others.

                        • signa11 3 hours ago

                          and more recently (?) the book : “a philosophy of software design”, highly recommended !

                        • GoblinSlayer an hour ago

                          > For many years, RDMA NICs could cache the state for only a few hundred connections; if the number of active connections exceeded the cache size, information had to be shuffled between host memory and the NIC, with a considerable loss in performance.

                          A massively parallel task? Sounds like something doable with GPGPU.

                          • indolering 21 minutes ago

                            So token ring?

                            • ksec 5 hours ago

                              Homa: A Receiver-Driven Low-Latency Transport Protocol Using Network Priorities

                              https://people.csail.mit.edu/alizadeh/papers/homa-sigcomm18....

                              • dveeden2 2 hours ago

                                Wasn't something like HOMA already tried with SCTP?

                                • iforgotpassword an hour ago

                                  And QUIC. And that thing tesla presented recently, with custom silicon even.

                                  And as usual, hardware gets faster, better and cheaper over the next years and suddenly the problem isn't a problem anymore - if it even ever was for the vast majority of applications. We only recently got a new fleet of compute nodes with 100gbit NICs. The previous one only had 10, plus omnipath. We're going ethernet only this time.

                                  I remember when saturating 10gbit/s was a challenge. This time around, reaching line speed with tcp, the server didn't even break a sweat. No jumbo frames, no fiddling with tunables. And that actually was while testing with 4 years old xeon boxes, not even the final hw.

                                  Again, I can see how there are use cases that benefit from even lower latency, but thats a niche compared to all DC business, and I'd assume you might just want rdma in that case, instead of optimizing on top of ethernet or IP.

                                  • silisili an hour ago

                                    This is a solid answer, as someone on the ground. TCP is not the bogeyman people point it out to be. It's the poison apple where some folks are looking for low hanging fruit.

                                • 7e 5 hours ago

                                  TCP was replaced in the data centers of certain FAANG companies years before this paper.

                                  • wmf 5 hours ago

                                    If they keep it secret they don't get credit for it.

                                    • andrewflnr 3 hours ago

                                      How do you figure? The right decision is the right decision, even if you don't tell people. (granting, for the sake of argument, that it is the right decision)

                                      • wmf 3 hours ago

                                        Yeah, you get the benefit of secret tech (in this case faster networking) but people shouldn't give social credit for it because that creates incentives to lie. And, sadly, tech adoption runs entirely on social proof.

                                    • albert_e 3 hours ago

                                      Curious ... replaced with what, I would like to know.

                                    • bushbaba 5 hours ago

                                      *minority of the fangs.

                                      • avardaro 4 hours ago

                                        A minority? What large tech company has not prioritized this?

                                        • cdchn 4 hours ago

                                          Which have and with what?

                                    • stiray 3 hours ago

                                      How long did we need to support ipv6? Is it supported yet and more widely in use than the ipv4, like in mobile networks where everything is stashed behind NAT and ipv4 kept?

                                      Another protocol, something completely new? Good luck with that, i would rather bet on global warming to put us out of our misery (/s)...

                                      https://imgs.xkcd.com/comics/standards.png

                                      • detaro 2 hours ago

                                        Mobile networks especially are widely IPv6, with IPv4 being translated/tunneled where still needed. (End-user connections in general skew IPv6 in many places - it's observable how traffic patterns shift with people being at work vs at home. Corporate networks without IPv6 leading to more IPv4 traffic during the day, in the evening IPv6 from consumer connections takes over)

                                        • stiray 2 hours ago

                                          Android: Settings -> About (just checked mine, 10...*), check your IP. We have 3 providers in our country, all 3 are using ipv4 "lan" for phone connectivity, behind NAT and I am observing this situation around most of EU (Germany, Austria, Portugal, Italy, Spain, France, various providers).

                                      • yesbut 4 hours ago

                                        Another thing not worth investing time into for the rest of our careers. TCP will be around for decades to come.

                                        • t-writescode 2 hours ago

                                          True! And chances are, if you're developing website software or video game software, you'll never think about these sorts of things, it'll just be a dumb pipe for you, still.

                                          And that's okay!

                                          But there are other sorts of computer people than website writers and business application devs, and they're some of the people this would be interesting for!

                                        • bmitc 3 hours ago

                                          Unrelated to this article, are there any reasons to use TCP/IP over WebSockets? The latter is such a clean, message-based interface that I don't see a reason to use TCP/IP.

                                          • tacitusarc 3 hours ago

                                            Websockets is a layer on top of TCP/IP.

                                            • bmitc 3 hours ago

                                              Yes, I know that WebSockets layer over TCP/IP. But that both misses the point and is part of the point. The reason that I ask is that WebSockets seem to almost always be used in the context of web applications. TCP/IP still seems to dominate control communications between hardware. But why not WebSockets? Almost everyone ends up building a message framing protocol on top of TCP/IP, so why not just use WebSockets which has bi-directional message framing built-in? I'm just not seeing why WebSockets aren't as ubiquitous as TCP/IP and only seem to be relegated to web applications.

                                              • dataviz1000 3 hours ago

                                                There isn't much of a difference between a router between two machines physically next to each other and a router in Kansas connecting a machine in California with a machine in Miami. The packets of data are wrapped with an address of where they are going in the header.

                                                WebSockets are long lived socket connection designed specifically for use on the 'web'. TCP is data sent wrapped in packets that is ordered and guaranteed delivery. This causes a massive overhead cost. This is different from UDP which doesn't guarantee order and delivery. However, a packet sent over UDP might arrive tomorrow after it goes around the world a few times.

                                                With fetch() or XMLHttpRequest, the client has to use energy and time to open a new HTTP connection while a WebSocket opens a long lived connection. When sending lots of bi directional messages it makes sense to have a WebSocket. However, a simple fetch() request is easier to develop. A developer needs to good reason to use the more complicated WebSocket.

                                                Regardless, they both send messages using TCP which ensures the order of packets and guaranteed delivery which features have a lot to do with why TCP is the first choice.

                                                There is UDP which is used by WebRTC which is good for information like voice or video which can have missing and unordered packets occasionally.

                                                If two different processes on the same machine want to communicate, they can use a Unix socket. A Unix socket creates a special file (socket file) in the filesystem, which both processes can use to establish a connection and exchange data directly through the socket, not by reading and writing to the file itself. But the Unix Socket doesn't have to deal with routing data packets.

                                                (ChatGPT says "Overall, you have a solid grasp of the concepts, and your statements are largely accurate with some minor clarifications needed.")

                                                • j16sdiz 3 hours ago

                                                  WebSocket is fairly inefficient protocol. and it needs to deal with the upgrade from HTTP. and you still need to implement you app specific protocol. This is adding complexity without additional benefit

                                                  It make sense only if you have an websocket based stack and don't want to maintain a second protocol.

                                                  • wmf 3 hours ago

                                                    Interesting point. For example, Web apps cannot speak BitTorrent (because Web apps are not allowed to use TCP) but they can speak WebTorrent over WebRTC and native apps can also speak WebTorrent. So in some sense a protocol that runs over WebSockets/WebRTC is superior because Web apps and native apps can speak it.

                                                • tkin1980 3 hours ago

                                                  Well, Websocket is over TCP, so you already need it for that.

                                                • freetanga 3 hours ago

                                                  So, back to the mainframe and SNA in the data centers?

                                                  • wmf 2 hours ago

                                                    If Rosenblum can get an award for rediscovering mainframe virtualization, why not give Ousterhout an award for rediscovering SNA?

                                                    (SNA was before my time so I have no idea if Homa is similar or not.)