Jim Handy of Objective Analysis and I recently finished a white paper on The Future of Low-Latency Memory and presented a few slides related to that white paper at the SNIA Persistent Memory and Computational Storage Summit this week. Here are some interesting observations from that white paper that may have some bearing to the future of near and far memory. The figure below, comparing memory capacity and memory bandwidth available for common near memory solutions (DDR4 and DDR5 as well as HBM) versus the OMI interface can help understand the following discussion.
For Near Memory, the memory that attaches directly to a processor’s pins, the current DDR parallel memory bus has been tweaked and adjusted over the years.
Although its performance has improved impressively for more than two decades, DDR is failing to keep pace with the increasing bandwidth requirements of processor chips. Processor core counts are rising quickly, and clock speeds continue to creep higher, driving a thirst for bandwidth and capacity that runs in direct opposition to the way the DDR bus operates.
To achieve the highest DDR speeds, the bus’s capacitive loading must decrease as the bus speed increases. Because of this, the memory channels that previously managed four DIMM slots have shrunk to three, then two, and now the highest-speed channels can only support a single slot. As a result, the amount of memory per channel is declining.
Some processors, notably GPUs, use HBM (High Bandwidth Memory) to get past this issue. High Bandwidth Memory (HBM)s are stacks of DRAM that present 1,000-2,000 parallel signal paths to the processor. This can improve performance but the processor and the HBM must be intimately connected.
Although HBM is a help, it’s considerably more expensive than standard DRAM and is limited to stacks of no more than twelve chips, limiting its use to lower-capacity memory arrays. HBM is also complex and inflexible. There’s no way to upgrade an HBM-based memory in the field. As a consequence, HBM memory is only adopted where no other solution will work
Todays and tomorrow’s computing systems need a growing amount of both Near and Far Memory, that provide as much bandwidth as possible and the processor needs to use the smallest possible die area to communicate with these memories. Various approaches have been proposed for far memory (such as CXL) that enable memory disaggregation, pooling and composibility, but near memory has remained the provenance of parallel DDR and HBM memory.
The Open Memory Interface (OMI) is supported by the OpenCAPI Consortium and uses existing high-speed serial signaling PHYs, with a custom protocol, to connect standard low-cost DDR DRAMs to the processor. OMI is a latency-optimized subset of OpenCAPI. This approach allows large arrays of inexpensive DRAM to be connected at high speeds to a processor without burdening the processor with a lot of additional I/O pins. As shown in the figure above OMI provides near-HBM bandwidth at larger capacities than are supported by DDR.
A summary of various characteristics of these three near memory solutions are summarized in the table below. OMI would be attractive for big data near memory applications that require significant memory density, low latency and high performance as a percentage of chip area.
Cryptocurrency mining is a process of receiving cryptocurrency “coins” as a reward for completing blocks of verified transactions that are added to the cryptocurrency blockchain. This mining generally…