Posted on

What is ROCM the AMD solution?

Rocm is the first framework to run Deep Learning subset of AI on gpu cards such as:

  • AMD RADEON 7970 / 280X
  • AMD RADEON Vega
  • the infamous AMD RADEON VII, powered with HBM2 memory.
  • The rasterization beasts AMD RADEON RX 5700/6800 series, with the latter cutting the price for a 16Gb card.
  • The Pro cards rather confidential, Mi100 AMD is good for cost reduction, so there are plenty of card to get, the lattest having more than 16gb of memory with great efficiency.

It is that bad?

After reading some posts on Reddit, some may consider it bad. This is a common issue on blogging platform when marketing is not that good, the people tend to spread broadly incorrect information... Such as the fact that this AMD framework is broken or quite painfully rant based comments. This couldn't be more broadly incorrect and average. There are reasons why CUDA succeeds where ROCM miss:

  • most of all lot of folks hate doing system administration work. Rather simple things like setting up a dual boot on a computer is above most people's head. Assuming this, allow to get more perspective from bad comments. CUDA is somehow simpler to install. This is not a big deal, but it is a deal breaker for some people.
  • most of folks use windows, so far I don't think there is a way to install ROCM on windows. This is a deal breaker for some people.
  • the API is not solid enough yet, it is evolving but every time there is a new version, the code is likely to break.
  • the bugs are specific
  • the community is so small it is hard to find help.
  • the hardware will not beat Nvidia hardware in the near future. This is annoying because some would see a limited potential in investing time in developing this framework. And more importantly the consumer grade hardware is not the top dog here.
  • Obviously deep learning is not that popular, it will gain on popularity over time. These are the pitfalls of Rocm for sure but also there are some promising things:
  • AMD does have market share into the automotive industry. Including some versions of Tesla cars. This is rock solid thing and will only improve.
  • AMD have very competent engineers
  • AMD have a lot of history doing open-sourced work. Which mean if a workflow is working, it will work for a long time. This is not guaranteed on some workflows with Nvidia.
  • AMD have gathered the IP of Xilinx so we may have Rocm over IOT FPGA in the future, not only GPU with fixed efficiency.
  • AMD is not spending billions on marketing. Focusing on the product growth and not the hype.
  • AMD is not (yet) selling 100k USD cards with proprietary drivers

Ideal AMD system

The ideal system would run on a splitted hypervised system, sharing cpu and gpu through processes. This is not possible with Nvidia, adding virtual limitations ("licensing" issues of the driver through a VM).

reference

reddit community