this is expected behavior... [...] it's unavoidable I'd say.
Actually no, this is a real problem and it's also an easy bug to fix. How to elevate this question to a dev's eyeballs? Is there a bug report system?
Most people who edit audio files are familiar with this artifact. The problem is that the first sound is not terminated correctly (at a zero crossing) before the next one plays. Of course we avoid it by not interrupting a waveform mid z-cycle.
But you may be wondering, with Razmo, how a computer can automatically avoid it. But it's not hard or anything new. In video game audio, for example, you have lots of sound files chopped up and combined on-the-fly. (Speech and SFX files interrupting each other, etc.). Novice video-game sound programmers often get this very same clipping, and then they learn about it and avoid it.

Anyhow, this is not a behavior I myself would expect. I hope it is fixed in an update.
edit: Please note that it also happens in monophonic mode. This makes preset F1P2 Prophet-5 Bass (a major headliner on this machine!) sound basically so bad it's unusable for pro-level recording...