Inlining wouldn’t help with op counts, unless the specializer can eliminate the branch in/branch-out that replaces the call/return. It can usually do that, but the gain should be low except for *very* short functions.
You are right the the inlined function body can often be folded into the width, improving bundle counts. How much improvement depends on whether the calculations inside the function are in the critical dataflow path of the caller. But if caller and callee bundles are already pretty full for the target member then folding may not produce much impact on bundle count, or on latency for that matter.
Another consideration with inlining is whether you can apply post-inline optimizations. Often arguments to the call are constants that control callee control flow, and you can toss a lot of the code as unnecessary to inline.