When I operated a studio, this was a frequent/constant problem with novice clients who'd never heard their own voices amplified in high quality.

I'd just have them put headphones on and sing/talk for 15-20 minutes as we rehearsed and interacted, and about 10 minutes in, I'd have them uncover one ear because they sang better in tune while monitoring both the mic and the room. They adjusted very quickly.
It's great that this method worked for you!

Over time I've learned a different way, initially from working with kids. I often don't even bother with headphones, and just set up a single speaker behind the mic and have the artist(s) work with that. There will be some bleed, but in a relatively dead room it's never a deal-breaker. And I've found that the little bit of bleed from the speaker can actually help the vocal blend with the track.

