Possible Approaches

Phase Vocoding is often thought of as one of the best means for audio extraction. It takes advantage of the centered placement of the lead vocals in most recordings. Unfortunately, it doesn’t extend well beyond such specific cases.

Training on the target characteristics of the sound and extracting doesn’t work so well. This is due to frequency overlaps not to mention the instrument sound in the recording may have a sufficiently different characteristic.

Full source separation followed by labeling and reconstruction of the target line from the deconstructed parts is a fine solution. This is if you can effectively perform source separation as it is exceedingly hard or expensive.

Next- Or Mimic the target line for extraction and use PLCA as described here.