Github shipped an updated version of good first issues feature which uses a combination of both a machine learning (ML) model that identifies easy issues, and a hand curated list of issues that have been labeled “easy” by project maintainers.
New and seasoned open source contributors can use this feature to find and tackle easy issues in a project.
In order to eliminate the challenging and tedious task of labelling and building a training set for a supervised ML model, Github has opted to use a weakly supervised model. The process starts by automatically inferring labels for hundreds of thousands of candidate samples from existing issues across Github repositories. Multiple criteria are used to filter out potentially negative training samples. These criteria include matching against a 300 odd curated list of labels, issues that were closed by a pull request submitted by a new contributor, and issues that were closed by pull requests that had tiny diffs in a single file.