Quality Scoring¶
The quality score is a deterministic 0-100 metric for every job posting. It runs entirely in Go — no LLM calls, no external APIs. The score is computed in the engine's post-processing pipeline and stored as JobPost.QualityScore.
Formula¶
| Points | Factor | Description |
|---|---|---|
| 20 | Salary | Non-zero minimum or maximum amount listed |
| 15 | Direct apply | Apply method is easy_apply, email, direct_url, or external_url |
| 15 | Email matches domain | At least one email's domain matches the company domain or is a subdomain |
| 15 | Freshness | Scaled: < 24h = 15, < 72h = 10, < 7d = 5, older = 0 |
| 10 | Verified email | At least one MX-verified email address |
| 10 | Description length | Scaled: > 2000 chars = 10, > 500 = 7, > 200 = 5 |
| 10 | Not an agency | Domain and company name checked against known staffing/recruiting firms |
| 5 | Multiple emails | Two or more distinct email addresses |
| 100 | Total |
Component breakdown¶
Salary (20 pts)¶
func hasSalary(job *model.JobPost) bool {
if job.Compensation == nil {
return false
}
if job.Compensation.MinAmount != nil && *job.Compensation.MinAmount > 0 {
return true
}
return job.Compensation.MaxAmount != nil && *job.Compensation.MaxAmount > 0
}
Requires a non-zero amount in either min_amount or max_amount. Zero or nil compensation → no points. The interval (yearly/hourly/daily) does not affect scoring.
Direct apply (15 pts)¶
Recognized apply methods:
- easy_apply — LinkedIn/Indeed one-click apply
- email — apply by sending an email
- direct_url — external application URL
- external_url — alternate external URL
Email-domain match (15 pts)¶
Compares the domain portion of each email against JobPost.Domain. Also accepts subdomains (e.g. engineering@sub.acme.com matches acme.com). Requires at least one email address AND a non-empty domain.
Freshness (15 pts)¶
func freshnessScore(job *model.JobPost) int {
diff := time.Since(*job.DatePosted)
switch {
case diff < 0: return 0 // future date
case diff <= 24*time.Hour: return 15
case diff <= 72*time.Hour: return 10
case diff <= 7*24*time.Hour: return 5
default: return 0
}
}
Future dates (clock skew, timezone mismatch) and nil dates score 0.
Verified email (10 pts)¶
Grants points when at least one email in the job has Verified: true, meaning it passed DNS MX record lookup. Emails that fail MX verification, or jobs with no emails at all, score 0.
Description length (10 pts)¶
> 2000characters → 10 points> 500characters → 7 points> 200characters → 5 points<= 200characters → 0 points
Not an agency (10 pts)¶
scrappy maintains a list of known staffing/recruiting agencies:
aerotek.com, adecco.com, hays.com, kellyservices.com,
kforce.com, manpower.com, randstad.com, roberthalf.com,
robertwalters.com, teksystems.com, ...
In addition, the company name is checked for tokens: staffing, recruiting, recruitment, agency, talent, workforce, placement.
Jobs from these firms lose 10 points. The check is case-insensitive and token-aware (whole-word match, not substring).
Multiple emails (5 pts)¶
Bonus points for jobs with two or more distinct email addresses. Deduplicated by address string (case-insensitive).
Usage¶
Filter results by minimum score:
scrappy --sites remoteok --search "golang" --min-score 50
Only jobs scoring 50+ pass through. The score is also written to every export format (CSV, JSONL, XLSX, Parquet) under the quality_score column, so post-processing tools can filter independently.
The score always falls within 0-100. If the sum exceeds 100 (not possible with the current weights), it is clamped. Negative scores are clamped to 0.