How to set up automatic Artifactory repository cleaning
Your repositories are cluttered, you’re storing irrelevant builds, and your disks are full. Luckily, you’ve come to the right place for advice. Here’s how to set up automatic cleanup.
There’s three things you want to be cleaning up on your Artifactory server:
- Artifacts
- Builds
- Binaries
I’ll go through each of them and show you how to clean them up. Note that I won’t be using the JFrog CLI. If you’re stuck on Windows, or not a big fan of curl, consider replacing the REST calls below with the JFrog CLI.
I’ve also prepared a GitHub repository with a number of scripts to help you automate the entire cleanup. You’ll find all you need to set it up in the README, so go check it out at praqma/artifactory-retention.
While a clone & own of the repository should be enough to get you started, it’s usually a good idea to know what it’s doing behind the scenes, so stick around for the rest. Let’s get started!
Stale artifacts
Artifacts are references to binaries plus some metadata. Removing stale artifacts is a great place to start your cleanup as you’ll open up more stale builds and binaries to be cleaned up down the road.
Querying for artifacts using AQL
Query for artifacts through the Artifactory Query Language. Deciding which artifacts are relevant (irrelevant?) is up to you. In my example I’m querying a single repository for artifacts that have never been downloaded but are older than 7 days, or just haven’t been downloaded in the last 30 days:
items.find({
"repo": { "$eq": "praqma-libraries-local" },
"$or" :[
{
"$and": [
{ "stat.downloads": { "$eq":null } },
{ "updated": { "$before": "7d" } }
]
},
{
"$and": [
{ "stat.downloads": { "$gt": 0 } },
{ "stat.downloaded": { "$before": "30d" } }
]
}
]
}).include("repo", "name", "path", "updated", "sha256", "stat.downloads", "stat.downloaded")
Next, we’ll call the Artifactory API with our AQL query as a payload:
curl -H content-type:text/plain --data-binary @payload.json https://artifactory.praqma.net/api/search/aql -o result.json
The response should be some JSON describing the artifacts:
{
"results" : [ {
"repo" : "praqma-libraries-local",
"path" : "net/praqma/foo/1.0.0-2-g08afc87",
"name" : "foo-1.0.0-2-g08afc87.pom",
"updated" : "2018-09-18T15:19:15.057+02:00",
"sha256" : "1763f2f76dcc1b6423f680dfc72627f80e7ddac542dfe8d94e0909699fcf6862",
"stats" : [ {
"downloaded" : "2018-09-18T15:18:39.853+02:00",
"downloads" : 2
} ]
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 1,
"total" : 1
}
}
Deleting the artifacts
Now that we have our targets it’s time to clean them up. We’ll be calling the REST API to delete all of these. Below is a simple Groovy script that parses the JSON result and calls a delete on all the matching artifacts.
def input = new File("result.json")
def parser = new groovy.json.JsonSlurper()
def artifacts = parser.parse(input).results
artifacts.each { artifact ->
println "curl -X DELETE https://artifactory.praqma.net/${artifact.repo}/${artifact.path}/${artifact.name}".execute().text
}
Stale builds
Cleaning up builds is very similar to cleaning up artifacts. Get a list through AQL and delete them through the REST API.
Querying for builds using AQL
This query is a bit different since we’re looking for “builds” rather than “items”. Again, which builds you want to clean up is up to you. In my example I’m querying for builds that didn’t produce any artifacts or had their artifacts deleted by my earlier cleaning
builds.find(
{"module.name":{"$nmatch": "*"}}
).include("name", "number")
Next, we’ll call the Artifactory API with our AQL query as a payload:
curl -H content-type:text/plain --data-binary @payload.json https://artifactory.praqma.net/api/search/aql -o results.json
The response should be some JSON describing our builds:
{
"results" : [ {
"build.name" : "the-foo-build",
"build.number" : "1"
} ],
"range" : {
"start_pos" : 0,
"end_pos" : 1,
"total" : 1
}
}
Deleting the builds
Again, I iterate over the builds and call the REST API to delete them using a small Groovy script:
def input = new File("result.json")
def parser = new groovy.json.JsonSlurper()
def builds = parser.parse(input).results
builds.each { build ->
def name = build."build.name"
def number = build."build.number"
println "curl -X DELETE https://artifactory.praqma.net/api/build/${name}?buildNumbers=${number}&artifacts=0".execute().text
}
Unreferenced binaries
Deleting builds and artifacts cleans up quite a lot clutter, but it doesn’t help our disk space issue. We’ve cleaned up the references - not the binaries themselves. Removing the unreferenced binaries is extremely easy, but a pain to automate. In the Artifactory Admin panel, under Advanced > Maintenance, you’ll find a small button labeled “Prune unreferenced data”. Click it and you’re done.
I’ve yet to find a REST API call that triggers the same cleanup. If you do, drop me a line.
Automating the cleanup
To automate the cleanup I’ve cobbled together a number of scripts that I run through a Jenkins job. A few config files are used to dictate what gets cleaned up making it trivial to include new repositories. You’ll find the result in the praqma/artifactory-retention repository on GitHub.
You’ll find everything you need to get it up and running in the README file. All you really need to do is point it at the right Artifactory server, tell it which artifacts and builds it should clean up, and off you go.
Unfortunately, there’s no way to automate cleaning up the unreferenced binaries yet. If that shows up I’ll come back and update the project and the blog.
Published: Feb 12, 2019
Updated: Mar 30, 2024