aurweb/scripts/cleanup
Dan McGee 9a79d2105e Segment the upload directory by package name prefix
This implements the following scheme:

* /packages/cower/ --> /packages/co/cower/
* /packages/j/     --> /packages/j/j/
* /packages/zqy/   --> /packages/zq/zqy/

We take up to the first two characters of each package name as a
intermediate subdirectory, and then the full package name lives
underneath that. Shorter named packages live in a single letter
directory.

Why, you ask? Well because earlier today the AUR hit 32,000 entries in
the unsupported/ directory, making new package uploads impossible. While
some might argue we shouldn't have so many damn packages in the repos,
we should be able to handle this case.

Why two characters instead of one? Our two biggest two-char groups, 'pe'
and 'py', both start with 'p', and have nearly 2000 packages each. Go
Python and Perl.

Still needed is a "move the existing data" script, as well as a set of
rewrite rules for those wishing to preserve backward compatible URLs for
any helper programs doing the wrong thing and relying on them.

Signed-off-by: Dan McGee <dan@archlinux.org>
Signed-off-by: Lukas Fleischer <archlinux@cryptocrack.de>
2011-08-10 14:34:07 +02:00

45 lines
1 KiB
PHP
Executable file

#!/usr/bin/php
<?php
# Run this script by providing it with the top path of AUR.
# In that path you should see a file lib/aur.inc
#
# This will remove files which belong to deleted packages
# in unsupported.
#
# ex: php cleanup dev/aur/web
#
$dir = $argv[1];
if (empty($dir)) {
echo "Please specify AUR directory.\n";
exit;
}
set_include_path(get_include_path() . PATH_SEPARATOR . "$dir/lib");
include("config.inc.php");
include("aur.inc.php");
include("pkgfuncs.inc.php");
$count = 0;
$buckets = scandir(INCOMING_DIR);
foreach ($buckets as $bucket) {
$bucketpath = INCOMING_DIR . $bucket;
if ($bucket == '.' || $bucket == '..' || !is_dir($bucketpath)) {
continue;
}
$files = scandir(INCOMING_DIR . $bucket);
foreach ($files as $pkgname) {
if ($pkgname == '.' || $pkgname == '..') {
continue;
}
$fullpath = INCOMING_DIR . $bucket . "/" . $pkgname;
if (!package_exists($pkgname) && is_dir($fullpath)) {
echo 'Removing ' . $fullpath . "\n";
rm_tree($fullpath);
$count++;
}
}
}
echo "\nRemoved $count directories.\n";