Searching
Build Lucene queries, sort and project fields, filter by mediatype and date, and export large result sets.
archive search queries the Internet Archive's Advanced Search (Solr) index. A
query is Lucene: free text, field-scoped terms, ranges, and boolean operators.
archive search 'collection:nasa' -n 5
archive search 'mediatype:texts AND subject:mathematics' -n 5
archive search 'title:(apollo AND moon)' -n 5
Field filters
Common fields have dedicated flags so you do not have to hand-write the Lucene:
archive search apollo --media image -n 5
archive search apollo --collection nasa -n 5
archive search apollo --creator 'NASA' -n 5
archive search apollo --year 1969 -n 5
archive search apollo --year 1965-1972 -n 5
These compose with a free-text or field query; everything is ANDed together.
Sorting
--sort takes a Solr sort key and is repeatable for tie-breaks:
archive search 'collection:nasa' --sort 'downloads desc' -n 5
archive search 'collection:nasa' --sort 'date desc' --sort 'downloads desc' -n 5
Choosing the columns
By default each result carries identifier, title, mediatype, downloads, and
date. Ask for specific metadata fields with -f (repeatable), and reshape the
displayed columns with --fields:
archive search 'collection:nasa' -f identifier -f downloads -f publicdate
archive search 'collection:nasa' --fields identifier,downloads -o csv
Just the count
archive search 'collection:nasa AND mediatype:image' --count
Output that pipes
archive search 'collection:nasa' -o url # details URLs
archive search 'collection:nasa' -o jsonl # one JSON object per line
archive search 'collection:nasa' --fields identifier -o url | head
Pipe identifiers straight into another command:
archive search 'collection:nasa AND mediatype:image' --fields identifier -o raw -n 10 \
| xargs -n1 archive files
Large result sets
A normal search pages through the Advanced Search endpoint up to the --limit
you set (-n 0 means unlimited, bounded only by what Solr will return). To
export a very large set reliably, --all switches to the cursor-based Scraping
API, which has no deep-paging ceiling:
archive search 'collection:nasa' --all --fields identifier -o raw > nasa-ids.txt
Tune the per-request page size with --rows. The client rate-limits and
retries on 429/5xx automatically, so a long export stays polite.