What survives of aja, after the algorithm
Yesterday and this afternoon I sat down and enumerated, via the YouTube Data API, the entire uploads playlist of AJA Arabic's main channel (twenty thousand video IDs) and the entire uploads playlist of @aljazeeradocumentary (five thousand eight hundred and thirty seven). For each item I fetched duration and view count, and threw out anything shorter than twenty minutes or below two hundred thousand views. What's left is two thousand one hundred and seventy nine videos from the main channel and three hundred and forty six from the documentary channel. Two and a half thousand videos out of roughly twenty six thousand. A bit under ten percent. The rest of that channel, by mass, is shorts, teasers, three minute clips of a longer interview re-cut for the feed, and the kind of vertical-format content that exists because the recommendation system rewards it, not because anybody at the desk thought it would be worth keeping in five years.
I want to be clear about why I bothered, because the obvious read is that this is some kind of Al Jazeera fandom and it is not. I am, if anything, increasingly hostile to the way that organisation has chosen to publish in the last decade. The complaint is upstream of any politics. A channel that runs /شاهد على العصر/, hour-long interviews with people who watched the second half of the twentieth century happen in places like Algiers and Baghdad and Tripoli, is sitting on what is, in archival terms, oral history that you cannot reconstruct from anywhere else. The interviewees are dead, mostly. The interview is the artifact. And the artifact lives on a third party's playlist endpoint, accessible only as long as a content-moderation policy team in San Bruno decides it should be. The same is true for the long documentary work the documentary channel did from roughly 2014 through 2019, on the Ethiopian highlands, the Atlas mountains, Indonesian island peoples, the Western Desert in Egypt; the kind of slow ethnographic-leaning film work that no streaming service today commissions because the unit economics are bad. Six, seven, eight million views on episodes from 2016. People watched it. The point is that people, plural, watched it, and one company decides whether they can keep watching.
What I have on disk is a list. The list is not the videos. I have not pulled the videos themselves; that's a different problem and a much larger one. The earliest item in the Arabic main results is from June 2007 and the earliest in the documentary results is from December 2008, which is already past the period that I keep going back to mentally, the period I wrote about in that brief diary entry about the old Aljazeera Arabic portal back in November. That portal, in its 2002 to roughly 2008 form, was not a news site in the present sense. It was a directory, with real navigation, real internal links, a working forum, polls, separate sub-portals per show, embedded Real Player streams that no longer play, and a sense that you were entering a place that someone had designed for you to spend time in.
So the censorship piece. AJ+ Arabic was suspended from YouTube in 2022. Individual videos from the main channels have been removed under various pretexts over the years, including the very long-running practice of removing speeches by people the State Department considers designated, which means a non-trivial slice of late twentieth century Arab political history is now harder to study than it was in 2009. I am not interested in arguing the rights and wrongs of any individual takedown. The structural point stands without it: there is no public ledger of what has already been removed from these channels. I have a list of what is up today. I do not, and cannot, have a list of what was up two years ago that is not up now. Nobody does. YouTube does not publish takedown logs at video granularity, and even the Lumen database, which catches a fraction of the legal removals, will not see the policy-based ones. We talk about the open web losing its archive function as if it were a passive process, files quietly bit-rotting on some neglected server, and it is not that. A meaningful portion of the long-form Arabic-language video record of the last twenty years is being actively curated, by a company that has no stake in Arabic-language posterity and every stake in advertiser relations, and the result is invisible attrition. You only notice when you go looking for something specific and find that the link 404s, and by then the question of when it was taken down and why has been engineered to be unanswerable.
What the filtration also showed, less polemically, is the brutal asymmetry between what they make and what survives the recommendation funnel. Of twenty thousand uploads on the main channel, only about two hundred from before 2010 cleared the filter. That is not because they uploaded little back then; the entire archive of سري للغاية and شاهد على العصر and the early الاتجاه المعاكس episodes is in there. It is because the channel was used differently then, as a video dump for an existing TV programme, and the long-form material was published as full episodes with thumbnails that no longer get clicked. The view counts on the 2008 to 2010 cohort are mostly two hundred thousand to a million; the recent cohort, the podcast-format material from 2023 onward, runs to several million per episode. The platform rewards the new format and the old format slowly recedes. That is not censorship in the legal sense. It is the slow form of it, the one that does not need a takedown request because the product manager's quarterly metric is sufficient.
Now the practical part. The filtration outputs are sitting under /aja/ on this site, both as readable markdown tables and as machine-friendly CSV and JSONL. The grand total is well under a megabyte; trivial for anybody to mirror. Direct links:
- /aja/results.md (Arabic main, 2,179 videos, markdown table sorted by views)
- /aja/results.csv (same, CSV)
- /aja/results.jsonl (same, one JSON object per line)
- /aja/results_doc.md (Documentary, 346 videos)
- /aja/results_doc.csv
- /aja/results_doc.jsonl
If you want one file:
curl -LO https://lr0.org/aja/results.mdThe columns are the same across all three formats: video ID, title, publish date, duration in seconds, view count at the time of the scan (mid May 2026), and the canonical YouTube URL. If you want to pull the actual videos using the URL list, yt-dlp is what to use, ideally with --write-info-json --write-subs --write-auto-subs so you keep the metadata and subtitles, because the subtitles will outlive the audio in any future archival use case. The Arabic auto-subtitles are not good, but they are searchable, and searchable is most of the value.
The list itself is not a preservation effort, it is an index of one. But it is useful in the way that an index is useful: if any of these URLs goes dark next year, somebody, somewhere, will have a row that says this title existed, this long, with this many views, on this date, at this ID. That is the smallest non-zero amount of archival metadata. It is the minimum you would want to have if you were going to argue, later, that something was removed. The thing I keep coming back to, and the reason for the slight edge in this entry, is that this minimum should not be something an individual has to assemble on a weekend out of API calls. It should be a property of a healthy public information layer, and it is not, and the people who would have built it twenty years ago when the old portal was still alive have mostly moved on or been laid off, and what we have instead is one company's playlist endpoint and the implicit hope that the company stays interested. #Propaganda #Politics
