A little introduction
Everything started from an non-planing stuff done on #opsyria. To give you some context, we have a bot there, named ii, that’s help us with information management.
Birth and death of a bot
ii’s birth dates back to the second phase of opsyria, the phase were we go wild and try to get some contacts with Syrians. It was first a greetings bots, telling new comers some safety tips in Syrian (because we still do not speak Syrian).
Then, we fired up a tweeter account, and so, we add twitter functions to ii. And status.net also (for our status.net platform). And then, we added it the possibility to repeat interesting stuff ii saw on those platform (publishing on IRC the thing he saw in its following list on both platforms).
Then, we had some problem with the micro bloging thing. 140 characters is short, especially when you use arabic and weird unicode chars. So, we build a news functionality, that leads us to our news website where we still publish real time news form the ground, due to our contacts help.
After that, things went crazy. Lots of videos were posted online and we started indexing them. here came the videos functionality (and later on the pics one, same thing, but with pictures) and we started building an index of all videos related to Syrian events.
So, this is how we built on 6 month, our database of information, with dates, places and comments of each videos, pictures or news we can find. We build different websites using these and, one day, we realized that, it could be nice for preservation of the data, to extract them from the website they are located to be sure they will always be online.
We had fears that Syrian officials (or Assad’s supporters) could manage to get youtube or facebook accounts closed, and then have the videos unavailable and lost for everyone.
The archiving idea
At the 28C3, we already had a somewhat big databases. And a script that could download each video, and stores them on a website, as ‘static file’ with a non-friendly user interface (apache directory listing) located here: http://syria-videos.ceops.eu/
Some journalists just told us that it was nice, but not really usable (no way to easily parse stuff, or to find events related to one particular date, and so on). So, we started to think about how we could do that.
Parsing it by hand was out of questions, there was more than 600 videos, that is more than 4GB of files to watch, and some of them are harsh and crude to watch. Besides, we’re still unable to understand arabic in the text, so the only data we could use was the one in the flat files provided by ii.
Let’s compile html
And, at the time, I was playing a lot with ikiwiki, which is a markdown compilation to build static html page. So, I started looking at that. After all, it can generate html5, so it should be easy to add some \<video> tag inside a template, generating the pages form flat text is easy to do in bash and then, I just have to use git to push it and make the magic of ikiwiki works.
We will have pure html website, with smart URL, easily mirrorable (hey, no ?static=yes&wtf=ya&unknownparam&yetanotherfrckingstuff url, just 2012/02/11 for the 11st of February of 2012 events page), with a tagging system and full html5.
This was the concept. And since ikiwiki provides a local.css system, we could even asks gently and harass some designers to have a logo and some design around it (I can leave with pure HTML, but a lot of people do like fancy and rounded stuff…)
Enough talk, do it
So, first, installing what we need. I’m on a debian openvz squeeze kernel and I’m gonna use nginx to serve it. Ineed to add the unstable version of ffmpeg to support .ogv
aptitude install ikiwiki nginx ffmpeg
Th setup of ikiwiki is preety easy to do, I’ll paste you all the uncommented line of TelecomixBroadcastSystem.setup:
So, let’s start with some naming stuff, the name of the wiki, the mail of the admin and the username of the admin/
wikiname => 'Telecomix Broadcast System', adminemail => 'okhin@bloum.net'; adminuser => [qw{a_user_admin}],
Since there’s no user function available, this should be empty.
banned_users => [],
Where I’ll puth the markdown files
srcdir => '/var/ikiwiki/TelecomixBroadcastSystem',
Where ikiwki will put the
destdir => '/var/www/tbs',
What will be teh url of the website
url => 'http://broadcast.telecomix.org',
The plugins I wanna add. Goodstuff is a package with a lot of usefull plugins for ikiwki. The goodstuff plugins page on ikiwiki website will give you more details.
I wanted a sidebar (for hosting the navigation), a calendar (to enable the calendar generation) and a favicon (because they are nice). As I do not want the site to be editable, I deactivate the recentchanges plugin.
add_plugins => [qw{goodstuff sidebar calendar favicon}], disable_plugins => [qw{recentchanges}],
Some system directory and default that I’ve kept.
templatedir => '/usr/share/ikiwiki/templates', underlaydir => '/usr/share/ikiwiki/basewiki', indexpages => 0, discussionpage => 'Discussion', default_pageext => 'mdwn', timeformat => '%c', numbacklinks => 10, hardlink => 0, wiki_file_chars => '-[:alnum:]+/.:_', allow_symlinks_before_srcdir => 0,
HTML 5 is nice and fun to play with, we should use it more
html5 => 1,
A link for the post-update git wrapper (that is, once the repo received an update, automatically generates the new wiki)
git_wrapper => '/var/git/TelecomixBroadcastSystem.git/hooks/post-update', atom => 1,
I want a sidebar for all the pages
global_sidebars => 1,
I want to autogenerate tagpage, and to stores them in the tag/ directory.
tagbase => 'tag', tag_autocreate => 1,
There’s a lot more things to change, but you should have a look at the ikiwiki documentation.
Now, we have to create the various directory ”/var/ikiwiki/TelecomixBroadcastSystem” and ”/var/www/tbs”, making them writable and owned by the user you’re going to use to generate it, and to give ”/var/www/tbs” permission to be read by the nginx user.
And let(s setup the wiki:
ikiwiki --setup /path/to/your/Wiki.setup file
Let’s tweak some templates
So, now, I need some templates to work with the videos repo. One for video, one for pictures (to add a specific CSS class around them), and one for the ‘regular’ page, because I wanted a logo in top of all of them.
Video template
I added a ”template” directory into the wiki root (so, //var/ikiwiki/TelecomixBroadcastSystem/template) and I create the video.tmpl file.
The tempaltes of ikiwiki use the HTML::Toolkit system to create the needed templates, and the one I need were realtively simples one. OI think comments are not needed
<article class="video"> <video controls="controls" type="video/ogg" width="480" src="/videos/<TMPL_VAR file>" poster="/pics/SVGs/tbs_V1.svg"><TMPL_VAR alt></video> <p><TMPL_VAR alt></p> <p><a href="/videos/<TMPL_VAR file>">Direct Link to the file</a> || <a href="<TMPL_VAR original>">Original link</a></p> </article>
So, fixed width video, in HTML5, the files must be in a /videos/ webdir and there will be a poster d
isplayed on the video before playing it with one nice logos. Some more links to add context, and we’re set-up.
Notice the mime format used here: video/ogg, I want to use really free web format, that will need transcoding (but that’s a later problem). The same goes for the pictrues template.
Page template
So, the page template is a huge (and complex) one, so just a patch:
--- templates/page.tmpl 2012-03-07 15:35:45.000000000 +0000 +++ /usr/share/ikiwiki/templates/page.tmpl 2011-03-28 23:46:08.000000000 +0000 @@ -30,7 +30,6 @@ </head> <body> -<div id="logo"><a href="/" title="Dirty Bytes of Revolutions Since 1337"><img src="/pics/PNGs/tbs_V2.png" alt="Dirty Bytes of Revolutions Since 1337" /></a></div> <TMPL_IF HTML5><article class="page"><TMPL_ELSE><div class="page"></TMPL_IF> <TMPL_IF HTML5><section class="pageheader"><TMPL_ELSE><div class="pageheader"></TMPL_IF> @@ -134,7 +133,6 @@ </TMPL_UNLESS> </div> -<div class="clearfix"></div> <TMPL_IF HTML5><footer id="footer" class="pagefooter"><TMPL_ELSE><div id="footer" class="pagefooter"></TMPL_IF> <TMPL_UNLESS DYNAMIC>
The clearfix div is here for the goddamn IE browser (at least, that’s why the CSS integrator guy told me). And above, there’s the pictures.
Let’s build special pages
Sidebar.mdwn
So, the sidebar plugins, grants me the use of a sidebar.mdwn file in the root folder of the wiki.
First, some useful links (back to home, the pure text news and our webchat)
\# Quick Links \* \[Back to Home\](/index.html) \* \[News from the ground\](http://syria.telecomix.org) \* \[Webchat\](https://new.punkbob.com/chat)
What did happened this month
\# This month events
And all the page since the start of the year.
\# Events month by month
Index.mdwn
Next step is to build a nice index.mdwn page with some speech, the tag cloud and a global map of everything. I’ll skip to the interesting parts (maps and tagcloud).
Thepage list use the map directive to find all the page under 2011 and 2012 directories (one per year), that will lead to a list of all the daily pages
# Page list
This will go through all of the tag of the page, and do some computational to generate a nice cloud
Fancyness
I then added a favicon.ico file along with a local.css to the repository, the local.css need to be copied manually into the ”/var/www/tbs” directory. And now, the basic setup is done.
Commiting
So, now use git to add all those files and commit and push them. Easy to do, that will generates some files into /var/www/tbs/.
Yeepee, now, we need to populate this.
Bashing accross videos
So, I have a list of videos soemwhere here of the form:
2011-12-04 homs/al-meedan http://www.youtube.com/watch?v=-qjNo0uqSM8 Random gunfires during the night
(And yes, sometimes, Arabic characters all over the place). So, I have, date, location (that will be used for tags), URL and some comments to add. Thanks to ii’s magic (and the huge work done for month). We already add some python scripts for downloading the video, but, for this kind of things, I wanted to use something I know: bash. It will be split in 2. One half to parse the youtube’s hell pages and to download the .webm, this part is still inpython, works well and I was too lazy to rewrite it; the second half will get the video info and add the necessary information to the wiki.
And then, I’ll need to transcode it.
So, script. Let’s start with some variable, will need them later
#!/bin/bash # We want to download everything. export VIDEOS_LINK='https://telecomix.ceops.eu/material/ii/videos.txt' export VIDEOS_RAW_DIR='/var/tbs/tbs/raw/' export VIDEOS_OGV_DIR='/var/tbs/tbs/videos/' export VIDEOS_WIKI_ROOT='/var/ikiwiki/TelecomixBroadcastSystem' export VIDEOS_LIST=${VIDEOS_WIKI_ROOT}/videos.lst export VIDEOS_NEW=${VIDEOS_WIKI_ROOT}/new_videos.lst
Let’s make some cleaning, and backup, needed to now what’s new
[[ -e ${VIDEOS_LIST}.old ]] && rm -rf ${VIDEOS_LIST}.old [[ -e $VIDEOS_LIST ]] && mv $VIDEOS_LIST ${VIDEOS_LIST}.old
Get the new version of the file list
cd $VIDEOS_WIKI_ROOT wget $VIDEOS_LINK --no-check-certificate -O $VIDEOS_LIST
Update the git repository (we probably add tags since last time, so new pages) and find the new videos part (a dirty diff, with only the added lines).
git pull 2>&1 > /dev/null diff -N $VIDEOS_LIST ${VIDEOS_LIST}.old | grep -e '^<' > $VIDEOS_NEW
Loop in all the news videos to add them to the wiki.
while read LINE do
This is a bash array if you did not know how they worked
VIDEO=( $LINE ) DATE=${VIDEO[1]} TTAGS=${VIDEO[2]}
Let’s split TAGS in different words separated by space not by slash
TAGS=$(echo $TTAGS | tr '/' ' ') LINK=${VIDEO[3]}
This is how I get the same thing than [4:] in python (from 4th fields to the end of teh array)
COMMENTS=${VIDEO[@]:4:${#VIDEO[@]}}
The date is YYYY-MM-DD in the file, I want it to be YYYY/MM/DD for creating my file in the good place (YYYY/MM/DD.mdwn), like that I have an automagick hierarchy, plus, you can get to /2012/02/14 URL quite easily.
The filename is the video link with only alphanumeric characters, will be good enough for me.
VIDEO_PATH=$(echo ${DATE}.mdwn | tr '-' '/') VIDEO_FILENAME=$(echo $LINK | tr -dc '[:alnum:]')
So, if the directory (which is YYYY/MM) dos not exist, let’s create it. If the file does not exist, it means this is the first time we see something for the day. We must create the page, and add some stuff (notably the date of creation must be juked, also we add a nice title). Once the file is create, git add it to the repo.
# We have only updates which is nice, no need to check if the videos already exist [[ ! -d $(dirname ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}) ]] && mkdir -p $(dirname ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}) if [ ! -e ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH} ] git add ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH} fi
Add some tags to the page, along with the video template (one line, really fun), note the .ogv part added to the filename.
And now, download the file. I need to add a dot at the end of it, because the download scripts add the extension (without the .) to the file. I download it in a raw dir, where I’ll next transcode all the video into the proper format and directory.
# And now, download it python ${VIDEOS_WIKI_ROOT}/scripts/multiproc_videos_dl.py ${VIDEOS_RAW_DIR} "${VIDEOS_RAW_DIR}/${VIDEO_FILENAME}." "$LINK" 2>&1 > /dev/null & done < $VIDEOS_NEW
Commit al the change at once, and push it.
# While we're at it, just publish the file git commit -a -m "VIDEO updated" 2>&1 > /dev/null git push 2>&1 > /dev/null
We’re done, just transcoding now, which is pretty easy, and done in another script. Nothing special here, looping across all the file in raw dir to transcode them into the video dir.
#!/bin/bash # Transcoding a video into ogv export ORIG='/var/tbs/tbs/raw' export DEST='/var/tbs/tbs/videos' for RAW in $(ls -1 $ORIG) do NAME=${RAW%.*} echo "transcoding $NAME" [[ -e $DEST/${NAME}.ogv ]] || ffmpeg -i $ORIG/$RAW -acodec libvorbis -ac 2 -ab 96k -b 345k -s 640x360 $DEST/${NAME}.ogv rm $ORIG/$RAW done
Bashing across pictures
Same format as video, so same scripts, almost. Won’t detail it, just do sed VIDEO/PICTURE and you’re almost done. Also, the dl is done using wget –no-check-certificate.
Bashing the news
Same kind of things, except that I add the timstamp to it, but besides that, just the same thing.
Cronjobs everywhere
I just now need to auto-exec the 3 jobs above, the transcoding and some ikiwki-internal command to update the calendars, I’ve got 2 cronjobs for that executed every 6 hours
0 */6 * * * /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_news.bash 2>&1 > /dev/null && /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_pictures.bash 2>&1 > /dev/null && /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_video.bash 2>&1 > /dev/null && /var/tbs/transcode.sh > /dev/null 2>/dev/null 0 1/6 * * * ikiwiki-calendar /var/ikiwiki/TelecomixBroadcastSystem.setup "2011/* or 2012/*" 2012
This is the end
Now the wiki auto-build itself. I then just needed to tweak the nginx to suit my needs bt that was really easy to do. I just need to keep in mind that I’m in need of two aliases (one for /videos, one for /pictures) because I did not wanted to commit all the videos in the git directory (that eat a lot of space), and to tell it that .ogv aare indeed video files.
server { listen 80; ## listen for ipv4 listen [::]:80 default ipv6only=on; ## listen for ipv6 server_name broadcast.telecomix.org; access_log off; location / { root /var/www/tbs; index index.html index.htm; } location /pictures { alias /var/tbs/pictures; autoindex off; } location /videos { alias /var/tbs/videos; autoindex off; } }
And I just need to edit the mime.types file to add those line at the end of the file:
video/ogg ogm; video/ogg ogv; video/ogg ogg;
That’s it, everything worked fine now. A final thing was needed, to spread it easily (and that’s why I wanted static pages), ease the process of mirroring. The best way to do this is to use rsync in daemon mode with three modules read-only.
Installation of rsync is piece of cake:
aptitude install rsync
You then need to enable it in debian, for this, editing the file /etc/default/rsync is the way to go. I wanted to throttle it down and to keep it nice on the I/O (because I already have too much process that eat my cpu like, transcoding), so I’ve enabled those options in the same file:
RSYNC_ENABLE=true RSYNC_OPTS='--bwlimit 200' RSYNC_NICE='10 RSYNC_IONICE='-c3'
And then, in the /etc/rsyncd.conf, I’ve added those modules
max connections = 10 log file = /dev/null timeout = 200 [tbs] comment = Telecomix Broadcast System path = /var/www/tbs read only = yes list = yes uid = nobody gid = nogroup [videos] comment = Telecomix Broadcast System - videos path = /var/tbs/videos read only = yes list = yes uid = nobody gid = nogroup [pictures] comment = Telecomix Broadcast System - pictures path = /var/tbs/pictures read only = yes list = yes uid = nobody gid = nogroup
ANd that’s it, people can now duplicate the whole thing on a simple web server (they just need space) without anything else on it that serving webpage.