Broadcasting news

A little introduction

Everything started from an non-planing stuff done on #opsyria. To give you some context, we have a bot there, named ii, that’s help us with information management.

Birth and death of a bot

ii’s birth dates back to the second phase of opsyria, the phase were we go wild and try to get some contacts with Syrians. It was first a greetings bots, telling new comers some safety tips in Syrian (because we still do not speak Syrian).

Then, we fired up a tweeter account, and so, we add twitter functions to ii. And status.net also (for our status.net platform). And then, we added it the possibility to repeat interesting stuff ii saw on those platform (publishing on IRC the thing he saw in its following list on both platforms).

Then, we had some problem with the micro bloging thing. 140 characters is short, especially when you use arabic and weird unicode chars. So, we build a news functionality, that leads us to our news website where we still publish real time news form the ground, due to our contacts help.

After that, things went crazy. Lots of videos were posted online and we started indexing them. here came the videos functionality (and later on the pics one, same thing, but with pictures) and we started building an index of all videos related to Syrian events.

So, this is how we built on 6 month, our database of information, with dates, places and comments of each videos, pictures or news we can find. We build different websites using these and, one day, we realized that, it could be nice for preservation of the data, to extract them from the website they are located to be sure they will always be online.

We had fears that Syrian officials (or Assad’s supporters) could manage to get youtube or facebook accounts closed, and then have the videos unavailable and lost for everyone.

The archiving idea

At the 28C3, we already had a somewhat big databases. And a script that could download each video, and stores them on a website, as ‘static file’ with a non-friendly user interface (apache directory listing) located here: http://syria-videos.ceops.eu/

Some journalists just told us that it was nice, but not really usable (no way to easily parse stuff, or to find events related to one particular date, and so on). So, we started to think about how we could do that.

Parsing it by hand was out of questions, there was more than 600 videos, that is more than 4GB of files to watch, and some of them are harsh and crude to watch. Besides, we’re still unable to understand arabic in the text, so the only data we could use was the one in the flat files provided by ii.

Let’s compile html

And, at the time, I was playing a lot with ikiwiki, which is a markdown compilation to build static html page. So, I started looking at that. After all, it can generate html5, so it should be easy to add some \<video> tag inside a template, generating the pages form flat text is easy to do in bash and then, I just have to use git to push it and make the magic of ikiwiki works.

We will have pure html website, with smart URL, easily mirrorable (hey, no ?static=yes&wtf=ya&unknownparam&yetanotherfrckingstuff url, just 2012/02/11 for the 11st of February of 2012 events page), with a tagging system and full html5.

This was the concept. And since ikiwiki provides a local.css system, we could even asks gently and harass some designers to have a logo and some design around it (I can leave with pure HTML, but a lot of people do like fancy and rounded stuff…)

Enough talk, do it

So, first, installing what we need. I’m on a debian openvz squeeze kernel and I’m gonna use nginx to serve it. Ineed to add the unstable version of ffmpeg to support .ogv

aptitude install ikiwiki nginx ffmpeg

Th setup of ikiwiki is preety easy to do, I’ll paste you all the uncommented line of TelecomixBroadcastSystem.setup:

So, let’s start with some naming stuff, the name of the wiki, the mail of the admin and the username of the admin/

wikiname => 'Telecomix Broadcast System', adminemail => 'okhin@bloum.net'; adminuser => [qw{a_user_admin}],

Since there’s no user function available, this should be empty.

banned_users => [],

Where I’ll puth the markdown files

srcdir => '/var/ikiwiki/TelecomixBroadcastSystem',

Where ikiwki will put the

destdir => '/var/www/tbs',

What will be teh url of the website

url => 'http://broadcast.telecomix.org',

The plugins I wanna add. Goodstuff is a package with a lot of usefull plugins for ikiwki. The goodstuff plugins page on ikiwiki website will give you more details.

I wanted a sidebar (for hosting the navigation), a calendar (to enable the calendar generation) and a favicon (because they are nice). As I do not want the site to be editable, I deactivate the recentchanges plugin.

add_plugins => [qw{goodstuff sidebar calendar favicon}], disable_plugins => [qw{recentchanges}],

Some system directory and default that I’ve kept.

templatedir => '/usr/share/ikiwiki/templates', underlaydir => '/usr/share/ikiwiki/basewiki', indexpages => 0, discussionpage => 'Discussion', default_pageext => 'mdwn', timeformat => '%c', numbacklinks => 10, hardlink => 0, wiki_file_chars => '-[:alnum:]+/.:_', allow_symlinks_before_srcdir => 0,

HTML 5 is nice and fun to play with, we should use it more

html5 => 1,

A link for the post-update git wrapper (that is, once the repo received an update, automatically generates the new wiki)

git_wrapper => '/var/git/TelecomixBroadcastSystem.git/hooks/post-update', atom => 1,

I want a sidebar for all the pages

global_sidebars => 1,

I want to autogenerate tagpage, and to stores them in the tag/ directory.

tagbase => 'tag', tag_autocreate => 1,

There’s a lot more things to change, but you should have a look at the ikiwiki documentation.

Now, we have to create the various directory ”/var/ikiwiki/TelecomixBroadcastSystem” and ”/var/www/tbs”, making them writable and owned by the user you’re going to use to generate it, and to give ”/var/www/tbs” permission to be read by the nginx user.

And let(s setup the wiki:

ikiwiki --setup /path/to/your/Wiki.setup file

Let’s tweak some templates

So, now, I need some templates to work with the videos repo. One for video, one for pictures (to add a specific CSS class around them), and one for the ‘regular’ page, because I wanted a logo in top of all of them.

Video template

I added a ”template” directory into the wiki root (so, //var/ikiwiki/TelecomixBroadcastSystem/template) and I create the video.tmpl file.

The tempaltes of ikiwiki use the HTML::Toolkit system to create the needed templates, and the one I need were realtively simples one. OI think comments are not needed

<article class="video">     <video controls="controls" type="video/ogg" width="480" src="/videos/<TMPL_VAR file>" poster="/pics/SVGs/tbs_V1.svg"><TMPL_VAR alt></video>     <p><TMPL_VAR alt></p>     <p><a href="/videos/<TMPL_VAR file>">Direct Link to the file</a> ||     <a href="<TMPL_VAR original>">Original link</a></p> </article>

So, fixed width video, in HTML5, the files must be in a /videos/ webdir and there will be a poster d
isplayed on the video before playing it with one nice logos. Some more links to add context, and we’re set-up.

Notice the mime format used here: video/ogg, I want to use really free web format, that will need transcoding (but that’s a later problem). The same goes for the pictrues template.

Page template

So, the page template is a huge (and complex) one, so just a patch:

--- templates/page.tmpl 2012-03-07 15:35:45.000000000 +0000 +++ /usr/share/ikiwiki/templates/page.tmpl      2011-03-28 23:46:08.000000000 +0000 @@ -30,7 +30,6 @@  </head>  <body>  -<div id="logo"><a href="/" title="Dirty Bytes of Revolutions Since 1337"><img src="/pics/PNGs/tbs_V2.png" alt="Dirty Bytes of Revolutions  Since 1337" /></a></div>  <TMPL_IF HTML5><article class="page"><TMPL_ELSE><div class="page"></TMPL_IF>   <TMPL_IF HTML5><section class="pageheader"><TMPL_ELSE><div class="pageheader"></TMPL_IF> @@ -134,7 +133,6 @@  </TMPL_UNLESS>   </div> -<div class="clearfix"></div>   <TMPL_IF HTML5><footer id="footer" class="pagefooter"><TMPL_ELSE><div id="footer" class="pagefooter"></TMPL_IF>  <TMPL_UNLESS DYNAMIC>

The clearfix div is here for the goddamn IE browser (at least, that’s why the CSS integrator guy told me). And above, there’s the pictures.

Let’s build special pages

Sidebar.mdwn

So, the sidebar plugins, grants me the use of a sidebar.mdwn file in the root folder of the wiki.

First, some useful links (back to home, the pure text news and our webchat)

\# Quick Links \* \[Back to Home\](/index.html) \* \[News from the ground\](http://syria.telecomix.org) \* \[Webchat\](https://new.punkbob.com/chat)

What did happened this month

\# This month events

And all the page since the start of the year.

\# Events month by month

Index.mdwn

Next step is to build a nice index.mdwn page with some speech, the tag cloud and a global map of everything. I’ll skip to the interesting parts (maps and tagcloud).

Thepage list use the map directive to find all the page under 2011 and 2012 directories (one per year), that will lead to a list of all the daily pages

# Page list

This will go through all of the tag of the page, and do some computational to generate a nice cloud

Fancyness

I then added a favicon.ico file along with a local.css to the repository, the local.css need to be copied manually into the ”/var/www/tbs” directory. And now, the basic setup is done.

Commiting

So, now use git to add all those files and commit and push them. Easy to do, that will generates some files into /var/www/tbs/.

Yeepee, now, we need to populate this.

Bashing accross videos

So, I have a list of videos soemwhere here of the form:

2011-12-04 homs/al-meedan http://www.youtube.com/watch?v=-qjNo0uqSM8 Random gunfires during the night

(And yes, sometimes, Arabic characters all over the place). So, I have, date, location (that will be used for tags), URL and some comments to add. Thanks to ii’s magic (and the huge work done for month). We already add some python scripts for downloading the video, but, for this kind of things, I wanted to use something I know: bash. It will be split in 2. One half to parse the youtube’s hell pages and to download the .webm, this part is still inpython, works well and I was too lazy to rewrite it; the second half will get the video info and add the necessary information to the wiki.

And then, I’ll need to transcode it.

So, script. Let’s start with some variable, will need them later

#!/bin/bash # We want to download everything. export VIDEOS_LINK='https://telecomix.ceops.eu/material/ii/videos.txt' export VIDEOS_RAW_DIR='/var/tbs/tbs/raw/' export VIDEOS_OGV_DIR='/var/tbs/tbs/videos/' export VIDEOS_WIKI_ROOT='/var/ikiwiki/TelecomixBroadcastSystem' export VIDEOS_LIST=${VIDEOS_WIKI_ROOT}/videos.lst export VIDEOS_NEW=${VIDEOS_WIKI_ROOT}/new_videos.lst

Let’s make some cleaning, and backup, needed to now what’s new

[[ -e ${VIDEOS_LIST}.old ]] && rm -rf ${VIDEOS_LIST}.old [[ -e $VIDEOS_LIST ]] && mv $VIDEOS_LIST ${VIDEOS_LIST}.old

Get the new version of the file list

cd $VIDEOS_WIKI_ROOT wget $VIDEOS_LINK --no-check-certificate -O $VIDEOS_LIST

Update the git repository (we probably add tags since last time, so new pages) and find the new videos part (a dirty diff, with only the added lines).

git pull 2>&1 > /dev/null diff -N $VIDEOS_LIST ${VIDEOS_LIST}.old | grep -e '^<' > $VIDEOS_NEW

Loop in all the news videos to add them to the wiki.

while read LINE do

This is a bash array if you did not know how they worked

        VIDEO=( $LINE )         DATE=${VIDEO[1]}         TTAGS=${VIDEO[2]}

Let’s split TAGS in different words separated by space not by slash

        TAGS=$(echo $TTAGS | tr '/' ' ')         LINK=${VIDEO[3]}

This is how I get the same thing than [4:] in python (from 4th fields to the end of teh array)

        COMMENTS=${VIDEO[@]:4:${#VIDEO[@]}}

The date is YYYY-MM-DD in the file, I want it to be YYYY/MM/DD for creating my file in the good place (YYYY/MM/DD.mdwn), like that I have an automagick hierarchy, plus, you can get to /2012/02/14 URL quite easily.

The filename is the video link with only alphanumeric characters, will be good enough for me.

        VIDEO_PATH=$(echo ${DATE}.mdwn | tr '-' '/')         VIDEO_FILENAME=$(echo $LINK | tr -dc '[:alnum:]')

So, if the directory (which is YYYY/MM) dos not exist, let’s create it. If the file does not exist, it means this is the first time we see something for the day. We must create the page, and add some stuff (notably the date of creation must be juked, also we add a nice title). Once the file is create, git add it to the repo.

        # We have only updates which is nice, no need to check if the videos already exist         [[ ! -d $(dirname ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}) ]] && mkdir -p $(dirname ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH})         if [ ! -e ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH} ]                 git add ${VIDEOS_WIKI_ROOT}/${VIDEO_PATH}         fi

Add some tags to the page, along with the video template (one line, really fun), note the .ogv part added to the filename.

And now, download the file. I need to add a dot at the end of it, because the download scripts add the extension (without the .) to the file. I download it in a raw dir, where I’ll next transcode all the video into the proper format and directory.

        # And now, download it         python ${VIDEOS_WIKI_ROOT}/scripts/multiproc_videos_dl.py ${VIDEOS_RAW_DIR} "${VIDEOS_RAW_DIR}/${VIDEO_FILENAME}." "$LINK" 2>&1 > /dev/null &  done < $VIDEOS_NEW

Commit al the change at once, and push it.

# While we're at it, just publish the file git commit -a -m "VIDEO updated" 2>&1 > /dev/null git push 2>&1 > /dev/null

We’re done, just transcoding now, which is pretty easy, and done in another script. Nothing special here, looping across all the file in raw dir to transcode them into the video dir.

#!/bin/bash # Transcoding a video into ogv export ORIG='/var/tbs/tbs/raw' export DEST='/var/tbs/tbs/videos'  for RAW in $(ls -1 $ORIG) do         NAME=${RAW%.*}         echo "transcoding $NAME"         [[ -e $DEST/${NAME}.ogv ]] || ffmpeg -i $ORIG/$RAW -acodec libvorbis -ac 2 -ab 96k -b 345k -s 640x360 $DEST/${NAME}.ogv         rm $ORIG/$RAW done

Bashing across pictures

Same format as video, so same scripts, almost. Won’t detail it, just do sed VIDEO/PICTURE and you’re almost done. Also, the dl is done using wget –no-check-certificate.

Bashing the news

Same kind of things, except that I add the timstamp to it, but besides that, just the same thing.

Cronjobs everywhere

I just now need to auto-exec the 3 jobs above, the transcoding and some ikiwki-internal command to update the calendars, I’ve got 2 cronjobs for that executed every 6 hours

0 */6 * * * /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_news.bash 2>&1 > /dev/null && /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_pictures.bash 2>&1 > /dev/null && /var/ikiwiki/TelecomixBroadcastSystem/scripts/dl_video.bash 2>&1 > /dev/null && /var/tbs/transcode.sh > /dev/null 2>/dev/null 0 1/6 * * * ikiwiki-calendar /var/ikiwiki/TelecomixBroadcastSystem.setup "2011/* or 2012/*" 2012

This is the end

Now the wiki auto-build itself. I then just needed to tweak the nginx to suit my needs bt that was really easy to do. I just need to keep in mind that I’m in need of two aliases (one for /videos, one for /pictures) because I did not wanted to commit all the videos in the git directory (that eat a lot of space), and to tell it that .ogv aare indeed video files.

server {          listen   80; ## listen for ipv4         listen   [::]:80 default ipv6only=on; ## listen for ipv6          server_name  broadcast.telecomix.org;          access_log off;          location / {                 root   /var/www/tbs;                 index  index.html index.htm;         }          location /pictures {                 alias   /var/tbs/pictures;                 autoindex off;         }          location /videos {                 alias   /var/tbs/videos;                 autoindex off;         }  }

And I just need to edit the mime.types file to add those line at the end of the file:

    video/ogg                             ogm;     video/ogg                             ogv;     video/ogg                             ogg;

That’s it, everything worked fine now. A final thing was needed, to spread it easily (and that’s why I wanted static pages), ease the process of mirroring. The best way to do this is to use rsync in daemon mode with three modules read-only.

Installation of rsync is piece of cake:

aptitude install rsync

You then need to enable it in debian, for this, editing the file /etc/default/rsync is the way to go. I wanted to throttle it down and to keep it nice on the I/O (because I already have too much process that eat my cpu like, transcoding), so I’ve enabled those options in the same file:

RSYNC_ENABLE=true RSYNC_OPTS='--bwlimit 200' RSYNC_NICE='10 RSYNC_IONICE='-c3'

And then, in the /etc/rsyncd.conf, I’ve added those modules

max connections = 10 log file = /dev/null timeout = 200  [tbs] comment = Telecomix Broadcast System path = /var/www/tbs read only = yes list = yes uid = nobody gid = nogroup  [videos] comment = Telecomix Broadcast System - videos path = /var/tbs/videos read only = yes list = yes uid = nobody gid = nogroup  [pictures] comment = Telecomix Broadcast System - pictures path = /var/tbs/pictures read only = yes list = yes uid = nobody gid = nogroup

ANd that’s it, people can now duplicate the whole thing on a simple web server (they just need space) without anything else on it that serving webpage.


Posted

in

,

by