+[[!meta date="Thu, 27 Feb 2014 20:22:00 +0100"]]
+
# Debian: watch your stats!
Over the past few weeks, myself and Matthieu Caneill have worked quite a bit on
[**Debsources**](http://anonscm.debian.org/gitweb/?p=qa/debsources.git). As we
-have now deployed most of them on <http://sources.debian.net>, it's time for
-another "what's new with Debsources?" post. Here is what's new:
+have now deployed most of the new features on <http://sources.debian.net>, it's
+time for another *"What's new with Debsources?"* blog post. Here is what's new:
-* Debsources now knows about **suites**, i.e. it knows which package is in
- which Debian "release" (stable, testing, unstable, ...). This knowledge is
- already used by some of the features below and will be used more in the
- future.
+* Debsources now knows about Debian **suites**, i.e. which package is in which
+ "release" (stable, testing, unstable, ...). This knowledge is already useful
+ for some of the other features below and will be used more in the future.
* [[since last summer|2013/09/sources.debian.net_-_advanced_search_and_other_news]]
- Debsources has been running **sloccount** on all unpacked source packages
- (together with ctags and du), but the resulting information wasn't exposed on
+ Debsources has been running **sloccount** on all unpacked source packages,
+ together with ctags and du, but the resulting information wasn't exposed on
the Web. This is now fixed. Each package now has an **infobox**
([example](http://sources.debian.net/src/linux/3.2.54-2)) which shows: disk
usage, archive area, suites, and sloccount with per-language breakdown. The
new infobox also subsumes the old puny list of package links.
-* we now gather and plot accurate
+ You can easily embed the infobox in other webapps if you need to
+ ([example](http://sources.debian.net/embed/pkginfo/linux/3.2.54-2/)). Check
+ the [URL scheme doc](http://sources.debian.net/doc/url/) for more info.
+
+* Debsources now gathers and plot accurate Debian sources
[**statistics**](http://sources.debian.net/stats/), both overall and
- per-suite, about both the current content of Debsources and its **historical
- trends**. (Yeah, I know, the charts are not particularly good looking ATM,
- but that's easy to change without impacting the rest. If you're a
- [matplotlib](http://matplotlib.org/) artist willing to help, please step
+ per-suite, in both snapshot and **historical trends** flavors.
+
+ (Yeah, I know, the charts are not particularly good looking ATM, but that's
+ easy to change without impacting the rest. So if you're a
+ [matplotlib](http://matplotlib.org/) artist and willing to help, please step
forward!)
-* many changes have been going on also on the **plumbing** layer to make the
- service less resource hungry, in view of a migration to the official Debian
- infrastructure (which I've in the meantime started discussing with DSA), in
- particular:
+* many changes have been going on also at the **plumbing** layer to make the
+ service less resource hungry and more maintainable, in view of a migration to
+ the official Debian infrastructure --- which I've in the meantime started
+ discussing with DSA. Some highlights:
+
+ * Debsources now has a rather comprehensive **test suite**, built using
+ [Nose](https://nose.readthedocs.org/en/latest/). Most notably, we do test
+ full update runs down to source unpacking (of a small subset of a Debian
+ mirror), DB injection, and plugin execution --- which is quite neat.
* the updater is now much faster (about 2x) and might require, in
pathological cases, 10x *less* memory than before. Memory usage now caps at
- around 300MB when injecting ctags for large packages like linux, chromium,
- and libreoffice.
-
- * the DB schema went through various refactoring cycles, and most notably now
- uses a **separate file table** to index all known source file paths. In the
- past path information was duplicated throughout the checksums and ctags
- tables, not only wasting DB space, but also making the presence of file
- information conditional on the enablement of at least one of the two
- corresponding plugins. This is now fixed---and migrating the full DB has
- been quite "fun". Unfortunately, we've also added quite a few large-ish
- indexes, resulting in no significant changes in DB size (currently at
- ~50GB), but at least in much faster queries :-) The next step on this front
- will be the addition of path-based searches, using the excellent Postgres
+ around 300MB, even when injecting ctags for large packages such as linux,
+ chromium, and libreoffice.
+
+ * the DB schema went through several refactoring cycles, and now uses a
+ separate **file table** to index all known source file paths. In the past
+ path information were duplicated across the checksums and ctags tables, not
+ only wasting DB space, but also making the presence of file information
+ conditional on the enablement of at least one of the two corresponding
+ plugins. This is now fixed --- and migrating the full DB has been quite
+ "fun". Unfortunately, we've also added quite a few large-ish indexes,
+ resulting in no significant overall changes in DB size (currently at
+ ~50GB), but at least in much faster queries :-)
+
+ The next step on this front will be the addition of path-based searches,
+ using the excellent Postgres
[trigram indexes](http://www.postgresql.org/docs/9.1/static/pgtrgm.html).
Want more? Sure, we'll be happy to! But it'll happen faster if you
[announcement](https://lists.debian.org/debian-devel-announce/2014/02/msg00009.html))
and we're looking forward to mentor new contributors.
-[[!tag lang/english planet-debian debian qa debsources draft]]
+[[!tag lang/english planet-debian debian qa debsources]]