EuroPython 2012

Guido van Rossum

Write unit tests - write integration tests.

Keep being clever. Don’t think your tools will solve all the problems. If you want to be naive, then delegate software development to someone else, and go and make money or something!

Almost all functionality on YouTube is written in python (apart from the codecs). python clearly is fast enough for real work.

Functional programming - interesting computer science stuff - not useful very often when integrating third party packages in the real world.

Questions and Answers

assert, why isn’t it used?

It is used by many people. Test your code with -o (what is that)? Even if -o did not exist, then GVM would still say that assert is a valid part of the language. Clever because the code following is not evaluated unless the assertion is True.

Misguided to use the assert statement for unit tests, because -o would disable all your tests.

Alex Martelli

Permission or Forgiveness

Sometimes the best answer, is neither!

The only way Grace Hopper could innovate so much in a bureaucratic US Navy was by doing it first and asking for forgiveness later. See Why does it tend to work in the slides (funny, but true):

if isinstance(x, numbers.Number):

… is an idiomatic, well-supported way to perform typeclass checking, but don’t forget, not ALL problems are nails.

Even better, when applicable, use defaults:

d.get(key, 'default')

Concurrency - perform desired operations within a transaction. Databases have been doing this for a while.

Favourite example, is source code control. Change local copy, commit, detect conflicts, forces reconciliation.

Also see, Launching a Product, very interesting to compare and think about the different strategies.

Mandate a code review. Pair programming is not a good alternative. People who were not actively involved in writing the code will spot bits of code which are unclear. Unpopular with young hot heads. Privacy and security cannot be an afterthought! Architectural, privacy and security reviews.

The benefits of paranoia: skills that make good programmers aren’t suitable for human interaction.

Febo Cincotti

Let your brain talk to computers

Brain, computer interfaces (BCIs).

http://www.bcistandards.org/

http://www.brindisys.it/

PyPy - Current status and GIL-less future

You don’t want to program using threads.

Lightning Talks

Transifex

Translate into many different languages.

How I became a bitterman

Joined Canonical UbuntuOne - Port it to Windows - based on Twisted. Or, why I hate Windows paths (some funny examples). os.normpath doesn’t work, os.path.split doesn’t work, os.listdir doesn’t work! ShFileOperation, ShellLink (from Microsoft) doesn’t work!!

eGenix, PyRun

One file python runtime environment in a single, very tiny file. Install without any side-effects. No installer, just copy the file. Independant of the system python installation, 12MB, etc, etc…

Can be used for commercial projects. linux, Mac (Windows support might follow).

ScraperWiki

A tool for analysing data online and making collaborations.

Tell us how you would do it.

SQLAlchemy and Testing

github.com/riffm/testalchemy

Most reliable way is to recreate the database schema for every test, but this is very slow. Wrapping each test in a transaction and rolling back after every test is fast.

See Restorable context manager in the source code.

My guess would be that this module is not very useful.

PSF Members Meeting

All invited to the meeting at 6:30pm. You can become a member…

OpenTaste.eu

Even python developers must eat. The site is made from Flask :)

RedEddy

Technical computing in the cloud

http://kivy.org/#gallery

Not WIMP, but mobile, tablets, OpenGL, touch, etc, etc.

Cross platform, Windows, Android, Linux, iOS. Applications have been accepted in the Apple App Store. All written using the GPU, OpenGL.

Check out the gallery, search on Google Play for kivy.

Talk on Friday, to share experiences, OpenGL and python on Android/iOS.

http://brochure.getpython.info/

python software foundation, printed brochure is ready!

Sponsorship options (including education).

Psycopg

Wrapper, old fashioned C extension. PyPy has tried to do something different. Alex Gaynor tried to write a subset of Psycopg. Another person tried to port using ctypes. Now written in pure python using a ctypes wrapper. People started using it - now we have some problems. Not yet working on python 3. Need somebody to take charge of pypy integration. Could end up dropping the original Psycopg.

http://readthedocs.org/projects/artichoke/

Web micro framework. One file (very clever presentation). Similar to TurboGears.

DocBook to Sphinx

Sphinx - 30% faster by not doing index generation. SOLR/Lucene does it better anyway.

Natural language processing and geocoding

There is not a perfect solution, but that doesn’t matter. The beginning is usually good enough. The first 70-80% is easy and fun. To get started, read the NLTK documentation.

Moin-Moin - Whoosh

Fast, pythonic, pure python search library. It is rather nice :) Can do highlighting and has a built in spell checker. It is a library, not a separate server or process - just import it and use it.

Dynamic fields are a nice feature - you don’t need to know the name of the field - just give a glob field name.

Works with Google App Engine.

You can also time limit searches.

python Anywhere

PyCon UK

Alex, PyCon UK is not dead. Coventry TechnoCentre, early bird from Friday 6th July, Friday 28th September for the weekend.

Book tickets using EventBrite.

Building JavaScript Widgets

Fitocracy.com

Django application. Django application. We spend too much time sitting… but we do like playing games.

pyRserve

A network bridge from python to the statistics package R

Social Eating Revolution

Gian Luca Ranno, Gnammo

Technology - Django, already have 2360 users and more than 30 events (within one year).

Django (easy to learn for designers), Rabbit MQ, logging, social-auth, Django PayPal, Fabric, Supervisor

Thank you for creating these great plugins.

Uploading to PyPI

Pelle

All of us should upload stuff. It is very easy to do.

@peralmq

Large XML with Unicode and namespaces

Need to stream…

Some people use codecs.open() and codecs.write(), but this feels like desperation.

Wrote our own loxun - pure python, only writes XML, streaming, raises an error for some errors. Try it!!

How not to write a micro-framework

Daniel Pope
@lordmauve

Qubes - a secure client OS

Following #euro2012

NHS Hack Day 2012

22-23rd September in Liverpool

Django Bitcoin

django-bitcoin
Open source currency. Governments cannot control.

MoinMoin

Rewritten - now version 2 - look at it again!
Jinja 2, Flask, https://bitbucket.org/jek/flatland/ for validating forms
HTML5
Supports RST

batou

Christian Theune

OpenStack and OpenShift

http://www.pixelbeat.org/talks/openstack_python/

Operational details of a large python project.

What is OpenStack?

IaaS (like Amazon AWS)
Open Source
2 years old.
Mainly written in python (300k lines of python)

Public or Private

Sensitive logic or data
Amazon have partnered with Eucalyptus to offer a private cloud (March 2012)
OpenStack aims to support public and private.

Who uses OpenStack?

Rackspace
HP
Supports multi-tenancy

Overview

Nova == EC2 (central service)
Swift == S3
Glance, VM image, registration
Keystone, identity and auth
Horizon, Admin UI (Django)
Quantum, networking
Volume, EBS

Compute Service

Postgres or MySQL
Choice of queue e.g. RabbitMQ

python Technologies

SQLAlchemy, Django, eventlet, paste, PasteDeploy, httplib2, webob, routes, python-cloudfiles, sqlalchemy-migrate etc, etc.

Project Packaging

Consume through distros. Difficult to install yourself!!
6 monthly release cycle.

Development

Always on trunk
Releases done to stable branch
git
Gerrit, patch review server (created by Google for Android). Looks nice :)
Jenkins (requires alot of ongoing maintenance).

Commit process

git branch; git commit
./run_tests, unit tests within a virtualenv, nose used, pep8
git-review python tool, used to submit to gerrit

Related python projects

OZ, ISO - image - glance - nova - libvirt - KVM
Heat API, AWS cloud formation, provision apps in the cloud.
cloud-init, package install etc, https://launchpad.net/cloud-init/

Similar non python projects

oVirt, Java, for private data centres
Eucalyptus, C, less general, closed editions, EC2 functionality
CloudStack, Java, parts of this are closed
OpenNebula, C++ datacentres

Try it

fedoraproject.org/wiki/Getting_started_with_OpenStack

You can run this on a VM!!

OpenShift

https://github.com/openshift/

PaaS, you code the application, you want to deploy it. You don’t want to care about the deploy stuff.

Is a free PaaS by Red Hat Hosted at openshift.com Based on Amazon EC2

python, java, node.js, php, perl, ruby

MySQL, PostgreSQL, mongoDB

Your app will still use the basic stuff

Open source project, tutorials, live CD, runs in a VM, Apache 2 license

Can be run on OpenStack

Run your own multi-node, multi-tenant PaaS using OpenShift, OpenStack and Fedora on your own hardware.

A cartridge adds resources to your application e.g. PostgreSQL or MongoDB.

Check out the django-example on github

3 free apps on their hosted version.

juju - Service Orchestration and Deployment

james.page@ubuntu.com

jamespage on IRC
#juju

Written in python and Twisted. Coordinates service install onto Ubuntu servers. Does not replace Puppet or Chef e.g. juju can deploy a database and an application. Can also scale up and down horizontally. Abstracts you from your underlying infrastructure.

The local provider is for developers. Can deploy onto EC2, OpenStack or bare metal servers.

Charms

Can be written in any language
Have a well defined structure.
Has configuration options to allow the application to be personalized.
Hooks - install, start, relation related (join, change), upgrade
Store - several charms built. By default, charms will come from here.

Demo

Install demo Django application - Summit
Charm written by Mark Mimms, and uses Michael Nelson’s generic Django charm
Majority of configuration is done using puppet.
Start (after installing local provider): juju bootstrap
juju deploy postgresql
juju deploy memcached
juju deploy --config europython.yaml local:summit europython-summit
(Summit is not in the Charm store, which explains the previous command)
juju add-relation europython-summit postgresql:db
juju add-relation europython-summit memcached
juju expose europython-summit
(service is made public when it is exposed)
juju status europython-status
juju debug-log (a log aggregation)
juju debug-hooks (uses a tmux session)

PostgreSQL charm cannot currently scale…

OpenStack is probably the most complicated charm set because of the multitude of options. Ubuntu JuJu MAAS (metal as a service) project was started to allow installation of JUJU on bare metal servers. Takes about 11 minutes to install OpenStack onto 9 servers.

http://tinyurl.com/juju-at-scale Testing on EC2 with 2000 nodes of Hadoop.. Took 7 or 8 hours to provision!

A couple of third party projects, https://launchpad.net/juju-jitsu/ and https://launchpad.net/charm-tools

Switching provider is dead easy…

Questions

Check out the Summit and Evernote charms to see how to pull code from source control systems.

Switching provider is dead easy… Can use current version of Ubuntu server. On the client, will work with OS10 and Debian.

Be careful with your data - the current charms might not take care of it

No automation for intelligent scale up and scale down, but juju will replace nodes which disappear.

Proposal to support verification e.g. smoke test, at some time in the future.

Diving into Flask

A.Mishkovskyi

Switched away from PHP to python with Flask. Second largest social network in Netherlands.

Considered Django, Pyramid etc.

Simple start application. Bus factor of one (everything written by one person). Loads of code behind the simple looking starting app.

End up with complex routes, loads of parameters… Flask has the ability to do things in many different ways e.g. method specific parameters. How does this work. Explored class based views. Can use manual dispatching. Much easier to use a decorator… Or use class based views with method names e.g. get… Check out the source code in the View class.

Routes… Rule creates regexp. Map holds all rules. Converters map to python code. Rules can match URLs and subdomain. Rule objects are stored in a Map in sorted order.

Modular applications - easier to develop. Pluggable. Blueprints - needed API versioning, url_prefix, splitting admin and API endpoints, each blueprint has it’s own template folder. Blueprints are a simple proxy object. Great example for writing plugins.

Wanted to use a proper ORM. There is only one - SQLAlchemy… Not obvious how Flask-SQLAlchemy actually works. Code just helps with debugging. Bind is the SQLAlchemy engine or pure connection object. Bare metal, so you really have to know what you are doing. If you specify __bind__ it will do the proper thing… See get_bind in Flask-SQLAlchemy. To achieve master-slave support, db.session.using_bind('slave')... (custom code)! Could use the bind for sharding etc… Using SQLAlchemy-migrate which is very old and not being actively maintained. Had to write a wrapper to run migrate. Consider switching to Alembic, which is written by Mike Bayer. Is very mature right now.

Deferring tasks. Can now use Celery in Flask. Removes the hassle of using amqplib/pika. Documentation is confusing and misleading. Flask-Script is a requirement for Flask-Celery. Most of the commands work!! Is over-engineered in many ways. Celery colourizes logs - they don’t like colours! Solution - add after_setup_logger signal and reassign our own formatters. Also, set CELERYD_HIJACK_ROOT_LOGGER to False, but this caused more problems. Solution - do not use root logger!! Two years old, but nothing has been fixed. To monitor celery, subclass Polaroid… Celery + SQLAlchemy + MySQL issues - solution to drop whole connection each time the worker starts - loses all connections (sounds like a complete nightmare)!!

Flask-Cache, all views are non-cacheable, so not very useful. Wrote their own. libredis was in a pretty bad shape - have improved it.

Flask-DebugToolbar, very good at identifying bottlenecks. Is a very good example of blueprint based design.

Is no longer an April Fool’s joke. Still micro, but not in terms of features.

ecosystem, is not on a par with Flask in places. Interoperability is rough in places. Lacks BDFL for extensions - do not know which ones to use.

How to bootstrap a startup using Django

Gidsy is a marketplace where anyone can explore and book activities.

@gidsynews
@__philw__
@jezdez (Django core developer)

Why we chose Django

Big community
Network
Language
Many problems already solved
The admin
Proven technology by similar use cases
Stable APIs in a well defined release process
Good documentation with focus on prose
Huge community of 3rd party components (2600 apps on chishop)

Haystack

Can write python, can integrate with Django. City page on the site is based on search technology. Spatial date will be very important in the future.

Customisable search abstraction
Indexing, filtering, faceting, “more like this”
Spatial search and sorting

TastyPie

Can easily hook into Django..

Highly customisable Web API library
Hooks for auth, throttling, caching, custom serialisation etc
Backbone.js compatible

Django has a very strong paradigm.

Celery

Async code execution, cronjobs (a few periodic tasks)
Thumbnails, search index updates, caching etc.
Collect stats without blocking

Very easy to put on a separate server.

Memcache

Periodic cache refreshing for high traffic sites
Fragment caching with dates and cache version
Cache warmup during deployment.

Using Celery to build data for pages. 37 signals had a great article on this a few months ago. Special field, refresh_date… if something was changed by the user, then all keys are invalidated. Tried redis, were not completely happy with it - have found memcache super simple.

A strong pattern. Framework based solutions from the Django community. Don’t have to think about all these things. Major usefullness of the Django ecosystem.

Workflow

Main branch is always deployable
Development happens in feature branches
Code reviews via pull requests
Shared responsibility

Stopped using gitflow - now use github workflow - very successful. Code reviews is a great way to improve the quality. Shared responsibility.

Testing

Separation of fast and slow tests
Full test suite via Jenkins, soon Travis CI
Fast tests locally via tox

If you use the github workflow, Travis CI will test the feature branches :) Super important to make sure the product is ready for the customers.

Releasing

virtualenv (wrapper) + pip
localshopi for in-house software releases
django-setcon for Django configuration. Class based settings files.
foreman for process management (written in Ruby).

Using foreman, one command will set up an environment for a developer. Not using virtual machines because they take too long to set up.

Scaling Up

Initial set-up by hand.

Gets more difficult, each server downloads dependencies, external services could be down, which server is in charge?

Built their own deployment server which builds the latest release as a tar file.

Builds are virtual envionments
Atomic and orchestrated releases
collectstatic, migrate and other command centralised.
Web interface for deploying and rolling back
Pushes status updates to New Relic and HipChat

Will be relased as an open source package soon.

Provisioning

Follows DRY
Chef/Puppet/Salt (decided to use Chef)
Documents infrastructure and change
Place to share and store secure date
Roles can be on one or many servers
Challenge - separating deployment from the application.

Dependencies come from the deployment server. On a new deploy, there is really nothing that Chef needs to do.

Use knife to run commands on the servers e.g. knife ssh "role:web" "sudo chef-client"

Instagram tool, ec2-ssh, has a simple syntax and the name never changes.

pychef to access node data and manipulate it with python. Use it in fabric.

Operations

Log everything you could need for debugging
If you deploy often then you need immediate feedback
Use services if you can: Mixpanel, NewRelic, Librato, Papertrail, Pagerduty

django-app-metrics to push data to the services.

Summary

Only scale when you need to, but be prepared
Be pragmatic
Automate
Continuous integration and continuous deployment.
Make routine tasks really easy

Questions

Django community is smaller than Ruby. Not necessarily harder to recruit.

Logging Module

http://lokai.redholm.com/

Two targets for notifications: data related (errors in a file, new data, warnings) and system related (all other errors).

Requirements

Route messages to different people
Accumulate messages relating to a single input
Remember types of messages to decide action
Store messages as actions in the database.

Development Requirements

Simple API
Avoid passing notification objects from place to place
A single process might handle many files in sequence

Logger

Root logger:

basicConfig()

Get the logger and send a message:

getLogger().error(error_message)

Or… use a named logger:

my_logger = getLogger('main.special')

Handler does the actual output…

Might be helpful to think of it as follows:

What went wrong - Message
Where did it go wrong - Logger name
How important is it - Level
Who needs to know - Handler

Filters are given a copy of the log record. Message is not processed if filter fails.

Multiple handlers can be defined.

Logger hierarchy

Messages sent to X.Y.Z also go to X.Y and X (depending on filtering).
getLogger('X.Y.Z').warn(text)

The logger does level cut off test and filtering before it does anything else. So, the hierarchy will not get checked if the level doesn’t match.

Filtering is possibly best done by the handler…

Lessons in Testing

David Cramer, DISQUS
twitter.com/zeeg

5 billion page views. Use Django and Flask. Less than 20 engineers. Terrible at testing.

Lessons

No one likes writing tests. Time consuming to write. At least 50% of time writing tests.

Legacy (untested) code is expensive. Very expensive to add tests later. Add tests for regressions. Always write tests for new code. Becomes easier and easier to write them.

Slow or inaccurate - you can spend more time writing tests, or much more time running tests. So, moving towards integration tests. Interface contracts yield inaccuracy (i.e. they change).

Higher level tests are slower, but easier to write and understand i.e. unit vs integration tests.

Mocking is great, but is very fragile (they use mock.readthedocs.org). Very useful for testing services e.g. Twitter and internal APIs. Record live data for mocking - check out the Ruby VCR library.

Limit what you test.

Assume APIs don’t change (it is mostly true).

Smoke tests… Very high level Selenium test.

Test the life-cycle of requests. Selenium kind of works… Very fragile, swapping some of the tests out to PhantomJS.

Don’t admit defeat!!

Start with a goal - write testable code - things will become much, much better.

Break up your code into functions e.g. abstract out the database calls.

Start writing tests, add helpers wherever it makes sense - tests will become much cleaner.

Create structure in the test suite. Don’t like the Django pattern (they have a much deeper folder structure).

Document best practices, make it obvious how to use your helpers. Help people write tests.

Continuously run tests, make people fix stuff immediately. Having visibility. Nobody cared about JavaScript tests until they were added to Jenkins.

Drive it into your culture. We don’t like to break production code.

Use code review Everything goes into code review… Breaking stuff into smaller chunks so development life-cycle can go faster.

Tools

If the right tools don’t exist, then build them.

Switched off the standard Django test system. They use nose for test discovery and uses standard unit test style. Can drop into pdb on test failure: nosetests --pdb --pdb-failure. Check out nose-quickunit and django-nose.

Record code coverage using coverage.py. Use coverage run in place of python.

Sentry, exception reporting, because tests are not enough! Data usually breaks code. Check out the stack trace - can often avoid having to contact the user.

CI - Jenkins. Has been mangled into something it is not at the moment. Wanted to test every commit (couldn’t do this with Jenkins). Have separated tests into chunks e.g. JS, integration, unit test.

Code review - http://phabricator.org/, (PHP), very well written. Very well integrated with GIT. Makes your commit message useful. arc is the command line interface - runs lint and unit tests.

Gargoyle, selectively enable features in code. Silently launch features. Ease performance and for load testing. They call this dark launching - managed to test their real time module before anyone was using it. It has also failed several times without affecting anything.

Takeaway

Very hard for us to adopt it.

Test suite still has alot of holes.

Global fixture is a problem - speeded up - but hard to understand and creates problems with test isolation.

Alot of problems with people arriving from environments where they are not used to testing.

Integration testing has been awesome for us. Use Django, so database testing is pretty awesome. Unit testing can work right - if you do it from the start - works perfectly for a library. Ship fast - or a perfect product - you have to find the balance.

The process is evolving. Culture is key. Very hard to adopt. Came from PHP - had no idea what tests were at that point. Alot of people come from this hackish world. Wanted to release often, and didn’t want an expensive QA team.

Figure out the value for your own company.

Just do it?

Questions

For lint, use pyflakes and pep8. Check out the modified version of pyflakes on his github page.

git branches - follow the pattern used by the review tool. Developer works on feature branch. rebase so the branch is a single commit (for code review). master has to be stable. Similar to gitflow (but not the same).

Deploy - use fabric, rsync etc.

Making the case - Why are we deploying broken software to production? Could calculate the cost of failure and compare it to the cost of prevention. Must be doing something right, as we have no QA people - and are fairly stable.

Arbitrarily assign code reviewers. Don’t currently have a great solution. Not really sure how to guide reviewers. When to accept, and when to reject. When should we spend time doing it. Comment from audience - Review board is good, but not great.

Has personally only been successful using TDD when fixing a bug/regression. Culture for TDD is much harder to adopt. Has been some very impressive stories from companies with this culture. Wouldn’t stop a developer doing TDD as long as the code is good.

For a while, only wrote positive tests… The code reviewers and the developer will decide if the tests are worth writing. Alot of it is trial and error. Have mentors, peer review, discussions.

Test data - Django fixtures were a problem, so implemented global fixtures. Use a modified version of django-nose. Set-up global fixture before running tests. Loads using SQL and sets up REDIS. Rely on the test not being too absolute about what is in the fixture. Do not mock the database. Kind of iffy. Released django-mock, but sounds like they don’t use it. All trade offs. No perfect solution.

Eventual goal is for Jenkins to test each patch in isolation before merging.

eGenix PyRun

Marc-Andre Lemburg
Core Developer, CEO of eGenix

Open source project.

Motivation

Simple installation without side effects. Often difficult on linux. Disk space is cheap.

Small footprint and download.

Easy to add to installers.

Extensible - load .py, .pyc and C extension modules.

Project

Builds upon old mxCGIPython project

Support for 2.7 (not 3 yet)

Binaries available for linux, Mac OSX

Use Case

Distribution of scripts and applications. Loads almost twice as fast as regular python. Uses only a fraction of the space - 12MB file.

virtualenv replacement. Simply copy into a folder. For pip etc, you will want to create a folder structure. Don’t need an activation script. Independant of python installation. Works without python installed. No symlinks to manage. Fully relocatable. Small enough to have multiple copies - 13MB with pip and setuptools. 36MB is you want to compile extensions (after compile, you can remove alot of this).

Testing and scripting

Application private python installations e.g. dedicated python for Trac.

Embedded devices (just a thought - not tested).

How does it work?

Based on python’s tools/freeze (with a couple of patches). Is used on the whole standard library.

PyRun searches relative to it’s executable folder making it easily relocatable.

Uses it’s own lib/directory for extensions.

Added tricks to make it compatible with distutils, setuptools and pip. Not yet tested with buildout.

If you find things which don’t work, then please let us know.

Missing

dbm, crypt, readline, parser, tkinter, multi-processing, test packages. These modules can be loaded as regular external .so modules.

What doesn’t work?

When run with regrtest.py - some of the test suite modules do not work. They do work when run standalone.

Standard libary modules that require access to resources.

Alternatives

py2exe
cx_Freeze
bbfreeze
etc

Future

Better documentation.

More flexible configuration.

Windows support

setup.py

Demo

tar
# only need the bin folder
bin/pyrun

tar setuptools
../bin/pyrun ...

tar pip
cd pip

bin/pip install

Continous introspection

@nicvenegas
Works for Atlassian bitbucket

Cast

@erikvanzijst - author of dogslow and interruptingcow
@brodie - author of geordi

Performance Problems

conq, their ssh shell was importing Django and Bitbucket code, and took nearly 1.5 seconds per request. Switched to using direct SQL, and massive reductions in load. 60% load decrease on all web servers. 16 times faster to start up. To learn - this wouldn’t have been seen in the development environment, but did cause problems to all of their users.

Common Causes

Slow SQL queries (or too many).

Lock contention - between threads, database table/row locks, file locks (hg/git).

Excessive IO (disk/network)

Regular expressions - the a fast regular expression can sometimes take forever.

503 - worker pool full. Could be a denial of service attack.

500 - if request times out (Gunicorn SIGKILL). Process does not know that it is going to be killed.

Libraries to help

dogslow is Django middleware which emails traceback of slow requests. Has no performance penalty.

django-geordi, selectively profile individual requests. Add ?__geordi__ to any URL to enable the VisorMiddleware. Produces a PDF call graph showing where the process takes the time. It runs outside the worker pool as a Celery task, so shouldn’t cause load problems.

interruptingcow - allows you to time-box chunks of python code e.g. allow the process to take up to 20 seconds - throw an exception if it takes longer. Supports nested timeouts - can be used to make parts of a request optional.

Becoming a better programmer

Harald Armin Massa
Lightning Talk Man

Shortcut

apt-get --purge remove java

Use mind-maps to help you process information. Not very useful for other people.

Very funny talk… although you probably had to be here!

BDD at BSkyB, Collaboratively coding correctly

@saley89
@russellsherwood

Replace a legacy sales system. python 2.7, REST API. Cannot afford defects when selling. Cannot afford to price incorrectly.

Why we use BDD

Do it right the first time
Deliver what was required with high quality code
Testers and developers write tests
Easy reuse
Refactoring

Testing is everyones responsibility

History - Developers did testing - testing is role of dedicated QA team - introduction of agile - testing is everyones responsibility TDD/BDD

Other tools - Fitnesse, Selenium, nose

Agile - Sprints, planning games, retrospectives, fail fast, adapt quickly

Two weekly sprints, only two bugs and they have never been deployed.

What is BDD

Focus
Collaboration
Simple
Feedback cycle

Gherkin

Given
When
Then
And
Feature
Background
Scenario
Scenario Outline

Process

Story card
Defuzz (15-20 minute chat with business analyst)
QA - BDD (write the test)
Write code (using TDD)
Review with business analyst
Card complete

Test

Understood by all stakeholders
Simple - plain English
Steps file - regular expressions,

Tools

Cucumber - Ruby
Lettuce - almost a direct port of Cucumber
Freshen - used by BSkyB - uses nose test runner
Behave - seems to be gathering momentum. Almost identical to Cucumber.

All pretty much the same.

Demo

nosetests --with-freshen -v --nocapture my.feature

Do it right the first time every time.

Questions

Step re-use is done by simple collaboration - team share and help each other.

The testing team work with the business analyst to write useful tests. Need someone in the team who understands how the test will be written.

Do unit testing before writing any code. Acceptance tests are written up front in Freshen.

Work in two weekly sprints.

Can the tests serve as documentation? BSkyB have separate documentation which is written by the business analyst.

How much time does it take? We can spend so much of time writing tests. BDD tests are verbose. BDD clarifies exactly what needs to be done.

How do you know how much a feature will change? The application has a road- map and the BDD tests define the features.

zc.buildout

http://gocept.com/

Problems to solve

Install and configure software in a reroducible way

python and other packages

Does not build software from source (make etc)

Isolated from other applications on the same machine - and from other buildout environments on the same machine.

What is zc.buildout

Developed by Jim Fulton (Zope) in 2006

Demo

# download
wget bootstrap.py
# create a config file, then bootstrap the environment
python bootstrap.py -d

bin/buildout

The work is done by a recipe. They are downloaded when buildout runs. The recipe comes as an egg e.g. zc.recipe.egg invokes easy_install API.

When the configuration is changed, everything previously installed by that recipe will be removed and re-installed.

Each script sets up it’s own python path, so the eggs do not have to be installed into the system python folders.

The recipe sets up the bin/buildout script.

Versions are pinned [versions]], they will always be honoured. Good practice to set allow-picked-versions = false

KGS - known good set of pinnings

e.g. Can install nginx, recipe - zc.recipe.cmmi

Python Web Applications in Multihost, Low Latency Environments

Pavel Schon
diverman on Django snippets

Trading systems. Using python on the web server, and JavaScript, jQuery SVG on the browser.

WSGI Frameworks works in similar ways

Create request object from an environment provided by web server
Dispatch URL to an appropriate controller function which returns a response
Execute the controller function
Returned to web server.

Request Object

Environment - method
Session
etc…

Dispatch URL

Regular expressions
Rewrites (mod_rewrite)
Wildcards
Decorators

Controller Function

Generate the content
Handle ORM, cache, cookies
Return response
Error handling

State

Browser state
Server state
Process state
Session state

Shared State

Get or create a session, modify, store session
Race conditions - need to synchronise

How to synchronise? SQL, lock file, messaging, RPC, DLM (distributed lock manager).

dlm.py published on ActiveState.com. If app crashes, all other processes will wait. Single point of failure.

Apache mod_wsgi can run background processes. Apache will start, end and restart the process. See WSGIDaemonProcess for details.

Fun with GET or POST data

How to check equivalency? Encode multiple forms into a single query string. Not sure about this - don’t know why you would

Guidelines to writing an API with python

@peristerakis
George Peristerakis

Reuse existing frameworks and customise them according to our customer needs. This did not work in one case, because the framework did not support an important feature. Patches to the framework were not accepted - so they tried a monkey patch (so they could keep using updates to the framework). They then put it in a middleware. So… what should the strategy have been?

How about writing an API to replace the existing framework implementation?

Steps

What is a discount calculator? Start by saying what it is not e.g. it is not reporting… This will allow us to concentrate on the API without contaminating our thought process with other concerns.

Understand the data.

Started by using a dictionary to collect the data - then converted to a class.

Lessons learned the hard way

Evolve - from __init__ to infinity. Don’t try to do too much. Don’t try and satisfy too many people. Always try and find the most simple form of what you are trying to do. If you talk about it, and find yourself talking about multiple conditions, then simplify.

Know your domain from different angles.

Document your process. Why is more important than the How.

Don’t be afraid to test your hypothesis and then throw it away.

Other Stuff

Downloads for Android gaming: http://thp.io/2012/europython/downloads.html

Keynote about https://www.torproject.org/ Tor is free software and an open network that helps you defend against a form of network surveillance that threatens personal freedom and privacy, confidential business activities and relationships, and state security known as traffic analysis.

Slides for Programming mobile apps with python

Check this out Advanced REST client Application

To Do

Check out the with statement in connection to unit testing…

How about a python contractors cooperative? If interested, contact rob.collins@pythonpro.co.uk. http://pycontract.com

Use mind-maps to help you process information. Not very useful for other people - but will help your own brain.

Check out the Background keyword in Freshen. What is the equivalent in Lettuce?

Read the following:

Check out https://github.com/inglesp/prescons

Check out django-ztask (should not use Celery apparently)

Check out django-pjax JavaScript library written by some guys from GitHub. Manipulates browser history.

Check out http://discorporate.us/projects/flatland/ Form validation etc…

Advanced Flask Patterns

Mysteriously applicable to other things…

Will only work with new version of Flask (released on Sunday).

Apps are entirely independent. Now have an app.app_context(), the current_app will point to the current application. Similar for the request object.

Request stack and application stack are independent:

with app.request_context() as ctx:

Because requests are expensive, you can now use the application context.

Runtime state is request bound (short lived), test bound, user controlled. If your view function returns, your context will disappear.

State bound data:

request - HTTP request and session data
app     - Database connections and object caching.

Old pattern had issues:

  • Requires an active request for a database connection.

  • Always connects to the database even if it isn’t used.

  • Once you start using g. you expose an implementation detail.

New pattern seems weird and complicated. The trivial example will not work with multiple applications. Not so bad in actual use… Slides are incorrect

Teardown always happens unless a chained teardown failed. Could move transaction commit or abort to teardown method.

Recommend using an extension for database handling.

Explicit response creation - using make_response. Normally you don’t want to do this (you can call make_response on the object returned from make_response - this is useful for decorators - one person in the room had made a custom return type which converted objects into JSON data*).

Deep copying objects in python is slow and nearly impossible! Faster to use JSON!

How to share between applications? Blueprints are similar to applications. Solution seems ugly - Armin would welcome suggestions.

Extension Primer

  • Are very vaguely defined

  • Do not use a plug-in system.

Extensions should no longer use self.app. They should use the application object from the context.

Making DISQUS Realtime

Adam Hitchcock
@NorthIsUp

Back-end Django and Postgres

Real time is an entirely new architecture.

Why do real time (less than 10 seconds)?

Getting new data to the user ASAP
Increased engagement
Looks awesome
We can sell it

Old realtime used polling which used jQuery to poll memcache. Was kinda #failscale!

Real*er* Time

Tested dark on 50% of network as is still a WIP
Have seen 1.5 million concurrently connected users
45 thousand connections per second
165 thousand messages per second
.2 seconds latency end to end

How do we do it?

nodejs and mongodb (no this is a python conference)
gevent, gunicorn, flask, thoonk (a queue built on redis)
redis (pub-sub), nginx, haproxy

Architecture

Django - new posts onto redis queue - backend gevent server - redis pub/sub - frontend gunicorn and flask - nginx and haproxy

Backend

  • Listens to Thoonk queue

  • Cleans and formats message - this is the final format before http publish - compress data now (gzip)

  • Publish message to pubsub forum:id, thread:id, user:id, post:id

Average processing time is 0.2 seconds. Queue maintenance - timeout 5 seconds. Separate pub/sub and non pub/sub redis. Quarantine failing messages (what jobs to re-queue, get and cancel jobs). Transactions can be picky. Planning on using zookeeper??

gevent is nice

gevent spawn helpers, https://gist.github.com/3053495

Start, fail, start, fail, start, fail, kill

To yeild a thread sleep(0)

Front End

  • Needs to be fast

  • Pools redis connections

  • Routes messages from pubsub to http

New request - create/register a subscription with the pool - sub/queue returns a python queue based on the channel.

Listener receives the message on a pubsub channel. If that channel has a subscriber, pass it on.

Long pollingish. Long help HTTP connection. Stream JSON over this connection. Why not web-sockets - because they don’t work yet - and are not stable. They will use them, with a fall back to long polling-ish. Must be JSON - as text will get buffered. With millions of connections, had to pool redis pub/sub.

Timeouts - needless reclaiming of resources. Maximise usage of cheap things (connection count). Minimise expensive things (requests per second). Getting rid of timeouts and increasing timeouts has increased concurrency.

Testing

Darktime - Use existing network to load test (at the beginning a few user complaints - cannot hide them from the browser console).

Darkesttime - load testing a single thread. Discovered alot of flaws in the architecture.

Have knobs you can twiddle.

Stats

Measure all the things

Especially hard when numbers don’t line up.

Try to express things as +1 and -1 if you can.

Is hard in distributed things.

I used scales from greplin metrics for py

Lessons

Do hard work early

Defer work you might never need

End to end ACKs are good, but expensive.

Timeouts are not free.

Greenlets are effectively free

Pub/sub is effectively free.

Nginx for real time, you must have proxy_buffering off

Questions

Something faster than pywsgi? FapWS

Between WSGI and Web-Sockets, you could use ZeroMQ. Don’t think it works cross language.

Flask just loads the routes.. Use Blueprints to load the same endpoints multiple times. No database access.

Do you have to have Nginx and haproxy - can we run Gunicorn straight onto the web? No - you need Nginx and haproxy.

Gargoyle also has JavaScript options to switch things on and off.

Why REDIS? Needed pub/sub and a queue. ZeroMQ also provides pub/sub, but uses a broker - so hard to measure.

nydus - consistent hashing for Redis.

Discovering Descriptors

https://github.com/inglesp/Discovering-Descriptors

Peter Inglesby
@inglesp
git://github.com/inglesp/Discovering-Descriptors.git

__get__, __set__ and __delete__ makes it a descriptor.

Slots restricts the attributes that a class can have.

Properties - similar to descriptor in many ways. Good for storing a single value but representing it in different ways.

  • Properties work best when they know about the class

  • Descriptors are more general, can often apply to any class.

  • Use descriptors if behaviour is different for classes and instances

  • Properties are syntactic sugar.

Read

  • Data Model Reference

  • Descriptor HowTo Guide

  • Unifying types and classes in Python 2.2

  • Guido’s History of Python blog.

Read Code

  • Lots of good examples in Django such as related objects.

  • Hybrid attributes in SQLAlchemy

  • python source Tools/demo/eiffel.py

  • $ grep __get__ site_packages

Play

  • Implement methods, __slots__, properties in pure python.

Cubes - lightweight OLAP

https://github.com/Stiivi/cubes

@stiivi
Stefan Urbanek

Small, lightweight framework. Is one year old - does not have permissions etc.

Aggregation browsing, slicing and dicing.

Two parts - modelling and reporting (aggregating).

Four parts: Model, Aggregation Browser, Backends, http Server

Model

Business analyst view of the data. Different to your normal transactional view of the database. Smallest part of the data is called a fact. A cube is a collection of measurable facts.

Dimensions e.g. time, type - provides context for facts - used to filter - has a hierarchy.

Label attribute describes the data. The key allows for slicing.

Cubes can be localised.

Browser

Displays data

No pre-defined ways to store the data. Denormalised or snowflake.

For the browser to work, you need the model and the data.

The cell provides the data from a filter or selection. Can be multi- dimensional. Cells have a path - which describes the meaning of the key.

Three cut types - point, set, range.

Has an implicit hierarchy e.g. months within a year.

Can create cross_table

Slicer is an OLAP server which uses HTTP and JSON: slicer serve slicer.ini. Also a slicer command line tool.

SQL Backend

Supports star or snowflake (extended star) schema. Can also browse a denormalised table.

Future

Would be nice to have some formatters for visualisation libraries.

JavaScript library (check out cubes-js)

More backends.

Open Data

  • Shared repository of models

  • Shared repository of dimensions

  • Public cubes - open slicer HTTP APIs

Simple module for Django i.e. read the Django models and then use the slicer server.

Stay light. Want to keep it simple and lightweight.

Python @ Layar

or, building complex and scalable systems using python and AWS

@jfdsmit
Jens de Smit

Case Study - Mobile augmented reality. Alot of python in the back end. Mobile clients are native code.

Django back end

Comprehensive feature set
Build web pages and API
Active community
Many good extensions
Can handle high volumes (Christophe Pettus, http://thebuild.com)
Handles user registration, catalogue, web hosting

Files are stored on S3 (slow), database is MySQL on Amazon RDS (not the best choice for Django, but easier on AWS as it is setup for you).

Web facing

Two Django boxes with AWS load balancer
Django instances autoscale when load goes up
Popular data in memcached
Scaling database - bigger machine or read replicas

Logging

Sentry
Group and count on similar messages
One Sentry install for all your services

Visual Search Engine

Image recognition - Catchoom
Tornado with Boost.Python interfacing to C++ binaries
Sharded for scale-out, redundant for HA and read speed
Storage on EBS volumes (more expensive, but much faster)

Analytics

MySQL database collects data
Django app stores SQL queries for aggregation
cron job executes queries and stores results
More SQL queries feed HighCharts for fancy graphics

Note: this does not scale.

Long Running Jobs - (Spencer - home grown Twisted app)

Extracting images from PDFs, analyse images
Multiprocessing rather than multi-threading
Default 1 instance, easily scales to 20
Calling separate programs to do processing lets you use anything
Only 1300 lines of Twisted

Basically simple queuing with background tasks.

AWS

Convenient
Pay for what you use
Basic monitoring
Web interface and command line tools
Not the most bang for your buck
Assume no guarantees
Does not excuse you from having Ops!
Databases are very expensive
Backup outside of Amazon

Tips

python has alot to offer
Automate - Fabric and Chef
Deploy early, darktest, waffle, gargoyle
Use django-ztask not Celery
Cache from the beginning. Think about every query as you write it.

Going massive with uWSGI and nginx

Identify your context - trusted or untrusted?

System resources - memory, CPU, disk space, network bandwidth

How many sysadmins do you have?

Try to never reload system services for updating config

Let users do hard work. Good docs needed.

nginx - cheap, fast HTTP, SPDY proxy

uWSGI - for app hosting and management

server {
  listen 80;
  servername $hostname;

  location / {
    include uwsgi_params;
    uwsgi_pass unix:/tmp/$host.socket;

Read about vassal (I think these might be configuration files)

# single folder
uwsgi --emporer /etc/vassals
uwsgi --emporer "/etc/vassals/*/*.ini"
[uwsgi]
customer = customer001
uid = %(customer)
gid = %(customer)
socket = /tmp/example.com.socket
wsgi-file=/var/apps/yourapp/ap...

Linux Control Groups (cgroups)

For security to limit CPU and memory

Is getting too complicated for me now!! Check out the slides…

uWSGI fastrouter for single Nginx and multiple uWSGI servers.

What is missing

Static files serving
Long running tasks and external daemons
database

ssh give ssh to users… (please)… Why?

Working on

ssh keys for secured subscription
Other event systems (zeromq, redis)
etc, etc

Questions

Is Nginx really needed? Yes
Can I build the next PaaS/ISP/hosting platform? Yes
HA Proxy is really good at something??

In Search of Reduced Loading Times

Apostolis Bessas
@mpessas
Transifex

Uses Django and PostgreSQL

Optimizing SQL

django-debug-toolbar
django-devserver
django.db.backends logger
Database logging, log_min_duratiion_statement in PostgreSQl

Less Queries

select_related - adds a JOIN to get the data for the related table.

prefetch_related for many to many and reverse for foreign keys. Will write a query to get all the data in just two requests from the database.

.iterator()

Tells Django to not cache the results from the database. Prevents unnecessary caching of results.

annotate

Always use values() before annotate, so the SQL query only does a GROUP BY on the required columns.

Raw SQL

Don’t be afraid to use raw SQL. Two methods, Manager.raw() and django.db.connection.cursor

RawQuerySet is like a QuerySet, but is not a QuerySet. The objects returned are valid models.

defer() and only()

defer - columns to omit from the SELECT list
only - columns to specify in the SELECT list

Bulk Operations

bulk_create (django-bulk for older versions of Django)
COPY for PostgreSQL

Don’t forget to take advantage of the native features of your database.

De-normalisation

Mostly for read-only data - and only when you see performance issues.

Meta.Options.ordering

Don’t use it!!! as it will add an ORDER BY statement to every query.

Caching

Don’t use the database for sessions. Use memcached or signed cookies.

Template Compilation

You can pre-compile templates: django.templates.loaders.cached.Loader.

Be very careful not to use any state in your template e.g. different template output for each user.

Entity Tags/Last-Modified

Allow to use browser cache (304 status code)

Worth doing only if it is easy to calculate the entity tag. No point doing anything expensive. Will save you bandwidth. Often good for home-pages if they don’t change very often. You could store the last updated time in your cache.

Will not work with personalised pages. Could think about using JavaScript for personalised sections.

Proxy might intercept the 304 status code. The request might not even get to your server.

Optimising Algorithms

Be careful with regular expressions…

I/O

Threads for I/O
Async I/O

PJAX

django-pjax JavaScript library written by some guys from GitHub. Manipulates browser history.

Kivy

Runs on Android, iOS, Windows, OSX and Linux

Based on OpenGL ES 2.0. API for 2D and 3D graphics. OpenGL managed by Khronos Group.

Available for 90% of Android and 84% of iOS devices. Number is always growing.

Goal

Create a framework
Handle all devices
Code once in python - deploy anywhere
Based on Cython
Rapid prototyping

Community

5 core developers
35 contributors
Over 500 users on mailing list

Performance improving with each release.

New language, kv for widgets (a bit like CSS)

Demo

Multi-touch does not work with QT. Can only receive one event at a time. Kivy is multi-touch by default.

Android Play store has a Kivy app showing all the widgets.

Not doing native applications - applications will look the same on all devices.

Next version

SVG graphics
Simple 3D model loader
Better documentation

Future

Grow the community
More widgets
Unified build/packaging
Faster execution
HTML5? :)

Questions

License - LGPL - Will do a blog post
Size of binaries