Friday, December 23, 2011

Delusions of Technofei - 2011 Tech House DJ Set

New DJ set to farewell 2011. 

Deep and tasty tech-house for your holiday boogie needs =) Recorded on NI Traktor S4. 







Track Listing:

01. Deadmau5: Slip
02. Deadmau5: Some Kind Of Blue
03. Mendo: Inocencia - Original Mix
04. Manuel Sahagun: Out Of Ocean (Original Mix)
05. Maya Jane Coles: What They Say - Dyed Soundorom Remix
06. Deetron: Starblazer - Original Mix
07. Fortunato and Montresor: Imagine (Chris Fortier Dub)
08. Pete Moss: Rain Day - The Timewriter Remix
09. Min & Mal: Blotox - Original Mix
10. Samuel L. Session: Preach On feat. Elbee Bad - Blessed Mix
11. Format:B: Piano Man (Original Mix)
12. David Berrie: Underslang - Original Mix
13. Stacey Pullen, Mirko Loko: Formulaic Mode - Original Mix
14. Jon Rundell: Knick Knack - Original Mix
15. Sante: Ce Una (Carlo Lio Remix)
16. Supernova: Beat Me Back feat Kevin Saunderson - Original Mix
17. Lewis Lastella: Sunset - Original Mix
18. Format:B: Country Girl (Original Mix)
19. Jay Lumen: As One (Original Mix)
20. Marco Lys: Kaleido (Original Mix)
21. Arnaud Le Texier, Hamid: The Prince Of Beans Feat. Stacey DeLooze (Original Mix)
22. Joris Voorn, Moby: After The After (Original Mix)

Saturday, November 12, 2011

Why I Like PostgreSQL

Today I gave a short presentation at work about PostgreSQL, and why I much prefer it to MySQL.

PostgreSQL vs MySQL: Eternal Battle


I may be misreading this, but it seems that there is a recent trend within startups to move away from MySQL, probably thanks to folks like Heroku on one side (who use PostgreSQL to the extreme, and help and contribute to it's development), vs folks like Oracle on the other side, tainting the "open source pureness" of MySQL :)

At my work we currently use a mid-sized MySQL 5.1 Percona instance, which is holding up quite well I must admit. Both PostgreSQL and MySQL have definitely converged to cover most features that people want, but my leaning is still towards PostgreSQL. I just agree with it's focus on data integrity, recovery, constraints, extensibility, while some of the early decisions in MySQL's design do not agree with me at all (like truncating long strings, 1/0 instead of booleans, ambiguous group by, etc). I think that data integrity may not have been the top priority in MySQL's early design (and it wasn't, MyISAM was fast, but not great at integrity).

I work for an e-commerce company, where transactions are very, very important. Loosing data is just not acceptable. Creating data that fails validation is a huge pain for our business analysts and accounting, who have to make sense of it. Early MySQL did not have sufficient constraints and foreign keys to encourage data modellers to use it.

I even believe that this lack of support for constraints and validation was the reason early Rails adopters rejected database constraints, and pushed people towards 100% in-application validation, while treating your database sort of like a document store (early Rails books and applications rarely create non-null columns, or even give varchar columns length specification, leaving MySQL to create all string columns with the default 256 character limit).

Among examples of where MySQL data integrity seems like an afterthought, is one where we recently discovered that a MySQL Replica being used for reporting allows writes, and was in fact quite out of sync with the master. How can you trust your reports that run off a replica, which is no longer a true replica?

Anyway, MySQL has plenty of support, fans and still enjoys wide spread usage. But if you are ready to try out PostgreSQL, here's my crash course of PostgreSQL install, features and some recent gotchas I had to deal with.

Note: parts of this post were inspired by a related post on data and PostgreSQL on SquareUp Technical blog.


PostgreSQL Basics

Installers are now available for Mac OS-X and Windows.

But I prefer compiling from sources. Compiling on unixes is very easy. Download the tar ball, unpack it and then run:
/configure --prefix=/usr/local/pgsql-9.1
make
make install
then let's create a database in directory /db on this server:
/usr/local/pgsql-9.1/bin/initdb –D /db
/usr/local/pgsql-9.1/bin/pg_ctl –D /db start
/usr/local/pgsql-9.1/bin/psql –U postgres postgres

Note that if you use an installer, your default user may be not "postgres" but your Mac or Windows username.

Configuration


Two critical files in /db:

postgresql.conf


Most db settings, performance, memory, optimizer, network interface to listen on, go into this file.

You will generally want to change the following (let me know if anyone is interested, and I can make some recommendations about which values I use).
shared_buffers 
temp_buffers 
work_mem 
maintenance_work_mem 
checkpoint_segments 
wal_keep_segments 
effective_cache_size
I also like to enable logger for slow queries in pg_log directory:
logging_collector = on 
log_directory = 'pg_log' 
log_filename = 'postgresql-%Y-%m-%d.log' 
log_rotation_age = 1d 
log_rotation_size = 0 
log_min_error_statement = error 
log_min_duration_statement = 200 
log_lock_waits = on 
log_statement = 'none'

pg_hba.conf


Access, including network, replication, etc. This is the file you want to modify to allow remote replication, remote access, decide which authentication method to use, etc.

Most of the time I use trust on a local system, and md5 for remote authentication.

Some Neat Features


PostgreSQL 9 boast a set of pretty cool features, some of them are listed below:
  • Partial Indexes (Reduce size of the index (say if only 10% of products are active):
create index on products (category_id) where isactive = true;
  • Function Indexes (instead of creating another column with lower case email):
create index on users to_lower(email);
select * from users where to_lower(‘MyEmail@GMAIL.COM’) = ‘myemail@gmail.com’;

More Cool Stuff

  • Create indexes concurrently (without table locks)
  • Schema modifications can be done in a transaction
  • Instant non-locking adding of nullable columns to large table (major issue today with MySQL)
  • Schemas, table spaces!  Can create indexes on a different table space (ie disk partition)
  • Extensible stored procedures: Java, Perl, Python, Ruby, Tcl. C/C++ and its own PL/pgSQL
  • Cost-based optimizer is generally better than rule base optimizer.  Takes into consideration data distribution
  • Full featured text search
  • INTERSECT and EXCEP in addition to UNION
  • Built-in performance statistics: pg_stat_activity

PostgreSQL In Practice: Replication


PostgreSQL 9.1.1 is the latest stable version as of this writing, and since version 9.0 PG supports streaming replication which I have recently setup on several servers.

There are a number of decent guides out there, for example here, and also here.

While setting up replication on PostgreSQL 9.1.1, I had a tiny trouble being able to replicate from the master db. The error was being printed on the master: "FATAL: must be replication role to start walsender"

Somehow, superuser "postgres" was not explicitly given "Replication" role. Weird, considering it's a super user. I think this may be a recent change in PostgreSQL default permissions. But to go around this, either add the REPLICATION role to user "postgres", or create a new role for replication only (must also have LOGIN role):
CREATE ROLE REPLICATOR REPLICATION LOGIN ENCRYPTED PASSWORD '....';
Then in pg_hba.conf:
host replication  replicator  10.0.0.0/32    md5

Using Replica for Queries


Another bump I ran into is the following: I wanted to use the PostgreSQL replica to run reports, pg_dump and other long-running queries. But they immediately failed with an error: "ERROR: canceling statement due to conflict with recovery"

See this thread for more info: http://postgresql.1045698.n5.nabble.com/Hot-Standby-ERROR-canceling-statement-due-to-conflict-with-recovery-td3402417.html

The solution is to increase the following parameter to allow queries longer than 10 minutes to successfully execute:
max_standby_streaming_delay = 600s
This also means your replica may be up to 10 minutes behind the master, but in my case this was an acceptable compromise.

I am guessing if someone wanted to setup a true hot-standby with minimum delay, it would not be very usable for reporting. So perhaps the following setup provides both redundancy and a reporting instance:

[master-db] <- [hot standby] <-- [hot standby, 2 hrs delay, reports]

Where is my Processlist?


MySQL users will no doubt miss infamous "show processlist" command. Don't fret: there is such thing in PostgreSQL too:

select * from pg_stat_activity;

This used to be in a separate contrib module, but now included by default. Which is great, because it's fast and provides locking information (whether queries wait on locks). This module uses a tiny temporary storage (usually in pg_stat_tmp) directory, which if you want to be really fancy, you would mount on a RAM disk partition. It looks like the size of the file inside that directory is constant (does not grow).

Filesystem            Size  Used Avail Use% Mounted on
/dev/ram1             9.7M  226K  9.0M   3% /db/data9/pg_stat_tmp

Locks, Waits, and Deadlocks


Locks and deadlocks are bane of any database application because they suck: at least one process would have to abort, sometimes more, and depending on how good your error handing is, this may have some undesired consequences. 

But how to you find and eliminate deadlocks? Short answer is -- there is no short answer. 

On many applications I worked, whether they were written in C, Perl, Java or Ruby, I've seen deadlocks happen again and again. Debugging deadlocks is a painful exercise and there is no prescribed answer that works in all cases. Debugging distributed deadlocks (that happen when distributed transactions spanning multiple databases lock up), is a lot harder than debugging deadlocks in a single database. 

But in both cases, being able to determine who is locking who, is very very important. PostgreSQL keep lock information in several supporting tables, which can be queried. I found this info invaluable, as you can see which processes are blocking, and which ones are waiting, and eventually figure out how reorder operations in your application or reduce contention on the same database object. The point is that information here is key, and PostgreSQL luckily provides a good deal of it. 

If you experience deadlocks, please see this page for detailed queries on lock contentions and deadlocks: http://wiki.postgresql.org/wiki/Lock_Monitoring

Summary


I wanted to share some of the recent finds, gotchas and also excitement about PostgreSQL 9 database, it's features and capabilities. I hope you found this post informative, and if anything maybe PostgreSQL will peek your interest. Feel free to leave a comment on any of the related topics.

Monday, July 25, 2011

Ruby, Mac Ports and OS-X Lion

Today I upgraded my work machine to OS-X Lion. As I do ruby development on my machine, I quickly stumbled upon a problem with installing "nokogiri" gem. I was getting this error:

Installing nokogiri (1.5.0) with native extensions
[snip] `rescue in block in build_extensions': 
ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError)
/Users/kig/.rvm/rubies/ruby-1.9.2-p290/bin/ruby extconf.rb
checking for libxml/parser.h... *** extconf.rb failed ***

In addition to installing new XCode from the app store, I also have Mac Ports installed, and so I had to perform the following command to allow nokogiri to build:
sudo port upgrade libxml2 libxslt

After this command completed, Nokogiri happily found libxml and it's dependencies.

UPDATE: Note that I no longer recommend installing libxml2 via ports. Use brew + rbenv instead.

K

Wednesday, June 15, 2011

New iMacs are Fast. Really Fricken Fast.

I finally upgraded my home workstation to one of the new just-released iMacs, which are reported to be the fastest Macs ever by MacRumors.

I concur! They are really really fast! For us rubyists, here are some benchmarks building Ruby VM on this machine using RVM (my home directory is on the SSD drive):
  • Ruby 1.8.7 (14 seconds to download)
  • real 1m20.815s
    user 0m48.298s
    sys 0m11.479s
    
  • Ruby 1.8.7 Enterprise Edition (13 seconds to download)
  • real 2m28.106s
    user 1m54.450s
    sys 0m14.372s
    
  • Ruby 1.9.2 (download time 38 seconds)
  • real 3m0.167s
    user 1m56.891s
    sys 0m19.959s
    
Model Info:
  Model Name: iMac
  Model Identifier: iMac12,2
  Processor Name: Intel Core i7
  Processor Speed: 3.4 GHz
  Number of Processors: 1
  Total Number of Cores: 4
  L2 Cache (per Core): 256 KB
  L3 Cache: 8 MB
  Memory: 8 GB

Tuesday, May 31, 2011

mc-settings: Gem created at ModCloth for managing configuration

We at ModCloth have been for a while focused on sharing some of the code we've written with the community, to give back to the open source.

MC-Settings is a cute little gem that provides smart ways of managing your multi-platform/per-deployment application configuration in a convenient set of YAML files, which are easily managed and activated using Capistrano.

For example, beyond configuration that is specific to Rails environment (such as production, development, etc), you may have configuration that is machine specific, or a cluster specific, etc. For example, if you run on multiple clusters, application servers on one cluster may need to use one set of MemCached servers, while your second cluster could have another set. How do you manage configuration in your deployments in such a way that allows sensible defaults, and per-system, cluster, environment overrides?

This is where MC-Settings gem comes in. Read up below on how to use the gem, and typical patterns that work with Capistrano.

Monday, May 30, 2011

RailsConf 2011 Best of Sessions

Below I put together my personal list of best RailsConf 2011 presentations that have online slides or PDFs. I put my comments next to each. Click on the link, then click on the "slideshow" or "PDF" links for each talk.

And if you are feeling sentimental, and added bonus:

  • 50 in 50 is a really zany presentation on computer science achievements in the last century. Great meditative presentation, and goes way over 1 hour. No relationship to ruby/rails, but everyone will get something out of it

Friday, May 13, 2011

mms-mime: MM7/MMS MIME parsing gem

https://github.com/kigster/mms-mime

This gem was written with a simple aim to parse MM7 wrapped binary and base64 encoded MMS messages received via MM7/XML HTTP post from an MM7 compatible gateway connection (such as OpenWave, OpenMarket, etc).

The gem provides a simple way to parse and access MMS message contents, such as from, to, subject and content parts (including image and text parts).

Wednesday, December 8, 2010

Amazing Map of World of Warcraft World Routes and Travel Paths post Cataclysm

If you are playing Cataclysm, you may be baffled by the new travelling routes between continents. Well, baffle no more! My amazing artistic wifey had created this fantastic map of routes, so you can figure out how to get from anywhere to everywhere without a handy mage :)
Enjoy, and happy playing :)

Saturday, September 18, 2010

GoGaRuCo 2010 - San Francisco Ruby Conference

Had an awesome time here for two days, listening to talks, hacking on some code, learning, networking, even managed to sign up for the UCSF gym and to take a swim. Very nice organization, excellent venue, and fantastic talks. Definitely coming back next year. For the price it's well worth it. Here are some highlights:
  • Super useful resource for lookup up shell commands: http://shellhaters.heroku.com/posix
  • Terminator plugin: start your dev environment as you like it. Gem install terminator
  • http://github.com/rdy/fixture_builder Factory to Fixtures converter to speed up your tests.
  • pprof profile Ruby interpreter. Rack-Profiler project, great profiling tool.
  • Coffee Script - wrapper (ruby-esque) for javascript; rails 3.1 supports coffee script templates
  • Machine Learning - great talk, and O'Reiley book, http://twitter.com/igrigorik
  • minitest fastest testing framework, many mentions, very fast, supports RSpec and Test::Unit syntax
  • Caching: using fresh_when(:last_modified => ...) to enable proper HTTP caching in Rails 3.1
  • Arel: enables fragment caching that does not run SQL if the fragment is cached
  • ruby 1.9: require 'objspace' allows inspection of object counts and memory usage in VM

Tuesday, August 3, 2010

Twitter is the slowest social network, Facebook the fastest

Interesting statistics just came out on AlertSite blog post about performance and response time of various highly trafficked social networks.

Unsurprisingly, Facebook is the fastest, and Twitter is the slowest. Facebook has so much more data to deal with, so much media, and so many more users, that it really is a shame for Twitter with it's 140 character data set to be in this unglorious last place.

After continuous reports about migrating to Cassandra, Twitter still serves their tweets out of MemCaches that sit in front of MySQL. When those caches die, it takes a long time to refill them and the site yet again drops the very familiar Fail Whale.

Perhaps Twitter's infrastructure group could use some new blood to work on scaling and up-time. I am guessing that adding experienced people who scaled Facebook, or other high traffic sites could really help in the long run. The problem is that hiring experienced contributors to work on this pain point may feel threatening to the existing crew who continually tried and ultimately failed to fix Twitter's performance problems. Don't fall into this trap Twitter: people who are having hard time re-engineering current architecture will continue to have a hard time without an influx of fresh energy and new ideas. Consider making the necessary changes, and then perhaps you could pull out of that miserly last place.