
Tektastic
Elaborations on technology, music, art and life in general, by Konstantin Gredeskoul
Friday, December 23, 2011
Delusions of Technofei - 2011 Tech House DJ Set

Saturday, November 12, 2011
Why I Like PostgreSQL
PostgreSQL vs MySQL: Eternal Battle
I may be misreading this, but it seems that there is a recent trend within startups to move away from MySQL, probably thanks to folks like Heroku on one side (who use PostgreSQL to the extreme, and help and contribute to it's development), vs folks like Oracle on the other side, tainting the "open source pureness" of MySQL :)
At my work we currently use a mid-sized MySQL 5.1 Percona instance, which is holding up quite well I must admit. Both PostgreSQL and MySQL have definitely converged to cover most features that people want, but my leaning is still towards PostgreSQL. I just agree with it's focus on data integrity, recovery, constraints, extensibility, while some of the early decisions in MySQL's design do not agree with me at all (like truncating long strings, 1/0 instead of booleans, ambiguous group by, etc). I think that data integrity may not have been the top priority in MySQL's early design (and it wasn't, MyISAM was fast, but not great at integrity).
I work for an e-commerce company, where transactions are very, very important. Loosing data is just not acceptable. Creating data that fails validation is a huge pain for our business analysts and accounting, who have to make sense of it. Early MySQL did not have sufficient constraints and foreign keys to encourage data modellers to use it.
I even believe that this lack of support for constraints and validation was the reason early Rails adopters rejected database constraints, and pushed people towards 100% in-application validation, while treating your database sort of like a document store (early Rails books and applications rarely create non-null columns, or even give varchar columns length specification, leaving MySQL to create all string columns with the default 256 character limit).
Among examples of where MySQL data integrity seems like an afterthought, is one where we recently discovered that a MySQL Replica being used for reporting allows writes, and was in fact quite out of sync with the master. How can you trust your reports that run off a replica, which is no longer a true replica?
Anyway, MySQL has plenty of support, fans and still enjoys wide spread usage. But if you are ready to try out PostgreSQL, here's my crash course of PostgreSQL install, features and some recent gotchas I had to deal with.
Note: parts of this post were inspired by a related post on data and PostgreSQL on SquareUp Technical blog.
PostgreSQL Basics
Installers are now available for Mac OS-X and Windows.But I prefer compiling from sources. Compiling on unixes is very easy. Download the tar ball, unpack it and then run:
/configure --prefix=/usr/local/pgsql-9.1 make make installthen let's create a database in directory /db on this server:
/usr/local/pgsql-9.1/bin/initdb –D /db /usr/local/pgsql-9.1/bin/pg_ctl –D /db start /usr/local/pgsql-9.1/bin/psql –U postgres postgres
Note that if you use an installer, your default user may be not "postgres" but your Mac or Windows username.
Configuration
Two critical files in /db:
postgresql.conf
Most db settings, performance, memory, optimizer, network interface to listen on, go into this file.
You will generally want to change the following (let me know if anyone is interested, and I can make some recommendations about which values I use).
shared_buffers temp_buffers work_mem maintenance_work_mem checkpoint_segments wal_keep_segments effective_cache_sizeI also like to enable logger for slow queries in pg_log directory:
logging_collector = on log_directory = 'pg_log' log_filename = 'postgresql-%Y-%m-%d.log' log_rotation_age = 1d log_rotation_size = 0 log_min_error_statement = error log_min_duration_statement = 200 log_lock_waits = on log_statement = 'none'
pg_hba.conf
Access, including network, replication, etc. This is the file you want to modify to allow remote replication, remote access, decide which authentication method to use, etc.
Most of the time I use trust on a local system, and md5 for remote authentication.
Some Neat Features
PostgreSQL 9 boast a set of pretty cool features, some of them are listed below:
- Partial Indexes (Reduce size of the index (say if only 10% of products are active):
create index on products (category_id) where isactive = true;
- Function Indexes (instead of creating another column with lower case email):
create index on users to_lower(email); select * from users where to_lower(‘MyEmail@GMAIL.COM’) = ‘myemail@gmail.com’;
More Cool Stuff
- Create indexes concurrently (without table locks)
- Schema modifications can be done in a transaction
- Instant non-locking adding of nullable columns to large table (major issue today with MySQL)
- Schemas, table spaces! Can create indexes on a different table space (ie disk partition)
- Extensible stored procedures: Java, Perl, Python, Ruby, Tcl. C/C++ and its own PL/pgSQL
- Cost-based optimizer is generally better than rule base optimizer. Takes into consideration data distribution
- Full featured text search
- INTERSECT and EXCEP in addition to UNION
- Built-in performance statistics: pg_stat_activity
PostgreSQL In Practice: Replication
PostgreSQL 9.1.1 is the latest stable version as of this writing, and since version 9.0 PG supports streaming replication which I have recently setup on several servers.
There are a number of decent guides out there, for example here, and also here.
While setting up replication on PostgreSQL 9.1.1, I had a tiny trouble being able to replicate from the master db. The error was being printed on the master: "FATAL: must be replication role to start walsender"
Somehow, superuser "postgres" was not explicitly given "Replication" role. Weird, considering it's a super user. I think this may be a recent change in PostgreSQL default permissions. But to go around this, either add the REPLICATION role to user "postgres", or create a new role for replication only (must also have LOGIN role):
CREATE ROLE REPLICATOR REPLICATION LOGIN ENCRYPTED PASSWORD '....';Then in pg_hba.conf:
host replication replicator 10.0.0.0/32 md5
Using Replica for Queries
Another bump I ran into is the following: I wanted to use the PostgreSQL replica to run reports, pg_dump and other long-running queries. But they immediately failed with an error: "ERROR: canceling statement due to conflict with recovery"
See this thread for more info: http://postgresql.1045698.n5.nabble.com/Hot-Standby-ERROR-canceling-statement-due-to-conflict-with-recovery-td3402417.html
The solution is to increase the following parameter to allow queries longer than 10 minutes to successfully execute:
max_standby_streaming_delay = 600sThis also means your replica may be up to 10 minutes behind the master, but in my case this was an acceptable compromise.
I am guessing if someone wanted to setup a true hot-standby with minimum delay, it would not be very usable for reporting. So perhaps the following setup provides both redundancy and a reporting instance:
[master-db] <- [hot standby] <-- [hot standby, 2 hrs delay, reports]
Where is my Processlist?
MySQL users will no doubt miss infamous "show processlist" command. Don't fret: there is such thing in PostgreSQL too:
select * from pg_stat_activity;
This used to be in a separate contrib module, but now included by default. Which is great, because it's fast and provides locking information (whether queries wait on locks). This module uses a tiny temporary storage (usually in pg_stat_tmp) directory, which if you want to be really fancy, you would mount on a RAM disk partition. It looks like the size of the file inside that directory is constant (does not grow).
Filesystem Size Used Avail Use% Mounted on /dev/ram1 9.7M 226K 9.0M 3% /db/data9/pg_stat_tmp
Locks, Waits, and Deadlocks
Locks and deadlocks are bane of any database application because they suck: at least one process would have to abort, sometimes more, and depending on how good your error handing is, this may have some undesired consequences.
But how to you find and eliminate deadlocks? Short answer is -- there is no short answer.
On many applications I worked, whether they were written in C, Perl, Java or Ruby, I've seen deadlocks happen again and again. Debugging deadlocks is a painful exercise and there is no prescribed answer that works in all cases. Debugging distributed deadlocks (that happen when distributed transactions spanning multiple databases lock up), is a lot harder than debugging deadlocks in a single database.
But in both cases, being able to determine who is locking who, is very very important. PostgreSQL keep lock information in several supporting tables, which can be queried. I found this info invaluable, as you can see which processes are blocking, and which ones are waiting, and eventually figure out how reorder operations in your application or reduce contention on the same database object. The point is that information here is key, and PostgreSQL luckily provides a good deal of it.
If you experience deadlocks, please see this page for detailed queries on lock contentions and deadlocks: http://wiki.postgresql.org/wiki/Lock_Monitoring
Summary
I wanted to share some of the recent finds, gotchas and also excitement about PostgreSQL 9 database, it's features and capabilities. I hope you found this post informative, and if anything maybe PostgreSQL will peek your interest. Feel free to leave a comment on any of the related topics.
Monday, July 25, 2011
Ruby, Mac Ports and OS-X Lion
Installing nokogiri (1.5.0) with native extensions [snip] `rescue in block in build_extensions': ERROR: Failed to build gem native extension. (Gem::Installer::ExtensionBuildError) /Users/kig/.rvm/rubies/ruby-1.9.2-p290/bin/ruby extconf.rb checking for libxml/parser.h... *** extconf.rb failed ***
In addition to installing new XCode from the app store, I also have Mac Ports installed, and so I had to perform the following command to allow nokogiri to build:
sudo port upgrade libxml2 libxslt
After this command completed, Nokogiri happily found libxml and it's dependencies.
UPDATE: Note that I no longer recommend installing libxml2 via ports. Use brew + rbenv instead.
K
Wednesday, June 15, 2011
New iMacs are Fast. Really Fricken Fast.
- Ruby 1.8.7 (14 seconds to download)
real 1m20.815s user 0m48.298s sys 0m11.479s
real 2m28.106s user 1m54.450s sys 0m14.372s
real 3m0.167s user 1m56.891s sys 0m19.959sModel Info:
Model Name: iMac Model Identifier: iMac12,2 Processor Name: Intel Core i7 Processor Speed: 3.4 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 8 MB Memory: 8 GB
Tuesday, May 31, 2011
mc-settings: Gem created at ModCloth for managing configuration
Monday, May 30, 2011
RailsConf 2011 Best of Sessions

Below I put together my personal list of best RailsConf 2011 presentations that have online slides or PDFs. I put my comments next to each. Click on the link, then click on the "slideshow" or "PDF" links for each talk.
- ActiveSupport 3: What We Should Know About What We Don't Know - how to use ActiveSupport in your ruby apps and get lots of behaviors and useful stuff for free
- Building Bulletproof Views really decent presentation on state of views in general, and best practices. Recommended reading.
- Building Pageless Apps with Rails and Backbone JS is a great presentation on a new up and coming JS client side framework for building reach apps on the client using pure MVC.
- David Heinemeier Hansson -- keynote, worth watching
- From 1,000 Transactions a Month to 1 million in a Day: Lessons in Credit Card Processing from LivingSocial - cool presentation on how BrainTree API is used to encrypt card data in JavaScript before submitting to BT for processing. Lots of other useful tidbits on credit card processing.
- How To Handle 1,000,000 Daily Users Without Using A Cache most interesting presentation for performance junkies, on how they outgrew MySQL and explored various other solutions to scale to 1M daily users. Highly recommended.
- KnowSQL: Database Tricks To Make Your Life Easier this is a PostgreSQL heavy presentation on how to get the most out of your database engine. Recommended to anyone who wants to learn more about using a real database =)
- OmniAuth from the Ground Up a really good presentation on OmniAuth - a generic authentication gem that supports a lot of options, including OpenID, LDAP, etc.
- Open Source E-Commerce With Spree VERY interesting presentation on this open source e-commerce platform. Great examples on using engines, customizable CSS, and many more. URL: https://github.com/spree/spree
- Rails Performance Tools - system tools such as lsof, strace, ltrace, gdb, perftools, memprof, etc to identify bottlenecks in your ruby apps that NewRelic can't, by Aman Gupta.
- Sass: The Future of Stylesheets a great overview of SASS and compass, with examples of @extend and @include.
- Stateful, Scalable Servers with EventMachine and Rails a talk by Aman Gupta (again) on EventMachine and writing non-blocking high concurrency but single-threaded TCP/IP servers. Highly recommended.
- The Holy Grail (of Databases) Overview of SQL, key-value stores and other NoSQL storage options. Highly recommended.
And if you are feeling sentimental, and added bonus:
- 50 in 50 is a really zany presentation on computer science achievements in the last century. Great meditative presentation, and goes way over 1 hour. No relationship to ruby/rails, but everyone will get something out of it
Friday, May 13, 2011
mms-mime: MM7/MMS MIME parsing gem
This gem was written with a simple aim to parse MM7 wrapped binary and base64 encoded MMS messages received via MM7/XML HTTP post from an MM7 compatible gateway connection (such as OpenWave, OpenMarket, etc).
The gem provides a simple way to parse and access MMS message contents, such as from, to, subject and content parts (including image and text parts).
Wednesday, December 8, 2010
Amazing Map of World of Warcraft World Routes and Travel Paths post Cataclysm
Saturday, September 18, 2010
GoGaRuCo 2010 - San Francisco Ruby Conference
- Super useful resource for lookup up shell commands: http://shellhaters.heroku.com/posix
- Terminator plugin: start your dev environment as you like it. Gem install terminator
- http://github.com/rdy/fixture_builder Factory to Fixtures converter to speed up your tests.
- pprof profile Ruby interpreter. Rack-Profiler project, great profiling tool.
- Coffee Script - wrapper (ruby-esque) for javascript; rails 3.1 supports coffee script templates
- Machine Learning - great talk, and O'Reiley book, http://twitter.com/igrigorik
- minitest fastest testing framework, many mentions, very fast, supports RSpec and Test::Unit syntax
- Caching: using fresh_when(:last_modified => ...) to enable proper HTTP caching in Rails 3.1
- Arel: enables fragment caching that does not run SQL if the fragment is cached
- ruby 1.9: require 'objspace' allows inspection of object counts and memory usage in VM
Tuesday, August 3, 2010
Twitter is the slowest social network, Facebook the fastest
Interesting statistics just came out on AlertSite blog post about performance and response time of various highly trafficked social networks.

Unsurprisingly, Facebook is the fastest, and Twitter is the slowest. Facebook has so much more data to deal with, so much media, and so many more users, that it really is a shame for Twitter with it's 140 character data set to be in this unglorious last place.
After continuous reports about migrating to Cassandra, Twitter still serves their tweets out of MemCaches that sit in front of MySQL. When those caches die, it takes a long time to refill them and the site yet again drops the very familiar Fail Whale.
Perhaps Twitter's infrastructure group could use some new blood to work on scaling and up-time. I am guessing that adding experienced people who scaled Facebook, or other high traffic sites could really help in the long run. The problem is that hiring experienced contributors to work on this pain point may feel threatening to the existing crew who continually tried and ultimately failed to fix Twitter's performance problems. Don't fall into this trap Twitter: people who are having hard time re-engineering current architecture will continue to have a hard time without an influx of fresh energy and new ideas. Consider making the necessary changes, and then perhaps you could pull out of that miserly last place.
