Friday, September 14, 2012

Wanelo Tech Blog: The Big Switch: How We Rebuilt Wanelo from Scratch and Lived to Tell About It

Just published my first contribution to Wanelo Blog on the subject of rewriting technology stack from scratch, and moving from Java to Ruby.

Feel free to leave comments here also!

-- Konstantin (at GoGaRuCo 2012).

Wednesday, August 8, 2012

Nagios checks for Sidekiq Queue size monitoring and Joyent RAM size

At Wanelo we are committed to sharing as much of our code as possible with the world, since we are ourselves highly reliant on open source movement (ruby, rails, etc).

Here is a small but growing project with simple BASH shell scripts used by Nagios monitoring system to ensure things are running smoothly.

https://github.com/wanelo/nagios-checks

Please see the examples in the README on how to use it with nagios.

Thursday, July 26, 2012

Getting RMagic and friends to work on OS-X Mountain Lion

Upgraded my ruby environment today to Mountain Lion.

Here is a quick checklist that I went through to get everything working.  The largest change was having to reinstall XCode and command line tools, and also download XQuarts in order to reinstall ImageMagick successfully. Without it, I was getting errors building RMagick of the following shape:

ld: file not found: /usr/lib/libltdl.7.dylib for architecture x86_64clang:
error: linker command failed with exit code 1
(use -v to see invocation)
make: *** [RMagick2.bundle] Error 1


Quick checklist:
  1. Install Mountain Lion
  2. Install XCode 4.4
  3. Install command line tools from XCode 4.4 Preferences dialog
  4. Install XQuartz
  5. In terminal run
    1. brew update
    2. brew uninstall imagemagick
    3. brew install --fresh imagemagick
    4. wipe out your ~/.rvm folder
    5. reinstall RVM and install the latest ruby 1.9.3-p-194
  6. run "bundle" in the project folder
  7. run "rake" and rejoice

References:

https://github.com/mroth/lolcommits/issues/65

    Wednesday, July 18, 2012

    Activity Feed Design

    I normally don't publish work diagrams, but this one sort of looks like work of art.

    It represents a collective design we came up with around aggregation of activity feeds on wanelo.com. If you have an account, you will soon see the results of this on http://wanelo.com/following


    Thursday, July 12, 2012

    Using CarrierWave with a dynamic fog host and a CDN

    Another simple case of "this took longer than it takes to look up this issue on Google" :)

    CarrierWave is a fantastic and well supported ruby gem for managing images, including support for cloud storage like Amazon.

    Whether or not you use CarrierWave on the cloud, you almost always want to put your user-uploaded images behind a CDN.  There are many CDNs available, including Amazon's CloudFront, Fastly.com, CacheFly, Akamai, etc, and comparing them is outside the scope of this article. 

    Because browsers, especially older, are limited in how many connections they will establish to a single host, it is sometimes beneficial to have images load from several alternating URLs, for example:
    http://cdn-0.wanelo.com
    http://cdn-1.wanelo.com
    http://cdn-2.wanelo.com

    etc...

    CarrierWave supports fog_host variable which can be set to either a string (a static hostname), or a lambda (if for example a randomized string is desired) and the usage is well described here.

    Unfortunately, due a bug that is not yet fixed (and unclear if it will be), you currently can not use the file object directly inside the lambda as it's shown in the examples. 

    If you actually want to use the "file" object inside the fog_host proc in any way, then fog_host must be a double-lambda, because the first lambda gets called by the accessor created in Configuration class. The second is then called in public_url instance method.

    Here is the configuration we ended up using, which works perfectly:
        config.fog_host = lambda do
          lambda do |file|
            host_number = file.path.sum % 4
            "cdn-s3-%d.ourdomain.com" % host_number
          end
        end
    
    The advantage of this approach, is that with several CDN hosts (which are useful to increase speed of browser's parallel download of images), this allows you to generate a consistent URL for each file instead of a random one. So a file with name "image.jpg" will always generate "cdn-s3-2.ourdomain.com" because of simple assignment:
    "image.jpg".sum % 4
    => 2
    Thanks to Jay Phillips for the tricky solution to this one.

    Sunday, June 10, 2012

    MagmaRails, and Enterprise Architecture with Rails

    This year's MagmaRails, where I was cordially invited to speak alongside Dr Nic, Aaron Patterson, Ron Evans, Blake Mizerany, Les HillDaniel Fischer and many more talented speakers from Mexico and USA, went without a hitch in a beautiful Manzanillo Municipality, Colima, Mexico.

    Organized by Crowd Interactive, this was by far one of my favorite conferences based on the amount of stuff learned, quality of personal interactions, general level of enjoyment, and also, of course, the drinking that went on with all the other fellow attendees and the speakers.  What a blast! Can't wait for the next one.

    Meanwhile, I've revised my presentation that was originally conceived for SFROR meetup, with more clarity, and a whole new section on how to get things started up and move fast, based in part on an ongoing professional experience at Wanelo.com.

    Here is the presentation!  Please leave comments for any feedback or suggestions.

    View more presentations from Konstantin Gredeskoul

    Sunday, June 3, 2012

    Starting a modern agile rails 3 project for a scale, from scratch - Part 0


    As I am entering my fifth month working at Wanelo, I feel very fortunate to be working with an amazing group of creative and compassionate people, building an awesome new application for the new generation social shopping platform that Wanelo had become.

    Because it is all so new — the team, infrastructure, the new application, we had to quickly make a lot of fundamental decisions about what toolsets to use, what processes to follow, how to optimally develop the new code, and how to build features as quickly as possible, how to minimize the amount of technical debt, and how not to forget about the scale (we have plenty of traffic already). Most of all, we wanted to have a ton of fun along the way :)

    The stack

    For our technology stack we chose Rails 3.2.3, Ruby 1.9.3 and PostgreSQL 9.1 as our base choices for the application. Besides these very fundamental choices, there are easily hundreds of tiny everyday decisions that a team on a new rails project needs to make. Once the basics are figured out, the frequency of these "big" decisions goes down. But it never stops, as the software for an active application is rarely frozen.

    A bit of history

    My very first Rails project was back in 2006, and it was an e-commerce site. There was just one magic book back then, and that was part of the attraction of Rails. Blurb.com launched a custom build e-commerce site in 4 months with 4 engineers, of whom all but one were new to ruby and rails. Having come from a recent Java project, I was blown away by how quickly everyone had become so incredibly productive.

    Since then, attracted by the same promise, thousands of developers had joined, and the ruby universe exploded. What's interesting, is that initially, and importantly, this technology attracted some of the brightest minds in the tech industry willing to escape to the world of software built in a cleverer, more coincise way that is both maintainable, incredibly compact, and follows the best patterns of software development, such as automated testing (or even TDD). Escaping from Java or PHP to Ruby brought happiness to people's career, including myself.

    As a result of exploding ruby-verse, the active toolset changes often. Any new team, having chosen their base set of tools, would have probably chosen a different set six month earlier. This is a unique trait of our chosen beloved development environment — i.e. building enterprise and web software with ruby (and often rails), and navigating it's vibrant, ever changing software ecosystem.

    There are many excellent (and often free) resources of information on the various open source options available to solve common problems. And when building a web app, especially starting it, many problems you encounter are very common and probably already solved. So,

    What tool is appropriate for ___ task?

    Should I use a gem or roll my own code for _____?

    How do you setup continuous integration?

    How should we all communicate?

    Should we be cowboy programmers and work individually "in corners" for 16 hours, or should we use paired programming?

    These are the many questions the new team must answer, and quickly. Arguing about these things is wasteful for the business. Even without an argument, each choice takes time.

    Some resources

    There is no shortage of most excellent resources (some free, some minimal and well deserved fee), such as:
    And watching/listening to these is most definitely one of the best ways to learn what's out there, and pick up some of best practices along the way. 

    But that takes time, and time is often scarce. So for those of you really needing to take a shortcut and kick start a modern rails 3 application, follow this series and I will go over a pretty reasonable set of tools and patterns to very common problems. And hopefully, this would be valuable for some people.

    Stay tuned for for next part, where we'll talk about some choices around persistence, and serving and storing the data.

    Monday, April 30, 2012

    SFROR Presentation: Rails in the Real World

    This is a slightly belated post, but recently I gave a presentation to 60+ rails developers in San Francisco about the evolution of the Rails application, and how it all fits together. Presentation, dramatically named "Rails in the Real World" is available on SlideShare.

    EDIT: I've updated this presentation for MagmaRails conference in June 2012 (presented in Manzanillo, MX), and the updated presentation is the one that should be viewed.

    Please see this post for the actual slides.


    Saturday, November 12, 2011

    Why I Like PostgreSQL

    Today I gave a short presentation at work about PostgreSQL, and why I much prefer it to MySQL.

    PostgreSQL vs MySQL: Eternal Battle


    I may be misreading this, but it seems that there is a recent trend within startups to move away from MySQL, probably thanks to folks like Heroku on one side (who use PostgreSQL to the extreme, and help and contribute to it's development), vs folks like Oracle on the other side, tainting the "open source pureness" of MySQL :)

    At my work we currently use a mid-sized MySQL 5.1 Percona instance, which is holding up quite well I must admit. Both PostgreSQL and MySQL have definitely converged to cover most features that people want, but my leaning is still towards PostgreSQL. I just agree with it's focus on data integrity, recovery, constraints, extensibility, while some of the early decisions in MySQL's design do not agree with me at all (like truncating long strings, 1/0 instead of booleans, ambiguous group by, etc). I think that data integrity may not have been the top priority in MySQL's early design (and it wasn't, MyISAM was fast, but not great at integrity).

    I work for an e-commerce company, where transactions are very, very important. Loosing data is just not acceptable. Creating data that fails validation is a huge pain for our business analysts and accounting, who have to make sense of it. Early MySQL did not have sufficient constraints and foreign keys to encourage data modellers to use it.

    I even believe that this lack of support for constraints and validation was the reason early Rails adopters rejected database constraints, and pushed people towards 100% in-application validation, while treating your database sort of like a document store (early Rails books and applications rarely create non-null columns, or even give varchar columns length specification, leaving MySQL to create all string columns with the default 256 character limit).

    Among examples of where MySQL data integrity seems like an afterthought, is one where we recently discovered that a MySQL Replica being used for reporting allows writes, and was in fact quite out of sync with the master. How can you trust your reports that run off a replica, which is no longer a true replica?

    Anyway, MySQL has plenty of support, fans and still enjoys wide spread usage. But if you are ready to try out PostgreSQL, here's my crash course of PostgreSQL install, features and some recent gotchas I had to deal with.

    Note: parts of this post were inspired by a related post on data and PostgreSQL on SquareUp Technical blog.

    PostgreSQL Basics

    Installers are now available for Mac OS-X and Windows.

    But I prefer compiling from sources. Compiling on unixes is very easy. Download the tar ball, unpack it and then run:
    /configure --prefix=/usr/local/pgsql-9.1
    make
    make install
    

    then let's create a database in directory /db on this server:
    /usr/local/pgsql-9.1/bin/initdb –D /db
    /usr/local/pgsql-9.1/bin/pg_ctl –D /db start
    /usr/local/pgsql-9.1/bin/psql –U postgres postgres
    

    Note that if you use an installer, your default user may be not "postgres" but your Mac or Windows username.

    Configuration

    Two critical files in /db:

    postgresql.conf

    Most db settings, performance, memory, optimizer, network interface to listen on, go into this file.

    You will generally want to change the following (let me know if anyone is interested, and I can make some recommendations about which values I use).
    shared_buffers 
    temp_buffers 
    work_mem 
    maintenance_work_mem 
    checkpoint_segments 
    wal_keep_segments 
    effective_cache_size
    
    I also like to enable logger for slow queries in pg_log directory:
    logging_collector = on 
    log_directory = 'pg_log' 
    log_filename = 'postgresql-%Y-%m-%d.log' 
    log_rotation_age = 1d 
    log_rotation_size = 0 
    log_min_error_statement = error 
    log_min_duration_statement = 200 
    log_lock_waits = on 
    log_statement = 'none'
    

    pg_hba.conf

    Access, including network, replication, etc. This is the file you want to modify to allow remote replication, remote access, decide which authentication method to use, etc.

    Most of the time I use trust on a local system, and md5 for remote authentication.

    Some Neat Features

    PostgreSQL 9 boast a set of pretty cool features, some of them are listed below:
    • Partial Indexes (Reduce size of the index (say if only 10% of products are active):
    create index on products (category_id) where isactive = true;
    • Function Indexes (instead of creating another column with lower case email):
    create index on users to_lower(email);
    select * from users where to_lower(‘MyEmail@GMAIL.COM’) = ‘myemail@gmail.com’;
    

    More Cool Stuff

    • Create indexes concurrently (without table locks)
    • Schema modifications can be done in a transaction
    • Instant non-locking adding of nullable columns to large table (major issue today with MySQL)
    • Schemas, table spaces!  Can create indexes on a different table space (ie disk partition)
    • Extensible stored procedures: Java, Perl, Python, Ruby, Tcl. C/C++ and its own PL/pgSQL
    • Cost-based optimizer is generally better than rule base optimizer.  Takes into consideration data distribution
    • Full featured text search
    • INTERSECT and EXCEP in addition to UNION
    • Built-in performance statistics: pg_stat_activity

    PostgreSQL In Practice: Replication

    PostgreSQL 9.1.1 is the latest stable version as of this writing, and since version 9.0 PG supports streaming replication which I have recently setup on several servers.

    There are a number of decent guides out there, for example here, and also here.

    While setting up replication on PostgreSQL 9.1.1, I had a tiny trouble being able to replicate from the master db. The error was being printed on the master: "FATAL: must be replication role to start walsender"

    Somehow, superuser "postgres" was not explicitly given "Replication" role. Weird, considering it's a super user. I think this may be a recent change in PostgreSQL default permissions. But to go around this, either add the REPLICATION role to user "postgres", or create a new role for replication only (must also have LOGIN role):
    CREATE ROLE REPLICATOR REPLICATION LOGIN ENCRYPTED PASSWORD '....';

    Then in pg_hba.conf:
    host replication  replicator  10.0.0.0/32    md5

    Using Replica for Queries

    Another bump I ran into is the following: I wanted to use the PostgreSQL replica to run reports, pg_dump and other long-running queries. But they immediately failed with an error: "ERROR: canceling statement due to conflict with recovery"
    See this thread for more info: http://postgresql.1045698.n5.nabble.com/Hot-Standby-ERROR-canceling-statement-due-to-conflict-with-recovery-td3402417.html
    The solution is to increase the following parameter to allow queries longer than 10 minutes to successfully execute:
    max_standby_streaming_delay = 600s
    This also means your replica may be up to 10 minutes behind the master, but in my case this was an acceptable compromise.

    I am guessing if someone wanted to setup a true hot-standby with minimum delay, it would not be very usable for reporting. So perhaps the following setup provides both redundancy and a reporting instance:
    [master-db] <- [hot standby] <-- [hot standby, 2 hrs delay, reports]
    

    Where is my Processlist?

    MySQL users will no doubt miss infamous "show processlist" command. Don't fret: there is such thing in PostgreSQL too:
    select * from pg_stat_activity;

    This used to be in a separate contrib module, but now included by default. Which is great, because it's fast and provides locking information (whether queries wait on locks). This module uses a tiny temporary storage (usually in pg_stat_tmp) directory, which if you want to be really fancy, you would mount on a RAM disk partition. It looks like the size of the file inside that directory is constant (does not grow).

    Filesystem            Size  Used Avail Use% Mounted on
    /dev/ram1             9.7M  226K  9.0M   3% /db/data9/pg_stat_tmp
    

    Locks, Waits, and Deadlocks

    Locks and deadlocks are bane of any database application because they suck: at least one process would have to abort, sometimes more, and depending on how good your error handing is, this may have some undesired consequences.

    But how to you find and eliminate deadlocks? Short answer is -- there is no short answer.

    On many applications I worked, whether they were written in C, Perl, Java or Ruby, I've seen deadlocks happen again and again. Debugging deadlocks is a painful exercise and there is no prescribed answer that works in all cases. Debugging distributed deadlocks (that happen when distributed transactions spanning multiple databases lock up), is a lot harder than debugging deadlocks in a single database.

    But in both cases, being able to determine who is locking who, is very very important. PostgreSQL keep lock information in several supporting tables, which can be queried. I found this info invaluable, as you can see which processes are blocking, and which ones are waiting, and eventually figure out how reorder operations in your application or reduce contention on the same database object. The point is that information here is key, and PostgreSQL luckily provides a good deal of it.

    If you experience deadlocks, please see this page for detailed queries on lock contentions and deadlocks: http://wiki.postgresql.org/wiki/Lock_Monitoring

    Summary

    I wanted to share some of the recent finds, gotchas and also excitement about PostgreSQL 9 database, it's features and capabilities. I hope you found this post informative, and if anything maybe PostgreSQL will peek your interest. Feel free to leave a comment on any of the related topics.