Saturday, February 18, 2017

Car history review


From To Made Model Buy  Sold Per Year Transmission CC HP Highlights Comments Problems
1999 2000 Honda Civic Sir $128,000 $95,000 $33,000 5MT 1600 170 170ps Don’t trust 2nd hand car dealers Engine leakage, history
2004 2011 Toyota Corolla $110,000 $34,000 $10,857 5MT 1500 109 Corolla Family car, MT, practical AC, 3th gear
2007 2011 Peugeot 206 RC $71,000 $30,000 $10,250 5MT 2000 180 Hatchback, 2000cc, 2 doors Good corners, fun to drive, great value for money Noise, high rpm on highway
2011 2013 Honda S2000 $130,000 $140,000 ($5,000) 6MT 2000 250 9000rpm, coupe, 6MT RWD is no good… Steeling, engine noise, airbag light, history
2013 Suzuki Swift Sport $185,000 6MT 1600 135 4 doors, 6MT, ECAP 5stars Nice warm hatch none, a bit small 
2017 Subaru XV 1.6 $189,800 CVT 1600 114 Boxer, AWD, Safety, Larger

Friday, March 6, 2015

declarative and imperative programming

Declarative programming is when you write your code in such a way that it describes what you want to do, and not how you want to do it. It is left up to the compiler to figure out the how. Examples of declarative programming languages are SQL and Prolog

Reference: http://stackoverflow.com/questions/129628/what-is-declarative-programming


A great C# example of declarative vs. imperative programming is LINQ.

With imperative programming, you tell the compiler what you want to happen, step by step.

For example, let's start with this collection, and choose the odd numbers:

List<int> collection = new List<in  t> { 1, 2, 3, 4, 5 };

With imperative programming, we'd step through this, and decide what we want:

List<int> results = new List<int&g  t;();  foreach(var num in collection)  {      if (num % 2 != 0)            results.Add(num);  }

Here, we're saying:

  1. Create a result collection
  2. Step through each number in the collection
  3. Check the number, if it's odd, add it to the results

With declarative programming, on the other hand, you write code that describes what you want, but not necessarily how to get it (declare your desired results, but not the step-by-step):

var results = collection.Where( num => num % 2 != 0);        
Reference: http://stackoverflow.com/questions/1784664/what-is-the-difference-between-declarative-and-imperative-programming


http://latentflip.com/imperative-vs-declarative/

Thursday, March 5, 2015

Consider using node.js for mobile app

http://blog.langoor.mobi/node-js-mobile-web-apps/
http://www.quora.com/Should-I-use-Go-or-Node-js-for-a-mobile-app-backend

Push technology
https://www.google.com.hk/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0CBsQFjAA&url=http%3A%2F%2Fen.wikipedia.org%2Fwiki%2FPush_technology&ei=gLr1VMXMGKGnmAXDlID4Cw&usg=AFQjCNGIXBTAcTDmv-bK5_H5lHq1XFjQ0A&sig2=-9uCYkeBLewm1b8lFLspBQ&bvm=bv.87269000,d.dGY

http://www.gianlucaguarini.com/blog/nodejs-and-a-simple-push-notification-server/

What server is suitable for mobile app?
http://aws.amazon.com/mobile/
...


Related topics:
How whatsapp works?
https://www.google.com.hk/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=how%20whatsapp%20do%20server%20push

Mobile App development tools

Products:

http://xamarin.com/?_bt=44014802828&_bk=%2Bxamarin&_bm=b&gclid=Cj0KEQiA1NWnBRDchObfnYrbo78BEiQA-2jqBcR4-xZxH4KEPXEuEEr4Pv4IfkeNvuOKLjcCji7AXYsaAsF88P8HAQ

http://phonegap.com/

http://www.telerik.com/

http://rhomobile.com/products/rhogallery/

http://www.appcelerator.com/titanium/



References:
http://www.tcs.com/SiteCollectionDocuments/White%20Papers/Mobility_Whitepaper_Client-Architecture_1012-1.pdf

Comparison on NoSQL Solutions

https://www.google.com.hk/search?q=nosql+comparison&ie=utf-8&oe=utf-8&gws_rd=cr&ei=hhD4VP3lNIbv8gX0moGwAg

http://db-engines.com/en/system/Memcached%3BMongoDB%3BRedis
https://www.digitalocean.com/community/tutorials/a-comparison-of-nosql-database-management-systems-and-models
http://kkovacs.eu/cassandra-vs-mongodb-vs-couchdb-vs-redis

Wednesday, February 25, 2015

Top 50 interview questions

reference: http://www.glassdoor.com/blog/common-interview-questions/

What are your strengths?
What are your weaknesses?
Why are you interested in working for [insert company name here]?
Where do you see yourself in 5 years? 10 years?
Why do you want to leave your current company?
Why was there a gap in your employment between [insert date] and [insert
date]?
What can you offer us that someone else can not?
What are three things your former manager would like you to improve on?
Are you willing to relocate?
Are you willing to travel?
Tell me about an accomplishment you are most proud of.
Tell me about a time you made a mistake.
What is your dream job?
How did you hear about this position?
What would you look to accomplish in the first 30 days/60 days/90 days
on the job?
Discuss your resume.
Discuss your educational background.
Describe yourself.
Tell me how you handled a difficult situation.
Why should we hire you?
Why are you looking for a new job?
Would you work holidays/weekends?
How would you deal with an angry or irate customer?
What are your salary requirements? (Hint: if you're not sure what's a
fair salary range and compensation package, research the job title
and/or company on Glassdoor.)
Give a time when you went above and beyond the requirements for a project.
Who are our competitors?
What was your biggest failure?
What motivates you?
What's your availability?
Who's your mentor?
Tell me about a time when you disagreed with your boss.
How do you handle pressure?
What is the name of our CEO?
What are your career goals?
What gets you up in the morning?
What would your direct reports say about you?
What were your bosses' strengths/weaknesses?
If I called your boss right now and asked him what is an area that you
could improve on, what would he say?
Are you a leader or a follower?
What was the last book you've read for fun?
What are your co-worker pet peeves?
What are your hobbies?
What is your favorite website?
What makes you uncomfortable?
What are some of your leadership experiences?
How would you fire someone?
What do you like the most and least about working in this industry?
Would you work 40+ hours a week?
What questions haven't I asked you?
What questions do you have for me?

97 things that every system architect should know

Don't put your resume ahead of the requirements by Nitin Borwankar
Simplify essential complexity; diminish accidental complexity by Neal Ford
Chances are your biggest problem isn't technical by Mark Ramm
Communication is King; Clarity and Leadership its humble servants by
Mark Richards
Architecting is about balancing by Randy Stafford
Seek the value in requested capabilities by Einar Landre
Stand Up! by Udi Dahan
Skyscrapers aren't scalable by Micheal Nygard
You're negotiating more often than you think. by Michael Nygard
Quantify by Keith Braithwaite
One line of working code is worth 500 of specification by Allison Randal
There is no one-size-fits-all solution by Randy Stafford
It's never too early to think about performance by Rebecca Parsons
Application architecture determines application performance by Randy
Stafford
Commit-and-run is a crime. by Niclas Nilsson
There Can be More than One by Keith Braithwaite
Business Drives by Dave Muirhead
Simplicity before generality, use before reuse by Kevlin Henney
Architects must be hands on by John Davies
Continuously Integrate by Dave Bartlett
Avoid Scheduling Failures by Norman Carnovale
Architectural Tradeoffs by Mark Richards
Database as a Fortress by Dan Chak
Use uncertainty as a driver by Kevlin Henney
Scope is the enemy of success by Dave Quick
Reuse is about people and education, not just architecture by Jeremy Meyer
There is no 'I' in architecture by Dave Quick
Get the 1000ft view by Erik Doernenburg
Try before choosing by Erik Doernenburg
Understand The Business Domain by Mark Richards
Programming is an act of design by Einar Landre
Time changes everything by Philip Nelson
Give developers autonomy by Philip Nelson
Value stewardship over showmanship by Barry Hawkins
Warning, problems in mirror may be larger than they appear by Dave Quick
The title of software architect has only lower-case 'a's; deal with it
by Barry Hawkins
Software architecture has ethical consequences by Michael Nygard
Everything will ultimately fail by Michael Nygard
Context is King by Edward Garson
It's all about performance by Craig L Russell
Engineer in the white spaces by Michael Nygard
Talk the Talk by Mark Richards
Heterogeneity Wins by Edward Garson
Dwarves, Elves, Wizards, and Kings by Evan Cofsky
Learn from Architects of Buildings by Keith Braithwaite
Fight repetition by Niclas Nilsson
Welcome to the Real World by Gregor Hohpe
Don't Control, but Observe by Gregor Hohpe
Janus the Architect by Dave Bartlett
Architects focus is on the boundaries and interfaces by Einar Landre
Challenge assumptions - especially your own by Timothy High
Record your rationale by Timothy High
Empower developers by Timothy High
It is all about the data by Paul W. Homer
Control the data, not just the code by Chad LaVigne
Don't Stretch The Architecture Metaphors by David Ing
Focus on Application Support and Maintenance by Mncedisi Kasper
Prepare to pick two by Bill de hOra
Prefer principles, axioms and analogies to opinion and taste by Michael
Harmer
Start with a Walking Skeleton by Clint Shank
Share your knowledge and experiencesby Paul W. Homer
Make sure the simple stuff is simple by Chad LaVigne
If you design it, you should be able to code it. by Mike Brown
The ROI variable by George Malamidis
Your system is legacy, design for it. by Dave Anderson
If there is only one solution, get a second opinion by Timothy High
Understand the impact of change by Doug Crawford
You have to understand Hardware too by Kamal Wickramanayake
Shortcuts now are paid back with interest later by Scot Mcphee
"Perfect" is the Enemy of "Good Enough" by Greg Nyberg
Avoid "Good Ideas" by Greg Nyberg
Great content creates great systems by Zubin Wadia
The Business Vs. The Angry Architect by Chad LaVigne
Stretch key dimensions to see what breaks by Stephen Jones
Before anything, an architect is a developer by Mike Brown
A rose by any other name will end up as a cabbage by Sam Gardiner
Stable problems get high quality solutions by Sam Gardiner
It Takes Diligence by Brian Hart
Take responsibility for your decisions by Yi Zhou
Don't Be a Problem Solver by Eben Hewitt
Choose your weapons carefully, relinquish them reluctantly by Chad LaVigne
Your Customer is Not Your Customer by Eben Hewitt
It will never look like that by Peter Gillard-Moss
Choose Frameworks that play well with others by Eric Hawthorne
Make a strong business case by Yi Zhou
Pattern Pathology by Chad LaVigne
Learn a new language by Burk Hufnagel
Don't Be Clever by Eben Hewitt
Build Systems to be Zuhanden by Keith Braithwaite
Find and retain passionate problem solvers by Chad LaVigne
Software doesn't really exist by Chad LaVigne
Pay down your technical debt by Burk Hufnagel
You can't future-proof solutions by Richard Monson-Haefel
The User Acceptance Problem by Norman Carnovale
The Importance of Consommé by Eben Hewitt
For the end-user, the interface is the system by Vinayak Hegde
Great software is not built, it is grown by Bill de hora

97 things that a programmer should know

Act with Prudence
Apply Functional Programming Principles
Ask "What Would the User Do?" (You Are Not the User)
Automate Your Coding Standard
Beauty Is in Simplicity
Before You Refactor
Beware the Share
The Boy Scout Rule
Check Your Code First Before Looking to Blame Others
Choose Your Tools with Care
Code in the Language of the Domain
Code Is Design
Code Layout Matters
Code Reviews
Coding with Reason
A Comment on Comments
Comment Only What the Code Cannot Say
Continuous Learning
Convenience Is Not an –ility
Deploy Early and Often
Distinguish Business Exceptions from Technical
Do Lots of Deliberate Practice
Domain-Specific Languages
Don't Be Afraid to Break Things
Don't Be Cute with Your Test Data
Don't Ignore That Error!
Don't Just Learn the Language, Understand its Culture
Don't Nail Your Program into the Upright Position
Don't Rely on "Magic Happens Here"
Don't Repeat Yourself
Don't Touch That Code!
Encapsulate Behavior, Not Just State
Floating-Point Numbers Aren't Real
Fulfill Your Ambitions with Open Source
The Golden Rule of API Design
The Guru Myth
Hard Work Does Not Pay Off
How to Use a Bug Tracker
Improve Code by Removing It
Install Me
Inter-Process Communication Affects Application Response Time
Keep the Build Clean
Know How to Use Command-line Tools
Know Well More than Two Programming Languages
Know Your IDE
Know Your Limits
Know Your Next Commit
Large Interconnected Data Belongs to a Database
Learn Foreign Languages
Learn to Estimate
Learn to Say "Hello, World"
Let Your Project Speak for Itself
The Linker Is Not a Magical Program
The Longevity of Interim Solutions
Make Interfaces Easy to Use Correctly and Hard to Use Incorrectly
Make the Invisible More Visible
Message Passing Leads to Better Scalability in Parallel Systems
A Message to the Future
Missing Opportunities for Polymorphism
News of the Weird: Testers Are Your Friends
One Binary
Only the Code Tells the Truth
Own (and Refactor) the Build
Pair Program and Feel the Flow
Prefer Domain-Specific Types to Primitive Types
Prevent Errors
The Professional Programmer
Put Everything Under Version Control
Put the Mouse Down and Step Away from the Keyboard
Read Code
Read the Humanities
Reinvent the Wheel Often
Resist the Temptation of the Singleton Pattern
The Road to Performance Is Littered with Dirty Code Bombs
Simplicity Comes from Reduction
The Single Responsibility Principle
Start from Yes
Step Back and Automate, Automate, Automate
Take Advantage of Code Analysis Tools
Test for Required Behavior, Not Incidental Behavior
Test Precisely and Concretely
Test While You Sleep (and over Weekends)
Testing Is the Engineering Rigor of Software Development
Thinking in States
Two Heads Are Often Better than One
Two Wrongs Can Make a Right (and Are Difficult to Fix)
Ubuntu Coding for Your Friends
The Unix Tools Are Your Friends
Use the Right Algorithm and Data Structure
Verbose Logging Will Disturb Your Sleep
WET Dilutes Performance Bottlenecks
When Programmers and Testers Collaborate
Write Code as If You Had to Support It for the Rest of Your Life
Write Small Functions Using Examples
Write Tests for People
You Gotta Care About the Code
Your Customers Do Not Mean What They Say

Sunday, February 22, 2015

Revisit the basics: Encapsulation

Encapsulation is the techinque of making fields in a class private and
providing access to the fields via public methods.
Benefits:
1. Read/Write access control.
2. Able to change the data processing of a field without changing user's
code.

Practice - High cohesion, Low coupling

High cohesion: A class has a clear responsibility and all of the implementation of that responsibility is close together or in one place.  For example, to put all the logic to write to database in one class or several classes in a module, instead of letting every part of the application that needs to write to database implement that logic in its own class.

Low coupling: A module or class to be decoupled if it does not have many dependencies on other classes or modules.  It reduces the impact of a change in one module across other modules.  Loosely coupled software is easier to reuse because it doesn't have many other dependencies to be included.

Thursday, February 12, 2015

Tip of the day: Scale up first, then scale out

... Jacques makes the point we often make on this site: scale up first, then scale out. If a machine with 64 cores and 256G of RAM no longer works for you, then scale out. Start small and evolve....

http://highscalability.com/blog/2015/2/11/rescuing-an-outsourced-project-from-collapse-8-problems-foun.html

Wednesday, February 11, 2015

Why node.js 2

Reference: http://www.javaworld.com/article/2104480/java-web-development/why-use-node-js.html

In one sentence: Node.js shines in real-time web applications employing push technology over websockets. What is so revolutionary about that? Well, after over 20 years of stateless-web based on the stateless request-response paradigm, we finally have web applications with real-time, two-way connections, where both the client and server can initiate communication, allowing them to exchange data freely. This is in stark contrast to the typical web response paradigm, where the client always initiates communication. Additionally, it's all based on the open web stack (HTML, CSS and JS) running over the standard port 80.

Still, Node.js is no silver bullet; in fact it has some real weaknesses -- for instance in CPU-intensive operations and other heavy computation. Find out why Node.js is becoming a go-to platform for some kinds of web development, and why in some cases you still might want to choose another option, like Java.

Why note.js 1

Some says:

Reference: http://blog.modulus.io/top-10-reasons-to-use-node

TOP 10 REASONS TO USE NODE.JS
modulesnpmnode.jsTop 10 Reasons To Use Node.js
There are many great reasons to use Node.js, regardless of experience level. Take a look into what some of the greatest practical reasons are to use Node and why you should love it.

I get it. You're not a bandwagon developer. You don't use the cool, trendy platform just because everyone else is. That's why you haven't looked seriously at Node.js yet. (Or your boss hasn't let you yet.) Well, it's time to look again. There are many great, practical reasons to use Node. Here are ten of them.



1. You Already Know JavaScript
Let me guess. You're using a rich client framework (Angular, Ember, Backbone) and a REST-ful server-side API that shuttles JSON back and forth. Even if you're not using one of those frameworks, you've written your own in jQuery. So if you're not using Node.js on the server, then you're constantly translating. You're translating two things: 1) the logic in your head from JavaScript to your server-side framework, and 2) the HTTP data from JSON to your server-side objects.

By using JavaScript throughout your app, you not only gain mental energy, you gain practicality as well. By potentially re-using your models, and templates, you reduce the size of your application which reduces complexity and chance for bugs.

JavaScript as a language is eating the world. It is not going away soon. There is a JavaScript runtime on every personal computer in the world, and it looks to stay that way for awhile.

2. It's Fast
Node.js is a JavaScript runtime that uses the V8 engine developed by Google for use in Chrome. V8 compiles and executes JavaScript at lightning speeds mainly due to the fact that V8 compiles JavaScript into native machine code.



In addition to lightning fast JavaScript execution, the real magic behind Node.js is the event loop. The event loop is a single thread that performs all I/O operations asynchronously. Traditionally, I/O operations either run synchronously (blocking) or asynchronously by spawning off parallel threads to perform the work. This old approach consumes a lot of memory and is notoriously difficult to program. In contrast, when a Node application needs to perform an I/O operation, it sends an asynchronous task to the event loop, along with a callback function, and then continues to execute the rest of its program. When the async operation completes, the event loop returns to the task to execute its callback.

In other words, reading and writing to network connections, reading/writing to the filesystem, and reading/writing to the database–all very common tasks in web apps–execute very, very fast in Node. Node allows you to build fast, scalable network applications capable of handling a huge number of simultaneous connections with high throughput.

3. Tooling


npm is the Node.js package manager and it... is... excellent. It does, of course, resemble package managers from other ecosystems, but npm is fast, robust, and consistent. It does a great job at specifying and installing project dependencies. It keeps packages isolated from other projects, avoiding version conflicts. But it also handles global installs of shell commands and platform-dependent binaries. I can't remember a time with npm where I've had to ask myself, "Why are those modules conflicting? Where is that module installed? Why is it picking up this version and not that one?"

grunt is the venerable task runner, but new kids on the block gulp, brunch, and broccoli focus on builds that transform your files, and take advantage of JavaScript's strong file streams capabilities.

4. You Already Know JavaScript, Again


So you've decided to use JavaScript on the server, and you're proud of your decision that avoids all that translating from client data to server data. But persisting that data to the database requires even more translations!

There's good news. If you're using an object database like Mongo, then you can extend JavaScript to the persistence layer as well.

Using Node.js allows you to use the same language on the client, on the server, and in the database. You can keep your data in its native JSON format from browser to disk.

5. Real-time Made Easy
If Node.js excels at many concurrent connections, then it makes sense that it excels at multi-user, real-time web applications like chat and games. Node's event loop takes care of the multi-user requirement. The real-time power comes thru use of the websocket protocol. Websockets are simply two-way communications channels between the client and server. So the server can push data to the client just as easily as the client can. Websockets run over TCP, avoiding the overhead of HTTP.

Socket.io is one of the most popular websocket libraries in use, and makes collaborative web applications dead simple. Here's a simple server using socket.io:

var app = require('http').createServer(handler)
var io = require('socket.io')(app);

app.listen(8080);

io.on('connection', function (socket) {
 
  // Send a message to the client
  socket.emit('event to client', { hello: 'world' });

  // Handle a message from the client
  socket.on('event from client, function (data) {
    console.log(data);
  });
});
6. Streaming data
Traditionally, web frameworks treat HTTP requests and responses as whole data objects. In fact, they're actually I/O streams, as you might get if you streamed a file from the filesystem. Since Node.js is very good at handling I/O, we can take advantage and build some cool things. For example, it's possible to transcode audio or video files while they're uploading, cutting down on the overall processing time.

Node can read/write streams to websockets just as well as it can read/write streams to HTTP. For example, we can pipe stdout from a running process on the server to a browser over a websocket, and have the webpage display the output in real-time.

7. One Codebase And Your Real-time For Free
If you've made it this far, you may ask yourself, "If Node.js allows me to write JavaScript on the client and server, and makes it easy to send data between the client and server, can I write a web app that runs a single codebase on both client and server, and automatically synchronizes data between the two?"



The answer to your question would be yes, and the framework for that app would be Meteor. Meteor is a next-generation web framework built atop Node. It runs the same codebase on the both the client and server. This allows you to write client code that saves directly to a database. Then, that data is automatically persisted to the server. It works the other way too! Any data changes on the server are automatically sent to the client. It gets better. Any webpage displaying that data reacts automatically and updates itself!

// Save the value of 'name' upon clicking 'submit' directly in the browser!
'.click .submit': function(e, tpl) {
  Users.update(
    { _id: this._id },
    { $set: { name: $('.name').val() }}
  );
}
8. Corporate Caretaker
The inherent risk with any open-source project is abandonment by its volunteer maintainers. This isn't the case with Node.js. Node is currently sponsored by Joyent, who has hired a project lead and other core contributors, so there is a real company backing the future of the project. Not to mention there are a great number of major companies backing the project at every level including Walmart, Microsoft, Yahoo, Paypal, Voxer, and more.

9. Hosting


With rapid adoption, world-class Node.js hosting is also proliferating. In particular, Platform-as-a-Service (PaaS) providers such as Modulus and other reduce deployments to a single command. Even the granddaddy of PaaS, Heroku, now formally supports Node deployments.

10. Every Developer Knows (A Little) JavaScript
This one's for your boss.

Since the dawn of the web, there have been JavaScript onclick's and onmouseover's. Every web developer has coded a little JavaScript, even if that JavaScript was hacking a jQuery plugin. Finding web development talent is terribly difficult these days. So when choosing a web platform, why not choose the platform whose language is known by every web developer in the world?

In Conclusion, A Bonus!
But wait, there's more! As with any platform or product, open-source or otherwise, its community is a huge influencing factor. And Node's is second to none. From meetups to conferences, there are really smart people working on the ecosystem every day. At the same time, the community is welcoming. These same smart people are always willing to offer help to folks new to Node, or even programming in general. You won't feel bad for asking a question on IRC or opening an issue. This community is also very active, with over 91,000 modules on npm. And this community is generous. In 2013, individuals donated over $70,000 to help run the public npm servers.

Yes, Node is trendy at the moment. This is web development, so next week Node may be dead, and the next hot thing will have arrived (will it be Go or Elixir?). But give it a try.

Friday, January 23, 2015

Article on Scaling in Ruby on Rails

Reference: http://mikepackdev.com/blog_posts/40-5-early-lessons-from-rapid-high-availability-scaling-with-rails

Note: This is a nice article that I read on web, just save a copy here for my own reference.  Please let me know by email if you are the author of this article and would like me to remove it from my blog.  Thank you.

5 Early Lessons from Rapid, High Availability Scaling with Rails

At Ello, we were blindsided by the amount of traffic we were receiving. Right time, right place, I guess. One week, we're seeing a few thousand daily sessions. The following week, a few million. This insurgence of users meant the software we built was contorted in directions we never thought possible.

Like anything viral, there's a massive influx of interest for a relatively short period of time, followed by a slow decline, leaving a wake of destruction as the subject settles into its new mold. Ello has since settled, so what better time than now to document some of the lessons learned whiling scaling during those critical weeks of virality. I want to ensure these lessons are not merely light takeaways but rather tangible advice that you can apply if you're ever fortunate/unfortunate enough to be put in a similar situation. As such, parts of this article will be specific to Ello and may not apply in other domains.

Lesson 1: Move the graph

One of our first scaling hurdles involved managing the graph of relationships between users. We didn't just intuitively say, "oh, the graph is slow," but it didn't take much prodding either. We're on a standard Rails stack, using Heroku and Postgres. We have a table called relationships which stores all data about how users are tied together. Have you friended, blocked, or unfriended someone? It's all stored in the relationships table.

We're building a social network. By definition, our relationships table is one of the hottest tables we have. How many people are you following in total? How many in friends? How many in noise? Who should be notified when you create a post? All of these questions rely on the relationships table for answers. Answers to these questions will be cached by Postgres, so only the initial query incurs the cost of calculating the results. Subsequent queries are fast. But Postgres' query cache alone becomes meager at scale. As a user on a new social network, accumulating relationships is a regular activity. Every new relationships formed busts Postgres' cache for queries on that data. This was a high read, high write table.

Since we're on Heroku, we had the phenomenal Heroku Postgres tools at our disposal. When thrown into the fire, one of the best extinguishers was heroku pg:outliers. This command illuminates the top 10 slowest queries. All 10 of ours were associated with the relationships table. We had all the right indexes in place, yet some queries were taking up to 10 seconds to produce results.

Resolving a problem like this is application specific, but in our case the best option was to denormalize the relationship data into a datastore that could more easily answer our pertinent and frequent questions about the social graph. We chose Redis. It was a bit of a knee-jerk reaction at the time but a technique we've had success with in the past. Only after having implemented this, did we stumble upon a reassuring article outlining how Pinterest uses Redis for their graph. To be clear, we didn't move the data entirely, we provided an additional layer of caching. All data is still stored in Postgres for durability and is cached in Redis for speed. In the event of a catastrophe, the Redis data can be rebuilt at any time.

We moved all of our hot queries against the relationships table into Redis. Since "followers" and "followings" are displayed on every profile and a count(*) was our top outlier, our first step was to cache these values in Redis counters. We used Redis Objects to make this simple and elegant. Any time a new relationship was created or destroyed, these counters are incremented and decremented. When looking at another user's profile, to render the UI we needed to answer the question "are you following this user? If so, in the friends or noise bucket?" To answer this and similar questions, we cached the user IDs of all people who you had in your friends bucket, your noise bucket, and the union of both.

With our graph data in Redis, we can now query the graph in ways that would be prohibitively expensive with Postgres. In particular, we use it to influence our recommendation system. "Give me all the users that are being followed by people I'm following, but I'm not yet following." Using Redis set intersections, unions, and diffs, we can begin to derive new and interesting uses of the same data.

The real lesson here is this: every product has a core pillar that supports the core offering. Ello's is a social graph. When your core pillar begins to buckle under its own weight, it is critical to cache that data (or move it entirely) and continue providing your core offering.

Lesson 2: Create indexes early, or you're screwed

No really, you'll be chasing down these indexes for months. The previous section outlined how we scaled our relationships table. This, and subsequent sections will detail how we scaled our activities table, or the denormalized table that runs everyone's main activity feed. The activity feed contains any posts that people you follow have created, notifications for when someone follows you, notifications for mentions, and the like. Everything that you need to be notified about ends up in this table and we forgot some indexes.

Prior to Ello, I fell into the camp that created indexes only when data proves so. Sure, you can predict usage patterns, but since indexes can consume a lot of memory, I would have rather created them when I knew they were necessary. Big mistake here.

The first type of index that we forgot was just a plain old btree on a field that was queried regularly. An index like this can be created easily if nobody is writing to the table or downtime is feasible. This is high availability scaling, so downtime is not an option, and everything was writing to this table. Since the activity table was experiencing extremely high writes, concurrently building these indexes would never finish. While an index is being built concurrently (that is, without downtime), new records in the table are also added to the index. If the speed by which new records are added outpaces the speed by which Postgres can index hundreds of millions of existing rows, you're shit out of luck.

The solution? If downtime is not an option, you'll have to build a chokepoint in your application. All writes to a particular table must be funneled through this chokepoint so that if you want to stop writes, you constrict the chokepoint. In our case, we are using Sidekiq. We use Sidekiq jobs as our chokepoint, which means that if we ever want to stop all writes to the activities table, we spin down all Sidekiq workers for the queue that pertains to activity writes. Unworked jobs would get backed up and remain idle until we spun the workers back up, hence preventing writes to the activities table. Doing this for a couple minutes endowed Postgres with enough breathing room to work hard on building the index from existing records. Since Sidekiq jobs run asynchronously, this should have little impact on users. In our case, the worst that would happen is a user creates a post, refreshes the page, and sees that the post is not there because the activity record was not yet created. It's a tradeoff we made to keep the app highly available.

Situations like this are actually not the worst of it. The absolute worst is when you forget a unique index. Now your data is corrupt. We forgot a unique index, oops. When the level of concurrency necessary to run a rapidly scaling app reaches a point where you can't decipher whether a job is good or fallacious, you need to rely on your database's ACIDcharacteristics. This is why Postgres is awesome: something will happen once and only once regardless of concurrency. If two jobs try to accomplish the same thing in parallel, Postgres will ensure only one of them wins. Only if you have a unique index.

An astute reader might ask, "well, why would two jobs try to accomplish the same thing?" Let me quickly explain. It all stems from one bad piece of data, that when used, creates more bad data. For example, we didn't have a unique index on the relationships table. So, I could technically follow another user twice. When the user I follow creates a new post and it becomes time to ask, "who should receive this post in their feed?", if you're relying on the relationships table to answer that question, you're relying on bad data. The system will now create two duplicate activities. This is just one reason for duplicate jobs. Others include computers being stupid, computers failing, and computers trying to fix their own stupidity and failures. Fixing the source of the bad data, the non-unique relationships, was a great launching point towards stability.

So many of our scaling choices were derived from not having a unique index. It was crippling. Firstly, you can't create a unique index with non-unique values in the table. Just won't happen. You need to first remove duplicates, which is terrifying. You're deleting data, and you better hope you're caffeinated enough to do it correctly. I also recommend 48 hours of sleep before attempting. What constitutes a duplicate depends on the data, but this Postgres wiki page on deleting duplicates is an excellent resource for finding them.

So, you delete duplicates, great. What about the time between deleting duplicates and adding a unique index? If any duplicates were added in the meantime, the index won't build. So, you start from square one. Delete duplicates. Did the index build? No? Delete duplicates.

Lesson 3: Sharding is cool, but not that cool

We sharded ourselves five times. Tee hee hee. Laugh it up. Sharding is so cool and webscale, we did it five times. I mentioned earlier that a lot of our scaling choices derived from the lack of a unique index. It took us two months to build a unique index on the activities table. At the point when the index was built, there were about a billion records in the table. Sharding would reduce the write traffic to each database and ease the pain of most tasks, including building a unique index.

For completeness, I want to define sharding. Like most things in software, sharding has conflated definitions, but here's mine. Sharding is the process of taking one large thing and breaking it into smaller pieces. We had one, large 750M record activities table that was becoming unwieldy. Prior to breaking down the activities table, we moved it out of our primary database (with users, posts, etc) into its own database, also a form of sharding. Moving it to a different database is horizontally sharding, breaking up a single table is vertically sharding or partitioning. We received recommendations from highly respected parties to think about vertically sharding when our table reached 100GB of data. We had about 200GB. We don't follow rules well.

I won't detail our sharding setup right now, but will mention that it took a lot of planning and practice to nail down. We used the Octopus gem to manage all of our ActiveRecord connection configurations, but that's certainly not the extent of it. Here are some articles you might find interesting: a general guide with diagrams, Braintree on MySQL, andInstagram on Postgres.

When sharding, say we have database A that is progressively slowing and needs to be broken down. Before sharding, users with IDs modulus 0 and 1 have their data in database A. After sharding, we want to make users with IDs modulus 0 continue going to database A and modulus 1 go to a new database B. That way, we can spread the load between multiple databases and they will each grow at roughly half the speed. The general sharding process is this: setup a new replica/follower database B, stop all writes to A, sever the replica (A and B are now two exact duplicate dbs), update the shard configuration so some data goes to A and some to B, resume writes, prune antiquated data from both A and B.

So cool, I love sharding.

Many highly respected and extremely expensive people told us we needed to shard. We trusted them. We planned out multiple approaches to sharding and converged on the technique outlined here, sharding by user ID. What nobody cared to consider was what would happen after we've sharded. We thought there was a pot of gold. Nope.

We sharded for two reasons: so we didn't hit a ceiling while vertically scaling our Postgres boxes. And so our queries would perform better because we had less data in each shard after the prune step. Let's address the prune step.

In the example above, since data for users with ID modulus 1 are no longer being stored or referenced in database A, we can safely remove all of their data. You're going to need a second pair of underwear. The simplified query for pruning database A is, "delete all records for users with ID modulus 1". The inverse is done on database B. In our case, we ended up removing almost exactly half of the records for each additional shard we created. This was our plan: if each time we shard, the databases store half the data, we need half the Postgres box to serve the same data.

Imagine we have four records in database A before sharding and pruning: [ W | X | Y | Z ]. After sharding and pruning, database A might look like this: [ W |      | Y |      ]. Database B might look like this: [      | X |      | Z ]. Notice the gaps. This equates to hard disk fragmentation. This started biting us in the ass and would have likely made our lives hell if we didn't already have other tricks up our sleeves.

If database A looks like this: [ W |      | Y |      ]. When I ask "give me all records for user ID 0", it should return W andY. But W and Y are not in contiguous places on disk. So in order to service this query, Postgres must first move the disk to W, then move the disk to Y, skipping over the gaps in between. If W and Y lived next to each other on disk, the disk would not have to work so hard to fetch both records. The more work to be done, the longer the query.

Generally, when new data is added to the table, it's put in contiguous slots at the end of the disk (regardless of the user ID). We then ran a VACUUM ANALYZE on the table. Postgres now says, "oh, there's space between W and Y, I can put new data there!" So when new data is added and then fetched, Postgres needs to spin all the way back to the beginning of the disk to fetch some records, while other records for the same user are at the end of disk. Fragmentation coupled with running a VACUUM ANALYZE put us up shit creek. Users with a lot of activity simply couldn't load their feeds. The only sanctioned way to fix fragmentation is hours of downtime.

Ok, I hope you're still with me. The solution and lesson here are important. Firstly, if our Postgres boxes were on SSDs, maybe fragmentation wouldn't have been such a big deal. We weren't on SSDs. The solution for us was to build a covering index so that we could service index-only scans. Effectively, what this means is that all fields used to filter and fetch data from a table must be stored in an index. If it's all in the index, Postgres does not need to go to disk for the data. So we added a covering index for our hottest query and saw about a 100x improvement on average, up to 7000x improvements for users with a lot of activity.

The lesson here is twofold. Serving data from memory is exponentially faster than serving from disk. Be leery of serving data from disk at scale. The second lesson is equally important. We probably should have just scaled vertically as much as possible. Webscale was too sexy to avoid. "Shard all the things" is the meme I'm looking for. Sharding was challenging and a better long-term solution, but had we applied a covering index for the whole entire table before doing any vertical sharding, I believe we could have saved tons of time and stress by simply adding more RAM as our database grew.

Lesson 4: Don't create bottlenecks, or do

Early on, we made a decision that would have a profound affect on how we scaled the platform. You could either see it as a terrible decision or a swift kick in the ass. We chose to create an Ello user that everyone automatically followed when they joined the network. It's pretty much the MySpace Tom of Ello. The intention was good; use the Ello user for announcements and interesting posts curated from the network, by the network. The problem is most of our scaling problems originated from this user.

All of the scaling issues that would have been irrelevant for months or years were staring us right in the face within the first month of having a significant user base. By automatically following the Ello user, it meant that just about all users would receive any posted content from that account. In effect, millions of records would be created every time the Ello user posted. This continues to be both a blessing and a curse. Database contention? Ello user is probably posting. Backed up queues? Ello user is probably posting. Luckily we control this account, and we actually had to disable it until sharding was complete and unique indexes were built.

What seemed like a benign additional at the time ended up having prodigious impacts on how we scale the platform. Posting to the Ello account puts more load on the system than anything else, and we use this to keep tabs on our future scaling plans. Culturally, it's important for us to be able to post from the Ello account. Technically, it's a huge burden. It means that we need to scale the platform in accordance with one user, which is silly. But in retrospect it's a godsend for keeping us on our toes and being proactive about scaling.

It makes me wonder if on future projects, it would be a good idea to implement the equivalent of the Ello user. Through induction of pain, we have a better infrastructure. So the lesson here is: if the platform must stay ahead of impending scaling challenges, it's probably a good idea to self-inflict the problems early and often.

Lesson 5: It always takes 10 times longer

In the above sections, I managed to breeze through some difficult scaling lessons. Caching, sharding and optimizing are non-trivial engineering objectives. Thus far, I've been taken aback by just how difficult these endeavors end up being in practice.

Take caching the graph in Redis as an example. Going into it, it felt like something that could have been accomplished in a few days. The data's there, all we need to do is put it in Redis, and start directing traffic to Redis. Great, so step one is to write the scripts to migrate the data, that's easy. Step two is to populate the data in Redis. Oh, that's right, there are tens of millions of records that we're caching in multiple ways. Well, it'll take at least a couple hours to work our way through that many records. Yeah, but what about capturing the data that was inserted, updated and deleted within those two hours? We have to capture that as well. Better not have a bug in the migration process or there goes a few days of your life you'll never get back.

The sheer amount of practice alone for sharding can't be accounted for with a point-based estimate. Don't mess it up or you'll lose millions of peoples data. No pressure. But say you've practiced enough to get comfortable with the process and you're confident it will go well. Things will always arise. We added more shards and realized our pgbouncer pool size was maxed out. Since the system was live and new data was being written to the new shards, we couldn't revert the changes or we'd lose data. We had to figure out on the fly that the non-intuitive errors meant we needed to increase the pool size. We didn't predict that disk fragmentation was going to be a huge problem, either, and ended up becoming a top priority.

While trying to apply a unique index to the activities table, who would have thought there were so many duplicates? The initial strategy was to attempt to build the index, and when it failed, let the error message tell us where we had to remove duplicates. Building an index is slow, duh, that won't scale if we have to attempt to build the index thousands of times. Ok, so write a query to remove the duplicates first. But wait, you can't just execute a blanket query a billion records, it will never finish and potentially acquire heavy locks for hours at a time. Ok, so page through all the users, and scope the query so it only removes duplicates for a subset of users. That works, but unfortunately there were a ton of orphaned rows for users that no longer existed. So while paging through all the users that currently exist, the query is not deleting records for users who no longer exist and for some reason have orphaned records. Ok, so write a query to remove all activities for orphaned users. But wait, since the activities table doesn't live in the same database as the users table, you can't join against the users table to determine which activities records are orphaned. Not that that would scale anyway.

Sorry for rambling, but you get the point. The lesson here is for your mental health during a time of rapid scale: plan on everything taking 10 times longer than you anticipate. It just will.

Closing thoughts

You may have noticed a recurring theme within this article: the quantity of data is a big offender. There are other scaling challenges including team size, DNS, bot prevention, responding to users, inappropriate content, and other forms of caching. All of these can be equally laborious, but without a stable and scalable infrastructure, the opportunity for solving them diminishes.

Really, the TL;DR here is pretty short: cache aggressively, be proactive, push data into memory and scale vertically, highlight scaling bottlenecks sooner than later, and set apart plenty of man hours.

Oh, and enjoy the ride.

Posted by Mike Pack on 12/22/2014 at 09:37AM