Tim Bauer’s Running Thoughts

Semi-daily webcast summaries/insights

Jeff Bezos (AMZN): Psst. Buddy. Want a Server? (AWS)

My life, as of late, is Amazon Web Services (AWS).

You see, my current client has me running their migration onto the AWS stack leveraging the AWS services around FPS, Customer, and Credit Card Management. So, I hadn’t been motivated to watch discussions on it … but when 2-3 people emailed me articles around AWS this week as fodder for the blog … I figured it was a sign.

So, as usual my raw notes are below, but here are the thoughts that jumped out at me as I ran (i.e. jogged) along.

Details Notable Points
Title/Link:

Duration:

  • ~45m

Speakers:

Recommend to Watch? Yes

  • The best part of this presentation was Jeff’s quick shift to Q&A. He spent 10 minutes overviewing AWS and then took questions for 30. The questions were excellent and off the beaten path.
1. Is that “RED” line really red … or more yellow?

  • While I am not a doubter that AWS isn’t seeing a huge uptake in usage the stats they are using to prove it are rather interesting. Bandwidth? What might be more interesting is instances running or # of service calls made. Reality is that S3 is a large chunk of that growth as many customers are using it for large file storage. It takes A TON of transactions to equal the movement of 1 big video file or image and file storage isn’t that eye opening. Lets see the states on instances and/or service calls.

2. That jab aside, AWS is the best I see right now for Enterprise Customers …

  • The recent release of Elastic IP Addresses + the ability to soon have persistent storage make it viable to start thinking maybe 09 to pilot actual production applications running at AMZN (granted low risk apps to start). If you still don’t want to put 100% of an application out on their gear, unlike Google’s GAE, you can choose how much or how little you use of their stack.

3. More on the power of choice AMZN gives you

  • You can look at your app and chunk out pieces of the infrastructure like storage (S3), queues, or the servers themselves. Even more interesting is what they are in pre-release on … services to allow aspects of your systems to be hosted with them. Your customer records, credit card records (no liability of a leaky credit card security model), product, orders, payment processing, warehousing systems. All at a granular service level (not an application). Roll your own. Something to ponder for 09 or 2010 as I suspect it will stabilize later this year or early next.

So while the AMZN AWS offering has been focused on solving spike CPU usage problems for clients for the most part up till now (like running massive models over night) I suspect they are close to stabilizing an offering for those 7X24 applications (yes, I know you can kinda do it today). Bezo’s comment around package vendors assessing hosting models w/ AMZN where images of servers running their gear can be rented was an example of this (think of a server running MSFT Office or Photoshop when you need it across all your various PCs).

** START OF RAW SCRIBBLE TAKEN WHILE RUNNING **

• http://omnisio.com/startupschool08/jeff-bezos
• 4/23/2008, 6:12 AM
• Bandwidth charts
…………….○ Blue line (AMZN)
…………….○ Red line (bandwidth w/ AWS)
…………………………..§ BAUER COMMENT - S3 could skew that #, service call metrics would be more interesting
• Offerings
…………….○ AWS
…………….○ S3
…………….○ Ec2
…………….○ Simpledb
…………….○ Sqs
…………….○ Fps
…………….○ Mechnical turk - get pieces done manually by work queues amzn enables
• 4/23/2008, 6:14 AM
…………….○ Goals - easy, fast, elastic, highly avail, pay by drink
• S3
…………….○ 18B objects in S3
…………….○ Double in 2 years in AWS developers (370k)
• Why are people excited
…………….○ Shows picture of electric power generator
…………………………..§ BAUER COMMENT - similar story to Big Switch book
…………….○ How people did electric by themselves
…………….○ How to … Go from idea to product … quickly …
…………….○ Costs of doing data center - server, contracts, bandwidth, purchasing, facilities, scaling, etwork, heterogeneous, legacy, large teams
…………….○ Vision releases occur over and over
…………….○ AMZN helps with the above
• 4/23/2008, 6:19 AM
• Examples who are using
…………….○ NY Times took archives back to 1851 (4 TB) to PDF
…………………………..§ AMZN provided virtual server for one time batch
…………………………..§ AMZN provided S3 to serve articles
…………………………..§ Search for computer 1892 reference …. Was a job type
…………….○ SanDisk
…………………………..§ USB drive w/ auto backup to S3
…………….○ Animoto
…………………………..§ Royalty free music + videos … program listens to music and auto edits photo and tweens to align … then you can regen till you like it
…………………………..§ Queue, s3, ec2 used
…………………………..§ Viral facebook … over 3 days ramps to 3500 server instances
…………………………………………□ BAUER COMMENT - KEY FOR VIRAL APPS, ? ON NEED FOR SLOWER ADOPTION MODELS
…………………………..§ Talk to graph of growth and how ec2 can spin up/down instances based on load (tailoring cost as you go)
…………………………..§ BAUER THOUGHT - TITLE - AMZN, MOVERS NEEDED
• 4/23/2008, 6:27 AM
• End point
• AMZN - They make electricity so you don’t have to
• Q&A
…………….○ 4/23/2008, 6:28 AM
…………….○ How did AWS idea get started?
…………………………..§ 4 yrs started idea
…………………………..§ Launch 1st service 2 yrs ago
…………………………..§ Internal issue 1st, abstracting apps teams from infra team internally
…………………………………………□ BAUER THOUGHT - MAKES SENSE FROM WHAT I SEE AT CURRENT GIG (WAY THEY ARE WRITTEN, ETC)
…………………………..§ Didn’t expect this level of traction this early
…………………………..§ 4/23/2008, 6:32 AM (restart, omnisio dropped)
…………………………..§ New customer set … buyers, sellers, and now (new) developers
…………….○ 4/23/2008, 6:33 AM
…………….○ Concerns around service interruption that happened 2 months back. How handle. How do people respond. Hard to get off gear?
…………………………..§ Low friction to switch off platform, no contracts.
…………………………..§ API’s simple … easy to switch
…………………………………………□ BAUER COMMENT — IT CAN BE PIECEMEAL
…………………………..§ Low latency, cost
…………………………………………□ BAUER COMMENT — Agree on the 1st point being a key in this arch. Working on that now. Roundtrips in system design are the key … not a specific interface latency. Logical unit of work
…………………………..§ Main change they made for up time
…………………………………………□ Improve communication around outtage or brown out
…………………………………………□ Goal is 100 Up (TP100)
…………….○ 4/23/2008, 6:37 AM
…………….○ Customer serving hospitals. Can’t provide AMZN due to patriot act if cross boundries. Where data resides.
…………………………..§ EU has data rules (medical data no cross borders)
…………………………..§ Availability zones is one answer (choose where you host globally)
…………………………………………□ Fault tolerant
…………………………………………□ Address data location (today all in US, expand globally)
…………….○ 4/23/2008, 6:39 AM
…………….○ Runs rails apps on AWS, spammers are ruining IPs
…………………………..§ Doesn’t know the answer
…………………………..§ Static IP solution (elastic IP) can solve the problem (but requires additional server) allows the external world to see a consistent IP
…………….○ 4/23/2008, 6:40 AM
…………….○ Hosting needs stable, EC2 pricing <> to traditional server hosting in a data center (like rackspace)
…………………………..§ He doesn’t agree
…………………………..§ Might be anomalies today … but goal is to be the low cost provider to customers
…………………………..§ Instance sizes may get more granular (they price by cpu power, etc in instance)
…………….○ 4/23/2008, 6:42 AM
…………….○ Whats it like to launch a rocket? (personal question to him)
…………………………..§ He Is funding a company for a trip to sub orbit … blue orbital … lands on tail like buck rogers
…………………………..§ 1st dev vehicle done
…………………………..§ 2nd in process
…………………………..§ One more after that
…………………………..§ Then take public
…………………………..§ Are a ec2 customers … do aerodynamic models on ec2 cloud …
…………………………………………□ BAUER COMMENT - Another example of high volume spike processing … keep in mind elastic IP offering came just in last 3-4 week … coupled with movement to attached storage drives to an instance …. so ability to host stable apps over time being viable for enterprise is new (granted you could hack it together).
…………….○ 4/23/2008, 6:45 AM
…………….○ Thoughts on Google’s offering
…………………………..§ They don’t talk about other companies
…………………………..§ They innovate more by focusing on the customer not the competitor
…………………………..§ That is why they have their policy
…………………………..§ AWS is unique is that its deep in the stack. API’s exposed like storage, queues,
…………………………………………□ BAUER COMMENT - Also services payment, customer, product, orders, checkout, etc
………………………….§ Give people control on knobs on sophis applications
…………………………..§ Not a winner take all space … there will be a lot of winners
…………………………..§ Rare for notable change to be driven by one company … so customers will find many
…………………………..options.
…………….○ 4/23/2008, 6:48 AM
…………….○ AWS Shared Infrastructure, can’t control spikes on ’shared’ infra … what if 10-100 customers go viral … others get impacted … how resolve
…………………………..§ Key question is how to manage risk
…………………………..§ Key is better than alternative (up time @ shared is better than internal) … inability to handle spikes .. Turn the question around .. How does your business handle spikes
…………………………..§ By aggregation … Pool is large can handle … so built for far larger spikes
…………………………..§ They manage averages of all AWS dev pool
…………………………..§ Insurance company example
…………………………..§ Onus on them to prove they can consistently scale and spin up/down servers … what is satisfaction rating
…………….○ 4/23/2008, 6:52 AM
…………….○ SLA promise from AMZN
…………………………..§ Not right now but they could look into it
…………….○ 4/23/2008, 6:53 AM
…………….○ Thoughts of spinning up a ERP instance … fully running
…………………………..§ He is talking about configured servers … license on go
…………………………..§ Vendors creating an AMI to run their stuff for awhile … servers setup
…………………………..§ BAUER COMMENT - Note that is somewhat available by the ERP vendors today
…………………………..§ BAUER COMMENT — Enables a ondemand sell model for vendors
…………….○ 4/23/2008, 6:55 AM
…………….○ Vendor able to sell hosted model to companies that ran internal in the past
…………………………..§ Yes, audit is key for the customer
…………………………..§ Application changes … encrypt data prior to storage
…………….○ 4/23/2008, 6:56 AM
…………….○ How is AWS going to enable the mobile platform
…………………………..§ They can see the mobile devices taking up share on their browser market
…………………………..§ So you will see more AWS tied focused on AWS
…………………………..§ BAUER COMMENT - They bought TEXT BUY to cover this (payment device like paypal via mobile phone)
…………….○ 4/23/2008, 6:57 AM
…………….○ 100 Domain limit on simpleDB. When lift?
…………………………..§ Not clear. Validating things.
…………………………..§ Exceptions could be handled by a call to them
…………………………..§ Availability is their priority #1
…………………………..§ Adam Slipsky (contact him)
…………….○ End

** END RAW SCRIBBLE TAKEN WHILE RUNNING **

April 23, 2008 Posted by bauertim | 2-Perhaps (what floats your boat?) | , , , | 2 Comments

Cal Henderson (Flickr): Automate, Automate, Automate

If you are wanting to hear how Flickr develops and manages their website and client software this is the webcast for you. That or if you want to see a picture of a kitty (you get 2). Doesn’t that cover everyone?

As usual my raw notes are below, but here are the thoughts that jumped out at me as I ran (i.e. jogged) along.

Details Notable Points
Title/Link:

Duration:

  • ~45m

Speakers:

Recommend to Watch? Maybe

  • The material was a bit dry but Cal brought out some points, to me, at various stages at the talk that were hidden gems. If you own similar responsibilities in your patch of the world I would watch this (and his other discussions around scaling).
1. Commit all the time … to PROD ?!?! AMEN!

  • Are your hands sweating yet? Cal talked about how @ Flickr they are constantly pushing to PROD as in daily, hourly, etc. They do it via a scheme of configuration settings that enable and disable functions / features on specific boxes regions. In effect, there is latent code in PROD at flickr growing till they activate it. Amusing as I am working with a client right now where I was pushing this approach and getting push back. At least I don’t feel totally off my rocker now. Cal has my back.

2. Controls Via Yelling, Then IM, Then Something …

  • He talked about how Flickr evolved from just bellowing in their area on greenlighting a push to PROD, to IM, then to a tools based approach that they built. I found it to be good counsel. People tend to try and over automate early and bog themselves down. The key is making the pushes to PROD simple … which in small teams is a bit of talking in many cases.

3. SVG Lover … What No FLEX?

  • He gave love to SVG as their approach to charting, trending various statistics on their rigs @ Flickr. But, i assume, he probably should / would take a gander at FLEX if he had to do it over again as it doesn’t have the support issues of SVG (no Batik) and a far more powerful charting library.

4. Admin Everything … God Mode

  • He showed how they have “God Mode” on all pages in Flickr. From what I could tell … they enable a process where an admin can go into a page and see all the system objects supporting it AND edit them. Very nice. Stellent does a similar approach w/ their content management (a few keystrokes and you can edit a page you are viewing). Flickr’s God mode is just more technically focused (database tables, config tables, localization, etc .. all relative to a page).

The talk overall started slow but the points above coupled with his discussions of the toolset in play @ Flickr at the end made it well worth listening to.

** START OF RAW SCRIBBLE TAKEN WHILE RUNNING **

• Did a talk at Webstock … Cal Henderson – arch for flickr
……………..· http://www.iamcal.com/talks/
……………..· his site à http://www.iamcal.com
……………..· his twitter à http://twitter.com/iamcal
• Notes
• Building Big on The Web
• 4/22/2008, 6:08 AM
• Flickr
• Usually he talks about scaling
• Not today
• Today talks about how to build
……………..· Interactive systems
• Over focus on process
……………..· It is important
……………..· XP, Waterfall, Agile, Scrum
……………..· He doesn’t care as much about process (methodology)
• 4/22/2008, 6:11 AM
• Don’t have methods slow down teams
• Todays talk is about tools … what they use and why
• 4/22/2008, 6:12 AM
• Old ways of tools
……………..· Txt editor, vi, emacs
……………..· People still use this … especially personal sites
……………..· Bigger sites can’t do it that way
• More tools w/ bigger
……………..· Release Management
• 4/22/2008, 6:16 AM
• Continuous Integration
……………..○ Martin Fowler
……………..○ Work, commit immediately, trigger test
……………..○ Update … get changes
……………..○ Test constantly
• Tests Are Good, Tests Are Dull
……………..○ On average test coverage is very small (dull, hard to keep up)
• 4/22/2008, 6:17 AM
• How deal w/ that … automate everything
……………..○ Automate tests to hit the trunk
• Mozilla’s Tinderbox
……………..○ Aggregate automated tests on clients … and see in one place
……………..○ Shows time new to old by build (y axis) … by machine (column, x)
• Flickr’s Tinderbox
……………..○ Run the test, about 1000 items, results on web page
……………..○ Wrap test cases on stuff that is most brittle … so most likely or core
……………..○ Run once an hour
……………..○ Email on failure
• Version Control = Blame
……………..○ They email on who changed code since last successful build
……………..○ Force of peer pressure to get fixes
• Continuous Production
……………..○ Example of glass, how it has to run continously to work
……………..○ Flickr calls it continuous deploy … constantly release their software to PROD
…………………………….§ BAUER COMMENT THERE IS A CONTENTIOUS POINT
• Process typical
……………..○ Dev -> qa -> stage -> prod
……………..○ 4/22/2008, 6:23 AM
……………..○ Reality … no QA … dev->stage->prod
…………………………….§ For medium to large sites
• Flickr process
……………..○ Dev, Alpha environments
……………..○ Version control line
……………..○ Staging beta1 beta 2
…………………………….§ Pull from version control
……………..○ Prod
…………………………….§ Comes from staging
• Feature flags, avoid branches
……………..○ Weird feature of flickr
……………..○ Avoid branching
……………..○ The more differences … the harder to merge
……………..○ He is against it based on that
……………..○ New features based on config flags … turn on/off features … environments flags control features what environment works
• Shrinkwrap-ware
……………..○ Process –> alpha, beta, rc, ga
…………………………….§ Rc — close to good enough
…………………………….§ Ga - golden master
……………..○ Box process also adds –> RTM (to cover boxing) … comes before GA
……………..○ Flickr uploader is of this type
…………………………….§ Alpha -> beta -> GA -> Push
……………………………………………□ Push, release force upgrade in PROD by users
• 4/22/2008, 6:27 AM
• Release tools
……………..○ Agile
…………………………….§ Release to PROD quick is the tools we need to enable that
…………………………….§ Makes releases to PROD simpler
…………………………….§ Many times a day / hour
……………..○ One tool –> Yelling between people
…………………………….§ Their first version …
…………………………….§ 2-3 people
……………..○ Another –> Via IM
…………………………….§ Scales a bit larger
……………..○ Deploy Log –> Web page
…………………………….§ Shows lines of change …
…………………………….§ Type into
…………………………….§ Shows tail of a file
…………………………….§ On deploy tools page
• Public deploy log
……………..○ Show beta site code.flickr
……………..○ Follow who breaks what
……………..○ Shows what people are up to
……………..○ Show people Flickr is working
……………..○ Public?
• 4/22/2008, 6:31 AM
• Staging tool
……………..○ Assemble lang pieces
……………..○ Put pieces on staging for testing
……………..○ A whole bunch of text on page
……………..○ Button -> perform staging for end to end process to run … key is one button
…………………………….§ Should be one script
• 4/22/2008, 6:33 AM
• Compile
……………..○ Build web interfaces quickly
……………..○ Ajax checks on compile status
……………..○ Look at file on disk check on progress
……………..○ See on deploy page on where they are in compile
• 4/22/2008, 6:33 AM
• Deploy system
……………..○ Single button again
……………..○ Press (if allowed)
……………..○ Done
……………..○ Button does 300 things … logs success / failure
……………..○ One touch deployment
• What changed from last deploy
• Config deploy
……………..○ Config files
……………..○ Flags change a lot in config files
…………………………….§ Bauer comment - configuration management
……………..○ Process to manage configuration changes
……………..○ Form, edit file … hit button … deploys to PROD boxes
……………..○ Things they do a lot they change to scripts
…………………………….§ Bauer comment — a lot is relative to process … if you are not agile like them what you do a lot is drastically different (so you might automate the wrong thing and think you are fine tuned …. But you forgot the re-engineer step)
• 4/22/2008, 6:37 AM
• Mozilla AUS - Auto Update Services
……………..○ Pings URL from desktop … gets if new version avail and downloads as plugin
……………..○ Http check …
……………..○ Not dependant on using mozilla … they hit the servers of mozila not the client
……………..○ Update scripts …
• 4/22/2008, 6:39 AM
• Development Process
……………..○ Bug Tracking
…………………………….§ Simple summary
…………………………….§ Not 25 fields like @ yahoo! Bug report … short /long tickets … training .. Egads
…………………………….§ Flickr … simpler
……………………………………………□ 2 fields … title / desc
……………………………………………□ Sits on top of a powerful system but doesn’t expose that to reporters
…………………………….§ Track projects
……………..○ Source control viewer
…………………………….§ UVC
…………………………….§ Diffs, Blame Log,
…………………………….§ Critical
…………………………….§ Track to track bugs … then use tracks on source browser
…………………………….§ Link mailing list to source control … mail w/ links to diff viewere
……………………………………………□ Could also do rss feed by dev
…………………………….§ LXR / Indexers
……………………………………………□ LXR - Linux Cross Referencer … Theory … looks at source code and looks at bits of it and see where it is used across the application …. Click on class name … find definition
• 4/22/2008, 6:43 AM
• Maintenance
……………..○ Monitoring
…………………………….§ Nagios. Ugly but awesome. Servers and services up / down
……………………………………………□ Most big sites use
…………………………….§ Ganglia - Gather stats on bits of apps, servers, services
……………………………………………□ Used a lot of @ Flickr
……………………………………………□ Overview by data center
……………………………………………□ Server drill down from data center
……………………………………………□ Color coded
……………………………………………□ Graphs on box stats - cpu, memory, etc
……………………………………………□ Zero config (easy setup)
……………………………………………□ Graphing over time , historical records
…………………………………………………………..® Ie hits on a server
……………………………………………□ Open source software
……………………………………………□ RRD - Round Robin Database Tool
…………………………………………………………..® Fixed space, snapshot of data over time … lose resolution as time goes by … auto drop of data long term
……………………………………………□ Stack stacks from RRD and look at trending and relationships of data
……………………………………………□ They build a custom tool to monitor mysql
…………………………………………………………..® They use that
…………………………………………………………..® Look at threads, logs, select performance … in open source program called DV Stats
…………………………………………………………..® Used for performance tuning … slow down trouble shooting
………………………………………………………………………….◊ Look at ganglia for tablelocks for example (pulling from RRD and mysql stuff)
………………………………………………………………………….◊ Immediate bug fix
……………………………………………□ RRD graphs look the same … time based sample data
…………………………………………………………..® Custom … mysql, svg, batik … views via that
………………………………………………………………………….◊ Svg - scalable vector graphics … define via code
…………………………………………………………………………………………► Bauer comment - FLEX knockoff …
…………………………………………………………………………………………► Limited support
…………………………………………………………………………………………► Batik … converts svg to regular graphics
………………………………………………………………………….◊ So they go from mysql -> svg -> batik
• 4/22/2008, 6:50 AM
• God Tools - Admin Tools
……………..○ Comes from GNE - Game Never Ending (company that built flickr built that then flickr)
…………………………….§ Collected paper game … (bauer comment - hrm)
…………………………….§ Actions performed by god … flickr.com/god
……………..○ Example
…………………………….§ So you admin from website … each page has admin pages tied to it
…………………………….§ In context of site
…………………………….§ See a product in site .. Click link and see / edit relevant files, configs, etc in PROD
……………..○ Cache Checker
…………………………….§ See what is database versus what is cache
…………………………….§ Dumps and troubleshoots data differences
……………..○ Customer care
…………………………….§ Help request via customer profile pages
……………..○ API data
…………………………….§ Graphs of various things (svg of course)
……………..○ Localization
…………………………….§ Multiple lang
…………………………….§ Every string has tags to localized
…………………………….§ Looked at 3rd party for localization … all sucked .. So they build their translation management interface
…………………………….§ So by string they translate
……………..○ Admin profile
…………………………….§ Bug tracker
…………………………….§ Flickr account
…………………………….§ …
…………………………….§ Obsessed w/ tools
……………..○ Very large system
…………………………….§ 32k lines of php
…………………………….§ 24k of html
…………………………….§ Update each time ask or do things … automate as they go
• 4/22/2008, 6:56 AM
• Final points
……………..○ Use robots to automate anything you do … single press of button

** END RAW SCRIBBLE TAKEN WHILE RUNNING **

April 22, 2008 Posted by bauertim | 2-Perhaps (what floats your boat?) | , , , , , | 1 Comment