Planet Ruby Last Update: Thursday, 21. February 2019 07:30

Plataformatec @ São Paulo › Brazil - Feb 11

Migrations in databases with large amount of data

There is a discussion that always comes up when dealing with database migrations. Should I use the migrations to also migrate data? I mean, I’ve already altered the structure so it would be easy to change the data by including an SQL as well, and this would guarantee that everything is working after the deploy. ... » 10 days ago

There is a discussion that always comes up when dealing with database migrations.

Should I use the migrations to also migrate data? I mean, I’ve already altered the structure so it would be easy to change the data by including an SQL as well, and this would guarantee that everything is working after the deploy. Right?

It could work, but in most cases, it could also cause a database lock and a major production problem.

In general, the guidelines are to move the commands responsible to migrate the data to a separate task and then execute it after the migrations are up to date.

In some cases, this could also take an eternity when you are dealing with a database with millions of records. Update statements are expensive to the database and sometimes it is preferable to create a new table with the right info and after everything is ok to rename after the right one. But sometimes we don’t want or we simply just can’t rename the table for N reasons.

When you are dealing with millions of records and need to migrate data, one thing you can do is to create a SQL script responsible to migrate the data in batches. This is faster because you won’t create a single transaction in the database to migrate all records and will consume less memory.

One thing to consider when using migration scripts is to disable all indexes in the table, indexes are meant to improve the read performance but can slow down the write action significantly. This happens because every time you write a new record in the table, the database will re-organize the data. Now imagine this in a scenario of millions of records, it could take way too much then it should.

Every database has its own characteristics, but most of the things you can do in one, you can do in another, this is due to the SQL specification that every database implements. So when you are writing these scripts, it is important to always look at the documentation. I’ll be using PostgreSQL in this example, but the idea can be applied to most databases.

Let’s take a look into one example

Suppose we are dealing with e-commerce and we are noticing a slowness in the orders page, and by analyzing the query we notice an improvement can be done by denormalizing one of its tables. Let’s work with some data to show how this could be done.

CREATE TABLE "users" (
id serial PRIMARY KEY,
account_id int not NULL,
name varchar(10)
);

CREATE TABLE orders (
id SERIAL PRIMARY KEY,
data text not NULL,
user_id int not NULL
);

-- Generates 200_000_000 orders
INSERT INTO "users" (account_id)
SELECT generate_series(1,50000000);

INSERT INTO orders (data, user_id)
SELECT 't-shirt' AS data,
       generate_series(1,50000000);

INSERT INTO orders (data, user_id)
SELECT 'backpack' AS data,
       generate_series(1,50000000);

INSERT INTO orders (data, user_id)
SELECT 'sunglass' AS data,
       generate_series(1,50000000);

INSERT INTO orders (data, user_id)
SELECT 'sketchbook' AS data,
       generate_series(1,50000000);

CREATE index ON "users" ("account_id");
CREATE index ON "orders" ("user_id");
SELECT
  "orders"."id",
  "orders"."data"
FROM "orders"
INNER JOIN "users" ON ("orders"."user_id" = "users"."id")
WHERE "users".account_id = 4500000;

The results from this query take about 45s to return. If we run the explain analyze in this query, we will see that the join is taking too long, even though it is a simple query.

One of the things we can do to improve this query is to denormalize the orders table and create another column user_account_id that will be a copy of the account_id column from the users table. This way we can remove the join and make it easier to read the info.

ALTER TABLE "orders" ADD COLUMN "user_account_id" integer;

If we weren’t dealing with such large data, the easiest way of doing it would be to write a simple UPDATE FROM and go on with life, but with this much of data it could take too long to finish.

UPDATE orders
SET user_account_id = users.account_id
FROM users
WHERE orders.user_id = users.id;

Updating records in batches

One way that we will explore in this blog post is to migrate this amount of data using a script that performs the update in batches.

We will need to control the items to be updated, if your table has a sequential id column it will be easy, otherwise, you will need to find a way to iterate through the records. One way you can control this is through creating another table or temp table, to store the data that needs to be changed, you could use a ROW_NUMBER function to generate the sequential ID or just create a sequential column. The only limitation with temp table is the database hardware that needs to be able to handle this much of records in memory.

PostgreSQL: Documentation: 9.3: CREATE TABLE

Lucky us, we have a sequential column in our table that we can use to control the items. To iterate through the records in PostgreSQL you can use some control structures such as FOR or WHILE.

PostgreSQL: Documentation: 9.2: Control Structures

You can also print some messages during the process to provide some feedback while the queries are running, chances are that it may take a while to finish if you are dealing with a large dataset.

https://www.postgresql.org/docs/9.6/plpgsql-errors-and-messages.html

DO $
DECLARE
   row_count integer := 1;
   batch_size  integer := 50000; -- HOW MANY ITEMS WILL BE UPDATED AT TIME
   from_number integer := 0;
   until_number integer := batch_size;
   affected integer;
BEGIN

row_count := (SELECT count(*) FROM orders WHERE user_account_id IS NULL);

RAISE NOTICE '% items to be updated', row_count;

-- ITERATES THROUGH THE ITEMS UNTIL THERE IS NO MORE RECORDS TO BE UPDATED
WHILE row_count > 0 LOOP
  UPDATE orders
  SET user_account_id = users.account_id
  FROM users
  WHERE orders.user_id = users.id
  AND orders.id BETWEEN from_number AND until_number;

  -- OBTAINING THE RESULT STATUS
  GET DIAGNOSTICS affected = ROW_COUNT;
  RAISE NOTICE '-> % records updated!', affected;

  -- UPDATES THE COUNTER SO IT DOESN'T TURN INTO AN INFINITE LOOP
  from_number := from_number + batch_size;
  until_number := until_number + batch_size;
  row_count := row_count - batch_size;

  RAISE NOTICE '% items to be updated', row_count;
END LOOP;

END $;

Given us a message output will be like this until the script finishes:

NOTICE:  200000000 items to be updated
CONTEXT:  PL/pgSQL function inline_code_block line 12 at RAISE
NOTICE:  -> 50000 records updated!
CONTEXT:  PL/pgSQL function inline_code_block line 23 at RAISE
NOTICE:  199950000 items to be updated
CONTEXT:  PL/pgSQL function inline_code_block line 30 at RAISE
NOTICE:  -> 50001 records updated!

After the script finishes, we can create an index in the new column since it will be used for reading purposes.

CREATE index ON "orders" ("user_account_id");

If we ran the EXPLAIN ANALYZE command again we can see the performance improvements.

We can see that before, only the join was taking a little bit more than 7s in the query, approximately 15% of the loading time. If we look closer, we can also notice that the next three lines were related to the join, and after we denormalized the table they were gone too.

You can follow the EXPLAIN ANALYZE evolution here

Hope it helps!

10 days ago

Plataformatec @ São Paulo › Brazil - Jan 28

Custom authentication methods with Devise

In the past, we have been asked to include other authentication methods in Devise (e.g. token-based and magic email links). Although it might make sense to include those for some applications, there is no plan to support them in Devise. But don’t be upset, it turns out you might not need to override Devise’s SessionsController ... » 24 days ago

In the past, we have been asked to include other authentication methods in Devise (e.g. token-based and magic email links). Although it might make sense to include those for some applications, there is no plan to support them in Devise.

But don’t be upset, it turns out you might not need to override Devise’s SessionsController or monkey patch some of its internals. In this article, you’ll learn how to create a token-based authentication for a JSON API by relying on Warden’s features.

Disclaimers

Warden? Huh?

This article will focus on how to include custom Warden strategies in a Rails application that uses Devise. If you want to know more about Warden strategies, I gave a talk last year at RailsConf that explains them in more details.

Show me the code!

The first part of this article will show how to set up a Rails application using Devise. If you want to skip to the token authentication part, click here.

Setup

Create a new Rails application (this example uses Postgres as the database to take advantage of UUIDs to generate the access tokens):

rails new devise-token-based-auth --database=postgresql

Add the devise gem to your Gemfile:

gem 'devise'

Now run the Devise generators:

rails generate devise:install
rails generate devise User

We are going to need a column to store the api_token. For this, we’ll use the pgcrypto extension’s gen_random_uuid() function.

First, create a migration to enable the pgcrypto extension:

rails generate migration enable_pgcrypto_extension

Now edit the migration to call the #enable_extension method:

class EnablePgcryptoExtension < ActiveRecord::Migration[5.2]
  def change
    enable_extension 'pgcrypto'
  end
end

The database is now able to use pgcrypto‘s functions. Now we can create a migration to add the api_token column:

rails generate migration add_api_token_to_users

Now edit the migration like the one below. Notice the default is set to gen_random_uuid():

class AddApiTokenToUsers < ActiveRecord::Migration[5.2]
  def change
    add_column :users, :api_token, :string, default: -> { 'gen_random_uuid()' }
    add_index :users, :api_token, unique: true
  end
end

Don’t forget to create the database and run the migrations:

rails db:create db:migrate

The next step is to create a user using rails console and grab its api_token:

rails console
Running via Spring preloader in process 60784
Loading development environment (Rails 5.2.2)
irb(main):001:0> user = User.create!(email: 'bruce@wayne.com', password: '123123')
=> #
irb(main):002:0> user.reload.api_token
=> "a4839b85-4c96-4f22-96f1-c2568e5d6a7f"

It’s time to create the Warden strategy now!

The Api Token Strategy

Create a file app/strategies/api_token_strategy.rb with the following content:

class ApiTokenStrategy < Warden::Strategies::Base
  def valid?
    api_token.present?
  end

  def authenticate!
    user = User.find_by(api_token: api_token)

    if user
      success!(user)
    else
      fail!('Invalid email or password')
    end
  end

  private

  def api_token
    env['HTTP_AUTHORIZATION'].to_s.remove('Bearer ')
  end
end

In short, the strategy tries to find a user for the token sent in the Authorization header. If it does, it signs the user in. Otherwise, it returns an error. If you are not familiar with the success! and fail! methods, watch the talk on the start of the blog post to get a sense on how Warden works.

Warden needs to know about this strategy. Create a file config/initializers/warden.rb with the following code:

Warden::Strategies.add(:api_token, ApiTokenStrategy)

This allows Warden to recognise that it should call the ApiTokenStrategy when it receives the :api_token symbol.

Authenticating a user

Now it’s time to use the strategy. Create an UsersController that renders the current_user in JSON:

class UsersController < ApplicationController
  def show
    render json: current_user.to_json
  end
end

Don’t forget to add a route for this controller action. Open config/routes.rb in your editor and include the following:

Rails.application.routes.draw do
  devise_for :users
  resource :user, only: :show
end

To require authentication in the controller, the method #authenticate! should be called passing the desired strategy as a parameter:

class UsersController < ApplicationController
  def show
    warden.authenticate!(:api_token)
    render json: current_user.to_json
  end
end

You can see that this works using a simple curl request:

curl http://localhost:3000/user -H 'Authorization: Bearer a4839b85-4c96-4f22-96f1-c2568e5d6a7f'

{"id":3,"email":"bruce@wayne.com","created_at":"2018-12-26T13:45:37.473Z","updated_at":"2018-12-26T13:45:37.473Z","api_token":"a4839b85-4c96-4f22-96f1-c2568e5d6a7f"}

It is also possible to define :api_token as a default strategy so that it’s called when no strategy is passed as a parameter. Add the following code in the config/initializers/devise.rb file:

Devise.setup do |config|
   # The secret key used by Devise. Devise uses this key to generate...
   config.warden do |manager|
     manager.default_strategies(scope: :user).unshift :api_token
   end

# ==> Mountable engine configurations...
end

This will add the :api_token strategy in the first position, followed by Devise’s default strategies (:rememberable and :database_authenticatable).

Now it’s possible to use Devise’s #authenticate_user! helper, and the :api_token will still be used:

class UsersController < ApplicationController
  before_action :authenticate_user!

  def show
    render json: current_user.to_json
  end
end

Summary

And… we’re done! The focus here was to show how to include custom Warden strategies in a Rails application. The example was straightforward but you can follow this structure to create custom authentication logic to suit your application’s needs.

The entire application used in this article can be found in GitHub.

24 days ago

Plataformatec @ São Paulo › Brazil - Jan 14

Working with distributed teams

According to the Harvard Business Review, one of the biggest difficulties for teams today and in the future is the distance factor. With each passing day, it’s becoming more common to work with remote teams and that creates a big communication barrier. Being able to communicate properly is already a very complex subject, so we ... » 1 months ago

According to the Harvard Business Review, one of the biggest difficulties for teams today and in the future is the distance factor. With each passing day, it’s becoming more common to work with remote teams and that creates a big communication barrier. Being able to communicate properly is already a very complex subject, so we should try to avoid any obstacles that would make it even harder.

To be able to work well with distributed teams, we need to pay very close attention to our communication, in order to always strive to convey our message as clearly as possible, and in doing so, avoid having different views regarding the same subject.

In addition to effective communication, we also need to efficiently use the tools that are available for this work context. I will be listing below some tips and good practices that we have been using here at Plataformatec to extract the most value from those tools.

COMMUNICATION

comunicação

The first group of tools, and by far the most used, is video conferencing. These tools have an added value when compared to e-mail as they make it easier to ask questions and so bring a shared understanding about a subject. Another very important merit to using video conference is the ability to use body language, to better explain a point. The tools I mostly use are Google Meet and Hangouts. Besides these two, I also recommend Appear.in and Zoom.us.

Some important tips whenever participating in a video conference call:

  • Always remember to turn off the microphone when not talking, because this prevents your background noise from disturbing the meeting;
  • Try using a headphone instead of using the notebook’s microphone and sound, in order to improve the sound quality;
  • When talking using a webcam, try to look into the camera instead of the screen where your image is being shown. This will increase the engagement with your audience and with this, your power of influence.
  • Establish a clear objective for the meeting and try to send the meeting agenda to everyone beforehand.
  • If you have a group in a single room, always keep a webcam on in a way that everyone can be seen. That will help to identify who is talking and avoid parallel conversations from arising between participants in the same room. If you notice parallel conversations, ask the subject to be shared with everyone, this will avoid communication silos and loss of information.

Although less effective than a video conference, some instant message tools are still preferable over e-mail, due to the faster response speed and ease to signal when something is not clear. The tool that I’ve been using very successfully is Slack.

CO-CREATION

One of the hardships of distributed teams is to make sure that every team member is aligned concerning the objectives to be achieved. One way to reduce this problem is to approximate the team members in the moments that the objectives are planned, and for that we need tools that enable the interaction of people who are remote.

A tool I like a lot is Draw.io, it’s a free online tool for diagram creation. The tool allows multiple people to work at the same time and, thus, collaboratively. It’s a great tool to arrange a Brainstorm session or a Retrospective ceremony, for example.

Another tool which I find interesting is GroupMap, it is not free, but has the advantage of allowing many users to work at the same time on the same board and edit it.

In addition to the two mentioned above, it’s also worth taking a look at the RealTimeBoard, which in turn has a trial and a paid version.

Some tips to run sessions using those tools:

  • Ask the person who is changing something in the board to tell what they are doing and why, this avoids things being moved without the whole team knowing the reason;
  • Elect someone to be the meeting facilitator, who will keep the meeting focused on its objective. Since these tools allow everyone present to change anything in the board, it’s good to have someone to keep the group focused;
  • Always use some video conference tool together with the co-creation tool, in order to facilitate and improve the communication;
  • Ask the participants to use their own computers during the session, even if they work close to other members, in order to improve the collaboration and engagement of everyone.

Other tools that I use a lot in order to work remotely is the Google Drive suite of applications. They are great to share documents, spreadsheets and presentations.

WORK ORGANIZATION

Another problem distributed teams face is the matter of prioritizing tasks and who should do what. To solve this problem, we need tools that can, besides showing the activities to be done, also show clearly what is already being done and thus avoid that two people do the same work.

For that, I recommend using Trello, as it’s an easy-to-use tool, free and that can be used regardless of how the team is organized.

For teams working on digital products and software, I recommend Jira, for providing reports and graphs of the progress of work and for the greater amount of integration with other software, for example, Draw.io mentioned earlier.

CONCLUSION

Although working remotely is increasingly becoming commonplace, we need to think about how we can mitigate issues that may arise. With the correct use of the tools we have available today, it’s possible to achieve greater engagement and increase team productivity.

Using the tools and tips mentioned above, you can have more peace of mind when working with a remote team, knowing that you will be minimizing the common communication risks in this type of work.

1 months ago

Plataformatec @ São Paulo › Brazil - Jan 07

Let’s talk about Story Mapping

Are you having a hard time prioritizing backlog and setting releases? Let’s talk about Story Mapping! If you’ve ever participated in a software development team, you’ve probably come across the difficulties faced by product professionals to prioritize the backlog and set releases. These tasks can become even harder without the ideal tools and techniques, distancing ... » 1 months ago

Are you having a hard time prioritizing backlog and setting releases?

Let’s talk about Story Mapping!

If you’ve ever participated in a software development team, you’ve probably come across the difficulties faced by product professionals to prioritize the backlog and set releases. These tasks can become even harder without the ideal tools and techniques, distancing releases from key project goals and creating unnecessary features for the product.

To help solve these problems, we will detail in this blog post the use of Story Mapping, known to some and a great mystery to others. This tool was created by Jeff Patton and described in detail in the book “User Story Mapping” by the same author.

At first, by name, you must be thinking that it is just for mapping work items. What if I tell you that by using Story Mapping you can also optimize value delivery for each release, align delivery expectations between stakeholders, and remove the priority for everything that is not essential to delivery? Story Mapping will not save and solve all problems of your project, but it will be extremely useful particularly in two moments: at the project start, when you are creating a product and don’t have a direction to follow, or after a goal change that alters the entire planned scope.

STEP 1 – Assembling the Value Flow

To start Story Mapping, we first need to have a workflow or a Big Story that will tell the user’s interaction with the product, considering the path that the user goes through to achieve the goal that this product proposes to solve. This will be the baseline for Story Mapping.
If you don’t have this flow yet, it can be assembled dynamically with the product owners and the team involved in creating it, displaying their visions of the ideal product features. We typically use the value stream or the main artifact (main product object) to guide the discussion.

STEP 2 – Mapping Stories

The second step is the moment to analyze the flow, identifying steps that have the same context or that originate from the same user action, thus generating the main epics, which will serve as the foundation for mapping work items. The epics will serve as a backbone of story mapping.

It is now time to detail what needs to be done in each epic. These will be the user stories (or work items, if you prefer to call them that) that will be developed. It must be clear that these stories will still go through refinement, which is the moment that we will in fact discuss thoroughly the task details and remove more technical and business uncertainties. The goal here is to just map the work items, not detail and write them. In the example of the image below we also have themes, this being a logical grouping of epics with the same context inside the system.

Before listing the next step, a tip: have clear, prioritized and feasible goals for the project so you can set releases with higher value added. When you plan a product, you usually have a goal to achieve and expect to add a reasonable value to the business. If the goals are generic or too broad, I recommend splitting this goal into smaller pieces that could become other product delivery milestones. For example, you can have an MVP (Minimum Viable Product) for your product. This first release can deliver value to the business faster and even if it’s a simpler solution, will enable you to collect product metrics, validate assumptions about what is being done, and increase delivery value for future releases.

STEP 3 – Setting Releases

The third and final step is, after mapping and prioritizing the key goals, to distribute the work items from the previous stage among these goals. A good way to do this is to draw lanes on your Story Mapping board, where each lane represents a goal and thus facilitates distributing cards. See the example below:

If we slice the backlog, in the horizontal slices we will have the goals and stories, while in the vertical slices we will have the epics with their respective stories. During the distribution of stories and goals, or even during product development, other items that previously had not been mapped will surely appear, but the scope will be much more detailed and directed to what needs to be done after the discussions.

With the participation of all stakeholders in the group dynamics to set flow, epics, stories, and releases, we have better alignment of expectations on deliveries, definition of an MVP that makes sense for the business and a direction for the next project stages.

To continue the topic of backlog prioritization and setting valuable releases for the business, here is a recommendation for a post about product metrics, written here at Plataformatec: What would be good metrics for digital products?

And you? Do you already have the vision of backlog and releases for your project? Leave your comments below about your experience with Story Mapping!

1 months ago

Plataformatec @ São Paulo › Brazil - Jan 02

An Agilist’s Guide: Analyzing Process health

Regardless of whether you are an experienced or recent agilist, there are challenges that are part of the daily routine of any team that is seeking to create a quality software with a predictable deadline for delivering a demand (e.g., new functionality, fixing a bug, technical improvement, etc.). Developing a vision that enables seeing and ... » 2 months ago

Regardless of whether you are an experienced or recent agilist, there are challenges that are part of the daily routine of any team that is seeking to create a quality software with a predictable deadline for delivering a demand (e.g., new functionality, fixing a bug, technical improvement, etc.).

Developing a vision that enables seeing and understanding the whole by analyzing the parts that form it is a great challenge when dealing with uncertain environments.

One of the ways to understand and analyze the work system of an agile team is through process metrics (there is a collection of cool content on the subject here on Ptec’s blog).

Understanding the flow from data is a pragmatic way to incorporate transparently a continuous improvement philosophy without abrupt and chaotic changes for everyone involved in the context (team, stakeholders, product area, managers, etc.). To paraphrase the Kanban community, measuring is a tool that will help the evolution and not revolution of an existing process.

Personally, I like to quantify the process when I am mapping the reality of a team, as I expose facts (data) for discussion and analysis. In my experience with software development teams, I have realized that having this type of behavior helps to reduce a subjective load that may exist on the team and is a way of confronting beliefs like “we don’t need to improve”, “we don’t have problems in the process” and “we work collaboratively”.

In today’s blog post, I’ll share a collection of metrics that will help you diagnose the health of an agile team process.

Assumptions

First of all, I need to list some reservations about using metrics.

Metrics should be used to evolve the process, not to generate destructive demands and comparisons.

Don’t be tempted to compare teams and measure an individual’s performance. If you or the company where you work are interested in rewarding individual results, engineer a method that proposes evaluating the person’s performance from the opinion of managers, peers and customers who have interacted with the results of the individual’s work. In the event of individually rewarding teams for their results, avoid metrics that are not directly related to business results (e.g. what is the value of the team delivering a number of features in the semester if product revenue has not increased?).

I share this assumption so that you and your team don’t behave according to how they are being measured. After all, as Goldratt would say: “Tell me how you measure me, and I will tell you how I will behave“.

Numbers without context are dangerous.

Any analysis done without understanding the variables that form a scenario becomes superficial, skewed and poor. If someone tells you that the number of features delivered in one team is 10 items per week, is it possible to reach any conclusion? Is this metric good or bad? What is the meaning of delivered for that team? Is it software in production or delivery in a QA environment? Well, from the previous questions it is possible to conclude that a loose number is nothing more than a symbol deprived of any meaning.

Look for trends and run away from accuracy. Given the complexity of creating a software product, do not seek to be deterministic in a world that is receptive by nature to a probabilistic reality.

The logic behind the last assumption is: we live in an environment where there is risk, that is, there is a probability of failure due to some uncertain event whose occurrence does not depend exclusively on the team’s will. Therefore, it is unlikely that the team knows for certain the delivery deadline for a demand or project. Rather than fooling ourselves with deadlines that ratify that deliveries will always happen on the same date and in the same way, let’s analyze the team’s data history to project the chance of delivering something.

These assumptions that I shared will help you use metrics effectively. Lucas Colucci wrote a very important article that presents some common mistakes when using process metrics.

Process diagnostics

Given the assumptions shared above, let’s go to what really matters. The metrics listed below will assist you in mapping the health of a team process. As an agilist, I consider that the views below form a workflow cockpit and should be available for teams to use as material for promoting process improvements (e.g., use of metrics in retrospectives, process analysis in daily meetings, etc.).

Work in progress

Monitoring the work in progress will help the team become aware of the work volume that the process has supported over time. I like this kind of view as it shows, for example, if teams are respecting a policy regarding limits to work in progress.

In the example above, we have a situation where the team had close to 10 items in progress a week, until at some point there was a reconfiguration in the number of people, which caused the amount of work in progress to decrease. An insight here is that this view presents an interesting history of the team, helping to identify moments when changes occurred in team structure, changes of direction (the famous pivots), obstacles, etc.

In order to stabilize the process from the productive capacity, it is important that there isn’t a growth trend for the number of items in WIP. If this is happening, the team is likely to need optimizations to reduce WIP. As an agilist, monitoring WIP will support you to reverberate the following mantra with the team: “Let’s stop starting and let’s start finishing”.

Another WIP analysis I have started is related to the lifespan of items that are in WIP within a given week. Essentially, the view considers the items that are in progress that week and accounts for how long they have been in the workflow. For an easier reading of the chart, I group the data by category where I consider the following ranges: (1) 1 day in WIP; (2) up to one week in WIP; (3) from 1 to 2 weeks in WIP; (4) from 2 to 3 weeks in WIP; (5) from 3 to 4 weeks in WIP; (6) over 4 weeks on WIP.

Still on how the view is structured, for the weeks that have passed, items that are in WIP at the end of the period are accounted for and classified. For the current week, items currently in WIP are analyzed.

Applying it to one example, you can see that in the fifth week of WIP lifespan tracking of the team above, there were two items that were over a month in the flow. If the team, by systematically tracking their workflow, realizes that the older categories are growing, it is very likely that a bottleneck is being produced at some stage in the process and an intervention will be needed.

Keep in mind that WIP represents effort and energy that has not yet been validated and the longer the team spends carrying it, the less feedback will be received, the slower the process of validating the assumptions behind the initiatives that originate the work, and the greater the risk of the company missing on a market opportunity.

Lead time

Lead time is an important metric for teams to develop the ability to understand how long they have taken to complete a work item. In addition, teams that develop the skill to analyze such metrics can identify situations that have generated variability in the process (e.g. environment issues, exit of team members, lack of clear acceptance criteria for demands).

The first view that should be available for the team is the lead time scatter plot chart. It will provide an idea if the time for delivering the items is decreasing or not over time.

As shown in the chart above, I like to combine the following information in this view: completed items (represented in the chart by the blue dots) and item in progress (represented in the chart by the red dots); the moving average that considers the last 5 delivered items (this is a completely arbitrary parameter); and the information for 50th percentile (median), 75th percentile and 95th percentile.

Such reference measures are useful because, from the example, they bring to light findings such as:

  • The moving average has varied over time.
  • Based on the history of completed items, 50% of them were completed within 10 days, 75% took up to 16 days to complete and 95% were completed within 34 days.

In addition, the scatter plot chart can generate answers to questions such as:

  • What is the team doing to handle items in progress that are becoming extreme lead time cases?
  • What can be improved in the process to reduce lead time?

Another beneficial view to understand lead time is the histogram chart, because, when considering each lead time as a category, it presents data more concisely and allows extracting information about distribution behavior.

As shown in the example above, the histogram enables the team to respond to queries such as:

  • What has been the most frequent lead time?
  • Are extreme lead time cases very common?
  • What is the distribution format? Is there a lead time concentration to the left or right of the distribution? Is there more than one mode? Usually bimodal distributions represent flows of teams that deal with more than one type of demand in their process, because they have similar concentrations in different lead time categories.

Delivery pace

Measuring and visualizing throughput is important for the team to understand what amount of work has been delivered over a period of time (example: week, two weeks, month), as well as helping it identify if there is an upward trend in the number of deliveries. When realizing that throughput is falling, the team can try to understand which factors are affecting the process throughput.

I recommend whenever possible breaking the throughput view by demand type. Thus, it is clear to everyone if the team is able to:

  • Balance the number of valuable deliveries (e.g. features) with failure demands (e.g. bugs).
  • Deal with urgent demands in a sustainable manner. I often find teams that classify every demand that enters the workflow as urgent and work on them, leaving aside items that will be important to the business in the medium term.

In the example above, the team managed to balance during most weeks user story deliveries with bug fixes. In weeks marked with red arrows, the team had to act and deliver critical issues that were affecting the company’s operation, therefore, it delivered only bug fixes.

Analysis of flow input and output rate

In addition to using the CFD chart (I wrote a full blog post to talk about this view), I would like to present an analysis that I have not seen many agilists do: the relationship between workflow input and output rates. Basically this view compares the amount of items that the team has committed to deliver with the number of items delivered over time.

What insight can be provided by can this type of view? Let’s look at an example.

Based on the image above you can make some deductions:

  • Of the 27 weeks analyzed, 10 of them had an input rate (items the team committed to delivering) greater than the number of delivered items.
  • In week 17 the team had a peak of items committed to being delivered. This happened because the team’s PO was about to go on vacation. Speaking of inventories, I suggest reading Guilherme Fré’s blog post on the subject.

Analyzing the process input and output rate will support the team in understanding whether the delivery pace is tracking the number of committed items. I have seen teams making commitments that are over their actual capacity. This type of behavior leads to lack of trust by stakeholders because the team delivers less than requested, as well as frustration in the team for never being able to deliver “everything”.

In a perfectly stable system, the input and output rate should be the same. If you and your team can develop a process where most weeks these rates are equivalent (e.g., for every 4 weeks, 3 have equal input and output rates), this will demonstrate workflow maturity and represent a highly predictable context.

Conclusion

Incorporating a culture that brings data to your team will enable you to monitor a process that has an essentially complex nature (software) providing progress visibility to anyone interested in what is being built or maintained.

Furthermore, proposing data-driven improvements and evolutions is an excellent way for removing subjective and, to some extent, empty analysis. Basically, I hope that this blog post helps you, agilist, on encouraging people to use less feeling and more facts when they are analyzing a team’s workflow behavior.

If you are looking for advanced material on metrics, I recommend the book I have written Agile Metrics – Get better results in your team. Check out the reviews posted by those who have read it 😏

And you? What metrics have you used to track process health? Share your experiences in the comments below!
What challenges do you face to track the health of your process?
Schedule a 30-minute conversation with one of our experts. A consultant will help you map out your biggest challenges in the development process.

2 months ago

Plataformatec @ São Paulo › Brazil - Dec 21

Building a new MySQL adapter for Ecto, Part III: DBConnection Integration

Welcome to the “Building a new MySQL adapter for Ecto” series: Part I: Hello World Part II: Encoding/Decoding Part III: DBConnection Integration (you’re here!) Part IV: Ecto Integration In the first two articles of the series we have learned the basic building blocks for interacting with a MySQL server using its binary protocol over TCP. ... » 2 months ago

Welcome to the “Building a new MySQL adapter for Ecto” series:

In the first two articles of the series we have learned the basic building blocks for interacting with a MySQL server using its binary protocol over TCP.

To have a production-quality driver, however, there’s more work to do. Namely, we need to think about:

  • maintaining a connection pool to talk to the DB efficiently from multiple processes
  • not overloading the DB
  • attempting to re-connect to the DB if connection is lost
  • supporting common DB features like prepared statements, transactions, and streaming

In short, we need: reliability, performance, and first-class support for common DB features. This is where DBConnection comes in.

DBConnection

DBConnection is a behaviour module for implementing efficient database connection client processes, pools and transactions. It has been created by Elixir and Ecto Core Team member James Fish and has been introduced in Ecto v2.0.

Per DBConnection documentation we can see how it addresses concerns mentioned above:

DBConnection handles callbacks differently to most behaviours. Some callbacks will be called in the calling process, with the state copied to and from the calling process. This is useful when the data for a request is large and means that a calling process can interact with a socket directly.

A side effect of this is that query handling can be written in a simple blocking fashion, while the connection process itself will remain responsive to OTP messages and can enqueue and cancel queued requests.

If a request or series of requests takes too long to handle in the client process a timeout will trigger and the socket can be cleanly disconnected by the connection process.

If a calling process waits too long to start its request it will timeout and its request will be cancelled. This prevents requests building up when the database cannot keep up.

If no requests are received for a period of time the connection will trigger an idle timeout and the database can be pinged to keep the connection alive.

Should the connection be lost, attempts will be made to reconnect with (configurable) exponential random backoff to reconnect. All state is lost when a connection disconnects but the process is reused.

The DBConnection.Query protocol provide utility functions so that queries can be prepared or encoded and results decoding without blocking the connection or pool.

Let’s see how we can use it!

DBConnection Integration

We will first create a module responsible for implementing DBConnection callbacks:

defmodule MyXQL.Protocol do
  use DBConnection
end

When we compile it, we’ll get a bunch of warnings about callbacks that we haven’t implemented yet.

Let’s start with the connect/1 callback and while at it, add some supporting code:

defmodule MyXQL.Error do
  defexception [:message]
end

defmodule MyXQL.Protocol do
  @moduledoc false
  use DBConnection
  import MyXQL.Messages
  defstruct [:sock]

  @impl true
  def connect(opts) do
    hostname = Keyword.get(opts, :hostname, "localhost")
    port = Keyword.get(opts, :port, 3306)
    timeout = Keyword.get(opts, :timeout, 5000)
    username = Keyword.get(opts, :username, System.get_env("USER")) || raise "username is missing"
    sock_opts = [:binary, active: false]

    case :gen_tcp.connect(String.to_charlist(hostname), port, sock_opts) do
      {:ok, sock} ->
        handshake(username, timeout, %__MODULE__{sock: sock})

      {:error, reason} ->
        {:error, %MyXQL.Error{message: "error when connecting: #{inspect(reason)}"}}

      err_packet(message: message) ->
        {:error, %MyXQL.Error{message: "error when performing handshake: #{message}"}}
    end
  end

  @impl true
  def checkin(state) do
    {:ok, state}
  end

  @impl true
  def checkout(state) do
    {:ok, state}
  end

  @impl true
  def ping(state) do
    {:ok, state}
  end

  defp handshake(username, timeout, state) do
    with {:ok, data} <- :gen_tcp.recv(state.sock, 0, timeout),
         initial_handshake_packet() = decode_initial_handshake_packet(data),
         data = encode_handshake_response_packet(username),
         :ok <- :gen_tcp.send(state.sock, data),
         {:ok, data} <- :gen_tcp.recv(state.sock, 0, timeout),
         ok_packet() <- decode_handshake_response_packet(data) do 
      {:ok, sock}
    end
  end
end

defmodule MyXQL do
  @moduledoc "..."

  @doc "..."
  def start_link(opts) do 
    DBConnection.start_link(MyXQL.Protocol, opts)
  end
end

That’s a lot to unpack so let’s break this down:

  • per documentation, connect/1 must return {:ok, state} on success and {:error, exception} on failure. Our connection state for now will be just the socket. (In a complete driver we’d use the state to manage prepared transaction references, status of transaction etc.) On error, we return an exception.
  • we extract configuration from keyword list opts and provide sane defaults * we try to connect to the TCP server and if successful, perform the handshake.
  • as we’ve learned in part I, the handshake goes like this: after connecting to the socket, we receive the “Initial Handshake Packet”. Then, we send “Handshake Response” packet. At the end, we receive the response and decode the result which can be an “OK Pacet” or an “ERR Packet”. If we receive any socket errors, we ignore them for now. We’ll talk about handling them better later on.
  • finally, we introduce a public MyXQL.start_link/1 that is an entry point to the driver
  • we also provide minimal implementations for checkin/1, checkout/1 and ping/1 callbacks

It’s worth taking a step back at looking at our overall design:

  • MyXQL module exposes a small public API and calls into an internal module
  • MyXQL.Protocol implements DBConnection behaviour and is the place where all side-effects are being handled
  • MyXQL.Messages implements pure functions for encoding and decoding packets This separation is really important. By keeping protocol data separate from protocol interactions code we have a codebase that’s much easier to understand and maintain.

Prepared Statements

Let’s take a look at handle_prepare/3 and handle_execute/4 callbacks that are used to
handle prepared statements:

iex> b DBConnection.handle_prepare
@callback handle_prepare(query(), opts :: Keyword.t(), state :: any()) ::
            {:ok, query(), new_state :: any()}
            | {:error | :disconnect, Exception.t(), new_state :: any()}

Prepare a query with the database. Return {:ok, query, state} where query is a
query to pass to execute/4 or close/3, {:error, exception, state} to return an
error and continue or {:disconnect, exception, state} to return an error and
disconnect.

This callback is intended for cases where the state of a connection is needed
to prepare a query and/or the query can be saved in the database to call later.

This callback is called in the client process.
iex> b DBConnection.handle_execute
@callback handle_execute(query(), params(), opts :: Keyword.t(), state :: any()) ::
            {:ok, query(), result(), new_state :: any()}
            | {:error | :disconnect, Exception.t(), new_state :: any()}

Execute a query prepared by c:handle_prepare/3. Return {:ok, query, result,
state} to return altered query query and result result and continue, {:error,
exception, state} to return an error and continue or {:disconnect, exception,
state} to return an error and disconnect.

This callback is called in the client process.

Notice the callbacks reference types like: query(), result() and params().
Let’s take a look at them too:

iex> t DBConnection.result
@type result() :: any()

iex> t DBConnection.params
@type params() :: any()

iex> t DBConnection.query
@type query() :: DBConnection.Query.t()

As far as DBConnection is concerned, result() and params() can be any term (it’s up to us to define these) and the query() must implement the DBConnection.Query protocol.

DBConnection.Query is used for preparing queries, encoding their params, and decoding their
results. Let’s define query and result structs as well as minimal protocol implementation.

defmodule MyXQL.Result do
  defstruct [:columns, :rows]
end

defmodule MyXQL.Query do
  defstruct [:statement, :statement_id]

  defimpl DBConnection.Query do
    def parse(query, _opts), do: query

    def describe(query, _opts), do: query

    def encode(_query, params, _opts), do: params

    def decode(_query, result, _opts), do: result
  end
end

Let’s define the first callback, handle_prepare/3:

defmodule MyXQL.Protocol do
  # ...

  @impl true
  def handle_prepare(%MyXQL.Query{statement: statement}, _opts, state) do
    data = encode_com_stmt_prepare(query.statement)

    with :ok <- sock_send(data, state),
         {:ok, data} <- sock_recv(state),
         com_stmt_prepare_ok(statement_id: statement_id) <- decode_com_stmt_prepare_response(data) do
      query = %{query | statement_id: statement_id}
      {:ok, query, state}
    else
      err_packet(message: message) ->
        {:error, %MyXQL.Error{message: "error when preparing query: #{message}"}, state}

      {:error, reason} ->
        {:disconnect, %MyXQL.Error{message: "error when preparing query: #{inspect(reason)}"}, state}
    end
  end

  defp sock_send(data, state), do: :gen_tcp.recv(state.sock, data, :infinity)

  defp sock_recv(state), do: :gen_tcp.recv(state.sock, :infinity)
end

The callback receives query, opts (which we ignore), and state. We encode the query statement into COM_STMT_PREPARE packet, send it, receive response, decode the COM_STMT_PREPARE Response, and put the retrieved statement_id into our query struct.

If we receive an ERR Packet, we put the error message into our MyXQL.Error exception and return that.

The only places that we could get {:error, reason} tuple is we could get it from are the gen_tcp.send,recv calls – if we get an error there it means there might be something wrong with the socket. By returning {:disconnect, _, _}, DBConnection will take care of closing the socket and will attempt to re-connect with a new one.

Note, we set timeout to :infinity on our send/recv calls. That’s because DBConnection is managing the process these calls will be executed in and it maintains it’s own timeouts. (And if we hit these timeouts, it cleans up the socket automatically.)

Let’s now define the handle_execute/4 callback:

defmodule MyXQL.Protocol do
  # ...

  @impl true
  def handle_execute(%{statement_id: statement_id} = query, params, _opts, state)
      when is_integer(statement_id) do
    data = encode_com_stmt_execute(statement_id, params)

    with :ok <- sock_send(state, data),
         {:ok, data} <- sock_recv(state),
         resultset(columns: columns, rows: rows) = decode_com_stmt_execute_response() do
      columns = Enum.map(columns, &column_definition(&1, :name))
      result = %MyXQL.Result{columns: columns, rows: rows}
      {:ok, query, result, state}
    else
      err_packet(message: message) ->
        {:error, %MyXQL.Error{message: "error when preparing query: #{message}"}, state}

      {:error, reason} ->
        {:disconnect, %MyXQL.Error{message: "error when preparing query: #{inspect(reason)}"}, state}
    end
  end
end

defmodule MyXQL.Messages do
  # ...

  # https://dev.mysql.com/doc/internals/en/com-query-response.html#packet-ProtocolText::Resultset
  defrecord :resultset, [:column_count, :columns, :row_count, :rows, :warning_count, :status_flags]

  def decode_com_stmt_prepare_response(data) do
    # ...
    resultset(...)
  end

  # https://dev.mysql.com/doc/internals/en/com-query-response.html#packet-Protocol::ColumnDefinition41
  defrecord :column_definition, [:name, :type]
end

Let’s break this down.

handle_execute/4 receives an already prepared query, params to encode, opts, and the state.

Similarly to handle_prepare/3, we encode the COM_STMT_EXECUTE packet, send it and receive a response, decode COM_STMT_EXECUTE Response, into a resultset record, and finally build the result struct.

Same as last time, if we get an ERR Packet we return an {:error, _, _} response; on socket problems, we simply disconnect and let DBConnection handle re-connecting at later time.

We’ve mentioned that the DBConnection.Query protocol is used to prepare queries, and in fact we could perform encoding of params and decoding the result in implementation functions. We’ve left that part out for brevity.

Finally, let’s add a public function that users of the driver will use:

defmodule MyXQL do
  # ...

  def prepare_execute(conn, statement, params, opts) do
    query = %MyXQL.Query{statement: statement}
    DBConnection.prepare_execute(conn, query, params, opts)
  end
end

and see it all working.

iex> {:ok, pid} = MyXQL.start_link([])
iex> MyXQL.prepare_execute(pid, "SELECT ?", [42], [])
{:ok, %MyXQL.Query{statement: "SELECT ? + ?", statement_id: 1},
%MyXQL.Result{columns: ["? + ?"], rows: [[5]]}}

Arguments to MyXQL.start_link are passed down to
DBConnection.start_link/2,
so starting a pool of 2 connections is as simple as:

iex> {:ok, pid} = MyXQL.start_link(pool_size: 2)

Conclusion

In this article, we’ve seen a sneak peek of integration with the DBConnection library. It gave us
many benefits:

  • a battle-tested connection pool without writing a single line of pooling code
  • we can use blocking :gen_tcp functions without worrying about OTP messages and timeouts;
    DBConnection will handle these
  • automatic re-connection, backoff etc
  • a way to structure our code

With this, we’re almost done with our adapter series. In the final article we’ll use our driver as an Ecto adapter. Stay tuned!

2 months ago

Adam Niedzielski @ Lodz › Poland - Nov 22

Boring Ruby Code

In 2017 I delievered a talk with the title “Boring Ruby Code” at two conferences – Brighton Ruby Conf and Southeast Ruby. After that I always wanted to write a blog post that summaries the approach, but I have never gotten around to do that. Today is the day so here we go.

This is just a short summary, not one-to-one transcription of the talk. You can wa 3 months ago

In 2017 I delievered a talk with the title “Boring Ruby Code” at two conferences – Brighton Ruby Conf and Southeast Ruby. After that I always wanted to write a blog post that summaries the approach, but I have never gotten around to do that. Today is the day so here we go.

This is just a short summary, not one-to-one transcription of the talk. You can watch the video from Brighton Ruby Conf here.

  1. Boring code is easier to understand. In your team you have (or at least should have) people at different stages of their career. Certain constructs in Ruby do not bring you much value, but make the code harder to understand for entry level programmers.
  2. Boring code is easier to read. Programmers spend most of their time reading the code so it makes sense to optimise for the reading time and not the writing time. Less common constructs increase the reading time even for people who understand how they work.
  3. Boring code is easier to delete. There is no “pride” associatied with boring code, so the decision to delete it comes easier.
  4. Iterating over an array of names to assign values or define methods is just lazy and brings no value.
  5. send is a code smell (public_send too).
  6. Dynamically defining methods is a perfect way to prevent other programmers from discovering where the method is defined.
  7. Dynamically defining methods where method name is dynamically constructed is a no-go.
  8. Metaprogramming capabilities of Ruby can steer our attention away from standard refactoring practices. A lot of code duplication can (and should) be solved by extracting a method, not by metaprogramming.
  9. method_missing is a code smell.
  10. “Smart code” can hide a lot of simple mistakes just by looking smart. Programmers assume that a piece of code is right, because it looks smart.
  11. is_a? and respond_to? is a code smell.
  12. Once you start using is_a? and respond_to? they tend to leak to multiple places in your codebase.
3 months ago

Ryan Davis (Polishing Ruby) @ Seattle, WA › United States - Nov 13

Graphics and Simulations (and Games), Oh My!

I’ve released my slides for my Ruby Conf talk titled Graphics and Simulations (and Games), Oh My! where I gave a talk about my graphics and simulation gem called, creatively, “graphics”.

As said during the talk, I’m going to try to get a release out today that will include the transition to SDL2 and a bunch of enhancements.

Hopefully I’ll also be abl 3 months ago

I’ve released my slides for my Ruby Conf talk titled Graphics and Simulations (and Games), Oh My! where I gave a talk about my graphics and simulation gem called, creatively, “graphics”.

As said during the talk, I’m going to try to get a release out today that will include the transition to SDL2 and a bunch of enhancements.

Hopefully I’ll also be able to take off the “beta” attribute as well.

3 months ago

Pat Shaughnessy @ Boston, MA › United States - Oct 24

Summer School With The Rust Compiler


(source: Steve Klabnik via Twitter)

A few months ago, I saw this tweet from Steve. I’m not even sure what “d 4 months ago

A few months ago, I saw this tweet from Steve. I’m not even sure what “derridean” means, but now the image of an insane coach pops into my head every time I try to write Rust code.

Learning Rust is hard for everyone, but it’s even worse for me because I’ve been working with Ruby during past ten years. Switching to Rust from Ruby is leaving an anything-goes hippie commune to a summer school for delinquent programmers run by a sadistic and unforgiving teacher.

Why would anyone use a compiler like this? The answer is simple: to learn how to write better code. This past summer I had some free time and decided to convert a simple Ruby script into Rust. As you’ll see, the Rust compiler beat me up a few times; it wasn’t easy. But after some tough love I ended up learning something, not only about Rust but about Ruby too.

Iterating Over an Array in Ruby

Here’s my example program. It’s so short and simple you can read and understand it in just a few seconds:

array = [1, 2, 3]
for i in array
  puts i
end

When I ran it, the output was:

$ ruby int-loop.rb
1
2
3

The Garden of Earthly Delights (detail), by Hieronymus Bosch

Ruby’s syntax and feature set are designed to make my life easier as a developer. Writing Ruby for me is as natural as writing English; it’s like having a pleasant conversation with my computer. I’m living in the Garden of Earthly Delights. If I can imagine a code change, I can write it. Using Ruby, all of my dreams can come true.

Next I decided to increment the values before printing them out. I added just one line of code to my example, i = i+1:

array = [1, 2, 3]
for i in array
  i = i+1
  puts i
end

As I expected, Ruby printed out 2 through 4:

$ ruby int-loop.rb
2
3
4

Of course, there are other ways to produce the same result. I could have used puts i+1, or mapped the original array to a new array [2, 3, 4]. But Ruby doesn’t care. Today I felt like writing i = i+1, and Ruby let me do it without comment. Ruby is the parent of an unruly teenager that gets away with anything.

As I found out later, using i = i+1 might have broken a Computer Science rule or two, but I was blissfully unaware. What you don’t know can’t hurt you. Ruby didn’t tell me anything might be wrong… but as we’ll see Rust certainly did!

Rust: Similar to Ruby At First Glance

I was curious: What would the Rust compiler think of this example? I was able to rewrite it in only a few minutes:

fn main() {
    let array = [1, 2, 3];
    for i in array.iter() {
        println!("{}", i);
    }
}

I had to type semicolons after each line and use a main function. A bit more typing, but really this is exactly the same program. Running this, of course, produced the same result:

$ rustc int-loop.rs && ./int-loop
1
2
3

Then I decided to try using the same i = i+1 line from above:

fn main() {
    let array = [1, 2, 3];
    for i in array.iter() {
        i = i+1;
        println!("{}", i);
    }
}

Lesson One: Passing By Reference vs. Passing By Value

Compiling this, the Rust compiler hit me over the head with Computer Science!

$ rustc int-loop.rs && ./int-loop
error[E0271]: type mismatch resolving `<&i32 as std::ops::Add>::Output == &i32`
  --> int-loop.rs:4:14
   |
 4 |         i = i+1;
   |              ^ expected i32, found &i32
   |
   = note: expected type `i32`
              found type `&i32`

error: aborting due to previous error

For more information about this error, try `rustc --explain E0271`.

What in the world does this mean? I wrote a very simple line of code, and got a message straight out of type theory! The error type mismatch resolving `<&i32 as std::ops::Add>::Output == &i32` makes no sense to me at all.

I decided to take the compiler’s suggestion and run the explain command:

$ rustc --explain E0271
This is because of a type mismatch between the associated type of some
trait (e.g. `T::Bar`, where `T` implements `trait Quux { type Bar; }`)
and another type `U` that is required to be equal to `T::Bar`, but is not.
Examples follow.

The explain output continued for about two more pages, with examples that didn’t resemble my code at all. What is a trait? What is an associated type? I didn’t use any of these more advanced Rust concepts in my simple script. Maybe I needed a PhD. in Computer Science even to try to use Rust?

Eventually, I figured it out. The key lines from the error message were:

 4 |         i = i+1;
   |              ^ expected i32, found &i32

Rust is telling me that iter() yielded references to integers, but my code expected an actual integer, not a reference to an integer. But what are references, exactly?

Running my code above, Ruby passed each integer from the array to my code as a simple value:

But Rust passed each integer from the array as a reference, or in other words as a pointer to the value itself:

In Ruby, of course, I didn’t have to worry about references, pointers or even types, so none of this came up. Or at least that’s what I thought at the time.

Lesson Two: Borrowed Values

Ah – according to the Rust compiler’s error message I just had to dereference the reference before using it. I changed i = i+1 to *i = *i+1:

fn main() {
    let array = [1, 2, 3];
    for i in array.iter() {
        *i = *i+1;
        println!("{}", i);
    }
}

Then Rust slapped me in the face again with more Computer Science:

$ rustc int-loop.rs && ./int-loop
error[E0594]: cannot assign to immutable borrowed content `*i`
  --> int-loop.rs:26:9
   |
26 |         *i = *i+1;
   |         ^^^^^^^^^ cannot borrow as mutable

error: aborting due to previous error

For more information about this error, try `rustc --explain E0594`.

Ugh. I guess that was a bad idea. What in the world happened here? I thought I had the dereferencing syntax correct, *i, the same syntax I’m used to from C. Actually Rust didn’t complain about types any more or about using a reference vs. a value. But what does “borrow as mutable” mean? And why doesn’t Rust let me do that?

Again, the problem here is that I don’t know enough Rust even to understand the compiler’s error messages. I need to take a few months off from my day job and read a book, or take a class. I need to understand Rust’s ownership model.

In Rust, every value is “owned” by the spot in my code where I allocate that value. In this example, the integers and the array that contains them are owned by the main function. When the main function goes out of scope, Rust frees the memory for that array automatically. In this diagram, the red arrow shows where Rust allocates the array (at the top), and where Rust frees it (at the bottom):

You can think of the red arrow as the “lifetime” of the array. When I pass a value from one spot to another, when I call a function or a closure, I can either “move” that value to the new function, or the function can ”borrow” it. In this example, the call to iter() borrowed the elements inside the array, passing a reference to each element into the closure. The blue array in this diagram indicates each element of the array, i, is a borrowed value inside the closure:

Lesson Three: Immutable vs. Mutable Values

But using borrowed values isn’t the problem here. The problem is that my code tries to change them, or mutate them:

*i = *i+1;

Because the value of i each time around the loop was an element of the array, and because iter() borrowed each element from the original array, the elements are marked as immutable, just as the array was. Or at least I that’s how I understood the previous error message.

Back in the main function when I typed:

let array = [1, 2, 3];

…Rust created an immutable array of three integers. All variables in Rust are immutable by default. Because it was immutable, my code can’t change it.

Ah… so the fix is to mark my array as mutable:

fn main() {
    let mut array = [1, 2, 3];
    for i in array.iter() {
        *i = *i+1;
        println!("{}", i);
    }
}

Lesson Four: Declaring Side Effects

Running the Rust compiler again, I got the same error along with a new warning:

$ rustc int-loop.rs && ./int-loop
error[E0594]: cannot assign to immutable borrowed content `*i`
  --> int-loop.rs:14:9
   |
14 |         *i = *i+1;
   |         ^^^^^^^^^ cannot borrow as mutable

warning: variable does not need to be mutable
  --> int-loop.rs:12:9
   |
12 |     let mut array = [1, 2, 3];
   |         ----^^^^^
   |         |
   |         help: remove this `mut`
   |

Wait – so now Rust was telling me I shouldn’t add the mut keyword? That my last change was dead wrong? Why was it wrong? Probably I didn’t understand what “cannot borrow as mutable” really meant.

It took me a while to figure this out but eventually I ran into this great article which explained what I was doing wrong and how to fix it. I needed to use iter_mut instead of iter. iter_mut yields mutable references to the closure, while iter yields normal, immutable references.

That is, by calling iter_mut I’m declaring that the code inside of the closure might mutate the elements of the array. This is knowns as a side effect. As a side effect of the iteration, the code inside might also change the values of the collection it is iterating over. Rust forced me to declare that my code might change the array.

Finally, running my program with iter_mut finally worked!

fn main() {
    let mut array = [1, 2, 3];
    for i in array.iter_mut() {
        *i = *i+1;
        println!("{}", i);
    }
}
$ rustc int-loop.rs && ./int-loop
2
3
4

What Rust Taught Me

My example today started out as a trivial, 4 line Ruby script. It was so simple, there really wasn’t anything that could possibly go wrong when I ran it. Then I added one simple line of code: i = i+1. When I added this to my Ruby script, it worked just fine.

As we saw, this line of code got the Rust compiler very angry. It slapped me in the face with four Computer Science lessons. I learned:

  • about passing values vs. passing references.
  • about mutable vs. immutable values.
  • about value ownership, lifetimes and borrowing values.
  • about side effects, and declaring them.

As you can see, the Rust compiler is an amazing tool you can use to learn more about Computer Science. The problem is that it’s hard to get along with. Compiling a Rust program will fail over and over again until you your code is 100% correct. You need to have tremendous patience to use Rust, especially as a beginner.

Worse than that, the Rust compiler’s error messages are hard to understand, and easy to misinterpret. They can seem to be self-contradictory as we saw above. The Rust compiler assumes you already know what it is trying to teach you. Not only is Rust a violent teacher, it’s a bad one. If I knew that iter() borrowed immutable values, if I knew what “borrowing” and “immutable” even meant, then I likely wouldn’t have run into that compiler error in the first place.

And Rust’s confusing error message lead me in the wrong direction. In this example, I didn’t really want to mutate the array, I just wanted to print out the incremented values. I could have just incremented an intermediate value and left the original array alone. Instead, the complex error messages confused and mislead me, and I never discovered this simpler code:

fn main() {
    let array = [1, 2, 3];
    for i in array.iter() {
        println!("{}", i+1);
    }
}

The Rust compiler is an amazing tool for learning; the problem is you need to have a deep understanding of the Rust language before you can use it effectively. Rust needs a --beginner option. Using this option on the command line would intstruct the compiler to produce error messages designed for Rust learners, rather than Rust experts.

What Ruby Didn’t Tell Me

I had the opposite experience using Ruby. No confusing compiler errors; in fact, no compiler at all. No types, no need to worry about immutability or whether I’m passing references or values. Everything just worked.

Or did it? Because Ruby passed integers by value, the array in my original example wasn’t modified:

array = [1, 2, 3]
for i in array
  i = i+1
  puts i
end
puts "----"
p array
$ ruby int-loop.rb
2
3
4
----
[1, 2, 3]

This is probably a good thing. Side effects like mutating a collection while iterating over it can easily lead to bugs. Maybe code later in my program needed the original, unchanged values in that array? Maybe another thread was trying to use that collection at the same time?

The problem with using Ruby is that you don’t know what Ruby isn’t telling you. Because Ruby didn’t display any warnings or error messages when I added i = i+1 to my loop, I didn’t even think about any of these issues. Fortunately, Ruby didn’t modify the array so it wasn’t a problem.

But suppose my array contained strings and not integers:

array = ["one", "two", "three"]
for str in array
    str = str << "-mutated"
    puts str
end
puts "----"
p array
$ ruby string-loop.rb
one-mutated
two-mutated
three-mutated
----
["one-mutated", "two-mutated", "three-mutated"]

Now the array was mutated! It turns out Ruby passed integers to the closure by value, but strings by reference. Updating each string inside the loop also updated that string inside the array. Now my program will have bugs, unless the point of running that loop was to mutate the array, and not just to print it out.

4 months ago

Sarah Allen @ San Francisco, CA › United States - Oct 14

the path is made by walking

In 2009, when Sarah Mei and I started teaching free coding workshops for women, we didn’t expect to fix the industry, just our little corner of it. We’re programmers. We solve problems by focusing on something concrete that can be built with the tools at hand. We focused on increasing diversity in the SF Ruby… Continue reading → 4 months ago

In 2009, when Sarah Mei and I started teaching free coding workshops for women, we didn’t expect to fix the industry, just our little corner of it.

We’re programmers. We solve problems by focusing on something concrete that can be built with the tools at hand. We focused on increasing diversity in the SF Ruby meetup. By teaching workshops, engaging the local tech companies and all of the people who wanted to help, we moved the needle. Later we expanded to include outreach to other demographics who are underrepresented in tech (which turns out to be most people).

Last week I spoke at a Bridge Foundry event where we announced a new industry partner program. In preparing for this announcement, I spoke to Amanda Cooper (@MandaCoop) on our advisory board. She framed what we do as “you make the road by walking it.”

There was no clear path, but we had ideas that we thought could work. We did the work to implement our ideas. We took a data-driven approach to measuring impact. We open-sourced our process and materials. In doing the work, we created a path that others could follow. Or more accurately, inspired others to help create the path by walking it with us.

Over the years, I’ve watched students become senior software developers. I’ve seen how volunteering at the workshops has caused some ex-programmers to decide to become software engineers again. It’s not all about more diverse software developers — we want everyone to be able to learn these tech skills, if they want to. Coding skills are applicable across many disciplines and can be useful to simply understand the technology that people use every day.

Most students and volunteers are working software developers, and we’re seeing some particular problems in the tech industry where we think we can help.

Lack of good tech jobs

The tech industry has a diversity problem that goes well beyond the “pipeline” problem that can be address with skill training. There seem to be few workplaces where there is real opportunity to succeed based on individual skill and potential.

I believe that most companies genuinely want to create workplaces where people with the right skills and capabilities can deliver business impact. This should be very aligned with business goals. Unfortunately systemic bias gets in the way. There are patterns that need to and can be changed. There are bugs in the system that need to be fixed in order to attract and retain diverse talent.

I see some companies where the environment seems to be different. I hear about companies who want to do better. Help create the path by walking it with some folks who have a lot of experience solving these kinds of challenges: join the Bridge Foundry Industry Partner Program.


XXIX

Traveler, there is no path.
The path is made by walking.

Traveller, the path is your tracks
And nothing more.
Traveller, there is no path
The path is made by walking.
By walking you make a path
And turning, you look back
At a way you will never tread again
Traveller, there is no road
Only wakes in the sea.

― Antonio Machado, Border of a Dream: Selected Poems

4 months ago

Maciej Mensfeld (Running with Rails) @ Kraków › Poland - Aug 18

Ubuntu 18.04 – Disable screen on lid close

In order to force your Ubuntu to just disable the screen on lid close, you need to do two things: Disable sleep (do nothing) on lid close Disable screen on lid close Just follow all the steps from both sections and you should be fine. Disable sleep (do nothing) on lid close Copy-paste this into […]

The post Ubuntu 18.04 – Disable screen on lid close appeared first on Running w 6 months ago

In order to force your Ubuntu to just disable the screen on lid close, you need to do two things:

  1. Disable sleep (do nothing) on lid close
  2. Disable screen on lid close

Just follow all the steps from both sections and you should be fine.

Disable sleep (do nothing) on lid close

Copy-paste this into the terminal (as a root):

# sudo su
echo 'HandleLidSwitch=ignore' | tee --append /etc/systemd/logind.conf
echo 'HandleLidSwitchDocked=ignore' | tee --append /etc/systemd/logind.conf
sudo service systemd-logind restart

Disable screen on lid close

Copy-paste this into the terminal (as a root):

# sudo su
echo 'event=button/lid.*' | tee --append /etc/acpi/events/lm_lid
echo 'action=/etc/acpi/lid.sh' | tee --append /etc/acpi/events/lm_lid
touch /etc/acpi/lid.sh
chmod +x /etc/acpi/lid.sh

Edit the /etc/acpi/lid.sh file, paste following content and replace the your_username with your main user name:

#!/bin/bash

USER=your_username

grep -q close /proc/acpi/button/lid/*/state

if [ $? = 0 ]; then
  su -c  "sleep 1 && xset -display :0.0 dpms force off" - $USER
fi

grep -q open /proc/acpi/button/lid/*/state

if [ $? = 0 ]; then
  su -c  "xset -display :0 dpms force on &> /tmp/screen.lid" - $USER
fi

The post Ubuntu 18.04 – Disable screen on lid close appeared first on Running with Ruby.

6 months ago

Tom Copeland (Junior Developer) @ Herndon, VA › United States - Aug 12

Invalid or incomplete POST parameters

It took me [insert large number here] years of Rails and Ruby but I finally saw this in my logs: 6 months ago

It took me [insert large number here] years of Rails and Ruby but I finally saw this in my logs:

Invalid or incomplete POST parameters

But the parameters were fine! It was just an innocuous XML document coming in through the API! After some flailing - basically bisecting a request payload - I reproduced it with:

$ curl -si -X POST -d '%' http://localhost/ | head -1
HTTP/1.1 400 Bad Request

To cut to the chase, adding a content type header solves it. With text/xml, we get a garden-variety 404 error since there's no route for a POST to /:

$ curl -si -H 'content-type: text/xml' -X POST -d '%' http://localhost/ | head -1
HTTP/1.1 404 Not Found

So the root cause is that if there's no explicit content type on a request, Rack attempts to parse it as if it were application/x-www-form-urlencoded and % is invalid input in that case.

There are a couple of interesting elements to this one. As far as the exception is concerned, most blog posts and Stack Overflow questions around this error involve users mistyping URLs and putting in consecutive percent signs or something. So those are GET requests, not POSTs, and thus not immediately relevant. Also, the exception comes from Rack. So the logs won't have a stracktrace and the usual error reporting tools won't get a chance to show a good post-mortem for this.

Poking around in Rack gets us moving though. Here's a comment from lib/rack/request.rb:

# This method support both application/x-www-form-urlencoded and
# multipart/form-data.

So there's a reference to x-www-form-urlencoded, which refers to RFC 1738 for encoding, which explains why a percent sign, or a string like 15% of net would be invalid input.

Supposing you can't add a content-type header to the code that's making the requests? Perhaps the POST is coming from a third party. In that case, Rack middleware to the rescue. Nothing fancy, just:

class Rack::AddTheHeader
  def initialize(app)
    @app = app
  end
  def call(env)
    if env['PATH_INFO'] == "/some/path"
      env['CONTENT_TYPE'] = 'text/xml'
      Rack::Request.new(env).body.rewind
    end
    @app.call(env)
  end
end

This scopes the header addition to a specific URL space, which seems prudent. That's about it, hope this saves someone a few minutes!

6 months ago

Sarah Allen @ San Francisco, CA › United States - Feb 03

firebase auth migration from rails/devise

Migration to Firebase from a Heroku-hosted Rails app appears to work seamlessly. I’ve only tested one user, but I could log in with the same password and Facebook account with no end-user intervention. It took a little experimentation to come up with the correct format for export from heroku psql command line: \COPY (select id,… Continue reading → 18 days ago

Migration to Firebase from a Heroku-hosted Rails app appears to work seamlessly. I’ve only tested one user, but I could log in with the same password and Facebook account with no end-user intervention.

It took a little experimentation to come up with the correct format for export from heroku psql command line:

\COPY (select id, email, CASE WHEN confirmed_at IS NULL THEN 'false' ELSE 'true' END as Verified, regexp_replace(encode(encrypted_password::bytea, 'base64'), '\n', '') as hash, password_salt::text, screen_name, '' as photo, '' as google_id, '' as google_email, '' as google_name, '' as google_photo, uid as facebook_id, provider_email as facebook_email, '' as fname, '' as fphoto, '' as twitter_id, '' as twitter_mail, '' as twitter_name, '' as twitter_photo, '' as github_id, '' as github_mail, '' as github_name, '' as github_photo, EXTRACT(EPOCH FROM created_at)::bigint, EXTRACT(EPOCH FROM last_sign_in_at)::bigint, '' as phone FROM speakers ORDER BY id limit 160) TO '~/users.csv' WITH (FORMAT csv, DELIMITER ',');

Which I imported with the following command:

firebase auth:import ~/users.csv     \
    --hash-algo=BCRYPT               \
    --rounds=10                      \

Check your devise config file config/initializers/devise.rb — encryption options are configurable there.

Additional Notes

I found these samples helpful to get a test running very quickly.

Firebase auth import requires exactly 26 collumns (for csv import):

UID,Email,Email Verified,Password Hash,Password Salt,Name,Photo URL,Google ID,Google Email,Google Display Name,Google Photo URL,Facebook ID,Facebook Email,Facebook Display Name,Facebook Photo URL,Twitter ID,Twitter Email,Twitter Display Name,Twitter Photo URL,GitHub ID,GitHub Email,GitHub Display Name,GitHub Photo URL,User Creation Time,Last Sign-In Time,Phone Number
18 days ago

Sarah Allen @ San Francisco, CA › United States - Jan 18

golang philosophy

In learning a new programming language, it’s helpful to understand it’s philosophy. I seek to learn patterns that are idiomatic, and more importantly: why the syntax is the way it is. This allows me to write original code more quickly, gaining an intuition for simple things like when to look for a library and when… Continue reading → 1 months ago

In learning a new programming language, it’s helpful to understand it’s philosophy. I seek to learn patterns that are idiomatic, and more importantly: why the syntax is the way it is. This allows me to write original code more quickly, gaining an intuition for simple things like when to look for a library and when to just write code.

I rarely find good resources for learning a new language that are targeted at experienced programmers. So I’ve developed a habit of looking for language koans. Inspired by Ruby Koans, these are unit tests which guide a programmer through basic language constructs by presenting a failing test and let you write simple code to learn the syntax of a language. These tests typically include a bit of text that helps newcomers reflect on what is special and interesting about this particular programming language.

In learning Go, I found cdarwin/go-koans, which helped me to reflect on the philosophy of golang, the Go programming language.

The koans caused me to meditate on the basics, leading me to read more and reflect. While about_basics.go is quick to solve technically, it sparked my curiosity on two points.

1. The uninitialized variable

I really wanted the comments in the go-koans to be a bit more like Zen koans (or Ruby koans), so I wrote these:

// listen to the darkness of an unset variable
// what is the code that is not written?
// consider the emptiness of a string

// create meaning from emptiness
// undefined structure isn't

“Make the zero value useful” —Go Proverbs

It reminds me of the Zen teacup parable. An empty cup has utility, even before it is filled.

2. The implications of a string

One of the most deceptively simple types in modern programming languages is the string. In Go, there is a built-in string type with short, unsatisfying descriptive text.

Strings, bytes, runes and characters in Go explains that strings are a read-only slice of bytes (at runtime). Go source code is UTF-8, so string literals always contain UTF-8 text (except for byte-level escapes.

Strings always cause me to reflect on how memory management works in a language. In my search for basic answers about how and when memory happens in string operations, I read about allocation efficiency in high-performance Go services which includes a nice explanation of heap vs stack memory allocation in Go.

Reflections

At this point, I don’t know what I need to know about this new programming language. I just like to know what the code I’m typing actually does. Learning syntax is boring, so I need to occupy my mind with something more interesting while I practice typing unfamiliar sequences of text. To write good code, I need to know so much more than the syntax, but I need to be careful not get get too attached to certain details. For example, future compiler versions change the patterns of how code is transformed into machine operations. However, if I attach just a little deeper meaning to these syntax constructs and get a feel for what my code ends up doing under-the-hood, I can more quickly understand the implications of the code I write.

When I emerge from these learning meditations and I can finally construct this new syntax without thinking and start to solve actual problems that matter to humans, then I will have created these little trails in my mind that lead to empty spaces, which have shape and meaning, like the Go zero value and the Zen teacup.

1 months ago

Maciej Mensfeld (Running with Rails) @ Kraków › Poland - Jan 09

Exploring a critical Net::Protocol issue in Ruby 2.6.0p0 and how it can lead to a security problem

TL;DR This bug has been fixed in 2.6.1. Please upgrade and all should be good. If you do any HTTP communication (HTTP requests, Elasticsearch, etc) do not upgrade to 2.6.0p0 or apply the patch below as soon as possible. Ruby is eating up characters when pushed over HTTP Ruby 2.6.0 has been released not long […]

The post Exploring a critical Net::Protocol issue in Ruby 2.6.0p0 and ho 1 months ago

TL;DR

This bug has been fixed in 2.6.1. Please upgrade and all should be good.

If you do any HTTP communication (HTTP requests, Elasticsearch, etc) do not upgrade to 2.6.0p0 or apply the patch below as soon as possible.

Ruby is eating up characters when pushed over HTTP

Ruby 2.6.0 has been released not long ago. Not many are unfortunately aware of a major bug that was introduced with these release.

This bug can affect you in many ways, some of which you may not even be aware. All may run well up until you decide to send a particular type of payload and then, things will get interesting.

What am I talking about?

This. What does it even mean? Well in the best scenario it means, that you will end up having a critical error like so:


Net::HTTP.post(URI('http://httpbin.org/post'), 'あ'*100_000)
Traceback (most recent call last):
       16: from /net/http.rb:502:in `block in post'
       15: from /net/http.rb:1281:in `post'
       14: from /net/http.rb:1493:in `send_entity'
       13: from /net/http.rb:1479:in `request'
       12: from /net/http.rb:1506:in `transport_request'
       11: from /net/http.rb:1506:in `catch'
       10: from /net/http.rb:1507:in `block in transport_request'
        9: from /net/http/generic_request.rb:123:in `exec'
        8: from /net/http/generic_request.rb:189:in `send_request_with_body'
        7: from /net/protocol.rb:247:in `write'
        6: from /net/protocol.rb:265:in `writing'
        5: from /net/protocol.rb:248:in `block in write'
        4: from /net/protocol.rb:275:in `write0'
        3: from /net/protocol.rb:275:in `each_with_index'
        2: from /net/protocol.rb:275:in `each'
        1: from /net/protocol.rb:280:in `block in write0'

However, there’s a much more interesting case that you can encounter. You can end up sending data that will be trimmed in a way that will make your server receive incomplete yet valid information.

That is not a security issue per se but can be a massive problem if you use your format as a protocol between some internal services.

Sidenote: bjeanes reported on Github, that this bug can also corrupt JSON in a way that will make it parsable but incorrect regarding data it consists.

Set HTTP API as a POC of this bug

To illustrate how this bug can become problematic and hard to debug, let’s build an HTTP based API that implements basic set operations via the web.

Some assumptions for the sake of simplicity:

  • we always send data in the following format: DATA,COMMAND;
  • we have three commands: GET, ADD and DEL;
  • to save a couple of bytes, when no command provided as a second argument, we run an ADD command;

This is how our abstract API could work:

client = Api.new
client.get #=> []
client.add('12313131') #=> ['12313131']
client.add('msg') #=> ['12313131', 'msg']
client.del('msg') #=> ['12313131']

A set API server implementation

The implementation of such an API server will just take us a couple of lines in Ruby:

require 'webrick'
require 'set'

server = WEBrick::HTTPServer.new(Port: 3000)
set = Set.new

server.mount_proc '/' do |req, res|
  data, action = req.body.split(',')
  action ||= 'ADD'

  # Return set data for any command, no need to handle GET
  case action
  when 'ADD'
    set.add(data)
  when 'DEL'
    set.delete(data)
  end

  res.body = set.to_a.to_s
end

trap('INT') { server.shutdown }

server.start

You can start it by running:

ruby server.rb
[2019-01-09 22:38:58] INFO  WEBrick 1.4.2
[2019-01-09 22:38:58] INFO  ruby 2.6.0 (2018-12-25)
[2019-01-09 22:38:58] INFO  WEBrick::HTTPServer

A set API client implementation

The client is not much more complicated:

require 'net/http'

class Api
  HOST = 'localhost'
  PORT = 3000

  def initialize
    @http = Net::HTTP.new(HOST, PORT)
  end

  def get
    request nil, 'GET'
  end

  def add(data)
    request data, 'ADD'
  end

  def del(data)
    request data, 'DEL'
  end

  private

  def request(data, cmd)
    Net::HTTP::Post
      .new('/', 'Content-Type': 'application/json')
      .tap { |req| req.body = "#{data},#{cmd}" }
      .yield_self(&@http.method(:request))
      .yield_self(&:body)
  end
end

client = Api.new
client.get
client.add('12313131')
client.add('msg')
client.del('msg')

When executed, you end up with exactly what we’ve wanted to achieve:

puts client.get #=> []
puts client.add('12313131') #=> ['12313131']
puts client.add('msg') #=> ['12313131', 'msg']
puts client.del('msg') #=> ['12313131']

Risk of an uncompleted payload

So far so good. We have an excellent API that we can use for storing anything we want. And here magic starts.

We decide to store some analytics results, that are used by other APIs to grant access to some super essential and expensive business information™.

It doesn’t matter what the results are. All we need to know from our perspective, is the fact, that it will fit into memory. So, we hand out our API client code to other developers; we run our server and… in the middle of the night the phone rings:

Data that is supposed to be deleted is still available. We constantly run the DEL command but nothing disappears! We need to revoke all the access ASAP!

How can it be!? This service has been running for months now, and everything was good. There was a recent update in Ruby, but even after that specs were passing and the service has been running for at least two weeks.

And this is the moment when this bug presents itself in all the glory. For big enough payload, Ruby is trimming data that is being sent, and unfortunately for us, it trims last three letters, that is the full DEL command. When we run an ADD and DEL on a given string, we expect it not to be in the results anymore, however…

Note: the dots from the payload below aren’t usual dots but Unicode middle dots – that is important.

PAYLOAD_SIZE = 8_301_500
data = 'a' * PAYLOAD_SIZE + '···'

client = Api.new
client.get
client.add(data)
client.del(data)
puts client.get #=> ["aaaaaaaaaaaa...\xC2\xB7\xC2\xB7\xC2\xB7"]

The data is still there! Because the data consists multibyte characters, the payload got trimmed, and we’ve ended up with a non-direct GET operation (DATA,) instead of a DEL. We had three multibyte characters in the data, and because of that, Ruby removed three last characters from the string before sending it to the server.

Patching things up

As a temporary patch you can use the body_stream instead of using body combined with Ruby StringIO:

Net::HTTP::Post
  .new('/', 'Content-Type': 'application/json')
  .tap { |req| req.body_stream = StringIO.new(operation) }
  .tap { |req| req.content_length = operation.bytesize }
  .yield_self(&@http.method(:request))
  .yield_self(&:body)

or if you use Faraday, you can just apply following patch:

module NetHttpFaradayPatch
  def create_request(env)
    super.tap do |request|
      if env[:body].respond_to?(:read)
        request.content_length = env[:body].size
      end
    end
  end
end

Faraday::Adapter::NetHttp.prepend(NetHttpFaradayPatch)

Here’s the proper fix, however Ruby 2.6.1 has not yet been released.

Summary

It’s a rather unpleasant bug, and I’m quite surprised that despite being fixed, new Ruby version hasn’t had been released yet Ruby 2.6.1 has been released and it fixes the issue. For now, if my patches work for you, that’s great but anyhow I would advise you to downgrade to Ruby 2.5.3 . It’s hard to be sure, that there aren’t other scenarios in which this bug may become even more problematic.


Cover photo by theilr on Attribution-ShareAlike 2.0 Generic (CC BY-SA 2.0) license.

The post Exploring a critical Net::Protocol issue in Ruby 2.6.0p0 and how it can lead to a security problem appeared first on Running with Ruby.

1 months ago

Plataformatec @ São Paulo › Brazil - Jan 04

Building a new MySQL adapter for Ecto Part IV: Ecto Integration

Welcome to the “Building a new MySQL adapter for Ecto” series: Part I: Hello World Part II: Encoding/Decoding Part III: DBConnection Integration Part IV: Ecto Integration (you’re here!) After DBConnection integration we have a driver that should be usable on its own. The next step is to integrate it with Ecto so that we can: ... » 2 months ago

Welcome to the “Building a new MySQL adapter for Ecto” series:

After DBConnection integration we have a driver that should be usable on its own. The next step is to integrate it with Ecto so that we can:

  • leverage Ecto (doh!) meaning, among other things, using changesets to cast and validate data before inserting it into the DB, composing queries instead of concatenating SQL strings, defining schemas that map DB data into Elixir structs, being able to run Mix tasks like mix ecto.create and mix ecto.migrate, and finally using Ecto SQL Sandbox to manage clean slate between tests
  • tap into greater Ecto ecosystem: integration with the Phoenix Web framework, various pagination libraries, custom types, admin builders etc

Ecto Adapter

If you ever worked with Ecto, you’ve seen code like:

defmodule MyApp.Repo do
  use Ecto.Repo,
    adapter: Ecto.Adapters.MySQL,
    otp_app: :my_app
end

The adapter is a module that implements Ecto Adapter specifications:

Adapters are required to implement at least Ecto.Adapter behaviour. The remaining behaviours are optional as some data stores don’t support transactions or creating/dropping the storage (e.g. some cloud services).

There’s also a separate Ecto SQL project which ships with its own set of adapter specifications on top of the ones from Ecto. Conveniently, it also includes a Ecto.Adapters.SQL module that we can use, which implements most of the callbacks and lets us worry mostly about generating appropriate SQL.

Ecto SQL Adapter

Let’s try using the Ecto.Adapters.SQL module:

defmodule MyXQL.EctoAdapter do
  use Ecto.Adapters.SQL,
    driver: :myxql,
    migration_lock: "FOR UPDATE"
end

When we compile it, we’ll get a bunch of warnings as we haven’t implemented any of the callbacks yet.

warning: function supports_ddl_transaction?/0 required by behaviour Ecto.Adapter.Migration is not implemented (in module MyXQL.EctoAdapter)
  lib/a.ex:1

warning: function MyXQL.EctoAdapter.Connection.all/1 is undefined (module MyXQL.EctoAdapter.Connection is not available)
  lib/a.ex:2

warning: function MyXQL.EctoAdapter.Connection.delete/4 is undefined (module MyXQL.EctoAdapter.Connection is not available)
  lib/a.ex:2

(...)

Notably, we get a module MyXQL.EctoAdapter.Connection is not available warning. The SQL adapter specification requires us to implement a separate connection module (see Ecto.Adapters.SQL.Connection behaviour) which will leverage, you guessed it, DBConnection. Let’s try that now and implement a couple of callbacks:

defmodule MyXQL.EctoAdapter.Connection do
  @moduledoc false
  @behaviour Ecto.Adapters.SQL.Connection

  @impl true
  def child_spec(opts) do
    MyXQL.child_spec(opts)
  end

  @impl true
  def prepare_execute(conn, name, sql, params, opts) do
    MyXQL.prepare_execute(conn, name, sql, params, opts)
  end
end

Since we’ve leveraged DBConnection in the MyXQL driver, these functions are simply delegating to driver. Let’s implement something a little bit more interesting.

Did you ever wonder how Ecto.Changeset.unique_constraint/3 is able to transform a SQL constraint violation failure into a changeset error? Turns out that unique_constriant/3 keeps a mapping between unique key constraint name and fields these errors should be reported on. The code that makes it work is executed in the repo and the adapter when the structs are persisted. In particular, the adapter should implement the Ecto.Adapters.SQL.Connection.to_constraints/1 callback. Let’s take a look:

iex> b Ecto.Adapters.SQL.Connection.to_constraints
@callback to_constraints(exception :: Exception.t()) :: Keyword.t()

Receives the exception returned by c:query/4.

The constraints are in the keyword list and must return the constraint type,
like :unique, and the constraint name as a string, for example:

    [unique: "posts_title_index"]

Must return an empty list if the error does not come from any constraint.

Let’s see how the constraint violation error looks exactly:

$ mysql -u root myxql_test
mysql> CREATE TABLE uniques (x INTEGER UNIQUE);
Query OK, 0 rows affected (0.17 sec)

mysql> INSERT INTO uniques VALUES (1);
Query OK, 1 row affected (0.08 sec)

mysql> INSERT INTO uniques VALUES (1);
ERROR 1062 (23000): Duplicate entry '1' for key 'x'

MySQL responds with error code 1062. We can further look into the error by using perror
command-line utility that ships with MySQL installation:

% perror 1062
MySQL error code 1062 (ER_DUP_ENTRY): Duplicate entry '%-.192s' for key %d

Ok, let’s finally implement the callback:

defmodule MyXQL.EctoAdapter.Connection do
  # ...

  @impl true
  def to_constraints(%MyXQL.Error{mysql: %{code: 1062}, message: message}) do
    case :binary.split(message, " for key ") do
      [_, quoted] -> [unique: strip_quotes(quoted)]
      _ -> []
    end
  end
end

Let’s break this down. We expect that the driver raises an exception struct on constraint violation, we then match on the particular error code, extract the field name from the error message, and return that as keywords list.

(To make this more understandable, in the MyXQL project we’ve added error code/name mapping so we pattern match like this instead: mysql: %{code: :ER_DUP_ENTRY}.)

To get a feeling of what other subtle changes we may have between data stores, let’s implement one more callback, back in the MyXQL.EctoAdapter module.

While MySQL has a BOOLEAN type, turns out it’s simply an alias to TINYINT and its possible values are 1 and 0. These sort of discrepancies are handled by the dumpers/2 and loaders/2 callbacks, let’s implement the latter:

defmodule MyXQL.EctoAdapter do
  # ...

  @impl true
  def loaders(:boolean, type), do: [&bool_decode/1, type]
  # ...
  def loaders(_, type),        do: [type]

  defp bool_decode(<<0>>), do: {:ok, false}
  defp bool_decode(<<1>>), do: {:ok, true}
  defp bool_decode(0), do: {:ok, false}
  defp bool_decode(1), do: {:ok, true}
  defp bool_decode(other), do: {:ok, other}
end

Integration Tests

As you can see there might be quite a bit of discrepancies between adapters and data stores. For this reason, besides providing adapter specifications, Ecto ships with integration tests that can be re-used by adapter libraries.

Here’s a set of basic integration test cases and support files in Ecto, see: ./integration_test/ directory.

And here’s an example how a separate package might leverage these. Turns out that ecto_sql uses ecto integration tests:

# ecto_sql/integration_test/mysql/all_test.exs
ecto = Mix.Project.deps_paths[:ecto]
Code.require_file "#{ecto}/integration_test/cases/assoc.exs", __DIR__
Code.require_file "#{ecto}/integration_test/cases/interval.exs", __DIR__
# ...

and has a few of its own.

When implementing a 3rd-party SQL adapter for Ecto we already have a lot of integration tests to run against!

Conclusion

In this article we have briefly looked at integrating our driver with Ecto and Ecto SQL.

Ecto helps with the integration by providing:

  • adapter specifications
  • a Ecto.Adapters.SQL module that we can use to build adapters for relational databases even faster
  • integration tests

We’re also concluding our adapter series. Some of the overarching themes were:

  • separation of concerns: we’ve built our protocol packet encoding/decoding layer stateless and separate from a process model which in turn made DBConnection integration more straight-forward and resulting codebase easier to understand. Ecto also exhibits a separation of concerns: not only we have separate changeset, repo, adapter etc, within adapter we have different aspects of talking to data stores like storage, transactions, connection etc.
  • behaviours, behaviours, behaviours! Not only behaviours provide a thought-through way of organizing the code as contracts, as long as we adhere to those contracts, features like e.g. DBConnection resilience and access to Ecto tooling and greater ecosystem becomes avaialble.

As this article is being published, we’re getting closer to shipping MyXQL’s first release as well as making it the default MySQL adapter in upcoming Ecto v3.1. You can see the progress on elixir-ecto/ecto_sql#66.

Happy coding!

2 months ago

Sam Saffron @ Sydney › Australia - Jan 02

Logster and our error logging strategy at Discourse

I have always been somewhat fascinated with logs. I tend to see the warning and error logs in production as a valuable heartbeat of an application. Proper handling of error logs is a very strong complement to a robust test suite. It shows us what really happens when real world data meets our application.

9 years ago, at Stack Overflow we had a daily ritual where we would open up our fork 2 months ago

I have always been somewhat fascinated with logs. I tend to see the warning and error logs in production as a valuable heartbeat of an application. Proper handling of error logs is a very strong complement to a robust test suite. It shows us what really happens when real world data meets our application.

9 years ago, at Stack Overflow we had a daily ritual where we would open up our fork of ELMAH every morning and fish through our logs for problems. This had a dramatic positive effect on Stack Overflow.

Almost 7 years into our journey building Discourse, every single week we find and fix issues in our application thanks to our error logs and Logster. Error logs are the pulse of our application, they let us know immediately if there are any urgent issues and where. Since we host more than 1500 sites running many different code branches, we needed to evolve a sane and robust set of practices and tools.

Top level structure of logging and monitoring at Discourse

We have lots of logs at Discourse and many systems for dealing with them.

  • We keep raw Docker, Postgres, Redis, NGINX, Rails and HAProxy and so on in Elastic Search and use Kibana for business intelligence.

  • We have a monitoring system built on alertmanager and Prometheus, with business intelligence in Grafana and alert escalation in our internal Discourse instance and opsgenie.

  • We have logster which we use for web application aka. “Rails / Sidekiq” warnings and errors.

I would like to focus on logster and our Rails / Sidekiq portion for this blog post, but think it is worth mentioning other mechanisms cause I don’t want people to think we are not good data hoarders and only have very limited visibility into our systems.

About Logster

At Discourse we developed a log viewer called logster.

logo-logster-cropped-small

Logster is a free and open source tool you can embed into any Ruby on Rails or Rack application in production and development. It runs as Rack middleware and uses Redis as its backend for log storage and analysis.

It operates in two different modes:

  • In production mode it aggregates similar errors by fingerprinting backtraces listening for warnings/errors and fatal messages. The intention is to display a list of open application problems that can somehow be resolved.

  • In development mode it provides a full fire-hose of all logs produced by Rails. (debug and up). This has significant advantages over console as you have proper access to backtraces for every log line.

Here are a few screenshots from logs on this very blog (accessible to admins at https://discuss.samsaffron.com/logs):

Each error log has a full backtrace

Web requests have extensive environment info, including path, ip address and user agent.

Logster has accumulated a large amount of very useful features over the years, including:

  • The ability to suppress errors from the logs until the application is upgraded. (The solve button)

  • The ability to protect certain log messages so they are not purged when clear all is clicked.

  • Advanced filtering, including regex and reverse regex search

  • Custom environment (ability to tag current thread with arbitrary metadata)

  • JavaScript error and backtrace support

  • Rich API allowing you to suppress patterns, ship errors from other instances, integrate automatically into Rails and so on.

The Logter project is still very much alive, recently our part time developer Osama added a mobile view and upgraded the Ember frontend to latest Ember. We have many exciting new features planned for 2019!

Giving up on tail -f logs/development.log

I do not remember the last time I tailed logs in development. There are a few reasons this does not happen anymore.

  • Most of the time when building stuff I use TDD, using our rake autospec tool. I will focus on one broken test. Every time I save a file it automatically triggers the test to re-run, if I need extra diagnostics I sprinkle puts statements.

  • If I am dealing with a specific error on a page I often find working with better_errors far more effective than reading logs.

  • If I need access to logs I will always prefer using logster in development. It allows me to filter using a text pattern or log level which is a huge time saver. It also provides information that is completely absent from the Rails logs on a per-line basis (environment and backtrace).

I sprinkled Rails.logger.warn("someone called featured users, I wonder who?") and filtered on “featured”


Death by 10,000 log messages in production

Logster attempts to provide some shielding against log floods by grouping based off stack traces. That said, we must be very diligent to keep our logs “under control”.

For the purpose of our Logster application logs usage we like to keep the screens focused on “actionable” errors and warnings. Many errors and warnings that get logged by default have no action we can take to resolve. We can deal with these elsewhere (offending IPs can be blocked after N requests and so on).

Here are a non exhaustive example of some “errors” that we really have no way of dealing with so they do not belong in Logster.

  • A rogue IP making a web request with corrupt parameter encoding

  • A 404 to index.php which we really do not care about

  • Rate limiting … for example a user posting too fast or liking too fast

  • Rogue users making a requests with an unknown HTTP verbs

Another interesting point about our use of Logster is that not all errors that float into our logs mean that we have a broken line of code in our application that needs fixing. In some cases a backup redis or db server can be broken so we will log that fact. In some cases there is data corruption that the application can pick up and log. Sometimes transactions can deadlock.

Keeping our Logster logs useful is extremely important. If we ignore in-actionable errors for long enough we can end up with a useless error log where all we have is noise.

Proactively logging issues

Given we have a high visibility place to look at errors. We will sometimes use our error logs to proactively report problems before a disaster hits.

In this case we are watching our “defer” queue, which is a special thread we have for light-weight jobs that run between requests on our web workers in a background thread. We need this queue to be serviced quickly if it is taking longer than 30 seconds per job we have a problem… but not necessarily a disaster. By reporting about this early we can correct issues in the job queue early, rather than dealing with the much more complex task of debugging “queue starvation” way down the line. (which we also monitor for)

The logs hot potato game :potato:

Half a year ago or so we introduced a fantastic game within our development team. The idea is very simple. Every developer attempts to correct an issue raised in our error logs and then assigns to the next person on the list.

We attempted many other patterns in the past, including:

  • Having our internal Discourse instance raise a big warning when too many errors are in the logs (which we still use)

  • Having “log parties” where a single team member triages the logs and assigns issues from the logs to other team members.

  • Having arbitrary triage and assign.

The “logs game” has proven the most effective at resolving a significant number of issues while keeping the entire team engaged.

We structure the game by having a dedicated Discourse topic in our internal instance with a list of names.

When we resolve issues based on log messages we share the resolution with the team. That way as the game progresses more people learn how to play it and more people learn about our application.

Once resolved, the team member hands the torch to the next person on the list. And so it goes.

This helps all of us get a holistic picture of our system, if logs are complaining that our backup redis instance can not be contacted, this may be a provisioning bug that needed fixing. For the purpose of the “logs game” fixing system issues is also completely legitimate, even though no line of code was committed to Discourse to fix it.

Should my Ruby web app be using Logster?

There are many other products for dealing with errors in production. When we started at Discourse we used errbit these days you have many other options such as sentry, airbrake or raygun.

One big advantage Logster has is that it can be embedded so you get to use the same tool in development and production with a very simple setup. Once you add it to your Gemfile you are seconds away from accessing logs at /logs.

On the other hand the for-pay dedicated tools out there have full time development teams building them with 100s of amazing features.

Logster is designed so it can work side-by-side with other tools, if you find you need other features you could always add an additional error reporter (or submit a PR to Logster).

Regardless of what you end up choosing, I recommend you choose something, there is enormous value in regular audits of errors and better visibility of real world problems your customers are facing.

2 months ago

François Lamontagne (Ruby Fleebie) @ Trois-Rivières, QC › Canada - Dec 14

Deploy your Rails applications like a pro with Dokku and DigitalOcean

UPDATE December 14th, 2018. This tutorial has been updated to target Dokku version 0.12.13. After reading this tutorial, you will be able to: Create your own server (droplet) on the cloud using the DigitalOcean cloud architecture. (I will also share with you a link that will give you $10 credit at DigitalOcean). Install your first […] 2 months ago

UPDATE December 14th, 2018.

This tutorial has been updated to target Dokku version 0.12.13.

After reading this tutorial, you will be able to:

  • Create your own server (droplet) on the cloud using the DigitalOcean cloud architecture. (I will also share with you a link that will give you $10 credit at DigitalOcean).
  • Install your first DOKKU plugin. In this case, a Postgresql database plugin
  • Automate your database migrations using the app.json manifest file
  • Create a swap file to prevent memory issues when using the cheapest droplet type (1 GB)
  • Setup zero downtime deployments using the CHECKS feature
  • NEW: Remove unused containers to make sure there is always enough space on your droplet

I’ve tested each step of this tutorial multiple times so you should not run into any issues. If you do however, please leave me a comment at the end of this post and we will sort it out together!


Heroku has become the standard to host Ruby On Rails web applications. It is understandable because Heroku has such a great infrastructure. Deploying is a matter of typing “git push heroku master” and you’re pretty much done!
The thing is, if you are part of a small development team or you are a freelancer, the cost of using Heroku for all your clients / projects might become a real issue for you. This is where Dokku comes in! But what is Dokku?
The description on the Dokku home page is pretty self-explanatory:

The smallest PaaS implementation you’ve ever seen. Docker powered mini-Heroku in around 200 lines of Bash

So, there you have it. A “mini-heroku” that you can self-host or, better perhaps, use on an affordable cloud infrastructure such as DigitalOcean (use that previous link to get a $10 credit). Small teams and freelancers can now deploy like the pros at a fraction of the cost. Follow this tutorial and soon, you too, will be able to deploy your Rails apps simply by typing: git push dokku master. How neat is that? Sure you will have some configuring to do, but the overall process is not that complicated. This tutorial will show you how to get there.

Get your $10 credit here:

Are you ready for the tutorial…?

DigitalOcean

First, create the droplet on DigitalOcean.

Then you have to choose the size of the droplet. Let’s choose the cheapest option (Small teams and freelancers love cheap options. We’re broke!)

Choose your image! Don’t miss this step, it’s very important. Don’t choose a Rails preset or a Ubuntu image. Remember, we want Dokku!

Add your ssh key(s) for a more secure access to your droplet.
SSH Keys
Then select the number of droplets to create and choose a hostname
Choose an hostname
Finally, click on the “Create” button and wait until your droplet is fully created!
Waiting, I hate waiting...
The DigitalOcean part is done. Now we have to make sure we can log in to our droplet

Connect to our droplet via SSH

Open a terminal window and connect to your droplet, like this:

ssh root@your-droplet-ip

Make sure the Dokku user can connect using your SSH key as well

When you will deploy your app with git, the “dokku” user will be used instead of root, so you need to make sure that this user can connect to your droplet. I’m not sure if this is supposed to be configured automatically when you create your droplet, but it didn’t work for me. Have a look at the file located in /home/dokku/.ssh/authorized_keys (on your droplet). If it’s empty like it was for me, run this command:

cat /root/.ssh/authorized_keys | sshcommand acl-add dokku dokku

Add a swap file!

Since we chose the cheapest option (1 GB), we might run into memory problems when we will deploy our Rails application. Rails assets compilation will make your deploy fail. Don’t worry though, your web application will still be running smoothly. What’s the solution if we are determined to use our cheap 1G droplet? Simple, we just add a swap file as explained in this StackOverflow answer. What follows is (almost) an exact copy of that answer.
To see if you have a swap files:

sudo swapon -s

No swap file shown? Check how much disk space space you have:

df

To create a swap file:
Step 1: Allocate a file for swap

sudo fallocate -l 2048m /mnt/swap_file.swap

Step 2: Change permission

sudo chmod 600 /mnt/swap_file.swap

Step 3: Format the file for swapping device

sudo mkswap /mnt/swap_file.swap

Step 4: Enable the swap

sudo swapon /mnt/swap_file.swap

Step 5: Make sure the swap is mounted when you Reboot. First, open fstab

sudo nano /etc/fstab

Finally, add entry in fstab (only if it wasn’t automatically added)

/mnt/swap_file.swap none swap sw 0 0

Great, now we have our swap file. What’s next?

Create our application in Dokku

If you type the dokku command, the list of commands for dokku will be displayed on the screen. You should study it as it is very instructive, but for now we will simply use the dokku apps:create command to create our application.

dokku apps:create myapp

This will create a container for your new app.

Database? Sure, let’s use Postgres

To interact with a postgres database on Dokku, you need to use a plugin. Update december 2018: I’ve now changed the postgres plugin I use since the old one does not appear to be in active development anymore.

dokku plugin:install https://github.com/dokku/dokku-postgres.git postgres

Once it’s installed, feel free to type dokku postgres to see all available commands. Let’s create our database:

dokku postgres:create myapp

A new service called myapp has now been created. The next step is to link it to our application which happens to have the same name.

dokku postgres:link myapp myapp

Done! If you look at the output of this command, you will notice that an environment variable called DATABASE_URL has been configured. This will be your connection string to access your postgres database from your Rails app.

Speaking of environment variables…

Thanks to Ariff in the comments for asking questions about environment variables. The following section is a recap of what was discussed in the comments.
To configure a new environment variable for a given application, you do the following:

dokku config:set myapp SOME_SECRET_VAR='hello'

Note that you don’t have to manually set the SECRET_KEY_BASE environment variable which is used in the secrets.yml file of your Rails application. This is because the ruby buildpack already does this for you. As you can see in the source code, SECRET_KEY_BASE is set to a randomly generated key (have a look at the setup_profiled and app_secret methods).

Create our Rails app locally

Switch to your local workstation and create a new rails app.

  rails new myapp
  cd myapp
  git init .

Add a git remote to your Dokku application

   git remote add dokku dokku@your-droplet-ip:myapp

 

Open your database.yml and add your Dokku environment variable:

#...
production:
  adapter: postgresql
  url: <%= ENV['DATABASE_URL'] %> #This is the environment variable created by our Dokku command earlier
  encoding: unicode
  pool: 5

Off topic: Why not take this opportunity to use environment variables for all your secrets?

As for the Gemfile, make sure it has the following lines:

ruby '2.5.1' #or any other ruby version
gem 'rails_12factor', group: :production #rails library tuned to run smoothly on Heroku/Dokku cloud infrastructures
gem 'pg' #postgres gem
#...

We will also create a default controller to have somewhat of a functioning application. On your local worstation, run:

./bin/rails g controller static_pages

Create a new file named home.html.erb in app/views/static_pages and add the following:

<p>Hello world!</p>

In routes.rb, add:

root 'static_pages#home'

Are you ready? Run bundle install, commit everything then type:

git push dokku master

If you did everything correctly, you should see something like this after you pushed to dokku.(I edited the output to keep it brief).

-----> Discovering process types
 Default types for -> worker, rake, console, web
-----> Releasing myapp (dokku/myapp:latest)...
-----> Deploying myapp (dokku/myapp:latest)...
-----> Attempting to run scripts.dokku.predeploy from app.json (if defined)
-----> App Procfile file found (/home/dokku/myapp/DOKKU_PROCFILE)
-----> DOKKU_SCALE file found (/home/dokku/myapp/DOKKU_SCALE)
=====> console=0
=====> rake=0
=====> web=1
=====> worker=0
-----> Attempting pre-flight checks
 For more efficient zero downtime deployments, create a file CHECKS.
 See http://dokku.viewdocs.io/dokku/deployment/zero-downtime-deploys/ for examples
 CHECKS file not found in container: Running simple container check...
-----> Waiting for 10 seconds ...
-----> Default container check successful!
-----> Running post-deploy
-----> Configuring myapp.myapp...(using built-in template)
-----> Creating http nginx.conf
-----> Running nginx-pre-reload
 Reloading nginx
-----> Setting config vars
 DOKKU_APP_RESTORE: 1
-----> Found previous container(s) (3594ff49f81c) named myapp.web.1
=====> Renaming container (3594ff49f81c) myapp.web.1 to myapp.web.1.1544803301
=====> Renaming container (40f628df49af) quizzical_raman to myapp.web.1
-----> Attempting to run scripts.dokku.postdeploy from app.json (if defined)
-----> Shutting down old containers in 60 seconds
=====> 3594ff49f81c171fefe56bca68742d98cde2cd18d5111b28d4ea32ed5e59afe9
=====> Application deployed:
 http://myapp.myapp

Obviously if you type myapp.myapp in the browser, it will not work. What you have to now is to point a domain to your new droplet.

Configuring a domain

If you don’t have any top level domain, the fastest way would be to add a subdomain record for one of a domain that you own. Then have it point to your droplet IP.

Once you’ve done that, run the following command on your dokku droplet

dokku domains:add myapp example.yourdomain.com

Open a browser and type example.yourdomain.com. You should see an ugly “Hello World!”, congratulations!

Configure pre-flight checks

Something might have caught your attention when we deployed our application:

-----> Running pre-flight checks
       For more efficient zero downtime deployments, create a file CHECKS.
       See http://progrium.viewdocs.io/dokku/checks-examples.md for examples
       CHECKS file not found in container: Running simple container check...

Checks in Dokku are a way to setup zero downtime deployments. You don’t want your users to get an error page while your server is restarting. Since we have not created any custom check, dokku run a default check that simply make sure that the new container is up and running before pointing to the new app. The problem is it will not check if puma has been fully loaded. Let’s create a super simple check to make sure our Rails application is available.

At the root of your app, create a file named CHECKS and add the following:

WAIT=8 #Wait 8 seconds before each attempt
ATTEMPTS=6 #Try 6 times, if it still doesn't work, the deploy has failed and the old container (app) will be kept
/check_deploy deploy_successful

Important: Leave an empty line at the end of this file, otherwise Dokku might not detect your check. Is this a bug? I don’t know… but it took me a while to figure this one out!

Now create a file called check_deploy in your rails public directory and add the text:

deploy_successful

In other words, dokku will try 6 times to obtain the “deploy_successful” string after calling “/check_deploy”.
Push everything to dokku and verify the output. You will probably see something like that:

-----> Running pre-flight checks
-----> Attempt 1/6 Waiting for 8 seconds ...
       CHECKS expected result:
       http://localhost/check_deploy => "deploy_successful"
-----> All checks successful!

Database migrations

Before Dokku 0.5, it was not really possible to have your database migrations run automatically on deploy. You had to do it in two steps. First you deploy, then you migrate by typing: ssh root@your-domain dokku run myapp rake db:migrate

Fortunately, we can automate the process now that Dokku supports the app.json manifest file. Create a app.json file in the root of your repository and add this:

{
  "name": "myapp",
  "description": "Dummy app to go along the dokku tutorial found on rubyfleebie.com",
  "keywords": [
    "dokku",
    "rails",
    "rubyfleebie.com"
  ],
  "scripts": {
    "dokku": {
      "postdeploy": "bundle exec rake db:migrate"
    }
  }
}

Let’s create a dummy model to see if the migrations will be run.

./bin/rails g model Book

You can then migrate your database in development if you want. Once it done, commit and push to dokku. the output should look like this:
-----> Running post-deploy
-----> Attempting to run scripts.dokku.postdeploy from app.json (if defined)
-----> Running 'rake db:migrate' in app container
       restoring installation cache...
       Migrating to CreateBooks (20160405194531)
       == 20160405194531 CreateBooks: migrating ======================================
       -- create_table(:books)
          -> 0.0139s
       == 20160405194531 CreateBooks: migrated (0.0142s) ==========

How cool is that? I hoped you enjoyed this tutorial. Your comments are appreciated!

 

Ready to use in production? Make sure to clear old and unused containers from time to time!

If you want to use Dokku in production, make sure to remove containers no longer in use, because the underlying Docker platform WILL NOT automatically delete them for you. If you don’t, the space on your droplet will grow and will ultimately crashes your app! Fortunately, in newer docker versions, pruning old containers is very easy, simply run the following command once in a while:

docker container prune

If you are using an older version of Docker and the prune command above does not exist, there is another way to clear unused containers. Have a look at this SO answer.

Troubleshooting

Dushyant in the comments had some errors on deploy. He found out that his problem was related to the numbers of containers configured when using DigitalOcean 5$ plan. I didn’t run into this problem myself, so here is what Dushyant says about it:
« Finally I found the solution. My previous solution got me working but ultimately that wasn’t the true solution.
It is happening because of containers and because of 5 dollar plan.
You can get the list of containers by this command
docker ps
Then remove the unwanted containers
docker rm -f docker_id
»

 

What’s next?

How about automating your database backups and storing them on a zero-knowledge cloud architecture?

2 months ago

Ryan Davis (Polishing Ruby) @ Seattle, WA › United States - Nov 19

Speaker Pro-Tips

I just gave my talk and I think it went swimmingly. One thing that really worked for me was putting it together in Keynote. Despite having a ton of movies and images, most of the tedium was automatically handled by using keynote to do almost all of the work.

My process:
  1. Come up with a talk proposal. Six, actually. Submit them.
  2. Hope one gets accepted.
  3. Assumi 3 months ago

I just gave my talk and I think it went swimmingly. One thing that really worked for me was putting it together in Keynote. Despite having a ton of movies and images, most of the tedium was automatically handled by using keynote to do almost all of the work.

My process:

  1. Come up with a talk proposal. Six, actually. Submit them.
  2. Hope one gets accepted.
  3. Assuming one gets accepted, accept the talk with the organizer.
  4. Procrastinate, but hope that you subconsciously chew on it.
  5. Work on an outline in omnioutliner. This is important.
  6. Print it out, scribble all over it, put edits back in document.
  7. Once I’m happy with it, export to powerpoint.
  8. Open in keynote.
  9. Select my theme, switch all slides to my normal slide master, color all of them bright pink. This is also important.
  10. Switch to outline view.
  11. From the bottom, on a section by section basis, un-indent each outline item to the top level. This makes a new slide.
  12. For each section, switch masters to my section master (but pink sticks).
  13. Go to navigator view, indent each normal slide under the section/subsection slide. This lets you collapse-all and work on a section-by-section basis easily.
  14. Go either no navigator view or light table view and search out pink slides to work on.
  15. Once a slide is “done”, Use “Reapply Master to Slide” to turn off pink.
  16. Work on notes and start reading it out loud a lot. Wherever something doesn’t make sense, sounds clunky, or can’t be explained in < 5 seconds, add a slide. Yes, you heard that right, add a slide, or three. These areas need more slides to make connecting the conceptual dots easier for the audience. Do not add more words to the slide.

Usually, steps 5-6 take me the longest amount of time and effort… Getting the outline right is super important. I would much rather fix things at this stage than in the slide stage.

Steps 10-13 get me from 0 slides to 200+ slides in a matter of minutes. It feels fantastic.

Steps 14-15 also take a fair amount of time.

My Optimizations:

All of these steps benefit greatly by adding extra keyboard shortcuts via System Preferences -> Keyboards -> Shortcuts -> App Shortcuts. I have:

  • Collapse All: Cmd-Opt-0
  • Expand All: Cmd-Opt-9
  • Fit in Window: Cmd-Opt-=
  • Light Table: Cmd-Opt-L
  • Navigator View: Cmd-Opt-N
  • Outline View: Cmd-Opt-U
  • Play Slideshow: Cmd-P (why would I EVER print a keynote?)
  • Reapply Master: Cmd-Control-M (doesn’t always work? Apple? Help?)
  • Rehearse Slideshow: Cmd-R

I use these a lot and they help me keep moving.

My Automation:

Figuring stuff out:

Applescript can be a real PITA, but most of the stuff I do is pretty straightforward.

While I don’t have a REPL for applescript, I can introspect on things fairly easily. I almost always have a script on the side with something like the following:

1
2
3
4
5
6
7
8
9
tell application "Keynote"
  tell front document
    tell current slide
      tell movie 1
        properties
      end tell
    end tell
  end tell
end tell

Which outputs something like:

{opacity:100, parent:slide 40 of document id
"1B0AA7F1-A88F-4C42-A6BE-FC67CCC70D1A" of application "Keynote", movie
volume:100, class:movie, file name:"lines1.mp4", reflection
showing:false, rotation:0, position:{260, 80}, width:1400, reflection
value:0, height:919, repetition method:loop, locked:false}

That quickly let’s me know what I can mess with, or let’s me do the math I need to figure out positioning, etc.

Resize Movie

I made this in the midst of working on this latest talk. The 10-15 minutes it took saved me a TON of time. I had 43 movies and 46 images in this talk (eg: count of movies of every slide). It was easily worth the investment many times over.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
tell application "Keynote"
  tell front document
    set myWidth to width
    set myHeight to height
    tell current slide
      tell movie 1
        -- since "constrain proportions" is on, only set one:
        set height to 919
        -- set width to 1400

        set position to {(myWidth / 2) - (width / 2), (myHeight / 2) - (height / 2)}

        set repetition method to loop
      end tell
    end tell
  end tell
end tell

It takes the (first) movie in the current slide and does the following:

  • set the height (or the width, but not both) to fit for that master. If I have different masters with different spacing, then I use multiple scripts. I don’t want applescript to have to figure that out (tho, thinking about it, that should be easy based on the master name).
  • Calculate it’s position to be centered by using the dimensions of the slide vs the dimensions of the image.
  • Turn on looping.

I’d like it to turn on auto-start as well, but that’s not a property of the movie for some reason and I didn’t want to dive into UI scripting. This script already saved me a ton of time and I felt like I was ahead of the ball for once.

In another case, I wasn’t centering so the maths weren’t as easy. I wound up manually positioning it, figuring out the X/Y, and injecting some of those numbers straight in:

1
2
3
4
5
6
7
8
9
tell movie 1
  -- since "constrain proportions" is on, only set one:
  set height to 800
  -- set width to 1400

  set position to {myWidth - width - 84, 193}

  set repetition method to loop
end tell

Copy Body To Speaker Note

I also have some scripts to transfer content into speaker notes. This one does the body:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
tell application "Keynote"
  tell front document
    set the base slide of the current slide to master slide "Takahashi Blue"
    tell current slide
      -- set myTitle to object text of default title item
      set myBody to object text of default body item

      set presenter notes to myBody
      set object text of default body item to ""

      set body showing to false
    end tell
  end tell
end tell

Copy Title To Speaker Note

And this one does the title.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
tell application "Keynote"
  tell front document
    set the base slide of the current slide to master slide "Takahashi Blue"
    tell current slide
      set myTitle to object text of default title item
      -- set myBody to object text of default body item
      
      set presenter notes to myTitle
      set object text of default title item to ""

      -- set title showing to false
    end tell
  end tell
end tell

Miscellaneous Tips:

  • cmd-F1 toggles mirroring. Works even after you “play” your keynote. Do this if your speaker notes don’t show up on your laptop or projector.
  • “x” will toggle your speaker notes and slides. Do this if your speaker notes go up on the projector.
  • “w” and “b” will blank the screen either white or black, respectively.

And very important:

YOU are being recorded. The audience is not. During Q&A, if you do not repeat the question, nobody watching this later will know what was asked.

If I’m in your talk, I’ll remind you, ONCE. After that it is on you.


So… I hope that helps you work on your slides. It certainly has helped me.

3 months ago

Maciej Mensfeld (Running with Rails) @ Kraków › Poland - Nov 10

Extracting the device token from Xiaomi Air Purifier 2S EU for Domoticz usage

Xiaomi Air Purifier is one of the best on the market in the price/value category. Like many other Xiaomi devices, it can be controlled using a great home automation system called Domoticz. The only problem that I had is that for the 2S version, there is no way to obtain the device token needed for […]

The post Extracting the device token from Xiaomi Air Purifier 2S EU for Domoticz u 3 months ago

Xiaomi Air Purifier is one of the best on the market in the price/value category. Like many other Xiaomi devices, it can be controlled using a great home automation system called Domoticz.

The only problem that I had is that for the 2S version, there is no way to obtain the device token needed for controlling the device using the miIO library.

Here are the steps needed to obtain this token using a Linux machine and a non-rooted Android phone.

Getting the Mi Home data backup

  1. Download and install the Mi Home application. You need to have the 5.0.19 version of this app. The newer versions don’t persist the token locally in the SQLite database. You can get it from the APKMirror page. Note, that you will have to allow the Unknown sources to do that. Here’s a description on how to do that.
  2. Enable developer mode and USB debugging on your phone (instruction).
  3. Download and extract the Android Platform tools for Linux from here.
  4. Install the most recent Java version if you don’t have it.
  5. Enable developer mode and USB debugging on your phone and connect it to your computer.
  6. Unlock the phone and allow the computer for data transfers (not USB charging only).
  7. Authorize the Linux machine on your Android phone.
  8. Run the following command from your Linux machine:
    ./adb backup -noapk com.xiaomi.smarthome -f backup.ab
    
  9. Unlock your device and confirm the backup operation. If your phone is encrypted, you will also have to provide the backup password.
  10. If everything went smooth, you should have a backup.ab file in the platform tools for further processing. From this moment, you won’t need the phone to be connected to your laptop.

Getting the device token out of the Mi Home backup

  1. Download and unpack the ADB extractor.
  2. Navigate to the android-backup-tookit/android-backup-extractor/android-backup-extractor-20180521-bin directory.
  3. Copy the abe.jar file to where your backup.ab is.
  4. Run the following command and follow the instructions (if any):
    java -jar abe.jar unpack backup.ab backup.tar
    
  5. Unpack the extracted backup:
    tar -xvf backup.tar
    
  6. Go to the apps/com.xiaomi.smarthome/db directory.
  7. Install and run the DB Browser for SQLite:
    sudo apt-get install sqlitebrowser
    
  8. Open the miio2.db file using the DB Browsers for SQLite.
  9. Click on the Execute SQL tab and run the following query:
    SELECT localIP, token FROM devicerecord
    

    As a result, you will get a list of the tokens with the IPs of the devices in your networks to which they belong (I’ve blurred the tokens just in case in the picture):

Now you can take the appropriate token and use it within your Domoticz setup.

The post Extracting the device token from Xiaomi Air Purifier 2S EU for Domoticz usage appeared first on Running with Ruby.

3 months ago

Maciej Mensfeld (Running with Rails) @ Kraków › Poland - Oct 03

Simplifying internal validations using Dry-Validation

When building APIs for other developers, it’s often important to draw the line between other programmers input data and the internal world of your library. This process is called data validation and you’re probably familiar with this name. What you may not know, is the fact that it can be achieved in many ways. One […]

The post Simplifying internal validations using Dry- 5 months ago

When building APIs for other developers, it’s often important to draw the line between other programmers input data and the internal world of your library. This process is called data validation and you’re probably familiar with this name.

What you may not know, is the fact that it can be achieved in many ways.

One that I particularly like is by using the dry-validation library. Here’s an example on how you can separate the validation from the actual business logic without actually changing the API of your library.

The inline way

The easiest way to provide validations is to embed the checks in a place where you receive the data.

This approach is great for super simple cases like the one below:

def sum(arg1, arg2)
  raise ArgumentError unless arg1
  raise ArgumentError unless arg2

  arg1.to_i + arg2.to_i
end

sum(2, nil) #=> ArgumentError (ArgumentError)

However, if you decide to follow this road, you will quickly end up with something really unreadable.

This code sample is taken from the ruby-kafka library and it’s used to validate the method input. I’ve removed the business logic parts as they aren’t relevant to the context of this article:

def build(
  ca_cert_file_path: nil,
  ca_cert: nil,
  client_cert: nil,
  client_cert_key: nil,
  client_cert_chain: nil,
  ca_certs_from_system: nil
)
  return nil unless ca_cert_file_path ||
                    ca_cert ||
                    client_cert ||
                    client_cert_key ||
                    client_cert_chain ||
                    ca_certs_from_system

  if client_cert && client_cert_key
    # business irrelevant to the checks
    if client_cert_chain
      # business irrelevant to the checks
    end
  elsif client_cert && !client_cert_key
    raise ArgumentError, "initialized with ssl_client_cert` but no ssl_client_cert_key"
  elsif !client_cert && client_cert_key
    raise ArgumentError, "initialized with ssl_client_cert_key, but no ssl_client_cert"
  elsif client_cert_chain && !client_cert
    raise ArgumentError, "initialized with ssl_client_cert_chain, but no ssl_client_cert"
  elsif client_cert_chain && !client_cert_key
    raise ArgumentError, "initialized with ssl_client_cert_chain, but no ssl_client_cert_key"
  end

  # business
end

Despite looking simple, the if-elsif validation is really complex and it brings many things to the table:

  • it checks several variables,
  • mixes the checks together due to the if-flow,
  • in the end it actually only checks the presence of the variables,
  • despite expecting string values, it will work with anything that is provided,
  • it forces us to spec out the validation cases with the business logic as they are coupled together.

Luckily for us, there’s a better way to do that.

The private schema way

We can achieve the same functionality and much more just by extracting the validations into a separate internal class. Let’s build up only the interface for now.

Note, that I’m leaving the ArgumentError and the external API intact, as I don’t want this change to impact anything that is outside of this class:

require 'dry-validation'

# Empty schema for now, we will get there
SCHEMA = Dry::Validation.Schema {}

def build(
  ca_cert_file_path: nil,
  ca_cert: nil,
  client_cert: nil,
  client_cert_key: nil,
  client_cert_chain: nil,
  ca_certs_from_system: nil
)
  input = {
    ca_cert_file_path: ca_cert_file_path,
    ca_cert: ca_cert,
    client_cert: client_cert,
    client_cert_key: client_cert_key,
    client_cert_chain: client_cert_chain,
    ca_certs_from_syste: ca_certs_from_system
  }

  # Do nothing if there's nothing to do
  return nil if input.values.all?(&:nil?)

  results = SCHEMA.call(input)
  raise ArgumentError, results.errors unless results.success?

  # Business logic
end

We’ve managed to extract the validation logic outside.

Thanks to that, now we have:

  • separation of responsibilities,
  • business applying method that we can test against only valid cases,
  • validation object that we can test in isolation,
  • much cleaner API that can be easier expanded (new arguments, new data types supported, etc) and/or replaced,
  • way to handle more complex validations (types, formats, etc),
  • support for reporting multiple issues with the input at the same time.

We can now perform all the checks and only when everything is good, we will run the business. But what about the validation itself?

Actually all the validations below are copy-pasted from the karafka repository. Here’s the dry-validation documentation.

require 'dry-validation'

SCHEMA = Dry::Validation.Schema do
  %i[
    ca_cert
    ca_cert_file_path
    client_cert
    client_cert_key
    client_cert_chain
  ].each do |encryption_attribute|
    optional(encryption_attribute).maybe(:str?)
  end

  optional(:ca_certs_from_system).maybe(:bool?)

  rule(
    client_cert_with_client_cert_key: %i[
      client_cert
      client_cert_key
    ]
  ) do |client_cert, client_cert_key|
    client_cert.filled? > client_cert_key.filled?
  end

  rule(
    client_cert_key_with_client_cert: %i[
      client_cert
      client_cert_key
    ]
  ) do |client_cert, client_cert_key|
    client_cert_key.filled? > client_cert.filled?
  end

  rule(
    client_cert_chain_with_client_cert: %i[
      client_cert
      client_cert_chain
    ]
  ) do |client_cert, client_cert_chain|
    client_cert_chain.filled? > client_cert.filled?
  end

  rule(
    client_cert_chain_with_client_cert_key: %i[
      client_cert_chain
      client_cert_key
    ]
  ) do |client_cert_chain, client_cert_key|
    client_cert_chain.filled? > client_cert_key.filled?
  end
end

The execution effect is also really good:

build(ca_cert: 2) #=> {:ca_cert=>["must be a string"]} (ArgumentError)

Summary

Whenever you find yourself adding some inline validations, stop and think twice, there’s probably a better and more extendable way to do it.


Originally published at The Castle blog.

The post Simplifying internal validations using Dry-Validation appeared first on Running with Ruby.

5 months ago

Aaron Lasseigne @ Dallas, TX › United States - Aug 15

My Best Career Move

When I first started going to the Dallas Ruby Brigade (DRB) I was silent. I might as well have been invisible. A fly on the wall. I didn’t want to embarrass myself.

At the time, I was working in Perl and PHP and I wanted out. I tried a few languages and found myself really enjoying Ruby. I wanted to learn more about it and this group could teach me. I wanted to teach and presen 6 months ago

6 months ago

Tom Copeland (Junior Developer) @ Herndon, VA › United States - Aug 09

Safer JSON munging

At work we have an ETL process where we get a CSV from a partner and put it in a database as JSON. Sometimes the column names are a little off and we have to rename things. Also, sometimes we need to derive new values from the data we've imported, and sometimes we delete some of the unnecessary key/value pairs. So basically there's a good bit of munging going on there. 7 months ago

At work we have an ETL process where we get a CSV from a partner and put it in a database as JSON. Sometimes the column names are a little off and we have to rename things. Also, sometimes we need to derive new values from the data we've imported, and sometimes we delete some of the unnecessary key/value pairs. So basically there's a good bit of munging going on there.

Initially I was writing little one-off scripts to do this stuff. For each record, grab it, munge the JSON, save it. But that was kind of stressful since I was always one typo away from messing up the data and having to re-import or restore from the audit records. Unit tests help, of course, but having to write a new script each time was tedious too.

Thinking more about the problem, it felt less like an imperative "if key x is present then do y" and more like a series of instructions, or transformations, that we were running on the data. This felt nicer, more functional; send in a hash, get back another hash with a slight modification. Also, why spend 15 minutes writing a script when you can automate it in a day? This line of thinking resulted in a little flurry of classes like this:

  # h = {a: 2, b: 3}
  # CopyInst.new(:a, :z).call(h)
  # {:a => 2, :b=>3, :z=>2} # new h
  class CopyInst
    attr_reader :from, :to
    def initialize(from, to)
      @from = from
      @to = to
    end
    def call(hash)
      hash[to] = hash[from]
    end
  end

It's not quite functional because it modifies the hash argument rather than dup'ing. But each little instruction (CopyInst, RenameInst, and a couple of other business-specific ones) was easy to understand and test. And after instantiating a series of instructions you just iterate over them and call each one in turn:

def munge(hash)
  instructions.each do |inst|
    inst.call(hash)
  end
  hash
end

And out the other side comes a cleaned up hash suitable for to_json-ing.

Expressing these transformations as a series of instructions - a 'program' - suggests some interesting possibilities for the thing that is executing them - the 'virtual machine'. We could validate those instructions so that each one verifies it can be run on the target hash; check for missing keys for RenameInst, checks for already existing keys for the CopyInst, and so on. We could coalesce instructions to reduce the number of method calls. We could detect no-op sequences (copy 'a' to 'b' followed by copy 'b' to 'a') or common subexpressions. If there were non-technical users you could imagine defining a little user interface to let end users express whatever changes they wanted in terms of these instructions, and then save off that 'program' and run it. And there's probably an undo stack in there somewhere as well. All kinds of nifty compiler-ish things!

This experiment turned what was drudgery into an interesting and fun exercise. Always a win!

7 months ago
pluto.models/1.4.0, feed.parser/1.0.0, feed.filter/1.1.1 - Ruby/2.0.0 (2014-11-13/x86_64-linux) on Rails/4.2.0 (production)