other

#155: Can “The Rule Of Age 10” Really Predict Your Future?

Today we’re exploring an expert’s theory that your childhood passions (specifically around 5th grade) could hold the key to what you’ll love doing most as an adult. It was fascinating to take a look back and see if the idea really held up for us. We’re also confessing to a mistake we made when hiring a pro for a recent house project and sharing what we’ll do differently from here on out. Plus it turns out that plants aren’t the powerful air purifiers we were promised, and the nerdiest decorating project you’ve ever heard in your life.
You can download this episode from Apple Podcasts, Google Podcasts, Stitcher, TuneIn Radio, and Spotify – or listen to it below! Note: If you’re reading in a feed reader, you may have to click through to the post to see the player.
Continue reading #155: Can “The Rule Of Age 10” Really Predict Your Future? at Young House Love.

What We’ve Changed Since Painting Our Brick House White

This is a smorgasbord of an update, since the exterior of our house has changed in a bunch of different ways since last year when we painted it white with masonry paint that lets the brick breathe (you can read all about that project & the cost right here).
It’s my very favorite makeover we’ve ever done to date, but as I mentioned in that post (probably 10 times if I know myself), the exterior was still very much a work in progress after the house got painted. So without further ado, let’s talk about the new path we added, the awning, the new porch lights (we not only switched them out, we lowered them when we hung the awning) and a bunch of other landscaping related things that have happened over the last 12 months. And the few remaining things that we’re still working on… because that’s how it goes 😉
First let’s take a second to look back at the before shot because it blows my mind every time.
Continue reading What We’ve Changed Since Painting Our Brick House White at Young House Love.

#156: Renovations Are Stressful. Here’s What Helps Us.

It’s no secret that house projects can be stressful. Sometimes REALLY stressful. So today we’re sharing three ways we’re keeping a handle on it during our master bathroom renovation, especially around common stress points like broken budgets, creeping timelines, and decision fatigue. Plus, how you can you stop second guessing yourself! We’re also spilling the details about a part of this renovation we haven’t mentioned much, which oddly enough happens to be one of the things we’re looking forward to the most. We also have some holiday life hacks for you that were submitted by our listeners and why part of Sherry’s childhood has come back to haunt us (in the best way possible).
You can download this episode from Apple Podcasts, Google Podcasts, Stitcher, TuneIn Radio, and Spotify – or listen to it below! Note: If you’re reading in a feed reader, you may have to click through to the post to see the player.
Continue reading #156: Renovations Are Stressful. Here’s What Helps Us. at Young House Love.

Many Roads to the Algorithms Team at Stitch Fix

Academic composition of the Stitch Fix Algorithms team. Circle size is proportional to the number of team members with that background.

Data and algorithms are at the heart of Stitch Fix. The work of our Algorithms team (= data science + algorithms platform) spans nearly the breadth of the entire business from how we market, how we manage inventory, and how we help clients find what they love through personal styling and recommendations. Over the years, much of our success is a result of building a team with the diverse skills, training and experience to match the range of these applications.

The diversity of the problems we work on, and the data-rich environment of our business, make it more than possible, even essential, to bring the tools of multiple disciplines to bear on our hardest problems. Many of our biggest wins come from reframing problems and taking new approaches, often inspired by thinking about old problems in new ways across the boundaries of academic disciplines.

Data scientists and algorithms platform engineers at Stitch Fix have studied math, computer science, statistics, economics, physics, psychology, biology, chemistry and even epidemiology. We have folks that got their start in the social sciences like public policy and sociology, and team members who have delved into studying everything from architecture to marketing on their journey to the Stitch Fix Algorithms team. Taken together, these complementary backgrounds make the team greater than the sum of its parts.

The algorithmic transformation of business has just begun. We’re always looking for new team members to help us reinvent the retail experience. If you’re looking for a team that is achieving great things, we’d love for you to help us build the future!

2019 Summer Intern Projects

Thank you to all the 2019 summer interns that worked with the Stitch Fix Algorithms team. This post is a summary of some of the projects they worked on. We appreciate all your contributions and insights! We will soon be recruiting for summer 2020 interns.

Improving Estimation and Testing of Selection Models
Tiffany Cheng, graduate student in Statistics at Stanford University

This summer, I worked on the Inventory Forecasting and Health team, which focuses on the flow of inventory through Stitch Fix’s logistic systems, algorithmically allocating the right amount of inventory to warehouses, and quantifying the health of inventory. My work focused on estimating how many units of each item will be selected for clients on a given day.

The first part of my work focused on improving selection probability estimates for new items (i.e., the cold start problem). For example, suppose Stitch Fix adds a new women’s sweater to its inventory. How do we estimate its probability of being selected for a client when we don’t have any historical data for the item? Existing models used an average selection probability for a women’s sweater in general as an initial estimate. However, we had the hypothesis that feature-based models, which consider attributes that are often predictive of selection (e.g., silhouette, size, color, etc.), could provide a more informed initial estimate. Upon building and evaluating feature-based models, such as ridge, lasso, support vector machines, and random forest regressions, we found that their selection estimates were closer to true selection data than those from existing models, suggesting the value in the approach.

The second area of focus was designing and implementing a backtesting workflow which evaluates the performance of selection models in isolation. Historically, if we wanted to analyze changes to selection models, we ran a full backtest of the inventory forecasting system. However, since the forecasting system contains many dynamic modules, this captured errors from other sources. The new workflow allows users to identify and measure the errors introduced by just the selection model. It can be used to evaluate current selection models as well as future ones built for the forecasting system. In addition, it can inform the development of backtesting workflows for other modules within the forecasting system.

Simulated data illustrating the cold start problem for items new to inventory. Recall that existing models estimate a new item’s selection probability by averaging the selection probabilities of items currently in its category. This means that all new items within a category receive the same estimate (indicated by the green bin above). However, if we look at the distribution of current items’ selection probabilities (plotted in blue), we see a wide range of values, suggesting that a feature-based model could provide more informed and nuanced estimates.

Legend: Geo-randomized Analytics
Phuong Pham, graduate student in the Department of Computer Science and Biology – MIT

When we run marketing campaigns here at Stitch Fix, we always aim to measure as much of the campaigns’ impact as we can. For some marketing channels, we can run user-randomized campaigns, which are generally considered the gold standard of experimentation. However, for other channels like radio, this isn’t possible because we can’t control who hears the ad and who doesn’t. In these scenarios, we sometimes run so-called “geo-randomized” campaigns. In these campaigns, we slice up the US into regions and then randomly (or sometimes not so randomly) choose individual regions in which we run the campaign, and other regions where we don’t. By doing so, we can estimate how effective the campaign was by comparing the two sets of regions. If we see a substantially higher rate of conversion in the regions where the campaign was run relative to the other regions, then we gain confidence that this increase in conversion is due to the campaign. Of course, this is an oversimplification. The goal of this project is to apply more sophisticated techniques to analyze these geo-randomized campaigns in an automated and reproducible way.

Over the course of the summer, I built and productionized a system to compute metrics for geo-randomized campaigns using the Generalized Synthetic Control Method (Gsynth1). In simple terms, we impute the counterfactual outcome for each treated unit using control group information based on a linear interactive fixed effects model that incorporates unit-specific intercepts interacted with time-varying coefficients1 (simple, right?) In other words, we look at the regions in the control group (where the campaign isn’t running) and use that as a baseline to estimate how the treatment regions (where the campaign is running) would have behaved if we hadn’t actually run the campaign. We can then compare these counterfactuals with the observed outcomes in the treatment regions to estimate the incremental impact of the campaign. So simple!

Example map of regions included in a geo-randomized marketing campaign (regions for illustrative purposes only – not from a real campaign). Grey represents regions in the control group, where the campaign is not running. Teal represents those regions where the campaign is actively being run.

Multidimensional Latent Size
Hanjiang ‘Hans’ Li, graduate student in Statistics at UCLA.

An important part of recommending items for a client to our stylists is choosing items that will fit. This can be challenging; sizes vary quite a bit across brands, and no two clients of the same size share an exact shape.

This figure shows learned client sizes grouped by self-reported size.

We need to match clients and items in a multidimensional space using a limited number of measurements. Stitch Fix has developed size models that can predict when a client will report an item as “too big” or “too small”. We extended these models to also predict more specific feedback, like “these sleeves are too long”, or “the waist is too tight”.

In addition to height, weight, and other standard measurements we have from all clients, we ask detailed questions about the fit of each item the client has tried. The example below is from the Men’s checkout survey and shows the type of structured data we have available to model a more multidimensional notion of size.

The checkout survey where clients can provide feedback on fit and size.

Our earlier sizing models (inspired in part by item response theory, IRT) were fit question-by-question. We had a single model to predict the client’s response to the sleeve length question. Another model predicts the client’s response to the shoulder fit question. And, so on. In this new framework, we form a joint model, able to make predictions for all of such responses. The model also allows us to directly predict size-related information in our client’s profiles, like numeric or generic top size, or whether a client prefers “Loose” or “Fitted” tops. This structural change allows us to iterate on our client on-boarding experience, enabling experimentation for various ways of asking about size and fit without tightly coupling our recommendation models to the specific questions asked.

Automated Fitting for the Demand Model
Zhenyu Wei, PhD student in Statistics at the University of California at Davis.

This summer at Stitch Fix, I worked with the Forecasting and Estimation of Demand (FED) team. The FED team builds models of client behavior as a way to forecast demand and inform strategic and operational decisions. My project focused on developing an automated model fitting pipeline as well as outlier detection.

Outlier data points were identified using a differencing method in time series to obtain a distribution of the difference between the current time period and the previous time period. A proper quantile was then chosen to identify a cutoff deviation value. We would treat data points that are above this cutoff as potential outliers. This method is flexible enough to work for the variety of different data series that are used in demand forecasting.

Flagging outliers (red dots) in the time series. These may have been caused by engineering bugs, marketing campaigns, or other factors. The data shown is not real and for illustrative purposes only.

With regards to the automated model fitting pipeline, I also built a configuration-driven architecture for the pipeline and considered human-in-the-loop in this process. We will use this config driven framework to generalize the process to new models and new features, saving manual labor hours for the team. I also generated an email of the evaluation results to help people decide whether to update the new models to production.

Assessing accuracy of model fit by flagging data points with large residual values. There are likely features that are missing from the model that can explain recent data trends such as the green dots. The data shown is not real and for illustrative purposes only.

Choice Set Influence on Purchasing Decisions
Arjun Seshadri, PhD Student in Electrical Engineering at Stanford University.

An illustration of various choice sets and their corresponding purchasing decisions, the latter highlighted in red.

At the very core of Stitch Fix’s business is understanding the nuances of how clients decide to keep items in a shipment. Understanding client purchasing behavior helps our stylists and algorithms put together relevant shipments and allows us to better manage inventory. A natural way to model this behavior is to assume that the merchandise can be described by various attributes and that our clients express differing preferences for those attributes. The client decision to keep an item can then be modeled by the sum total of how well that item satisfies that client’s preferences over all the attributes. This approach, known sometimes as a factorization models, is powerful because it is very general and is fairly straightforward to infer from a limited amount of purchasing data. At the same time, such models ignore interactions between the items.

A big part of my summer internship was to explore and understand how the decision to buy an item is influenced by the assortment it is presented in. There are a myriad of reasons for why such interactions could take place. I’ll list a few examples: Clothing items are often purchased with consideration of how they will fit in with an outfit, and having two pieces of that outfit appear in the same assortment makes buying either one more compelling. Consumers are rarely looking to buying two similar pairs of jeans in one sitting, even though they may want a few pairs to pick one from, and this means multiple jeans in the same shipment are typically in competition with each other (though in some cases our client specifically request multiple pairs of jeans in order to compare them). More subtle reasons also exist. There’s a vast amount of literature in behavioral economics that suggests that some purchasing decisions are made purely as a result of the frame in which they are presented. Could a shipment be such a frame? Over the summer, I developed a modelling approach that accounts for item-collection interactions while retaining the power and efficiency of factorization models. I built a system and workflow around this new approach, which allowed me to leverage the sophisticated feature engineering of existing client purchasing models, while being able to explore and understand these effects.

References and Footnotes

[1]↩ Xu, Yiqing, Generalized Synthetic Control Method: Causal Inference with Interactive Fixed Effects Models (August 23, 2016). Political Analysis, Forthcoming; MIT Political Science Department Research Paper No. 2015-1. Available at SSRN: https://ssrn.com/abstract=2584200 or http://dx.doi.org/10.2139/ssrn.2584200

Newsvendor Problem – The Tale of the First Formula in the Textbook

Introduction
Near our offices here at Stitch Fix, there’s a weekly farmers market.

We’ve noticed an interesting pattern. When the market opens, seasonal produce, bread, and nuts fill tables and food stands. By the end of the day, however, the table with the strawberries is almost empty, while you can still select from a range of nuts at another table. Why is this the case?

Is it possible that the berry vendor constantly underestimates demand? And why hasn’t the vendor picked up on this pattern and adjusted supply to better meet demand?

A typical morning and afternoon for the strawberry vendor.

A typical day for another vendor.

We may not know the particulars for these vendors for sure, but we can use the newsvendor framing to try to better understand the tradeoffs they face every week.

The Newsvendor Model
In their blog post, Doug and Anahita [1]
covered detailed formulations of stochastic optimization problems. In this post, we will present a broad framework.

The newsvendor model is a standard problem formulation in Operations Management for making optimal capacity/inventory decisions under uncertainty
[2]
. You might say it’s “the first formula in the textbook.” Here’s how it goes:

An entrepreneurial newsvendor buys the daily newspaper from the distributor early
in the morning.
They try to sell as many papers as they can during the day.
At the end of their shift, they salvage what they couldn’t sell.

How many papers should the newsvendor buy?

Making this decision involves finding a good balance between having too many papers vs. not having enough. Let’s introduce some notation and frame the problem.

Cost of being one unit short (typically the margin per unit)

Cost of an unsold unit of inventory (typically characterized by Original Cost – Salvage Value, or by the estimated cost of carrying inventory for another period)

Probability density function of demand

Cumulative density function of demand

Number of units stocked (the decision quantity, with the optimal quantity ).

To find , we can compare the expected utility of each unit to its expected cost. Suppose we plan for having units, and are thinking about adding one more unit. We will then keep planning for additional inventory as long as the expected benefit of the next unit added is greater than the expected overage cost.

The st unit will be sold if demand is greater than , with probability .

Expected benefit of the st item:

Expected cost of the same item:

Finding the point at which the expected benefit of the last unit is equal to the expected cost gives us the optimal

The right-hand side of the above equation is sometimes referred to as the critical ratio. This ratio is between 0 and 1, and it corresponds to how much of the total uncertainty to cover with the stocking decision.

Critical ratio corresponds to the area
covered by the optimal stocking quantity

Products that are not perishable will have a high critical ratio, especially if the margins are high (e.g., toner cartridges, canned food, nuts, etc.). The other extreme is highly perishable products such as airline seats and fashion goods. In this case the critical ratio may well be below .5. We can think of these cases as ones where the decision maker should plan to sell out all inventory (airline seats and, yes, strawberries).

Credit: Eric Colson

As it turns out, the risk profiles (demand uncertainty, overage, and underage costs) of the nut vendor and the strawberry vendor are pretty different just based on perishability (overage cost) alone. The strawberry vendor likely wants to run out of inventory before the end of the day: what is unsold is probably not going to be worth bringing back to the next market, while the nut vendor can easily sell unsold inventory on a future day.

Applying the Newsvendor Formulation
Estimating the cost of underage may be relatively straightforward for most companies — missed profit opportunity is a good first estimate. Complexity may increase if effects of client retention may need to be qualified (customers who switch to another brand tend not to come back, and now underage costs may be greater than the profit margin). Additional complexity may arise when the effects of substitution need to be included (vendor runs out of inventory but some of the customers switch to an equivalent product that may be in stock resulting in underage costs that may be less than the profit margin).

Estimating the impacts of overage may end up being pretty involved as well. Cost of capital (opportunity cost of not using the capital elsewhere) and the cost of carrying products in inventory for another cycle are good starting points, but this approach may miss additional costs of overage.
1

For example, product returns and obsolescence costs may be hard to quantify, especially when obsolescence costs only occur when the product is cleared. It may be hard to convert this to a figure used for weekly decision-making.

Estimating the range of demand over the decision period almost deserves its own blog post. For this post, let’s assume that we can have a range of demand scenarios from historical data that enables a rough understanding of the demand distribution. Note that it is very important for companies to operate on a range of values rather than a point estimate.
2

Overage and Underage Tradeoffs Are Everywhere
At Stitch Fix, we face the manifestations of the newsvendor problem all the time, both for short- and longer-term decisions. Here are some examples.

CAPACITY PLANNING: For each day, we plan our styling and warehouse labor and shipping capacity. Overage is having idle human capacity; underage may mean delays, expediting, and unmet client expectations.

Running out of capacity today means a need to expedite tomorrow

MERCHANDISE PLANNING: For each season, we plan for and buy seasonal merchandise. We strive to find the sweet spot between potentially having too much of something vs. running out. Underage is running out of a very popular style; overage is capital and warehouse capacity tied to inventory that ended up being unproductive.

Do we have too many, or not enough?

The first formula in the textbook is sometimes the hardest to implement
In his excellent book, High Output Management [3],
Andrew Grove introduces indicators, or measurements, as key tools for management towards operational goals:

“If, for example, you start measuring your inventory levels carefully,
you are likely to take action to drive your inventory levels down,
which is good up to a point. But your inventories could become so lean
that you can’t react to changes without creating shortages.”

We can think of these, as factors companies care about and optimize for – they direct attention to what they monitor. Grove highlights the importance of pairing indicators so that both effect and counter effect are measured (e.g., monitoring both demand shortages and inventory levels).

The newsvendor formulation is a great approach to find a good balance between two factors that move counter to each other: Revenue vs. Cost, Customer Service vs. Efficiency, Inventory vs. Shortage.

Things get interesting if the competing indicators are the key metrics for different departments. It’s pretty typical that in traditional companies that source/make/sell products, departments are not closely coupled. Demand-facing departments are primarily measured against revenue, conversion, and client growth targets, while Procurement, Manufacturing, and Supply Chain organizations primarily focus on higher efficiency and cost reduction.

It’s challenging enough to improve a single business metric as a data scientist. Things do get more interesting when one aims to find the sweet spot within competing factors such as service vs. efficiency. In addition to the right models, algorithms (simple as they may be), and infrastructure, you will need to ensure that the right conversations are happening between the stakeholders and that the tradeoffs between competing metrics are made clear across the company.

The following engineering “law” [4] is still very much on point when interfaces are between organizational silos:

“The ability to improve a design occurs primarily at the interfaces.
This is also the prime location for screwing it up.” Shea’s Law

At Stitch Fix, the Algorithms team is a standalone organization. This enables us to not only work on a range of interesting problems (marketing spend optimization, conversion optimization, warehouse capacity, and freight cost optimization), but also to identify the cases in which the key business tradeoffs may span multiple departments. More and more, we act as the interface.

If you are a practitioner, here are a few questions that may help you better understand your surroundings:

What is the primary goal of your group? (e.g., what do you optimize for?)
What are the counter effects of your optimization? (i.e., what are the counter indicators?)
Where does your group report to? To a business function, or to a centralized function?

Footnotes

[1]↩ Marking down product to sell through the remaining inventory is typical. However, when this becomes too typical, consumers start expecting the sales and this has implications for the brand image.

[2]↩ Especially because point estimates may change periodically, incorporating a range of scenarios in decision-making enables highlighting when and where there may be demand and supply mismatches. (We call this scenario-based planning.)

References

[1]↩ Moving Beyond Deterministic Optimization by Anahita Hassanzadeh and Douglas Roud

[2]↩ Production and Operations Analysis by S. Nahmias

[3]↩ High Output Management by Andrew Grove

[4]↩ Shea’s Law: https://spacecraft.ssl.umd.edu/akins_laws.html

Further reading, with some examples of overage costs from the personal computer business:
Inventory-driven Costs

A Framework for Responsible Innovation

Stitch Fix’s engineering efforts are guided by the pragmatic question, “What problem are you trying to solve?” We want to encourage experimentation and innovation within the constraints of our significant investments in our existing tech stack by iterating, improving, and integrating new tools and techniques. Our core languages and platforms are robust and well-supported both internally and externally, leaving plenty of room for creative and effective adaptation to our evolving business needs. But as new needs arise and new technologies emerge, we needed to consider how we can innovate in a responsible way.

A successful engineering culture encourages autonomy and problem-solving. But as we explore the use of different tools that may have value to our organization, we realized that we needed a clear process for ensuring that we stay current with the industry, make good decisions, and understand the impact of adopting new technologies.

Our Responsible Innovation Framework breaks this down into three primary activities:

Evaluating the need and potential of a new technology,
Testing the technology through responsible experimentation, and
Consistently following a scope-based decision-making matrix for the change that we want to make.

Evaluating the Need and Potential for New Technologies

We consider exploring alternatives to our typical toolset when we encounter significant friction or come across a new technology or approach that may serve us better in the long term. The level of urgency for finding an alternative technology solution is determined by weighing the relative costs of doing nothing, waiting for an opportune time to explore and adopt something new, or making the necessary investment in the near term to better solve the specific problem we are facing.

Understanding the Problem Space
We begin by clearly articulating the problem that we’re trying to solve, and evaluating how our current technologies can be leveraged to solve that problem. This involves asking the following questions:

Why is this new problem a priority?
What business needs or decisions make this problem relevant?
Is the utility or flexibility of our current technologies lacking in response to this problem?
Have we used our current technology to its full benefit?
Did we adopt our current technology solution under significantly different constraints or operating conditions?
How costly would it be to apply our current technology versus adoption a new technology to solve the problem?

Answering these questions often involves discussions with our teammates to get agreement that the problem warrants attention, and that we have a good idea of how much of a priority the problem really is.

Strengthening Efforts vs Growth-Focused Efforts

Considering the answers to these questions, we move on to determining a general approach to meeting our technology need. The decision comes down to committing to a strengthening effort or a growth-focused effort.

A strengthening effort requires a careful analysis of the potential of our current tools to solve the problem and finding ways to adapt them to meet the new need. This may involve some degree of re-engineering, including process changes or extending the existing functionality of tools that we have already built.

A growth-focused effort requires evaluation of new potential options, considering the adoption costs and the overall benefit of each alternative carefully weighed against each other.

Examining Alternatives

If it is determined that our current tools are not well-suited to the emerging problem, and that a growth-focused effort is the appropriate way forward, we move on to evaluating alternative technologies that have the potential to solve it and choose the option that delivers maximum value for minimal effort.

For this, we can turn to a technology adoption model that combines an assessment of the ease of adoption and the utility of a new technology:

Ease of adoption encompasses a number of considerations:

Does the new technology enhance and fit well with our existing processes?
Will it be easy for our existing staff to learn to use effectively?
Is there good documentation and prior art that we can learn from?
Is it flexible in comparison to our existing tools?
Does the technology have a strong security story?
Have we considered the overall cost to other teams that need to support the new technology?

Usefulness can be determined with this series of questions:

Does the new technology enhance the performance of our teams?
Does it positively impact the performance, stability, and reliability of our application infrastructure?
Is it well-maintained, with an active user base and readily available support resources?
Does it solve the problem more effectively than our current toolset?

This matrix guides our decision-making process. Technologies that are difficult to adopt and provide limited utility can be rejected outright. Easy to adopt solutions that provide limited value are not likely to be adopted but still may be worth exploring through ‘play’ (e.g. reading documentation and making simple proofs of concept), as they can create learning opportunities and help us gain a new perspective on the problem space. Technologies that provide utility but are difficult to adopt fall into the wait category; they may be candidates for future adoption, once the technology matures to the point that tooling and training exists to make onboarding easier. And solutions that are easy to adopt and accommodate a wide range of use cases are the most appropriate to explore, and can move on to the next phase: experimentation.

Responsible Experimentation
In our daily work we are often faced with novel situations to which existing patterns of solutions don’t always perfectly apply. As creative professionals, we need to strike a balance between consistency and standardization, and creative problem-solving. We operate according to our shared values of trust and being motivated by a challenge, and as engineers we feel empowered to explore solutions from a practical, pragmatic, and customer-centered perspective.

To inform and guide the evolution of our solutions within an environment of pragmatic innovation and creativity, we sometimes need to experiment with new ways of doing things. So how do we safely experiment?

Characteristics of a Responsible Exploratory Experiment
How we approach exploring a new technology has a huge bearing on its likelihood of being adopted as a standard. So what makes a safe and responsible technology experiment?

A responsible experiment has:

A clearly defined problem statement;
A well-understood and well-documented use case;
Minimal impact on our own and other engineering teams’ priorities;
Minimal impact on other applications and systems;
Well-defined boundaries;
Buy-in from the team and its manager;
Success and failure criteria against which it can be measured;
Clear and concise documentation;
Evidence of thinking about alternatives and tradeoffs;
Analysis of the impact on operations, security, scalability, and observability;
A constrained time and cost; and
The ability to be undone or rolled back with a minimum of cost and effort; the artifacts of such an experiment are likely to be short-lived and replaced with more maintainable and production-ready code.

Every technology decision that we make has ripple effects beyond the code in which it manifests. New frameworks have an operational cost and associated security concerns. Third-party solutions must be analyzed for their long-term sustainability and likelihood of being properly updated and maintained. Esoteric or highly specialized technologies may have an associated learning curve (and a limited number of engineers capable of maintaining and extending the solution, which impacts hiring and staffing.) In short, we must recognize that decisions made by an individual developer or small group of developers may have an impact beyond that team’s domain.

After carefully experimenting with a new technology, and seeing the benefits of its adoption, it’s time to move into the decision-making phase.

Responsible Decision-Making

Our engineering culture is designed to empower our engineers and grant individuals and teams great latitude in their approach to creating technology solutions to meet our business needs. But understanding that some of these decisions have a wider impact, we must be careful and thoughtful about their widespread adoption.

We need to strike a balance between our desire to be consensus-driven and our need to be thoughtful, pragmatic, and deliberate about the decisions we make. We must rely on a clear and efficient decision-making process that both rewards creativity and innovation, and draws on the hard-won experience and insight of senior engineers and managers.

Decision-Making Framework

Who is responsible for making technology decisions? It comes down to a question scope: asking ourselves, who is impacted by this decision? This is where an emphasis on partnership becomes critical.

Team Scope

When the impact is limited to a single application or team, individual developers are trusted to make the right decisions about their approach to solving a problem.

As an example, a developer may explore the use of an architectural pattern that would be relevant to solving the problem at hand, even if its application is novel in the context of the team’s prior work.

In cases like these, we discuss our idea with our teammates to ensure that our change is acceptable and will be understood and documented well enough that it can be effectively supported by our team.

Responsible: An individual engineer (or small group of engineers) leads the effort. They are doing the work on the implementation and making sure that it adheres to the guidelines of a responsible experiment.

Accountable: The team’s manager, in collaboration with senior engineers, is ultimately accountable and has the power to decide if the experiment will be put into production.

Consulted: The entire team should be kept informed as to the setup, execution, and evaluation of the experiment. Feedback may be collected by reaching out to members of the team and through code reviews.

Informed: The responsible engineer should take responsibility for socializing the experiment to the broader engineering organization in case it applies to problems that other teams are facing.

Cross-Team Scope

When a change we want to make impacts one or more other teams or applications, we document our proposed solution and share that documentation with owners of those applications, including managers and representative engineers from those teams.

An example of cross-team scope is the extraction and integration of a new service that has potential use by multiple teams. We collaborate and iterate until we have agreed on an approach that meets the needs of all of the people who are affected by the decision.

Responsible: A small group of engineers representing multiple teams lead the experiment. The experiment should be well-documented, and the documentation should be thorough enough for anyone to read and understand what is being tested, why it is being tested, and what the impact of the technology will be if it is successfully adopted.

Accountable: The managers and senior engineers of the affected teams have the ultimate decision-making authority.

Consulted: Senior engineers from all affected teams should be consulted in the design, execution, and evaluation of the experiment. Note that the new pattern, tool, or technology may have unexpected impact on infrastructure and security concerns, so the people responsible for these areas should be consulted as well. The team leading the experiment should also solicit input from other engineers who have an interest in the topic, and regularly report back to them on the status of the experiment.

Informed: The engineers leading the experiment have the responsibility to socialize it to the broader engineering organization.

Engineering-Wide Scope

When a change impacts multiple teams, requires operational support in terms of infrastructure or security, significantly changes our approach to solving a certain set of problems, is costly in terms of time, effort, or price, or has broad business implications, we need to draw on a larger group to leverage their relevant experience and insight into the problem and proposed solution.

For example, if we were interested in adopting a new testing framework or adopting GraphQL API interfaces, this effort would only be successful by securing buy-in from our most senior engineers and managers well before we explore integrating these technologies.

Responsible: The working group of engineers leading the experiment.

Accountable: One or more senior engineers and managers (and potentially product partners or even executives, in the case of a large-scale adoption proposal) are ultimately accountable and have the authority over the final decision to adopt or not adopt the new tool, framework, or language.

Consulted: The experiment team should be in regular communication with senior engineering staff.

Informed: The team should take responsibility for socializing the experiment in the broader engineering organization. They should schedule presentations and Q&A sessions to fully inform other engineers and managers about the change and how it will impact them.

Striking a Balance

To preserve a culture of broad participation in decision-making, we must do our best to communicate clearly about the problems we face and how we can solve those problems in new ways. Partnering with our peers improves our chances of success, not only during the experiment but also as we move into the adoption phase for a new tool or technology.

We must also recognize the limits of driving toward consensus. Ultimately, we rely on our managers, senior engineers, architects, or other project sponsors to evaluate and approve significant changes to our tools, processes, patterns, and infrastructure. We trust that the decisions that they make will be informed by the results of our responsible research and experiments, the comprehensive documentation we have prepared, and the input and buy-in we have secured from across the engineering organization.

Regardless of the scope of the change we want to introduce, if we have done our jobs with due diligence, we can trust that the accountable individuals are well-equipped and empowered to make an informed decision that we can all believe in and support moving forward.

Final Thoughts

We understand that changes to our tools (and the resulting infrastructure implications) can have significant impact outside of our own individual or team domains. In order to stay most focused on delivering value to our clients, rather than getting lost in the details of complex technical problems, we want to keep the number of different technologies we use to a manageable and effective minimum.

Success is predicated on being deliberate and methodical in the way that we experiment with new technologies and how we make decisions about which technologies and techniques we will adopt.

We want to preserve the balance between pragmatism, creativity, and autonomy. In order to do this, we need to be deliberate about how we explore, experiment with, and choose to adopt new tools and technologies. We have to balance our attraction to the new and novel with realism and deliberation. And we need to trust our technical leaders to make informed decisions.

In short, we need to recognize that real and sustainable innovation is a team effort.

Moving from Data-Driven to AI-Driven: The Next Step in the Evolution of Business Workflows

This post is an adaptation of the article that originally appeared in HBR.

Many companies have adopted a “data-driven” approach for operational decision-making. While data can improve decisions, it requires the right processor to fully leverage it. Many people assume that processor is human. The term “data-driven” implies that the data is to be curated by — and summarized for — humans to process. However, in order to fully leverage the value contained in the data, companies need to bring Artificial Intelligence (AI) into the workflows and sometimes this means getting us humans out of the way, shifting our focus to where we can best contribute. We need to evolve from data-driven to AI-driven.

Discerning “data-driven” from “AI-driven” isn’t just semantics; it’s distinguishing between two different assets: data and processing ability. Data holds the insights that can enable better decisions; processing is the way to extract those insights and take actions. Humans and AI are both processors, yet they have very different abilities. To understand how to best leverage each it’s helpful to review our own biological evolution as well as how decision-making has evolved in industry.

Just fifty years ago human judgment was the central processor of business decision-making. Professionals relied on their highly-tuned intuitions, developed from years of experience in their domain, to pick the right creative for an ad campaign, determine the right inventory levels to stock, or approve the right financial investments. Experience and gut instinct were all that were available to discern good from bad, high from low, and risky vs. safe.

It was, perhaps, all too human. Our intuitions are far from ideal for use in decision-making. Our brains are inflicted with many cognitive biases that impair our judgement in predictable ways. This is the result of hundreds of thousands of years of evolution where, as early hunter-gatherers, we developed a system of reasoning that relies on simple heuristics — shortcuts or rules-of-thumb that circumvent the high cost of processing a lot of information. This enabled quick, almost unconscious decisions to get us out of potentially perilous situations. However, ‘quick and almost unconscious’ didn’t always mean optimal or even accurate. Imagine a group of our hunter-gatherer ancestors huddled around a campfire when a nearby bush suddenly rustles. A decision of the ‘quick and almost unconscious’ type needs to be made: conclude that the rusting is a dangerous predator and flee, or, inquire to gather more information to see if it is potential prey – say, a rabbit, that can provide rich nutrients. Our more impulsive ancestors – those that decided to flee – survived at a higher rate than their more inquisitive peers. This is because the cost of wrongly concluding it was a predator when it was only a rabbit is relatively low – some forgone nutrition for the evening. However, the cost of inquiring to gather more information when the rustling was actually a predator can be devastating – the cost of life! With such asymmetry in outcomes, evolution favors the trait that leads to less costly consequences, even at the sacrifice of accuracy1. Therefore, the trait for more impulsive decision-making and less information processing becomes prevalent in the descendant population.

The result of this selection process is the myriad of cognitive biases that come pre-loaded in our inherited brains. These biases influence our judgment and decision-making in ways that depart from rational objectivity. We give more weight than we should to vivid or recent events. We coarsely classify subjects intro broad stereotypes that don’t sufficiently explain their differences. We anchor on prior experience even when it is completely irrelevant. We tend to conjure up specious explanations for events when it’s really just random noise (see “You can’t make this stuff up … or can you?”). These are just a few of the dozens of ways cognitive bias plagues human judgment – the very thing we had once placed as the central processor of business decision-making. Relying solely on human intuition is inefficient, capricious, fallible and limits the ability of the organization.

Data-Driven Workflows

Thank goodness for the digital revolution. Connected devices now capture unthinkable volumes of data: every transaction, every customer gesture, every micro- and macroeconomic indicator, all the information that can inform better decisions. In response to this new data-rich environment we’ve adapted our workflows. IT departments support the flow of information using machines (databases, distributed file systems, and the like) to reduce the unmanageable volumes of data down to digestible summaries for human consumption. The summaries are then further processed by humans using the tools like spreadsheets, dashboards, and analytics applications. Eventually, the highly processed, and now manageably small, data is presented for decision-making. This is the “data-driven” workflow. Human judgment is still in the role of central processor, yet now with summarized data as a new input.

While it’s undoubtedly better than relying solely on intuition, humans playing the role of central processor still creates several limitations.

We don’t leverage all the data. Data summarization is necessary to accommodate the throughput of human processors. For as much as we are adept at digesting our surroundings, effortlessly processing vast amounts of ambient information, we are remarkably limited when it comes to processing structured data. Processing millions or billions of records of structured data is unfathomable; we can only processes small summaries – say, total sales and average selling price rolled up to a region level. Yet, summarized data can obscure many of the insights, relationships, and patterns contained in the original (big) data set. Aggregate statistics like sums and averages don’t provide the whole picture needed for decisions 2. Often a decision requires understanding the full distribution of data values or important relationships between data elements. This information is lost when data is aggregated. In other cases summarized data can be outright misleading. Confounding factors can give the appearance of a positive relationship when it is actually the opposite (see Simpson’s and other paradoxes). Yet, once data is aggregated it may be impossible to recover the factors in order to properly control for them3. In short, by using humans as central processors of data, we are still trading off accuracy to circumvent the high cost of human data processing.

Data is not enough to Insulate us from cognitive bias. With humans in the role of central processors, the data summaries are directed by humans in a way that is prone to all the same biases mentioned earlier. We direct the summarization in a manner that is intuitive to us. We ask that the data be aggregated to segments that we feel are representative archetypes. Yet, we have that tendency to coarsely classify subjects intro broad stereotypes that don’t sufficiently explain their differences. For example, we may roll up the data to attributes such as geography even when there is no discernible difference in behavior between regions. There’s also the matter of grain – the level to which the data is summarized. Since we can only handle so much data we prefer a very coarse grain in order to make it digestible. For example, an attribute like geography needs to be kept at a region level where there are relatively few values (i.e., “east” vs. “west”). Dropping down to a city or zipcode level just won’t work for us as it is too much data for our human brains to process. We also prefer simple relationships between elements – we’ll approximate just about everything as linear because it’s easier for us to process. The relationship between price and sales, market penetration and conversion rate, credit risk and income — all are assumed linear even when the data suggests otherwise.

Alas, we are accommodating our biases when we drive the data.

AI-Driven Workflows

We need to evolve further to bring AI into the workflow. For routine decisions that only rely on structured data, we are better off delegating decisions to AI. AI does not suffer from cognitive bias4. AI can be trained to find segments in the population that best explain variance — even if they are unintuitive to our human perceptions. AI can be trained to find segments in the population that best explain variance at fine-grain levels even if they are unintuitive to our human perceptions or result in thousands or even millions of groupings. And, AI is more than comfortable working with nonlinear relationships, be they exponential, power laws, geometric series, binomial distributions, or otherwise.

This workflow better leverages the information contained in the data and is more consistent and objective in its decisions. It can better determine which ad creative is most effective, the optimal inventory levels to set, or which financial investments to make.

While humans are removed from this workflow, it’s important to note that mere automation is not the goal of an AI-driven workflow. Sure, it may reduce costs, but that’s only an incremental benefit. The value of AI is making better decisions than what humans alone can do. This creates step-change improvement in efficiency and enables new capabilities. This is evolution of the punctuated type.

Leveraging both AI and Human processors in the workflow

Removing humans from workflows that only involve the processing of structure data does not mean that humans are obsolete. There are many business decisions that depend on more than just structured data. Vision statements, company strategies, corporate values, market dynamics – all are examples of information that is only available in our minds and transmitted through culture and other forms of non-digital communication. This information is inaccessible to AI, yet can be extremely relevant to business decisions.

For example, AI may objectively determine the right inventory levels in order to maximize profits. However, in a competitive environment a company may opt for higher inventory levels in order to provide a better customer experience, even at the expense of profits. In other cases, AI may determine that investing more dollars in marketing will have the highest ROI among the options available to the company. However, a company may choose to temper growth in order to uphold quality standards. In other cases still, the selection of the best marketing creative for an ad may require considerations that AI can’t make (see “Hollywood vs. The Algorithm”). The additional information available to humans in the form of strategy, values, and market conditions can merit a departure from the objective rationality of AI. In such cases, AI can be used to generate possibilities from which humans can pick the best alternative given the additional information they have access to5.

They key is that humans are not interfacing directly with data but rather with the possibilities produced by AI’s processing of the data. Values, strategy and culture is our way of reconciling our decisions with objective rationality. This is best done explicitly and fully informed. By leveraging both AI and humans we can make better decisions that using either one alone.

The Next Phase in our Evolution

Moving from data-driven to AI-driven is the next phase in our evolution. Embracing AI in our workflows affords better processing of structured data and allows for humans to contribute in ways that are complementary.

This evolution is unlikely to occur within the individual organization just as evolution by natural selection does not take place within individuals. Rather, it’s a selection process that operates on a population. The more efficient organizations will survive at higher rate. Since it’s hard to for mature companies to adapt to changes in the environment, I suspect we’ll see the emergence of new companies that embrace both AI and human contributions from the beginning and build them natively into their workflows.

References and Footnotes

[1]↩ Shermer, Michael. “Patternicity: Finding Meaningful Patterns in Meaningless Noise.” Scientific American. N.p., 1 Dec. 2008.

[2]↩ This is not to suggest that data summaries are not useful. To be sure, they are invaluable in providing basic visibility into the business. But they will provide little value for use in decision-making. Too much is lost in the preparation for humans.

[3]↩ The best practice is to use randomized controlled trials (ie A|B testing). Without this practice, even AI may not be able to properly control for confounding factors.

[4]↩ It should be acknowledged that there is a very real risk of using biased data that may cause AI to find specious relationships that are unfair. Be sure to understand how the data is generated in addition to how it is used.

[5]↩ The order of execution for such workflows is case-specific. Sometimes AI is first to reduce the workload on humans. In other cases, human judgment can be used as inputs to AI processing. In other cases still, there may be iteration between AI and human processing.

Twitter’s New Two-Factor Authentication No Longer Needs Phone Numbers

Twitter users, perhaps gullibly, added their phone numbers and email addresses to their Twitter accounts believing it would keep them safe, only for it to be revealed last month that some of that information that was added was unintentionally used for advertising. It wasn’t so easy just to remove all that information, however, as the phone numbers were needed for two-factor authentication with Twitter. But Twitter users will no longer need their phone numbers to authenticate in 2FA. Now there will be a choice of three ways to authenticate, one of those being with the phone number. But it will no… Read more

Papaya Lassi

I’ve never come across a papaya lassi on a menu anywhere. I wonder why because it is so darned delicious! And takes literally a few minutes to make. A papaya lassi is absolutely one of the best ways to ensure you eat more of this amazing fruit with all its glorious health benefits.

Why a papaya lassi works!
As a general rule of thumb, any stone fruit or fruit with gorgeous soft flesh would be delicious when blitzed with some yoghurt. The lassi made famous by mango is a traditional Indian drink of blended fruit, yoghurt and spices – most likely cardamom. If you look at cardamom’s flavour notes, it is sweet and floral with delicate eucalyptus notes. A perfect match for mango, coconut, peach, pear, strawberry, persimmon and papaya.

Papaya with its glorious orange buttery soft flesh and exotic musky flavour, works really well with yoghurt (especially coconut yoghurt!) and cardamom. And that is why a papaya lassi is a no brainer.

Why papaya might be the healthiest fruit on the planet?
A small papaya (that I have come to realize I can devour in a single sitting!) has almost 200% of your daily recommended Vitamin C intake. And a good amount of potassium and magnesium. An incredibly good source of fiber and folic acid, papaya can lower bad cholesterol and regulate digestion. Being super-rich in lycopene, papaya is known to reduce the risk of colon and prostate cancer.

A bowl full of papaya every other day might be excellent for building up your immunity while providing your gut with all the good bacteria it needs to thrive and flourish. Cut papaya keeps in the fridge for up to 2 days and can be frozen for up to 2 months. When they are in season, choose the ripest papayas you can find, peel deseed and cube them to freeze for the offseason. This papaya lassi is a great way to use up fresh or frozen papaya.

How do you eat your daily papaya? I usually just drizzle the flesh with lime juice and scoop and eat it with a spoon. I have recently started making this super delicious Papaya Ceviche. And now this papaya lassi is keeping things interesting in the daily-papaya-eating-department. If you have a fun way of having papayas, please do share! x

Print

PAPAYA LASSI

Author: Sneh
Prep Time: 10 minutes
Total Time: 10 minutes
Yield: 2
Category: Breakfast, Beverage
Cuisine: Gluten Free, Vegetarian

Print Recipe

Ingredients

1 small ripe red papaya (approx 500g)
1 cup Greek yoghurt
1 tablespoon maple syrup
1 teaspoon ground cardamom + extra for sprinkling
1/2 cup crushed ice

Instructions
Cut the papaya in half (lengthways). Using a spoon, scoop out the seeds and discard. Using a sharp paring knife, carefully peel the papaya. Place the papaya cut side down on a chopping board and cut into small pieces.
Place papaya pieces, yoghurt, cardamom, maple syrup and ice in the jug of a blender. Blend until smooth.
Pour into two 500ml capacity tall glasses. Sprinkle with extra cardamom and serve.

Notes
Ensure papaya is ripe. I usually buy a firm bright yellow papaya with smooth skin and then allow it to fully ripen in my fruit bowl for 1-3 days. A nicely ripened papaya will be very fragrant, a deep coral orange colour and starting to form a couple of soft spots on the surface. You will be able to press your thumb into the flesh.
Make It Vegan – Use coconut yoghurt instead of regular yoghurt.

Did you make this recipe?
Share your creations by tagging @cookrepublic on Instagram with the hashtag #cookrepublic

The post Papaya Lassi appeared first on Cook Republic.