Posted Nov 24, 2016 by James Kyle
Yarn is a new package manager that we built to be consistent and reliable. When installing hundreds or even thousands of third-party packages from the internet you want to be sure that you’re executing the same code across every system.
Yarn maintains consistency across machines in two key ways:
- Yarn uses a deterministic algorithm that builds up the entire dependency tree before placing files where they need to be.
- Important info from the install process is stored in the
yarn.locklockfile so that it can be shared between every system installing the dependencies.
This lockfile contains information about the exact versions of every single dependency that was installed as well as checksums of the code to make sure the code is identical.
Yarn needs this info about every dependency because packages are constantly
changing. New versions of packages are published all the time and since
package.json specifies version ranges you need to lock them down to a single
These version ranges exist because Yarn follows Semantic Versioning, or “SemVer”. SemVer is a versioning system designed around “breaking” or “non-breaking” changes.
When you have a version such as
v1.2.3, it’s broken into three parts:
- Major (1.x.x) – Changes that may cause user code to break
- Minor (x.2.x) – Changes that add new features (but should not break user code)
- Patch (x.x.3) – Changes that are fixing bugs in previous versions (but do not add new features and should not break user code)
When a package publishes a new version, the author bumps major, minor, or patch based on the changes that they have made as a way to communicate them.
Users of packages should typically welcome minor and patch versions but should be wary of major versions as they could break your code.
Version ranges are a way of specifying which types of changes you want to accept and which versions you want to prevent. These version ranges then resolve down to a single version which is either the version you have installed or the latest published version that matches your version range.
If you don’t store which version you ended up installing, someone could be installing the same set of dependencies and end up with different versions depending on when they installed. This can lead to “Works On My Machine” problems and should be avoided.
Also, since package authors are people and they can make mistake, it’s possible for them to publish an accidental breaking change in a minor or patch version. If you install this breaking change when you don’t intend to it could have bad consequences like breaking your app in production.
Lockfiles lock the versions for every single dependency you have installed. This prevents “Works On My Machine” problems, and ensures that you don’t accidentally get a bad dependency.
It also works as a security precaution: If the package author is either malicious or is attacked by someone malicious and a bad version is published, you do not want that code to end up running without you knowing about it.
Libraries vs Applications
There are two primary types of projects that use Yarn:
- Libraries – Projects that get published as packages to the registry and installed by users. (e.g. React or Babel)
- Applications – Projects that only consume other packages, typically building some kind of product. (e.g. Your company’s app)
For applications, most developers agree that lockfiles are A Good Idea™. But there has been some question about using them when building libraries.
When you publish a package that contains a
yarn.lock, any user of that
library will not be affected by it. When you install dependencies in your
application or library, only your own
yarn.lock file is respected. Lockfiles
within your dependencies will be ignored.
It is important that Yarn behaves this way for two reasons:
- You would never be able to update the versions of sub-dependencies because
they would be locked by other
- Yarn would never be able to fold (de-duplicate) dependencies so that compatible version ranges only install a single version.
Some have wondered why libraries should use lockfiles at all if they do not and should not affect users. Even further, some have said that using lockfiles when developing libraries creates a false sense of security since your users could be installing different versions than you.
This seems to logically makes sense, but let’s dive deeper into the problem.
So far we’ve been talking about dependencies as if there were only one type of dependency when in fact there are several different types. These are broken down into two categories:
- Runtime – Dependencies that are used by the project’s code and needed when the code is run.
- Development – Dependencies that are only needed to work directly on the project
When a library is installed by a user, only the runtime dependencies are installed. The development dependencies are only ever installed when working directly on the project that specifies them.
Each type of dependency creates a whole tree of dependencies which are needed for them to be used.
It turns out that most projects (libraries or applications) have far more dependencies for development than for runtime. The tree of dependencies created by a projects development dependencies is almost always the largest part of the total.
When working on a library, it is far more likely that a development dependency breaks just because there are more of them that could break.
The Breaking Change Race
When a package accidentally publishes a breaking change it starts the clock on who will be the first person to catch it. Whoever installs that breaking change first will (most likely) be the first to discover it.
Let’s imagine we have a package called
left-pad which takes a string and
adds a specified amount of padding in front of it. This small and seemingly
harmless package is used by a really big project, which we’ll just call…
One day, the maintainer of
left-pad decides they are going to do something
unspeakably evil and make it pad the right side instead. They publish it as a
patch version which quickly spreads to everyone using it.
Remember, Babel is a really huge project with tens of thousands of users and dozens of contributors. Let’s try and figure out who will catch the padding fiasco first.
- Looking at build history, Babel was installed in CI (with the
left-paddependency) exactly 103 times in the last 30 days.
- Looking at statistics from the registry, Babel was installed (with
left-pad) by users over 5.5 Million times in the same 30 days.
To put that in a different measurement:
- Contributors install Babel on average every 7 hours
- Users install Babel about on average every 0.5 seconds
left-pad incident happened, users discovered it almost immediately.
Notifying the Babel contributors via GitHub issues, tweets, Facebook messages,
emails, phone calls, smoke signals, and by carrier pigeon.
Babel contributors couldn’t possibly prevent users from being affected by every error. In order to do so, every 0.5 seconds they would need to be able to run CI, notice the error, find a fix, publish a new version, and get every user to upgrade. There’s just no way to make this happen.
This might seem like a bit too extreme of an example to apply broadly, but the reasoning stays the same just with different numbers. Libraries should always be installed more frequently by users than by contributors. If that isn’t true, it’s because no one uses that library anyways– and we should not be optimizing our workflow around code that no one uses.
Many library developers work very hard to maintain a test suite that has 100% code coverage. This coverage ensures that every single line of code is run at least once during the tests.
But it still does not catch everything. If a library has even a small number of users, it will be far better tested by the users than it will be by any test suite the contributors can come up with.
Users just write more code, and the code they write is not always what a library author will expect. The more popular the library the more edge cases users will find. Users will do things that you didn’t even think was possible– it will horrify you. It will make you question the goodness of humanity.
Even if a contributor happened to beat users to finding a breaking change, there’s a pretty decent chance they won’t catch it anyways. Untested edge cases are the most likely things to break.
The contributor’s burden
Now let’s talk a bit about the social side of not using lockfiles in libraries.
As mentioned previously, the majority of breaking changes that occur in library dependencies will be development dependencies.
Some percentage of these breaking changes will be caught and (hopefully) fixed by regular contributors. However, the remaining breaking changes will be caught by new or infrequent contributors.
Contributing to a project for the first time is a very intimidating experience for anyone who is not an open source veteran. If the first thing a potential new contributor runs into is a broken build or test suite, they could be so intimidated that they decide to forget about contributing back.
Even as someone who contributes to lots of open source projects, it’s a terrible thing to have to debug a build system for a few hours when you just want to fix a bug or add a small feature to a library.
The user’s burden
Of course not every breaking change is a development dependency, and theoretically contributors could catch breaking changes before users notice them. In that (narrow) scenario, aren’t we shifting the burden from contributors to users?
First, it’s not going to be every user that is affected by this. It’s well agreed upon that applications should be using lockfiles, and if they are then they won’t be affected by sudden breaking changes.
We’re only talking about users that are either installing a package for the first time, or are upgrading the versions of their dependencies.
For users upgrading their dependencies, they should be trained to look out for breaking changes and if they encounter one, to roll back the version to a working state and open an issue in the library.
The only really negative experience that we are adding is for new users, which is admittedly terrible. You want new users to have the best possible experience.
But remember that they already face this burden in the majority of cases. It’s only when contributors could have caught breaking changes before users experienced them that we are now placing an additional burden on new users.
It’s also important to note that new users make up a minority of the total installations. The majority of the time it will be existing users that are the first to catch breaking changes because they are the ones installing a library the most.
There is a simple universal rule that everyone should follow with Yarn: If you are installing a new dependency or upgrading an existing one you should check to make sure the package works as intended and is not breaking your code.
You should follow that rule regardless of what your libraries are doing.
Without lockfiles it gets even more complicated: In applications or libraries, if there is no lockfile, you will have to check the dependencies every time you install or re-install them and make sure that everything still works. Otherwise the build might be broken or the tests might fail. You could break something without even realizing it. You could run into situations where code works on your machine but no one else’s.
The idea that not using lockfiles in libraries somehow saves users from encountering breaking changes is at best extremely rare and at worst is never true.
By not using lockfiles in libraries, the only tangible thing you accomplish is making the project harder to contribute to.
We should work as a community to keep all of our libraries up to date. We should go out and build tooling for automatically upgrading dependencies that makes it painless to do. There has been attempts at such tooling before, but we can do better.
Please commit your