Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic mix test partitioning #9422

Merged
merged 5 commits into from Oct 17, 2019
Merged

Add basic mix test partitioning #9422

merged 5 commits into from Oct 17, 2019

Conversation

josevalim
Copy link
Member

@josevalim josevalim commented Oct 16, 2019

This PR implements the functionality described on step #3 here: https://medium.com/@cblavier/how-we-improved-our-elixir-build-speed-by-5x-d45393c6700f

The rationale is: not all tests can run async (think about tests that change the current directory, for example) so having OS based parallelism is a great help. This can also be used to automate CIs. With this PR, you would be able to do:

MIX_TEST_PARTITION=1 mix test --partitions 4
MIX_TEST_PARTITION=2 mix test --partitions 4
MIX_TEST_PARTITION=3 mix test --partitions 4
MIX_TEST_PARTITION=4 mix test --partitions 4

Note the partition itself is given as an environment variable so it can, for example, be used to configure a database and other things in config files.

The current implementation simply partitions files but @wojtekmach suggested we can provide better heuristics in the future. For example, we can partition based on slowest tests, in case we want to use the manifest.

Thanks for sharing @cblavier!

{partition, ""} <- Integer.parse(partition),
true <- partition in 1..total do
partition = partition - 1
Enum.filter(files, &(:erlang.phash2(&1, total) == partition))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there are reason to use hash based partitioning instead of something like:

Enum.filter(Enum.with_index(files), fn {_file, index} -> rem(index, total) == partition end)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason, your approach is definitely fairer.

Copy link

@cblavier cblavier Oct 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rationale behind hash based partitioning (vs linear & alphabetical ordering) was to get all my slow feature tests (which are located in the same feature folder) randomly distributed over test runners.

Very excited to see this feature integrated in Elixir, thanks guys!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cblavier in this case, sort+round-robin is actually fairer than hashing, because we will guarantee that, if the feature folder has four files, each file goes to a different partition (assuming at least four partitions). With hashing, they would be distributed unfairly.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it, you're right 👍

@josevalim
Copy link
Member Author

This is ready for review and merge.

José Valim and others added 2 commits October 17, 2019 11:59
Co-Authored-By: Fernando Tapia Rico <fertapric@gmail.com>
Co-Authored-By: Tonći Galić <tuxified@gmail.com>
@devonestes
Copy link
Contributor

Is anyone worried that by adding this feature that we're making it easier to make poor system design choices? If enough of your tests involve exclusive access to global resources to make a substantial difference in test runtime, those slow tests might be valuable feedback that you've made a design decision that's potentially going to lead to issues in a highly concurrent system like the BEAM.

Also, this essentially makes it impossible to run any test in their suite in series if they depend on exclusive access to a resource outside of the BEAM (like the file system or a database). How would one run their tests if they have async: false tests that rely on exclusive access to a folder on the file system and also have tests that require exclusive access to something like a global ETS process? It seems like this might result in unintended behavior for folks.

@josevalim
Copy link
Member Author

josevalim commented Oct 17, 2019

@devonestes I don't think the cause and consequence relationship as you propose exists here. I.e. I don't think people will say "we will rely on global behaviour because this feature exists". And if they were to conciously make said decision, that's unlikely to be the only poor decision they would do. :D

If the concern is poor design, the most likely scenario is that people will have a poor system design, which often reflects in poor tests, but at least they will be provided with an option to reduce their feedback cycle time at some point (which we can hope will motivate further improvements to the system). But of course, this is speculation, and your speculation is as good as mine.

However, I can confidently say they are also cases where we literally have no option. Look at Mix. It is the slowest test suite in Elixir by far, we do our best to organize and structure the code, but because a lot of things on Mix are controlled by the current working directory, which is global, there is nothing we can do but disable async.

Another example is anyone who is running MySQL. MySQL users cannot use the SQL Sandbox concurrently due to deadlocks. Shall we tell them their only option is to change the database? So I would say this is a very welcome addition in said cases.

# ordering, so different OSes could return a different order,
# meaning run across OSes on different partitions could run
# duplicate files.
for {file, index} <- Enum.with_index(Enum.sort(files)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to be sure, Enum.sort/1 isn't using system locale when comparing strings.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it isn't, but I think we can assume we are running on similar file systems, so that the sorting is consistent.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It uses <= or equivalent semantics.

@devonestes
Copy link
Contributor

Yeah, I don't see many people developing tools like Mix, but the MySQL thing is a good point. They still need an equal number of databases to the number of partitions they have, but that's not so bad and would at least make things nicer for them.

I do think the feedback we get from tests is important - if something is hard to test, then it's a good sign that the design of the code under test could be improved - but to me the MySQL thing overrules by far. That's still used a ton...

Ok, it's a good idea 🎉 😉

@josevalim
Copy link
Member Author

They still need an equal number of databases to the number of partitions they have, but that's not so bad and would at least make things nicer for them.

And FWIW, it just works. All you need to do is this in their config/test.exs:

database: "demo_test#{System.get_env("MIX_TEST_PARTITION")}",

@josevalim josevalim merged commit 058b224 into master Oct 17, 2019
@josevalim
Copy link
Member Author

❤️ 💚 💙 💛 💜

@josevalim josevalim deleted the jv-partition branch October 17, 2019 11:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants