-
Notifications
You must be signed in to change notification settings - Fork 146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to write to Partial sealed directory #1295
Comments
Hi Vladimir,
Thanks, |
Hi Serge Thank you for so detailed explanation. My I ask slightly different question, I have solution in some directory and it contains multiple C++ projects that are located in sub-directories and during compilation each directory will have x64 directory created with intermediate output. I don't really care about that output because actual results will be produced in different directory. Structure looks like this:
What is the best to use in this case? I was trying to use sealed source code for Prj1 ... Prj100 and regular output directory for Prj1\x64 to Prj100\x64, but it is quite painful, specially considered that in reality each project has unique name. I was planning to declare Root as sealed partial directory because x64 cleaned after checkout from VCS but I don't know how to specify intermediate output Thank you, Vlad |
If you have multiple projects generating intermediate outputs under each project root, an easy way to deal with that which doesn't imply specifying each project root/individual outputs is to declare a catch-all shared opaque directory at the root of your source root. E.g. if you have projects src/prj1, src/prj2, etc. you can declare a shared opaque at 'src'. That would allow any output produced under that folder (and recursively below). This shared opaque can be declare for each pip (tool execution), and that means each pip will have an output directory that can be consumed by downstream tool. Each output directory (even though they will all share a common 'src' root) will contain only the outputs generated by the corresponding pip. So if you have a downstream tool that then creates a final deployment (presumably outside of the 'src' structure), this tool need to take a dependency on all the output directories representing intermediate outputs. Thanks, |
How should I declare src directory? I tried as sealed source or seal partial and every time I got error Vlad |
The problem with a source sealed directory is that it doesn't allow outputs to occur under it. A better candidate for your scenario is maybe a partial seal directory, where you can glob for all sources, and can be declares as an input for your pips. Thanks, |
It does not work and I got error message:
Code is here:
Perhaps I'm doing something wrong? |
Would you mind sharing the full code? I suspect 'outDir' in this case is an exclusive opaque directory, which wouldn't allow any inputs underneath. In order to specify that output as a shared opaque directory, you can follow the examples here. |
I got following error:
Thank you, |
Test3-commented.zip Thanks, |
Hi Serge Thank you for your help. I found that I accidentally used wrong type of directory. Looks like I clicked on wrong link in Wiki. I agree with comments for untrackedDirectories. I just need something working and then go to details. The same applies for qualifiers. I have 2 more questions:
Vlad |
If 'outDir' will contain some files upfront, besides the ones produced by the tool, I suggest you turn that directory into a shared opaque one as well. Exclusive opaque directories are wiped out because they are not allowed to have sources or outputs from other tools (that the 'exclusive' part). So wiping them out guarantees build determinism. Making that a shared opaque will allow sources and other outputs to be there. For having a shared cache across machines. Yes, that's an existent cache feature. Unfortunately, we don't have public documentation for it yet. In essence that's a service that has to be configured and has some tricky deployment steps. I opened up an internal work item to do some minimal documentation around this scenario, will let you know when we have that available. Thanks, |
Thank you Serge. It works great with outDir. I will do more testings but looks like I have everything I need. I will wait for documentation. Thank you again for your help. |
Hi Serge I have another question about Test3-commented.zip:
Every build solution produces 63 Mb files into Vlad |
Hi Vladimir,
Thanks, |
Hi Serge.
Thank you, |
If you build with a filter pointing to outDir (look at path-based filters), as long as you get cache hits you won't need to transfer data from the cache. E.g. if you have a chain of projects like A -> B -> C (the arrow meaning a dependent) and both A and B are cache hits and C a miss, only the outputs produced by B will be brought from the cache for C to consume (assuming A does not produce files that go into outDir). This is called lazy materialization and it is the default behavior of buildxl. For untracking directories, that's something that can be specified at the pip level when calling Transformer.execute(...). You can take a look at that here. The option is unsafe for the reasons I mentioned before. Even if the outputs are intermediate (meaning that they are not part of outDir) you'd have to be sure they are not consumed as part of the build (e.g. a project produces and obj in some temp directory that is later consumed by some other project to produce the final executable). If that was the case, you'd need those files in the cache so they can get replayed properly. Thanks, |
Hi Serge. None of these intermediate files will be consumed later and I can untrack them. But as I mentioned in 3rd post, there are quite a few projects there and anybody can add new project (and it already happened twice) and intermediate data for that project will go to cache. Just in case I will explain full picture. We have build process that is working fine right now, but it rebuilds everything every single time. I would like to improve it by using BuildXL for one of the big solutions. I will have only one pip for now that will build that solution. If no input changes, then BuildXL will restore outputs from cache and speed up build process. So, if I understand correctly, because it is single pip, BuildXL has to materialize all outputs including intermediate files. So in my case, if somebody will add But on next run, even nothing changed, BuildXL has to materialize everything to As result there will be extra time to put intermediate data if anything changed and extra time to materialize unnecessary data when nothing changes. I can write code that scans solution and all its projects to find where they output intermediate files and generate BuildXL script on the fly but it is not an easy task and perhaps it will have own issues. It is why I asked if we can ignore that intermediate directory. If we could, then there will be little bit of code and everything will be reliable for future changes. Thank you, |
Hey Vladimir, Coming back to the untracked files question. Today this is statically provided data in the form of individual files or directory cones. This means you need to know upfront the location of the artifacts you want to untrack. I understand that the dynamic nature of projects being added/removed can make this hard to keep in sync, considering you want to untrack all intermediates of each project. Maybe if you can make all projects put their intermediates under a common folder (e.g. /out/obj), that would be easier to maintain. And one last point. Bxl uses hardlinks by default. So there is actually no extra space taken by files in the cache: the outputs that you see (being final or intermediate) are actually hardlinks from the local cache. There is of course some natural overhead when tracking/materializing more files, but I'd curious to see what's the real impact of untracking all intermediates vs not. Thanks, |
Hi Serge I'm not using shared cache yet. You said eventually there will be instruction on how to setup it. And as you said without it there is not much use for it. My project right now in research phase and I'm trying to see what I can do as first step to improve build time. I was trying to use msbuild resolver some time and I got bunch of internal errors and I gave up. I can change project to output all intermediate files to one common directory, but we have quite a few of such projects and I was trying to figure out, if it is possible to avoid doing it because it is a lot of work. As for last paragraph, yes I saw that BuildXL uses hard links. But if data not in local cache then it has to materialize from network and as result, transfer a lot of data. Thank you, |
Hi, I was wondering if there was ever any documentation or information provided about setting up a shared cache? |
Hi
I'm trying to run a tool from BuildXL that consumes files from some directory and writes new files to the same directory.
For what I read, it looks like I have to use partial sealed directory but I can't figure out how to specify that output of that tool should be that partial sealed directory.
For example:
I cannot specify
sealedDir
inoutputs
because it is not compatible. If I try to usesealedDir.root
there, then BuildXL got some internal error.If I try to use
dir1
inoutputs
, I got error thatdir1
coincides with the sealed directory
.What is correct way to use it?
Thank you,
Vlad
The text was updated successfully, but these errors were encountered: