It might solve it for git, but this looks like something the Review Board team came up with, and they have to integrate with many other version control systems like SVN, CVS, Perforce..etc. Seems like this is meant to address supporting many different version control systems with a single format.
I've worked at a place that used Review Board, and SVN as their primary vcs, but many devs used a local git-svn mirror for their work. Sometimes this caused problems with uploading diffs, especially if svn and git-svn were being mixed in one review. Having the Review Board cli generate a common diff format for both would have helped with that.
> Seems like this is meant to address supporting many different version control systems with a single format.
I'm sorry, this is simply wrong at so many levels. You're lauding this as a solution in search for a problem. As OP pointed out, this is already a solved problem as proven by Git. Git is not using a proprietary format. The problem of "integrate with many other version control systems" depends on whether those version control systems want to work on adding support for this feature. I guarantee you there isn't a single SVN or Mercurial maintainer complaining that they would love to share patches with Git but they are blocked because they cannot implement, let alone design, a format to exchange patches. That is not the hard part. That doesn't even register as a concern.
Git is using a proprietary variant on top of Unified Diffs. Unified Diffs themselves convey very little information about the file being modified, focusing solely on the line-based contents of text files and allowing vendors to provide their own "garbage" lines containing anything else. Every SCM that tracks information beyond line changes in a diff fills out the garbage data differently.
The intent here isn't to let you copy changes from one type of repository to another, but to have a format that can be generated from many SCMs that a tool could parse in a consistent way.
Right now, tools working with diffs from multiple types of SCMs need at least one diff parser per SCM (some provide multiple formats, or have significantly changed compatibility between releases.
For SCMs that lack a diff format (there are several) or lack one that contains enough information to identify a file or its changes (there are several), tools also need to choose a method to represent that information. That often means yet another custom diff format that is specific to the tool consuming the diff.
We've spent over 20 years dealing with the headaches and pain points here, giving it a lot of thought. DiffX (which is now a few years old itself) has worked out very well as a solution for us. This wasn't done in a vacuum, but rather has gone through many rounds of discussion with developers at a few different SCM vendors who have given thought to these issues and supplied much valuable feedback and improvements for the spec.
Created by the Git team for Git's purposes, rather than something documented or proposed for wider adoption.
Other SCMs can and do use a Git-style diff format, but as there's no defined grammar, there are sometimes important differences. For example, Mercurial's Git-style diffs represent the revisions in a different format than Git's does with different meanings, reuse Git "index" lines for binary files but include SHAs of the file contents instead of any sort of revision, and have a header block that should be stripped out before sending to a Git-style diff parser.
Yep! We spent 20 years dealing with these problems and in those 20 years nobody really solved these pain points. So we talked to some SCM vendors, bounced ideas around, built a spec, got feedback from them, repeated off-and-on for a couple years until we got the current draft, and implemented it for our needs.
It's been a few years now, and so far so good for the purposes we built it for. And it's there for any other tool or SCM authors to use if it also happens to be useful to them.
Feels more like in 20 years nobody else really has those pain points.
1. For most people using multiple SCMs is just a huge and easily-avoidable mistake. Most people can just mandate a single SCM for a project and then all these problems are moot.
2. For the things listed in TFA
A single diff can’t represent a list of commits
That's what "patch" and "patch format" is for. It works great.
There’s no standard way to represent binary patches
Very unclear why anyone needs this. There's no standard way to code-review a binary diff (it depends what the blob is that you're diffing) so how would it help if you had this standard way to represent the diff?
Diffs don’t know about text encodings (which is more of a problem than you might think)
This goes away if people on a project agree a particular encoding (which is going to be utf-8 lets face it). If someone sends a diff in an incorrect file encoding via diffx it will still apply wrong if someone uses a non-diffx aware (aka standard) tool to apply it. So diffx doesn't really fix this problem.
Diffs don’t have any standard format for arbitrary metadata, so everyone implements it their own way.
This goes away if you just use one SCM for a project which you should anyway for everyone's sanity.
> 1. For most people using multiple SCMs is just a huge and easily-avoidable mistake. Most people can just mandate a single SCM for a project and then all these problems are moot.
You talk about SCMs, we're talking about VCSs. Where it's not just source code under control, or even source code with a handful of binary assets. Imagine dealing with a VCS that has to handle 15 years and a few petabytes of binary assets. Or individual files that were multiple gigabytes and had changes made to them several times per day. Can git do that gracefully just by itself? Or SVN? Even Perforce struggled with something like that back in the day.
>Very unclear why anyone needs this. There's no standard way to code-review a binary diff (it depends what the blob is that you're diffing) so how would it help if you had this standard way to represent the diff?
A standard way of handling the binary data doesn't mean understanding the binary data. You can leave that up to specific tools. What you need though is a way to somehow package up and describe those binary diffs enough that you can transport the diff data and pick the right tool to show you the actual differences.
> This goes away if you just use one SCM for a project which you should anyway for everyone's sanity.
And if wishes were fishes, I'd never be hungry again. If you have a lot of history, a lot of data, a lot of workflows and tools built up around multiple VCSs, then changing that to just one VCS is going to be a massive undertaking. And not every VCS can handle all of the kinds of data that might get input into it. Some are going to be good at text data, some might handle binary assets better. Some might have a commit model that makes sense for one type of workflow but not for another. For example, you might be dealing with binary assets where you can only have one person working on a specific file at a time because there's no real way to merge changes from multiple people, so they need to lock it. For text assets though, you might be able to handle having multiple people work on a file. To afford both workflows, your VCS now needs to not only support both locking modes, but be hyper-aware of the specific content to know which kind of locking to permit for specific files.
The world doesn't always fit into the nice little models that the most popular VCSs provide. So if you're trying to not limit your product to supporting just those handful of popular VCSs, you can't just assume everything will fit into one of those models.
> Imagine dealing with a VCS that has to handle 15 years and a few petabytes of binary assets.
Part of the problem is that you're fabricating imaginary problems that no one is actually experiencing, and only to try to argue that the solution for this imaginary problems is a file format.
That's a very strong statement. A less aggressive approach to discussion might involve asking for a concrete example of a problem rather than assuming bad faith argument.
Off the top of my head, and just spitballing, I would be more surprised if mature game devs or animation studios didn't want to version control pretty massive asset libraries.
> Off the top of my head, and just spitballing, I would be more surprised if mature game devs or animation studios didn't want to version control pretty massive asset libraries.
I once closed a bug with a comment that it was old enough to drink. Also that the lines mentioned in the bug no longer existed, although the file did. Couldn't even satisfy my curiosity about what it had originally looked like, change history didn't go back that far.
>I guarantee you there isn't a single SVN or Mercurial maintainer complaining that they would love to share patches with Git
I was one of those maintainers. So you're already wrong there. As I described in my parent comment, I've worked somewhere this was an actual problem I encountered. I was responsible for both maintaining our SVN repository, and our Review Board instance, so I have had to actually deal with this.
DiffX is a bit younger than the tooling I've written, but I have added custom diffing tools to the SVN client for one team I worked with. I've also written plenty of tools that used the information provided by a VCS (sometimes even poking around in the server-side data), but external to it. So given a few days to refresh my memory on the interfaces, I could probably whip something up for SVN pretty quickly.
Exactly that. They all do things so differently that you end up creating and maintaining a separate parser for every SCM's diff format, and sometimes doing a lot of normalization of content or modification to include information the format lacks that's needed to apply the patches. And those are just for the ones that actually have a diff format -- many don't.
We needed something for ourselves at the very least. Much of DiffX came from thinking about these pain points and from talking to other SCM vendors whose engineers have also given some thought to these problems.
DiffX would have been nice to have available way back when I was trying to add support for our custom in-house vcs to Review Board. We had to either contort the diffs from our vcs to some format already understood by Review Board, which was sometimes difficult due to how the vcs structured the data it stored, or add a whole new parser to our Review Board instance, which would have been a major maintenance pain.
As an aside, I applaud you for creating Review Board. I've introduced its usage with several teams that I've worked with, and it really helped change how those teams operated, from a fly-by-night sort of development to actually having a process; The reduction in bugs and improvement in code quality were quite useful too.
I'm really glad it was useful for you and your teams! :) Hearing that kind of thing always brightens my day. I've felt very lucky getting to work on this as my job all these years.
It'd have been amazing having something like DiffX when we started building some of these SCM integrations too. It's really saved us a lot of trouble with some of the recent ones we've built (PlasticSCM / Unity Version Control, Keysight SOS, and ClearCase), which didn't have a format to work with and needed a lot of extra metadata for lookups and some other stuff.
They are using it. Review Board is a successful project that's been around for a long time, and it's solving a problem they had. One of the most common workflows with Review Board for source code reviews is to use the RBTools command line tools to post or update reviews. The cli would be the one generating the diff (although it supports uploading diffs that you generate iirc.) I haven't looked into the details, but I assume RBTools can generate DiffX diffs which is probably easier for the backend to process. (E - from what chipx86 has said in some of his posts here, they have been using it for several years now)
I don't really see this as pushing anything, more as documentation of something they did for themselves, but are also willing to provide to anyone else if they want to use it. Same as how the source code for the core Review Board product is available for anyone.
If you're happy with the diff format you're using in your workflow, keep using that. No one's twisting your arm to switch to DiffX.
You're just being obtuse. It's been explained multiple times that there are tools external to the version control systems that can generate and consume this format. Just because there's no 'svn-diffx' or 'hg-diffx' command/tool built into the vcs itself, doesn't mean that this format can't be generated and used by other tools.
So to answer your question, any vcs that has had tools written for it to generate this format. And it sounds like it's most of the major ones as far as Review Board is concerned.
I've worked at a place that used Review Board, and SVN as their primary vcs, but many devs used a local git-svn mirror for their work. Sometimes this caused problems with uploading diffs, especially if svn and git-svn were being mixed in one review. Having the Review Board cli generate a common diff format for both would have helped with that.