Discussion:
[Revctrl] DARCS correctness question
William Uther
2008-01-17 02:57:23 UTC
Permalink
Hi,
I have brief question about DARCS correctness. I must admit to
being more familiar with Operational Transformation theory better
than Darcs theory of patches, but I believe the two are very
similar. There is a brief intro to OT theory here: http://
revctrl.org/OperationalTransformation . In OT theory there are two
correctness constraints on any system, which I believe are related to
the two constraints in DARCS that patch commute is its own inverse,
and that:

ABC <=> B'A'C <=> B'C'A'' <=> C''B''A'' <=> C''A'''B''' <=>
A''''C'''B''' <=> A''''B''''C'''' where A'''' = A, B'''' = B and
C'''' = C

Basically, if you start off with a set of patches, you may need to
transform them to apply them, but if you do it right then the order
in which you apply them should not affect the final state of the system.

My question is: does that requirement apply to conflicts (merger
patches) in the DARCS theory?

The reason I ask that question is because my understanding of the
DARCS commute operation for textual patches is that it is very
similar to Ressel's Transformation Functions in OT theory. These are
known NOT to be valid. However, there are no conflicts in OT theory
- any result is acceptable as long as you get the same result
everywhere. The particular counter example I have for Ressel's
transformations, when translated to work with DARCS, causes
conflicts. Re-stating the question above in a slightly different form:

If I have two Darcs workspaces, each of which have the same set of
patches, but in different orders, should these workspaces be
identical (even in the presence of conflicts)?

If the answer is 'yes', then I have a counter-example. If the answer
is 'no', is there a write-up on how this affects correctness?

Here is the (possible) counter example (converted to Darcs from the
revctrl wiki page mentioned above):

#! /bin/sh

mkdir darcsTest
cd darcsTest
mkdir site1
cd site1
darcs init
cat > myfile <<EOF
Line A
Line B
Line C
EOF
darcs add myfile
darcs record -am "initial record" --author=EMAIL
cd ..
darcs get site1 site2
darcs get site1 site3
cd site1
cat > myfile <<EOF
Line A
Line B
Line X
Line C
EOF
darcs record -am "op 1" --author=EMAIL
cd ../site2
cat > myfile <<EOF
Line A
Line C
EOF
darcs record -am "op 2" --author=EMAIL
cd ../site3
cat > myfile <<EOF
Line A
Line Y
Line B
Line C
EOF
darcs record -am "op 3" --author=EMAIL
#having set up the operations, lets do some pulls to sync everything
#first each of site2 and site3 pulls the other
darcs pull -a ../site2
cd ../site2
darcs pull -a ../site3
# next they both pull site1
darcs pull -a ../site1
cd ../site3
darcs pull -a ../site1
# Now we'll check for differences in the result
cd ..
diff site2/myfile site3/myfile


If the final diff outputs anything then the workspaces are in
different states after pulling all the patches... and I get output
under Darcs 1.0.9rc1 which is the version installed on one of the
servers here.

Thanks in advance for your thoughts,

Will :-}
Ian Lynagh
2008-01-17 12:45:56 UTC
Permalink
Post by William Uther
If I have two Darcs workspaces, each of which have the same set of
patches, but in different orders, should these workspaces be
identical (even in the presence of conflicts)?
The pristine trees should be, but the way the conflicts are marked in
the working directory need not be.
Post by William Uther
# Now we'll check for differences in the result
cd ..
diff site2/myfile site3/myfile
$ diff -q site2/myfile site3/myfile && echo OK
Files site2/myfile and site3/myfile differ
$ diff -q site2/_darcs/pristine/myfile site3/_darcs/pristine/myfile && echo OK
OK


Thanks
Ian
William Uther
2008-01-17 21:39:43 UTC
Permalink
Post by Ian Lynagh
Post by William Uther
If I have two Darcs workspaces, each of which have the same set of
patches, but in different orders, should these workspaces be
identical (even in the presence of conflicts)?
The pristine trees should be, but the way the conflicts are marked in
the working directory need not be.
In OT theory, a set of patches that are all transformed to apply should
result in _identical_ output, regardless of order. If the output really
is identical, then shouldn't the conflicts be marked up in the same way?
My understanding from this that the merger patches are NOT identical.

You claim that this isn't a problem - i.e. you seem to be claiming that
merger patches do not need to obey the same correctness requirements
as normal patches. Can you point me to a formal justification for this?

Thanks,

Will :-}
David Roundy
2008-01-17 21:57:18 UTC
Permalink
Post by William Uther
Post by Ian Lynagh
Post by William Uther
If I have two Darcs workspaces, each of which have the same set of
patches, but in different orders, should these workspaces be
identical (even in the presence of conflicts)?
The pristine trees should be, but the way the conflicts are marked in
the working directory need not be.
In OT theory, a set of patches that are all transformed to apply should
result in _identical_ output, regardless of order. If the output really
is identical, then shouldn't the conflicts be marked up in the same way?
My understanding from this that the merger patches are NOT identical.
You claim that this isn't a problem - i.e. you seem to be claiming that
merger patches do not need to obey the same correctness requirements
as normal patches. Can you point me to a formal justification for this?
The patches aren't identical because they're in a different order. It's
perfectly normal in darcs that changes in different order are described by
different patches--in fact it's the basis of the whole system. If you
reorder the changes into the same order, then the patches are the same.
Merger patches do obey precisely the same correctness requirements as any
normal patches.

The conflict marking does depend on the order of changes in the repository,
but this doesn't really matter, since conflict-marking is not fundamental
to how darcs works. It's something that's done to the working directory
for the convenience of the user. We could remove this feature and darcs
would be just as correct (although rather more awkward to use).
--
David Roundy
Department of Physics
Oregon State University
William Uther
2008-01-18 00:05:22 UTC
Permalink
Post by David Roundy
Post by William Uther
Post by Ian Lynagh
Post by William Uther
If I have two Darcs workspaces, each of which have the same set of
patches, but in different orders, should these workspaces be
identical (even in the presence of conflicts)?
The pristine trees should be, but the way the conflicts are
marked in
the working directory need not be.
In OT theory, a set of patches that are all transformed to apply should
result in _identical_ output, regardless of order. If the output really
is identical, then shouldn't the conflicts be marked up in the same way?
My understanding from this that the merger patches are NOT identical.
You claim that this isn't a problem - i.e. you seem to be claiming that
merger patches do not need to obey the same correctness requirements
as normal patches. Can you point me to a formal justification for this?
The patches aren't identical because they're in a different order.
It's
perfectly normal in darcs that changes in different order are
described by
different patches--in fact it's the basis of the whole system. If you
reorder the changes into the same order, then the patches are the same.
Yes. But the resulting state after applying a series of patches
should be
the same, regardless of the order in which you've commuted the patches.
Post by David Roundy
Merger patches do obey precisely the same correctness requirements as any
normal patches.
Ok.
Post by David Roundy
The conflict marking does depend on the order of changes in the repository,
but this doesn't really matter, since conflict-marking is not
fundamental
to how darcs works. It's something that's done to the working
directory
for the convenience of the user. We could remove this feature and darcs
would be just as correct (although rather more awkward to use).
Okie. I think I understand this better now after Stephen's response.
Darcs' state is a partial order (represented using 'merger patches') and
the particular way you serialise that for display is irrelevant.

Thanks for the help,

Will :-}
Simon Marlow
2008-01-18 13:43:30 UTC
Permalink
Post by David Roundy
The conflict marking does depend on the order of changes in the repository,
but this doesn't really matter, since conflict-marking is not fundamental
to how darcs works. It's something that's done to the working directory
for the convenience of the user. We could remove this feature and darcs
would be just as correct (although rather more awkward to use).
Which reminds me - conflict markers are rather hard to use, because it's
not clear which change comes from which patch, and unless I'm mistaken the
order can vary from one file to another (perhaps even between hunks?).
Also, you don't get to see the original version of the file before either
patch. I'd like to see something like

v v v v v v (original)
foo
=========== (patch A)
bar
=========== (patch B)
baz
=========== (patch C)
wibble
^ ^ ^ ^ ^ ^

Cheers,
Simon

Stephen J. Turnbull
2008-01-17 23:09:32 UTC
Permalink
Post by William Uther
In OT theory, a set of patches that are all transformed to apply should
result in _identical_ output, regardless of order. If the output really
is identical, then shouldn't the conflicts be marked up in the same way?
Why would you expect that? Merger markup expresses a superposition
(hello, physicists!) of two texts; there's always a choice of order
when serializing.
Post by William Uther
My understanding from this that the merger patches are NOT identical.
The patches are the same, the notation is changed. That is, like most
systems, darcs marks the conflicts like this:

<<<<<<< status quo post patch applied first temporally
atext
atext
=======
btext
btext
btext
Post by William Uther
conflicting change due to patch applied second temporally
not

<<<<<< changes proposed by patch earlier in some arbitrary but global order
atext
atext
=======
btext
btext
btext
Post by William Uther
changes proposed by patch later in some arbitrary but global order
The latter will give consistent results, I think, but the former
definitely will be nondeterministic.

Doing this the way you expect *and* in a way that is easy for users to
understand is hard, I suspect. For example, in git, Mercurial, or
Darcs, you could just sort on some handy SHA1, but this would probably
be confusing in the best of all possible world for the case of cherry-
picking exactly one patch, because sometimes "mine" would appear on
top and sometimes "yours" would appear on top, and the SHA1s are not
human-meaningful. In the world we live in where most systems give
status quo status to the patch applied to the repo earlier in temporal
order, I'm sure it would disconcert most users.
William Uther
2008-01-17 23:59:50 UTC
Permalink
Post by Stephen J. Turnbull
Post by William Uther
In OT theory, a set of patches that are all transformed to apply should
result in _identical_ output, regardless of order. If the output really
is identical, then shouldn't the conflicts be marked up in the same way?
Why would you expect that? Merger markup expresses a superposition
(hello, physicists!) of two texts; there's always a choice of order
when serializing.
Well, in OT theory a large effort is made to make sure that the same
serialization is made in all cases. It is a bug if different orders
appear.
Post by Stephen J. Turnbull
Post by William Uther
My understanding from this that the merger patches are NOT identical.
The patches are the same, the notation is changed.
Ahh. I think that might have cleared up my confusion. Would it be
correct to say that DARCS is versioning partially ordered text, and it
is just choosing which total order to display to the user differently?
Post by Stephen J. Turnbull
Doing this the way you expect *and* in a way that is easy for users to
understand is hard, I suspect.
Good point.

Cheers,

Will :-}
Stephen J. Turnbull
2008-01-18 03:20:10 UTC
Permalink
Post by William Uther
Post by Stephen J. Turnbull
Why would you expect that? Merger markup expresses a superposition
(hello, physicists!) of two texts; there's always a choice of order
when serializing.
Well, in OT theory a large effort is made to make sure that the same
serialization is made in all cases. It is a bug if different orders
appear.
That's one way of proving that the result is correct (you'll get an
empty diff). But there are other ways of doing it, and if you choose
a particular serialization, you have to show that which serialization
is chosen doesn't matter.
Post by William Uther
Ahh. I think that might have cleared up my confusion. Would it be
correct to say that DARCS is versioning partially ordered text, and it
is just choosing which total order to display to the user differently?
I don't have a good feeling for that. It's a reasonable analogy, I
suppose.
Loading...