Skip to content

Clarify the definition of undefined behavior #202

Open
@gnzlbg

Description

@gnzlbg

I think we should clarify in the definition of undefined behavior that:

  • While this document (and the reference) often call out the behavior that is undefined, the behavior that is not explicitly defined in these documents is also undefined.

  • Undefined behavior cannot be assumed to remain undefined.

To expand on this second point, we currently say that creating a &mut T with value 0 is undefined behavior, but we guarantee that this will always be undefined behavior (because the type has a niche), and therefore, this code is sound:

fn foo(x: &mut T) { unsafe {
   if transmute(x) == 0_usize {
      *std::ptr::null() // never invokes UB
   }
}}

Now follow me in this thought experiment, and imagine that, instead of guaranteeing that &mut T are never 0, we were just to say that creating a &mut T with value 0 is UB.

Would the example above be sound? I don't think so, because we could, in a future release of Rust, make this undefined behavior defined in such a way that would make that code unsound. For example, we could say that creating a &mut T with value 0 is ok, but dereferencing it panics, and that change in the language from undefined to defined behavior would break the code above.

That is, while the compiler can assume that undefined behavior never happens when optimizing Rust code, users cannot assume that undefined behavior will remain undefined when writing Rust code. The only thing users can assume are things that we guarantee.

If we want to say that some behavior is undefined, and that behavior will remain undefined forever, such that users can assume that it never happens, we need to word that as a guarantee, and breaking that guarantee would be a backward incompatible change.

So maybe we need to split our lists of undefined behavior into behaviors that are just undefined, and behaviors that are guaranteed to be undefined.

Activity

RalfJung

RalfJung commented on Sep 11, 2019

@RalfJung
Member

While this document (and the reference) often call out the behavior that is undefined, the behavior that is not explicitly defined in these documents is also undefined.

I think that is one of the worst possible definitions of UB.
IMO, it is a spec bug if the spec does not cover a case. That's just like having a non-exhaustive match. A mathematical definition has to cover all cases, otherwise it is not a valid definition. The same goes for a proper spec.

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

So what do we call behavior not defined in the spec then ?

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

Or are you suggesting that the spec is not valid until it becomes complete ?

RalfJung

RalfJung commented on Sep 11, 2019

@RalfJung
Member

"Not yet specified behavior"? It's basically a white spot on the map. That's very different from UB, for which there should be a precise (and conscious) definition. UB is where we know there are dragons; a white spot is where there might be dragons or not.

RalfJung

RalfJung commented on Sep 11, 2019

@RalfJung
Member

Or are you suggesting that the spec is not valid until it becomes complete ?

Not sure what you mean by "valid", but is certainly not complete until there are no more missing cases.

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

"Not yet specified behavior"?

This sounds good to me.

Not sure what you mean by "valid", but is certainly not complete until there are no more missing cases.

You mentioned that:

A mathematical definition has to cover all cases, otherwise it is not a valid definition. The same goes for a proper spec.

So I asked whether this means that you consider any incomplete spec invalid. As in, if the spec is not complete, it is not a valid Rust spec, and nothing in it applies to Rust.

I don't think doing that would be very useful, because AFAIK there might never be a complete spec for Rust (this is a practical / engineering problem). Even if at some point the spec would be believed to be complete, any tiny detail that would make it incomplete would render it invalid (e.g. new nightly feature is added to Rust without spec wording, now nothing in the spec applies to nightly Rust anymore).

I think it would be more useful to say that the spec does not defines the behavior of all well-formed Rust programs because there is some behavior that has not yet been specified, but if you have a program for which the spec defines all of its behavior, then you can use the spec to reason about it.

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

As in, the spec can tell you what some well-formed programs do or whether some well-formed programs have UB, but at least right now there definitely are well-formed programs for which the spec can't tell you what they do or whether they have UB or not because the behavior of these programs is not specified.


I worry a bit that "not-specified-behavior" is too similar to "unspecified behavior". There is a difference between unspecified behavior that's required not to be undefined, and unspecified behavior that could be undefined.

Diggsey

Diggsey commented on Sep 11, 2019

@Diggsey

To expand on this second point, we currently say that creating a &mut T with value 0 is undefined behavior, but we guarantee that this will always be undefined behavior (because the type has a niche), and therefore, this code is sound:

It's the other way around: the spec says that a &mut T must refer to a valid T and as a result creating one with value 0 is UB.

It doesn't make sense for the spec to say "this is UB now but might not be UB in future" or for the spec to say "this is UB and will stay UB forever" - the spec describes rust at a point in time.

We can then take two versions of the spec and ask whether a newer one is backwards compatible with an older one: if something is UB in a previous version then it can continue to be UB or its behaviour may be defined in the newer version.

However, if the old spec says "&mut T must refer to a valid T" and the new spec relaxes that, then the new spec cannot be backwards compatible.

TLDR. "UB" already means "we are saying nothing about the behaviour in this case", and there's no reason to introduce a term to say "we will never say anything about the behaviour in this case" because that's not a useful thing to say - it restricts what we can do for no reason.

hanna-kruppe

hanna-kruppe commented on Sep 11, 2019

@hanna-kruppe

@RalfJung

I think that is one of the worst possible definitions of UB.
IMO, it is a spec bug if the spec does not cover a case. That's just like having a non-exhaustive match. A mathematical definition has to cover all cases, otherwise it is not a valid definition. The same goes for a proper spec.

I agree that it is generally better to explicitly state when something is UB than risk declaring something reasonable UB by mistakenly not covering it. However, a spec is only a model of reality, and neither can nor should exhaustively list everything that could possibly happen in reality that might be relevant to the operation of a Rust program. So there's always going to be vast stretches of things that people might reasonably want to do, but which the spec (and implementation documentation) doesn't give any guarantees or even guidance about. Since these things often interact with the parts that are covered by the spec in non-trivial ways, it's not hard get something indistingushable from de jure UB (assumed not to happen by implementations, non-local and time-travelling effects, has cascading and arbitrarily bad consequences, etc.).

Clearly, it's overly conservative to declare everything not explicitly covered by the spec to be UB: many reasonable things that need to happen in real systems and work perfectly fine fall under that. But "it's probably UB" is a safer default answer and just calling these things "not specified" does not properly convey the distinction between "we'll get back to you" and "you're on your own and probably stuff will break". For example, if someone has a question about a tricky interplay of raw pointers and &mut's, we can say we don't yet have a finalized memory model so it's not settled yet. If someone does very funky stuff with page tables and linker scripts, I wouldn't call it a spec bug if the spec doesn't tell them if they're in the clear.

I'd also like to note that "we don't specify what this does" is not really any more restricted than "it's UB" because the thing it might do is causing UB. So it's not like people can look at the spec, see that the thing they want is "not specified", and then can feel justified in doing it. It's really just UB by a less scary-sounding name.

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

We can then take two versions of the spec and ask whether a newer one is backwards compatible with an older one: if something is UB in a previous version then it can continue to be UB or its behaviour may be defined in the newer version.

@Diggsey If undefined behavior becomes defined, the safe and sound Rust API of the thought experiment becomes unsound.

There are dozens of places in the reference, nomicon, and the UCGs where we say that something is undefined.

For a concrete example, we used to say that the behavior of unwinding out of an extern "C" functions is undefined. That is, the following safe and stable Rust API is sound (it never invokes UB):

pub fn bar(x: extern "C" fn()) {
    if catch_unwind(|| x()).is_err() { unsafe { /* UB */ } }
}

If we were to "fix" that UB in the reference, by saying that "unwinding out of an extern "C" function is ok" (which feels like a perfectly reasonable thing that we should be able to do if we wanted), then that stable, safe, and sound Rust API becomes unsound. That would be a serious backward incompatible change.

I think we should be able to remove "X is UB" from the reference, and for that, we need to forbid users from relying on things being UB to write code like the above.

Diggsey

Diggsey commented on Sep 11, 2019

@Diggsey

If undefined behavior becomes defined, the safe and sound Rust API of the thought experiment becomes unsound.

Undefined behaviour becoming defined is not what causes that example to become unsound. It becomes unsound because the spec previously said that &mut T referred to a valid T`, and the new version does not guarantee that, which is a breaking change.

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

It becomes unsound because the spec previously said that &mut T referred to a valid T`, and the new version does not guarantee that, which is a breaking change.

I think you are focusing too much on the &mut T example instead of trying to understand the issue I'm trying to raise.

Diggsey

Diggsey commented on Sep 11, 2019

@Diggsey

OK, to take your unwinding example, I don't believe that code is sound to begin with.

pub fn bar(x: extern "C" fn()) {
    if catch_unwind(|| x()).is_err() { unsafe { /* UB */ } }
}

You can't negatively reason about UB in this way. Just because the spec doesn't define what the behaviour of unwinding is, a specific compiler implementation on specific hardware may define that behaviour to be something sensible, or a future version of the spec may define that behaviour.

This is different from the &mut example, because in that case you are not relying on negative reasoning, you are relying on an explicit guarantee made by the spec.

edit:
If the spec explicitly said "extern C functions will never unwind" then that of course would be a different story.

Lokathor

Lokathor commented on Sep 11, 2019

@Lokathor
Contributor

It doesn't make sense for the spec to say "this is UB now but might not be UB in future" or for the spec to say "this is UB and will stay UB forever" - the spec describes rust at a point in time.

Actually a lot of people doing unsafe code really want to know the difference between "this is UB now and we're resolving to keep it always UB (at least for this edition)" compared to "this us UB now because we haven't gotten there yet, but it might change later".

gnzlbg

gnzlbg commented on Sep 11, 2019

@gnzlbg
ContributorAuthor

@Diggsey

You can't negatively reason about UB in this way.

Where do we say so?

35 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-abstract-machineTopic: concerning the abstract machine in general (as opposed to any specific part of it)C-terminologyCategory: Discussing terminology -- which term to use, how to define it, adding it to the glossaryS-pending-documentationStatus: The issue is "resolved," but this resolution needs documentationTriagedVisited during a backlog bonanza meeting

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @comex@RalfJung@Diggsey@Centril@gnzlbg

        Issue actions

          Clarify the definition of undefined behavior · Issue #202 · rust-lang/unsafe-code-guidelines