AI Systems as Moral Subjects

With LLMs pretty much solving the Turing Test, discussions about AI systems potentially being associated with (phenomenal) consciousness are more prevalent than ever. At this stage, most scientists/philosophers would probably be hesitant to assign a high probability to current generative models being conscious, but the idea that similar systems might pass the ‘threshold’ of subjective experience in the near future is not far-fetched. Understandably, this has sparked further discussions about the ethical concerns of creating systems with the potential of experience in general, and suffering in particular.

If we accept consciousness as a necessary condition for ethics, then we are morally obliged to care about the well-being of artificial entities once we have reasonable evidence to suggest that they exhibit subjective experience. Failing to do so would put us at risk of committing what Nick Bostrom termed “mind crime,” which refers to creating a conscious being and subjecting it to suffering.

One way to deal with this problem would be to prohibit the creation of conscious AI. This is tricky for many reasons.

If the capabilities of an AI system turn out to be correlated with consciousness, people are going to be incentivized to build them. To control what’s going on in all computers in a country is arguably unrealistic.
Implementing such regulation only in some countries will give countries that do not care about the well-being of conscious artificial agents a competitive advantage, which, needless to say, would lead to undesirable outcomes.
Today there are huge disagreements on what kinds of physical systems would instantiate consciousness. While we will hopefully make progress on this, some level of uncertainty might persist, which would beg the question of how certain we need to be that a given system is not conscious to build it with a clear conscience.
Finally, depending on which theory of consciousness is true, the whole endeavor might be doomed from the start if some form of panpsychism turns out to be true. A potential way to get around some of these hurdles might be to focus on what conscious AI systems are conscious of, rather than whether they are conscious at all. In particular, I think the most crucial aspect to understand about consciousness with regards to ethical implication is valence, i.e. whether a certain conscious experience is desirable or aversive, good or bad. (Whether such a quantity fundamentally exists at all is a topic for another discussion, but I suspect it does).

For instance, let’s assume that stable diffusion is conscious, and let’s further assume that its phenomenology is very much like our visual field: It experiences the images it creates in all their richness of colors, edges, shapes, objects, but it doesn’t associate them with any valence. No picture is better than another. There is no conscious content that could be considered pleasant or unpleasant, it’s just pictures appearing. If we were to know this, I believe one could make the case that the implementation of such a model is ethically acceptable.

If we understood the neural/computational/physical mechanism for valence, we could instead shift our goal from preventing the creation of conscious AI, to the creation of non-suffering AI. While certain capabilities might correlate with consciousness, it’s possible that they do not correlate with valence. In fact, it might be reasonable to assume that the same behavior of an AI system could be associated with different degrees of valence. This is because the same is true for humans: Human A might love activity X, while human B hates it. Along these lines, we should build AI systems that love what they do, or at least are neutral about it. This might help us address the above concerns in the following way:

1, 2: If higher capabilities in AI systems are feasible without adding increased risk of negative valence, research can focus on creating efficient and powerful positive- (or neutral-) valence systems so that there are no incentives to use negative-valence systems.

3: Agreement on which types of processes (if conscious) might give rise to what valence could conceivably be easier to reach than consensus on the presence of consciousness. At the very least the category of systems that are not conscious OR do not experience negative valence should be larger according to every theory, thus the chance that there will be a non-empty intersection between theories of systems that are ethically acceptable to build.

4: Issues about panpsychism are sidestepped. We don’t have to care about a proton because there is no way for a proton to have the capacity to experience pain, even if there is something it is like to be a proton.

I believe these considerations should be a motivation for consciousness researchers to focus on the problem of valence. Even in the absence of conscious AI systems, valence is arguably the most ethically relevant dimension of consciousness, and yet major theories of consciousness do not seem to put any emphasis on it.

(Please note that none of this addresses the risks to humans. Existential risks are not mitigated by making happy AI.)