系列文章目录
第二十二章 QEMU系统仿真的主循环退出分析
文章目录
- 系列文章目录
- 前言
- 一、QEMU是什么?
- 二、QEMU系统仿真的启动分析
- 总结
前言
本文以 QEMU 8.2.2 为例,分析其作为系统仿真工具的启动过程,并为读者展示各种 QEMU 系统仿真的启动配置实例。
本文读者需要具备一定的 QEMU 系统仿真使用经验,并对 C 语言编程有一定了解。
一、QEMU是什么?
QEMU 是一个通用且开源的机器模拟器和虚拟机。
其官方主页是:https://www.qemu.org/
二、QEMU系统仿真的启动分析
1.系统仿真的初始化代码
QEMU 作为系统仿真工具,其入口代码在 system/main.c 文件中,初始化函数 qemu_init() 的实现在 system/vl.c 文件中,在 QEMU 虚拟机进入主循环后,如果运行异常将退出主循环,退出时需要做一些善后工作,本篇文章将完成以下代码部分的分析。
2.主循环
这部分代码在 system/vl.c 文件中,实现如下:
int (*qemu_main)(void) = qemu_default_main;
int main(int argc, char **argv)
{
qemu_init(argc, argv);
return qemu_main();
}
3. qemu_default_main()
函数 qemu_default_main() 在 /system/vl.c 文件中,定义如下:
int qemu_default_main(void)
{
int status;
status = qemu_main_loop();
qemu_cleanup(status);
return status;
}
前边我们已经分析了函数 qemu_main_loop(),本文将分析函数 qemu_cleanup()。
4. qemu_cleanup()
函数 qemu_cleanup() 在 /system/runstate.c 文件中,定义如下:
void qemu_cleanup(int status)
{
gdb_exit(status);
/*
* cleaning up the migration object cancels any existing migration
* try to do this early so that it also stops using devices.
*/
migration_shutdown();
/*
* Close the exports before draining the block layer. The export
* drivers may have coroutines yielding on it, so we need to clean
* them up before the drain, as otherwise they may be get stuck in
* blk_wait_while_drained().
*/
blk_exp_close_all();
/* No more vcpu or device emulation activity beyond this point */
vm_shutdown();
replay_finish();
/*
* We must cancel all block jobs while the block layer is drained,
* or cancelling will be affected by throttling and thus may block
* for an extended period of time.
* Begin the drained section after vm_shutdown() to avoid requests being
* stuck in the BlockBackend's request queue.
* We do not need to end this section, because we do not want any
* requests happening from here on anyway.
*/
bdrv_drain_all_begin();
job_cancel_sync_all();
bdrv_close_all();
/* vhost-user must be cleaned up before chardevs. */
tpm_cleanup();
net_cleanup();
audio_cleanup();
monitor_cleanup();
qemu_chr_cleanup();
user_creatable_cleanup();
/* TODO: unref root container, check all devices are ok */
}
gdb_exit()
函数 gdb_exit() 在 /gdbstub/system.c 文件中,定义如下:
/* Tell the remote gdb that the process has exited. */
void gdb_exit(int code)
{
char buf[4];
if (!gdbserver_state.init) {
return;
}
trace_gdbstub_op_exiting((uint8_t)code);
if (gdbserver_state.allow_stop_reply) {
snprintf(buf, sizeof(buf), "W%02x", (uint8_t)code);
gdb_put_packet(buf);
gdbserver_state.allow_stop_reply = false;
}
qemu_chr_fe_deinit(&gdbserver_system_state.chr, true);
}
gdb_put_packet()
函数 gdb_put_packet() 在 /gdbstub/gdbstub.c 文件中,定义如下:
/* return -1 if error, 0 if OK */
int gdb_put_packet_binary(const char *buf, int len, bool dump)
{
int csum, i;
uint8_t footer[3];
if (dump && trace_event_get_state_backends(TRACE_GDBSTUB_IO_BINARYREPLY)) {
hexdump(buf, len, trace_gdbstub_io_binaryreply);
}
for(;;) {
g_byte_array_set_size(gdbserver_state.last_packet, 0);
g_byte_array_append(gdbserver_state.last_packet,
(const uint8_t *) "$", 1);
g_byte_array_append(gdbserver_state.last_packet,
(const uint8_t *) buf, len);
csum = 0;
for(i = 0; i < len; i++) {
csum += buf[i];
}
footer[0] = '#';
footer[1] = tohex((csum >> 4) & 0xf);
footer[2] = tohex((csum) & 0xf);
g_byte_array_append(gdbserver_state.last_packet, footer, 3);
gdb_put_buffer(gdbserver_state.last_packet->data,
gdbserver_state.last_packet->len);
if (gdb_got_immediate_ack()) {
break;
}
}
return 0;
}
/* return -1 if error, 0 if OK */
int gdb_put_packet(const char *buf)
{
trace_gdbstub_io_reply(buf);
return gdb_put_packet_binary(buf, strlen(buf), false);
}
migration_shutdown()
函数 migration_shutdown() 在 /migration/migration.c 文件中,定义如下:
void migration_shutdown(void)
{
/*
* When the QEMU main thread exit, the COLO thread
* may wait a semaphore. So, we should wakeup the
* COLO thread before migration shutdown.
*/
colo_shutdown();
/*
* Cancel the current migration - that will (eventually)
* stop the migration using this structure
*/
migration_cancel(NULL);
object_unref(OBJECT(current_migration));
/*
* Cancel outgoing migration of dirty bitmaps. It should
* at least unref used block nodes.
*/
dirty_bitmap_mig_cancel_outgoing();
/*
* Cancel incoming migration of dirty bitmaps. Dirty bitmaps
* are non-critical data, and their loss never considered as
* something serious.
*/
dirty_bitmap_mig_cancel_incoming();
}
colo_shutdown()
函数 colo_shutdown() 在 /migration/colo.c 文件中,定义如下:
void colo_shutdown(void)
{
MigrationIncomingState *mis = NULL;
MigrationState *s = NULL;
switch (get_colo_mode()) {
case COLO_MODE_PRIMARY:
s = migrate_get_current();
qemu_event_set(&s->colo_checkpoint_event);
qemu_sem_post(&s->colo_exit_sem);
break;
case COLO_MODE_SECONDARY:
mis = migration_incoming_get_current();
qemu_sem_post(&mis->colo_incoming_sem);
break;
default:
break;
}
}
migration_cancel()
函数 migration_cancel() 在 /migration/migration.c 文件中,定义如下:
void migration_cancel(const Error *error)
{
if (error) {
migrate_set_error(current_migration, error);
}
if (migrate_dirty_limit()) {
qmp_cancel_vcpu_dirty_limit(false, -1, NULL);
}
migrate_fd_cancel(current_migration);
}
函数 migrate_dirty_limit() 在 /migration/options.c 文件中,定义如下:
bool migrate_dirty_limit(void)
{
MigrationState *s = migrate_get_current();
return s->capabilities[MIGRATION_CAPABILITY_DIRTY_LIMIT];
}
函数 qmp_cancel_vcpu_dirty_limit() 在 /system/dirtylimit.c 文件中,定义如下:
void qmp_cancel_vcpu_dirty_limit(bool has_cpu_index,
int64_t cpu_index,
Error **errp)
{
if (!kvm_enabled() || !kvm_dirty_ring_enabled()) {
return;
}
if (has_cpu_index && !dirtylimit_vcpu_index_valid(cpu_index)) {
error_setg(errp, "incorrect cpu index specified");
return;
}
if (!dirtylimit_is_allowed()) {
error_setg(errp, "can't cancel dirty page rate limit while"
" migration is running");
return;
}
if (!dirtylimit_in_service()) {
return;
}
dirtylimit_state_lock();
if (has_cpu_index) {
dirtylimit_set_vcpu(cpu_index, 0, false);
} else {
dirtylimit_set_all(0, false);
}
if (!dirtylimit_state->limited_nvcpu) {
dirtylimit_cleanup();
}
dirtylimit_state_unlock();
}
函数 migrate_fd_cancel() 在 /migration/migration.c 文件中,定义如下:
static void migrate_fd_cancel(MigrationState *s)
{
int old_state ;
trace_migrate_fd_cancel();
WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
if (s->rp_state.from_dst_file) {
/* shutdown the rp socket, so causing the rp thread to shutdown */
qemu_file_shutdown(s->rp_state.from_dst_file);
}
}
do {
old_state = s->state;
if (!migration_is_running(old_state)) {
break;
}
/* If the migration is paused, kick it out of the pause */
if (old_state == MIGRATION_STATUS_PRE_SWITCHOVER) {
qemu_sem_post(&s->pause_sem);
}
migrate_set_state(&s->state, old_state, MIGRATION_STATUS_CANCELLING);
} while (s->state != MIGRATION_STATUS_CANCELLING);
/*
* If we're unlucky the migration code might be stuck somewhere in a
* send/write while the network has failed and is waiting to timeout;
* if we've got shutdown(2) available then we can force it to quit.
*/
if (s->state == MIGRATION_STATUS_CANCELLING) {
WITH_QEMU_LOCK_GUARD(&s->qemu_file_lock) {
if (s->to_dst_file) {
qemu_file_shutdown(s->to_dst_file);
}
}
}
if (s->state == MIGRATION_STATUS_CANCELLING && s->block_inactive) {
Error *local_err = NULL;
bdrv_activate_all(&local_err);
if (local_err) {
error_report_err(local_err);
} else {
s->block_inactive = false;
}
}
}
blk_exp_close_all()
函数 blk_exp_close_all() 在 /block/export/export.c 文件中,定义如下:
/* type == BLOCK_EXPORT_TYPE__MAX for all types */
void blk_exp_close_all_type(BlockExportType type)
{
BlockExport *exp, *next;
assert(in_aio_context_home_thread(qemu_get_aio_context()));
QLIST_FOREACH_SAFE(exp, &block_exports, next, next) {
if (type != BLOCK_EXPORT_TYPE__MAX && exp->drv->type != type) {
continue;
}
blk_exp_request_shutdown(exp);
}
AIO_WAIT_WHILE_UNLOCKED(NULL, blk_exp_has_type(type));
}
void blk_exp_close_all(void)
{
blk_exp_close_all_type(BLOCK_EXPORT_TYPE__MAX);
}
函数 blk_exp_request_shutdown() 在 /block/export/export.c 文件中,定义如下:
/*
* Drops the user reference to the export and requests that all client
* connections and other internally held references start to shut down. When
* the function returns, there may still be active references while the export
* is in the process of shutting down.
*
* Acquires exp->ctx internally. Callers must *not* hold the lock.
*/
void blk_exp_request_shutdown(BlockExport *exp)
{
AioContext *aio_context = exp->ctx;
aio_context_acquire(aio_context);
/*
* If the user doesn't own the export any more, it is already shutting
* down. We must not call .request_shutdown and decrease the refcount a
* second time.
*/
if (!exp->user_owned) {
goto out;
}
exp->drv->request_shutdown(exp);
assert(exp->user_owned);
exp->user_owned = false;
blk_exp_unref(exp);
out:
aio_context_release(aio_context);
}
vm_shutdown()
函数 vm_shutdown() 在 /system/cpus.c 文件中,定义如下:
static int do_vm_stop(RunState state, bool send_stop)
{
int ret = 0;
if (runstate_is_running()) {
runstate_set(state);
cpu_disable_ticks();
pause_all_vcpus();
vm_state_notify(0, state);
if (send_stop) {
qapi_event_send_stop();
}
}
bdrv_drain_all();
ret = bdrv_flush_all();
trace_vm_stop_flush_all(ret);
return ret;
}
/* Special vm_stop() variant for terminating the process. Historically clients
* did not expect a QMP STOP event and so we need to retain compatibility.
*/
int vm_shutdown(void)
{
return do_vm_stop(RUN_STATE_SHUTDOWN, false);
}
cpu_disable_ticks()
函数 cpu_disable_ticks() 在 /system/cpu-timers.c 文件中,定义如下:
/*
* disable cpu_get_ticks() : the clock is stopped. You must not call
* cpu_get_ticks() after that.
* Caller must hold BQL which serves as mutex for vm_clock_seqlock.
*/
void cpu_disable_ticks(void)
{
seqlock_write_lock(&timers_state.vm_clock_seqlock,
&timers_state.vm_clock_lock);
if (timers_state.cpu_ticks_enabled) {
timers_state.cpu_ticks_offset += cpu_get_host_ticks();
timers_state.cpu_clock_offset = cpu_get_clock_locked();
timers_state.cpu_ticks_enabled = 0;
}
seqlock_write_unlock(&timers_state.vm_clock_seqlock,
&timers_state.vm_clock_lock);
}
pause_all_vcpus()
函数 pause_all_vcpus() 在 /system/cpus.c 文件中,定义如下:
void pause_all_vcpus(void)
{
CPUState *cpu;
qemu_clock_enable(QEMU_CLOCK_VIRTUAL, false);
CPU_FOREACH(cpu) {
if (qemu_cpu_is_self(cpu)) {
qemu_cpu_stop(cpu, true);
} else {
cpu->stop = true;
qemu_cpu_kick(cpu);
}
}
/* We need to drop the replay_lock so any vCPU threads woken up
* can finish their replay tasks
*/
replay_mutex_unlock();
while (!all_vcpus_paused()) {
qemu_cond_wait(&qemu_pause_cond, &qemu_global_mutex);
CPU_FOREACH(cpu) {
qemu_cpu_kick(cpu);
}
}
qemu_mutex_unlock_iothread();
replay_mutex_lock();
qemu_mutex_lock_iothread();
}
vm_state_notify()
函数 vm_state_notify() 在 /system/runstate.c 文件中,定义如下:
void vm_state_notify(bool running, RunState state)
{
VMChangeStateEntry *e, *next;
trace_vm_state_notify(running, state, RunState_str(state));
if (running) {
QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
if (e->prepare_cb) {
e->prepare_cb(e->opaque, running, state);
}
}
QTAILQ_FOREACH_SAFE(e, &vm_change_state_head, entries, next) {
e->cb(e->opaque, running, state);
}
} else {
QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
if (e->prepare_cb) {
e->prepare_cb(e->opaque, running, state);
}
}
QTAILQ_FOREACH_REVERSE_SAFE(e, &vm_change_state_head, entries, next) {
e->cb(e->opaque, running, state);
}
}
}
qapi_event_send_stop()
函数 qapi_event_send_stop() 在 /build/qapi/qapi-event-run-state.c 文件中,定义如下:
void qapi_event_send_stop(void)
{
QDict *qmp;
qmp = qmp_event_build_dict("STOP");
qapi_event_emit(QAPI_EVENT_STOP, qmp);
qobject_unref(qmp);
}
bdrv_drain_all()
函数 qapi_event_send_stop() 在 /block/io.c 文件中,定义如下:
void coroutine_mixed_fn bdrv_drain_all_begin(void)
{
BlockDriverState *bs = NULL;
if (qemu_in_coroutine()) {
bdrv_co_yield_to_drain(NULL, true, NULL, true);
return;
}
/*
* bdrv queue is managed by record/replay,
* waiting for finishing the I/O requests may
* be infinite
*/
if (replay_events_enabled()) {
return;
}
bdrv_drain_all_begin_nopoll();
/* Now poll the in-flight requests */
AIO_WAIT_WHILE_UNLOCKED(NULL, bdrv_drain_all_poll());
while ((bs = bdrv_next_all_states(bs))) {
bdrv_drain_assert_idle(bs);
}
}
void bdrv_drain_all_end_quiesce(BlockDriverState *bs)
{
GLOBAL_STATE_CODE();
g_assert(bs->quiesce_counter > 0);
g_assert(!bs->refcnt);
while (bs->quiesce_counter) {
bdrv_do_drained_end(bs, NULL);
}
}
void bdrv_drain_all_end(void)
{
BlockDriverState *bs = NULL;
GLOBAL_STATE_CODE();
/*
* bdrv queue is managed by record/replay,
* waiting for finishing the I/O requests may
* be endless
*/
if (replay_events_enabled()) {
return;
}
while ((bs = bdrv_next_all_states(bs))) {
AioContext *aio_context = bdrv_get_aio_context(bs);
aio_context_acquire(aio_context);
bdrv_do_drained_end(bs, NULL);
aio_context_release(aio_context);
}
assert(qemu_get_current_aio_context() == qemu_get_aio_context());
assert(bdrv_drain_all_count > 0);
bdrv_drain_all_count--;
}
void bdrv_drain_all(void)
{
GLOBAL_STATE_CODE();
bdrv_drain_all_begin();
bdrv_drain_all_end();
}
bdrv_flush_all()
函数 bdrv_flush_all() 在 /block/io.c 文件中,定义如下:
/*
* Flush ALL BDSes regardless of if they are reachable via a BlkBackend or not.
*/
int bdrv_flush_all(void)
{
BdrvNextIterator it;
BlockDriverState *bs = NULL;
int result = 0;
GLOBAL_STATE_CODE();
GRAPH_RDLOCK_GUARD_MAINLOOP();
/*
* bdrv queue is managed by record/replay,
* creating new flush request for stopping
* the VM may break the determinism
*/
if (replay_events_enabled()) {
return result;
}
for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
AioContext *aio_context = bdrv_get_aio_context(bs);
int ret;
aio_context_acquire(aio_context);
ret = bdrv_flush(bs);
if (ret < 0 && !result) {
result = ret;
}
aio_context_release(aio_context);
}
return result;
}
replay_finish()
函数 replay_finish() 在 /replay/replay.c 文件中,定义如下:
void replay_finish(void)
{
if (replay_mode == REPLAY_MODE_NONE) {
return;
}
replay_save_instructions();
/* finalize the file */
if (replay_file) {
if (replay_mode == REPLAY_MODE_RECORD) {
/*
* Can't do it in the signal handler, therefore
* add shutdown event here for the case of Ctrl-C.
*/
replay_shutdown_request(SHUTDOWN_CAUSE_HOST_SIGNAL);
/* write end event */
replay_put_event(EVENT_END);
/* write header */
fseek(replay_file, 0, SEEK_SET);
replay_put_dword(REPLAY_VERSION);
}
fclose(replay_file);
replay_file = NULL;
}
g_free(replay_filename);
replay_filename = NULL;
g_free(replay_snapshot);
replay_snapshot = NULL;
replay_finish_events();
replay_mode = REPLAY_MODE_NONE;
}
bdrv_drain_all_begin()
函数 bdrv_drain_all_begin() 在 /block/io.c 文件中,定义如下:
void coroutine_mixed_fn bdrv_drain_all_begin(void)
{
BlockDriverState *bs = NULL;
if (qemu_in_coroutine()) {
bdrv_co_yield_to_drain(NULL, true, NULL, true);
return;
}
/*
* bdrv queue is managed by record/replay,
* waiting for finishing the I/O requests may
* be infinite
*/
if (replay_events_enabled()) {
return;
}
bdrv_drain_all_begin_nopoll();
/* Now poll the in-flight requests */
AIO_WAIT_WHILE_UNLOCKED(NULL, bdrv_drain_all_poll());
while ((bs = bdrv_next_all_states(bs))) {
bdrv_drain_assert_idle(bs);
}
}
job_cancel_sync_all()
函数 job_cancel_sync_all() 在 /job.c 文件中,定义如下:
void job_cancel_sync_all(void)
{
Job *job;
JOB_LOCK_GUARD();
while ((job = job_next_locked(NULL))) {
job_cancel_sync_locked(job, true);
}
}
bdrv_close_all()
函数 bdrv_close_all() 在 /block.c 文件中,定义如下:
void bdrv_close_all(void)
{
GLOBAL_STATE_CODE();
assert(job_next(NULL) == NULL);
/* Drop references from requests still in flight, such as canceled block
* jobs whose AIO context has not been polled yet */
bdrv_drain_all();
blk_remove_all_bs();
blockdev_close_all_bdrv_states();
assert(QTAILQ_EMPTY(&all_bdrv_states));
}
tpm_cleanup()
如果定义了 CONFIG_TPM,那么调用 libtpm 的实现,否则定义如下:
#define tpm_cleanup()
net_cleanup()
函数 net_cleanup() 在 /net/net.c 文件中,定义如下:
void net_cleanup(void)
{
NetClientState *nc, **p = &QTAILQ_FIRST(&net_clients);
/*cleanup colo compare module for COLO*/
colo_compare_cleanup();
/*
* Walk the net_clients list and remove the netdevs but *not* any
* NET_CLIENT_DRIVER_NIC entries. The latter are owned by the device
* model which created them, and in some cases (e.g. xen-net-device)
* the device itself may do cleanup at exit and will be upset if we
* just delete its NIC from underneath it.
*
* Since qemu_del_net_client() may delete multiple entries, using
* QTAILQ_FOREACH_SAFE() is not safe here. The only safe pointer
* to keep as a bookmark is a NET_CLIENT_DRIVER_NIC entry, so keep
* 'p' pointing to either the head of the list, or the 'next' field
* of the latest NET_CLIENT_DRIVER_NIC, and operate on *p as we walk
* the list.
*
* The 'nc' variable isn't part of the list traversal; it's purely
* for convenience as too much '(*p)->' has a tendency to make the
* readers' eyes bleed.
*/
while (*p) {
nc = *p;
if (nc->info->type == NET_CLIENT_DRIVER_NIC) {
/* Skip NET_CLIENT_DRIVER_NIC entries */
p = &QTAILQ_NEXT(nc, next);
} else {
qemu_del_net_client(nc);
}
}
qemu_del_vm_change_state_handler(net_change_state_entry);
}
audio_cleanup()
函数 audio_cleanup() 在 /audio/audio.c 文件中,定义如下:
void audio_cleanup(void)
{
default_audio_state = NULL;
while (!QTAILQ_EMPTY(&audio_states)) {
AudioState *s = QTAILQ_FIRST(&audio_states);
QTAILQ_REMOVE(&audio_states, s, list);
free_audio_state(s);
}
}
monitor_cleanup()
函数 monitor_cleanup() 在 /monitor/monitor.c 文件中,定义如下:
void monitor_cleanup(void)
{
/*
* The dispatcher needs to stop before destroying the monitor and
* the I/O thread.
*
* We need to poll both qemu_aio_context and iohandler_ctx to make
* sure that the dispatcher coroutine keeps making progress and
* eventually terminates. qemu_aio_context is automatically
* polled by calling AIO_WAIT_WHILE_UNLOCKED on it, but we must poll
* iohandler_ctx manually.
*
* Letting the iothread continue while shutting down the dispatcher
* means that new requests may still be coming in. This is okay,
* we'll just leave them in the queue without sending a response
* and monitor_data_destroy() will free them.
*/
WITH_QEMU_LOCK_GUARD(&monitor_lock) {
qmp_dispatcher_co_shutdown = true;
}
qmp_dispatcher_co_wake();
AIO_WAIT_WHILE_UNLOCKED(NULL,
(aio_poll(iohandler_get_aio_context(), false),
qatomic_read(&qmp_dispatcher_co)));
/*
* We need to explicitly stop the I/O thread (but not destroy it),
* clean up the monitor resources, then destroy the I/O thread since
* we need to unregister from chardev below in
* monitor_data_destroy(), and chardev is not thread-safe yet
*/
if (mon_iothread) {
iothread_stop(mon_iothread);
}
/* Flush output buffers and destroy monitors */
qemu_mutex_lock(&monitor_lock);
monitor_destroyed = true;
while (!QTAILQ_EMPTY(&mon_list)) {
Monitor *mon = QTAILQ_FIRST(&mon_list);
QTAILQ_REMOVE(&mon_list, mon, entry);
/* Permit QAPI event emission from character frontend release */
qemu_mutex_unlock(&monitor_lock);
monitor_flush(mon);
monitor_data_destroy(mon);
qemu_mutex_lock(&monitor_lock);
g_free(mon);
}
qemu_mutex_unlock(&monitor_lock);
if (mon_iothread) {
iothread_destroy(mon_iothread);
mon_iothread = NULL;
}
}
qemu_chr_cleanup()
函数 qemu_chr_cleanup() 在 /chardev/char.c 文件中,定义如下:
void qemu_chr_cleanup(void)
{
object_unparent(get_chardevs_root());
}
user_creatable_cleanup()
函数 user_creatable_cleanup() 在 /qom/object_interfaces.c 文件中,定义如下:
void user_creatable_cleanup(void)
{
object_unparent(object_get_objects_root());
}
总结
以上分析了 QEMU 系统仿真在启动过程中,QEMU 系统进入主循环后,在运行异常时退出主循环的代码部分。
在完成此动作后,QEMU 程序退出运行,至此整个 QEMU 系统的生命周期结束。