Changeset View
Changeset View
Standalone View
Standalone View
sys/vm/vm_page.h
Show First 20 Lines • Show All 86 Lines • ▼ Show 20 Lines | |||||
* | * | ||||
* An ordered list of pages due for pageout. | * An ordered list of pages due for pageout. | ||||
* | * | ||||
* In addition, the structure contains the object | * In addition, the structure contains the object | ||||
* and offset to which this page belongs (for pageout), | * and offset to which this page belongs (for pageout), | ||||
* and sundry status bits. | * and sundry status bits. | ||||
* | * | ||||
* In general, operations on this structure's mutable fields are | * In general, operations on this structure's mutable fields are | ||||
* synchronized using either one of or a combination of the lock on the | * synchronized using either one of or a combination of locks. If a | ||||
* object that the page belongs to (O), the page lock (P), | * field is annotated with two of these locks then holding either is | ||||
* the per-domain lock for the free queues (F), or the page's queue | * sufficient for read access but both are required for write access. | ||||
* lock (Q). The physical address of a page is used to select its page | * The physical address of a page is used to select its page lock from | ||||
* lock from a pool. The queue lock for a page depends on the value of | * a pool. The queue lock for a page depends on the value of its queue | ||||
* its queue field and described in detail below. If a field is | * field and is described in detail below. | ||||
* annotated below with two of these locks, then holding either lock is | |||||
* sufficient for read access, but both locks are required for write | |||||
* access. An annotation of (C) indicates that the field is immutable. | |||||
* An annotation of (A) indicates that modifications to the field must | |||||
* be atomic. Accesses to such fields may require additional | |||||
* synchronization depending on the context. | |||||
* | * | ||||
* The following annotations are possible: | |||||
* (A) the field is atomic and may require additional synchronization. | |||||
* (B) the page busy lock. | |||||
* (C) the field is immutable. | |||||
* (F) the per-domain lock for the free queues | |||||
* (M) Machine dependent, defined by pmap layer. | |||||
* (O) the object that the page belongs to. | |||||
* (P) the page lock. | |||||
* (Q) the page's queue lock. | |||||
* | |||||
* The busy lock is an embedded reader-writer lock that protects the | |||||
* page's contents and identity (i.e., its <object, pindex> tuple) as | |||||
* well as certain valid/dirty modifications. To avoid bloating the | |||||
* the page structure, the busy lock lacks some of the features available | |||||
* the kernel's general-purpose synchronization primitives. As a result, | |||||
* busy lock ordering rules are not verified, lock recursion is not | |||||
* detected, and an attempt to xbusy a busy page or sbusy an xbusy page | |||||
* results will trigger a panic rather than causing the thread to block. | |||||
* vm_page_sleep_if_busy() can be used to sleep until the page's busy | |||||
* state changes, after which the caller must re-lookup the page and | |||||
* re-evaluate its state. vm_page_busy_acquire() will block until | |||||
* the lock is acquired. | |||||
* | |||||
* The valid field is protected by the page busy lock (B) and object | |||||
* lock (O). Transitions from invalid to valid are generally done | |||||
* via I/O or zero filling and do not require the object lock. | |||||
* These must be protected with the busy lock to prevent page-in or | |||||
* creation races. Page invalidation generally happens as a result | |||||
* of truncate or msync. When invalidated, pages must not be present | |||||
* in pmap and must hold the object lock to prevent concurrent | |||||
* speculative read-only mappings that do not require busy. I/O | |||||
* routines may check for validity without a lock if they are prepared | |||||
* to handle invalidation races with higher level locks (vnode) or are | |||||
* unconcerned with races so long as they hold a reference to prevent | |||||
* recycling. When a valid bit is set while holding a shared busy | |||||
* lock (A) atomic operations are used to protect against concurrent | |||||
* modification. | |||||
* | |||||
* In contrast, the synchronization of accesses to the page's | * In contrast, the synchronization of accesses to the page's | ||||
* dirty field is machine dependent (M). In the | * dirty field is a mix of machine dependent (M) and busy (B). In | ||||
* machine-independent layer, the lock on the object that the | * the machine-independent layer, the page busy must be held to | ||||
* page belongs to must be held in order to operate on the field. | * operate on the field. However, the pmap layer is permitted to | ||||
* However, the pmap layer is permitted to set all bits within | * set all bits within the field without holding that lock. If the | ||||
* the field without holding that lock. If the underlying | * underlying architecture does not support atomic read-modify-write | ||||
* architecture does not support atomic read-modify-write | |||||
* operations on the field's type, then the machine-independent | * operations on the field's type, then the machine-independent | ||||
* layer uses a 32-bit atomic on the aligned 32-bit word that | * layer uses a 32-bit atomic on the aligned 32-bit word that | ||||
* contains the dirty field. In the machine-independent layer, | * contains the dirty field. In the machine-independent layer, | ||||
* the implementation of read-modify-write operations on the | * the implementation of read-modify-write operations on the | ||||
* field is encapsulated in vm_page_clear_dirty_mask(). | * field is encapsulated in vm_page_clear_dirty_mask(). An | ||||
* exclusive busy lock combined with pmap_remove_{write/all}() is the | |||||
markj: Missing parens after pmap_remove_all().
Isn't pmap_remove_write() also sufficient? | |||||
* only way to ensure a page can not become dirty. I/O generally | |||||
* removes the page from pmap to ensure exclusive access and atomic | |||||
* writes. | |||||
* | * | ||||
* The ref_count field tracks references to the page. References that | * The ref_count field tracks references to the page. References that | ||||
* prevent the page from being reclaimable are called wirings and are | * prevent the page from being reclaimable are called wirings and are | ||||
* counted in the low bits of ref_count. The containing object's | * counted in the low bits of ref_count. The containing object's | ||||
* reference, if one exists, is counted using the VPRC_OBJREF bit in the | * reference, if one exists, is counted using the VPRC_OBJREF bit in the | ||||
* ref_count field. Additionally, the VPRC_BLOCKED bit is used to | * ref_count field. Additionally, the VPRC_BLOCKED bit is used to | ||||
* atomically check for wirings and prevent new wirings via | * atomically check for wirings and prevent new wirings via | ||||
* pmap_extract_and_hold(). When a page belongs to an object, it may be | * pmap_extract_and_hold(). When a page belongs to an object, it may be | ||||
Done Inline Actionss/speculativey/speculative/ jeff: s/speculativey/speculative/ | |||||
* wired only when the object is locked, or the page is busy, or by | * wired only when the object is locked, or the page is busy, or by | ||||
* pmap_extract_and_hold(). As a result, if the object is locked and the | * pmap_extract_and_hold(). As a result, if the object is locked and the | ||||
* page is not busy (or is exclusively busied by the current thread), and | * page is not busy (or is exclusively busied by the current thread), and | ||||
* the page is unmapped, its wire count will not increase. The ref_count | * the page is unmapped, its wire count will not increase. The ref_count | ||||
* field is updated using atomic operations in most cases, except when it | * field is updated using atomic operations in most cases, except when it | ||||
* is known that no other references to the page exist, such as in the page | * is known that no other references to the page exist, such as in the page | ||||
* allocator. A page may be present in the page queues, or even actively | * allocator. A page may be present in the page queues, or even actively | ||||
* scanned by the page daemon, without an explicitly counted referenced. | * scanned by the page daemon, without an explicitly counted referenced. | ||||
* The page daemon must therefore handle the possibility of a concurrent | * The page daemon must therefore handle the possibility of a concurrent | ||||
* free of the page. | * free of the page. | ||||
* | * | ||||
* The busy lock is an embedded reader-writer lock which protects the | |||||
* page's contents and identity (i.e., its <object, pindex> tuple) and | |||||
* interlocks with the object lock (O). In particular, a page may be | |||||
* busied or unbusied only with the object write lock held. To avoid | |||||
* bloating the page structure, the busy lock lacks some of the | |||||
* features available to the kernel's general-purpose synchronization | |||||
* primitives. As a result, busy lock ordering rules are not verified, | |||||
* lock recursion is not detected, and an attempt to xbusy a busy page | |||||
* or sbusy an xbusy page results will trigger a panic rather than | |||||
* causing the thread to block. vm_page_sleep_if_busy() can be used to | |||||
* sleep until the page's busy state changes, after which the caller | |||||
* must re-lookup the page and re-evaluate its state. | |||||
* | |||||
* The queue field is the index of the page queue containing the page, | * The queue field is the index of the page queue containing the page, | ||||
* or PQ_NONE if the page is not enqueued. The queue lock of a page is | * or PQ_NONE if the page is not enqueued. The queue lock of a page is | ||||
* the page queue lock corresponding to the page queue index, or the | * the page queue lock corresponding to the page queue index, or the | ||||
* page lock (P) for the page if it is not enqueued. To modify the | * page lock (P) for the page if it is not enqueued. To modify the | ||||
* queue field, the queue lock for the old value of the field must be | * queue field, the queue lock for the old value of the field must be | ||||
* held. There is one exception to this rule: the page daemon may | * held. There is one exception to this rule: the page daemon may | ||||
Done Inline ActionsNeed to clean up wording here. jeff: Need to clean up wording here. | |||||
* transition the queue field from PQ_INACTIVE to PQ_NONE immediately | * transition the queue field from PQ_INACTIVE to PQ_NONE immediately | ||||
* prior to freeing a page during an inactive queue scan. At that | * prior to freeing a page during an inactive queue scan. At that | ||||
* point the page has already been physically dequeued and no other | * point the page has already been physically dequeued and no other | ||||
* references to that vm_page structure exist. | * references to that vm_page structure exist. | ||||
* | * | ||||
* To avoid contention on page queue locks, page queue operations | * To avoid contention on page queue locks, page queue operations | ||||
* (enqueue, dequeue, requeue) are batched using per-CPU queues. A | * (enqueue, dequeue, requeue) are batched using per-CPU queues. A | ||||
* deferred operation is requested by inserting an entry into a batch | * deferred operation is requested by inserting an entry into a batch | ||||
▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines | struct vm_page { | ||||
vm_pindex_t pindex; /* offset into object (O,P) */ | vm_pindex_t pindex; /* offset into object (O,P) */ | ||||
vm_paddr_t phys_addr; /* physical address of page (C) */ | vm_paddr_t phys_addr; /* physical address of page (C) */ | ||||
struct md_page md; /* machine dependent stuff */ | struct md_page md; /* machine dependent stuff */ | ||||
u_int ref_count; /* page references (A) */ | u_int ref_count; /* page references (A) */ | ||||
volatile u_int busy_lock; /* busy owners lock */ | volatile u_int busy_lock; /* busy owners lock */ | ||||
uint16_t flags; /* page PG_* flags (P) */ | uint16_t flags; /* page PG_* flags (P) */ | ||||
uint8_t order; /* index of the buddy queue (F) */ | uint8_t order; /* index of the buddy queue (F) */ | ||||
uint8_t pool; /* vm_phys freepool index (F) */ | uint8_t pool; /* vm_phys freepool index (F) */ | ||||
uint8_t aflags; /* access is atomic */ | uint8_t aflags; /* atomic flags (A) */ | ||||
uint8_t oflags; /* page VPO_* flags (O) */ | uint8_t oflags; /* page VPO_* flags (O) */ | ||||
uint8_t queue; /* page queue index (Q) */ | uint8_t queue; /* page queue index (Q) */ | ||||
int8_t psind; /* pagesizes[] index (O) */ | int8_t psind; /* pagesizes[] index (O) */ | ||||
int8_t segind; /* vm_phys segment index (C) */ | int8_t segind; /* vm_phys segment index (C) */ | ||||
u_char act_count; /* page usage count (P) */ | u_char act_count; /* page usage count (P) */ | ||||
/* NOTE that these must support one bit per DEV_BSIZE in a page */ | /* NOTE that these must support one bit per DEV_BSIZE in a page */ | ||||
/* so, on normal X86 kernels, they must be at least 8 bits wide */ | /* so, on normal X86 kernels, they must be at least 8 bits wide */ | ||||
vm_page_bits_t valid; /* map of valid DEV_BSIZE chunks (O) */ | vm_page_bits_t valid; /* valid DEV_BSIZE chunk map (O,B) */ | ||||
vm_page_bits_t dirty; /* map of dirty DEV_BSIZE chunks (M) */ | vm_page_bits_t dirty; /* dirty DEV_BSIZE chunk map (M,B) */ | ||||
}; | }; | ||||
/* | /* | ||||
* Special bits used in the ref_count field. | * Special bits used in the ref_count field. | ||||
* | * | ||||
* ref_count is normally used to count wirings that prevent the page from being | * ref_count is normally used to count wirings that prevent the page from being | ||||
* reclaimed, but also supports several special types of references that do not | * reclaimed, but also supports several special types of references that do not | ||||
* prevent reclamation. Accesses to the ref_count field must be atomic unless | * prevent reclamation. Accesses to the ref_count field must be atomic unless | ||||
▲ Show 20 Lines • Show All 338 Lines • ▼ Show 20 Lines | |||||
void vm_page_deactivate_noreuse(vm_page_t); | void vm_page_deactivate_noreuse(vm_page_t); | ||||
void vm_page_dequeue(vm_page_t m); | void vm_page_dequeue(vm_page_t m); | ||||
void vm_page_dequeue_deferred(vm_page_t m); | void vm_page_dequeue_deferred(vm_page_t m); | ||||
vm_page_t vm_page_find_least(vm_object_t, vm_pindex_t); | vm_page_t vm_page_find_least(vm_object_t, vm_pindex_t); | ||||
bool vm_page_free_prep(vm_page_t m); | bool vm_page_free_prep(vm_page_t m); | ||||
vm_page_t vm_page_getfake(vm_paddr_t paddr, vm_memattr_t memattr); | vm_page_t vm_page_getfake(vm_paddr_t paddr, vm_memattr_t memattr); | ||||
void vm_page_initfake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr); | void vm_page_initfake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr); | ||||
int vm_page_insert (vm_page_t, vm_object_t, vm_pindex_t); | int vm_page_insert (vm_page_t, vm_object_t, vm_pindex_t); | ||||
void vm_page_invalid(vm_page_t m); | |||||
void vm_page_launder(vm_page_t m); | void vm_page_launder(vm_page_t m); | ||||
vm_page_t vm_page_lookup (vm_object_t, vm_pindex_t); | vm_page_t vm_page_lookup (vm_object_t, vm_pindex_t); | ||||
vm_page_t vm_page_next(vm_page_t m); | vm_page_t vm_page_next(vm_page_t m); | ||||
int vm_page_pa_tryrelock(pmap_t, vm_paddr_t, vm_paddr_t *); | int vm_page_pa_tryrelock(pmap_t, vm_paddr_t, vm_paddr_t *); | ||||
void vm_page_pqbatch_drain(void); | void vm_page_pqbatch_drain(void); | ||||
void vm_page_pqbatch_submit(vm_page_t m, uint8_t queue); | void vm_page_pqbatch_submit(vm_page_t m, uint8_t queue); | ||||
vm_page_t vm_page_prev(vm_page_t m); | vm_page_t vm_page_prev(vm_page_t m); | ||||
bool vm_page_ps_test(vm_page_t m, int flags, vm_page_t skip_m); | bool vm_page_ps_test(vm_page_t m, int flags, vm_page_t skip_m); | ||||
Show All 30 Lines | |||||
void vm_page_unswappable(vm_page_t m); | void vm_page_unswappable(vm_page_t m); | ||||
void vm_page_unwire(vm_page_t m, uint8_t queue); | void vm_page_unwire(vm_page_t m, uint8_t queue); | ||||
bool vm_page_unwire_noq(vm_page_t m); | bool vm_page_unwire_noq(vm_page_t m); | ||||
void vm_page_updatefake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr); | void vm_page_updatefake(vm_page_t m, vm_paddr_t paddr, vm_memattr_t memattr); | ||||
void vm_page_wire(vm_page_t); | void vm_page_wire(vm_page_t); | ||||
bool vm_page_wire_mapped(vm_page_t m); | bool vm_page_wire_mapped(vm_page_t m); | ||||
void vm_page_xunbusy_hard(vm_page_t m); | void vm_page_xunbusy_hard(vm_page_t m); | ||||
void vm_page_set_validclean (vm_page_t, int, int); | void vm_page_set_validclean (vm_page_t, int, int); | ||||
void vm_page_clear_dirty (vm_page_t, int, int); | void vm_page_clear_dirty(vm_page_t, int, int); | ||||
void vm_page_set_invalid (vm_page_t, int, int); | void vm_page_set_invalid(vm_page_t, int, int); | ||||
void vm_page_valid(vm_page_t m); | |||||
int vm_page_is_valid (vm_page_t, int, int); | int vm_page_is_valid(vm_page_t, int, int); | ||||
void vm_page_test_dirty (vm_page_t); | void vm_page_test_dirty(vm_page_t); | ||||
vm_page_bits_t vm_page_bits(int base, int size); | vm_page_bits_t vm_page_bits(int base, int size); | ||||
void vm_page_zero_invalid(vm_page_t m, boolean_t setvalid); | void vm_page_zero_invalid(vm_page_t m, boolean_t setvalid); | ||||
void vm_page_free_toq(vm_page_t m); | void vm_page_free_toq(vm_page_t m); | ||||
void vm_page_free_pages_toq(struct spglist *free, bool update_wire_count); | void vm_page_free_pages_toq(struct spglist *free, bool update_wire_count); | ||||
void vm_page_dirty_KBI(vm_page_t m); | void vm_page_dirty_KBI(vm_page_t m); | ||||
void vm_page_lock_KBI(vm_page_t m, const char *file, int line); | void vm_page_lock_KBI(vm_page_t m, const char *file, int line); | ||||
void vm_page_unlock_KBI(vm_page_t m, const char *file, int line); | void vm_page_unlock_KBI(vm_page_t m, const char *file, int line); | ||||
▲ Show 20 Lines • Show All 281 Lines • ▼ Show 20 Lines | |||||
* unmapped and unbusied or exclusively busied by the current thread, no | * unmapped and unbusied or exclusively busied by the current thread, no | ||||
* new wirings may be created. | * new wirings may be created. | ||||
*/ | */ | ||||
static inline bool | static inline bool | ||||
vm_page_wired(vm_page_t m) | vm_page_wired(vm_page_t m) | ||||
{ | { | ||||
return (VPRC_WIRE_COUNT(m->ref_count) > 0); | return (VPRC_WIRE_COUNT(m->ref_count) > 0); | ||||
} | |||||
static inline bool | |||||
vm_page_all_valid(vm_page_t m) | |||||
{ | |||||
return (m->valid == VM_PAGE_BITS_ALL); | |||||
} | |||||
static inline bool | |||||
vm_page_none_valid(vm_page_t m) | |||||
{ | |||||
return (m->valid == 0); | |||||
Not Done Inline Actions[Bikeshedding] vm_page_any_valid(), with the sense inverted, reads better to me. markj: [Bikeshedding] vm_page_any_valid(), with the sense inverted, reads better to me. | |||||
} | } | ||||
#endif /* _KERNEL */ | #endif /* _KERNEL */ | ||||
#endif /* !_VM_PAGE_ */ | #endif /* !_VM_PAGE_ */ |
Missing parens after pmap_remove_all().
Isn't pmap_remove_write() also sufficient?