Index: head/share/man/man9/random.9 =================================================================== --- head/share/man/man9/random.9 (revision 356096) +++ head/share/man/man9/random.9 (revision 356097) @@ -1,227 +1,202 @@ .\" .\" Copyright (c) 2015 .\" Mark R V Murray .\" Copyright (c) 2000 .\" The Regents of the University of California. All rights reserved. .\" .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE DEVELOPERS ``AS IS'' AND ANY EXPRESS OR .\" IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES .\" OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. .\" IN NO EVENT SHALL THE DEVELOPERS BE LIABLE FOR ANY DIRECT, INDIRECT, .\" INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT .\" NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, .\" DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY .\" THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT .\" (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF .\" THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. .\" .\" $FreeBSD$ .\" " -.Dd April 16, 2019 +.Dd December 26, 2019 .Dt RANDOM 9 .Os .Sh NAME .Nm arc4rand , .Nm arc4random , .Nm arc4random_buf , .Nm is_random_seeded , .Nm random , .Nm read_random , -.Nm read_random_uio , -.Nm srandom +.Nm read_random_uio .Nd supply pseudo-random numbers .Sh SYNOPSIS .In sys/libkern.h .Ft uint32_t .Fn arc4random "void" .Ft void .Fn arc4random_buf "void *ptr" "size_t len" .Ft void .Fn arc4rand "void *ptr" "u_int length" "int reseed" .Pp .In sys/random.h .Ft bool .Fn is_random_seeded "void" .Ft void .Fn read_random "void *buffer" "int count" .Ft int .Fn read_random_uio "struct uio *uio" "bool nonblock" .Ss LEGACY ROUTINES .In sys/libkern.h -.Ft void -.Fn srandom "u_long seed" .Ft u_long .Fn random "void" .Sh DESCRIPTION The .Fn arc4random and .Fn arc4random_buf functions will return very good quality random numbers, suited for security-related purposes. Both are wrappers around the underlying .Fn arc4rand interface. .Fn arc4random returns a 32-bit random value, while .Fn arc4random_buf fills .Fa ptr with .Fa len bytes of random data. .Pp The .Fn arc4rand CSPRNG is seeded from the .Xr random 4 kernel abstract entropy device. Automatic reseeding happens at unspecified time and bytes (of output) intervals. A reseed can be forced by passing a non-zero .Fa reseed value. .Pp The .Fn read_random function is used to read entropy directly from the kernel abstract entropy device. .Fn read_random blocks if and until the entropy device is seeded. The provided .Fa buffer is filled with no more than .Fa count bytes. It is strongly advised that .Fn read_random is not used directly; instead, use the .Fn arc4rand family of functions. .Pp The .Fn is_random_seeded function can be used to check in advance if .Fn read_random will block. (If random is seeded, it will not block.) .Pp The .Fn read_random_uio function behaves identically to .Xr read 2 on .Pa /dev/random . The .Fa uio argument points to a buffer where random data should be stored. If .Fa nonblock is true and the random device is not seeded, this function does not return any data. Otherwise, this function may block interruptibly until the random device is seeded. If the function is interrupted before the random device is seeded, no data is returned. .Pp -The legacy -.Fn random -function will produce a sequence of numbers that can be duplicated by calling -.Fn srandom -with some constant as the -.Fa seed . -The legacy -.Fn srandom -function may be called with any -.Fa seed -value. +The deprecated +.Xr random 9 +function will produce a sequence of pseudorandom numbers using a similar weak +linear congruential generator as +.Xr rand 3 +(the 1988 Park-Miller LCG). +It is obsolete and scheduled to be removed in +.Fx 13.0 . It is strongly advised that the -.Fn random +.Xr random 9 function not be used to generate random numbers. See .Sx SECURITY CONSIDERATIONS . .Sh RETURN VALUES The .Fn arc4rand function uses the Chacha20 algorithm to generate a pseudo-random sequence of bytes. The .Fn arc4random function uses .Fn arc4rand to generate pseudo-random numbers in the range from 0 to .if t 2\u\s732\s10\d\(mi1. .if n (2**32)\(mi1. .Pp The .Fn read_random function returns the number of bytes placed in .Fa buffer . .Pp .Fn read_random_uio returns zero when successful, otherwise an error code is returned. -.Pp -The legacy -.Fn random -function uses -a non-linear additive feedback random number generator -employing a default table -of size 31 -containing long integers -to return successive pseudo-random -numbers in the range from 0 to -.if t 2\u\s731\s10\d\(mi1. -.if n (2**31)\(mi1. -The period of this random number generator -is very large, -approximately -.if t 16\(mu(2\u\s731\s10\d\(mi1). -.if n 16*((2**31)\(mi1). .Sh ERRORS .Fn read_random_uio may fail if: .Bl -tag -width Er .It Bq Er EFAULT .Fa uio points to an invalid memory region. .It Bq Er EWOULDBLOCK The random device is unseeded and .Fa nonblock is true. .El .Sh AUTHORS .An Dan Moschuk wrote .Fn arc4random . .An Mark R V Murray wrote .Fn read_random . .Sh SECURITY CONSIDERATIONS Do not use .Fn random -or -.Fn srandom in new code. .Pp It is important to remember that the .Fn random function is entirely predictable. It is easy for attackers to predict future output of .Fn random by recording some generated values. We cannot emphasize strongly enough that .Fn random must not be used to generate values that are intended to be unpredictable. Index: head/sys/compat/ndis/subr_ntoskrnl.c =================================================================== --- head/sys/compat/ndis/subr_ntoskrnl.c (revision 356096) +++ head/sys/compat/ndis/subr_ntoskrnl.c (revision 356097) @@ -1,4456 +1,4454 @@ /*- * SPDX-License-Identifier: BSD-4-Clause * * Copyright (c) 2003 * Bill Paul . All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by Bill Paul. * 4. Neither the name of the author nor the names of any co-contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY Bill Paul AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL Bill Paul OR THE VOICES IN HIS HEAD * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF * THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #ifdef NTOSKRNL_DEBUG_TIMERS static int sysctl_show_timers(SYSCTL_HANDLER_ARGS); SYSCTL_PROC(_debug, OID_AUTO, ntoskrnl_timers, CTLTYPE_INT | CTLFLAG_RW, NULL, 0, sysctl_show_timers, "I", "Show ntoskrnl timer stats"); #endif struct kdpc_queue { list_entry kq_disp; struct thread *kq_td; int kq_cpu; int kq_exit; int kq_running; kspin_lock kq_lock; nt_kevent kq_proc; nt_kevent kq_done; }; typedef struct kdpc_queue kdpc_queue; struct wb_ext { struct cv we_cv; struct thread *we_td; }; typedef struct wb_ext wb_ext; #define NTOSKRNL_TIMEOUTS 256 #ifdef NTOSKRNL_DEBUG_TIMERS static uint64_t ntoskrnl_timer_fires; static uint64_t ntoskrnl_timer_sets; static uint64_t ntoskrnl_timer_reloads; static uint64_t ntoskrnl_timer_cancels; #endif struct callout_entry { struct callout ce_callout; list_entry ce_list; }; typedef struct callout_entry callout_entry; static struct list_entry ntoskrnl_calllist; static struct mtx ntoskrnl_calllock; struct kuser_shared_data kuser_shared_data; static struct list_entry ntoskrnl_intlist; static kspin_lock ntoskrnl_intlock; static uint8_t RtlEqualUnicodeString(unicode_string *, unicode_string *, uint8_t); static void RtlCopyString(ansi_string *, const ansi_string *); static void RtlCopyUnicodeString(unicode_string *, unicode_string *); static irp *IoBuildSynchronousFsdRequest(uint32_t, device_object *, void *, uint32_t, uint64_t *, nt_kevent *, io_status_block *); static irp *IoBuildAsynchronousFsdRequest(uint32_t, device_object *, void *, uint32_t, uint64_t *, io_status_block *); static irp *IoBuildDeviceIoControlRequest(uint32_t, device_object *, void *, uint32_t, void *, uint32_t, uint8_t, nt_kevent *, io_status_block *); static irp *IoAllocateIrp(uint8_t, uint8_t); static void IoReuseIrp(irp *, uint32_t); static void IoFreeIrp(irp *); static void IoInitializeIrp(irp *, uint16_t, uint8_t); static irp *IoMakeAssociatedIrp(irp *, uint8_t); static uint32_t KeWaitForMultipleObjects(uint32_t, nt_dispatch_header **, uint32_t, uint32_t, uint32_t, uint8_t, int64_t *, wait_block *); static void ntoskrnl_waittest(nt_dispatch_header *, uint32_t); static void ntoskrnl_satisfy_wait(nt_dispatch_header *, struct thread *); static void ntoskrnl_satisfy_multiple_waits(wait_block *); static int ntoskrnl_is_signalled(nt_dispatch_header *, struct thread *); static void ntoskrnl_insert_timer(ktimer *, int); static void ntoskrnl_remove_timer(ktimer *); #ifdef NTOSKRNL_DEBUG_TIMERS static void ntoskrnl_show_timers(void); #endif static void ntoskrnl_timercall(void *); static void ntoskrnl_dpc_thread(void *); static void ntoskrnl_destroy_dpc_threads(void); static void ntoskrnl_destroy_workitem_threads(void); static void ntoskrnl_workitem_thread(void *); static void ntoskrnl_workitem(device_object *, void *); static void ntoskrnl_unicode_to_ascii(uint16_t *, char *, int); static void ntoskrnl_ascii_to_unicode(char *, uint16_t *, int); static uint8_t ntoskrnl_insert_dpc(list_entry *, kdpc *); static void WRITE_REGISTER_USHORT(uint16_t *, uint16_t); static uint16_t READ_REGISTER_USHORT(uint16_t *); static void WRITE_REGISTER_ULONG(uint32_t *, uint32_t); static uint32_t READ_REGISTER_ULONG(uint32_t *); static void WRITE_REGISTER_UCHAR(uint8_t *, uint8_t); static uint8_t READ_REGISTER_UCHAR(uint8_t *); static int64_t _allmul(int64_t, int64_t); static int64_t _alldiv(int64_t, int64_t); static int64_t _allrem(int64_t, int64_t); static int64_t _allshr(int64_t, uint8_t); static int64_t _allshl(int64_t, uint8_t); static uint64_t _aullmul(uint64_t, uint64_t); static uint64_t _aulldiv(uint64_t, uint64_t); static uint64_t _aullrem(uint64_t, uint64_t); static uint64_t _aullshr(uint64_t, uint8_t); static uint64_t _aullshl(uint64_t, uint8_t); static slist_entry *ntoskrnl_pushsl(slist_header *, slist_entry *); static void InitializeSListHead(slist_header *); static slist_entry *ntoskrnl_popsl(slist_header *); static void ExFreePoolWithTag(void *, uint32_t); static void ExInitializePagedLookasideList(paged_lookaside_list *, lookaside_alloc_func *, lookaside_free_func *, uint32_t, size_t, uint32_t, uint16_t); static void ExDeletePagedLookasideList(paged_lookaside_list *); static void ExInitializeNPagedLookasideList(npaged_lookaside_list *, lookaside_alloc_func *, lookaside_free_func *, uint32_t, size_t, uint32_t, uint16_t); static void ExDeleteNPagedLookasideList(npaged_lookaside_list *); static slist_entry *ExInterlockedPushEntrySList(slist_header *, slist_entry *, kspin_lock *); static slist_entry *ExInterlockedPopEntrySList(slist_header *, kspin_lock *); static uint32_t InterlockedIncrement(volatile uint32_t *); static uint32_t InterlockedDecrement(volatile uint32_t *); static void ExInterlockedAddLargeStatistic(uint64_t *, uint32_t); static void *MmAllocateContiguousMemory(uint32_t, uint64_t); static void *MmAllocateContiguousMemorySpecifyCache(uint32_t, uint64_t, uint64_t, uint64_t, enum nt_caching_type); static void MmFreeContiguousMemory(void *); static void MmFreeContiguousMemorySpecifyCache(void *, uint32_t, enum nt_caching_type); static uint32_t MmSizeOfMdl(void *, size_t); static void *MmMapLockedPages(mdl *, uint8_t); static void *MmMapLockedPagesSpecifyCache(mdl *, uint8_t, uint32_t, void *, uint32_t, uint32_t); static void MmUnmapLockedPages(void *, mdl *); static device_t ntoskrnl_finddev(device_t, uint64_t, struct resource **); static void RtlZeroMemory(void *, size_t); static void RtlSecureZeroMemory(void *, size_t); static void RtlFillMemory(void *, size_t, uint8_t); static void RtlMoveMemory(void *, const void *, size_t); static ndis_status RtlCharToInteger(const char *, uint32_t, uint32_t *); static void RtlCopyMemory(void *, const void *, size_t); static size_t RtlCompareMemory(const void *, const void *, size_t); static ndis_status RtlUnicodeStringToInteger(unicode_string *, uint32_t, uint32_t *); static int atoi (const char *); static long atol (const char *); static int rand(void); static void srand(unsigned int); static void KeQuerySystemTime(uint64_t *); static uint32_t KeTickCount(void); static uint8_t IoIsWdmVersionAvailable(uint8_t, uint8_t); static int32_t IoOpenDeviceRegistryKey(struct device_object *, uint32_t, uint32_t, void **); static void ntoskrnl_thrfunc(void *); static ndis_status PsCreateSystemThread(ndis_handle *, uint32_t, void *, ndis_handle, void *, void *, void *); static ndis_status PsTerminateSystemThread(ndis_status); static ndis_status IoGetDeviceObjectPointer(unicode_string *, uint32_t, void *, device_object *); static ndis_status IoGetDeviceProperty(device_object *, uint32_t, uint32_t, void *, uint32_t *); static void KeInitializeMutex(kmutant *, uint32_t); static uint32_t KeReleaseMutex(kmutant *, uint8_t); static uint32_t KeReadStateMutex(kmutant *); static ndis_status ObReferenceObjectByHandle(ndis_handle, uint32_t, void *, uint8_t, void **, void **); static void ObfDereferenceObject(void *); static uint32_t ZwClose(ndis_handle); static uint32_t WmiQueryTraceInformation(uint32_t, void *, uint32_t, uint32_t, void *); static uint32_t WmiTraceMessage(uint64_t, uint32_t, void *, uint16_t, ...); static uint32_t IoWMIRegistrationControl(device_object *, uint32_t); static void *ntoskrnl_memset(void *, int, size_t); static void *ntoskrnl_memmove(void *, void *, size_t); static void *ntoskrnl_memchr(void *, unsigned char, size_t); static char *ntoskrnl_strstr(char *, char *); static char *ntoskrnl_strncat(char *, char *, size_t); static int ntoskrnl_toupper(int); static int ntoskrnl_tolower(int); static funcptr ntoskrnl_findwrap(funcptr); static uint32_t DbgPrint(char *, ...); static void DbgBreakPoint(void); static void KeBugCheckEx(uint32_t, u_long, u_long, u_long, u_long); static int32_t KeDelayExecutionThread(uint8_t, uint8_t, int64_t *); static int32_t KeSetPriorityThread(struct thread *, int32_t); static void dummy(void); static struct mtx ntoskrnl_dispatchlock; static struct mtx ntoskrnl_interlock; static kspin_lock ntoskrnl_cancellock; static int ntoskrnl_kth = 0; static struct nt_objref_head ntoskrnl_reflist; static uma_zone_t mdl_zone; static uma_zone_t iw_zone; static struct kdpc_queue *kq_queues; static struct kdpc_queue *wq_queues; static int wq_idx = 0; int ntoskrnl_libinit() { image_patch_table *patch; int error; struct proc *p; kdpc_queue *kq; callout_entry *e; int i; mtx_init(&ntoskrnl_dispatchlock, "ntoskrnl dispatch lock", MTX_NDIS_LOCK, MTX_DEF|MTX_RECURSE); mtx_init(&ntoskrnl_interlock, MTX_NTOSKRNL_SPIN_LOCK, NULL, MTX_SPIN); KeInitializeSpinLock(&ntoskrnl_cancellock); KeInitializeSpinLock(&ntoskrnl_intlock); TAILQ_INIT(&ntoskrnl_reflist); InitializeListHead(&ntoskrnl_calllist); InitializeListHead(&ntoskrnl_intlist); mtx_init(&ntoskrnl_calllock, MTX_NTOSKRNL_SPIN_LOCK, NULL, MTX_SPIN); kq_queues = ExAllocatePoolWithTag(NonPagedPool, #ifdef NTOSKRNL_MULTIPLE_DPCS sizeof(kdpc_queue) * mp_ncpus, 0); #else sizeof(kdpc_queue), 0); #endif if (kq_queues == NULL) return (ENOMEM); wq_queues = ExAllocatePoolWithTag(NonPagedPool, sizeof(kdpc_queue) * WORKITEM_THREADS, 0); if (wq_queues == NULL) return (ENOMEM); #ifdef NTOSKRNL_MULTIPLE_DPCS bzero((char *)kq_queues, sizeof(kdpc_queue) * mp_ncpus); #else bzero((char *)kq_queues, sizeof(kdpc_queue)); #endif bzero((char *)wq_queues, sizeof(kdpc_queue) * WORKITEM_THREADS); /* * Launch the DPC threads. */ #ifdef NTOSKRNL_MULTIPLE_DPCS for (i = 0; i < mp_ncpus; i++) { #else for (i = 0; i < 1; i++) { #endif kq = kq_queues + i; kq->kq_cpu = i; error = kproc_create(ntoskrnl_dpc_thread, kq, &p, RFHIGHPID, NDIS_KSTACK_PAGES, "Windows DPC %d", i); if (error) panic("failed to launch DPC thread"); } /* * Launch the workitem threads. */ for (i = 0; i < WORKITEM_THREADS; i++) { kq = wq_queues + i; error = kproc_create(ntoskrnl_workitem_thread, kq, &p, RFHIGHPID, NDIS_KSTACK_PAGES, "Windows Workitem %d", i); if (error) panic("failed to launch workitem thread"); } patch = ntoskrnl_functbl; while (patch->ipt_func != NULL) { windrv_wrap((funcptr)patch->ipt_func, (funcptr *)&patch->ipt_wrap, patch->ipt_argcnt, patch->ipt_ftype); patch++; } for (i = 0; i < NTOSKRNL_TIMEOUTS; i++) { e = ExAllocatePoolWithTag(NonPagedPool, sizeof(callout_entry), 0); if (e == NULL) panic("failed to allocate timeouts"); mtx_lock_spin(&ntoskrnl_calllock); InsertHeadList((&ntoskrnl_calllist), (&e->ce_list)); mtx_unlock_spin(&ntoskrnl_calllock); } /* * MDLs are supposed to be variable size (they describe * buffers containing some number of pages, but we don't * know ahead of time how many pages that will be). But * always allocating them off the heap is very slow. As * a compromise, we create an MDL UMA zone big enough to * handle any buffer requiring up to 16 pages, and we * use those for any MDLs for buffers of 16 pages or less * in size. For buffers larger than that (which we assume * will be few and far between, we allocate the MDLs off * the heap. */ mdl_zone = uma_zcreate("Windows MDL", MDL_ZONE_SIZE, NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); iw_zone = uma_zcreate("Windows WorkItem", sizeof(io_workitem), NULL, NULL, NULL, NULL, UMA_ALIGN_PTR, 0); return (0); } int ntoskrnl_libfini() { image_patch_table *patch; callout_entry *e; list_entry *l; patch = ntoskrnl_functbl; while (patch->ipt_func != NULL) { windrv_unwrap(patch->ipt_wrap); patch++; } /* Stop the workitem queues. */ ntoskrnl_destroy_workitem_threads(); /* Stop the DPC queues. */ ntoskrnl_destroy_dpc_threads(); ExFreePool(kq_queues); ExFreePool(wq_queues); uma_zdestroy(mdl_zone); uma_zdestroy(iw_zone); mtx_lock_spin(&ntoskrnl_calllock); while(!IsListEmpty(&ntoskrnl_calllist)) { l = RemoveHeadList(&ntoskrnl_calllist); e = CONTAINING_RECORD(l, callout_entry, ce_list); mtx_unlock_spin(&ntoskrnl_calllock); ExFreePool(e); mtx_lock_spin(&ntoskrnl_calllock); } mtx_unlock_spin(&ntoskrnl_calllock); mtx_destroy(&ntoskrnl_dispatchlock); mtx_destroy(&ntoskrnl_interlock); mtx_destroy(&ntoskrnl_calllock); return (0); } /* * We need to be able to reference this externally from the wrapper; * GCC only generates a local implementation of memset. */ static void * ntoskrnl_memset(buf, ch, size) void *buf; int ch; size_t size; { return (memset(buf, ch, size)); } static void * ntoskrnl_memmove(dst, src, size) void *src; void *dst; size_t size; { bcopy(src, dst, size); return (dst); } static void * ntoskrnl_memchr(void *buf, unsigned char ch, size_t len) { if (len != 0) { unsigned char *p = buf; do { if (*p++ == ch) return (p - 1); } while (--len != 0); } return (NULL); } static char * ntoskrnl_strstr(s, find) char *s, *find; { char c, sc; size_t len; if ((c = *find++) != 0) { len = strlen(find); do { do { if ((sc = *s++) == 0) return (NULL); } while (sc != c); } while (strncmp(s, find, len) != 0); s--; } return ((char *)s); } /* Taken from libc */ static char * ntoskrnl_strncat(dst, src, n) char *dst; char *src; size_t n; { if (n != 0) { char *d = dst; const char *s = src; while (*d != 0) d++; do { if ((*d = *s++) == 0) break; d++; } while (--n != 0); *d = 0; } return (dst); } static int ntoskrnl_toupper(c) int c; { return (toupper(c)); } static int ntoskrnl_tolower(c) int c; { return (tolower(c)); } static uint8_t RtlEqualUnicodeString(unicode_string *str1, unicode_string *str2, uint8_t caseinsensitive) { int i; if (str1->us_len != str2->us_len) return (FALSE); for (i = 0; i < str1->us_len; i++) { if (caseinsensitive == TRUE) { if (toupper((char)(str1->us_buf[i] & 0xFF)) != toupper((char)(str2->us_buf[i] & 0xFF))) return (FALSE); } else { if (str1->us_buf[i] != str2->us_buf[i]) return (FALSE); } } return (TRUE); } static void RtlCopyString(dst, src) ansi_string *dst; const ansi_string *src; { if (src != NULL && src->as_buf != NULL && dst->as_buf != NULL) { dst->as_len = min(src->as_len, dst->as_maxlen); memcpy(dst->as_buf, src->as_buf, dst->as_len); if (dst->as_len < dst->as_maxlen) dst->as_buf[dst->as_len] = 0; } else dst->as_len = 0; } static void RtlCopyUnicodeString(dest, src) unicode_string *dest; unicode_string *src; { if (dest->us_maxlen >= src->us_len) dest->us_len = src->us_len; else dest->us_len = dest->us_maxlen; memcpy(dest->us_buf, src->us_buf, dest->us_len); } static void ntoskrnl_ascii_to_unicode(ascii, unicode, len) char *ascii; uint16_t *unicode; int len; { int i; uint16_t *ustr; ustr = unicode; for (i = 0; i < len; i++) { *ustr = (uint16_t)ascii[i]; ustr++; } } static void ntoskrnl_unicode_to_ascii(unicode, ascii, len) uint16_t *unicode; char *ascii; int len; { int i; uint8_t *astr; astr = ascii; for (i = 0; i < len / 2; i++) { *astr = (uint8_t)unicode[i]; astr++; } } uint32_t RtlUnicodeStringToAnsiString(ansi_string *dest, unicode_string *src, uint8_t allocate) { if (dest == NULL || src == NULL) return (STATUS_INVALID_PARAMETER); dest->as_len = src->us_len / 2; if (dest->as_maxlen < dest->as_len) dest->as_len = dest->as_maxlen; if (allocate == TRUE) { dest->as_buf = ExAllocatePoolWithTag(NonPagedPool, (src->us_len / 2) + 1, 0); if (dest->as_buf == NULL) return (STATUS_INSUFFICIENT_RESOURCES); dest->as_len = dest->as_maxlen = src->us_len / 2; } else { dest->as_len = src->us_len / 2; /* XXX */ if (dest->as_maxlen < dest->as_len) dest->as_len = dest->as_maxlen; } ntoskrnl_unicode_to_ascii(src->us_buf, dest->as_buf, dest->as_len * 2); return (STATUS_SUCCESS); } uint32_t RtlAnsiStringToUnicodeString(unicode_string *dest, ansi_string *src, uint8_t allocate) { if (dest == NULL || src == NULL) return (STATUS_INVALID_PARAMETER); if (allocate == TRUE) { dest->us_buf = ExAllocatePoolWithTag(NonPagedPool, src->as_len * 2, 0); if (dest->us_buf == NULL) return (STATUS_INSUFFICIENT_RESOURCES); dest->us_len = dest->us_maxlen = strlen(src->as_buf) * 2; } else { dest->us_len = src->as_len * 2; /* XXX */ if (dest->us_maxlen < dest->us_len) dest->us_len = dest->us_maxlen; } ntoskrnl_ascii_to_unicode(src->as_buf, dest->us_buf, dest->us_len / 2); return (STATUS_SUCCESS); } void * ExAllocatePoolWithTag(pooltype, len, tag) uint32_t pooltype; size_t len; uint32_t tag; { void *buf; buf = malloc(len, M_DEVBUF, M_NOWAIT|M_ZERO); if (buf == NULL) return (NULL); return (buf); } static void ExFreePoolWithTag(buf, tag) void *buf; uint32_t tag; { ExFreePool(buf); } void ExFreePool(buf) void *buf; { free(buf, M_DEVBUF); } uint32_t IoAllocateDriverObjectExtension(drv, clid, extlen, ext) driver_object *drv; void *clid; uint32_t extlen; void **ext; { custom_extension *ce; ce = ExAllocatePoolWithTag(NonPagedPool, sizeof(custom_extension) + extlen, 0); if (ce == NULL) return (STATUS_INSUFFICIENT_RESOURCES); ce->ce_clid = clid; InsertTailList((&drv->dro_driverext->dre_usrext), (&ce->ce_list)); *ext = (void *)(ce + 1); return (STATUS_SUCCESS); } void * IoGetDriverObjectExtension(drv, clid) driver_object *drv; void *clid; { list_entry *e; custom_extension *ce; /* * Sanity check. Our dummy bus drivers don't have * any driver extensions. */ if (drv->dro_driverext == NULL) return (NULL); e = drv->dro_driverext->dre_usrext.nle_flink; while (e != &drv->dro_driverext->dre_usrext) { ce = (custom_extension *)e; if (ce->ce_clid == clid) return ((void *)(ce + 1)); e = e->nle_flink; } return (NULL); } uint32_t IoCreateDevice(driver_object *drv, uint32_t devextlen, unicode_string *devname, uint32_t devtype, uint32_t devchars, uint8_t exclusive, device_object **newdev) { device_object *dev; dev = ExAllocatePoolWithTag(NonPagedPool, sizeof(device_object), 0); if (dev == NULL) return (STATUS_INSUFFICIENT_RESOURCES); dev->do_type = devtype; dev->do_drvobj = drv; dev->do_currirp = NULL; dev->do_flags = 0; if (devextlen) { dev->do_devext = ExAllocatePoolWithTag(NonPagedPool, devextlen, 0); if (dev->do_devext == NULL) { ExFreePool(dev); return (STATUS_INSUFFICIENT_RESOURCES); } bzero(dev->do_devext, devextlen); } else dev->do_devext = NULL; dev->do_size = sizeof(device_object) + devextlen; dev->do_refcnt = 1; dev->do_attacheddev = NULL; dev->do_nextdev = NULL; dev->do_devtype = devtype; dev->do_stacksize = 1; dev->do_alignreq = 1; dev->do_characteristics = devchars; dev->do_iotimer = NULL; KeInitializeEvent(&dev->do_devlock, EVENT_TYPE_SYNC, TRUE); /* * Vpd is used for disk/tape devices, * but we don't support those. (Yet.) */ dev->do_vpb = NULL; dev->do_devobj_ext = ExAllocatePoolWithTag(NonPagedPool, sizeof(devobj_extension), 0); if (dev->do_devobj_ext == NULL) { if (dev->do_devext != NULL) ExFreePool(dev->do_devext); ExFreePool(dev); return (STATUS_INSUFFICIENT_RESOURCES); } dev->do_devobj_ext->dve_type = 0; dev->do_devobj_ext->dve_size = sizeof(devobj_extension); dev->do_devobj_ext->dve_devobj = dev; /* * Attach this device to the driver object's list * of devices. Note: this is not the same as attaching * the device to the device stack. The driver's AddDevice * routine must explicitly call IoAddDeviceToDeviceStack() * to do that. */ if (drv->dro_devobj == NULL) { drv->dro_devobj = dev; dev->do_nextdev = NULL; } else { dev->do_nextdev = drv->dro_devobj; drv->dro_devobj = dev; } *newdev = dev; return (STATUS_SUCCESS); } void IoDeleteDevice(dev) device_object *dev; { device_object *prev; if (dev == NULL) return; if (dev->do_devobj_ext != NULL) ExFreePool(dev->do_devobj_ext); if (dev->do_devext != NULL) ExFreePool(dev->do_devext); /* Unlink the device from the driver's device list. */ prev = dev->do_drvobj->dro_devobj; if (prev == dev) dev->do_drvobj->dro_devobj = dev->do_nextdev; else { while (prev->do_nextdev != dev) prev = prev->do_nextdev; prev->do_nextdev = dev->do_nextdev; } ExFreePool(dev); } device_object * IoGetAttachedDevice(dev) device_object *dev; { device_object *d; if (dev == NULL) return (NULL); d = dev; while (d->do_attacheddev != NULL) d = d->do_attacheddev; return (d); } static irp * IoBuildSynchronousFsdRequest(func, dobj, buf, len, off, event, status) uint32_t func; device_object *dobj; void *buf; uint32_t len; uint64_t *off; nt_kevent *event; io_status_block *status; { irp *ip; ip = IoBuildAsynchronousFsdRequest(func, dobj, buf, len, off, status); if (ip == NULL) return (NULL); ip->irp_usrevent = event; return (ip); } static irp * IoBuildAsynchronousFsdRequest(func, dobj, buf, len, off, status) uint32_t func; device_object *dobj; void *buf; uint32_t len; uint64_t *off; io_status_block *status; { irp *ip; io_stack_location *sl; ip = IoAllocateIrp(dobj->do_stacksize, TRUE); if (ip == NULL) return (NULL); ip->irp_usriostat = status; ip->irp_tail.irp_overlay.irp_thread = NULL; sl = IoGetNextIrpStackLocation(ip); sl->isl_major = func; sl->isl_minor = 0; sl->isl_flags = 0; sl->isl_ctl = 0; sl->isl_devobj = dobj; sl->isl_fileobj = NULL; sl->isl_completionfunc = NULL; ip->irp_userbuf = buf; if (dobj->do_flags & DO_BUFFERED_IO) { ip->irp_assoc.irp_sysbuf = ExAllocatePoolWithTag(NonPagedPool, len, 0); if (ip->irp_assoc.irp_sysbuf == NULL) { IoFreeIrp(ip); return (NULL); } bcopy(buf, ip->irp_assoc.irp_sysbuf, len); } if (dobj->do_flags & DO_DIRECT_IO) { ip->irp_mdl = IoAllocateMdl(buf, len, FALSE, FALSE, ip); if (ip->irp_mdl == NULL) { if (ip->irp_assoc.irp_sysbuf != NULL) ExFreePool(ip->irp_assoc.irp_sysbuf); IoFreeIrp(ip); return (NULL); } ip->irp_userbuf = NULL; ip->irp_assoc.irp_sysbuf = NULL; } if (func == IRP_MJ_READ) { sl->isl_parameters.isl_read.isl_len = len; if (off != NULL) sl->isl_parameters.isl_read.isl_byteoff = *off; else sl->isl_parameters.isl_read.isl_byteoff = 0; } if (func == IRP_MJ_WRITE) { sl->isl_parameters.isl_write.isl_len = len; if (off != NULL) sl->isl_parameters.isl_write.isl_byteoff = *off; else sl->isl_parameters.isl_write.isl_byteoff = 0; } return (ip); } static irp * IoBuildDeviceIoControlRequest(uint32_t iocode, device_object *dobj, void *ibuf, uint32_t ilen, void *obuf, uint32_t olen, uint8_t isinternal, nt_kevent *event, io_status_block *status) { irp *ip; io_stack_location *sl; uint32_t buflen; ip = IoAllocateIrp(dobj->do_stacksize, TRUE); if (ip == NULL) return (NULL); ip->irp_usrevent = event; ip->irp_usriostat = status; ip->irp_tail.irp_overlay.irp_thread = NULL; sl = IoGetNextIrpStackLocation(ip); sl->isl_major = isinternal == TRUE ? IRP_MJ_INTERNAL_DEVICE_CONTROL : IRP_MJ_DEVICE_CONTROL; sl->isl_minor = 0; sl->isl_flags = 0; sl->isl_ctl = 0; sl->isl_devobj = dobj; sl->isl_fileobj = NULL; sl->isl_completionfunc = NULL; sl->isl_parameters.isl_ioctl.isl_iocode = iocode; sl->isl_parameters.isl_ioctl.isl_ibuflen = ilen; sl->isl_parameters.isl_ioctl.isl_obuflen = olen; switch(IO_METHOD(iocode)) { case METHOD_BUFFERED: if (ilen > olen) buflen = ilen; else buflen = olen; if (buflen) { ip->irp_assoc.irp_sysbuf = ExAllocatePoolWithTag(NonPagedPool, buflen, 0); if (ip->irp_assoc.irp_sysbuf == NULL) { IoFreeIrp(ip); return (NULL); } } if (ilen && ibuf != NULL) { bcopy(ibuf, ip->irp_assoc.irp_sysbuf, ilen); bzero((char *)ip->irp_assoc.irp_sysbuf + ilen, buflen - ilen); } else bzero(ip->irp_assoc.irp_sysbuf, ilen); ip->irp_userbuf = obuf; break; case METHOD_IN_DIRECT: case METHOD_OUT_DIRECT: if (ilen && ibuf != NULL) { ip->irp_assoc.irp_sysbuf = ExAllocatePoolWithTag(NonPagedPool, ilen, 0); if (ip->irp_assoc.irp_sysbuf == NULL) { IoFreeIrp(ip); return (NULL); } bcopy(ibuf, ip->irp_assoc.irp_sysbuf, ilen); } if (olen && obuf != NULL) { ip->irp_mdl = IoAllocateMdl(obuf, olen, FALSE, FALSE, ip); /* * Normally we would MmProbeAndLockPages() * here, but we don't have to in our * imlementation. */ } break; case METHOD_NEITHER: ip->irp_userbuf = obuf; sl->isl_parameters.isl_ioctl.isl_type3ibuf = ibuf; break; default: break; } /* * Ideally, we should associate this IRP with the calling * thread here. */ return (ip); } static irp * IoAllocateIrp(uint8_t stsize, uint8_t chargequota) { irp *i; i = ExAllocatePoolWithTag(NonPagedPool, IoSizeOfIrp(stsize), 0); if (i == NULL) return (NULL); IoInitializeIrp(i, IoSizeOfIrp(stsize), stsize); return (i); } static irp * IoMakeAssociatedIrp(irp *ip, uint8_t stsize) { irp *associrp; associrp = IoAllocateIrp(stsize, FALSE); if (associrp == NULL) return (NULL); mtx_lock(&ntoskrnl_dispatchlock); associrp->irp_flags |= IRP_ASSOCIATED_IRP; associrp->irp_tail.irp_overlay.irp_thread = ip->irp_tail.irp_overlay.irp_thread; associrp->irp_assoc.irp_master = ip; mtx_unlock(&ntoskrnl_dispatchlock); return (associrp); } static void IoFreeIrp(ip) irp *ip; { ExFreePool(ip); } static void IoInitializeIrp(irp *io, uint16_t psize, uint8_t ssize) { bzero((char *)io, IoSizeOfIrp(ssize)); io->irp_size = psize; io->irp_stackcnt = ssize; io->irp_currentstackloc = ssize; InitializeListHead(&io->irp_thlist); io->irp_tail.irp_overlay.irp_csl = (io_stack_location *)(io + 1) + ssize; } static void IoReuseIrp(ip, status) irp *ip; uint32_t status; { uint8_t allocflags; allocflags = ip->irp_allocflags; IoInitializeIrp(ip, ip->irp_size, ip->irp_stackcnt); ip->irp_iostat.isb_status = status; ip->irp_allocflags = allocflags; } void IoAcquireCancelSpinLock(uint8_t *irql) { KeAcquireSpinLock(&ntoskrnl_cancellock, irql); } void IoReleaseCancelSpinLock(uint8_t irql) { KeReleaseSpinLock(&ntoskrnl_cancellock, irql); } uint8_t IoCancelIrp(irp *ip) { cancel_func cfunc; uint8_t cancelirql; IoAcquireCancelSpinLock(&cancelirql); cfunc = IoSetCancelRoutine(ip, NULL); ip->irp_cancel = TRUE; if (cfunc == NULL) { IoReleaseCancelSpinLock(cancelirql); return (FALSE); } ip->irp_cancelirql = cancelirql; MSCALL2(cfunc, IoGetCurrentIrpStackLocation(ip)->isl_devobj, ip); return (uint8_t)IoSetCancelValue(ip, TRUE); } uint32_t IofCallDriver(dobj, ip) device_object *dobj; irp *ip; { driver_object *drvobj; io_stack_location *sl; uint32_t status; driver_dispatch disp; drvobj = dobj->do_drvobj; if (ip->irp_currentstackloc <= 0) panic("IoCallDriver(): out of stack locations"); IoSetNextIrpStackLocation(ip); sl = IoGetCurrentIrpStackLocation(ip); sl->isl_devobj = dobj; disp = drvobj->dro_dispatch[sl->isl_major]; status = MSCALL2(disp, dobj, ip); return (status); } void IofCompleteRequest(irp *ip, uint8_t prioboost) { uint32_t status; device_object *dobj; io_stack_location *sl; completion_func cf; KASSERT(ip->irp_iostat.isb_status != STATUS_PENDING, ("incorrect IRP(%p) status (STATUS_PENDING)", ip)); sl = IoGetCurrentIrpStackLocation(ip); IoSkipCurrentIrpStackLocation(ip); do { if (sl->isl_ctl & SL_PENDING_RETURNED) ip->irp_pendingreturned = TRUE; if (ip->irp_currentstackloc != (ip->irp_stackcnt + 1)) dobj = IoGetCurrentIrpStackLocation(ip)->isl_devobj; else dobj = NULL; if (sl->isl_completionfunc != NULL && ((ip->irp_iostat.isb_status == STATUS_SUCCESS && sl->isl_ctl & SL_INVOKE_ON_SUCCESS) || (ip->irp_iostat.isb_status != STATUS_SUCCESS && sl->isl_ctl & SL_INVOKE_ON_ERROR) || (ip->irp_cancel == TRUE && sl->isl_ctl & SL_INVOKE_ON_CANCEL))) { cf = sl->isl_completionfunc; status = MSCALL3(cf, dobj, ip, sl->isl_completionctx); if (status == STATUS_MORE_PROCESSING_REQUIRED) return; } else { if ((ip->irp_currentstackloc <= ip->irp_stackcnt) && (ip->irp_pendingreturned == TRUE)) IoMarkIrpPending(ip); } /* move to the next. */ IoSkipCurrentIrpStackLocation(ip); sl++; } while (ip->irp_currentstackloc <= (ip->irp_stackcnt + 1)); if (ip->irp_usriostat != NULL) *ip->irp_usriostat = ip->irp_iostat; if (ip->irp_usrevent != NULL) KeSetEvent(ip->irp_usrevent, prioboost, FALSE); /* Handle any associated IRPs. */ if (ip->irp_flags & IRP_ASSOCIATED_IRP) { uint32_t masterirpcnt; irp *masterirp; mdl *m; masterirp = ip->irp_assoc.irp_master; masterirpcnt = InterlockedDecrement(&masterirp->irp_assoc.irp_irpcnt); while ((m = ip->irp_mdl) != NULL) { ip->irp_mdl = m->mdl_next; IoFreeMdl(m); } IoFreeIrp(ip); if (masterirpcnt == 0) IoCompleteRequest(masterirp, IO_NO_INCREMENT); return; } /* With any luck, these conditions will never arise. */ if (ip->irp_flags & IRP_PAGING_IO) { if (ip->irp_mdl != NULL) IoFreeMdl(ip->irp_mdl); IoFreeIrp(ip); } } void ntoskrnl_intr(arg) void *arg; { kinterrupt *iobj; uint8_t irql; uint8_t claimed; list_entry *l; KeAcquireSpinLock(&ntoskrnl_intlock, &irql); l = ntoskrnl_intlist.nle_flink; while (l != &ntoskrnl_intlist) { iobj = CONTAINING_RECORD(l, kinterrupt, ki_list); claimed = MSCALL2(iobj->ki_svcfunc, iobj, iobj->ki_svcctx); if (claimed == TRUE) break; l = l->nle_flink; } KeReleaseSpinLock(&ntoskrnl_intlock, irql); } uint8_t KeAcquireInterruptSpinLock(iobj) kinterrupt *iobj; { uint8_t irql; KeAcquireSpinLock(&ntoskrnl_intlock, &irql); return (irql); } void KeReleaseInterruptSpinLock(kinterrupt *iobj, uint8_t irql) { KeReleaseSpinLock(&ntoskrnl_intlock, irql); } uint8_t KeSynchronizeExecution(iobj, syncfunc, syncctx) kinterrupt *iobj; void *syncfunc; void *syncctx; { uint8_t irql; KeAcquireSpinLock(&ntoskrnl_intlock, &irql); MSCALL1(syncfunc, syncctx); KeReleaseSpinLock(&ntoskrnl_intlock, irql); return (TRUE); } /* * IoConnectInterrupt() is passed only the interrupt vector and * irql that a device wants to use, but no device-specific tag * of any kind. This conflicts rather badly with FreeBSD's * bus_setup_intr(), which needs the device_t for the device * requesting interrupt delivery. In order to bypass this * inconsistency, we implement a second level of interrupt * dispatching on top of bus_setup_intr(). All devices use * ntoskrnl_intr() as their ISR, and any device requesting * interrupts will be registered with ntoskrnl_intr()'s interrupt * dispatch list. When an interrupt arrives, we walk the list * and invoke all the registered ISRs. This effectively makes all * interrupts shared, but it's the only way to duplicate the * semantics of IoConnectInterrupt() and IoDisconnectInterrupt() properly. */ uint32_t IoConnectInterrupt(kinterrupt **iobj, void *svcfunc, void *svcctx, kspin_lock *lock, uint32_t vector, uint8_t irql, uint8_t syncirql, uint8_t imode, uint8_t shared, uint32_t affinity, uint8_t savefloat) { uint8_t curirql; *iobj = ExAllocatePoolWithTag(NonPagedPool, sizeof(kinterrupt), 0); if (*iobj == NULL) return (STATUS_INSUFFICIENT_RESOURCES); (*iobj)->ki_svcfunc = svcfunc; (*iobj)->ki_svcctx = svcctx; if (lock == NULL) { KeInitializeSpinLock(&(*iobj)->ki_lock_priv); (*iobj)->ki_lock = &(*iobj)->ki_lock_priv; } else (*iobj)->ki_lock = lock; KeAcquireSpinLock(&ntoskrnl_intlock, &curirql); InsertHeadList((&ntoskrnl_intlist), (&(*iobj)->ki_list)); KeReleaseSpinLock(&ntoskrnl_intlock, curirql); return (STATUS_SUCCESS); } void IoDisconnectInterrupt(iobj) kinterrupt *iobj; { uint8_t irql; if (iobj == NULL) return; KeAcquireSpinLock(&ntoskrnl_intlock, &irql); RemoveEntryList((&iobj->ki_list)); KeReleaseSpinLock(&ntoskrnl_intlock, irql); ExFreePool(iobj); } device_object * IoAttachDeviceToDeviceStack(src, dst) device_object *src; device_object *dst; { device_object *attached; mtx_lock(&ntoskrnl_dispatchlock); attached = IoGetAttachedDevice(dst); attached->do_attacheddev = src; src->do_attacheddev = NULL; src->do_stacksize = attached->do_stacksize + 1; mtx_unlock(&ntoskrnl_dispatchlock); return (attached); } void IoDetachDevice(topdev) device_object *topdev; { device_object *tail; mtx_lock(&ntoskrnl_dispatchlock); /* First, break the chain. */ tail = topdev->do_attacheddev; if (tail == NULL) { mtx_unlock(&ntoskrnl_dispatchlock); return; } topdev->do_attacheddev = tail->do_attacheddev; topdev->do_refcnt--; /* Now reduce the stacksize count for the takm_il objects. */ tail = topdev->do_attacheddev; while (tail != NULL) { tail->do_stacksize--; tail = tail->do_attacheddev; } mtx_unlock(&ntoskrnl_dispatchlock); } /* * For the most part, an object is considered signalled if * dh_sigstate == TRUE. The exception is for mutant objects * (mutexes), where the logic works like this: * * - If the thread already owns the object and sigstate is * less than or equal to 0, then the object is considered * signalled (recursive acquisition). * - If dh_sigstate == 1, the object is also considered * signalled. */ static int ntoskrnl_is_signalled(obj, td) nt_dispatch_header *obj; struct thread *td; { kmutant *km; if (obj->dh_type == DISP_TYPE_MUTANT) { km = (kmutant *)obj; if ((obj->dh_sigstate <= 0 && km->km_ownerthread == td) || obj->dh_sigstate == 1) return (TRUE); return (FALSE); } if (obj->dh_sigstate > 0) return (TRUE); return (FALSE); } static void ntoskrnl_satisfy_wait(obj, td) nt_dispatch_header *obj; struct thread *td; { kmutant *km; switch (obj->dh_type) { case DISP_TYPE_MUTANT: km = (struct kmutant *)obj; obj->dh_sigstate--; /* * If sigstate reaches 0, the mutex is now * non-signalled (the new thread owns it). */ if (obj->dh_sigstate == 0) { km->km_ownerthread = td; if (km->km_abandoned == TRUE) km->km_abandoned = FALSE; } break; /* Synchronization objects get reset to unsignalled. */ case DISP_TYPE_SYNCHRONIZATION_EVENT: case DISP_TYPE_SYNCHRONIZATION_TIMER: obj->dh_sigstate = 0; break; case DISP_TYPE_SEMAPHORE: obj->dh_sigstate--; break; default: break; } } static void ntoskrnl_satisfy_multiple_waits(wb) wait_block *wb; { wait_block *cur; struct thread *td; cur = wb; td = wb->wb_kthread; do { ntoskrnl_satisfy_wait(wb->wb_object, td); cur->wb_awakened = TRUE; cur = cur->wb_next; } while (cur != wb); } /* Always called with dispatcher lock held. */ static void ntoskrnl_waittest(obj, increment) nt_dispatch_header *obj; uint32_t increment; { wait_block *w, *next; list_entry *e; struct thread *td; wb_ext *we; int satisfied; /* * Once an object has been signalled, we walk its list of * wait blocks. If a wait block can be awakened, then satisfy * waits as necessary and wake the thread. * * The rules work like this: * * If a wait block is marked as WAITTYPE_ANY, then * we can satisfy the wait conditions on the current * object and wake the thread right away. Satisfying * the wait also has the effect of breaking us out * of the search loop. * * If the object is marked as WAITTYLE_ALL, then the * wait block will be part of a circularly linked * list of wait blocks belonging to a waiting thread * that's sleeping in KeWaitForMultipleObjects(). In * order to wake the thread, all the objects in the * wait list must be in the signalled state. If they * are, we then satisfy all of them and wake the * thread. * */ e = obj->dh_waitlisthead.nle_flink; while (e != &obj->dh_waitlisthead && obj->dh_sigstate > 0) { w = CONTAINING_RECORD(e, wait_block, wb_waitlist); we = w->wb_ext; td = we->we_td; satisfied = FALSE; if (w->wb_waittype == WAITTYPE_ANY) { /* * Thread can be awakened if * any wait is satisfied. */ ntoskrnl_satisfy_wait(obj, td); satisfied = TRUE; w->wb_awakened = TRUE; } else { /* * Thread can only be woken up * if all waits are satisfied. * If the thread is waiting on multiple * objects, they should all be linked * through the wb_next pointers in the * wait blocks. */ satisfied = TRUE; next = w->wb_next; while (next != w) { if (ntoskrnl_is_signalled(obj, td) == FALSE) { satisfied = FALSE; break; } next = next->wb_next; } ntoskrnl_satisfy_multiple_waits(w); } if (satisfied == TRUE) cv_broadcastpri(&we->we_cv, (w->wb_oldpri - (increment * 4)) > PRI_MIN_KERN ? w->wb_oldpri - (increment * 4) : PRI_MIN_KERN); e = e->nle_flink; } } /* * Return the number of 100 nanosecond intervals since * January 1, 1601. (?!?!) */ void ntoskrnl_time(tval) uint64_t *tval; { struct timespec ts; nanotime(&ts); *tval = (uint64_t)ts.tv_nsec / 100 + (uint64_t)ts.tv_sec * 10000000 + 11644473600 * 10000000; /* 100ns ticks from 1601 to 1970 */ } static void KeQuerySystemTime(current_time) uint64_t *current_time; { ntoskrnl_time(current_time); } static uint32_t KeTickCount(void) { struct timeval tv; getmicrouptime(&tv); return tvtohz(&tv); } /* * KeWaitForSingleObject() is a tricky beast, because it can be used * with several different object types: semaphores, timers, events, * mutexes and threads. Semaphores don't appear very often, but the * other object types are quite common. KeWaitForSingleObject() is * what's normally used to acquire a mutex, and it can be used to * wait for a thread termination. * * The Windows NDIS API is implemented in terms of Windows kernel * primitives, and some of the object manipulation is duplicated in * NDIS. For example, NDIS has timers and events, which are actually * Windows kevents and ktimers. Now, you're supposed to only use the * NDIS variants of these objects within the confines of the NDIS API, * but there are some naughty developers out there who will use * KeWaitForSingleObject() on NDIS timer and event objects, so we * have to support that as well. Conseqently, our NDIS timer and event * code has to be closely tied into our ntoskrnl timer and event code, * just as it is in Windows. * * KeWaitForSingleObject() may do different things for different kinds * of objects: * * - For events, we check if the event has been signalled. If the * event is already in the signalled state, we just return immediately, * otherwise we wait for it to be set to the signalled state by someone * else calling KeSetEvent(). Events can be either synchronization or * notification events. * * - For timers, if the timer has already fired and the timer is in * the signalled state, we just return, otherwise we wait on the * timer. Unlike an event, timers get signalled automatically when * they expire rather than someone having to trip them manually. * Timers initialized with KeInitializeTimer() are always notification * events: KeInitializeTimerEx() lets you initialize a timer as * either a notification or synchronization event. * * - For mutexes, we try to acquire the mutex and if we can't, we wait * on the mutex until it's available and then grab it. When a mutex is * released, it enters the signalled state, which wakes up one of the * threads waiting to acquire it. Mutexes are always synchronization * events. * * - For threads, the only thing we do is wait until the thread object * enters a signalled state, which occurs when the thread terminates. * Threads are always notification events. * * A notification event wakes up all threads waiting on an object. A * synchronization event wakes up just one. Also, a synchronization event * is auto-clearing, which means we automatically set the event back to * the non-signalled state once the wakeup is done. */ uint32_t KeWaitForSingleObject(void *arg, uint32_t reason, uint32_t mode, uint8_t alertable, int64_t *duetime) { wait_block w; struct thread *td = curthread; struct timeval tv; int error = 0; uint64_t curtime; wb_ext we; nt_dispatch_header *obj; obj = arg; if (obj == NULL) return (STATUS_INVALID_PARAMETER); mtx_lock(&ntoskrnl_dispatchlock); cv_init(&we.we_cv, "KeWFS"); we.we_td = td; /* * Check to see if this object is already signalled, * and just return without waiting if it is. */ if (ntoskrnl_is_signalled(obj, td) == TRUE) { /* Sanity check the signal state value. */ if (obj->dh_sigstate != INT32_MIN) { ntoskrnl_satisfy_wait(obj, curthread); mtx_unlock(&ntoskrnl_dispatchlock); return (STATUS_SUCCESS); } else { /* * There's a limit to how many times we can * recursively acquire a mutant. If we hit * the limit, something is very wrong. */ if (obj->dh_type == DISP_TYPE_MUTANT) { mtx_unlock(&ntoskrnl_dispatchlock); panic("mutant limit exceeded"); } } } bzero((char *)&w, sizeof(wait_block)); w.wb_object = obj; w.wb_ext = &we; w.wb_waittype = WAITTYPE_ANY; w.wb_next = &w; w.wb_waitkey = 0; w.wb_awakened = FALSE; w.wb_oldpri = td->td_priority; InsertTailList((&obj->dh_waitlisthead), (&w.wb_waitlist)); /* * The timeout value is specified in 100 nanosecond units * and can be a positive or negative number. If it's positive, * then the duetime is absolute, and we need to convert it * to an absolute offset relative to now in order to use it. * If it's negative, then the duetime is relative and we * just have to convert the units. */ if (duetime != NULL) { if (*duetime < 0) { tv.tv_sec = - (*duetime) / 10000000; tv.tv_usec = (- (*duetime) / 10) - (tv.tv_sec * 1000000); } else { ntoskrnl_time(&curtime); if (*duetime < curtime) tv.tv_sec = tv.tv_usec = 0; else { tv.tv_sec = ((*duetime) - curtime) / 10000000; tv.tv_usec = ((*duetime) - curtime) / 10 - (tv.tv_sec * 1000000); } } } if (duetime == NULL) cv_wait(&we.we_cv, &ntoskrnl_dispatchlock); else error = cv_timedwait(&we.we_cv, &ntoskrnl_dispatchlock, tvtohz(&tv)); RemoveEntryList(&w.wb_waitlist); cv_destroy(&we.we_cv); /* We timed out. Leave the object alone and return status. */ if (error == EWOULDBLOCK) { mtx_unlock(&ntoskrnl_dispatchlock); return (STATUS_TIMEOUT); } mtx_unlock(&ntoskrnl_dispatchlock); return (STATUS_SUCCESS); /* return (KeWaitForMultipleObjects(1, &obj, WAITTYPE_ALL, reason, mode, alertable, duetime, &w)); */ } static uint32_t KeWaitForMultipleObjects(uint32_t cnt, nt_dispatch_header *obj[], uint32_t wtype, uint32_t reason, uint32_t mode, uint8_t alertable, int64_t *duetime, wait_block *wb_array) { struct thread *td = curthread; wait_block *whead, *w; wait_block _wb_array[MAX_WAIT_OBJECTS]; nt_dispatch_header *cur; struct timeval tv; int i, wcnt = 0, error = 0; uint64_t curtime; struct timespec t1, t2; uint32_t status = STATUS_SUCCESS; wb_ext we; if (cnt > MAX_WAIT_OBJECTS) return (STATUS_INVALID_PARAMETER); if (cnt > THREAD_WAIT_OBJECTS && wb_array == NULL) return (STATUS_INVALID_PARAMETER); mtx_lock(&ntoskrnl_dispatchlock); cv_init(&we.we_cv, "KeWFM"); we.we_td = td; if (wb_array == NULL) whead = _wb_array; else whead = wb_array; bzero((char *)whead, sizeof(wait_block) * cnt); /* First pass: see if we can satisfy any waits immediately. */ wcnt = 0; w = whead; for (i = 0; i < cnt; i++) { InsertTailList((&obj[i]->dh_waitlisthead), (&w->wb_waitlist)); w->wb_ext = &we; w->wb_object = obj[i]; w->wb_waittype = wtype; w->wb_waitkey = i; w->wb_awakened = FALSE; w->wb_oldpri = td->td_priority; w->wb_next = w + 1; w++; wcnt++; if (ntoskrnl_is_signalled(obj[i], td)) { /* * There's a limit to how many times * we can recursively acquire a mutant. * If we hit the limit, something * is very wrong. */ if (obj[i]->dh_sigstate == INT32_MIN && obj[i]->dh_type == DISP_TYPE_MUTANT) { mtx_unlock(&ntoskrnl_dispatchlock); panic("mutant limit exceeded"); } /* * If this is a WAITTYPE_ANY wait, then * satisfy the waited object and exit * right now. */ if (wtype == WAITTYPE_ANY) { ntoskrnl_satisfy_wait(obj[i], td); status = STATUS_WAIT_0 + i; goto wait_done; } else { w--; wcnt--; w->wb_object = NULL; RemoveEntryList(&w->wb_waitlist); } } } /* * If this is a WAITTYPE_ALL wait and all objects are * already signalled, satisfy the waits and exit now. */ if (wtype == WAITTYPE_ALL && wcnt == 0) { for (i = 0; i < cnt; i++) ntoskrnl_satisfy_wait(obj[i], td); status = STATUS_SUCCESS; goto wait_done; } /* * Create a circular waitblock list. The waitcount * must always be non-zero when we get here. */ (w - 1)->wb_next = whead; /* Wait on any objects that aren't yet signalled. */ /* Calculate timeout, if any. */ if (duetime != NULL) { if (*duetime < 0) { tv.tv_sec = - (*duetime) / 10000000; tv.tv_usec = (- (*duetime) / 10) - (tv.tv_sec * 1000000); } else { ntoskrnl_time(&curtime); if (*duetime < curtime) tv.tv_sec = tv.tv_usec = 0; else { tv.tv_sec = ((*duetime) - curtime) / 10000000; tv.tv_usec = ((*duetime) - curtime) / 10 - (tv.tv_sec * 1000000); } } } while (wcnt) { nanotime(&t1); if (duetime == NULL) cv_wait(&we.we_cv, &ntoskrnl_dispatchlock); else error = cv_timedwait(&we.we_cv, &ntoskrnl_dispatchlock, tvtohz(&tv)); /* Wait with timeout expired. */ if (error) { status = STATUS_TIMEOUT; goto wait_done; } nanotime(&t2); /* See what's been signalled. */ w = whead; do { cur = w->wb_object; if (ntoskrnl_is_signalled(cur, td) == TRUE || w->wb_awakened == TRUE) { /* Sanity check the signal state value. */ if (cur->dh_sigstate == INT32_MIN && cur->dh_type == DISP_TYPE_MUTANT) { mtx_unlock(&ntoskrnl_dispatchlock); panic("mutant limit exceeded"); } wcnt--; if (wtype == WAITTYPE_ANY) { status = w->wb_waitkey & STATUS_WAIT_0; goto wait_done; } } w = w->wb_next; } while (w != whead); /* * If all objects have been signalled, or if this * is a WAITTYPE_ANY wait and we were woke up by * someone, we can bail. */ if (wcnt == 0) { status = STATUS_SUCCESS; goto wait_done; } /* * If this is WAITTYPE_ALL wait, and there's still * objects that haven't been signalled, deduct the * time that's elapsed so far from the timeout and * wait again (or continue waiting indefinitely if * there's no timeout). */ if (duetime != NULL) { tv.tv_sec -= (t2.tv_sec - t1.tv_sec); tv.tv_usec -= (t2.tv_nsec - t1.tv_nsec) / 1000; } } wait_done: cv_destroy(&we.we_cv); for (i = 0; i < cnt; i++) { if (whead[i].wb_object != NULL) RemoveEntryList(&whead[i].wb_waitlist); } mtx_unlock(&ntoskrnl_dispatchlock); return (status); } static void WRITE_REGISTER_USHORT(uint16_t *reg, uint16_t val) { bus_space_write_2(NDIS_BUS_SPACE_MEM, 0x0, (bus_size_t)reg, val); } static uint16_t READ_REGISTER_USHORT(reg) uint16_t *reg; { return (bus_space_read_2(NDIS_BUS_SPACE_MEM, 0x0, (bus_size_t)reg)); } static void WRITE_REGISTER_ULONG(reg, val) uint32_t *reg; uint32_t val; { bus_space_write_4(NDIS_BUS_SPACE_MEM, 0x0, (bus_size_t)reg, val); } static uint32_t READ_REGISTER_ULONG(reg) uint32_t *reg; { return (bus_space_read_4(NDIS_BUS_SPACE_MEM, 0x0, (bus_size_t)reg)); } static uint8_t READ_REGISTER_UCHAR(uint8_t *reg) { return (bus_space_read_1(NDIS_BUS_SPACE_MEM, 0x0, (bus_size_t)reg)); } static void WRITE_REGISTER_UCHAR(uint8_t *reg, uint8_t val) { bus_space_write_1(NDIS_BUS_SPACE_MEM, 0x0, (bus_size_t)reg, val); } static int64_t _allmul(a, b) int64_t a; int64_t b; { return (a * b); } static int64_t _alldiv(a, b) int64_t a; int64_t b; { return (a / b); } static int64_t _allrem(a, b) int64_t a; int64_t b; { return (a % b); } static uint64_t _aullmul(a, b) uint64_t a; uint64_t b; { return (a * b); } static uint64_t _aulldiv(a, b) uint64_t a; uint64_t b; { return (a / b); } static uint64_t _aullrem(a, b) uint64_t a; uint64_t b; { return (a % b); } static int64_t _allshl(int64_t a, uint8_t b) { return (a << b); } static uint64_t _aullshl(uint64_t a, uint8_t b) { return (a << b); } static int64_t _allshr(int64_t a, uint8_t b) { return (a >> b); } static uint64_t _aullshr(uint64_t a, uint8_t b) { return (a >> b); } static slist_entry * ntoskrnl_pushsl(head, entry) slist_header *head; slist_entry *entry; { slist_entry *oldhead; oldhead = head->slh_list.slh_next; entry->sl_next = head->slh_list.slh_next; head->slh_list.slh_next = entry; head->slh_list.slh_depth++; head->slh_list.slh_seq++; return (oldhead); } static void InitializeSListHead(head) slist_header *head; { memset(head, 0, sizeof(*head)); } static slist_entry * ntoskrnl_popsl(head) slist_header *head; { slist_entry *first; first = head->slh_list.slh_next; if (first != NULL) { head->slh_list.slh_next = first->sl_next; head->slh_list.slh_depth--; head->slh_list.slh_seq++; } return (first); } /* * We need this to make lookaside lists work for amd64. * We pass a pointer to ExAllocatePoolWithTag() the lookaside * list structure. For amd64 to work right, this has to be a * pointer to the wrapped version of the routine, not the * original. Letting the Windows driver invoke the original * function directly will result in a convention calling * mismatch and a pretty crash. On x86, this effectively * becomes a no-op since ipt_func and ipt_wrap are the same. */ static funcptr ntoskrnl_findwrap(func) funcptr func; { image_patch_table *patch; patch = ntoskrnl_functbl; while (patch->ipt_func != NULL) { if ((funcptr)patch->ipt_func == func) return ((funcptr)patch->ipt_wrap); patch++; } return (NULL); } static void ExInitializePagedLookasideList(paged_lookaside_list *lookaside, lookaside_alloc_func *allocfunc, lookaside_free_func *freefunc, uint32_t flags, size_t size, uint32_t tag, uint16_t depth) { bzero((char *)lookaside, sizeof(paged_lookaside_list)); if (size < sizeof(slist_entry)) lookaside->nll_l.gl_size = sizeof(slist_entry); else lookaside->nll_l.gl_size = size; lookaside->nll_l.gl_tag = tag; if (allocfunc == NULL) lookaside->nll_l.gl_allocfunc = ntoskrnl_findwrap((funcptr)ExAllocatePoolWithTag); else lookaside->nll_l.gl_allocfunc = allocfunc; if (freefunc == NULL) lookaside->nll_l.gl_freefunc = ntoskrnl_findwrap((funcptr)ExFreePool); else lookaside->nll_l.gl_freefunc = freefunc; #ifdef __i386__ KeInitializeSpinLock(&lookaside->nll_obsoletelock); #endif lookaside->nll_l.gl_type = NonPagedPool; lookaside->nll_l.gl_depth = depth; lookaside->nll_l.gl_maxdepth = LOOKASIDE_DEPTH; } static void ExDeletePagedLookasideList(lookaside) paged_lookaside_list *lookaside; { void *buf; void (*freefunc)(void *); freefunc = lookaside->nll_l.gl_freefunc; while((buf = ntoskrnl_popsl(&lookaside->nll_l.gl_listhead)) != NULL) MSCALL1(freefunc, buf); } static void ExInitializeNPagedLookasideList(npaged_lookaside_list *lookaside, lookaside_alloc_func *allocfunc, lookaside_free_func *freefunc, uint32_t flags, size_t size, uint32_t tag, uint16_t depth) { bzero((char *)lookaside, sizeof(npaged_lookaside_list)); if (size < sizeof(slist_entry)) lookaside->nll_l.gl_size = sizeof(slist_entry); else lookaside->nll_l.gl_size = size; lookaside->nll_l.gl_tag = tag; if (allocfunc == NULL) lookaside->nll_l.gl_allocfunc = ntoskrnl_findwrap((funcptr)ExAllocatePoolWithTag); else lookaside->nll_l.gl_allocfunc = allocfunc; if (freefunc == NULL) lookaside->nll_l.gl_freefunc = ntoskrnl_findwrap((funcptr)ExFreePool); else lookaside->nll_l.gl_freefunc = freefunc; #ifdef __i386__ KeInitializeSpinLock(&lookaside->nll_obsoletelock); #endif lookaside->nll_l.gl_type = NonPagedPool; lookaside->nll_l.gl_depth = depth; lookaside->nll_l.gl_maxdepth = LOOKASIDE_DEPTH; } static void ExDeleteNPagedLookasideList(lookaside) npaged_lookaside_list *lookaside; { void *buf; void (*freefunc)(void *); freefunc = lookaside->nll_l.gl_freefunc; while((buf = ntoskrnl_popsl(&lookaside->nll_l.gl_listhead)) != NULL) MSCALL1(freefunc, buf); } slist_entry * InterlockedPushEntrySList(head, entry) slist_header *head; slist_entry *entry; { slist_entry *oldhead; mtx_lock_spin(&ntoskrnl_interlock); oldhead = ntoskrnl_pushsl(head, entry); mtx_unlock_spin(&ntoskrnl_interlock); return (oldhead); } slist_entry * InterlockedPopEntrySList(head) slist_header *head; { slist_entry *first; mtx_lock_spin(&ntoskrnl_interlock); first = ntoskrnl_popsl(head); mtx_unlock_spin(&ntoskrnl_interlock); return (first); } static slist_entry * ExInterlockedPushEntrySList(head, entry, lock) slist_header *head; slist_entry *entry; kspin_lock *lock; { return (InterlockedPushEntrySList(head, entry)); } static slist_entry * ExInterlockedPopEntrySList(head, lock) slist_header *head; kspin_lock *lock; { return (InterlockedPopEntrySList(head)); } uint16_t ExQueryDepthSList(head) slist_header *head; { uint16_t depth; mtx_lock_spin(&ntoskrnl_interlock); depth = head->slh_list.slh_depth; mtx_unlock_spin(&ntoskrnl_interlock); return (depth); } void KeInitializeSpinLock(lock) kspin_lock *lock; { *lock = 0; } #ifdef __i386__ void KefAcquireSpinLockAtDpcLevel(lock) kspin_lock *lock; { #ifdef NTOSKRNL_DEBUG_SPINLOCKS int i = 0; #endif while (atomic_cmpset_acq_int((volatile u_int *)lock, 0, 1) == 0) { /* sit and spin */; #ifdef NTOSKRNL_DEBUG_SPINLOCKS i++; if (i > 200000000) panic("DEADLOCK!"); #endif } } void KefReleaseSpinLockFromDpcLevel(lock) kspin_lock *lock; { atomic_store_rel_int((volatile u_int *)lock, 0); } uint8_t KeAcquireSpinLockRaiseToDpc(kspin_lock *lock) { uint8_t oldirql; if (KeGetCurrentIrql() > DISPATCH_LEVEL) panic("IRQL_NOT_LESS_THAN_OR_EQUAL"); KeRaiseIrql(DISPATCH_LEVEL, &oldirql); KeAcquireSpinLockAtDpcLevel(lock); return (oldirql); } #else void KeAcquireSpinLockAtDpcLevel(kspin_lock *lock) { while (atomic_cmpset_acq_int((volatile u_int *)lock, 0, 1) == 0) /* sit and spin */; } void KeReleaseSpinLockFromDpcLevel(kspin_lock *lock) { atomic_store_rel_int((volatile u_int *)lock, 0); } #endif /* __i386__ */ uintptr_t InterlockedExchange(dst, val) volatile uint32_t *dst; uintptr_t val; { uintptr_t r; mtx_lock_spin(&ntoskrnl_interlock); r = *dst; *dst = val; mtx_unlock_spin(&ntoskrnl_interlock); return (r); } static uint32_t InterlockedIncrement(addend) volatile uint32_t *addend; { atomic_add_long((volatile u_long *)addend, 1); return (*addend); } static uint32_t InterlockedDecrement(addend) volatile uint32_t *addend; { atomic_subtract_long((volatile u_long *)addend, 1); return (*addend); } static void ExInterlockedAddLargeStatistic(addend, inc) uint64_t *addend; uint32_t inc; { mtx_lock_spin(&ntoskrnl_interlock); *addend += inc; mtx_unlock_spin(&ntoskrnl_interlock); }; mdl * IoAllocateMdl(void *vaddr, uint32_t len, uint8_t secondarybuf, uint8_t chargequota, irp *iopkt) { mdl *m; int zone = 0; if (MmSizeOfMdl(vaddr, len) > MDL_ZONE_SIZE) m = ExAllocatePoolWithTag(NonPagedPool, MmSizeOfMdl(vaddr, len), 0); else { m = uma_zalloc(mdl_zone, M_NOWAIT | M_ZERO); zone++; } if (m == NULL) return (NULL); MmInitializeMdl(m, vaddr, len); /* * MmInitializMdl() clears the flags field, so we * have to set this here. If the MDL came from the * MDL UMA zone, tag it so we can release it to * the right place later. */ if (zone) m->mdl_flags = MDL_ZONE_ALLOCED; if (iopkt != NULL) { if (secondarybuf == TRUE) { mdl *last; last = iopkt->irp_mdl; while (last->mdl_next != NULL) last = last->mdl_next; last->mdl_next = m; } else { if (iopkt->irp_mdl != NULL) panic("leaking an MDL in IoAllocateMdl()"); iopkt->irp_mdl = m; } } return (m); } void IoFreeMdl(m) mdl *m; { if (m == NULL) return; if (m->mdl_flags & MDL_ZONE_ALLOCED) uma_zfree(mdl_zone, m); else ExFreePool(m); } static void * MmAllocateContiguousMemory(size, highest) uint32_t size; uint64_t highest; { void *addr; size_t pagelength = roundup(size, PAGE_SIZE); addr = ExAllocatePoolWithTag(NonPagedPool, pagelength, 0); return (addr); } static void * MmAllocateContiguousMemorySpecifyCache(size, lowest, highest, boundary, cachetype) uint32_t size; uint64_t lowest; uint64_t highest; uint64_t boundary; enum nt_caching_type cachetype; { vm_memattr_t memattr; void *ret; switch (cachetype) { case MmNonCached: memattr = VM_MEMATTR_UNCACHEABLE; break; case MmWriteCombined: memattr = VM_MEMATTR_WRITE_COMBINING; break; case MmNonCachedUnordered: memattr = VM_MEMATTR_UNCACHEABLE; break; case MmCached: case MmHardwareCoherentCached: case MmUSWCCached: default: memattr = VM_MEMATTR_DEFAULT; break; } ret = (void *)kmem_alloc_contig(size, M_ZERO | M_NOWAIT, lowest, highest, PAGE_SIZE, boundary, memattr); if (ret != NULL) malloc_type_allocated(M_DEVBUF, round_page(size)); return (ret); } static void MmFreeContiguousMemory(base) void *base; { ExFreePool(base); } static void MmFreeContiguousMemorySpecifyCache(base, size, cachetype) void *base; uint32_t size; enum nt_caching_type cachetype; { contigfree(base, size, M_DEVBUF); } static uint32_t MmSizeOfMdl(vaddr, len) void *vaddr; size_t len; { uint32_t l; l = sizeof(struct mdl) + (sizeof(vm_offset_t *) * SPAN_PAGES(vaddr, len)); return (l); } /* * The Microsoft documentation says this routine fills in the * page array of an MDL with the _physical_ page addresses that * comprise the buffer, but we don't really want to do that here. * Instead, we just fill in the page array with the kernel virtual * addresses of the buffers. */ void MmBuildMdlForNonPagedPool(m) mdl *m; { vm_offset_t *mdl_pages; int pagecnt, i; pagecnt = SPAN_PAGES(m->mdl_byteoffset, m->mdl_bytecount); if (pagecnt > (m->mdl_size - sizeof(mdl)) / sizeof(vm_offset_t *)) panic("not enough pages in MDL to describe buffer"); mdl_pages = MmGetMdlPfnArray(m); for (i = 0; i < pagecnt; i++) *mdl_pages = (vm_offset_t)m->mdl_startva + (i * PAGE_SIZE); m->mdl_flags |= MDL_SOURCE_IS_NONPAGED_POOL; m->mdl_mappedsystemva = MmGetMdlVirtualAddress(m); } static void * MmMapLockedPages(mdl *buf, uint8_t accessmode) { buf->mdl_flags |= MDL_MAPPED_TO_SYSTEM_VA; return (MmGetMdlVirtualAddress(buf)); } static void * MmMapLockedPagesSpecifyCache(mdl *buf, uint8_t accessmode, uint32_t cachetype, void *vaddr, uint32_t bugcheck, uint32_t prio) { return (MmMapLockedPages(buf, accessmode)); } static void MmUnmapLockedPages(vaddr, buf) void *vaddr; mdl *buf; { buf->mdl_flags &= ~MDL_MAPPED_TO_SYSTEM_VA; } /* * This function has a problem in that it will break if you * compile this module without PAE and try to use it on a PAE * kernel. Unfortunately, there's no way around this at the * moment. It's slightly less broken that using pmap_kextract(). * You'd think the virtual memory subsystem would help us out * here, but it doesn't. */ static uint64_t MmGetPhysicalAddress(void *base) { return (pmap_extract(kernel_map->pmap, (vm_offset_t)base)); } void * MmGetSystemRoutineAddress(ustr) unicode_string *ustr; { ansi_string astr; if (RtlUnicodeStringToAnsiString(&astr, ustr, TRUE)) return (NULL); return (ndis_get_routine_address(ntoskrnl_functbl, astr.as_buf)); } uint8_t MmIsAddressValid(vaddr) void *vaddr; { if (pmap_extract(kernel_map->pmap, (vm_offset_t)vaddr)) return (TRUE); return (FALSE); } void * MmMapIoSpace(paddr, len, cachetype) uint64_t paddr; uint32_t len; uint32_t cachetype; { devclass_t nexus_class; device_t *nexus_devs, devp; int nexus_count = 0; device_t matching_dev = NULL; struct resource *res; int i; vm_offset_t v; /* There will always be at least one nexus. */ nexus_class = devclass_find("nexus"); devclass_get_devices(nexus_class, &nexus_devs, &nexus_count); for (i = 0; i < nexus_count; i++) { devp = nexus_devs[i]; matching_dev = ntoskrnl_finddev(devp, paddr, &res); if (matching_dev) break; } free(nexus_devs, M_TEMP); if (matching_dev == NULL) return (NULL); v = (vm_offset_t)rman_get_virtual(res); if (paddr > rman_get_start(res)) v += paddr - rman_get_start(res); return ((void *)v); } void MmUnmapIoSpace(vaddr, len) void *vaddr; size_t len; { } static device_t ntoskrnl_finddev(dev, paddr, res) device_t dev; uint64_t paddr; struct resource **res; { device_t *children = NULL; device_t matching_dev; int childcnt; struct resource *r; struct resource_list *rl; struct resource_list_entry *rle; uint32_t flags; int i; /* We only want devices that have been successfully probed. */ if (device_is_alive(dev) == FALSE) return (NULL); rl = BUS_GET_RESOURCE_LIST(device_get_parent(dev), dev); if (rl != NULL) { STAILQ_FOREACH(rle, rl, link) { r = rle->res; if (r == NULL) continue; flags = rman_get_flags(r); if (rle->type == SYS_RES_MEMORY && paddr >= rman_get_start(r) && paddr <= rman_get_end(r)) { if (!(flags & RF_ACTIVE)) bus_activate_resource(dev, SYS_RES_MEMORY, 0, r); *res = r; return (dev); } } } /* * If this device has children, do another * level of recursion to inspect them. */ device_get_children(dev, &children, &childcnt); for (i = 0; i < childcnt; i++) { matching_dev = ntoskrnl_finddev(children[i], paddr, res); if (matching_dev != NULL) { free(children, M_TEMP); return (matching_dev); } } /* Won't somebody please think of the children! */ if (children != NULL) free(children, M_TEMP); return (NULL); } /* * Workitems are unlike DPCs, in that they run in a user-mode thread * context rather than at DISPATCH_LEVEL in kernel context. In our * case we run them in kernel context anyway. */ static void ntoskrnl_workitem_thread(arg) void *arg; { kdpc_queue *kq; list_entry *l; io_workitem *iw; uint8_t irql; kq = arg; InitializeListHead(&kq->kq_disp); kq->kq_td = curthread; kq->kq_exit = 0; KeInitializeSpinLock(&kq->kq_lock); KeInitializeEvent(&kq->kq_proc, EVENT_TYPE_SYNC, FALSE); while (1) { KeWaitForSingleObject(&kq->kq_proc, 0, 0, TRUE, NULL); KeAcquireSpinLock(&kq->kq_lock, &irql); if (kq->kq_exit) { kq->kq_exit = 0; KeReleaseSpinLock(&kq->kq_lock, irql); break; } while (!IsListEmpty(&kq->kq_disp)) { l = RemoveHeadList(&kq->kq_disp); iw = CONTAINING_RECORD(l, io_workitem, iw_listentry); InitializeListHead((&iw->iw_listentry)); if (iw->iw_func == NULL) continue; KeReleaseSpinLock(&kq->kq_lock, irql); MSCALL2(iw->iw_func, iw->iw_dobj, iw->iw_ctx); KeAcquireSpinLock(&kq->kq_lock, &irql); } KeReleaseSpinLock(&kq->kq_lock, irql); } kproc_exit(0); return; /* notreached */ } static ndis_status RtlCharToInteger(src, base, val) const char *src; uint32_t base; uint32_t *val; { int negative = 0; uint32_t res; if (!src || !val) return (STATUS_ACCESS_VIOLATION); while (*src != '\0' && *src <= ' ') src++; if (*src == '+') src++; else if (*src == '-') { src++; negative = 1; } if (base == 0) { base = 10; if (*src == '0') { src++; if (*src == 'b') { base = 2; src++; } else if (*src == 'o') { base = 8; src++; } else if (*src == 'x') { base = 16; src++; } } } else if (!(base == 2 || base == 8 || base == 10 || base == 16)) return (STATUS_INVALID_PARAMETER); for (res = 0; *src; src++) { int v; if (isdigit(*src)) v = *src - '0'; else if (isxdigit(*src)) v = tolower(*src) - 'a' + 10; else v = base; if (v >= base) return (STATUS_INVALID_PARAMETER); res = res * base + v; } *val = negative ? -res : res; return (STATUS_SUCCESS); } static void ntoskrnl_destroy_workitem_threads(void) { kdpc_queue *kq; int i; for (i = 0; i < WORKITEM_THREADS; i++) { kq = wq_queues + i; kq->kq_exit = 1; KeSetEvent(&kq->kq_proc, IO_NO_INCREMENT, FALSE); while (kq->kq_exit) tsleep(kq->kq_td->td_proc, PWAIT, "waitiw", hz/10); } } io_workitem * IoAllocateWorkItem(dobj) device_object *dobj; { io_workitem *iw; iw = uma_zalloc(iw_zone, M_NOWAIT); if (iw == NULL) return (NULL); InitializeListHead(&iw->iw_listentry); iw->iw_dobj = dobj; mtx_lock(&ntoskrnl_dispatchlock); iw->iw_idx = wq_idx; WORKIDX_INC(wq_idx); mtx_unlock(&ntoskrnl_dispatchlock); return (iw); } void IoFreeWorkItem(iw) io_workitem *iw; { uma_zfree(iw_zone, iw); } void IoQueueWorkItem(iw, iw_func, qtype, ctx) io_workitem *iw; io_workitem_func iw_func; uint32_t qtype; void *ctx; { kdpc_queue *kq; list_entry *l; io_workitem *cur; uint8_t irql; kq = wq_queues + iw->iw_idx; KeAcquireSpinLock(&kq->kq_lock, &irql); /* * Traverse the list and make sure this workitem hasn't * already been inserted. Queuing the same workitem * twice will hose the list but good. */ l = kq->kq_disp.nle_flink; while (l != &kq->kq_disp) { cur = CONTAINING_RECORD(l, io_workitem, iw_listentry); if (cur == iw) { /* Already queued -- do nothing. */ KeReleaseSpinLock(&kq->kq_lock, irql); return; } l = l->nle_flink; } iw->iw_func = iw_func; iw->iw_ctx = ctx; InsertTailList((&kq->kq_disp), (&iw->iw_listentry)); KeReleaseSpinLock(&kq->kq_lock, irql); KeSetEvent(&kq->kq_proc, IO_NO_INCREMENT, FALSE); } static void ntoskrnl_workitem(dobj, arg) device_object *dobj; void *arg; { io_workitem *iw; work_queue_item *w; work_item_func f; iw = arg; w = (work_queue_item *)dobj; f = (work_item_func)w->wqi_func; uma_zfree(iw_zone, iw); MSCALL2(f, w, w->wqi_ctx); } /* * The ExQueueWorkItem() API is deprecated in Windows XP. Microsoft * warns that it's unsafe and to use IoQueueWorkItem() instead. The * problem with ExQueueWorkItem() is that it can't guard against * the condition where a driver submits a job to the work queue and * is then unloaded before the job is able to run. IoQueueWorkItem() * acquires a reference to the device's device_object via the * object manager and retains it until after the job has completed, * which prevents the driver from being unloaded before the job * runs. (We don't currently support this behavior, though hopefully * that will change once the object manager API is fleshed out a bit.) * * Having said all that, the ExQueueWorkItem() API remains, because * there are still other parts of Windows that use it, including * NDIS itself: NdisScheduleWorkItem() calls ExQueueWorkItem(). * We fake up the ExQueueWorkItem() API on top of our implementation * of IoQueueWorkItem(). Workitem thread #3 is reserved exclusively * for ExQueueWorkItem() jobs, and we pass a pointer to the work * queue item (provided by the caller) in to IoAllocateWorkItem() * instead of the device_object. We need to save this pointer so * we can apply a sanity check: as with the DPC queue and other * workitem queues, we can't allow the same work queue item to * be queued twice. If it's already pending, we silently return */ void ExQueueWorkItem(w, qtype) work_queue_item *w; uint32_t qtype; { io_workitem *iw; io_workitem_func iwf; kdpc_queue *kq; list_entry *l; io_workitem *cur; uint8_t irql; /* * We need to do a special sanity test to make sure * the ExQueueWorkItem() API isn't used to queue * the same workitem twice. Rather than checking the * io_workitem pointer itself, we test the attached * device object, which is really a pointer to the * legacy work queue item structure. */ kq = wq_queues + WORKITEM_LEGACY_THREAD; KeAcquireSpinLock(&kq->kq_lock, &irql); l = kq->kq_disp.nle_flink; while (l != &kq->kq_disp) { cur = CONTAINING_RECORD(l, io_workitem, iw_listentry); if (cur->iw_dobj == (device_object *)w) { /* Already queued -- do nothing. */ KeReleaseSpinLock(&kq->kq_lock, irql); return; } l = l->nle_flink; } KeReleaseSpinLock(&kq->kq_lock, irql); iw = IoAllocateWorkItem((device_object *)w); if (iw == NULL) return; iw->iw_idx = WORKITEM_LEGACY_THREAD; iwf = (io_workitem_func)ntoskrnl_findwrap((funcptr)ntoskrnl_workitem); IoQueueWorkItem(iw, iwf, qtype, iw); } static void RtlZeroMemory(dst, len) void *dst; size_t len; { bzero(dst, len); } static void RtlSecureZeroMemory(dst, len) void *dst; size_t len; { memset(dst, 0, len); } static void RtlFillMemory(void *dst, size_t len, uint8_t c) { memset(dst, c, len); } static void RtlMoveMemory(dst, src, len) void *dst; const void *src; size_t len; { memmove(dst, src, len); } static void RtlCopyMemory(dst, src, len) void *dst; const void *src; size_t len; { bcopy(src, dst, len); } static size_t RtlCompareMemory(s1, s2, len) const void *s1; const void *s2; size_t len; { size_t i; uint8_t *m1, *m2; m1 = __DECONST(char *, s1); m2 = __DECONST(char *, s2); for (i = 0; i < len && m1[i] == m2[i]; i++); return (i); } void RtlInitAnsiString(dst, src) ansi_string *dst; char *src; { ansi_string *a; a = dst; if (a == NULL) return; if (src == NULL) { a->as_len = a->as_maxlen = 0; a->as_buf = NULL; } else { a->as_buf = src; a->as_len = a->as_maxlen = strlen(src); } } void RtlInitUnicodeString(dst, src) unicode_string *dst; uint16_t *src; { unicode_string *u; int i; u = dst; if (u == NULL) return; if (src == NULL) { u->us_len = u->us_maxlen = 0; u->us_buf = NULL; } else { i = 0; while(src[i] != 0) i++; u->us_buf = src; u->us_len = u->us_maxlen = i * 2; } } ndis_status RtlUnicodeStringToInteger(ustr, base, val) unicode_string *ustr; uint32_t base; uint32_t *val; { uint16_t *uchr; int len, neg = 0; char abuf[64]; char *astr; uchr = ustr->us_buf; len = ustr->us_len; bzero(abuf, sizeof(abuf)); if ((char)((*uchr) & 0xFF) == '-') { neg = 1; uchr++; len -= 2; } else if ((char)((*uchr) & 0xFF) == '+') { neg = 0; uchr++; len -= 2; } if (base == 0) { if ((char)((*uchr) & 0xFF) == 'b') { base = 2; uchr++; len -= 2; } else if ((char)((*uchr) & 0xFF) == 'o') { base = 8; uchr++; len -= 2; } else if ((char)((*uchr) & 0xFF) == 'x') { base = 16; uchr++; len -= 2; } else base = 10; } astr = abuf; if (neg) { strcpy(astr, "-"); astr++; } ntoskrnl_unicode_to_ascii(uchr, astr, len); *val = strtoul(abuf, NULL, base); return (STATUS_SUCCESS); } void RtlFreeUnicodeString(ustr) unicode_string *ustr; { if (ustr->us_buf == NULL) return; ExFreePool(ustr->us_buf); ustr->us_buf = NULL; } void RtlFreeAnsiString(astr) ansi_string *astr; { if (astr->as_buf == NULL) return; ExFreePool(astr->as_buf); astr->as_buf = NULL; } static int atoi(str) const char *str; { return (int)strtol(str, (char **)NULL, 10); } static long atol(str) const char *str; { return strtol(str, (char **)NULL, 10); } static int rand(void) { return (random()); } static void -srand(unsigned int seed) +srand(unsigned int seed __unused) { - - srandom(seed); } static uint8_t IoIsWdmVersionAvailable(uint8_t major, uint8_t minor) { if (major == WDM_MAJOR && minor == WDM_MINOR_WINXP) return (TRUE); return (FALSE); } static int32_t IoOpenDeviceRegistryKey(struct device_object *devobj, uint32_t type, uint32_t mask, void **key) { return (NDIS_STATUS_INVALID_DEVICE_REQUEST); } static ndis_status IoGetDeviceObjectPointer(name, reqaccess, fileobj, devobj) unicode_string *name; uint32_t reqaccess; void *fileobj; device_object *devobj; { return (STATUS_SUCCESS); } static ndis_status IoGetDeviceProperty(devobj, regprop, buflen, prop, reslen) device_object *devobj; uint32_t regprop; uint32_t buflen; void *prop; uint32_t *reslen; { driver_object *drv; uint16_t **name; drv = devobj->do_drvobj; switch (regprop) { case DEVPROP_DRIVER_KEYNAME: name = prop; *name = drv->dro_drivername.us_buf; *reslen = drv->dro_drivername.us_len; break; default: return (STATUS_INVALID_PARAMETER_2); break; } return (STATUS_SUCCESS); } static void KeInitializeMutex(kmutex, level) kmutant *kmutex; uint32_t level; { InitializeListHead((&kmutex->km_header.dh_waitlisthead)); kmutex->km_abandoned = FALSE; kmutex->km_apcdisable = 1; kmutex->km_header.dh_sigstate = 1; kmutex->km_header.dh_type = DISP_TYPE_MUTANT; kmutex->km_header.dh_size = sizeof(kmutant) / sizeof(uint32_t); kmutex->km_ownerthread = NULL; } static uint32_t KeReleaseMutex(kmutant *kmutex, uint8_t kwait) { uint32_t prevstate; mtx_lock(&ntoskrnl_dispatchlock); prevstate = kmutex->km_header.dh_sigstate; if (kmutex->km_ownerthread != curthread) { mtx_unlock(&ntoskrnl_dispatchlock); return (STATUS_MUTANT_NOT_OWNED); } kmutex->km_header.dh_sigstate++; kmutex->km_abandoned = FALSE; if (kmutex->km_header.dh_sigstate == 1) { kmutex->km_ownerthread = NULL; ntoskrnl_waittest(&kmutex->km_header, IO_NO_INCREMENT); } mtx_unlock(&ntoskrnl_dispatchlock); return (prevstate); } static uint32_t KeReadStateMutex(kmutex) kmutant *kmutex; { return (kmutex->km_header.dh_sigstate); } void KeInitializeEvent(nt_kevent *kevent, uint32_t type, uint8_t state) { InitializeListHead((&kevent->k_header.dh_waitlisthead)); kevent->k_header.dh_sigstate = state; if (type == EVENT_TYPE_NOTIFY) kevent->k_header.dh_type = DISP_TYPE_NOTIFICATION_EVENT; else kevent->k_header.dh_type = DISP_TYPE_SYNCHRONIZATION_EVENT; kevent->k_header.dh_size = sizeof(nt_kevent) / sizeof(uint32_t); } uint32_t KeResetEvent(kevent) nt_kevent *kevent; { uint32_t prevstate; mtx_lock(&ntoskrnl_dispatchlock); prevstate = kevent->k_header.dh_sigstate; kevent->k_header.dh_sigstate = FALSE; mtx_unlock(&ntoskrnl_dispatchlock); return (prevstate); } uint32_t KeSetEvent(nt_kevent *kevent, uint32_t increment, uint8_t kwait) { uint32_t prevstate; wait_block *w; nt_dispatch_header *dh; struct thread *td; wb_ext *we; mtx_lock(&ntoskrnl_dispatchlock); prevstate = kevent->k_header.dh_sigstate; dh = &kevent->k_header; if (IsListEmpty(&dh->dh_waitlisthead)) /* * If there's nobody in the waitlist, just set * the state to signalled. */ dh->dh_sigstate = 1; else { /* * Get the first waiter. If this is a synchronization * event, just wake up that one thread (don't bother * setting the state to signalled since we're supposed * to automatically clear synchronization events anyway). * * If it's a notification event, or the first * waiter is doing a WAITTYPE_ALL wait, go through * the full wait satisfaction process. */ w = CONTAINING_RECORD(dh->dh_waitlisthead.nle_flink, wait_block, wb_waitlist); we = w->wb_ext; td = we->we_td; if (kevent->k_header.dh_type == DISP_TYPE_NOTIFICATION_EVENT || w->wb_waittype == WAITTYPE_ALL) { if (prevstate == 0) { dh->dh_sigstate = 1; ntoskrnl_waittest(dh, increment); } } else { w->wb_awakened |= TRUE; cv_broadcastpri(&we->we_cv, (w->wb_oldpri - (increment * 4)) > PRI_MIN_KERN ? w->wb_oldpri - (increment * 4) : PRI_MIN_KERN); } } mtx_unlock(&ntoskrnl_dispatchlock); return (prevstate); } void KeClearEvent(kevent) nt_kevent *kevent; { kevent->k_header.dh_sigstate = FALSE; } uint32_t KeReadStateEvent(kevent) nt_kevent *kevent; { return (kevent->k_header.dh_sigstate); } /* * The object manager in Windows is responsible for managing * references and access to various types of objects, including * device_objects, events, threads, timers and so on. However, * there's a difference in the way objects are handled in user * mode versus kernel mode. * * In user mode (i.e. Win32 applications), all objects are * managed by the object manager. For example, when you create * a timer or event object, you actually end up with an * object_header (for the object manager's bookkeeping * purposes) and an object body (which contains the actual object * structure, e.g. ktimer, kevent, etc...). This allows Windows * to manage resource quotas and to enforce access restrictions * on basically every kind of system object handled by the kernel. * * However, in kernel mode, you only end up using the object * manager some of the time. For example, in a driver, you create * a timer object by simply allocating the memory for a ktimer * structure and initializing it with KeInitializeTimer(). Hence, * the timer has no object_header and no reference counting or * security/resource checks are done on it. The assumption in * this case is that if you're running in kernel mode, you know * what you're doing, and you're already at an elevated privilege * anyway. * * There are some exceptions to this. The two most important ones * for our purposes are device_objects and threads. We need to use * the object manager to do reference counting on device_objects, * and for threads, you can only get a pointer to a thread's * dispatch header by using ObReferenceObjectByHandle() on the * handle returned by PsCreateSystemThread(). */ static ndis_status ObReferenceObjectByHandle(ndis_handle handle, uint32_t reqaccess, void *otype, uint8_t accessmode, void **object, void **handleinfo) { nt_objref *nr; nr = malloc(sizeof(nt_objref), M_DEVBUF, M_NOWAIT|M_ZERO); if (nr == NULL) return (STATUS_INSUFFICIENT_RESOURCES); InitializeListHead((&nr->no_dh.dh_waitlisthead)); nr->no_obj = handle; nr->no_dh.dh_type = DISP_TYPE_THREAD; nr->no_dh.dh_sigstate = 0; nr->no_dh.dh_size = (uint8_t)(sizeof(struct thread) / sizeof(uint32_t)); TAILQ_INSERT_TAIL(&ntoskrnl_reflist, nr, link); *object = nr; return (STATUS_SUCCESS); } static void ObfDereferenceObject(object) void *object; { nt_objref *nr; nr = object; TAILQ_REMOVE(&ntoskrnl_reflist, nr, link); free(nr, M_DEVBUF); } static uint32_t ZwClose(handle) ndis_handle handle; { return (STATUS_SUCCESS); } static uint32_t WmiQueryTraceInformation(traceclass, traceinfo, infolen, reqlen, buf) uint32_t traceclass; void *traceinfo; uint32_t infolen; uint32_t reqlen; void *buf; { return (STATUS_NOT_FOUND); } static uint32_t WmiTraceMessage(uint64_t loghandle, uint32_t messageflags, void *guid, uint16_t messagenum, ...) { return (STATUS_SUCCESS); } static uint32_t IoWMIRegistrationControl(dobj, action) device_object *dobj; uint32_t action; { return (STATUS_SUCCESS); } /* * This is here just in case the thread returns without calling * PsTerminateSystemThread(). */ static void ntoskrnl_thrfunc(arg) void *arg; { thread_context *thrctx; uint32_t (*tfunc)(void *); void *tctx; uint32_t rval; thrctx = arg; tfunc = thrctx->tc_thrfunc; tctx = thrctx->tc_thrctx; free(thrctx, M_TEMP); rval = MSCALL1(tfunc, tctx); PsTerminateSystemThread(rval); return; /* notreached */ } static ndis_status PsCreateSystemThread(handle, reqaccess, objattrs, phandle, clientid, thrfunc, thrctx) ndis_handle *handle; uint32_t reqaccess; void *objattrs; ndis_handle phandle; void *clientid; void *thrfunc; void *thrctx; { int error; thread_context *tc; struct proc *p; tc = malloc(sizeof(thread_context), M_TEMP, M_NOWAIT); if (tc == NULL) return (STATUS_INSUFFICIENT_RESOURCES); tc->tc_thrctx = thrctx; tc->tc_thrfunc = thrfunc; error = kproc_create(ntoskrnl_thrfunc, tc, &p, RFHIGHPID, NDIS_KSTACK_PAGES, "Windows Kthread %d", ntoskrnl_kth); if (error) { free(tc, M_TEMP); return (STATUS_INSUFFICIENT_RESOURCES); } *handle = p; ntoskrnl_kth++; return (STATUS_SUCCESS); } /* * In Windows, the exit of a thread is an event that you're allowed * to wait on, assuming you've obtained a reference to the thread using * ObReferenceObjectByHandle(). Unfortunately, the only way we can * simulate this behavior is to register each thread we create in a * reference list, and if someone holds a reference to us, we poke * them. */ static ndis_status PsTerminateSystemThread(status) ndis_status status; { struct nt_objref *nr; mtx_lock(&ntoskrnl_dispatchlock); TAILQ_FOREACH(nr, &ntoskrnl_reflist, link) { if (nr->no_obj != curthread->td_proc) continue; nr->no_dh.dh_sigstate = 1; ntoskrnl_waittest(&nr->no_dh, IO_NO_INCREMENT); break; } mtx_unlock(&ntoskrnl_dispatchlock); ntoskrnl_kth--; kproc_exit(0); return (0); /* notreached */ } static uint32_t DbgPrint(char *fmt, ...) { va_list ap; if (bootverbose) { va_start(ap, fmt); vprintf(fmt, ap); va_end(ap); } return (STATUS_SUCCESS); } static void DbgBreakPoint(void) { kdb_enter(KDB_WHY_NDIS, "DbgBreakPoint(): breakpoint"); } static void KeBugCheckEx(code, param1, param2, param3, param4) uint32_t code; u_long param1; u_long param2; u_long param3; u_long param4; { panic("KeBugCheckEx: STOP 0x%X", code); } static void ntoskrnl_timercall(arg) void *arg; { ktimer *timer; struct timeval tv; kdpc *dpc; mtx_lock(&ntoskrnl_dispatchlock); timer = arg; #ifdef NTOSKRNL_DEBUG_TIMERS ntoskrnl_timer_fires++; #endif ntoskrnl_remove_timer(timer); /* * This should never happen, but complain * if it does. */ if (timer->k_header.dh_inserted == FALSE) { mtx_unlock(&ntoskrnl_dispatchlock); printf("NTOS: timer %p fired even though " "it was canceled\n", timer); return; } /* Mark the timer as no longer being on the timer queue. */ timer->k_header.dh_inserted = FALSE; /* Now signal the object and satisfy any waits on it. */ timer->k_header.dh_sigstate = 1; ntoskrnl_waittest(&timer->k_header, IO_NO_INCREMENT); /* * If this is a periodic timer, re-arm it * so it will fire again. We do this before * calling any deferred procedure calls because * it's possible the DPC might cancel the timer, * in which case it would be wrong for us to * re-arm it again afterwards. */ if (timer->k_period) { tv.tv_sec = 0; tv.tv_usec = timer->k_period * 1000; timer->k_header.dh_inserted = TRUE; ntoskrnl_insert_timer(timer, tvtohz(&tv)); #ifdef NTOSKRNL_DEBUG_TIMERS ntoskrnl_timer_reloads++; #endif } dpc = timer->k_dpc; mtx_unlock(&ntoskrnl_dispatchlock); /* If there's a DPC associated with the timer, queue it up. */ if (dpc != NULL) KeInsertQueueDpc(dpc, NULL, NULL); } #ifdef NTOSKRNL_DEBUG_TIMERS static int sysctl_show_timers(SYSCTL_HANDLER_ARGS) { int ret; ret = 0; ntoskrnl_show_timers(); return (sysctl_handle_int(oidp, &ret, 0, req)); } static void ntoskrnl_show_timers() { int i = 0; list_entry *l; mtx_lock_spin(&ntoskrnl_calllock); l = ntoskrnl_calllist.nle_flink; while(l != &ntoskrnl_calllist) { i++; l = l->nle_flink; } mtx_unlock_spin(&ntoskrnl_calllock); printf("\n"); printf("%d timers available (out of %d)\n", i, NTOSKRNL_TIMEOUTS); printf("timer sets: %qu\n", ntoskrnl_timer_sets); printf("timer reloads: %qu\n", ntoskrnl_timer_reloads); printf("timer cancels: %qu\n", ntoskrnl_timer_cancels); printf("timer fires: %qu\n", ntoskrnl_timer_fires); printf("\n"); } #endif /* * Must be called with dispatcher lock held. */ static void ntoskrnl_insert_timer(timer, ticks) ktimer *timer; int ticks; { callout_entry *e; list_entry *l; struct callout *c; /* * Try and allocate a timer. */ mtx_lock_spin(&ntoskrnl_calllock); if (IsListEmpty(&ntoskrnl_calllist)) { mtx_unlock_spin(&ntoskrnl_calllock); #ifdef NTOSKRNL_DEBUG_TIMERS ntoskrnl_show_timers(); #endif panic("out of timers!"); } l = RemoveHeadList(&ntoskrnl_calllist); mtx_unlock_spin(&ntoskrnl_calllock); e = CONTAINING_RECORD(l, callout_entry, ce_list); c = &e->ce_callout; timer->k_callout = c; callout_init(c, 1); callout_reset(c, ticks, ntoskrnl_timercall, timer); } static void ntoskrnl_remove_timer(timer) ktimer *timer; { callout_entry *e; e = (callout_entry *)timer->k_callout; callout_stop(timer->k_callout); mtx_lock_spin(&ntoskrnl_calllock); InsertHeadList((&ntoskrnl_calllist), (&e->ce_list)); mtx_unlock_spin(&ntoskrnl_calllock); } void KeInitializeTimer(timer) ktimer *timer; { if (timer == NULL) return; KeInitializeTimerEx(timer, EVENT_TYPE_NOTIFY); } void KeInitializeTimerEx(timer, type) ktimer *timer; uint32_t type; { if (timer == NULL) return; bzero((char *)timer, sizeof(ktimer)); InitializeListHead((&timer->k_header.dh_waitlisthead)); timer->k_header.dh_sigstate = FALSE; timer->k_header.dh_inserted = FALSE; if (type == EVENT_TYPE_NOTIFY) timer->k_header.dh_type = DISP_TYPE_NOTIFICATION_TIMER; else timer->k_header.dh_type = DISP_TYPE_SYNCHRONIZATION_TIMER; timer->k_header.dh_size = sizeof(ktimer) / sizeof(uint32_t); } /* * DPC subsystem. A Windows Defered Procedure Call has the following * properties: * - It runs at DISPATCH_LEVEL. * - It can have one of 3 importance values that control when it * runs relative to other DPCs in the queue. * - On SMP systems, it can be set to run on a specific processor. * In order to satisfy the last property, we create a DPC thread for * each CPU in the system and bind it to that CPU. Each thread * maintains three queues with different importance levels, which * will be processed in order from lowest to highest. * * In Windows, interrupt handlers run as DPCs. (Not to be confused * with ISRs, which run in interrupt context and can preempt DPCs.) * ISRs are given the highest importance so that they'll take * precedence over timers and other things. */ static void ntoskrnl_dpc_thread(arg) void *arg; { kdpc_queue *kq; kdpc *d; list_entry *l; uint8_t irql; kq = arg; InitializeListHead(&kq->kq_disp); kq->kq_td = curthread; kq->kq_exit = 0; kq->kq_running = FALSE; KeInitializeSpinLock(&kq->kq_lock); KeInitializeEvent(&kq->kq_proc, EVENT_TYPE_SYNC, FALSE); KeInitializeEvent(&kq->kq_done, EVENT_TYPE_SYNC, FALSE); /* * Elevate our priority. DPCs are used to run interrupt * handlers, and they should trigger as soon as possible * once scheduled by an ISR. */ thread_lock(curthread); #ifdef NTOSKRNL_MULTIPLE_DPCS sched_bind(curthread, kq->kq_cpu); #endif sched_prio(curthread, PRI_MIN_KERN); thread_unlock(curthread); while (1) { KeWaitForSingleObject(&kq->kq_proc, 0, 0, TRUE, NULL); KeAcquireSpinLock(&kq->kq_lock, &irql); if (kq->kq_exit) { kq->kq_exit = 0; KeReleaseSpinLock(&kq->kq_lock, irql); break; } kq->kq_running = TRUE; while (!IsListEmpty(&kq->kq_disp)) { l = RemoveHeadList((&kq->kq_disp)); d = CONTAINING_RECORD(l, kdpc, k_dpclistentry); InitializeListHead((&d->k_dpclistentry)); KeReleaseSpinLockFromDpcLevel(&kq->kq_lock); MSCALL4(d->k_deferedfunc, d, d->k_deferredctx, d->k_sysarg1, d->k_sysarg2); KeAcquireSpinLockAtDpcLevel(&kq->kq_lock); } kq->kq_running = FALSE; KeReleaseSpinLock(&kq->kq_lock, irql); KeSetEvent(&kq->kq_done, IO_NO_INCREMENT, FALSE); } kproc_exit(0); return; /* notreached */ } static void ntoskrnl_destroy_dpc_threads(void) { kdpc_queue *kq; kdpc dpc; int i; kq = kq_queues; #ifdef NTOSKRNL_MULTIPLE_DPCS for (i = 0; i < mp_ncpus; i++) { #else for (i = 0; i < 1; i++) { #endif kq += i; kq->kq_exit = 1; KeInitializeDpc(&dpc, NULL, NULL); KeSetTargetProcessorDpc(&dpc, i); KeInsertQueueDpc(&dpc, NULL, NULL); while (kq->kq_exit) tsleep(kq->kq_td->td_proc, PWAIT, "dpcw", hz/10); } } static uint8_t ntoskrnl_insert_dpc(head, dpc) list_entry *head; kdpc *dpc; { list_entry *l; kdpc *d; l = head->nle_flink; while (l != head) { d = CONTAINING_RECORD(l, kdpc, k_dpclistentry); if (d == dpc) return (FALSE); l = l->nle_flink; } if (dpc->k_importance == KDPC_IMPORTANCE_LOW) InsertTailList((head), (&dpc->k_dpclistentry)); else InsertHeadList((head), (&dpc->k_dpclistentry)); return (TRUE); } void KeInitializeDpc(dpc, dpcfunc, dpcctx) kdpc *dpc; void *dpcfunc; void *dpcctx; { if (dpc == NULL) return; dpc->k_deferedfunc = dpcfunc; dpc->k_deferredctx = dpcctx; dpc->k_num = KDPC_CPU_DEFAULT; dpc->k_importance = KDPC_IMPORTANCE_MEDIUM; InitializeListHead((&dpc->k_dpclistentry)); } uint8_t KeInsertQueueDpc(dpc, sysarg1, sysarg2) kdpc *dpc; void *sysarg1; void *sysarg2; { kdpc_queue *kq; uint8_t r; uint8_t irql; if (dpc == NULL) return (FALSE); kq = kq_queues; #ifdef NTOSKRNL_MULTIPLE_DPCS KeRaiseIrql(DISPATCH_LEVEL, &irql); /* * By default, the DPC is queued to run on the same CPU * that scheduled it. */ if (dpc->k_num == KDPC_CPU_DEFAULT) kq += curthread->td_oncpu; else kq += dpc->k_num; KeAcquireSpinLockAtDpcLevel(&kq->kq_lock); #else KeAcquireSpinLock(&kq->kq_lock, &irql); #endif r = ntoskrnl_insert_dpc(&kq->kq_disp, dpc); if (r == TRUE) { dpc->k_sysarg1 = sysarg1; dpc->k_sysarg2 = sysarg2; } KeReleaseSpinLock(&kq->kq_lock, irql); if (r == FALSE) return (r); KeSetEvent(&kq->kq_proc, IO_NO_INCREMENT, FALSE); return (r); } uint8_t KeRemoveQueueDpc(dpc) kdpc *dpc; { kdpc_queue *kq; uint8_t irql; if (dpc == NULL) return (FALSE); #ifdef NTOSKRNL_MULTIPLE_DPCS KeRaiseIrql(DISPATCH_LEVEL, &irql); kq = kq_queues + dpc->k_num; KeAcquireSpinLockAtDpcLevel(&kq->kq_lock); #else kq = kq_queues; KeAcquireSpinLock(&kq->kq_lock, &irql); #endif if (dpc->k_dpclistentry.nle_flink == &dpc->k_dpclistentry) { KeReleaseSpinLockFromDpcLevel(&kq->kq_lock); KeLowerIrql(irql); return (FALSE); } RemoveEntryList((&dpc->k_dpclistentry)); InitializeListHead((&dpc->k_dpclistentry)); KeReleaseSpinLock(&kq->kq_lock, irql); return (TRUE); } void KeSetImportanceDpc(dpc, imp) kdpc *dpc; uint32_t imp; { if (imp != KDPC_IMPORTANCE_LOW && imp != KDPC_IMPORTANCE_MEDIUM && imp != KDPC_IMPORTANCE_HIGH) return; dpc->k_importance = (uint8_t)imp; } void KeSetTargetProcessorDpc(kdpc *dpc, uint8_t cpu) { if (cpu > mp_ncpus) return; dpc->k_num = cpu; } void KeFlushQueuedDpcs(void) { kdpc_queue *kq; int i; /* * Poke each DPC queue and wait * for them to drain. */ #ifdef NTOSKRNL_MULTIPLE_DPCS for (i = 0; i < mp_ncpus; i++) { #else for (i = 0; i < 1; i++) { #endif kq = kq_queues + i; KeSetEvent(&kq->kq_proc, IO_NO_INCREMENT, FALSE); KeWaitForSingleObject(&kq->kq_done, 0, 0, TRUE, NULL); } } uint32_t KeGetCurrentProcessorNumber(void) { return ((uint32_t)curthread->td_oncpu); } uint8_t KeSetTimerEx(timer, duetime, period, dpc) ktimer *timer; int64_t duetime; uint32_t period; kdpc *dpc; { struct timeval tv; uint64_t curtime; uint8_t pending; if (timer == NULL) return (FALSE); mtx_lock(&ntoskrnl_dispatchlock); if (timer->k_header.dh_inserted == TRUE) { ntoskrnl_remove_timer(timer); #ifdef NTOSKRNL_DEBUG_TIMERS ntoskrnl_timer_cancels++; #endif timer->k_header.dh_inserted = FALSE; pending = TRUE; } else pending = FALSE; timer->k_duetime = duetime; timer->k_period = period; timer->k_header.dh_sigstate = FALSE; timer->k_dpc = dpc; if (duetime < 0) { tv.tv_sec = - (duetime) / 10000000; tv.tv_usec = (- (duetime) / 10) - (tv.tv_sec * 1000000); } else { ntoskrnl_time(&curtime); if (duetime < curtime) tv.tv_sec = tv.tv_usec = 0; else { tv.tv_sec = ((duetime) - curtime) / 10000000; tv.tv_usec = ((duetime) - curtime) / 10 - (tv.tv_sec * 1000000); } } timer->k_header.dh_inserted = TRUE; ntoskrnl_insert_timer(timer, tvtohz(&tv)); #ifdef NTOSKRNL_DEBUG_TIMERS ntoskrnl_timer_sets++; #endif mtx_unlock(&ntoskrnl_dispatchlock); return (pending); } uint8_t KeSetTimer(timer, duetime, dpc) ktimer *timer; int64_t duetime; kdpc *dpc; { return (KeSetTimerEx(timer, duetime, 0, dpc)); } /* * The Windows DDK documentation seems to say that cancelling * a timer that has a DPC will result in the DPC also being * cancelled, but this isn't really the case. */ uint8_t KeCancelTimer(timer) ktimer *timer; { uint8_t pending; if (timer == NULL) return (FALSE); mtx_lock(&ntoskrnl_dispatchlock); pending = timer->k_header.dh_inserted; if (timer->k_header.dh_inserted == TRUE) { timer->k_header.dh_inserted = FALSE; ntoskrnl_remove_timer(timer); #ifdef NTOSKRNL_DEBUG_TIMERS ntoskrnl_timer_cancels++; #endif } mtx_unlock(&ntoskrnl_dispatchlock); return (pending); } uint8_t KeReadStateTimer(timer) ktimer *timer; { return (timer->k_header.dh_sigstate); } static int32_t KeDelayExecutionThread(uint8_t wait_mode, uint8_t alertable, int64_t *interval) { ktimer timer; if (wait_mode != 0) panic("invalid wait_mode %d", wait_mode); KeInitializeTimer(&timer); KeSetTimer(&timer, *interval, NULL); KeWaitForSingleObject(&timer, 0, 0, alertable, NULL); return STATUS_SUCCESS; } static uint64_t KeQueryInterruptTime(void) { int ticks; struct timeval tv; getmicrouptime(&tv); ticks = tvtohz(&tv); return ticks * howmany(10000000, hz); } static struct thread * KeGetCurrentThread(void) { return curthread; } static int32_t KeSetPriorityThread(td, pri) struct thread *td; int32_t pri; { int32_t old; if (td == NULL) return LOW_REALTIME_PRIORITY; if (td->td_priority <= PRI_MIN_KERN) old = HIGH_PRIORITY; else if (td->td_priority >= PRI_MAX_KERN) old = LOW_PRIORITY; else old = LOW_REALTIME_PRIORITY; thread_lock(td); if (pri == HIGH_PRIORITY) sched_prio(td, PRI_MIN_KERN); if (pri == LOW_REALTIME_PRIORITY) sched_prio(td, PRI_MIN_KERN + (PRI_MAX_KERN - PRI_MIN_KERN) / 2); if (pri == LOW_PRIORITY) sched_prio(td, PRI_MAX_KERN); thread_unlock(td); return old; } static void dummy() { printf("ntoskrnl dummy called...\n"); } image_patch_table ntoskrnl_functbl[] = { IMPORT_SFUNC(RtlZeroMemory, 2), IMPORT_SFUNC(RtlSecureZeroMemory, 2), IMPORT_SFUNC(RtlFillMemory, 3), IMPORT_SFUNC(RtlMoveMemory, 3), IMPORT_SFUNC(RtlCharToInteger, 3), IMPORT_SFUNC(RtlCopyMemory, 3), IMPORT_SFUNC(RtlCopyString, 2), IMPORT_SFUNC(RtlCompareMemory, 3), IMPORT_SFUNC(RtlEqualUnicodeString, 3), IMPORT_SFUNC(RtlCopyUnicodeString, 2), IMPORT_SFUNC(RtlUnicodeStringToAnsiString, 3), IMPORT_SFUNC(RtlAnsiStringToUnicodeString, 3), IMPORT_SFUNC(RtlInitAnsiString, 2), IMPORT_SFUNC_MAP(RtlInitString, RtlInitAnsiString, 2), IMPORT_SFUNC(RtlInitUnicodeString, 2), IMPORT_SFUNC(RtlFreeAnsiString, 1), IMPORT_SFUNC(RtlFreeUnicodeString, 1), IMPORT_SFUNC(RtlUnicodeStringToInteger, 3), IMPORT_CFUNC(sprintf, 0), IMPORT_CFUNC(vsprintf, 0), IMPORT_CFUNC_MAP(_snprintf, snprintf, 0), IMPORT_CFUNC_MAP(_vsnprintf, vsnprintf, 0), IMPORT_CFUNC(DbgPrint, 0), IMPORT_SFUNC(DbgBreakPoint, 0), IMPORT_SFUNC(KeBugCheckEx, 5), IMPORT_CFUNC(strncmp, 0), IMPORT_CFUNC(strcmp, 0), IMPORT_CFUNC_MAP(stricmp, strcasecmp, 0), IMPORT_CFUNC(strncpy, 0), IMPORT_CFUNC(strcpy, 0), IMPORT_CFUNC(strlen, 0), IMPORT_CFUNC_MAP(toupper, ntoskrnl_toupper, 0), IMPORT_CFUNC_MAP(tolower, ntoskrnl_tolower, 0), IMPORT_CFUNC_MAP(strstr, ntoskrnl_strstr, 0), IMPORT_CFUNC_MAP(strncat, ntoskrnl_strncat, 0), IMPORT_CFUNC_MAP(strchr, index, 0), IMPORT_CFUNC_MAP(strrchr, rindex, 0), IMPORT_CFUNC(memcpy, 0), IMPORT_CFUNC_MAP(memmove, ntoskrnl_memmove, 0), IMPORT_CFUNC_MAP(memset, ntoskrnl_memset, 0), IMPORT_CFUNC_MAP(memchr, ntoskrnl_memchr, 0), IMPORT_SFUNC(IoAllocateDriverObjectExtension, 4), IMPORT_SFUNC(IoGetDriverObjectExtension, 2), IMPORT_FFUNC(IofCallDriver, 2), IMPORT_FFUNC(IofCompleteRequest, 2), IMPORT_SFUNC(IoAcquireCancelSpinLock, 1), IMPORT_SFUNC(IoReleaseCancelSpinLock, 1), IMPORT_SFUNC(IoCancelIrp, 1), IMPORT_SFUNC(IoConnectInterrupt, 11), IMPORT_SFUNC(IoDisconnectInterrupt, 1), IMPORT_SFUNC(IoCreateDevice, 7), IMPORT_SFUNC(IoDeleteDevice, 1), IMPORT_SFUNC(IoGetAttachedDevice, 1), IMPORT_SFUNC(IoAttachDeviceToDeviceStack, 2), IMPORT_SFUNC(IoDetachDevice, 1), IMPORT_SFUNC(IoBuildSynchronousFsdRequest, 7), IMPORT_SFUNC(IoBuildAsynchronousFsdRequest, 6), IMPORT_SFUNC(IoBuildDeviceIoControlRequest, 9), IMPORT_SFUNC(IoAllocateIrp, 2), IMPORT_SFUNC(IoReuseIrp, 2), IMPORT_SFUNC(IoMakeAssociatedIrp, 2), IMPORT_SFUNC(IoFreeIrp, 1), IMPORT_SFUNC(IoInitializeIrp, 3), IMPORT_SFUNC(KeAcquireInterruptSpinLock, 1), IMPORT_SFUNC(KeReleaseInterruptSpinLock, 2), IMPORT_SFUNC(KeSynchronizeExecution, 3), IMPORT_SFUNC(KeWaitForSingleObject, 5), IMPORT_SFUNC(KeWaitForMultipleObjects, 8), IMPORT_SFUNC(_allmul, 4), IMPORT_SFUNC(_alldiv, 4), IMPORT_SFUNC(_allrem, 4), IMPORT_RFUNC(_allshr, 0), IMPORT_RFUNC(_allshl, 0), IMPORT_SFUNC(_aullmul, 4), IMPORT_SFUNC(_aulldiv, 4), IMPORT_SFUNC(_aullrem, 4), IMPORT_RFUNC(_aullshr, 0), IMPORT_RFUNC(_aullshl, 0), IMPORT_CFUNC(atoi, 0), IMPORT_CFUNC(atol, 0), IMPORT_CFUNC(rand, 0), IMPORT_CFUNC(srand, 0), IMPORT_SFUNC(WRITE_REGISTER_USHORT, 2), IMPORT_SFUNC(READ_REGISTER_USHORT, 1), IMPORT_SFUNC(WRITE_REGISTER_ULONG, 2), IMPORT_SFUNC(READ_REGISTER_ULONG, 1), IMPORT_SFUNC(READ_REGISTER_UCHAR, 1), IMPORT_SFUNC(WRITE_REGISTER_UCHAR, 2), IMPORT_SFUNC(ExInitializePagedLookasideList, 7), IMPORT_SFUNC(ExDeletePagedLookasideList, 1), IMPORT_SFUNC(ExInitializeNPagedLookasideList, 7), IMPORT_SFUNC(ExDeleteNPagedLookasideList, 1), IMPORT_FFUNC(InterlockedPopEntrySList, 1), IMPORT_FFUNC(InitializeSListHead, 1), IMPORT_FFUNC(InterlockedPushEntrySList, 2), IMPORT_SFUNC(ExQueryDepthSList, 1), IMPORT_FFUNC_MAP(ExpInterlockedPopEntrySList, InterlockedPopEntrySList, 1), IMPORT_FFUNC_MAP(ExpInterlockedPushEntrySList, InterlockedPushEntrySList, 2), IMPORT_FFUNC(ExInterlockedPopEntrySList, 2), IMPORT_FFUNC(ExInterlockedPushEntrySList, 3), IMPORT_SFUNC(ExAllocatePoolWithTag, 3), IMPORT_SFUNC(ExFreePoolWithTag, 2), IMPORT_SFUNC(ExFreePool, 1), #ifdef __i386__ IMPORT_FFUNC(KefAcquireSpinLockAtDpcLevel, 1), IMPORT_FFUNC(KefReleaseSpinLockFromDpcLevel,1), IMPORT_FFUNC(KeAcquireSpinLockRaiseToDpc, 1), #else /* * For AMD64, we can get away with just mapping * KeAcquireSpinLockRaiseToDpc() directly to KfAcquireSpinLock() * because the calling conventions end up being the same. * On i386, we have to be careful because KfAcquireSpinLock() * is _fastcall but KeAcquireSpinLockRaiseToDpc() isn't. */ IMPORT_SFUNC(KeAcquireSpinLockAtDpcLevel, 1), IMPORT_SFUNC(KeReleaseSpinLockFromDpcLevel, 1), IMPORT_SFUNC_MAP(KeAcquireSpinLockRaiseToDpc, KfAcquireSpinLock, 1), #endif IMPORT_SFUNC_MAP(KeReleaseSpinLock, KfReleaseSpinLock, 1), IMPORT_FFUNC(InterlockedIncrement, 1), IMPORT_FFUNC(InterlockedDecrement, 1), IMPORT_FFUNC(InterlockedExchange, 2), IMPORT_FFUNC(ExInterlockedAddLargeStatistic, 2), IMPORT_SFUNC(IoAllocateMdl, 5), IMPORT_SFUNC(IoFreeMdl, 1), IMPORT_SFUNC(MmAllocateContiguousMemory, 2 + 1), IMPORT_SFUNC(MmAllocateContiguousMemorySpecifyCache, 5 + 3), IMPORT_SFUNC(MmFreeContiguousMemory, 1), IMPORT_SFUNC(MmFreeContiguousMemorySpecifyCache, 3), IMPORT_SFUNC(MmSizeOfMdl, 1), IMPORT_SFUNC(MmMapLockedPages, 2), IMPORT_SFUNC(MmMapLockedPagesSpecifyCache, 6), IMPORT_SFUNC(MmUnmapLockedPages, 2), IMPORT_SFUNC(MmBuildMdlForNonPagedPool, 1), IMPORT_SFUNC(MmGetPhysicalAddress, 1), IMPORT_SFUNC(MmGetSystemRoutineAddress, 1), IMPORT_SFUNC(MmIsAddressValid, 1), IMPORT_SFUNC(MmMapIoSpace, 3 + 1), IMPORT_SFUNC(MmUnmapIoSpace, 2), IMPORT_SFUNC(KeInitializeSpinLock, 1), IMPORT_SFUNC(IoIsWdmVersionAvailable, 2), IMPORT_SFUNC(IoOpenDeviceRegistryKey, 4), IMPORT_SFUNC(IoGetDeviceObjectPointer, 4), IMPORT_SFUNC(IoGetDeviceProperty, 5), IMPORT_SFUNC(IoAllocateWorkItem, 1), IMPORT_SFUNC(IoFreeWorkItem, 1), IMPORT_SFUNC(IoQueueWorkItem, 4), IMPORT_SFUNC(ExQueueWorkItem, 2), IMPORT_SFUNC(ntoskrnl_workitem, 2), IMPORT_SFUNC(KeInitializeMutex, 2), IMPORT_SFUNC(KeReleaseMutex, 2), IMPORT_SFUNC(KeReadStateMutex, 1), IMPORT_SFUNC(KeInitializeEvent, 3), IMPORT_SFUNC(KeSetEvent, 3), IMPORT_SFUNC(KeResetEvent, 1), IMPORT_SFUNC(KeClearEvent, 1), IMPORT_SFUNC(KeReadStateEvent, 1), IMPORT_SFUNC(KeInitializeTimer, 1), IMPORT_SFUNC(KeInitializeTimerEx, 2), IMPORT_SFUNC(KeSetTimer, 3), IMPORT_SFUNC(KeSetTimerEx, 4), IMPORT_SFUNC(KeCancelTimer, 1), IMPORT_SFUNC(KeReadStateTimer, 1), IMPORT_SFUNC(KeInitializeDpc, 3), IMPORT_SFUNC(KeInsertQueueDpc, 3), IMPORT_SFUNC(KeRemoveQueueDpc, 1), IMPORT_SFUNC(KeSetImportanceDpc, 2), IMPORT_SFUNC(KeSetTargetProcessorDpc, 2), IMPORT_SFUNC(KeFlushQueuedDpcs, 0), IMPORT_SFUNC(KeGetCurrentProcessorNumber, 1), IMPORT_SFUNC(ObReferenceObjectByHandle, 6), IMPORT_FFUNC(ObfDereferenceObject, 1), IMPORT_SFUNC(ZwClose, 1), IMPORT_SFUNC(PsCreateSystemThread, 7), IMPORT_SFUNC(PsTerminateSystemThread, 1), IMPORT_SFUNC(IoWMIRegistrationControl, 2), IMPORT_SFUNC(WmiQueryTraceInformation, 5), IMPORT_CFUNC(WmiTraceMessage, 0), IMPORT_SFUNC(KeQuerySystemTime, 1), IMPORT_CFUNC(KeTickCount, 0), IMPORT_SFUNC(KeDelayExecutionThread, 3), IMPORT_SFUNC(KeQueryInterruptTime, 0), IMPORT_SFUNC(KeGetCurrentThread, 0), IMPORT_SFUNC(KeSetPriorityThread, 2), /* * This last entry is a catch-all for any function we haven't * implemented yet. The PE import list patching routine will * use it for any function that doesn't have an explicit match * in this table. */ { NULL, (FUNC)dummy, NULL, 0, WINDRV_WRAP_STDCALL }, /* End of list. */ { NULL, NULL, NULL } }; Index: head/sys/dev/oce/oce_mbox.c =================================================================== --- head/sys/dev/oce/oce_mbox.c (revision 356096) +++ head/sys/dev/oce/oce_mbox.c (revision 356097) @@ -1,2371 +1,2370 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (C) 2013 Emulex * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions are met: * * 1. Redistributions of source code must retain the above copyright notice, * this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * 3. Neither the name of the Emulex Corporation nor the names of its * contributors may be used to endorse or promote products derived from * this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" * AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE * LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * Contact Information: * freebsd-drivers@emulex.com * * Emulex * 3333 Susan Street * Costa Mesa, CA 92626 */ /* $FreeBSD$ */ #include "oce_if.h" int oce_wait_ready(POCE_SOFTC sc) { #define SLIPORT_READY_TIMEOUT 30000 uint32_t sliport_status, i; if (!IS_XE201(sc)) return (-1); for (i = 0; i < SLIPORT_READY_TIMEOUT; i++) { sliport_status = OCE_READ_REG32(sc, db, SLIPORT_STATUS_OFFSET); if (sliport_status & SLIPORT_STATUS_RDY_MASK) return 0; if (sliport_status & SLIPORT_STATUS_ERR_MASK && !(sliport_status & SLIPORT_STATUS_RN_MASK)) { device_printf(sc->dev, "Error detected in the card\n"); return EIO; } DELAY(1000); } device_printf(sc->dev, "Firmware wait timed out\n"); return (-1); } /** * @brief Reset (firmware) common function * @param sc software handle to the device * @returns 0 on success, ETIMEDOUT on failure */ int oce_reset_fun(POCE_SOFTC sc) { struct oce_mbx *mbx; struct oce_bmbx *mb; struct ioctl_common_function_reset *fwcmd; int rc = 0; if (IS_XE201(sc)) { OCE_WRITE_REG32(sc, db, SLIPORT_CONTROL_OFFSET, SLI_PORT_CONTROL_IP_MASK); rc = oce_wait_ready(sc); if (rc) { device_printf(sc->dev, "Firmware reset Failed\n"); } return rc; } mb = OCE_DMAPTR(&sc->bsmbx, struct oce_bmbx); mbx = &mb->mbx; bzero(mbx, sizeof(struct oce_mbx)); fwcmd = (struct ioctl_common_function_reset *)&mbx->payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_FUNCTION_RESET, 10, /* MBX_TIMEOUT_SEC */ sizeof(struct ioctl_common_function_reset), OCE_MBX_VER_V0); mbx->u0.s.embedded = 1; mbx->payload_length = sizeof(struct ioctl_common_function_reset); rc = oce_mbox_dispatch(sc, 2); return rc; } /** * @brief This funtions tells firmware we are * done with commands. * @param sc software handle to the device * @returns 0 on success, ETIMEDOUT on failure */ int oce_fw_clean(POCE_SOFTC sc) { struct oce_bmbx *mbx; uint8_t *ptr; int ret = 0; mbx = OCE_DMAPTR(&sc->bsmbx, struct oce_bmbx); ptr = (uint8_t *) &mbx->mbx; /* Endian Signature */ *ptr++ = 0xff; *ptr++ = 0xaa; *ptr++ = 0xbb; *ptr++ = 0xff; *ptr++ = 0xff; *ptr++ = 0xcc; *ptr++ = 0xdd; *ptr = 0xff; ret = oce_mbox_dispatch(sc, 2); return ret; } /** * @brief Mailbox wait * @param sc software handle to the device * @param tmo_sec timeout in seconds */ static int oce_mbox_wait(POCE_SOFTC sc, uint32_t tmo_sec) { tmo_sec *= 10000; pd_mpu_mbox_db_t mbox_db; for (;;) { if (tmo_sec != 0) { if (--tmo_sec == 0) break; } mbox_db.dw0 = OCE_READ_REG32(sc, db, PD_MPU_MBOX_DB); if (mbox_db.bits.ready) return 0; DELAY(100); } device_printf(sc->dev, "Mailbox timed out\n"); return ETIMEDOUT; } /** * @brief Mailbox dispatch * @param sc software handle to the device * @param tmo_sec timeout in seconds */ int oce_mbox_dispatch(POCE_SOFTC sc, uint32_t tmo_sec) { pd_mpu_mbox_db_t mbox_db; uint32_t pa; int rc; oce_dma_sync(&sc->bsmbx, BUS_DMASYNC_PREWRITE); pa = (uint32_t) ((uint64_t) sc->bsmbx.paddr >> 34); bzero(&mbox_db, sizeof(pd_mpu_mbox_db_t)); mbox_db.bits.ready = 0; mbox_db.bits.hi = 1; mbox_db.bits.address = pa; rc = oce_mbox_wait(sc, tmo_sec); if (rc == 0) { OCE_WRITE_REG32(sc, db, PD_MPU_MBOX_DB, mbox_db.dw0); pa = (uint32_t) ((uint64_t) sc->bsmbx.paddr >> 4) & 0x3fffffff; mbox_db.bits.ready = 0; mbox_db.bits.hi = 0; mbox_db.bits.address = pa; rc = oce_mbox_wait(sc, tmo_sec); if (rc == 0) { OCE_WRITE_REG32(sc, db, PD_MPU_MBOX_DB, mbox_db.dw0); rc = oce_mbox_wait(sc, tmo_sec); oce_dma_sync(&sc->bsmbx, BUS_DMASYNC_POSTWRITE); } } return rc; } /** * @brief Mailbox common request header initialization * @param hdr mailbox header * @param dom domain * @param port port * @param subsys subsystem * @param opcode opcode * @param timeout timeout * @param pyld_len payload length */ void mbx_common_req_hdr_init(struct mbx_hdr *hdr, uint8_t dom, uint8_t port, uint8_t subsys, uint8_t opcode, uint32_t timeout, uint32_t pyld_len, uint8_t version) { hdr->u0.req.opcode = opcode; hdr->u0.req.subsystem = subsys; hdr->u0.req.port_number = port; hdr->u0.req.domain = dom; hdr->u0.req.timeout = timeout; hdr->u0.req.request_length = pyld_len - sizeof(struct mbx_hdr); hdr->u0.req.version = version; } /** * @brief Function to initialize the hw with host endian information * @param sc software handle to the device * @returns 0 on success, ETIMEDOUT on failure */ int oce_mbox_init(POCE_SOFTC sc) { struct oce_bmbx *mbx; uint8_t *ptr; int ret = 0; if (sc->flags & OCE_FLAGS_MBOX_ENDIAN_RQD) { mbx = OCE_DMAPTR(&sc->bsmbx, struct oce_bmbx); ptr = (uint8_t *) &mbx->mbx; /* Endian Signature */ *ptr++ = 0xff; *ptr++ = 0x12; *ptr++ = 0x34; *ptr++ = 0xff; *ptr++ = 0xff; *ptr++ = 0x56; *ptr++ = 0x78; *ptr = 0xff; ret = oce_mbox_dispatch(sc, 0); } return ret; } /** * @brief Function to get the firmware version * @param sc software handle to the device * @returns 0 on success, EIO on failure */ int oce_get_fw_version(POCE_SOFTC sc) { struct oce_mbx mbx; struct mbx_get_common_fw_version *fwcmd; int ret = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_get_common_fw_version *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_GET_FW_VERSION, MBX_TIMEOUT_SEC, sizeof(struct mbx_get_common_fw_version), OCE_MBX_VER_V0); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_get_common_fw_version); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); ret = oce_mbox_post(sc, &mbx, NULL); if (!ret) ret = fwcmd->hdr.u0.rsp.status; if (ret) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, ret, fwcmd->hdr.u0.rsp.additional_status); goto error; } bcopy(fwcmd->params.rsp.fw_ver_str, sc->fw_version, 32); error: return ret; } /** * @brief Firmware will send gracious notifications during * attach only after sending first mcc commnad. We * use MCC queue only for getting async and mailbox * for sending cmds. So to get gracious notifications * atleast send one dummy command on mcc. */ int oce_first_mcc_cmd(POCE_SOFTC sc) { struct oce_mbx *mbx; struct oce_mq *mq = sc->mq; struct mbx_get_common_fw_version *fwcmd; uint32_t reg_value; mbx = RING_GET_PRODUCER_ITEM_VA(mq->ring, struct oce_mbx); bzero(mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_get_common_fw_version *)&mbx->payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_GET_FW_VERSION, MBX_TIMEOUT_SEC, sizeof(struct mbx_get_common_fw_version), OCE_MBX_VER_V0); mbx->u0.s.embedded = 1; mbx->payload_length = sizeof(struct mbx_get_common_fw_version); bus_dmamap_sync(mq->ring->dma.tag, mq->ring->dma.map, BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE); RING_PUT(mq->ring, 1); reg_value = (1 << 16) | mq->mq_id; OCE_WRITE_REG32(sc, db, PD_MQ_DB, reg_value); return 0; } /** * @brief Function to post a MBX to the mbox * @param sc software handle to the device * @param mbx pointer to the MBX to send * @param mbxctx pointer to the mbx context structure * @returns 0 on success, error on failure */ int oce_mbox_post(POCE_SOFTC sc, struct oce_mbx *mbx, struct oce_mbx_ctx *mbxctx) { struct oce_mbx *mb_mbx = NULL; struct oce_mq_cqe *mb_cqe = NULL; struct oce_bmbx *mb = NULL; int rc = 0; uint32_t tmo = 0; uint32_t cstatus = 0; uint32_t xstatus = 0; LOCK(&sc->bmbx_lock); mb = OCE_DMAPTR(&sc->bsmbx, struct oce_bmbx); mb_mbx = &mb->mbx; /* get the tmo */ tmo = mbx->tag[0]; mbx->tag[0] = 0; /* copy mbx into mbox */ bcopy(mbx, mb_mbx, sizeof(struct oce_mbx)); /* now dispatch */ rc = oce_mbox_dispatch(sc, tmo); if (rc == 0) { /* * the command completed successfully. Now get the * completion queue entry */ mb_cqe = &mb->cqe; DW_SWAP(u32ptr(&mb_cqe->u0.dw[0]), sizeof(struct oce_mq_cqe)); /* copy mbox mbx back */ bcopy(mb_mbx, mbx, sizeof(struct oce_mbx)); /* pick up the mailbox status */ cstatus = mb_cqe->u0.s.completion_status; xstatus = mb_cqe->u0.s.extended_status; /* * store the mbx context in the cqe tag section so that * the upper layer handling the cqe can associate the mbx * with the response */ if (cstatus == 0 && mbxctx) { /* save context */ mbxctx->mbx = mb_mbx; bcopy(&mbxctx, mb_cqe->u0.s.mq_tag, sizeof(struct oce_mbx_ctx *)); } } UNLOCK(&sc->bmbx_lock); return rc; } /** * @brief Function to read the mac address associated with an interface * @param sc software handle to the device * @param if_id interface id to read the address from * @param perm set to 1 if reading the factory mac address. * In this case if_id is ignored * @param type type of the mac address, whether network or storage * @param[out] mac [OUTPUT] pointer to a buffer containing the * mac address when the command succeeds. * @returns 0 on success, EIO on failure */ int oce_read_mac_addr(POCE_SOFTC sc, uint32_t if_id, uint8_t perm, uint8_t type, struct mac_address_format *mac) { struct oce_mbx mbx; struct mbx_query_common_iface_mac *fwcmd; int ret = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_query_common_iface_mac *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_QUERY_IFACE_MAC, MBX_TIMEOUT_SEC, sizeof(struct mbx_query_common_iface_mac), OCE_MBX_VER_V0); fwcmd->params.req.permanent = perm; if (!perm) fwcmd->params.req.if_id = (uint16_t) if_id; else fwcmd->params.req.if_id = 0; fwcmd->params.req.type = type; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_query_common_iface_mac); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); ret = oce_mbox_post(sc, &mbx, NULL); if (!ret) ret = fwcmd->hdr.u0.rsp.status; if (ret) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, ret, fwcmd->hdr.u0.rsp.additional_status); goto error; } /* copy the mac addres in the output parameter */ mac->size_of_struct = fwcmd->params.rsp.mac.size_of_struct; bcopy(&fwcmd->params.rsp.mac.mac_addr[0], &mac->mac_addr[0], mac->size_of_struct); error: return ret; } /** * @brief Function to query the fw attributes from the hw * @param sc software handle to the device * @returns 0 on success, EIO on failure */ int oce_get_fw_config(POCE_SOFTC sc) { struct oce_mbx mbx; struct mbx_common_query_fw_config *fwcmd; int ret = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_common_query_fw_config *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_QUERY_FIRMWARE_CONFIG, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_query_fw_config), OCE_MBX_VER_V0); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_common_query_fw_config); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); ret = oce_mbox_post(sc, &mbx, NULL); if (!ret) ret = fwcmd->hdr.u0.rsp.status; if (ret) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, ret, fwcmd->hdr.u0.rsp.additional_status); goto error; } DW_SWAP(u32ptr(fwcmd), sizeof(struct mbx_common_query_fw_config)); sc->config_number = HOST_32(fwcmd->params.rsp.config_number); sc->asic_revision = HOST_32(fwcmd->params.rsp.asic_revision); sc->port_id = HOST_32(fwcmd->params.rsp.port_id); sc->function_mode = HOST_32(fwcmd->params.rsp.function_mode); if ((sc->function_mode & (ULP_NIC_MODE | ULP_RDMA_MODE)) == (ULP_NIC_MODE | ULP_RDMA_MODE)) { sc->rdma_flags = OCE_RDMA_FLAG_SUPPORTED; } sc->function_caps = HOST_32(fwcmd->params.rsp.function_caps); if (fwcmd->params.rsp.ulp[0].ulp_mode & ULP_NIC_MODE) { sc->max_tx_rings = HOST_32(fwcmd->params.rsp.ulp[0].nic_wq_tot); sc->max_rx_rings = HOST_32(fwcmd->params.rsp.ulp[0].lro_rqid_tot); } else { sc->max_tx_rings = HOST_32(fwcmd->params.rsp.ulp[1].nic_wq_tot); sc->max_rx_rings = HOST_32(fwcmd->params.rsp.ulp[1].lro_rqid_tot); } error: return ret; } /** * * @brief function to create a device interface * @param sc software handle to the device * @param cap_flags capability flags * @param en_flags enable capability flags * @param vlan_tag optional vlan tag to associate with the if * @param mac_addr pointer to a buffer containing the mac address * @param[out] if_id [OUTPUT] pointer to an integer to hold the ID of the interface created * @returns 0 on success, EIO on failure */ int oce_if_create(POCE_SOFTC sc, uint32_t cap_flags, uint32_t en_flags, uint16_t vlan_tag, uint8_t *mac_addr, uint32_t *if_id) { struct oce_mbx mbx; struct mbx_create_common_iface *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_create_common_iface *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_CREATE_IFACE, MBX_TIMEOUT_SEC, sizeof(struct mbx_create_common_iface), OCE_MBX_VER_V0); DW_SWAP(u32ptr(&fwcmd->hdr), sizeof(struct mbx_hdr)); fwcmd->params.req.version = 0; fwcmd->params.req.cap_flags = LE_32(cap_flags); fwcmd->params.req.enable_flags = LE_32(en_flags); if (mac_addr != NULL) { bcopy(mac_addr, &fwcmd->params.req.mac_addr[0], 6); fwcmd->params.req.vlan_tag.u0.normal.vtag = LE_16(vlan_tag); fwcmd->params.req.mac_invalid = 0; } else { fwcmd->params.req.mac_invalid = 1; } mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_create_common_iface); DW_SWAP(u32ptr(&mbx), OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } *if_id = HOST_32(fwcmd->params.rsp.if_id); if (mac_addr != NULL) sc->pmac_id = HOST_32(fwcmd->params.rsp.pmac_id); error: return rc; } /** * @brief Function to delete an interface * @param sc software handle to the device * @param if_id ID of the interface to delete * @returns 0 on success, EIO on failure */ int oce_if_del(POCE_SOFTC sc, uint32_t if_id) { struct oce_mbx mbx; struct mbx_destroy_common_iface *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_destroy_common_iface *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_DESTROY_IFACE, MBX_TIMEOUT_SEC, sizeof(struct mbx_destroy_common_iface), OCE_MBX_VER_V0); fwcmd->params.req.if_id = if_id; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_destroy_common_iface); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } /** * @brief Function to send the mbx command to configure vlan * @param sc software handle to the device * @param if_id interface identifier index * @param vtag_arr array of vlan tags * @param vtag_cnt number of elements in array * @param untagged boolean TRUE/FLASE * @param enable_promisc flag to enable/disable VLAN promiscuous mode * @returns 0 on success, EIO on failure */ int oce_config_vlan(POCE_SOFTC sc, uint32_t if_id, struct normal_vlan *vtag_arr, uint8_t vtag_cnt, uint32_t untagged, uint32_t enable_promisc) { struct oce_mbx mbx; struct mbx_common_config_vlan *fwcmd; int rc = 0; if (sc->vlans_added > sc->max_vlans) goto vlan_promisc; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_common_config_vlan *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_CONFIG_IFACE_VLAN, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_config_vlan), OCE_MBX_VER_V0); fwcmd->params.req.if_id = (uint8_t) if_id; fwcmd->params.req.promisc = (uint8_t) enable_promisc; fwcmd->params.req.untagged = (uint8_t) untagged; fwcmd->params.req.num_vlans = vtag_cnt; if (!enable_promisc) { bcopy(vtag_arr, fwcmd->params.req.tags.normal_vlans, vtag_cnt * sizeof(struct normal_vlan)); } mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_common_config_vlan); DW_SWAP(u32ptr(&mbx), (OCE_BMBX_RHDR_SZ + mbx.payload_length)); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto done; vlan_promisc: /* Enable Vlan Promis */ oce_rxf_set_promiscuous(sc, (1 << 1)); device_printf(sc->dev,"Enabling Vlan Promisc Mode\n"); done: return rc; } /** * @brief Function to set flow control capability in the hardware * @param sc software handle to the device * @param flow_control flow control flags to set * @returns 0 on success, EIO on failure */ int oce_set_flow_control(POCE_SOFTC sc, uint32_t flow_control) { struct oce_mbx mbx; struct mbx_common_get_set_flow_control *fwcmd = (struct mbx_common_get_set_flow_control *)&mbx.payload; int rc; bzero(&mbx, sizeof(struct oce_mbx)); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_SET_FLOW_CONTROL, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_get_set_flow_control), OCE_MBX_VER_V0); if (flow_control & OCE_FC_TX) fwcmd->tx_flow_control = 1; if (flow_control & OCE_FC_RX) fwcmd->rx_flow_control = 1; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_common_get_set_flow_control); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } /** * @brief Initialize the RSS CPU indirection table * * The table is used to choose the queue to place the incomming packets. * Incomming packets are hashed. The lowest bits in the hash result * are used as the index into the CPU indirection table. * Each entry in the table contains the RSS CPU-ID returned by the NIC * create. Based on the CPU ID, the receive completion is routed to * the corresponding RSS CQs. (Non-RSS packets are always completed * on the default (0) CQ). * * @param sc software handle to the device * @param *fwcmd pointer to the rss mbox command * @returns none */ static int oce_rss_itbl_init(POCE_SOFTC sc, struct mbx_config_nic_rss *fwcmd) { int i = 0, j = 0, rc = 0; uint8_t *tbl = fwcmd->params.req.cputable; struct oce_rq *rq = NULL; for (j = 0; j < INDIRECTION_TABLE_ENTRIES ; j += (sc->nrqs - 1)) { for_all_rss_queues(sc, rq, i) { if ((j + i) >= INDIRECTION_TABLE_ENTRIES) break; tbl[j + i] = rq->rss_cpuid; } } if (i == 0) { device_printf(sc->dev, "error: Invalid number of RSS RQ's\n"); rc = ENXIO; } /* fill log2 value indicating the size of the CPU table */ if (rc == 0) fwcmd->params.req.cpu_tbl_sz_log2 = LE_16(OCE_LOG2(INDIRECTION_TABLE_ENTRIES)); return rc; } /** * @brief Function to set flow control capability in the hardware * @param sc software handle to the device * @param if_id interface id to read the address from * @param enable_rss 0=disable, RSS_ENABLE_xxx flags otherwise * @returns 0 on success, EIO on failure */ int oce_config_nic_rss(POCE_SOFTC sc, uint32_t if_id, uint16_t enable_rss) { int rc; struct oce_mbx mbx; struct mbx_config_nic_rss *fwcmd = (struct mbx_config_nic_rss *)&mbx.payload; int version; bzero(&mbx, sizeof(struct oce_mbx)); if (IS_XE201(sc) || IS_SH(sc)) { version = OCE_MBX_VER_V1; fwcmd->params.req.enable_rss = RSS_ENABLE_UDP_IPV4 | RSS_ENABLE_UDP_IPV6; } else version = OCE_MBX_VER_V0; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, NIC_CONFIG_RSS, MBX_TIMEOUT_SEC, sizeof(struct mbx_config_nic_rss), version); if (enable_rss) fwcmd->params.req.enable_rss |= (RSS_ENABLE_IPV4 | RSS_ENABLE_TCP_IPV4 | RSS_ENABLE_IPV6 | RSS_ENABLE_TCP_IPV6); if(!sc->enable_hwlro) fwcmd->params.req.flush = OCE_FLUSH; else fwcmd->params.req.flush = 0; fwcmd->params.req.if_id = LE_32(if_id); - srandom(arc4random()); /* random entropy seed */ read_random(fwcmd->params.req.hash, sizeof(fwcmd->params.req.hash)); rc = oce_rss_itbl_init(sc, fwcmd); if (rc == 0) { mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_config_nic_rss); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); } return rc; } /** * @brief RXF function to enable/disable device promiscuous mode * @param sc software handle to the device * @param enable enable/disable flag * @returns 0 on success, EIO on failure * @note * The NIC_CONFIG_PROMISCUOUS command deprecated for Lancer. * This function uses the COMMON_SET_IFACE_RX_FILTER command instead. */ int oce_rxf_set_promiscuous(POCE_SOFTC sc, uint8_t enable) { struct mbx_set_common_iface_rx_filter *fwcmd; int sz = sizeof(struct mbx_set_common_iface_rx_filter); iface_rx_filter_ctx_t *req; OCE_DMA_MEM sgl; int rc; /* allocate mbx payload's dma scatter/gather memory */ rc = oce_dma_alloc(sc, sz, &sgl, 0); if (rc) return rc; fwcmd = OCE_DMAPTR(&sgl, struct mbx_set_common_iface_rx_filter); req = &fwcmd->params.req; req->iface_flags_mask = MBX_RX_IFACE_FLAGS_PROMISCUOUS | MBX_RX_IFACE_FLAGS_VLAN_PROMISCUOUS; /* Bit 0 Mac promisc, Bit 1 Vlan promisc */ if (enable & 0x01) req->iface_flags = MBX_RX_IFACE_FLAGS_PROMISCUOUS; if (enable & 0x02) req->iface_flags |= MBX_RX_IFACE_FLAGS_VLAN_PROMISCUOUS; req->if_id = sc->if_id; rc = oce_set_common_iface_rx_filter(sc, &sgl); oce_dma_free(sc, &sgl); return rc; } /** * @brief Function modify and select rx filter options * @param sc software handle to the device * @param sgl scatter/gather request/response * @returns 0 on success, error code on failure */ int oce_set_common_iface_rx_filter(POCE_SOFTC sc, POCE_DMA_MEM sgl) { struct oce_mbx mbx; int mbx_sz = sizeof(struct mbx_set_common_iface_rx_filter); struct mbx_set_common_iface_rx_filter *fwcmd; int rc; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = OCE_DMAPTR(sgl, struct mbx_set_common_iface_rx_filter); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_SET_IFACE_RX_FILTER, MBX_TIMEOUT_SEC, mbx_sz, OCE_MBX_VER_V0); oce_dma_sync(sgl, BUS_DMASYNC_PREWRITE); mbx.u0.s.embedded = 0; mbx.u0.s.sge_count = 1; mbx.payload.u0.u1.sgl[0].pa_lo = ADDR_LO(sgl->paddr); mbx.payload.u0.u1.sgl[0].pa_hi = ADDR_HI(sgl->paddr); mbx.payload.u0.u1.sgl[0].length = mbx_sz; mbx.payload_length = mbx_sz; DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } /** * @brief Function to query the link status from the hardware * @param sc software handle to the device * @param[out] link pointer to the structure returning link attributes * @returns 0 on success, EIO on failure */ int oce_get_link_status(POCE_SOFTC sc, struct link_status *link) { struct oce_mbx mbx; struct mbx_query_common_link_config *fwcmd; int rc = 0, version; bzero(&mbx, sizeof(struct oce_mbx)); IS_BE2(sc) ? (version = OCE_MBX_VER_V0) : (version = OCE_MBX_VER_V1); fwcmd = (struct mbx_query_common_link_config *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_QUERY_LINK_CONFIG, MBX_TIMEOUT_SEC, sizeof(struct mbx_query_common_link_config), version); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_query_common_link_config); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } /* interpret response */ link->qos_link_speed = HOST_16(fwcmd->params.rsp.qos_link_speed); link->phys_port_speed = fwcmd->params.rsp.physical_port_speed; link->logical_link_status = fwcmd->params.rsp.logical_link_status; error: return rc; } /** * @brief Function to get NIC statistics * @param sc software handle to the device * @param *stats pointer to where to store statistics * @param reset_stats resets statistics of set * @returns 0 on success, EIO on failure * @note command depricated in Lancer */ #define OCE_MBOX_GET_NIC_STATS(sc, pstats_dma_mem, version) \ int \ oce_mbox_get_nic_stats_v##version(POCE_SOFTC sc, POCE_DMA_MEM pstats_dma_mem) \ { \ struct oce_mbx mbx; \ struct mbx_get_nic_stats_v##version *fwcmd; \ int rc = 0; \ \ bzero(&mbx, sizeof(struct oce_mbx)); \ fwcmd = OCE_DMAPTR(pstats_dma_mem, struct mbx_get_nic_stats_v##version); \ bzero(fwcmd, sizeof(*fwcmd)); \ \ mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, \ MBX_SUBSYSTEM_NIC, \ NIC_GET_STATS, \ MBX_TIMEOUT_SEC, \ sizeof(*fwcmd), \ OCE_MBX_VER_V##version); \ \ mbx.u0.s.embedded = 0; /* stats too large for embedded mbx rsp */ \ mbx.u0.s.sge_count = 1; /* using scatter gather instead */ \ \ oce_dma_sync(pstats_dma_mem, BUS_DMASYNC_PREWRITE); \ mbx.payload.u0.u1.sgl[0].pa_lo = ADDR_LO(pstats_dma_mem->paddr); \ mbx.payload.u0.u1.sgl[0].pa_hi = ADDR_HI(pstats_dma_mem->paddr); \ mbx.payload.u0.u1.sgl[0].length = sizeof(*fwcmd); \ mbx.payload_length = sizeof(*fwcmd); \ DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); \ \ rc = oce_mbox_post(sc, &mbx, NULL); \ oce_dma_sync(pstats_dma_mem, BUS_DMASYNC_POSTWRITE); \ if (!rc) \ rc = fwcmd->hdr.u0.rsp.status; \ if (rc) \ device_printf(sc->dev, \ "%s failed - cmd status: %d addi status: %d\n", \ __FUNCTION__, rc, \ fwcmd->hdr.u0.rsp.additional_status); \ return rc; \ } OCE_MBOX_GET_NIC_STATS(sc, pstats_dma_mem, 0); OCE_MBOX_GET_NIC_STATS(sc, pstats_dma_mem, 1); OCE_MBOX_GET_NIC_STATS(sc, pstats_dma_mem, 2); /** * @brief Function to get pport (physical port) statistics * @param sc software handle to the device * @param *stats pointer to where to store statistics * @param reset_stats resets statistics of set * @returns 0 on success, EIO on failure */ int oce_mbox_get_pport_stats(POCE_SOFTC sc, POCE_DMA_MEM pstats_dma_mem, uint32_t reset_stats) { struct oce_mbx mbx; struct mbx_get_pport_stats *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = OCE_DMAPTR(pstats_dma_mem, struct mbx_get_pport_stats); bzero(fwcmd, sizeof(struct mbx_get_pport_stats)); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, NIC_GET_PPORT_STATS, MBX_TIMEOUT_SEC, sizeof(struct mbx_get_pport_stats), OCE_MBX_VER_V0); fwcmd->params.req.reset_stats = reset_stats; fwcmd->params.req.port_number = sc->port_id; mbx.u0.s.embedded = 0; /* stats too large for embedded mbx rsp */ mbx.u0.s.sge_count = 1; /* using scatter gather instead */ oce_dma_sync(pstats_dma_mem, BUS_DMASYNC_PREWRITE); mbx.payload.u0.u1.sgl[0].pa_lo = ADDR_LO(pstats_dma_mem->paddr); mbx.payload.u0.u1.sgl[0].pa_hi = ADDR_HI(pstats_dma_mem->paddr); mbx.payload.u0.u1.sgl[0].length = sizeof(struct mbx_get_pport_stats); mbx.payload_length = sizeof(struct mbx_get_pport_stats); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); oce_dma_sync(pstats_dma_mem, BUS_DMASYNC_POSTWRITE); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } /** * @brief Function to get vport (virtual port) statistics * @param sc software handle to the device * @param *stats pointer to where to store statistics * @param reset_stats resets statistics of set * @returns 0 on success, EIO on failure */ int oce_mbox_get_vport_stats(POCE_SOFTC sc, POCE_DMA_MEM pstats_dma_mem, uint32_t req_size, uint32_t reset_stats) { struct oce_mbx mbx; struct mbx_get_vport_stats *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = OCE_DMAPTR(pstats_dma_mem, struct mbx_get_vport_stats); bzero(fwcmd, sizeof(struct mbx_get_vport_stats)); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, NIC_GET_VPORT_STATS, MBX_TIMEOUT_SEC, sizeof(struct mbx_get_vport_stats), OCE_MBX_VER_V0); fwcmd->params.req.reset_stats = reset_stats; fwcmd->params.req.vport_number = sc->if_id; mbx.u0.s.embedded = 0; /* stats too large for embedded mbx rsp */ mbx.u0.s.sge_count = 1; /* using scatter gather instead */ oce_dma_sync(pstats_dma_mem, BUS_DMASYNC_PREWRITE); mbx.payload.u0.u1.sgl[0].pa_lo = ADDR_LO(pstats_dma_mem->paddr); mbx.payload.u0.u1.sgl[0].pa_hi = ADDR_HI(pstats_dma_mem->paddr); mbx.payload.u0.u1.sgl[0].length = sizeof(struct mbx_get_vport_stats); mbx.payload_length = sizeof(struct mbx_get_vport_stats); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); oce_dma_sync(pstats_dma_mem, BUS_DMASYNC_POSTWRITE); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } /** * @brief Function to update the muticast filter with * values in dma_mem * @param sc software handle to the device * @param dma_mem pointer to dma memory region * @returns 0 on success, EIO on failure */ int oce_update_multicast(POCE_SOFTC sc, POCE_DMA_MEM pdma_mem) { struct oce_mbx mbx; struct oce_mq_sge *sgl; struct mbx_set_common_iface_multicast *req = NULL; int rc = 0; req = OCE_DMAPTR(pdma_mem, struct mbx_set_common_iface_multicast); mbx_common_req_hdr_init(&req->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_SET_IFACE_MULTICAST, MBX_TIMEOUT_SEC, sizeof(struct mbx_set_common_iface_multicast), OCE_MBX_VER_V0); bzero(&mbx, sizeof(struct oce_mbx)); mbx.u0.s.embedded = 0; /*Non embeded*/ mbx.payload_length = sizeof(struct mbx_set_common_iface_multicast); mbx.u0.s.sge_count = 1; sgl = &mbx.payload.u0.u1.sgl[0]; sgl->pa_hi = htole32(upper_32_bits(pdma_mem->paddr)); sgl->pa_lo = htole32((pdma_mem->paddr) & 0xFFFFFFFF); sgl->length = htole32(mbx.payload_length); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = req->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, req->hdr.u0.rsp.additional_status); return rc; } /** * @brief Function to send passthrough Ioctls * @param sc software handle to the device * @param dma_mem pointer to dma memory region * @param req_size size of dma_mem * @returns 0 on success, EIO on failure */ int oce_pass_through_mbox(POCE_SOFTC sc, POCE_DMA_MEM dma_mem, uint32_t req_size) { struct oce_mbx mbx; struct oce_mq_sge *sgl; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); mbx.u0.s.embedded = 0; /*Non embeded*/ mbx.payload_length = req_size; mbx.u0.s.sge_count = 1; sgl = &mbx.payload.u0.u1.sgl[0]; sgl->pa_hi = htole32(upper_32_bits(dma_mem->paddr)); sgl->pa_lo = htole32((dma_mem->paddr) & 0xFFFFFFFF); sgl->length = htole32(req_size); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); return rc; } int oce_mbox_macaddr_add(POCE_SOFTC sc, uint8_t *mac_addr, uint32_t if_id, uint32_t *pmac_id) { struct oce_mbx mbx; struct mbx_add_common_iface_mac *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_add_common_iface_mac *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_ADD_IFACE_MAC, MBX_TIMEOUT_SEC, sizeof(struct mbx_add_common_iface_mac), OCE_MBX_VER_V0); fwcmd->params.req.if_id = (uint16_t) if_id; bcopy(mac_addr, fwcmd->params.req.mac_address, 6); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_add_common_iface_mac); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } *pmac_id = fwcmd->params.rsp.pmac_id; error: return rc; } int oce_mbox_macaddr_del(POCE_SOFTC sc, uint32_t if_id, uint32_t pmac_id) { struct oce_mbx mbx; struct mbx_del_common_iface_mac *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_del_common_iface_mac *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_DEL_IFACE_MAC, MBX_TIMEOUT_SEC, sizeof(struct mbx_del_common_iface_mac), OCE_MBX_VER_V0); fwcmd->params.req.if_id = (uint16_t)if_id; fwcmd->params.req.pmac_id = pmac_id; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_del_common_iface_mac); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } int oce_mbox_check_native_mode(POCE_SOFTC sc) { struct oce_mbx mbx; struct mbx_common_set_function_cap *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_common_set_function_cap *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_SET_FUNCTIONAL_CAPS, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_set_function_cap), OCE_MBX_VER_V0); fwcmd->params.req.valid_capability_flags = CAP_SW_TIMESTAMPS | CAP_BE3_NATIVE_ERX_API; fwcmd->params.req.capability_flags = CAP_BE3_NATIVE_ERX_API; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_common_set_function_cap); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } sc->be3_native = HOST_32(fwcmd->params.rsp.capability_flags) & CAP_BE3_NATIVE_ERX_API; error: return 0; } int oce_mbox_cmd_set_loopback(POCE_SOFTC sc, uint8_t port_num, uint8_t loopback_type, uint8_t enable) { struct oce_mbx mbx; struct mbx_lowlevel_set_loopback_mode *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_lowlevel_set_loopback_mode *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_LOWLEVEL, OPCODE_LOWLEVEL_SET_LOOPBACK_MODE, MBX_TIMEOUT_SEC, sizeof(struct mbx_lowlevel_set_loopback_mode), OCE_MBX_VER_V0); fwcmd->params.req.src_port = port_num; fwcmd->params.req.dest_port = port_num; fwcmd->params.req.loopback_type = loopback_type; fwcmd->params.req.loopback_state = enable; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_lowlevel_set_loopback_mode); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } int oce_mbox_cmd_test_loopback(POCE_SOFTC sc, uint32_t port_num, uint32_t loopback_type, uint32_t pkt_size, uint32_t num_pkts, uint64_t pattern) { struct oce_mbx mbx; struct mbx_lowlevel_test_loopback_mode *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_lowlevel_test_loopback_mode *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_LOWLEVEL, OPCODE_LOWLEVEL_TEST_LOOPBACK, MBX_TIMEOUT_SEC, sizeof(struct mbx_lowlevel_test_loopback_mode), OCE_MBX_VER_V0); fwcmd->params.req.pattern = pattern; fwcmd->params.req.src_port = port_num; fwcmd->params.req.dest_port = port_num; fwcmd->params.req.pkt_size = pkt_size; fwcmd->params.req.num_pkts = num_pkts; fwcmd->params.req.loopback_type = loopback_type; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_lowlevel_test_loopback_mode); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } int oce_mbox_write_flashrom(POCE_SOFTC sc, uint32_t optype,uint32_t opcode, POCE_DMA_MEM pdma_mem, uint32_t num_bytes) { struct oce_mbx mbx; struct oce_mq_sge *sgl = NULL; struct mbx_common_read_write_flashrom *fwcmd = NULL; int rc = 0, payload_len = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = OCE_DMAPTR(pdma_mem, struct mbx_common_read_write_flashrom); payload_len = sizeof(struct mbx_common_read_write_flashrom) + 32*1024; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_WRITE_FLASHROM, LONG_TIMEOUT, payload_len, OCE_MBX_VER_V0); fwcmd->flash_op_type = LE_32(optype); fwcmd->flash_op_code = LE_32(opcode); fwcmd->data_buffer_size = LE_32(num_bytes); mbx.u0.s.embedded = 0; /*Non embeded*/ mbx.payload_length = payload_len; mbx.u0.s.sge_count = 1; sgl = &mbx.payload.u0.u1.sgl[0]; sgl->pa_hi = upper_32_bits(pdma_mem->paddr); sgl->pa_lo = pdma_mem->paddr & 0xFFFFFFFF; sgl->length = payload_len; /* post the command */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } int oce_mbox_get_flashrom_crc(POCE_SOFTC sc, uint8_t *flash_crc, uint32_t offset, uint32_t optype) { int rc = 0, payload_len = 0; struct oce_mbx mbx; struct mbx_common_read_write_flashrom *fwcmd; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_common_read_write_flashrom *)&mbx.payload; /* Firmware requires extra 4 bytes with this ioctl. Since there is enough room in the mbx payload it should be good enough Reference: Bug 14853 */ payload_len = sizeof(struct mbx_common_read_write_flashrom) + 4; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_READ_FLASHROM, MBX_TIMEOUT_SEC, payload_len, OCE_MBX_VER_V0); fwcmd->flash_op_type = optype; fwcmd->flash_op_code = FLASHROM_OPER_REPORT; fwcmd->data_offset = offset; fwcmd->data_buffer_size = 0x4; mbx.u0.s.embedded = 1; mbx.payload_length = payload_len; /* post the command */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } bcopy(fwcmd->data_buffer, flash_crc, 4); error: return rc; } int oce_mbox_get_phy_info(POCE_SOFTC sc, struct oce_phy_info *phy_info) { struct oce_mbx mbx; struct mbx_common_phy_info *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_common_phy_info *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_GET_PHY_CONFIG, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_phy_info), OCE_MBX_VER_V0); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_common_phy_info); /* now post the command */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } phy_info->phy_type = HOST_16(fwcmd->params.rsp.phy_info.phy_type); phy_info->interface_type = HOST_16(fwcmd->params.rsp.phy_info.interface_type); phy_info->auto_speeds_supported = HOST_16(fwcmd->params.rsp.phy_info.auto_speeds_supported); phy_info->fixed_speeds_supported = HOST_16(fwcmd->params.rsp.phy_info.fixed_speeds_supported); phy_info->misc_params = HOST_32(fwcmd->params.rsp.phy_info.misc_params); error: return rc; } int oce_mbox_lancer_write_flashrom(POCE_SOFTC sc, uint32_t data_size, uint32_t data_offset, POCE_DMA_MEM pdma_mem, uint32_t *written_data, uint32_t *additional_status) { struct oce_mbx mbx; struct mbx_lancer_common_write_object *fwcmd = NULL; int rc = 0, payload_len = 0; bzero(&mbx, sizeof(struct oce_mbx)); payload_len = sizeof(struct mbx_lancer_common_write_object); mbx.u0.s.embedded = 1;/* Embedded */ mbx.payload_length = payload_len; fwcmd = (struct mbx_lancer_common_write_object *)&mbx.payload; /* initialize the ioctl header */ mbx_common_req_hdr_init(&fwcmd->params.req.hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_WRITE_OBJECT, LONG_TIMEOUT, payload_len, OCE_MBX_VER_V0); fwcmd->params.req.write_length = data_size; if (data_size == 0) fwcmd->params.req.eof = 1; else fwcmd->params.req.eof = 0; strcpy(fwcmd->params.req.object_name, "/prg"); fwcmd->params.req.descriptor_count = 1; fwcmd->params.req.write_offset = data_offset; fwcmd->params.req.buffer_length = data_size; fwcmd->params.req.address_lower = pdma_mem->paddr & 0xFFFFFFFF; fwcmd->params.req.address_upper = upper_32_bits(pdma_mem->paddr); /* post the command */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->params.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->params.rsp.additional_status); goto error; } *written_data = HOST_32(fwcmd->params.rsp.actual_write_length); *additional_status = fwcmd->params.rsp.additional_status; error: return rc; } int oce_mbox_create_rq(struct oce_rq *rq) { struct oce_mbx mbx; struct mbx_create_nic_rq *fwcmd; POCE_SOFTC sc = rq->parent; int rc, num_pages = 0; if (rq->qstate == QCREATED) return 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_create_nic_rq *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, NIC_CREATE_RQ, MBX_TIMEOUT_SEC, sizeof(struct mbx_create_nic_rq), OCE_MBX_VER_V0); /* oce_page_list will also prepare pages */ num_pages = oce_page_list(rq->ring, &fwcmd->params.req.pages[0]); if (IS_XE201(sc)) { fwcmd->params.req.frag_size = rq->cfg.frag_size/2048; fwcmd->params.req.page_size = 1; fwcmd->hdr.u0.req.version = OCE_MBX_VER_V1; } else fwcmd->params.req.frag_size = OCE_LOG2(rq->cfg.frag_size); fwcmd->params.req.num_pages = num_pages; fwcmd->params.req.cq_id = rq->cq->cq_id; fwcmd->params.req.if_id = sc->if_id; fwcmd->params.req.max_frame_size = rq->cfg.mtu; fwcmd->params.req.is_rss_queue = rq->cfg.is_rss_queue; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_create_nic_rq); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } rq->rq_id = HOST_16(fwcmd->params.rsp.rq_id); rq->rss_cpuid = fwcmd->params.rsp.rss_cpuid; error: return rc; } int oce_mbox_create_wq(struct oce_wq *wq) { struct oce_mbx mbx; struct mbx_create_nic_wq *fwcmd; POCE_SOFTC sc = wq->parent; int rc = 0, version, num_pages; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_create_nic_wq *)&mbx.payload; if (IS_XE201(sc)) version = OCE_MBX_VER_V1; else if(IS_BE(sc)) IS_PROFILE_SUPER_NIC(sc) ? (version = OCE_MBX_VER_V2) : (version = OCE_MBX_VER_V0); else version = OCE_MBX_VER_V2; if (version > OCE_MBX_VER_V0) fwcmd->params.req.if_id = sc->if_id; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, NIC_CREATE_WQ, MBX_TIMEOUT_SEC, sizeof(struct mbx_create_nic_wq), version); num_pages = oce_page_list(wq->ring, &fwcmd->params.req.pages[0]); fwcmd->params.req.nic_wq_type = wq->cfg.wq_type; fwcmd->params.req.num_pages = num_pages; fwcmd->params.req.wq_size = OCE_LOG2(wq->cfg.q_len) + 1; fwcmd->params.req.cq_id = wq->cq->cq_id; fwcmd->params.req.ulp_num = 1; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_create_nic_wq); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } wq->wq_id = HOST_16(fwcmd->params.rsp.wq_id); if (version == OCE_MBX_VER_V2) wq->db_offset = HOST_32(fwcmd->params.rsp.db_offset); else wq->db_offset = PD_TXULP_DB; error: return rc; } int oce_mbox_create_eq(struct oce_eq *eq) { struct oce_mbx mbx; struct mbx_create_common_eq *fwcmd; POCE_SOFTC sc = eq->parent; int rc = 0; uint32_t num_pages; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_create_common_eq *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_CREATE_EQ, MBX_TIMEOUT_SEC, sizeof(struct mbx_create_common_eq), OCE_MBX_VER_V0); num_pages = oce_page_list(eq->ring, &fwcmd->params.req.pages[0]); fwcmd->params.req.ctx.num_pages = num_pages; fwcmd->params.req.ctx.valid = 1; fwcmd->params.req.ctx.size = (eq->eq_cfg.item_size == 4) ? 0 : 1; fwcmd->params.req.ctx.count = OCE_LOG2(eq->eq_cfg.q_len / 256); fwcmd->params.req.ctx.armed = 0; fwcmd->params.req.ctx.delay_mult = eq->eq_cfg.cur_eqd; mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_create_common_eq); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } eq->eq_id = HOST_16(fwcmd->params.rsp.eq_id); error: return rc; } int oce_mbox_cq_create(struct oce_cq *cq, uint32_t ncoalesce, uint32_t is_eventable) { struct oce_mbx mbx; struct mbx_create_common_cq *fwcmd; POCE_SOFTC sc = cq->parent; uint8_t version; oce_cq_ctx_t *ctx; uint32_t num_pages, page_size; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_create_common_cq *)&mbx.payload; if (IS_XE201(sc)) version = OCE_MBX_VER_V2; else version = OCE_MBX_VER_V0; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_CREATE_CQ, MBX_TIMEOUT_SEC, sizeof(struct mbx_create_common_cq), version); ctx = &fwcmd->params.req.cq_ctx; num_pages = oce_page_list(cq->ring, &fwcmd->params.req.pages[0]); page_size = 1; /* 1 for 4K */ if (version == OCE_MBX_VER_V2) { ctx->v2.num_pages = LE_16(num_pages); ctx->v2.page_size = page_size; ctx->v2.eventable = is_eventable; ctx->v2.valid = 1; ctx->v2.count = OCE_LOG2(cq->cq_cfg.q_len / 256); ctx->v2.nodelay = cq->cq_cfg.nodelay; ctx->v2.coalesce_wm = ncoalesce; ctx->v2.armed = 0; ctx->v2.eq_id = cq->eq->eq_id; if (ctx->v2.count == 3) { if ((u_int)cq->cq_cfg.q_len > (4*1024)-1) ctx->v2.cqe_count = (4*1024)-1; else ctx->v2.cqe_count = cq->cq_cfg.q_len; } } else { ctx->v0.num_pages = LE_16(num_pages); ctx->v0.eventable = is_eventable; ctx->v0.valid = 1; ctx->v0.count = OCE_LOG2(cq->cq_cfg.q_len / 256); ctx->v0.nodelay = cq->cq_cfg.nodelay; ctx->v0.coalesce_wm = ncoalesce; ctx->v0.armed = 0; ctx->v0.eq_id = cq->eq->eq_id; } mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_create_common_cq); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } cq->cq_id = HOST_16(fwcmd->params.rsp.cq_id); error: return rc; } int oce_mbox_read_transrecv_data(POCE_SOFTC sc, uint32_t page_num) { int rc = 0; struct oce_mbx mbx; struct mbx_read_common_transrecv_data *fwcmd; struct oce_mq_sge *sgl; OCE_DMA_MEM dma; /* Allocate DMA mem*/ if (oce_dma_alloc(sc, sizeof(struct mbx_read_common_transrecv_data), &dma, 0)) return ENOMEM; fwcmd = OCE_DMAPTR(&dma, struct mbx_read_common_transrecv_data); bzero(fwcmd, sizeof(struct mbx_read_common_transrecv_data)); bzero(&mbx, sizeof(struct oce_mbx)); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_READ_TRANSRECEIVER_DATA, MBX_TIMEOUT_SEC, sizeof(struct mbx_read_common_transrecv_data), OCE_MBX_VER_V0); /* fill rest of mbx */ mbx.u0.s.embedded = 0; mbx.payload_length = sizeof(struct mbx_read_common_transrecv_data); mbx.u0.s.sge_count = 1; sgl = &mbx.payload.u0.u1.sgl[0]; sgl->pa_hi = htole32(upper_32_bits(dma.paddr)); sgl->pa_lo = htole32((dma.paddr) & 0xFFFFFFFF); sgl->length = htole32(mbx.payload_length); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); fwcmd->params.req.port = LE_32(sc->port_id); fwcmd->params.req.page_num = LE_32(page_num); /* command post */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } if(fwcmd->params.rsp.page_num == PAGE_NUM_A0) { bcopy((char *)fwcmd->params.rsp.page_data, &sfp_vpd_dump_buffer[0], TRANSCEIVER_A0_SIZE); } if(fwcmd->params.rsp.page_num == PAGE_NUM_A2) { bcopy((char *)fwcmd->params.rsp.page_data, &sfp_vpd_dump_buffer[TRANSCEIVER_A0_SIZE], TRANSCEIVER_A2_SIZE); } error: oce_dma_free(sc, &dma); return rc; } void oce_mbox_eqd_modify_periodic(POCE_SOFTC sc, struct oce_set_eqd *set_eqd, int num) { struct oce_mbx mbx; struct mbx_modify_common_eq_delay *fwcmd; int rc = 0; int i = 0; bzero(&mbx, sizeof(struct oce_mbx)); /* Initialize MODIFY_EQ_DELAY ioctl header */ fwcmd = (struct mbx_modify_common_eq_delay *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_MODIFY_EQ_DELAY, MBX_TIMEOUT_SEC, sizeof(struct mbx_modify_common_eq_delay), OCE_MBX_VER_V0); /* fill rest of mbx */ mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_modify_common_eq_delay); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); fwcmd->params.req.num_eq = num; for (i = 0; i < num; i++) { fwcmd->params.req.delay[i].eq_id = htole32(set_eqd[i].eq_id); fwcmd->params.req.delay[i].phase = 0; fwcmd->params.req.delay[i].dm = htole32(set_eqd[i].delay_multiplier); } /* command post */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); } int oce_get_profile_config(POCE_SOFTC sc, uint32_t max_rss) { struct oce_mbx mbx; struct mbx_common_get_profile_config *fwcmd; int rc = 0; int version = 0; struct oce_mq_sge *sgl; OCE_DMA_MEM dma; uint32_t desc_count = 0; struct oce_nic_resc_desc *nic_desc = NULL; int i; boolean_t nic_desc_valid = FALSE; if (IS_BE2(sc)) return -1; /* Allocate DMA mem*/ if (oce_dma_alloc(sc, sizeof(struct mbx_common_get_profile_config), &dma, 0)) return ENOMEM; /* Initialize MODIFY_EQ_DELAY ioctl header */ fwcmd = OCE_DMAPTR(&dma, struct mbx_common_get_profile_config); bzero(fwcmd, sizeof(struct mbx_common_get_profile_config)); if (!IS_XE201(sc)) version = OCE_MBX_VER_V1; else version = OCE_MBX_VER_V0; bzero(&mbx, sizeof(struct oce_mbx)); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_GET_PROFILE_CONFIG, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_get_profile_config), version); /* fill rest of mbx */ mbx.u0.s.embedded = 0; mbx.payload_length = sizeof(struct mbx_common_get_profile_config); mbx.u0.s.sge_count = 1; sgl = &mbx.payload.u0.u1.sgl[0]; sgl->pa_hi = htole32(upper_32_bits(dma.paddr)); sgl->pa_lo = htole32((dma.paddr) & 0xFFFFFFFF); sgl->length = htole32(mbx.payload_length); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); fwcmd->params.req.type = ACTIVE_PROFILE; /* command post */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } nic_desc = (struct oce_nic_resc_desc *) fwcmd->params.rsp.resources; desc_count = HOST_32(fwcmd->params.rsp.desc_count); for (i = 0; i < desc_count; i++) { if ((nic_desc->desc_type == NIC_RESC_DESC_TYPE_V0) || (nic_desc->desc_type == NIC_RESC_DESC_TYPE_V1)) { nic_desc_valid = TRUE; break; } nic_desc = (struct oce_nic_resc_desc *) \ ((char *)nic_desc + nic_desc->desc_len); } if (!nic_desc_valid) { rc = -1; goto error; } else { sc->max_vlans = HOST_16(nic_desc->vlan_count); sc->nwqs = HOST_16(nic_desc->txq_count); if (sc->nwqs) sc->nwqs = MIN(sc->nwqs, OCE_MAX_WQ); else sc->nwqs = OCE_MAX_WQ; sc->nrssqs = HOST_16(nic_desc->rssq_count); if (sc->nrssqs) sc->nrssqs = MIN(sc->nrssqs, max_rss); else sc->nrssqs = max_rss; sc->nrqs = sc->nrssqs + 1; /* 1 for def RX */ } error: oce_dma_free(sc, &dma); return rc; } int oce_get_func_config(POCE_SOFTC sc) { struct oce_mbx mbx; struct mbx_common_get_func_config *fwcmd; int rc = 0; int version = 0; struct oce_mq_sge *sgl; OCE_DMA_MEM dma; uint32_t desc_count = 0; struct oce_nic_resc_desc *nic_desc = NULL; int i; boolean_t nic_desc_valid = FALSE; uint32_t max_rss = 0; if ((IS_BE(sc) || IS_SH(sc)) && (!sc->be3_native)) max_rss = OCE_LEGACY_MODE_RSS; else max_rss = OCE_MAX_RSS; /* Allocate DMA mem*/ if (oce_dma_alloc(sc, sizeof(struct mbx_common_get_func_config), &dma, 0)) return ENOMEM; /* Initialize MODIFY_EQ_DELAY ioctl header */ fwcmd = OCE_DMAPTR(&dma, struct mbx_common_get_func_config); bzero(fwcmd, sizeof(struct mbx_common_get_func_config)); if (IS_SH(sc)) version = OCE_MBX_VER_V1; else version = OCE_MBX_VER_V0; bzero(&mbx, sizeof(struct oce_mbx)); mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_COMMON, OPCODE_COMMON_GET_FUNCTION_CONFIG, MBX_TIMEOUT_SEC, sizeof(struct mbx_common_get_func_config), version); /* fill rest of mbx */ mbx.u0.s.embedded = 0; mbx.payload_length = sizeof(struct mbx_common_get_func_config); mbx.u0.s.sge_count = 1; sgl = &mbx.payload.u0.u1.sgl[0]; sgl->pa_hi = htole32(upper_32_bits(dma.paddr)); sgl->pa_lo = htole32((dma.paddr) & 0xFFFFFFFF); sgl->length = htole32(mbx.payload_length); DW_SWAP(u32ptr(&mbx), mbx.payload_length + OCE_BMBX_RHDR_SZ); /* command post */ rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } nic_desc = (struct oce_nic_resc_desc *) fwcmd->params.rsp.resources; desc_count = HOST_32(fwcmd->params.rsp.desc_count); for (i = 0; i < desc_count; i++) { if ((nic_desc->desc_type == NIC_RESC_DESC_TYPE_V0) || (nic_desc->desc_type == NIC_RESC_DESC_TYPE_V1)) { nic_desc_valid = TRUE; break; } nic_desc = (struct oce_nic_resc_desc *) \ ((char *)nic_desc + nic_desc->desc_len); } if (!nic_desc_valid) { rc = -1; goto error; } else { sc->max_vlans = nic_desc->vlan_count; sc->nwqs = HOST_32(nic_desc->txq_count); if (sc->nwqs) sc->nwqs = MIN(sc->nwqs, OCE_MAX_WQ); else sc->nwqs = OCE_MAX_WQ; sc->nrssqs = HOST_32(nic_desc->rssq_count); if (sc->nrssqs) sc->nrssqs = MIN(sc->nrssqs, max_rss); else sc->nrssqs = max_rss; sc->nrqs = sc->nrssqs + 1; /* 1 for def RX */ } error: oce_dma_free(sc, &dma); return rc; } /* hw lro functions */ int oce_mbox_nic_query_lro_capabilities(POCE_SOFTC sc, uint32_t *lro_rq_cnt, uint32_t *lro_flags) { struct oce_mbx mbx; struct mbx_nic_query_lro_capabilities *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_nic_query_lro_capabilities *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, 0x20,MBX_TIMEOUT_SEC, sizeof(struct mbx_nic_query_lro_capabilities), OCE_MBX_VER_V0); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_nic_query_lro_capabilities); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } if(lro_flags) *lro_flags = HOST_32(fwcmd->params.rsp.lro_flags); if(lro_rq_cnt) *lro_rq_cnt = HOST_16(fwcmd->params.rsp.lro_rq_cnt); return rc; } int oce_mbox_nic_set_iface_lro_config(POCE_SOFTC sc, int enable) { struct oce_mbx mbx; struct mbx_nic_set_iface_lro_config *fwcmd; int rc = 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_nic_set_iface_lro_config *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, 0x26,MBX_TIMEOUT_SEC, sizeof(struct mbx_nic_set_iface_lro_config), OCE_MBX_VER_V0); mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_nic_set_iface_lro_config); fwcmd->params.req.iface_id = sc->if_id; fwcmd->params.req.lro_flags = 0; if(enable) { fwcmd->params.req.lro_flags = LRO_FLAGS_HASH_MODE | LRO_FLAGS_RSS_MODE; fwcmd->params.req.lro_flags |= LRO_FLAGS_CLSC_IPV4 | LRO_FLAGS_CLSC_IPV6; fwcmd->params.req.max_clsc_byte_cnt = 64*1024; /* min = 2974, max = 0xfa59 */ fwcmd->params.req.max_clsc_seg_cnt = 43; /* min = 2, max = 64 */ fwcmd->params.req.max_clsc_usec_delay = 18; /* min = 1, max = 256 */ fwcmd->params.req.min_clsc_frame_byte_cnt = 0; /* min = 1, max = 9014 */ } rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); return rc; } return rc; } int oce_mbox_create_rq_v2(struct oce_rq *rq) { struct oce_mbx mbx; struct mbx_create_nic_rq_v2 *fwcmd; POCE_SOFTC sc = rq->parent; int rc = 0, num_pages = 0; if (rq->qstate == QCREATED) return 0; bzero(&mbx, sizeof(struct oce_mbx)); fwcmd = (struct mbx_create_nic_rq_v2 *)&mbx.payload; mbx_common_req_hdr_init(&fwcmd->hdr, 0, 0, MBX_SUBSYSTEM_NIC, 0x08, MBX_TIMEOUT_SEC, sizeof(struct mbx_create_nic_rq_v2), OCE_MBX_VER_V2); /* oce_page_list will also prepare pages */ num_pages = oce_page_list(rq->ring, &fwcmd->params.req.pages[0]); fwcmd->params.req.cq_id = rq->cq->cq_id; fwcmd->params.req.frag_size = rq->cfg.frag_size/2048; fwcmd->params.req.num_pages = num_pages; fwcmd->params.req.if_id = sc->if_id; fwcmd->params.req.max_frame_size = rq->cfg.mtu; fwcmd->params.req.page_size = 1; if(rq->cfg.is_rss_queue) { fwcmd->params.req.rq_flags = (NIC_RQ_FLAGS_RSS | NIC_RQ_FLAGS_LRO); }else { device_printf(sc->dev, "non rss lro queue should not be created \n"); goto error; } mbx.u0.s.embedded = 1; mbx.payload_length = sizeof(struct mbx_create_nic_rq_v2); rc = oce_mbox_post(sc, &mbx, NULL); if (!rc) rc = fwcmd->hdr.u0.rsp.status; if (rc) { device_printf(sc->dev, "%s failed - cmd status: %d addi status: %d\n", __FUNCTION__, rc, fwcmd->hdr.u0.rsp.additional_status); goto error; } rq->rq_id = HOST_16(fwcmd->params.rsp.rq_id); rq->rss_cpuid = fwcmd->params.rsp.rss_cpuid; error: return rc; } Index: head/sys/kern/init_main.c =================================================================== --- head/sys/kern/init_main.c (revision 356096) +++ head/sys/kern/init_main.c (revision 356097) @@ -1,857 +1,837 @@ /*- * SPDX-License-Identifier: BSD-4-Clause * * Copyright (c) 1995 Terrence R. Lambert * All rights reserved. * * Copyright (c) 1982, 1986, 1989, 1991, 1992, 1993 * The Regents of the University of California. All rights reserved. * (c) UNIX System Laboratories, Inc. * All or some portions of this file are derived from material licensed * to the University of California by American Telephone and Telegraph * Co. or Unix System Laboratories, Inc. and are reproduced herein with * the permission of UNIX System Laboratories, Inc. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * This product includes software developed by the University of * California, Berkeley and its contributors. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)init_main.c 8.9 (Berkeley) 1/21/94 */ #include __FBSDID("$FreeBSD$"); #include "opt_ddb.h" #include "opt_init_path.h" #include "opt_verbose_sysinit.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include void mi_startup(void); /* Should be elsewhere */ /* Components of the first process -- never freed. */ static struct session session0; static struct pgrp pgrp0; struct proc proc0; struct thread0_storage thread0_st __aligned(32); struct vmspace vmspace0; struct proc *initproc; int linux_alloc_current_noop(struct thread *td __unused, int flags __unused) { return (0); } int (*lkpi_alloc_current)(struct thread *, int) = linux_alloc_current_noop; #ifndef BOOTHOWTO #define BOOTHOWTO 0 #endif int boothowto = BOOTHOWTO; /* initialized so that it can be patched */ SYSCTL_INT(_debug, OID_AUTO, boothowto, CTLFLAG_RD, &boothowto, 0, "Boot control flags, passed from loader"); #ifndef BOOTVERBOSE #define BOOTVERBOSE 0 #endif int bootverbose = BOOTVERBOSE; SYSCTL_INT(_debug, OID_AUTO, bootverbose, CTLFLAG_RW, &bootverbose, 0, "Control the output of verbose kernel messages"); #ifdef VERBOSE_SYSINIT /* * We'll use the defined value of VERBOSE_SYSINIT from the kernel config to * dictate the default VERBOSE_SYSINIT behavior. Significant values for this * option and associated tunable are: * - 0, 'compiled in but silent by default' * - 1, 'compiled in but verbose by default' (default) */ int verbose_sysinit = VERBOSE_SYSINIT; TUNABLE_INT("debug.verbose_sysinit", &verbose_sysinit); #endif #ifdef INVARIANTS FEATURE(invariants, "Kernel compiled with INVARIANTS, may affect performance"); #endif /* * This ensures that there is at least one entry so that the sysinit_set * symbol is not undefined. A sybsystem ID of SI_SUB_DUMMY is never * executed. */ SYSINIT(placeholder, SI_SUB_DUMMY, SI_ORDER_ANY, NULL, NULL); /* * The sysinit table itself. Items are checked off as the are run. * If we want to register new sysinit types, add them to newsysinit. */ SET_DECLARE(sysinit_set, struct sysinit); struct sysinit **sysinit, **sysinit_end; struct sysinit **newsysinit, **newsysinit_end; /* * Merge a new sysinit set into the current set, reallocating it if * necessary. This can only be called after malloc is running. */ void sysinit_add(struct sysinit **set, struct sysinit **set_end) { struct sysinit **newset; struct sysinit **sipp; struct sysinit **xipp; int count; count = set_end - set; if (newsysinit) count += newsysinit_end - newsysinit; else count += sysinit_end - sysinit; newset = malloc(count * sizeof(*sipp), M_TEMP, M_NOWAIT); if (newset == NULL) panic("cannot malloc for sysinit"); xipp = newset; if (newsysinit) for (sipp = newsysinit; sipp < newsysinit_end; sipp++) *xipp++ = *sipp; else for (sipp = sysinit; sipp < sysinit_end; sipp++) *xipp++ = *sipp; for (sipp = set; sipp < set_end; sipp++) *xipp++ = *sipp; if (newsysinit) free(newsysinit, M_TEMP); newsysinit = newset; newsysinit_end = newset + count; } #if defined (DDB) && defined(VERBOSE_SYSINIT) static const char * symbol_name(vm_offset_t va, db_strategy_t strategy) { const char *name; c_db_sym_t sym; db_expr_t offset; if (va == 0) return (NULL); sym = db_search_symbol(va, strategy, &offset); if (offset != 0) return (NULL); db_symbol_values(sym, &name, NULL); return (name); } #endif /* * System startup; initialize the world, create process 0, mount root * filesystem, and fork to create init and pagedaemon. Most of the * hard work is done in the lower-level initialization routines including * startup(), which does memory initialization and autoconfiguration. * * This allows simple addition of new kernel subsystems that require * boot time initialization. It also allows substitution of subsystem * (for instance, a scheduler, kernel profiler, or VM system) by object * module. Finally, it allows for optional "kernel threads". */ void mi_startup(void) { struct sysinit **sipp; /* system initialization*/ struct sysinit **xipp; /* interior loop of sort*/ struct sysinit *save; /* bubble*/ #if defined(VERBOSE_SYSINIT) int last; int verbose; #endif TSENTER(); if (boothowto & RB_VERBOSE) bootverbose++; if (sysinit == NULL) { sysinit = SET_BEGIN(sysinit_set); sysinit_end = SET_LIMIT(sysinit_set); } restart: /* * Perform a bubble sort of the system initialization objects by * their subsystem (primary key) and order (secondary key). */ for (sipp = sysinit; sipp < sysinit_end; sipp++) { for (xipp = sipp + 1; xipp < sysinit_end; xipp++) { if ((*sipp)->subsystem < (*xipp)->subsystem || ((*sipp)->subsystem == (*xipp)->subsystem && (*sipp)->order <= (*xipp)->order)) continue; /* skip*/ save = *sipp; *sipp = *xipp; *xipp = save; } } #if defined(VERBOSE_SYSINIT) last = SI_SUB_COPYRIGHT; verbose = 0; #if !defined(DDB) printf("VERBOSE_SYSINIT: DDB not enabled, symbol lookups disabled.\n"); #endif #endif /* * Traverse the (now) ordered list of system initialization tasks. * Perform each task, and continue on to the next task. */ for (sipp = sysinit; sipp < sysinit_end; sipp++) { if ((*sipp)->subsystem == SI_SUB_DUMMY) continue; /* skip dummy task(s)*/ if ((*sipp)->subsystem == SI_SUB_DONE) continue; #if defined(VERBOSE_SYSINIT) if ((*sipp)->subsystem > last && verbose_sysinit != 0) { verbose = 1; last = (*sipp)->subsystem; printf("subsystem %x\n", last); } if (verbose) { #if defined(DDB) const char *func, *data; func = symbol_name((vm_offset_t)(*sipp)->func, DB_STGY_PROC); data = symbol_name((vm_offset_t)(*sipp)->udata, DB_STGY_ANY); if (func != NULL && data != NULL) printf(" %s(&%s)... ", func, data); else if (func != NULL) printf(" %s(%p)... ", func, (*sipp)->udata); else #endif printf(" %p(%p)... ", (*sipp)->func, (*sipp)->udata); } #endif /* Call function */ (*((*sipp)->func))((*sipp)->udata); #if defined(VERBOSE_SYSINIT) if (verbose) printf("done.\n"); #endif /* Check off the one we're just done */ (*sipp)->subsystem = SI_SUB_DONE; /* Check if we've installed more sysinit items via KLD */ if (newsysinit != NULL) { if (sysinit != SET_BEGIN(sysinit_set)) free(sysinit, M_TEMP); sysinit = newsysinit; sysinit_end = newsysinit_end; newsysinit = NULL; newsysinit_end = NULL; goto restart; } } TSEXIT(); /* Here so we don't overlap with start_init. */ mtx_assert(&Giant, MA_OWNED | MA_NOTRECURSED); mtx_unlock(&Giant); /* * Now hand over this thread to swapper. */ swapper(); /* NOTREACHED*/ } static void print_caddr_t(void *data) { printf("%s", (char *)data); } static void print_version(void *data __unused) { int len; /* Strip a trailing newline from version. */ len = strlen(version); while (len > 0 && version[len - 1] == '\n') len--; printf("%.*s %s\n", len, version, machine); printf("%s\n", compiler_version); } SYSINIT(announce, SI_SUB_COPYRIGHT, SI_ORDER_FIRST, print_caddr_t, copyright); SYSINIT(trademark, SI_SUB_COPYRIGHT, SI_ORDER_SECOND, print_caddr_t, trademark); SYSINIT(version, SI_SUB_COPYRIGHT, SI_ORDER_THIRD, print_version, NULL); #ifdef WITNESS static char wit_warn[] = "WARNING: WITNESS option enabled, expect reduced performance.\n"; SYSINIT(witwarn, SI_SUB_COPYRIGHT, SI_ORDER_THIRD + 1, print_caddr_t, wit_warn); SYSINIT(witwarn2, SI_SUB_LAST, SI_ORDER_THIRD + 1, print_caddr_t, wit_warn); #endif #ifdef DIAGNOSTIC static char diag_warn[] = "WARNING: DIAGNOSTIC option enabled, expect reduced performance.\n"; SYSINIT(diagwarn, SI_SUB_COPYRIGHT, SI_ORDER_THIRD + 2, print_caddr_t, diag_warn); SYSINIT(diagwarn2, SI_SUB_LAST, SI_ORDER_THIRD + 2, print_caddr_t, diag_warn); #endif static int null_fetch_syscall_args(struct thread *td __unused) { panic("null_fetch_syscall_args"); } static void null_set_syscall_retval(struct thread *td __unused, int error __unused) { panic("null_set_syscall_retval"); } struct sysentvec null_sysvec = { .sv_size = 0, .sv_table = NULL, .sv_errsize = 0, .sv_errtbl = NULL, .sv_transtrap = NULL, .sv_fixup = NULL, .sv_sendsig = NULL, .sv_sigcode = NULL, .sv_szsigcode = NULL, .sv_name = "null", .sv_coredump = NULL, .sv_imgact_try = NULL, .sv_minsigstksz = 0, .sv_minuser = VM_MIN_ADDRESS, .sv_maxuser = VM_MAXUSER_ADDRESS, .sv_usrstack = USRSTACK, .sv_psstrings = PS_STRINGS, .sv_stackprot = VM_PROT_ALL, .sv_copyout_strings = NULL, .sv_setregs = NULL, .sv_fixlimit = NULL, .sv_maxssiz = NULL, .sv_flags = 0, .sv_set_syscall_retval = null_set_syscall_retval, .sv_fetch_syscall_args = null_fetch_syscall_args, .sv_syscallnames = NULL, .sv_schedtail = NULL, .sv_thread_detach = NULL, .sv_trap = NULL, }; /* * The two following SYSINIT's are proc0 specific glue code. I am not * convinced that they can not be safely combined, but their order of * operation has been maintained as the same as the original init_main.c * for right now. */ /* ARGSUSED*/ static void proc0_init(void *dummy __unused) { struct proc *p; struct thread *td; struct ucred *newcred; struct uidinfo tmpuinfo; struct loginclass tmplc = { .lc_name = "", }; vm_paddr_t pageablemem; int i; GIANT_REQUIRED; p = &proc0; td = &thread0; /* * Initialize magic number and osrel. */ p->p_magic = P_MAGIC; p->p_osrel = osreldate; /* * Initialize thread and process structures. */ procinit(); /* set up proc zone */ threadinit(); /* set up UMA zones */ /* * Initialise scheduler resources. * Add scheduler specific parts to proc, thread as needed. */ schedinit(); /* scheduler gets its house in order */ /* * Create process 0 (the swapper). */ LIST_INSERT_HEAD(&allproc, p, p_list); LIST_INSERT_HEAD(PIDHASH(0), p, p_hash); mtx_init(&pgrp0.pg_mtx, "process group", NULL, MTX_DEF | MTX_DUPOK); p->p_pgrp = &pgrp0; LIST_INSERT_HEAD(PGRPHASH(0), &pgrp0, pg_hash); LIST_INIT(&pgrp0.pg_members); LIST_INSERT_HEAD(&pgrp0.pg_members, p, p_pglist); pgrp0.pg_session = &session0; mtx_init(&session0.s_mtx, "session", NULL, MTX_DEF); refcount_init(&session0.s_count, 1); session0.s_leader = p; p->p_sysent = &null_sysvec; p->p_flag = P_SYSTEM | P_INMEM | P_KPROC; p->p_flag2 = 0; p->p_state = PRS_NORMAL; p->p_klist = knlist_alloc(&p->p_mtx); STAILQ_INIT(&p->p_ktr); p->p_nice = NZERO; /* pid_max cannot be greater than PID_MAX */ td->td_tid = PID_MAX + 1; LIST_INSERT_HEAD(TIDHASH(td->td_tid), td, td_hash); td->td_state = TDS_RUNNING; td->td_pri_class = PRI_TIMESHARE; td->td_user_pri = PUSER; td->td_base_user_pri = PUSER; td->td_lend_user_pri = PRI_MAX; td->td_priority = PVM; td->td_base_pri = PVM; td->td_oncpu = curcpu; td->td_flags = TDF_INMEM; td->td_pflags = TDP_KTHREAD; td->td_cpuset = cpuset_thread0(); td->td_domain.dr_policy = td->td_cpuset->cs_domain; prison0_init(); p->p_peers = 0; p->p_leader = p; p->p_reaper = p; p->p_treeflag |= P_TREE_REAPER; LIST_INIT(&p->p_reaplist); strncpy(p->p_comm, "kernel", sizeof (p->p_comm)); strncpy(td->td_name, "swapper", sizeof (td->td_name)); callout_init_mtx(&p->p_itcallout, &p->p_mtx, 0); callout_init_mtx(&p->p_limco, &p->p_mtx, 0); callout_init(&td->td_slpcallout, 1); /* Create credentials. */ newcred = crget(); newcred->cr_ngroups = 1; /* group 0 */ /* A hack to prevent uifind from tripping over NULL pointers. */ curthread->td_ucred = newcred; tmpuinfo.ui_uid = 1; newcred->cr_uidinfo = newcred->cr_ruidinfo = &tmpuinfo; newcred->cr_uidinfo = uifind(0); newcred->cr_ruidinfo = uifind(0); newcred->cr_loginclass = &tmplc; newcred->cr_loginclass = loginclass_find("default"); /* End hack. creds get properly set later with thread_cow_get_proc */ curthread->td_ucred = NULL; newcred->cr_prison = &prison0; proc_set_cred_init(p, newcred); #ifdef AUDIT audit_cred_kproc0(newcred); #endif #ifdef MAC mac_cred_create_swapper(newcred); #endif /* Create sigacts. */ p->p_sigacts = sigacts_alloc(); /* Initialize signal state for process 0. */ siginit(&proc0); /* Create the file descriptor table. */ p->p_fd = fdinit(NULL, false); p->p_fdtol = NULL; /* Create the limits structures. */ p->p_limit = lim_alloc(); for (i = 0; i < RLIM_NLIMITS; i++) p->p_limit->pl_rlimit[i].rlim_cur = p->p_limit->pl_rlimit[i].rlim_max = RLIM_INFINITY; p->p_limit->pl_rlimit[RLIMIT_NOFILE].rlim_cur = p->p_limit->pl_rlimit[RLIMIT_NOFILE].rlim_max = maxfiles; p->p_limit->pl_rlimit[RLIMIT_NPROC].rlim_cur = p->p_limit->pl_rlimit[RLIMIT_NPROC].rlim_max = maxproc; p->p_limit->pl_rlimit[RLIMIT_DATA].rlim_cur = dfldsiz; p->p_limit->pl_rlimit[RLIMIT_DATA].rlim_max = maxdsiz; p->p_limit->pl_rlimit[RLIMIT_STACK].rlim_cur = dflssiz; p->p_limit->pl_rlimit[RLIMIT_STACK].rlim_max = maxssiz; /* Cast to avoid overflow on i386/PAE. */ pageablemem = ptoa((vm_paddr_t)vm_free_count()); p->p_limit->pl_rlimit[RLIMIT_RSS].rlim_cur = p->p_limit->pl_rlimit[RLIMIT_RSS].rlim_max = pageablemem; p->p_limit->pl_rlimit[RLIMIT_MEMLOCK].rlim_cur = pageablemem / 3; p->p_limit->pl_rlimit[RLIMIT_MEMLOCK].rlim_max = pageablemem; p->p_cpulimit = RLIM_INFINITY; PROC_LOCK(p); thread_cow_get_proc(td, p); PROC_UNLOCK(p); /* Initialize resource accounting structures. */ racct_create(&p->p_racct); p->p_stats = pstats_alloc(); /* Allocate a prototype map so we have something to fork. */ p->p_vmspace = &vmspace0; vmspace0.vm_refcnt = 1; pmap_pinit0(vmspace_pmap(&vmspace0)); /* * proc0 is not expected to enter usermode, so there is no special * handling for sv_minuser here, like is done for exec_new_vmspace(). */ vm_map_init(&vmspace0.vm_map, vmspace_pmap(&vmspace0), p->p_sysent->sv_minuser, p->p_sysent->sv_maxuser); /* * Call the init and ctor for the new thread and proc. We wait * to do this until all other structures are fairly sane. */ EVENTHANDLER_DIRECT_INVOKE(process_init, p); EVENTHANDLER_DIRECT_INVOKE(thread_init, td); EVENTHANDLER_DIRECT_INVOKE(process_ctor, p); EVENTHANDLER_DIRECT_INVOKE(thread_ctor, td); /* * Charge root for one process. */ (void)chgproccnt(p->p_ucred->cr_ruidinfo, 1, 0); PROC_LOCK(p); racct_add_force(p, RACCT_NPROC, 1); PROC_UNLOCK(p); } SYSINIT(p0init, SI_SUB_INTRINSIC, SI_ORDER_FIRST, proc0_init, NULL); /* ARGSUSED*/ static void proc0_post(void *dummy __unused) { - struct timespec ts; struct proc *p; struct rusage ru; struct thread *td; /* * Now we can look at the time, having had a chance to verify the * time from the filesystem. Pretend that proc0 started now. */ sx_slock(&allproc_lock); FOREACH_PROC_IN_SYSTEM(p) { PROC_LOCK(p); if (p->p_state == PRS_NEW) { PROC_UNLOCK(p); continue; } microuptime(&p->p_stats->p_start); PROC_STATLOCK(p); rufetch(p, &ru); /* Clears thread stats */ p->p_rux.rux_runtime = 0; p->p_rux.rux_uticks = 0; p->p_rux.rux_sticks = 0; p->p_rux.rux_iticks = 0; PROC_STATUNLOCK(p); FOREACH_THREAD_IN_PROC(p, td) { td->td_runtime = 0; } PROC_UNLOCK(p); } sx_sunlock(&allproc_lock); PCPU_SET(switchtime, cpu_ticks()); PCPU_SET(switchticks, ticks); - - /* - * Give the ``random'' number generator a thump. - */ - nanotime(&ts); - srandom(ts.tv_sec ^ ts.tv_nsec); } SYSINIT(p0post, SI_SUB_INTRINSIC_POST, SI_ORDER_FIRST, proc0_post, NULL); - -static void -random_init(void *dummy __unused) -{ - - /* - * After CPU has been started we have some randomness on most - * platforms via get_cyclecount(). For platforms that don't - * we will reseed random(9) in proc0_post() as well. - */ - srandom(get_cyclecount()); -} -SYSINIT(random, SI_SUB_RANDOM, SI_ORDER_FIRST, random_init, NULL); /* *************************************************************************** **** **** The following SYSINIT's and glue code should be moved to the **** respective files on a per subsystem basis. **** *************************************************************************** */ /* * List of paths to try when searching for "init". */ static char init_path[MAXPATHLEN] = #ifdef INIT_PATH __XSTRING(INIT_PATH); #else "/sbin/init:/sbin/oinit:/sbin/init.bak:/rescue/init"; #endif SYSCTL_STRING(_kern, OID_AUTO, init_path, CTLFLAG_RD, init_path, 0, "Path used to search the init process"); /* * Shutdown timeout of init(8). * Unused within kernel, but used to control init(8), hence do not remove. */ #ifndef INIT_SHUTDOWN_TIMEOUT #define INIT_SHUTDOWN_TIMEOUT 120 #endif static int init_shutdown_timeout = INIT_SHUTDOWN_TIMEOUT; SYSCTL_INT(_kern, OID_AUTO, init_shutdown_timeout, CTLFLAG_RW, &init_shutdown_timeout, 0, "Shutdown timeout of init(8). " "Unused within kernel, but used to control init(8)"); /* * Start the initial user process; try exec'ing each pathname in init_path. * The program is invoked with one argument containing the boot flags. */ static void start_init(void *dummy) { struct image_args args; int error; char *var, *path; char *free_init_path, *tmp_init_path; struct thread *td; struct proc *p; struct vmspace *oldvmspace; TSENTER(); /* Here so we don't overlap with mi_startup. */ td = curthread; p = td->td_proc; vfs_mountroot(); /* Wipe GELI passphrase from the environment. */ kern_unsetenv("kern.geom.eli.passphrase"); if ((var = kern_getenv("init_path")) != NULL) { strlcpy(init_path, var, sizeof(init_path)); freeenv(var); } free_init_path = tmp_init_path = strdup(init_path, M_TEMP); while ((path = strsep(&tmp_init_path, ":")) != NULL) { if (bootverbose) printf("start_init: trying %s\n", path); memset(&args, 0, sizeof(args)); error = exec_alloc_args(&args); if (error != 0) panic("%s: Can't allocate space for init arguments %d", __func__, error); error = exec_args_add_fname(&args, path, UIO_SYSSPACE); if (error != 0) panic("%s: Can't add fname %d", __func__, error); error = exec_args_add_arg(&args, path, UIO_SYSSPACE); if (error != 0) panic("%s: Can't add argv[0] %d", __func__, error); if (boothowto & RB_SINGLE) error = exec_args_add_arg(&args, "-s", UIO_SYSSPACE); if (error != 0) panic("%s: Can't add argv[0] %d", __func__, error); /* * Now try to exec the program. If can't for any reason * other than it doesn't exist, complain. * * Otherwise, return via fork_trampoline() all the way * to user mode as init! */ KASSERT((td->td_pflags & TDP_EXECVMSPC) == 0, ("nested execve")); oldvmspace = td->td_proc->p_vmspace; error = kern_execve(td, &args, NULL); KASSERT(error != 0, ("kern_execve returned success, not EJUSTRETURN")); if (error == EJUSTRETURN) { if ((td->td_pflags & TDP_EXECVMSPC) != 0) { KASSERT(p->p_vmspace != oldvmspace, ("oldvmspace still used")); vmspace_free(oldvmspace); td->td_pflags &= ~TDP_EXECVMSPC; } free(free_init_path, M_TEMP); TSEXIT(); return; } if (error != ENOENT) printf("exec %s: error %d\n", path, error); } free(free_init_path, M_TEMP); printf("init: not found in path %s\n", init_path); panic("no init"); } /* * Like kproc_create(), but runs in its own address space. We do this * early to reserve pid 1. Note special case - do not make it * runnable yet, init execution is started when userspace can be served. */ static void create_init(const void *udata __unused) { struct fork_req fr; struct ucred *newcred, *oldcred; struct thread *td; int error; bzero(&fr, sizeof(fr)); fr.fr_flags = RFFDG | RFPROC | RFSTOPPED; fr.fr_procp = &initproc; error = fork1(&thread0, &fr); if (error) panic("cannot fork init: %d\n", error); KASSERT(initproc->p_pid == 1, ("create_init: initproc->p_pid != 1")); /* divorce init's credentials from the kernel's */ newcred = crget(); sx_xlock(&proctree_lock); PROC_LOCK(initproc); initproc->p_flag |= P_SYSTEM | P_INMEM; initproc->p_treeflag |= P_TREE_REAPER; oldcred = initproc->p_ucred; crcopy(newcred, oldcred); #ifdef MAC mac_cred_create_init(newcred); #endif #ifdef AUDIT audit_cred_proc1(newcred); #endif proc_set_cred(initproc, newcred); td = FIRST_THREAD_IN_PROC(initproc); crfree(td->td_ucred); td->td_ucred = crhold(initproc->p_ucred); PROC_UNLOCK(initproc); sx_xunlock(&proctree_lock); crfree(oldcred); cpu_fork_kthread_handler(FIRST_THREAD_IN_PROC(initproc), start_init, NULL); } SYSINIT(init, SI_SUB_CREATE_INIT, SI_ORDER_FIRST, create_init, NULL); /* * Make it runnable now. */ static void kick_init(const void *udata __unused) { struct thread *td; td = FIRST_THREAD_IN_PROC(initproc); thread_lock(td); TD_SET_CAN_RUN(td); sched_add(td, SRQ_BORING); } SYSINIT(kickinit, SI_SUB_KTHREAD_INIT, SI_ORDER_MIDDLE, kick_init, NULL); Index: head/sys/kern/subr_stats.c =================================================================== --- head/sys/kern/subr_stats.c (revision 356096) +++ head/sys/kern/subr_stats.c (revision 356097) @@ -1,3913 +1,3919 @@ /*- * Copyright (c) 2014-2018 Netflix, Inc. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ /* * Author: Lawrence Stewart */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #if defined(DIAGNOSTIC) #include #endif #include /* Must come after qmath.h and arb.h */ #include #include #include #ifdef _KERNEL #include #include #include #include #include #else /* ! _KERNEL */ #include #include #include #include #include #endif /* _KERNEL */ struct voistatdata_voistate { /* Previous VOI value for diff calculation. */ struct voistatdata_numeric prev; }; #define VS_VSDVALID 0x0001 /* Stat's voistatdata updated at least once. */ struct voistat { int8_t stype; /* Type of stat e.g. VS_STYPE_SUM. */ enum vsd_dtype dtype : 8; /* Data type of this stat's data. */ uint16_t data_off; /* Blob offset for this stat's data. */ uint16_t dsz; /* Size of stat's data. */ #define VS_EBITS 8 uint16_t errs : VS_EBITS;/* Non-wrapping error count. */ uint16_t flags : 16 - VS_EBITS; }; /* The voistat error count is capped to avoid wrapping. */ #define VS_INCERRS(vs) do { \ if ((vs)->errs < (1U << VS_EBITS) - 1) \ (vs)->errs++; \ } while (0) /* * Ideas for flags: * - Global or entity specific (global would imply use of counter(9)?) * - Whether to reset stats on read or not * - Signal an overflow? * - Compressed voistat array */ #define VOI_REQSTATE 0x0001 /* VOI requires VS_STYPE_VOISTATE. */ struct voi { int16_t id; /* VOI id. */ enum vsd_dtype dtype : 8; /* Data type of the VOI itself. */ int8_t voistatmaxid; /* Largest allocated voistat index. */ uint16_t stats_off; /* Blob offset for this VOIs stats. */ uint16_t flags; }; /* * Memory for the entire blob is allocated as a slab and then offsets are * maintained to carve up the slab into sections holding different data types. * * Ideas for flags: * - Compressed voi array (trade off memory usage vs search time) * - Units of offsets (default bytes, flag for e.g. vm_page/KiB/Mib) */ struct statsblobv1 { uint8_t abi; uint8_t endian; uint16_t flags; uint16_t maxsz; uint16_t cursz; /* Fields from here down are opaque to consumers. */ uint32_t tplhash; /* Base template hash ID. */ uint16_t stats_off; /* voistat array blob offset. */ uint16_t statsdata_off; /* voistatdata array blob offset. */ sbintime_t created; /* Blob creation time. */ sbintime_t lastrst; /* Time of last reset. */ struct voi vois[]; /* Array indexed by [voi_id]. */ } __aligned(sizeof(void *)); _Static_assert(offsetof(struct statsblobv1, cursz) + SIZEOF_MEMBER(struct statsblobv1, cursz) == offsetof(struct statsblob, opaque), "statsblobv1 ABI mismatch"); struct statsblobv1_tpl { struct metablob *mb; struct statsblobv1 *sb; }; /* Context passed to iterator callbacks. */ struct sb_iter_ctx { void *usrctx; /* Caller supplied context. */ uint32_t flags; /* Flags for current iteration. */ int16_t vslot; /* struct voi slot index. */ int8_t vsslot; /* struct voistat slot index. */ }; struct sb_tostrcb_ctx { struct sbuf *buf; struct statsblob_tpl *tpl; enum sb_str_fmt fmt; uint32_t flags; }; struct sb_visitcb_ctx { stats_blob_visitcb_t cb; void *usrctx; }; /* Stats blob iterator callback. */ typedef int (*stats_v1_blob_itercb_t)(struct statsblobv1 *sb, struct voi *v, struct voistat *vs, struct sb_iter_ctx *ctx); #ifdef _KERNEL static struct rwlock tpllistlock; RW_SYSINIT(stats_tpl_list, &tpllistlock, "Stat template list lock"); #define TPL_LIST_RLOCK() rw_rlock(&tpllistlock) #define TPL_LIST_RUNLOCK() rw_runlock(&tpllistlock) #define TPL_LIST_WLOCK() rw_wlock(&tpllistlock) #define TPL_LIST_WUNLOCK() rw_wunlock(&tpllistlock) #define TPL_LIST_LOCK_ASSERT() rw_assert(&tpllistlock, RA_LOCKED) #define TPL_LIST_RLOCK_ASSERT() rw_assert(&tpllistlock, RA_RLOCKED) #define TPL_LIST_WLOCK_ASSERT() rw_assert(&tpllistlock, RA_WLOCKED) MALLOC_DEFINE(M_STATS, "stats(9) related memory", "stats(9) related memory"); #define stats_free(ptr) free((ptr), M_STATS) #else /* ! _KERNEL */ static void stats_constructor(void); static void stats_destructor(void); static pthread_rwlock_t tpllistlock; #define TPL_LIST_UNLOCK() pthread_rwlock_unlock(&tpllistlock) #define TPL_LIST_RLOCK() pthread_rwlock_rdlock(&tpllistlock) #define TPL_LIST_RUNLOCK() TPL_LIST_UNLOCK() #define TPL_LIST_WLOCK() pthread_rwlock_wrlock(&tpllistlock) #define TPL_LIST_WUNLOCK() TPL_LIST_UNLOCK() #define TPL_LIST_LOCK_ASSERT() do { } while (0) #define TPL_LIST_RLOCK_ASSERT() do { } while (0) #define TPL_LIST_WLOCK_ASSERT() do { } while (0) #ifdef NDEBUG #define KASSERT(cond, msg) do {} while (0) #define stats_abort() do {} while (0) #else /* ! NDEBUG */ #define KASSERT(cond, msg) do { \ if (!(cond)) { \ panic msg; \ } \ } while (0) #define stats_abort() abort() #endif /* NDEBUG */ #define stats_free(ptr) free(ptr) #define panic(fmt, ...) do { \ fprintf(stderr, (fmt), ##__VA_ARGS__); \ stats_abort(); \ } while (0) #endif /* _KERNEL */ #define SB_V1_MAXSZ 65535 /* Obtain a blob offset pointer. */ #define BLOB_OFFSET(sb, off) ((void *)(((uint8_t *)(sb)) + (off))) /* * Number of VOIs in the blob's vois[] array. By virtue of struct voi being a * power of 2 size, we can shift instead of divide. The shift amount must be * updated if sizeof(struct voi) ever changes, which the assert should catch. */ #define NVOIS(sb) ((int32_t)((((struct statsblobv1 *)(sb))->stats_off - \ sizeof(struct statsblobv1)) >> 3)) _Static_assert(sizeof(struct voi) == 8, "statsblobv1 voi ABI mismatch"); /* Try restrict names to alphanumeric and underscore to simplify JSON compat. */ const char *vs_stype2name[VS_NUM_STYPES] = { [VS_STYPE_VOISTATE] = "VOISTATE", [VS_STYPE_SUM] = "SUM", [VS_STYPE_MAX] = "MAX", [VS_STYPE_MIN] = "MIN", [VS_STYPE_HIST] = "HIST", [VS_STYPE_TDGST] = "TDGST", }; const char *vs_stype2desc[VS_NUM_STYPES] = { [VS_STYPE_VOISTATE] = "VOI related state data (not a real stat)", [VS_STYPE_SUM] = "Simple arithmetic accumulator", [VS_STYPE_MAX] = "Maximum observed VOI value", [VS_STYPE_MIN] = "Minimum observed VOI value", [VS_STYPE_HIST] = "Histogram of observed VOI values", [VS_STYPE_TDGST] = "t-digest of observed VOI values", }; const char *vsd_dtype2name[VSD_NUM_DTYPES] = { [VSD_DTYPE_VOISTATE] = "VOISTATE", [VSD_DTYPE_INT_S32] = "INT_S32", [VSD_DTYPE_INT_U32] = "INT_U32", [VSD_DTYPE_INT_S64] = "INT_S64", [VSD_DTYPE_INT_U64] = "INT_U64", [VSD_DTYPE_INT_SLONG] = "INT_SLONG", [VSD_DTYPE_INT_ULONG] = "INT_ULONG", [VSD_DTYPE_Q_S32] = "Q_S32", [VSD_DTYPE_Q_U32] = "Q_U32", [VSD_DTYPE_Q_S64] = "Q_S64", [VSD_DTYPE_Q_U64] = "Q_U64", [VSD_DTYPE_CRHIST32] = "CRHIST32", [VSD_DTYPE_DRHIST32] = "DRHIST32", [VSD_DTYPE_DVHIST32] = "DVHIST32", [VSD_DTYPE_CRHIST64] = "CRHIST64", [VSD_DTYPE_DRHIST64] = "DRHIST64", [VSD_DTYPE_DVHIST64] = "DVHIST64", [VSD_DTYPE_TDGSTCLUST32] = "TDGSTCLUST32", [VSD_DTYPE_TDGSTCLUST64] = "TDGSTCLUST64", }; const size_t vsd_dtype2size[VSD_NUM_DTYPES] = { [VSD_DTYPE_VOISTATE] = sizeof(struct voistatdata_voistate), [VSD_DTYPE_INT_S32] = sizeof(struct voistatdata_int32), [VSD_DTYPE_INT_U32] = sizeof(struct voistatdata_int32), [VSD_DTYPE_INT_S64] = sizeof(struct voistatdata_int64), [VSD_DTYPE_INT_U64] = sizeof(struct voistatdata_int64), [VSD_DTYPE_INT_SLONG] = sizeof(struct voistatdata_intlong), [VSD_DTYPE_INT_ULONG] = sizeof(struct voistatdata_intlong), [VSD_DTYPE_Q_S32] = sizeof(struct voistatdata_q32), [VSD_DTYPE_Q_U32] = sizeof(struct voistatdata_q32), [VSD_DTYPE_Q_S64] = sizeof(struct voistatdata_q64), [VSD_DTYPE_Q_U64] = sizeof(struct voistatdata_q64), [VSD_DTYPE_CRHIST32] = sizeof(struct voistatdata_crhist32), [VSD_DTYPE_DRHIST32] = sizeof(struct voistatdata_drhist32), [VSD_DTYPE_DVHIST32] = sizeof(struct voistatdata_dvhist32), [VSD_DTYPE_CRHIST64] = sizeof(struct voistatdata_crhist64), [VSD_DTYPE_DRHIST64] = sizeof(struct voistatdata_drhist64), [VSD_DTYPE_DVHIST64] = sizeof(struct voistatdata_dvhist64), [VSD_DTYPE_TDGSTCLUST32] = sizeof(struct voistatdata_tdgstclust32), [VSD_DTYPE_TDGSTCLUST64] = sizeof(struct voistatdata_tdgstclust64), }; static const bool vsd_compoundtype[VSD_NUM_DTYPES] = { [VSD_DTYPE_VOISTATE] = true, [VSD_DTYPE_INT_S32] = false, [VSD_DTYPE_INT_U32] = false, [VSD_DTYPE_INT_S64] = false, [VSD_DTYPE_INT_U64] = false, [VSD_DTYPE_INT_SLONG] = false, [VSD_DTYPE_INT_ULONG] = false, [VSD_DTYPE_Q_S32] = false, [VSD_DTYPE_Q_U32] = false, [VSD_DTYPE_Q_S64] = false, [VSD_DTYPE_Q_U64] = false, [VSD_DTYPE_CRHIST32] = true, [VSD_DTYPE_DRHIST32] = true, [VSD_DTYPE_DVHIST32] = true, [VSD_DTYPE_CRHIST64] = true, [VSD_DTYPE_DRHIST64] = true, [VSD_DTYPE_DVHIST64] = true, [VSD_DTYPE_TDGSTCLUST32] = true, [VSD_DTYPE_TDGSTCLUST64] = true, }; const struct voistatdata_numeric numeric_limits[2][VSD_DTYPE_Q_U64 + 1] = { [LIM_MIN] = { [VSD_DTYPE_VOISTATE] = {0}, [VSD_DTYPE_INT_S32] = {.int32 = {.s32 = INT32_MIN}}, [VSD_DTYPE_INT_U32] = {.int32 = {.u32 = 0}}, [VSD_DTYPE_INT_S64] = {.int64 = {.s64 = INT64_MIN}}, [VSD_DTYPE_INT_U64] = {.int64 = {.u64 = 0}}, [VSD_DTYPE_INT_SLONG] = {.intlong = {.slong = LONG_MIN}}, [VSD_DTYPE_INT_ULONG] = {.intlong = {.ulong = 0}}, [VSD_DTYPE_Q_S32] = {.q32 = {.sq32 = Q_IFMINVAL(INT32_MIN)}}, [VSD_DTYPE_Q_U32] = {.q32 = {.uq32 = 0}}, [VSD_DTYPE_Q_S64] = {.q64 = {.sq64 = Q_IFMINVAL(INT64_MIN)}}, [VSD_DTYPE_Q_U64] = {.q64 = {.uq64 = 0}}, }, [LIM_MAX] = { [VSD_DTYPE_VOISTATE] = {0}, [VSD_DTYPE_INT_S32] = {.int32 = {.s32 = INT32_MAX}}, [VSD_DTYPE_INT_U32] = {.int32 = {.u32 = UINT32_MAX}}, [VSD_DTYPE_INT_S64] = {.int64 = {.s64 = INT64_MAX}}, [VSD_DTYPE_INT_U64] = {.int64 = {.u64 = UINT64_MAX}}, [VSD_DTYPE_INT_SLONG] = {.intlong = {.slong = LONG_MAX}}, [VSD_DTYPE_INT_ULONG] = {.intlong = {.ulong = ULONG_MAX}}, [VSD_DTYPE_Q_S32] = {.q32 = {.sq32 = Q_IFMAXVAL(INT32_MAX)}}, [VSD_DTYPE_Q_U32] = {.q32 = {.uq32 = Q_IFMAXVAL(UINT32_MAX)}}, [VSD_DTYPE_Q_S64] = {.q64 = {.sq64 = Q_IFMAXVAL(INT64_MAX)}}, [VSD_DTYPE_Q_U64] = {.q64 = {.uq64 = Q_IFMAXVAL(UINT64_MAX)}}, } }; /* tpllistlock protects tpllist and ntpl */ static uint32_t ntpl; static struct statsblob_tpl **tpllist; static inline void * stats_realloc(void *ptr, size_t oldsz, size_t newsz, int flags); //static void stats_v1_blob_finalise(struct statsblobv1 *sb); static int stats_v1_blob_init_locked(struct statsblobv1 *sb, uint32_t tpl_id, uint32_t flags); static int stats_v1_blob_expand(struct statsblobv1 **sbpp, int newvoibytes, int newvoistatbytes, int newvoistatdatabytes); static void stats_v1_blob_iter(struct statsblobv1 *sb, stats_v1_blob_itercb_t icb, void *usrctx, uint32_t flags); static inline int stats_v1_vsd_tdgst_add(enum vsd_dtype vs_dtype, struct voistatdata_tdgst *tdgst, s64q_t x, uint64_t weight, int attempt); static inline int ctd32cmp(const struct voistatdata_tdgstctd32 *c1, const struct voistatdata_tdgstctd32 *c2) { KASSERT(Q_PRECEQ(c1->mu, c2->mu), ("%s: Q_RELPREC(c1->mu,c2->mu)=%d", __func__, Q_RELPREC(c1->mu, c2->mu))); return (Q_QLTQ(c1->mu, c2->mu) ? -1 : 1); } ARB_GENERATE_STATIC(ctdth32, voistatdata_tdgstctd32, ctdlnk, ctd32cmp); static inline int ctd64cmp(const struct voistatdata_tdgstctd64 *c1, const struct voistatdata_tdgstctd64 *c2) { KASSERT(Q_PRECEQ(c1->mu, c2->mu), ("%s: Q_RELPREC(c1->mu,c2->mu)=%d", __func__, Q_RELPREC(c1->mu, c2->mu))); return (Q_QLTQ(c1->mu, c2->mu) ? -1 : 1); } ARB_GENERATE_STATIC(ctdth64, voistatdata_tdgstctd64, ctdlnk, ctd64cmp); #ifdef DIAGNOSTIC RB_GENERATE_STATIC(rbctdth32, voistatdata_tdgstctd32, rblnk, ctd32cmp); RB_GENERATE_STATIC(rbctdth64, voistatdata_tdgstctd64, rblnk, ctd64cmp); #endif static inline sbintime_t stats_sbinuptime(void) { sbintime_t sbt; #ifdef _KERNEL sbt = sbinuptime(); #else /* ! _KERNEL */ struct timespec tp; clock_gettime(CLOCK_MONOTONIC_FAST, &tp); sbt = tstosbt(tp); #endif /* _KERNEL */ return (sbt); } static inline void * stats_realloc(void *ptr, size_t oldsz, size_t newsz, int flags) { #ifdef _KERNEL /* Default to M_NOWAIT if neither M_NOWAIT or M_WAITOK are set. */ if (!(flags & (M_WAITOK | M_NOWAIT))) flags |= M_NOWAIT; ptr = realloc(ptr, newsz, M_STATS, flags); #else /* ! _KERNEL */ ptr = realloc(ptr, newsz); if ((flags & M_ZERO) && ptr != NULL) { if (oldsz == 0) memset(ptr, '\0', newsz); else if (newsz > oldsz) memset(BLOB_OFFSET(ptr, oldsz), '\0', newsz - oldsz); } #endif /* _KERNEL */ return (ptr); } static inline char * stats_strdup(const char *s, #ifdef _KERNEL int flags) { char *copy; size_t len; if (!(flags & (M_WAITOK | M_NOWAIT))) flags |= M_NOWAIT; len = strlen(s) + 1; if ((copy = malloc(len, M_STATS, flags)) != NULL) bcopy(s, copy, len); return (copy); #else int flags __unused) { return (strdup(s)); #endif } static inline void stats_tpl_update_hash(struct statsblob_tpl *tpl) { TPL_LIST_WLOCK_ASSERT(); tpl->mb->tplhash = hash32_str(tpl->mb->tplname, 0); for (int voi_id = 0; voi_id < NVOIS(tpl->sb); voi_id++) { if (tpl->mb->voi_meta[voi_id].name != NULL) tpl->mb->tplhash = hash32_str( tpl->mb->voi_meta[voi_id].name, tpl->mb->tplhash); } tpl->mb->tplhash = hash32_buf(tpl->sb, tpl->sb->cursz, tpl->mb->tplhash); } static inline uint64_t stats_pow_u64(uint64_t base, uint64_t exp) { uint64_t result = 1; while (exp) { if (exp & 1) result *= base; exp >>= 1; base *= base; } return (result); } static inline int stats_vss_hist_bkt_hlpr(struct vss_hist_hlpr_info *info, uint32_t curbkt, struct voistatdata_numeric *bkt_lb, struct voistatdata_numeric *bkt_ub) { uint64_t step = 0; int error = 0; switch (info->scheme) { case BKT_LIN: step = info->lin.stepinc; break; case BKT_EXP: step = stats_pow_u64(info->exp.stepbase, info->exp.stepexp + curbkt); break; case BKT_LINEXP: { uint64_t curstepexp = 1; switch (info->voi_dtype) { case VSD_DTYPE_INT_S32: while ((int32_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= bkt_lb->int32.s32) curstepexp++; break; case VSD_DTYPE_INT_U32: while ((uint32_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= bkt_lb->int32.u32) curstepexp++; break; case VSD_DTYPE_INT_S64: while ((int64_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= bkt_lb->int64.s64) curstepexp++; break; case VSD_DTYPE_INT_U64: while ((uint64_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= bkt_lb->int64.u64) curstepexp++; break; case VSD_DTYPE_INT_SLONG: while ((long)stats_pow_u64(info->linexp.stepbase, curstepexp) <= bkt_lb->intlong.slong) curstepexp++; break; case VSD_DTYPE_INT_ULONG: while ((unsigned long)stats_pow_u64(info->linexp.stepbase, curstepexp) <= bkt_lb->intlong.ulong) curstepexp++; break; case VSD_DTYPE_Q_S32: while ((s32q_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= Q_GIVAL(bkt_lb->q32.sq32)) break; case VSD_DTYPE_Q_U32: while ((u32q_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= Q_GIVAL(bkt_lb->q32.uq32)) break; case VSD_DTYPE_Q_S64: while ((s64q_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= Q_GIVAL(bkt_lb->q64.sq64)) curstepexp++; break; case VSD_DTYPE_Q_U64: while ((u64q_t)stats_pow_u64(info->linexp.stepbase, curstepexp) <= Q_GIVAL(bkt_lb->q64.uq64)) curstepexp++; break; default: break; } step = stats_pow_u64(info->linexp.stepbase, curstepexp) / info->linexp.linstepdiv; if (step == 0) step = 1; break; } default: break; } if (info->scheme == BKT_USR) { *bkt_lb = info->usr.bkts[curbkt].lb; *bkt_ub = info->usr.bkts[curbkt].ub; } else if (step != 0) { switch (info->voi_dtype) { case VSD_DTYPE_INT_S32: bkt_ub->int32.s32 += (int32_t)step; break; case VSD_DTYPE_INT_U32: bkt_ub->int32.u32 += (uint32_t)step; break; case VSD_DTYPE_INT_S64: bkt_ub->int64.s64 += (int64_t)step; break; case VSD_DTYPE_INT_U64: bkt_ub->int64.u64 += (uint64_t)step; break; case VSD_DTYPE_INT_SLONG: bkt_ub->intlong.slong += (long)step; break; case VSD_DTYPE_INT_ULONG: bkt_ub->intlong.ulong += (unsigned long)step; break; case VSD_DTYPE_Q_S32: error = Q_QADDI(&bkt_ub->q32.sq32, step); break; case VSD_DTYPE_Q_U32: error = Q_QADDI(&bkt_ub->q32.uq32, step); break; case VSD_DTYPE_Q_S64: error = Q_QADDI(&bkt_ub->q64.sq64, step); break; case VSD_DTYPE_Q_U64: error = Q_QADDI(&bkt_ub->q64.uq64, step); break; default: break; } } else { /* info->scheme != BKT_USR && step == 0 */ return (EINVAL); } return (error); } static uint32_t stats_vss_hist_nbkts_hlpr(struct vss_hist_hlpr_info *info) { struct voistatdata_numeric bkt_lb, bkt_ub; uint32_t nbkts; int done; if (info->scheme == BKT_USR) { /* XXXLAS: Setting info->{lb,ub} from macro is tricky. */ info->lb = info->usr.bkts[0].lb; info->ub = info->usr.bkts[info->usr.nbkts - 1].lb; } nbkts = 0; done = 0; bkt_ub = info->lb; do { bkt_lb = bkt_ub; if (stats_vss_hist_bkt_hlpr(info, nbkts++, &bkt_lb, &bkt_ub)) return (0); if (info->scheme == BKT_USR) done = (nbkts == info->usr.nbkts); else { switch (info->voi_dtype) { case VSD_DTYPE_INT_S32: done = (bkt_ub.int32.s32 > info->ub.int32.s32); break; case VSD_DTYPE_INT_U32: done = (bkt_ub.int32.u32 > info->ub.int32.u32); break; case VSD_DTYPE_INT_S64: done = (bkt_ub.int64.s64 > info->ub.int64.s64); break; case VSD_DTYPE_INT_U64: done = (bkt_ub.int64.u64 > info->ub.int64.u64); break; case VSD_DTYPE_INT_SLONG: done = (bkt_ub.intlong.slong > info->ub.intlong.slong); break; case VSD_DTYPE_INT_ULONG: done = (bkt_ub.intlong.ulong > info->ub.intlong.ulong); break; case VSD_DTYPE_Q_S32: done = Q_QGTQ(bkt_ub.q32.sq32, info->ub.q32.sq32); break; case VSD_DTYPE_Q_U32: done = Q_QGTQ(bkt_ub.q32.uq32, info->ub.q32.uq32); break; case VSD_DTYPE_Q_S64: done = Q_QGTQ(bkt_ub.q64.sq64, info->ub.q64.sq64); break; case VSD_DTYPE_Q_U64: done = Q_QGTQ(bkt_ub.q64.uq64, info->ub.q64.uq64); break; default: return (0); } } } while (!done); if (info->flags & VSD_HIST_LBOUND_INF) nbkts++; if (info->flags & VSD_HIST_UBOUND_INF) nbkts++; return (nbkts); } int stats_vss_hist_hlpr(enum vsd_dtype voi_dtype, struct voistatspec *vss, struct vss_hist_hlpr_info *info) { struct voistatdata_hist *hist; struct voistatdata_numeric bkt_lb, bkt_ub, *lbinfbktlb, *lbinfbktub, *ubinfbktlb, *ubinfbktub; uint32_t bkt, nbkts, nloop; if (vss == NULL || info == NULL || (info->flags & (VSD_HIST_LBOUND_INF|VSD_HIST_UBOUND_INF) && (info->hist_dtype == VSD_DTYPE_DVHIST32 || info->hist_dtype == VSD_DTYPE_DVHIST64))) return (EINVAL); info->voi_dtype = voi_dtype; if ((nbkts = stats_vss_hist_nbkts_hlpr(info)) == 0) return (EINVAL); switch (info->hist_dtype) { case VSD_DTYPE_CRHIST32: vss->vsdsz = HIST_NBKTS2VSDSZ(crhist32, nbkts); break; case VSD_DTYPE_DRHIST32: vss->vsdsz = HIST_NBKTS2VSDSZ(drhist32, nbkts); break; case VSD_DTYPE_DVHIST32: vss->vsdsz = HIST_NBKTS2VSDSZ(dvhist32, nbkts); break; case VSD_DTYPE_CRHIST64: vss->vsdsz = HIST_NBKTS2VSDSZ(crhist64, nbkts); break; case VSD_DTYPE_DRHIST64: vss->vsdsz = HIST_NBKTS2VSDSZ(drhist64, nbkts); break; case VSD_DTYPE_DVHIST64: vss->vsdsz = HIST_NBKTS2VSDSZ(dvhist64, nbkts); break; default: return (EINVAL); } vss->iv = stats_realloc(NULL, 0, vss->vsdsz, M_ZERO); if (vss->iv == NULL) return (ENOMEM); hist = (struct voistatdata_hist *)vss->iv; bkt_ub = info->lb; for (bkt = (info->flags & VSD_HIST_LBOUND_INF), nloop = 0; bkt < nbkts; bkt++, nloop++) { bkt_lb = bkt_ub; if (stats_vss_hist_bkt_hlpr(info, nloop, &bkt_lb, &bkt_ub)) return (EINVAL); switch (info->hist_dtype) { case VSD_DTYPE_CRHIST32: VSD(crhist32, hist)->bkts[bkt].lb = bkt_lb; break; case VSD_DTYPE_DRHIST32: VSD(drhist32, hist)->bkts[bkt].lb = bkt_lb; VSD(drhist32, hist)->bkts[bkt].ub = bkt_ub; break; case VSD_DTYPE_DVHIST32: VSD(dvhist32, hist)->bkts[bkt].val = bkt_lb; break; case VSD_DTYPE_CRHIST64: VSD(crhist64, hist)->bkts[bkt].lb = bkt_lb; break; case VSD_DTYPE_DRHIST64: VSD(drhist64, hist)->bkts[bkt].lb = bkt_lb; VSD(drhist64, hist)->bkts[bkt].ub = bkt_ub; break; case VSD_DTYPE_DVHIST64: VSD(dvhist64, hist)->bkts[bkt].val = bkt_lb; break; default: return (EINVAL); } } lbinfbktlb = lbinfbktub = ubinfbktlb = ubinfbktub = NULL; switch (info->hist_dtype) { case VSD_DTYPE_CRHIST32: lbinfbktlb = &VSD(crhist32, hist)->bkts[0].lb; ubinfbktlb = &VSD(crhist32, hist)->bkts[nbkts - 1].lb; break; case VSD_DTYPE_DRHIST32: lbinfbktlb = &VSD(drhist32, hist)->bkts[0].lb; lbinfbktub = &VSD(drhist32, hist)->bkts[0].ub; ubinfbktlb = &VSD(drhist32, hist)->bkts[nbkts - 1].lb; ubinfbktub = &VSD(drhist32, hist)->bkts[nbkts - 1].ub; break; case VSD_DTYPE_CRHIST64: lbinfbktlb = &VSD(crhist64, hist)->bkts[0].lb; ubinfbktlb = &VSD(crhist64, hist)->bkts[nbkts - 1].lb; break; case VSD_DTYPE_DRHIST64: lbinfbktlb = &VSD(drhist64, hist)->bkts[0].lb; lbinfbktub = &VSD(drhist64, hist)->bkts[0].ub; ubinfbktlb = &VSD(drhist64, hist)->bkts[nbkts - 1].lb; ubinfbktub = &VSD(drhist64, hist)->bkts[nbkts - 1].ub; break; case VSD_DTYPE_DVHIST32: case VSD_DTYPE_DVHIST64: break; default: return (EINVAL); } if ((info->flags & VSD_HIST_LBOUND_INF) && lbinfbktlb) { *lbinfbktlb = numeric_limits[LIM_MIN][info->voi_dtype]; /* * Assignment from numeric_limit array for Q types assigns max * possible integral/fractional value for underlying data type, * but we must set control bits for this specific histogram per * the user's choice of fractional bits, which we extract from * info->lb. */ if (info->voi_dtype == VSD_DTYPE_Q_S32 || info->voi_dtype == VSD_DTYPE_Q_U32) { /* Signedness doesn't matter for setting control bits. */ Q_SCVAL(lbinfbktlb->q32.sq32, Q_GCVAL(info->lb.q32.sq32)); } else if (info->voi_dtype == VSD_DTYPE_Q_S64 || info->voi_dtype == VSD_DTYPE_Q_U64) { /* Signedness doesn't matter for setting control bits. */ Q_SCVAL(lbinfbktlb->q64.sq64, Q_GCVAL(info->lb.q64.sq64)); } if (lbinfbktub) *lbinfbktub = info->lb; } if ((info->flags & VSD_HIST_UBOUND_INF) && ubinfbktlb) { *ubinfbktlb = bkt_lb; if (ubinfbktub) { *ubinfbktub = numeric_limits[LIM_MAX][info->voi_dtype]; if (info->voi_dtype == VSD_DTYPE_Q_S32 || info->voi_dtype == VSD_DTYPE_Q_U32) { Q_SCVAL(ubinfbktub->q32.sq32, Q_GCVAL(info->lb.q32.sq32)); } else if (info->voi_dtype == VSD_DTYPE_Q_S64 || info->voi_dtype == VSD_DTYPE_Q_U64) { Q_SCVAL(ubinfbktub->q64.sq64, Q_GCVAL(info->lb.q64.sq64)); } } } return (0); } int stats_vss_tdgst_hlpr(enum vsd_dtype voi_dtype, struct voistatspec *vss, struct vss_tdgst_hlpr_info *info) { struct voistatdata_tdgst *tdgst; struct ctdth32 *ctd32tree; struct ctdth64 *ctd64tree; struct voistatdata_tdgstctd32 *ctd32; struct voistatdata_tdgstctd64 *ctd64; info->voi_dtype = voi_dtype; switch (info->tdgst_dtype) { case VSD_DTYPE_TDGSTCLUST32: vss->vsdsz = TDGST_NCTRS2VSDSZ(tdgstclust32, info->nctds); break; case VSD_DTYPE_TDGSTCLUST64: vss->vsdsz = TDGST_NCTRS2VSDSZ(tdgstclust64, info->nctds); break; default: return (EINVAL); } vss->iv = stats_realloc(NULL, 0, vss->vsdsz, M_ZERO); if (vss->iv == NULL) return (ENOMEM); tdgst = (struct voistatdata_tdgst *)vss->iv; switch (info->tdgst_dtype) { case VSD_DTYPE_TDGSTCLUST32: ctd32tree = &VSD(tdgstclust32, tdgst)->ctdtree; ARB_INIT(ctd32, ctdlnk, ctd32tree, info->nctds) { Q_INI(&ctd32->mu, 0, 0, info->prec); } break; case VSD_DTYPE_TDGSTCLUST64: ctd64tree = &VSD(tdgstclust64, tdgst)->ctdtree; ARB_INIT(ctd64, ctdlnk, ctd64tree, info->nctds) { Q_INI(&ctd64->mu, 0, 0, info->prec); } break; default: return (EINVAL); } return (0); } int stats_vss_numeric_hlpr(enum vsd_dtype voi_dtype, struct voistatspec *vss, struct vss_numeric_hlpr_info *info) { struct voistatdata_numeric iv; switch (vss->stype) { case VS_STYPE_SUM: iv = stats_ctor_vsd_numeric(0); break; case VS_STYPE_MIN: iv = numeric_limits[LIM_MAX][voi_dtype]; break; case VS_STYPE_MAX: iv = numeric_limits[LIM_MIN][voi_dtype]; break; default: return (EINVAL); } vss->iv = stats_realloc(NULL, 0, vsd_dtype2size[voi_dtype], 0); if (vss->iv == NULL) return (ENOMEM); vss->vs_dtype = voi_dtype; vss->vsdsz = vsd_dtype2size[voi_dtype]; switch (voi_dtype) { case VSD_DTYPE_INT_S32: *((int32_t *)vss->iv) = iv.int32.s32; break; case VSD_DTYPE_INT_U32: *((uint32_t *)vss->iv) = iv.int32.u32; break; case VSD_DTYPE_INT_S64: *((int64_t *)vss->iv) = iv.int64.s64; break; case VSD_DTYPE_INT_U64: *((uint64_t *)vss->iv) = iv.int64.u64; break; case VSD_DTYPE_INT_SLONG: *((long *)vss->iv) = iv.intlong.slong; break; case VSD_DTYPE_INT_ULONG: *((unsigned long *)vss->iv) = iv.intlong.ulong; break; case VSD_DTYPE_Q_S32: *((s32q_t *)vss->iv) = Q_SCVAL(iv.q32.sq32, Q_CTRLINI(info->prec)); break; case VSD_DTYPE_Q_U32: *((u32q_t *)vss->iv) = Q_SCVAL(iv.q32.uq32, Q_CTRLINI(info->prec)); break; case VSD_DTYPE_Q_S64: *((s64q_t *)vss->iv) = Q_SCVAL(iv.q64.sq64, Q_CTRLINI(info->prec)); break; case VSD_DTYPE_Q_U64: *((u64q_t *)vss->iv) = Q_SCVAL(iv.q64.uq64, Q_CTRLINI(info->prec)); break; default: break; } return (0); } int stats_vss_hlpr_init(enum vsd_dtype voi_dtype, uint32_t nvss, struct voistatspec *vss) { int i, ret; for (i = nvss - 1; i >= 0; i--) { if (vss[i].hlpr && (ret = vss[i].hlpr(voi_dtype, &vss[i], vss[i].hlprinfo)) != 0) return (ret); } return (0); } void stats_vss_hlpr_cleanup(uint32_t nvss, struct voistatspec *vss) { int i; for (i = nvss - 1; i >= 0; i--) { if (vss[i].hlpr) { stats_free((void *)vss[i].iv); vss[i].iv = NULL; } } } int stats_tpl_fetch(int tpl_id, struct statsblob_tpl **tpl) { int error; error = 0; TPL_LIST_WLOCK(); if (tpl_id < 0 || tpl_id >= (int)ntpl) { error = ENOENT; } else { *tpl = tpllist[tpl_id]; /* XXXLAS: Acquire refcount on tpl. */ } TPL_LIST_WUNLOCK(); return (error); } int stats_tpl_fetch_allocid(const char *name, uint32_t hash) { int i, tpl_id; tpl_id = -ESRCH; TPL_LIST_RLOCK(); for (i = ntpl - 1; i >= 0; i--) { if (name != NULL) { if (strlen(name) == strlen(tpllist[i]->mb->tplname) && strncmp(name, tpllist[i]->mb->tplname, TPL_MAX_NAME_LEN) == 0 && (!hash || hash == tpllist[i]->mb->tplhash)) { tpl_id = i; break; } } else if (hash == tpllist[i]->mb->tplhash) { tpl_id = i; break; } } TPL_LIST_RUNLOCK(); return (tpl_id); } int stats_tpl_id2name(uint32_t tpl_id, char *buf, size_t len) { int error; error = 0; TPL_LIST_RLOCK(); if (tpl_id < ntpl) { if (buf != NULL && len > strlen(tpllist[tpl_id]->mb->tplname)) strlcpy(buf, tpllist[tpl_id]->mb->tplname, len); else error = EOVERFLOW; } else error = ENOENT; TPL_LIST_RUNLOCK(); return (error); } int stats_tpl_sample_rollthedice(struct stats_tpl_sample_rate *rates, int nrates, void *seed_bytes, size_t seed_len) { uint32_t cum_pct, rnd_pct; int i; cum_pct = 0; /* * Choose a pseudorandom or seeded number in range [0,100] and use * it to make a sampling decision and template selection where required. * If no seed is supplied, a PRNG is used to generate a pseudorandom * number so that every selection is independent. If a seed is supplied, * the caller desires random selection across different seeds, but * deterministic selection given the same seed. This is achieved by * hashing the seed and using the hash as the random number source. * * XXXLAS: Characterise hash function output distribution. */ if (seed_bytes == NULL) rnd_pct = random() / (INT32_MAX / 100); else rnd_pct = hash32_buf(seed_bytes, seed_len, 0) / (UINT32_MAX / 100U); /* * We map the randomly selected percentage on to the interval [0,100] * consisting of the cumulatively summed template sampling percentages. * The difference between the cumulative sum of all template sampling * percentages and 100 is treated as a NULL assignment i.e. no stats * template will be assigned, and -1 returned instead. */ for (i = 0; i < nrates; i++) { cum_pct += rates[i].tpl_sample_pct; KASSERT(cum_pct <= 100, ("%s cum_pct %u > 100", __func__, cum_pct)); if (rnd_pct > cum_pct || rates[i].tpl_sample_pct == 0) continue; return (rates[i].tpl_slot_id); } return (-1); } int stats_v1_blob_clone(struct statsblobv1 **dst, size_t dstmaxsz, struct statsblobv1 *src, uint32_t flags) { int error; error = 0; if (src == NULL || dst == NULL || src->cursz < sizeof(struct statsblob) || ((flags & SB_CLONE_ALLOCDST) && (flags & (SB_CLONE_USRDSTNOFAULT | SB_CLONE_USRDST)))) { error = EINVAL; } else if (flags & SB_CLONE_ALLOCDST) { *dst = stats_realloc(NULL, 0, src->cursz, 0); if (*dst) (*dst)->maxsz = dstmaxsz = src->cursz; else error = ENOMEM; } else if (*dst == NULL || dstmaxsz < sizeof(struct statsblob)) { error = EINVAL; } if (!error) { size_t postcurszlen; /* * Clone src into dst except for the maxsz field. If dst is too * small to hold all of src, only copy src's header and return * EOVERFLOW. */ #ifdef _KERNEL if (flags & SB_CLONE_USRDSTNOFAULT) copyout_nofault(src, *dst, offsetof(struct statsblob, maxsz)); else if (flags & SB_CLONE_USRDST) copyout(src, *dst, offsetof(struct statsblob, maxsz)); else #endif memcpy(*dst, src, offsetof(struct statsblob, maxsz)); if (dstmaxsz >= src->cursz) { postcurszlen = src->cursz - offsetof(struct statsblob, cursz); } else { error = EOVERFLOW; postcurszlen = sizeof(struct statsblob) - offsetof(struct statsblob, cursz); } #ifdef _KERNEL if (flags & SB_CLONE_USRDSTNOFAULT) copyout_nofault(&(src->cursz), &((*dst)->cursz), postcurszlen); else if (flags & SB_CLONE_USRDST) copyout(&(src->cursz), &((*dst)->cursz), postcurszlen); else #endif memcpy(&((*dst)->cursz), &(src->cursz), postcurszlen); } return (error); } int stats_v1_tpl_alloc(const char *name, uint32_t flags __unused) { struct statsblobv1_tpl *tpl, **newtpllist; struct statsblobv1 *tpl_sb; struct metablob *tpl_mb; int tpl_id; if (name != NULL && strlen(name) > TPL_MAX_NAME_LEN) return (-EINVAL); if (name != NULL && stats_tpl_fetch_allocid(name, 0) >= 0) return (-EEXIST); tpl = stats_realloc(NULL, 0, sizeof(struct statsblobv1_tpl), M_ZERO); tpl_mb = stats_realloc(NULL, 0, sizeof(struct metablob), M_ZERO); tpl_sb = stats_realloc(NULL, 0, sizeof(struct statsblobv1), M_ZERO); if (tpl_mb != NULL && name != NULL) tpl_mb->tplname = stats_strdup(name, 0); if (tpl == NULL || tpl_sb == NULL || tpl_mb == NULL || tpl_mb->tplname == NULL) { stats_free(tpl); stats_free(tpl_sb); if (tpl_mb != NULL) { stats_free(tpl_mb->tplname); stats_free(tpl_mb); } return (-ENOMEM); } tpl->mb = tpl_mb; tpl->sb = tpl_sb; tpl_sb->abi = STATS_ABI_V1; tpl_sb->endian = #if BYTE_ORDER == LITTLE_ENDIAN SB_LE; #elif BYTE_ORDER == BIG_ENDIAN SB_BE; #else SB_UE; #endif tpl_sb->cursz = tpl_sb->maxsz = sizeof(struct statsblobv1); tpl_sb->stats_off = tpl_sb->statsdata_off = sizeof(struct statsblobv1); TPL_LIST_WLOCK(); newtpllist = stats_realloc(tpllist, ntpl * sizeof(void *), (ntpl + 1) * sizeof(void *), 0); if (newtpllist != NULL) { tpl_id = ntpl++; tpllist = (struct statsblob_tpl **)newtpllist; tpllist[tpl_id] = (struct statsblob_tpl *)tpl; stats_tpl_update_hash(tpllist[tpl_id]); } else { stats_free(tpl); stats_free(tpl_sb); if (tpl_mb != NULL) { stats_free(tpl_mb->tplname); stats_free(tpl_mb); } tpl_id = -ENOMEM; } TPL_LIST_WUNLOCK(); return (tpl_id); } int stats_v1_tpl_add_voistats(uint32_t tpl_id, int32_t voi_id, const char *voi_name, enum vsd_dtype voi_dtype, uint32_t nvss, struct voistatspec *vss, uint32_t flags) { struct voi *voi; struct voistat *tmpstat; struct statsblobv1 *tpl_sb; struct metablob *tpl_mb; int error, i, newstatdataidx, newvoibytes, newvoistatbytes, newvoistatdatabytes, newvoistatmaxid; uint32_t nbytes; if (voi_id < 0 || voi_dtype == 0 || voi_dtype >= VSD_NUM_DTYPES || nvss == 0 || vss == NULL) return (EINVAL); error = nbytes = newvoibytes = newvoistatbytes = newvoistatdatabytes = 0; newvoistatmaxid = -1; /* Calculate the number of bytes required for the new voistats. */ for (i = nvss - 1; i >= 0; i--) { if (vss[i].stype == 0 || vss[i].stype >= VS_NUM_STYPES || vss[i].vs_dtype == 0 || vss[i].vs_dtype >= VSD_NUM_DTYPES || vss[i].iv == NULL || vss[i].vsdsz == 0) return (EINVAL); if ((int)vss[i].stype > newvoistatmaxid) newvoistatmaxid = vss[i].stype; newvoistatdatabytes += vss[i].vsdsz; } if (flags & SB_VOI_RELUPDATE) { /* XXXLAS: VOI state bytes may need to vary based on stat types. */ newvoistatdatabytes += sizeof(struct voistatdata_voistate); } nbytes += newvoistatdatabytes; TPL_LIST_WLOCK(); if (tpl_id < ntpl) { tpl_sb = (struct statsblobv1 *)tpllist[tpl_id]->sb; tpl_mb = tpllist[tpl_id]->mb; if (voi_id >= NVOIS(tpl_sb) || tpl_sb->vois[voi_id].id == -1) { /* Adding a new VOI and associated stats. */ if (voi_id >= NVOIS(tpl_sb)) { /* We need to grow the tpl_sb->vois array. */ newvoibytes = (voi_id - (NVOIS(tpl_sb) - 1)) * sizeof(struct voi); nbytes += newvoibytes; } newvoistatbytes = (newvoistatmaxid + 1) * sizeof(struct voistat); } else { /* Adding stats to an existing VOI. */ if (newvoistatmaxid > tpl_sb->vois[voi_id].voistatmaxid) { newvoistatbytes = (newvoistatmaxid - tpl_sb->vois[voi_id].voistatmaxid) * sizeof(struct voistat); } /* XXXLAS: KPI does not yet support expanding VOIs. */ error = EOPNOTSUPP; } nbytes += newvoistatbytes; if (!error && newvoibytes > 0) { struct voi_meta *voi_meta = tpl_mb->voi_meta; voi_meta = stats_realloc(voi_meta, voi_meta == NULL ? 0 : NVOIS(tpl_sb) * sizeof(struct voi_meta), (1 + voi_id) * sizeof(struct voi_meta), M_ZERO); if (voi_meta == NULL) error = ENOMEM; else tpl_mb->voi_meta = voi_meta; } if (!error) { /* NB: Resizing can change where tpl_sb points. */ error = stats_v1_blob_expand(&tpl_sb, newvoibytes, newvoistatbytes, newvoistatdatabytes); } if (!error) { tpl_mb->voi_meta[voi_id].name = stats_strdup(voi_name, 0); if (tpl_mb->voi_meta[voi_id].name == NULL) error = ENOMEM; } if (!error) { /* Update the template list with the resized pointer. */ tpllist[tpl_id]->sb = (struct statsblob *)tpl_sb; /* Update the template. */ voi = &tpl_sb->vois[voi_id]; if (voi->id < 0) { /* VOI is new and needs to be initialised. */ voi->id = voi_id; voi->dtype = voi_dtype; voi->stats_off = tpl_sb->stats_off; if (flags & SB_VOI_RELUPDATE) voi->flags |= VOI_REQSTATE; } else { /* * XXXLAS: When this else block is written, the * "KPI does not yet support expanding VOIs" * error earlier in this function can be * removed. What is required here is to shuffle * the voistat array such that the new stats for * the voi are contiguous, which will displace * stats for other vois that reside after the * voi being updated. The other vois then need * to have their stats_off adjusted post * shuffle. */ } voi->voistatmaxid = newvoistatmaxid; newstatdataidx = 0; if (voi->flags & VOI_REQSTATE) { /* Initialise the voistate stat in slot 0. */ tmpstat = BLOB_OFFSET(tpl_sb, voi->stats_off); tmpstat->stype = VS_STYPE_VOISTATE; tmpstat->flags = 0; tmpstat->dtype = VSD_DTYPE_VOISTATE; newstatdataidx = tmpstat->dsz = sizeof(struct voistatdata_numeric); tmpstat->data_off = tpl_sb->statsdata_off; } for (i = 0; (uint32_t)i < nvss; i++) { tmpstat = BLOB_OFFSET(tpl_sb, voi->stats_off + (vss[i].stype * sizeof(struct voistat))); KASSERT(tmpstat->stype < 0, ("voistat %p " "already initialised", tmpstat)); tmpstat->stype = vss[i].stype; tmpstat->flags = vss[i].flags; tmpstat->dtype = vss[i].vs_dtype; tmpstat->dsz = vss[i].vsdsz; tmpstat->data_off = tpl_sb->statsdata_off + newstatdataidx; memcpy(BLOB_OFFSET(tpl_sb, tmpstat->data_off), vss[i].iv, vss[i].vsdsz); newstatdataidx += vss[i].vsdsz; } /* Update the template version hash. */ stats_tpl_update_hash(tpllist[tpl_id]); /* XXXLAS: Confirm tpl name/hash pair remains unique. */ } } else error = EINVAL; TPL_LIST_WUNLOCK(); return (error); } struct statsblobv1 * stats_v1_blob_alloc(uint32_t tpl_id, uint32_t flags __unused) { struct statsblobv1 *sb; int error; sb = NULL; TPL_LIST_RLOCK(); if (tpl_id < ntpl) { sb = stats_realloc(NULL, 0, tpllist[tpl_id]->sb->maxsz, 0); if (sb != NULL) { sb->maxsz = tpllist[tpl_id]->sb->maxsz; error = stats_v1_blob_init_locked(sb, tpl_id, 0); } else error = ENOMEM; if (error) { stats_free(sb); sb = NULL; } } TPL_LIST_RUNLOCK(); return (sb); } void stats_v1_blob_destroy(struct statsblobv1 *sb) { stats_free(sb); } int stats_v1_voistat_fetch_dptr(struct statsblobv1 *sb, int32_t voi_id, enum voi_stype stype, enum vsd_dtype *retdtype, struct voistatdata **retvsd, size_t *retvsdsz) { struct voi *v; struct voistat *vs; if (retvsd == NULL || sb == NULL || sb->abi != STATS_ABI_V1 || voi_id >= NVOIS(sb)) return (EINVAL); v = &sb->vois[voi_id]; if ((__typeof(v->voistatmaxid))stype > v->voistatmaxid) return (EINVAL); vs = BLOB_OFFSET(sb, v->stats_off + (stype * sizeof(struct voistat))); *retvsd = BLOB_OFFSET(sb, vs->data_off); if (retdtype != NULL) *retdtype = vs->dtype; if (retvsdsz != NULL) *retvsdsz = vs->dsz; return (0); } int stats_v1_blob_init(struct statsblobv1 *sb, uint32_t tpl_id, uint32_t flags) { int error; error = 0; TPL_LIST_RLOCK(); if (sb == NULL || tpl_id >= ntpl) { error = EINVAL; } else { error = stats_v1_blob_init_locked(sb, tpl_id, flags); } TPL_LIST_RUNLOCK(); return (error); } static inline int stats_v1_blob_init_locked(struct statsblobv1 *sb, uint32_t tpl_id, uint32_t flags __unused) { int error; TPL_LIST_RLOCK_ASSERT(); error = (sb->maxsz >= tpllist[tpl_id]->sb->cursz) ? 0 : EOVERFLOW; KASSERT(!error, ("sb %d instead of %d bytes", sb->maxsz, tpllist[tpl_id]->sb->cursz)); if (!error) { memcpy(sb, tpllist[tpl_id]->sb, tpllist[tpl_id]->sb->cursz); sb->created = sb->lastrst = stats_sbinuptime(); sb->tplhash = tpllist[tpl_id]->mb->tplhash; } return (error); } static int stats_v1_blob_expand(struct statsblobv1 **sbpp, int newvoibytes, int newvoistatbytes, int newvoistatdatabytes) { struct statsblobv1 *sb; struct voi *tmpvoi; struct voistat *tmpvoistat, *voistat_array; int error, i, idxnewvois, idxnewvoistats, nbytes, nvoistats; KASSERT(newvoibytes % sizeof(struct voi) == 0, ("Bad newvoibytes %d", newvoibytes)); KASSERT(newvoistatbytes % sizeof(struct voistat) == 0, ("Bad newvoistatbytes %d", newvoistatbytes)); error = ((newvoibytes % sizeof(struct voi) == 0) && (newvoistatbytes % sizeof(struct voistat) == 0)) ? 0 : EINVAL; sb = *sbpp; nbytes = newvoibytes + newvoistatbytes + newvoistatdatabytes; /* * XXXLAS: Required until we gain support for flags which alter the * units of size/offset fields in key structs. */ if (!error && ((((int)sb->cursz) + nbytes) > SB_V1_MAXSZ)) error = EFBIG; if (!error && (sb->cursz + nbytes > sb->maxsz)) { /* Need to expand our blob. */ sb = stats_realloc(sb, sb->maxsz, sb->cursz + nbytes, M_ZERO); if (sb != NULL) { sb->maxsz = sb->cursz + nbytes; *sbpp = sb; } else error = ENOMEM; } if (!error) { /* * Shuffle memory within the expanded blob working from the end * backwards, leaving gaps for the new voistat and voistatdata * structs at the beginning of their respective blob regions, * and for the new voi structs at the end of their blob region. */ memmove(BLOB_OFFSET(sb, sb->statsdata_off + nbytes), BLOB_OFFSET(sb, sb->statsdata_off), sb->cursz - sb->statsdata_off); memmove(BLOB_OFFSET(sb, sb->stats_off + newvoibytes + newvoistatbytes), BLOB_OFFSET(sb, sb->stats_off), sb->statsdata_off - sb->stats_off); /* First index of new voi/voistat structs to be initialised. */ idxnewvois = NVOIS(sb); idxnewvoistats = (newvoistatbytes / sizeof(struct voistat)) - 1; /* Update housekeeping variables and offsets. */ sb->cursz += nbytes; sb->stats_off += newvoibytes; sb->statsdata_off += newvoibytes + newvoistatbytes; /* XXXLAS: Zeroing not strictly needed but aids debugging. */ memset(&sb->vois[idxnewvois], '\0', newvoibytes); memset(BLOB_OFFSET(sb, sb->stats_off), '\0', newvoistatbytes); memset(BLOB_OFFSET(sb, sb->statsdata_off), '\0', newvoistatdatabytes); /* Initialise new voi array members and update offsets. */ for (i = 0; i < NVOIS(sb); i++) { tmpvoi = &sb->vois[i]; if (i >= idxnewvois) { tmpvoi->id = tmpvoi->voistatmaxid = -1; } else if (tmpvoi->id > -1) { tmpvoi->stats_off += newvoibytes + newvoistatbytes; } } /* Initialise new voistat array members and update offsets. */ nvoistats = (sb->statsdata_off - sb->stats_off) / sizeof(struct voistat); voistat_array = BLOB_OFFSET(sb, sb->stats_off); for (i = 0; i < nvoistats; i++) { tmpvoistat = &voistat_array[i]; if (i <= idxnewvoistats) { tmpvoistat->stype = -1; } else if (tmpvoistat->stype > -1) { tmpvoistat->data_off += nbytes; } } } return (error); } static void stats_v1_blob_finalise(struct statsblobv1 *sb __unused) { /* XXXLAS: Fill this in. */ } static void stats_v1_blob_iter(struct statsblobv1 *sb, stats_v1_blob_itercb_t icb, void *usrctx, uint32_t flags) { struct voi *v; struct voistat *vs; struct sb_iter_ctx ctx; int i, j, firstvoi; ctx.usrctx = usrctx; ctx.flags |= SB_IT_FIRST_CB; ctx.flags &= ~(SB_IT_FIRST_VOI | SB_IT_LAST_VOI | SB_IT_FIRST_VOISTAT | SB_IT_LAST_VOISTAT); firstvoi = 1; for (i = 0; i < NVOIS(sb); i++) { v = &sb->vois[i]; ctx.vslot = i; ctx.vsslot = -1; ctx.flags |= SB_IT_FIRST_VOISTAT; if (firstvoi) ctx.flags |= SB_IT_FIRST_VOI; else if (i == (NVOIS(sb) - 1)) ctx.flags |= SB_IT_LAST_VOI | SB_IT_LAST_CB; if (v->id < 0 && (flags & SB_IT_NULLVOI)) { if (icb(sb, v, NULL, &ctx)) return; firstvoi = 0; ctx.flags &= ~SB_IT_FIRST_CB; } /* If NULL voi, v->voistatmaxid == -1 */ for (j = 0; j <= v->voistatmaxid; j++) { vs = &((struct voistat *)BLOB_OFFSET(sb, v->stats_off))[j]; if (vs->stype < 0 && !(flags & SB_IT_NULLVOISTAT)) continue; if (j == v->voistatmaxid) { ctx.flags |= SB_IT_LAST_VOISTAT; if (i == (NVOIS(sb) - 1)) ctx.flags |= SB_IT_LAST_CB; } else ctx.flags &= ~SB_IT_LAST_CB; ctx.vsslot = j; if (icb(sb, v, vs, &ctx)) return; ctx.flags &= ~(SB_IT_FIRST_CB | SB_IT_FIRST_VOISTAT | SB_IT_LAST_VOISTAT); } ctx.flags &= ~(SB_IT_FIRST_VOI | SB_IT_LAST_VOI); } } static inline void stats_voistatdata_tdgst_tostr(enum vsd_dtype voi_dtype __unused, const struct voistatdata_tdgst *tdgst, enum vsd_dtype tdgst_dtype, size_t tdgst_dsz __unused, enum sb_str_fmt fmt, struct sbuf *buf, int objdump) { const struct ctdth32 *ctd32tree; const struct ctdth64 *ctd64tree; const struct voistatdata_tdgstctd32 *ctd32; const struct voistatdata_tdgstctd64 *ctd64; const char *fmtstr; uint64_t smplcnt, compcnt; int is32bit, qmaxstrlen; uint16_t maxctds, curctds; switch (tdgst_dtype) { case VSD_DTYPE_TDGSTCLUST32: smplcnt = CONSTVSD(tdgstclust32, tdgst)->smplcnt; compcnt = CONSTVSD(tdgstclust32, tdgst)->compcnt; maxctds = ARB_MAXNODES(&CONSTVSD(tdgstclust32, tdgst)->ctdtree); curctds = ARB_CURNODES(&CONSTVSD(tdgstclust32, tdgst)->ctdtree); ctd32tree = &CONSTVSD(tdgstclust32, tdgst)->ctdtree; ctd32 = (objdump ? ARB_CNODE(ctd32tree, 0) : ARB_CMIN(ctdth32, ctd32tree)); qmaxstrlen = (ctd32 == NULL) ? 1 : Q_MAXSTRLEN(ctd32->mu, 10); is32bit = 1; ctd64tree = NULL; ctd64 = NULL; break; case VSD_DTYPE_TDGSTCLUST64: smplcnt = CONSTVSD(tdgstclust64, tdgst)->smplcnt; compcnt = CONSTVSD(tdgstclust64, tdgst)->compcnt; maxctds = ARB_MAXNODES(&CONSTVSD(tdgstclust64, tdgst)->ctdtree); curctds = ARB_CURNODES(&CONSTVSD(tdgstclust64, tdgst)->ctdtree); ctd64tree = &CONSTVSD(tdgstclust64, tdgst)->ctdtree; ctd64 = (objdump ? ARB_CNODE(ctd64tree, 0) : ARB_CMIN(ctdth64, ctd64tree)); qmaxstrlen = (ctd64 == NULL) ? 1 : Q_MAXSTRLEN(ctd64->mu, 10); is32bit = 0; ctd32tree = NULL; ctd32 = NULL; break; default: return; } switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "smplcnt=%ju, compcnt=%ju, maxctds=%hu, nctds=%hu"; break; case SB_STRFMT_JSON: default: fmtstr = "\"smplcnt\":%ju,\"compcnt\":%ju,\"maxctds\":%hu," "\"nctds\":%hu,\"ctds\":["; break; } sbuf_printf(buf, fmtstr, (uintmax_t)smplcnt, (uintmax_t)compcnt, maxctds, curctds); while ((is32bit ? NULL != ctd32 : NULL != ctd64)) { char qstr[qmaxstrlen]; switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "\n\t\t\t\t"; break; case SB_STRFMT_JSON: default: fmtstr = "{"; break; } sbuf_cat(buf, fmtstr); if (objdump) { switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "ctd[%hu]."; break; case SB_STRFMT_JSON: default: fmtstr = "\"ctd\":%hu,"; break; } sbuf_printf(buf, fmtstr, is32bit ? ARB_SELFIDX(ctd32tree, ctd32) : ARB_SELFIDX(ctd64tree, ctd64)); } switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "{mu="; break; case SB_STRFMT_JSON: default: fmtstr = "\"mu\":"; break; } sbuf_cat(buf, fmtstr); Q_TOSTR((is32bit ? ctd32->mu : ctd64->mu), -1, 10, qstr, sizeof(qstr)); sbuf_cat(buf, qstr); switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = is32bit ? ",cnt=%u}" : ",cnt=%ju}"; break; case SB_STRFMT_JSON: default: fmtstr = is32bit ? ",\"cnt\":%u}" : ",\"cnt\":%ju}"; break; } sbuf_printf(buf, fmtstr, is32bit ? ctd32->cnt : (uintmax_t)ctd64->cnt); if (is32bit) ctd32 = (objdump ? ARB_CNODE(ctd32tree, ARB_SELFIDX(ctd32tree, ctd32) + 1) : ARB_CNEXT(ctdth32, ctd32tree, ctd32)); else ctd64 = (objdump ? ARB_CNODE(ctd64tree, ARB_SELFIDX(ctd64tree, ctd64) + 1) : ARB_CNEXT(ctdth64, ctd64tree, ctd64)); if (fmt == SB_STRFMT_JSON && (is32bit ? NULL != ctd32 : NULL != ctd64)) sbuf_putc(buf, ','); } if (fmt == SB_STRFMT_JSON) sbuf_cat(buf, "]"); } static inline void stats_voistatdata_hist_tostr(enum vsd_dtype voi_dtype, const struct voistatdata_hist *hist, enum vsd_dtype hist_dtype, size_t hist_dsz, enum sb_str_fmt fmt, struct sbuf *buf, int objdump) { const struct voistatdata_numeric *bkt_lb, *bkt_ub; const char *fmtstr; int is32bit; uint16_t i, nbkts; switch (hist_dtype) { case VSD_DTYPE_CRHIST32: nbkts = HIST_VSDSZ2NBKTS(crhist32, hist_dsz); is32bit = 1; break; case VSD_DTYPE_DRHIST32: nbkts = HIST_VSDSZ2NBKTS(drhist32, hist_dsz); is32bit = 1; break; case VSD_DTYPE_DVHIST32: nbkts = HIST_VSDSZ2NBKTS(dvhist32, hist_dsz); is32bit = 1; break; case VSD_DTYPE_CRHIST64: nbkts = HIST_VSDSZ2NBKTS(crhist64, hist_dsz); is32bit = 0; break; case VSD_DTYPE_DRHIST64: nbkts = HIST_VSDSZ2NBKTS(drhist64, hist_dsz); is32bit = 0; break; case VSD_DTYPE_DVHIST64: nbkts = HIST_VSDSZ2NBKTS(dvhist64, hist_dsz); is32bit = 0; break; default: return; } switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "nbkts=%hu, "; break; case SB_STRFMT_JSON: default: fmtstr = "\"nbkts\":%hu,"; break; } sbuf_printf(buf, fmtstr, nbkts); switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = (is32bit ? "oob=%u" : "oob=%ju"); break; case SB_STRFMT_JSON: default: fmtstr = (is32bit ? "\"oob\":%u,\"bkts\":[" : "\"oob\":%ju,\"bkts\":["); break; } sbuf_printf(buf, fmtstr, is32bit ? VSD_CONSTHIST_FIELDVAL(hist, hist_dtype, oob) : (uintmax_t)VSD_CONSTHIST_FIELDVAL(hist, hist_dtype, oob)); for (i = 0; i < nbkts; i++) { switch (hist_dtype) { case VSD_DTYPE_CRHIST32: case VSD_DTYPE_CRHIST64: bkt_lb = VSD_CONSTCRHIST_FIELDPTR(hist, hist_dtype, bkts[i].lb); if (i < nbkts - 1) bkt_ub = VSD_CONSTCRHIST_FIELDPTR(hist, hist_dtype, bkts[i + 1].lb); else bkt_ub = &numeric_limits[LIM_MAX][voi_dtype]; break; case VSD_DTYPE_DRHIST32: case VSD_DTYPE_DRHIST64: bkt_lb = VSD_CONSTDRHIST_FIELDPTR(hist, hist_dtype, bkts[i].lb); bkt_ub = VSD_CONSTDRHIST_FIELDPTR(hist, hist_dtype, bkts[i].ub); break; case VSD_DTYPE_DVHIST32: case VSD_DTYPE_DVHIST64: bkt_lb = bkt_ub = VSD_CONSTDVHIST_FIELDPTR(hist, hist_dtype, bkts[i].val); break; default: break; } switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "\n\t\t\t\t"; break; case SB_STRFMT_JSON: default: fmtstr = "{"; break; } sbuf_cat(buf, fmtstr); if (objdump) { switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "bkt[%hu]."; break; case SB_STRFMT_JSON: default: fmtstr = "\"bkt\":%hu,"; break; } sbuf_printf(buf, fmtstr, i); } switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "{lb="; break; case SB_STRFMT_JSON: default: fmtstr = "\"lb\":"; break; } sbuf_cat(buf, fmtstr); stats_voistatdata_tostr((const struct voistatdata *)bkt_lb, voi_dtype, voi_dtype, sizeof(struct voistatdata_numeric), fmt, buf, objdump); switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = ",ub="; break; case SB_STRFMT_JSON: default: fmtstr = ",\"ub\":"; break; } sbuf_cat(buf, fmtstr); stats_voistatdata_tostr((const struct voistatdata *)bkt_ub, voi_dtype, voi_dtype, sizeof(struct voistatdata_numeric), fmt, buf, objdump); switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = is32bit ? ",cnt=%u}" : ",cnt=%ju}"; break; case SB_STRFMT_JSON: default: fmtstr = is32bit ? ",\"cnt\":%u}" : ",\"cnt\":%ju}"; break; } sbuf_printf(buf, fmtstr, is32bit ? VSD_CONSTHIST_FIELDVAL(hist, hist_dtype, bkts[i].cnt) : (uintmax_t)VSD_CONSTHIST_FIELDVAL(hist, hist_dtype, bkts[i].cnt)); if (fmt == SB_STRFMT_JSON && i < nbkts - 1) sbuf_putc(buf, ','); } if (fmt == SB_STRFMT_JSON) sbuf_cat(buf, "]"); } int stats_voistatdata_tostr(const struct voistatdata *vsd, enum vsd_dtype voi_dtype, enum vsd_dtype vsd_dtype, size_t vsd_sz, enum sb_str_fmt fmt, struct sbuf *buf, int objdump) { const char *fmtstr; if (vsd == NULL || buf == NULL || voi_dtype >= VSD_NUM_DTYPES || vsd_dtype >= VSD_NUM_DTYPES || fmt >= SB_STRFMT_NUM_FMTS) return (EINVAL); switch (vsd_dtype) { case VSD_DTYPE_VOISTATE: switch (fmt) { case SB_STRFMT_FREEFORM: fmtstr = "prev="; break; case SB_STRFMT_JSON: default: fmtstr = "\"prev\":"; break; } sbuf_cat(buf, fmtstr); /* * Render prev by passing it as *vsd and voi_dtype as vsd_dtype. */ stats_voistatdata_tostr( (const struct voistatdata *)&CONSTVSD(voistate, vsd)->prev, voi_dtype, voi_dtype, vsd_sz, fmt, buf, objdump); break; case VSD_DTYPE_INT_S32: sbuf_printf(buf, "%d", vsd->int32.s32); break; case VSD_DTYPE_INT_U32: sbuf_printf(buf, "%u", vsd->int32.u32); break; case VSD_DTYPE_INT_S64: sbuf_printf(buf, "%jd", (intmax_t)vsd->int64.s64); break; case VSD_DTYPE_INT_U64: sbuf_printf(buf, "%ju", (uintmax_t)vsd->int64.u64); break; case VSD_DTYPE_INT_SLONG: sbuf_printf(buf, "%ld", vsd->intlong.slong); break; case VSD_DTYPE_INT_ULONG: sbuf_printf(buf, "%lu", vsd->intlong.ulong); break; case VSD_DTYPE_Q_S32: { char qstr[Q_MAXSTRLEN(vsd->q32.sq32, 10)]; Q_TOSTR((s32q_t)vsd->q32.sq32, -1, 10, qstr, sizeof(qstr)); sbuf_cat(buf, qstr); } break; case VSD_DTYPE_Q_U32: { char qstr[Q_MAXSTRLEN(vsd->q32.uq32, 10)]; Q_TOSTR((u32q_t)vsd->q32.uq32, -1, 10, qstr, sizeof(qstr)); sbuf_cat(buf, qstr); } break; case VSD_DTYPE_Q_S64: { char qstr[Q_MAXSTRLEN(vsd->q64.sq64, 10)]; Q_TOSTR((s64q_t)vsd->q64.sq64, -1, 10, qstr, sizeof(qstr)); sbuf_cat(buf, qstr); } break; case VSD_DTYPE_Q_U64: { char qstr[Q_MAXSTRLEN(vsd->q64.uq64, 10)]; Q_TOSTR((u64q_t)vsd->q64.uq64, -1, 10, qstr, sizeof(qstr)); sbuf_cat(buf, qstr); } break; case VSD_DTYPE_CRHIST32: case VSD_DTYPE_DRHIST32: case VSD_DTYPE_DVHIST32: case VSD_DTYPE_CRHIST64: case VSD_DTYPE_DRHIST64: case VSD_DTYPE_DVHIST64: stats_voistatdata_hist_tostr(voi_dtype, CONSTVSD(hist, vsd), vsd_dtype, vsd_sz, fmt, buf, objdump); break; case VSD_DTYPE_TDGSTCLUST32: case VSD_DTYPE_TDGSTCLUST64: stats_voistatdata_tdgst_tostr(voi_dtype, CONSTVSD(tdgst, vsd), vsd_dtype, vsd_sz, fmt, buf, objdump); break; default: break; } return (sbuf_error(buf)); } static void stats_v1_itercb_tostr_freeform(struct statsblobv1 *sb, struct voi *v, struct voistat *vs, struct sb_iter_ctx *ctx) { struct sb_tostrcb_ctx *sctx; struct metablob *tpl_mb; struct sbuf *buf; void *vsd; uint8_t dump; sctx = ctx->usrctx; buf = sctx->buf; tpl_mb = sctx->tpl ? sctx->tpl->mb : NULL; dump = ((sctx->flags & SB_TOSTR_OBJDUMP) != 0); if (ctx->flags & SB_IT_FIRST_CB) { sbuf_printf(buf, "struct statsblobv1@%p", sb); if (dump) { sbuf_printf(buf, ", abi=%hhu, endian=%hhu, maxsz=%hu, " "cursz=%hu, created=%jd, lastrst=%jd, flags=0x%04hx, " "stats_off=%hu, statsdata_off=%hu", sb->abi, sb->endian, sb->maxsz, sb->cursz, sb->created, sb->lastrst, sb->flags, sb->stats_off, sb->statsdata_off); } sbuf_printf(buf, ", tplhash=%u", sb->tplhash); } if (ctx->flags & SB_IT_FIRST_VOISTAT) { sbuf_printf(buf, "\n\tvois[%hd]: id=%hd", ctx->vslot, v->id); if (v->id < 0) return; sbuf_printf(buf, ", name=\"%s\"", (tpl_mb == NULL) ? "" : tpl_mb->voi_meta[v->id].name); if (dump) sbuf_printf(buf, ", flags=0x%04hx, dtype=%s, " "voistatmaxid=%hhd, stats_off=%hu", v->flags, vsd_dtype2name[v->dtype], v->voistatmaxid, v->stats_off); } if (!dump && vs->stype <= 0) return; sbuf_printf(buf, "\n\t\tvois[%hd]stat[%hhd]: stype=", v->id, ctx->vsslot); if (vs->stype < 0) { sbuf_printf(buf, "%hhd", vs->stype); return; } else sbuf_printf(buf, "%s, errs=%hu", vs_stype2name[vs->stype], vs->errs); vsd = BLOB_OFFSET(sb, vs->data_off); if (dump) sbuf_printf(buf, ", flags=0x%04x, dtype=%s, dsz=%hu, " "data_off=%hu", vs->flags, vsd_dtype2name[vs->dtype], vs->dsz, vs->data_off); sbuf_printf(buf, "\n\t\t\tvoistatdata: "); stats_voistatdata_tostr(vsd, v->dtype, vs->dtype, vs->dsz, sctx->fmt, buf, dump); } static void stats_v1_itercb_tostr_json(struct statsblobv1 *sb, struct voi *v, struct voistat *vs, struct sb_iter_ctx *ctx) { struct sb_tostrcb_ctx *sctx; struct metablob *tpl_mb; struct sbuf *buf; const char *fmtstr; void *vsd; uint8_t dump; sctx = ctx->usrctx; buf = sctx->buf; tpl_mb = sctx->tpl ? sctx->tpl->mb : NULL; dump = ((sctx->flags & SB_TOSTR_OBJDUMP) != 0); if (ctx->flags & SB_IT_FIRST_CB) { sbuf_putc(buf, '{'); if (dump) { sbuf_printf(buf, "\"abi\":%hhu,\"endian\":%hhu," "\"maxsz\":%hu,\"cursz\":%hu,\"created\":%jd," "\"lastrst\":%jd,\"flags\":%hu,\"stats_off\":%hu," "\"statsdata_off\":%hu,", sb->abi, sb->endian, sb->maxsz, sb->cursz, sb->created, sb->lastrst, sb->flags, sb->stats_off, sb->statsdata_off); } if (tpl_mb == NULL) fmtstr = "\"tplname\":%s,\"tplhash\":%u,\"vois\":{"; else fmtstr = "\"tplname\":\"%s\",\"tplhash\":%u,\"vois\":{"; sbuf_printf(buf, fmtstr, tpl_mb ? tpl_mb->tplname : "null", sb->tplhash); } if (ctx->flags & SB_IT_FIRST_VOISTAT) { if (dump) { sbuf_printf(buf, "\"[%d]\":{\"id\":%d", ctx->vslot, v->id); if (v->id < 0) { sbuf_printf(buf, "},"); return; } if (tpl_mb == NULL) fmtstr = ",\"name\":%s,\"flags\":%hu," "\"dtype\":\"%s\",\"voistatmaxid\":%hhd," "\"stats_off\":%hu,"; else fmtstr = ",\"name\":\"%s\",\"flags\":%hu," "\"dtype\":\"%s\",\"voistatmaxid\":%hhd," "\"stats_off\":%hu,"; sbuf_printf(buf, fmtstr, tpl_mb ? tpl_mb->voi_meta[v->id].name : "null", v->flags, vsd_dtype2name[v->dtype], v->voistatmaxid, v->stats_off); } else { if (tpl_mb == NULL) { sbuf_printf(buf, "\"[%hd]\":{", v->id); } else { sbuf_printf(buf, "\"%s\":{", tpl_mb->voi_meta[v->id].name); } } sbuf_cat(buf, "\"stats\":{"); } vsd = BLOB_OFFSET(sb, vs->data_off); if (dump) { sbuf_printf(buf, "\"[%hhd]\":", ctx->vsslot); if (vs->stype < 0) { sbuf_printf(buf, "{\"stype\":-1},"); return; } sbuf_printf(buf, "{\"stype\":\"%s\",\"errs\":%hu,\"flags\":%hu," "\"dtype\":\"%s\",\"data_off\":%hu,\"voistatdata\":{", vs_stype2name[vs->stype], vs->errs, vs->flags, vsd_dtype2name[vs->dtype], vs->data_off); } else if (vs->stype > 0) { if (tpl_mb == NULL) sbuf_printf(buf, "\"[%hhd]\":", vs->stype); else sbuf_printf(buf, "\"%s\":", vs_stype2name[vs->stype]); } else return; if ((vs->flags & VS_VSDVALID) || dump) { if (!dump) sbuf_printf(buf, "{\"errs\":%hu,", vs->errs); /* Simple non-compound VSD types need a key. */ if (!vsd_compoundtype[vs->dtype]) sbuf_cat(buf, "\"val\":"); stats_voistatdata_tostr(vsd, v->dtype, vs->dtype, vs->dsz, sctx->fmt, buf, dump); sbuf_cat(buf, dump ? "}}" : "}"); } else sbuf_cat(buf, dump ? "null}" : "null"); if (ctx->flags & SB_IT_LAST_VOISTAT) sbuf_cat(buf, "}}"); if (ctx->flags & SB_IT_LAST_CB) sbuf_cat(buf, "}}"); else sbuf_putc(buf, ','); } static int stats_v1_itercb_tostr(struct statsblobv1 *sb, struct voi *v, struct voistat *vs, struct sb_iter_ctx *ctx) { struct sb_tostrcb_ctx *sctx; sctx = ctx->usrctx; switch (sctx->fmt) { case SB_STRFMT_FREEFORM: stats_v1_itercb_tostr_freeform(sb, v, vs, ctx); break; case SB_STRFMT_JSON: stats_v1_itercb_tostr_json(sb, v, vs, ctx); break; default: break; } return (sbuf_error(sctx->buf)); } int stats_v1_blob_tostr(struct statsblobv1 *sb, struct sbuf *buf, enum sb_str_fmt fmt, uint32_t flags) { struct sb_tostrcb_ctx sctx; uint32_t iflags; if (sb == NULL || sb->abi != STATS_ABI_V1 || buf == NULL || fmt >= SB_STRFMT_NUM_FMTS) return (EINVAL); sctx.buf = buf; sctx.fmt = fmt; sctx.flags = flags; if (flags & SB_TOSTR_META) { if (stats_tpl_fetch(stats_tpl_fetch_allocid(NULL, sb->tplhash), &sctx.tpl)) return (EINVAL); } else sctx.tpl = NULL; iflags = 0; if (flags & SB_TOSTR_OBJDUMP) iflags |= (SB_IT_NULLVOI | SB_IT_NULLVOISTAT); stats_v1_blob_iter(sb, stats_v1_itercb_tostr, &sctx, iflags); return (sbuf_error(buf)); } static int stats_v1_itercb_visit(struct statsblobv1 *sb, struct voi *v, struct voistat *vs, struct sb_iter_ctx *ctx) { struct sb_visitcb_ctx *vctx; struct sb_visit sbv; vctx = ctx->usrctx; sbv.tplhash = sb->tplhash; sbv.voi_id = v->id; sbv.voi_dtype = v->dtype; sbv.vs_stype = vs->stype; sbv.vs_dtype = vs->dtype; sbv.vs_dsz = vs->dsz; sbv.vs_data = BLOB_OFFSET(sb, vs->data_off); sbv.vs_errs = vs->errs; sbv.flags = ctx->flags & (SB_IT_FIRST_CB | SB_IT_LAST_CB | SB_IT_FIRST_VOI | SB_IT_LAST_VOI | SB_IT_FIRST_VOISTAT | SB_IT_LAST_VOISTAT); return (vctx->cb(&sbv, vctx->usrctx)); } int stats_v1_blob_visit(struct statsblobv1 *sb, stats_blob_visitcb_t func, void *usrctx) { struct sb_visitcb_ctx vctx; if (sb == NULL || sb->abi != STATS_ABI_V1 || func == NULL) return (EINVAL); vctx.cb = func; vctx.usrctx = usrctx; stats_v1_blob_iter(sb, stats_v1_itercb_visit, &vctx, 0); return (0); } static int stats_v1_icb_reset_voistat(struct statsblobv1 *sb, struct voi *v __unused, struct voistat *vs, struct sb_iter_ctx *ctx __unused) { void *vsd; if (vs->stype == VS_STYPE_VOISTATE) return (0); vsd = BLOB_OFFSET(sb, vs->data_off); /* Perform the stat type's default reset action. */ switch (vs->stype) { case VS_STYPE_SUM: switch (vs->dtype) { case VSD_DTYPE_Q_S32: Q_SIFVAL(VSD(q32, vsd)->sq32, 0); break; case VSD_DTYPE_Q_U32: Q_SIFVAL(VSD(q32, vsd)->uq32, 0); break; case VSD_DTYPE_Q_S64: Q_SIFVAL(VSD(q64, vsd)->sq64, 0); break; case VSD_DTYPE_Q_U64: Q_SIFVAL(VSD(q64, vsd)->uq64, 0); break; default: bzero(vsd, vs->dsz); break; } break; case VS_STYPE_MAX: switch (vs->dtype) { case VSD_DTYPE_Q_S32: Q_SIFVAL(VSD(q32, vsd)->sq32, Q_IFMINVAL(VSD(q32, vsd)->sq32)); break; case VSD_DTYPE_Q_U32: Q_SIFVAL(VSD(q32, vsd)->uq32, Q_IFMINVAL(VSD(q32, vsd)->uq32)); break; case VSD_DTYPE_Q_S64: Q_SIFVAL(VSD(q64, vsd)->sq64, Q_IFMINVAL(VSD(q64, vsd)->sq64)); break; case VSD_DTYPE_Q_U64: Q_SIFVAL(VSD(q64, vsd)->uq64, Q_IFMINVAL(VSD(q64, vsd)->uq64)); break; default: memcpy(vsd, &numeric_limits[LIM_MIN][vs->dtype], vs->dsz); break; } break; case VS_STYPE_MIN: switch (vs->dtype) { case VSD_DTYPE_Q_S32: Q_SIFVAL(VSD(q32, vsd)->sq32, Q_IFMAXVAL(VSD(q32, vsd)->sq32)); break; case VSD_DTYPE_Q_U32: Q_SIFVAL(VSD(q32, vsd)->uq32, Q_IFMAXVAL(VSD(q32, vsd)->uq32)); break; case VSD_DTYPE_Q_S64: Q_SIFVAL(VSD(q64, vsd)->sq64, Q_IFMAXVAL(VSD(q64, vsd)->sq64)); break; case VSD_DTYPE_Q_U64: Q_SIFVAL(VSD(q64, vsd)->uq64, Q_IFMAXVAL(VSD(q64, vsd)->uq64)); break; default: memcpy(vsd, &numeric_limits[LIM_MAX][vs->dtype], vs->dsz); break; } break; case VS_STYPE_HIST: { /* Reset bucket counts. */ struct voistatdata_hist *hist; int i, is32bit; uint16_t nbkts; hist = VSD(hist, vsd); switch (vs->dtype) { case VSD_DTYPE_CRHIST32: nbkts = HIST_VSDSZ2NBKTS(crhist32, vs->dsz); is32bit = 1; break; case VSD_DTYPE_DRHIST32: nbkts = HIST_VSDSZ2NBKTS(drhist32, vs->dsz); is32bit = 1; break; case VSD_DTYPE_DVHIST32: nbkts = HIST_VSDSZ2NBKTS(dvhist32, vs->dsz); is32bit = 1; break; case VSD_DTYPE_CRHIST64: nbkts = HIST_VSDSZ2NBKTS(crhist64, vs->dsz); is32bit = 0; break; case VSD_DTYPE_DRHIST64: nbkts = HIST_VSDSZ2NBKTS(drhist64, vs->dsz); is32bit = 0; break; case VSD_DTYPE_DVHIST64: nbkts = HIST_VSDSZ2NBKTS(dvhist64, vs->dsz); is32bit = 0; break; default: return (0); } bzero(VSD_HIST_FIELDPTR(hist, vs->dtype, oob), is32bit ? sizeof(uint32_t) : sizeof(uint64_t)); for (i = nbkts - 1; i >= 0; i--) { bzero(VSD_HIST_FIELDPTR(hist, vs->dtype, bkts[i].cnt), is32bit ? sizeof(uint32_t) : sizeof(uint64_t)); } break; } case VS_STYPE_TDGST: { /* Reset sample count centroids array/tree. */ struct voistatdata_tdgst *tdgst; struct ctdth32 *ctd32tree; struct ctdth64 *ctd64tree; struct voistatdata_tdgstctd32 *ctd32; struct voistatdata_tdgstctd64 *ctd64; tdgst = VSD(tdgst, vsd); switch (vs->dtype) { case VSD_DTYPE_TDGSTCLUST32: VSD(tdgstclust32, tdgst)->smplcnt = 0; VSD(tdgstclust32, tdgst)->compcnt = 0; ctd32tree = &VSD(tdgstclust32, tdgst)->ctdtree; ARB_INIT(ctd32, ctdlnk, ctd32tree, ARB_MAXNODES(ctd32tree)) { ctd32->cnt = 0; Q_SIFVAL(ctd32->mu, 0); } #ifdef DIAGNOSTIC RB_INIT(&VSD(tdgstclust32, tdgst)->rbctdtree); #endif break; case VSD_DTYPE_TDGSTCLUST64: VSD(tdgstclust64, tdgst)->smplcnt = 0; VSD(tdgstclust64, tdgst)->compcnt = 0; ctd64tree = &VSD(tdgstclust64, tdgst)->ctdtree; ARB_INIT(ctd64, ctdlnk, ctd64tree, ARB_MAXNODES(ctd64tree)) { ctd64->cnt = 0; Q_SIFVAL(ctd64->mu, 0); } #ifdef DIAGNOSTIC RB_INIT(&VSD(tdgstclust64, tdgst)->rbctdtree); #endif break; default: return (0); } break; } default: KASSERT(0, ("Unknown VOI stat type %d", vs->stype)); break; } vs->errs = 0; vs->flags &= ~VS_VSDVALID; return (0); } int stats_v1_blob_snapshot(struct statsblobv1 **dst, size_t dstmaxsz, struct statsblobv1 *src, uint32_t flags) { int error; if (src != NULL && src->abi == STATS_ABI_V1) { error = stats_v1_blob_clone(dst, dstmaxsz, src, flags); if (!error) { if (flags & SB_CLONE_RSTSRC) { stats_v1_blob_iter(src, stats_v1_icb_reset_voistat, NULL, 0); src->lastrst = stats_sbinuptime(); } stats_v1_blob_finalise(*dst); } } else error = EINVAL; return (error); } static inline int stats_v1_voi_update_max(enum vsd_dtype voi_dtype __unused, struct voistatdata *voival, struct voistat *vs, void *vsd) { int error; KASSERT(vs->dtype < VSD_NUM_DTYPES, ("Unknown VSD dtype %d", vs->dtype)); error = 0; switch (vs->dtype) { case VSD_DTYPE_INT_S32: if (VSD(int32, vsd)->s32 < voival->int32.s32) { VSD(int32, vsd)->s32 = voival->int32.s32; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_U32: if (VSD(int32, vsd)->u32 < voival->int32.u32) { VSD(int32, vsd)->u32 = voival->int32.u32; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_S64: if (VSD(int64, vsd)->s64 < voival->int64.s64) { VSD(int64, vsd)->s64 = voival->int64.s64; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_U64: if (VSD(int64, vsd)->u64 < voival->int64.u64) { VSD(int64, vsd)->u64 = voival->int64.u64; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_SLONG: if (VSD(intlong, vsd)->slong < voival->intlong.slong) { VSD(intlong, vsd)->slong = voival->intlong.slong; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_ULONG: if (VSD(intlong, vsd)->ulong < voival->intlong.ulong) { VSD(intlong, vsd)->ulong = voival->intlong.ulong; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_S32: if (Q_QLTQ(VSD(q32, vsd)->sq32, voival->q32.sq32) && (0 == (error = Q_QCPYVALQ(&VSD(q32, vsd)->sq32, voival->q32.sq32)))) { vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_U32: if (Q_QLTQ(VSD(q32, vsd)->uq32, voival->q32.uq32) && (0 == (error = Q_QCPYVALQ(&VSD(q32, vsd)->uq32, voival->q32.uq32)))) { vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_S64: if (Q_QLTQ(VSD(q64, vsd)->sq64, voival->q64.sq64) && (0 == (error = Q_QCPYVALQ(&VSD(q64, vsd)->sq64, voival->q64.sq64)))) { vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_U64: if (Q_QLTQ(VSD(q64, vsd)->uq64, voival->q64.uq64) && (0 == (error = Q_QCPYVALQ(&VSD(q64, vsd)->uq64, voival->q64.uq64)))) { vs->flags |= VS_VSDVALID; } break; default: error = EINVAL; break; } return (error); } static inline int stats_v1_voi_update_min(enum vsd_dtype voi_dtype __unused, struct voistatdata *voival, struct voistat *vs, void *vsd) { int error; KASSERT(vs->dtype < VSD_NUM_DTYPES, ("Unknown VSD dtype %d", vs->dtype)); error = 0; switch (vs->dtype) { case VSD_DTYPE_INT_S32: if (VSD(int32, vsd)->s32 > voival->int32.s32) { VSD(int32, vsd)->s32 = voival->int32.s32; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_U32: if (VSD(int32, vsd)->u32 > voival->int32.u32) { VSD(int32, vsd)->u32 = voival->int32.u32; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_S64: if (VSD(int64, vsd)->s64 > voival->int64.s64) { VSD(int64, vsd)->s64 = voival->int64.s64; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_U64: if (VSD(int64, vsd)->u64 > voival->int64.u64) { VSD(int64, vsd)->u64 = voival->int64.u64; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_SLONG: if (VSD(intlong, vsd)->slong > voival->intlong.slong) { VSD(intlong, vsd)->slong = voival->intlong.slong; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_INT_ULONG: if (VSD(intlong, vsd)->ulong > voival->intlong.ulong) { VSD(intlong, vsd)->ulong = voival->intlong.ulong; vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_S32: if (Q_QGTQ(VSD(q32, vsd)->sq32, voival->q32.sq32) && (0 == (error = Q_QCPYVALQ(&VSD(q32, vsd)->sq32, voival->q32.sq32)))) { vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_U32: if (Q_QGTQ(VSD(q32, vsd)->uq32, voival->q32.uq32) && (0 == (error = Q_QCPYVALQ(&VSD(q32, vsd)->uq32, voival->q32.uq32)))) { vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_S64: if (Q_QGTQ(VSD(q64, vsd)->sq64, voival->q64.sq64) && (0 == (error = Q_QCPYVALQ(&VSD(q64, vsd)->sq64, voival->q64.sq64)))) { vs->flags |= VS_VSDVALID; } break; case VSD_DTYPE_Q_U64: if (Q_QGTQ(VSD(q64, vsd)->uq64, voival->q64.uq64) && (0 == (error = Q_QCPYVALQ(&VSD(q64, vsd)->uq64, voival->q64.uq64)))) { vs->flags |= VS_VSDVALID; } break; default: error = EINVAL; break; } return (error); } static inline int stats_v1_voi_update_sum(enum vsd_dtype voi_dtype __unused, struct voistatdata *voival, struct voistat *vs, void *vsd) { int error; KASSERT(vs->dtype < VSD_NUM_DTYPES, ("Unknown VSD dtype %d", vs->dtype)); error = 0; switch (vs->dtype) { case VSD_DTYPE_INT_S32: VSD(int32, vsd)->s32 += voival->int32.s32; break; case VSD_DTYPE_INT_U32: VSD(int32, vsd)->u32 += voival->int32.u32; break; case VSD_DTYPE_INT_S64: VSD(int64, vsd)->s64 += voival->int64.s64; break; case VSD_DTYPE_INT_U64: VSD(int64, vsd)->u64 += voival->int64.u64; break; case VSD_DTYPE_INT_SLONG: VSD(intlong, vsd)->slong += voival->intlong.slong; break; case VSD_DTYPE_INT_ULONG: VSD(intlong, vsd)->ulong += voival->intlong.ulong; break; case VSD_DTYPE_Q_S32: error = Q_QADDQ(&VSD(q32, vsd)->sq32, voival->q32.sq32); break; case VSD_DTYPE_Q_U32: error = Q_QADDQ(&VSD(q32, vsd)->uq32, voival->q32.uq32); break; case VSD_DTYPE_Q_S64: error = Q_QADDQ(&VSD(q64, vsd)->sq64, voival->q64.sq64); break; case VSD_DTYPE_Q_U64: error = Q_QADDQ(&VSD(q64, vsd)->uq64, voival->q64.uq64); break; default: error = EINVAL; break; } if (!error) vs->flags |= VS_VSDVALID; return (error); } static inline int stats_v1_voi_update_hist(enum vsd_dtype voi_dtype, struct voistatdata *voival, struct voistat *vs, struct voistatdata_hist *hist) { struct voistatdata_numeric *bkt_lb, *bkt_ub; uint64_t *oob64, *cnt64; uint32_t *oob32, *cnt32; int error, i, found, is32bit, has_ub, eq_only; error = 0; switch (vs->dtype) { case VSD_DTYPE_CRHIST32: i = HIST_VSDSZ2NBKTS(crhist32, vs->dsz); is32bit = 1; has_ub = eq_only = 0; oob32 = &VSD(crhist32, hist)->oob; break; case VSD_DTYPE_DRHIST32: i = HIST_VSDSZ2NBKTS(drhist32, vs->dsz); is32bit = has_ub = 1; eq_only = 0; oob32 = &VSD(drhist32, hist)->oob; break; case VSD_DTYPE_DVHIST32: i = HIST_VSDSZ2NBKTS(dvhist32, vs->dsz); is32bit = eq_only = 1; has_ub = 0; oob32 = &VSD(dvhist32, hist)->oob; break; case VSD_DTYPE_CRHIST64: i = HIST_VSDSZ2NBKTS(crhist64, vs->dsz); is32bit = has_ub = eq_only = 0; oob64 = &VSD(crhist64, hist)->oob; break; case VSD_DTYPE_DRHIST64: i = HIST_VSDSZ2NBKTS(drhist64, vs->dsz); is32bit = eq_only = 0; has_ub = 1; oob64 = &VSD(drhist64, hist)->oob; break; case VSD_DTYPE_DVHIST64: i = HIST_VSDSZ2NBKTS(dvhist64, vs->dsz); is32bit = has_ub = 0; eq_only = 1; oob64 = &VSD(dvhist64, hist)->oob; break; default: return (EINVAL); } i--; /* Adjust for 0-based array index. */ /* XXXLAS: Should probably use a better bucket search algorithm. ARB? */ for (found = 0; i >= 0 && !found; i--) { switch (vs->dtype) { case VSD_DTYPE_CRHIST32: bkt_lb = &VSD(crhist32, hist)->bkts[i].lb; cnt32 = &VSD(crhist32, hist)->bkts[i].cnt; break; case VSD_DTYPE_DRHIST32: bkt_lb = &VSD(drhist32, hist)->bkts[i].lb; bkt_ub = &VSD(drhist32, hist)->bkts[i].ub; cnt32 = &VSD(drhist32, hist)->bkts[i].cnt; break; case VSD_DTYPE_DVHIST32: bkt_lb = &VSD(dvhist32, hist)->bkts[i].val; cnt32 = &VSD(dvhist32, hist)->bkts[i].cnt; break; case VSD_DTYPE_CRHIST64: bkt_lb = &VSD(crhist64, hist)->bkts[i].lb; cnt64 = &VSD(crhist64, hist)->bkts[i].cnt; break; case VSD_DTYPE_DRHIST64: bkt_lb = &VSD(drhist64, hist)->bkts[i].lb; bkt_ub = &VSD(drhist64, hist)->bkts[i].ub; cnt64 = &VSD(drhist64, hist)->bkts[i].cnt; break; case VSD_DTYPE_DVHIST64: bkt_lb = &VSD(dvhist64, hist)->bkts[i].val; cnt64 = &VSD(dvhist64, hist)->bkts[i].cnt; break; default: return (EINVAL); } switch (voi_dtype) { case VSD_DTYPE_INT_S32: if (voival->int32.s32 >= bkt_lb->int32.s32) { if ((eq_only && voival->int32.s32 == bkt_lb->int32.s32) || (!eq_only && (!has_ub || voival->int32.s32 < bkt_ub->int32.s32))) found = 1; } break; case VSD_DTYPE_INT_U32: if (voival->int32.u32 >= bkt_lb->int32.u32) { if ((eq_only && voival->int32.u32 == bkt_lb->int32.u32) || (!eq_only && (!has_ub || voival->int32.u32 < bkt_ub->int32.u32))) found = 1; } break; case VSD_DTYPE_INT_S64: if (voival->int64.s64 >= bkt_lb->int64.s64) if ((eq_only && voival->int64.s64 == bkt_lb->int64.s64) || (!eq_only && (!has_ub || voival->int64.s64 < bkt_ub->int64.s64))) found = 1; break; case VSD_DTYPE_INT_U64: if (voival->int64.u64 >= bkt_lb->int64.u64) if ((eq_only && voival->int64.u64 == bkt_lb->int64.u64) || (!eq_only && (!has_ub || voival->int64.u64 < bkt_ub->int64.u64))) found = 1; break; case VSD_DTYPE_INT_SLONG: if (voival->intlong.slong >= bkt_lb->intlong.slong) if ((eq_only && voival->intlong.slong == bkt_lb->intlong.slong) || (!eq_only && (!has_ub || voival->intlong.slong < bkt_ub->intlong.slong))) found = 1; break; case VSD_DTYPE_INT_ULONG: if (voival->intlong.ulong >= bkt_lb->intlong.ulong) if ((eq_only && voival->intlong.ulong == bkt_lb->intlong.ulong) || (!eq_only && (!has_ub || voival->intlong.ulong < bkt_ub->intlong.ulong))) found = 1; break; case VSD_DTYPE_Q_S32: if (Q_QGEQ(voival->q32.sq32, bkt_lb->q32.sq32)) if ((eq_only && Q_QEQ(voival->q32.sq32, bkt_lb->q32.sq32)) || (!eq_only && (!has_ub || Q_QLTQ(voival->q32.sq32, bkt_ub->q32.sq32)))) found = 1; break; case VSD_DTYPE_Q_U32: if (Q_QGEQ(voival->q32.uq32, bkt_lb->q32.uq32)) if ((eq_only && Q_QEQ(voival->q32.uq32, bkt_lb->q32.uq32)) || (!eq_only && (!has_ub || Q_QLTQ(voival->q32.uq32, bkt_ub->q32.uq32)))) found = 1; break; case VSD_DTYPE_Q_S64: if (Q_QGEQ(voival->q64.sq64, bkt_lb->q64.sq64)) if ((eq_only && Q_QEQ(voival->q64.sq64, bkt_lb->q64.sq64)) || (!eq_only && (!has_ub || Q_QLTQ(voival->q64.sq64, bkt_ub->q64.sq64)))) found = 1; break; case VSD_DTYPE_Q_U64: if (Q_QGEQ(voival->q64.uq64, bkt_lb->q64.uq64)) if ((eq_only && Q_QEQ(voival->q64.uq64, bkt_lb->q64.uq64)) || (!eq_only && (!has_ub || Q_QLTQ(voival->q64.uq64, bkt_ub->q64.uq64)))) found = 1; break; default: break; } } if (found) { if (is32bit) *cnt32 += 1; else *cnt64 += 1; } else { if (is32bit) *oob32 += 1; else *oob64 += 1; } vs->flags |= VS_VSDVALID; return (error); } static inline int stats_v1_vsd_tdgst_compress(enum vsd_dtype vs_dtype, struct voistatdata_tdgst *tdgst, int attempt) { struct ctdth32 *ctd32tree; struct ctdth64 *ctd64tree; struct voistatdata_tdgstctd32 *ctd32; struct voistatdata_tdgstctd64 *ctd64; uint64_t ebits, idxmask; uint32_t bitsperidx, nebits; int error, idx, is32bit, maxctds, remctds, tmperr; error = 0; switch (vs_dtype) { case VSD_DTYPE_TDGSTCLUST32: ctd32tree = &VSD(tdgstclust32, tdgst)->ctdtree; if (!ARB_FULL(ctd32tree)) return (0); VSD(tdgstclust32, tdgst)->compcnt++; maxctds = remctds = ARB_MAXNODES(ctd32tree); ARB_RESET_TREE(ctd32tree, ctdth32, maxctds); VSD(tdgstclust32, tdgst)->smplcnt = 0; is32bit = 1; ctd64tree = NULL; ctd64 = NULL; #ifdef DIAGNOSTIC RB_INIT(&VSD(tdgstclust32, tdgst)->rbctdtree); #endif break; case VSD_DTYPE_TDGSTCLUST64: ctd64tree = &VSD(tdgstclust64, tdgst)->ctdtree; if (!ARB_FULL(ctd64tree)) return (0); VSD(tdgstclust64, tdgst)->compcnt++; maxctds = remctds = ARB_MAXNODES(ctd64tree); ARB_RESET_TREE(ctd64tree, ctdth64, maxctds); VSD(tdgstclust64, tdgst)->smplcnt = 0; is32bit = 0; ctd32tree = NULL; ctd32 = NULL; #ifdef DIAGNOSTIC RB_INIT(&VSD(tdgstclust64, tdgst)->rbctdtree); #endif break; default: return (EINVAL); } /* * Rebuild the t-digest ARB by pseudorandomly selecting centroids and * re-inserting the mu/cnt of each as a value and corresponding weight. */ -#define bitsperrand 31 /* Per random(3). */ + /* + * XXXCEM: random(9) is currently rand(3), not random(3). rand(3) + * RAND_MAX happens to be approximately 31 bits (range [0, + * 0x7ffffffd]), so the math kinda works out. When/if this portion of + * the code is compiled in userspace, it gets the random(3) behavior, + * which has expected range [0, 0x7fffffff]. + */ +#define bitsperrand 31 ebits = 0; nebits = 0; bitsperidx = fls(maxctds); KASSERT(bitsperidx <= sizeof(ebits) << 3, ("%s: bitsperidx=%d, ebits=%d", __func__, bitsperidx, (int)(sizeof(ebits) << 3))); idxmask = (UINT64_C(1) << bitsperidx) - 1; - srandom(stats_sbinuptime()); /* Initialise the free list with randomised centroid indices. */ for (; remctds > 0; remctds--) { while (nebits < bitsperidx) { ebits |= ((uint64_t)random()) << nebits; nebits += bitsperrand; if (nebits > (sizeof(ebits) << 3)) nebits = sizeof(ebits) << 3; } idx = ebits & idxmask; nebits -= bitsperidx; ebits >>= bitsperidx; /* * Select the next centroid to put on the ARB free list. We * start with the centroid at our randomly selected array index, * and work our way forwards until finding one (the latter * aspect reduces re-insertion randomness, but is good enough). */ do { if (idx >= maxctds) idx %= maxctds; if (is32bit) ctd32 = ARB_NODE(ctd32tree, idx); else ctd64 = ARB_NODE(ctd64tree, idx); } while ((is32bit ? ARB_ISFREE(ctd32, ctdlnk) : ARB_ISFREE(ctd64, ctdlnk)) && ++idx); /* Put the centroid on the ARB free list. */ if (is32bit) ARB_RETURNFREE(ctd32tree, ctd32, ctdlnk); else ARB_RETURNFREE(ctd64tree, ctd64, ctdlnk); } /* * The free list now contains the randomised indices of every centroid. * Walk the free list from start to end, re-inserting each centroid's * mu/cnt. The tdgst_add() call may or may not consume the free centroid * we re-insert values from during each loop iteration, so we must latch * the index of the next free list centroid before the re-insertion * call. The previous loop above should have left the centroid pointer * pointing to the element at the head of the free list. */ KASSERT((is32bit ? ARB_FREEIDX(ctd32tree) == ARB_SELFIDX(ctd32tree, ctd32) : ARB_FREEIDX(ctd64tree) == ARB_SELFIDX(ctd64tree, ctd64)), ("%s: t-digest ARB@%p free list bug", __func__, (is32bit ? (void *)ctd32tree : (void *)ctd64tree))); remctds = maxctds; while ((is32bit ? ctd32 != NULL : ctd64 != NULL)) { tmperr = 0; if (is32bit) { s64q_t x; idx = ARB_NEXTFREEIDX(ctd32, ctdlnk); /* Cloning a s32q_t into a s64q_t should never fail. */ tmperr = Q_QCLONEQ(&x, ctd32->mu); tmperr = tmperr ? tmperr : stats_v1_vsd_tdgst_add( vs_dtype, tdgst, x, ctd32->cnt, attempt); ctd32 = ARB_NODE(ctd32tree, idx); KASSERT(ctd32 == NULL || ARB_ISFREE(ctd32, ctdlnk), ("%s: t-digest ARB@%p free list bug", __func__, ctd32tree)); } else { idx = ARB_NEXTFREEIDX(ctd64, ctdlnk); tmperr = stats_v1_vsd_tdgst_add(vs_dtype, tdgst, ctd64->mu, ctd64->cnt, attempt); ctd64 = ARB_NODE(ctd64tree, idx); KASSERT(ctd64 == NULL || ARB_ISFREE(ctd64, ctdlnk), ("%s: t-digest ARB@%p free list bug", __func__, ctd64tree)); } /* * This process should not produce errors, bugs notwithstanding. * Just in case, latch any errors and attempt all re-insertions. */ error = tmperr ? tmperr : error; remctds--; } KASSERT(remctds == 0, ("%s: t-digest ARB@%p free list bug", __func__, (is32bit ? (void *)ctd32tree : (void *)ctd64tree))); return (error); } static inline int stats_v1_vsd_tdgst_add(enum vsd_dtype vs_dtype, struct voistatdata_tdgst *tdgst, s64q_t x, uint64_t weight, int attempt) { #ifdef DIAGNOSTIC char qstr[Q_MAXSTRLEN(x, 10)]; #endif struct ctdth32 *ctd32tree; struct ctdth64 *ctd64tree; void *closest, *cur, *lb, *ub; struct voistatdata_tdgstctd32 *ctd32; struct voistatdata_tdgstctd64 *ctd64; uint64_t cnt, smplcnt, sum, tmpsum; s64q_t k, minz, q, z; int error, is32bit, n; error = 0; minz = Q_INI(&z, 0, 0, Q_NFBITS(x)); switch (vs_dtype) { case VSD_DTYPE_TDGSTCLUST32: if ((UINT32_MAX - weight) < VSD(tdgstclust32, tdgst)->smplcnt) error = EOVERFLOW; smplcnt = VSD(tdgstclust32, tdgst)->smplcnt; ctd32tree = &VSD(tdgstclust32, tdgst)->ctdtree; is32bit = 1; ctd64tree = NULL; ctd64 = NULL; break; case VSD_DTYPE_TDGSTCLUST64: if ((UINT64_MAX - weight) < VSD(tdgstclust64, tdgst)->smplcnt) error = EOVERFLOW; smplcnt = VSD(tdgstclust64, tdgst)->smplcnt; ctd64tree = &VSD(tdgstclust64, tdgst)->ctdtree; is32bit = 0; ctd32tree = NULL; ctd32 = NULL; break; default: error = EINVAL; break; } if (error) return (error); /* * Inspired by Ted Dunning's AVLTreeDigest.java */ do { #if defined(DIAGNOSTIC) KASSERT(attempt < 5, ("%s: Too many attempts", __func__)); #endif if (attempt >= 5) return (EAGAIN); Q_SIFVAL(minz, Q_IFMAXVAL(minz)); closest = ub = NULL; sum = tmpsum = 0; if (is32bit) lb = cur = (void *)(ctd32 = ARB_MIN(ctdth32, ctd32tree)); else lb = cur = (void *)(ctd64 = ARB_MIN(ctdth64, ctd64tree)); if (lb == NULL) /* Empty tree. */ lb = (is32bit ? (void *)ARB_ROOT(ctd32tree) : (void *)ARB_ROOT(ctd64tree)); /* * Find the set of centroids with minimum distance to x and * compute the sum of counts for all centroids with mean less * than the first centroid in the set. */ for (; cur != NULL; cur = (is32bit ? (void *)(ctd32 = ARB_NEXT(ctdth32, ctd32tree, ctd32)) : (void *)(ctd64 = ARB_NEXT(ctdth64, ctd64tree, ctd64)))) { if (is32bit) { cnt = ctd32->cnt; KASSERT(Q_PRECEQ(ctd32->mu, x), ("%s: Q_RELPREC(mu,x)=%d", __func__, Q_RELPREC(ctd32->mu, x))); /* Ok to assign as both have same precision. */ z = ctd32->mu; } else { cnt = ctd64->cnt; KASSERT(Q_PRECEQ(ctd64->mu, x), ("%s: Q_RELPREC(mu,x)=%d", __func__, Q_RELPREC(ctd64->mu, x))); /* Ok to assign as both have same precision. */ z = ctd64->mu; } error = Q_QSUBQ(&z, x); #if defined(DIAGNOSTIC) KASSERT(!error, ("%s: unexpected error %d", __func__, error)); #endif if (error) return (error); z = Q_QABS(z); if (Q_QLTQ(z, minz)) { minz = z; lb = cur; sum = tmpsum; tmpsum += cnt; } else if (Q_QGTQ(z, minz)) { ub = cur; break; } } cur = (is32bit ? (void *)(ctd32 = (struct voistatdata_tdgstctd32 *)lb) : (void *)(ctd64 = (struct voistatdata_tdgstctd64 *)lb)); for (n = 0; cur != ub; cur = (is32bit ? (void *)(ctd32 = ARB_NEXT(ctdth32, ctd32tree, ctd32)) : (void *)(ctd64 = ARB_NEXT(ctdth64, ctd64tree, ctd64)))) { if (is32bit) cnt = ctd32->cnt; else cnt = ctd64->cnt; q = Q_CTRLINI(16); if (smplcnt == 1) error = Q_QFRACI(&q, 1, 2); else /* [ sum + ((cnt - 1) / 2) ] / (smplcnt - 1) */ error = Q_QFRACI(&q, (sum << 1) + cnt - 1, (smplcnt - 1) << 1); k = q; /* k = q x 4 x samplcnt x attempt */ error |= Q_QMULI(&k, 4 * smplcnt * attempt); /* k = k x (1 - q) */ error |= Q_QSUBI(&q, 1); q = Q_QABS(q); error |= Q_QMULQ(&k, q); #if defined(DIAGNOSTIC) #if !defined(_KERNEL) double q_dbl, k_dbl, q2d, k2d; q2d = Q_Q2D(q); k2d = Q_Q2D(k); q_dbl = smplcnt == 1 ? 0.5 : (sum + ((cnt - 1) / 2.0)) / (double)(smplcnt - 1); k_dbl = 4 * smplcnt * q_dbl * (1.0 - q_dbl) * attempt; /* * If the difference between q and q_dbl is greater than * the fractional precision of q, something is off. * NB: q is holding the value of 1 - q */ q_dbl = 1.0 - q_dbl; KASSERT((q_dbl > q2d ? q_dbl - q2d : q2d - q_dbl) < (1.05 * ((double)1 / (double)(1ULL << Q_NFBITS(q)))), ("Q-type q bad precision")); KASSERT((k_dbl > k2d ? k_dbl - k2d : k2d - k_dbl) < 1.0 + (0.01 * smplcnt), ("Q-type k bad precision")); #endif /* !_KERNEL */ KASSERT(!error, ("%s: unexpected error %d", __func__, error)); #endif /* DIAGNOSTIC */ if (error) return (error); if ((is32bit && ((ctd32->cnt + weight) <= (uint64_t)Q_GIVAL(k))) || (!is32bit && ((ctd64->cnt + weight) <= (uint64_t)Q_GIVAL(k)))) { n++; /* random() produces 31 bits. */ if (random() < (INT32_MAX / n)) closest = cur; } sum += cnt; } } while (closest == NULL && (is32bit ? ARB_FULL(ctd32tree) : ARB_FULL(ctd64tree)) && (error = stats_v1_vsd_tdgst_compress(vs_dtype, tdgst, attempt++)) == 0); if (error) return (error); if (closest != NULL) { /* Merge with an existing centroid. */ if (is32bit) { ctd32 = (struct voistatdata_tdgstctd32 *)closest; error = Q_QSUBQ(&x, ctd32->mu); error = error ? error : Q_QDIVI(&x, ctd32->cnt + weight); if (error || (error = Q_QADDQ(&ctd32->mu, x))) { #ifdef DIAGNOSTIC KASSERT(!error, ("%s: unexpected error %d", __func__, error)); #endif return (error); } ctd32->cnt += weight; error = ARB_REINSERT(ctdth32, ctd32tree, ctd32) == NULL ? 0 : EALREADY; #ifdef DIAGNOSTIC RB_REINSERT(rbctdth32, &VSD(tdgstclust32, tdgst)->rbctdtree, ctd32); #endif } else { ctd64 = (struct voistatdata_tdgstctd64 *)closest; error = Q_QSUBQ(&x, ctd64->mu); error = error ? error : Q_QDIVI(&x, ctd64->cnt + weight); if (error || (error = Q_QADDQ(&ctd64->mu, x))) { KASSERT(!error, ("%s: unexpected error %d", __func__, error)); return (error); } ctd64->cnt += weight; error = ARB_REINSERT(ctdth64, ctd64tree, ctd64) == NULL ? 0 : EALREADY; #ifdef DIAGNOSTIC RB_REINSERT(rbctdth64, &VSD(tdgstclust64, tdgst)->rbctdtree, ctd64); #endif } } else { /* * Add a new centroid. If digest compression is working * correctly, there should always be at least one free. */ if (is32bit) { ctd32 = ARB_GETFREE(ctd32tree, ctdlnk); #ifdef DIAGNOSTIC KASSERT(ctd32 != NULL, ("%s: t-digest@%p has no free centroids", __func__, tdgst)); #endif if (ctd32 == NULL) return (EAGAIN); if ((error = Q_QCPYVALQ(&ctd32->mu, x))) return (error); ctd32->cnt = weight; error = ARB_INSERT(ctdth32, ctd32tree, ctd32) == NULL ? 0 : EALREADY; #ifdef DIAGNOSTIC RB_INSERT(rbctdth32, &VSD(tdgstclust32, tdgst)->rbctdtree, ctd32); #endif } else { ctd64 = ARB_GETFREE(ctd64tree, ctdlnk); #ifdef DIAGNOSTIC KASSERT(ctd64 != NULL, ("%s: t-digest@%p has no free centroids", __func__, tdgst)); #endif if (ctd64 == NULL) /* Should not happen. */ return (EAGAIN); /* Direct assignment ok as both have same type/prec. */ ctd64->mu = x; ctd64->cnt = weight; error = ARB_INSERT(ctdth64, ctd64tree, ctd64) == NULL ? 0 : EALREADY; #ifdef DIAGNOSTIC RB_INSERT(rbctdth64, &VSD(tdgstclust64, tdgst)->rbctdtree, ctd64); #endif } } if (is32bit) VSD(tdgstclust32, tdgst)->smplcnt += weight; else { VSD(tdgstclust64, tdgst)->smplcnt += weight; #ifdef DIAGNOSTIC struct rbctdth64 *rbctdtree = &VSD(tdgstclust64, tdgst)->rbctdtree; struct voistatdata_tdgstctd64 *rbctd64; int i = 0; ARB_FOREACH(ctd64, ctdth64, ctd64tree) { rbctd64 = (i == 0 ? RB_MIN(rbctdth64, rbctdtree) : RB_NEXT(rbctdth64, rbctdtree, rbctd64)); if (i >= ARB_CURNODES(ctd64tree) || ctd64 != rbctd64 || ARB_MIN(ctdth64, ctd64tree) != RB_MIN(rbctdth64, rbctdtree) || ARB_MAX(ctdth64, ctd64tree) != RB_MAX(rbctdth64, rbctdtree) || ARB_LEFTIDX(ctd64, ctdlnk) != ARB_SELFIDX(ctd64tree, RB_LEFT(rbctd64, rblnk)) || ARB_RIGHTIDX(ctd64, ctdlnk) != ARB_SELFIDX(ctd64tree, RB_RIGHT(rbctd64, rblnk)) || ARB_PARENTIDX(ctd64, ctdlnk) != ARB_SELFIDX(ctd64tree, RB_PARENT(rbctd64, rblnk))) { Q_TOSTR(ctd64->mu, -1, 10, qstr, sizeof(qstr)); printf("ARB ctd=%3d p=%3d l=%3d r=%3d c=%2d " "mu=%s\n", (int)ARB_SELFIDX(ctd64tree, ctd64), ARB_PARENTIDX(ctd64, ctdlnk), ARB_LEFTIDX(ctd64, ctdlnk), ARB_RIGHTIDX(ctd64, ctdlnk), ARB_COLOR(ctd64, ctdlnk), qstr); Q_TOSTR(rbctd64->mu, -1, 10, qstr, sizeof(qstr)); printf(" RB ctd=%3d p=%3d l=%3d r=%3d c=%2d " "mu=%s\n", (int)ARB_SELFIDX(ctd64tree, rbctd64), (int)ARB_SELFIDX(ctd64tree, RB_PARENT(rbctd64, rblnk)), (int)ARB_SELFIDX(ctd64tree, RB_LEFT(rbctd64, rblnk)), (int)ARB_SELFIDX(ctd64tree, RB_RIGHT(rbctd64, rblnk)), RB_COLOR(rbctd64, rblnk), qstr); panic("RB@%p and ARB@%p trees differ\n", rbctdtree, ctd64tree); } i++; } #endif /* DIAGNOSTIC */ } return (error); } static inline int stats_v1_voi_update_tdgst(enum vsd_dtype voi_dtype, struct voistatdata *voival, struct voistat *vs, struct voistatdata_tdgst *tdgst) { s64q_t x; int error; error = 0; switch (vs->dtype) { case VSD_DTYPE_TDGSTCLUST32: /* Use same precision as the user's centroids. */ Q_INI(&x, 0, 0, Q_NFBITS( ARB_CNODE(&VSD(tdgstclust32, tdgst)->ctdtree, 0)->mu)); break; case VSD_DTYPE_TDGSTCLUST64: /* Use same precision as the user's centroids. */ Q_INI(&x, 0, 0, Q_NFBITS( ARB_CNODE(&VSD(tdgstclust64, tdgst)->ctdtree, 0)->mu)); break; default: KASSERT(vs->dtype == VSD_DTYPE_TDGSTCLUST32 || vs->dtype == VSD_DTYPE_TDGSTCLUST64, ("%s: vs->dtype(%d) != VSD_DTYPE_TDGSTCLUST<32|64>", __func__, vs->dtype)); return (EINVAL); } /* * XXXLAS: Should have both a signed and unsigned 'x' variable to avoid * returning EOVERFLOW if the voival would have fit in a u64q_t. */ switch (voi_dtype) { case VSD_DTYPE_INT_S32: error = Q_QCPYVALI(&x, voival->int32.s32); break; case VSD_DTYPE_INT_U32: error = Q_QCPYVALI(&x, voival->int32.u32); break; case VSD_DTYPE_INT_S64: error = Q_QCPYVALI(&x, voival->int64.s64); break; case VSD_DTYPE_INT_U64: error = Q_QCPYVALI(&x, voival->int64.u64); break; case VSD_DTYPE_INT_SLONG: error = Q_QCPYVALI(&x, voival->intlong.slong); break; case VSD_DTYPE_INT_ULONG: error = Q_QCPYVALI(&x, voival->intlong.ulong); break; case VSD_DTYPE_Q_S32: error = Q_QCPYVALQ(&x, voival->q32.sq32); break; case VSD_DTYPE_Q_U32: error = Q_QCPYVALQ(&x, voival->q32.uq32); break; case VSD_DTYPE_Q_S64: error = Q_QCPYVALQ(&x, voival->q64.sq64); break; case VSD_DTYPE_Q_U64: error = Q_QCPYVALQ(&x, voival->q64.uq64); break; default: error = EINVAL; break; } if (error || (error = stats_v1_vsd_tdgst_add(vs->dtype, tdgst, x, 1, 1))) return (error); vs->flags |= VS_VSDVALID; return (0); } int stats_v1_voi_update(struct statsblobv1 *sb, int32_t voi_id, enum vsd_dtype voi_dtype, struct voistatdata *voival, uint32_t flags) { struct voi *v; struct voistat *vs; void *statevsd, *vsd; int error, i, tmperr; error = 0; if (sb == NULL || sb->abi != STATS_ABI_V1 || voi_id >= NVOIS(sb) || voi_dtype == 0 || voi_dtype >= VSD_NUM_DTYPES || voival == NULL) return (EINVAL); v = &sb->vois[voi_id]; if (voi_dtype != v->dtype || v->id < 0 || ((flags & SB_VOI_RELUPDATE) && !(v->flags & VOI_REQSTATE))) return (EINVAL); vs = BLOB_OFFSET(sb, v->stats_off); if (v->flags & VOI_REQSTATE) statevsd = BLOB_OFFSET(sb, vs->data_off); else statevsd = NULL; if (flags & SB_VOI_RELUPDATE) { switch (voi_dtype) { case VSD_DTYPE_INT_S32: voival->int32.s32 += VSD(voistate, statevsd)->prev.int32.s32; break; case VSD_DTYPE_INT_U32: voival->int32.u32 += VSD(voistate, statevsd)->prev.int32.u32; break; case VSD_DTYPE_INT_S64: voival->int64.s64 += VSD(voistate, statevsd)->prev.int64.s64; break; case VSD_DTYPE_INT_U64: voival->int64.u64 += VSD(voistate, statevsd)->prev.int64.u64; break; case VSD_DTYPE_INT_SLONG: voival->intlong.slong += VSD(voistate, statevsd)->prev.intlong.slong; break; case VSD_DTYPE_INT_ULONG: voival->intlong.ulong += VSD(voistate, statevsd)->prev.intlong.ulong; break; case VSD_DTYPE_Q_S32: error = Q_QADDQ(&voival->q32.sq32, VSD(voistate, statevsd)->prev.q32.sq32); break; case VSD_DTYPE_Q_U32: error = Q_QADDQ(&voival->q32.uq32, VSD(voistate, statevsd)->prev.q32.uq32); break; case VSD_DTYPE_Q_S64: error = Q_QADDQ(&voival->q64.sq64, VSD(voistate, statevsd)->prev.q64.sq64); break; case VSD_DTYPE_Q_U64: error = Q_QADDQ(&voival->q64.uq64, VSD(voistate, statevsd)->prev.q64.uq64); break; default: KASSERT(0, ("Unknown VOI data type %d", voi_dtype)); break; } } if (error) return (error); for (i = v->voistatmaxid; i > 0; i--) { vs = &((struct voistat *)BLOB_OFFSET(sb, v->stats_off))[i]; if (vs->stype < 0) continue; vsd = BLOB_OFFSET(sb, vs->data_off); switch (vs->stype) { case VS_STYPE_MAX: tmperr = stats_v1_voi_update_max(voi_dtype, voival, vs, vsd); break; case VS_STYPE_MIN: tmperr = stats_v1_voi_update_min(voi_dtype, voival, vs, vsd); break; case VS_STYPE_SUM: tmperr = stats_v1_voi_update_sum(voi_dtype, voival, vs, vsd); break; case VS_STYPE_HIST: tmperr = stats_v1_voi_update_hist(voi_dtype, voival, vs, vsd); break; case VS_STYPE_TDGST: tmperr = stats_v1_voi_update_tdgst(voi_dtype, voival, vs, vsd); break; default: KASSERT(0, ("Unknown VOI stat type %d", vs->stype)); break; } if (tmperr) { error = tmperr; VS_INCERRS(vs); } } if (statevsd) { switch (voi_dtype) { case VSD_DTYPE_INT_S32: VSD(voistate, statevsd)->prev.int32.s32 = voival->int32.s32; break; case VSD_DTYPE_INT_U32: VSD(voistate, statevsd)->prev.int32.u32 = voival->int32.u32; break; case VSD_DTYPE_INT_S64: VSD(voistate, statevsd)->prev.int64.s64 = voival->int64.s64; break; case VSD_DTYPE_INT_U64: VSD(voistate, statevsd)->prev.int64.u64 = voival->int64.u64; break; case VSD_DTYPE_INT_SLONG: VSD(voistate, statevsd)->prev.intlong.slong = voival->intlong.slong; break; case VSD_DTYPE_INT_ULONG: VSD(voistate, statevsd)->prev.intlong.ulong = voival->intlong.ulong; break; case VSD_DTYPE_Q_S32: error = Q_QCPYVALQ( &VSD(voistate, statevsd)->prev.q32.sq32, voival->q32.sq32); break; case VSD_DTYPE_Q_U32: error = Q_QCPYVALQ( &VSD(voistate, statevsd)->prev.q32.uq32, voival->q32.uq32); break; case VSD_DTYPE_Q_S64: error = Q_QCPYVALQ( &VSD(voistate, statevsd)->prev.q64.sq64, voival->q64.sq64); break; case VSD_DTYPE_Q_U64: error = Q_QCPYVALQ( &VSD(voistate, statevsd)->prev.q64.uq64, voival->q64.uq64); break; default: KASSERT(0, ("Unknown VOI data type %d", voi_dtype)); break; } } return (error); } #ifdef _KERNEL static void stats_init(void *arg) { } SYSINIT(stats, SI_SUB_KDTRACE, SI_ORDER_FIRST, stats_init, NULL); /* * Sysctl handler to display the list of available stats templates. */ static int stats_tpl_list_available(SYSCTL_HANDLER_ARGS) { struct sbuf *s; int err, i; err = 0; /* We can tolerate ntpl being stale, so do not take the lock. */ s = sbuf_new(NULL, NULL, /* +1 per tpl for , */ ntpl * (STATS_TPL_MAX_STR_SPEC_LEN + 1), SBUF_FIXEDLEN); if (s == NULL) return (ENOMEM); TPL_LIST_RLOCK(); for (i = 0; i < ntpl; i++) { err = sbuf_printf(s, "%s\"%s\":%u", i ? "," : "", tpllist[i]->mb->tplname, tpllist[i]->mb->tplhash); if (err) { /* Sbuf overflow condition. */ err = EOVERFLOW; break; } } TPL_LIST_RUNLOCK(); if (!err) { sbuf_finish(s); err = sysctl_handle_string(oidp, sbuf_data(s), 0, req); } sbuf_delete(s); return (err); } /* * Called by subsystem-specific sysctls to report and/or parse the list of * templates being sampled and their sampling rates. A stats_tpl_sr_cb_t * conformant function pointer must be passed in as arg1, which is used to * interact with the subsystem's stats template sample rates list. If arg2 > 0, * a zero-initialised allocation of arg2-sized contextual memory is * heap-allocated and passed in to all subsystem callbacks made during the * operation of stats_tpl_sample_rates(). * * XXXLAS: Assumes templates are never removed, which is currently true but may * need to be reworked in future if dynamic template management becomes a * requirement e.g. to support kernel module based templates. */ int stats_tpl_sample_rates(SYSCTL_HANDLER_ARGS) { char kvpair_fmt[16], tplspec_fmt[16]; char tpl_spec[STATS_TPL_MAX_STR_SPEC_LEN]; char tpl_name[TPL_MAX_NAME_LEN + 2]; /* +2 for "" */ stats_tpl_sr_cb_t subsys_cb; void *subsys_ctx; char *buf, *new_rates_usr_str, *tpl_name_p; struct stats_tpl_sample_rate *rates; struct sbuf *s, _s; uint32_t cum_pct, pct, tpl_hash; int err, i, off, len, newlen, nrates; buf = NULL; rates = NULL; err = nrates = 0; subsys_cb = (stats_tpl_sr_cb_t)arg1; KASSERT(subsys_cb != NULL, ("%s: subsys_cb == arg1 == NULL", __func__)); if (arg2 > 0) subsys_ctx = malloc(arg2, M_TEMP, M_WAITOK | M_ZERO); else subsys_ctx = NULL; /* Grab current count of subsystem rates. */ err = subsys_cb(TPL_SR_UNLOCKED_GET, NULL, &nrates, subsys_ctx); if (err) goto done; /* +1 to ensure we can append '\0' post copyin, +5 per rate for =nnn, */ len = max(req->newlen + 1, nrates * (STATS_TPL_MAX_STR_SPEC_LEN + 5)); if (req->oldptr != NULL || req->newptr != NULL) buf = malloc(len, M_TEMP, M_WAITOK); if (req->oldptr != NULL) { if (nrates == 0) { /* No rates, so return an empty string via oldptr. */ err = SYSCTL_OUT(req, "", 1); if (err) goto done; goto process_new; } s = sbuf_new(&_s, buf, len, SBUF_FIXEDLEN | SBUF_INCLUDENUL); /* Grab locked count of, and ptr to, subsystem rates. */ err = subsys_cb(TPL_SR_RLOCKED_GET, &rates, &nrates, subsys_ctx); if (err) goto done; TPL_LIST_RLOCK(); for (i = 0; i < nrates && !err; i++) { err = sbuf_printf(s, "%s\"%s\":%u=%u", i ? "," : "", tpllist[rates[i].tpl_slot_id]->mb->tplname, tpllist[rates[i].tpl_slot_id]->mb->tplhash, rates[i].tpl_sample_pct); } TPL_LIST_RUNLOCK(); /* Tell subsystem that we're done with its rates list. */ err = subsys_cb(TPL_SR_RUNLOCK, &rates, &nrates, subsys_ctx); if (err) goto done; err = sbuf_finish(s); if (err) goto done; /* We lost a race for buf to be too small. */ /* Return the rendered string data via oldptr. */ err = SYSCTL_OUT(req, sbuf_data(s), sbuf_len(s)); } else { /* Return the upper bound size for buffer sizing requests. */ err = SYSCTL_OUT(req, NULL, len); } process_new: if (err || req->newptr == NULL) goto done; newlen = req->newlen - req->newidx; err = SYSCTL_IN(req, buf, newlen); if (err) goto done; /* * Initialise format strings at run time. * * Write the max template spec string length into the * template_spec=percent key-value pair parsing format string as: * " %[^=]=%u %n" * * Write the max template name string length into the tplname:tplhash * parsing format string as: * "%[^:]:%u" * * Subtract 1 for \0 appended by sscanf(). */ sprintf(kvpair_fmt, " %%%zu[^=]=%%u %%n", sizeof(tpl_spec) - 1); sprintf(tplspec_fmt, "%%%zu[^:]:%%u", sizeof(tpl_name) - 1); /* * Parse each CSV key-value pair specifying a template and its sample * percentage. Whitespace either side of a key-value pair is ignored. * Templates can be specified by name, hash, or name and hash per the * following formats (chars in [] are optional): * ["]["]= * :hash=pct * ["]["]:hash= */ cum_pct = nrates = 0; rates = NULL; buf[newlen] = '\0'; /* buf is at least newlen+1 in size. */ new_rates_usr_str = buf; while (isspace(*new_rates_usr_str)) new_rates_usr_str++; /* Skip leading whitespace. */ while (*new_rates_usr_str != '\0') { tpl_name_p = tpl_name; tpl_name[0] = '\0'; tpl_hash = 0; off = 0; /* * Parse key-value pair which must perform 2 conversions, then * parse the template spec to extract either name, hash, or name * and hash depending on the three possible spec formats. The * tplspec_fmt format specifier parses name or name and hash * template specs, while the ":%u" format specifier parses * hash-only template specs. If parsing is successfull, ensure * the cumulative sampling percentage does not exceed 100. */ err = EINVAL; if (2 != sscanf(new_rates_usr_str, kvpair_fmt, tpl_spec, &pct, &off)) break; if ((1 > sscanf(tpl_spec, tplspec_fmt, tpl_name, &tpl_hash)) && (1 != sscanf(tpl_spec, ":%u", &tpl_hash))) break; if ((cum_pct += pct) > 100) break; err = 0; /* Strip surrounding "" from template name if present. */ len = strlen(tpl_name); if (len > 0) { if (tpl_name[len - 1] == '"') tpl_name[--len] = '\0'; if (tpl_name[0] == '"') { tpl_name_p++; len--; } } rates = stats_realloc(rates, 0, /* oldsz is unused in kernel. */ (nrates + 1) * sizeof(*rates), M_WAITOK); rates[nrates].tpl_slot_id = stats_tpl_fetch_allocid(len ? tpl_name_p : NULL, tpl_hash); if (rates[nrates].tpl_slot_id < 0) { err = -rates[nrates].tpl_slot_id; break; } rates[nrates].tpl_sample_pct = pct; nrates++; new_rates_usr_str += off; if (*new_rates_usr_str != ',') break; /* End-of-input or malformed. */ new_rates_usr_str++; /* Move past comma to next pair. */ } if (!err) { if ((new_rates_usr_str - buf) < newlen) { /* Entire input has not been consumed. */ err = EINVAL; } else { /* * Give subsystem the new rates. They'll return the * appropriate rates pointer for us to garbage collect. */ err = subsys_cb(TPL_SR_PUT, &rates, &nrates, subsys_ctx); } } stats_free(rates); done: free(buf, M_TEMP); free(subsys_ctx, M_TEMP); return (err); } SYSCTL_NODE(_kern, OID_AUTO, stats, CTLFLAG_RW, NULL, "stats(9) MIB"); SYSCTL_PROC(_kern_stats, OID_AUTO, templates, CTLTYPE_STRING|CTLFLAG_RD, NULL, 0, stats_tpl_list_available, "A", "list the name/hash of all available stats(9) templates"); #else /* ! _KERNEL */ static void __attribute__ ((constructor)) stats_constructor(void) { pthread_rwlock_init(&tpllistlock, NULL); } static void __attribute__ ((destructor)) stats_destructor(void) { pthread_rwlock_destroy(&tpllistlock); } #endif /* _KERNEL */ Index: head/sys/libkern/random.c =================================================================== --- head/sys/libkern/random.c (revision 356096) +++ head/sys/libkern/random.c (revision 356097) @@ -1,79 +1,78 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)random.c 8.1 (Berkeley) 6/10/93 */ #include __FBSDID("$FreeBSD$"); +#include #include +#include -#define NSHUFF 50 /* to drop some "seed -> 1st value" linearity */ - static u_long randseed = 937186357; /* after srandom(1), NSHUFF counted */ -void -srandom(u_long seed) -{ - int i; - - randseed = seed; - for (i = 0; i < NSHUFF; i++) - (void)random(); -} - /* * Pseudo-random number generator for perturbing the profiling clock, * and whatever else we might use it for. The result is uniform on * [0, 2^31 - 1]. */ u_long -random() +random(void) { + static bool warned = false; + long x, hi, lo, t; + + /* Warn only once, or it gets very spammy. */ + if (!warned) { + gone_in(13, + "random(9) is the obsolete Park-Miller LCG from 1988"); + warned = true; + } /* * Compute x[n + 1] = (7^5 * x[n]) mod (2^31 - 1). * From "Random number generators: good ones are hard to find", * Park and Miller, Communications of the ACM, vol. 31, no. 10, * October 1988, p. 1195. */ /* Can't be initialized with 0, so use another value. */ if ((x = randseed) == 0) x = 123459876; hi = x / 127773; lo = x % 127773; t = 16807 * lo - 2836 * hi; if (t < 0) t += 0x7fffffff; randseed = t; return (t); } Index: head/sys/sys/libkern.h =================================================================== --- head/sys/sys/libkern.h (revision 356096) +++ head/sys/sys/libkern.h (revision 356097) @@ -1,229 +1,228 @@ /*- * SPDX-License-Identifier: BSD-3-Clause * * Copyright (c) 1992, 1993 * The Regents of the University of California. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * @(#)libkern.h 8.1 (Berkeley) 6/10/93 * $FreeBSD$ */ #ifndef _SYS_LIBKERN_H_ #define _SYS_LIBKERN_H_ #include #include #ifdef _KERNEL #include #endif #ifndef LIBKERN_INLINE #define LIBKERN_INLINE static __inline #define LIBKERN_BODY #endif /* BCD conversions. */ extern u_char const bcd2bin_data[]; extern u_char const bin2bcd_data[]; extern char const hex2ascii_data[]; #define LIBKERN_LEN_BCD2BIN 154 #define LIBKERN_LEN_BIN2BCD 100 #define LIBKERN_LEN_HEX2ASCII 36 static inline u_char bcd2bin(int bcd) { KASSERT(bcd >= 0 && bcd < LIBKERN_LEN_BCD2BIN, ("invalid bcd %d", bcd)); return (bcd2bin_data[bcd]); } static inline u_char bin2bcd(int bin) { KASSERT(bin >= 0 && bin < LIBKERN_LEN_BIN2BCD, ("invalid bin %d", bin)); return (bin2bcd_data[bin]); } static inline char hex2ascii(int hex) { KASSERT(hex >= 0 && hex < LIBKERN_LEN_HEX2ASCII, ("invalid hex %d", hex)); return (hex2ascii_data[hex]); } static inline bool validbcd(int bcd) { return (bcd == 0 || (bcd > 0 && bcd <= 0x99 && bcd2bin_data[bcd] != 0)); } static __inline int imax(int a, int b) { return (a > b ? a : b); } static __inline int imin(int a, int b) { return (a < b ? a : b); } static __inline long lmax(long a, long b) { return (a > b ? a : b); } static __inline long lmin(long a, long b) { return (a < b ? a : b); } static __inline u_int max(u_int a, u_int b) { return (a > b ? a : b); } static __inline u_int min(u_int a, u_int b) { return (a < b ? a : b); } static __inline quad_t qmax(quad_t a, quad_t b) { return (a > b ? a : b); } static __inline quad_t qmin(quad_t a, quad_t b) { return (a < b ? a : b); } static __inline u_quad_t uqmax(u_quad_t a, u_quad_t b) { return (a > b ? a : b); } static __inline u_quad_t uqmin(u_quad_t a, u_quad_t b) { return (a < b ? a : b); } static __inline u_long ulmax(u_long a, u_long b) { return (a > b ? a : b); } static __inline u_long ulmin(u_long a, u_long b) { return (a < b ? a : b); } static __inline __uintmax_t ummax(__uintmax_t a, __uintmax_t b) { return (a > b ? a : b); } static __inline __uintmax_t ummin(__uintmax_t a, __uintmax_t b) { return (a < b ? a : b); } static __inline off_t omax(off_t a, off_t b) { return (a > b ? a : b); } static __inline off_t omin(off_t a, off_t b) { return (a < b ? a : b); } static __inline int abs(int a) { return (a < 0 ? -a : a); } static __inline long labs(long a) { return (a < 0 ? -a : a); } static __inline quad_t qabs(quad_t a) { return (a < 0 ? -a : a); } #define ARC4_ENTR_NONE 0 /* Don't have entropy yet. */ #define ARC4_ENTR_HAVE 1 /* Have entropy. */ #define ARC4_ENTR_SEED 2 /* Reseeding. */ extern int arc4rand_iniseed_state; /* Prototypes for non-quad routines. */ struct malloc_type; uint32_t arc4random(void); void arc4random_buf(void *, size_t); void arc4rand(void *, u_int, int); int timingsafe_bcmp(const void *, const void *, size_t); void *bsearch(const void *, const void *, size_t, size_t, int (*)(const void *, const void *)); #ifndef HAVE_INLINE_FFS int ffs(int); #endif #ifndef HAVE_INLINE_FFSL int ffsl(long); #endif #ifndef HAVE_INLINE_FFSLL int ffsll(long long); #endif #ifndef HAVE_INLINE_FLS int fls(int); #endif #ifndef HAVE_INLINE_FLSL int flsl(long); #endif #ifndef HAVE_INLINE_FLSLL int flsll(long long); #endif #define bitcount64(x) __bitcount64((uint64_t)(x)) #define bitcount32(x) __bitcount32((uint32_t)(x)) #define bitcount16(x) __bitcount16((uint16_t)(x)) #define bitcountl(x) __bitcountl((u_long)(x)) #define bitcount(x) __bitcount((u_int)(x)) int fnmatch(const char *, const char *, int); int locc(int, char *, u_int); void *memchr(const void *s, int c, size_t n); void *memcchr(const void *s, int c, size_t n); void *memmem(const void *l, size_t l_len, const void *s, size_t s_len); void qsort(void *base, size_t nmemb, size_t size, int (*compar)(const void *, const void *)); void qsort_r(void *base, size_t nmemb, size_t size, void *thunk, int (*compar)(void *, const void *, const void *)); u_long random(void); int scanc(u_int, const u_char *, const u_char *, int); -void srandom(u_long); int strcasecmp(const char *, const char *); char *strcat(char * __restrict, const char * __restrict); char *strchr(const char *, int); char *strchrnul(const char *, int); int strcmp(const char *, const char *); char *strcpy(char * __restrict, const char * __restrict); size_t strcspn(const char * __restrict, const char * __restrict) __pure; char *strdup_flags(const char *__restrict, struct malloc_type *, int); char *strdup(const char *__restrict, struct malloc_type *); char *strncat(char *, const char *, size_t); char *strndup(const char *__restrict, size_t, struct malloc_type *); size_t strlcat(char *, const char *, size_t); size_t strlcpy(char *, const char *, size_t); size_t strlen(const char *); int strncasecmp(const char *, const char *, size_t); int strncmp(const char *, const char *, size_t); char *strncpy(char * __restrict, const char * __restrict, size_t); size_t strnlen(const char *, size_t); char *strrchr(const char *, int); char *strsep(char **, const char *delim); size_t strspn(const char *, const char *); char *strstr(const char *, const char *); int strvalid(const char *, size_t); #ifdef KCSAN char *kcsan_strcpy(char *, const char *); int kcsan_strcmp(const char *, const char *); size_t kcsan_strlen(const char *); #define strcpy(d, s) kcsan_strcpy((d), (s)) #define strcmp(s1, s2) kcsan_strcmp((s1), (s2)) #define strlen(s) kcsan_strlen((s)) #endif static __inline char * index(const char *p, int ch) { return (strchr(p, ch)); } static __inline char * rindex(const char *p, int ch) { return (strrchr(p, ch)); } /* fnmatch() return values. */ #define FNM_NOMATCH 1 /* Match failed. */ /* fnmatch() flags. */ #define FNM_NOESCAPE 0x01 /* Disable backslash escaping. */ #define FNM_PATHNAME 0x02 /* Slash must be matched by slash. */ #define FNM_PERIOD 0x04 /* Period must be matched by period. */ #define FNM_LEADING_DIR 0x08 /* Ignore / after Imatch. */ #define FNM_CASEFOLD 0x10 /* Case insensitive search. */ #define FNM_IGNORECASE FNM_CASEFOLD #define FNM_FILE_NAME FNM_PATHNAME #endif /* !_SYS_LIBKERN_H_ */