Index: projects/ifnet/share/man/man7/crypto.7 =================================================================== --- projects/ifnet/share/man/man7/crypto.7 (revision 277106) +++ projects/ifnet/share/man/man7/crypto.7 (revision 277107) @@ -1,141 +1,141 @@ .\" Copyright (c) 2014 The FreeBSD Foundation .\" All rights reserved. .\" .\" This documentation was written by John-Mark Gurney under .\" the sponsorship of the FreeBSD Foundation and .\" Rubicon Communications, LLC (Netgate). .\" Redistribution and use in source and binary forms, with or without .\" modification, are permitted provided that the following conditions .\" are met: .\" 1. Redistributions of source code must retain the above copyright .\" notice, this list of conditions and the following disclaimer. .\" 2. Redistributions in binary form must reproduce the above copyright .\" notice, this list of conditions and the following disclaimer in the .\" documentation and/or other materials provided with the distribution. .\" .\" THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE .\" ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF .\" SUCH DAMAGE. .\" .\" $FreeBSD$ .\" -.Dd December 12, 2014 +.Dd January 2, 2015 .Dt CRYPTO 7 .Os .Sh NAME .Nm crypto .Nd OpenCrypto algorithms .Sh SYNOPSIS In the kernel configuration file: .Cd "device crypto" .Pp Or load the crypto.ko module. .Sh DESCRIPTION The following cryptographic algorithms that are part of the OpenCrypto framework have the following requirements. .Pp Cipher algorithms: .Bl -tag -width ".Dv CRYPTO_AES_CBC" .It Dv CRYPTO_AES_CBC .Bl -tag -width "Block size :" -compact -offset indent .It IV size : 16 .It Block size : 16 .It Key size : 16, 24 or 32 .El .Pp This algorithm implements Cipher-block chaining. .It Dv CRYPTO_AES_NIST_GCM_16 .Bl -tag -width "Block size :" -compact -offset indent .It IV size : 12 .It Block size : 1 .It Key size : 16, 24 or 32 .It Digest size : 16 .El .Pp This algorithm implements Galois/Counter Mode. This is the cipher part of an AEAD .Pq Authenticated Encryption with Associated Data mode. This requires use of the use of a proper authentication mode, one of .Dv CRYPTO_AES_128_NIST_GMAC , .Dv CRYPTO_AES_192_NIST_GMAC or .Dv CRYPTO_AES_256_NIST_GMAC , that corresponds with the number of bits in the key that you are using. .Pp The associated data (if any) must be provided by the authentication mode op. The authentication tag will be read/written from/to the offset crd_inject specified in the descriptor for the authentication mode. .Pp Note: You must provide an IV on every call. .It Dv CRYPTO_AES_ICM .Bl -tag -width "Block size :" -compact -offset indent .It IV size : 16 .It Block size : 1 (aesni), 16 (software) .It Key size : 16, 24 or 32 .El .Pp This algorithm implements Integer Counter Mode. This is similar to what most people call counter mode, but instead of the counter being split into a nonce and a counter part, then entire nonce is used as the initial counter. This does mean that if a counter is required that rolls over at 32 bits, the transaction need to be split into two parts where the counter rolls over. The counter incremented as a 128-bit big endian number. .Pp Note: You must provide an IV on every call. .It Dv CRYPTO_AES_XTS .Bl -tag -width "Block size :" -compact -offset indent .It IV size : -16 +8 .It Block size : 16 .It Key size : 32 or 64 .El .Pp This algorithm implements XEX Tweakable Block Cipher with Ciphertext Stealing as defined in NIST SP 800-38E. .Pp NOTE: The ciphertext stealing part is not implemented which is why this cipher is listed as having a block size of 16 instead of 1. .El .Pp Authentication algorithms: .Bl -tag -width ".Dv CRYPTO_AES_256_NIST_GMAC" .It CRYPTO_AES_128_NIST_GMAC See .Dv CRYPTO_AES_NIST_GCM_16 in the cipher mode section. .It CRYPTO_AES_192_NIST_GMAC See .Dv CRYPTO_AES_NIST_GCM_16 in the cipher mode section. .It CRYPTO_AES_256_NIST_GMAC See .Dv CRYPTO_AES_NIST_GCM_16 in the cipher mode section. .El .Sh SEE ALSO .Xr crypto 4 , .Xr crypto 9 .Sh BUGS Not all the implemented algorithms are listed. Index: projects/ifnet/share =================================================================== --- projects/ifnet/share (revision 277106) +++ projects/ifnet/share (revision 277107) Property changes on: projects/ifnet/share ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/share:r277078-277106 Index: projects/ifnet/sys/cam/ata/ata_da.c =================================================================== --- projects/ifnet/sys/cam/ata/ata_da.c (revision 277106) +++ projects/ifnet/sys/cam/ata/ata_da.c (revision 277107) @@ -1,2118 +1,2128 @@ /*- * Copyright (c) 2009 Alexander Motin * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_ada.h" #include #ifdef _KERNEL #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #endif /* _KERNEL */ #ifndef _KERNEL #include #include #endif /* _KERNEL */ #include #include #include #include #include #include #include /* geometry translation */ #ifdef _KERNEL #define ATA_MAX_28BIT_LBA 268435455UL typedef enum { ADA_STATE_RAHEAD, ADA_STATE_WCACHE, ADA_STATE_NORMAL } ada_state; typedef enum { ADA_FLAG_CAN_48BIT = 0x0002, ADA_FLAG_CAN_FLUSHCACHE = 0x0004, ADA_FLAG_CAN_NCQ = 0x0008, ADA_FLAG_CAN_DMA = 0x0010, ADA_FLAG_NEED_OTAG = 0x0020, ADA_FLAG_WAS_OTAG = 0x0040, ADA_FLAG_CAN_TRIM = 0x0080, ADA_FLAG_OPEN = 0x0100, ADA_FLAG_SCTX_INIT = 0x0200, ADA_FLAG_CAN_CFA = 0x0400, ADA_FLAG_CAN_POWERMGT = 0x0800, ADA_FLAG_CAN_DMA48 = 0x1000, ADA_FLAG_DIRTY = 0x2000 } ada_flags; typedef enum { ADA_Q_NONE = 0x00, ADA_Q_4K = 0x01, } ada_quirks; #define ADA_Q_BIT_STRING \ "\020" \ "\0014K" typedef enum { ADA_CCB_RAHEAD = 0x01, ADA_CCB_WCACHE = 0x02, ADA_CCB_BUFFER_IO = 0x03, ADA_CCB_DUMP = 0x05, ADA_CCB_TRIM = 0x06, ADA_CCB_TYPE_MASK = 0x0F, } ada_ccb_state; /* Offsets into our private area for storing information */ #define ccb_state ppriv_field0 #define ccb_bp ppriv_ptr1 struct disk_params { u_int8_t heads; u_int8_t secs_per_track; u_int32_t cylinders; u_int32_t secsize; /* Number of bytes/logical sector */ u_int64_t sectors; /* Total number sectors */ }; #define TRIM_MAX_BLOCKS 8 #define TRIM_MAX_RANGES (TRIM_MAX_BLOCKS * ATA_DSM_BLK_RANGES) struct trim_request { uint8_t data[TRIM_MAX_RANGES * ATA_DSM_RANGE_SIZE]; TAILQ_HEAD(, bio) bps; }; struct ada_softc { struct bio_queue_head bio_queue; struct bio_queue_head trim_queue; int outstanding_cmds; /* Number of active commands */ int refcount; /* Active xpt_action() calls */ ada_state state; ada_flags flags; ada_quirks quirks; int sort_io_queue; int trim_max_ranges; int trim_running; int read_ahead; int write_cache; #ifdef ADA_TEST_FAILURE int force_read_error; int force_write_error; int periodic_read_error; int periodic_read_count; #endif struct disk_params params; struct disk *disk; struct task sysctl_task; struct sysctl_ctx_list sysctl_ctx; struct sysctl_oid *sysctl_tree; struct callout sendordered_c; struct trim_request trim_req; }; struct ada_quirk_entry { struct scsi_inquiry_pattern inq_pat; ada_quirks quirks; }; static struct ada_quirk_entry ada_quirk_table[] = { { /* Hitachi Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Hitachi H??????????E3*", "*" }, /*quirks*/ADA_Q_4K }, { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD155UI*", "*" }, /*quirks*/ADA_Q_4K }, { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG HD204UI*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST????DL*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Barracuda Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST???DM*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Barracuda Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST????DM*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9500423AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9500424AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9640423AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9640424AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9750420AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9750422AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST9750423AS*", "*" }, /*quirks*/ADA_Q_4K }, { /* Seagate Momentus Thin Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "ST???LT*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD????RS*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD????RX*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD??????RS*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD??????RX*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Scorpio Black Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD???PKT*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Scorpio Black Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD?????PKT*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Scorpio Blue Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD???PVT*", "*" }, /*quirks*/ADA_Q_4K }, { /* WDC Scorpio Blue Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "WDC WD?????PVT*", "*" }, /*quirks*/ADA_Q_4K }, /* SSDs */ { /* * Corsair Force 2 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Corsair CSSD-F*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Corsair Force 3 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Corsair Force 3*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Corsair Neutron GTX SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Corsair Neutron GTX*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Corsair Force GT & GS SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Corsair Force G*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Crucial M4 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "M4-CT???M4SSD2*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Crucial RealSSD C300 SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "C300-CTFDDAC???MAG*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Intel 320 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "INTEL SSDSA2CW*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Intel 330 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "INTEL SSDSC2CT*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Intel 510 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "INTEL SSDSC2MH*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Intel 520 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "INTEL SSDSC2BW*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Intel X25-M Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "INTEL SSDSA2M*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Kingston E100 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "KINGSTON SE100S3*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Kingston HyperX 3k SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "KINGSTON SH103S3*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Marvell SSDs (entry taken from OpenSolaris) * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "MARVELL SD88SA02*", "*" }, /*quirks*/ADA_Q_4K }, { /* * OCZ Agility 2 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "OCZ-AGILITY2*", "*" }, /*quirks*/ADA_Q_4K }, { /* * OCZ Agility 3 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "OCZ-AGILITY3*", "*" }, /*quirks*/ADA_Q_4K }, { /* * OCZ Deneva R Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "DENRSTE251M45*", "*" }, /*quirks*/ADA_Q_4K }, { /* * OCZ Vertex 2 SSDs (inc pro series) * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "OCZ?VERTEX2*", "*" }, /*quirks*/ADA_Q_4K }, { /* * OCZ Vertex 3 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "OCZ-VERTEX3*", "*" }, /*quirks*/ADA_Q_4K }, { /* * OCZ Vertex 4 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "OCZ-VERTEX4*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Samsung 830 Series SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG SSD 830 Series*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Samsung 840 SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Samsung SSD 840*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Samsung 843T Series SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG MZ7WD*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Samsung 850 SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Samsung SSD 850*", "*" }, /*quirks*/ADA_Q_4K }, { /* * Samsung PM853T Series SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SAMSUNG MZ7GE*", "*" }, /*quirks*/ADA_Q_4K }, { /* * SuperTalent TeraDrive CT SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "FTM??CT25H*", "*" }, /*quirks*/ADA_Q_4K }, { /* * XceedIOPS SATA SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "SG9XCS2D*", "*" }, /*quirks*/ADA_Q_4K }, { /* Default */ { T_ANY, SIP_MEDIA_REMOVABLE|SIP_MEDIA_FIXED, /*vendor*/"*", /*product*/"*", /*revision*/"*" }, /*quirks*/0 }, }; static disk_strategy_t adastrategy; static dumper_t adadump; static periph_init_t adainit; static void adaasync(void *callback_arg, u_int32_t code, struct cam_path *path, void *arg); static void adasysctlinit(void *context, int pending); static periph_ctor_t adaregister; static periph_dtor_t adacleanup; static periph_start_t adastart; static periph_oninv_t adaoninvalidate; static void adadone(struct cam_periph *periph, union ccb *done_ccb); static int adaerror(union ccb *ccb, u_int32_t cam_flags, u_int32_t sense_flags); static void adagetparams(struct cam_periph *periph, struct ccb_getdev *cgd); static timeout_t adasendorderedtag; static void adashutdown(void *arg, int howto); static void adasuspend(void *arg); static void adaresume(void *arg); #ifndef ADA_DEFAULT_LEGACY_ALIASES #define ADA_DEFAULT_LEGACY_ALIASES 1 #endif #ifndef ADA_DEFAULT_TIMEOUT #define ADA_DEFAULT_TIMEOUT 30 /* Timeout in seconds */ #endif #ifndef ADA_DEFAULT_RETRY #define ADA_DEFAULT_RETRY 4 #endif #ifndef ADA_DEFAULT_SEND_ORDERED #define ADA_DEFAULT_SEND_ORDERED 1 #endif #ifndef ADA_DEFAULT_SPINDOWN_SHUTDOWN #define ADA_DEFAULT_SPINDOWN_SHUTDOWN 1 #endif #ifndef ADA_DEFAULT_SPINDOWN_SUSPEND #define ADA_DEFAULT_SPINDOWN_SUSPEND 1 #endif #ifndef ADA_DEFAULT_READ_AHEAD #define ADA_DEFAULT_READ_AHEAD 1 #endif #ifndef ADA_DEFAULT_WRITE_CACHE #define ADA_DEFAULT_WRITE_CACHE 1 #endif #define ADA_RA (softc->read_ahead >= 0 ? \ softc->read_ahead : ada_read_ahead) #define ADA_WC (softc->write_cache >= 0 ? \ softc->write_cache : ada_write_cache) #define ADA_SIO (softc->sort_io_queue >= 0 ? \ softc->sort_io_queue : cam_sort_io_queues) /* * Most platforms map firmware geometry to actual, but some don't. If * not overridden, default to nothing. */ #ifndef ata_disk_firmware_geom_adjust #define ata_disk_firmware_geom_adjust(disk) #endif static int ada_legacy_aliases = ADA_DEFAULT_LEGACY_ALIASES; static int ada_retry_count = ADA_DEFAULT_RETRY; static int ada_default_timeout = ADA_DEFAULT_TIMEOUT; static int ada_send_ordered = ADA_DEFAULT_SEND_ORDERED; static int ada_spindown_shutdown = ADA_DEFAULT_SPINDOWN_SHUTDOWN; static int ada_spindown_suspend = ADA_DEFAULT_SPINDOWN_SUSPEND; static int ada_read_ahead = ADA_DEFAULT_READ_AHEAD; static int ada_write_cache = ADA_DEFAULT_WRITE_CACHE; static SYSCTL_NODE(_kern_cam, OID_AUTO, ada, CTLFLAG_RD, 0, "CAM Direct Access Disk driver"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, legacy_aliases, CTLFLAG_RWTUN, &ada_legacy_aliases, 0, "Create legacy-like device aliases"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, retry_count, CTLFLAG_RWTUN, &ada_retry_count, 0, "Normal I/O retry count"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, default_timeout, CTLFLAG_RWTUN, &ada_default_timeout, 0, "Normal I/O timeout (in seconds)"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, send_ordered, CTLFLAG_RWTUN, &ada_send_ordered, 0, "Send Ordered Tags"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, spindown_shutdown, CTLFLAG_RWTUN, &ada_spindown_shutdown, 0, "Spin down upon shutdown"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, spindown_suspend, CTLFLAG_RWTUN, &ada_spindown_suspend, 0, "Spin down upon suspend"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, read_ahead, CTLFLAG_RWTUN, &ada_read_ahead, 0, "Enable disk read-ahead"); SYSCTL_INT(_kern_cam_ada, OID_AUTO, write_cache, CTLFLAG_RWTUN, &ada_write_cache, 0, "Enable disk write cache"); /* * ADA_ORDEREDTAG_INTERVAL determines how often, relative * to the default timeout, we check to see whether an ordered * tagged transaction is appropriate to prevent simple tag * starvation. Since we'd like to ensure that there is at least * 1/2 of the timeout length left for a starved transaction to * complete after we've sent an ordered tag, we must poll at least * four times in every timeout period. This takes care of the worst * case where a starved transaction starts during an interval that * meets the requirement "don't send an ordered tag" test so it takes * us two intervals to determine that a tag must be sent. */ #ifndef ADA_ORDEREDTAG_INTERVAL #define ADA_ORDEREDTAG_INTERVAL 4 #endif static struct periph_driver adadriver = { adainit, "ada", TAILQ_HEAD_INITIALIZER(adadriver.units), /* generation */ 0 }; PERIPHDRIVER_DECLARE(ada, adadriver); static MALLOC_DEFINE(M_ATADA, "ata_da", "ata_da buffers"); static int adaopen(struct disk *dp) { struct cam_periph *periph; struct ada_softc *softc; int error; periph = (struct cam_periph *)dp->d_drv1; if (cam_periph_acquire(periph) != CAM_REQ_CMP) { return(ENXIO); } cam_periph_lock(periph); if ((error = cam_periph_hold(periph, PRIBIO|PCATCH)) != 0) { cam_periph_unlock(periph); cam_periph_release(periph); return (error); } CAM_DEBUG(periph->path, CAM_DEBUG_TRACE | CAM_DEBUG_PERIPH, ("adaopen\n")); softc = (struct ada_softc *)periph->softc; softc->flags |= ADA_FLAG_OPEN; cam_periph_unhold(periph); cam_periph_unlock(periph); return (0); } static int adaclose(struct disk *dp) { struct cam_periph *periph; struct ada_softc *softc; union ccb *ccb; int error; periph = (struct cam_periph *)dp->d_drv1; softc = (struct ada_softc *)periph->softc; cam_periph_lock(periph); CAM_DEBUG(periph->path, CAM_DEBUG_TRACE | CAM_DEBUG_PERIPH, ("adaclose\n")); /* We only sync the cache if the drive is capable of it. */ if ((softc->flags & ADA_FLAG_DIRTY) != 0 && (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) != 0 && (periph->flags & CAM_PERIPH_INVALID) == 0 && cam_periph_hold(periph, PRIBIO) == 0) { ccb = cam_periph_getccb(periph, CAM_PRIORITY_NORMAL); cam_fill_ataio(&ccb->ataio, 1, adadone, CAM_DIR_NONE, 0, NULL, 0, ada_default_timeout*1000); if (softc->flags & ADA_FLAG_CAN_48BIT) ata_48bit_cmd(&ccb->ataio, ATA_FLUSHCACHE48, 0, 0, 0); else ata_28bit_cmd(&ccb->ataio, ATA_FLUSHCACHE, 0, 0, 0); error = cam_periph_runccb(ccb, adaerror, /*cam_flags*/0, /*sense_flags*/0, softc->disk->d_devstat); if (error != 0) xpt_print(periph->path, "Synchronize cache failed\n"); else softc->flags &= ~ADA_FLAG_DIRTY; xpt_release_ccb(ccb); cam_periph_unhold(periph); } softc->flags &= ~ADA_FLAG_OPEN; while (softc->refcount != 0) cam_periph_sleep(periph, &softc->refcount, PRIBIO, "adaclose", 1); cam_periph_unlock(periph); cam_periph_release(periph); return (0); } static void adaschedule(struct cam_periph *periph) { struct ada_softc *softc = (struct ada_softc *)periph->softc; if (softc->state != ADA_STATE_NORMAL) return; /* Check if we have more work to do. */ if (bioq_first(&softc->bio_queue) || (!softc->trim_running && bioq_first(&softc->trim_queue))) { xpt_schedule(periph, CAM_PRIORITY_NORMAL); } } /* * Actually translate the requested transfer into one the physical driver * can understand. The transfer is described by a buf and will include * only one physical transfer. */ static void adastrategy(struct bio *bp) { struct cam_periph *periph; struct ada_softc *softc; periph = (struct cam_periph *)bp->bio_disk->d_drv1; softc = (struct ada_softc *)periph->softc; cam_periph_lock(periph); CAM_DEBUG(periph->path, CAM_DEBUG_TRACE, ("adastrategy(%p)\n", bp)); /* * If the device has been made invalid, error out */ if ((periph->flags & CAM_PERIPH_INVALID) != 0) { cam_periph_unlock(periph); biofinish(bp, NULL, ENXIO); return; } /* * Place it in the queue of disk activities for this disk */ if (bp->bio_cmd == BIO_DELETE) { KASSERT((softc->flags & ADA_FLAG_CAN_TRIM) || ((softc->flags & ADA_FLAG_CAN_CFA) && !(softc->flags & ADA_FLAG_CAN_48BIT)), ("BIO_DELETE but no supported TRIM method.")); bioq_disksort(&softc->trim_queue, bp); } else { if (ADA_SIO) bioq_disksort(&softc->bio_queue, bp); else bioq_insert_tail(&softc->bio_queue, bp); } /* * Schedule ourselves for performing the work. */ adaschedule(periph); cam_periph_unlock(periph); return; } static int adadump(void *arg, void *virtual, vm_offset_t physical, off_t offset, size_t length) { struct cam_periph *periph; struct ada_softc *softc; u_int secsize; union ccb ccb; struct disk *dp; uint64_t lba; uint16_t count; int error = 0; dp = arg; periph = dp->d_drv1; softc = (struct ada_softc *)periph->softc; cam_periph_lock(periph); secsize = softc->params.secsize; lba = offset / secsize; count = length / secsize; if ((periph->flags & CAM_PERIPH_INVALID) != 0) { cam_periph_unlock(periph); return (ENXIO); } if (length > 0) { xpt_setup_ccb(&ccb.ccb_h, periph->path, CAM_PRIORITY_NORMAL); ccb.ccb_h.ccb_state = ADA_CCB_DUMP; cam_fill_ataio(&ccb.ataio, 0, adadone, CAM_DIR_OUT, 0, (u_int8_t *) virtual, length, ada_default_timeout*1000); if ((softc->flags & ADA_FLAG_CAN_48BIT) && (lba + count >= ATA_MAX_28BIT_LBA || count >= 256)) { ata_48bit_cmd(&ccb.ataio, ATA_WRITE_DMA48, 0, lba, count); } else { ata_28bit_cmd(&ccb.ataio, ATA_WRITE_DMA, 0, lba, count); } xpt_polled_action(&ccb); error = cam_periph_error(&ccb, 0, SF_NO_RECOVERY | SF_NO_RETRY, NULL); if ((ccb.ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(ccb.ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); if (error != 0) printf("Aborting dump due to I/O error.\n"); cam_periph_unlock(periph); return (error); } if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) { xpt_setup_ccb(&ccb.ccb_h, periph->path, CAM_PRIORITY_NORMAL); ccb.ccb_h.ccb_state = ADA_CCB_DUMP; cam_fill_ataio(&ccb.ataio, 0, adadone, CAM_DIR_NONE, 0, NULL, 0, ada_default_timeout*1000); if (softc->flags & ADA_FLAG_CAN_48BIT) ata_48bit_cmd(&ccb.ataio, ATA_FLUSHCACHE48, 0, 0, 0); else ata_28bit_cmd(&ccb.ataio, ATA_FLUSHCACHE, 0, 0, 0); xpt_polled_action(&ccb); error = cam_periph_error(&ccb, 0, SF_NO_RECOVERY | SF_NO_RETRY, NULL); if ((ccb.ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(ccb.ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); if (error != 0) xpt_print(periph->path, "Synchronize cache failed\n"); } cam_periph_unlock(periph); return (error); } static void adainit(void) { cam_status status; /* * Install a global async callback. This callback will * receive async callbacks like "new device found". */ status = xpt_register_async(AC_FOUND_DEVICE, adaasync, NULL, NULL); if (status != CAM_REQ_CMP) { printf("ada: Failed to attach master async callback " "due to status 0x%x!\n", status); } else if (ada_send_ordered) { /* Register our event handlers */ if ((EVENTHANDLER_REGISTER(power_suspend, adasuspend, NULL, EVENTHANDLER_PRI_LAST)) == NULL) printf("adainit: power event registration failed!\n"); if ((EVENTHANDLER_REGISTER(power_resume, adaresume, NULL, EVENTHANDLER_PRI_LAST)) == NULL) printf("adainit: power event registration failed!\n"); if ((EVENTHANDLER_REGISTER(shutdown_post_sync, adashutdown, NULL, SHUTDOWN_PRI_DEFAULT)) == NULL) printf("adainit: shutdown event registration failed!\n"); } } /* * Callback from GEOM, called when it has finished cleaning up its * resources. */ static void adadiskgonecb(struct disk *dp) { struct cam_periph *periph; periph = (struct cam_periph *)dp->d_drv1; cam_periph_release(periph); } static void adaoninvalidate(struct cam_periph *periph) { struct ada_softc *softc; softc = (struct ada_softc *)periph->softc; /* * De-register any async callbacks. */ xpt_register_async(0, adaasync, periph, periph->path); /* * Return all queued I/O with ENXIO. * XXX Handle any transactions queued to the card * with XPT_ABORT_CCB. */ bioq_flush(&softc->bio_queue, NULL, ENXIO); bioq_flush(&softc->trim_queue, NULL, ENXIO); disk_gone(softc->disk); } static void adacleanup(struct cam_periph *periph) { struct ada_softc *softc; softc = (struct ada_softc *)periph->softc; cam_periph_unlock(periph); /* * If we can't free the sysctl tree, oh well... */ if ((softc->flags & ADA_FLAG_SCTX_INIT) != 0 && sysctl_ctx_free(&softc->sysctl_ctx) != 0) { xpt_print(periph->path, "can't remove sysctl context\n"); } disk_destroy(softc->disk); callout_drain(&softc->sendordered_c); free(softc, M_DEVBUF); cam_periph_lock(periph); } static void adaasync(void *callback_arg, u_int32_t code, struct cam_path *path, void *arg) { struct ccb_getdev cgd; struct cam_periph *periph; struct ada_softc *softc; periph = (struct cam_periph *)callback_arg; switch (code) { case AC_FOUND_DEVICE: { struct ccb_getdev *cgd; cam_status status; cgd = (struct ccb_getdev *)arg; if (cgd == NULL) break; if (cgd->protocol != PROTO_ATA) break; /* * Allocate a peripheral instance for * this device and start the probe * process. */ status = cam_periph_alloc(adaregister, adaoninvalidate, adacleanup, adastart, "ada", CAM_PERIPH_BIO, path, adaasync, AC_FOUND_DEVICE, cgd); if (status != CAM_REQ_CMP && status != CAM_REQ_INPROG) printf("adaasync: Unable to attach to new device " "due to status 0x%x\n", status); break; } case AC_GETDEV_CHANGED: { softc = (struct ada_softc *)periph->softc; xpt_setup_ccb(&cgd.ccb_h, periph->path, CAM_PRIORITY_NORMAL); cgd.ccb_h.func_code = XPT_GDEV_TYPE; xpt_action((union ccb *)&cgd); if ((cgd.ident_data.capabilities1 & ATA_SUPPORT_DMA) && (cgd.inq_flags & SID_DMA)) softc->flags |= ADA_FLAG_CAN_DMA; else softc->flags &= ~ADA_FLAG_CAN_DMA; if (cgd.ident_data.support.command2 & ATA_SUPPORT_ADDRESS48) { softc->flags |= ADA_FLAG_CAN_48BIT; if (cgd.inq_flags & SID_DMA48) softc->flags |= ADA_FLAG_CAN_DMA48; else softc->flags &= ~ADA_FLAG_CAN_DMA48; } else softc->flags &= ~(ADA_FLAG_CAN_48BIT | ADA_FLAG_CAN_DMA48); if ((cgd.ident_data.satacapabilities & ATA_SUPPORT_NCQ) && (cgd.inq_flags & SID_DMA) && (cgd.inq_flags & SID_CmdQue)) softc->flags |= ADA_FLAG_CAN_NCQ; else softc->flags &= ~ADA_FLAG_CAN_NCQ; if ((cgd.ident_data.support_dsm & ATA_SUPPORT_DSM_TRIM) && (cgd.inq_flags & SID_DMA)) softc->flags |= ADA_FLAG_CAN_TRIM; else softc->flags &= ~ADA_FLAG_CAN_TRIM; cam_periph_async(periph, code, path, arg); break; } case AC_ADVINFO_CHANGED: { uintptr_t buftype; buftype = (uintptr_t)arg; if (buftype == CDAI_TYPE_PHYS_PATH) { struct ada_softc *softc; softc = periph->softc; disk_attr_changed(softc->disk, "GEOM::physpath", M_NOWAIT); } break; } case AC_SENT_BDR: case AC_BUS_RESET: { softc = (struct ada_softc *)periph->softc; cam_periph_async(periph, code, path, arg); if (softc->state != ADA_STATE_NORMAL) break; xpt_setup_ccb(&cgd.ccb_h, periph->path, CAM_PRIORITY_NORMAL); cgd.ccb_h.func_code = XPT_GDEV_TYPE; xpt_action((union ccb *)&cgd); if (ADA_RA >= 0 && cgd.ident_data.support.command1 & ATA_SUPPORT_LOOKAHEAD) softc->state = ADA_STATE_RAHEAD; else if (ADA_WC >= 0 && cgd.ident_data.support.command1 & ATA_SUPPORT_WRITECACHE) softc->state = ADA_STATE_WCACHE; else break; if (cam_periph_acquire(periph) != CAM_REQ_CMP) softc->state = ADA_STATE_NORMAL; else xpt_schedule(periph, CAM_PRIORITY_DEV); } default: cam_periph_async(periph, code, path, arg); break; } } static void adasysctlinit(void *context, int pending) { struct cam_periph *periph; struct ada_softc *softc; char tmpstr[80], tmpstr2[80]; periph = (struct cam_periph *)context; /* periph was held for us when this task was enqueued */ if ((periph->flags & CAM_PERIPH_INVALID) != 0) { cam_periph_release(periph); return; } softc = (struct ada_softc *)periph->softc; snprintf(tmpstr, sizeof(tmpstr), "CAM ADA unit %d", periph->unit_number); snprintf(tmpstr2, sizeof(tmpstr2), "%d", periph->unit_number); sysctl_ctx_init(&softc->sysctl_ctx); softc->flags |= ADA_FLAG_SCTX_INIT; softc->sysctl_tree = SYSCTL_ADD_NODE(&softc->sysctl_ctx, SYSCTL_STATIC_CHILDREN(_kern_cam_ada), OID_AUTO, tmpstr2, CTLFLAG_RD, 0, tmpstr); if (softc->sysctl_tree == NULL) { printf("adasysctlinit: unable to allocate sysctl tree\n"); cam_periph_release(periph); return; } SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "read_ahead", CTLFLAG_RW | CTLFLAG_MPSAFE, &softc->read_ahead, 0, "Enable disk read ahead."); SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "write_cache", CTLFLAG_RW | CTLFLAG_MPSAFE, &softc->write_cache, 0, "Enable disk write cache."); SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "sort_io_queue", CTLFLAG_RW | CTLFLAG_MPSAFE, &softc->sort_io_queue, 0, "Sort IO queue to try and optimise disk access patterns"); #ifdef ADA_TEST_FAILURE /* * Add a 'door bell' sysctl which allows one to set it from userland * and cause something bad to happen. For the moment, we only allow * whacking the next read or write. */ SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "force_read_error", CTLFLAG_RW | CTLFLAG_MPSAFE, &softc->force_read_error, 0, "Force a read error for the next N reads."); SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "force_write_error", CTLFLAG_RW | CTLFLAG_MPSAFE, &softc->force_write_error, 0, "Force a write error for the next N writes."); SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "periodic_read_error", CTLFLAG_RW | CTLFLAG_MPSAFE, &softc->periodic_read_error, 0, "Force a read error every N reads (don't set too low)."); #endif cam_periph_release(periph); } static int adagetattr(struct bio *bp) { int ret; struct cam_periph *periph; periph = (struct cam_periph *)bp->bio_disk->d_drv1; cam_periph_lock(periph); ret = xpt_getattr(bp->bio_data, bp->bio_length, bp->bio_attribute, periph->path); cam_periph_unlock(periph); if (ret == 0) bp->bio_completed = bp->bio_length; return ret; } static cam_status adaregister(struct cam_periph *periph, void *arg) { struct ada_softc *softc; struct ccb_pathinq cpi; struct ccb_getdev *cgd; char announce_buf[80], buf1[32]; struct disk_params *dp; caddr_t match; u_int maxio; int legacy_id, quirks; cgd = (struct ccb_getdev *)arg; if (cgd == NULL) { printf("adaregister: no getdev CCB, can't register device\n"); return(CAM_REQ_CMP_ERR); } softc = (struct ada_softc *)malloc(sizeof(*softc), M_DEVBUF, M_NOWAIT|M_ZERO); if (softc == NULL) { printf("adaregister: Unable to probe new device. " "Unable to allocate softc\n"); return(CAM_REQ_CMP_ERR); } bioq_init(&softc->bio_queue); bioq_init(&softc->trim_queue); if ((cgd->ident_data.capabilities1 & ATA_SUPPORT_DMA) && (cgd->inq_flags & SID_DMA)) softc->flags |= ADA_FLAG_CAN_DMA; if (cgd->ident_data.support.command2 & ATA_SUPPORT_ADDRESS48) { softc->flags |= ADA_FLAG_CAN_48BIT; if (cgd->inq_flags & SID_DMA48) softc->flags |= ADA_FLAG_CAN_DMA48; } if (cgd->ident_data.support.command2 & ATA_SUPPORT_FLUSHCACHE) softc->flags |= ADA_FLAG_CAN_FLUSHCACHE; if (cgd->ident_data.support.command1 & ATA_SUPPORT_POWERMGT) softc->flags |= ADA_FLAG_CAN_POWERMGT; if ((cgd->ident_data.satacapabilities & ATA_SUPPORT_NCQ) && (cgd->inq_flags & SID_DMA) && (cgd->inq_flags & SID_CmdQue)) softc->flags |= ADA_FLAG_CAN_NCQ; if ((cgd->ident_data.support_dsm & ATA_SUPPORT_DSM_TRIM) && (cgd->inq_flags & SID_DMA)) { softc->flags |= ADA_FLAG_CAN_TRIM; softc->trim_max_ranges = TRIM_MAX_RANGES; if (cgd->ident_data.max_dsm_blocks != 0) { softc->trim_max_ranges = min(cgd->ident_data.max_dsm_blocks * ATA_DSM_BLK_RANGES, softc->trim_max_ranges); } } if (cgd->ident_data.support.command2 & ATA_SUPPORT_CFA) softc->flags |= ADA_FLAG_CAN_CFA; periph->softc = softc; /* * See if this device has any quirks. */ match = cam_quirkmatch((caddr_t)&cgd->ident_data, (caddr_t)ada_quirk_table, sizeof(ada_quirk_table)/sizeof(*ada_quirk_table), sizeof(*ada_quirk_table), ata_identify_match); if (match != NULL) softc->quirks = ((struct ada_quirk_entry *)match)->quirks; else softc->quirks = ADA_Q_NONE; bzero(&cpi, sizeof(cpi)); xpt_setup_ccb(&cpi.ccb_h, periph->path, CAM_PRIORITY_NONE); cpi.ccb_h.func_code = XPT_PATH_INQ; xpt_action((union ccb *)&cpi); TASK_INIT(&softc->sysctl_task, 0, adasysctlinit, periph); /* * Register this media as a disk */ (void)cam_periph_hold(periph, PRIBIO); cam_periph_unlock(periph); snprintf(announce_buf, sizeof(announce_buf), "kern.cam.ada.%d.quirks", periph->unit_number); quirks = softc->quirks; TUNABLE_INT_FETCH(announce_buf, &quirks); softc->quirks = quirks; softc->read_ahead = -1; snprintf(announce_buf, sizeof(announce_buf), "kern.cam.ada.%d.read_ahead", periph->unit_number); TUNABLE_INT_FETCH(announce_buf, &softc->read_ahead); softc->write_cache = -1; snprintf(announce_buf, sizeof(announce_buf), "kern.cam.ada.%d.write_cache", periph->unit_number); TUNABLE_INT_FETCH(announce_buf, &softc->write_cache); /* Disable queue sorting for non-rotational media by default. */ if (cgd->ident_data.media_rotation_rate == ATA_RATE_NON_ROTATING) softc->sort_io_queue = 0; else softc->sort_io_queue = -1; adagetparams(periph, cgd); softc->disk = disk_alloc(); softc->disk->d_rotation_rate = cgd->ident_data.media_rotation_rate; softc->disk->d_devstat = devstat_new_entry(periph->periph_name, periph->unit_number, softc->params.secsize, DEVSTAT_ALL_SUPPORTED, DEVSTAT_TYPE_DIRECT | XPORT_DEVSTAT_TYPE(cpi.transport), DEVSTAT_PRIORITY_DISK); softc->disk->d_open = adaopen; softc->disk->d_close = adaclose; softc->disk->d_strategy = adastrategy; softc->disk->d_getattr = adagetattr; softc->disk->d_dump = adadump; softc->disk->d_gone = adadiskgonecb; softc->disk->d_name = "ada"; softc->disk->d_drv1 = periph; maxio = cpi.maxio; /* Honor max I/O size of SIM */ if (maxio == 0) maxio = DFLTPHYS; /* traditional default */ else if (maxio > MAXPHYS) maxio = MAXPHYS; /* for safety */ if (softc->flags & ADA_FLAG_CAN_48BIT) maxio = min(maxio, 65536 * softc->params.secsize); else /* 28bit ATA command limit */ maxio = min(maxio, 256 * softc->params.secsize); softc->disk->d_maxsize = maxio; softc->disk->d_unit = periph->unit_number; softc->disk->d_flags = DISKFLAG_DIRECT_COMPLETION; if (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; if (softc->flags & ADA_FLAG_CAN_TRIM) { softc->disk->d_flags |= DISKFLAG_CANDELETE; softc->disk->d_delmaxsize = softc->params.secsize * ATA_DSM_RANGE_MAX * softc->trim_max_ranges; } else if ((softc->flags & ADA_FLAG_CAN_CFA) && !(softc->flags & ADA_FLAG_CAN_48BIT)) { softc->disk->d_flags |= DISKFLAG_CANDELETE; softc->disk->d_delmaxsize = 256 * softc->params.secsize; } else softc->disk->d_delmaxsize = maxio; if ((cpi.hba_misc & PIM_UNMAPPED) != 0) softc->disk->d_flags |= DISKFLAG_UNMAPPED_BIO; strlcpy(softc->disk->d_descr, cgd->ident_data.model, MIN(sizeof(softc->disk->d_descr), sizeof(cgd->ident_data.model))); strlcpy(softc->disk->d_ident, cgd->ident_data.serial, MIN(sizeof(softc->disk->d_ident), sizeof(cgd->ident_data.serial))); softc->disk->d_hba_vendor = cpi.hba_vendor; softc->disk->d_hba_device = cpi.hba_device; softc->disk->d_hba_subvendor = cpi.hba_subvendor; softc->disk->d_hba_subdevice = cpi.hba_subdevice; softc->disk->d_sectorsize = softc->params.secsize; softc->disk->d_mediasize = (off_t)softc->params.sectors * softc->params.secsize; if (ata_physical_sector_size(&cgd->ident_data) != softc->params.secsize) { softc->disk->d_stripesize = ata_physical_sector_size(&cgd->ident_data); softc->disk->d_stripeoffset = (softc->disk->d_stripesize - ata_logical_sector_offset(&cgd->ident_data)) % softc->disk->d_stripesize; } else if (softc->quirks & ADA_Q_4K) { softc->disk->d_stripesize = 4096; softc->disk->d_stripeoffset = 0; } softc->disk->d_fwsectors = softc->params.secs_per_track; softc->disk->d_fwheads = softc->params.heads; ata_disk_firmware_geom_adjust(softc->disk); if (ada_legacy_aliases) { #ifdef ATA_STATIC_ID legacy_id = xpt_path_legacy_ata_id(periph->path); #else legacy_id = softc->disk->d_unit; #endif if (legacy_id >= 0) { snprintf(announce_buf, sizeof(announce_buf), "kern.devalias.%s%d", softc->disk->d_name, softc->disk->d_unit); snprintf(buf1, sizeof(buf1), "ad%d", legacy_id); kern_setenv(announce_buf, buf1); } } else legacy_id = -1; /* * Acquire a reference to the periph before we register with GEOM. * We'll release this reference once GEOM calls us back (via * adadiskgonecb()) telling us that our provider has been freed. */ if (cam_periph_acquire(periph) != CAM_REQ_CMP) { xpt_print(periph->path, "%s: lost periph during " "registration!\n", __func__); cam_periph_lock(periph); return (CAM_REQ_CMP_ERR); } disk_create(softc->disk, DISK_VERSION); cam_periph_lock(periph); cam_periph_unhold(periph); dp = &softc->params; snprintf(announce_buf, sizeof(announce_buf), "%juMB (%ju %u byte sectors: %dH %dS/T %dC)", (uintmax_t)(((uintmax_t)dp->secsize * dp->sectors) / (1024*1024)), (uintmax_t)dp->sectors, dp->secsize, dp->heads, dp->secs_per_track, dp->cylinders); xpt_announce_periph(periph, announce_buf); xpt_announce_quirks(periph, softc->quirks, ADA_Q_BIT_STRING); if (legacy_id >= 0) printf("%s%d: Previously was known as ad%d\n", periph->periph_name, periph->unit_number, legacy_id); /* * Create our sysctl variables, now that we know * we have successfully attached. */ if (cam_periph_acquire(periph) == CAM_REQ_CMP) taskqueue_enqueue(taskqueue_thread, &softc->sysctl_task); /* * Add async callbacks for bus reset and * bus device reset calls. I don't bother * checking if this fails as, in most cases, * the system will function just fine without * them and the only alternative would be to * not attach the device on failure. */ xpt_register_async(AC_SENT_BDR | AC_BUS_RESET | AC_LOST_DEVICE | AC_GETDEV_CHANGED | AC_ADVINFO_CHANGED, adaasync, periph, periph->path); /* * Schedule a periodic event to occasionally send an * ordered tag to a device. */ callout_init_mtx(&softc->sendordered_c, cam_periph_mtx(periph), 0); callout_reset(&softc->sendordered_c, (ada_default_timeout * hz) / ADA_ORDEREDTAG_INTERVAL, adasendorderedtag, softc); if (ADA_RA >= 0 && cgd->ident_data.support.command1 & ATA_SUPPORT_LOOKAHEAD) { softc->state = ADA_STATE_RAHEAD; } else if (ADA_WC >= 0 && cgd->ident_data.support.command1 & ATA_SUPPORT_WRITECACHE) { softc->state = ADA_STATE_WCACHE; } else { softc->state = ADA_STATE_NORMAL; return(CAM_REQ_CMP); } if (cam_periph_acquire(periph) != CAM_REQ_CMP) softc->state = ADA_STATE_NORMAL; else xpt_schedule(periph, CAM_PRIORITY_DEV); return(CAM_REQ_CMP); } static void ada_dsmtrim(struct ada_softc *softc, struct bio *bp, struct ccb_ataio *ataio) { struct trim_request *req = &softc->trim_req; uint64_t lastlba = (uint64_t)-1; int c, lastcount = 0, off, ranges = 0; bzero(req, sizeof(*req)); TAILQ_INIT(&req->bps); do { uint64_t lba = bp->bio_pblkno; int count = bp->bio_bcount / softc->params.secsize; bioq_remove(&softc->trim_queue, bp); /* Try to extend the previous range. */ if (lba == lastlba) { c = min(count, ATA_DSM_RANGE_MAX - lastcount); lastcount += c; off = (ranges - 1) * ATA_DSM_RANGE_SIZE; req->data[off + 6] = lastcount & 0xff; req->data[off + 7] = (lastcount >> 8) & 0xff; count -= c; lba += c; } while (count > 0) { c = min(count, ATA_DSM_RANGE_MAX); off = ranges * ATA_DSM_RANGE_SIZE; req->data[off + 0] = lba & 0xff; req->data[off + 1] = (lba >> 8) & 0xff; req->data[off + 2] = (lba >> 16) & 0xff; req->data[off + 3] = (lba >> 24) & 0xff; req->data[off + 4] = (lba >> 32) & 0xff; req->data[off + 5] = (lba >> 40) & 0xff; req->data[off + 6] = c & 0xff; req->data[off + 7] = (c >> 8) & 0xff; lba += c; count -= c; lastcount = c; ranges++; /* * Its the caller's responsibility to ensure the * request will fit so we don't need to check for * overrun here */ } lastlba = lba; TAILQ_INSERT_TAIL(&req->bps, bp, bio_queue); bp = bioq_first(&softc->trim_queue); if (bp == NULL || bp->bio_bcount / softc->params.secsize > (softc->trim_max_ranges - ranges) * ATA_DSM_RANGE_MAX) break; } while (1); cam_fill_ataio(ataio, ada_retry_count, adadone, CAM_DIR_OUT, 0, req->data, ((ranges + ATA_DSM_BLK_RANGES - 1) / ATA_DSM_BLK_RANGES) * ATA_DSM_BLK_SIZE, ada_default_timeout * 1000); ata_48bit_cmd(ataio, ATA_DATA_SET_MANAGEMENT, ATA_DSM_TRIM, 0, (ranges + ATA_DSM_BLK_RANGES - 1) / ATA_DSM_BLK_RANGES); } static void ada_cfaerase(struct ada_softc *softc, struct bio *bp, struct ccb_ataio *ataio) { struct trim_request *req = &softc->trim_req; uint64_t lba = bp->bio_pblkno; uint16_t count = bp->bio_bcount / softc->params.secsize; bzero(req, sizeof(*req)); TAILQ_INIT(&req->bps); bioq_remove(&softc->trim_queue, bp); TAILQ_INSERT_TAIL(&req->bps, bp, bio_queue); cam_fill_ataio(ataio, ada_retry_count, adadone, CAM_DIR_NONE, 0, NULL, 0, ada_default_timeout*1000); if (count >= 256) count = 0; ata_28bit_cmd(ataio, ATA_CFA_ERASE, 0, lba, count); } static void adastart(struct cam_periph *periph, union ccb *start_ccb) { struct ada_softc *softc = (struct ada_softc *)periph->softc; struct ccb_ataio *ataio = &start_ccb->ataio; CAM_DEBUG(periph->path, CAM_DEBUG_TRACE, ("adastart\n")); switch (softc->state) { case ADA_STATE_NORMAL: { struct bio *bp; u_int8_t tag_code; /* Run TRIM if not running yet. */ if (!softc->trim_running && (bp = bioq_first(&softc->trim_queue)) != 0) { if (softc->flags & ADA_FLAG_CAN_TRIM) { ada_dsmtrim(softc, bp, ataio); } else if ((softc->flags & ADA_FLAG_CAN_CFA) && !(softc->flags & ADA_FLAG_CAN_48BIT)) { ada_cfaerase(softc, bp, ataio); } else { panic("adastart: BIO_DELETE without method, not possible."); } softc->trim_running = 1; start_ccb->ccb_h.ccb_state = ADA_CCB_TRIM; start_ccb->ccb_h.flags |= CAM_UNLOCKED; goto out; } /* Run regular command. */ bp = bioq_first(&softc->bio_queue); if (bp == NULL) { xpt_release_ccb(start_ccb); break; } bioq_remove(&softc->bio_queue, bp); if ((bp->bio_flags & BIO_ORDERED) != 0 || (softc->flags & ADA_FLAG_NEED_OTAG) != 0) { softc->flags &= ~ADA_FLAG_NEED_OTAG; softc->flags |= ADA_FLAG_WAS_OTAG; tag_code = 0; } else { tag_code = 1; } switch (bp->bio_cmd) { case BIO_WRITE: softc->flags |= ADA_FLAG_DIRTY; /* FALLTHROUGH */ case BIO_READ: { uint64_t lba = bp->bio_pblkno; uint16_t count = bp->bio_bcount / softc->params.secsize; #ifdef ADA_TEST_FAILURE int fail = 0; /* * Support the failure ioctls. If the command is a * read, and there are pending forced read errors, or * if a write and pending write errors, then fail this * operation with EIO. This is useful for testing * purposes. Also, support having every Nth read fail. * * This is a rather blunt tool. */ if (bp->bio_cmd == BIO_READ) { if (softc->force_read_error) { softc->force_read_error--; fail = 1; } if (softc->periodic_read_error > 0) { if (++softc->periodic_read_count >= softc->periodic_read_error) { softc->periodic_read_count = 0; fail = 1; } } } else { if (softc->force_write_error) { softc->force_write_error--; fail = 1; } } if (fail) { bp->bio_error = EIO; bp->bio_flags |= BIO_ERROR; biodone(bp); xpt_release_ccb(start_ccb); adaschedule(periph); return; } #endif KASSERT((bp->bio_flags & BIO_UNMAPPED) == 0 || round_page(bp->bio_bcount + bp->bio_ma_offset) / PAGE_SIZE == bp->bio_ma_n, ("Short bio %p", bp)); cam_fill_ataio(ataio, ada_retry_count, adadone, (bp->bio_cmd == BIO_READ ? CAM_DIR_IN : CAM_DIR_OUT) | ((bp->bio_flags & BIO_UNMAPPED) != 0 ? CAM_DATA_BIO : 0), tag_code, ((bp->bio_flags & BIO_UNMAPPED) != 0) ? (void *)bp : bp->bio_data, bp->bio_bcount, ada_default_timeout*1000); if ((softc->flags & ADA_FLAG_CAN_NCQ) && tag_code) { if (bp->bio_cmd == BIO_READ) { ata_ncq_cmd(ataio, ATA_READ_FPDMA_QUEUED, lba, count); } else { ata_ncq_cmd(ataio, ATA_WRITE_FPDMA_QUEUED, lba, count); } } else if ((softc->flags & ADA_FLAG_CAN_48BIT) && (lba + count >= ATA_MAX_28BIT_LBA || count > 256)) { if (softc->flags & ADA_FLAG_CAN_DMA48) { if (bp->bio_cmd == BIO_READ) { ata_48bit_cmd(ataio, ATA_READ_DMA48, 0, lba, count); } else { ata_48bit_cmd(ataio, ATA_WRITE_DMA48, 0, lba, count); } } else { if (bp->bio_cmd == BIO_READ) { ata_48bit_cmd(ataio, ATA_READ_MUL48, 0, lba, count); } else { ata_48bit_cmd(ataio, ATA_WRITE_MUL48, 0, lba, count); } } } else { if (count == 256) count = 0; if (softc->flags & ADA_FLAG_CAN_DMA) { if (bp->bio_cmd == BIO_READ) { ata_28bit_cmd(ataio, ATA_READ_DMA, 0, lba, count); } else { ata_28bit_cmd(ataio, ATA_WRITE_DMA, 0, lba, count); } } else { if (bp->bio_cmd == BIO_READ) { ata_28bit_cmd(ataio, ATA_READ_MUL, 0, lba, count); } else { ata_28bit_cmd(ataio, ATA_WRITE_MUL, 0, lba, count); } } } break; } case BIO_FLUSH: cam_fill_ataio(ataio, 1, adadone, CAM_DIR_NONE, 0, NULL, 0, ada_default_timeout*1000); if (softc->flags & ADA_FLAG_CAN_48BIT) ata_48bit_cmd(ataio, ATA_FLUSHCACHE48, 0, 0, 0); else ata_28bit_cmd(ataio, ATA_FLUSHCACHE, 0, 0, 0); break; } start_ccb->ccb_h.ccb_state = ADA_CCB_BUFFER_IO; start_ccb->ccb_h.flags |= CAM_UNLOCKED; out: start_ccb->ccb_h.ccb_bp = bp; softc->outstanding_cmds++; softc->refcount++; cam_periph_unlock(periph); xpt_action(start_ccb); cam_periph_lock(periph); softc->refcount--; /* May have more work to do, so ensure we stay scheduled */ adaschedule(periph); break; } case ADA_STATE_RAHEAD: case ADA_STATE_WCACHE: { cam_fill_ataio(ataio, 1, adadone, CAM_DIR_NONE, 0, NULL, 0, ada_default_timeout*1000); if (softc->state == ADA_STATE_RAHEAD) { ata_28bit_cmd(ataio, ATA_SETFEATURES, ADA_RA ? ATA_SF_ENAB_RCACHE : ATA_SF_DIS_RCACHE, 0, 0); start_ccb->ccb_h.ccb_state = ADA_CCB_RAHEAD; } else { ata_28bit_cmd(ataio, ATA_SETFEATURES, ADA_WC ? ATA_SF_ENAB_WCACHE : ATA_SF_DIS_WCACHE, 0, 0); start_ccb->ccb_h.ccb_state = ADA_CCB_WCACHE; } start_ccb->ccb_h.flags |= CAM_DEV_QFREEZE; xpt_action(start_ccb); break; } } } static void adadone(struct cam_periph *periph, union ccb *done_ccb) { struct ada_softc *softc; struct ccb_ataio *ataio; struct ccb_getdev *cgd; struct cam_path *path; int state; softc = (struct ada_softc *)periph->softc; ataio = &done_ccb->ataio; path = done_ccb->ccb_h.path; CAM_DEBUG(path, CAM_DEBUG_TRACE, ("adadone\n")); state = ataio->ccb_h.ccb_state & ADA_CCB_TYPE_MASK; switch (state) { case ADA_CCB_BUFFER_IO: case ADA_CCB_TRIM: { struct bio *bp; int error; cam_periph_lock(periph); if ((done_ccb->ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { error = adaerror(done_ccb, 0, 0); if (error == ERESTART) { /* A retry was scheduled, so just return. */ cam_periph_unlock(periph); return; } if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } else { if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) panic("REQ_CMP with QFRZN"); error = 0; } bp = (struct bio *)done_ccb->ccb_h.ccb_bp; bp->bio_error = error; if (error != 0) { bp->bio_resid = bp->bio_bcount; bp->bio_flags |= BIO_ERROR; } else { if (state == ADA_CCB_TRIM) bp->bio_resid = 0; else bp->bio_resid = ataio->resid; if (bp->bio_resid > 0) bp->bio_flags |= BIO_ERROR; } softc->outstanding_cmds--; if (softc->outstanding_cmds == 0) softc->flags |= ADA_FLAG_WAS_OTAG; xpt_release_ccb(done_ccb); if (state == ADA_CCB_TRIM) { TAILQ_HEAD(, bio) queue; struct bio *bp1; TAILQ_INIT(&queue); TAILQ_CONCAT(&queue, &softc->trim_req.bps, bio_queue); + /* + * Normally, the xpt_release_ccb() above would make sure + * that when we have more work to do, that work would + * get kicked off. However, we specifically keep + * trim_running set to 0 before the call above to allow + * other I/O to progress when many BIO_DELETE requests + * are pushed down. We set trim_running to 0 and call + * daschedule again so that we don't stall if there are + * no other I/Os pending apart from BIO_DELETEs. + */ softc->trim_running = 0; adaschedule(periph); cam_periph_unlock(periph); while ((bp1 = TAILQ_FIRST(&queue)) != NULL) { TAILQ_REMOVE(&queue, bp1, bio_queue); bp1->bio_error = error; if (error != 0) { bp1->bio_flags |= BIO_ERROR; bp1->bio_resid = bp1->bio_bcount; } else bp1->bio_resid = 0; biodone(bp1); } } else { cam_periph_unlock(periph); biodone(bp); } return; } case ADA_CCB_RAHEAD: { if ((done_ccb->ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { if (adaerror(done_ccb, 0, 0) == ERESTART) { out: /* Drop freeze taken due to CAM_DEV_QFREEZE */ cam_release_devq(path, 0, 0, 0, FALSE); return; } else if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { cam_release_devq(path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } } /* * Since our peripheral may be invalidated by an error * above or an external event, we must release our CCB * before releasing the reference on the peripheral. * The peripheral will only go away once the last reference * is removed, and we need it around for the CCB release * operation. */ cgd = (struct ccb_getdev *)done_ccb; xpt_setup_ccb(&cgd->ccb_h, path, CAM_PRIORITY_NORMAL); cgd->ccb_h.func_code = XPT_GDEV_TYPE; xpt_action((union ccb *)cgd); if (ADA_WC >= 0 && cgd->ident_data.support.command1 & ATA_SUPPORT_WRITECACHE) { softc->state = ADA_STATE_WCACHE; xpt_release_ccb(done_ccb); xpt_schedule(periph, CAM_PRIORITY_DEV); goto out; } softc->state = ADA_STATE_NORMAL; xpt_release_ccb(done_ccb); /* Drop freeze taken due to CAM_DEV_QFREEZE */ cam_release_devq(path, 0, 0, 0, FALSE); adaschedule(periph); cam_periph_release_locked(periph); return; } case ADA_CCB_WCACHE: { if ((done_ccb->ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { if (adaerror(done_ccb, 0, 0) == ERESTART) { goto out; } else if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { cam_release_devq(path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } } softc->state = ADA_STATE_NORMAL; /* * Since our peripheral may be invalidated by an error * above or an external event, we must release our CCB * before releasing the reference on the peripheral. * The peripheral will only go away once the last reference * is removed, and we need it around for the CCB release * operation. */ xpt_release_ccb(done_ccb); /* Drop freeze taken due to CAM_DEV_QFREEZE */ cam_release_devq(path, 0, 0, 0, FALSE); adaschedule(periph); cam_periph_release_locked(periph); return; } case ADA_CCB_DUMP: /* No-op. We're polling */ return; default: break; } xpt_release_ccb(done_ccb); } static int adaerror(union ccb *ccb, u_int32_t cam_flags, u_int32_t sense_flags) { return(cam_periph_error(ccb, cam_flags, sense_flags, NULL)); } static void adagetparams(struct cam_periph *periph, struct ccb_getdev *cgd) { struct ada_softc *softc = (struct ada_softc *)periph->softc; struct disk_params *dp = &softc->params; u_int64_t lbasize48; u_int32_t lbasize; dp->secsize = ata_logical_sector_size(&cgd->ident_data); if ((cgd->ident_data.atavalid & ATA_FLAG_54_58) && cgd->ident_data.current_heads && cgd->ident_data.current_sectors) { dp->heads = cgd->ident_data.current_heads; dp->secs_per_track = cgd->ident_data.current_sectors; dp->cylinders = cgd->ident_data.cylinders; dp->sectors = (u_int32_t)cgd->ident_data.current_size_1 | ((u_int32_t)cgd->ident_data.current_size_2 << 16); } else { dp->heads = cgd->ident_data.heads; dp->secs_per_track = cgd->ident_data.sectors; dp->cylinders = cgd->ident_data.cylinders; dp->sectors = cgd->ident_data.cylinders * dp->heads * dp->secs_per_track; } lbasize = (u_int32_t)cgd->ident_data.lba_size_1 | ((u_int32_t)cgd->ident_data.lba_size_2 << 16); /* use the 28bit LBA size if valid or bigger than the CHS mapping */ if (cgd->ident_data.cylinders == 16383 || dp->sectors < lbasize) dp->sectors = lbasize; /* use the 48bit LBA size if valid */ lbasize48 = ((u_int64_t)cgd->ident_data.lba_size48_1) | ((u_int64_t)cgd->ident_data.lba_size48_2 << 16) | ((u_int64_t)cgd->ident_data.lba_size48_3 << 32) | ((u_int64_t)cgd->ident_data.lba_size48_4 << 48); if ((cgd->ident_data.support.command2 & ATA_SUPPORT_ADDRESS48) && lbasize48 > ATA_MAX_28BIT_LBA) dp->sectors = lbasize48; } static void adasendorderedtag(void *arg) { struct ada_softc *softc = arg; if (ada_send_ordered) { if (softc->outstanding_cmds > 0) { if ((softc->flags & ADA_FLAG_WAS_OTAG) == 0) softc->flags |= ADA_FLAG_NEED_OTAG; softc->flags &= ~ADA_FLAG_WAS_OTAG; } } /* Queue us up again */ callout_reset(&softc->sendordered_c, (ada_default_timeout * hz) / ADA_ORDEREDTAG_INTERVAL, adasendorderedtag, softc); } /* * Step through all ADA peripheral drivers, and if the device is still open, * sync the disk cache to physical media. */ static void adaflush(void) { struct cam_periph *periph; struct ada_softc *softc; union ccb *ccb; int error; CAM_PERIPH_FOREACH(periph, &adadriver) { softc = (struct ada_softc *)periph->softc; if (SCHEDULER_STOPPED()) { /* If we paniced with the lock held, do not recurse. */ if (!cam_periph_owned(periph) && (softc->flags & ADA_FLAG_OPEN)) { adadump(softc->disk, NULL, 0, 0, 0); } continue; } cam_periph_lock(periph); /* * We only sync the cache if the drive is still open, and * if the drive is capable of it.. */ if (((softc->flags & ADA_FLAG_OPEN) == 0) || (softc->flags & ADA_FLAG_CAN_FLUSHCACHE) == 0) { cam_periph_unlock(periph); continue; } ccb = cam_periph_getccb(periph, CAM_PRIORITY_NORMAL); cam_fill_ataio(&ccb->ataio, 0, adadone, CAM_DIR_NONE, 0, NULL, 0, ada_default_timeout*1000); if (softc->flags & ADA_FLAG_CAN_48BIT) ata_48bit_cmd(&ccb->ataio, ATA_FLUSHCACHE48, 0, 0, 0); else ata_28bit_cmd(&ccb->ataio, ATA_FLUSHCACHE, 0, 0, 0); error = cam_periph_runccb(ccb, adaerror, /*cam_flags*/0, /*sense_flags*/ SF_NO_RECOVERY | SF_NO_RETRY, softc->disk->d_devstat); if (error != 0) xpt_print(periph->path, "Synchronize cache failed\n"); xpt_release_ccb(ccb); cam_periph_unlock(periph); } } static void adaspindown(uint8_t cmd, int flags) { struct cam_periph *periph; struct ada_softc *softc; union ccb *ccb; int error; CAM_PERIPH_FOREACH(periph, &adadriver) { /* If we paniced with lock held - not recurse here. */ if (cam_periph_owned(periph)) continue; cam_periph_lock(periph); softc = (struct ada_softc *)periph->softc; /* * We only spin-down the drive if it is capable of it.. */ if ((softc->flags & ADA_FLAG_CAN_POWERMGT) == 0) { cam_periph_unlock(periph); continue; } if (bootverbose) xpt_print(periph->path, "spin-down\n"); ccb = cam_periph_getccb(periph, CAM_PRIORITY_NORMAL); cam_fill_ataio(&ccb->ataio, 0, adadone, CAM_DIR_NONE | flags, 0, NULL, 0, ada_default_timeout*1000); ata_28bit_cmd(&ccb->ataio, cmd, 0, 0, 0); error = cam_periph_runccb(ccb, adaerror, /*cam_flags*/0, /*sense_flags*/ SF_NO_RECOVERY | SF_NO_RETRY, softc->disk->d_devstat); if (error != 0) xpt_print(periph->path, "Spin-down disk failed\n"); xpt_release_ccb(ccb); cam_periph_unlock(periph); } } static void adashutdown(void *arg, int howto) { adaflush(); if (ada_spindown_shutdown != 0 && (howto & (RB_HALT | RB_POWEROFF)) != 0) adaspindown(ATA_STANDBY_IMMEDIATE, 0); } static void adasuspend(void *arg) { adaflush(); if (ada_spindown_suspend != 0) adaspindown(ATA_SLEEP, CAM_DEV_QFREEZE); } static void adaresume(void *arg) { struct cam_periph *periph; struct ada_softc *softc; if (ada_spindown_suspend == 0) return; CAM_PERIPH_FOREACH(periph, &adadriver) { cam_periph_lock(periph); softc = (struct ada_softc *)periph->softc; /* * We only spin-down the drive if it is capable of it.. */ if ((softc->flags & ADA_FLAG_CAN_POWERMGT) == 0) { cam_periph_unlock(periph); continue; } if (bootverbose) xpt_print(periph->path, "resume\n"); /* * Drop freeze taken due to CAM_DEV_QFREEZE flag set on * sleep request. */ cam_release_devq(periph->path, /*relsim_flags*/0, /*openings*/0, /*timeout*/0, /*getcount_only*/0); cam_periph_unlock(periph); } } #endif /* _KERNEL */ Index: projects/ifnet/sys/cam/scsi/scsi_da.c =================================================================== --- projects/ifnet/sys/cam/scsi/scsi_da.c (revision 277106) +++ projects/ifnet/sys/cam/scsi/scsi_da.c (revision 277107) @@ -1,3985 +1,3995 @@ /*- * Implementation of SCSI Direct Access Peripheral driver for CAM. * * Copyright (c) 1997 Justin T. Gibbs. * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions, and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. The name of the author may not be used to endorse or promote products * derived from this software without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #ifdef _KERNEL #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #endif /* _KERNEL */ #ifndef _KERNEL #include #include #endif /* _KERNEL */ #include #include #include #include #include #include #ifndef _KERNEL #include #endif /* !_KERNEL */ #ifdef _KERNEL typedef enum { DA_STATE_PROBE_RC, DA_STATE_PROBE_RC16, DA_STATE_PROBE_LBP, DA_STATE_PROBE_BLK_LIMITS, DA_STATE_PROBE_BDC, DA_STATE_PROBE_ATA, DA_STATE_NORMAL } da_state; typedef enum { DA_FLAG_PACK_INVALID = 0x001, DA_FLAG_NEW_PACK = 0x002, DA_FLAG_PACK_LOCKED = 0x004, DA_FLAG_PACK_REMOVABLE = 0x008, DA_FLAG_NEED_OTAG = 0x020, DA_FLAG_WAS_OTAG = 0x040, DA_FLAG_RETRY_UA = 0x080, DA_FLAG_OPEN = 0x100, DA_FLAG_SCTX_INIT = 0x200, DA_FLAG_CAN_RC16 = 0x400, DA_FLAG_PROBED = 0x800, DA_FLAG_DIRTY = 0x1000, DA_FLAG_ANNOUNCED = 0x2000 } da_flags; typedef enum { DA_Q_NONE = 0x00, DA_Q_NO_SYNC_CACHE = 0x01, DA_Q_NO_6_BYTE = 0x02, DA_Q_NO_PREVENT = 0x04, DA_Q_4K = 0x08, DA_Q_NO_RC16 = 0x10, DA_Q_NO_UNMAP = 0x20 } da_quirks; #define DA_Q_BIT_STRING \ "\020" \ "\001NO_SYNC_CACHE" \ "\002NO_6_BYTE" \ "\003NO_PREVENT" \ "\0044K" \ "\005NO_RC16" typedef enum { DA_CCB_PROBE_RC = 0x01, DA_CCB_PROBE_RC16 = 0x02, DA_CCB_PROBE_LBP = 0x03, DA_CCB_PROBE_BLK_LIMITS = 0x04, DA_CCB_PROBE_BDC = 0x05, DA_CCB_PROBE_ATA = 0x06, DA_CCB_BUFFER_IO = 0x07, DA_CCB_DUMP = 0x0A, DA_CCB_DELETE = 0x0B, DA_CCB_TUR = 0x0C, DA_CCB_TYPE_MASK = 0x0F, DA_CCB_RETRY_UA = 0x10 } da_ccb_state; /* * Order here is important for method choice * * We prefer ATA_TRIM as tests run against a Sandforce 2281 SSD attached to * LSI 2008 (mps) controller (FW: v12, Drv: v14) resulted 20% quicker deletes * using ATA_TRIM than the corresponding UNMAP results for a real world mysql * import taking 5mins. * */ typedef enum { DA_DELETE_NONE, DA_DELETE_DISABLE, DA_DELETE_ATA_TRIM, DA_DELETE_UNMAP, DA_DELETE_WS16, DA_DELETE_WS10, DA_DELETE_ZERO, DA_DELETE_MIN = DA_DELETE_ATA_TRIM, DA_DELETE_MAX = DA_DELETE_ZERO } da_delete_methods; typedef void da_delete_func_t (struct cam_periph *periph, union ccb *ccb, struct bio *bp); static da_delete_func_t da_delete_trim; static da_delete_func_t da_delete_unmap; static da_delete_func_t da_delete_ws; static const void * da_delete_functions[] = { NULL, NULL, da_delete_trim, da_delete_unmap, da_delete_ws, da_delete_ws, da_delete_ws }; static const char *da_delete_method_names[] = { "NONE", "DISABLE", "ATA_TRIM", "UNMAP", "WS16", "WS10", "ZERO" }; static const char *da_delete_method_desc[] = { "NONE", "DISABLED", "ATA TRIM", "UNMAP", "WRITE SAME(16) with UNMAP", "WRITE SAME(10) with UNMAP", "ZERO" }; /* Offsets into our private area for storing information */ #define ccb_state ppriv_field0 #define ccb_bp ppriv_ptr1 struct disk_params { u_int8_t heads; u_int32_t cylinders; u_int8_t secs_per_track; u_int32_t secsize; /* Number of bytes/sector */ u_int64_t sectors; /* total number sectors */ u_int stripesize; u_int stripeoffset; }; #define UNMAP_RANGE_MAX 0xffffffff #define UNMAP_HEAD_SIZE 8 #define UNMAP_RANGE_SIZE 16 #define UNMAP_MAX_RANGES 2048 /* Protocol Max is 4095 */ #define UNMAP_BUF_SIZE ((UNMAP_MAX_RANGES * UNMAP_RANGE_SIZE) + \ UNMAP_HEAD_SIZE) #define WS10_MAX_BLKS 0xffff #define WS16_MAX_BLKS 0xffffffff #define ATA_TRIM_MAX_RANGES ((UNMAP_BUF_SIZE / \ (ATA_DSM_RANGE_SIZE * ATA_DSM_BLK_SIZE)) * ATA_DSM_BLK_SIZE) struct da_softc { struct bio_queue_head bio_queue; struct bio_queue_head delete_queue; struct bio_queue_head delete_run_queue; LIST_HEAD(, ccb_hdr) pending_ccbs; int tur; /* TEST UNIT READY should be sent */ int refcount; /* Active xpt_action() calls */ da_state state; da_flags flags; da_quirks quirks; int sort_io_queue; int minimum_cmd_size; int error_inject; int trim_max_ranges; int delete_running; int delete_available; /* Delete methods possibly available */ u_int maxio; uint32_t unmap_max_ranges; uint32_t unmap_max_lba; /* Max LBAs in UNMAP req */ uint64_t ws_max_blks; da_delete_methods delete_method; da_delete_func_t *delete_func; struct disk_params params; struct disk *disk; union ccb saved_ccb; struct task sysctl_task; struct sysctl_ctx_list sysctl_ctx; struct sysctl_oid *sysctl_tree; struct callout sendordered_c; uint64_t wwpn; uint8_t unmap_buf[UNMAP_BUF_SIZE]; struct scsi_read_capacity_data_long rcaplong; struct callout mediapoll_c; }; #define dadeleteflag(softc, delete_method, enable) \ if (enable) { \ softc->delete_available |= (1 << delete_method); \ } else { \ softc->delete_available &= ~(1 << delete_method); \ } struct da_quirk_entry { struct scsi_inquiry_pattern inq_pat; da_quirks quirks; }; static const char quantum[] = "QUANTUM"; static const char microp[] = "MICROP"; static struct da_quirk_entry da_quirk_table[] = { /* SPI, FC devices */ { /* * Fujitsu M2513A MO drives. * Tested devices: M2513A2 firmware versions 1200 & 1300. * (dip switch selects whether T_DIRECT or T_OPTICAL device) * Reported by: W.Scholten */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "FUJITSU", "M2513A", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* See above. */ {T_OPTICAL, SIP_MEDIA_REMOVABLE, "FUJITSU", "M2513A", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * This particular Fujitsu drive doesn't like the * synchronize cache command. * Reported by: Tom Jackson */ {T_DIRECT, SIP_MEDIA_FIXED, "FUJITSU", "M2954*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * This drive doesn't like the synchronize cache command * either. Reported by: Matthew Jacob * in NetBSD PR kern/6027, August 24, 1998. */ {T_DIRECT, SIP_MEDIA_FIXED, microp, "2217*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * This drive doesn't like the synchronize cache command * either. Reported by: Hellmuth Michaelis (hm@kts.org) * (PR 8882). */ {T_DIRECT, SIP_MEDIA_FIXED, microp, "2112*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Doesn't like the synchronize cache command. * Reported by: Blaz Zupan */ {T_DIRECT, SIP_MEDIA_FIXED, "NEC", "D3847*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Doesn't like the synchronize cache command. * Reported by: Blaz Zupan */ {T_DIRECT, SIP_MEDIA_FIXED, quantum, "MAVERICK 540S", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Doesn't like the synchronize cache command. */ {T_DIRECT, SIP_MEDIA_FIXED, quantum, "LPS525S", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Doesn't like the synchronize cache command. * Reported by: walter@pelissero.de */ {T_DIRECT, SIP_MEDIA_FIXED, quantum, "LPS540S", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Doesn't work correctly with 6 byte reads/writes. * Returns illegal request, and points to byte 9 of the * 6-byte CDB. * Reported by: Adam McDougall */ {T_DIRECT, SIP_MEDIA_FIXED, quantum, "VIKING 4*", "*"}, /*quirks*/ DA_Q_NO_6_BYTE }, { /* See above. */ {T_DIRECT, SIP_MEDIA_FIXED, quantum, "VIKING 2*", "*"}, /*quirks*/ DA_Q_NO_6_BYTE }, { /* * Doesn't like the synchronize cache command. * Reported by: walter@pelissero.de */ {T_DIRECT, SIP_MEDIA_FIXED, "CONNER", "CP3500*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * The CISS RAID controllers do not support SYNC_CACHE */ {T_DIRECT, SIP_MEDIA_FIXED, "COMPAQ", "RAID*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * The STEC SSDs sometimes hang on UNMAP. */ {T_DIRECT, SIP_MEDIA_FIXED, "STEC", "*", "*"}, /*quirks*/ DA_Q_NO_UNMAP }, /* USB mass storage devices supported by umass(4) */ { /* * EXATELECOM (Sigmatel) i-Bead 100/105 USB Flash MP3 Player * PR: kern/51675 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "EXATEL", "i-BEAD10*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Power Quotient Int. (PQI) USB flash key * PR: kern/53067 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Generic*", "USB Flash Disk*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Creative Nomad MUVO mp3 player (USB) * PR: kern/53094 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "CREATIVE", "NOMAD_MUVO", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE|DA_Q_NO_PREVENT }, { /* * Jungsoft NEXDISK USB flash key * PR: kern/54737 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "JUNGSOFT", "NEXDISK*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * FreeDik USB Mini Data Drive * PR: kern/54786 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "FreeDik*", "Mini Data Drive", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Sigmatel USB Flash MP3 Player * PR: kern/57046 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "SigmaTel", "MSCN", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE|DA_Q_NO_PREVENT }, { /* * Neuros USB Digital Audio Computer * PR: kern/63645 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "NEUROS", "dig. audio comp.", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * SEAGRAND NP-900 MP3 Player * PR: kern/64563 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "SEAGRAND", "NP-900*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE|DA_Q_NO_PREVENT }, { /* * iRiver iFP MP3 player (with UMS Firmware) * PR: kern/54881, i386/63941, kern/66124 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "iRiver", "iFP*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Frontier Labs NEX IA+ Digital Audio Player, rev 1.10/0.01 * PR: kern/70158 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "FL" , "Nex*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * ZICPlay USB MP3 Player with FM * PR: kern/75057 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "ACTIONS*" , "USB DISK*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * TEAC USB floppy mechanisms */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "TEAC" , "FD-05*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Kingston DataTraveler II+ USB Pen-Drive. * Reported by: Pawel Jakub Dawidek */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Kingston" , "DataTraveler II+", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * USB DISK Pro PMAP * Reported by: jhs * PR: usb/96381 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, " ", "USB DISK Pro", "PMAP"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Motorola E398 Mobile Phone (TransFlash memory card). * Reported by: Wojciech A. Koszek * PR: usb/89889 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Motorola" , "Motorola Phone", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Qware BeatZkey! Pro * PR: usb/79164 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "GENERIC", "USB DISK DEVICE", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Time DPA20B 1GB MP3 Player * PR: usb/81846 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "USB2.0*", "(FS) FLASH DISK*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Samsung USB key 128Mb * PR: usb/90081 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "USB-DISK", "FreeDik-FlashUsb", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Kingston DataTraveler 2.0 USB Flash memory. * PR: usb/89196 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Kingston", "DataTraveler 2.0", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Creative MUVO Slim mp3 player (USB) * PR: usb/86131 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "CREATIVE", "MuVo Slim", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE|DA_Q_NO_PREVENT }, { /* * United MP5512 Portable MP3 Player (2-in-1 USB DISK/MP3) * PR: usb/80487 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Generic*", "MUSIC DISK", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * SanDisk Micro Cruzer 128MB * PR: usb/75970 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "SanDisk" , "Micro Cruzer", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * TOSHIBA TransMemory USB sticks * PR: kern/94660 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "TOSHIBA", "TransMemory", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * PNY USB 3.0 Flash Drives */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "PNY", "USB 3.0 FD*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE | DA_Q_NO_RC16 }, { /* * PNY USB Flash keys * PR: usb/75578, usb/72344, usb/65436 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "*" , "USB DISK*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Genesys 6-in-1 Card Reader * PR: usb/94647 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Generic*", "STORAGE DEVICE*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Rekam Digital CAMERA * PR: usb/98713 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "CAMERA*", "4MP-9J6*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * iRiver H10 MP3 player * PR: usb/102547 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "iriver", "H10*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * iRiver U10 MP3 player * PR: usb/92306 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "iriver", "U10*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * X-Micro Flash Disk * PR: usb/96901 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "X-Micro", "Flash Disk", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * EasyMP3 EM732X USB 2.0 Flash MP3 Player * PR: usb/96546 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "EM732X", "MP3 Player*", "1.00"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Denver MP3 player * PR: usb/107101 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "DENVER", "MP3 PLAYER", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Philips USB Key Audio KEY013 * PR: usb/68412 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "PHILIPS", "Key*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE | DA_Q_NO_PREVENT }, { /* * JNC MP3 Player * PR: usb/94439 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "JNC*" , "MP3 Player*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * SAMSUNG MP0402H * PR: usb/108427 */ {T_DIRECT, SIP_MEDIA_FIXED, "SAMSUNG", "MP0402H", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * I/O Magic USB flash - Giga Bank * PR: usb/108810 */ {T_DIRECT, SIP_MEDIA_FIXED, "GS-Magic", "stor*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * JoyFly 128mb USB Flash Drive * PR: 96133 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "USB 2.0", "Flash Disk*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * ChipsBnk usb stick * PR: 103702 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "ChipsBnk", "USB*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Storcase (Kingston) InfoStation IFS FC2/SATA-R 201A * PR: 129858 */ {T_DIRECT, SIP_MEDIA_FIXED, "IFS", "FC2/SATA-R*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Samsung YP-U3 mp3-player * PR: 125398 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Samsung", "YP-U3", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { {T_DIRECT, SIP_MEDIA_REMOVABLE, "Netac", "OnlyDisk*", "2000"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Sony Cyber-Shot DSC cameras * PR: usb/137035 */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "Sony", "Sony DSC", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE | DA_Q_NO_PREVENT }, { {T_DIRECT, SIP_MEDIA_REMOVABLE, "Kingston", "DataTraveler G3", "1.00"}, /*quirks*/ DA_Q_NO_PREVENT }, { /* At least several Transcent USB sticks lie on RC16. */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "JetFlash", "Transcend*", "*"}, /*quirks*/ DA_Q_NO_RC16 }, /* ATA/SATA devices over SAS/USB/... */ { /* Hitachi Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "Hitachi", "H??????????E3*", "*" }, /*quirks*/DA_Q_4K }, { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "SAMSUNG HD155UI*", "*" }, /*quirks*/DA_Q_4K }, { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "SAMSUNG", "HD155UI*", "*" }, /*quirks*/DA_Q_4K }, { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "SAMSUNG HD204UI*", "*" }, /*quirks*/DA_Q_4K }, { /* Samsung Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "SAMSUNG", "HD204UI*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST????DL*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST????DL", "*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST???DM*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST???DM*", "*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST????DM*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Barracuda Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST????DM", "*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9500423AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST950042", "3AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9500424AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST950042", "4AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9640423AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST964042", "3AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9640424AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST964042", "4AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9750420AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST975042", "0AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9750422AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST975042", "2AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST9750423AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST975042", "3AS*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Thin Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "ST???LT*", "*" }, /*quirks*/DA_Q_4K }, { /* Seagate Momentus Thin Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ST???LT*", "*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD????RS*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "??RS*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD????RX*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "??RX*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD??????RS*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "????RS*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD??????RX*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Caviar Green Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "????RX*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Black Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD???PKT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Black Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "?PKT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Black Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD?????PKT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Black Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "???PKT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Blue Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD???PVT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Blue Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "?PVT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Blue Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "WDC WD?????PVT*", "*" }, /*quirks*/DA_Q_4K }, { /* WDC Scorpio Blue Advanced Format (4k) drives */ { T_DIRECT, SIP_MEDIA_FIXED, "WDC WD??", "???PVT*", "*" }, /*quirks*/DA_Q_4K }, { /* * Olympus FE-210 camera */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "OLYMPUS", "FE210*", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * LG UP3S MP3 player */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "LG", "UP3S", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * Laser MP3-2GA13 MP3 player */ {T_DIRECT, SIP_MEDIA_REMOVABLE, "USB 2.0", "(HS) Flash Disk", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, { /* * LaCie external 250GB Hard drive des by Porsche * Submitted by: Ben Stuyts * PR: 121474 */ {T_DIRECT, SIP_MEDIA_FIXED, "SAMSUNG", "HM250JI", "*"}, /*quirks*/ DA_Q_NO_SYNC_CACHE }, /* SATA SSDs */ { /* * Corsair Force 2 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Corsair CSSD-F*", "*" }, /*quirks*/DA_Q_4K }, { /* * Corsair Force 3 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Corsair Force 3*", "*" }, /*quirks*/DA_Q_4K }, { /* * Corsair Neutron GTX SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "Corsair Neutron GTX*", "*" }, /*quirks*/DA_Q_4K }, { /* * Corsair Force GT & GS SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Corsair Force G*", "*" }, /*quirks*/DA_Q_4K }, { /* * Crucial M4 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "M4-CT???M4SSD2*", "*" }, /*quirks*/DA_Q_4K }, { /* * Crucial RealSSD C300 SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "C300-CTFDDAC???MAG*", "*" }, /*quirks*/DA_Q_4K }, { /* * Intel 320 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "INTEL SSDSA2CW*", "*" }, /*quirks*/DA_Q_4K }, { /* * Intel 330 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "INTEL SSDSC2CT*", "*" }, /*quirks*/DA_Q_4K }, { /* * Intel 510 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "INTEL SSDSC2MH*", "*" }, /*quirks*/DA_Q_4K }, { /* * Intel 520 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "INTEL SSDSC2BW*", "*" }, /*quirks*/DA_Q_4K }, { /* * Intel X25-M Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "INTEL SSDSA2M*", "*" }, /*quirks*/DA_Q_4K }, { /* * Kingston E100 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "KINGSTON SE100S3*", "*" }, /*quirks*/DA_Q_4K }, { /* * Kingston HyperX 3k SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "KINGSTON SH103S3*", "*" }, /*quirks*/DA_Q_4K }, { /* * Marvell SSDs (entry taken from OpenSolaris) * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "MARVELL SD88SA02*", "*" }, /*quirks*/DA_Q_4K }, { /* * OCZ Agility 2 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "*", "OCZ-AGILITY2*", "*" }, /*quirks*/DA_Q_4K }, { /* * OCZ Agility 3 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "OCZ-AGILITY3*", "*" }, /*quirks*/DA_Q_4K }, { /* * OCZ Deneva R Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "DENRSTE251M45*", "*" }, /*quirks*/DA_Q_4K }, { /* * OCZ Vertex 2 SSDs (inc pro series) * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "OCZ?VERTEX2*", "*" }, /*quirks*/DA_Q_4K }, { /* * OCZ Vertex 3 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "OCZ-VERTEX3*", "*" }, /*quirks*/DA_Q_4K }, { /* * OCZ Vertex 4 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "OCZ-VERTEX4*", "*" }, /*quirks*/DA_Q_4K }, { /* * Samsung 830 Series SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "SAMSUNG SSD 830 Series*", "*" }, /*quirks*/DA_Q_4K }, { /* * Samsung 840 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Samsung SSD 840*", "*" }, /*quirks*/DA_Q_4K }, { /* * Samsung 843T Series SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "SAMSUNG MZ7WD*", "*" }, /*quirks*/DA_Q_4K }, { /* * Samsung 850 SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "Samsung SSD 850*", "*" }, /*quirks*/DA_Q_4K }, { /* * Samsung PM853T Series SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "SAMSUNG MZ7GE*", "*" }, /*quirks*/DA_Q_4K }, { /* * SuperTalent TeraDrive CT SSDs * 4k optimised & trim only works in 4k requests + 4k aligned */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "FTM??CT25H*", "*" }, /*quirks*/DA_Q_4K }, { /* * XceedIOPS SATA SSDs * 4k optimised */ { T_DIRECT, SIP_MEDIA_FIXED, "ATA", "SG9XCS2D*", "*" }, /*quirks*/DA_Q_4K }, }; static disk_strategy_t dastrategy; static dumper_t dadump; static periph_init_t dainit; static void daasync(void *callback_arg, u_int32_t code, struct cam_path *path, void *arg); static void dasysctlinit(void *context, int pending); static int dacmdsizesysctl(SYSCTL_HANDLER_ARGS); static int dadeletemethodsysctl(SYSCTL_HANDLER_ARGS); static int dadeletemaxsysctl(SYSCTL_HANDLER_ARGS); static void dadeletemethodset(struct da_softc *softc, da_delete_methods delete_method); static off_t dadeletemaxsize(struct da_softc *softc, da_delete_methods delete_method); static void dadeletemethodchoose(struct da_softc *softc, da_delete_methods default_method); static void daprobedone(struct cam_periph *periph, union ccb *ccb); static periph_ctor_t daregister; static periph_dtor_t dacleanup; static periph_start_t dastart; static periph_oninv_t daoninvalidate; static void dadone(struct cam_periph *periph, union ccb *done_ccb); static int daerror(union ccb *ccb, u_int32_t cam_flags, u_int32_t sense_flags); static void daprevent(struct cam_periph *periph, int action); static void dareprobe(struct cam_periph *periph); static void dasetgeom(struct cam_periph *periph, uint32_t block_len, uint64_t maxsector, struct scsi_read_capacity_data_long *rcaplong, size_t rcap_size); static timeout_t dasendorderedtag; static void dashutdown(void *arg, int howto); static timeout_t damediapoll; #ifndef DA_DEFAULT_POLL_PERIOD #define DA_DEFAULT_POLL_PERIOD 3 #endif #ifndef DA_DEFAULT_TIMEOUT #define DA_DEFAULT_TIMEOUT 60 /* Timeout in seconds */ #endif #ifndef DA_DEFAULT_RETRY #define DA_DEFAULT_RETRY 4 #endif #ifndef DA_DEFAULT_SEND_ORDERED #define DA_DEFAULT_SEND_ORDERED 1 #endif #define DA_SIO (softc->sort_io_queue >= 0 ? \ softc->sort_io_queue : cam_sort_io_queues) static int da_poll_period = DA_DEFAULT_POLL_PERIOD; static int da_retry_count = DA_DEFAULT_RETRY; static int da_default_timeout = DA_DEFAULT_TIMEOUT; static int da_send_ordered = DA_DEFAULT_SEND_ORDERED; static SYSCTL_NODE(_kern_cam, OID_AUTO, da, CTLFLAG_RD, 0, "CAM Direct Access Disk driver"); SYSCTL_INT(_kern_cam_da, OID_AUTO, poll_period, CTLFLAG_RWTUN, &da_poll_period, 0, "Media polling period in seconds"); SYSCTL_INT(_kern_cam_da, OID_AUTO, retry_count, CTLFLAG_RWTUN, &da_retry_count, 0, "Normal I/O retry count"); SYSCTL_INT(_kern_cam_da, OID_AUTO, default_timeout, CTLFLAG_RWTUN, &da_default_timeout, 0, "Normal I/O timeout (in seconds)"); SYSCTL_INT(_kern_cam_da, OID_AUTO, send_ordered, CTLFLAG_RWTUN, &da_send_ordered, 0, "Send Ordered Tags"); /* * DA_ORDEREDTAG_INTERVAL determines how often, relative * to the default timeout, we check to see whether an ordered * tagged transaction is appropriate to prevent simple tag * starvation. Since we'd like to ensure that there is at least * 1/2 of the timeout length left for a starved transaction to * complete after we've sent an ordered tag, we must poll at least * four times in every timeout period. This takes care of the worst * case where a starved transaction starts during an interval that * meets the requirement "don't send an ordered tag" test so it takes * us two intervals to determine that a tag must be sent. */ #ifndef DA_ORDEREDTAG_INTERVAL #define DA_ORDEREDTAG_INTERVAL 4 #endif static struct periph_driver dadriver = { dainit, "da", TAILQ_HEAD_INITIALIZER(dadriver.units), /* generation */ 0 }; PERIPHDRIVER_DECLARE(da, dadriver); static MALLOC_DEFINE(M_SCSIDA, "scsi_da", "scsi_da buffers"); static int daopen(struct disk *dp) { struct cam_periph *periph; struct da_softc *softc; int error; periph = (struct cam_periph *)dp->d_drv1; if (cam_periph_acquire(periph) != CAM_REQ_CMP) { return (ENXIO); } cam_periph_lock(periph); if ((error = cam_periph_hold(periph, PRIBIO|PCATCH)) != 0) { cam_periph_unlock(periph); cam_periph_release(periph); return (error); } CAM_DEBUG(periph->path, CAM_DEBUG_TRACE | CAM_DEBUG_PERIPH, ("daopen\n")); softc = (struct da_softc *)periph->softc; dareprobe(periph); /* Wait for the disk size update. */ error = cam_periph_sleep(periph, &softc->disk->d_mediasize, PRIBIO, "dareprobe", 0); if (error != 0) xpt_print(periph->path, "unable to retrieve capacity data\n"); if (periph->flags & CAM_PERIPH_INVALID) error = ENXIO; if (error == 0 && (softc->flags & DA_FLAG_PACK_REMOVABLE) != 0 && (softc->quirks & DA_Q_NO_PREVENT) == 0) daprevent(periph, PR_PREVENT); if (error == 0) { softc->flags &= ~DA_FLAG_PACK_INVALID; softc->flags |= DA_FLAG_OPEN; } cam_periph_unhold(periph); cam_periph_unlock(periph); if (error != 0) cam_periph_release(periph); return (error); } static int daclose(struct disk *dp) { struct cam_periph *periph; struct da_softc *softc; union ccb *ccb; int error; periph = (struct cam_periph *)dp->d_drv1; softc = (struct da_softc *)periph->softc; cam_periph_lock(periph); CAM_DEBUG(periph->path, CAM_DEBUG_TRACE | CAM_DEBUG_PERIPH, ("daclose\n")); if (cam_periph_hold(periph, PRIBIO) == 0) { /* Flush disk cache. */ if ((softc->flags & DA_FLAG_DIRTY) != 0 && (softc->quirks & DA_Q_NO_SYNC_CACHE) == 0 && (softc->flags & DA_FLAG_PACK_INVALID) == 0) { ccb = cam_periph_getccb(periph, CAM_PRIORITY_NORMAL); scsi_synchronize_cache(&ccb->csio, /*retries*/1, /*cbfcnp*/dadone, MSG_SIMPLE_Q_TAG, /*begin_lba*/0, /*lb_count*/0, SSD_FULL_SIZE, 5 * 60 * 1000); error = cam_periph_runccb(ccb, daerror, /*cam_flags*/0, /*sense_flags*/SF_RETRY_UA | SF_QUIET_IR, softc->disk->d_devstat); if (error == 0) softc->flags &= ~DA_FLAG_DIRTY; xpt_release_ccb(ccb); } /* Allow medium removal. */ if ((softc->flags & DA_FLAG_PACK_REMOVABLE) != 0 && (softc->quirks & DA_Q_NO_PREVENT) == 0) daprevent(periph, PR_ALLOW); cam_periph_unhold(periph); } /* * If we've got removeable media, mark the blocksize as * unavailable, since it could change when new media is * inserted. */ if ((softc->flags & DA_FLAG_PACK_REMOVABLE) != 0) softc->disk->d_devstat->flags |= DEVSTAT_BS_UNAVAILABLE; softc->flags &= ~DA_FLAG_OPEN; while (softc->refcount != 0) cam_periph_sleep(periph, &softc->refcount, PRIBIO, "daclose", 1); cam_periph_unlock(periph); cam_periph_release(periph); return (0); } static void daschedule(struct cam_periph *periph) { struct da_softc *softc = (struct da_softc *)periph->softc; if (softc->state != DA_STATE_NORMAL) return; /* Check if we have more work to do. */ if (bioq_first(&softc->bio_queue) || (!softc->delete_running && bioq_first(&softc->delete_queue)) || softc->tur) { xpt_schedule(periph, CAM_PRIORITY_NORMAL); } } /* * Actually translate the requested transfer into one the physical driver * can understand. The transfer is described by a buf and will include * only one physical transfer. */ static void dastrategy(struct bio *bp) { struct cam_periph *periph; struct da_softc *softc; periph = (struct cam_periph *)bp->bio_disk->d_drv1; softc = (struct da_softc *)periph->softc; cam_periph_lock(periph); /* * If the device has been made invalid, error out */ if ((softc->flags & DA_FLAG_PACK_INVALID)) { cam_periph_unlock(periph); biofinish(bp, NULL, ENXIO); return; } CAM_DEBUG(periph->path, CAM_DEBUG_TRACE, ("dastrategy(%p)\n", bp)); /* * Place it in the queue of disk activities for this disk */ if (bp->bio_cmd == BIO_DELETE) { bioq_disksort(&softc->delete_queue, bp); } else if (DA_SIO) { bioq_disksort(&softc->bio_queue, bp); } else { bioq_insert_tail(&softc->bio_queue, bp); } /* * Schedule ourselves for performing the work. */ daschedule(periph); cam_periph_unlock(periph); return; } static int dadump(void *arg, void *virtual, vm_offset_t physical, off_t offset, size_t length) { struct cam_periph *periph; struct da_softc *softc; u_int secsize; struct ccb_scsiio csio; struct disk *dp; int error = 0; dp = arg; periph = dp->d_drv1; softc = (struct da_softc *)periph->softc; cam_periph_lock(periph); secsize = softc->params.secsize; if ((softc->flags & DA_FLAG_PACK_INVALID) != 0) { cam_periph_unlock(periph); return (ENXIO); } if (length > 0) { xpt_setup_ccb(&csio.ccb_h, periph->path, CAM_PRIORITY_NORMAL); csio.ccb_h.ccb_state = DA_CCB_DUMP; scsi_read_write(&csio, /*retries*/0, dadone, MSG_ORDERED_Q_TAG, /*read*/SCSI_RW_WRITE, /*byte2*/0, /*minimum_cmd_size*/ softc->minimum_cmd_size, offset / secsize, length / secsize, /*data_ptr*/(u_int8_t *) virtual, /*dxfer_len*/length, /*sense_len*/SSD_FULL_SIZE, da_default_timeout * 1000); xpt_polled_action((union ccb *)&csio); error = cam_periph_error((union ccb *)&csio, 0, SF_NO_RECOVERY | SF_NO_RETRY, NULL); if ((csio.ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(csio.ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); if (error != 0) printf("Aborting dump due to I/O error.\n"); cam_periph_unlock(periph); return (error); } /* * Sync the disk cache contents to the physical media. */ if ((softc->quirks & DA_Q_NO_SYNC_CACHE) == 0) { xpt_setup_ccb(&csio.ccb_h, periph->path, CAM_PRIORITY_NORMAL); csio.ccb_h.ccb_state = DA_CCB_DUMP; scsi_synchronize_cache(&csio, /*retries*/0, /*cbfcnp*/dadone, MSG_SIMPLE_Q_TAG, /*begin_lba*/0,/* Cover the whole disk */ /*lb_count*/0, SSD_FULL_SIZE, 5 * 60 * 1000); xpt_polled_action((union ccb *)&csio); error = cam_periph_error((union ccb *)&csio, 0, SF_NO_RECOVERY | SF_NO_RETRY | SF_QUIET_IR, NULL); if ((csio.ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(csio.ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); if (error != 0) xpt_print(periph->path, "Synchronize cache failed\n"); } cam_periph_unlock(periph); return (error); } static int dagetattr(struct bio *bp) { int ret; struct cam_periph *periph; periph = (struct cam_periph *)bp->bio_disk->d_drv1; cam_periph_lock(periph); ret = xpt_getattr(bp->bio_data, bp->bio_length, bp->bio_attribute, periph->path); cam_periph_unlock(periph); if (ret == 0) bp->bio_completed = bp->bio_length; return ret; } static void dainit(void) { cam_status status; /* * Install a global async callback. This callback will * receive async callbacks like "new device found". */ status = xpt_register_async(AC_FOUND_DEVICE, daasync, NULL, NULL); if (status != CAM_REQ_CMP) { printf("da: Failed to attach master async callback " "due to status 0x%x!\n", status); } else if (da_send_ordered) { /* Register our shutdown event handler */ if ((EVENTHANDLER_REGISTER(shutdown_post_sync, dashutdown, NULL, SHUTDOWN_PRI_DEFAULT)) == NULL) printf("dainit: shutdown event registration failed!\n"); } } /* * Callback from GEOM, called when it has finished cleaning up its * resources. */ static void dadiskgonecb(struct disk *dp) { struct cam_periph *periph; periph = (struct cam_periph *)dp->d_drv1; cam_periph_release(periph); } static void daoninvalidate(struct cam_periph *periph) { struct da_softc *softc; softc = (struct da_softc *)periph->softc; /* * De-register any async callbacks. */ xpt_register_async(0, daasync, periph, periph->path); softc->flags |= DA_FLAG_PACK_INVALID; /* * Return all queued I/O with ENXIO. * XXX Handle any transactions queued to the card * with XPT_ABORT_CCB. */ bioq_flush(&softc->bio_queue, NULL, ENXIO); bioq_flush(&softc->delete_queue, NULL, ENXIO); /* * Tell GEOM that we've gone away, we'll get a callback when it is * done cleaning up its resources. */ disk_gone(softc->disk); } static void dacleanup(struct cam_periph *periph) { struct da_softc *softc; softc = (struct da_softc *)periph->softc; cam_periph_unlock(periph); /* * If we can't free the sysctl tree, oh well... */ if ((softc->flags & DA_FLAG_SCTX_INIT) != 0 && sysctl_ctx_free(&softc->sysctl_ctx) != 0) { xpt_print(periph->path, "can't remove sysctl context\n"); } callout_drain(&softc->mediapoll_c); disk_destroy(softc->disk); callout_drain(&softc->sendordered_c); free(softc, M_DEVBUF); cam_periph_lock(periph); } static void daasync(void *callback_arg, u_int32_t code, struct cam_path *path, void *arg) { struct cam_periph *periph; struct da_softc *softc; periph = (struct cam_periph *)callback_arg; switch (code) { case AC_FOUND_DEVICE: { struct ccb_getdev *cgd; cam_status status; cgd = (struct ccb_getdev *)arg; if (cgd == NULL) break; if (cgd->protocol != PROTO_SCSI) break; if (SID_TYPE(&cgd->inq_data) != T_DIRECT && SID_TYPE(&cgd->inq_data) != T_RBC && SID_TYPE(&cgd->inq_data) != T_OPTICAL) break; /* * Allocate a peripheral instance for * this device and start the probe * process. */ status = cam_periph_alloc(daregister, daoninvalidate, dacleanup, dastart, "da", CAM_PERIPH_BIO, path, daasync, AC_FOUND_DEVICE, cgd); if (status != CAM_REQ_CMP && status != CAM_REQ_INPROG) printf("daasync: Unable to attach to new device " "due to status 0x%x\n", status); return; } case AC_ADVINFO_CHANGED: { uintptr_t buftype; buftype = (uintptr_t)arg; if (buftype == CDAI_TYPE_PHYS_PATH) { struct da_softc *softc; softc = periph->softc; disk_attr_changed(softc->disk, "GEOM::physpath", M_NOWAIT); } break; } case AC_UNIT_ATTENTION: { union ccb *ccb; int error_code, sense_key, asc, ascq; softc = (struct da_softc *)periph->softc; ccb = (union ccb *)arg; /* * Handle all UNIT ATTENTIONs except our own, * as they will be handled by daerror(). */ if (xpt_path_periph(ccb->ccb_h.path) != periph && scsi_extract_sense_ccb(ccb, &error_code, &sense_key, &asc, &ascq)) { if (asc == 0x2A && ascq == 0x09) { xpt_print(ccb->ccb_h.path, "Capacity data has changed\n"); softc->flags &= ~DA_FLAG_PROBED; dareprobe(periph); } else if (asc == 0x28 && ascq == 0x00) { softc->flags &= ~DA_FLAG_PROBED; disk_media_changed(softc->disk, M_NOWAIT); } else if (asc == 0x3F && ascq == 0x03) { xpt_print(ccb->ccb_h.path, "INQUIRY data has changed\n"); softc->flags &= ~DA_FLAG_PROBED; dareprobe(periph); } } cam_periph_async(periph, code, path, arg); break; } case AC_SCSI_AEN: softc = (struct da_softc *)periph->softc; if (!softc->tur) { if (cam_periph_acquire(periph) == CAM_REQ_CMP) { softc->tur = 1; daschedule(periph); } } /* FALLTHROUGH */ case AC_SENT_BDR: case AC_BUS_RESET: { struct ccb_hdr *ccbh; softc = (struct da_softc *)periph->softc; /* * Don't fail on the expected unit attention * that will occur. */ softc->flags |= DA_FLAG_RETRY_UA; LIST_FOREACH(ccbh, &softc->pending_ccbs, periph_links.le) ccbh->ccb_state |= DA_CCB_RETRY_UA; break; } default: break; } cam_periph_async(periph, code, path, arg); } static void dasysctlinit(void *context, int pending) { struct cam_periph *periph; struct da_softc *softc; char tmpstr[80], tmpstr2[80]; struct ccb_trans_settings cts; periph = (struct cam_periph *)context; /* * periph was held for us when this task was enqueued */ if (periph->flags & CAM_PERIPH_INVALID) { cam_periph_release(periph); return; } softc = (struct da_softc *)periph->softc; snprintf(tmpstr, sizeof(tmpstr), "CAM DA unit %d", periph->unit_number); snprintf(tmpstr2, sizeof(tmpstr2), "%d", periph->unit_number); sysctl_ctx_init(&softc->sysctl_ctx); softc->flags |= DA_FLAG_SCTX_INIT; softc->sysctl_tree = SYSCTL_ADD_NODE(&softc->sysctl_ctx, SYSCTL_STATIC_CHILDREN(_kern_cam_da), OID_AUTO, tmpstr2, CTLFLAG_RD, 0, tmpstr); if (softc->sysctl_tree == NULL) { printf("dasysctlinit: unable to allocate sysctl tree\n"); cam_periph_release(periph); return; } /* * Now register the sysctl handler, so the user can change the value on * the fly. */ SYSCTL_ADD_PROC(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "delete_method", CTLTYPE_STRING | CTLFLAG_RW, softc, 0, dadeletemethodsysctl, "A", "BIO_DELETE execution method"); SYSCTL_ADD_PROC(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "delete_max", CTLTYPE_U64 | CTLFLAG_RW, softc, 0, dadeletemaxsysctl, "Q", "Maximum BIO_DELETE size"); SYSCTL_ADD_PROC(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "minimum_cmd_size", CTLTYPE_INT | CTLFLAG_RW, &softc->minimum_cmd_size, 0, dacmdsizesysctl, "I", "Minimum CDB size"); SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "sort_io_queue", CTLFLAG_RW, &softc->sort_io_queue, 0, "Sort IO queue to try and optimise disk access patterns"); SYSCTL_ADD_INT(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "error_inject", CTLFLAG_RW, &softc->error_inject, 0, "error_inject leaf"); /* * Add some addressing info. */ memset(&cts, 0, sizeof (cts)); xpt_setup_ccb(&cts.ccb_h, periph->path, CAM_PRIORITY_NONE); cts.ccb_h.func_code = XPT_GET_TRAN_SETTINGS; cts.type = CTS_TYPE_CURRENT_SETTINGS; cam_periph_lock(periph); xpt_action((union ccb *)&cts); cam_periph_unlock(periph); if (cts.ccb_h.status != CAM_REQ_CMP) { cam_periph_release(periph); return; } if (cts.protocol == PROTO_SCSI && cts.transport == XPORT_FC) { struct ccb_trans_settings_fc *fc = &cts.xport_specific.fc; if (fc->valid & CTS_FC_VALID_WWPN) { softc->wwpn = fc->wwpn; SYSCTL_ADD_UQUAD(&softc->sysctl_ctx, SYSCTL_CHILDREN(softc->sysctl_tree), OID_AUTO, "wwpn", CTLFLAG_RD, &softc->wwpn, "World Wide Port Name"); } } cam_periph_release(periph); } static int dadeletemaxsysctl(SYSCTL_HANDLER_ARGS) { int error; uint64_t value; struct da_softc *softc; softc = (struct da_softc *)arg1; value = softc->disk->d_delmaxsize; error = sysctl_handle_64(oidp, &value, 0, req); if ((error != 0) || (req->newptr == NULL)) return (error); /* only accept values smaller than the calculated value */ if (value > dadeletemaxsize(softc, softc->delete_method)) { return (EINVAL); } softc->disk->d_delmaxsize = value; return (0); } static int dacmdsizesysctl(SYSCTL_HANDLER_ARGS) { int error, value; value = *(int *)arg1; error = sysctl_handle_int(oidp, &value, 0, req); if ((error != 0) || (req->newptr == NULL)) return (error); /* * Acceptable values here are 6, 10, 12 or 16. */ if (value < 6) value = 6; else if ((value > 6) && (value <= 10)) value = 10; else if ((value > 10) && (value <= 12)) value = 12; else if (value > 12) value = 16; *(int *)arg1 = value; return (0); } static void dadeletemethodset(struct da_softc *softc, da_delete_methods delete_method) { softc->delete_method = delete_method; softc->disk->d_delmaxsize = dadeletemaxsize(softc, delete_method); softc->delete_func = da_delete_functions[delete_method]; if (softc->delete_method > DA_DELETE_DISABLE) softc->disk->d_flags |= DISKFLAG_CANDELETE; else softc->disk->d_flags &= ~DISKFLAG_CANDELETE; } static off_t dadeletemaxsize(struct da_softc *softc, da_delete_methods delete_method) { off_t sectors; switch(delete_method) { case DA_DELETE_UNMAP: sectors = (off_t)softc->unmap_max_lba; break; case DA_DELETE_ATA_TRIM: sectors = (off_t)ATA_DSM_RANGE_MAX * softc->trim_max_ranges; break; case DA_DELETE_WS16: sectors = (off_t)min(softc->ws_max_blks, WS16_MAX_BLKS); break; case DA_DELETE_ZERO: case DA_DELETE_WS10: sectors = (off_t)min(softc->ws_max_blks, WS10_MAX_BLKS); break; default: return 0; } return (off_t)softc->params.secsize * min(sectors, (off_t)softc->params.sectors); } static void daprobedone(struct cam_periph *periph, union ccb *ccb) { struct da_softc *softc; softc = (struct da_softc *)periph->softc; dadeletemethodchoose(softc, DA_DELETE_NONE); if (bootverbose && (softc->flags & DA_FLAG_ANNOUNCED) == 0) { char buf[80]; int i, sep; snprintf(buf, sizeof(buf), "Delete methods: <"); sep = 0; for (i = DA_DELETE_MIN; i <= DA_DELETE_MAX; i++) { if (softc->delete_available & (1 << i)) { if (sep) { strlcat(buf, ",", sizeof(buf)); } else { sep = 1; } strlcat(buf, da_delete_method_names[i], sizeof(buf)); if (i == softc->delete_method) { strlcat(buf, "(*)", sizeof(buf)); } } } if (sep == 0) { if (softc->delete_method == DA_DELETE_NONE) strlcat(buf, "NONE(*)", sizeof(buf)); else strlcat(buf, "DISABLED(*)", sizeof(buf)); } strlcat(buf, ">", sizeof(buf)); printf("%s%d: %s\n", periph->periph_name, periph->unit_number, buf); } /* * Since our peripheral may be invalidated by an error * above or an external event, we must release our CCB * before releasing the probe lock on the peripheral. * The peripheral will only go away once the last lock * is removed, and we need it around for the CCB release * operation. */ xpt_release_ccb(ccb); softc->state = DA_STATE_NORMAL; softc->flags |= DA_FLAG_PROBED; daschedule(periph); wakeup(&softc->disk->d_mediasize); if ((softc->flags & DA_FLAG_ANNOUNCED) == 0) { softc->flags |= DA_FLAG_ANNOUNCED; cam_periph_unhold(periph); } else cam_periph_release_locked(periph); } static void dadeletemethodchoose(struct da_softc *softc, da_delete_methods default_method) { int i, delete_method; delete_method = default_method; /* * Use the pre-defined order to choose the best * performing delete. */ for (i = DA_DELETE_MIN; i <= DA_DELETE_MAX; i++) { if (softc->delete_available & (1 << i)) { dadeletemethodset(softc, i); return; } } dadeletemethodset(softc, delete_method); } static int dadeletemethodsysctl(SYSCTL_HANDLER_ARGS) { char buf[16]; const char *p; struct da_softc *softc; int i, error, methods, value; softc = (struct da_softc *)arg1; value = softc->delete_method; if (value < 0 || value > DA_DELETE_MAX) p = "UNKNOWN"; else p = da_delete_method_names[value]; strncpy(buf, p, sizeof(buf)); error = sysctl_handle_string(oidp, buf, sizeof(buf), req); if (error != 0 || req->newptr == NULL) return (error); methods = softc->delete_available | (1 << DA_DELETE_DISABLE); for (i = 0; i <= DA_DELETE_MAX; i++) { if (!(methods & (1 << i)) || strcmp(buf, da_delete_method_names[i]) != 0) continue; dadeletemethodset(softc, i); return (0); } return (EINVAL); } static cam_status daregister(struct cam_periph *periph, void *arg) { struct da_softc *softc; struct ccb_pathinq cpi; struct ccb_getdev *cgd; char tmpstr[80]; caddr_t match; cgd = (struct ccb_getdev *)arg; if (cgd == NULL) { printf("daregister: no getdev CCB, can't register device\n"); return(CAM_REQ_CMP_ERR); } softc = (struct da_softc *)malloc(sizeof(*softc), M_DEVBUF, M_NOWAIT|M_ZERO); if (softc == NULL) { printf("daregister: Unable to probe new device. " "Unable to allocate softc\n"); return(CAM_REQ_CMP_ERR); } LIST_INIT(&softc->pending_ccbs); softc->state = DA_STATE_PROBE_RC; bioq_init(&softc->bio_queue); bioq_init(&softc->delete_queue); bioq_init(&softc->delete_run_queue); if (SID_IS_REMOVABLE(&cgd->inq_data)) softc->flags |= DA_FLAG_PACK_REMOVABLE; softc->unmap_max_ranges = UNMAP_MAX_RANGES; softc->unmap_max_lba = UNMAP_RANGE_MAX; softc->ws_max_blks = WS16_MAX_BLKS; softc->trim_max_ranges = ATA_TRIM_MAX_RANGES; softc->sort_io_queue = -1; periph->softc = softc; /* * See if this device has any quirks. */ match = cam_quirkmatch((caddr_t)&cgd->inq_data, (caddr_t)da_quirk_table, sizeof(da_quirk_table)/sizeof(*da_quirk_table), sizeof(*da_quirk_table), scsi_inquiry_match); if (match != NULL) softc->quirks = ((struct da_quirk_entry *)match)->quirks; else softc->quirks = DA_Q_NONE; /* Check if the SIM does not want 6 byte commands */ bzero(&cpi, sizeof(cpi)); xpt_setup_ccb(&cpi.ccb_h, periph->path, CAM_PRIORITY_NORMAL); cpi.ccb_h.func_code = XPT_PATH_INQ; xpt_action((union ccb *)&cpi); if (cpi.ccb_h.status == CAM_REQ_CMP && (cpi.hba_misc & PIM_NO_6_BYTE)) softc->quirks |= DA_Q_NO_6_BYTE; TASK_INIT(&softc->sysctl_task, 0, dasysctlinit, periph); /* * Take an exclusive refcount on the periph while dastart is called * to finish the probe. The reference will be dropped in dadone at * the end of probe. */ (void)cam_periph_hold(periph, PRIBIO); /* * Schedule a periodic event to occasionally send an * ordered tag to a device. */ callout_init_mtx(&softc->sendordered_c, cam_periph_mtx(periph), 0); callout_reset(&softc->sendordered_c, (da_default_timeout * hz) / DA_ORDEREDTAG_INTERVAL, dasendorderedtag, softc); cam_periph_unlock(periph); /* * RBC devices don't have to support READ(6), only READ(10). */ if (softc->quirks & DA_Q_NO_6_BYTE || SID_TYPE(&cgd->inq_data) == T_RBC) softc->minimum_cmd_size = 10; else softc->minimum_cmd_size = 6; /* * Load the user's default, if any. */ snprintf(tmpstr, sizeof(tmpstr), "kern.cam.da.%d.minimum_cmd_size", periph->unit_number); TUNABLE_INT_FETCH(tmpstr, &softc->minimum_cmd_size); /* * 6, 10, 12 and 16 are the currently permissible values. */ if (softc->minimum_cmd_size < 6) softc->minimum_cmd_size = 6; else if ((softc->minimum_cmd_size > 6) && (softc->minimum_cmd_size <= 10)) softc->minimum_cmd_size = 10; else if ((softc->minimum_cmd_size > 10) && (softc->minimum_cmd_size <= 12)) softc->minimum_cmd_size = 12; else if (softc->minimum_cmd_size > 12) softc->minimum_cmd_size = 16; /* Predict whether device may support READ CAPACITY(16). */ if (SID_ANSI_REV(&cgd->inq_data) >= SCSI_REV_SPC3 && (softc->quirks & DA_Q_NO_RC16) == 0) { softc->flags |= DA_FLAG_CAN_RC16; softc->state = DA_STATE_PROBE_RC16; } /* * Register this media as a disk. */ softc->disk = disk_alloc(); softc->disk->d_devstat = devstat_new_entry(periph->periph_name, periph->unit_number, 0, DEVSTAT_BS_UNAVAILABLE, SID_TYPE(&cgd->inq_data) | XPORT_DEVSTAT_TYPE(cpi.transport), DEVSTAT_PRIORITY_DISK); softc->disk->d_open = daopen; softc->disk->d_close = daclose; softc->disk->d_strategy = dastrategy; softc->disk->d_dump = dadump; softc->disk->d_getattr = dagetattr; softc->disk->d_gone = dadiskgonecb; softc->disk->d_name = "da"; softc->disk->d_drv1 = periph; if (cpi.maxio == 0) softc->maxio = DFLTPHYS; /* traditional default */ else if (cpi.maxio > MAXPHYS) softc->maxio = MAXPHYS; /* for safety */ else softc->maxio = cpi.maxio; softc->disk->d_maxsize = softc->maxio; softc->disk->d_unit = periph->unit_number; softc->disk->d_flags = DISKFLAG_DIRECT_COMPLETION; if ((softc->quirks & DA_Q_NO_SYNC_CACHE) == 0) softc->disk->d_flags |= DISKFLAG_CANFLUSHCACHE; if ((cpi.hba_misc & PIM_UNMAPPED) != 0) softc->disk->d_flags |= DISKFLAG_UNMAPPED_BIO; cam_strvis(softc->disk->d_descr, cgd->inq_data.vendor, sizeof(cgd->inq_data.vendor), sizeof(softc->disk->d_descr)); strlcat(softc->disk->d_descr, " ", sizeof(softc->disk->d_descr)); cam_strvis(&softc->disk->d_descr[strlen(softc->disk->d_descr)], cgd->inq_data.product, sizeof(cgd->inq_data.product), sizeof(softc->disk->d_descr) - strlen(softc->disk->d_descr)); softc->disk->d_hba_vendor = cpi.hba_vendor; softc->disk->d_hba_device = cpi.hba_device; softc->disk->d_hba_subvendor = cpi.hba_subvendor; softc->disk->d_hba_subdevice = cpi.hba_subdevice; /* * Acquire a reference to the periph before we register with GEOM. * We'll release this reference once GEOM calls us back (via * dadiskgonecb()) telling us that our provider has been freed. */ if (cam_periph_acquire(periph) != CAM_REQ_CMP) { xpt_print(periph->path, "%s: lost periph during " "registration!\n", __func__); cam_periph_lock(periph); return (CAM_REQ_CMP_ERR); } disk_create(softc->disk, DISK_VERSION); cam_periph_lock(periph); /* * Add async callbacks for events of interest. * I don't bother checking if this fails as, * in most cases, the system will function just * fine without them and the only alternative * would be to not attach the device on failure. */ xpt_register_async(AC_SENT_BDR | AC_BUS_RESET | AC_LOST_DEVICE | AC_ADVINFO_CHANGED | AC_SCSI_AEN | AC_UNIT_ATTENTION, daasync, periph, periph->path); /* * Emit an attribute changed notification just in case * physical path information arrived before our async * event handler was registered, but after anyone attaching * to our disk device polled it. */ disk_attr_changed(softc->disk, "GEOM::physpath", M_NOWAIT); /* * Schedule a periodic media polling events. */ callout_init_mtx(&softc->mediapoll_c, cam_periph_mtx(periph), 0); if ((softc->flags & DA_FLAG_PACK_REMOVABLE) && (cgd->inq_flags & SID_AEN) == 0 && da_poll_period != 0) callout_reset(&softc->mediapoll_c, da_poll_period * hz, damediapoll, periph); xpt_schedule(periph, CAM_PRIORITY_DEV); return(CAM_REQ_CMP); } static void dastart(struct cam_periph *periph, union ccb *start_ccb) { struct da_softc *softc; softc = (struct da_softc *)periph->softc; CAM_DEBUG(periph->path, CAM_DEBUG_TRACE, ("dastart\n")); skipstate: switch (softc->state) { case DA_STATE_NORMAL: { struct bio *bp; uint8_t tag_code; /* Run BIO_DELETE if not running yet. */ if (!softc->delete_running && (bp = bioq_first(&softc->delete_queue)) != NULL) { if (softc->delete_func != NULL) { softc->delete_func(periph, start_ccb, bp); goto out; } else { bioq_flush(&softc->delete_queue, NULL, 0); /* FALLTHROUGH */ } } /* Run regular command. */ bp = bioq_takefirst(&softc->bio_queue); if (bp == NULL) { if (softc->tur) { softc->tur = 0; scsi_test_unit_ready(&start_ccb->csio, /*retries*/ da_retry_count, dadone, MSG_SIMPLE_Q_TAG, SSD_FULL_SIZE, da_default_timeout * 1000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_TUR; xpt_action(start_ccb); } else xpt_release_ccb(start_ccb); break; } if (softc->tur) { softc->tur = 0; cam_periph_release_locked(periph); } if ((bp->bio_flags & BIO_ORDERED) != 0 || (softc->flags & DA_FLAG_NEED_OTAG) != 0) { softc->flags &= ~DA_FLAG_NEED_OTAG; softc->flags |= DA_FLAG_WAS_OTAG; tag_code = MSG_ORDERED_Q_TAG; } else { tag_code = MSG_SIMPLE_Q_TAG; } switch (bp->bio_cmd) { case BIO_WRITE: softc->flags |= DA_FLAG_DIRTY; /* FALLTHROUGH */ case BIO_READ: scsi_read_write(&start_ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/tag_code, /*read_op*/(bp->bio_cmd == BIO_READ ? SCSI_RW_READ : SCSI_RW_WRITE) | ((bp->bio_flags & BIO_UNMAPPED) != 0 ? SCSI_RW_BIO : 0), /*byte2*/0, softc->minimum_cmd_size, /*lba*/bp->bio_pblkno, /*block_count*/bp->bio_bcount / softc->params.secsize, /*data_ptr*/ (bp->bio_flags & BIO_UNMAPPED) != 0 ? (void *)bp : bp->bio_data, /*dxfer_len*/ bp->bio_bcount, /*sense_len*/SSD_FULL_SIZE, da_default_timeout * 1000); break; case BIO_FLUSH: /* * BIO_FLUSH doesn't currently communicate * range data, so we synchronize the cache * over the whole disk. We also force * ordered tag semantics the flush applies * to all previously queued I/O. */ scsi_synchronize_cache(&start_ccb->csio, /*retries*/1, /*cbfcnp*/dadone, MSG_ORDERED_Q_TAG, /*begin_lba*/0, /*lb_count*/0, SSD_FULL_SIZE, da_default_timeout*1000); break; } start_ccb->ccb_h.ccb_state = DA_CCB_BUFFER_IO; start_ccb->ccb_h.flags |= CAM_UNLOCKED; out: LIST_INSERT_HEAD(&softc->pending_ccbs, &start_ccb->ccb_h, periph_links.le); /* We expect a unit attention from this device */ if ((softc->flags & DA_FLAG_RETRY_UA) != 0) { start_ccb->ccb_h.ccb_state |= DA_CCB_RETRY_UA; softc->flags &= ~DA_FLAG_RETRY_UA; } start_ccb->ccb_h.ccb_bp = bp; softc->refcount++; cam_periph_unlock(periph); xpt_action(start_ccb); cam_periph_lock(periph); softc->refcount--; /* May have more work to do, so ensure we stay scheduled */ daschedule(periph); break; } case DA_STATE_PROBE_RC: { struct scsi_read_capacity_data *rcap; rcap = (struct scsi_read_capacity_data *) malloc(sizeof(*rcap), M_SCSIDA, M_NOWAIT|M_ZERO); if (rcap == NULL) { printf("dastart: Couldn't malloc read_capacity data\n"); /* da_free_periph??? */ break; } scsi_read_capacity(&start_ccb->csio, /*retries*/da_retry_count, dadone, MSG_SIMPLE_Q_TAG, rcap, SSD_FULL_SIZE, /*timeout*/5000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_PROBE_RC; xpt_action(start_ccb); break; } case DA_STATE_PROBE_RC16: { struct scsi_read_capacity_data_long *rcaplong; rcaplong = (struct scsi_read_capacity_data_long *) malloc(sizeof(*rcaplong), M_SCSIDA, M_NOWAIT|M_ZERO); if (rcaplong == NULL) { printf("dastart: Couldn't malloc read_capacity data\n"); /* da_free_periph??? */ break; } scsi_read_capacity_16(&start_ccb->csio, /*retries*/ da_retry_count, /*cbfcnp*/ dadone, /*tag_action*/ MSG_SIMPLE_Q_TAG, /*lba*/ 0, /*reladr*/ 0, /*pmi*/ 0, /*rcap_buf*/ (uint8_t *)rcaplong, /*rcap_buf_len*/ sizeof(*rcaplong), /*sense_len*/ SSD_FULL_SIZE, /*timeout*/ da_default_timeout * 1000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_PROBE_RC16; xpt_action(start_ccb); break; } case DA_STATE_PROBE_LBP: { struct scsi_vpd_logical_block_prov *lbp; if (!scsi_vpd_supported_page(periph, SVPD_LBP)) { /* * If we get here we don't support any SBC-3 delete * methods with UNMAP as the Logical Block Provisioning * VPD page support is required for devices which * support it according to T10/1799-D Revision 31 * however older revisions of the spec don't mandate * this so we currently don't remove these methods * from the available set. */ softc->state = DA_STATE_PROBE_BLK_LIMITS; goto skipstate; } lbp = (struct scsi_vpd_logical_block_prov *) malloc(sizeof(*lbp), M_SCSIDA, M_NOWAIT|M_ZERO); if (lbp == NULL) { printf("dastart: Couldn't malloc lbp data\n"); /* da_free_periph??? */ break; } scsi_inquiry(&start_ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, /*inq_buf*/(u_int8_t *)lbp, /*inq_len*/sizeof(*lbp), /*evpd*/TRUE, /*page_code*/SVPD_LBP, /*sense_len*/SSD_MIN_SIZE, /*timeout*/da_default_timeout * 1000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_PROBE_LBP; xpt_action(start_ccb); break; } case DA_STATE_PROBE_BLK_LIMITS: { struct scsi_vpd_block_limits *block_limits; if (!scsi_vpd_supported_page(periph, SVPD_BLOCK_LIMITS)) { /* Not supported skip to next probe */ softc->state = DA_STATE_PROBE_BDC; goto skipstate; } block_limits = (struct scsi_vpd_block_limits *) malloc(sizeof(*block_limits), M_SCSIDA, M_NOWAIT|M_ZERO); if (block_limits == NULL) { printf("dastart: Couldn't malloc block_limits data\n"); /* da_free_periph??? */ break; } scsi_inquiry(&start_ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, /*inq_buf*/(u_int8_t *)block_limits, /*inq_len*/sizeof(*block_limits), /*evpd*/TRUE, /*page_code*/SVPD_BLOCK_LIMITS, /*sense_len*/SSD_MIN_SIZE, /*timeout*/da_default_timeout * 1000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_PROBE_BLK_LIMITS; xpt_action(start_ccb); break; } case DA_STATE_PROBE_BDC: { struct scsi_vpd_block_characteristics *bdc; if (!scsi_vpd_supported_page(periph, SVPD_BDC)) { softc->state = DA_STATE_PROBE_ATA; goto skipstate; } bdc = (struct scsi_vpd_block_characteristics *) malloc(sizeof(*bdc), M_SCSIDA, M_NOWAIT|M_ZERO); if (bdc == NULL) { printf("dastart: Couldn't malloc bdc data\n"); /* da_free_periph??? */ break; } scsi_inquiry(&start_ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, /*inq_buf*/(u_int8_t *)bdc, /*inq_len*/sizeof(*bdc), /*evpd*/TRUE, /*page_code*/SVPD_BDC, /*sense_len*/SSD_MIN_SIZE, /*timeout*/da_default_timeout * 1000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_PROBE_BDC; xpt_action(start_ccb); break; } case DA_STATE_PROBE_ATA: { struct ata_params *ata_params; if (!scsi_vpd_supported_page(periph, SVPD_ATA_INFORMATION)) { daprobedone(periph, start_ccb); break; } ata_params = (struct ata_params*) malloc(sizeof(*ata_params), M_SCSIDA, M_NOWAIT|M_ZERO); if (ata_params == NULL) { printf("dastart: Couldn't malloc ata_params data\n"); /* da_free_periph??? */ break; } scsi_ata_identify(&start_ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, /*data_ptr*/(u_int8_t *)ata_params, /*dxfer_len*/sizeof(*ata_params), /*sense_len*/SSD_FULL_SIZE, /*timeout*/da_default_timeout * 1000); start_ccb->ccb_h.ccb_bp = NULL; start_ccb->ccb_h.ccb_state = DA_CCB_PROBE_ATA; xpt_action(start_ccb); break; } } } /* * In each of the methods below, while its the caller's * responsibility to ensure the request will fit into a * single device request, we might have changed the delete * method due to the device incorrectly advertising either * its supported methods or limits. * * To prevent this causing further issues we validate the * against the methods limits, and warn which would * otherwise be unnecessary. */ static void da_delete_unmap(struct cam_periph *periph, union ccb *ccb, struct bio *bp) { struct da_softc *softc = (struct da_softc *)periph->softc;; struct bio *bp1; uint8_t *buf = softc->unmap_buf; uint64_t lba, lastlba = (uint64_t)-1; uint64_t totalcount = 0; uint64_t count; uint32_t lastcount = 0, c; uint32_t off, ranges = 0; /* * Currently this doesn't take the UNMAP * Granularity and Granularity Alignment * fields into account. * * This could result in both unoptimal unmap * requests as as well as UNMAP calls unmapping * fewer LBA's than requested. */ softc->delete_running = 1; bzero(softc->unmap_buf, sizeof(softc->unmap_buf)); bp1 = bp; do { bioq_remove(&softc->delete_queue, bp1); if (bp1 != bp) bioq_insert_tail(&softc->delete_run_queue, bp1); lba = bp1->bio_pblkno; count = bp1->bio_bcount / softc->params.secsize; /* Try to extend the previous range. */ if (lba == lastlba) { c = omin(count, UNMAP_RANGE_MAX - lastcount); lastcount += c; off = ((ranges - 1) * UNMAP_RANGE_SIZE) + UNMAP_HEAD_SIZE; scsi_ulto4b(lastcount, &buf[off + 8]); count -= c; lba +=c; totalcount += c; } while (count > 0) { c = omin(count, UNMAP_RANGE_MAX); if (totalcount + c > softc->unmap_max_lba || ranges >= softc->unmap_max_ranges) { xpt_print(periph->path, "%s issuing short delete %ld > %ld" "|| %d >= %d", da_delete_method_desc[softc->delete_method], totalcount + c, softc->unmap_max_lba, ranges, softc->unmap_max_ranges); break; } off = (ranges * UNMAP_RANGE_SIZE) + UNMAP_HEAD_SIZE; scsi_u64to8b(lba, &buf[off + 0]); scsi_ulto4b(c, &buf[off + 8]); lba += c; totalcount += c; ranges++; count -= c; lastcount = c; } lastlba = lba; bp1 = bioq_first(&softc->delete_queue); if (bp1 == NULL || ranges >= softc->unmap_max_ranges || totalcount + bp1->bio_bcount / softc->params.secsize > softc->unmap_max_lba) break; } while (1); scsi_ulto2b(ranges * 16 + 6, &buf[0]); scsi_ulto2b(ranges * 16, &buf[2]); scsi_unmap(&ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, /*byte2*/0, /*data_ptr*/ buf, /*dxfer_len*/ ranges * 16 + 8, /*sense_len*/SSD_FULL_SIZE, da_default_timeout * 1000); ccb->ccb_h.ccb_state = DA_CCB_DELETE; ccb->ccb_h.flags |= CAM_UNLOCKED; } static void da_delete_trim(struct cam_periph *periph, union ccb *ccb, struct bio *bp) { struct da_softc *softc = (struct da_softc *)periph->softc; struct bio *bp1; uint8_t *buf = softc->unmap_buf; uint64_t lastlba = (uint64_t)-1; uint64_t count; uint64_t lba; uint32_t lastcount = 0, c, requestcount; int ranges = 0, off, block_count; softc->delete_running = 1; bzero(softc->unmap_buf, sizeof(softc->unmap_buf)); bp1 = bp; do { bioq_remove(&softc->delete_queue, bp1); if (bp1 != bp) bioq_insert_tail(&softc->delete_run_queue, bp1); lba = bp1->bio_pblkno; count = bp1->bio_bcount / softc->params.secsize; requestcount = count; /* Try to extend the previous range. */ if (lba == lastlba) { c = min(count, ATA_DSM_RANGE_MAX - lastcount); lastcount += c; off = (ranges - 1) * 8; buf[off + 6] = lastcount & 0xff; buf[off + 7] = (lastcount >> 8) & 0xff; count -= c; lba += c; } while (count > 0) { c = min(count, ATA_DSM_RANGE_MAX); off = ranges * 8; buf[off + 0] = lba & 0xff; buf[off + 1] = (lba >> 8) & 0xff; buf[off + 2] = (lba >> 16) & 0xff; buf[off + 3] = (lba >> 24) & 0xff; buf[off + 4] = (lba >> 32) & 0xff; buf[off + 5] = (lba >> 40) & 0xff; buf[off + 6] = c & 0xff; buf[off + 7] = (c >> 8) & 0xff; lba += c; ranges++; count -= c; lastcount = c; if (count != 0 && ranges == softc->trim_max_ranges) { xpt_print(periph->path, "%s issuing short delete %ld > %ld\n", da_delete_method_desc[softc->delete_method], requestcount, (softc->trim_max_ranges - ranges) * ATA_DSM_RANGE_MAX); break; } } lastlba = lba; bp1 = bioq_first(&softc->delete_queue); if (bp1 == NULL || bp1->bio_bcount / softc->params.secsize > (softc->trim_max_ranges - ranges) * ATA_DSM_RANGE_MAX) break; } while (1); block_count = (ranges + ATA_DSM_BLK_RANGES - 1) / ATA_DSM_BLK_RANGES; scsi_ata_trim(&ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, block_count, /*data_ptr*/buf, /*dxfer_len*/block_count * ATA_DSM_BLK_SIZE, /*sense_len*/SSD_FULL_SIZE, da_default_timeout * 1000); ccb->ccb_h.ccb_state = DA_CCB_DELETE; ccb->ccb_h.flags |= CAM_UNLOCKED; } /* * We calculate ws_max_blks here based off d_delmaxsize instead * of using softc->ws_max_blks as it is absolute max for the * device not the protocol max which may well be lower. */ static void da_delete_ws(struct cam_periph *periph, union ccb *ccb, struct bio *bp) { struct da_softc *softc; struct bio *bp1; uint64_t ws_max_blks; uint64_t lba; uint64_t count; /* forward compat with WS32 */ softc = (struct da_softc *)periph->softc; ws_max_blks = softc->disk->d_delmaxsize / softc->params.secsize; softc->delete_running = 1; lba = bp->bio_pblkno; count = 0; bp1 = bp; do { bioq_remove(&softc->delete_queue, bp1); if (bp1 != bp) bioq_insert_tail(&softc->delete_run_queue, bp1); count += bp1->bio_bcount / softc->params.secsize; if (count > ws_max_blks) { xpt_print(periph->path, "%s issuing short delete %ld > %ld\n", da_delete_method_desc[softc->delete_method], count, ws_max_blks); count = min(count, ws_max_blks); break; } bp1 = bioq_first(&softc->delete_queue); if (bp1 == NULL || lba + count != bp1->bio_pblkno || count + bp1->bio_bcount / softc->params.secsize > ws_max_blks) break; } while (1); scsi_write_same(&ccb->csio, /*retries*/da_retry_count, /*cbfcnp*/dadone, /*tag_action*/MSG_SIMPLE_Q_TAG, /*byte2*/softc->delete_method == DA_DELETE_ZERO ? 0 : SWS_UNMAP, softc->delete_method == DA_DELETE_WS16 ? 16 : 10, /*lba*/lba, /*block_count*/count, /*data_ptr*/ __DECONST(void *, zero_region), /*dxfer_len*/ softc->params.secsize, /*sense_len*/SSD_FULL_SIZE, da_default_timeout * 1000); ccb->ccb_h.ccb_state = DA_CCB_DELETE; ccb->ccb_h.flags |= CAM_UNLOCKED; } static int cmd6workaround(union ccb *ccb) { struct scsi_rw_6 cmd6; struct scsi_rw_10 *cmd10; struct da_softc *softc; u_int8_t *cdb; struct bio *bp; int frozen; cdb = ccb->csio.cdb_io.cdb_bytes; softc = (struct da_softc *)xpt_path_periph(ccb->ccb_h.path)->softc; if (ccb->ccb_h.ccb_state == DA_CCB_DELETE) { da_delete_methods old_method = softc->delete_method; /* * Typically there are two reasons for failure here * 1. Delete method was detected as supported but isn't * 2. Delete failed due to invalid params e.g. too big * * While we will attempt to choose an alternative delete method * this may result in short deletes if the existing delete * requests from geom are big for the new method choosen. * * This method assumes that the error which triggered this * will not retry the io otherwise a panic will occur */ dadeleteflag(softc, old_method, 0); dadeletemethodchoose(softc, DA_DELETE_DISABLE); if (softc->delete_method == DA_DELETE_DISABLE) xpt_print(ccb->ccb_h.path, "%s failed, disabling BIO_DELETE\n", da_delete_method_desc[old_method]); else xpt_print(ccb->ccb_h.path, "%s failed, switching to %s BIO_DELETE\n", da_delete_method_desc[old_method], da_delete_method_desc[softc->delete_method]); while ((bp = bioq_takefirst(&softc->delete_run_queue)) != NULL) bioq_disksort(&softc->delete_queue, bp); bioq_disksort(&softc->delete_queue, (struct bio *)ccb->ccb_h.ccb_bp); ccb->ccb_h.ccb_bp = NULL; return (0); } /* Detect unsupported PREVENT ALLOW MEDIUM REMOVAL. */ if ((ccb->ccb_h.flags & CAM_CDB_POINTER) == 0 && (*cdb == PREVENT_ALLOW) && (softc->quirks & DA_Q_NO_PREVENT) == 0) { if (bootverbose) xpt_print(ccb->ccb_h.path, "PREVENT ALLOW MEDIUM REMOVAL not supported.\n"); softc->quirks |= DA_Q_NO_PREVENT; return (0); } /* Detect unsupported SYNCHRONIZE CACHE(10). */ if ((ccb->ccb_h.flags & CAM_CDB_POINTER) == 0 && (*cdb == SYNCHRONIZE_CACHE) && (softc->quirks & DA_Q_NO_SYNC_CACHE) == 0) { if (bootverbose) xpt_print(ccb->ccb_h.path, "SYNCHRONIZE CACHE(10) not supported.\n"); softc->quirks |= DA_Q_NO_SYNC_CACHE; softc->disk->d_flags &= ~DISKFLAG_CANFLUSHCACHE; return (0); } /* Translation only possible if CDB is an array and cmd is R/W6 */ if ((ccb->ccb_h.flags & CAM_CDB_POINTER) != 0 || (*cdb != READ_6 && *cdb != WRITE_6)) return 0; xpt_print(ccb->ccb_h.path, "READ(6)/WRITE(6) not supported, " "increasing minimum_cmd_size to 10.\n"); softc->minimum_cmd_size = 10; bcopy(cdb, &cmd6, sizeof(struct scsi_rw_6)); cmd10 = (struct scsi_rw_10 *)cdb; cmd10->opcode = (cmd6.opcode == READ_6) ? READ_10 : WRITE_10; cmd10->byte2 = 0; scsi_ulto4b(scsi_3btoul(cmd6.addr), cmd10->addr); cmd10->reserved = 0; scsi_ulto2b(cmd6.length, cmd10->length); cmd10->control = cmd6.control; ccb->csio.cdb_len = sizeof(*cmd10); /* Requeue request, unfreezing queue if necessary */ frozen = (ccb->ccb_h.status & CAM_DEV_QFRZN) != 0; ccb->ccb_h.status = CAM_REQUEUE_REQ; xpt_action(ccb); if (frozen) { cam_release_devq(ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } return (ERESTART); } static void dadone(struct cam_periph *periph, union ccb *done_ccb) { struct da_softc *softc; struct ccb_scsiio *csio; u_int32_t priority; da_ccb_state state; softc = (struct da_softc *)periph->softc; priority = done_ccb->ccb_h.pinfo.priority; CAM_DEBUG(periph->path, CAM_DEBUG_TRACE, ("dadone\n")); csio = &done_ccb->csio; state = csio->ccb_h.ccb_state & DA_CCB_TYPE_MASK; switch (state) { case DA_CCB_BUFFER_IO: case DA_CCB_DELETE: { struct bio *bp, *bp1; cam_periph_lock(periph); bp = (struct bio *)done_ccb->ccb_h.ccb_bp; if ((done_ccb->ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { int error; int sf; if ((csio->ccb_h.ccb_state & DA_CCB_RETRY_UA) != 0) sf = SF_RETRY_UA; else sf = 0; error = daerror(done_ccb, CAM_RETRY_SELTO, sf); if (error == ERESTART) { /* * A retry was scheduled, so * just return. */ cam_periph_unlock(periph); return; } bp = (struct bio *)done_ccb->ccb_h.ccb_bp; if (error != 0) { int queued_error; /* * return all queued I/O with EIO, so that * the client can retry these I/Os in the * proper order should it attempt to recover. */ queued_error = EIO; if (error == ENXIO && (softc->flags & DA_FLAG_PACK_INVALID)== 0) { /* * Catastrophic error. Mark our pack as * invalid. */ /* * XXX See if this is really a media * XXX change first? */ xpt_print(periph->path, "Invalidating pack\n"); softc->flags |= DA_FLAG_PACK_INVALID; queued_error = ENXIO; } bioq_flush(&softc->bio_queue, NULL, queued_error); if (bp != NULL) { bp->bio_error = error; bp->bio_resid = bp->bio_bcount; bp->bio_flags |= BIO_ERROR; } } else if (bp != NULL) { if (state == DA_CCB_DELETE) bp->bio_resid = 0; else bp->bio_resid = csio->resid; bp->bio_error = 0; if (bp->bio_resid != 0) bp->bio_flags |= BIO_ERROR; } if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } else if (bp != NULL) { if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) panic("REQ_CMP with QFRZN"); if (state == DA_CCB_DELETE) bp->bio_resid = 0; else bp->bio_resid = csio->resid; if (csio->resid > 0) bp->bio_flags |= BIO_ERROR; if (softc->error_inject != 0) { bp->bio_error = softc->error_inject; bp->bio_resid = bp->bio_bcount; bp->bio_flags |= BIO_ERROR; softc->error_inject = 0; } } LIST_REMOVE(&done_ccb->ccb_h, periph_links.le); if (LIST_EMPTY(&softc->pending_ccbs)) softc->flags |= DA_FLAG_WAS_OTAG; xpt_release_ccb(done_ccb); if (state == DA_CCB_DELETE) { TAILQ_HEAD(, bio) queue; TAILQ_INIT(&queue); TAILQ_CONCAT(&queue, &softc->delete_run_queue.queue, bio_queue); softc->delete_run_queue.insert_point = NULL; + /* + * Normally, the xpt_release_ccb() above would make sure + * that when we have more work to do, that work would + * get kicked off. However, we specifically keep + * delete_running set to 0 before the call above to + * allow other I/O to progress when many BIO_DELETE + * requests are pushed down. We set delete_running to 0 + * and call daschedule again so that we don't stall if + * there are no other I/Os pending apart from BIO_DELETEs. + */ softc->delete_running = 0; daschedule(periph); cam_periph_unlock(periph); while ((bp1 = TAILQ_FIRST(&queue)) != NULL) { TAILQ_REMOVE(&queue, bp1, bio_queue); bp1->bio_error = bp->bio_error; if (bp->bio_flags & BIO_ERROR) { bp1->bio_flags |= BIO_ERROR; bp1->bio_resid = bp1->bio_bcount; } else bp1->bio_resid = 0; biodone(bp1); } } else cam_periph_unlock(periph); if (bp != NULL) biodone(bp); return; } case DA_CCB_PROBE_RC: case DA_CCB_PROBE_RC16: { struct scsi_read_capacity_data *rdcap; struct scsi_read_capacity_data_long *rcaplong; char announce_buf[80]; int lbp; lbp = 0; rdcap = NULL; rcaplong = NULL; if (state == DA_CCB_PROBE_RC) rdcap =(struct scsi_read_capacity_data *)csio->data_ptr; else rcaplong = (struct scsi_read_capacity_data_long *) csio->data_ptr; if ((csio->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_CMP) { struct disk_params *dp; uint32_t block_size; uint64_t maxsector; u_int lalba; /* Lowest aligned LBA. */ if (state == DA_CCB_PROBE_RC) { block_size = scsi_4btoul(rdcap->length); maxsector = scsi_4btoul(rdcap->addr); lalba = 0; /* * According to SBC-2, if the standard 10 * byte READ CAPACITY command returns 2^32, * we should issue the 16 byte version of * the command, since the device in question * has more sectors than can be represented * with the short version of the command. */ if (maxsector == 0xffffffff) { free(rdcap, M_SCSIDA); xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_RC16; xpt_schedule(periph, priority); return; } } else { block_size = scsi_4btoul(rcaplong->length); maxsector = scsi_8btou64(rcaplong->addr); lalba = scsi_2btoul(rcaplong->lalba_lbp); } /* * Because GEOM code just will panic us if we * give them an 'illegal' value we'll avoid that * here. */ if (block_size == 0 && maxsector == 0) { block_size = 512; maxsector = -1; } else if (block_size == 0) { block_size = 512; } if (block_size >= MAXPHYS) { xpt_print(periph->path, "unsupportable block size %ju\n", (uintmax_t) block_size); announce_buf[0] = '\0'; cam_periph_invalidate(periph); } else { /* * We pass rcaplong into dasetgeom(), * because it will only use it if it is * non-NULL. */ dasetgeom(periph, block_size, maxsector, rcaplong, sizeof(*rcaplong)); lbp = (lalba & SRC16_LBPME_A); dp = &softc->params; snprintf(announce_buf, sizeof(announce_buf), "%juMB (%ju %u byte sectors: %dH %dS/T " "%dC)", (uintmax_t) (((uintmax_t)dp->secsize * dp->sectors) / (1024*1024)), (uintmax_t)dp->sectors, dp->secsize, dp->heads, dp->secs_per_track, dp->cylinders); } } else { int error; announce_buf[0] = '\0'; /* * Retry any UNIT ATTENTION type errors. They * are expected at boot. */ error = daerror(done_ccb, CAM_RETRY_SELTO, SF_RETRY_UA|SF_NO_PRINT); if (error == ERESTART) { /* * A retry was scheuled, so * just return. */ return; } else if (error != 0) { int asc, ascq; int sense_key, error_code; int have_sense; cam_status status; struct ccb_getdev cgd; /* Don't wedge this device's queue */ status = done_ccb->ccb_h.status; if ((status & CAM_DEV_QFRZN) != 0) cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); xpt_setup_ccb(&cgd.ccb_h, done_ccb->ccb_h.path, CAM_PRIORITY_NORMAL); cgd.ccb_h.func_code = XPT_GDEV_TYPE; xpt_action((union ccb *)&cgd); if (scsi_extract_sense_ccb(done_ccb, &error_code, &sense_key, &asc, &ascq)) have_sense = TRUE; else have_sense = FALSE; /* * If we tried READ CAPACITY(16) and failed, * fallback to READ CAPACITY(10). */ if ((state == DA_CCB_PROBE_RC16) && (softc->flags & DA_FLAG_CAN_RC16) && (((csio->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_INVALID) || ((have_sense) && (error_code == SSD_CURRENT_ERROR) && (sense_key == SSD_KEY_ILLEGAL_REQUEST)))) { softc->flags &= ~DA_FLAG_CAN_RC16; free(rdcap, M_SCSIDA); xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_RC; xpt_schedule(periph, priority); return; } else /* * Attach to anything that claims to be a * direct access or optical disk device, * as long as it doesn't return a "Logical * unit not supported" (0x25) error. */ if ((have_sense) && (asc != 0x25) && (error_code == SSD_CURRENT_ERROR)) { const char *sense_key_desc; const char *asc_desc; dasetgeom(periph, 512, -1, NULL, 0); scsi_sense_desc(sense_key, asc, ascq, &cgd.inq_data, &sense_key_desc, &asc_desc); snprintf(announce_buf, sizeof(announce_buf), "Attempt to query device " "size failed: %s, %s", sense_key_desc, asc_desc); } else { if (have_sense) scsi_sense_print( &done_ccb->csio); else { xpt_print(periph->path, "got CAM status %#x\n", done_ccb->ccb_h.status); } xpt_print(periph->path, "fatal error, " "failed to attach to device\n"); /* * Free up resources. */ cam_periph_invalidate(periph); } } } free(csio->data_ptr, M_SCSIDA); if (announce_buf[0] != '\0' && ((softc->flags & DA_FLAG_ANNOUNCED) == 0)) { /* * Create our sysctl variables, now that we know * we have successfully attached. */ /* increase the refcount */ if (cam_periph_acquire(periph) == CAM_REQ_CMP) { taskqueue_enqueue(taskqueue_thread, &softc->sysctl_task); xpt_announce_periph(periph, announce_buf); xpt_announce_quirks(periph, softc->quirks, DA_Q_BIT_STRING); } else { xpt_print(periph->path, "fatal error, " "could not acquire reference count\n"); } } /* We already probed the device. */ if (softc->flags & DA_FLAG_PROBED) { daprobedone(periph, done_ccb); return; } /* Ensure re-probe doesn't see old delete. */ softc->delete_available = 0; if (lbp && (softc->quirks & DA_Q_NO_UNMAP) == 0) { /* * Based on older SBC-3 spec revisions * any of the UNMAP methods "may" be * available via LBP given this flag so * we flag all of them as availble and * then remove those which further * probes confirm aren't available * later. * * We could also check readcap(16) p_type * flag to exclude one or more invalid * write same (X) types here */ dadeleteflag(softc, DA_DELETE_WS16, 1); dadeleteflag(softc, DA_DELETE_WS10, 1); dadeleteflag(softc, DA_DELETE_ZERO, 1); dadeleteflag(softc, DA_DELETE_UNMAP, 1); xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_LBP; xpt_schedule(periph, priority); return; } xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_BDC; xpt_schedule(periph, priority); return; } case DA_CCB_PROBE_LBP: { struct scsi_vpd_logical_block_prov *lbp; lbp = (struct scsi_vpd_logical_block_prov *)csio->data_ptr; if ((csio->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_CMP) { /* * T10/1799-D Revision 31 states at least one of these * must be supported but we don't currently enforce this. */ dadeleteflag(softc, DA_DELETE_WS16, (lbp->flags & SVPD_LBP_WS16)); dadeleteflag(softc, DA_DELETE_WS10, (lbp->flags & SVPD_LBP_WS10)); dadeleteflag(softc, DA_DELETE_ZERO, (lbp->flags & SVPD_LBP_WS10)); dadeleteflag(softc, DA_DELETE_UNMAP, (lbp->flags & SVPD_LBP_UNMAP)); } else { int error; error = daerror(done_ccb, CAM_RETRY_SELTO, SF_RETRY_UA|SF_NO_PRINT); if (error == ERESTART) return; else if (error != 0) { if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { /* Don't wedge this device's queue */ cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } /* * Failure indicates we don't support any SBC-3 * delete methods with UNMAP */ } } free(lbp, M_SCSIDA); xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_BLK_LIMITS; xpt_schedule(periph, priority); return; } case DA_CCB_PROBE_BLK_LIMITS: { struct scsi_vpd_block_limits *block_limits; block_limits = (struct scsi_vpd_block_limits *)csio->data_ptr; if ((csio->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_CMP) { uint32_t max_txfer_len = scsi_4btoul( block_limits->max_txfer_len); uint32_t max_unmap_lba_cnt = scsi_4btoul( block_limits->max_unmap_lba_cnt); uint32_t max_unmap_blk_cnt = scsi_4btoul( block_limits->max_unmap_blk_cnt); uint64_t ws_max_blks = scsi_8btou64( block_limits->max_write_same_length); if (max_txfer_len != 0) { softc->disk->d_maxsize = MIN(softc->maxio, (off_t)max_txfer_len * softc->params.secsize); } /* * We should already support UNMAP but we check lba * and block count to be sure */ if (max_unmap_lba_cnt != 0x00L && max_unmap_blk_cnt != 0x00L) { softc->unmap_max_lba = max_unmap_lba_cnt; softc->unmap_max_ranges = min(max_unmap_blk_cnt, UNMAP_MAX_RANGES); } else { /* * Unexpected UNMAP limits which means the * device doesn't actually support UNMAP */ dadeleteflag(softc, DA_DELETE_UNMAP, 0); } if (ws_max_blks != 0x00L) softc->ws_max_blks = ws_max_blks; } else { int error; error = daerror(done_ccb, CAM_RETRY_SELTO, SF_RETRY_UA|SF_NO_PRINT); if (error == ERESTART) return; else if (error != 0) { if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { /* Don't wedge this device's queue */ cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } /* * Failure here doesn't mean UNMAP is not * supported as this is an optional page. */ softc->unmap_max_lba = 1; softc->unmap_max_ranges = 1; } } free(block_limits, M_SCSIDA); xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_BDC; xpt_schedule(periph, priority); return; } case DA_CCB_PROBE_BDC: { struct scsi_vpd_block_characteristics *bdc; bdc = (struct scsi_vpd_block_characteristics *)csio->data_ptr; if ((csio->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_CMP) { /* * Disable queue sorting for non-rotational media * by default. */ u_int16_t old_rate = softc->disk->d_rotation_rate; softc->disk->d_rotation_rate = scsi_2btoul(bdc->medium_rotation_rate); if (softc->disk->d_rotation_rate == SVPD_BDC_RATE_NON_ROTATING) { softc->sort_io_queue = 0; } if (softc->disk->d_rotation_rate != old_rate) { disk_attr_changed(softc->disk, "GEOM::rotation_rate", M_NOWAIT); } } else { int error; error = daerror(done_ccb, CAM_RETRY_SELTO, SF_RETRY_UA|SF_NO_PRINT); if (error == ERESTART) return; else if (error != 0) { if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { /* Don't wedge this device's queue */ cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } } } free(bdc, M_SCSIDA); xpt_release_ccb(done_ccb); softc->state = DA_STATE_PROBE_ATA; xpt_schedule(periph, priority); return; } case DA_CCB_PROBE_ATA: { int i; struct ata_params *ata_params; int16_t *ptr; ata_params = (struct ata_params *)csio->data_ptr; ptr = (uint16_t *)ata_params; if ((csio->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_CMP) { uint16_t old_rate; for (i = 0; i < sizeof(*ata_params) / 2; i++) ptr[i] = le16toh(ptr[i]); if (ata_params->support_dsm & ATA_SUPPORT_DSM_TRIM && (softc->quirks & DA_Q_NO_UNMAP) == 0) { dadeleteflag(softc, DA_DELETE_ATA_TRIM, 1); if (ata_params->max_dsm_blocks != 0) softc->trim_max_ranges = min( softc->trim_max_ranges, ata_params->max_dsm_blocks * ATA_DSM_BLK_RANGES); } /* * Disable queue sorting for non-rotational media * by default. */ old_rate = softc->disk->d_rotation_rate; softc->disk->d_rotation_rate = ata_params->media_rotation_rate; if (softc->disk->d_rotation_rate == ATA_RATE_NON_ROTATING) { softc->sort_io_queue = 0; } if (softc->disk->d_rotation_rate != old_rate) { disk_attr_changed(softc->disk, "GEOM::rotation_rate", M_NOWAIT); } } else { int error; error = daerror(done_ccb, CAM_RETRY_SELTO, SF_RETRY_UA|SF_NO_PRINT); if (error == ERESTART) return; else if (error != 0) { if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) { /* Don't wedge this device's queue */ cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } } } free(ata_params, M_SCSIDA); daprobedone(periph, done_ccb); return; } case DA_CCB_DUMP: /* No-op. We're polling */ return; case DA_CCB_TUR: { if ((done_ccb->ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { if (daerror(done_ccb, CAM_RETRY_SELTO, SF_RETRY_UA | SF_NO_RECOVERY | SF_NO_PRINT) == ERESTART) return; if ((done_ccb->ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(done_ccb->ccb_h.path, /*relsim_flags*/0, /*reduction*/0, /*timeout*/0, /*getcount_only*/0); } xpt_release_ccb(done_ccb); cam_periph_release_locked(periph); return; } default: break; } xpt_release_ccb(done_ccb); } static void dareprobe(struct cam_periph *periph) { struct da_softc *softc; cam_status status; softc = (struct da_softc *)periph->softc; /* Probe in progress; don't interfere. */ if (softc->state != DA_STATE_NORMAL) return; status = cam_periph_acquire(periph); KASSERT(status == CAM_REQ_CMP, ("dareprobe: cam_periph_acquire failed")); if (softc->flags & DA_FLAG_CAN_RC16) softc->state = DA_STATE_PROBE_RC16; else softc->state = DA_STATE_PROBE_RC; xpt_schedule(periph, CAM_PRIORITY_DEV); } static int daerror(union ccb *ccb, u_int32_t cam_flags, u_int32_t sense_flags) { struct da_softc *softc; struct cam_periph *periph; int error, error_code, sense_key, asc, ascq; periph = xpt_path_periph(ccb->ccb_h.path); softc = (struct da_softc *)periph->softc; /* * Automatically detect devices that do not support * READ(6)/WRITE(6) and upgrade to using 10 byte cdbs. */ error = 0; if ((ccb->ccb_h.status & CAM_STATUS_MASK) == CAM_REQ_INVALID) { error = cmd6workaround(ccb); } else if (scsi_extract_sense_ccb(ccb, &error_code, &sense_key, &asc, &ascq)) { if (sense_key == SSD_KEY_ILLEGAL_REQUEST) error = cmd6workaround(ccb); /* * If the target replied with CAPACITY DATA HAS CHANGED UA, * query the capacity and notify upper layers. */ else if (sense_key == SSD_KEY_UNIT_ATTENTION && asc == 0x2A && ascq == 0x09) { xpt_print(periph->path, "Capacity data has changed\n"); softc->flags &= ~DA_FLAG_PROBED; dareprobe(periph); sense_flags |= SF_NO_PRINT; } else if (sense_key == SSD_KEY_UNIT_ATTENTION && asc == 0x28 && ascq == 0x00) { softc->flags &= ~DA_FLAG_PROBED; disk_media_changed(softc->disk, M_NOWAIT); } else if (sense_key == SSD_KEY_UNIT_ATTENTION && asc == 0x3F && ascq == 0x03) { xpt_print(periph->path, "INQUIRY data has changed\n"); softc->flags &= ~DA_FLAG_PROBED; dareprobe(periph); sense_flags |= SF_NO_PRINT; } else if (sense_key == SSD_KEY_NOT_READY && asc == 0x3a && (softc->flags & DA_FLAG_PACK_INVALID) == 0) { softc->flags |= DA_FLAG_PACK_INVALID; disk_media_gone(softc->disk, M_NOWAIT); } } if (error == ERESTART) return (ERESTART); /* * XXX * Until we have a better way of doing pack validation, * don't treat UAs as errors. */ sense_flags |= SF_RETRY_UA; return(cam_periph_error(ccb, cam_flags, sense_flags, &softc->saved_ccb)); } static void damediapoll(void *arg) { struct cam_periph *periph = arg; struct da_softc *softc = periph->softc; if (!softc->tur && LIST_EMPTY(&softc->pending_ccbs)) { if (cam_periph_acquire(periph) == CAM_REQ_CMP) { softc->tur = 1; daschedule(periph); } } /* Queue us up again */ if (da_poll_period != 0) callout_schedule(&softc->mediapoll_c, da_poll_period * hz); } static void daprevent(struct cam_periph *periph, int action) { struct da_softc *softc; union ccb *ccb; int error; softc = (struct da_softc *)periph->softc; if (((action == PR_ALLOW) && (softc->flags & DA_FLAG_PACK_LOCKED) == 0) || ((action == PR_PREVENT) && (softc->flags & DA_FLAG_PACK_LOCKED) != 0)) { return; } ccb = cam_periph_getccb(periph, CAM_PRIORITY_NORMAL); scsi_prevent(&ccb->csio, /*retries*/1, /*cbcfp*/dadone, MSG_SIMPLE_Q_TAG, action, SSD_FULL_SIZE, 5000); error = cam_periph_runccb(ccb, daerror, CAM_RETRY_SELTO, SF_RETRY_UA | SF_NO_PRINT, softc->disk->d_devstat); if (error == 0) { if (action == PR_ALLOW) softc->flags &= ~DA_FLAG_PACK_LOCKED; else softc->flags |= DA_FLAG_PACK_LOCKED; } xpt_release_ccb(ccb); } static void dasetgeom(struct cam_periph *periph, uint32_t block_len, uint64_t maxsector, struct scsi_read_capacity_data_long *rcaplong, size_t rcap_len) { struct ccb_calc_geometry ccg; struct da_softc *softc; struct disk_params *dp; u_int lbppbe, lalba; int error; softc = (struct da_softc *)periph->softc; dp = &softc->params; dp->secsize = block_len; dp->sectors = maxsector + 1; if (rcaplong != NULL) { lbppbe = rcaplong->prot_lbppbe & SRC16_LBPPBE; lalba = scsi_2btoul(rcaplong->lalba_lbp); lalba &= SRC16_LALBA_A; } else { lbppbe = 0; lalba = 0; } if (lbppbe > 0) { dp->stripesize = block_len << lbppbe; dp->stripeoffset = (dp->stripesize - block_len * lalba) % dp->stripesize; } else if (softc->quirks & DA_Q_4K) { dp->stripesize = 4096; dp->stripeoffset = 0; } else { dp->stripesize = 0; dp->stripeoffset = 0; } /* * Have the controller provide us with a geometry * for this disk. The only time the geometry * matters is when we boot and the controller * is the only one knowledgeable enough to come * up with something that will make this a bootable * device. */ xpt_setup_ccb(&ccg.ccb_h, periph->path, CAM_PRIORITY_NORMAL); ccg.ccb_h.func_code = XPT_CALC_GEOMETRY; ccg.block_size = dp->secsize; ccg.volume_size = dp->sectors; ccg.heads = 0; ccg.secs_per_track = 0; ccg.cylinders = 0; xpt_action((union ccb*)&ccg); if ((ccg.ccb_h.status & CAM_STATUS_MASK) != CAM_REQ_CMP) { /* * We don't know what went wrong here- but just pick * a geometry so we don't have nasty things like divide * by zero. */ dp->heads = 255; dp->secs_per_track = 255; dp->cylinders = dp->sectors / (255 * 255); if (dp->cylinders == 0) { dp->cylinders = 1; } } else { dp->heads = ccg.heads; dp->secs_per_track = ccg.secs_per_track; dp->cylinders = ccg.cylinders; } /* * If the user supplied a read capacity buffer, and if it is * different than the previous buffer, update the data in the EDT. * If it's the same, we don't bother. This avoids sending an * update every time someone opens this device. */ if ((rcaplong != NULL) && (bcmp(rcaplong, &softc->rcaplong, min(sizeof(softc->rcaplong), rcap_len)) != 0)) { struct ccb_dev_advinfo cdai; xpt_setup_ccb(&cdai.ccb_h, periph->path, CAM_PRIORITY_NORMAL); cdai.ccb_h.func_code = XPT_DEV_ADVINFO; cdai.buftype = CDAI_TYPE_RCAPLONG; cdai.flags |= CDAI_FLAG_STORE; cdai.bufsiz = rcap_len; cdai.buf = (uint8_t *)rcaplong; xpt_action((union ccb *)&cdai); if ((cdai.ccb_h.status & CAM_DEV_QFRZN) != 0) cam_release_devq(cdai.ccb_h.path, 0, 0, 0, FALSE); if (cdai.ccb_h.status != CAM_REQ_CMP) { xpt_print(periph->path, "%s: failed to set read " "capacity advinfo\n", __func__); /* Use cam_error_print() to decode the status */ cam_error_print((union ccb *)&cdai, CAM_ESF_CAM_STATUS, CAM_EPF_ALL); } else { bcopy(rcaplong, &softc->rcaplong, min(sizeof(softc->rcaplong), rcap_len)); } } softc->disk->d_sectorsize = softc->params.secsize; softc->disk->d_mediasize = softc->params.secsize * (off_t)softc->params.sectors; softc->disk->d_stripesize = softc->params.stripesize; softc->disk->d_stripeoffset = softc->params.stripeoffset; /* XXX: these are not actually "firmware" values, so they may be wrong */ softc->disk->d_fwsectors = softc->params.secs_per_track; softc->disk->d_fwheads = softc->params.heads; softc->disk->d_devstat->block_size = softc->params.secsize; softc->disk->d_devstat->flags &= ~DEVSTAT_BS_UNAVAILABLE; error = disk_resize(softc->disk, M_NOWAIT); if (error != 0) xpt_print(periph->path, "disk_resize(9) failed, error = %d\n", error); } static void dasendorderedtag(void *arg) { struct da_softc *softc = arg; if (da_send_ordered) { if (!LIST_EMPTY(&softc->pending_ccbs)) { if ((softc->flags & DA_FLAG_WAS_OTAG) == 0) softc->flags |= DA_FLAG_NEED_OTAG; softc->flags &= ~DA_FLAG_WAS_OTAG; } } /* Queue us up again */ callout_reset(&softc->sendordered_c, (da_default_timeout * hz) / DA_ORDEREDTAG_INTERVAL, dasendorderedtag, softc); } /* * Step through all DA peripheral drivers, and if the device is still open, * sync the disk cache to physical media. */ static void dashutdown(void * arg, int howto) { struct cam_periph *periph; struct da_softc *softc; union ccb *ccb; int error; CAM_PERIPH_FOREACH(periph, &dadriver) { softc = (struct da_softc *)periph->softc; if (SCHEDULER_STOPPED()) { /* If we paniced with the lock held, do not recurse. */ if (!cam_periph_owned(periph) && (softc->flags & DA_FLAG_OPEN)) { dadump(softc->disk, NULL, 0, 0, 0); } continue; } cam_periph_lock(periph); /* * We only sync the cache if the drive is still open, and * if the drive is capable of it.. */ if (((softc->flags & DA_FLAG_OPEN) == 0) || (softc->quirks & DA_Q_NO_SYNC_CACHE)) { cam_periph_unlock(periph); continue; } ccb = cam_periph_getccb(periph, CAM_PRIORITY_NORMAL); scsi_synchronize_cache(&ccb->csio, /*retries*/0, /*cbfcnp*/dadone, MSG_SIMPLE_Q_TAG, /*begin_lba*/0, /* whole disk */ /*lb_count*/0, SSD_FULL_SIZE, 60 * 60 * 1000); error = cam_periph_runccb(ccb, daerror, /*cam_flags*/0, /*sense_flags*/ SF_NO_RECOVERY | SF_NO_RETRY | SF_QUIET_IR, softc->disk->d_devstat); if (error != 0) xpt_print(periph->path, "Synchronize cache failed\n"); xpt_release_ccb(ccb); cam_periph_unlock(periph); } } #else /* !_KERNEL */ /* * XXX These are only left out of the kernel build to silence warnings. If, * for some reason these functions are used in the kernel, the ifdefs should * be moved so they are included both in the kernel and userland. */ void scsi_format_unit(struct ccb_scsiio *csio, u_int32_t retries, void (*cbfcnp)(struct cam_periph *, union ccb *), u_int8_t tag_action, u_int8_t byte2, u_int16_t ileave, u_int8_t *data_ptr, u_int32_t dxfer_len, u_int8_t sense_len, u_int32_t timeout) { struct scsi_format_unit *scsi_cmd; scsi_cmd = (struct scsi_format_unit *)&csio->cdb_io.cdb_bytes; scsi_cmd->opcode = FORMAT_UNIT; scsi_cmd->byte2 = byte2; scsi_ulto2b(ileave, scsi_cmd->interleave); cam_fill_csio(csio, retries, cbfcnp, /*flags*/ (dxfer_len > 0) ? CAM_DIR_OUT : CAM_DIR_NONE, tag_action, data_ptr, dxfer_len, sense_len, sizeof(*scsi_cmd), timeout); } void scsi_read_defects(struct ccb_scsiio *csio, uint32_t retries, void (*cbfcnp)(struct cam_periph *, union ccb *), uint8_t tag_action, uint8_t list_format, uint32_t addr_desc_index, uint8_t *data_ptr, uint32_t dxfer_len, int minimum_cmd_size, uint8_t sense_len, uint32_t timeout) { uint8_t cdb_len; /* * These conditions allow using the 10 byte command. Otherwise we * need to use the 12 byte command. */ if ((minimum_cmd_size <= 10) && (addr_desc_index == 0) && (dxfer_len <= SRDD10_MAX_LENGTH)) { struct scsi_read_defect_data_10 *cdb10; cdb10 = (struct scsi_read_defect_data_10 *) &csio->cdb_io.cdb_bytes; cdb_len = sizeof(*cdb10); bzero(cdb10, cdb_len); cdb10->opcode = READ_DEFECT_DATA_10; cdb10->format = list_format; scsi_ulto2b(dxfer_len, cdb10->alloc_length); } else { struct scsi_read_defect_data_12 *cdb12; cdb12 = (struct scsi_read_defect_data_12 *) &csio->cdb_io.cdb_bytes; cdb_len = sizeof(*cdb12); bzero(cdb12, cdb_len); cdb12->opcode = READ_DEFECT_DATA_12; cdb12->format = list_format; scsi_ulto4b(dxfer_len, cdb12->alloc_length); scsi_ulto4b(addr_desc_index, cdb12->address_descriptor_index); } cam_fill_csio(csio, retries, cbfcnp, /*flags*/ CAM_DIR_IN, tag_action, data_ptr, dxfer_len, sense_len, cdb_len, timeout); } void scsi_sanitize(struct ccb_scsiio *csio, u_int32_t retries, void (*cbfcnp)(struct cam_periph *, union ccb *), u_int8_t tag_action, u_int8_t byte2, u_int16_t control, u_int8_t *data_ptr, u_int32_t dxfer_len, u_int8_t sense_len, u_int32_t timeout) { struct scsi_sanitize *scsi_cmd; scsi_cmd = (struct scsi_sanitize *)&csio->cdb_io.cdb_bytes; scsi_cmd->opcode = SANITIZE; scsi_cmd->byte2 = byte2; scsi_cmd->control = control; scsi_ulto2b(dxfer_len, scsi_cmd->length); cam_fill_csio(csio, retries, cbfcnp, /*flags*/ (dxfer_len > 0) ? CAM_DIR_OUT : CAM_DIR_NONE, tag_action, data_ptr, dxfer_len, sense_len, sizeof(*scsi_cmd), timeout); } #endif /* _KERNEL */ Index: projects/ifnet/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c =================================================================== --- projects/ifnet/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c (revision 277106) +++ projects/ifnet/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/vdev_mirror.c (revision 277107) @@ -1,660 +1,661 @@ /* * CDDL HEADER START * * The contents of this file are subject to the terms of the * Common Development and Distribution License (the "License"). * You may not use this file except in compliance with the License. * * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE * or http://www.opensolaris.org/os/licensing. * See the License for the specific language governing permissions * and limitations under the License. * * When distributing Covered Code, include this CDDL HEADER in each * file and include the License file at usr/src/OPENSOLARIS.LICENSE. * If applicable, add the following below this CDDL HEADER, with the * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright 2010 Sun Microsystems, Inc. All rights reserved. * Use is subject to license terms. */ /* * Copyright (c) 2012, 2014 by Delphix. All rights reserved. */ #include #include #include #include #include /* * Virtual device vector for mirroring. */ typedef struct mirror_child { vdev_t *mc_vd; uint64_t mc_offset; int mc_error; int mc_load; uint8_t mc_tried; uint8_t mc_skipped; uint8_t mc_speculative; } mirror_child_t; typedef struct mirror_map { int *mm_preferred; int mm_preferred_cnt; int mm_children; boolean_t mm_replacing; boolean_t mm_root; mirror_child_t mm_child[]; } mirror_map_t; static int vdev_mirror_shift = 21; SYSCTL_DECL(_vfs_zfs_vdev); static SYSCTL_NODE(_vfs_zfs_vdev, OID_AUTO, mirror, CTLFLAG_RD, 0, "ZFS VDEV Mirror"); /* * The load configuration settings below are tuned by default for * the case where all devices are of the same rotational type. * * If there is a mixture of rotating and non-rotating media, setting * non_rotating_seek_inc to 0 may well provide better results as it * will direct more reads to the non-rotating vdevs which are more * likely to have a higher performance. */ /* Rotating media load calculation configuration. */ static int rotating_inc = 0; SYSCTL_INT(_vfs_zfs_vdev_mirror, OID_AUTO, rotating_inc, CTLFLAG_RWTUN, &rotating_inc, 0, "Rotating media load increment for non-seeking I/O's"); static int rotating_seek_inc = 5; SYSCTL_INT(_vfs_zfs_vdev_mirror, OID_AUTO, rotating_seek_inc, CTLFLAG_RWTUN, &rotating_seek_inc, 0, "Rotating media load increment for seeking I/O's"); static int rotating_seek_offset = 1 * 1024 * 1024; SYSCTL_INT(_vfs_zfs_vdev_mirror, OID_AUTO, rotating_seek_offset, CTLFLAG_RWTUN, &rotating_seek_offset, 0, "Offset in bytes from the last I/O which " "triggers a reduced rotating media seek increment"); /* Non-rotating media load calculation configuration. */ static int non_rotating_inc = 0; SYSCTL_INT(_vfs_zfs_vdev_mirror, OID_AUTO, non_rotating_inc, CTLFLAG_RWTUN, &non_rotating_inc, 0, "Non-rotating media load increment for non-seeking I/O's"); static int non_rotating_seek_inc = 1; SYSCTL_INT(_vfs_zfs_vdev_mirror, OID_AUTO, non_rotating_seek_inc, CTLFLAG_RWTUN, &non_rotating_seek_inc, 0, "Non-rotating media load increment for seeking I/O's"); static inline size_t vdev_mirror_map_size(int children) { return (offsetof(mirror_map_t, mm_child[children]) + sizeof(int) * children); } static inline mirror_map_t * vdev_mirror_map_alloc(int children, boolean_t replacing, boolean_t root) { mirror_map_t *mm; mm = kmem_zalloc(vdev_mirror_map_size(children), KM_SLEEP); mm->mm_children = children; mm->mm_replacing = replacing; mm->mm_root = root; mm->mm_preferred = (int *)((uintptr_t)mm + offsetof(mirror_map_t, mm_child[children])); return mm; } static void vdev_mirror_map_free(zio_t *zio) { mirror_map_t *mm = zio->io_vsd; kmem_free(mm, vdev_mirror_map_size(mm->mm_children)); } static const zio_vsd_ops_t vdev_mirror_vsd_ops = { vdev_mirror_map_free, zio_vsd_default_cksum_report }; static int vdev_mirror_load(mirror_map_t *mm, vdev_t *vd, uint64_t zio_offset) { uint64_t lastoffset; int load; /* All DVAs have equal weight at the root. */ if (mm->mm_root) return (INT_MAX); /* * We don't return INT_MAX if the device is resilvering i.e. * vdev_resilver_txg != 0 as when tested performance was slightly * worse overall when resilvering with compared to without. */ /* Standard load based on pending queue length. */ load = vdev_queue_length(vd); lastoffset = vdev_queue_lastoffset(vd); if (vd->vdev_rotation_rate == VDEV_RATE_NON_ROTATING) { /* Non-rotating media. */ if (lastoffset == zio_offset) return (load + non_rotating_inc); /* * Apply a seek penalty even for non-rotating devices as * sequential I/O'a can be aggregated into fewer operations * on the device, thus avoiding unnecessary per-command * overhead and boosting performance. */ return (load + non_rotating_seek_inc); } /* Rotating media I/O's which directly follow the last I/O. */ if (lastoffset == zio_offset) return (load + rotating_inc); /* * Apply half the seek increment to I/O's within seek offset * of the last I/O queued to this vdev as they should incure less * of a seek increment. */ if (ABS(lastoffset - zio_offset) < rotating_seek_offset) return (load + (rotating_seek_inc / 2)); /* Apply the full seek increment to all other I/O's. */ return (load + rotating_seek_inc); } static mirror_map_t * vdev_mirror_map_init(zio_t *zio) { mirror_map_t *mm = NULL; mirror_child_t *mc; vdev_t *vd = zio->io_vd; int c; if (vd == NULL) { dva_t *dva = zio->io_bp->blk_dva; spa_t *spa = zio->io_spa; mm = vdev_mirror_map_alloc(BP_GET_NDVAS(zio->io_bp), B_FALSE, B_TRUE); for (c = 0; c < mm->mm_children; c++) { mc = &mm->mm_child[c]; mc->mc_vd = vdev_lookup_top(spa, DVA_GET_VDEV(&dva[c])); mc->mc_offset = DVA_GET_OFFSET(&dva[c]); } } else { mm = vdev_mirror_map_alloc(vd->vdev_children, (vd->vdev_ops == &vdev_replacing_ops || vd->vdev_ops == &vdev_spare_ops), B_FALSE); for (c = 0; c < mm->mm_children; c++) { mc = &mm->mm_child[c]; mc->mc_vd = vd->vdev_child[c]; mc->mc_offset = zio->io_offset; } } zio->io_vsd = mm; zio->io_vsd_ops = &vdev_mirror_vsd_ops; return (mm); } static int vdev_mirror_open(vdev_t *vd, uint64_t *asize, uint64_t *max_asize, uint64_t *logical_ashift, uint64_t *physical_ashift) { int numerrors = 0; int lasterror = 0; if (vd->vdev_children == 0) { vd->vdev_stat.vs_aux = VDEV_AUX_BAD_LABEL; return (SET_ERROR(EINVAL)); } vdev_open_children(vd); for (int c = 0; c < vd->vdev_children; c++) { vdev_t *cvd = vd->vdev_child[c]; if (cvd->vdev_open_error) { lasterror = cvd->vdev_open_error; numerrors++; continue; } *asize = MIN(*asize - 1, cvd->vdev_asize - 1) + 1; *max_asize = MIN(*max_asize - 1, cvd->vdev_max_asize - 1) + 1; *logical_ashift = MAX(*logical_ashift, cvd->vdev_ashift); *physical_ashift = MAX(*physical_ashift, cvd->vdev_physical_ashift); } if (numerrors == vd->vdev_children) { vd->vdev_stat.vs_aux = VDEV_AUX_NO_REPLICAS; return (lasterror); } return (0); } static void vdev_mirror_close(vdev_t *vd) { for (int c = 0; c < vd->vdev_children; c++) vdev_close(vd->vdev_child[c]); } static void vdev_mirror_child_done(zio_t *zio) { mirror_child_t *mc = zio->io_private; mc->mc_error = zio->io_error; mc->mc_tried = 1; mc->mc_skipped = 0; } static void vdev_mirror_scrub_done(zio_t *zio) { mirror_child_t *mc = zio->io_private; if (zio->io_error == 0) { zio_t *pio; mutex_enter(&zio->io_lock); while ((pio = zio_walk_parents(zio)) != NULL) { mutex_enter(&pio->io_lock); ASSERT3U(zio->io_size, >=, pio->io_size); bcopy(zio->io_data, pio->io_data, pio->io_size); mutex_exit(&pio->io_lock); } mutex_exit(&zio->io_lock); } zio_buf_free(zio->io_data, zio->io_size); mc->mc_error = zio->io_error; mc->mc_tried = 1; mc->mc_skipped = 0; } /* * Check the other, lower-index DVAs to see if they're on the same * vdev as the child we picked. If they are, use them since they * are likely to have been allocated from the primary metaslab in * use at the time, and hence are more likely to have locality with * single-copy data. */ static int vdev_mirror_dva_select(zio_t *zio, int p) { dva_t *dva = zio->io_bp->blk_dva; mirror_map_t *mm = zio->io_vsd; int preferred; int c; preferred = mm->mm_preferred[p]; for (p-- ; p >= 0; p--) { c = mm->mm_preferred[p]; if (DVA_GET_VDEV(&dva[c]) == DVA_GET_VDEV(&dva[preferred])) preferred = c; } return (preferred); } static int vdev_mirror_preferred_child_randomize(zio_t *zio) { mirror_map_t *mm = zio->io_vsd; int p; if (mm->mm_root) { p = spa_get_random(mm->mm_preferred_cnt); return (vdev_mirror_dva_select(zio, p)); } /* * To ensure we don't always favour the first matching vdev, * which could lead to wear leveling issues on SSD's, we * use the I/O offset as a pseudo random seed into the vdevs * which have the lowest load. */ p = (zio->io_offset >> vdev_mirror_shift) % mm->mm_preferred_cnt; return (mm->mm_preferred[p]); } /* * Try to find a vdev whose DTL doesn't contain the block we want to read * prefering vdevs based on determined load. * * If we can't, try the read on any vdev we haven't already tried. */ static int vdev_mirror_child_select(zio_t *zio) { mirror_map_t *mm = zio->io_vsd; uint64_t txg = zio->io_txg; int c, lowest_load; ASSERT(zio->io_bp == NULL || BP_PHYSICAL_BIRTH(zio->io_bp) == txg); lowest_load = INT_MAX; mm->mm_preferred_cnt = 0; for (c = 0; c < mm->mm_children; c++) { mirror_child_t *mc; mc = &mm->mm_child[c]; if (mc->mc_tried || mc->mc_skipped) continue; if (!vdev_readable(mc->mc_vd)) { mc->mc_error = SET_ERROR(ENXIO); mc->mc_tried = 1; /* don't even try */ mc->mc_skipped = 1; continue; } if (vdev_dtl_contains(mc->mc_vd, DTL_MISSING, txg, 1)) { mc->mc_error = SET_ERROR(ESTALE); mc->mc_skipped = 1; mc->mc_speculative = 1; continue; } mc->mc_load = vdev_mirror_load(mm, mc->mc_vd, mc->mc_offset); if (mc->mc_load > lowest_load) continue; if (mc->mc_load < lowest_load) { lowest_load = mc->mc_load; mm->mm_preferred_cnt = 0; } mm->mm_preferred[mm->mm_preferred_cnt] = c; mm->mm_preferred_cnt++; } if (mm->mm_preferred_cnt == 1) { vdev_queue_register_lastoffset( mm->mm_child[mm->mm_preferred[0]].mc_vd, zio); return (mm->mm_preferred[0]); } if (mm->mm_preferred_cnt > 1) { int c = vdev_mirror_preferred_child_randomize(zio); vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, zio); return (c); } /* * Every device is either missing or has this txg in its DTL. * Look for any child we haven't already tried before giving up. */ for (c = 0; c < mm->mm_children; c++) { if (!mm->mm_child[c].mc_tried) { vdev_queue_register_lastoffset(mm->mm_child[c].mc_vd, zio); return (c); } } /* * Every child failed. There's no place left to look. */ return (-1); } static void vdev_mirror_io_start(zio_t *zio) { mirror_map_t *mm; mirror_child_t *mc; int c, children; mm = vdev_mirror_map_init(zio); if (zio->io_type == ZIO_TYPE_READ) { - if ((zio->io_flags & ZIO_FLAG_SCRUB) && !mm->mm_replacing) { + if ((zio->io_flags & ZIO_FLAG_SCRUB) && !mm->mm_replacing && + mm->mm_children > 1) { /* * For scrubbing reads we need to allocate a read * buffer for each child and issue reads to all * children. If any child succeeds, it will copy its * data into zio->io_data in vdev_mirror_scrub_done. */ for (c = 0; c < mm->mm_children; c++) { mc = &mm->mm_child[c]; zio_nowait(zio_vdev_child_io(zio, zio->io_bp, mc->mc_vd, mc->mc_offset, zio_buf_alloc(zio->io_size), zio->io_size, zio->io_type, zio->io_priority, 0, vdev_mirror_scrub_done, mc)); } zio_execute(zio); return; } /* * For normal reads just pick one child. */ c = vdev_mirror_child_select(zio); children = (c >= 0); } else { ASSERT(zio->io_type == ZIO_TYPE_WRITE || zio->io_type == ZIO_TYPE_FREE); /* * Writes and frees go to all children. */ c = 0; children = mm->mm_children; } while (children--) { mc = &mm->mm_child[c]; zio_nowait(zio_vdev_child_io(zio, zio->io_bp, mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size, zio->io_type, zio->io_priority, 0, vdev_mirror_child_done, mc)); c++; } zio_execute(zio); } static int vdev_mirror_worst_error(mirror_map_t *mm) { int error[2] = { 0, 0 }; for (int c = 0; c < mm->mm_children; c++) { mirror_child_t *mc = &mm->mm_child[c]; int s = mc->mc_speculative; error[s] = zio_worst_error(error[s], mc->mc_error); } return (error[0] ? error[0] : error[1]); } static void vdev_mirror_io_done(zio_t *zio) { mirror_map_t *mm = zio->io_vsd; mirror_child_t *mc; int c; int good_copies = 0; int unexpected_errors = 0; for (c = 0; c < mm->mm_children; c++) { mc = &mm->mm_child[c]; if (mc->mc_error) { if (!mc->mc_skipped) unexpected_errors++; } else if (mc->mc_tried) { good_copies++; } } if (zio->io_type == ZIO_TYPE_WRITE) { /* * XXX -- for now, treat partial writes as success. * * Now that we support write reallocation, it would be better * to treat partial failure as real failure unless there are * no non-degraded top-level vdevs left, and not update DTLs * if we intend to reallocate. */ /* XXPOLICY */ if (good_copies != mm->mm_children) { /* * Always require at least one good copy. * * For ditto blocks (io_vd == NULL), require * all copies to be good. * * XXX -- for replacing vdevs, there's no great answer. * If the old device is really dead, we may not even * be able to access it -- so we only want to * require good writes to the new device. But if * the new device turns out to be flaky, we want * to be able to detach it -- which requires all * writes to the old device to have succeeded. */ if (good_copies == 0 || zio->io_vd == NULL) zio->io_error = vdev_mirror_worst_error(mm); } return; } else if (zio->io_type == ZIO_TYPE_FREE) { return; } ASSERT(zio->io_type == ZIO_TYPE_READ); /* * If we don't have a good copy yet, keep trying other children. */ /* XXPOLICY */ if (good_copies == 0 && (c = vdev_mirror_child_select(zio)) != -1) { ASSERT(c >= 0 && c < mm->mm_children); mc = &mm->mm_child[c]; zio_vdev_io_redone(zio); zio_nowait(zio_vdev_child_io(zio, zio->io_bp, mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size, ZIO_TYPE_READ, zio->io_priority, 0, vdev_mirror_child_done, mc)); return; } /* XXPOLICY */ if (good_copies == 0) { zio->io_error = vdev_mirror_worst_error(mm); ASSERT(zio->io_error != 0); } if (good_copies && spa_writeable(zio->io_spa) && (unexpected_errors || (zio->io_flags & ZIO_FLAG_RESILVER) || ((zio->io_flags & ZIO_FLAG_SCRUB) && mm->mm_replacing))) { /* * Use the good data we have in hand to repair damaged children. */ for (c = 0; c < mm->mm_children; c++) { /* * Don't rewrite known good children. * Not only is it unnecessary, it could * actually be harmful: if the system lost * power while rewriting the only good copy, * there would be no good copies left! */ mc = &mm->mm_child[c]; if (mc->mc_error == 0) { if (mc->mc_tried) continue; if (!(zio->io_flags & ZIO_FLAG_SCRUB) && !vdev_dtl_contains(mc->mc_vd, DTL_PARTIAL, zio->io_txg, 1)) continue; mc->mc_error = SET_ERROR(ESTALE); } zio_nowait(zio_vdev_child_io(zio, zio->io_bp, mc->mc_vd, mc->mc_offset, zio->io_data, zio->io_size, ZIO_TYPE_WRITE, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_IO_REPAIR | (unexpected_errors ? ZIO_FLAG_SELF_HEAL : 0), NULL, NULL)); } } } static void vdev_mirror_state_change(vdev_t *vd, int faulted, int degraded) { if (faulted == vd->vdev_children) vdev_set_state(vd, B_FALSE, VDEV_STATE_CANT_OPEN, VDEV_AUX_NO_REPLICAS); else if (degraded + faulted != 0) vdev_set_state(vd, B_FALSE, VDEV_STATE_DEGRADED, VDEV_AUX_NONE); else vdev_set_state(vd, B_FALSE, VDEV_STATE_HEALTHY, VDEV_AUX_NONE); } vdev_ops_t vdev_mirror_ops = { vdev_mirror_open, vdev_mirror_close, vdev_default_asize, vdev_mirror_io_start, vdev_mirror_io_done, vdev_mirror_state_change, NULL, NULL, VDEV_TYPE_MIRROR, /* name of this vdev type */ B_FALSE /* not a leaf vdev */ }; vdev_ops_t vdev_replacing_ops = { vdev_mirror_open, vdev_mirror_close, vdev_default_asize, vdev_mirror_io_start, vdev_mirror_io_done, vdev_mirror_state_change, NULL, NULL, VDEV_TYPE_REPLACING, /* name of this vdev type */ B_FALSE /* not a leaf vdev */ }; vdev_ops_t vdev_spare_ops = { vdev_mirror_open, vdev_mirror_close, vdev_default_asize, vdev_mirror_io_start, vdev_mirror_io_done, vdev_mirror_state_change, NULL, NULL, VDEV_TYPE_SPARE, /* name of this vdev type */ B_FALSE /* not a leaf vdev */ }; Index: projects/ifnet/sys/cddl/contrib/opensolaris =================================================================== --- projects/ifnet/sys/cddl/contrib/opensolaris (revision 277106) +++ projects/ifnet/sys/cddl/contrib/opensolaris (revision 277107) Property changes on: projects/ifnet/sys/cddl/contrib/opensolaris ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys/cddl/contrib/opensolaris:r277061-277106 Index: projects/ifnet/sys/dev/ahci/ahci.h =================================================================== --- projects/ifnet/sys/dev/ahci/ahci.h (revision 277106) +++ projects/ifnet/sys/dev/ahci/ahci.h (revision 277107) @@ -1,612 +1,613 @@ /*- * Copyright (c) 1998 - 2008 Søren Schmidt * Copyright (c) 2009-2012 Alexander Motin * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ /* ATA register defines */ #define ATA_DATA 0 /* (RW) data */ #define ATA_FEATURE 1 /* (W) feature */ #define ATA_F_DMA 0x01 /* enable DMA */ #define ATA_F_OVL 0x02 /* enable overlap */ #define ATA_COUNT 2 /* (W) sector count */ #define ATA_SECTOR 3 /* (RW) sector # */ #define ATA_CYL_LSB 4 /* (RW) cylinder# LSB */ #define ATA_CYL_MSB 5 /* (RW) cylinder# MSB */ #define ATA_DRIVE 6 /* (W) Sector/Drive/Head */ #define ATA_D_LBA 0x40 /* use LBA addressing */ #define ATA_D_IBM 0xa0 /* 512 byte sectors, ECC */ #define ATA_COMMAND 7 /* (W) command */ #define ATA_ERROR 8 /* (R) error */ #define ATA_E_ILI 0x01 /* illegal length */ #define ATA_E_NM 0x02 /* no media */ #define ATA_E_ABORT 0x04 /* command aborted */ #define ATA_E_MCR 0x08 /* media change request */ #define ATA_E_IDNF 0x10 /* ID not found */ #define ATA_E_MC 0x20 /* media changed */ #define ATA_E_UNC 0x40 /* uncorrectable data */ #define ATA_E_ICRC 0x80 /* UDMA crc error */ #define ATA_E_ATAPI_SENSE_MASK 0xf0 /* ATAPI sense key mask */ #define ATA_IREASON 9 /* (R) interrupt reason */ #define ATA_I_CMD 0x01 /* cmd (1) | data (0) */ #define ATA_I_IN 0x02 /* read (1) | write (0) */ #define ATA_I_RELEASE 0x04 /* released bus (1) */ #define ATA_I_TAGMASK 0xf8 /* tag mask */ #define ATA_STATUS 10 /* (R) status */ #define ATA_ALTSTAT 11 /* (R) alternate status */ #define ATA_S_ERROR 0x01 /* error */ #define ATA_S_INDEX 0x02 /* index */ #define ATA_S_CORR 0x04 /* data corrected */ #define ATA_S_DRQ 0x08 /* data request */ #define ATA_S_DSC 0x10 /* drive seek completed */ #define ATA_S_SERVICE 0x10 /* drive needs service */ #define ATA_S_DWF 0x20 /* drive write fault */ #define ATA_S_DMA 0x20 /* DMA ready */ #define ATA_S_READY 0x40 /* drive ready */ #define ATA_S_BUSY 0x80 /* busy */ #define ATA_CONTROL 12 /* (W) control */ #define ATA_A_IDS 0x02 /* disable interrupts */ #define ATA_A_RESET 0x04 /* RESET controller */ #define ATA_A_4BIT 0x08 /* 4 head bits */ #define ATA_A_HOB 0x80 /* High Order Byte enable */ /* SATA register defines */ #define ATA_SSTATUS 13 #define ATA_SS_DET_MASK 0x0000000f #define ATA_SS_DET_NO_DEVICE 0x00000000 #define ATA_SS_DET_DEV_PRESENT 0x00000001 #define ATA_SS_DET_PHY_ONLINE 0x00000003 #define ATA_SS_DET_PHY_OFFLINE 0x00000004 #define ATA_SS_SPD_MASK 0x000000f0 #define ATA_SS_SPD_NO_SPEED 0x00000000 #define ATA_SS_SPD_GEN1 0x00000010 #define ATA_SS_SPD_GEN2 0x00000020 #define ATA_SS_SPD_GEN3 0x00000040 #define ATA_SS_IPM_MASK 0x00000f00 #define ATA_SS_IPM_NO_DEVICE 0x00000000 #define ATA_SS_IPM_ACTIVE 0x00000100 #define ATA_SS_IPM_PARTIAL 0x00000200 #define ATA_SS_IPM_SLUMBER 0x00000600 #define ATA_SS_IPM_DEVSLEEP 0x00000800 #define ATA_SERROR 14 #define ATA_SE_DATA_CORRECTED 0x00000001 #define ATA_SE_COMM_CORRECTED 0x00000002 #define ATA_SE_DATA_ERR 0x00000100 #define ATA_SE_COMM_ERR 0x00000200 #define ATA_SE_PROT_ERR 0x00000400 #define ATA_SE_HOST_ERR 0x00000800 #define ATA_SE_PHY_CHANGED 0x00010000 #define ATA_SE_PHY_IERROR 0x00020000 #define ATA_SE_COMM_WAKE 0x00040000 #define ATA_SE_DECODE_ERR 0x00080000 #define ATA_SE_PARITY_ERR 0x00100000 #define ATA_SE_CRC_ERR 0x00200000 #define ATA_SE_HANDSHAKE_ERR 0x00400000 #define ATA_SE_LINKSEQ_ERR 0x00800000 #define ATA_SE_TRANSPORT_ERR 0x01000000 #define ATA_SE_UNKNOWN_FIS 0x02000000 #define ATA_SE_EXCHANGED 0x04000000 #define ATA_SCONTROL 15 #define ATA_SC_DET_MASK 0x0000000f #define ATA_SC_DET_IDLE 0x00000000 #define ATA_SC_DET_RESET 0x00000001 #define ATA_SC_DET_DISABLE 0x00000004 #define ATA_SC_SPD_MASK 0x000000f0 #define ATA_SC_SPD_NO_SPEED 0x00000000 #define ATA_SC_SPD_SPEED_GEN1 0x00000010 #define ATA_SC_SPD_SPEED_GEN2 0x00000020 #define ATA_SC_SPD_SPEED_GEN3 0x00000040 #define ATA_SC_IPM_MASK 0x00000f00 #define ATA_SC_IPM_NONE 0x00000000 #define ATA_SC_IPM_DIS_PARTIAL 0x00000100 #define ATA_SC_IPM_DIS_SLUMBER 0x00000200 #define ATA_SC_IPM_DIS_DEVSLEEP 0x00000400 #define ATA_SACTIVE 16 #define AHCI_MAX_PORTS 32 #define AHCI_MAX_SLOTS 32 #define AHCI_MAX_IRQS 16 /* SATA AHCI v1.0 register defines */ #define AHCI_CAP 0x00 #define AHCI_CAP_NPMASK 0x0000001f #define AHCI_CAP_SXS 0x00000020 #define AHCI_CAP_EMS 0x00000040 #define AHCI_CAP_CCCS 0x00000080 #define AHCI_CAP_NCS 0x00001F00 #define AHCI_CAP_NCS_SHIFT 8 #define AHCI_CAP_PSC 0x00002000 #define AHCI_CAP_SSC 0x00004000 #define AHCI_CAP_PMD 0x00008000 #define AHCI_CAP_FBSS 0x00010000 #define AHCI_CAP_SPM 0x00020000 #define AHCI_CAP_SAM 0x00080000 #define AHCI_CAP_ISS 0x00F00000 #define AHCI_CAP_ISS_SHIFT 20 #define AHCI_CAP_SCLO 0x01000000 #define AHCI_CAP_SAL 0x02000000 #define AHCI_CAP_SALP 0x04000000 #define AHCI_CAP_SSS 0x08000000 #define AHCI_CAP_SMPS 0x10000000 #define AHCI_CAP_SSNTF 0x20000000 #define AHCI_CAP_SNCQ 0x40000000 #define AHCI_CAP_64BIT 0x80000000 #define AHCI_GHC 0x04 #define AHCI_GHC_AE 0x80000000 #define AHCI_GHC_MRSM 0x00000004 #define AHCI_GHC_IE 0x00000002 #define AHCI_GHC_HR 0x00000001 #define AHCI_IS 0x08 #define AHCI_PI 0x0c #define AHCI_VS 0x10 #define AHCI_CCCC 0x14 #define AHCI_CCCC_TV_MASK 0xffff0000 #define AHCI_CCCC_TV_SHIFT 16 #define AHCI_CCCC_CC_MASK 0x0000ff00 #define AHCI_CCCC_CC_SHIFT 8 #define AHCI_CCCC_INT_MASK 0x000000f8 #define AHCI_CCCC_INT_SHIFT 3 #define AHCI_CCCC_EN 0x00000001 #define AHCI_CCCP 0x18 #define AHCI_EM_LOC 0x1C #define AHCI_EM_CTL 0x20 #define AHCI_EM_MR 0x00000001 #define AHCI_EM_TM 0x00000100 #define AHCI_EM_RST 0x00000200 #define AHCI_EM_LED 0x00010000 #define AHCI_EM_SAFTE 0x00020000 #define AHCI_EM_SES2 0x00040000 #define AHCI_EM_SGPIO 0x00080000 #define AHCI_EM_SMB 0x01000000 #define AHCI_EM_XMT 0x02000000 #define AHCI_EM_ALHD 0x04000000 #define AHCI_EM_PM 0x08000000 #define AHCI_CAP2 0x24 #define AHCI_CAP2_BOH 0x00000001 #define AHCI_CAP2_NVMP 0x00000002 #define AHCI_CAP2_APST 0x00000004 #define AHCI_CAP2_SDS 0x00000008 #define AHCI_CAP2_SADM 0x00000010 #define AHCI_CAP2_DESO 0x00000020 #define AHCI_OFFSET 0x100 #define AHCI_STEP 0x80 #define AHCI_P_CLB 0x00 #define AHCI_P_CLBU 0x04 #define AHCI_P_FB 0x08 #define AHCI_P_FBU 0x0c #define AHCI_P_IS 0x10 #define AHCI_P_IE 0x14 #define AHCI_P_IX_DHR 0x00000001 #define AHCI_P_IX_PS 0x00000002 #define AHCI_P_IX_DS 0x00000004 #define AHCI_P_IX_SDB 0x00000008 #define AHCI_P_IX_UF 0x00000010 #define AHCI_P_IX_DP 0x00000020 #define AHCI_P_IX_PC 0x00000040 #define AHCI_P_IX_MP 0x00000080 #define AHCI_P_IX_PRC 0x00400000 #define AHCI_P_IX_IPM 0x00800000 #define AHCI_P_IX_OF 0x01000000 #define AHCI_P_IX_INF 0x04000000 #define AHCI_P_IX_IF 0x08000000 #define AHCI_P_IX_HBD 0x10000000 #define AHCI_P_IX_HBF 0x20000000 #define AHCI_P_IX_TFE 0x40000000 #define AHCI_P_IX_CPD 0x80000000 #define AHCI_P_CMD 0x18 #define AHCI_P_CMD_ST 0x00000001 #define AHCI_P_CMD_SUD 0x00000002 #define AHCI_P_CMD_POD 0x00000004 #define AHCI_P_CMD_CLO 0x00000008 #define AHCI_P_CMD_FRE 0x00000010 #define AHCI_P_CMD_CCS_MASK 0x00001f00 #define AHCI_P_CMD_CCS_SHIFT 8 #define AHCI_P_CMD_ISS 0x00002000 #define AHCI_P_CMD_FR 0x00004000 #define AHCI_P_CMD_CR 0x00008000 #define AHCI_P_CMD_CPS 0x00010000 #define AHCI_P_CMD_PMA 0x00020000 #define AHCI_P_CMD_HPCP 0x00040000 #define AHCI_P_CMD_MPSP 0x00080000 #define AHCI_P_CMD_CPD 0x00100000 #define AHCI_P_CMD_ESP 0x00200000 #define AHCI_P_CMD_FBSCP 0x00400000 #define AHCI_P_CMD_APSTE 0x00800000 #define AHCI_P_CMD_ATAPI 0x01000000 #define AHCI_P_CMD_DLAE 0x02000000 #define AHCI_P_CMD_ALPE 0x04000000 #define AHCI_P_CMD_ASP 0x08000000 #define AHCI_P_CMD_ICC_MASK 0xf0000000 #define AHCI_P_CMD_NOOP 0x00000000 #define AHCI_P_CMD_ACTIVE 0x10000000 #define AHCI_P_CMD_PARTIAL 0x20000000 #define AHCI_P_CMD_SLUMBER 0x60000000 #define AHCI_P_CMD_DEVSLEEP 0x80000000 #define AHCI_P_TFD 0x20 #define AHCI_P_SIG 0x24 #define AHCI_P_SSTS 0x28 #define AHCI_P_SCTL 0x2c #define AHCI_P_SERR 0x30 #define AHCI_P_SACT 0x34 #define AHCI_P_CI 0x38 #define AHCI_P_SNTF 0x3C #define AHCI_P_FBS 0x40 #define AHCI_P_FBS_EN 0x00000001 #define AHCI_P_FBS_DEC 0x00000002 #define AHCI_P_FBS_SDE 0x00000004 #define AHCI_P_FBS_DEV 0x00000f00 #define AHCI_P_FBS_DEV_SHIFT 8 #define AHCI_P_FBS_ADO 0x0000f000 #define AHCI_P_FBS_ADO_SHIFT 12 #define AHCI_P_FBS_DWE 0x000f0000 #define AHCI_P_FBS_DWE_SHIFT 16 #define AHCI_P_DEVSLP 0x44 #define AHCI_P_DEVSLP_ADSE 0x00000001 #define AHCI_P_DEVSLP_DSP 0x00000002 #define AHCI_P_DEVSLP_DETO 0x000003fc #define AHCI_P_DEVSLP_DETO_SHIFT 2 #define AHCI_P_DEVSLP_MDAT 0x00007c00 #define AHCI_P_DEVSLP_MDAT_SHIFT 10 #define AHCI_P_DEVSLP_DITO 0x01ff8000 #define AHCI_P_DEVSLP_DITO_SHIFT 15 #define AHCI_P_DEVSLP_DM 0x0e000000 #define AHCI_P_DEVSLP_DM_SHIFT 25 /* Just to be sure, if building as module. */ #if MAXPHYS < 512 * 1024 #undef MAXPHYS #define MAXPHYS 512 * 1024 #endif /* Pessimistic prognosis on number of required S/G entries */ #define AHCI_SG_ENTRIES (roundup(btoc(MAXPHYS) + 1, 8)) /* Command list. 32 commands. First, 1Kbyte aligned. */ #define AHCI_CL_OFFSET 0 #define AHCI_CL_SIZE 32 /* Command tables. Up to 32 commands, Each, 128byte aligned. */ #define AHCI_CT_OFFSET (AHCI_CL_OFFSET + AHCI_CL_SIZE * AHCI_MAX_SLOTS) #define AHCI_CT_SIZE (128 + AHCI_SG_ENTRIES * 16) /* Total main work area. */ #define AHCI_WORK_SIZE (AHCI_CT_OFFSET + AHCI_CT_SIZE * ch->numslots) struct ahci_dma_prd { u_int64_t dba; u_int32_t reserved; u_int32_t dbc; /* 0 based */ #define AHCI_PRD_MASK 0x003fffff /* max 4MB */ #define AHCI_PRD_MAX (AHCI_PRD_MASK + 1) #define AHCI_PRD_IPC (1U << 31) } __packed; struct ahci_cmd_tab { u_int8_t cfis[64]; u_int8_t acmd[32]; u_int8_t reserved[32]; struct ahci_dma_prd prd_tab[AHCI_SG_ENTRIES]; } __packed; struct ahci_cmd_list { u_int16_t cmd_flags; #define AHCI_CMD_ATAPI 0x0020 #define AHCI_CMD_WRITE 0x0040 #define AHCI_CMD_PREFETCH 0x0080 #define AHCI_CMD_RESET 0x0100 #define AHCI_CMD_BIST 0x0200 #define AHCI_CMD_CLR_BUSY 0x0400 u_int16_t prd_length; /* PRD entries */ u_int32_t bytecount; u_int64_t cmd_table_phys; /* 128byte aligned */ } __packed; /* misc defines */ #define ATA_IRQ_RID 0 #define ATA_INTR_FLAGS (INTR_MPSAFE|INTR_TYPE_BIO|INTR_ENTROPY) struct ata_dmaslot { bus_dmamap_t data_map; /* data DMA map */ int nsegs; /* Number of segs loaded */ }; /* structure holding DMA related information */ struct ata_dma { bus_dma_tag_t work_tag; /* workspace DMA tag */ bus_dmamap_t work_map; /* workspace DMA map */ uint8_t *work; /* workspace */ bus_addr_t work_bus; /* bus address of work */ bus_dma_tag_t rfis_tag; /* RFIS list DMA tag */ bus_dmamap_t rfis_map; /* RFIS list DMA map */ uint8_t *rfis; /* FIS receive area */ bus_addr_t rfis_bus; /* bus address of rfis */ bus_dma_tag_t data_tag; /* data DMA tag */ }; enum ahci_slot_states { AHCI_SLOT_EMPTY, AHCI_SLOT_LOADING, AHCI_SLOT_RUNNING, AHCI_SLOT_EXECUTING }; struct ahci_slot { struct ahci_channel *ch; /* Channel */ u_int8_t slot; /* Number of this slot */ enum ahci_slot_states state; /* Slot state */ union ccb *ccb; /* CCB occupying slot */ struct ata_dmaslot dma; /* DMA data of this slot */ struct callout timeout; /* Execution timeout */ }; struct ahci_device { int revision; int mode; u_int bytecount; u_int atapi; u_int tags; u_int caps; }; struct ahci_led { device_t dev; /* Device handle */ struct cdev *led; uint8_t num; /* Number of this led */ uint8_t state; /* State of this led */ }; #define AHCI_NUM_LEDS 3 /* structure describing an ATA channel */ struct ahci_channel { device_t dev; /* Device handle */ int unit; /* Physical channel */ struct resource *r_mem; /* Memory of this channel */ struct resource *r_irq; /* Interrupt of this channel */ void *ih; /* Interrupt handle */ struct ata_dma dma; /* DMA data */ struct cam_sim *sim; struct cam_path *path; uint32_t caps; /* Controller capabilities */ uint32_t caps2; /* Controller capabilities */ uint32_t chcaps; /* Channel capabilities */ uint32_t chscaps; /* Channel sleep capabilities */ uint16_t vendorid; /* Vendor ID from the bus */ uint16_t deviceid; /* Device ID from the bus */ uint16_t subvendorid; /* Subvendor ID from the bus */ uint16_t subdeviceid; /* Subdevice ID from the bus */ int quirks; int numslots; /* Number of present slots */ int pm_level; /* power management level */ int devices; /* What is present */ int pm_present; /* PM presence reported */ int fbs_enabled; /* FIS-based switching enabled */ union ccb *hold[AHCI_MAX_SLOTS]; struct ahci_slot slot[AHCI_MAX_SLOTS]; uint32_t oslots; /* Occupied slots */ uint32_t rslots; /* Running slots */ uint32_t aslots; /* Slots with atomic commands */ uint32_t eslots; /* Slots in error */ uint32_t toslots; /* Slots in timeout */ int lastslot; /* Last used slot */ int taggedtarget; /* Last tagged target */ int numrslots; /* Number of running slots */ int numrslotspd[16];/* Number of running slots per dev */ int numtslots; /* Number of tagged slots */ int numtslotspd[16];/* Number of tagged slots per dev */ int numhslots; /* Number of held slots */ int recoverycmd; /* Our READ LOG active */ int fatalerr; /* Fatal error happend */ int resetting; /* Hard-reset in progress. */ int resetpolldiv; /* Hard-reset poll divider. */ int listening; /* SUD bit is cleared. */ int wrongccs; /* CCS field in CMD was wrong */ union ccb *frozen; /* Frozen command */ struct callout pm_timer; /* Power management events */ struct callout reset_timer; /* Hard-reset timeout */ struct ahci_device user[16]; /* User-specified settings */ struct ahci_device curr[16]; /* Current settings */ struct mtx_padalign mtx; /* state lock */ STAILQ_HEAD(, ccb_hdr) doneq; /* queue of completed CCBs */ int batch; /* doneq is in use */ }; struct ahci_enclosure { device_t dev; /* Device handle */ struct resource *r_memc; /* Control register */ struct resource *r_memt; /* Transmit buffer */ struct resource *r_memr; /* Recieve buffer */ struct cam_sim *sim; struct cam_path *path; struct mtx mtx; /* state lock */ struct ahci_led leds[AHCI_MAX_PORTS * 3]; uint32_t capsem; /* Controller capabilities */ uint8_t status[AHCI_MAX_PORTS][4]; /* ArrayDev statuses */ int quirks; int channels; int ichannels; }; /* structure describing a AHCI controller */ struct ahci_controller { device_t dev; bus_dma_tag_t dma_tag; int r_rid; uint16_t vendorid; /* Vendor ID from the bus */ uint16_t deviceid; /* Device ID from the bus */ uint16_t subvendorid; /* Subvendor ID from the bus */ uint16_t subdeviceid; /* Subdevice ID from the bus */ struct resource *r_mem; struct rman sc_iomem; struct ahci_controller_irq { struct ahci_controller *ctlr; struct resource *r_irq; void *handle; int r_irq_rid; int mode; #define AHCI_IRQ_MODE_ALL 0 #define AHCI_IRQ_MODE_AFTER 1 #define AHCI_IRQ_MODE_ONE 2 } irqs[AHCI_MAX_IRQS]; uint32_t caps; /* Controller capabilities */ uint32_t caps2; /* Controller capabilities */ uint32_t capsem; /* Controller capabilities */ uint32_t emloc; /* EM buffer location */ int quirks; int numirqs; int channels; int ichannels; int ccc; /* CCC timeout */ int cccv; /* CCC vector */ int direct; /* Direct command completion */ int msi; /* MSI interupts */ struct { void (*function)(void *); void *argument; } interrupt[AHCI_MAX_PORTS]; }; enum ahci_err_type { AHCI_ERR_NONE, /* No error */ AHCI_ERR_INVALID, /* Error detected by us before submitting. */ AHCI_ERR_INNOCENT, /* Innocent victim. */ AHCI_ERR_TFE, /* Task File Error. */ AHCI_ERR_SATA, /* SATA error. */ AHCI_ERR_TIMEOUT, /* Command execution timeout. */ AHCI_ERR_NCQ, /* NCQ command error. CCB should be put on hold * until READ LOG executed to reveal error. */ }; /* macros to hide busspace uglyness */ #define ATA_INB(res, offset) \ bus_read_1((res), (offset)) #define ATA_INW(res, offset) \ bus_read_2((res), (offset)) #define ATA_INL(res, offset) \ bus_read_4((res), (offset)) #define ATA_INSW(res, offset, addr, count) \ bus_read_multi_2((res), (offset), (addr), (count)) #define ATA_INSW_STRM(res, offset, addr, count) \ bus_read_multi_stream_2((res), (offset), (addr), (count)) #define ATA_INSL(res, offset, addr, count) \ bus_read_multi_4((res), (offset), (addr), (count)) #define ATA_INSL_STRM(res, offset, addr, count) \ bus_read_multi_stream_4((res), (offset), (addr), (count)) #define ATA_OUTB(res, offset, value) \ bus_write_1((res), (offset), (value)) #define ATA_OUTW(res, offset, value) \ bus_write_2((res), (offset), (value)) #define ATA_OUTL(res, offset, value) \ bus_write_4((res), (offset), (value)) #define ATA_OUTSW(res, offset, addr, count) \ bus_write_multi_2((res), (offset), (addr), (count)) #define ATA_OUTSW_STRM(res, offset, addr, count) \ bus_write_multi_stream_2((res), (offset), (addr), (count)) #define ATA_OUTSL(res, offset, addr, count) \ bus_write_multi_4((res), (offset), (addr), (count)) #define ATA_OUTSL_STRM(res, offset, addr, count) \ bus_write_multi_stream_4((res), (offset), (addr), (count)) #define AHCI_Q_NOFORCE 1 #define AHCI_Q_NOPMP 2 #define AHCI_Q_NONCQ 4 #define AHCI_Q_1CH 8 #define AHCI_Q_2CH 0x10 #define AHCI_Q_4CH 0x20 #define AHCI_Q_EDGEIS 0x40 #define AHCI_Q_SATA2 0x80 #define AHCI_Q_NOBSYRES 0x100 #define AHCI_Q_NOAA 0x200 #define AHCI_Q_NOCOUNT 0x400 #define AHCI_Q_ALTSIG 0x800 #define AHCI_Q_NOMSI 0x1000 #define AHCI_Q_ATI_PMP_BUG 0x2000 #define AHCI_Q_MAXIO_64K 0x4000 #define AHCI_Q_SATA1_UNIT0 0x8000 /* need better method for this */ +#define AHCI_Q_ABAR0 0x10000 #define AHCI_Q_BIT_STRING \ "\020" \ "\001NOFORCE" \ "\002NOPMP" \ "\003NONCQ" \ "\0041CH" \ "\0052CH" \ "\0064CH" \ "\007EDGEIS" \ "\010SATA2" \ "\011NOBSYRES" \ "\012NOAA" \ "\013NOCOUNT" \ "\014ALTSIG" \ "\015NOMSI" \ "\016ATI_PMP_BUG" \ "\017MAXIO_64K" \ "\020SATA1_UNIT0" int ahci_attach(device_t dev); int ahci_detach(device_t dev); int ahci_setup_interrupt(device_t dev); int ahci_print_child(device_t dev, device_t child); struct resource *ahci_alloc_resource(device_t dev, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags); int ahci_release_resource(device_t dev, device_t child, int type, int rid, struct resource *r); int ahci_setup_intr(device_t dev, device_t child, struct resource *irq, int flags, driver_filter_t *filter, driver_intr_t *function, void *argument, void **cookiep); int ahci_teardown_intr(device_t dev, device_t child, struct resource *irq, void *cookie); int ahci_child_location_str(device_t dev, device_t child, char *buf, size_t buflen); bus_dma_tag_t ahci_get_dma_tag(device_t dev, device_t child); int ahci_ctlr_reset(device_t dev); int ahci_ctlr_setup(device_t dev); Index: projects/ifnet/sys/dev/ahci/ahci_pci.c =================================================================== --- projects/ifnet/sys/dev/ahci/ahci_pci.c (revision 277106) +++ projects/ifnet/sys/dev/ahci/ahci_pci.c (revision 277107) @@ -1,509 +1,514 @@ /*- * Copyright (c) 2009-2012 Alexander Motin * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR * IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES * OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. * IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT, * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "ahci.h" static int force_ahci = 1; TUNABLE_INT("hw.ahci.force", &force_ahci); static const struct { uint32_t id; uint8_t rev; const char *name; int quirks; } ahci_ids[] = { {0x43801002, 0x00, "AMD SB600", AHCI_Q_NOMSI | AHCI_Q_ATI_PMP_BUG | AHCI_Q_MAXIO_64K}, {0x43901002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG}, {0x43911002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG}, {0x43921002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG}, {0x43931002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG}, {0x43941002, 0x00, "AMD SB7x0/SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG}, /* Not sure SB8x0/SB9x0 needs this quirk. Be conservative though */ {0x43951002, 0x00, "AMD SB8x0/SB9x0", AHCI_Q_ATI_PMP_BUG}, {0x78001022, 0x00, "AMD Hudson-2", 0}, {0x78011022, 0x00, "AMD Hudson-2", 0}, {0x78021022, 0x00, "AMD Hudson-2", 0}, {0x78031022, 0x00, "AMD Hudson-2", 0}, {0x78041022, 0x00, "AMD Hudson-2", 0}, {0x06111b21, 0x00, "ASMedia ASM2106", 0}, {0x06121b21, 0x00, "ASMedia ASM1061", 0}, {0x26528086, 0x00, "Intel ICH6", AHCI_Q_NOFORCE}, {0x26538086, 0x00, "Intel ICH6M", AHCI_Q_NOFORCE}, {0x26818086, 0x00, "Intel ESB2", 0}, {0x26828086, 0x00, "Intel ESB2", 0}, {0x26838086, 0x00, "Intel ESB2", 0}, {0x27c18086, 0x00, "Intel ICH7", 0}, {0x27c38086, 0x00, "Intel ICH7", 0}, {0x27c58086, 0x00, "Intel ICH7M", 0}, {0x27c68086, 0x00, "Intel ICH7M", 0}, {0x28218086, 0x00, "Intel ICH8", 0}, {0x28228086, 0x00, "Intel ICH8", 0}, {0x28248086, 0x00, "Intel ICH8", 0}, {0x28298086, 0x00, "Intel ICH8M", 0}, {0x282a8086, 0x00, "Intel ICH8M", 0}, {0x29228086, 0x00, "Intel ICH9", 0}, {0x29238086, 0x00, "Intel ICH9", 0}, {0x29248086, 0x00, "Intel ICH9", 0}, {0x29258086, 0x00, "Intel ICH9", 0}, {0x29278086, 0x00, "Intel ICH9", 0}, {0x29298086, 0x00, "Intel ICH9M", 0}, {0x292a8086, 0x00, "Intel ICH9M", 0}, {0x292b8086, 0x00, "Intel ICH9M", 0}, {0x292c8086, 0x00, "Intel ICH9M", 0}, {0x292f8086, 0x00, "Intel ICH9M", 0}, {0x294d8086, 0x00, "Intel ICH9", 0}, {0x294e8086, 0x00, "Intel ICH9M", 0}, {0x3a058086, 0x00, "Intel ICH10", 0}, {0x3a228086, 0x00, "Intel ICH10", 0}, {0x3a258086, 0x00, "Intel ICH10", 0}, {0x3b228086, 0x00, "Intel 5 Series/3400 Series", 0}, {0x3b238086, 0x00, "Intel 5 Series/3400 Series", 0}, {0x3b258086, 0x00, "Intel 5 Series/3400 Series", 0}, {0x3b298086, 0x00, "Intel 5 Series/3400 Series", 0}, {0x3b2c8086, 0x00, "Intel 5 Series/3400 Series", 0}, {0x3b2f8086, 0x00, "Intel 5 Series/3400 Series", 0}, {0x1c028086, 0x00, "Intel Cougar Point", 0}, {0x1c038086, 0x00, "Intel Cougar Point", 0}, {0x1c048086, 0x00, "Intel Cougar Point", 0}, {0x1c058086, 0x00, "Intel Cougar Point", 0}, {0x1d028086, 0x00, "Intel Patsburg", 0}, {0x1d048086, 0x00, "Intel Patsburg", 0}, {0x1d068086, 0x00, "Intel Patsburg", 0}, {0x28268086, 0x00, "Intel Patsburg (RAID)", 0}, {0x1e028086, 0x00, "Intel Panther Point", 0}, {0x1e038086, 0x00, "Intel Panther Point", 0}, {0x1e048086, 0x00, "Intel Panther Point (RAID)", 0}, {0x1e058086, 0x00, "Intel Panther Point (RAID)", 0}, {0x1e068086, 0x00, "Intel Panther Point (RAID)", 0}, {0x1e078086, 0x00, "Intel Panther Point (RAID)", 0}, {0x1e0e8086, 0x00, "Intel Panther Point (RAID)", 0}, {0x1e0f8086, 0x00, "Intel Panther Point (RAID)", 0}, {0x1f228086, 0x00, "Intel Avoton", 0}, {0x1f238086, 0x00, "Intel Avoton", 0}, {0x1f248086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f258086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f268086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f278086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f2e8086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f2f8086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f328086, 0x00, "Intel Avoton", 0}, {0x1f338086, 0x00, "Intel Avoton", 0}, {0x1f348086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f358086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f368086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f378086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f3e8086, 0x00, "Intel Avoton (RAID)", 0}, {0x1f3f8086, 0x00, "Intel Avoton (RAID)", 0}, {0x23a38086, 0x00, "Intel Coleto Creek", 0}, {0x28238086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x28278086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x8c028086, 0x00, "Intel Lynx Point", 0}, {0x8c038086, 0x00, "Intel Lynx Point", 0}, {0x8c048086, 0x00, "Intel Lynx Point (RAID)", 0}, {0x8c058086, 0x00, "Intel Lynx Point (RAID)", 0}, {0x8c068086, 0x00, "Intel Lynx Point (RAID)", 0}, {0x8c078086, 0x00, "Intel Lynx Point (RAID)", 0}, {0x8c0e8086, 0x00, "Intel Lynx Point (RAID)", 0}, {0x8c0f8086, 0x00, "Intel Lynx Point (RAID)", 0}, {0x8c828086, 0x00, "Intel Wildcat Point", 0}, {0x8c838086, 0x00, "Intel Wildcat Point", 0}, {0x8c848086, 0x00, "Intel Wildcat Point (RAID)", 0}, {0x8c858086, 0x00, "Intel Wildcat Point (RAID)", 0}, {0x8c868086, 0x00, "Intel Wildcat Point (RAID)", 0}, {0x8c878086, 0x00, "Intel Wildcat Point (RAID)", 0}, {0x8c8e8086, 0x00, "Intel Wildcat Point (RAID)", 0}, {0x8c8f8086, 0x00, "Intel Wildcat Point (RAID)", 0}, {0x8d028086, 0x00, "Intel Wellsburg", 0}, {0x8d048086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x8d068086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x8d628086, 0x00, "Intel Wellsburg", 0}, {0x8d648086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x8d668086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x8d6e8086, 0x00, "Intel Wellsburg (RAID)", 0}, {0x9c028086, 0x00, "Intel Lynx Point-LP", 0}, {0x9c038086, 0x00, "Intel Lynx Point-LP", 0}, {0x9c048086, 0x00, "Intel Lynx Point-LP (RAID)", 0}, {0x9c058086, 0x00, "Intel Lynx Point-LP (RAID)", 0}, {0x9c068086, 0x00, "Intel Lynx Point-LP (RAID)", 0}, {0x9c078086, 0x00, "Intel Lynx Point-LP (RAID)", 0}, {0x9c0e8086, 0x00, "Intel Lynx Point-LP (RAID)", 0}, {0x9c0f8086, 0x00, "Intel Lynx Point-LP (RAID)", 0}, {0x23238086, 0x00, "Intel DH89xxCC", 0}, {0x2360197b, 0x00, "JMicron JMB360", 0}, {0x2361197b, 0x00, "JMicron JMB361", AHCI_Q_NOFORCE}, {0x2362197b, 0x00, "JMicron JMB362", 0}, {0x2363197b, 0x00, "JMicron JMB363", AHCI_Q_NOFORCE}, {0x2365197b, 0x00, "JMicron JMB365", AHCI_Q_NOFORCE}, {0x2366197b, 0x00, "JMicron JMB366", AHCI_Q_NOFORCE}, {0x2368197b, 0x00, "JMicron JMB368", AHCI_Q_NOFORCE}, {0x611111ab, 0x00, "Marvell 88SE6111", AHCI_Q_NOFORCE | AHCI_Q_1CH | AHCI_Q_EDGEIS}, {0x612111ab, 0x00, "Marvell 88SE6121", AHCI_Q_NOFORCE | AHCI_Q_2CH | AHCI_Q_EDGEIS | AHCI_Q_NONCQ | AHCI_Q_NOCOUNT}, {0x614111ab, 0x00, "Marvell 88SE6141", AHCI_Q_NOFORCE | AHCI_Q_4CH | AHCI_Q_EDGEIS | AHCI_Q_NONCQ | AHCI_Q_NOCOUNT}, {0x614511ab, 0x00, "Marvell 88SE6145", AHCI_Q_NOFORCE | AHCI_Q_4CH | AHCI_Q_EDGEIS | AHCI_Q_NONCQ | AHCI_Q_NOCOUNT}, {0x91201b4b, 0x00, "Marvell 88SE912x", AHCI_Q_EDGEIS}, {0x91231b4b, 0x11, "Marvell 88SE912x", AHCI_Q_ALTSIG}, {0x91231b4b, 0x00, "Marvell 88SE912x", AHCI_Q_EDGEIS|AHCI_Q_SATA2}, {0x91251b4b, 0x00, "Marvell 88SE9125", 0}, {0x91281b4b, 0x00, "Marvell 88SE9128", AHCI_Q_ALTSIG}, {0x91301b4b, 0x00, "Marvell 88SE9130", AHCI_Q_ALTSIG}, {0x91721b4b, 0x00, "Marvell 88SE9172", 0}, {0x91821b4b, 0x00, "Marvell 88SE9182", 0}, {0x91831b4b, 0x00, "Marvell 88SS9183", 0}, {0x91a01b4b, 0x00, "Marvell 88SE91Ax", 0}, {0x92151b4b, 0x00, "Marvell 88SE9215", 0}, {0x92201b4b, 0x00, "Marvell 88SE9220", AHCI_Q_ALTSIG}, {0x92301b4b, 0x00, "Marvell 88SE9230", AHCI_Q_ALTSIG}, {0x92351b4b, 0x00, "Marvell 88SE9235", 0}, {0x06201103, 0x00, "HighPoint RocketRAID 620", 0}, {0x06201b4b, 0x00, "HighPoint RocketRAID 620", 0}, {0x06221103, 0x00, "HighPoint RocketRAID 622", 0}, {0x06221b4b, 0x00, "HighPoint RocketRAID 622", 0}, {0x06401103, 0x00, "HighPoint RocketRAID 640", 0}, {0x06401b4b, 0x00, "HighPoint RocketRAID 640", 0}, {0x06441103, 0x00, "HighPoint RocketRAID 644", 0}, {0x06441b4b, 0x00, "HighPoint RocketRAID 644", 0}, {0x06411103, 0x00, "HighPoint RocketRAID 640L", 0}, {0x06421103, 0x00, "HighPoint RocketRAID 642L", 0}, {0x06451103, 0x00, "HighPoint RocketRAID 644L", 0}, {0x044c10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x044d10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x044e10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x044f10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x045c10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x045d10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x045e10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x045f10de, 0x00, "NVIDIA MCP65", AHCI_Q_NOAA}, {0x055010de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055110de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055210de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055310de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055410de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055510de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055610de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055710de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055810de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055910de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055A10de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x055B10de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x058410de, 0x00, "NVIDIA MCP67", AHCI_Q_NOAA}, {0x07f010de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f110de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f210de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f310de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f410de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f510de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f610de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f710de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f810de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07f910de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07fa10de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x07fb10de, 0x00, "NVIDIA MCP73", AHCI_Q_NOAA}, {0x0ad010de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad110de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad210de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad310de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad410de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad510de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad610de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad710de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad810de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ad910de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ada10de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0adb10de, 0x00, "NVIDIA MCP77", AHCI_Q_NOAA}, {0x0ab410de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0ab510de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0ab610de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0ab710de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0ab810de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0ab910de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0aba10de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0abb10de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0abc10de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0abd10de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0abe10de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0abf10de, 0x00, "NVIDIA MCP79", AHCI_Q_NOAA}, {0x0d8410de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8510de, 0x00, "NVIDIA MCP89", AHCI_Q_NOFORCE|AHCI_Q_NOAA}, {0x0d8610de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8710de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8810de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8910de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8a10de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8b10de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8c10de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8d10de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8e10de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x0d8f10de, 0x00, "NVIDIA MCP89", AHCI_Q_NOAA}, {0x3781105a, 0x00, "Promise TX8660", 0}, {0x33491106, 0x00, "VIA VT8251", AHCI_Q_NOPMP|AHCI_Q_NONCQ}, {0x62871106, 0x00, "VIA VT8251", AHCI_Q_NOPMP|AHCI_Q_NONCQ}, {0x11841039, 0x00, "SiS 966", 0}, {0x11851039, 0x00, "SiS 968", 0}, {0x01861039, 0x00, "SiS 968", 0}, + {0xa01c177d, 0x00, "ThunderX SATA", AHCI_Q_ABAR0}, {0x00000000, 0x00, NULL, 0} }; static int ahci_pci_ctlr_reset(device_t dev) { if (pci_read_config(dev, PCIR_DEVVENDOR, 4) == 0x28298086 && (pci_read_config(dev, 0x92, 1) & 0xfe) == 0x04) pci_write_config(dev, 0x92, 0x01, 1); return ahci_ctlr_reset(dev); } static int ahci_probe(device_t dev) { char buf[64]; int i, valid = 0; uint32_t devid = pci_get_devid(dev); uint8_t revid = pci_get_revid(dev); /* * Ensure it is not a PCI bridge (some vendors use * the same PID and VID in PCI bridge and AHCI cards). */ if (pci_get_class(dev) == PCIC_BRIDGE) return (ENXIO); /* Is this a possible AHCI candidate? */ if (pci_get_class(dev) == PCIC_STORAGE && pci_get_subclass(dev) == PCIS_STORAGE_SATA && pci_get_progif(dev) == PCIP_STORAGE_SATA_AHCI_1_0) valid = 1; /* Is this a known AHCI chip? */ for (i = 0; ahci_ids[i].id != 0; i++) { if (ahci_ids[i].id == devid && ahci_ids[i].rev <= revid && (valid || (force_ahci == 1 && !(ahci_ids[i].quirks & AHCI_Q_NOFORCE)))) { /* Do not attach JMicrons with single PCI function. */ if (pci_get_vendor(dev) == 0x197b && (pci_read_config(dev, 0xdf, 1) & 0x40) == 0) return (ENXIO); snprintf(buf, sizeof(buf), "%s AHCI SATA controller", ahci_ids[i].name); device_set_desc_copy(dev, buf); return (BUS_PROBE_VENDOR); } } if (!valid) return (ENXIO); device_set_desc_copy(dev, "AHCI SATA controller"); return (BUS_PROBE_VENDOR); } static int ahci_ata_probe(device_t dev) { char buf[64]; int i; uint32_t devid = pci_get_devid(dev); uint8_t revid = pci_get_revid(dev); if ((intptr_t)device_get_ivars(dev) >= 0) return (ENXIO); /* Is this a known AHCI chip? */ for (i = 0; ahci_ids[i].id != 0; i++) { if (ahci_ids[i].id == devid && ahci_ids[i].rev <= revid) { snprintf(buf, sizeof(buf), "%s AHCI SATA controller", ahci_ids[i].name); device_set_desc_copy(dev, buf); return (BUS_PROBE_VENDOR); } } device_set_desc_copy(dev, "AHCI SATA controller"); return (BUS_PROBE_VENDOR); } static int ahci_pci_attach(device_t dev) { struct ahci_controller *ctlr = device_get_softc(dev); int error, i; uint32_t devid = pci_get_devid(dev); uint8_t revid = pci_get_revid(dev); i = 0; while (ahci_ids[i].id != 0 && (ahci_ids[i].id != devid || ahci_ids[i].rev > revid)) i++; ctlr->quirks = ahci_ids[i].quirks; /* Limit speed for my onboard JMicron external port. * It is not eSATA really, limit to SATA 1 */ if (pci_get_devid(dev) == 0x2363197b && pci_get_subvendor(dev) == 0x1043 && pci_get_subdevice(dev) == 0x81e4) ctlr->quirks |= AHCI_Q_SATA1_UNIT0; - /* if we have a memory BAR(5) we are likely on an AHCI part */ ctlr->vendorid = pci_get_vendor(dev); ctlr->deviceid = pci_get_device(dev); ctlr->subvendorid = pci_get_subvendor(dev); ctlr->subdeviceid = pci_get_subdevice(dev); - ctlr->r_rid = PCIR_BAR(5); + + /* Default AHCI Base Address is BAR(5), Cavium uses BAR(0) */ + if (ctlr->quirks & AHCI_Q_ABAR0) + ctlr->r_rid = PCIR_BAR(0); + else + ctlr->r_rid = PCIR_BAR(5); if (!(ctlr->r_mem = bus_alloc_resource_any(dev, SYS_RES_MEMORY, &ctlr->r_rid, RF_ACTIVE))) return ENXIO; pci_enable_busmaster(dev); /* Reset controller */ if ((error = ahci_pci_ctlr_reset(dev)) != 0) { bus_release_resource(dev, SYS_RES_MEMORY, ctlr->r_rid, ctlr->r_mem); return (error); }; /* Setup interrupts. */ /* Setup MSI register parameters */ ctlr->msi = 2; /* Process hints. */ if (ctlr->quirks & AHCI_Q_NOMSI) ctlr->msi = 0; resource_int_value(device_get_name(dev), device_get_unit(dev), "msi", &ctlr->msi); ctlr->numirqs = 1; if (ctlr->msi < 0) ctlr->msi = 0; else if (ctlr->msi == 1) ctlr->msi = min(1, pci_msi_count(dev)); else if (ctlr->msi > 1) { ctlr->msi = 2; ctlr->numirqs = pci_msi_count(dev); } /* Allocate MSI if needed/present. */ if (ctlr->msi && pci_alloc_msi(dev, &ctlr->numirqs) != 0) { ctlr->msi = 0; ctlr->numirqs = 1; } error = ahci_attach(dev); if (error != 0) if (ctlr->msi) pci_release_msi(dev); return error; } static int ahci_pci_detach(device_t dev) { ahci_detach(dev); pci_release_msi(dev); return (0); } static int ahci_pci_suspend(device_t dev) { struct ahci_controller *ctlr = device_get_softc(dev); bus_generic_suspend(dev); /* Disable interupts, so the state change(s) doesn't trigger */ ATA_OUTL(ctlr->r_mem, AHCI_GHC, ATA_INL(ctlr->r_mem, AHCI_GHC) & (~AHCI_GHC_IE)); return 0; } static int ahci_pci_resume(device_t dev) { int res; if ((res = ahci_pci_ctlr_reset(dev)) != 0) return (res); ahci_ctlr_setup(dev); return (bus_generic_resume(dev)); } devclass_t ahci_devclass; static device_method_t ahci_methods[] = { DEVMETHOD(device_probe, ahci_probe), DEVMETHOD(device_attach, ahci_pci_attach), DEVMETHOD(device_detach, ahci_pci_detach), DEVMETHOD(device_suspend, ahci_pci_suspend), DEVMETHOD(device_resume, ahci_pci_resume), DEVMETHOD(bus_print_child, ahci_print_child), DEVMETHOD(bus_alloc_resource, ahci_alloc_resource), DEVMETHOD(bus_release_resource, ahci_release_resource), DEVMETHOD(bus_setup_intr, ahci_setup_intr), DEVMETHOD(bus_teardown_intr,ahci_teardown_intr), DEVMETHOD(bus_child_location_str, ahci_child_location_str), DEVMETHOD(bus_get_dma_tag, ahci_get_dma_tag), DEVMETHOD_END }; static driver_t ahci_driver = { "ahci", ahci_methods, sizeof(struct ahci_controller) }; DRIVER_MODULE(ahci, pci, ahci_driver, ahci_devclass, NULL, NULL); static device_method_t ahci_ata_methods[] = { DEVMETHOD(device_probe, ahci_ata_probe), DEVMETHOD(device_attach, ahci_pci_attach), DEVMETHOD(device_detach, ahci_pci_detach), DEVMETHOD(device_suspend, ahci_pci_suspend), DEVMETHOD(device_resume, ahci_pci_resume), DEVMETHOD(bus_print_child, ahci_print_child), DEVMETHOD(bus_alloc_resource, ahci_alloc_resource), DEVMETHOD(bus_release_resource, ahci_release_resource), DEVMETHOD(bus_setup_intr, ahci_setup_intr), DEVMETHOD(bus_teardown_intr,ahci_teardown_intr), DEVMETHOD(bus_child_location_str, ahci_child_location_str), DEVMETHOD_END }; static driver_t ahci_ata_driver = { "ahci", ahci_ata_methods, sizeof(struct ahci_controller) }; DRIVER_MODULE(ahci, atapci, ahci_ata_driver, ahci_devclass, NULL, NULL); Index: projects/ifnet/sys/dev/cxgbe/iw_cxgbe/cm.c =================================================================== --- projects/ifnet/sys/dev/cxgbe/iw_cxgbe/cm.c (revision 277106) +++ projects/ifnet/sys/dev/cxgbe/iw_cxgbe/cm.c (revision 277107) @@ -1,2445 +1,2443 @@ /* * Copyright (c) 2009-2013 Chelsio, Inc. All rights reserved. * * This software is available to you under a choice of one of two * licenses. You may choose to be licensed under the terms of the GNU * General Public License (GPL) Version 2, available from the file * COPYING in the main directory of this source tree, or the * OpenIB.org BSD license below: * * Redistribution and use in source and binary forms, with or * without modification, are permitted provided that the following * conditions are met: * * - Redistributions of source code must retain the above * copyright notice, this list of conditions and the following * disclaimer. * * - Redistributions in binary form must reproduce the above * copyright notice, this list of conditions and the following * disclaimer in the documentation and/or other materials * provided with the distribution. * * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, * EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND * NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS * BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN * ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN * CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE * SOFTWARE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #ifdef TCP_OFFLOAD #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include struct sge_iq; struct rss_header; #include #include "offload.h" #include "tom/t4_tom.h" #define TOEPCB(so) ((struct toepcb *)(so_sototcpcb((so))->t_toe)) #include "iw_cxgbe.h" #include #include #include #include #include #include static spinlock_t req_lock; static TAILQ_HEAD(c4iw_ep_list, c4iw_ep_common) req_list; static struct work_struct c4iw_task; static struct workqueue_struct *c4iw_taskq; static LIST_HEAD(timeout_list); static spinlock_t timeout_lock; static void process_req(struct work_struct *ctx); static void start_ep_timer(struct c4iw_ep *ep); static void stop_ep_timer(struct c4iw_ep *ep); static int set_tcpinfo(struct c4iw_ep *ep); static enum c4iw_ep_state state_read(struct c4iw_ep_common *epc); static void __state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state tostate); static void state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state tostate); static void *alloc_ep(int size, gfp_t flags); void __free_ep(struct c4iw_ep_common *epc); static struct rtentry * find_route(__be32 local_ip, __be32 peer_ip, __be16 local_port, __be16 peer_port, u8 tos); static int close_socket(struct c4iw_ep_common *epc, int close); static int shutdown_socket(struct c4iw_ep_common *epc); static void abort_socket(struct c4iw_ep *ep); static void send_mpa_req(struct c4iw_ep *ep); static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen); static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen); static void close_complete_upcall(struct c4iw_ep *ep, int status); static int abort_connection(struct c4iw_ep *ep); static void peer_close_upcall(struct c4iw_ep *ep); static void peer_abort_upcall(struct c4iw_ep *ep); static void connect_reply_upcall(struct c4iw_ep *ep, int status); static void connect_request_upcall(struct c4iw_ep *ep); static void established_upcall(struct c4iw_ep *ep); static void process_mpa_reply(struct c4iw_ep *ep); static void process_mpa_request(struct c4iw_ep *ep); static void process_peer_close(struct c4iw_ep *ep); static void process_conn_error(struct c4iw_ep *ep); static void process_close_complete(struct c4iw_ep *ep); static void ep_timeout(unsigned long arg); static void init_sock(struct c4iw_ep_common *epc); static void process_data(struct c4iw_ep *ep); static void process_connected(struct c4iw_ep *ep); static struct socket * dequeue_socket(struct socket *head, struct sockaddr_in **remote, struct c4iw_ep *child_ep); static void process_newconn(struct c4iw_ep *parent_ep); static int c4iw_so_upcall(struct socket *so, void *arg, int waitflag); static void process_socket_event(struct c4iw_ep *ep); static void release_ep_resources(struct c4iw_ep *ep); #define START_EP_TIMER(ep) \ do { \ CTR3(KTR_IW_CXGBE, "start_ep_timer (%s:%d) ep %p", \ __func__, __LINE__, (ep)); \ start_ep_timer(ep); \ } while (0) #define STOP_EP_TIMER(ep) \ do { \ CTR3(KTR_IW_CXGBE, "stop_ep_timer (%s:%d) ep %p", \ __func__, __LINE__, (ep)); \ stop_ep_timer(ep); \ } while (0) #ifdef KTR static char *states[] = { "idle", "listen", "connecting", "mpa_wait_req", "mpa_req_sent", "mpa_req_rcvd", "mpa_rep_sent", "fpdu_mode", "aborting", "closing", "moribund", "dead", NULL, }; #endif static void process_req(struct work_struct *ctx) { struct c4iw_ep_common *epc; spin_lock(&req_lock); while (!TAILQ_EMPTY(&req_list)) { epc = TAILQ_FIRST(&req_list); TAILQ_REMOVE(&req_list, epc, entry); epc->entry.tqe_prev = NULL; spin_unlock(&req_lock); if (epc->so) process_socket_event((struct c4iw_ep *)epc); c4iw_put_ep(epc); spin_lock(&req_lock); } spin_unlock(&req_lock); } /* * XXX: doesn't belong here in the iWARP driver. * XXX: assumes that the connection was offloaded by cxgbe/t4_tom if TF_TOE is * set. Is this a valid assumption for active open? */ static int set_tcpinfo(struct c4iw_ep *ep) { struct socket *so = ep->com.so; struct inpcb *inp = sotoinpcb(so); struct tcpcb *tp; struct toepcb *toep; int rc = 0; INP_WLOCK(inp); tp = intotcpcb(inp); if ((tp->t_flags & TF_TOE) == 0) { rc = EINVAL; log(LOG_ERR, "%s: connection not offloaded (so %p, ep %p)\n", __func__, so, ep); goto done; } toep = TOEPCB(so); ep->hwtid = toep->tid; ep->snd_seq = tp->snd_nxt; ep->rcv_seq = tp->rcv_nxt; ep->emss = max(tp->t_maxseg, 128); done: INP_WUNLOCK(inp); return (rc); } static struct rtentry * find_route(__be32 local_ip, __be32 peer_ip, __be16 local_port, __be16 peer_port, u8 tos) { struct route iproute; struct sockaddr_in *dst = (struct sockaddr_in *)&iproute.ro_dst; CTR5(KTR_IW_CXGBE, "%s:frtB %x, %x, %d, %d", __func__, local_ip, peer_ip, ntohs(local_port), ntohs(peer_port)); bzero(&iproute, sizeof iproute); dst->sin_family = AF_INET; dst->sin_len = sizeof *dst; dst->sin_addr.s_addr = peer_ip; rtalloc(&iproute); CTR2(KTR_IW_CXGBE, "%s:frtE %p", __func__, (uint64_t)iproute.ro_rt); return iproute.ro_rt; } static int close_socket(struct c4iw_ep_common *epc, int close) { struct socket *so = epc->so; int rc; CTR4(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s", __func__, epc, so, states[epc->state]); SOCK_LOCK(so); soupcall_clear(so, SO_RCV); SOCK_UNLOCK(so); if (close) rc = soclose(so); else rc = soshutdown(so, SHUT_WR | SHUT_RD); epc->so = NULL; return (rc); } static int shutdown_socket(struct c4iw_ep_common *epc) { CTR4(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s", __func__, epc->so, epc, states[epc->state]); return (soshutdown(epc->so, SHUT_WR)); } static void abort_socket(struct c4iw_ep *ep) { struct sockopt sopt; int rc; struct linger l; CTR4(KTR_IW_CXGBE, "%s ep %p so %p state %s", __func__, ep, ep->com.so, states[ep->com.state]); l.l_onoff = 1; l.l_linger = 0; /* linger_time of 0 forces RST to be sent */ sopt.sopt_dir = SOPT_SET; sopt.sopt_level = SOL_SOCKET; sopt.sopt_name = SO_LINGER; sopt.sopt_val = (caddr_t)&l; sopt.sopt_valsize = sizeof l; sopt.sopt_td = NULL; rc = sosetopt(ep->com.so, &sopt); if (rc) { log(LOG_ERR, "%s: can't set linger to 0, no RST! err %d\n", __func__, rc); } } static void process_peer_close(struct c4iw_ep *ep) { struct c4iw_qp_attributes attrs; int disconnect = 1; int release = 0; CTR4(KTR_IW_CXGBE, "%s:ppcB ep %p so %p state %s", __func__, ep, ep->com.so, states[ep->com.state]); mutex_lock(&ep->com.mutex); switch (ep->com.state) { case MPA_REQ_WAIT: CTR2(KTR_IW_CXGBE, "%s:ppc1 %p MPA_REQ_WAIT CLOSING", __func__, ep); __state_set(&ep->com, CLOSING); break; case MPA_REQ_SENT: CTR2(KTR_IW_CXGBE, "%s:ppc2 %p MPA_REQ_SENT CLOSING", __func__, ep); __state_set(&ep->com, DEAD); connect_reply_upcall(ep, -ECONNABORTED); disconnect = 0; STOP_EP_TIMER(ep); close_socket(&ep->com, 0); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; release = 1; break; case MPA_REQ_RCVD: /* * We're gonna mark this puppy DEAD, but keep * the reference on it until the ULP accepts or * rejects the CR. */ CTR2(KTR_IW_CXGBE, "%s:ppc3 %p MPA_REQ_RCVD CLOSING", __func__, ep); __state_set(&ep->com, CLOSING); c4iw_get_ep(&ep->com); break; case MPA_REP_SENT: CTR2(KTR_IW_CXGBE, "%s:ppc4 %p MPA_REP_SENT CLOSING", __func__, ep); __state_set(&ep->com, CLOSING); break; case FPDU_MODE: CTR2(KTR_IW_CXGBE, "%s:ppc5 %p FPDU_MODE CLOSING", __func__, ep); START_EP_TIMER(ep); __state_set(&ep->com, CLOSING); attrs.next_state = C4IW_QP_STATE_CLOSING; c4iw_modify_qp(ep->com.dev, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); peer_close_upcall(ep); break; case ABORTING: CTR2(KTR_IW_CXGBE, "%s:ppc6 %p ABORTING (disconn)", __func__, ep); disconnect = 0; break; case CLOSING: CTR2(KTR_IW_CXGBE, "%s:ppc7 %p CLOSING MORIBUND", __func__, ep); __state_set(&ep->com, MORIBUND); disconnect = 0; break; case MORIBUND: CTR2(KTR_IW_CXGBE, "%s:ppc8 %p MORIBUND DEAD", __func__, ep); STOP_EP_TIMER(ep); if (ep->com.cm_id && ep->com.qp) { attrs.next_state = C4IW_QP_STATE_IDLE; c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); } close_socket(&ep->com, 0); close_complete_upcall(ep, 0); __state_set(&ep->com, DEAD); release = 1; disconnect = 0; break; case DEAD: CTR2(KTR_IW_CXGBE, "%s:ppc9 %p DEAD (disconn)", __func__, ep); disconnect = 0; break; default: panic("%s: ep %p state %d", __func__, ep, ep->com.state); break; } mutex_unlock(&ep->com.mutex); if (disconnect) { CTR2(KTR_IW_CXGBE, "%s:ppca %p", __func__, ep); c4iw_ep_disconnect(ep, 0, M_NOWAIT); } if (release) { CTR2(KTR_IW_CXGBE, "%s:ppcb %p", __func__, ep); c4iw_put_ep(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:ppcE %p", __func__, ep); return; } static void process_conn_error(struct c4iw_ep *ep) { struct c4iw_qp_attributes attrs; int ret; int state; state = state_read(&ep->com); CTR5(KTR_IW_CXGBE, "%s:pceB ep %p so %p so->so_error %u state %s", __func__, ep, ep->com.so, ep->com.so->so_error, states[ep->com.state]); switch (state) { case MPA_REQ_WAIT: STOP_EP_TIMER(ep); break; case MPA_REQ_SENT: STOP_EP_TIMER(ep); connect_reply_upcall(ep, -ECONNRESET); break; case MPA_REP_SENT: ep->com.rpl_err = ECONNRESET; CTR1(KTR_IW_CXGBE, "waking up ep %p", ep); break; case MPA_REQ_RCVD: /* * We're gonna mark this puppy DEAD, but keep * the reference on it until the ULP accepts or * rejects the CR. */ c4iw_get_ep(&ep->com); break; case MORIBUND: case CLOSING: STOP_EP_TIMER(ep); /*FALLTHROUGH*/ case FPDU_MODE: if (ep->com.cm_id && ep->com.qp) { attrs.next_state = C4IW_QP_STATE_ERROR; ret = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); if (ret) log(LOG_ERR, "%s - qp <- error failed!\n", __func__); } peer_abort_upcall(ep); break; case ABORTING: break; case DEAD: CTR2(KTR_IW_CXGBE, "%s so_error %d IN DEAD STATE!!!!", __func__, ep->com.so->so_error); return; default: panic("%s: ep %p state %d", __func__, ep, state); break; } if (state != ABORTING) { CTR2(KTR_IW_CXGBE, "%s:pce1 %p", __func__, ep); close_socket(&ep->com, 1); state_set(&ep->com, DEAD); c4iw_put_ep(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:pceE %p", __func__, ep); return; } static void process_close_complete(struct c4iw_ep *ep) { struct c4iw_qp_attributes attrs; int release = 0; CTR4(KTR_IW_CXGBE, "%s:pccB ep %p so %p state %s", __func__, ep, ep->com.so, states[ep->com.state]); /* The cm_id may be null if we failed to connect */ mutex_lock(&ep->com.mutex); switch (ep->com.state) { case CLOSING: CTR2(KTR_IW_CXGBE, "%s:pcc1 %p CLOSING MORIBUND", __func__, ep); __state_set(&ep->com, MORIBUND); break; case MORIBUND: CTR2(KTR_IW_CXGBE, "%s:pcc1 %p MORIBUND DEAD", __func__, ep); STOP_EP_TIMER(ep); if ((ep->com.cm_id) && (ep->com.qp)) { CTR2(KTR_IW_CXGBE, "%s:pcc2 %p QP_STATE_IDLE", __func__, ep); attrs.next_state = C4IW_QP_STATE_IDLE; c4iw_modify_qp(ep->com.dev, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); } if (ep->parent_ep) { CTR2(KTR_IW_CXGBE, "%s:pcc3 %p", __func__, ep); close_socket(&ep->com, 1); } else { CTR2(KTR_IW_CXGBE, "%s:pcc4 %p", __func__, ep); close_socket(&ep->com, 0); } close_complete_upcall(ep, 0); __state_set(&ep->com, DEAD); release = 1; break; case ABORTING: CTR2(KTR_IW_CXGBE, "%s:pcc5 %p ABORTING", __func__, ep); break; case DEAD: default: CTR2(KTR_IW_CXGBE, "%s:pcc6 %p DEAD", __func__, ep); panic("%s:pcc6 %p DEAD", __func__, ep); break; } mutex_unlock(&ep->com.mutex); if (release) { CTR2(KTR_IW_CXGBE, "%s:pcc7 %p", __func__, ep); c4iw_put_ep(&ep->com); } CTR2(KTR_IW_CXGBE, "%s:pccE %p", __func__, ep); return; } static void init_sock(struct c4iw_ep_common *epc) { int rc; struct sockopt sopt; struct socket *so = epc->so; int on = 1; SOCK_LOCK(so); soupcall_set(so, SO_RCV, c4iw_so_upcall, epc); so->so_state |= SS_NBIO; SOCK_UNLOCK(so); sopt.sopt_dir = SOPT_SET; sopt.sopt_level = IPPROTO_TCP; sopt.sopt_name = TCP_NODELAY; sopt.sopt_val = (caddr_t)&on; sopt.sopt_valsize = sizeof on; sopt.sopt_td = NULL; rc = sosetopt(so, &sopt); if (rc) { log(LOG_ERR, "%s: can't set TCP_NODELAY on so %p (%d)\n", __func__, so, rc); } } static void process_data(struct c4iw_ep *ep) { struct sockaddr_in *local, *remote; CTR5(KTR_IW_CXGBE, "%s: so %p, ep %p, state %s, sbused %d", __func__, ep->com.so, ep, states[ep->com.state], sbused(&ep->com.so->so_rcv)); switch (state_read(&ep->com)) { case MPA_REQ_SENT: process_mpa_reply(ep); break; case MPA_REQ_WAIT: in_getsockaddr(ep->com.so, (struct sockaddr **)&local); in_getpeeraddr(ep->com.so, (struct sockaddr **)&remote); ep->com.local_addr = *local; ep->com.remote_addr = *remote; free(local, M_SONAME); free(remote, M_SONAME); process_mpa_request(ep); break; default: if (sbused(&ep->com.so->so_rcv)) log(LOG_ERR, "%s: Unexpected streaming data. ep %p, " "state %d, so %p, so_state 0x%x, sbused %u\n", __func__, ep, state_read(&ep->com), ep->com.so, ep->com.so->so_state, sbused(&ep->com.so->so_rcv)); break; } } static void process_connected(struct c4iw_ep *ep) { if ((ep->com.so->so_state & SS_ISCONNECTED) && !ep->com.so->so_error) send_mpa_req(ep); else { connect_reply_upcall(ep, -ep->com.so->so_error); close_socket(&ep->com, 0); state_set(&ep->com, DEAD); c4iw_put_ep(&ep->com); } } static struct socket * dequeue_socket(struct socket *head, struct sockaddr_in **remote, struct c4iw_ep *child_ep) { struct socket *so; ACCEPT_LOCK(); so = TAILQ_FIRST(&head->so_comp); if (!so) { ACCEPT_UNLOCK(); return (NULL); } TAILQ_REMOVE(&head->so_comp, so, so_list); head->so_qlen--; SOCK_LOCK(so); so->so_qstate &= ~SQ_COMP; so->so_head = NULL; soref(so); soupcall_set(so, SO_RCV, c4iw_so_upcall, child_ep); so->so_state |= SS_NBIO; SOCK_UNLOCK(so); ACCEPT_UNLOCK(); soaccept(so, (struct sockaddr **)remote); return (so); } static void process_newconn(struct c4iw_ep *parent_ep) { struct socket *child_so; struct c4iw_ep *child_ep; struct sockaddr_in *remote; child_ep = alloc_ep(sizeof(*child_ep), M_NOWAIT); if (!child_ep) { CTR3(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, ENOMEM", __func__, parent_ep->com.so, parent_ep); log(LOG_ERR, "%s: failed to allocate ep entry\n", __func__); return; } child_so = dequeue_socket(parent_ep->com.so, &remote, child_ep); if (!child_so) { CTR4(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, child ep %p, dequeue err", __func__, parent_ep->com.so, parent_ep, child_ep); log(LOG_ERR, "%s: failed to dequeue child socket\n", __func__); __free_ep(&child_ep->com); return; } CTR5(KTR_IW_CXGBE, "%s: parent so %p, parent ep %p, child so %p, child ep %p", __func__, parent_ep->com.so, parent_ep, child_so, child_ep); child_ep->com.local_addr = parent_ep->com.local_addr; child_ep->com.remote_addr = *remote; child_ep->com.dev = parent_ep->com.dev; child_ep->com.so = child_so; child_ep->com.cm_id = NULL; child_ep->com.thread = parent_ep->com.thread; child_ep->parent_ep = parent_ep; free(remote, M_SONAME); c4iw_get_ep(&parent_ep->com); child_ep->parent_ep = parent_ep; init_timer(&child_ep->timer); state_set(&child_ep->com, MPA_REQ_WAIT); START_EP_TIMER(child_ep); /* maybe the request has already been queued up on the socket... */ process_mpa_request(child_ep); } static int c4iw_so_upcall(struct socket *so, void *arg, int waitflag) { struct c4iw_ep *ep = arg; spin_lock(&req_lock); CTR6(KTR_IW_CXGBE, "%s: so %p, so_state 0x%x, ep %p, ep_state %s, tqe_prev %p", __func__, so, so->so_state, ep, states[ep->com.state], ep->com.entry.tqe_prev); if (ep && ep->com.so && !ep->com.entry.tqe_prev) { KASSERT(ep->com.so == so, ("%s: XXX review.", __func__)); c4iw_get_ep(&ep->com); TAILQ_INSERT_TAIL(&req_list, &ep->com, entry); queue_work(c4iw_taskq, &c4iw_task); } spin_unlock(&req_lock); return (SU_OK); } static void process_socket_event(struct c4iw_ep *ep) { int state = state_read(&ep->com); struct socket *so = ep->com.so; CTR6(KTR_IW_CXGBE, "process_socket_event: so %p, so_state 0x%x, " "so_err %d, sb_state 0x%x, ep %p, ep_state %s", so, so->so_state, so->so_error, so->so_rcv.sb_state, ep, states[state]); if (state == CONNECTING) { process_connected(ep); return; } if (state == LISTEN) { process_newconn(ep); return; } /* connection error */ if (so->so_error) { process_conn_error(ep); return; } /* peer close */ if ((so->so_rcv.sb_state & SBS_CANTRCVMORE) && state < CLOSING) { process_peer_close(ep); return; } /* close complete */ if (so->so_state & SS_ISDISCONNECTED) { process_close_complete(ep); return; } /* rx data */ process_data(ep); } SYSCTL_NODE(_hw, OID_AUTO, iw_cxgbe, CTLFLAG_RD, 0, "iw_cxgbe driver parameters"); int db_delay_usecs = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, db_delay_usecs, CTLFLAG_RWTUN, &db_delay_usecs, 0, "Usecs to delay awaiting db fifo to drain"); static int dack_mode = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, dack_mode, CTLFLAG_RWTUN, &dack_mode, 0, "Delayed ack mode (default = 1)"); int c4iw_max_read_depth = 8; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, c4iw_max_read_depth, CTLFLAG_RWTUN, &c4iw_max_read_depth, 0, "Per-connection max ORD/IRD (default = 8)"); static int enable_tcp_timestamps; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, enable_tcp_timestamps, CTLFLAG_RWTUN, &enable_tcp_timestamps, 0, "Enable tcp timestamps (default = 0)"); static int enable_tcp_sack; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, enable_tcp_sack, CTLFLAG_RWTUN, &enable_tcp_sack, 0, "Enable tcp SACK (default = 0)"); static int enable_tcp_window_scaling = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, enable_tcp_window_scaling, CTLFLAG_RWTUN, &enable_tcp_window_scaling, 0, "Enable tcp window scaling (default = 1)"); int c4iw_debug = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, c4iw_debug, CTLFLAG_RWTUN, &c4iw_debug, 0, "Enable debug logging (default = 0)"); static int peer2peer; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, peer2peer, CTLFLAG_RWTUN, &peer2peer, 0, "Support peer2peer ULPs (default = 0)"); static int p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, p2p_type, CTLFLAG_RWTUN, &p2p_type, 0, "RDMAP opcode to use for the RTR message: 1 = RDMA_READ 0 = RDMA_WRITE (default 1)"); static int ep_timeout_secs = 60; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, ep_timeout_secs, CTLFLAG_RWTUN, &ep_timeout_secs, 0, "CM Endpoint operation timeout in seconds (default = 60)"); static int mpa_rev = 1; #ifdef IW_CM_MPAV2 SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, mpa_rev, CTLFLAG_RWTUN, &mpa_rev, 0, "MPA Revision, 0 supports amso1100, 1 is RFC0544 spec compliant, 2 is IETF MPA Peer Connect Draft compliant (default = 1)"); #else SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, mpa_rev, CTLFLAG_RWTUN, &mpa_rev, 0, "MPA Revision, 0 supports amso1100, 1 is RFC0544 spec compliant (default = 1)"); #endif static int markers_enabled; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, markers_enabled, CTLFLAG_RWTUN, &markers_enabled, 0, "Enable MPA MARKERS (default(0) = disabled)"); static int crc_enabled = 1; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, crc_enabled, CTLFLAG_RWTUN, &crc_enabled, 0, "Enable MPA CRC (default(1) = enabled)"); static int rcv_win = 256 * 1024; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, rcv_win, CTLFLAG_RWTUN, &rcv_win, 0, "TCP receive window in bytes (default = 256KB)"); static int snd_win = 128 * 1024; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, snd_win, CTLFLAG_RWTUN, &snd_win, 0, "TCP send window in bytes (default = 128KB)"); int db_fc_threshold = 2000; SYSCTL_INT(_hw_iw_cxgbe, OID_AUTO, db_fc_threshold, CTLFLAG_RWTUN, &db_fc_threshold, 0, "QP count/threshold that triggers automatic"); static void start_ep_timer(struct c4iw_ep *ep) { if (timer_pending(&ep->timer)) { CTR2(KTR_IW_CXGBE, "%s: ep %p, already started", __func__, ep); printk(KERN_ERR "%s timer already started! ep %p\n", __func__, ep); return; } clear_bit(TIMEOUT, &ep->com.flags); c4iw_get_ep(&ep->com); ep->timer.expires = jiffies + ep_timeout_secs * HZ; ep->timer.data = (unsigned long)ep; ep->timer.function = ep_timeout; add_timer(&ep->timer); } static void stop_ep_timer(struct c4iw_ep *ep) { del_timer_sync(&ep->timer); if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) { c4iw_put_ep(&ep->com); } } static enum c4iw_ep_state state_read(struct c4iw_ep_common *epc) { enum c4iw_ep_state state; mutex_lock(&epc->mutex); state = epc->state; mutex_unlock(&epc->mutex); return (state); } static void __state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state new) { epc->state = new; } static void state_set(struct c4iw_ep_common *epc, enum c4iw_ep_state new) { mutex_lock(&epc->mutex); __state_set(epc, new); mutex_unlock(&epc->mutex); } static void * alloc_ep(int size, gfp_t gfp) { struct c4iw_ep_common *epc; epc = kzalloc(size, gfp); if (epc == NULL) return (NULL); kref_init(&epc->kref); mutex_init(&epc->mutex); c4iw_init_wr_wait(&epc->wr_wait); return (epc); } void __free_ep(struct c4iw_ep_common *epc) { CTR2(KTR_IW_CXGBE, "%s:feB %p", __func__, epc); KASSERT(!epc->so, ("%s warning ep->so %p \n", __func__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list!\n", __func__, epc)); free(epc, M_DEVBUF); CTR2(KTR_IW_CXGBE, "%s:feE %p", __func__, epc); } void _c4iw_free_ep(struct kref *kref) { struct c4iw_ep *ep; struct c4iw_ep_common *epc; ep = container_of(kref, struct c4iw_ep, com.kref); epc = &ep->com; KASSERT(!epc->so, ("%s ep->so %p", __func__, epc->so)); KASSERT(!epc->entry.tqe_prev, ("%s epc %p still on req list", __func__, epc)); kfree(ep); } static void release_ep_resources(struct c4iw_ep *ep) { CTR2(KTR_IW_CXGBE, "%s:rerB %p", __func__, ep); set_bit(RELEASE_RESOURCES, &ep->com.flags); c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:rerE %p", __func__, ep); } static void send_mpa_req(struct c4iw_ep *ep) { int mpalen; struct mpa_message *mpa; struct mpa_v2_conn_params mpa_v2_params; struct mbuf *m; char mpa_rev_to_use = mpa_rev; int err; if (ep->retry_with_mpa_v1) mpa_rev_to_use = 1; mpalen = sizeof(*mpa) + ep->plen; if (mpa_rev_to_use == 2) mpalen += sizeof(struct mpa_v2_conn_params); - if (mpalen > MHLEN) - CXGBE_UNIMPLEMENTED(__func__); - - m = m_gethdr(M_NOWAIT, MT_DATA); - if (m == NULL) { + mpa = malloc(mpalen, M_CXGBE, M_NOWAIT); + if (mpa == NULL) { +failed: connect_reply_upcall(ep, -ENOMEM); return; } - mpa = mtod(m, struct mpa_message *); - m->m_len = mpalen; - m->m_pkthdr.len = mpalen; + memset(mpa, 0, mpalen); memcpy(mpa->key, MPA_KEY_REQ, sizeof(mpa->key)); mpa->flags = (crc_enabled ? MPA_CRC : 0) | (markers_enabled ? MPA_MARKERS : 0) | (mpa_rev_to_use == 2 ? MPA_ENHANCED_RDMA_CONN : 0); mpa->private_data_size = htons(ep->plen); mpa->revision = mpa_rev_to_use; if (mpa_rev_to_use == 1) { ep->tried_with_mpa_v1 = 1; ep->retry_with_mpa_v1 = 0; } if (mpa_rev_to_use == 2) { mpa->private_data_size += htons(sizeof(struct mpa_v2_conn_params)); mpa_v2_params.ird = htons((u16)ep->ird); mpa_v2_params.ord = htons((u16)ep->ord); if (peer2peer) { mpa_v2_params.ird |= htons(MPA_V2_PEER2PEER_MODEL); if (p2p_type == FW_RI_INIT_P2PTYPE_RDMA_WRITE) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_WRITE_RTR); } else if (p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_READ_RTR); } } memcpy(mpa->private_data, &mpa_v2_params, sizeof(struct mpa_v2_conn_params)); if (ep->plen) { memcpy(mpa->private_data + sizeof(struct mpa_v2_conn_params), ep->mpa_pkt + sizeof(*mpa), ep->plen); } } else { if (ep->plen) memcpy(mpa->private_data, ep->mpa_pkt + sizeof(*mpa), ep->plen); CTR2(KTR_IW_CXGBE, "%s:smr7 %p", __func__, ep); } - err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); - if (err) { - connect_reply_upcall(ep, -ENOMEM); - return; + m = m_getm(NULL, mpalen, M_NOWAIT, MT_DATA); + if (m == NULL) { + free(mpa, M_CXGBE); + goto failed; } + m_copyback(m, 0, mpalen, (void *)mpa); + free(mpa, M_CXGBE); + err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, + ep->com.thread); + if (err) + goto failed; + START_EP_TIMER(ep); state_set(&ep->com, MPA_REQ_SENT); ep->mpa_attr.initiator = 1; } static int send_mpa_reject(struct c4iw_ep *ep, const void *pdata, u8 plen) { int mpalen ; struct mpa_message *mpa; struct mpa_v2_conn_params mpa_v2_params; struct mbuf *m; int err; CTR4(KTR_IW_CXGBE, "%s:smrejB %p %u %d", __func__, ep, ep->hwtid, ep->plen); mpalen = sizeof(*mpa) + plen; if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { mpalen += sizeof(struct mpa_v2_conn_params); CTR4(KTR_IW_CXGBE, "%s:smrej1 %p %u %d", __func__, ep, ep->mpa_attr.version, mpalen); } - if (mpalen > MHLEN) - CXGBE_UNIMPLEMENTED(__func__); - - m = m_gethdr(M_NOWAIT, MT_DATA); - if (m == NULL) { - - printf("%s - cannot alloc mbuf!\n", __func__); - CTR2(KTR_IW_CXGBE, "%s:smrej2 %p", __func__, ep); + mpa = malloc(mpalen, M_CXGBE, M_NOWAIT); + if (mpa == NULL) return (-ENOMEM); - } - - mpa = mtod(m, struct mpa_message *); - m->m_len = mpalen; - m->m_pkthdr.len = mpalen; - memset(mpa, 0, sizeof(*mpa)); + memset(mpa, 0, mpalen); memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key)); mpa->flags = MPA_REJECT; mpa->revision = mpa_rev; mpa->private_data_size = htons(plen); if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { mpa->flags |= MPA_ENHANCED_RDMA_CONN; mpa->private_data_size += htons(sizeof(struct mpa_v2_conn_params)); mpa_v2_params.ird = htons(((u16)ep->ird) | (peer2peer ? MPA_V2_PEER2PEER_MODEL : 0)); mpa_v2_params.ord = htons(((u16)ep->ord) | (peer2peer ? (p2p_type == FW_RI_INIT_P2PTYPE_RDMA_WRITE ? MPA_V2_RDMA_WRITE_RTR : p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ ? MPA_V2_RDMA_READ_RTR : 0) : 0)); memcpy(mpa->private_data, &mpa_v2_params, sizeof(struct mpa_v2_conn_params)); if (ep->plen) memcpy(mpa->private_data + sizeof(struct mpa_v2_conn_params), pdata, plen); CTR5(KTR_IW_CXGBE, "%s:smrej3 %p %d %d %d", __func__, ep, mpa_v2_params.ird, mpa_v2_params.ord, ep->plen); } else if (plen) memcpy(mpa->private_data, pdata, plen); - err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); + m = m_getm(NULL, mpalen, M_NOWAIT, MT_DATA); + if (m == NULL) { + free(mpa, M_CXGBE); + return (-ENOMEM); + } + m_copyback(m, 0, mpalen, (void *)mpa); + free(mpa, M_CXGBE); + + err = -sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); if (!err) ep->snd_seq += mpalen; CTR4(KTR_IW_CXGBE, "%s:smrejE %p %u %d", __func__, ep, ep->hwtid, err); return err; } static int send_mpa_reply(struct c4iw_ep *ep, const void *pdata, u8 plen) { int mpalen; struct mpa_message *mpa; struct mbuf *m; struct mpa_v2_conn_params mpa_v2_params; int err; CTR2(KTR_IW_CXGBE, "%s:smrepB %p", __func__, ep); mpalen = sizeof(*mpa) + plen; if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { CTR3(KTR_IW_CXGBE, "%s:smrep1 %p %d", __func__, ep, ep->mpa_attr.version); mpalen += sizeof(struct mpa_v2_conn_params); } - if (mpalen > MHLEN) - CXGBE_UNIMPLEMENTED(__func__); - - m = m_gethdr(M_NOWAIT, MT_DATA); - if (m == NULL) { - - CTR2(KTR_IW_CXGBE, "%s:smrep2 %p", __func__, ep); - printf("%s - cannot alloc mbuf!\n", __func__); + mpa = malloc(mpalen, M_CXGBE, M_NOWAIT); + if (mpa == NULL) return (-ENOMEM); - } - - mpa = mtod(m, struct mpa_message *); - m->m_len = mpalen; - m->m_pkthdr.len = mpalen; memset(mpa, 0, sizeof(*mpa)); memcpy(mpa->key, MPA_KEY_REP, sizeof(mpa->key)); mpa->flags = (ep->mpa_attr.crc_enabled ? MPA_CRC : 0) | (markers_enabled ? MPA_MARKERS : 0); mpa->revision = ep->mpa_attr.version; mpa->private_data_size = htons(plen); if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { mpa->flags |= MPA_ENHANCED_RDMA_CONN; mpa->private_data_size += htons(sizeof(struct mpa_v2_conn_params)); mpa_v2_params.ird = htons((u16)ep->ird); mpa_v2_params.ord = htons((u16)ep->ord); CTR5(KTR_IW_CXGBE, "%s:smrep3 %p %d %d %d", __func__, ep, ep->mpa_attr.version, mpa_v2_params.ird, mpa_v2_params.ord); if (peer2peer && (ep->mpa_attr.p2p_type != FW_RI_INIT_P2PTYPE_DISABLED)) { mpa_v2_params.ird |= htons(MPA_V2_PEER2PEER_MODEL); if (p2p_type == FW_RI_INIT_P2PTYPE_RDMA_WRITE) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_WRITE_RTR); CTR5(KTR_IW_CXGBE, "%s:smrep4 %p %d %d %d", __func__, ep, p2p_type, mpa_v2_params.ird, mpa_v2_params.ord); } else if (p2p_type == FW_RI_INIT_P2PTYPE_READ_REQ) { mpa_v2_params.ord |= htons(MPA_V2_RDMA_READ_RTR); CTR5(KTR_IW_CXGBE, "%s:smrep5 %p %d %d %d", __func__, ep, p2p_type, mpa_v2_params.ird, mpa_v2_params.ord); } } memcpy(mpa->private_data, &mpa_v2_params, sizeof(struct mpa_v2_conn_params)); if (ep->plen) memcpy(mpa->private_data + sizeof(struct mpa_v2_conn_params), pdata, plen); } else if (plen) memcpy(mpa->private_data, pdata, plen); + m = m_getm(NULL, mpalen, M_NOWAIT, MT_DATA); + if (m == NULL) { + free(mpa, M_CXGBE); + return (-ENOMEM); + } + m_copyback(m, 0, mpalen, (void *)mpa); + free(mpa, M_CXGBE); + + state_set(&ep->com, MPA_REP_SENT); ep->snd_seq += mpalen; - err = sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, + err = -sosend(ep->com.so, NULL, NULL, m, NULL, MSG_DONTWAIT, ep->com.thread); CTR3(KTR_IW_CXGBE, "%s:smrepE %p %d", __func__, ep, err); return err; } static void close_complete_upcall(struct c4iw_ep *ep, int status) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:ccuB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CLOSE; event.status = status; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:ccu1 %1", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; set_bit(CLOSE_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:ccuE %p", __func__, ep); } static int abort_connection(struct c4iw_ep *ep) { int err; CTR2(KTR_IW_CXGBE, "%s:abB %p", __func__, ep); close_complete_upcall(ep, -ECONNRESET); state_set(&ep->com, ABORTING); abort_socket(ep); err = close_socket(&ep->com, 0); set_bit(ABORT_CONN, &ep->com.history); CTR2(KTR_IW_CXGBE, "%s:abE %p", __func__, ep); return err; } static void peer_close_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:pcuB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_DISCONNECT; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:pcu1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); set_bit(DISCONN_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:pcuE %p", __func__, ep); } static void peer_abort_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:pauB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CLOSE; event.status = -ECONNRESET; if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:pau1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; set_bit(ABORT_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:pauE %p", __func__, ep); } static void connect_reply_upcall(struct c4iw_ep *ep, int status) { struct iw_cm_event event; CTR3(KTR_IW_CXGBE, "%s:cruB %p", __func__, ep, status); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CONNECT_REPLY; event.status = (status ==-ECONNABORTED)?-ECONNRESET: status; event.local_addr = ep->com.local_addr; event.remote_addr = ep->com.remote_addr; if ((status == 0) || (status == -ECONNREFUSED)) { if (!ep->tried_with_mpa_v1) { CTR2(KTR_IW_CXGBE, "%s:cru1 %p", __func__, ep); /* this means MPA_v2 is used */ event.private_data_len = ep->plen - sizeof(struct mpa_v2_conn_params); event.private_data = ep->mpa_pkt + sizeof(struct mpa_message) + sizeof(struct mpa_v2_conn_params); } else { CTR2(KTR_IW_CXGBE, "%s:cru2 %p", __func__, ep); /* this means MPA_v1 is used */ event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); } } if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:cru3 %p", __func__, ep); set_bit(CONN_RPL_UPCALL, &ep->com.history); ep->com.cm_id->event_handler(ep->com.cm_id, &event); } if(status == -ECONNABORTED) { CTR3(KTR_IW_CXGBE, "%s:cruE %p %d", __func__, ep, status); return; } if (status < 0) { CTR3(KTR_IW_CXGBE, "%s:cru4 %p %d", __func__, ep, status); ep->com.cm_id->rem_ref(ep->com.cm_id); ep->com.cm_id = NULL; ep->com.qp = NULL; } CTR2(KTR_IW_CXGBE, "%s:cruE %p", __func__, ep); } static void connect_request_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR3(KTR_IW_CXGBE, "%s: ep %p, mpa_v1 %d", __func__, ep, ep->tried_with_mpa_v1); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_CONNECT_REQUEST; event.local_addr = ep->com.local_addr; event.remote_addr = ep->com.remote_addr; event.provider_data = ep; event.so = ep->com.so; if (!ep->tried_with_mpa_v1) { /* this means MPA_v2 is used */ #ifdef IW_CM_MPAV2 event.ord = ep->ord; event.ird = ep->ird; #endif event.private_data_len = ep->plen - sizeof(struct mpa_v2_conn_params); event.private_data = ep->mpa_pkt + sizeof(struct mpa_message) + sizeof(struct mpa_v2_conn_params); } else { /* this means MPA_v1 is used. Send max supported */ #ifdef IW_CM_MPAV2 event.ord = c4iw_max_read_depth; event.ird = c4iw_max_read_depth; #endif event.private_data_len = ep->plen; event.private_data = ep->mpa_pkt + sizeof(struct mpa_message); } c4iw_get_ep(&ep->com); ep->parent_ep->com.cm_id->event_handler(ep->parent_ep->com.cm_id, &event); set_bit(CONNREQ_UPCALL, &ep->com.history); c4iw_put_ep(&ep->parent_ep->com); } static void established_upcall(struct c4iw_ep *ep) { struct iw_cm_event event; CTR2(KTR_IW_CXGBE, "%s:euB %p", __func__, ep); memset(&event, 0, sizeof(event)); event.event = IW_CM_EVENT_ESTABLISHED; #ifdef IW_CM_MPAV2 event.ird = ep->ird; event.ord = ep->ord; #endif if (ep->com.cm_id) { CTR2(KTR_IW_CXGBE, "%s:eu1 %p", __func__, ep); ep->com.cm_id->event_handler(ep->com.cm_id, &event); set_bit(ESTAB_UPCALL, &ep->com.history); } CTR2(KTR_IW_CXGBE, "%s:euE %p", __func__, ep); } static void process_mpa_reply(struct c4iw_ep *ep) { struct mpa_message *mpa; struct mpa_v2_conn_params *mpa_v2_params; u16 plen; u16 resp_ird, resp_ord; u8 rtr_mismatch = 0, insuff_ird = 0; struct c4iw_qp_attributes attrs; enum c4iw_qp_attr_mask mask; int err; struct mbuf *top, *m; int flags = MSG_DONTWAIT; struct uio uio; CTR2(KTR_IW_CXGBE, "%s:pmrB %p", __func__, ep); /* * Stop mpa timer. If it expired, then the state has * changed and we bail since ep_timeout already aborted * the connection. */ STOP_EP_TIMER(ep); if (state_read(&ep->com) != MPA_REQ_SENT) return; uio.uio_resid = 1000000; uio.uio_td = ep->com.thread; err = soreceive(ep->com.so, NULL, &uio, &top, NULL, &flags); if (err) { if (err == EWOULDBLOCK) { CTR2(KTR_IW_CXGBE, "%s:pmr1 %p", __func__, ep); START_EP_TIMER(ep); return; } err = -err; CTR2(KTR_IW_CXGBE, "%s:pmr2 %p", __func__, ep); goto err; } if (ep->com.so->so_rcv.sb_mb) { CTR2(KTR_IW_CXGBE, "%s:pmr3 %p", __func__, ep); printf("%s data after soreceive called! so %p sb_mb %p top %p\n", __func__, ep->com.so, ep->com.so->so_rcv.sb_mb, top); } m = top; do { CTR2(KTR_IW_CXGBE, "%s:pmr4 %p", __func__, ep); /* * If we get more than the supported amount of private data * then we must fail this connection. */ if (ep->mpa_pkt_len + m->m_len > sizeof(ep->mpa_pkt)) { CTR3(KTR_IW_CXGBE, "%s:pmr5 %p %d", __func__, ep, ep->mpa_pkt_len + m->m_len); err = (-EINVAL); goto err; } /* * copy the new data into our accumulation buffer. */ m_copydata(m, 0, m->m_len, &(ep->mpa_pkt[ep->mpa_pkt_len])); ep->mpa_pkt_len += m->m_len; if (!m->m_next) m = m->m_nextpkt; else m = m->m_next; } while (m); m_freem(top); /* * if we don't even have the mpa message, then bail. */ if (ep->mpa_pkt_len < sizeof(*mpa)) return; mpa = (struct mpa_message *) ep->mpa_pkt; /* Validate MPA header. */ if (mpa->revision > mpa_rev) { CTR4(KTR_IW_CXGBE, "%s:pmr6 %p %d %d", __func__, ep, mpa->revision, mpa_rev); printk(KERN_ERR MOD "%s MPA version mismatch. Local = %d, " " Received = %d\n", __func__, mpa_rev, mpa->revision); err = -EPROTO; goto err; } if (memcmp(mpa->key, MPA_KEY_REP, sizeof(mpa->key))) { CTR2(KTR_IW_CXGBE, "%s:pmr7 %p", __func__, ep); err = -EPROTO; goto err; } plen = ntohs(mpa->private_data_size); /* * Fail if there's too much private data. */ if (plen > MPA_MAX_PRIVATE_DATA) { CTR2(KTR_IW_CXGBE, "%s:pmr8 %p", __func__, ep); err = -EPROTO; goto err; } /* * If plen does not account for pkt size */ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGBE, "%s:pmr9 %p", __func__, ep); err = -EPROTO; goto err; } ep->plen = (u8) plen; /* * If we don't have all the pdata yet, then bail. * We'll continue process when more data arrives. */ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) { CTR2(KTR_IW_CXGBE, "%s:pmra %p", __func__, ep); return; } if (mpa->flags & MPA_REJECT) { CTR2(KTR_IW_CXGBE, "%s:pmrb %p", __func__, ep); err = -ECONNREFUSED; goto err; } /* * If we get here we have accumulated the entire mpa * start reply message including private data. And * the MPA header is valid. */ state_set(&ep->com, FPDU_MODE); ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; ep->mpa_attr.version = mpa->revision; ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_DISABLED; if (mpa->revision == 2) { CTR2(KTR_IW_CXGBE, "%s:pmrc %p", __func__, ep); ep->mpa_attr.enhanced_rdma_conn = mpa->flags & MPA_ENHANCED_RDMA_CONN ? 1 : 0; if (ep->mpa_attr.enhanced_rdma_conn) { CTR2(KTR_IW_CXGBE, "%s:pmrd %p", __func__, ep); mpa_v2_params = (struct mpa_v2_conn_params *) (ep->mpa_pkt + sizeof(*mpa)); resp_ird = ntohs(mpa_v2_params->ird) & MPA_V2_IRD_ORD_MASK; resp_ord = ntohs(mpa_v2_params->ord) & MPA_V2_IRD_ORD_MASK; /* * This is a double-check. Ideally, below checks are * not required since ird/ord stuff has been taken * care of in c4iw_accept_cr */ if ((ep->ird < resp_ord) || (ep->ord > resp_ird)) { CTR2(KTR_IW_CXGBE, "%s:pmre %p", __func__, ep); err = -ENOMEM; ep->ird = resp_ord; ep->ord = resp_ird; insuff_ird = 1; } if (ntohs(mpa_v2_params->ird) & MPA_V2_PEER2PEER_MODEL) { CTR2(KTR_IW_CXGBE, "%s:pmrf %p", __func__, ep); if (ntohs(mpa_v2_params->ord) & MPA_V2_RDMA_WRITE_RTR) { CTR2(KTR_IW_CXGBE, "%s:pmrg %p", __func__, ep); ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_RDMA_WRITE; } else if (ntohs(mpa_v2_params->ord) & MPA_V2_RDMA_READ_RTR) { CTR2(KTR_IW_CXGBE, "%s:pmrh %p", __func__, ep); ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; } } } } else { CTR2(KTR_IW_CXGBE, "%s:pmri %p", __func__, ep); if (mpa->revision == 1) { CTR2(KTR_IW_CXGBE, "%s:pmrj %p", __func__, ep); if (peer2peer) { CTR2(KTR_IW_CXGBE, "%s:pmrk %p", __func__, ep); ep->mpa_attr.p2p_type = p2p_type; } } } if (set_tcpinfo(ep)) { CTR2(KTR_IW_CXGBE, "%s:pmrl %p", __func__, ep); printf("%s set_tcpinfo error\n", __func__); goto err; } CTR6(KTR_IW_CXGBE, "%s - crc_enabled = %d, recv_marker_enabled = %d, " "xmit_marker_enabled = %d, version = %d p2p_type = %d", __func__, ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled, ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version, ep->mpa_attr.p2p_type); /* * If responder's RTR does not match with that of initiator, assign * FW_RI_INIT_P2PTYPE_DISABLED in mpa attributes so that RTR is not * generated when moving QP to RTS state. * A TERM message will be sent after QP has moved to RTS state */ if ((ep->mpa_attr.version == 2) && peer2peer && (ep->mpa_attr.p2p_type != p2p_type)) { CTR2(KTR_IW_CXGBE, "%s:pmrm %p", __func__, ep); ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_DISABLED; rtr_mismatch = 1; } //ep->ofld_txq = TOEPCB(ep->com.so)->ofld_txq; attrs.mpa_attr = ep->mpa_attr; attrs.max_ird = ep->ird; attrs.max_ord = ep->ord; attrs.llp_stream_handle = ep; attrs.next_state = C4IW_QP_STATE_RTS; mask = C4IW_QP_ATTR_NEXT_STATE | C4IW_QP_ATTR_LLP_STREAM_HANDLE | C4IW_QP_ATTR_MPA_ATTR | C4IW_QP_ATTR_MAX_IRD | C4IW_QP_ATTR_MAX_ORD; /* bind QP and TID with INIT_WR */ err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); if (err) { CTR2(KTR_IW_CXGBE, "%s:pmrn %p", __func__, ep); goto err; } /* * If responder's RTR requirement did not match with what initiator * supports, generate TERM message */ if (rtr_mismatch) { CTR2(KTR_IW_CXGBE, "%s:pmro %p", __func__, ep); printk(KERN_ERR "%s: RTR mismatch, sending TERM\n", __func__); attrs.layer_etype = LAYER_MPA | DDP_LLP; attrs.ecode = MPA_NOMATCH_RTR; attrs.next_state = C4IW_QP_STATE_TERMINATE; err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); err = -ENOMEM; goto out; } /* * Generate TERM if initiator IRD is not sufficient for responder * provided ORD. Currently, we do the same behaviour even when * responder provided IRD is also not sufficient as regards to * initiator ORD. */ if (insuff_ird) { CTR2(KTR_IW_CXGBE, "%s:pmrp %p", __func__, ep); printk(KERN_ERR "%s: Insufficient IRD, sending TERM\n", __func__); attrs.layer_etype = LAYER_MPA | DDP_LLP; attrs.ecode = MPA_INSUFF_IRD; attrs.next_state = C4IW_QP_STATE_TERMINATE; err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 0); err = -ENOMEM; goto out; } goto out; err: state_set(&ep->com, ABORTING); abort_connection(ep); out: connect_reply_upcall(ep, err); CTR2(KTR_IW_CXGBE, "%s:pmrE %p", __func__, ep); return; } static void process_mpa_request(struct c4iw_ep *ep) { struct mpa_message *mpa; u16 plen; int flags = MSG_DONTWAIT; int rc; struct iovec iov; struct uio uio; enum c4iw_ep_state state = state_read(&ep->com); CTR3(KTR_IW_CXGBE, "%s: ep %p, state %s", __func__, ep, states[state]); if (state != MPA_REQ_WAIT) return; iov.iov_base = &ep->mpa_pkt[ep->mpa_pkt_len]; iov.iov_len = sizeof(ep->mpa_pkt) - ep->mpa_pkt_len; uio.uio_iov = &iov; uio.uio_iovcnt = 1; uio.uio_offset = 0; uio.uio_resid = sizeof(ep->mpa_pkt) - ep->mpa_pkt_len; uio.uio_segflg = UIO_SYSSPACE; uio.uio_rw = UIO_READ; uio.uio_td = NULL; /* uio.uio_td = ep->com.thread; */ rc = soreceive(ep->com.so, NULL, &uio, NULL, NULL, &flags); if (rc == EAGAIN) return; else if (rc) { abort: STOP_EP_TIMER(ep); abort_connection(ep); return; } KASSERT(uio.uio_offset > 0, ("%s: sorecieve on so %p read no data", __func__, ep->com.so)); ep->mpa_pkt_len += uio.uio_offset; /* * If we get more than the supported amount of private data then we must * fail this connection. XXX: check so_rcv->sb_cc, or peek with another * soreceive, or increase the size of mpa_pkt by 1 and abort if the last * byte is filled by the soreceive above. */ /* Don't even have the MPA message. Wait for more data to arrive. */ if (ep->mpa_pkt_len < sizeof(*mpa)) return; mpa = (struct mpa_message *) ep->mpa_pkt; /* * Validate MPA Header. */ if (mpa->revision > mpa_rev) { log(LOG_ERR, "%s: MPA version mismatch. Local = %d," " Received = %d\n", __func__, mpa_rev, mpa->revision); goto abort; } if (memcmp(mpa->key, MPA_KEY_REQ, sizeof(mpa->key))) goto abort; /* * Fail if there's too much private data. */ plen = ntohs(mpa->private_data_size); if (plen > MPA_MAX_PRIVATE_DATA) goto abort; /* * If plen does not account for pkt size */ if (ep->mpa_pkt_len > (sizeof(*mpa) + plen)) goto abort; ep->plen = (u8) plen; /* * If we don't have all the pdata yet, then bail. */ if (ep->mpa_pkt_len < (sizeof(*mpa) + plen)) return; /* * If we get here we have accumulated the entire mpa * start reply message including private data. */ ep->mpa_attr.initiator = 0; ep->mpa_attr.crc_enabled = (mpa->flags & MPA_CRC) | crc_enabled ? 1 : 0; ep->mpa_attr.recv_marker_enabled = markers_enabled; ep->mpa_attr.xmit_marker_enabled = mpa->flags & MPA_MARKERS ? 1 : 0; ep->mpa_attr.version = mpa->revision; if (mpa->revision == 1) ep->tried_with_mpa_v1 = 1; ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_DISABLED; if (mpa->revision == 2) { ep->mpa_attr.enhanced_rdma_conn = mpa->flags & MPA_ENHANCED_RDMA_CONN ? 1 : 0; if (ep->mpa_attr.enhanced_rdma_conn) { struct mpa_v2_conn_params *mpa_v2_params; u16 ird, ord; mpa_v2_params = (void *)&ep->mpa_pkt[sizeof(*mpa)]; ird = ntohs(mpa_v2_params->ird); ord = ntohs(mpa_v2_params->ord); ep->ird = ird & MPA_V2_IRD_ORD_MASK; ep->ord = ord & MPA_V2_IRD_ORD_MASK; if (ird & MPA_V2_PEER2PEER_MODEL && peer2peer) { if (ord & MPA_V2_RDMA_WRITE_RTR) { ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_RDMA_WRITE; } else if (ord & MPA_V2_RDMA_READ_RTR) { ep->mpa_attr.p2p_type = FW_RI_INIT_P2PTYPE_READ_REQ; } } } } else if (mpa->revision == 1 && peer2peer) ep->mpa_attr.p2p_type = p2p_type; if (set_tcpinfo(ep)) goto abort; CTR5(KTR_IW_CXGBE, "%s: crc_enabled = %d, recv_marker_enabled = %d, " "xmit_marker_enabled = %d, version = %d", __func__, ep->mpa_attr.crc_enabled, ep->mpa_attr.recv_marker_enabled, ep->mpa_attr.xmit_marker_enabled, ep->mpa_attr.version); state_set(&ep->com, MPA_REQ_RCVD); STOP_EP_TIMER(ep); /* drive upcall */ mutex_lock(&ep->parent_ep->com.mutex); if (ep->parent_ep->com.state != DEAD) connect_request_upcall(ep); else abort_connection(ep); mutex_unlock(&ep->parent_ep->com.mutex); } /* * Upcall from the adapter indicating data has been transmitted. * For us its just the single MPA request or reply. We can now free * the skb holding the mpa message. */ int c4iw_reject_cr(struct iw_cm_id *cm_id, const void *pdata, u8 pdata_len) { int err; struct c4iw_ep *ep = to_ep(cm_id); CTR2(KTR_IW_CXGBE, "%s:crcB %p", __func__, ep); if (state_read(&ep->com) == DEAD) { CTR2(KTR_IW_CXGBE, "%s:crc1 %p", __func__, ep); c4iw_put_ep(&ep->com); return -ECONNRESET; } set_bit(ULP_REJECT, &ep->com.history); BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); if (mpa_rev == 0) { CTR2(KTR_IW_CXGBE, "%s:crc2 %p", __func__, ep); abort_connection(ep); } else { CTR2(KTR_IW_CXGBE, "%s:crc3 %p", __func__, ep); err = send_mpa_reject(ep, pdata, pdata_len); err = soshutdown(ep->com.so, 3); } c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:crc4 %p", __func__, ep); return 0; } int c4iw_accept_cr(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) { int err; struct c4iw_qp_attributes attrs; enum c4iw_qp_attr_mask mask; struct c4iw_ep *ep = to_ep(cm_id); struct c4iw_dev *h = to_c4iw_dev(cm_id->device); struct c4iw_qp *qp = get_qhp(h, conn_param->qpn); CTR2(KTR_IW_CXGBE, "%s:cacB %p", __func__, ep); if (state_read(&ep->com) == DEAD) { CTR2(KTR_IW_CXGBE, "%s:cac1 %p", __func__, ep); err = -ECONNRESET; goto err; } BUG_ON(state_read(&ep->com) != MPA_REQ_RCVD); BUG_ON(!qp); set_bit(ULP_ACCEPT, &ep->com.history); if ((conn_param->ord > c4iw_max_read_depth) || (conn_param->ird > c4iw_max_read_depth)) { CTR2(KTR_IW_CXGBE, "%s:cac2 %p", __func__, ep); abort_connection(ep); err = -EINVAL; goto err; } if (ep->mpa_attr.version == 2 && ep->mpa_attr.enhanced_rdma_conn) { CTR2(KTR_IW_CXGBE, "%s:cac3 %p", __func__, ep); if (conn_param->ord > ep->ird) { CTR2(KTR_IW_CXGBE, "%s:cac4 %p", __func__, ep); ep->ird = conn_param->ird; ep->ord = conn_param->ord; send_mpa_reject(ep, conn_param->private_data, conn_param->private_data_len); abort_connection(ep); err = -ENOMEM; goto err; } if (conn_param->ird > ep->ord) { CTR2(KTR_IW_CXGBE, "%s:cac5 %p", __func__, ep); if (!ep->ord) { CTR2(KTR_IW_CXGBE, "%s:cac6 %p", __func__, ep); conn_param->ird = 1; } else { CTR2(KTR_IW_CXGBE, "%s:cac7 %p", __func__, ep); abort_connection(ep); err = -ENOMEM; goto err; } } } ep->ird = conn_param->ird; ep->ord = conn_param->ord; if (ep->mpa_attr.version != 2) { CTR2(KTR_IW_CXGBE, "%s:cac8 %p", __func__, ep); if (peer2peer && ep->ird == 0) { CTR2(KTR_IW_CXGBE, "%s:cac9 %p", __func__, ep); ep->ird = 1; } } cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->com.qp = qp; //ep->ofld_txq = TOEPCB(ep->com.so)->ofld_txq; /* bind QP to EP and move to RTS */ attrs.mpa_attr = ep->mpa_attr; attrs.max_ird = ep->ird; attrs.max_ord = ep->ord; attrs.llp_stream_handle = ep; attrs.next_state = C4IW_QP_STATE_RTS; /* bind QP and TID with INIT_WR */ mask = C4IW_QP_ATTR_NEXT_STATE | C4IW_QP_ATTR_LLP_STREAM_HANDLE | C4IW_QP_ATTR_MPA_ATTR | C4IW_QP_ATTR_MAX_IRD | C4IW_QP_ATTR_MAX_ORD; err = c4iw_modify_qp(ep->com.qp->rhp, ep->com.qp, mask, &attrs, 1); if (err) { CTR2(KTR_IW_CXGBE, "%s:caca %p", __func__, ep); goto err1; } err = send_mpa_reply(ep, conn_param->private_data, conn_param->private_data_len); if (err) { CTR2(KTR_IW_CXGBE, "%s:caca %p", __func__, ep); goto err1; } state_set(&ep->com, FPDU_MODE); established_upcall(ep); c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:cacE %p", __func__, ep); return 0; err1: ep->com.cm_id = NULL; ep->com.qp = NULL; cm_id->rem_ref(cm_id); err: c4iw_put_ep(&ep->com); CTR2(KTR_IW_CXGBE, "%s:cacE err %p", __func__, ep); return err; } int c4iw_connect(struct iw_cm_id *cm_id, struct iw_cm_conn_param *conn_param) { int err = 0; struct c4iw_dev *dev = to_c4iw_dev(cm_id->device); struct c4iw_ep *ep = NULL; struct rtentry *rt; struct toedev *tdev; CTR2(KTR_IW_CXGBE, "%s:ccB %p", __func__, cm_id); if ((conn_param->ord > c4iw_max_read_depth) || (conn_param->ird > c4iw_max_read_depth)) { CTR2(KTR_IW_CXGBE, "%s:cc1 %p", __func__, cm_id); err = -EINVAL; goto out; } ep = alloc_ep(sizeof(*ep), M_NOWAIT); if (!ep) { CTR2(KTR_IW_CXGBE, "%s:cc2 %p", __func__, cm_id); printk(KERN_ERR MOD "%s - cannot alloc ep.\n", __func__); err = -ENOMEM; goto out; } init_timer(&ep->timer); ep->plen = conn_param->private_data_len; if (ep->plen) { CTR2(KTR_IW_CXGBE, "%s:cc3 %p", __func__, ep); memcpy(ep->mpa_pkt + sizeof(struct mpa_message), conn_param->private_data, ep->plen); } ep->ird = conn_param->ird; ep->ord = conn_param->ord; if (peer2peer && ep->ord == 0) { CTR2(KTR_IW_CXGBE, "%s:cc4 %p", __func__, ep); ep->ord = 1; } cm_id->add_ref(cm_id); ep->com.dev = dev; ep->com.cm_id = cm_id; ep->com.qp = get_qhp(dev, conn_param->qpn); if (!ep->com.qp) { CTR2(KTR_IW_CXGBE, "%s:cc5 %p", __func__, ep); err = -EINVAL; goto fail2; } ep->com.thread = curthread; ep->com.so = cm_id->so; init_sock(&ep->com); /* find a route */ rt = find_route( cm_id->local_addr.sin_addr.s_addr, cm_id->remote_addr.sin_addr.s_addr, cm_id->local_addr.sin_port, cm_id->remote_addr.sin_port, 0); if (!rt) { CTR2(KTR_IW_CXGBE, "%s:cc7 %p", __func__, ep); printk(KERN_ERR MOD "%s - cannot find route.\n", __func__); err = -EHOSTUNREACH; goto fail2; } if (!(rt->rt_ifp->if_capenable & IFCAP_TOE)) { CTR2(KTR_IW_CXGBE, "%s:cc8 %p", __func__, ep); printf("%s - interface not TOE capable.\n", __func__); close_socket(&ep->com, 0); err = -ENOPROTOOPT; goto fail3; } tdev = TOEDEV(rt->rt_ifp); if (tdev == NULL) { CTR2(KTR_IW_CXGBE, "%s:cc9 %p", __func__, ep); printf("%s - No toedev for interface.\n", __func__); goto fail3; } RTFREE(rt); state_set(&ep->com, CONNECTING); ep->tos = 0; ep->com.local_addr = cm_id->local_addr; ep->com.remote_addr = cm_id->remote_addr; err = soconnect(ep->com.so, (struct sockaddr *)&ep->com.remote_addr, ep->com.thread); if (!err) { CTR2(KTR_IW_CXGBE, "%s:cca %p", __func__, ep); goto out; } else { close_socket(&ep->com, 0); goto fail2; } fail3: CTR2(KTR_IW_CXGBE, "%s:ccb %p", __func__, ep); RTFREE(rt); fail2: cm_id->rem_ref(cm_id); c4iw_put_ep(&ep->com); out: CTR2(KTR_IW_CXGBE, "%s:ccE %p", __func__, ep); return err; } /* * iwcm->create_listen. Returns -errno on failure. */ int c4iw_create_listen(struct iw_cm_id *cm_id, int backlog) { int rc; struct c4iw_dev *dev = to_c4iw_dev(cm_id->device); struct c4iw_listen_ep *ep; struct socket *so = cm_id->so; ep = alloc_ep(sizeof(*ep), GFP_KERNEL); CTR5(KTR_IW_CXGBE, "%s: cm_id %p, lso %p, ep %p, inp %p", __func__, cm_id, so, ep, so->so_pcb); if (ep == NULL) { log(LOG_ERR, "%s: failed to alloc memory for endpoint\n", __func__); rc = ENOMEM; goto failed; } cm_id->add_ref(cm_id); ep->com.cm_id = cm_id; ep->com.dev = dev; ep->backlog = backlog; ep->com.local_addr = cm_id->local_addr; ep->com.thread = curthread; state_set(&ep->com, LISTEN); ep->com.so = so; init_sock(&ep->com); rc = solisten(so, ep->backlog, ep->com.thread); if (rc != 0) { log(LOG_ERR, "%s: failed to start listener: %d\n", __func__, rc); close_socket(&ep->com, 0); cm_id->rem_ref(cm_id); c4iw_put_ep(&ep->com); goto failed; } cm_id->provider_data = ep; return (0); failed: CTR3(KTR_IW_CXGBE, "%s: cm_id %p, FAILED (%d)", __func__, cm_id, rc); return (-rc); } int c4iw_destroy_listen(struct iw_cm_id *cm_id) { int rc; struct c4iw_listen_ep *ep = to_listen_ep(cm_id); CTR4(KTR_IW_CXGBE, "%s: cm_id %p, so %p, inp %p", __func__, cm_id, cm_id->so, cm_id->so->so_pcb); state_set(&ep->com, DEAD); rc = close_socket(&ep->com, 0); cm_id->rem_ref(cm_id); c4iw_put_ep(&ep->com); return (rc); } int c4iw_ep_disconnect(struct c4iw_ep *ep, int abrupt, gfp_t gfp) { int ret = 0; int close = 0; int fatal = 0; struct c4iw_rdev *rdev; mutex_lock(&ep->com.mutex); CTR2(KTR_IW_CXGBE, "%s:cedB %p", __func__, ep); rdev = &ep->com.dev->rdev; if (c4iw_fatal_error(rdev)) { CTR2(KTR_IW_CXGBE, "%s:ced1 %p", __func__, ep); fatal = 1; close_complete_upcall(ep, -EIO); ep->com.state = DEAD; } CTR3(KTR_IW_CXGBE, "%s:ced2 %p %s", __func__, ep, states[ep->com.state]); switch (ep->com.state) { case MPA_REQ_WAIT: case MPA_REQ_SENT: case MPA_REQ_RCVD: case MPA_REP_SENT: case FPDU_MODE: close = 1; if (abrupt) ep->com.state = ABORTING; else { ep->com.state = CLOSING; START_EP_TIMER(ep); } set_bit(CLOSE_SENT, &ep->com.flags); break; case CLOSING: if (!test_and_set_bit(CLOSE_SENT, &ep->com.flags)) { close = 1; if (abrupt) { STOP_EP_TIMER(ep); ep->com.state = ABORTING; } else ep->com.state = MORIBUND; } break; case MORIBUND: case ABORTING: case DEAD: CTR3(KTR_IW_CXGBE, "%s ignoring disconnect ep %p state %u", __func__, ep, ep->com.state); break; default: BUG(); break; } mutex_unlock(&ep->com.mutex); if (close) { CTR2(KTR_IW_CXGBE, "%s:ced3 %p", __func__, ep); if (abrupt) { CTR2(KTR_IW_CXGBE, "%s:ced4 %p", __func__, ep); set_bit(EP_DISC_ABORT, &ep->com.history); ret = abort_connection(ep); } else { CTR2(KTR_IW_CXGBE, "%s:ced5 %p", __func__, ep); set_bit(EP_DISC_CLOSE, &ep->com.history); if (!ep->parent_ep) __state_set(&ep->com, MORIBUND); ret = shutdown_socket(&ep->com); } if (ret) { fatal = 1; } } if (fatal) { release_ep_resources(ep); CTR2(KTR_IW_CXGBE, "%s:ced6 %p", __func__, ep); } CTR2(KTR_IW_CXGBE, "%s:cedE %p", __func__, ep); return ret; } #ifdef C4IW_EP_REDIRECT int c4iw_ep_redirect(void *ctx, struct dst_entry *old, struct dst_entry *new, struct l2t_entry *l2t) { struct c4iw_ep *ep = ctx; if (ep->dst != old) return 0; PDBG("%s ep %p redirect to dst %p l2t %p\n", __func__, ep, new, l2t); dst_hold(new); cxgb4_l2t_release(ep->l2t); ep->l2t = l2t; dst_release(old); ep->dst = new; return 1; } #endif static void ep_timeout(unsigned long arg) { struct c4iw_ep *ep = (struct c4iw_ep *)arg; int kickit = 0; CTR2(KTR_IW_CXGBE, "%s:etB %p", __func__, ep); spin_lock(&timeout_lock); if (!test_and_set_bit(TIMEOUT, &ep->com.flags)) { list_add_tail(&ep->entry, &timeout_list); kickit = 1; } spin_unlock(&timeout_lock); if (kickit) { CTR2(KTR_IW_CXGBE, "%s:et1 %p", __func__, ep); queue_work(c4iw_taskq, &c4iw_task); } CTR2(KTR_IW_CXGBE, "%s:etE %p", __func__, ep); } static int fw6_wr_rpl(struct adapter *sc, const __be64 *rpl) { uint64_t val = be64toh(*rpl); int ret; struct c4iw_wr_wait *wr_waitp; ret = (int)((val >> 8) & 0xff); wr_waitp = (struct c4iw_wr_wait *)rpl[1]; CTR3(KTR_IW_CXGBE, "%s wr_waitp %p ret %u", __func__, wr_waitp, ret); if (wr_waitp) c4iw_wake_up(wr_waitp, ret ? -ret : 0); return (0); } static int fw6_cqe_handler(struct adapter *sc, const __be64 *rpl) { struct t4_cqe cqe =*(const struct t4_cqe *)(&rpl[0]); CTR2(KTR_IW_CXGBE, "%s rpl %p", __func__, rpl); c4iw_ev_dispatch(sc->iwarp_softc, &cqe); return (0); } static int terminate(struct sge_iq *iq, const struct rss_header *rss, struct mbuf *m) { struct adapter *sc = iq->adapter; const struct cpl_rdma_terminate *rpl = (const void *)(rss + 1); unsigned int tid = GET_TID(rpl); struct c4iw_qp_attributes attrs; struct toepcb *toep = lookup_tid(sc, tid); struct socket *so = inp_inpcbtosocket(toep->inp); struct c4iw_ep *ep = so->so_rcv.sb_upcallarg; CTR2(KTR_IW_CXGBE, "%s:tB %p %d", __func__, ep); if (ep && ep->com.qp) { printk(KERN_WARNING MOD "TERM received tid %u qpid %u\n", tid, ep->com.qp->wq.sq.qid); attrs.next_state = C4IW_QP_STATE_TERMINATE; c4iw_modify_qp(ep->com.dev, ep->com.qp, C4IW_QP_ATTR_NEXT_STATE, &attrs, 1); } else printk(KERN_WARNING MOD "TERM received tid %u no ep/qp\n", tid); CTR2(KTR_IW_CXGBE, "%s:tE %p %d", __func__, ep); return 0; } void c4iw_cm_init_cpl(struct adapter *sc) { t4_register_cpl_handler(sc, CPL_RDMA_TERMINATE, terminate); t4_register_fw_msg_handler(sc, FW6_TYPE_WR_RPL, fw6_wr_rpl); t4_register_fw_msg_handler(sc, FW6_TYPE_CQE, fw6_cqe_handler); t4_register_an_handler(sc, c4iw_ev_handler); } void c4iw_cm_term_cpl(struct adapter *sc) { t4_register_cpl_handler(sc, CPL_RDMA_TERMINATE, NULL); t4_register_fw_msg_handler(sc, FW6_TYPE_WR_RPL, NULL); t4_register_fw_msg_handler(sc, FW6_TYPE_CQE, NULL); } int __init c4iw_cm_init(void) { TAILQ_INIT(&req_list); spin_lock_init(&req_lock); INIT_LIST_HEAD(&timeout_list); spin_lock_init(&timeout_lock); INIT_WORK(&c4iw_task, process_req); c4iw_taskq = create_singlethread_workqueue("iw_cxgbe"); if (!c4iw_taskq) return -ENOMEM; return 0; } void __exit c4iw_cm_term(void) { WARN_ON(!TAILQ_EMPTY(&req_list)); WARN_ON(!list_empty(&timeout_list)); flush_workqueue(c4iw_taskq); destroy_workqueue(c4iw_taskq); } #endif Index: projects/ifnet/sys/dev/fdt/simplebus.c =================================================================== --- projects/ifnet/sys/dev/fdt/simplebus.c (revision 277106) +++ projects/ifnet/sys/dev/fdt/simplebus.c (revision 277107) @@ -1,414 +1,385 @@ /*- * Copyright (c) 2013 Nathan Whitehorn * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include struct simplebus_range { uint64_t bus; uint64_t host; uint64_t size; }; struct simplebus_softc { device_t dev; phandle_t node; struct simplebus_range *ranges; int nranges; pcell_t acells, scells; }; struct simplebus_devinfo { struct ofw_bus_devinfo obdinfo; struct resource_list rl; }; /* * Bus interface. */ static int simplebus_probe(device_t dev); static int simplebus_attach(device_t dev); static struct resource *simplebus_alloc_resource(device_t, device_t, int, int *, u_long, u_long, u_long, u_int); static void simplebus_probe_nomatch(device_t bus, device_t child); static int simplebus_print_child(device_t bus, device_t child); /* * ofw_bus interface */ static const struct ofw_bus_devinfo *simplebus_get_devinfo(device_t bus, device_t child); /* * local methods */ static int simplebus_fill_ranges(phandle_t node, struct simplebus_softc *sc); static struct simplebus_devinfo *simplebus_setup_dinfo(device_t dev, phandle_t node); /* * Driver methods. */ static device_method_t simplebus_methods[] = { /* Device interface */ DEVMETHOD(device_probe, simplebus_probe), DEVMETHOD(device_attach, simplebus_attach), /* Bus interface */ DEVMETHOD(bus_print_child, simplebus_print_child), DEVMETHOD(bus_probe_nomatch, simplebus_probe_nomatch), DEVMETHOD(bus_setup_intr, bus_generic_setup_intr), DEVMETHOD(bus_teardown_intr, bus_generic_teardown_intr), DEVMETHOD(bus_alloc_resource, simplebus_alloc_resource), DEVMETHOD(bus_release_resource, bus_generic_release_resource), DEVMETHOD(bus_activate_resource, bus_generic_activate_resource), DEVMETHOD(bus_deactivate_resource, bus_generic_deactivate_resource), DEVMETHOD(bus_adjust_resource, bus_generic_adjust_resource), DEVMETHOD(bus_child_pnpinfo_str, ofw_bus_gen_child_pnpinfo_str), /* ofw_bus interface */ DEVMETHOD(ofw_bus_get_devinfo, simplebus_get_devinfo), DEVMETHOD(ofw_bus_get_compat, ofw_bus_gen_get_compat), DEVMETHOD(ofw_bus_get_model, ofw_bus_gen_get_model), DEVMETHOD(ofw_bus_get_name, ofw_bus_gen_get_name), DEVMETHOD(ofw_bus_get_node, ofw_bus_gen_get_node), DEVMETHOD(ofw_bus_get_type, ofw_bus_gen_get_type), DEVMETHOD_END }; static driver_t simplebus_driver = { "simplebus", simplebus_methods, sizeof(struct simplebus_softc) }; static devclass_t simplebus_devclass; EARLY_DRIVER_MODULE(simplebus, ofwbus, simplebus_driver, simplebus_devclass, 0, 0, BUS_PASS_BUS); EARLY_DRIVER_MODULE(simplebus, simplebus, simplebus_driver, simplebus_devclass, 0, 0, BUS_PASS_BUS + BUS_PASS_ORDER_MIDDLE); static int simplebus_probe(device_t dev) { if (!ofw_bus_status_okay(dev)) return (ENXIO); /* * FDT data puts a "simple-bus" compatible string on many things that * have children but aren't really busses in our world. Without a * ranges property we will fail to attach, so just fail to probe too. */ if (!(ofw_bus_is_compatible(dev, "simple-bus") && ofw_bus_has_prop(dev, "ranges")) && (ofw_bus_get_type(dev) == NULL || strcmp(ofw_bus_get_type(dev), "soc") != 0)) return (ENXIO); device_set_desc(dev, "Flattened device tree simple bus"); return (BUS_PROBE_GENERIC); } static int simplebus_attach(device_t dev) { struct simplebus_softc *sc; struct simplebus_devinfo *di; phandle_t node; device_t cdev; node = ofw_bus_get_node(dev); sc = device_get_softc(dev); sc->dev = dev; sc->node = node; /* * Some important numbers */ sc->acells = 2; OF_getencprop(node, "#address-cells", &sc->acells, sizeof(sc->acells)); sc->scells = 1; OF_getencprop(node, "#size-cells", &sc->scells, sizeof(sc->scells)); if (simplebus_fill_ranges(node, sc) < 0) { device_printf(dev, "could not get ranges\n"); return (ENXIO); } /* * In principle, simplebus could have an interrupt map, but ignore that * for now */ for (node = OF_child(node); node > 0; node = OF_peer(node)) { if ((di = simplebus_setup_dinfo(dev, node)) == NULL) continue; cdev = device_add_child(dev, NULL, -1); if (cdev == NULL) { device_printf(dev, "<%s>: device_add_child failed\n", di->obdinfo.obd_name); resource_list_free(&di->rl); ofw_bus_gen_destroy_devinfo(&di->obdinfo); free(di, M_DEVBUF); continue; } device_set_ivars(cdev, di); } return (bus_generic_attach(dev)); } static int simplebus_fill_ranges(phandle_t node, struct simplebus_softc *sc) { int host_address_cells; cell_t *base_ranges; ssize_t nbase_ranges; int err; int i, j, k; err = OF_searchencprop(OF_parent(node), "#address-cells", &host_address_cells, sizeof(host_address_cells)); if (err <= 0) return (-1); nbase_ranges = OF_getproplen(node, "ranges"); if (nbase_ranges < 0) return (-1); sc->nranges = nbase_ranges / sizeof(cell_t) / (sc->acells + host_address_cells + sc->scells); if (sc->nranges == 0) return (0); sc->ranges = malloc(sc->nranges * sizeof(sc->ranges[0]), M_DEVBUF, M_WAITOK); base_ranges = malloc(nbase_ranges, M_DEVBUF, M_WAITOK); OF_getencprop(node, "ranges", base_ranges, nbase_ranges); for (i = 0, j = 0; i < sc->nranges; i++) { sc->ranges[i].bus = 0; for (k = 0; k < sc->acells; k++) { sc->ranges[i].bus <<= 32; sc->ranges[i].bus |= base_ranges[j++]; } sc->ranges[i].host = 0; for (k = 0; k < host_address_cells; k++) { sc->ranges[i].host <<= 32; sc->ranges[i].host |= base_ranges[j++]; } sc->ranges[i].size = 0; for (k = 0; k < sc->scells; k++) { sc->ranges[i].size <<= 32; sc->ranges[i].size |= base_ranges[j++]; } } free(base_ranges, M_DEVBUF); return (sc->nranges); } static struct simplebus_devinfo * simplebus_setup_dinfo(device_t dev, phandle_t node) { struct simplebus_softc *sc; struct simplebus_devinfo *ndi; - uint32_t *reg; - uint64_t phys, size; - int i, j, k; - int nreg; sc = device_get_softc(dev); ndi = malloc(sizeof(*ndi), M_DEVBUF, M_WAITOK | M_ZERO); if (ofw_bus_gen_setup_devinfo(&ndi->obdinfo, node) != 0) { free(ndi, M_DEVBUF); return (NULL); } resource_list_init(&ndi->rl); - nreg = OF_getencprop_alloc(node, "reg", sizeof(*reg), (void **)®); - if (nreg == -1) - nreg = 0; - if (nreg % (sc->acells + sc->scells) != 0) { - if (bootverbose) - device_printf(dev, "Malformed reg property on <%s>\n", - ndi->obdinfo.obd_name); - nreg = 0; - } - - for (i = 0, k = 0; i < nreg; i += sc->acells + sc->scells, k++) { - phys = size = 0; - for (j = 0; j < sc->acells; j++) { - phys <<= 32; - phys |= reg[i + j]; - } - for (j = 0; j < sc->scells; j++) { - size <<= 32; - size |= reg[i + sc->acells + j]; - } - - resource_list_add(&ndi->rl, SYS_RES_MEMORY, k, - phys, phys + size - 1, size); - } - free(reg, M_OFWPROP); - + ofw_bus_reg_to_rl(dev, node, sc->acells, sc->scells, &ndi->rl); ofw_bus_intr_to_rl(dev, node, &ndi->rl); return (ndi); } static const struct ofw_bus_devinfo * simplebus_get_devinfo(device_t bus __unused, device_t child) { struct simplebus_devinfo *ndi; ndi = device_get_ivars(child); return (&ndi->obdinfo); } static struct resource * simplebus_alloc_resource(device_t bus, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags) { struct simplebus_softc *sc; struct simplebus_devinfo *di; struct resource_list_entry *rle; int j; sc = device_get_softc(bus); /* * Request for the default allocation with a given rid: use resource * list stored in the local device info. */ if ((start == 0UL) && (end == ~0UL)) { if ((di = device_get_ivars(child)) == NULL) return (NULL); if (type == SYS_RES_IOPORT) type = SYS_RES_MEMORY; rle = resource_list_find(&di->rl, type, *rid); if (rle == NULL) { if (bootverbose) device_printf(bus, "no default resources for " "rid = %d, type = %d\n", *rid, type); return (NULL); } start = rle->start; end = rle->end; count = rle->count; } if (type == SYS_RES_MEMORY) { /* Remap through ranges property */ for (j = 0; j < sc->nranges; j++) { if (start >= sc->ranges[j].bus && end < sc->ranges[j].bus + sc->ranges[j].size) { start -= sc->ranges[j].bus; start += sc->ranges[j].host; end -= sc->ranges[j].bus; end += sc->ranges[j].host; break; } } if (j == sc->nranges && sc->nranges != 0) { if (bootverbose) device_printf(bus, "Could not map resource " "%#lx-%#lx\n", start, end); return (NULL); } } return (bus_generic_alloc_resource(bus, child, type, rid, start, end, count, flags)); } static int simplebus_print_res(struct simplebus_devinfo *di) { int rv; rv = 0; rv += resource_list_print_type(&di->rl, "mem", SYS_RES_MEMORY, "%#lx"); rv += resource_list_print_type(&di->rl, "irq", SYS_RES_IRQ, "%ld"); return (rv); } static void simplebus_probe_nomatch(device_t bus, device_t child) { const char *name, *type, *compat; if (!bootverbose) return; name = ofw_bus_get_name(child); type = ofw_bus_get_type(child); compat = ofw_bus_get_compat(child); device_printf(bus, "<%s>", name != NULL ? name : "unknown"); simplebus_print_res(device_get_ivars(child)); if (!ofw_bus_status_okay(child)) printf(" disabled"); if (type) printf(" type %s", type); if (compat) printf(" compat %s", compat); printf(" (no driver attached)\n"); } static int simplebus_print_child(device_t bus, device_t child) { int rv; rv = bus_print_child_header(bus, child); rv += simplebus_print_res(device_get_ivars(child)); if (!ofw_bus_status_okay(child)) rv += printf(" disabled"); rv += bus_print_child_footer(bus, child); return (rv); } Index: projects/ifnet/sys/dev/mii/e1000phy.c =================================================================== --- projects/ifnet/sys/dev/mii/e1000phy.c (revision 277106) +++ projects/ifnet/sys/dev/mii/e1000phy.c (revision 277107) @@ -1,500 +1,499 @@ /*- * Principal Author: Parag Patel * Copyright (c) 2001 * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice unmodified, this list of conditions, and the following * disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * Additional Copyright (c) 2001 by Traakan Software under same licence. * Secondary Author: Matthew Jacob */ #include __FBSDID("$FreeBSD$"); /* * driver for the Marvell 88E1000 series external 1000/100/10-BT PHY. */ /* * Support added for the Marvell 88E1011 (Alaska) 1000/100/10baseTX and * 1000baseSX PHY. * Nathan Binkert * Jung-uk Kim */ #include #include #include #include #include #include #include #include #include #include #include "miidevs.h" #include #include "miibus_if.h" static int e1000phy_probe(device_t); static int e1000phy_attach(device_t); static device_method_t e1000phy_methods[] = { /* device interface */ DEVMETHOD(device_probe, e1000phy_probe), DEVMETHOD(device_attach, e1000phy_attach), DEVMETHOD(device_detach, mii_phy_detach), DEVMETHOD(device_shutdown, bus_generic_shutdown), DEVMETHOD_END }; static devclass_t e1000phy_devclass; static driver_t e1000phy_driver = { "e1000phy", e1000phy_methods, sizeof(struct mii_softc) }; DRIVER_MODULE(e1000phy, miibus, e1000phy_driver, e1000phy_devclass, 0, 0); static int e1000phy_service(struct mii_softc *, struct mii_data *, int); static void e1000phy_status(struct mii_softc *); static void e1000phy_reset(struct mii_softc *); static int e1000phy_mii_phy_auto(struct mii_softc *, int); static const struct mii_phydesc e1000phys[] = { MII_PHY_DESC(MARVELL, E1000), MII_PHY_DESC(MARVELL, E1011), MII_PHY_DESC(MARVELL, E1000_3), MII_PHY_DESC(MARVELL, E1000_5), MII_PHY_DESC(MARVELL, E1111), MII_PHY_DESC(xxMARVELL, E1000), MII_PHY_DESC(xxMARVELL, E1011), MII_PHY_DESC(xxMARVELL, E1000_3), MII_PHY_DESC(xxMARVELL, E1000S), MII_PHY_DESC(xxMARVELL, E1000_5), MII_PHY_DESC(xxMARVELL, E1101), MII_PHY_DESC(xxMARVELL, E3082), MII_PHY_DESC(xxMARVELL, E1112), MII_PHY_DESC(xxMARVELL, E1149), MII_PHY_DESC(xxMARVELL, E1111), MII_PHY_DESC(xxMARVELL, E1116), MII_PHY_DESC(xxMARVELL, E1116R), MII_PHY_DESC(xxMARVELL, E1116R_29), MII_PHY_DESC(xxMARVELL, E1118), MII_PHY_DESC(xxMARVELL, E1145), MII_PHY_DESC(xxMARVELL, E1149R), MII_PHY_DESC(xxMARVELL, E3016), MII_PHY_DESC(xxMARVELL, PHYG65G), MII_PHY_END }; static const struct mii_phy_funcs e1000phy_funcs = { e1000phy_service, e1000phy_status, e1000phy_reset }; static int e1000phy_probe(device_t dev) { return (mii_phy_dev_probe(dev, e1000phys, BUS_PROBE_DEFAULT)); } static int e1000phy_attach(device_t dev) { struct mii_softc *sc; - if_t ifp; sc = device_get_softc(dev); mii_phy_dev_attach(dev, MIIF_NOMANPAUSE, &e1000phy_funcs, 0); - ifp = sc->mii_pdata->mii_ifp; - if (mii_dev_mac_match(dev, "msk") && (sc->mii_flags & MIIF_MACPRIV0) != 0) + if (mii_dev_mac_match(dev, "msk") && + (sc->mii_flags & MIIF_MACPRIV0) != 0) sc->mii_flags |= MIIF_PHYPRIV0; switch (sc->mii_mpd_model) { case MII_MODEL_xxMARVELL_E1011: case MII_MODEL_xxMARVELL_E1112: if (PHY_READ(sc, E1000_ESSR) & E1000_ESSR_FIBER_LINK) sc->mii_flags |= MIIF_HAVEFIBER; break; case MII_MODEL_xxMARVELL_E1149: case MII_MODEL_xxMARVELL_E1149R: /* * Some 88E1149 PHY's page select is initialized to * point to other bank instead of copper/fiber bank * which in turn resulted in wrong registers were * accessed during PHY operation. It is believed that * page 0 should be used for copper PHY so reinitialize * E1000_EADR to select default copper PHY. If parent * device know the type of PHY(either copper or fiber), * that information should be used to select default * type of PHY. */ PHY_WRITE(sc, E1000_EADR, 0); break; } PHY_RESET(sc); sc->mii_capabilities = PHY_READ(sc, MII_BMSR) & sc->mii_capmask; if (sc->mii_capabilities & BMSR_EXTSTAT) { sc->mii_extcapabilities = PHY_READ(sc, MII_EXTSR); if ((sc->mii_extcapabilities & (EXTSR_1000TFDX | EXTSR_1000THDX)) != 0) sc->mii_flags |= MIIF_HAVE_GTCR; } device_printf(dev, " "); mii_phy_add_media(sc); printf("\n"); MIIBUS_MEDIAINIT(sc->mii_dev); return (0); } static void e1000phy_reset(struct mii_softc *sc) { uint16_t reg, page; reg = PHY_READ(sc, E1000_SCR); if ((sc->mii_flags & MIIF_HAVEFIBER) != 0) { reg &= ~E1000_SCR_AUTO_X_MODE; PHY_WRITE(sc, E1000_SCR, reg); if (sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1112) { /* Select 1000BASE-X only mode. */ page = PHY_READ(sc, E1000_EADR); PHY_WRITE(sc, E1000_EADR, 2); reg = PHY_READ(sc, E1000_SCR); reg &= ~E1000_SCR_MODE_MASK; reg |= E1000_SCR_MODE_1000BX; PHY_WRITE(sc, E1000_SCR, reg); if ((sc->mii_flags & MIIF_PHYPRIV0) != 0) { /* Set SIGDET polarity low for SFP module. */ PHY_WRITE(sc, E1000_EADR, 1); reg = PHY_READ(sc, E1000_SCR); reg |= E1000_SCR_FIB_SIGDET_POLARITY; PHY_WRITE(sc, E1000_SCR, reg); } PHY_WRITE(sc, E1000_EADR, page); } } else { switch (sc->mii_mpd_model) { case MII_MODEL_xxMARVELL_E1111: case MII_MODEL_xxMARVELL_E1112: case MII_MODEL_xxMARVELL_E1116: case MII_MODEL_xxMARVELL_E1116R_29: case MII_MODEL_xxMARVELL_E1118: case MII_MODEL_xxMARVELL_E1149: case MII_MODEL_xxMARVELL_E1149R: case MII_MODEL_xxMARVELL_PHYG65G: /* Disable energy detect mode. */ reg &= ~E1000_SCR_EN_DETECT_MASK; reg |= E1000_SCR_AUTO_X_MODE; if (sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1116 || sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1116R_29) reg &= ~E1000_SCR_POWER_DOWN; reg |= E1000_SCR_ASSERT_CRS_ON_TX; break; case MII_MODEL_xxMARVELL_E3082: reg |= (E1000_SCR_AUTO_X_MODE >> 1); reg |= E1000_SCR_ASSERT_CRS_ON_TX; break; case MII_MODEL_xxMARVELL_E3016: reg |= E1000_SCR_AUTO_MDIX; reg &= ~(E1000_SCR_EN_DETECT | E1000_SCR_SCRAMBLER_DISABLE); reg |= E1000_SCR_LPNP; /* XXX Enable class A driver for Yukon FE+ A0. */ PHY_WRITE(sc, 0x1C, PHY_READ(sc, 0x1C) | 0x0001); break; default: reg &= ~E1000_SCR_AUTO_X_MODE; reg |= E1000_SCR_ASSERT_CRS_ON_TX; break; } if (sc->mii_mpd_model != MII_MODEL_xxMARVELL_E3016) { /* Auto correction for reversed cable polarity. */ reg &= ~E1000_SCR_POLARITY_REVERSAL; } PHY_WRITE(sc, E1000_SCR, reg); if (sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1116 || sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1116R_29 || sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1149 || sc->mii_mpd_model == MII_MODEL_xxMARVELL_E1149R) { PHY_WRITE(sc, E1000_EADR, 2); reg = PHY_READ(sc, E1000_SCR); reg |= E1000_SCR_RGMII_POWER_UP; PHY_WRITE(sc, E1000_SCR, reg); PHY_WRITE(sc, E1000_EADR, 0); } } switch (sc->mii_mpd_model) { case MII_MODEL_xxMARVELL_E3082: case MII_MODEL_xxMARVELL_E1112: case MII_MODEL_xxMARVELL_E1118: break; case MII_MODEL_xxMARVELL_E1116: case MII_MODEL_xxMARVELL_E1116R_29: page = PHY_READ(sc, E1000_EADR); /* Select page 3, LED control register. */ PHY_WRITE(sc, E1000_EADR, 3); PHY_WRITE(sc, E1000_SCR, E1000_SCR_LED_LOS(1) | /* Link/Act */ E1000_SCR_LED_INIT(8) | /* 10Mbps */ E1000_SCR_LED_STAT1(7) | /* 100Mbps */ E1000_SCR_LED_STAT0(7)); /* 1000Mbps */ /* Set blink rate. */ PHY_WRITE(sc, E1000_IER, E1000_PULSE_DUR(E1000_PULSE_170MS) | E1000_BLINK_RATE(E1000_BLINK_84MS)); PHY_WRITE(sc, E1000_EADR, page); break; case MII_MODEL_xxMARVELL_E3016: /* LED2 -> ACT, LED1 -> LINK, LED0 -> SPEED. */ PHY_WRITE(sc, 0x16, 0x0B << 8 | 0x05 << 4 | 0x04); /* Integrated register calibration workaround. */ PHY_WRITE(sc, 0x1D, 17); PHY_WRITE(sc, 0x1E, 0x3F60); break; default: /* Force TX_CLK to 25MHz clock. */ reg = PHY_READ(sc, E1000_ESCR); reg |= E1000_ESCR_TX_CLK_25; PHY_WRITE(sc, E1000_ESCR, reg); break; } /* Reset the PHY so all changes take effect. */ reg = PHY_READ(sc, E1000_CR); reg |= E1000_CR_RESET; PHY_WRITE(sc, E1000_CR, reg); } static int e1000phy_service(struct mii_softc *sc, struct mii_data *mii, int cmd) { struct ifmedia_entry *ife = mii->mii_media.ifm_cur; uint16_t speed, gig; int reg; switch (cmd) { case MII_POLLSTAT: break; case MII_MEDIACHG: if (IFM_SUBTYPE(ife->ifm_media) == IFM_AUTO) { e1000phy_mii_phy_auto(sc, ife->ifm_media); break; } speed = 0; switch (IFM_SUBTYPE(ife->ifm_media)) { case IFM_1000_T: if ((sc->mii_flags & MIIF_HAVE_GTCR) == 0) return (EINVAL); speed = E1000_CR_SPEED_1000; break; case IFM_1000_SX: if ((sc->mii_extcapabilities & (EXTSR_1000XFDX | EXTSR_1000XHDX)) == 0) return (EINVAL); speed = E1000_CR_SPEED_1000; break; case IFM_100_TX: speed = E1000_CR_SPEED_100; break; case IFM_10_T: speed = E1000_CR_SPEED_10; break; case IFM_NONE: reg = PHY_READ(sc, E1000_CR); PHY_WRITE(sc, E1000_CR, reg | E1000_CR_ISOLATE | E1000_CR_POWER_DOWN); goto done; default: return (EINVAL); } if ((ife->ifm_media & IFM_FDX) != 0) { speed |= E1000_CR_FULL_DUPLEX; gig = E1000_1GCR_1000T_FD; } else gig = E1000_1GCR_1000T; reg = PHY_READ(sc, E1000_CR); reg &= ~E1000_CR_AUTO_NEG_ENABLE; PHY_WRITE(sc, E1000_CR, reg | E1000_CR_RESET); if (IFM_SUBTYPE(ife->ifm_media) == IFM_1000_T) { gig |= E1000_1GCR_MS_ENABLE; if ((ife->ifm_media & IFM_ETH_MASTER) != 0) gig |= E1000_1GCR_MS_VALUE; } else if ((sc->mii_flags & MIIF_HAVE_GTCR) != 0) gig = 0; PHY_WRITE(sc, E1000_1GCR, gig); PHY_WRITE(sc, E1000_AR, E1000_AR_SELECTOR_FIELD); PHY_WRITE(sc, E1000_CR, speed | E1000_CR_RESET); done: break; case MII_TICK: /* * Only used for autonegotiation. */ if (IFM_SUBTYPE(ife->ifm_media) != IFM_AUTO) { sc->mii_ticks = 0; break; } /* * check for link. * Read the status register twice; BMSR_LINK is latch-low. */ reg = PHY_READ(sc, MII_BMSR) | PHY_READ(sc, MII_BMSR); if (reg & BMSR_LINK) { sc->mii_ticks = 0; break; } /* Announce link loss right after it happens. */ if (sc->mii_ticks++ == 0) break; if (sc->mii_ticks <= sc->mii_anegticks) break; sc->mii_ticks = 0; PHY_RESET(sc); e1000phy_mii_phy_auto(sc, ife->ifm_media); break; } /* Update the media status. */ PHY_STATUS(sc); /* Callback if something changed. */ mii_phy_update(sc, cmd); return (0); } static void e1000phy_status(struct mii_softc *sc) { struct mii_data *mii = sc->mii_pdata; int bmcr, bmsr, ssr; mii->mii_media_status = IFM_AVALID; mii->mii_media_active = IFM_ETHER; bmsr = PHY_READ(sc, E1000_SR) | PHY_READ(sc, E1000_SR); bmcr = PHY_READ(sc, E1000_CR); ssr = PHY_READ(sc, E1000_SSR); if (bmsr & E1000_SR_LINK_STATUS) mii->mii_media_status |= IFM_ACTIVE; if (bmcr & E1000_CR_LOOPBACK) mii->mii_media_active |= IFM_LOOP; if ((bmcr & E1000_CR_AUTO_NEG_ENABLE) != 0 && (ssr & E1000_SSR_SPD_DPLX_RESOLVED) == 0) { /* Erg, still trying, I guess... */ mii->mii_media_active |= IFM_NONE; return; } if ((sc->mii_flags & MIIF_HAVEFIBER) == 0) { switch (ssr & E1000_SSR_SPEED) { case E1000_SSR_1000MBS: mii->mii_media_active |= IFM_1000_T; break; case E1000_SSR_100MBS: mii->mii_media_active |= IFM_100_TX; break; case E1000_SSR_10MBS: mii->mii_media_active |= IFM_10_T; break; default: mii->mii_media_active |= IFM_NONE; return; } } else { /* * Some fiber PHY(88E1112) does not seem to set resolved * speed so always assume we've got IFM_1000_SX. */ mii->mii_media_active |= IFM_1000_SX; } if (ssr & E1000_SSR_DUPLEX) { mii->mii_media_active |= IFM_FDX; if ((sc->mii_flags & MIIF_HAVEFIBER) == 0) mii->mii_media_active |= mii_phy_flowstatus(sc); } else mii->mii_media_active |= IFM_HDX; if (IFM_SUBTYPE(mii->mii_media_active) == IFM_1000_T) { if (((PHY_READ(sc, E1000_1GSR) | PHY_READ(sc, E1000_1GSR)) & E1000_1GSR_MS_CONFIG_RES) != 0) mii->mii_media_active |= IFM_ETH_MASTER; } } static int e1000phy_mii_phy_auto(struct mii_softc *sc, int media) { uint16_t reg; if ((sc->mii_flags & MIIF_HAVEFIBER) == 0) { reg = PHY_READ(sc, E1000_AR); reg &= ~(E1000_AR_PAUSE | E1000_AR_ASM_DIR); reg |= E1000_AR_10T | E1000_AR_10T_FD | E1000_AR_100TX | E1000_AR_100TX_FD; if ((media & IFM_FLOW) != 0 || (sc->mii_flags & MIIF_FORCEPAUSE) != 0) reg |= E1000_AR_PAUSE | E1000_AR_ASM_DIR; PHY_WRITE(sc, E1000_AR, reg | E1000_AR_SELECTOR_FIELD); } else PHY_WRITE(sc, E1000_AR, E1000_FA_1000X_FD | E1000_FA_1000X); if ((sc->mii_flags & MIIF_HAVE_GTCR) != 0) { reg = 0; if ((sc->mii_extcapabilities & EXTSR_1000TFDX) != 0) reg |= E1000_1GCR_1000T_FD; if ((sc->mii_extcapabilities & EXTSR_1000THDX) != 0) reg |= E1000_1GCR_1000T; PHY_WRITE(sc, E1000_1GCR, reg); } PHY_WRITE(sc, E1000_CR, E1000_CR_AUTO_NEG_ENABLE | E1000_CR_RESTART_AUTO_NEG); return (EJUSTRETURN); } Index: projects/ifnet/sys/dev/mii/miivar.h =================================================================== --- projects/ifnet/sys/dev/mii/miivar.h (revision 277106) +++ projects/ifnet/sys/dev/mii/miivar.h (revision 277107) @@ -1,288 +1,274 @@ /* $NetBSD: miivar.h,v 1.8 1999/04/23 04:24:32 thorpej Exp $ */ /*- * Copyright (c) 1998, 1999 The NetBSD Foundation, Inc. * All rights reserved. * * This code is derived from software contributed to The NetBSD Foundation * by Jason R. Thorpe of the Numerical Aerospace Simulation Facility, * NASA Ames Research Center. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE NETBSD FOUNDATION, INC. AND CONTRIBUTORS * ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED * TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE FOUNDATION OR CONTRIBUTORS * BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR * CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF * SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS * INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN * CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _DEV_MII_MIIVAR_H_ #define _DEV_MII_MIIVAR_H_ #include #include /* XXX driver API temporary */ /* * Media Independent Interface data structure defintions */ struct mii_softc; /* - * Callbacks from MII layer into network interface device driver. - */ -typedef int (*mii_readreg_t)(struct device *, int, int); -typedef void (*mii_writereg_t)(struct device *, int, int, int); -typedef void (*mii_statchg_t)(struct device *); - -/* * A network interface driver has one of these structures in its softc. * It is the interface from the network interface driver to the MII * layer. */ struct mii_data { struct ifmedia mii_media; /* media information */ if_t mii_ifp; /* pointer back to network interface */ /* * For network interfaces with multiple PHYs, a list of all * PHYs is required so they can all be notified when a media * request is made. */ LIST_HEAD(mii_listhead, mii_softc) mii_phys; u_int mii_instance; /* * PHY driver fills this in with active media status. */ u_int mii_media_status; u_int mii_media_active; - - /* - * Calls from MII layer into network interface driver. - */ - mii_readreg_t mii_readreg; - mii_writereg_t mii_writereg; - mii_statchg_t mii_statchg; }; typedef struct mii_data mii_data_t; /* * Functions provided by the PHY to perform various functions. */ struct mii_phy_funcs { int (*pf_service)(struct mii_softc *, struct mii_data *, int); void (*pf_status)(struct mii_softc *); void (*pf_reset)(struct mii_softc *); }; /* * Requests that can be made to the downcall. */ #define MII_TICK 1 /* once-per-second tick */ #define MII_MEDIACHG 2 /* user changed media; perform the switch */ #define MII_POLLSTAT 3 /* user requested media status; fill it in */ /* * Each PHY driver's softc has one of these as the first member. * XXX This would be better named "phy_softc", but this is the name * XXX BSDI used, and we would like to have the same interface. */ struct mii_softc { device_t mii_dev; /* generic device glue */ LIST_ENTRY(mii_softc) mii_list; /* entry on parent's PHY list */ uint32_t mii_mpd_oui; /* the PHY's OUI (MII_OUI())*/ uint32_t mii_mpd_model; /* the PHY's model (MII_MODEL())*/ uint32_t mii_mpd_rev; /* the PHY's revision (MII_REV())*/ u_int mii_capmask; /* capability mask for BMSR */ u_int mii_phy; /* our MII address */ u_int mii_offset; /* first PHY, second PHY, etc. */ u_int mii_inst; /* instance for ifmedia */ /* Our PHY functions. */ const struct mii_phy_funcs *mii_funcs; struct mii_data *mii_pdata; /* pointer to parent's mii_data */ u_int mii_flags; /* misc. flags; see below */ u_int mii_capabilities; /* capabilities from BMSR */ u_int mii_extcapabilities; /* extended capabilities */ u_int mii_ticks; /* MII_TICK counter */ u_int mii_anegticks; /* ticks before retrying aneg */ u_int mii_media_active; /* last active media */ u_int mii_media_status; /* last active status */ }; typedef struct mii_softc mii_softc_t; /* mii_flags */ #define MIIF_INITDONE 0x00000001 /* has been initialized (mii_data) */ #define MIIF_NOISOLATE 0x00000002 /* do not isolate the PHY */ #if 0 #define MIIF_NOLOOP 0x00000004 /* no loopback capability */ #endif #define MIIF_DOINGAUTO 0x00000008 /* doing autonegotiation (mii_softc) */ #define MIIF_AUTOTSLEEP 0x00000010 /* use tsleep(), not callout() */ #define MIIF_HAVEFIBER 0x00000020 /* from parent: has fiber interface */ #define MIIF_HAVE_GTCR 0x00000040 /* has 100base-T2/1000base-T CR */ #define MIIF_IS_1000X 0x00000080 /* is a 1000BASE-X device */ #define MIIF_DOPAUSE 0x00000100 /* advertise PAUSE capability */ #define MIIF_IS_HPNA 0x00000200 /* is a HomePNA device */ #define MIIF_FORCEANEG 0x00000400 /* force auto-negotiation */ #define MIIF_NOMANPAUSE 0x00100000 /* no manual PAUSE selection */ #define MIIF_FORCEPAUSE 0x00200000 /* force PAUSE advertisement */ #define MIIF_MACPRIV0 0x01000000 /* private to the MAC driver */ #define MIIF_MACPRIV1 0x02000000 /* private to the MAC driver */ #define MIIF_MACPRIV2 0x04000000 /* private to the MAC driver */ #define MIIF_PHYPRIV0 0x10000000 /* private to the PHY driver */ #define MIIF_PHYPRIV1 0x20000000 /* private to the PHY driver */ #define MIIF_PHYPRIV2 0x40000000 /* private to the PHY driver */ /* Default mii_anegticks values */ #define MII_ANEGTICKS 5 #define MII_ANEGTICKS_GIGE 17 #define MIIF_INHERIT_MASK (MIIF_NOISOLATE|MIIF_NOLOOP|MIIF_AUTOTSLEEP) /* * Special `locators' passed to mii_attach(). If one of these is not * an `any' value, we look for *that* PHY and configure it. If both * are not `any', that is an error, and mii_attach() will fail. */ #define MII_OFFSET_ANY -1 #define MII_PHY_ANY -1 /* * Used to attach a PHY to a parent. */ struct mii_attach_args { struct mii_data *mii_data; /* pointer to parent data */ u_int mii_phyno; /* MII address */ u_int mii_offset; /* first PHY, second PHY, etc. */ uint32_t mii_id1; /* PHY ID register 1 */ uint32_t mii_id2; /* PHY ID register 2 */ u_int mii_capmask; /* capability mask for BMSR */ }; typedef struct mii_attach_args mii_attach_args_t; /* * Used to match a PHY. */ struct mii_phydesc { uint32_t mpd_oui; /* the PHY's OUI */ uint32_t mpd_model; /* the PHY's model */ const char *mpd_name; /* the PHY's name */ }; #define MII_PHY_DESC(a, b) { MII_OUI_ ## a, MII_MODEL_ ## a ## _ ## b, \ MII_STR_ ## a ## _ ## b } #define MII_PHY_END { 0, 0, NULL } /* * An array of these structures map MII media types to BMCR/ANAR settings. */ struct mii_media { u_int mm_bmcr; /* BMCR settings for this media */ u_int mm_anar; /* ANAR settings for this media */ u_int mm_gtcr; /* 100base-T2 or 1000base-T CR */ }; #define MII_MEDIA_NONE 0 #define MII_MEDIA_10_T 1 #define MII_MEDIA_10_T_FDX 2 #define MII_MEDIA_100_T4 3 #define MII_MEDIA_100_TX 4 #define MII_MEDIA_100_TX_FDX 5 #define MII_MEDIA_1000_X 6 #define MII_MEDIA_1000_X_FDX 7 #define MII_MEDIA_1000_T 8 #define MII_MEDIA_1000_T_FDX 9 #define MII_NMEDIA 10 #ifdef _KERNEL #define PHY_READ(p, r) \ MIIBUS_READREG((p)->mii_dev, (p)->mii_phy, (r)) #define PHY_WRITE(p, r, v) \ MIIBUS_WRITEREG((p)->mii_dev, (p)->mii_phy, (r), (v)) #define PHY_SERVICE(p, d, o) \ (*(p)->mii_funcs->pf_service)((p), (d), (o)) #define PHY_STATUS(p) \ (*(p)->mii_funcs->pf_status)(p) #define PHY_RESET(p) \ (*(p)->mii_funcs->pf_reset)(p) enum miibus_device_ivars { MIIBUS_IVAR_FLAGS }; /* * Simplified accessors for miibus */ #define MIIBUS_ACCESSOR(var, ivar, type) \ __BUS_ACCESSOR(miibus, var, MIIBUS, ivar, type) MIIBUS_ACCESSOR(flags, FLAGS, u_int) extern devclass_t miibus_devclass; extern driver_t miibus_driver; int mii_attach(device_t, device_t *, if_t, ifm_change_cb_t, ifm_stat_cb_t, int, int, int, int); void mii_down(struct mii_data *); int mii_mediachg(struct mii_data *); void mii_tick(struct mii_data *); void mii_pollstat(struct mii_data *); void mii_phy_add_media(struct mii_softc *); int mii_phy_auto(struct mii_softc *); int mii_phy_detach(device_t dev); void mii_phy_down(struct mii_softc *); u_int mii_phy_flowstatus(struct mii_softc *); void mii_phy_reset(struct mii_softc *); void mii_phy_setmedia(struct mii_softc *sc); void mii_phy_update(struct mii_softc *, int); int mii_phy_tick(struct mii_softc *); int mii_phy_mac_match(struct mii_softc *, const char *); int mii_dev_mac_match(device_t, const char *); void *mii_phy_mac_softc(struct mii_softc *); void *mii_dev_mac_softc(device_t); const struct mii_phydesc * mii_phy_match(const struct mii_attach_args *ma, const struct mii_phydesc *mpd); const struct mii_phydesc * mii_phy_match_gen(const struct mii_attach_args *ma, const struct mii_phydesc *mpd, size_t endlen); int mii_phy_dev_probe(device_t dev, const struct mii_phydesc *mpd, int mrv); void mii_phy_dev_attach(device_t dev, u_int flags, const struct mii_phy_funcs *mpf, int add_media); void ukphy_status(struct mii_softc *); u_int mii_oui(u_int, u_int); #define MII_OUI(id1, id2) mii_oui(id1, id2) #define MII_MODEL(id2) (((id2) & IDR2_MODEL) >> 4) #define MII_REV(id2) ((id2) & IDR2_REV) #endif /* _KERNEL */ #endif /* _DEV_MII_MIIVAR_H_ */ Index: projects/ifnet/sys/dev/ofw/ofw_bus_subr.c =================================================================== --- projects/ifnet/sys/dev/ofw/ofw_bus_subr.c (revision 277106) +++ projects/ifnet/sys/dev/ofw/ofw_bus_subr.c (revision 277107) @@ -1,441 +1,489 @@ /*- * Copyright (c) 2001 - 2003 by Thomas Moestl . * Copyright (c) 2005 Marius Strobl * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions, and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_platform.h" #include #include #include #include #include #include #include #include #include #include "ofw_bus_if.h" int ofw_bus_gen_setup_devinfo(struct ofw_bus_devinfo *obd, phandle_t node) { if (obd == NULL) return (ENOMEM); /* The 'name' property is considered mandatory. */ if ((OF_getprop_alloc(node, "name", 1, (void **)&obd->obd_name)) == -1) return (EINVAL); OF_getprop_alloc(node, "compatible", 1, (void **)&obd->obd_compat); OF_getprop_alloc(node, "device_type", 1, (void **)&obd->obd_type); OF_getprop_alloc(node, "model", 1, (void **)&obd->obd_model); OF_getprop_alloc(node, "status", 1, (void **)&obd->obd_status); obd->obd_node = node; return (0); } void ofw_bus_gen_destroy_devinfo(struct ofw_bus_devinfo *obd) { if (obd == NULL) return; if (obd->obd_compat != NULL) free(obd->obd_compat, M_OFWPROP); if (obd->obd_model != NULL) free(obd->obd_model, M_OFWPROP); if (obd->obd_name != NULL) free(obd->obd_name, M_OFWPROP); if (obd->obd_type != NULL) free(obd->obd_type, M_OFWPROP); if (obd->obd_status != NULL) free(obd->obd_status, M_OFWPROP); } int ofw_bus_gen_child_pnpinfo_str(device_t cbdev, device_t child, char *buf, size_t buflen) { if (ofw_bus_get_name(child) != NULL) { strlcat(buf, "name=", buflen); strlcat(buf, ofw_bus_get_name(child), buflen); } if (ofw_bus_get_compat(child) != NULL) { strlcat(buf, " compat=", buflen); strlcat(buf, ofw_bus_get_compat(child), buflen); } return (0); }; const char * ofw_bus_gen_get_compat(device_t bus, device_t dev) { const struct ofw_bus_devinfo *obd; obd = OFW_BUS_GET_DEVINFO(bus, dev); if (obd == NULL) return (NULL); return (obd->obd_compat); } const char * ofw_bus_gen_get_model(device_t bus, device_t dev) { const struct ofw_bus_devinfo *obd; obd = OFW_BUS_GET_DEVINFO(bus, dev); if (obd == NULL) return (NULL); return (obd->obd_model); } const char * ofw_bus_gen_get_name(device_t bus, device_t dev) { const struct ofw_bus_devinfo *obd; obd = OFW_BUS_GET_DEVINFO(bus, dev); if (obd == NULL) return (NULL); return (obd->obd_name); } phandle_t ofw_bus_gen_get_node(device_t bus, device_t dev) { const struct ofw_bus_devinfo *obd; obd = OFW_BUS_GET_DEVINFO(bus, dev); if (obd == NULL) return (0); return (obd->obd_node); } const char * ofw_bus_gen_get_type(device_t bus, device_t dev) { const struct ofw_bus_devinfo *obd; obd = OFW_BUS_GET_DEVINFO(bus, dev); if (obd == NULL) return (NULL); return (obd->obd_type); } const char * ofw_bus_get_status(device_t dev) { const struct ofw_bus_devinfo *obd; obd = OFW_BUS_GET_DEVINFO(device_get_parent(dev), dev); if (obd == NULL) return (NULL); return (obd->obd_status); } int ofw_bus_status_okay(device_t dev) { const char *status; status = ofw_bus_get_status(dev); if (status == NULL || strcmp(status, "okay") == 0) return (1); return (0); } int ofw_bus_is_compatible(device_t dev, const char *onecompat) { phandle_t node; const char *compat; int len, onelen, l; if ((compat = ofw_bus_get_compat(dev)) == NULL) return (0); if ((node = ofw_bus_get_node(dev)) == -1) return (0); /* Get total 'compatible' prop len */ if ((len = OF_getproplen(node, "compatible")) <= 0) return (0); onelen = strlen(onecompat); while (len > 0) { if (strlen(compat) == onelen && strncasecmp(compat, onecompat, onelen) == 0) /* Found it. */ return (1); /* Slide to the next sub-string. */ l = strlen(compat) + 1; compat += l; len -= l; } return (0); } int ofw_bus_is_compatible_strict(device_t dev, const char *compatible) { const char *compat; size_t len; if ((compat = ofw_bus_get_compat(dev)) == NULL) return (0); len = strlen(compatible); if (strlen(compat) == len && strncasecmp(compat, compatible, len) == 0) return (1); return (0); } const struct ofw_compat_data * ofw_bus_search_compatible(device_t dev, const struct ofw_compat_data *compat) { if (compat == NULL) return NULL; for (; compat->ocd_str != NULL; ++compat) { if (ofw_bus_is_compatible(dev, compat->ocd_str)) break; } return (compat); } int ofw_bus_has_prop(device_t dev, const char *propname) { phandle_t node; if ((node = ofw_bus_get_node(dev)) == -1) return (0); return (OF_hasprop(node, propname)); } void ofw_bus_setup_iinfo(phandle_t node, struct ofw_bus_iinfo *ii, int intrsz) { pcell_t addrc; int msksz; if (OF_getencprop(node, "#address-cells", &addrc, sizeof(addrc)) == -1) addrc = 2; ii->opi_addrc = addrc * sizeof(pcell_t); ii->opi_imapsz = OF_getencprop_alloc(node, "interrupt-map", 1, (void **)&ii->opi_imap); if (ii->opi_imapsz > 0) { msksz = OF_getencprop_alloc(node, "interrupt-map-mask", 1, (void **)&ii->opi_imapmsk); /* * Failure to get the mask is ignored; a full mask is used * then. We barf on bad mask sizes, however. */ if (msksz != -1 && msksz != ii->opi_addrc + intrsz) panic("ofw_bus_setup_iinfo: bad interrupt-map-mask " "property!"); } } int ofw_bus_lookup_imap(phandle_t node, struct ofw_bus_iinfo *ii, void *reg, int regsz, void *pintr, int pintrsz, void *mintr, int mintrsz, phandle_t *iparent) { uint8_t maskbuf[regsz + pintrsz]; int rv; if (ii->opi_imapsz <= 0) return (0); KASSERT(regsz >= ii->opi_addrc, ("ofw_bus_lookup_imap: register size too small: %d < %d", regsz, ii->opi_addrc)); if (node != -1) { rv = OF_getencprop(node, "reg", reg, regsz); if (rv < regsz) panic("ofw_bus_lookup_imap: cannot get reg property"); } return (ofw_bus_search_intrmap(pintr, pintrsz, reg, ii->opi_addrc, ii->opi_imap, ii->opi_imapsz, ii->opi_imapmsk, maskbuf, mintr, mintrsz, iparent)); } /* * Map an interrupt using the firmware reg, interrupt-map and * interrupt-map-mask properties. * The interrupt property to be mapped must be of size intrsz, and pointed to * by intr. The regs property of the node for which the mapping is done must * be passed as regs. This property is an array of register specifications; * the size of the address part of such a specification must be passed as * physsz. Only the first element of the property is used. * imap and imapsz hold the interrupt mask and it's size. * imapmsk is a pointer to the interrupt-map-mask property, which must have * a size of physsz + intrsz; it may be NULL, in which case a full mask is * assumed. * maskbuf must point to a buffer of length physsz + intrsz. * The interrupt is returned in result, which must point to a buffer of length * rintrsz (which gives the expected size of the mapped interrupt). * Returns number of cells in the interrupt if a mapping was found, 0 otherwise. */ int ofw_bus_search_intrmap(void *intr, int intrsz, void *regs, int physsz, void *imap, int imapsz, void *imapmsk, void *maskbuf, void *result, int rintrsz, phandle_t *iparent) { phandle_t parent; uint8_t *ref = maskbuf; uint8_t *uiintr = intr; uint8_t *uiregs = regs; uint8_t *uiimapmsk = imapmsk; uint8_t *mptr; pcell_t pintrsz; int i, rsz, tsz; rsz = -1; if (imapmsk != NULL) { for (i = 0; i < physsz; i++) ref[i] = uiregs[i] & uiimapmsk[i]; for (i = 0; i < intrsz; i++) ref[physsz + i] = uiintr[i] & uiimapmsk[physsz + i]; } else { bcopy(regs, ref, physsz); bcopy(intr, ref + physsz, intrsz); } mptr = imap; i = imapsz; while (i > 0) { bcopy(mptr + physsz + intrsz, &parent, sizeof(parent)); if (OF_searchencprop(OF_node_from_xref(parent), "#interrupt-cells", &pintrsz, sizeof(pintrsz)) == -1) pintrsz = 1; /* default */ pintrsz *= sizeof(pcell_t); /* Compute the map stride size. */ tsz = physsz + intrsz + sizeof(phandle_t) + pintrsz; KASSERT(i >= tsz, ("ofw_bus_search_intrmap: truncated map")); if (bcmp(ref, mptr, physsz + intrsz) == 0) { bcopy(mptr + physsz + intrsz + sizeof(parent), result, MIN(rintrsz, pintrsz)); if (iparent != NULL) *iparent = parent; return (pintrsz/sizeof(pcell_t)); } mptr += tsz; i -= tsz; } return (0); } int +ofw_bus_reg_to_rl(device_t dev, phandle_t node, pcell_t acells, pcell_t scells, + struct resource_list *rl) +{ + uint64_t phys, size; + ssize_t i, j, rid, nreg, ret; + uint32_t *reg; + char *name; + + /* + * This may be just redundant when having ofw_bus_devinfo + * but makes this routine independent of it. + */ + ret = OF_getencprop_alloc(node, "name", sizeof(*name), (void **)&name); + if (ret == -1) + name = NULL; + + ret = OF_getencprop_alloc(node, "reg", sizeof(*reg), (void **)®); + nreg = (ret == -1) ? 0 : ret; + + if (nreg % (acells + scells) != 0) { + if (bootverbose) + device_printf(dev, "Malformed reg property on <%s>\n", + (name == NULL) ? "unknown" : name); + nreg = 0; + } + + for (i = 0, rid = 0; i < nreg; i += acells + scells, rid++) { + phys = size = 0; + for (j = 0; j < acells; j++) { + phys <<= 32; + phys |= reg[i + j]; + } + for (j = 0; j < scells; j++) { + size <<= 32; + size |= reg[i + acells + j]; + } + /* Skip the dummy reg property of glue devices like ssm(4). */ + if (size != 0) + resource_list_add(rl, SYS_RES_MEMORY, rid, + phys, phys + size - 1, size); + } + free(name, M_OFWPROP); + free(reg, M_OFWPROP); + + return (0); +} + +int ofw_bus_intr_to_rl(device_t dev, phandle_t node, struct resource_list *rl) { phandle_t iparent; uint32_t icells, *intr; int err, i, irqnum, nintr, rid; boolean_t extended; nintr = OF_getencprop_alloc(node, "interrupts", sizeof(*intr), (void **)&intr); if (nintr > 0) { if (OF_searchencprop(node, "interrupt-parent", &iparent, sizeof(iparent)) == -1) { for (iparent = node; iparent != 0; iparent = OF_parent(node)) { if (OF_hasprop(iparent, "interrupt-controller")) break; } if (iparent == 0) { device_printf(dev, "No interrupt-parent found, " "assuming direct parent\n"); iparent = OF_parent(node); } iparent = OF_xref_from_node(iparent); } if (OF_searchencprop(OF_node_from_xref(iparent), "#interrupt-cells", &icells, sizeof(icells)) == -1) { device_printf(dev, "Missing #interrupt-cells " "property, assuming <1>\n"); icells = 1; } if (icells < 1 || icells > nintr) { device_printf(dev, "Invalid #interrupt-cells property " "value <%d>, assuming <1>\n", icells); icells = 1; } extended = false; } else { nintr = OF_getencprop_alloc(node, "interrupts-extended", sizeof(*intr), (void **)&intr); if (nintr <= 0) return (0); extended = true; } err = 0; rid = 0; for (i = 0; i < nintr; i += icells) { if (extended) { iparent = intr[i++]; if (OF_searchencprop(OF_node_from_xref(iparent), "#interrupt-cells", &icells, sizeof(icells)) == -1) { device_printf(dev, "Missing #interrupt-cells " "property\n"); err = ENOENT; break; } if (icells < 1 || (i + icells) > nintr) { device_printf(dev, "Invalid #interrupt-cells " "property value <%d>\n", icells); err = ERANGE; break; } } irqnum = ofw_bus_map_intr(dev, iparent, icells, &intr[i]); resource_list_add(rl, SYS_RES_IRQ, rid++, irqnum, irqnum, 1); } free(intr, M_OFWPROP); return (err); } Index: projects/ifnet/sys/dev/ofw/ofw_bus_subr.h =================================================================== --- projects/ifnet/sys/dev/ofw/ofw_bus_subr.h (revision 277106) +++ projects/ifnet/sys/dev/ofw/ofw_bus_subr.h (revision 277107) @@ -1,102 +1,104 @@ /*- * Copyright (c) 2005 Marius Strobl * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions, and the following disclaimer, * without modification, immediately at the beginning of the file. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR * ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * $FreeBSD$ */ #ifndef _DEV_OFW_OFW_BUS_SUBR_H_ #define _DEV_OFW_OFW_BUS_SUBR_H_ #include #include #include "ofw_bus_if.h" #define ORIP_NOINT -1 #define ORIR_NOTFOUND 0xffffffff struct ofw_bus_iinfo { uint8_t *opi_imap; uint8_t *opi_imapmsk; int opi_imapsz; pcell_t opi_addrc; }; struct ofw_compat_data { const char *ocd_str; uintptr_t ocd_data; }; /* Generic implementation of ofw_bus_if.m methods and helper routines */ int ofw_bus_gen_setup_devinfo(struct ofw_bus_devinfo *, phandle_t); void ofw_bus_gen_destroy_devinfo(struct ofw_bus_devinfo *); ofw_bus_get_compat_t ofw_bus_gen_get_compat; ofw_bus_get_model_t ofw_bus_gen_get_model; ofw_bus_get_name_t ofw_bus_gen_get_name; ofw_bus_get_node_t ofw_bus_gen_get_node; ofw_bus_get_type_t ofw_bus_gen_get_type; /* Helper method to report interesting OF properties in pnpinfo */ bus_child_pnpinfo_str_t ofw_bus_gen_child_pnpinfo_str; /* Routines for processing firmware interrupt maps */ void ofw_bus_setup_iinfo(phandle_t, struct ofw_bus_iinfo *, int); int ofw_bus_lookup_imap(phandle_t, struct ofw_bus_iinfo *, void *, int, void *, int, void *, int, phandle_t *); int ofw_bus_search_intrmap(void *, int, void *, int, void *, int, void *, void *, void *, int, phandle_t *); /* Routines for parsing device-tree data into resource lists. */ +int ofw_bus_reg_to_rl(device_t, phandle_t, pcell_t, pcell_t, + struct resource_list *); int ofw_bus_intr_to_rl(device_t, phandle_t, struct resource_list *); /* Helper to get device status property */ const char *ofw_bus_get_status(device_t dev); int ofw_bus_status_okay(device_t dev); /* Helper to get node's interrupt parent */ void ofw_bus_find_iparent(phandle_t); /* Helper routine for checking compat prop */ int ofw_bus_is_compatible(device_t, const char *); int ofw_bus_is_compatible_strict(device_t, const char *); /* * Helper routine to search a list of compat properties. The table is * terminated by an entry with a NULL compat-string pointer; a pointer to that * table entry is returned if none of the compat strings match for the device, * giving you control over the not-found value. Will not return NULL unless the * provided table pointer is NULL. */ const struct ofw_compat_data * ofw_bus_search_compatible(device_t, const struct ofw_compat_data *); /* Helper routine for checking existence of a prop */ int ofw_bus_has_prop(device_t, const char *); #endif /* !_DEV_OFW_OFW_BUS_SUBR_H_ */ Index: projects/ifnet/sys/dev/ofw/ofwbus.c =================================================================== --- projects/ifnet/sys/dev/ofw/ofwbus.c (revision 277106) +++ projects/ifnet/sys/dev/ofw/ofwbus.c (revision 277107) @@ -1,518 +1,488 @@ /*- * Copyright 1998 Massachusetts Institute of Technology * Copyright 2001 by Thomas Moestl . * Copyright 2006 by Marius Strobl . * All rights reserved. * * Permission to use, copy, modify, and distribute this software and * its documentation for any purpose and without fee is hereby * granted, provided that both the above copyright notice and this * permission notice appear in all copies, that both the above * copyright notice and this permission notice appear in all * supporting documentation, and that the name of M.I.T. not be used * in advertising or publicity pertaining to distribution of the * software without specific, written prior permission. M.I.T. makes * no representations about the suitability of this software for any * purpose. It is provided "as is" without express or implied * warranty. * * THIS SOFTWARE IS PROVIDED BY M.I.T. ``AS IS''. M.I.T. DISCLAIMS * ALL EXPRESS OR IMPLIED WARRANTIES WITH REGARD TO THIS SOFTWARE, * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT * SHALL M.I.T. BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF * USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND * ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, * OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT * OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * from: FreeBSD: src/sys/i386/i386/nexus.c,v 1.43 2001/02/09 */ #include __FBSDID("$FreeBSD$"); #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include /* * The ofwbus (which is a pseudo-bus actually) iterates over the nodes that * hang from the Open Firmware root node and adds them as devices to this bus * (except some special nodes which are excluded) so that drivers can be * attached to them. * */ struct ofwbus_devinfo { struct ofw_bus_devinfo ndi_obdinfo; struct resource_list ndi_rl; }; struct ofwbus_softc { uint32_t acells, scells; struct rman sc_intr_rman; struct rman sc_mem_rman; }; static device_identify_t ofwbus_identify; static device_probe_t ofwbus_probe; static device_attach_t ofwbus_attach; static bus_print_child_t ofwbus_print_child; static bus_add_child_t ofwbus_add_child; static bus_probe_nomatch_t ofwbus_probe_nomatch; static bus_alloc_resource_t ofwbus_alloc_resource; static bus_adjust_resource_t ofwbus_adjust_resource; static bus_release_resource_t ofwbus_release_resource; static bus_get_resource_list_t ofwbus_get_resource_list; static ofw_bus_get_devinfo_t ofwbus_get_devinfo; static int ofwbus_inlist(const char *, const char *const *); static struct ofwbus_devinfo * ofwbus_setup_dinfo(device_t, phandle_t); static void ofwbus_destroy_dinfo(struct ofwbus_devinfo *); static int ofwbus_print_res(struct ofwbus_devinfo *); static device_method_t ofwbus_methods[] = { /* Device interface */ DEVMETHOD(device_identify, ofwbus_identify), DEVMETHOD(device_probe, ofwbus_probe), DEVMETHOD(device_attach, ofwbus_attach), DEVMETHOD(device_detach, bus_generic_detach), DEVMETHOD(device_shutdown, bus_generic_shutdown), DEVMETHOD(device_suspend, bus_generic_suspend), DEVMETHOD(device_resume, bus_generic_resume), /* Bus interface */ DEVMETHOD(bus_print_child, ofwbus_print_child), DEVMETHOD(bus_probe_nomatch, ofwbus_probe_nomatch), DEVMETHOD(bus_read_ivar, bus_generic_read_ivar), DEVMETHOD(bus_write_ivar, bus_generic_write_ivar), DEVMETHOD(bus_add_child, ofwbus_add_child), DEVMETHOD(bus_child_pnpinfo_str, ofw_bus_gen_child_pnpinfo_str), DEVMETHOD(bus_alloc_resource, ofwbus_alloc_resource), DEVMETHOD(bus_adjust_resource, ofwbus_adjust_resource), DEVMETHOD(bus_release_resource, ofwbus_release_resource), DEVMETHOD(bus_set_resource, bus_generic_rl_set_resource), DEVMETHOD(bus_get_resource, bus_generic_rl_get_resource), DEVMETHOD(bus_get_resource_list, ofwbus_get_resource_list), DEVMETHOD(bus_activate_resource, bus_generic_activate_resource), DEVMETHOD(bus_deactivate_resource, bus_generic_deactivate_resource), DEVMETHOD(bus_config_intr, bus_generic_config_intr), DEVMETHOD(bus_setup_intr, bus_generic_setup_intr), DEVMETHOD(bus_teardown_intr, bus_generic_teardown_intr), /* ofw_bus interface */ DEVMETHOD(ofw_bus_get_devinfo, ofwbus_get_devinfo), DEVMETHOD(ofw_bus_get_compat, ofw_bus_gen_get_compat), DEVMETHOD(ofw_bus_get_model, ofw_bus_gen_get_model), DEVMETHOD(ofw_bus_get_name, ofw_bus_gen_get_name), DEVMETHOD(ofw_bus_get_node, ofw_bus_gen_get_node), DEVMETHOD(ofw_bus_get_type, ofw_bus_gen_get_type), DEVMETHOD_END }; static driver_t ofwbus_driver = { "ofwbus", ofwbus_methods, sizeof(struct ofwbus_softc) }; static devclass_t ofwbus_devclass; EARLY_DRIVER_MODULE(ofwbus, nexus, ofwbus_driver, ofwbus_devclass, 0, 0, BUS_PASS_BUS + BUS_PASS_ORDER_MIDDLE); MODULE_VERSION(ofwbus, 1); static const char *const ofwbus_excl_name[] = { "FJSV,system", "aliases", "associations", "chosen", "cmp", "counter-timer", /* No separate device; handled by psycho/sbus */ "failsafe", "memory", "openprom", "options", "packages", "physical-memory", "rsc", "sgcn", "todsg", "virtual-memory", NULL }; static const char *const ofwbus_excl_type[] = { "core", "cpu", NULL }; static int ofwbus_inlist(const char *name, const char *const *list) { int i; if (name == NULL) return (0); for (i = 0; list[i] != NULL; i++) if (strcmp(name, list[i]) == 0) return (1); return (0); } #define OFWBUS_EXCLUDED(name, type) \ (ofwbus_inlist((name), ofwbus_excl_name) || \ ((type) != NULL && ofwbus_inlist((type), ofwbus_excl_type))) static void ofwbus_identify(driver_t *driver, device_t parent) { /* Check if Open Firmware has been instantiated */ if (OF_peer(0) == 0) return; if (device_find_child(parent, "ofwbus", -1) == NULL) BUS_ADD_CHILD(parent, 0, "ofwbus", -1); } static int ofwbus_probe(device_t dev) { device_set_desc(dev, "Open Firmware Device Tree"); return (BUS_PROBE_NOWILDCARD); } static int ofwbus_attach(device_t dev) { struct ofwbus_devinfo *ndi; struct ofwbus_softc *sc; device_t cdev; phandle_t node; sc = device_get_softc(dev); node = OF_peer(0); /* * If no Open Firmware, bail early */ if (node == -1) return (ENXIO); sc->sc_intr_rman.rm_type = RMAN_ARRAY; sc->sc_intr_rman.rm_descr = "Interrupts"; sc->sc_mem_rman.rm_type = RMAN_ARRAY; sc->sc_mem_rman.rm_descr = "Device Memory"; if (rman_init(&sc->sc_intr_rman) != 0 || rman_init(&sc->sc_mem_rman) != 0 || rman_manage_region(&sc->sc_intr_rman, 0, ~0) != 0 || rman_manage_region(&sc->sc_mem_rman, 0, BUS_SPACE_MAXADDR) != 0) panic("%s: failed to set up rmans.", __func__); /* * Allow devices to identify. */ bus_generic_probe(dev); /* * Some important numbers */ sc->acells = 2; OF_getencprop(node, "#address-cells", &sc->acells, sizeof(sc->acells)); sc->scells = 1; OF_getencprop(node, "#size-cells", &sc->scells, sizeof(sc->scells)); /* * Now walk the OFW tree and attach top-level devices. */ for (node = OF_child(node); node > 0; node = OF_peer(node)) { if ((ndi = ofwbus_setup_dinfo(dev, node)) == NULL) continue; cdev = device_add_child(dev, NULL, -1); if (cdev == NULL) { device_printf(dev, "<%s>: device_add_child failed\n", ndi->ndi_obdinfo.obd_name); ofwbus_destroy_dinfo(ndi); continue; } device_set_ivars(cdev, ndi); } return (bus_generic_attach(dev)); } static device_t ofwbus_add_child(device_t dev, u_int order, const char *name, int unit) { device_t cdev; struct ofwbus_devinfo *ndi; cdev = device_add_child_ordered(dev, order, name, unit); if (cdev == NULL) return (NULL); ndi = malloc(sizeof(*ndi), M_DEVBUF, M_WAITOK | M_ZERO); ndi->ndi_obdinfo.obd_node = -1; resource_list_init(&ndi->ndi_rl); device_set_ivars(cdev, ndi); return (cdev); } static int ofwbus_print_child(device_t bus, device_t child) { int rv; rv = bus_print_child_header(bus, child); rv += ofwbus_print_res(device_get_ivars(child)); rv += bus_print_child_footer(bus, child); return (rv); } static void ofwbus_probe_nomatch(device_t bus, device_t child) { const char *name, *type; if (!bootverbose) return; name = ofw_bus_get_name(child); type = ofw_bus_get_type(child); device_printf(bus, "<%s>", name != NULL ? name : "unknown"); ofwbus_print_res(device_get_ivars(child)); printf(" type %s (no driver attached)\n", type != NULL ? type : "unknown"); } static struct resource * ofwbus_alloc_resource(device_t bus, device_t child, int type, int *rid, u_long start, u_long end, u_long count, u_int flags) { struct ofwbus_softc *sc; struct rman *rm; struct resource *rv; struct resource_list_entry *rle; int isdefault, passthrough; isdefault = (start == 0UL && end == ~0UL); passthrough = (device_get_parent(child) != bus); sc = device_get_softc(bus); rle = NULL; if (!passthrough && isdefault) { rle = resource_list_find(BUS_GET_RESOURCE_LIST(bus, child), type, *rid); if (rle == NULL) return (NULL); if (rle->res != NULL) panic("%s: resource entry is busy", __func__); start = rle->start; count = ulmax(count, rle->count); end = ulmax(rle->end, start + count - 1); } switch (type) { case SYS_RES_IRQ: rm = &sc->sc_intr_rman; break; case SYS_RES_MEMORY: rm = &sc->sc_mem_rman; break; default: return (NULL); } rv = rman_reserve_resource(rm, start, end, count, flags & ~RF_ACTIVE, child); if (rv == NULL) return (NULL); rman_set_rid(rv, *rid); if ((flags & RF_ACTIVE) != 0 && bus_activate_resource(child, type, *rid, rv) != 0) { rman_release_resource(rv); return (NULL); } if (!passthrough && rle != NULL) { rle->res = rv; rle->start = rman_get_start(rv); rle->end = rman_get_end(rv); rle->count = rle->end - rle->start + 1; } return (rv); } static int ofwbus_adjust_resource(device_t bus, device_t child __unused, int type, struct resource *r, u_long start, u_long end) { struct ofwbus_softc *sc; struct rman *rm; device_t ofwbus; ofwbus = bus; while (strcmp(device_get_name(device_get_parent(ofwbus)), "root") != 0) ofwbus = device_get_parent(ofwbus); sc = device_get_softc(ofwbus); switch (type) { case SYS_RES_IRQ: rm = &sc->sc_intr_rman; break; case SYS_RES_MEMORY: rm = &sc->sc_mem_rman; break; default: return (EINVAL); } if (rm == NULL) return (ENXIO); if (rman_is_region_manager(r, rm) == 0) return (EINVAL); return (rman_adjust_resource(r, start, end)); } static int ofwbus_release_resource(device_t bus, device_t child, int type, int rid, struct resource *r) { struct resource_list_entry *rle; int error; /* Clean resource list entry */ rle = resource_list_find(BUS_GET_RESOURCE_LIST(bus, child), type, rid); if (rle != NULL) rle->res = NULL; if ((rman_get_flags(r) & RF_ACTIVE) != 0) { error = bus_deactivate_resource(child, type, rid, r); if (error) return (error); } return (rman_release_resource(r)); } static struct resource_list * ofwbus_get_resource_list(device_t bus __unused, device_t child) { struct ofwbus_devinfo *ndi; ndi = device_get_ivars(child); return (&ndi->ndi_rl); } static const struct ofw_bus_devinfo * ofwbus_get_devinfo(device_t bus __unused, device_t child) { struct ofwbus_devinfo *ndi; ndi = device_get_ivars(child); return (&ndi->ndi_obdinfo); } static struct ofwbus_devinfo * ofwbus_setup_dinfo(device_t dev, phandle_t node) { struct ofwbus_softc *sc; struct ofwbus_devinfo *ndi; const char *nodename; - uint32_t *reg; - uint64_t phys, size; - int i, j, rid; - int nreg; sc = device_get_softc(dev); ndi = malloc(sizeof(*ndi), M_DEVBUF, M_WAITOK | M_ZERO); if (ofw_bus_gen_setup_devinfo(&ndi->ndi_obdinfo, node) != 0) { free(ndi, M_DEVBUF); return (NULL); } nodename = ndi->ndi_obdinfo.obd_name; if (OFWBUS_EXCLUDED(nodename, ndi->ndi_obdinfo.obd_type)) { ofw_bus_gen_destroy_devinfo(&ndi->ndi_obdinfo); free(ndi, M_DEVBUF); return (NULL); } resource_list_init(&ndi->ndi_rl); - nreg = OF_getencprop_alloc(node, "reg", sizeof(*reg), (void **)®); - if (nreg == -1) - nreg = 0; - if (nreg % (sc->acells + sc->scells) != 0) { - if (bootverbose) - device_printf(dev, "Malformed reg property on <%s>\n", - nodename); - nreg = 0; - } - - for (i = 0, rid = 0; i < nreg; i += sc->acells + sc->scells, rid++) { - phys = size = 0; - for (j = 0; j < sc->acells; j++) { - phys <<= 32; - phys |= reg[i + j]; - } - for (j = 0; j < sc->scells; j++) { - size <<= 32; - size |= reg[i + sc->acells + j]; - } - /* Skip the dummy reg property of glue devices like ssm(4). */ - if (size != 0) - resource_list_add(&ndi->ndi_rl, SYS_RES_MEMORY, rid, - phys, phys + size - 1, size); - } - free(reg, M_OFWPROP); - + ofw_bus_reg_to_rl(dev, node, sc->acells, sc->scells, &ndi->ndi_rl); ofw_bus_intr_to_rl(dev, node, &ndi->ndi_rl); return (ndi); } static void ofwbus_destroy_dinfo(struct ofwbus_devinfo *ndi) { resource_list_free(&ndi->ndi_rl); ofw_bus_gen_destroy_devinfo(&ndi->ndi_obdinfo); free(ndi, M_DEVBUF); } static int ofwbus_print_res(struct ofwbus_devinfo *ndi) { int rv; rv = 0; rv += resource_list_print_type(&ndi->ndi_rl, "mem", SYS_RES_MEMORY, "%#lx"); rv += resource_list_print_type(&ndi->ndi_rl, "irq", SYS_RES_IRQ, "%ld"); return (rv); } Index: projects/ifnet/sys/dev/xen/netback/netback.c =================================================================== --- projects/ifnet/sys/dev/xen/netback/netback.c (revision 277106) +++ projects/ifnet/sys/dev/xen/netback/netback.c (revision 277107) @@ -1,2536 +1,2535 @@ /*- * Copyright (c) 2009-2011 Spectra Logic Corporation * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions, and the following disclaimer, * without modification. * 2. Redistributions in binary form must reproduce at minimum a disclaimer * substantially similar to the "NO WARRANTY" disclaimer below * ("Disclaimer") and any redistribution must be conditioned upon * including a substantially similar Disclaimer requirement for further * binary redistribution. * * NO WARRANTY * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT * HOLDERS OR CONTRIBUTORS BE LIABLE FOR SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING * IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE * POSSIBILITY OF SUCH DAMAGES. * * Authors: Justin T. Gibbs (Spectra Logic Corporation) * Alan Somers (Spectra Logic Corporation) * John Suykerbuyk (Spectra Logic Corporation) */ #include __FBSDID("$FreeBSD$"); /** * \file netback.c * * \brief Device driver supporting the vending of network access * from this FreeBSD domain to other domains. */ #include "opt_inet.h" #include "opt_inet6.h" #include "opt_sctp.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #if __FreeBSD_version >= 700000 #include #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include /*--------------------------- Compile-time Tunables --------------------------*/ /*---------------------------------- Macros ----------------------------------*/ /** * Custom malloc type for all driver allocations. */ static MALLOC_DEFINE(M_XENNETBACK, "xnb", "Xen Net Back Driver Data"); #define XNB_SG 1 /* netback driver supports feature-sg */ #define XNB_GSO_TCPV4 1 /* netback driver supports feature-gso-tcpv4 */ #define XNB_RX_COPY 1 /* netback driver supports feature-rx-copy */ #define XNB_RX_FLIP 0 /* netback driver does not support feature-rx-flip */ #undef XNB_DEBUG #define XNB_DEBUG /* hardcode on during development */ #ifdef XNB_DEBUG #define DPRINTF(fmt, args...) \ printf("xnb(%s:%d): " fmt, __FUNCTION__, __LINE__, ##args) #else #define DPRINTF(fmt, args...) do {} while (0) #endif /* Default length for stack-allocated grant tables */ #define GNTTAB_LEN (64) /* Features supported by all backends. TSO and LRO can be negotiated */ #define XNB_CSUM_FEATURES (CSUM_TCP | CSUM_UDP) #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE) /** * Two argument version of the standard macro. Second argument is a tentative * value of req_cons */ #define RING_HAS_UNCONSUMED_REQUESTS_2(_r, cons) ({ \ unsigned int req = (_r)->sring->req_prod - cons; \ unsigned int rsp = RING_SIZE(_r) - \ (cons - (_r)->rsp_prod_pvt); \ req < rsp ? req : rsp; \ }) #define virt_to_mfn(x) (vtomach(x) >> PAGE_SHIFT) #define virt_to_offset(x) ((x) & (PAGE_SIZE - 1)) /** * Predefined array type of grant table copy descriptors. Used to pass around * statically allocated memory structures. */ typedef struct gnttab_copy gnttab_copy_table[GNTTAB_LEN]; /*--------------------------- Forward Declarations ---------------------------*/ struct xnb_softc; struct xnb_pkt; static void xnb_attach_failed(struct xnb_softc *xnb, int err, const char *fmt, ...) __printflike(3,4); static int xnb_shutdown(struct xnb_softc *xnb); static int create_netdev(device_t dev); static int xnb_detach(device_t dev); static int xnb_ifmedia_upd(struct ifnet *ifp); static void xnb_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr); static void xnb_intr(void *arg); static int xnb_send(netif_rx_back_ring_t *rxb, domid_t otherend, const struct mbuf *mbufc, gnttab_copy_table gnttab); static int xnb_recv(netif_tx_back_ring_t *txb, domid_t otherend, struct mbuf **mbufc, struct ifnet *ifnet, gnttab_copy_table gnttab); static int xnb_ring2pkt(struct xnb_pkt *pkt, const netif_tx_back_ring_t *tx_ring, RING_IDX start); static void xnb_txpkt2rsp(const struct xnb_pkt *pkt, netif_tx_back_ring_t *ring, int error); static struct mbuf *xnb_pkt2mbufc(const struct xnb_pkt *pkt, struct ifnet *ifp); static int xnb_txpkt2gnttab(const struct xnb_pkt *pkt, const struct mbuf *mbufc, gnttab_copy_table gnttab, const netif_tx_back_ring_t *txb, domid_t otherend_id); static void xnb_update_mbufc(struct mbuf *mbufc, const gnttab_copy_table gnttab, int n_entries); static int xnb_mbufc2pkt(const struct mbuf *mbufc, struct xnb_pkt *pkt, RING_IDX start, int space); static int xnb_rxpkt2gnttab(const struct xnb_pkt *pkt, const struct mbuf *mbufc, gnttab_copy_table gnttab, const netif_rx_back_ring_t *rxb, domid_t otherend_id); static int xnb_rxpkt2rsp(const struct xnb_pkt *pkt, const gnttab_copy_table gnttab, int n_entries, netif_rx_back_ring_t *ring); static void xnb_stop(struct xnb_softc*); static int xnb_ioctl(struct ifnet*, u_long, caddr_t); static void xnb_start_locked(struct ifnet*); static void xnb_start(struct ifnet*); static void xnb_ifinit_locked(struct xnb_softc*); static void xnb_ifinit(void*); #ifdef XNB_DEBUG static int xnb_unit_test_main(SYSCTL_HANDLER_ARGS); static int xnb_dump_rings(SYSCTL_HANDLER_ARGS); #endif #if defined(INET) || defined(INET6) static void xnb_add_mbuf_cksum(struct mbuf *mbufc); #endif /*------------------------------ Data Structures -----------------------------*/ /** * Representation of a xennet packet. Simplified version of a packet as * stored in the Xen tx ring. Applicable to both RX and TX packets */ struct xnb_pkt{ /** * Array index of the first data-bearing (eg, not extra info) entry * for this packet */ RING_IDX car; /** * Array index of the second data-bearing entry for this packet. * Invalid if the packet has only one data-bearing entry. If the * packet has more than two data-bearing entries, then the second * through the last will be sequential modulo the ring size */ RING_IDX cdr; /** * Optional extra info. Only valid if flags contains * NETTXF_extra_info. Note that extra.type will always be * XEN_NETIF_EXTRA_TYPE_GSO. Currently, no known netfront or netback * driver will ever set XEN_NETIF_EXTRA_TYPE_MCAST_* */ netif_extra_info_t extra; /** Size of entire packet in bytes. */ uint16_t size; /** The size of the first entry's data in bytes */ uint16_t car_size; /** * Either NETTXF_ or NETRXF_ flags. Note that the flag values are * not the same for TX and RX packets */ uint16_t flags; /** * The number of valid data-bearing entries (either netif_tx_request's * or netif_rx_response's) in the packet. If this is 0, it means the * entire packet is invalid. */ uint16_t list_len; /** There was an error processing the packet */ uint8_t error; }; /** xnb_pkt method: initialize it */ static inline void xnb_pkt_initialize(struct xnb_pkt *pxnb) { bzero(pxnb, sizeof(*pxnb)); } /** xnb_pkt method: mark the packet as valid */ static inline void xnb_pkt_validate(struct xnb_pkt *pxnb) { pxnb->error = 0; }; /** xnb_pkt method: mark the packet as invalid */ static inline void xnb_pkt_invalidate(struct xnb_pkt *pxnb) { pxnb->error = 1; }; /** xnb_pkt method: Check whether the packet is valid */ static inline int xnb_pkt_is_valid(const struct xnb_pkt *pxnb) { return (! pxnb->error); } #ifdef XNB_DEBUG /** xnb_pkt method: print the packet's contents in human-readable format*/ static void __unused xnb_dump_pkt(const struct xnb_pkt *pkt) { if (pkt == NULL) { DPRINTF("Was passed a null pointer.\n"); return; } DPRINTF("pkt address= %p\n", pkt); DPRINTF("pkt->size=%d\n", pkt->size); DPRINTF("pkt->car_size=%d\n", pkt->car_size); DPRINTF("pkt->flags=0x%04x\n", pkt->flags); DPRINTF("pkt->list_len=%d\n", pkt->list_len); /* DPRINTF("pkt->extra"); TODO */ DPRINTF("pkt->car=%d\n", pkt->car); DPRINTF("pkt->cdr=%d\n", pkt->cdr); DPRINTF("pkt->error=%d\n", pkt->error); } #endif /* XNB_DEBUG */ static void xnb_dump_txreq(RING_IDX idx, const struct netif_tx_request *txreq) { if (txreq != NULL) { DPRINTF("netif_tx_request index =%u\n", idx); DPRINTF("netif_tx_request.gref =%u\n", txreq->gref); DPRINTF("netif_tx_request.offset=%hu\n", txreq->offset); DPRINTF("netif_tx_request.flags =%hu\n", txreq->flags); DPRINTF("netif_tx_request.id =%hu\n", txreq->id); DPRINTF("netif_tx_request.size =%hu\n", txreq->size); } } /** * \brief Configuration data for a shared memory request ring * used to communicate with the front-end client of this * this driver. */ struct xnb_ring_config { /** * Runtime structures for ring access. Unfortunately, TX and RX rings * use different data structures, and that cannot be changed since it * is part of the interdomain protocol. */ union{ netif_rx_back_ring_t rx_ring; netif_tx_back_ring_t tx_ring; } back_ring; /** * The device bus address returned by the hypervisor when * mapping the ring and required to unmap it when a connection * is torn down. */ uint64_t bus_addr; /** The pseudo-physical address where ring memory is mapped.*/ uint64_t gnt_addr; /** KVA address where ring memory is mapped. */ vm_offset_t va; /** * Grant table handles, one per-ring page, returned by the * hyperpervisor upon mapping of the ring and required to * unmap it when a connection is torn down. */ grant_handle_t handle; /** The number of ring pages mapped for the current connection. */ unsigned ring_pages; /** * The grant references, one per-ring page, supplied by the * front-end, allowing us to reference the ring pages in the * front-end's domain and to map these pages into our own domain. */ grant_ref_t ring_ref; }; /** * Per-instance connection state flags. */ typedef enum { /** Communication with the front-end has been established. */ XNBF_RING_CONNECTED = 0x01, /** * Front-end requests exist in the ring and are waiting for * xnb_xen_req objects to free up. */ XNBF_RESOURCE_SHORTAGE = 0x02, /** Connection teardown has started. */ XNBF_SHUTDOWN = 0x04, /** A thread is already performing shutdown processing. */ XNBF_IN_SHUTDOWN = 0x08 } xnb_flag_t; /** * Types of rings. Used for array indices and to identify a ring's control * data structure type */ typedef enum{ XNB_RING_TYPE_TX = 0, /* ID of TX rings, used for array indices */ XNB_RING_TYPE_RX = 1, /* ID of RX rings, used for array indices */ XNB_NUM_RING_TYPES } xnb_ring_type_t; /** * Per-instance configuration data. */ struct xnb_softc { /** NewBus device corresponding to this instance. */ device_t dev; /* Media related fields */ /** Generic network media state */ struct ifmedia sc_media; /** Media carrier info */ struct ifnet *xnb_ifp; /** Our own private carrier state */ unsigned carrier; /** Device MAC Address */ uint8_t mac[ETHER_ADDR_LEN]; /* Xen related fields */ /** * \brief The netif protocol abi in effect. * * There are situations where the back and front ends can * have a different, native abi (e.g. intel x86_64 and * 32bit x86 domains on the same machine). The back-end * always accomodates the front-end's native abi. That * value is pulled from the XenStore and recorded here. */ int abi; /** * Name of the bridge to which this VIF is connected, if any * This field is dynamically allocated by xenbus and must be free()ed * when no longer needed */ char *bridge; /** The interrupt driven even channel used to signal ring events. */ evtchn_port_t evtchn; /** Xen device handle.*/ long handle; /** Handle to the communication ring event channel. */ xen_intr_handle_t xen_intr_handle; /** * \brief Cached value of the front-end's domain id. * * This value is used at once for each mapped page in * a transaction. We cache it to avoid incuring the * cost of an ivar access every time this is needed. */ domid_t otherend_id; /** * Undocumented frontend feature. Has something to do with * scatter/gather IO */ uint8_t can_sg; /** Undocumented frontend feature */ uint8_t gso; /** Undocumented frontend feature */ uint8_t gso_prefix; /** Can checksum TCP/UDP over IPv4 */ uint8_t ip_csum; /* Implementation related fields */ /** * Preallocated grant table copy descriptor for RX operations. * Access must be protected by rx_lock */ gnttab_copy_table rx_gnttab; /** * Preallocated grant table copy descriptor for TX operations. * Access must be protected by tx_lock */ gnttab_copy_table tx_gnttab; #ifdef XENHVM /** * Resource representing allocated physical address space * associated with our per-instance kva region. */ struct resource *pseudo_phys_res; /** Resource id for allocated physical address space. */ int pseudo_phys_res_id; #endif /** Ring mapping and interrupt configuration data. */ struct xnb_ring_config ring_configs[XNB_NUM_RING_TYPES]; /** * Global pool of kva used for mapping remote domain ring * and I/O transaction data. */ vm_offset_t kva; /** Psuedo-physical address corresponding to kva. */ uint64_t gnt_base_addr; /** Various configuration and state bit flags. */ xnb_flag_t flags; /** Mutex protecting per-instance data in the receive path. */ struct mtx rx_lock; /** Mutex protecting per-instance data in the softc structure. */ struct mtx sc_lock; /** Mutex protecting per-instance data in the transmit path. */ struct mtx tx_lock; /** The size of the global kva pool. */ int kva_size; /** Name of the interface */ char if_name[IFNAMSIZ]; }; /*---------------------------- Debugging functions ---------------------------*/ #ifdef XNB_DEBUG static void __unused xnb_dump_gnttab_copy(const struct gnttab_copy *entry) { if (entry == NULL) { printf("NULL grant table pointer\n"); return; } if (entry->flags & GNTCOPY_dest_gref) printf("gnttab dest ref=\t%u\n", entry->dest.u.ref); else printf("gnttab dest gmfn=\t%lu\n", entry->dest.u.gmfn); printf("gnttab dest offset=\t%hu\n", entry->dest.offset); printf("gnttab dest domid=\t%hu\n", entry->dest.domid); if (entry->flags & GNTCOPY_source_gref) printf("gnttab source ref=\t%u\n", entry->source.u.ref); else printf("gnttab source gmfn=\t%lu\n", entry->source.u.gmfn); printf("gnttab source offset=\t%hu\n", entry->source.offset); printf("gnttab source domid=\t%hu\n", entry->source.domid); printf("gnttab len=\t%hu\n", entry->len); printf("gnttab flags=\t%hu\n", entry->flags); printf("gnttab status=\t%hd\n", entry->status); } static int xnb_dump_rings(SYSCTL_HANDLER_ARGS) { static char results[720]; struct xnb_softc const* xnb = (struct xnb_softc*)arg1; netif_rx_back_ring_t const* rxb = &xnb->ring_configs[XNB_RING_TYPE_RX].back_ring.rx_ring; netif_tx_back_ring_t const* txb = &xnb->ring_configs[XNB_RING_TYPE_TX].back_ring.tx_ring; /* empty the result strings */ results[0] = 0; if ( !txb || !txb->sring || !rxb || !rxb->sring ) return (SYSCTL_OUT(req, results, strnlen(results, 720))); snprintf(results, 720, "\n\t%35s %18s\n" /* TX, RX */ "\t%16s %18d %18d\n" /* req_cons */ "\t%16s %18d %18d\n" /* nr_ents */ "\t%16s %18d %18d\n" /* rsp_prod_pvt */ "\t%16s %18p %18p\n" /* sring */ "\t%16s %18d %18d\n" /* req_prod */ "\t%16s %18d %18d\n" /* req_event */ "\t%16s %18d %18d\n" /* rsp_prod */ "\t%16s %18d %18d\n", /* rsp_event */ "TX", "RX", "req_cons", txb->req_cons, rxb->req_cons, "nr_ents", txb->nr_ents, rxb->nr_ents, "rsp_prod_pvt", txb->rsp_prod_pvt, rxb->rsp_prod_pvt, "sring", txb->sring, rxb->sring, "sring->req_prod", txb->sring->req_prod, rxb->sring->req_prod, "sring->req_event", txb->sring->req_event, rxb->sring->req_event, "sring->rsp_prod", txb->sring->rsp_prod, rxb->sring->rsp_prod, "sring->rsp_event", txb->sring->rsp_event, rxb->sring->rsp_event); return (SYSCTL_OUT(req, results, strnlen(results, 720))); } static void __unused xnb_dump_mbuf(const struct mbuf *m) { int len; uint8_t *d; if (m == NULL) return; printf("xnb_dump_mbuf:\n"); if (m->m_flags & M_PKTHDR) { printf(" flowid=%10d, csum_flags=%#8x, csum_data=%#8x, " "tso_segsz=%5hd\n", m->m_pkthdr.flowid, (int)m->m_pkthdr.csum_flags, m->m_pkthdr.csum_data, m->m_pkthdr.tso_segsz); printf(" rcvif=%16p, len=%19d\n", m->m_pkthdr.rcvif, m->m_pkthdr.len); } printf(" m_next=%16p, m_nextpk=%16p, m_data=%16p\n", m->m_next, m->m_nextpkt, m->m_data); printf(" m_len=%17d, m_flags=%#15x, m_type=%18u\n", m->m_len, m->m_flags, m->m_type); len = m->m_len; d = mtod(m, uint8_t*); while (len > 0) { int i; printf(" "); for (i = 0; (i < 16) && (len > 0); i++, len--) { printf("%02hhx ", *(d++)); } printf("\n"); } } #endif /* XNB_DEBUG */ /*------------------------ Inter-Domain Communication ------------------------*/ /** * Free dynamically allocated KVA or pseudo-physical address allocations. * * \param xnb Per-instance xnb configuration structure. */ static void xnb_free_communication_mem(struct xnb_softc *xnb) { if (xnb->kva != 0) { #ifndef XENHVM kva_free(xnb->kva, xnb->kva_size); #else if (xnb->pseudo_phys_res != NULL) { bus_release_resource(xnb->dev, SYS_RES_MEMORY, xnb->pseudo_phys_res_id, xnb->pseudo_phys_res); xnb->pseudo_phys_res = NULL; } #endif /* XENHVM */ } xnb->kva = 0; xnb->gnt_base_addr = 0; } /** * Cleanup all inter-domain communication mechanisms. * * \param xnb Per-instance xnb configuration structure. */ static int xnb_disconnect(struct xnb_softc *xnb) { struct gnttab_unmap_grant_ref gnts[XNB_NUM_RING_TYPES]; int error; int i; if (xnb->xen_intr_handle != NULL) xen_intr_unbind(&xnb->xen_intr_handle); /* * We may still have another thread currently processing requests. We * must acquire the rx and tx locks to make sure those threads are done, * but we can release those locks as soon as we acquire them, because no * more interrupts will be arriving. */ mtx_lock(&xnb->tx_lock); mtx_unlock(&xnb->tx_lock); mtx_lock(&xnb->rx_lock); mtx_unlock(&xnb->rx_lock); /* Free malloc'd softc member variables */ if (xnb->bridge != NULL) { free(xnb->bridge, M_XENSTORE); xnb->bridge = NULL; } /* All request processing has stopped, so unmap the rings */ for (i=0; i < XNB_NUM_RING_TYPES; i++) { gnts[i].host_addr = xnb->ring_configs[i].gnt_addr; gnts[i].dev_bus_addr = xnb->ring_configs[i].bus_addr; gnts[i].handle = xnb->ring_configs[i].handle; } error = HYPERVISOR_grant_table_op(GNTTABOP_unmap_grant_ref, gnts, XNB_NUM_RING_TYPES); KASSERT(error == 0, ("Grant table unmap op failed (%d)", error)); xnb_free_communication_mem(xnb); /* * Zero the ring config structs because the pointers, handles, and * grant refs contained therein are no longer valid. */ bzero(&xnb->ring_configs[XNB_RING_TYPE_TX], sizeof(struct xnb_ring_config)); bzero(&xnb->ring_configs[XNB_RING_TYPE_RX], sizeof(struct xnb_ring_config)); xnb->flags &= ~XNBF_RING_CONNECTED; return (0); } /** * Map a single shared memory ring into domain local address space and * initialize its control structure * * \param xnb Per-instance xnb configuration structure * \param ring_type Array index of this ring in the xnb's array of rings * \return An errno */ static int xnb_connect_ring(struct xnb_softc *xnb, xnb_ring_type_t ring_type) { struct gnttab_map_grant_ref gnt; struct xnb_ring_config *ring = &xnb->ring_configs[ring_type]; int error; /* TX ring type = 0, RX =1 */ ring->va = xnb->kva + ring_type * PAGE_SIZE; ring->gnt_addr = xnb->gnt_base_addr + ring_type * PAGE_SIZE; gnt.host_addr = ring->gnt_addr; gnt.flags = GNTMAP_host_map; gnt.ref = ring->ring_ref; gnt.dom = xnb->otherend_id; error = HYPERVISOR_grant_table_op(GNTTABOP_map_grant_ref, &gnt, 1); if (error != 0) panic("netback: Ring page grant table op failed (%d)", error); if (gnt.status != 0) { ring->va = 0; error = EACCES; xenbus_dev_fatal(xnb->dev, error, "Ring shared page mapping failed. " "Status %d.", gnt.status); } else { ring->handle = gnt.handle; ring->bus_addr = gnt.dev_bus_addr; if (ring_type == XNB_RING_TYPE_TX) { BACK_RING_INIT(&ring->back_ring.tx_ring, (netif_tx_sring_t*)ring->va, ring->ring_pages * PAGE_SIZE); } else if (ring_type == XNB_RING_TYPE_RX) { BACK_RING_INIT(&ring->back_ring.rx_ring, (netif_rx_sring_t*)ring->va, ring->ring_pages * PAGE_SIZE); } else { xenbus_dev_fatal(xnb->dev, error, "Unknown ring type %d", ring_type); } } return error; } /** * Setup the shared memory rings and bind an interrupt to the event channel * used to notify us of ring changes. * * \param xnb Per-instance xnb configuration structure. */ static int xnb_connect_comms(struct xnb_softc *xnb) { int error; xnb_ring_type_t i; if ((xnb->flags & XNBF_RING_CONNECTED) != 0) return (0); /* * Kva for our rings are at the tail of the region of kva allocated * by xnb_alloc_communication_mem(). */ for (i=0; i < XNB_NUM_RING_TYPES; i++) { error = xnb_connect_ring(xnb, i); if (error != 0) return error; } xnb->flags |= XNBF_RING_CONNECTED; error = xen_intr_bind_remote_port(xnb->dev, xnb->otherend_id, xnb->evtchn, /*filter*/NULL, xnb_intr, /*arg*/xnb, INTR_TYPE_BIO | INTR_MPSAFE, &xnb->xen_intr_handle); if (error != 0) { (void)xnb_disconnect(xnb); xenbus_dev_fatal(xnb->dev, error, "binding event channel"); return (error); } DPRINTF("rings connected!\n"); return (0); } /** * Size KVA and pseudo-physical address allocations based on negotiated * values for the size and number of I/O requests, and the size of our * communication ring. * * \param xnb Per-instance xnb configuration structure. * * These address spaces are used to dynamically map pages in the * front-end's domain into our own. */ static int xnb_alloc_communication_mem(struct xnb_softc *xnb) { xnb_ring_type_t i; xnb->kva_size = 0; for (i=0; i < XNB_NUM_RING_TYPES; i++) { xnb->kva_size += xnb->ring_configs[i].ring_pages * PAGE_SIZE; } #ifndef XENHVM xnb->kva = kva_alloc(xnb->kva_size); if (xnb->kva == 0) return (ENOMEM); xnb->gnt_base_addr = xnb->kva; #else /* defined XENHVM */ /* * Reserve a range of pseudo physical memory that we can map * into kva. These pages will only be backed by machine * pages ("real memory") during the lifetime of front-end requests * via grant table operations. We will map the netif tx and rx rings * into this space. */ xnb->pseudo_phys_res_id = 0; xnb->pseudo_phys_res = bus_alloc_resource(xnb->dev, SYS_RES_MEMORY, &xnb->pseudo_phys_res_id, 0, ~0, xnb->kva_size, RF_ACTIVE); if (xnb->pseudo_phys_res == NULL) { xnb->kva = 0; return (ENOMEM); } xnb->kva = (vm_offset_t)rman_get_virtual(xnb->pseudo_phys_res); xnb->gnt_base_addr = rman_get_start(xnb->pseudo_phys_res); #endif /* !defined XENHVM */ return (0); } /** * Collect information from the XenStore related to our device and its frontend * * \param xnb Per-instance xnb configuration structure. */ static int xnb_collect_xenstore_info(struct xnb_softc *xnb) { /** * \todo Linux collects the following info. We should collect most * of this, too: * "feature-rx-notify" */ const char *otherend_path; const char *our_path; int err; unsigned int rx_copy, bridge_len; uint8_t no_csum_offload; otherend_path = xenbus_get_otherend_path(xnb->dev); our_path = xenbus_get_node(xnb->dev); /* Collect the critical communication parameters */ err = xs_gather(XST_NIL, otherend_path, "tx-ring-ref", "%l" PRIu32, &xnb->ring_configs[XNB_RING_TYPE_TX].ring_ref, "rx-ring-ref", "%l" PRIu32, &xnb->ring_configs[XNB_RING_TYPE_RX].ring_ref, "event-channel", "%" PRIu32, &xnb->evtchn, NULL); if (err != 0) { xenbus_dev_fatal(xnb->dev, err, "Unable to retrieve ring information from " "frontend %s. Unable to connect.", otherend_path); return (err); } /* Collect the handle from xenstore */ err = xs_scanf(XST_NIL, our_path, "handle", NULL, "%li", &xnb->handle); if (err != 0) { xenbus_dev_fatal(xnb->dev, err, "Error reading handle from frontend %s. " "Unable to connect.", otherend_path); } /* * Collect the bridgename, if any. We do not need bridge_len; we just * throw it away */ err = xs_read(XST_NIL, our_path, "bridge", &bridge_len, (void**)&xnb->bridge); if (err != 0) xnb->bridge = NULL; /* * Does the frontend request that we use rx copy? If not, return an * error because this driver only supports rx copy. */ err = xs_scanf(XST_NIL, otherend_path, "request-rx-copy", NULL, "%" PRIu32, &rx_copy); if (err == ENOENT) { err = 0; rx_copy = 0; } if (err < 0) { xenbus_dev_fatal(xnb->dev, err, "reading %s/request-rx-copy", otherend_path); return err; } /** * \todo: figure out the exact meaning of this feature, and when * the frontend will set it to true. It should be set to true * at some point */ /* if (!rx_copy)*/ /* return EOPNOTSUPP;*/ /** \todo Collect the rx notify feature */ /* Collect the feature-sg. */ if (xs_scanf(XST_NIL, otherend_path, "feature-sg", NULL, "%hhu", &xnb->can_sg) < 0) xnb->can_sg = 0; /* Collect remaining frontend features */ if (xs_scanf(XST_NIL, otherend_path, "feature-gso-tcpv4", NULL, "%hhu", &xnb->gso) < 0) xnb->gso = 0; if (xs_scanf(XST_NIL, otherend_path, "feature-gso-tcpv4-prefix", NULL, "%hhu", &xnb->gso_prefix) < 0) xnb->gso_prefix = 0; if (xs_scanf(XST_NIL, otherend_path, "feature-no-csum-offload", NULL, "%hhu", &no_csum_offload) < 0) no_csum_offload = 0; xnb->ip_csum = (no_csum_offload == 0); return (0); } /** * Supply information about the physical device to the frontend * via XenBus. * * \param xnb Per-instance xnb configuration structure. */ static int xnb_publish_backend_info(struct xnb_softc *xnb) { struct xs_transaction xst; const char *our_path; int error; our_path = xenbus_get_node(xnb->dev); do { error = xs_transaction_start(&xst); if (error != 0) { xenbus_dev_fatal(xnb->dev, error, "Error publishing backend info " "(start transaction)"); break; } error = xs_printf(xst, our_path, "feature-sg", "%d", XNB_SG); if (error != 0) break; error = xs_printf(xst, our_path, "feature-gso-tcpv4", "%d", XNB_GSO_TCPV4); if (error != 0) break; error = xs_printf(xst, our_path, "feature-rx-copy", "%d", XNB_RX_COPY); if (error != 0) break; error = xs_printf(xst, our_path, "feature-rx-flip", "%d", XNB_RX_FLIP); if (error != 0) break; error = xs_transaction_end(xst, 0); if (error != 0 && error != EAGAIN) { xenbus_dev_fatal(xnb->dev, error, "ending transaction"); break; } } while (error == EAGAIN); return (error); } /** * Connect to our netfront peer now that it has completed publishing * its configuration into the XenStore. * * \param xnb Per-instance xnb configuration structure. */ static void xnb_connect(struct xnb_softc *xnb) { int error; if (xenbus_get_state(xnb->dev) == XenbusStateConnected) return; if (xnb_collect_xenstore_info(xnb) != 0) return; xnb->flags &= ~XNBF_SHUTDOWN; /* Read front end configuration. */ /* Allocate resources whose size depends on front-end configuration. */ error = xnb_alloc_communication_mem(xnb); if (error != 0) { xenbus_dev_fatal(xnb->dev, error, "Unable to allocate communication memory"); return; } /* * Connect communication channel. */ error = xnb_connect_comms(xnb); if (error != 0) { /* Specific errors are reported by xnb_connect_comms(). */ return; } xnb->carrier = 1; /* Ready for I/O. */ xenbus_set_state(xnb->dev, XenbusStateConnected); } /*-------------------------- Device Teardown Support -------------------------*/ /** * Perform device shutdown functions. * * \param xnb Per-instance xnb configuration structure. * * Mark this instance as shutting down, wait for any active requests * to drain, disconnect from the front-end, and notify any waiters (e.g. * a thread invoking our detach method) that detach can now proceed. */ static int xnb_shutdown(struct xnb_softc *xnb) { /* * Due to the need to drop our mutex during some * xenbus operations, it is possible for two threads * to attempt to close out shutdown processing at * the same time. Tell the caller that hits this * race to try back later. */ if ((xnb->flags & XNBF_IN_SHUTDOWN) != 0) return (EAGAIN); xnb->flags |= XNBF_SHUTDOWN; xnb->flags |= XNBF_IN_SHUTDOWN; mtx_unlock(&xnb->sc_lock); /* Free the network interface */ xnb->carrier = 0; if (xnb->xnb_ifp != NULL) { ether_ifdetach(xnb->xnb_ifp); if_free(xnb->xnb_ifp); xnb->xnb_ifp = NULL; } mtx_lock(&xnb->sc_lock); xnb_disconnect(xnb); mtx_unlock(&xnb->sc_lock); if (xenbus_get_state(xnb->dev) < XenbusStateClosing) xenbus_set_state(xnb->dev, XenbusStateClosing); mtx_lock(&xnb->sc_lock); xnb->flags &= ~XNBF_IN_SHUTDOWN; /* Indicate to xnb_detach() that is it safe to proceed. */ wakeup(xnb); return (0); } /** * Report an attach time error to the console and Xen, and cleanup * this instance by forcing immediate detach processing. * * \param xnb Per-instance xnb configuration structure. * \param err Errno describing the error. * \param fmt Printf style format and arguments */ static void xnb_attach_failed(struct xnb_softc *xnb, int err, const char *fmt, ...) { va_list ap; va_list ap_hotplug; va_start(ap, fmt); va_copy(ap_hotplug, ap); xs_vprintf(XST_NIL, xenbus_get_node(xnb->dev), "hotplug-error", fmt, ap_hotplug); va_end(ap_hotplug); xs_printf(XST_NIL, xenbus_get_node(xnb->dev), "hotplug-status", "error"); xenbus_dev_vfatal(xnb->dev, err, fmt, ap); va_end(ap); xs_printf(XST_NIL, xenbus_get_node(xnb->dev), "online", "0"); xnb_detach(xnb->dev); } /*---------------------------- NewBus Entrypoints ----------------------------*/ /** * Inspect a XenBus device and claim it if is of the appropriate type. * * \param dev NewBus device object representing a candidate XenBus device. * * \return 0 for success, errno codes for failure. */ static int xnb_probe(device_t dev) { if (!strcmp(xenbus_get_type(dev), "vif")) { DPRINTF("Claiming device %d, %s\n", device_get_unit(dev), devclass_get_name(device_get_devclass(dev))); device_set_desc(dev, "Backend Virtual Network Device"); device_quiet(dev); return (0); } return (ENXIO); } /** * Setup sysctl variables to control various Network Back parameters. * * \param xnb Xen Net Back softc. * */ static void xnb_setup_sysctl(struct xnb_softc *xnb) { struct sysctl_ctx_list *sysctl_ctx = NULL; struct sysctl_oid *sysctl_tree = NULL; sysctl_ctx = device_get_sysctl_ctx(xnb->dev); if (sysctl_ctx == NULL) return; sysctl_tree = device_get_sysctl_tree(xnb->dev); if (sysctl_tree == NULL) return; #ifdef XNB_DEBUG SYSCTL_ADD_PROC(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, "unit_test_results", CTLTYPE_STRING | CTLFLAG_RD, xnb, 0, xnb_unit_test_main, "A", "Results of builtin unit tests"); SYSCTL_ADD_PROC(sysctl_ctx, SYSCTL_CHILDREN(sysctl_tree), OID_AUTO, "dump_rings", CTLTYPE_STRING | CTLFLAG_RD, xnb, 0, xnb_dump_rings, "A", "Xennet Back Rings"); #endif /* XNB_DEBUG */ } /** * Create a network device. * @param handle device handle */ int create_netdev(device_t dev) { struct ifnet *ifp; struct xnb_softc *xnb; int err = 0; uint32_t handle; xnb = device_get_softc(dev); mtx_init(&xnb->sc_lock, "xnb_softc", "xen netback softc lock", MTX_DEF); mtx_init(&xnb->tx_lock, "xnb_tx", "xen netback tx lock", MTX_DEF); mtx_init(&xnb->rx_lock, "xnb_rx", "xen netback rx lock", MTX_DEF); xnb->dev = dev; ifmedia_init(&xnb->sc_media, 0, xnb_ifmedia_upd, xnb_ifmedia_sts); ifmedia_add(&xnb->sc_media, IFM_ETHER|IFM_MANUAL, 0, NULL); ifmedia_set(&xnb->sc_media, IFM_ETHER|IFM_MANUAL); /* * Set the MAC address to a dummy value (00:00:00:00:00), * if the MAC address of the host-facing interface is set * to the same as the guest-facing one (the value found in * xenstore), the bridge would stop delivering packets to * us because it would see that the destination address of * the packet is the same as the interface, and so the bridge * would expect the packet has already been delivered locally * (and just drop it). */ bzero(&xnb->mac[0], sizeof(xnb->mac)); /* The interface will be named using the following nomenclature: * * xnb. * * Where handle is the oder of the interface referred to the guest. */ err = xs_scanf(XST_NIL, xenbus_get_node(xnb->dev), "handle", NULL, "%" PRIu32, &handle); if (err != 0) return (err); snprintf(xnb->if_name, IFNAMSIZ, "xnb%" PRIu16 ".%" PRIu32, xenbus_get_otherend_id(dev), handle); if (err == 0) { /* Set up ifnet structure */ ifp = xnb->xnb_ifp = if_alloc(IFT_ETHER); ifp->if_softc = xnb; if_initname(ifp, xnb->if_name, IF_DUNIT_NONE); ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp->if_ioctl = xnb_ioctl; ifp->if_output = ether_output; ifp->if_start = xnb_start; #ifdef notyet ifp->if_watchdog = xnb_watchdog; #endif ifp->if_init = xnb_ifinit; ifp->if_mtu = ETHERMTU; ifp->if_snd.ifq_maxlen = NET_RX_RING_SIZE - 1; ifp->if_hwassist = XNB_CSUM_FEATURES; ifp->if_capabilities = IFCAP_HWCSUM; ifp->if_capenable = IFCAP_HWCSUM; ether_ifattach(ifp, xnb->mac); xnb->carrier = 0; } return err; } /** * Attach to a XenBus device that has been claimed by our probe routine. * * \param dev NewBus device object representing this Xen Net Back instance. * * \return 0 for success, errno codes for failure. */ static int xnb_attach(device_t dev) { struct xnb_softc *xnb; int error; xnb_ring_type_t i; error = create_netdev(dev); if (error != 0) { xenbus_dev_fatal(dev, error, "creating netdev"); return (error); } DPRINTF("Attaching to %s\n", xenbus_get_node(dev)); /* * Basic initialization. * After this block it is safe to call xnb_detach() * to clean up any allocated data for this instance. */ xnb = device_get_softc(dev); xnb->otherend_id = xenbus_get_otherend_id(dev); for (i=0; i < XNB_NUM_RING_TYPES; i++) { xnb->ring_configs[i].ring_pages = 1; } /* * Setup sysctl variables. */ xnb_setup_sysctl(xnb); /* Update hot-plug status to satisfy xend. */ error = xs_printf(XST_NIL, xenbus_get_node(xnb->dev), "hotplug-status", "connected"); if (error != 0) { xnb_attach_failed(xnb, error, "writing %s/hotplug-status", xenbus_get_node(xnb->dev)); return (error); } if ((error = xnb_publish_backend_info(xnb)) != 0) { /* * If we can't publish our data, we cannot participate * in this connection, and waiting for a front-end state * change will not help the situation. */ xnb_attach_failed(xnb, error, "Publishing backend status for %s", xenbus_get_node(xnb->dev)); return error; } /* Tell the front end that we are ready to connect. */ xenbus_set_state(dev, XenbusStateInitWait); return (0); } /** * Detach from a net back device instance. * * \param dev NewBus device object representing this Xen Net Back instance. * * \return 0 for success, errno codes for failure. * * \note A net back device may be detached at any time in its life-cycle, * including part way through the attach process. For this reason, * initialization order and the intialization state checks in this * routine must be carefully coupled so that attach time failures * are gracefully handled. */ static int xnb_detach(device_t dev) { struct xnb_softc *xnb; DPRINTF("\n"); xnb = device_get_softc(dev); mtx_lock(&xnb->sc_lock); while (xnb_shutdown(xnb) == EAGAIN) { msleep(xnb, &xnb->sc_lock, /*wakeup prio unchanged*/0, "xnb_shutdown", 0); } mtx_unlock(&xnb->sc_lock); DPRINTF("\n"); mtx_destroy(&xnb->tx_lock); mtx_destroy(&xnb->rx_lock); mtx_destroy(&xnb->sc_lock); return (0); } /** * Prepare this net back device for suspension of this VM. * * \param dev NewBus device object representing this Xen net Back instance. * * \return 0 for success, errno codes for failure. */ static int xnb_suspend(device_t dev) { return (0); } /** * Perform any processing required to recover from a suspended state. * * \param dev NewBus device object representing this Xen Net Back instance. * * \return 0 for success, errno codes for failure. */ static int xnb_resume(device_t dev) { return (0); } /** * Handle state changes expressed via the XenStore by our front-end peer. * * \param dev NewBus device object representing this Xen * Net Back instance. * \param frontend_state The new state of the front-end. * * \return 0 for success, errno codes for failure. */ static void xnb_frontend_changed(device_t dev, XenbusState frontend_state) { struct xnb_softc *xnb; xnb = device_get_softc(dev); DPRINTF("frontend_state=%s, xnb_state=%s\n", xenbus_strstate(frontend_state), xenbus_strstate(xenbus_get_state(xnb->dev))); switch (frontend_state) { case XenbusStateInitialising: break; case XenbusStateInitialised: case XenbusStateConnected: xnb_connect(xnb); break; case XenbusStateClosing: case XenbusStateClosed: mtx_lock(&xnb->sc_lock); xnb_shutdown(xnb); mtx_unlock(&xnb->sc_lock); if (frontend_state == XenbusStateClosed) xenbus_set_state(xnb->dev, XenbusStateClosed); break; default: xenbus_dev_fatal(xnb->dev, EINVAL, "saw state %d at frontend", frontend_state); break; } } /*---------------------------- Request Processing ----------------------------*/ /** * Interrupt handler bound to the shared ring's event channel. * Entry point for the xennet transmit path in netback * Transfers packets from the Xen ring to the host's generic networking stack * * \param arg Callback argument registerd during event channel * binding - the xnb_softc for this instance. */ static void xnb_intr(void *arg) { struct xnb_softc *xnb; struct ifnet *ifp; netif_tx_back_ring_t *txb; RING_IDX req_prod_local; xnb = (struct xnb_softc *)arg; ifp = xnb->xnb_ifp; txb = &xnb->ring_configs[XNB_RING_TYPE_TX].back_ring.tx_ring; mtx_lock(&xnb->tx_lock); do { int notify; req_prod_local = txb->sring->req_prod; xen_rmb(); for (;;) { struct mbuf *mbufc; int err; err = xnb_recv(txb, xnb->otherend_id, &mbufc, ifp, xnb->tx_gnttab); if (err || (mbufc == NULL)) break; /* Send the packet to the generic network stack */ (*xnb->xnb_ifp->if_input)(xnb->xnb_ifp, mbufc); } RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(txb, notify); if (notify != 0) xen_intr_signal(xnb->xen_intr_handle); txb->sring->req_event = txb->req_cons + 1; xen_mb(); } while (txb->sring->req_prod != req_prod_local) ; mtx_unlock(&xnb->tx_lock); xnb_start(ifp); } /** * Build a struct xnb_pkt based on netif_tx_request's from a netif tx ring. * Will read exactly 0 or 1 packets from the ring; never a partial packet. * \param[out] pkt The returned packet. If there is an error building * the packet, pkt.list_len will be set to 0. * \param[in] tx_ring Pointer to the Ring that is the input to this function * \param[in] start The ring index of the first potential request * \return The number of requests consumed to build this packet */ static int xnb_ring2pkt(struct xnb_pkt *pkt, const netif_tx_back_ring_t *tx_ring, RING_IDX start) { /* * Outline: * 1) Initialize pkt * 2) Read the first request of the packet * 3) Read the extras * 4) Set cdr * 5) Loop on the remainder of the packet * 6) Finalize pkt (stuff like car_size and list_len) */ int idx = start; int discard = 0; /* whether to discard the packet */ int more_data = 0; /* there are more request past the last one */ uint16_t cdr_size = 0; /* accumulated size of requests 2 through n */ xnb_pkt_initialize(pkt); /* Read the first request */ if (RING_HAS_UNCONSUMED_REQUESTS_2(tx_ring, idx)) { netif_tx_request_t *tx = RING_GET_REQUEST(tx_ring, idx); pkt->size = tx->size; pkt->flags = tx->flags & ~NETTXF_more_data; more_data = tx->flags & NETTXF_more_data; pkt->list_len++; pkt->car = idx; idx++; } /* Read the extra info */ if ((pkt->flags & NETTXF_extra_info) && RING_HAS_UNCONSUMED_REQUESTS_2(tx_ring, idx)) { netif_extra_info_t *ext = (netif_extra_info_t*) RING_GET_REQUEST(tx_ring, idx); pkt->extra.type = ext->type; switch (pkt->extra.type) { case XEN_NETIF_EXTRA_TYPE_GSO: pkt->extra.u.gso = ext->u.gso; break; default: /* * The reference Linux netfront driver will * never set any other extra.type. So we don't * know what to do with it. Let's print an * error, then consume and discard the packet */ printf("xnb(%s:%d): Unknown extra info type %d." " Discarding packet\n", __func__, __LINE__, pkt->extra.type); xnb_dump_txreq(start, RING_GET_REQUEST(tx_ring, start)); xnb_dump_txreq(idx, RING_GET_REQUEST(tx_ring, idx)); discard = 1; break; } pkt->extra.flags = ext->flags; if (ext->flags & XEN_NETIF_EXTRA_FLAG_MORE) { /* * The reference linux netfront driver never sets this * flag (nor does any other known netfront). So we * will discard the packet. */ printf("xnb(%s:%d): Request sets " "XEN_NETIF_EXTRA_FLAG_MORE, but we can't handle " "that\n", __func__, __LINE__); xnb_dump_txreq(start, RING_GET_REQUEST(tx_ring, start)); xnb_dump_txreq(idx, RING_GET_REQUEST(tx_ring, idx)); discard = 1; } idx++; } /* Set cdr. If there is not more data, cdr is invalid */ pkt->cdr = idx; /* Loop on remainder of packet */ while (more_data && RING_HAS_UNCONSUMED_REQUESTS_2(tx_ring, idx)) { netif_tx_request_t *tx = RING_GET_REQUEST(tx_ring, idx); pkt->list_len++; cdr_size += tx->size; if (tx->flags & ~NETTXF_more_data) { /* There should be no other flags set at this point */ printf("xnb(%s:%d): Request sets unknown flags %d " "after the 1st request in the packet.\n", __func__, __LINE__, tx->flags); xnb_dump_txreq(start, RING_GET_REQUEST(tx_ring, start)); xnb_dump_txreq(idx, RING_GET_REQUEST(tx_ring, idx)); } more_data = tx->flags & NETTXF_more_data; idx++; } /* Finalize packet */ if (more_data != 0) { /* The ring ran out of requests before finishing the packet */ xnb_pkt_invalidate(pkt); idx = start; /* tell caller that we consumed no requests */ } else { /* Calculate car_size */ pkt->car_size = pkt->size - cdr_size; } if (discard != 0) { xnb_pkt_invalidate(pkt); } return idx - start; } /** * Respond to all the requests that constituted pkt. Builds the responses and * writes them to the ring, but doesn't push them to the shared ring. * \param[in] pkt the packet that needs a response * \param[in] error true if there was an error handling the packet, such * as in the hypervisor copy op or mbuf allocation * \param[out] ring Responses go here */ static void xnb_txpkt2rsp(const struct xnb_pkt *pkt, netif_tx_back_ring_t *ring, int error) { /* * Outline: * 1) Respond to the first request * 2) Respond to the extra info reques * Loop through every remaining request in the packet, generating * responses that copy those requests' ids and sets the status * appropriately. */ netif_tx_request_t *tx; netif_tx_response_t *rsp; int i; uint16_t status; status = (xnb_pkt_is_valid(pkt) == 0) || error ? NETIF_RSP_ERROR : NETIF_RSP_OKAY; KASSERT((pkt->list_len == 0) || (ring->rsp_prod_pvt == pkt->car), ("Cannot respond to ring requests out of order")); if (pkt->list_len >= 1) { uint16_t id; tx = RING_GET_REQUEST(ring, ring->rsp_prod_pvt); id = tx->id; rsp = RING_GET_RESPONSE(ring, ring->rsp_prod_pvt); rsp->id = id; rsp->status = status; ring->rsp_prod_pvt++; if (pkt->flags & NETRXF_extra_info) { rsp = RING_GET_RESPONSE(ring, ring->rsp_prod_pvt); rsp->status = NETIF_RSP_NULL; ring->rsp_prod_pvt++; } } for (i=0; i < pkt->list_len - 1; i++) { uint16_t id; tx = RING_GET_REQUEST(ring, ring->rsp_prod_pvt); id = tx->id; rsp = RING_GET_RESPONSE(ring, ring->rsp_prod_pvt); rsp->id = id; rsp->status = status; ring->rsp_prod_pvt++; } } /** * Create an mbuf chain to represent a packet. Initializes all of the headers * in the mbuf chain, but does not copy the data. The returned chain must be * free()'d when no longer needed * \param[in] pkt A packet to model the mbuf chain after * \return A newly allocated mbuf chain, possibly with clusters attached. * NULL on failure */ static struct mbuf* xnb_pkt2mbufc(const struct xnb_pkt *pkt, struct ifnet *ifp) { /** * \todo consider using a memory pool for mbufs instead of * reallocating them for every packet */ /** \todo handle extra data */ struct mbuf *m; m = m_getm(NULL, pkt->size, M_NOWAIT, MT_DATA); if (m != NULL) { m->m_pkthdr.rcvif = ifp; if (pkt->flags & NETTXF_data_validated) { /* * We lie to the host OS and always tell it that the * checksums are ok, because the packet is unlikely to * get corrupted going across domains. */ m->m_pkthdr.csum_flags = ( CSUM_IP_CHECKED | CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR ); m->m_pkthdr.csum_data = 0xffff; } } return m; } /** * Build a gnttab_copy table that can be used to copy data from a pkt * to an mbufc. Does not actually perform the copy. Always uses gref's on * the packet side. * \param[in] pkt pkt's associated requests form the src for * the copy operation * \param[in] mbufc mbufc's storage forms the dest for the copy operation * \param[out] gnttab Storage for the returned grant table * \param[in] txb Pointer to the backend ring structure * \param[in] otherend_id The domain ID of the other end of the copy * \return The number of gnttab entries filled */ static int xnb_txpkt2gnttab(const struct xnb_pkt *pkt, const struct mbuf *mbufc, gnttab_copy_table gnttab, const netif_tx_back_ring_t *txb, domid_t otherend_id) { const struct mbuf *mbuf = mbufc;/* current mbuf within the chain */ int gnt_idx = 0; /* index into grant table */ RING_IDX r_idx = pkt->car; /* index into tx ring buffer */ int r_ofs = 0; /* offset of next data within tx request's data area */ int m_ofs = 0; /* offset of next data within mbuf's data area */ /* size in bytes that still needs to be represented in the table */ uint16_t size_remaining = pkt->size; while (size_remaining > 0) { const netif_tx_request_t *txq = RING_GET_REQUEST(txb, r_idx); const size_t mbuf_space = M_TRAILINGSPACE(mbuf) - m_ofs; const size_t req_size = r_idx == pkt->car ? pkt->car_size : txq->size; const size_t pkt_space = req_size - r_ofs; /* * space is the largest amount of data that can be copied in the * grant table's next entry */ const size_t space = MIN(pkt_space, mbuf_space); /* TODO: handle this error condition without panicking */ KASSERT(gnt_idx < GNTTAB_LEN, ("Grant table is too short")); gnttab[gnt_idx].source.u.ref = txq->gref; gnttab[gnt_idx].source.domid = otherend_id; gnttab[gnt_idx].source.offset = txq->offset + r_ofs; gnttab[gnt_idx].dest.u.gmfn = virt_to_mfn( mtod(mbuf, vm_offset_t) + m_ofs); gnttab[gnt_idx].dest.offset = virt_to_offset( mtod(mbuf, vm_offset_t) + m_ofs); gnttab[gnt_idx].dest.domid = DOMID_SELF; gnttab[gnt_idx].len = space; gnttab[gnt_idx].flags = GNTCOPY_source_gref; gnt_idx++; r_ofs += space; m_ofs += space; size_remaining -= space; if (req_size - r_ofs <= 0) { /* Must move to the next tx request */ r_ofs = 0; r_idx = (r_idx == pkt->car) ? pkt->cdr : r_idx + 1; } if (M_TRAILINGSPACE(mbuf) - m_ofs <= 0) { /* Must move to the next mbuf */ m_ofs = 0; mbuf = mbuf->m_next; } } return gnt_idx; } /** * Check the status of the grant copy operations, and update mbufs various * non-data fields to reflect the data present. * \param[in,out] mbufc mbuf chain to update. The chain must be valid and of * the correct length, and data should already be present * \param[in] gnttab A grant table for a just completed copy op * \param[in] n_entries The number of valid entries in the grant table */ static void xnb_update_mbufc(struct mbuf *mbufc, const gnttab_copy_table gnttab, int n_entries) { struct mbuf *mbuf = mbufc; int i; size_t total_size = 0; for (i = 0; i < n_entries; i++) { KASSERT(gnttab[i].status == GNTST_okay, ("Some gnttab_copy entry had error status %hd\n", gnttab[i].status)); mbuf->m_len += gnttab[i].len; total_size += gnttab[i].len; if (M_TRAILINGSPACE(mbuf) <= 0) { mbuf = mbuf->m_next; } } mbufc->m_pkthdr.len = total_size; #if defined(INET) || defined(INET6) xnb_add_mbuf_cksum(mbufc); #endif } /** * Dequeue at most one packet from the shared ring * \param[in,out] txb Netif tx ring. A packet will be removed from it, and * its private indices will be updated. But the indices * will not be pushed to the shared ring. * \param[in] ifnet Interface to which the packet will be sent * \param[in] otherend Domain ID of the other end of the ring * \param[out] mbufc The assembled mbuf chain, ready to send to the generic * networking stack * \param[in,out] gnttab Pointer to enough memory for a grant table. We make * this a function parameter so that we will take less * stack space. * \return An error code */ static int xnb_recv(netif_tx_back_ring_t *txb, domid_t otherend, struct mbuf **mbufc, struct ifnet *ifnet, gnttab_copy_table gnttab) { struct xnb_pkt pkt; /* number of tx requests consumed to build the last packet */ int num_consumed; int nr_ents; *mbufc = NULL; num_consumed = xnb_ring2pkt(&pkt, txb, txb->req_cons); if (num_consumed == 0) return 0; /* Nothing to receive */ /* update statistics independent of errors */ if_inc_counter(ifnet, IFCOUNTER_IPACKETS, 1); /* * if we got here, then 1 or more requests was consumed, but the packet * is not necessarily valid. */ if (xnb_pkt_is_valid(&pkt) == 0) { /* got a garbage packet, respond and drop it */ xnb_txpkt2rsp(&pkt, txb, 1); txb->req_cons += num_consumed; DPRINTF("xnb_intr: garbage packet, num_consumed=%d\n", num_consumed); if_inc_counter(ifnet, IFCOUNTER_IERRORS, 1); return EINVAL; } *mbufc = xnb_pkt2mbufc(&pkt, ifnet); if (*mbufc == NULL) { /* * Couldn't allocate mbufs. Respond and drop the packet. Do * not consume the requests */ xnb_txpkt2rsp(&pkt, txb, 1); DPRINTF("xnb_intr: Couldn't allocate mbufs, num_consumed=%d\n", num_consumed); if_inc_counter(ifnet, IFCOUNTER_IQDROPS, 1); return ENOMEM; } nr_ents = xnb_txpkt2gnttab(&pkt, *mbufc, gnttab, txb, otherend); if (nr_ents > 0) { int __unused hv_ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gnttab, nr_ents); KASSERT(hv_ret == 0, ("HYPERVISOR_grant_table_op returned %d\n", hv_ret)); xnb_update_mbufc(*mbufc, gnttab, nr_ents); } xnb_txpkt2rsp(&pkt, txb, 0); txb->req_cons += num_consumed; return 0; } /** * Create an xnb_pkt based on the contents of an mbuf chain. * \param[in] mbufc mbuf chain to transform into a packet * \param[out] pkt Storage for the newly generated xnb_pkt * \param[in] start The ring index of the first available slot in the rx * ring * \param[in] space The number of free slots in the rx ring * \retval 0 Success * \retval EINVAL mbufc was corrupt or not convertible into a pkt * \retval EAGAIN There was not enough space in the ring to queue the * packet */ static int xnb_mbufc2pkt(const struct mbuf *mbufc, struct xnb_pkt *pkt, RING_IDX start, int space) { int retval = 0; if ((mbufc == NULL) || ( (mbufc->m_flags & M_PKTHDR) == 0) || (mbufc->m_pkthdr.len == 0)) { xnb_pkt_invalidate(pkt); retval = EINVAL; } else { int slots_required; xnb_pkt_validate(pkt); pkt->flags = 0; pkt->size = mbufc->m_pkthdr.len; pkt->car = start; pkt->car_size = mbufc->m_len; if (mbufc->m_pkthdr.csum_flags & CSUM_TSO) { pkt->flags |= NETRXF_extra_info; pkt->extra.u.gso.size = mbufc->m_pkthdr.tso_segsz; pkt->extra.u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; pkt->extra.u.gso.pad = 0; pkt->extra.u.gso.features = 0; pkt->extra.type = XEN_NETIF_EXTRA_TYPE_GSO; pkt->extra.flags = 0; pkt->cdr = start + 2; } else { pkt->cdr = start + 1; } if (mbufc->m_pkthdr.csum_flags & (CSUM_TSO | CSUM_DELAY_DATA)) { pkt->flags |= (NETRXF_csum_blank | NETRXF_data_validated); } /* * Each ring response can have up to PAGE_SIZE of data. * Assume that we can defragment the mbuf chain efficiently * into responses so that each response but the last uses all * PAGE_SIZE bytes. */ pkt->list_len = (pkt->size + PAGE_SIZE - 1) / PAGE_SIZE; if (pkt->list_len > 1) { pkt->flags |= NETRXF_more_data; } slots_required = pkt->list_len + (pkt->flags & NETRXF_extra_info ? 1 : 0); if (slots_required > space) { xnb_pkt_invalidate(pkt); retval = EAGAIN; } } return retval; } /** * Build a gnttab_copy table that can be used to copy data from an mbuf chain * to the frontend's shared buffers. Does not actually perform the copy. * Always uses gref's on the other end's side. * \param[in] pkt pkt's associated responses form the dest for the copy * operatoin * \param[in] mbufc The source for the copy operation * \param[out] gnttab Storage for the returned grant table * \param[in] rxb Pointer to the backend ring structure * \param[in] otherend_id The domain ID of the other end of the copy * \return The number of gnttab entries filled */ static int xnb_rxpkt2gnttab(const struct xnb_pkt *pkt, const struct mbuf *mbufc, gnttab_copy_table gnttab, const netif_rx_back_ring_t *rxb, domid_t otherend_id) { const struct mbuf *mbuf = mbufc;/* current mbuf within the chain */ int gnt_idx = 0; /* index into grant table */ RING_IDX r_idx = pkt->car; /* index into rx ring buffer */ int r_ofs = 0; /* offset of next data within rx request's data area */ int m_ofs = 0; /* offset of next data within mbuf's data area */ /* size in bytes that still needs to be represented in the table */ uint16_t size_remaining; size_remaining = (xnb_pkt_is_valid(pkt) != 0) ? pkt->size : 0; while (size_remaining > 0) { const netif_rx_request_t *rxq = RING_GET_REQUEST(rxb, r_idx); const size_t mbuf_space = mbuf->m_len - m_ofs; /* Xen shared pages have an implied size of PAGE_SIZE */ const size_t req_size = PAGE_SIZE; const size_t pkt_space = req_size - r_ofs; /* * space is the largest amount of data that can be copied in the * grant table's next entry */ const size_t space = MIN(pkt_space, mbuf_space); /* TODO: handle this error condition without panicing */ KASSERT(gnt_idx < GNTTAB_LEN, ("Grant table is too short")); gnttab[gnt_idx].dest.u.ref = rxq->gref; gnttab[gnt_idx].dest.domid = otherend_id; gnttab[gnt_idx].dest.offset = r_ofs; gnttab[gnt_idx].source.u.gmfn = virt_to_mfn( mtod(mbuf, vm_offset_t) + m_ofs); gnttab[gnt_idx].source.offset = virt_to_offset( mtod(mbuf, vm_offset_t) + m_ofs); gnttab[gnt_idx].source.domid = DOMID_SELF; gnttab[gnt_idx].len = space; gnttab[gnt_idx].flags = GNTCOPY_dest_gref; gnt_idx++; r_ofs += space; m_ofs += space; size_remaining -= space; if (req_size - r_ofs <= 0) { /* Must move to the next rx request */ r_ofs = 0; r_idx = (r_idx == pkt->car) ? pkt->cdr : r_idx + 1; } if (mbuf->m_len - m_ofs <= 0) { /* Must move to the next mbuf */ m_ofs = 0; mbuf = mbuf->m_next; } } return gnt_idx; } /** * Generates responses for all the requests that constituted pkt. Builds * responses and writes them to the ring, but doesn't push the shared ring * indices. * \param[in] pkt the packet that needs a response * \param[in] gnttab The grant copy table corresponding to this packet. * Used to determine how many rsp->netif_rx_response_t's to * generate. * \param[in] n_entries Number of relevant entries in the grant table * \param[out] ring Responses go here * \return The number of RX requests that were consumed to generate * the responses */ static int xnb_rxpkt2rsp(const struct xnb_pkt *pkt, const gnttab_copy_table gnttab, int n_entries, netif_rx_back_ring_t *ring) { /* * This code makes the following assumptions: * * All entries in gnttab set GNTCOPY_dest_gref * * The entries in gnttab are grouped by their grefs: any two * entries with the same gref must be adjacent */ int error = 0; int gnt_idx, i; int n_responses = 0; grant_ref_t last_gref = GRANT_REF_INVALID; RING_IDX r_idx; KASSERT(gnttab != NULL, ("Received a null granttable copy")); /* * In the event of an error, we only need to send one response to the * netfront. In that case, we musn't write any data to the responses * after the one we send. So we must loop all the way through gnttab * looking for errors before we generate any responses * * Since we're looping through the grant table anyway, we'll count the * number of different gref's in it, which will tell us how many * responses to generate */ for (gnt_idx = 0; gnt_idx < n_entries; gnt_idx++) { int16_t status = gnttab[gnt_idx].status; if (status != GNTST_okay) { DPRINTF( "Got error %d for hypervisor gnttab_copy status\n", status); error = 1; break; } if (gnttab[gnt_idx].dest.u.ref != last_gref) { n_responses++; last_gref = gnttab[gnt_idx].dest.u.ref; } } if (error != 0) { uint16_t id; netif_rx_response_t *rsp; id = RING_GET_REQUEST(ring, ring->rsp_prod_pvt)->id; rsp = RING_GET_RESPONSE(ring, ring->rsp_prod_pvt); rsp->id = id; rsp->status = NETIF_RSP_ERROR; n_responses = 1; } else { gnt_idx = 0; const int has_extra = pkt->flags & NETRXF_extra_info; if (has_extra != 0) n_responses++; for (i = 0; i < n_responses; i++) { netif_rx_request_t rxq; netif_rx_response_t *rsp; r_idx = ring->rsp_prod_pvt + i; /* * We copy the structure of rxq instead of making a * pointer because it shares the same memory as rsp. */ rxq = *(RING_GET_REQUEST(ring, r_idx)); rsp = RING_GET_RESPONSE(ring, r_idx); if (has_extra && (i == 1)) { netif_extra_info_t *ext = (netif_extra_info_t*)rsp; ext->type = XEN_NETIF_EXTRA_TYPE_GSO; ext->flags = 0; ext->u.gso.size = pkt->extra.u.gso.size; ext->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; ext->u.gso.pad = 0; ext->u.gso.features = 0; } else { rsp->id = rxq.id; rsp->status = GNTST_okay; rsp->offset = 0; rsp->flags = 0; if (i < pkt->list_len - 1) rsp->flags |= NETRXF_more_data; if ((i == 0) && has_extra) rsp->flags |= NETRXF_extra_info; if ((i == 0) && (pkt->flags & NETRXF_data_validated)) { rsp->flags |= NETRXF_data_validated; rsp->flags |= NETRXF_csum_blank; } rsp->status = 0; for (; gnttab[gnt_idx].dest.u.ref == rxq.gref; gnt_idx++) { rsp->status += gnttab[gnt_idx].len; } } } } ring->req_cons += n_responses; ring->rsp_prod_pvt += n_responses; return n_responses; } #if defined(INET) || defined(INET6) /** * Add IP, TCP, and/or UDP checksums to every mbuf in a chain. The first mbuf * in the chain must start with a struct ether_header. * * XXX This function will perform incorrectly on UDP packets that are split up * into multiple ethernet frames. */ static void xnb_add_mbuf_cksum(struct mbuf *mbufc) { struct ether_header *eh; struct ip *iph; uint16_t ether_type; eh = mtod(mbufc, struct ether_header*); ether_type = ntohs(eh->ether_type); if (ether_type != ETHERTYPE_IP) { /* Nothing to calculate */ return; } iph = (struct ip*)(eh + 1); if (mbufc->m_pkthdr.csum_flags & CSUM_IP_VALID) { iph->ip_sum = 0; iph->ip_sum = in_cksum_hdr(iph); } switch (iph->ip_p) { case IPPROTO_TCP: if (mbufc->m_pkthdr.csum_flags & CSUM_IP_VALID) { size_t tcplen = ntohs(iph->ip_len) - sizeof(struct ip); struct tcphdr *th = (struct tcphdr*)(iph + 1); th->th_sum = in_pseudo(iph->ip_src.s_addr, iph->ip_dst.s_addr, htons(IPPROTO_TCP + tcplen)); th->th_sum = in_cksum_skip(mbufc, sizeof(struct ether_header) + ntohs(iph->ip_len), sizeof(struct ether_header) + (iph->ip_hl << 2)); } break; case IPPROTO_UDP: if (mbufc->m_pkthdr.csum_flags & CSUM_IP_VALID) { size_t udplen = ntohs(iph->ip_len) - sizeof(struct ip); struct udphdr *uh = (struct udphdr*)(iph + 1); uh->uh_sum = in_pseudo(iph->ip_src.s_addr, iph->ip_dst.s_addr, htons(IPPROTO_UDP + udplen)); uh->uh_sum = in_cksum_skip(mbufc, sizeof(struct ether_header) + ntohs(iph->ip_len), sizeof(struct ether_header) + (iph->ip_hl << 2)); } break; default: break; } } #endif /* INET || INET6 */ static void xnb_stop(struct xnb_softc *xnb) { struct ifnet *ifp; mtx_assert(&xnb->sc_lock, MA_OWNED); ifp = xnb->xnb_ifp; ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); if_link_state_change(ifp, LINK_STATE_DOWN); } static int xnb_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct xnb_softc *xnb = ifp->if_softc; struct ifreq *ifr = (struct ifreq*) data; #ifdef INET struct ifaddr *ifa = (struct ifaddr*)data; #endif int error = 0; switch (cmd) { case SIOCSIFFLAGS: mtx_lock(&xnb->sc_lock); if (ifp->if_flags & IFF_UP) { xnb_ifinit_locked(xnb); } else { if (ifp->if_drv_flags & IFF_DRV_RUNNING) { xnb_stop(xnb); } } /* * Note: netfront sets a variable named xn_if_flags * here, but that variable is never read */ mtx_unlock(&xnb->sc_lock); break; case SIOCSIFADDR: - case SIOCGIFADDR: #ifdef INET mtx_lock(&xnb->sc_lock); if (ifa->ifa_addr->sa_family == AF_INET) { ifp->if_flags |= IFF_UP; if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) { ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); if_link_state_change(ifp, LINK_STATE_DOWN); ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; if_link_state_change(ifp, LINK_STATE_UP); } arp_ifinit(ifp, ifa); mtx_unlock(&xnb->sc_lock); } else { mtx_unlock(&xnb->sc_lock); #endif error = ether_ioctl(ifp, cmd, data); #ifdef INET } #endif break; case SIOCSIFCAP: mtx_lock(&xnb->sc_lock); if (ifr->ifr_reqcap & IFCAP_TXCSUM) { ifp->if_capenable |= IFCAP_TXCSUM; ifp->if_hwassist |= XNB_CSUM_FEATURES; } else { ifp->if_capenable &= ~(IFCAP_TXCSUM); ifp->if_hwassist &= ~(XNB_CSUM_FEATURES); } if ((ifr->ifr_reqcap & IFCAP_RXCSUM)) { ifp->if_capenable |= IFCAP_RXCSUM; } else { ifp->if_capenable &= ~(IFCAP_RXCSUM); } /* * TODO enable TSO4 and LRO once we no longer need * to calculate checksums in software */ #if 0 if (ifr->if_reqcap |= IFCAP_TSO4) { if (IFCAP_TXCSUM & ifp->if_capenable) { printf("xnb: Xen netif requires that " "TXCSUM be enabled in order " "to use TSO4\n"); error = EINVAL; } else { ifp->if_capenable |= IFCAP_TSO4; ifp->if_hwassist |= CSUM_TSO; } } else { ifp->if_capenable &= ~(IFCAP_TSO4); ifp->if_hwassist &= ~(CSUM_TSO); } if (ifr->ifreqcap |= IFCAP_LRO) { ifp->if_capenable |= IFCAP_LRO; } else { ifp->if_capenable &= ~(IFCAP_LRO); } #endif mtx_unlock(&xnb->sc_lock); break; case SIOCSIFMTU: ifp->if_mtu = ifr->ifr_mtu; ifp->if_drv_flags &= ~IFF_DRV_RUNNING; xnb_ifinit(xnb); break; case SIOCADDMULTI: case SIOCDELMULTI: case SIOCSIFMEDIA: case SIOCGIFMEDIA: error = ifmedia_ioctl(ifp, ifr, &xnb->sc_media, cmd); break; default: error = ether_ioctl(ifp, cmd, data); break; } return (error); } static void xnb_start_locked(struct ifnet *ifp) { netif_rx_back_ring_t *rxb; struct xnb_softc *xnb; struct mbuf *mbufc; RING_IDX req_prod_local; xnb = ifp->if_softc; rxb = &xnb->ring_configs[XNB_RING_TYPE_RX].back_ring.rx_ring; if (!xnb->carrier) return; do { int out_of_space = 0; int notify; req_prod_local = rxb->sring->req_prod; xen_rmb(); for (;;) { int error; IF_DEQUEUE(&ifp->if_snd, mbufc); if (mbufc == NULL) break; error = xnb_send(rxb, xnb->otherend_id, mbufc, xnb->rx_gnttab); switch (error) { case EAGAIN: /* * Insufficient space in the ring. * Requeue pkt and send when space is * available. */ IF_PREPEND(&ifp->if_snd, mbufc); /* * Perhaps the frontend missed an IRQ * and went to sleep. Notify it to wake * it up. */ out_of_space = 1; break; case EINVAL: /* OS gave a corrupt packet. Drop it.*/ if_inc_counter(ifp, IFCOUNTER_OERRORS, 1); /* FALLTHROUGH */ default: /* Send succeeded, or packet had error. * Free the packet */ if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); if (mbufc) m_freem(mbufc); break; } if (out_of_space != 0) break; } RING_PUSH_RESPONSES_AND_CHECK_NOTIFY(rxb, notify); if ((notify != 0) || (out_of_space != 0)) xen_intr_signal(xnb->xen_intr_handle); rxb->sring->req_event = req_prod_local + 1; xen_mb(); } while (rxb->sring->req_prod != req_prod_local) ; } /** * Sends one packet to the ring. Blocks until the packet is on the ring * \param[in] mbufc Contains one packet to send. Caller must free * \param[in,out] rxb The packet will be pushed onto this ring, but the * otherend will not be notified. * \param[in] otherend The domain ID of the other end of the connection * \retval EAGAIN The ring did not have enough space for the packet. * The ring has not been modified * \param[in,out] gnttab Pointer to enough memory for a grant table. We make * this a function parameter so that we will take less * stack space. * \retval EINVAL mbufc was corrupt or not convertible into a pkt */ static int xnb_send(netif_rx_back_ring_t *ring, domid_t otherend, const struct mbuf *mbufc, gnttab_copy_table gnttab) { struct xnb_pkt pkt; int error, n_entries, n_reqs; RING_IDX space; space = ring->sring->req_prod - ring->req_cons; error = xnb_mbufc2pkt(mbufc, &pkt, ring->rsp_prod_pvt, space); if (error != 0) return error; n_entries = xnb_rxpkt2gnttab(&pkt, mbufc, gnttab, ring, otherend); if (n_entries != 0) { int __unused hv_ret = HYPERVISOR_grant_table_op(GNTTABOP_copy, gnttab, n_entries); KASSERT(hv_ret == 0, ("HYPERVISOR_grant_table_op returned %d\n", hv_ret)); } n_reqs = xnb_rxpkt2rsp(&pkt, gnttab, n_entries, ring); return 0; } static void xnb_start(struct ifnet *ifp) { struct xnb_softc *xnb; xnb = ifp->if_softc; mtx_lock(&xnb->rx_lock); xnb_start_locked(ifp); mtx_unlock(&xnb->rx_lock); } /* equivalent of network_open() in Linux */ static void xnb_ifinit_locked(struct xnb_softc *xnb) { struct ifnet *ifp; ifp = xnb->xnb_ifp; mtx_assert(&xnb->sc_lock, MA_OWNED); if (ifp->if_drv_flags & IFF_DRV_RUNNING) return; xnb_stop(xnb); ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; if_link_state_change(ifp, LINK_STATE_UP); } static void xnb_ifinit(void *xsc) { struct xnb_softc *xnb = xsc; mtx_lock(&xnb->sc_lock); xnb_ifinit_locked(xnb); mtx_unlock(&xnb->sc_lock); } /** * Callback used by the generic networking code to tell us when our carrier * state has changed. Since we don't have a physical carrier, we don't care */ static int xnb_ifmedia_upd(struct ifnet *ifp) { return (0); } /** * Callback used by the generic networking code to ask us what our carrier * state is. Since we don't have a physical carrier, this is very simple */ static void xnb_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr) { ifmr->ifm_status = IFM_AVALID|IFM_ACTIVE; ifmr->ifm_active = IFM_ETHER|IFM_MANUAL; } /*---------------------------- NewBus Registration ---------------------------*/ static device_method_t xnb_methods[] = { /* Device interface */ DEVMETHOD(device_probe, xnb_probe), DEVMETHOD(device_attach, xnb_attach), DEVMETHOD(device_detach, xnb_detach), DEVMETHOD(device_shutdown, bus_generic_shutdown), DEVMETHOD(device_suspend, xnb_suspend), DEVMETHOD(device_resume, xnb_resume), /* Xenbus interface */ DEVMETHOD(xenbus_otherend_changed, xnb_frontend_changed), { 0, 0 } }; static driver_t xnb_driver = { "xnb", xnb_methods, sizeof(struct xnb_softc), }; devclass_t xnb_devclass; DRIVER_MODULE(xnb, xenbusb_back, xnb_driver, xnb_devclass, 0, 0); /*-------------------------- Unit Tests -------------------------------------*/ #ifdef XNB_DEBUG #include "netback_unit_tests.c" #endif Index: projects/ifnet/sys/dev/xen/netfront/netfront.c =================================================================== --- projects/ifnet/sys/dev/xen/netfront/netfront.c (revision 277106) +++ projects/ifnet/sys/dev/xen/netfront/netfront.c (revision 277107) @@ -1,2233 +1,2232 @@ /*- * Copyright (c) 2004-2006 Kip Macy * All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * * THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #include "opt_inet.h" #include "opt_inet6.h" #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #if __FreeBSD_version >= 700000 #include #include #endif #include #include #include /* for DELAY */ #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "xenbus_if.h" /* Features supported by all backends. TSO and LRO can be negotiated */ #define XN_CSUM_FEATURES (CSUM_TCP | CSUM_UDP) #define NET_TX_RING_SIZE __RING_SIZE((netif_tx_sring_t *)0, PAGE_SIZE) #define NET_RX_RING_SIZE __RING_SIZE((netif_rx_sring_t *)0, PAGE_SIZE) #if __FreeBSD_version >= 700000 /* * Should the driver do LRO on the RX end * this can be toggled on the fly, but the * interface must be reset (down/up) for it * to take effect. */ static int xn_enable_lro = 1; TUNABLE_INT("hw.xn.enable_lro", &xn_enable_lro); #else #define IFCAP_TSO4 0 #define CSUM_TSO 0 #endif #ifdef CONFIG_XEN static int MODPARM_rx_copy = 0; module_param_named(rx_copy, MODPARM_rx_copy, bool, 0); MODULE_PARM_DESC(rx_copy, "Copy packets from network card (rather than flip)"); static int MODPARM_rx_flip = 0; module_param_named(rx_flip, MODPARM_rx_flip, bool, 0); MODULE_PARM_DESC(rx_flip, "Flip packets from network card (rather than copy)"); #else static const int MODPARM_rx_copy = 1; static const int MODPARM_rx_flip = 0; #endif /** * \brief The maximum allowed data fragments in a single transmit * request. * * This limit is imposed by the backend driver. We assume here that * we are dealing with a Linux driver domain and have set our limit * to mirror the Linux MAX_SKB_FRAGS constant. */ #define MAX_TX_REQ_FRAGS (65536 / PAGE_SIZE + 2) #define RX_COPY_THRESHOLD 256 #define net_ratelimit() 0 struct netfront_info; struct netfront_rx_info; static void xn_txeof(struct netfront_info *); static void xn_rxeof(struct netfront_info *); static void network_alloc_rx_buffers(struct netfront_info *); static void xn_tick_locked(struct netfront_info *); static void xn_tick(void *); static void xn_intr(void *); static inline int xn_count_frags(struct mbuf *m); static int xn_assemble_tx_request(struct netfront_info *sc, struct mbuf *m_head); static void xn_start_locked(struct ifnet *); static void xn_start(struct ifnet *); static int xn_ioctl(struct ifnet *, u_long, caddr_t); static void xn_ifinit_locked(struct netfront_info *); static void xn_ifinit(void *); static void xn_stop(struct netfront_info *); static void xn_query_features(struct netfront_info *np); static int xn_configure_features(struct netfront_info *np); #ifdef notyet static void xn_watchdog(struct ifnet *); #endif #ifdef notyet static void netfront_closing(device_t dev); #endif static void netif_free(struct netfront_info *info); static int netfront_detach(device_t dev); static int talk_to_backend(device_t dev, struct netfront_info *info); static int create_netdev(device_t dev); static void netif_disconnect_backend(struct netfront_info *info); static int setup_device(device_t dev, struct netfront_info *info); static void free_ring(int *ref, void *ring_ptr_ref); static int xn_ifmedia_upd(struct ifnet *ifp); static void xn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr); /* Xenolinux helper functions */ int network_connect(struct netfront_info *); static void xn_free_rx_ring(struct netfront_info *); static void xn_free_tx_ring(struct netfront_info *); static int xennet_get_responses(struct netfront_info *np, struct netfront_rx_info *rinfo, RING_IDX rp, RING_IDX *cons, struct mbuf **list, int *pages_flipped_p); #define virt_to_mfn(x) (vtomach(x) >> PAGE_SHIFT) #define INVALID_P2M_ENTRY (~0UL) /* * Mbuf pointers. We need these to keep track of the virtual addresses * of our mbuf chains since we can only convert from virtual to physical, * not the other way around. The size must track the free index arrays. */ struct xn_chain_data { struct mbuf *xn_tx_chain[NET_TX_RING_SIZE+1]; int xn_tx_chain_cnt; struct mbuf *xn_rx_chain[NET_RX_RING_SIZE+1]; }; struct net_device_stats { u_long rx_packets; /* total packets received */ u_long tx_packets; /* total packets transmitted */ u_long rx_bytes; /* total bytes received */ u_long tx_bytes; /* total bytes transmitted */ u_long rx_errors; /* bad packets received */ u_long tx_errors; /* packet transmit problems */ u_long rx_dropped; /* no space in linux buffers */ u_long tx_dropped; /* no space available in linux */ u_long multicast; /* multicast packets received */ u_long collisions; /* detailed rx_errors: */ u_long rx_length_errors; u_long rx_over_errors; /* receiver ring buff overflow */ u_long rx_crc_errors; /* recved pkt with crc error */ u_long rx_frame_errors; /* recv'd frame alignment error */ u_long rx_fifo_errors; /* recv'r fifo overrun */ u_long rx_missed_errors; /* receiver missed packet */ /* detailed tx_errors */ u_long tx_aborted_errors; u_long tx_carrier_errors; u_long tx_fifo_errors; u_long tx_heartbeat_errors; u_long tx_window_errors; /* for cslip etc */ u_long rx_compressed; u_long tx_compressed; }; struct netfront_info { struct ifnet *xn_ifp; #if __FreeBSD_version >= 700000 struct lro_ctrl xn_lro; #endif struct net_device_stats stats; u_int tx_full; netif_tx_front_ring_t tx; netif_rx_front_ring_t rx; struct mtx tx_lock; struct mtx rx_lock; struct mtx sc_lock; xen_intr_handle_t xen_intr_handle; u_int copying_receiver; u_int carrier; u_int maxfrags; /* Receive-ring batched refills. */ #define RX_MIN_TARGET 32 #define RX_MAX_TARGET NET_RX_RING_SIZE int rx_min_target; int rx_max_target; int rx_target; grant_ref_t gref_tx_head; grant_ref_t grant_tx_ref[NET_TX_RING_SIZE + 1]; grant_ref_t gref_rx_head; grant_ref_t grant_rx_ref[NET_TX_RING_SIZE + 1]; device_t xbdev; int tx_ring_ref; int rx_ring_ref; uint8_t mac[ETHER_ADDR_LEN]; struct xn_chain_data xn_cdata; /* mbufs */ struct mbuf_head xn_rx_batch; /* head of the batch queue */ int xn_if_flags; struct callout xn_stat_ch; u_long rx_pfn_array[NET_RX_RING_SIZE]; multicall_entry_t rx_mcl[NET_RX_RING_SIZE+1]; mmu_update_t rx_mmu[NET_RX_RING_SIZE]; struct ifmedia sc_media; }; #define rx_mbufs xn_cdata.xn_rx_chain #define tx_mbufs xn_cdata.xn_tx_chain #define XN_LOCK_INIT(_sc, _name) \ mtx_init(&(_sc)->tx_lock, #_name"_tx", "network transmit lock", MTX_DEF); \ mtx_init(&(_sc)->rx_lock, #_name"_rx", "network receive lock", MTX_DEF); \ mtx_init(&(_sc)->sc_lock, #_name"_sc", "netfront softc lock", MTX_DEF) #define XN_RX_LOCK(_sc) mtx_lock(&(_sc)->rx_lock) #define XN_RX_UNLOCK(_sc) mtx_unlock(&(_sc)->rx_lock) #define XN_TX_LOCK(_sc) mtx_lock(&(_sc)->tx_lock) #define XN_TX_UNLOCK(_sc) mtx_unlock(&(_sc)->tx_lock) #define XN_LOCK(_sc) mtx_lock(&(_sc)->sc_lock); #define XN_UNLOCK(_sc) mtx_unlock(&(_sc)->sc_lock); #define XN_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->sc_lock, MA_OWNED); #define XN_RX_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->rx_lock, MA_OWNED); #define XN_TX_LOCK_ASSERT(_sc) mtx_assert(&(_sc)->tx_lock, MA_OWNED); #define XN_LOCK_DESTROY(_sc) mtx_destroy(&(_sc)->rx_lock); \ mtx_destroy(&(_sc)->tx_lock); \ mtx_destroy(&(_sc)->sc_lock); struct netfront_rx_info { struct netif_rx_response rx; struct netif_extra_info extras[XEN_NETIF_EXTRA_TYPE_MAX - 1]; }; #define netfront_carrier_on(netif) ((netif)->carrier = 1) #define netfront_carrier_off(netif) ((netif)->carrier = 0) #define netfront_carrier_ok(netif) ((netif)->carrier) /* Access macros for acquiring freeing slots in xn_free_{tx,rx}_idxs[]. */ static inline void add_id_to_freelist(struct mbuf **list, uintptr_t id) { KASSERT(id != 0, ("%s: the head item (0) must always be free.", __func__)); list[id] = list[0]; list[0] = (struct mbuf *)id; } static inline unsigned short get_id_from_freelist(struct mbuf **list) { uintptr_t id; id = (uintptr_t)list[0]; KASSERT(id != 0, ("%s: the head item (0) must always remain free.", __func__)); list[0] = list[id]; return (id); } static inline int xennet_rxidx(RING_IDX idx) { return idx & (NET_RX_RING_SIZE - 1); } static inline struct mbuf * xennet_get_rx_mbuf(struct netfront_info *np, RING_IDX ri) { int i = xennet_rxidx(ri); struct mbuf *m; m = np->rx_mbufs[i]; np->rx_mbufs[i] = NULL; return (m); } static inline grant_ref_t xennet_get_rx_ref(struct netfront_info *np, RING_IDX ri) { int i = xennet_rxidx(ri); grant_ref_t ref = np->grant_rx_ref[i]; KASSERT(ref != GRANT_REF_INVALID, ("Invalid grant reference!\n")); np->grant_rx_ref[i] = GRANT_REF_INVALID; return ref; } #define IPRINTK(fmt, args...) \ printf("[XEN] " fmt, ##args) #ifdef INVARIANTS #define WPRINTK(fmt, args...) \ printf("[XEN] " fmt, ##args) #else #define WPRINTK(fmt, args...) #endif #ifdef DEBUG #define DPRINTK(fmt, args...) \ printf("[XEN] %s: " fmt, __func__, ##args) #else #define DPRINTK(fmt, args...) #endif /** * Read the 'mac' node at the given device's node in the store, and parse that * as colon-separated octets, placing result the given mac array. mac must be * a preallocated array of length ETH_ALEN (as declared in linux/if_ether.h). * Return 0 on success, or errno on error. */ static int xen_net_read_mac(device_t dev, uint8_t mac[]) { int error, i; char *s, *e, *macstr; const char *path; path = xenbus_get_node(dev); error = xs_read(XST_NIL, path, "mac", NULL, (void **) &macstr); if (error == ENOENT) { /* * Deal with missing mac XenStore nodes on devices with * HVM emulation (the 'ioemu' configuration attribute) * enabled. * * The HVM emulator may execute in a stub device model * domain which lacks the permission, only given to Dom0, * to update the guest's XenStore tree. For this reason, * the HVM emulator doesn't even attempt to write the * front-side mac node, even when operating in Dom0. * However, there should always be a mac listed in the * backend tree. Fallback to this version if our query * of the front side XenStore location doesn't find * anything. */ path = xenbus_get_otherend_path(dev); error = xs_read(XST_NIL, path, "mac", NULL, (void **) &macstr); } if (error != 0) { xenbus_dev_fatal(dev, error, "parsing %s/mac", path); return (error); } s = macstr; for (i = 0; i < ETHER_ADDR_LEN; i++) { mac[i] = strtoul(s, &e, 16); if (s == e || (e[0] != ':' && e[0] != 0)) { free(macstr, M_XENBUS); return (ENOENT); } s = &e[1]; } free(macstr, M_XENBUS); return (0); } /** * Entry point to this code when a new device is created. Allocate the basic * structures and the ring buffers for communication with the backend, and * inform the backend of the appropriate details for those. Switch to * Connected state. */ static int netfront_probe(device_t dev) { if (!strcmp(xenbus_get_type(dev), "vif")) { device_set_desc(dev, "Virtual Network Interface"); return (0); } return (ENXIO); } static int netfront_attach(device_t dev) { int err; err = create_netdev(dev); if (err) { xenbus_dev_fatal(dev, err, "creating netdev"); return (err); } #if __FreeBSD_version >= 700000 SYSCTL_ADD_INT(device_get_sysctl_ctx(dev), SYSCTL_CHILDREN(device_get_sysctl_tree(dev)), OID_AUTO, "enable_lro", CTLFLAG_RW, &xn_enable_lro, 0, "Large Receive Offload"); #endif return (0); } static int netfront_suspend(device_t dev) { struct netfront_info *info = device_get_softc(dev); XN_RX_LOCK(info); XN_TX_LOCK(info); netfront_carrier_off(info); XN_TX_UNLOCK(info); XN_RX_UNLOCK(info); return (0); } /** * We are reconnecting to the backend, due to a suspend/resume, or a backend * driver restart. We tear down our netif structure and recreate it, but * leave the device-layer structures intact so that this is transparent to the * rest of the kernel. */ static int netfront_resume(device_t dev) { struct netfront_info *info = device_get_softc(dev); netif_disconnect_backend(info); return (0); } /* Common code used when first setting up, and when resuming. */ static int talk_to_backend(device_t dev, struct netfront_info *info) { const char *message; struct xs_transaction xst; const char *node = xenbus_get_node(dev); int err; err = xen_net_read_mac(dev, info->mac); if (err) { xenbus_dev_fatal(dev, err, "parsing %s/mac", node); goto out; } /* Create shared ring, alloc event channel. */ err = setup_device(dev, info); if (err) goto out; again: err = xs_transaction_start(&xst); if (err) { xenbus_dev_fatal(dev, err, "starting transaction"); goto destroy_ring; } err = xs_printf(xst, node, "tx-ring-ref","%u", info->tx_ring_ref); if (err) { message = "writing tx ring-ref"; goto abort_transaction; } err = xs_printf(xst, node, "rx-ring-ref","%u", info->rx_ring_ref); if (err) { message = "writing rx ring-ref"; goto abort_transaction; } err = xs_printf(xst, node, "event-channel", "%u", xen_intr_port(info->xen_intr_handle)); if (err) { message = "writing event-channel"; goto abort_transaction; } err = xs_printf(xst, node, "request-rx-copy", "%u", info->copying_receiver); if (err) { message = "writing request-rx-copy"; goto abort_transaction; } err = xs_printf(xst, node, "feature-rx-notify", "%d", 1); if (err) { message = "writing feature-rx-notify"; goto abort_transaction; } err = xs_printf(xst, node, "feature-sg", "%d", 1); if (err) { message = "writing feature-sg"; goto abort_transaction; } #if __FreeBSD_version >= 700000 err = xs_printf(xst, node, "feature-gso-tcpv4", "%d", 1); if (err) { message = "writing feature-gso-tcpv4"; goto abort_transaction; } #endif err = xs_transaction_end(xst, 0); if (err) { if (err == EAGAIN) goto again; xenbus_dev_fatal(dev, err, "completing transaction"); goto destroy_ring; } return 0; abort_transaction: xs_transaction_end(xst, 1); xenbus_dev_fatal(dev, err, "%s", message); destroy_ring: netif_free(info); out: return err; } static int setup_device(device_t dev, struct netfront_info *info) { netif_tx_sring_t *txs; netif_rx_sring_t *rxs; int error; struct ifnet *ifp; ifp = info->xn_ifp; info->tx_ring_ref = GRANT_REF_INVALID; info->rx_ring_ref = GRANT_REF_INVALID; info->rx.sring = NULL; info->tx.sring = NULL; txs = (netif_tx_sring_t *)malloc(PAGE_SIZE, M_DEVBUF, M_NOWAIT|M_ZERO); if (!txs) { error = ENOMEM; xenbus_dev_fatal(dev, error, "allocating tx ring page"); goto fail; } SHARED_RING_INIT(txs); FRONT_RING_INIT(&info->tx, txs, PAGE_SIZE); error = xenbus_grant_ring(dev, virt_to_mfn(txs), &info->tx_ring_ref); if (error) goto fail; rxs = (netif_rx_sring_t *)malloc(PAGE_SIZE, M_DEVBUF, M_NOWAIT|M_ZERO); if (!rxs) { error = ENOMEM; xenbus_dev_fatal(dev, error, "allocating rx ring page"); goto fail; } SHARED_RING_INIT(rxs); FRONT_RING_INIT(&info->rx, rxs, PAGE_SIZE); error = xenbus_grant_ring(dev, virt_to_mfn(rxs), &info->rx_ring_ref); if (error) goto fail; error = xen_intr_alloc_and_bind_local_port(dev, xenbus_get_otherend_id(dev), /*filter*/NULL, xn_intr, info, INTR_TYPE_NET | INTR_MPSAFE | INTR_ENTROPY, &info->xen_intr_handle); if (error) { xenbus_dev_fatal(dev, error, "xen_intr_alloc_and_bind_local_port failed"); goto fail; } return (0); fail: netif_free(info); return (error); } #ifdef INET /** * If this interface has an ipv4 address, send an arp for it. This * helps to get the network going again after migrating hosts. */ static void netfront_send_fake_arp(device_t dev, struct netfront_info *info) { struct ifnet *ifp; struct ifaddr *ifa; ifp = info->xn_ifp; TAILQ_FOREACH(ifa, &ifp->if_addrhead, ifa_link) { if (ifa->ifa_addr->sa_family == AF_INET) { arp_ifinit(ifp, ifa); } } } #endif /** * Callback received when the backend's state changes. */ static void netfront_backend_changed(device_t dev, XenbusState newstate) { struct netfront_info *sc = device_get_softc(dev); DPRINTK("newstate=%d\n", newstate); switch (newstate) { case XenbusStateInitialising: case XenbusStateInitialised: case XenbusStateConnected: case XenbusStateUnknown: case XenbusStateClosed: case XenbusStateReconfigured: case XenbusStateReconfiguring: break; case XenbusStateInitWait: if (xenbus_get_state(dev) != XenbusStateInitialising) break; if (network_connect(sc) != 0) break; xenbus_set_state(dev, XenbusStateConnected); #ifdef INET netfront_send_fake_arp(dev, sc); #endif break; case XenbusStateClosing: xenbus_set_state(dev, XenbusStateClosed); break; } } static void xn_free_rx_ring(struct netfront_info *sc) { #if 0 int i; for (i = 0; i < NET_RX_RING_SIZE; i++) { if (sc->xn_cdata.rx_mbufs[i] != NULL) { m_freem(sc->rx_mbufs[i]); sc->rx_mbufs[i] = NULL; } } sc->rx.rsp_cons = 0; sc->xn_rx_if->req_prod = 0; sc->xn_rx_if->event = sc->rx.rsp_cons ; #endif } static void xn_free_tx_ring(struct netfront_info *sc) { #if 0 int i; for (i = 0; i < NET_TX_RING_SIZE; i++) { if (sc->tx_mbufs[i] != NULL) { m_freem(sc->tx_mbufs[i]); sc->xn_cdata.xn_tx_chain[i] = NULL; } } return; #endif } /** * \brief Verify that there is sufficient space in the Tx ring * buffer for a maximally sized request to be enqueued. * * A transmit request requires a transmit descriptor for each packet * fragment, plus up to 2 entries for "options" (e.g. TSO). */ static inline int xn_tx_slot_available(struct netfront_info *np) { return (RING_FREE_REQUESTS(&np->tx) > (MAX_TX_REQ_FRAGS + 2)); } static void netif_release_tx_bufs(struct netfront_info *np) { int i; for (i = 1; i <= NET_TX_RING_SIZE; i++) { struct mbuf *m; m = np->tx_mbufs[i]; /* * We assume that no kernel addresses are * less than NET_TX_RING_SIZE. Any entry * in the table that is below this number * must be an index from free-list tracking. */ if (((uintptr_t)m) <= NET_TX_RING_SIZE) continue; gnttab_end_foreign_access_ref(np->grant_tx_ref[i]); gnttab_release_grant_reference(&np->gref_tx_head, np->grant_tx_ref[i]); np->grant_tx_ref[i] = GRANT_REF_INVALID; add_id_to_freelist(np->tx_mbufs, i); np->xn_cdata.xn_tx_chain_cnt--; if (np->xn_cdata.xn_tx_chain_cnt < 0) { panic("%s: tx_chain_cnt must be >= 0", __func__); } m_free(m); } } static void network_alloc_rx_buffers(struct netfront_info *sc) { int otherend_id = xenbus_get_otherend_id(sc->xbdev); unsigned short id; struct mbuf *m_new; int i, batch_target, notify; RING_IDX req_prod; struct xen_memory_reservation reservation; grant_ref_t ref; int nr_flips; netif_rx_request_t *req; vm_offset_t vaddr; u_long pfn; req_prod = sc->rx.req_prod_pvt; if (__predict_false(sc->carrier == 0)) return; /* * Allocate mbufs greedily, even though we batch updates to the * receive ring. This creates a less bursty demand on the memory * allocator, and so should reduce the chance of failed allocation * requests both for ourself and for other kernel subsystems. * * Here we attempt to maintain rx_target buffers in flight, counting * buffers that we have yet to process in the receive ring. */ batch_target = sc->rx_target - (req_prod - sc->rx.rsp_cons); for (i = mbufq_len(&sc->xn_rx_batch); i < batch_target; i++) { MGETHDR(m_new, M_NOWAIT, MT_DATA); if (m_new == NULL) { printf("%s: MGETHDR failed\n", __func__); goto no_mbuf; } if (m_cljget(m_new, M_NOWAIT, MJUMPAGESIZE) == NULL) { printf("%s: m_cljget failed\n", __func__); m_freem(m_new); no_mbuf: if (i != 0) goto refill; /* * XXX set timer */ break; } m_new->m_len = m_new->m_pkthdr.len = MJUMPAGESIZE; /* queue the mbufs allocated */ mbufq_tail(&sc->xn_rx_batch, m_new); } /* * If we've allocated at least half of our target number of entries, * submit them to the backend - we have enough to make the overhead * of submission worthwhile. Otherwise wait for more mbufs and * request entries to become available. */ if (i < (sc->rx_target/2)) { if (req_prod >sc->rx.sring->req_prod) goto push; return; } /* * Double floating fill target if we risked having the backend * run out of empty buffers for receive traffic. We define "running * low" as having less than a fourth of our target buffers free * at the time we refilled the queue. */ if ((req_prod - sc->rx.sring->rsp_prod) < (sc->rx_target / 4)) { sc->rx_target *= 2; if (sc->rx_target > sc->rx_max_target) sc->rx_target = sc->rx_max_target; } refill: for (nr_flips = i = 0; ; i++) { if ((m_new = mbufq_dequeue(&sc->xn_rx_batch)) == NULL) break; m_new->m_ext.ext_arg1 = (vm_paddr_t *)(uintptr_t)( vtophys(m_new->m_ext.ext_buf) >> PAGE_SHIFT); id = xennet_rxidx(req_prod + i); KASSERT(sc->rx_mbufs[id] == NULL, ("non-NULL xm_rx_chain")); sc->rx_mbufs[id] = m_new; ref = gnttab_claim_grant_reference(&sc->gref_rx_head); KASSERT(ref != GNTTAB_LIST_END, ("reserved grant references exhuasted")); sc->grant_rx_ref[id] = ref; vaddr = mtod(m_new, vm_offset_t); pfn = vtophys(vaddr) >> PAGE_SHIFT; req = RING_GET_REQUEST(&sc->rx, req_prod + i); if (sc->copying_receiver == 0) { gnttab_grant_foreign_transfer_ref(ref, otherend_id, pfn); sc->rx_pfn_array[nr_flips] = PFNTOMFN(pfn); if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* Remove this page before passing * back to Xen. */ set_phys_to_machine(pfn, INVALID_P2M_ENTRY); MULTI_update_va_mapping(&sc->rx_mcl[i], vaddr, 0, 0); } nr_flips++; } else { gnttab_grant_foreign_access_ref(ref, otherend_id, PFNTOMFN(pfn), 0); } req->id = id; req->gref = ref; sc->rx_pfn_array[i] = vtomach(mtod(m_new,vm_offset_t)) >> PAGE_SHIFT; } KASSERT(i, ("no mbufs processed")); /* should have returned earlier */ KASSERT(mbufq_len(&sc->xn_rx_batch) == 0, ("not all mbufs processed")); /* * We may have allocated buffers which have entries outstanding * in the page * update queue -- make sure we flush those first! */ PT_UPDATES_FLUSH(); if (nr_flips != 0) { #ifdef notyet /* Tell the ballon driver what is going on. */ balloon_update_driver_allowance(i); #endif set_xen_guest_handle(reservation.extent_start, sc->rx_pfn_array); reservation.nr_extents = i; reservation.extent_order = 0; reservation.address_bits = 0; reservation.domid = DOMID_SELF; if (!xen_feature(XENFEAT_auto_translated_physmap)) { /* After all PTEs have been zapped, flush the TLB. */ sc->rx_mcl[i-1].args[MULTI_UVMFLAGS_INDEX] = UVMF_TLB_FLUSH|UVMF_ALL; /* Give away a batch of pages. */ sc->rx_mcl[i].op = __HYPERVISOR_memory_op; sc->rx_mcl[i].args[0] = XENMEM_decrease_reservation; sc->rx_mcl[i].args[1] = (u_long)&reservation; /* Zap PTEs and give away pages in one big multicall. */ (void)HYPERVISOR_multicall(sc->rx_mcl, i+1); if (__predict_false(sc->rx_mcl[i].result != i || HYPERVISOR_memory_op(XENMEM_decrease_reservation, &reservation) != i)) panic("%s: unable to reduce memory " "reservation\n", __func__); } } else { wmb(); } /* Above is a suitable barrier to ensure backend will see requests. */ sc->rx.req_prod_pvt = req_prod + i; push: RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&sc->rx, notify); if (notify) xen_intr_signal(sc->xen_intr_handle); } static void xn_rxeof(struct netfront_info *np) { struct ifnet *ifp; #if __FreeBSD_version >= 700000 && (defined(INET) || defined(INET6)) struct lro_ctrl *lro = &np->xn_lro; struct lro_entry *queued; #endif struct netfront_rx_info rinfo; struct netif_rx_response *rx = &rinfo.rx; struct netif_extra_info *extras = rinfo.extras; RING_IDX i, rp; multicall_entry_t *mcl; struct mbuf *m; struct mbuf_head rxq, errq; int err, pages_flipped = 0, work_to_do; do { XN_RX_LOCK_ASSERT(np); if (!netfront_carrier_ok(np)) return; mbufq_init(&errq); mbufq_init(&rxq); ifp = np->xn_ifp; rp = np->rx.sring->rsp_prod; rmb(); /* Ensure we see queued responses up to 'rp'. */ i = np->rx.rsp_cons; while ((i != rp)) { memcpy(rx, RING_GET_RESPONSE(&np->rx, i), sizeof(*rx)); memset(extras, 0, sizeof(rinfo.extras)); m = NULL; err = xennet_get_responses(np, &rinfo, rp, &i, &m, &pages_flipped); if (__predict_false(err)) { if (m) mbufq_tail(&errq, m); np->stats.rx_errors++; continue; } m->m_pkthdr.rcvif = ifp; if ( rx->flags & NETRXF_data_validated ) { /* Tell the stack the checksums are okay */ /* * XXX this isn't necessarily the case - need to add * check */ m->m_pkthdr.csum_flags |= (CSUM_IP_CHECKED | CSUM_IP_VALID | CSUM_DATA_VALID | CSUM_PSEUDO_HDR); m->m_pkthdr.csum_data = 0xffff; } np->stats.rx_packets++; np->stats.rx_bytes += m->m_pkthdr.len; mbufq_tail(&rxq, m); np->rx.rsp_cons = i; } if (pages_flipped) { /* Some pages are no longer absent... */ #ifdef notyet balloon_update_driver_allowance(-pages_flipped); #endif /* Do all the remapping work, and M->P updates, in one big * hypercall. */ if (!!xen_feature(XENFEAT_auto_translated_physmap)) { mcl = np->rx_mcl + pages_flipped; mcl->op = __HYPERVISOR_mmu_update; mcl->args[0] = (u_long)np->rx_mmu; mcl->args[1] = pages_flipped; mcl->args[2] = 0; mcl->args[3] = DOMID_SELF; (void)HYPERVISOR_multicall(np->rx_mcl, pages_flipped + 1); } } while ((m = mbufq_dequeue(&errq))) m_freem(m); /* * Process all the mbufs after the remapping is complete. * Break the mbuf chain first though. */ while ((m = mbufq_dequeue(&rxq)) != NULL) { if_inc_counter(ifp, IFCOUNTER_IPACKETS, 1); /* * Do we really need to drop the rx lock? */ XN_RX_UNLOCK(np); #if __FreeBSD_version >= 700000 && (defined(INET) || defined(INET6)) /* Use LRO if possible */ if ((ifp->if_capenable & IFCAP_LRO) == 0 || lro->lro_cnt == 0 || tcp_lro_rx(lro, m, 0)) { /* * If LRO fails, pass up to the stack * directly. */ (*ifp->if_input)(ifp, m); } #else (*ifp->if_input)(ifp, m); #endif XN_RX_LOCK(np); } np->rx.rsp_cons = i; #if __FreeBSD_version >= 700000 && (defined(INET) || defined(INET6)) /* * Flush any outstanding LRO work */ while (!SLIST_EMPTY(&lro->lro_active)) { queued = SLIST_FIRST(&lro->lro_active); SLIST_REMOVE_HEAD(&lro->lro_active, next); tcp_lro_flush(lro, queued); } #endif #if 0 /* If we get a callback with very few responses, reduce fill target. */ /* NB. Note exponential increase, linear decrease. */ if (((np->rx.req_prod_pvt - np->rx.sring->rsp_prod) > ((3*np->rx_target) / 4)) && (--np->rx_target < np->rx_min_target)) np->rx_target = np->rx_min_target; #endif network_alloc_rx_buffers(np); RING_FINAL_CHECK_FOR_RESPONSES(&np->rx, work_to_do); } while (work_to_do); } static void xn_txeof(struct netfront_info *np) { RING_IDX i, prod; unsigned short id; struct ifnet *ifp; netif_tx_response_t *txr; struct mbuf *m; XN_TX_LOCK_ASSERT(np); if (!netfront_carrier_ok(np)) return; ifp = np->xn_ifp; do { prod = np->tx.sring->rsp_prod; rmb(); /* Ensure we see responses up to 'rp'. */ for (i = np->tx.rsp_cons; i != prod; i++) { txr = RING_GET_RESPONSE(&np->tx, i); if (txr->status == NETIF_RSP_NULL) continue; if (txr->status != NETIF_RSP_OKAY) { printf("%s: WARNING: response is %d!\n", __func__, txr->status); } id = txr->id; m = np->tx_mbufs[id]; KASSERT(m != NULL, ("mbuf not found in xn_tx_chain")); KASSERT((uintptr_t)m > NET_TX_RING_SIZE, ("mbuf already on the free list, but we're " "trying to free it again!")); M_ASSERTVALID(m); /* * Increment packet count if this is the last * mbuf of the chain. */ if (!m->m_next) if_inc_counter(ifp, IFCOUNTER_OPACKETS, 1); if (__predict_false(gnttab_query_foreign_access( np->grant_tx_ref[id]) != 0)) { panic("%s: grant id %u still in use by the " "backend", __func__, id); } gnttab_end_foreign_access_ref( np->grant_tx_ref[id]); gnttab_release_grant_reference( &np->gref_tx_head, np->grant_tx_ref[id]); np->grant_tx_ref[id] = GRANT_REF_INVALID; np->tx_mbufs[id] = NULL; add_id_to_freelist(np->tx_mbufs, id); np->xn_cdata.xn_tx_chain_cnt--; m_free(m); /* Only mark the queue active if we've freed up at least one slot to try */ ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; } np->tx.rsp_cons = prod; /* * Set a new event, then check for race with update of * tx_cons. Note that it is essential to schedule a * callback, no matter how few buffers are pending. Even if * there is space in the transmit ring, higher layers may * be blocked because too much data is outstanding: in such * cases notification from Xen is likely to be the only kick * that we'll get. */ np->tx.sring->rsp_event = prod + ((np->tx.sring->req_prod - prod) >> 1) + 1; mb(); } while (prod != np->tx.sring->rsp_prod); if (np->tx_full && ((np->tx.sring->req_prod - prod) < NET_TX_RING_SIZE)) { np->tx_full = 0; #if 0 if (np->user_state == UST_OPEN) netif_wake_queue(dev); #endif } } static void xn_intr(void *xsc) { struct netfront_info *np = xsc; struct ifnet *ifp = np->xn_ifp; #if 0 if (!(np->rx.rsp_cons != np->rx.sring->rsp_prod && likely(netfront_carrier_ok(np)) && ifp->if_drv_flags & IFF_DRV_RUNNING)) return; #endif if (RING_HAS_UNCONSUMED_RESPONSES(&np->tx)) { XN_TX_LOCK(np); xn_txeof(np); XN_TX_UNLOCK(np); } XN_RX_LOCK(np); xn_rxeof(np); XN_RX_UNLOCK(np); if (ifp->if_drv_flags & IFF_DRV_RUNNING && !IFQ_DRV_IS_EMPTY(&ifp->if_snd)) xn_start(ifp); } static void xennet_move_rx_slot(struct netfront_info *np, struct mbuf *m, grant_ref_t ref) { int new = xennet_rxidx(np->rx.req_prod_pvt); KASSERT(np->rx_mbufs[new] == NULL, ("rx_mbufs != NULL")); np->rx_mbufs[new] = m; np->grant_rx_ref[new] = ref; RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->id = new; RING_GET_REQUEST(&np->rx, np->rx.req_prod_pvt)->gref = ref; np->rx.req_prod_pvt++; } static int xennet_get_extras(struct netfront_info *np, struct netif_extra_info *extras, RING_IDX rp, RING_IDX *cons) { struct netif_extra_info *extra; int err = 0; do { struct mbuf *m; grant_ref_t ref; if (__predict_false(*cons + 1 == rp)) { #if 0 if (net_ratelimit()) WPRINTK("Missing extra info\n"); #endif err = EINVAL; break; } extra = (struct netif_extra_info *) RING_GET_RESPONSE(&np->rx, ++(*cons)); if (__predict_false(!extra->type || extra->type >= XEN_NETIF_EXTRA_TYPE_MAX)) { #if 0 if (net_ratelimit()) WPRINTK("Invalid extra type: %d\n", extra->type); #endif err = EINVAL; } else { memcpy(&extras[extra->type - 1], extra, sizeof(*extra)); } m = xennet_get_rx_mbuf(np, *cons); ref = xennet_get_rx_ref(np, *cons); xennet_move_rx_slot(np, m, ref); } while (extra->flags & XEN_NETIF_EXTRA_FLAG_MORE); return err; } static int xennet_get_responses(struct netfront_info *np, struct netfront_rx_info *rinfo, RING_IDX rp, RING_IDX *cons, struct mbuf **list, int *pages_flipped_p) { int pages_flipped = *pages_flipped_p; struct mmu_update *mmu; struct multicall_entry *mcl; struct netif_rx_response *rx = &rinfo->rx; struct netif_extra_info *extras = rinfo->extras; struct mbuf *m, *m0, *m_prev; grant_ref_t ref = xennet_get_rx_ref(np, *cons); RING_IDX ref_cons = *cons; int frags = 1; int err = 0; u_long ret; m0 = m = m_prev = xennet_get_rx_mbuf(np, *cons); if (rx->flags & NETRXF_extra_info) { err = xennet_get_extras(np, extras, rp, cons); } if (m0 != NULL) { m0->m_pkthdr.len = 0; m0->m_next = NULL; } for (;;) { u_long mfn; #if 0 DPRINTK("rx->status=%hd rx->offset=%hu frags=%u\n", rx->status, rx->offset, frags); #endif if (__predict_false(rx->status < 0 || rx->offset + rx->status > PAGE_SIZE)) { #if 0 if (net_ratelimit()) WPRINTK("rx->offset: %x, size: %u\n", rx->offset, rx->status); #endif xennet_move_rx_slot(np, m, ref); if (m0 == m) m0 = NULL; m = NULL; err = EINVAL; goto next_skip_queue; } /* * This definitely indicates a bug, either in this driver or in * the backend driver. In future this should flag the bad * situation to the system controller to reboot the backed. */ if (ref == GRANT_REF_INVALID) { #if 0 if (net_ratelimit()) WPRINTK("Bad rx response id %d.\n", rx->id); #endif printf("%s: Bad rx response id %d.\n", __func__,rx->id); err = EINVAL; goto next; } if (!np->copying_receiver) { /* Memory pressure, insufficient buffer * headroom, ... */ if (!(mfn = gnttab_end_foreign_transfer_ref(ref))) { WPRINTK("Unfulfilled rx req (id=%d, st=%d).\n", rx->id, rx->status); xennet_move_rx_slot(np, m, ref); err = ENOMEM; goto next; } if (!xen_feature( XENFEAT_auto_translated_physmap)) { /* Remap the page. */ void *vaddr = mtod(m, void *); uint32_t pfn; mcl = np->rx_mcl + pages_flipped; mmu = np->rx_mmu + pages_flipped; MULTI_update_va_mapping(mcl, (u_long)vaddr, (((vm_paddr_t)mfn) << PAGE_SHIFT) | PG_RW | PG_V | PG_M | PG_A, 0); pfn = (uintptr_t)m->m_ext.ext_arg1; mmu->ptr = ((vm_paddr_t)mfn << PAGE_SHIFT) | MMU_MACHPHYS_UPDATE; mmu->val = pfn; set_phys_to_machine(pfn, mfn); } pages_flipped++; } else { ret = gnttab_end_foreign_access_ref(ref); KASSERT(ret, ("ret != 0")); } gnttab_release_grant_reference(&np->gref_rx_head, ref); next: if (m == NULL) break; m->m_len = rx->status; m->m_data += rx->offset; m0->m_pkthdr.len += rx->status; next_skip_queue: if (!(rx->flags & NETRXF_more_data)) break; if (*cons + frags == rp) { if (net_ratelimit()) WPRINTK("Need more frags\n"); err = ENOENT; printf("%s: cons %u frags %u rp %u, not enough frags\n", __func__, *cons, frags, rp); break; } /* * Note that m can be NULL, if rx->status < 0 or if * rx->offset + rx->status > PAGE_SIZE above. */ m_prev = m; rx = RING_GET_RESPONSE(&np->rx, *cons + frags); m = xennet_get_rx_mbuf(np, *cons + frags); /* * m_prev == NULL can happen if rx->status < 0 or if * rx->offset + * rx->status > PAGE_SIZE above. */ if (m_prev != NULL) m_prev->m_next = m; /* * m0 can be NULL if rx->status < 0 or if * rx->offset + * rx->status > PAGE_SIZE above. */ if (m0 == NULL) m0 = m; m->m_next = NULL; ref = xennet_get_rx_ref(np, *cons + frags); ref_cons = *cons + frags; frags++; } *list = m0; *cons += frags; *pages_flipped_p = pages_flipped; return (err); } static void xn_tick_locked(struct netfront_info *sc) { XN_RX_LOCK_ASSERT(sc); callout_reset(&sc->xn_stat_ch, hz, xn_tick, sc); /* XXX placeholder for printing debug information */ } static void xn_tick(void *xsc) { struct netfront_info *sc; sc = xsc; XN_RX_LOCK(sc); xn_tick_locked(sc); XN_RX_UNLOCK(sc); } /** * \brief Count the number of fragments in an mbuf chain. * * Surprisingly, there isn't an M* macro for this. */ static inline int xn_count_frags(struct mbuf *m) { int nfrags; for (nfrags = 0; m != NULL; m = m->m_next) nfrags++; return (nfrags); } /** * Given an mbuf chain, make sure we have enough room and then push * it onto the transmit ring. */ static int xn_assemble_tx_request(struct netfront_info *sc, struct mbuf *m_head) { struct ifnet *ifp; struct mbuf *m; u_int nfrags; netif_extra_info_t *extra; int otherend_id; ifp = sc->xn_ifp; /** * Defragment the mbuf if necessary. */ nfrags = xn_count_frags(m_head); /* * Check to see whether this request is longer than netback * can handle, and try to defrag it. */ /** * It is a bit lame, but the netback driver in Linux can't * deal with nfrags > MAX_TX_REQ_FRAGS, which is a quirk of * the Linux network stack. */ if (nfrags > sc->maxfrags) { m = m_defrag(m_head, M_NOWAIT); if (!m) { /* * Defrag failed, so free the mbuf and * therefore drop the packet. */ m_freem(m_head); return (EMSGSIZE); } m_head = m; } /* Determine how many fragments now exist */ nfrags = xn_count_frags(m_head); /* * Check to see whether the defragmented packet has too many * segments for the Linux netback driver. */ /** * The FreeBSD TCP stack, with TSO enabled, can produce a chain * of mbufs longer than Linux can handle. Make sure we don't * pass a too-long chain over to the other side by dropping the * packet. It doesn't look like there is currently a way to * tell the TCP stack to generate a shorter chain of packets. */ if (nfrags > MAX_TX_REQ_FRAGS) { #ifdef DEBUG printf("%s: nfrags %d > MAX_TX_REQ_FRAGS %d, netback " "won't be able to handle it, dropping\n", __func__, nfrags, MAX_TX_REQ_FRAGS); #endif m_freem(m_head); return (EMSGSIZE); } /* * This check should be redundant. We've already verified that we * have enough slots in the ring to handle a packet of maximum * size, and that our packet is less than the maximum size. Keep * it in here as an assert for now just to make certain that * xn_tx_chain_cnt is accurate. */ KASSERT((sc->xn_cdata.xn_tx_chain_cnt + nfrags) <= NET_TX_RING_SIZE, ("%s: xn_tx_chain_cnt (%d) + nfrags (%d) > NET_TX_RING_SIZE " "(%d)!", __func__, (int) sc->xn_cdata.xn_tx_chain_cnt, (int) nfrags, (int) NET_TX_RING_SIZE)); /* * Start packing the mbufs in this chain into * the fragment pointers. Stop when we run out * of fragments or hit the end of the mbuf chain. */ m = m_head; extra = NULL; otherend_id = xenbus_get_otherend_id(sc->xbdev); for (m = m_head; m; m = m->m_next) { netif_tx_request_t *tx; uintptr_t id; grant_ref_t ref; u_long mfn; /* XXX Wrong type? */ tx = RING_GET_REQUEST(&sc->tx, sc->tx.req_prod_pvt); id = get_id_from_freelist(sc->tx_mbufs); if (id == 0) panic("%s: was allocated the freelist head!\n", __func__); sc->xn_cdata.xn_tx_chain_cnt++; if (sc->xn_cdata.xn_tx_chain_cnt > NET_TX_RING_SIZE) panic("%s: tx_chain_cnt must be <= NET_TX_RING_SIZE\n", __func__); sc->tx_mbufs[id] = m; tx->id = id; ref = gnttab_claim_grant_reference(&sc->gref_tx_head); KASSERT((short)ref >= 0, ("Negative ref")); mfn = virt_to_mfn(mtod(m, vm_offset_t)); gnttab_grant_foreign_access_ref(ref, otherend_id, mfn, GNTMAP_readonly); tx->gref = sc->grant_tx_ref[id] = ref; tx->offset = mtod(m, vm_offset_t) & (PAGE_SIZE - 1); tx->flags = 0; if (m == m_head) { /* * The first fragment has the entire packet * size, subsequent fragments have just the * fragment size. The backend works out the * true size of the first fragment by * subtracting the sizes of the other * fragments. */ tx->size = m->m_pkthdr.len; /* * The first fragment contains the checksum flags * and is optionally followed by extra data for * TSO etc. */ /** * CSUM_TSO requires checksum offloading. * Some versions of FreeBSD fail to * set CSUM_TCP in the CSUM_TSO case, * so we have to test for CSUM_TSO * explicitly. */ if (m->m_pkthdr.csum_flags & (CSUM_DELAY_DATA | CSUM_TSO)) { tx->flags |= (NETTXF_csum_blank | NETTXF_data_validated); } #if __FreeBSD_version >= 700000 if (m->m_pkthdr.csum_flags & CSUM_TSO) { struct netif_extra_info *gso = (struct netif_extra_info *) RING_GET_REQUEST(&sc->tx, ++sc->tx.req_prod_pvt); tx->flags |= NETTXF_extra_info; gso->u.gso.size = m->m_pkthdr.tso_segsz; gso->u.gso.type = XEN_NETIF_GSO_TYPE_TCPV4; gso->u.gso.pad = 0; gso->u.gso.features = 0; gso->type = XEN_NETIF_EXTRA_TYPE_GSO; gso->flags = 0; } #endif } else { tx->size = m->m_len; } if (m->m_next) tx->flags |= NETTXF_more_data; sc->tx.req_prod_pvt++; } BPF_MTAP(ifp, m_head); sc->stats.tx_bytes += m_head->m_pkthdr.len; sc->stats.tx_packets++; return (0); } static void xn_start_locked(struct ifnet *ifp) { struct netfront_info *sc; struct mbuf *m_head; int notify; sc = ifp->if_softc; if (!netfront_carrier_ok(sc)) return; /* * While we have enough transmit slots available for at least one * maximum-sized packet, pull mbufs off the queue and put them on * the transmit ring. */ while (xn_tx_slot_available(sc)) { IF_DEQUEUE(&ifp->if_snd, m_head); if (m_head == NULL) break; if (xn_assemble_tx_request(sc, m_head) != 0) break; } RING_PUSH_REQUESTS_AND_CHECK_NOTIFY(&sc->tx, notify); if (notify) xen_intr_signal(sc->xen_intr_handle); if (RING_FULL(&sc->tx)) { sc->tx_full = 1; #if 0 netif_stop_queue(dev); #endif } } static void xn_start(struct ifnet *ifp) { struct netfront_info *sc; sc = ifp->if_softc; XN_TX_LOCK(sc); xn_start_locked(ifp); XN_TX_UNLOCK(sc); } /* equivalent of network_open() in Linux */ static void xn_ifinit_locked(struct netfront_info *sc) { struct ifnet *ifp; XN_LOCK_ASSERT(sc); ifp = sc->xn_ifp; if (ifp->if_drv_flags & IFF_DRV_RUNNING) return; xn_stop(sc); network_alloc_rx_buffers(sc); sc->rx.sring->rsp_event = sc->rx.rsp_cons + 1; ifp->if_drv_flags |= IFF_DRV_RUNNING; ifp->if_drv_flags &= ~IFF_DRV_OACTIVE; if_link_state_change(ifp, LINK_STATE_UP); callout_reset(&sc->xn_stat_ch, hz, xn_tick, sc); } static void xn_ifinit(void *xsc) { struct netfront_info *sc = xsc; XN_LOCK(sc); xn_ifinit_locked(sc); XN_UNLOCK(sc); } static int xn_ioctl(struct ifnet *ifp, u_long cmd, caddr_t data) { struct netfront_info *sc = ifp->if_softc; struct ifreq *ifr = (struct ifreq *) data; #ifdef INET struct ifaddr *ifa = (struct ifaddr *)data; #endif int mask, error = 0; switch(cmd) { case SIOCSIFADDR: - case SIOCGIFADDR: #ifdef INET XN_LOCK(sc); if (ifa->ifa_addr->sa_family == AF_INET) { ifp->if_flags |= IFF_UP; if (!(ifp->if_drv_flags & IFF_DRV_RUNNING)) xn_ifinit_locked(sc); arp_ifinit(ifp, ifa); XN_UNLOCK(sc); } else { XN_UNLOCK(sc); #endif error = ether_ioctl(ifp, cmd, data); #ifdef INET } #endif break; case SIOCSIFMTU: /* XXX can we alter the MTU on a VN ?*/ #ifdef notyet if (ifr->ifr_mtu > XN_JUMBO_MTU) error = EINVAL; else #endif { ifp->if_mtu = ifr->ifr_mtu; ifp->if_drv_flags &= ~IFF_DRV_RUNNING; xn_ifinit(sc); } break; case SIOCSIFFLAGS: XN_LOCK(sc); if (ifp->if_flags & IFF_UP) { /* * If only the state of the PROMISC flag changed, * then just use the 'set promisc mode' command * instead of reinitializing the entire NIC. Doing * a full re-init means reloading the firmware and * waiting for it to start up, which may take a * second or two. */ #ifdef notyet /* No promiscuous mode with Xen */ if (ifp->if_drv_flags & IFF_DRV_RUNNING && ifp->if_flags & IFF_PROMISC && !(sc->xn_if_flags & IFF_PROMISC)) { XN_SETBIT(sc, XN_RX_MODE, XN_RXMODE_RX_PROMISC); } else if (ifp->if_drv_flags & IFF_DRV_RUNNING && !(ifp->if_flags & IFF_PROMISC) && sc->xn_if_flags & IFF_PROMISC) { XN_CLRBIT(sc, XN_RX_MODE, XN_RXMODE_RX_PROMISC); } else #endif xn_ifinit_locked(sc); } else { if (ifp->if_drv_flags & IFF_DRV_RUNNING) { xn_stop(sc); } } sc->xn_if_flags = ifp->if_flags; XN_UNLOCK(sc); error = 0; break; case SIOCSIFCAP: mask = ifr->ifr_reqcap ^ ifp->if_capenable; if (mask & IFCAP_TXCSUM) { if (IFCAP_TXCSUM & ifp->if_capenable) { ifp->if_capenable &= ~(IFCAP_TXCSUM|IFCAP_TSO4); ifp->if_hwassist &= ~(CSUM_TCP | CSUM_UDP | CSUM_IP | CSUM_TSO); } else { ifp->if_capenable |= IFCAP_TXCSUM; ifp->if_hwassist |= (CSUM_TCP | CSUM_UDP | CSUM_IP); } } if (mask & IFCAP_RXCSUM) { ifp->if_capenable ^= IFCAP_RXCSUM; } #if __FreeBSD_version >= 700000 if (mask & IFCAP_TSO4) { if (IFCAP_TSO4 & ifp->if_capenable) { ifp->if_capenable &= ~IFCAP_TSO4; ifp->if_hwassist &= ~CSUM_TSO; } else if (IFCAP_TXCSUM & ifp->if_capenable) { ifp->if_capenable |= IFCAP_TSO4; ifp->if_hwassist |= CSUM_TSO; } else { IPRINTK("Xen requires tx checksum offload" " be enabled to use TSO\n"); error = EINVAL; } } if (mask & IFCAP_LRO) { ifp->if_capenable ^= IFCAP_LRO; } #endif error = 0; break; case SIOCADDMULTI: case SIOCDELMULTI: #ifdef notyet if (ifp->if_drv_flags & IFF_DRV_RUNNING) { XN_LOCK(sc); xn_setmulti(sc); XN_UNLOCK(sc); error = 0; } #endif /* FALLTHROUGH */ case SIOCSIFMEDIA: case SIOCGIFMEDIA: error = ifmedia_ioctl(ifp, ifr, &sc->sc_media, cmd); break; default: error = ether_ioctl(ifp, cmd, data); } return (error); } static void xn_stop(struct netfront_info *sc) { struct ifnet *ifp; XN_LOCK_ASSERT(sc); ifp = sc->xn_ifp; callout_stop(&sc->xn_stat_ch); xn_free_rx_ring(sc); xn_free_tx_ring(sc); ifp->if_drv_flags &= ~(IFF_DRV_RUNNING | IFF_DRV_OACTIVE); if_link_state_change(ifp, LINK_STATE_DOWN); } /* START of Xenolinux helper functions adapted to FreeBSD */ int network_connect(struct netfront_info *np) { int i, requeue_idx, error; grant_ref_t ref; netif_rx_request_t *req; u_int feature_rx_copy, feature_rx_flip; error = xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev), "feature-rx-copy", NULL, "%u", &feature_rx_copy); if (error) feature_rx_copy = 0; error = xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev), "feature-rx-flip", NULL, "%u", &feature_rx_flip); if (error) feature_rx_flip = 1; /* * Copy packets on receive path if: * (a) This was requested by user, and the backend supports it; or * (b) Flipping was requested, but this is unsupported by the backend. */ np->copying_receiver = ((MODPARM_rx_copy && feature_rx_copy) || (MODPARM_rx_flip && !feature_rx_flip)); /* Recovery procedure: */ error = talk_to_backend(np->xbdev, np); if (error) return (error); /* Step 1: Reinitialise variables. */ xn_query_features(np); xn_configure_features(np); netif_release_tx_bufs(np); /* Step 2: Rebuild the RX buffer freelist and the RX ring itself. */ for (requeue_idx = 0, i = 0; i < NET_RX_RING_SIZE; i++) { struct mbuf *m; u_long pfn; if (np->rx_mbufs[i] == NULL) continue; m = np->rx_mbufs[requeue_idx] = xennet_get_rx_mbuf(np, i); ref = np->grant_rx_ref[requeue_idx] = xennet_get_rx_ref(np, i); req = RING_GET_REQUEST(&np->rx, requeue_idx); pfn = vtophys(mtod(m, vm_offset_t)) >> PAGE_SHIFT; if (!np->copying_receiver) { gnttab_grant_foreign_transfer_ref(ref, xenbus_get_otherend_id(np->xbdev), pfn); } else { gnttab_grant_foreign_access_ref(ref, xenbus_get_otherend_id(np->xbdev), PFNTOMFN(pfn), 0); } req->gref = ref; req->id = requeue_idx; requeue_idx++; } np->rx.req_prod_pvt = requeue_idx; /* Step 3: All public and private state should now be sane. Get * ready to start sending and receiving packets and give the driver * domain a kick because we've probably just requeued some * packets. */ netfront_carrier_on(np); xen_intr_signal(np->xen_intr_handle); XN_TX_LOCK(np); xn_txeof(np); XN_TX_UNLOCK(np); network_alloc_rx_buffers(np); return (0); } static void xn_query_features(struct netfront_info *np) { int val; device_printf(np->xbdev, "backend features:"); if (xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev), "feature-sg", NULL, "%d", &val) < 0) val = 0; np->maxfrags = 1; if (val) { np->maxfrags = MAX_TX_REQ_FRAGS; printf(" feature-sg"); } if (xs_scanf(XST_NIL, xenbus_get_otherend_path(np->xbdev), "feature-gso-tcpv4", NULL, "%d", &val) < 0) val = 0; np->xn_ifp->if_capabilities &= ~(IFCAP_TSO4|IFCAP_LRO); if (val) { np->xn_ifp->if_capabilities |= IFCAP_TSO4|IFCAP_LRO; printf(" feature-gso-tcp4"); } printf("\n"); } static int xn_configure_features(struct netfront_info *np) { int err; err = 0; #if __FreeBSD_version >= 700000 && (defined(INET) || defined(INET6)) if ((np->xn_ifp->if_capenable & IFCAP_LRO) != 0) tcp_lro_free(&np->xn_lro); #endif np->xn_ifp->if_capenable = np->xn_ifp->if_capabilities & ~(IFCAP_LRO|IFCAP_TSO4); np->xn_ifp->if_hwassist &= ~CSUM_TSO; #if __FreeBSD_version >= 700000 && (defined(INET) || defined(INET6)) if (xn_enable_lro && (np->xn_ifp->if_capabilities & IFCAP_LRO) != 0) { err = tcp_lro_init(&np->xn_lro); if (err) { device_printf(np->xbdev, "LRO initialization failed\n"); } else { np->xn_lro.ifp = np->xn_ifp; np->xn_ifp->if_capenable |= IFCAP_LRO; } } if ((np->xn_ifp->if_capabilities & IFCAP_TSO4) != 0) { np->xn_ifp->if_capenable |= IFCAP_TSO4; np->xn_ifp->if_hwassist |= CSUM_TSO; } #endif return (err); } /** * Create a network device. * @param dev Newbus device representing this virtual NIC. */ int create_netdev(device_t dev) { int i; struct netfront_info *np; int err; struct ifnet *ifp; np = device_get_softc(dev); np->xbdev = dev; XN_LOCK_INIT(np, xennetif); ifmedia_init(&np->sc_media, 0, xn_ifmedia_upd, xn_ifmedia_sts); ifmedia_add(&np->sc_media, IFM_ETHER|IFM_MANUAL, 0, NULL); ifmedia_set(&np->sc_media, IFM_ETHER|IFM_MANUAL); np->rx_target = RX_MIN_TARGET; np->rx_min_target = RX_MIN_TARGET; np->rx_max_target = RX_MAX_TARGET; /* Initialise {tx,rx}_skbs to be a free chain containing every entry. */ for (i = 0; i <= NET_TX_RING_SIZE; i++) { np->tx_mbufs[i] = (void *) ((u_long) i+1); np->grant_tx_ref[i] = GRANT_REF_INVALID; } np->tx_mbufs[NET_TX_RING_SIZE] = (void *)0; for (i = 0; i <= NET_RX_RING_SIZE; i++) { np->rx_mbufs[i] = NULL; np->grant_rx_ref[i] = GRANT_REF_INVALID; } /* A grant for every tx ring slot */ if (gnttab_alloc_grant_references(NET_TX_RING_SIZE, &np->gref_tx_head) != 0) { IPRINTK("#### netfront can't alloc tx grant refs\n"); err = ENOMEM; goto exit; } /* A grant for every rx ring slot */ if (gnttab_alloc_grant_references(RX_MAX_TARGET, &np->gref_rx_head) != 0) { WPRINTK("#### netfront can't alloc rx grant refs\n"); gnttab_free_grant_references(np->gref_tx_head); err = ENOMEM; goto exit; } err = xen_net_read_mac(dev, np->mac); if (err) goto out; /* Set up ifnet structure */ ifp = np->xn_ifp = if_alloc(IFT_ETHER); ifp->if_softc = np; if_initname(ifp, "xn", device_get_unit(dev)); ifp->if_flags = IFF_BROADCAST | IFF_SIMPLEX | IFF_MULTICAST; ifp->if_ioctl = xn_ioctl; ifp->if_output = ether_output; ifp->if_start = xn_start; #ifdef notyet ifp->if_watchdog = xn_watchdog; #endif ifp->if_init = xn_ifinit; ifp->if_snd.ifq_maxlen = NET_TX_RING_SIZE - 1; ifp->if_hwassist = XN_CSUM_FEATURES; ifp->if_capabilities = IFCAP_HWCSUM; ifp->if_hw_tsomax = 65536 - (ETHER_HDR_LEN + ETHER_VLAN_ENCAP_LEN); ifp->if_hw_tsomaxsegcount = MAX_TX_REQ_FRAGS; ifp->if_hw_tsomaxsegsize = PAGE_SIZE; ether_ifattach(ifp, np->mac); callout_init(&np->xn_stat_ch, CALLOUT_MPSAFE); netfront_carrier_off(np); return (0); exit: gnttab_free_grant_references(np->gref_tx_head); out: return (err); } /** * Handle the change of state of the backend to Closing. We must delete our * device-layer structures now, to ensure that writes are flushed through to * the backend. Once is this done, we can switch to Closed in * acknowledgement. */ #if 0 static void netfront_closing(device_t dev) { #if 0 struct netfront_info *info = dev->dev_driver_data; DPRINTK("netfront_closing: %s removed\n", dev->nodename); close_netdev(info); #endif xenbus_switch_state(dev, XenbusStateClosed); } #endif static int netfront_detach(device_t dev) { struct netfront_info *info = device_get_softc(dev); DPRINTK("%s\n", xenbus_get_node(dev)); netif_free(info); return 0; } static void netif_free(struct netfront_info *info) { XN_LOCK(info); xn_stop(info); XN_UNLOCK(info); callout_drain(&info->xn_stat_ch); netif_disconnect_backend(info); if (info->xn_ifp != NULL) { ether_ifdetach(info->xn_ifp); if_free(info->xn_ifp); info->xn_ifp = NULL; } ifmedia_removeall(&info->sc_media); } static void netif_disconnect_backend(struct netfront_info *info) { XN_RX_LOCK(info); XN_TX_LOCK(info); netfront_carrier_off(info); XN_TX_UNLOCK(info); XN_RX_UNLOCK(info); free_ring(&info->tx_ring_ref, &info->tx.sring); free_ring(&info->rx_ring_ref, &info->rx.sring); xen_intr_unbind(&info->xen_intr_handle); } static void free_ring(int *ref, void *ring_ptr_ref) { void **ring_ptr_ptr = ring_ptr_ref; if (*ref != GRANT_REF_INVALID) { /* This API frees the associated storage. */ gnttab_end_foreign_access(*ref, *ring_ptr_ptr); *ref = GRANT_REF_INVALID; } *ring_ptr_ptr = NULL; } static int xn_ifmedia_upd(struct ifnet *ifp) { return (0); } static void xn_ifmedia_sts(struct ifnet *ifp, struct ifmediareq *ifmr) { ifmr->ifm_status = IFM_AVALID|IFM_ACTIVE; ifmr->ifm_active = IFM_ETHER|IFM_MANUAL; } /* ** Driver registration ** */ static device_method_t netfront_methods[] = { /* Device interface */ DEVMETHOD(device_probe, netfront_probe), DEVMETHOD(device_attach, netfront_attach), DEVMETHOD(device_detach, netfront_detach), DEVMETHOD(device_shutdown, bus_generic_shutdown), DEVMETHOD(device_suspend, netfront_suspend), DEVMETHOD(device_resume, netfront_resume), /* Xenbus interface */ DEVMETHOD(xenbus_otherend_changed, netfront_backend_changed), DEVMETHOD_END }; static driver_t netfront_driver = { "xn", netfront_methods, sizeof(struct netfront_info), }; devclass_t netfront_devclass; DRIVER_MODULE(xe, xenbusb_front, netfront_driver, netfront_devclass, NULL, NULL); Index: projects/ifnet/sys =================================================================== --- projects/ifnet/sys (revision 277106) +++ projects/ifnet/sys (revision 277107) Property changes on: projects/ifnet/sys ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head/sys:r277094-277106 Index: projects/ifnet/usr.bin/sed/process.c =================================================================== --- projects/ifnet/usr.bin/sed/process.c (revision 277106) +++ projects/ifnet/usr.bin/sed/process.c (revision 277107) @@ -1,783 +1,783 @@ /*- * Copyright (c) 1992 Diomidis Spinellis. * Copyright (c) 1992, 1993, 1994 * The Regents of the University of California. All rights reserved. * * This code is derived from software contributed to Berkeley by * Diomidis Spinellis of Imperial College, University of London. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 4. Neither the name of the University nor the names of its contributors * may be used to endorse or promote products derived from this software * without specific prior written permission. * * THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. */ #include __FBSDID("$FreeBSD$"); #ifndef lint static const char sccsid[] = "@(#)process.c 8.6 (Berkeley) 4/20/94"; #endif #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include #include "defs.h" #include "extern.h" static SPACE HS, PS, SS, YS; #define pd PS.deleted #define ps PS.space #define psl PS.len #define psanl PS.append_newline #define hs HS.space #define hsl HS.len -static __inline int applies(struct s_command *); +static inline int applies(struct s_command *); static void do_tr(struct s_tr *); static void flush_appends(void); static void lputs(char *, size_t); static int regexec_e(regex_t *, const char *, int, int, size_t); static void regsub(SPACE *, char *, char *); static int substitute(struct s_command *); struct s_appends *appends; /* Array of pointers to strings to append. */ static int appendx; /* Index into appends array. */ int appendnum; /* Size of appends array. */ static int lastaddr; /* Set by applies if last address of a range. */ static int sdone; /* If any substitutes since last line input. */ /* Iov structure for 'w' commands. */ static regex_t *defpreg; size_t maxnsub; regmatch_t *match; #define OUT() do { \ fwrite(ps, 1, psl, outfile); \ if (psanl) fputc('\n', outfile); \ } while (0) void process(void) { struct s_command *cp; SPACE tspace; size_t oldpsl = 0; char *p; int oldpsanl; p = NULL; for (linenum = 0; mf_fgets(&PS, REPLACE);) { pd = 0; top: cp = prog; redirect: while (cp != NULL) { if (!applies(cp)) { cp = cp->next; continue; } switch (cp->code) { case '{': cp = cp->u.c; goto redirect; case 'a': if (appendx >= appendnum) if ((appends = realloc(appends, sizeof(struct s_appends) * (appendnum *= 2))) == NULL) err(1, "realloc"); appends[appendx].type = AP_STRING; appends[appendx].s = cp->t; appends[appendx].len = strlen(cp->t); appendx++; break; case 'b': cp = cp->u.c; goto redirect; case 'c': pd = 1; psl = 0; if (cp->a2 == NULL || lastaddr || lastline()) (void)fprintf(outfile, "%s", cp->t); break; case 'd': pd = 1; goto new; case 'D': if (pd) goto new; if (psl == 0 || (p = memchr(ps, '\n', psl)) == NULL) { pd = 1; goto new; } else { psl -= (p + 1) - ps; memmove(ps, p + 1, psl); goto top; } case 'g': cspace(&PS, hs, hsl, REPLACE); break; case 'G': cspace(&PS, "\n", 1, APPEND); cspace(&PS, hs, hsl, APPEND); break; case 'h': cspace(&HS, ps, psl, REPLACE); break; case 'H': cspace(&HS, "\n", 1, APPEND); cspace(&HS, ps, psl, APPEND); break; case 'i': (void)fprintf(outfile, "%s", cp->t); break; case 'l': lputs(ps, psl); break; case 'n': if (!nflag && !pd) OUT(); flush_appends(); if (!mf_fgets(&PS, REPLACE)) exit(0); pd = 0; break; case 'N': flush_appends(); cspace(&PS, "\n", 1, APPEND); if (!mf_fgets(&PS, APPEND)) exit(0); break; case 'p': if (pd) break; OUT(); break; case 'P': if (pd) break; if ((p = memchr(ps, '\n', psl)) != NULL) { oldpsl = psl; oldpsanl = psanl; psl = p - ps; psanl = 1; } OUT(); if (p != NULL) { psl = oldpsl; psanl = oldpsanl; } break; case 'q': if (!nflag && !pd) OUT(); flush_appends(); exit(0); case 'r': if (appendx >= appendnum) if ((appends = realloc(appends, sizeof(struct s_appends) * (appendnum *= 2))) == NULL) err(1, "realloc"); appends[appendx].type = AP_FILE; appends[appendx].s = cp->t; appends[appendx].len = strlen(cp->t); appendx++; break; case 's': sdone |= substitute(cp); break; case 't': if (sdone) { sdone = 0; cp = cp->u.c; goto redirect; } break; case 'w': if (pd) break; if (cp->u.fd == -1 && (cp->u.fd = open(cp->t, O_WRONLY|O_APPEND|O_CREAT|O_TRUNC, DEFFILEMODE)) == -1) err(1, "%s", cp->t); if (write(cp->u.fd, ps, psl) != (ssize_t)psl || write(cp->u.fd, "\n", 1) != 1) err(1, "%s", cp->t); break; case 'x': /* * If the hold space is null, make it empty * but not null. Otherwise the pattern space * will become null after the swap, which is * an abnormal condition. */ if (hs == NULL) cspace(&HS, "", 0, REPLACE); tspace = PS; PS = HS; psanl = tspace.append_newline; HS = tspace; break; case 'y': if (pd || psl == 0) break; do_tr(cp->u.y); break; case ':': case '}': break; case '=': (void)fprintf(outfile, "%lu\n", linenum); } cp = cp->next; } /* for all cp */ new: if (!nflag && !pd) OUT(); flush_appends(); } /* for all lines */ } /* * TRUE if the address passed matches the current program state * (lastline, linenumber, ps). */ #define MATCH(a) \ ((a)->type == AT_RE ? regexec_e((a)->u.r, ps, 0, 1, psl) : \ (a)->type == AT_LINE ? linenum == (a)->u.l : lastline()) /* * Return TRUE if the command applies to the current line. Sets the start * line for process ranges. Interprets the non-select (``!'') flag. */ -static __inline int +static inline int applies(struct s_command *cp) { int r; lastaddr = 0; if (cp->a1 == NULL && cp->a2 == NULL) r = 1; else if (cp->a2) if (cp->startline > 0) { switch (cp->a2->type) { case AT_RELLINE: if (linenum - cp->startline <= cp->a2->u.l) r = 1; else { cp->startline = 0; r = 0; } break; default: if (MATCH(cp->a2)) { cp->startline = 0; lastaddr = 1; r = 1; } else if (cp->a2->type == AT_LINE && linenum > cp->a2->u.l) { /* * We missed the 2nd address due to a * branch, so just close the range and * return false. */ cp->startline = 0; r = 0; } else r = 1; } } else if (MATCH(cp->a1)) { /* * If the second address is a number less than or * equal to the line number first selected, only * one line shall be selected. * -- POSIX 1003.2 * Likewise if the relative second line address is zero. */ if ((cp->a2->type == AT_LINE && linenum >= cp->a2->u.l) || (cp->a2->type == AT_RELLINE && cp->a2->u.l == 0)) lastaddr = 1; else { cp->startline = linenum; } r = 1; } else r = 0; else r = MATCH(cp->a1); return (cp->nonsel ? ! r : r); } /* * Reset the sed processor to its initial state. */ void resetstate(void) { struct s_command *cp; /* * Reset all in-range markers. */ for (cp = prog; cp; cp = cp->code == '{' ? cp->u.c : cp->next) if (cp->a2) cp->startline = 0; /* * Clear out the hold space. */ cspace(&HS, "", 0, REPLACE); } /* * substitute -- * Do substitutions in the pattern space. Currently, we build a * copy of the new pattern space in the substitute space structure * and then swap them. */ static int substitute(struct s_command *cp) { SPACE tspace; regex_t *re; regoff_t re_off, slen; int lastempty, n; char *s; s = ps; re = cp->u.s->re; if (re == NULL) { if (defpreg != NULL && cp->u.s->maxbref > defpreg->re_nsub) { linenum = cp->u.s->linenum; errx(1, "%lu: %s: \\%u not defined in the RE", linenum, fname, cp->u.s->maxbref); } } if (!regexec_e(re, s, 0, 0, psl)) return (0); SS.len = 0; /* Clean substitute space. */ slen = psl; n = cp->u.s->n; lastempty = 1; switch (n) { case 0: /* Global */ do { if (lastempty || match[0].rm_so != match[0].rm_eo) { /* Locate start of replaced string. */ re_off = match[0].rm_so; /* Copy leading retained string. */ cspace(&SS, s, re_off, APPEND); /* Add in regular expression. */ regsub(&SS, s, cp->u.s->new); } /* Move past this match. */ if (match[0].rm_so != match[0].rm_eo) { s += match[0].rm_eo; slen -= match[0].rm_eo; lastempty = 0; } else { if (match[0].rm_so < slen) cspace(&SS, s + match[0].rm_so, 1, APPEND); s += match[0].rm_so + 1; slen -= match[0].rm_so + 1; lastempty = 1; } } while (slen >= 0 && regexec_e(re, s, REG_NOTBOL, 0, slen)); /* Copy trailing retained string. */ if (slen > 0) cspace(&SS, s, slen, APPEND); break; default: /* Nth occurrence */ while (--n) { if (match[0].rm_eo == match[0].rm_so) match[0].rm_eo = match[0].rm_so + 1; s += match[0].rm_eo; slen -= match[0].rm_eo; if (slen < 0) return (0); if (!regexec_e(re, s, REG_NOTBOL, 0, slen)) return (0); } /* FALLTHROUGH */ case 1: /* 1st occurrence */ /* Locate start of replaced string. */ re_off = match[0].rm_so + (s - ps); /* Copy leading retained string. */ cspace(&SS, ps, re_off, APPEND); /* Add in regular expression. */ regsub(&SS, s, cp->u.s->new); /* Copy trailing retained string. */ s += match[0].rm_eo; slen -= match[0].rm_eo; cspace(&SS, s, slen, APPEND); break; } /* * Swap the substitute space and the pattern space, and make sure * that any leftover pointers into stdio memory get lost. */ tspace = PS; PS = SS; psanl = tspace.append_newline; SS = tspace; SS.space = SS.back; /* Handle the 'p' flag. */ if (cp->u.s->p) OUT(); /* Handle the 'w' flag. */ if (cp->u.s->wfile && !pd) { if (cp->u.s->wfd == -1 && (cp->u.s->wfd = open(cp->u.s->wfile, O_WRONLY|O_APPEND|O_CREAT|O_TRUNC, DEFFILEMODE)) == -1) err(1, "%s", cp->u.s->wfile); if (write(cp->u.s->wfd, ps, psl) != (ssize_t)psl || write(cp->u.s->wfd, "\n", 1) != 1) err(1, "%s", cp->u.s->wfile); } return (1); } /* * do_tr -- * Perform translation ('y' command) in the pattern space. */ static void do_tr(struct s_tr *y) { SPACE tmp; char c, *p; size_t clen, left; int i; if (MB_CUR_MAX == 1) { /* * Single-byte encoding: perform in-place translation * of the pattern space. */ for (p = ps; p < &ps[psl]; p++) *p = y->bytetab[(u_char)*p]; } else { /* * Multi-byte encoding: perform translation into the * translation space, then swap the translation and * pattern spaces. */ /* Clean translation space. */ YS.len = 0; for (p = ps, left = psl; left > 0; p += clen, left -= clen) { if ((c = y->bytetab[(u_char)*p]) != '\0') { cspace(&YS, &c, 1, APPEND); clen = 1; continue; } for (i = 0; i < y->nmultis; i++) if (left >= y->multis[i].fromlen && memcmp(p, y->multis[i].from, y->multis[i].fromlen) == 0) break; if (i < y->nmultis) { cspace(&YS, y->multis[i].to, y->multis[i].tolen, APPEND); clen = y->multis[i].fromlen; } else { cspace(&YS, p, 1, APPEND); clen = 1; } } /* Swap the translation space and the pattern space. */ tmp = PS; PS = YS; psanl = tmp.append_newline; YS = tmp; YS.space = YS.back; } } /* * Flush append requests. Always called before reading a line, * therefore it also resets the substitution done (sdone) flag. */ static void flush_appends(void) { FILE *f; int count, i; char buf[8 * 1024]; for (i = 0; i < appendx; i++) switch (appends[i].type) { case AP_STRING: fwrite(appends[i].s, sizeof(char), appends[i].len, outfile); break; case AP_FILE: /* * Read files probably shouldn't be cached. Since * it's not an error to read a non-existent file, * it's possible that another program is interacting * with the sed script through the filesystem. It * would be truly bizarre, but possible. It's probably * not that big a performance win, anyhow. */ if ((f = fopen(appends[i].s, "r")) == NULL) break; while ((count = fread(buf, sizeof(char), sizeof(buf), f))) (void)fwrite(buf, sizeof(char), count, outfile); (void)fclose(f); break; } if (ferror(outfile)) errx(1, "%s: %s", outfname, strerror(errno ? errno : EIO)); appendx = sdone = 0; } static void lputs(char *s, size_t len) { static const char escapes[] = "\\\a\b\f\r\t\v"; int c, col, width; const char *p; struct winsize win; static int termwidth = -1; size_t clen, i; wchar_t wc; mbstate_t mbs; if (outfile != stdout) termwidth = 60; if (termwidth == -1) { if ((p = getenv("COLUMNS")) && *p != '\0') termwidth = atoi(p); else if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &win) == 0 && win.ws_col > 0) termwidth = win.ws_col; else termwidth = 60; } if (termwidth <= 0) termwidth = 1; memset(&mbs, 0, sizeof(mbs)); col = 0; while (len != 0) { clen = mbrtowc(&wc, s, len, &mbs); if (clen == 0) clen = 1; if (clen == (size_t)-1 || clen == (size_t)-2) { wc = (unsigned char)*s; clen = 1; memset(&mbs, 0, sizeof(mbs)); } if (wc == '\n') { if (col + 1 >= termwidth) fprintf(outfile, "\\\n"); fputc('$', outfile); fputc('\n', outfile); col = 0; } else if (iswprint(wc)) { width = wcwidth(wc); if (col + width >= termwidth) { fprintf(outfile, "\\\n"); col = 0; } fwrite(s, 1, clen, outfile); col += width; } else if (wc != L'\0' && (c = wctob(wc)) != EOF && (p = strchr(escapes, c)) != NULL) { if (col + 2 >= termwidth) { fprintf(outfile, "\\\n"); col = 0; } fprintf(outfile, "\\%c", "\\abfrtv"[p - escapes]); col += 2; } else { if (col + 4 * clen >= (unsigned)termwidth) { fprintf(outfile, "\\\n"); col = 0; } for (i = 0; i < clen; i++) fprintf(outfile, "\\%03o", (int)(unsigned char)s[i]); col += 4 * clen; } s += clen; len -= clen; } if (col + 1 >= termwidth) fprintf(outfile, "\\\n"); (void)fputc('$', outfile); (void)fputc('\n', outfile); if (ferror(outfile)) errx(1, "%s: %s", outfname, strerror(errno ? errno : EIO)); } static int regexec_e(regex_t *preg, const char *string, int eflags, int nomatch, size_t slen) { int eval; if (preg == NULL) { if (defpreg == NULL) errx(1, "first RE may not be empty"); } else defpreg = preg; /* Set anchors */ match[0].rm_so = 0; match[0].rm_eo = slen; eval = regexec(defpreg, string, nomatch ? 0 : maxnsub + 1, match, eflags | REG_STARTEND); switch(eval) { case 0: return (1); case REG_NOMATCH: return (0); } errx(1, "RE error: %s", strregerror(eval, defpreg)); /* NOTREACHED */ } /* * regsub - perform substitutions after a regexp match * Based on a routine by Henry Spencer */ static void regsub(SPACE *sp, char *string, char *src) { int len, no; char c, *dst; #define NEEDSP(reqlen) \ /* XXX What is the +1 for? */ \ if (sp->len + (reqlen) + 1 >= sp->blen) { \ sp->blen += (reqlen) + 1024; \ if ((sp->space = sp->back = realloc(sp->back, sp->blen)) \ == NULL) \ err(1, "realloc"); \ dst = sp->space + sp->len; \ } dst = sp->space + sp->len; while ((c = *src++) != '\0') { if (c == '&') no = 0; else if (c == '\\' && isdigit((unsigned char)*src)) no = *src++ - '0'; else no = -1; if (no < 0) { /* Ordinary character. */ if (c == '\\' && (*src == '\\' || *src == '&')) c = *src++; NEEDSP(1); *dst++ = c; ++sp->len; } else if (match[no].rm_so != -1 && match[no].rm_eo != -1) { len = match[no].rm_eo - match[no].rm_so; NEEDSP(len); memmove(dst, string + match[no].rm_so, len); dst += len; sp->len += len; } } NEEDSP(1); *dst = '\0'; } /* * cspace -- * Concatenate space: append the source space to the destination space, * allocating new space as necessary. */ void cspace(SPACE *sp, const char *p, size_t len, enum e_spflag spflag) { size_t tlen; /* Make sure SPACE has enough memory and ramp up quickly. */ tlen = sp->len + len + 1; if (tlen > sp->blen) { sp->blen = tlen + 1024; if ((sp->space = sp->back = realloc(sp->back, sp->blen)) == NULL) err(1, "realloc"); } if (spflag == REPLACE) sp->len = 0; memmove(sp->space + sp->len, p, len); sp->space[sp->len += len] = '\0'; } /* * Close all cached opened files and report any errors */ void cfclose(struct s_command *cp, struct s_command *end) { for (; cp != end; cp = cp->next) switch(cp->code) { case 's': if (cp->u.s->wfd != -1 && close(cp->u.s->wfd)) err(1, "%s", cp->u.s->wfile); cp->u.s->wfd = -1; break; case 'w': if (cp->u.fd != -1 && close(cp->u.fd)) err(1, "%s", cp->t); cp->u.fd = -1; break; case '{': cfclose(cp->u.c, cp->next); break; } } Index: projects/ifnet =================================================================== --- projects/ifnet (revision 277106) +++ projects/ifnet (revision 277107) Property changes on: projects/ifnet ___________________________________________________________________ Modified: svn:mergeinfo ## -0,0 +0,1 ## Merged /head:r277094-277106