Service Fabric

Table of ContentsOverview What is Service Fabric? Understand microservices Application scenarios Architecture Terminology Primer Get Started Set up your development environment Windows Linux Mac OS Create your first application C# on Windows Java on Linux C# on Linux Deploy apps on a local cluster Create your first cluster on Azure Create your first standalone cluster How To Build an application Patterns and scenarios Basics Guest executable application Container application Reliable Service application Reliable Actor application Migrate from Cloud Services Compare Cloud Services with Service Fabric Migrate to Service Fabric Recomended practices Create and manage clusters Basics Clusters on Azure Standalone clusters Manage application lifecycle Overview Set up continuous integration with VSTS Deploy your Linux Java app using Jenkins Understand the ImageStoreConnectionString setting Deploy or remove applications Application upgrade overview Configure application upgrade Application upgrade parameters Upgrade an application Troubleshoot application upgrades Data serialization in application upgrades Application upgrades advanced topics Inspect application and cluster health Monitor Service Fabric health Report and check service health Add custom health reports Troubleshoot with system health reports View health reports Monitor and diagnose Monitor and diagnose applications Monitor and diagnose services locally Azure Diagnostics logs Collect logs from a service process Diagnostics in stateful Reliable Services Diagnostics in Reliable Actors Troubleshoot your local cluster Scale applications Partition Reliable Services Availability of services Scale applications Plan capacity of applications Test applications and services Fault Analysis overview Test service-to-service communication Simulate failures Load test your application Manage and orchestrate cluster resources Cluster Resource Manager overview Cluster Resource Manager architecture Describe a cluster Application groups overview Configure Cluster Resource Manager settings Resource consumption metrics Use service affinity Service placement policies Manage a cluster Cluster defragmentation Balance a cluster Throttling Service movement Reference PowerShell Java API .NET REST Resources Common questions about Service Fabric Service Fabric support options Sample code Learning path Pricing Service Updates MSDN Forum Videos Overview of Azure Service Fabric 1/24/2017 • 7 min to read • Edit Online Azure Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices. Service Fabric also addresses the significant challenges in developing and managing cloud applications. Developers and administrators can avoid complex infrastructure problems and focus on implementing mission-critical, demanding workloads that are scalable, reliable, and manageable. Service Fabric represents the next-generation middleware platform for building and managing these enterprise-class, tier-1, cloud-scale applications. This short Channel9 video introduces Service Fabric and microservices: This longer Microsoft Virtual Academy video describes the Service Fabric core concepts: Applications composed of microservices Service Fabric enables you to build and manage scalable and reliable applications composed of microservices that run at very high density on a shared pool of machines, which is referred to as a cluster. It provides a sophisticated runtime to build distributed, scalable, stateless and stateful microservices. It also provides comprehensive application management capabilities to provision, deploy, monitor, upgrade/patch, and delete deployed applications. Why is a microservices approach important? The two main reasons are: You can scale different parts of your application depending on the needs of the application. Development teams can be more agile as they roll out changes and thus provide features to customers faster and more frequently. Service Fabric powers many Microsoft services today, including Azure SQL Database, Azure DocumentDB, Cortana, Microsoft Power BI, Microsoft Intune, Azure Event Hubs, Azure IoT Hub, Skype for Business, and many core Azure services. Service Fabric is tailored to create cloud native services that can start small, as needed, and grow to massive scale with hundreds or thousands of machines. Today's Internet-scale services are built of microservices. Examples of microservices include protocol gateways, user profiles, shopping carts, inventory processing, queues, and caches. Service Fabric is a microservices platform that gives every microservice a unique name that can be either stateless or stateful. Service Fabric provides comprehensive runtime and lifecycle management capabilities to applications that are composed of these microservices. It hosts microservices inside containers that are deployed and activated across the Service Fabric cluster. A move from virtual machines to containers makes possible an order-of-magnitude increase in density. Similarly, another order of magnitude in density becomes possible when you move from containers to microservices. For example, a single cluster for Azure SQL Database comprises hundreds of machines running tens of thousands of containers that host a total of hundreds of thousands of databases. Each database is a Service Fabric stateful microservice. The same is true of the other previously mentioned services, which is why the term hyperscale is used to describe Service Fabric capabilities. If containers give you high density, then microservices give you hyperscale. For more on the microservices approach, read Why a microservices approach to building applications?. Container deployment and orchestration Service Fabric is an orchestrator of microservices across a cluster of machines. Microservices can be developed in many ways from using the Service Fabric programming models to deploying guest executables. Service Fabric can deploy services in container images. Importantly, you can mix both services in processes and services in containers in the same application. If you just want to deploy and manage container images across a cluster of machines, Service Fabric is a perfect choice. Create clusters for Service Fabric anywhere You can create clusters for Service Fabric in many environments, including Azure or on premises, on Windows Server, or on Linux. In addition, the development environment in the SDK is identical to the production environment, and no emulators are involved. In other words, what runs on your local development cluster deploys to the same cluster in other environments. For more information on creating clusters on-premises, read creating a cluster on Windows Server or Linux or for Azure creating a cluster via the Azure portal. Stateless and stateful micrososervices for Service Fabric Service Fabric enables you to build applications that consist of microservices. Stateless microservices (such as protocol gateways and web proxies) do not maintain a mutable state outside a request and its response from the service. Azure Cloud Services worker roles are an example of a stateless service. Stateful microservices (such as user accounts, databases, devices, shopping carts, and queues) maintain a mutable, authoritative state beyond the request and its response. Today's Internet-scale applications consist of a combination of stateless and stateful microservices. Why have stateful microservices along with stateless ones? The two main reasons are: You can build high-throughput, low-latency, failure-tolerant online transaction processing (OLTP) services by keeping code and data close on the same machine. Some examples are interactive storefronts, search, Internet of Things (IoT) systems, trading systems, credit card processing and fraud detection systems, and personal record management. You can simplify application design. Stateful microservices remove the need for additional queues and caches, which are traditionally required to address the availability and latency requirements of a purely stateless application. Stateful services are naturally high-availability and low-latency, which reduces the number of moving parts to manage in your application as a whole. For more information about application patterns with Service Fabric, read Application scenarios and Choosing a programming model framework for your service. You can also watch this Microsoft Virtual Academy video for an overview of stateless and stateful services: Application lifecycle management Service Fabric provides support for the full application lifecycle management of cloud applications. This lifecycle includes development through deployment, daily management, and maintenance to eventual decommissioning. Service Fabric application lifecycle management capabilities enable application administrators and IT operators to use simple, low-touch workflows to provision, deploy, patch, and monitor applications. These built-in workflows greatly reduce the burden on IT operators to keep applications continuously available. Most applications consist of a combination of stateless and stateful microservices and other executables/runtimes that are deployed together. By having strong types on the applications and packaged microservices, Service Fabric enables the deployment of multiple application instances. Each instance is managed and upgraded independently. Importantly, Service Fabric can deploy any executables or runtime and make them reliable. For example, Service Fabric deploys .NET, ASP.NET Core, Node.js, Java virtual machines, scripts, Angular or anything else that makes up your application. For more information about application lifecycle management, read Application lifecycle. For more about how to deploy any code, see Deploy a guest executable. You can also watch this Microsoft Virtual Academy video for an overview of app lifecycle management: Key capabilities By using Service Fabric, you can: Develop massively scalable applications that are self-healing. Develop applications that are composed of microservices by using the Service Fabric programming model. Or, you can simply host guest executables and other application frameworks of your choice, such as ASP.NET Core or Node.js. Develop highly reliable stateless and stateful microservices. Deploy and orchestrate containers that include Windows containers and Docker containers across a cluster. These containers can contain guest executables or reliable stateless and stateful microservices. In either case, you get mapping from container port to host port, container discoverability, and automated failover. Simplify the design of your application by using stateful microservices in place of caches and queues. Deploy to Azure or to on-premises datacenters that run Windows or Linux with zero code changes. Write once, and then deploy anywhere to any Service Fabric cluster. Develop with a "datacenter on your machine" approach. The local development environment is the same code that runs in the Azure datacenters. Deploy applications in seconds. Deploy applications at higher density than virtual machines, deploying hundreds or thousands of applications per machine. Deploy different versions of the same application side by side, and upgrade each application independently. Manage the lifecycle of your stateful applications without any downtime, including breaking and nonbreaking upgrades. Manage applications by using .NET APIs, Java (Linux), PowerShell, Azure command-line interface (Linux), or REST interface. Upgrade and patch microservices within applications independently. Monitor and diagnose the health of your applications and set policies for performing automatic repairs. Scale out or scale in the number of nodes in a cluster, and scale up or scale down the size of each node. As you scale nodes, your applications automatically scale and are distributed according to the available resources. Watch the self-healing resource balancer orchestrate the redistribution of applications across the cluster. Service Fabric recovers from failures and optimizes the distribution of load based on available resources. Use the fault analysis service to perform chaos testing on your service to find issues and failures before running in production. Next steps For more information: Why a microservices approach to building applications? Terminology overview Setting up your Service Fabric development environment Choosing a programming model framework for your service Learn about Service Fabric support options Why a microservices approach to building applications? 2/7/2017 • 14 min to read • Edit Online As software developers, there is nothing new in how we think about factoring an application into component parts. It is the central paradigm of object orientation, software abstractions, and componentization. Today, this factorization tends to take the form of classes and interfaces between shared libraries and technology layers. Typically, a tiered approach is taken with a back-end store, middle-tier business logic, and a front-end user interface (UI). What has changed over the last few years is that we, as developers, are building distributed applications that are for the cloud and driven by the business. The changing business needs are: A service that's built and operates at scale to reach customers in new geographical regions (for example). Faster delivery of features and capabilities to be able to respond to customer demands in an agile way. Improved resource utilization to reduce costs. These business needs are affecting how we build applications. For more information about the approach of Azure to microservices, read Microservices: An application revolution powered by the cloud. Monolithic vs. microservice design approach All applications evolve over time. Successful applications evolve by being useful to people. Unsuccessful applications do not evolve and eventually are deprecated. The question becomes: How much do you know about your requirements today, and what will they be in the future? For example, let's say that you are building a reporting application for a department. You are sure that the application will remain within the scope of your company and that the reports will be short-lived. Your choice of approach is different from, say, building a service that delivers video content to tens of millions of customers. Sometimes, getting something out the door as proof of concept is the driving factor, while you know that the application can be redesigned later. There is little point in over-engineering something that never gets used. It’s the usual engineering trade-off. On the other hand, when companies talk about building for the cloud, the expectation is growth and usage. The issue is that growth and scale are unpredictable. We would like to be able to prototype quickly while also knowing that we are on a path to deal with future success. This is the lean startup approach: build, measure, learn, and iterate. During the client-server era, we tended to focus on building tiered applications by using specific technologies in each tier. The term monolithic application has emerged for these approaches. The interfaces tended to be between the tiers, and a more tightly coupled design was used between components within each tier. Developers designed and factored classes that were compiled into libraries and linked together into a few executables and DLLs. There are benefits to such a monolithic design approach. It's often simpler to design, and it has faster calls between components, because these calls are often over interprocess communication (IPC). Also, everyone tests a single product, which tends to be more people-resource efficient. The downside is that there's a tight coupling between tiered layers, and you cannot scale individual components. If you need to perform fixes or upgrades, you have to wait for others to finish their testing. It is more difficult to be agile. Microservices address these downsides and more closely align with the preceding business requirements, but they also have both benefits and liabilities. The benefits of microservices are that each one typically encapsulates simpler business functionality, which you scale up or down, test, deploy, and manage independently. One important benefit of a microservice approach is that teams are driven more by business scenarios than by technology, which the tiered approach encourages. In practice, smaller teams develop a microservice based on a customer scenario and use any technologies they choose. In other words, the organization doesn’t need to standardize tech to maintain monolithic applications. Individual teams that own services can do what makes sense for them based on team expertise or what’s most appropriate to solve the problem. In practice, a set of recommended technologies, such as a particular NoSQL store or web application framework, is preferable. The downside of microservices comes in managing the increased number of separate entities and dealing with more complex deployments and versioning. Network traffic between the microservices increases as well as the corresponding network latencies. Lots of chatty, granular services are a recipe for a performance nightmare. Without tools to help view these dependencies, it is hard to “see” the whole system. Standards make the microservice approach work by agreeing on how to communicate and being tolerant of only the things you need from a service, rather than rigid contracts. It is important to define these contracts up front in the design, because services update independently of each other. Another description coined for designing with a microservices approach is “fine-grained service-oriented architecture (SOA).” At its simplest, the microservices design approach is about a decoupled federation of services, with independent changes to each, and agreed-upon standards for communication. As more cloud apps are produced, people discover that this decomposition of the overall app into independent, scenario-focused services is a better long-term approach. Comparison between application development approaches A monolithic app contains domain-specific functionality and is normally divided by functional layers, such as web, business, and data. You scale a monolithic app by cloning it on multiple servers/virtual machines/containers. A microservice application separates functionality into separate smaller services. The microservices approach scales out by deploying each service independently, creating instances of these services across servers/virtual machines/containers. Designing with a microservice approach is not a panacea for all projects, but it does align more closely with the business objectives described earlier. Starting with a monolithic approach might be acceptable if you know that you will not have the opportunity to rework the code later into a microservice design if necessary. More commonly, you begin with a monolithic app and slowly break it up in stages, starting with the functional areas that need to be more scalable or agile. To summarize, the microservice approach is to compose your application of many small services. The services run in containers that are deployed across a cluster of machines. Smaller teams develop a service that focuses on a scenario and independently test, version, deploy, and scale each service so that the entire application can evolve. What is a microservice? There are different definitions of microservices. If you search the Internet, you'll find many useful resources that provide their own viewpoints and definitions. However, most of the following characteristics of microservices are widely agreed upon: Encapsulate a customer or business scenario. What is the problem you are solving? Developed by a small engineering team. Written in any programming language and use any framework. Consist of code and (optionally) state, both of which are independently versioned, deployed, and scaled. Interact with other microservices over well-defined interfaces and protocols. Have unique names (URLs) used to resolve their location. Remain consistent and available in the presence of failures. You can summarize these characteristics into: Microservice applications are composed of small, independently versioned, and scalable customer-focused services that communicate with each other over standard protocols with well-defined interfaces. We covered the first two points in the preceding section, and now we expand on and clarify the others. Written in any programming language and use any framework As developers, we should be free to choose a language or framework that we want, depending on our skills or the needs of the service. In some services, you might value the performance benefits of C++ above all else. In other services, the ease of managed development in C# or Java might be most important. In some cases, you may need to use a specific partner library, data storage technology, or means of exposing the service to clients. After you have chosen a technology, you come to the operational or lifecycle management and scaling of the service. Allows code and state to be independently versioned, deployed, and scaled However you choose to write your microservices, the code and optionally the state should independently deploy, upgrade, and scale. This is actually one of the harder problems to solve, because it comes down to your choice of technologies. For scaling, understanding how to partition (or shard) both the code and state is challenging. When the code and state use separate technologies, which is common today, the deployment scripts for your microservice need to be able to cope with scaling them both. This is also about agility and flexibility, so you can upgrade some of the microservices without having to upgrade all of them at once. Returning to the monolithic versus microservice approach for a moment, the following diagram shows the differences in the approach to storing state. State storage between application styles The monolithic approach on the left has a single database and tiers of specific technologies. The microservices approach on the right has a graph of interconnected microservices where state is typically scoped to the microservice and various technologies are used. In a monolithic approach, typically the application uses a single database. The advantage is that it is a single location, which makes it easy to deploy. Each component can have a single table to store its state. Teams need to strictly separate state, which is a challenge. Inevitably there are temptations to add a new column to an existing customer table, do a join between tables, and create dependencies at the storage layer. After this happens, you can't scale individual components. In the microservices approach, each service manages and stores its own state. Each service is responsible for scaling both code and state together to meet the demands of the service. A downside is that when there is a need to create views, or queries, of your application’s data, you will need to query across disparate state stores. Typically, this is solved by having a separate microservice that builds a view across a collection of microservices. If you need to perform multiple impromptu queries on the data, each microservice should consider writing its data to a data warehousing service for offline analytics. Versioning is specific to the deployed version of a microservice so that multiple, different versions deploy and run side by side. Versioning addresses the scenarios where a newer version of a microservice fails during upgrade and needs to roll back to an earlier version. The other scenario for versioning is performing A/B-style testing, where different users experience different versions of the service. For example, it is common to upgrade a microservice for a specific set of customers to test new functionality before rolling it out more widely. After lifecycle management of microservices, this now brings us to communication between them. Interacts with other microservices over well-defined interfaces and protocols This topic needs little attention here, because extensive literature about service-oriented architecture that has been published over the past 10 years describes communication patterns. Generally, service communication uses a REST approach with HTTP and TCP protocols and XML or JSON as the serialization format. From an interface perspective, it is about embracing the web design approach. But nothing stops you from using binary protocols or your own data formats. Be prepared for people to have a harder time using your microservices if these are openly available. Has a unique name (URL ) used to resolve its location Remember how we keep saying that the microservice approach is like the web? Like the web, your microservice needs to be addressable wherever it is running. If you are thinking about machines and which one is running a particular microservice, things will go bad quickly. In the same way that DNS resolves a particular URL to a particular machine, your microservice needs to have a unique name so that its current location is discoverable. Microservices need addressable names that make them independent from the infrastructure that they are running on. This implies that there is an interaction between how your service is deployed and how it is discovered, because there needs to be a service registry. Equally, when a machine fails, the registry service must tell you where the service is now running. This brings us to the next topic: resilience and consistency. Remains consistent and available in the presence of failures Dealing with unexpected failures is one of the hardest problems to solve, especially in a distributed system. Much of the code that we write as developers is handling exceptions, and this is also where the most time is spent in testing. The problem is more involved than writing code to handle failures. What happens when the machine where the microservice is running fails? Not only do you need to detect this microservice failure (a hard problem on its own), but you also need something to restart your microservice. A microservice needs to be resilient to failures and restart often on another machine for availability reasons. This also comes down to the state that was saved on behalf of the microservice, where the microservice can recover this state from, and whether the microservice is able to restart successfully. In other words, there needs to be resilience in the compute (the process restarts) as well as resilience in the state or data (no data loss and the data remains consistent). The problems of resiliency are compounded during other scenarios, such as when failures happen during an application upgrade. The microservice, working with the deployment system, doesn't need to recover. It also needs to then decide whether it can continue to move forward to the newer version or instead roll back to a previous version to maintain a consistent state. Questions such as whether enough machines are available to keep moving forward and how to recover previous versions of the microservice need to be considered. This requires the microservice to emit health information to be able to make these decisions. Reports health and diagnostics It may seem obvious, and it is often overlooked, but a microservice must report its health and diagnostics. Otherwise, there is little insight from an operations perspective. Correlating diagnostic events across a set of independent services and dealing with machine clock skews to make sense of the event order is challenging. In the same way that you interact with a microservice over agreed-upon protocols and data formats, there emerges a need for standardization in how to log health and diagnostic events that ultimately end up in an event store for querying and viewing. In a microservices approach, it is key that different teams agree on a single logging format. There needs to be a consistent approach to viewing diagnostic events in the application as a whole. Health is different from diagnostics. Health is about the microservice reporting its current state to take appropriate actions. A good example is working with upgrade and deployment mechanisms to maintain availability. Although a service may be currently unhealthy due to a process crash or machine reboot, the service might still be operational. The last thing you need is to make this worse by performing an upgrade. The best approach is to do an investigation first or allow time for the microservice to recover. Health events from a microservice help us make informed decisions and, in effect, help create self-healing services. Service Fabric as a microservices platform Azure Service Fabric emerged from a transition by Microsoft from delivering box products, which were typically monolithic in style, to delivering services. The experience of building and operating large services, such as Azure SQL Database and Azure DocumentDB, shaped Service Fabric. The platform evolved over time as more and more services adopted it. Importantly, Service Fabric had to run not only in Azure but also in standalone Windows Server deployments. The aim of Service Fabric is to solve the hard problems of building and running a service and utilize infrastructure resources efficiently, so that teams can solve business problems using a microservices approach. Service Fabric provides two broad areas to help you build applications that use a microservices approach: A platform that provides system services to deploy, upgrade, detect, and restart failed services, discover service location, manage state, and monitor health. These system services in effect enable many of the characteristics of microservices previously described. Programming APIs, or frameworks, to help you build applications as microservices: reliable actors and reliable services. Of course, you can choose any code to build your microservice. But these APIs make the job more straightforward, and they integrate with the platform at a deeper level. This way, for example, you can get health and diagnostics information, or you can take advantage of built-in high availability. Service Fabric is agnostic on how you build your service, and you can use any technology.However, it does provide built-in programming APIs that make it easier to build microservices. Are microservices right for my application? Maybe. What we experienced was that as more and more teams in Microsoft began to build for the cloud for business reasons, many of them realized the benefits of taking a microservice-like approach. Bing, for example, has been developing microservices in search for years. For other teams, the microservices approach was new. Teams found that there were hard problems to solve outside of their core areas of strength. This is why Service Fabric gained traction as the technology of choice for building services. The objective of Service Fabric is to reduce the complexities of building applications with a microservice approach, so that you do not have to go through as many costly redesigns. Start small, scale when needed, deprecate services, add new ones, and evolve with customer usage is the approach. We also know that there are many other problems yet to be solved to make microservices more approachable for most developers. Containers and the actor programming model are examples of small steps in that direction, and we are sure that more innovations will emerge to make this easier. Next steps Service Fabric terminology overview Microservices: An application revolution powered by the cloud Service Fabric application scenarios 2/21/2017 • 4 min to read • Edit Online Azure Service Fabric offers a reliable and flexible platform that enables you to write and run many types of business applications and services. These applications and microservices can be stateless or stateful, and they are resource-balanced across virtual machines to maximize efficiency. The unique architecture of Service Fabric enables you to perform near real-time data analysis, in-memory computation, parallel transactions, and event processing in your applications. You can easily scale your applications up or down (really in or out), depending on your changing resource requirements. The Service Fabric platform in Azure is ideal for the following categories of applications: Highly available services: Service Fabric services provide fast failover by creating multiple secondary service replicas. If a node, process, or individual service goes down due to hardware or other failure, one of the secondary replicas is promoted to a primary replica with minimal loss of service. Scalable services: Individual services can be partitioned, allowing for state to be scaled out across the cluster. In addition, individual services can be created and removed on the fly. Services can be quickly and easily scaled out from a few instances on a few nodes to thousands of instances on many nodes, and then scaled in again, depending on your resource needs. You can use Service Fabric to build these services and manage their complete lifecycles. Computation on nonstatic data: Service Fabric enables you to build data, input/output, and compute intensive stateful applications. Service Fabric allows the collocation of processing (computation) and data in applications. Normally, when your application requires access to data, there is network latency associated with an external data cache or storage tier. With stateful Service Fabric services, that latency is eliminated, enabling more performant reads and writes. Say for example that you have an application that performs near real-time recommendation selections for customers, with a round-trip time requirement of less than 100 milliseconds. The latency and performance characteristics of Service Fabric services (where the computation of recommendation selection is collocated with the data and rules) provides a responsive experience to the user compared with the standard implementation model of having to fetch the necessary data from remote storage. Session-based interactive applications: Service Fabric is useful if your applications, such as online gaming or instant messaging, require low latency reads and writes. Service Fabric enables you to build these interactive, stateful applications without having to create a separate store or cache, as required for stateless apps. (This increases latency and potentially introduces consistency issues.) Distributed graph processing: The growth of social networks has greatly increased the need to analyze large- scale graphs in parallel. Fast scaling and parallel load processing make Service Fabric a natural platform for processing large-scale graphs. Service Fabric enables you to build highly scalable services for groups such as social networking, business intelligence, and scientific research. Data analytics and workflows: The fast reads and writes of Service Fabric enable applications that must reliably process events or streams of data. Service Fabric also enables applications that describe processing pipelines, where results must be reliable and passed on to the next processing stage without loss. These include transactional and financial systems, where data consistency and computation guarantees are essential. Data gathering, processing and IoT: Since Service Fabric handles large scale and has low latency through its stateful services, it is ideal for data processing on millions of devices where the data for the device and the computation are co-located. We have seen several customers who have built IoT systems using Service Fabric including BMW, Schneider Electric and Mesh Systems. Application design case studies A number of case studies showing how Service Fabric is used to design applications are published on the Service reliability. Applications have already moved from using purely relational databases for high availability to NoSQL databases. Service Fabric's stateful services democratize high availability. etc. Now the applications themselves can have their "hot" state and data managed within them for additional performance gains without sacrificing reliability. all deployed into the same Service Fabric cluster using the Service Fabric deployment commands. bringing it to all types of applications. In contrast. not just databases and other data stores. By taking advantage of the reliable services and reliable actors programming models.js. An application built using stateless services An application built using stateful services . or availability.) calling onto stateless and stateful business middle-tier services. This provides high availability and consistency of the state through simple APIs that provide transactional guarantees backed by replication. Stateful microservices simplify application designs because they remove the need for the additional queues and caches that have traditionally been required to address the availability and latency requirements of purely stateless applications. The diagrams below illustrate the differences between designing an application that is stateless and one that is stateful. stateful microservices maintain their authoritative state beyond the request and its response. Since stateful services are naturally highly available and low latency. greatly improving agility in development and lifecycle management. Each of these services is independent with regard to scale. and resource usage. This is a natural progression. consistency. Node. stateful services reduce application complexity while achieving high throughput and low latency.Fabric team blog and the microservices solutions site Design applications composed of stateless and stateful microservices Building applications with Azure cloud service worker roles is an example of a stateless service.NET. When building applications consisting of microservices. this means that there are fewer moving parts to manage in your application as a whole. you typically have a combination of stateless web apps (ASP. Next steps Get started building stateless and stateful services with the Service Fabric reliable services and reliable actors programming models. Learn about customer case studies. Learn more about Patterns and scenarios. Also see the following topics: Tell me about microservices Define and manage service state Availability of Service Fabric services Scale Service Fabric services Partition Service Fabric services . It supports one-way and request-reply communication patterns. This channel is used for communication within service fabric clusters and communication between the service fabric cluster and clients. The reliability subsystem. Transport subsystem The transport subsystem implements a point-to-point datagram communication channel. Service Fabric architecture 4/19/2017 • 5 min to read • Edit Online Service Fabric is built with layered subsystems. which provides the basis for implementing broadcast and multicast in the Federation layer. is responsible for the reliability of Service Fabric services through mechanisms such as replication. which provides secure communication between nodes. perform leader election. The transport subsystem secures communication by using X509 . which clusters the different nodes into a single entity (named clusters) so that Service Fabric can detect failures. Service Fabric provides the ability to resolve service locations through its communication subsystem. The federation subsystem also underlies the hosting and activation subsystem. At the base of the stack is the transport subsystem. layered on top of the federation subsystem. which manages the lifecycle of an application on a single node. The testability subsystem helps application developers test their services through simulated faults before and after deploying applications and services to production environments. and failover. and provide consistent routing. In a distributed system. These subsystems enable you to write applications that are: Highly available Scalable Manageable Testable The following diagram shows the major subsystems of Service Fabric. the ability to securely communicate between nodes in a cluster is crucial. Above the transport subsystem lies the federation subsystem. The management subsystem manages the lifecycle of applications and services. The application programming models exposed to developers are layered on top of these subsystems along with the application model to enable tooling. resource management. The Failover Manager ensures that when nodes are added to or removed from the cluster. That configuration indicates which replicas the operations need to be replicated. This subsystem is used internally by Service Fabric and is not directly accessible to developers for application programming. which is then aggregated into the centralized health store. It integrates with the health manager to ensure that application availability is not lost from a semantic health perspective during upgrades. the layer uses a leasing mechanism based on heart beating and arbitration. The Replicator ensures that state changes in the primary service replica will automatically be replicated to secondary replicas. Image Store: This service provides storage and distribution of the application binaries. The replicator is responsible for quorum management among the replicas in the replica set. which can be used by the programming model API to make the service state highly available and reliable. Health query APIs enable you to query the health events reported to the health subsystem. If a node in the cluster fails. Health Manager: This service enables health monitoring of applications. The federation subsystem uses the communication primitives provided by the transport subsystem and stitches the various nodes into a single unified cluster that it can reason about. The federation subsystem also guarantees through intricate join and departure protocols that only a single owner of a token exists at any time. patch. Management subsystem The management subsystem provides end-to-end service and application lifecycle management. Failover Manager. It provides the distributed systems primitives needed by the other subsystems . The federation subsystem is built on top of distributed hash tables with a 128-bit token space. PowerShell cmdlets and administrative APIs enable you to provision. with each node in the ring being allocated a subset of the token space for ownership. deploy. Reliability subsystem The reliability subsystem provides the mechanism to make the state of a Service Fabric service highly available through the use of the Replicator. This service provides a . the cluster will automatically reconfigure the service replicas to maintain availability. This provides leader election and consistent routing guarantees. and the reconfiguration agent provides it with the configuration of the replica set. service partitions. and de-provision applications without loss of availability. For failure detection. Cluster entities (such as nodes. The health query APIs return the raw health data stored in the health store or the aggregated.failure detection. Cluster Manager: This is the primary service that interacts with the Failover Manager from reliability to place the applications on the nodes based on the service placement constraints. The management subsystem performs this through the following services. The Resource Manager places service replicas across failure domain in the cluster and ensures that all failover units are operational. enabling you to take any needed corrective actions. Service Fabric provides a default replicator called Fabric Replicator. Federation subsystem In order to reason about a set of nodes in a distributed system. The Resource Manager in failover subsystem ensures that the constraints are never broken. This health information provides an overall point-in-time health snapshot of the services and nodes distributed across multiple nodes in the cluster. It interacts with the failover unit to get the list of operations to replicate. and consistent routing. upgrade. services. you need to have a consistent view of the system. and cluster entities. leader election. the load is automatically redistributed across the available nodes. interpreted health data for a specific cluster entity. and replicas) can report health information.certificates or Windows security. The Resource Manager also balances service resources across the underlying shared pool of cluster nodes to achieve optimal uniform load distribution. and Resource Balancer. maintaining consistency between the primary and secondary replicas in a service replica set. The subsystem creates a ring topology over the nodes. The cluster manager manages the lifecycle of the applications from provision to de-provision. Communication subsystem This subsystem provides reliable messaging within the cluster and service discovery through the Naming service. Using a simple Naming client API. users of Service Fabric can develop services and clients capable of resolving the current network location despite node dynamism or the resizing of the cluster. Testability subsystem Testability is a suite of tools specifically designed for testing services built on Service Fabric. all in a controlled and safe manner. It interacts with the reliability and health components to ensure that the replicas are properly placed and are healthy. . Using the Naming service. The tools let a developer easily induce meaningful faults and run test scenarios to exercise and validate the numerous states and transitions that a service will experience throughout its lifetime. The Naming service resolves service names to a location in the cluster and enables users to manage service names and properties. simple distributed file store where the applications are uploaded to and downloaded from. The hosting subsystem then manages the lifecycle of the application on that node. Testability also provides a mechanism to run longer tests that can iterate through various possible failures without losing availability. This provides you with a test-in-production environment. clients can securely communicate with any node in the cluster to resolve a service name and retrieve service metadata. Hosting subsystem The cluster manager informs the hosting subsystem (running on each node) which services it needs to manage for a particular node. after creating a named application. data packages. Defined in an ApplicationManifest.exe and FabricGateway. which is then copied to the Service Fabric cluster's image store. Named Application: After an application package is copied to the image store.xml file. For example.xml file describes the service. These two executables make up the node.xml file. FabricHost. Service Fabric terminology overview 2/21/2017 • 8 min to read • Edit Online Service Fabric is a distributed systems platform that makes it easy to package. you create an instance of the application within the cluster by specifying the application package's application type (using its name/version). There are two types of services: Stateless: Use a stateless service when the service's persistent state is stored in an external storage service . an application package for an email application type could contain references to a queue service package. Read the Application Model article for more information. Within a cluster. and a database service package. Nodes have characteristics such as placement properties.exe . Each named application is managed and versioned independently. Each node is assigned a node name (a string). The concepts listed in this section are also discussed in the following Microsoft Virtual Academy videos: Core concepts. Node: A machine or VM that is part of a cluster is called a node. Defined in a ServiceManifest. Design-time concepts. Within the cluster. You can also create named applications from different application types.exe . embedded in an application package directory. and configuration packages. embedded in a service package directory and the service package directory is then referenced by an application package's ApplicationManifest. you can create multiple named applications from a single application type.exe and FabricGateway. For testing scenarios. and manage scalable and reliable microservices. Application concepts Application Type: The name/version assigned to a collection of service types. Read the Application Model article for more information. Clusters can scale to thousands of machines. which starts running upon boot and then starts two executables: Fabric. a frontend service package. References the service packages for each service type that makes up the application type. The files in the application package directory are copied to Service Fabric cluster's image store. deploy.xml file. you can host multiple nodes on a single machine or VM by running multiple instances of Fabric. The service type's ServiceManifest. This topic details the terminology used by Service Fabric in order to understand the terms used in the documentation. Infrastructure concepts Cluster: A network-connected set of virtual or physical machines into which your microservices are deployed and managed.xml file. Service Type: The name/version assigned to a service's code packages. you can create a named service from one of the application type's service types. and Run-time concepts. You can then create a named application from this application type within the cluster.exe . Each machine or VM has an auto-start Windows service. Application Package: A disk directory containing the application type's ApplicationManifest. Each application type instance is assigned a URI name that looks like this: "fabric:/MyNamedApp" . This file references the code. Service Fabric can also deploy services in container images. a computation is performed using these values. Also specify how many times to replicate your state across nodes (for reliability). For example.xml file. These executables are unable to use some Service Fabric features such as the naming service for endpoint discovery. Use a stateless service when the service has no persistent storage at all. Data Package: A disk directory containing the service type's static. Service Host Executables: Executables that use Service Fabric programming models by linking to Service Fabric runtime files. dependencies. Containers are a virtualization technology that virtualizes the underlying operating system from applications. When a named service is created. the code package is copied to the one or more nodes selected to run the named service. There are two types of code package executables: Guest executables: Executables that run as-is on the host operating system (Windows or Linux).xml file. Containers: By default. The files in the service package directory are referenced by the application type's ApplicationManifest. Azure SQL Database. Guest executables cannot report load metrics specific to each service instance.xml file. Service Fabric deploys and activates services as processes. read Service Fabric . The code starts running and can now access the data files. enabling Service Fabric features. the files in the configuration package are copied to the one or more nodes selected to run the named service.xml file. the URI looks like: "fabric:/MyNamedApp/MyDatabase" . or Azure DocumentDB. such as Azure Storage. For more information. static data. Service Package: A disk directory containing the service type's ServiceManifest.xml file. You modify your named service's state by writing to the primary replica. An application and its runtime. and configuration packages for the service type. For example. For example. When a named service is created. you can create an instance of one of its service types within the cluster by specifying the service type (using its name/version). read-only configuration files (typically text files). these executables do not link to or reference any Service Fabric runtime files and therefore do not use any Service Fabric programming models. When a named service is created. That is. Within a named application. Service Fabric supports Docker containers on Linux and Windows Server containers. and configuration packages that make up a database service. and a result is returned. Service Fabric then replicates this state to all the secondary replicas keeping your state in sync. a named service instance can register endpoints with Service Fabric's Naming Service and can also report load metrics. Each named service has a single primary replica and multiple secondary replicas. The files in the data package directory are referenced by the service type's ServiceManifest. The files in the configuration package directory are referenced by the service type's ServiceManifest. you can create several named services. and system libraries run inside a container with full. The files in the code package directory are referenced by the service type's ServiceManifest. private access to the container's own isolated view of operating system constructs. Stateful: Use a stateful service when you want Service Fabric to manage your service's state via its Reliable Collections or Reliable Actors programming models. Specify how many partitions you want to spread your state over (for scalability) when creating a named service. and video files). Then the code starts running and can now access the configuration files. a service package could refer to the code. Then the code starts running. sound. Service Fabric then creates a new secondary replica. Configuration Package: A disk directory containing the service type's static. Each service type instance is assigned a URI name scoped under its named application's URI. a calculator service where values are passed to the service. For example. if you create a "MyDatabase" named service within a "MyNamedApp" named application. the data package is copied to the one or more nodes selected to run the named service. Named Service: After creating a named application. Code Package: A disk directory containing the service type's executable files (typically EXE/DLL files). read-only data files (typically photo. Service Fabric automatically detects when a primary replica fails and promotes an existing secondary replica to a primary replica. Each named service can have its own partition scheme and instance/replica counts. static data. You manage the service names and properties. it is best to avoid code that calls out to other actors or services since an individual actor cannot process other incoming requests until all its outbound requests have completed. similar to an internet Domain Name Service (DNS) for the cluster. or the resizing of the cluster. stateless named services only ever have one partition since they have no internal state. Within a partition.and containers. Usually. resource balancing. you specify a partition scheme. Read the Deploy an application article for more information on deploying applications to the Image store service. which resolves service names to a location in the cluster. This model can be useful when you have lots of independent units of computation/state. Should a replica fail. Clients securely communicate with any node in the cluster using the Naming Service to resolve a service name and its location. Copy an application package to the Image Store and then register the application type contained within that application package. Service Fabric builds a new replica from the existing replicas. stateless named services have instances while stateful named services have replicas. versioned application packages are kept. You can unregister an application type from the Image Store service after all its named applications have been deleted. After the application type is provisioned. Partition Scheme: When creating a named service. Built-in programming models There are . You also get to plug in a variety of communication stacks such as Web API and Windows Communication Foundation (WCF). Next steps To learn more about Service Fabric: Overview of Service Fabric . Image Store Service: Each Service Fabric cluster has an Image Store service where deployed. Stateful named services maintain their state within replicas and each partition has its own replica set with all the state being kept in sync. The partition instances provide for availability. Because this model uses a turn-based threading model. Read Communicate with services for more information on the client and service communication APIs that work with the Naming service. System services There are system services that are created in every cluster that provide the platform capabilities of Service Fabric. Read the Choose a Programming Model for your service article for more information. Clients obtain the actual machine IP address and port where it is currently running. Stateful service store their state in Reliable Collections (such as a dictionary or a queue). Services with large amounts of state split the data across partitions which spreads it across the cluster's nodes. if one instance fails. Naming Service: Each Service Fabric cluster has a Naming service. Read the Partition Service Fabric reliable services article for more information. This allows your named service's state to scale. Reliable Actors: An API to build stateless and stateful objects through the virtual Actor programming model.NET Framework programming models available for you to build Service Fabric services: Reliable Services: An API to build stateless and stateful services. You can develop services and clients capable of resolving the current network location despite applications being moved within the cluster for example due to failures. Read Understand the ImageStoreConnectionString setting for more information about the Image Store service. you create a named applications from it. other instances continue to operate normally and then Service Fabric will create a new instance. Why a microservices approach to building applications? Application scenarios . when companies talk about building for the cloud there is the expectation of growth and usage. parallel transactions. deploy. It also provides comprehensive application management capabilities to provision. in-memory computation. an introduction to the core concepts and terminology. You can also run any of your existing applications (written in any language). and event processing in your applications. On the other hand. Service Fabric addresses the significant challenges in developing and managing cloud applications. reliable. So you want to learn about Service Fabric? 4/18/2017 • 20 min to read • Edit Online This primer provides a brief overview to Service Fabric. The five-minute overview Azure Service Fabric is a distributed systems platform that makes it easy to package. you can focus on implementing mission-critical. Read Overview of Service Fabric to learn more. and an overview of each area of Service Fabric. The issue is that growth and scale are unpredictable. which is referred to as a cluster. deploy. Service Fabric represents the next-generation middleware platform for building and managing these enterprise-class. and delete deployed applications. getting a simple app out the door as proof of concept is the driving factor (knowing that the application can be redesigned at a later time). Tier-1 cloud-scale applications. scalable. and manageable. This short Channel9 video introduces Service Fabric and microservices: The detailed overview Service Fabric enables you to build and manage scalable and reliable applications composed of microservices. Developers want to prototype quickly while knowing that the app can scale to react to unpredictable growth and usage. upgrade/patch. Instead. demanding workloads knowing that they are scalable. and manage independently. a getting started guide. developers and administrators can avoid solving complex infrastructure problems. Successful applications evolve by being useful to people. By using Service Fabric. and manage scalable and reliable microservices. which you can scale up or down. What are microservices? describes how the microservice design approach meets these challenges and how you can create microservices. The unique architecture of Service Fabric enables you to perform near real-time data analysis. How much do you know about your requirements today. Why a microservices design approach? All applications evolve over time. stateless and stateful microservices. test. These applications and microservices can be stateless or stateful. and what will they be in the future? Sometimes. Read . This primer does not contain a comprehensive content list. and they are resource-balanced across virtual machines to maximize efficiency. deploy. You can easily scale your applications up or down (really in or out). depending on your changing resource requirements. monitor. It provides a sophisticated runtime to build distributed. These microservices run at very high density on a shared pool of machines. but does link to overview and getting started articles for every area of Service Fabric. Service Fabric offers a reliable and flexible platform that enables you to write and run many types of business applications and services. and Create your first app (C#). After you've run your first application. or MacOS environments and deploy those apps to clusters running on Windows or Linux. The following Microsoft Virtual Academy video walks you through the process of creating a Java app on Linux: . This longer Microsoft Virtual Academy video describes the Service Fabric core concepts: Get started and create your first app Using the Service Fabric SDKs and tools. which takes you through stateful services. These topics walk you through the process of creating your first application in Visual Studio and running it on your development computer. The following guides will have you deploying an app within minutes.NET Core and Java. Linux. configure monitoring and health reports. These topics walk you through the process of creating your first Java or C# application on Linux and running it on your development computer: Set up your dev environment. deploying. Set up your dev environment Create your first app (C#) Practical hands on labs Try this extensive hands-on-lab Part 1 to get familiar with the end-to-end development flow for Service Fabric. download and run some of our sample apps. The following Channel9 video walks you through the process of creating a C# app in Visual Studio: On Linux Service Fabric provides SDKs for building services on Linux in both . and perform an application upgrade. In particular start with the Getting Started Samples On Windows The Service Fabric SDK includes an add-in for Visual Studio that provides templates and tools for creating. Create your first app (Java). After Part 1.Application scenarios and patterns and scenarios to learn about the categories of applications and services you can create as well as customer case studies. you can develop apps in Windows. and debugging Service Fabric applications. go through hands-on-lab Part 2. Learn to create a stateless service. Core concepts Service Fabric terminology. after creating a named application. embedded in a service package directory and the service package directory is then referenced by an application package's ApplicationManifest. which is then copied to the Service Fabric cluster's image store. which then runs within the cluster. For example. you can create a named service from one of the application type's service types. The application package is a disk directory containing the application type's ApplicationManifest. data packages. app package and manifest. Defined in a ServiceManifest.xml file.xml file. and configuration packages. which are loaded at run time. and Supported programming models provide more concepts and descriptions. A service type is described by it's ServiceManifest. CORE CONCEPTS DESIGN TIME RUN TIME Design time: app type.xml file. A service type is the name/version assigned to a service's code packages. but here are the basics. service package and manifest An application type is the name/version assigned to a collection of service types. an application package for an email application type could contain references to a queue service package. It's composed of executable code service configuration settings. Defined in an ApplicationManifest. You can then create a named application from this application type. which references the service packages for each service type that makes up the application type.On MacOS You can build Service Fabric applications on MacOS X to run on Linux clusters. a frontend . embedded in an application package directory.xml file. These articles describe how to set up your Mac for development and walk you through creating a Java application on MacOS and running it on an Ubuntu virtual machine: Set up your dev environment and Create your first app (Java).xml file. and static data that is consumed by the service. Within the cluster. Application model. service type. you can host multiple nodes on a single machine or VM by running multiple instances of Fabric.exe and FabricGateway. The files in the service package directory are referenced by the application type's ApplicationManifest. or Azure DocumentDB. the URI looks like: fabric:/MyNamedApp/MyDatabase. Within a named application. partitions. you can create multiple named applications from a single application type. Each service type instance is assigned a URI name scoped under its named application's URI. Usually. A stateful service uses Service Fabric to manage your service's state via its Reliable Collections or Reliable Actors programming models. A service performs a complete and standalone function (it can start and run independently of other services) and is composed of code. stateless named services have instances while stateful named services have replicas. You can also create named applications from different application types. Stateful named services maintain their state within replicas and each partition has its own replica set. and configuration packages for the service type. and replicas. There are two types of services: stateless and stateful. FabricHost. The following diagram shows the relationship between applications and service instances. Nodes have characteristics such as placement properties. Use a stateless service when the service has no persistent storage at all. Changes to state from write operations are replicated to multiple other replicas (called Active Secondaries). which references the code. Azure SQL Database. Run time: clusters and nodes. if you create a "MyDatabase" named service within a "MyNamedApp" named application. and configuration packages that make up a database service.exe. you can create an instance of one of its service types (a named service) within the cluster by specifying the service type (using its name/version). Each application type instance is assigned a URI name that looks like fabric:/MyNamedApp. After creating a named application. After an application package is copied to the image store. stateless named services only ever have one partition since they have no internal state. which is spread across the cluster's nodes. Within a cluster. A machine or VM that is part of a cluster is called a node. named services.xml file. and data. Services with large amounts of state split the data across partitions. Each named service can have its own partition scheme and instance/replica counts. Stateless services store persistent state in an external storage service such as Azure Storage. which starts running upon boot and then starts two executables: Fabric.exe. Read and write operations are performed at one replica (called the Primary). you specify a partition scheme. configuration. Clusters can scale to thousands of machines. and a database service package. a service package could refer to the code. you can create one or more named services. For example. Each named application is managed and versioned independently. The files in the application package directory are copied to the Service Fabric cluster's image store. you create an instance of the application within the cluster by specifying the application package's application type (using its name/version). When creating a named service. For development or testing scenarios. and replicas A Service Fabric cluster is a network-connected set of virtual or physical machines into which your microservices are deployed and managed.exe and FabricGateway. static data.exe. Within a partition. Each machine or VM has an auto-start Windows service. static data. named apps. For example.xml file. These two executables make up the node. partitions.service package. A named application is a collection of named services that performs a certain function or functions. . A service package is a disk directory containing the service type's ServiceManifest. Each partition is responsible for a portion of the complete state of the service. Each node is assigned a node name (a string). If one instance fails. such as custom health and load reporting. or sharding. see Supported programming models. Each partition is responsible for a portion of the complete state of the service. As the data needs grow. Service Fabric builds a new replica from the existing replicas. Importantly. stateless services. A well known form of partitioning is data partitioning. you can decrease the number of nodes in the cluster. The partition instances provide for availability. services can be any compiled executable program written in any language and hosted on a Service Fabric cluster. Supported programming models Service Fabric offers multiple ways to write and manage your services. arbitrary executable (written in any language) hosted on a Service Fabric cluster alongside other services. You can deploy existing applications. Or. For more information. Usually. Changes to state from write operations are replicated to multiple other replicas (called Active Secondaries). If the nodes in the cluster are not being used efficiently.Partitioning. partitions grow. Service Fabric currently supports deployment of Docker containers on Linux and Windows Server containers on Windows Server 2016. scaling. however. Containers By default. which allows your named service's state to scale. Should a replica fail. other instances continue to operate normally and then Service Fabric creates a new instance. If you add new nodes to the cluster. Service Fabric will rebalance the partition replicas across the increased number of nodes. stateless named services have instances while stateful named services have replicas. Service Fabric deploys and activates services as processes. Services can choose to use the Service Fabric APIs to take full advantage of the platform's features and application frameworks. Guest executables do not integrate directly with Service Fabric APIs. you can mix services in processes and services in containers in the same application. Overall application performance improves and contention for access to memory decreases. service endpoint registration. Stateful services with large amounts of state split the data across partitions. Service Fabric can also deploy services in container images. Service Fabric again rebalances the partition replicas across the decreased number of nodes to make better use of the hardware on each node. Stateful named services maintain their state within replicas and each partition has its own replica set. Guest executables do not benefit from the full set of features the platform offers. Within a partition. and Service Fabric rebalances partitions across nodes to make efficient use of hardware resources. or stateful services in a . and availability Partitioning is not unique to Service Fabric. The replicas of each partition are spread across the cluster's nodes. and stateful compute. Read and write operations are performed at one replica (called the Primary). Guest executables A guest executable is an existing. a container represents an application host in which multiple service replicas are placed. In the Service Fabric application model. stateless named services only ever have one partition since they have no internal state. The service model enables several different roles to participate independently in the application lifecycle. Reliable Actors Built on top of Reliable Services. A service developer can use these as building blocks to write complicated scenarios. and REST APIs. it is critical to verify that your apps aps and services can withstand real-world failures. Service Fabric application lifecycle provides an overview of the APIs and how they are used by the different roles throughout the phases of the Service Fabric application lifecycle. safe. from development through deployment. daily management. where state is persisted directly in the service itself using Reliable Collections. State is persisted in an external solution. all managed automatically by Service Fabric. Reliable Services can also be stateful. based on the actor design pattern. Actions target a service for testing using individual faults. Examples of simulated faults are: Restart a node to simulate any number of situations where a machine or VM is rebooted. Service Fabric provides first-class support for the full application lifecycle of cloud applications. The Reliable Actor framework uses independent units of compute and state with single-threaded execution called actors. upgrade. Move a replica of your stateful service to simulate load balancing. Java APIs. State is made highly available through replication and distributed through partitioning. or application upgrade. deployment. maintenance. The Reliable Actor framework provides built in communication for actors and pre-set state persistence and scale-out configurations. an application on Service Fabric usually goes through the following phases: design. all in a controlled. failover. The entire app lifecycle can be managed using PowerShell cmdlets. The Fault Analysis Service is designed for testing services that are built on Service Fabric. such as Azure DB or Azure Table Storage.container using Service Fabric. C# APIs. such as web servers or Worker Roles in Azure Cloud Services). testing. App lifecycle As with other platforms. and maintenance to eventual decommissioning. Invoke quorum loss on a stateful service to create a situation where write operations can't proceed because . the Reliable Actor framework is an application framework that implements the Virtual Actor pattern. With the Fault Analysis Service you can induce meaningful faults and run complete test scenarios against your applications. and consistent manner. development. These faults and scenarios exercise and validate the numerous states and transitions that a service will experience throughout its lifetime. You can also set up continuous integration/continuous deployment pipelines using tools such as Visual Studio Team Services or Jenkins The following Microsoft Virtual Academy video describes how to manage your application lifecycle: Test apps and services To create truly cloud-scale services. Reliable Services Reliable Services is a light-weight framework for writing services that integrate with the Service Fabric platform and benefit from the full set of platform features. and removal. Reliable Services can be stateless (similar to most service platforms. NET Core). there are some features that are supported on Windows. read Differences between Service Fabric on Linux and Windows. but not on Linux.a version of the chaos test scenario that targets a specific service partition while leaving other services unaffected.exe and FabricGateway. interleaved faults (both graceful and ungraceful) throughout the cluster over extended periods of time. Service Fabric clusters can be created on any virtual machines or computers running Windows Server or Linux.exe. Resource Manager also provides easy management of all resources used by the cluster as a single unit. on Microsoft Azure. Each node is assigned a node name (a string). Failover scenario. Clusters A Service Fabric cluster is a network-connected set of virtual or physical machines into which your microservices are deployed and managed. Docker containers can run guest executables or native Service Fabric services. the preview also supports orchestrating Docker containers. Cluster nodetypes are virtual machine scale sets. or on any cloud provider.exe. Nodes have characteristics such as placement properties. . These two executables make up the node. or from Visual Studio. The Fault Analysis Service provides two built- in complete scenarios: Chaos scenario. which starts running upon boot and then starts two executables: Fabric. and manage highly available. You are able to deploy and run Service Fabric applications in any environment where you have a set of Windows Server or Linux computers that are interconnected: on-premises. Since Service Fabric on Linux is a preview. A machine or VM that is part of a cluster is called a cluster node. The following Microsoft Virtual Academy video describes Service Fabric clusters: Clusters on Azure Running Service Fabric clusters on Azure provide integration with other Azure features and services. Clusters on Azure are integrated with Azure diagnostics and Log Analytics.simulates continuous. Invoke data loss on a stateful service to create a situation where all in-memory state is completely wiped out. read Service Fabric on Linux. so you can model clusters like any other resources in Azure. from a template. Each machine or VM has an auto-start service. Clusters can scale to thousands of machines. which makes operations and management of the cluster easier and more reliable. In addition. The Service Fabric frameworks (Reliable Services and Reliable Actors) are available in Java on Linux in addition to C# (. which use the Service Fabric frameworks. highly scalable applications on Linux just as you would on Windows. FabricHost. so autoscaling functionality is built- in.exe. You can create a cluster on Azure through the Azure portal. there aren't enough "back-up" or "secondary" replicas to accept new data. The preview of Service Fabric on Linux enables you to build. For more information. Scenarios are complex operations composed of one or more actions. A cluster is an Azure Resource Manager resource. To learn more. For testing scenarios. You can also build guest executable services with any language or framework.exe and FabricGateway. deploy. you can host multiple nodes on a single machine or VM by running multiple instances of Fabric. read Upgrade a Service Fabric cluster. You would then initiate the upgrade. Cluster security Clusters must be secured to prevent unauthorized users from connecting to your cluster.com/download. The cluster security scenarios are: Node-to-node security Client-to-node security Role-based access control (RBAC) For more information. Microsoft is responsible for patching the underlying OS and performing fabric upgrades on your cluster. upgrades on your cluster so that you are always running a supported version. For more information. A Service Fabric cluster is a resource that you own. doing so allows anonymous users to connect to it if management endpoints are exposed to the public internet. If your cluster can connect to https://www. read Secure a cluster. You can set your cluster to receive automatic fabric upgrades. you can host your own cluster and apps. If the nodes in the cluster are not being used efficiently. Standalone clusters can be scaled manually. For more information. you can decrease the number of nodes in the cluster.com/download. Service Fabric again rebalances the partition replicas and instances across the decreased number of nodes to make better use of the hardware on each node. It is not possible to later enable security on an unsecured cluster: cluster security is enabled at cluster creation time. Fabric and configuration upgrades can be set through the Azure portal or through Resource Manager. In addition to fabric upgrades. Standalone clusters give you the freedom to host a cluster wherever you want. you can manually download the new runtime package from an internet connected machine and then initiate the upgrade. but is partly managed by Microsoft.Standalone clusters Service Fabric provides an install package for you to create standalone Service Fabric clusters on-premises or on any cloud provider. Health monitoring Service Fabric introduces a health model designed to flag unhealthy cluster and application conditions on specific entities (such as cluster nodes and service replicas). A standalone cluster is a resource that you entirely own. If your cluster can't access https://www. Although it is possible to create an unsecured cluster. Cluster upgrades Periodically. The health model uses health reporters (system components . Service Fabric rebalances the partition replicas and instances across the increased number of nodes. If your data is subject to compliance or regulatory constraints.microsoft. Service Fabric apps can run in multiple hosting environments with no changes. Perform runtime. read Upgrade a standalone Service Fabric cluster. You can scale clusters on Azure either manually or programmatically. or you want to keep your data local. or fabric. you can set your cluster to automatically download and provision the new Service Fabric runtime package. or choose to select a supported fabric version that you want. new versions of the Service Fabric runtime are released. You are responsible for patching the underlying OS and initiating fabric upgrades. especially when it has production workloads running on it. Create your first Service Fabric standalone cluster Linux standalone clusters are not yet supported.microsoft. so your knowledge of building apps carries over from one hosting environment to another. when Microsoft releases a new version. Overall application performance improves and contention for access to memory decreases. Scaling If you add new nodes to the cluster. you can also update cluster configuration such as certificates or application ports. The health information can save time and effort on debugging and investigation once the service is up and running at scale in production. Examples of such services are repair services and alerting mechanisms. Internal watchdogs deployed as a Service Fabric service (for example. False positives that wrongly show unhealthy issues can negatively impact upgrades or other services that use health data. implement custom health reporting in your services. Reporting can be done from: The monitored Service Fabric service replica or instance. The following Microsoft Virtual Academy video describes the Service Fabric health model and how it's used: Next steps Learn how to create a cluster in Azure or a standalone cluster on Windows. . flexible. a Service Fabric stateless service that monitors conditions and issues reports). monitoring service like Gomez). For applications and services.and watchdogs). Any condition that can impact health should be reported on. some thought is needed to provide reports that capture conditions of interest in the best possible way. The model is intended to be rich. To add health information specific to your service's logic. especially if it can help flag problems close to the root. Therefore. Learn how to migrate from Cloud Services. The Service Fabric reporters monitor identified conditions of interest. General queries that return a list of entities that have health as one of the properties (through PowerShell. The quality of the health reports determines the accuracy of the health view of the cluster. the C# FabricClient APIs and Java FabricClient APIs. External watchdogs that probe the resource from outside the Service Fabric cluster (for example. or REST APIs). system health reports verify that entities are implemented and are behaving correctly from the perspective of the Service Fabric runtime. Service writers need to think upfront about health and how to design health reporting. They report on those conditions based on their local view. The reports do not provide any health monitoring of the business logic of the service or detect hung processes. Try creating a service using the Reliable Services or Reliable Actors programming models. the API. or REST). and easy to use. Internal watchdogs that run on the Service Fabric nodes but are not implemented as Service Fabric services. System health reports provide visibility into cluster and application functionality and flag issues through health. Learn to monitor and diagnose services. Service Fabric provides multiple ways to view health reports aggregated in the health store: Service Fabric Explorer or other visualization tools/ Health queries (through PowerShell. The goal is easy and fast diagnosis and repair. The health store aggregates health data sent by all reporters to determine whether entities are globally healthy. Out of the box. Service Fabric components report health on all entities in the cluster. The watchdogs can be deployed an all nodes or can be affinitized to the monitored service. Learn to test your apps and services. Look through the Service Fabric samples. . Learn to manage and orchestrate cluster resources. Read the team blog for articles and announcements. Learn about Service Fabric support options. 0 by default.1 Windows Server 2012 R2 Windows Server 2016 Windows 10 NOTE Windows 7 only includes Windows PowerShell 2. install the runtime. SDK. Service Fabric PowerShell cmdlets requires PowerShell 3. using Web Platform Installer. Prepare your development environment 3/22/2017 • 2 min to read • Edit Online To build and run Azure Service Fabric applications on your development machine. and tools.0 from the Microsoft Download Center. Install the SDK and tools To use Visual Studio 2017 Service Fabric Tools are part of the Azure Development and Management workload in Visual Studio 2017.0 or higher. Enable this workload as part of your Visual Studio installation. you need to install the Microsoft Azure Service Fabric SDK. You also need to enable execution of the Windows PowerShell scripts included in the SDK. using the Web Platform Installer: Install the Microsoft Azure Service Fabric SDK and Tools SDK installation only If you only need the SDK. In addition. Prerequisites Supported operating system versions The following operating system versions are supported for development: Windows 7 Windows 8/Windows 8. You can download Windows PowerShell 5. Install the Microsoft Azure Service Fabric SDK To use Visual Studio 2015 (requires Visual Studio 2015 Update 2 or later) For Visual Studio 2015. you can install this package: Install the Microsoft Azure Service Fabric SDK . Service Fabric tools are installed together with the SDK. 5.1 For a list of supported versions. Create your first Service Fabric application in Visual Studio Learn how to deploy and manage applications on your local cluster Learn about the programming models: Reliable Services and Reliable Actors Check out the Service Fabric code samples on GitHub Visualize your cluster by using Service Fabric Explorer Follow the Service Fabric learning path to get a broad introduction to the platform Learn about Service Fabric support options .216 Visual Studio 2015 tools 1. Try the following workarounds: Launch the preceding links in Internet Explorer or Edge browsers. see Service Fabric support Enable PowerShell script execution Service Fabric uses Windows PowerShell scripts for creating a local development cluster and for deploying applications from Visual Studio.5. search for "Service Fabric". Windows blocks these scripts from running. WARNING Customers have reported errors during installation when using these launch links. or when these links were used in Chrome browser.216 Service Fabric runtime 5. you must modify your PowerShell execution policy. and install the SDK We apologize for the inconvenience. or Launch Web Platform Installer from the Start menu. To enable them. By default. Open PowerShell as an administrator and enter the following command: Set-ExecutionPolicy -ExecutionPolicy Unrestricted -Force -Scope CurrentUser Next steps Now that you've finished setting up your development environment.50311. start building and running apps.5. These errors are known issues in Web Platform Installer which are being addressed. The current versions are: Service Fabric SDK 2. 6.net/repos/servicefabric/ trusty main" > /etc/apt/sources. you can install the SDK. sudo apt-get update Install and set up the SDK for containers and guest executables Once your sources are updated. sudo apt-get install servicefabricsdkcommon .list. you must first update your apt sources. sudo sh -c 'echo "deb [arch=amd64] https://apt-mo. sudo apt-key adv --keyserver apt-mo.list.net/repos/dotnet-release/ xenial main" > /etc/apt/sources.net --recv-keys 417A0893 sudo apt-key adv --keyserver hkp://keyserver. Open a terminal. 1. install the runtime and common SDK.ubuntu.list' 4. 2.trafficmanager.com:80 --recv-keys 417A0893 5.list' 3. Add the new GPG key to your apt keyring.04 (i"Xenial Xerus") Update your apt sources To install the SDK and the associated runtime package via apt-get.trafficmanager. Add the Service Fabric repo to your sources list.trafficmanager. Prerequisites Supported operating system versions The following operating system versions are supported for development: Ubuntu 16. sudo sh -c 'echo "deb [arch=amd64] http://apt-mo. You can also install optional SDKs for Java and . You are asked to confirm the installation and to agree to a license agreement. Add the dotnet repo to your sources list.NET Core.d/dotnetdev. Install the Service Fabric SDK package. Refresh your package lists based on the newly added repositories.d/servicefabric. Prepare your development environment on Linux 3/29/2017 • 4 min to read • Edit Online To deploy and run Azure Service Fabric applications on your Linux development machine. you may need to set the variable with the following command: ```bash export NODE_PATH=$NODE_PATH:/root/. creation of apps with guest executable or container services should be possible by running yo azuresfguest . Create a symlink from the bin/azure folder of the cloned repo to /usr/bin/azure so that it's added to your path and commands are available from any directory. azure --completion >> ~/azure.sh .sh' >> ~/. You may need to set your $NODE_PATH environment variable to where the node modules are located. Switch into the cloned repo and install the CLI's dependencies using the Node Package Manager (npm).node/lib/node_modules ``` If you are using the environment as root.completion.com/Azure/azure-xplat-cli. sudo /opt/microsoft/sdk/servicefabric/common/sdkcommonsetup. sudo ln -s $(pwd)/bin/azure /usr/bin/azure 4. Run the SDK setup script. Finally.git 2. ```bash export NODE_PATH=$NODE_PATH:$HOME/.completion. It is based on Node.7. Set up the Azure cross-platform CLI The Azure cross-platform CLI includes commands for interacting with Service Fabric entities.bashrc file so that you don't have to set the environment variable at every login. enable auto-completion Service Fabric commands.js so ensure that you have installed Node before proceeding with the following instructions: 1. including clusters and applications.sh echo 'source ~/azure.completion. Clone the github repo to your development machine.node/lib/node_modules ``` TIP You may want to add these commands into your ~/.bash_profile source ~/azure.sh Once you have run the steps to install the Common SDK package. git clone https://github. cd azure-xplat-cli npm install 3. you are able to deploy pre-built Service Fabric application packages or new ones based on guest containers or guest executables. NOTE Service Fabric commands are not yet available in Azure CLI 2. if you wish to use the Java programming models) The Java SDK provides the libraries and templates required to build Service Fabric services using Java. . you should see the Service Fabric Explorer dashboard. Set up a local cluster If everything has installed successfully. Run the SDK setup script. Install the Java SDK (optional. If the cluster has started. Install the Java SDK package. NOTE Stand alone clusters aren't supported in Linux . 1.only one box and Azure Linux multi-machine clusters are supported in the preview. sudo apt-get install servicefabricsdkjava 2. Run the cluster setup script.sh 2.NET Core SDKs. sudo /opt/microsoft/sdk/servicefabric/common/clustersetup/devclustersetup. you should be able to start a local cluster.0. Open a web browser and navigate to http://localhost:19080/Explorer. follow the optional setup steps provided in subsequent sections. 1. At this point. To build new services using the Java or . You can update Buildship using the instructions here.NET Core programming models) . 5. if you wish to use the . In the "Work with" textbox. NOTE Installing the Java SDK is a prerequisite to using the Eclipse plugin. choose Help > Install New Software. enter: http://dl. ensure that you have latest eclipse Neon and latest Buildship version (1. 6. 3.. Choose the Service Fabric plugin and click next. You can check the versions of installed components by choosing Help > Installation Details. Click Add.17 or later) installed. Select update if a newer version is available. 2. You can check by selecting Help => Installation Details and searching for Service Fabric in the list of installed plugins. For more information. To install the Service Fabric plugin.sh Install the Eclipse Neon plugin (optional) You can install the Eclipse plugin for Service Fabric from within the Eclipse IDE for Java Developers.com/eclipse/servicefabric 4. You can use Eclipse to create Service Fabric guest executable applications and container applications in addition to Service Fabric Java applications. If you already have the Service Fabric Eclipse plugin installed. Proceed through the installation and accept the end-user license agreement. sudo /opt/microsoft/sdk/servicefabric/java/sdkjavasetup.. even if you only use it to create and deploy guest executables and container applications. make sure you are on the latest version.windowsazure. Install the .NET Core SDK (optional. see Service fabric getting started with eclipse. 1.0. In Eclipse. the release notes will specify those steps.The . sudo apt-get install servicefabricsdkcsharp 2. run the following steps (remove SDKs from the list that you don't want to update or install): sudo apt-get update sudo apt-get install servicefabric servicefabricsdkcommon servicefabricsdkcsharp servicefabricsdkjava For updating the CLI. 1. navigate to the directory where you cloned the CLI and run git pull for updating.NET Core SDK provides the libraries and templates required to build Service Fabric services using cross- platform . sudo /opt/microsoft/sdk/servicefabric/csharp/sdkcsharpsetup. If additional steps are needed for updating. Next steps Create and deploy your first Service Fabric Java application on Linux using Yeoman Create and deploy your first Service Fabric Java application on Linux using Service Fabric Plugin for Eclipse Create your first CSharp application on Linux Prepare your development environment on OSX Use the Azure CLI to manage your Service Fabric applications Service Fabric Windows/Linux differences .NET Core SDK package.sh Updating the SDK and Runtime To update to the latest version of the SDK and runtime. Install the . Run the SDK setup script.NET Core. Create the VM vagrant up This step downloads the preconfigured VM image. Create the local VM To create the local VM containing a 5-node Service Fabric cluster.4 or later) VirtualBox NOTE You need to use mutually supported versions of Vagrant and VirtualBox. To run a local Service Fabric cluster.168. boot it locally.com/azure/service-fabric-linux-vagrant-onebox. Before you get started. you need: Vagrant (v1.50. Clone the Vagrantfile repo git clone https://github. perform the following steps: 1. and then set up a local Service Fabric .8. See the Vagrant documentation for the full list of configuration options. Vagrant might behave erratically on an unsupported VirtualBox version.git This steps bring downs the file Vagrantfile containing the VM configuration along with the location the VM is downloaded from.50 enabling passthrough of traffic from the Mac host You can change either of these settings or add other configuration to the VM in the Vagrantfile . (Optional) Modify the default VM settings By default. 4. Prerequisites Service Fabric does not run natively on OS X. the local VM is configured as follows: 3 GB of memory allocated Private host network configured at IP 192. This article covers how to set up your Mac for development. Navigate to the local clone of the repo cd service-fabric-linux-vagrant-onebox 3. we provide a pre-configured Ubuntu virtual machine using Vagrant and VirtualBox. Set up your development environment on Mac OS X 4/10/2017 • 2 min to read • Edit Online You can build Service Fabric applications to run on Linux clusters using Mac OS X. 2. you can download it using wget or curl or through a browser by navigating to the link specified by config. building. Test that the cluster has been set up correctly by navigating to Service Fabric Explorer at http://192.50. you see a message in the output indicating that the cluster is starting up. edit Vagrantfile to point to the local path where you downloaded the image.box.168.vm. 1.box_url to that path. For example if you downloaded the image to /home/users/test/azureservicefabric. and deploying Java services. You should expect it to take a few minutes. You can follow the installation steps mentioned in this general documentation about installing or updating Service Fabric Eclipse plugin.vm. After downloading it locally.50:19080/Explorer (assuming you kept the default private network IP). TIP If the VM download is taking a long time.tp8. If setup completes successfully.box_url in the file Vagrantfile . then set config. . cluster in it. Install the Service Fabric plugin for Eclipse Neon Service Fabric provides a plugin for the Eclipse Neon for Java IDE that can simplify the process of creating. is mostly same as the general documentation. the eclipse project needs to be created in a shared path. the contents at the path on your host where the Vagrantfile exists. and deploying Service Fabric Java application using vagrant-guest container on a Mac host. By default. Next steps Create and deploy your first Service Fabric Java application on Linux using Yeoman Create and deploy your first Service Fabric Java application on Linux using Service Fabric Plugin for Eclipse Create a Service Fabric cluster in the Azure portal Create a Service Fabric cluster using the Azure Resource Manager Understand the Service Fabric application model . ~/home/john/allprojects/ . say. apart from the following items: Since the Service Fabric libraries are required by your Service Fabric Java application. building. then you need to create your Service Fabric project MyActor in location ~/home/john/allprojects/MyActor and the path to your eclipse workspace would be ~/home/john/allprojects . The steps for creating. If you have the Vagrantfile in a path. is shared with the /vagrant path on the guest.Using Service Fabric Eclipse plugin on Mac Ensure you have gone through the steps mentioned in the Service Fabric Eclipse plugin documentation. 2. Launch Visual Studio as an administrator. Create an application project. Name the application and click OK. using the New Project wizard. deploying. Click File > New Project > Cloud > Service Fabric Application. Video walkthrough The following video walks through the steps in this tutorial: Create the application A Service Fabric application can contain one or more services. and debugging Service Fabric applications. along with your first service project. make sure that you have set up your development environment. . 3. 1. each with a specific role in delivering the application's functionality. This topic walks you through the process of creating your first application in Visual Studio 2017 or Visual Studio 2015. Prerequisites Before you get started. You can also add more services later if you want. Create your first Azure Service Fabric application 3/7/2017 • 5 min to read • Edit Online The Service Fabric SDK includes an add-in for Visual Studio that provides templates and tools for creating. Name it and click OK.4. Visual Studio creates the application project and the stateful service project and displays them in Solution . see Service Fabric programming model overview. choose Stateful as the first service type to include in your application. On the next page. NOTE For more information about the options. The script can also be invoked directly at the command line. In addition. Once the application starts. it references a set of service projects. try running it. Visual Studio uses the script behind-the-scenes. The application project does not contain any code directly. Explorer. Deploy and debug the application Now that you have an application. as Visual Studio is creating a local cluster for development. it contains three other types of content: Publish profiles: Used to manage tooling preferences for different environments. Application definition: Includes the application manifest under ApplicationPackageRoot. see Getting started with Reliable Services. A local cluster runs the same platform code that you build on in a multi-machine cluster. where you . Press F5 in Visual Studio to deploy the application for debugging. 1. you get a notification from the local cluster system tray manager application included with the SDK. When the cluster is ready. Associated application parameter files are under ApplicationParameters. which define the application and allow you to configure it specifically for a given environment. Instead. NOTE Deploying takes a while the first time. Scripts: Includes a PowerShell script for deploying/upgrading your application. For an overview of the contents of the service project. 2. just on a single machine. The cluster creation status displays in the Visual Studio output window. Visual Studio automatically brings up the Diagnostics Event Viewer. To simulate the loss of a machine while exercising the Visual Studio debugger at the same time. It mimics a five-node cluster. Expand one of the events to see more details. it is _Node_2. MyStatefulService) and set a breakpoint on the first line of the RunAsync method. the messages simply show the counter value incrementing in the RunAsync method of MyStatefulService. including the node where the code is running. For more information. NOTE The application diagnostic events emitted by the project template use the included ServiceEventSource class. can see trace output from the service. Find the class in your service project that derives from StatefulService (for example. In this case. let's take down one of the nodes on the local cluster. see How to monitor and diagnose services locally. 3. where nodes are on distinct machines. The local cluster contains five nodes hosted on a single machine. . In the case of the stateful service template. though it may differ on your machine.cs. 4. you should see your breakpoint hit in Visual Studio as the computation you were doing on one node seamlessly fails over to another. see Visualizing your cluster. Service Fabric Explorer offers a visual representation of a cluster--including the set of applications deployed to it and the set of physical nodes that make it up. Or. The counter has continued incrementing. expand Cluster > Nodes and find the node where your code is running.5. 6. 7. Click Actions > Deactivate (Restart) to simulate a machine restarting. Return to the Diagnostic Events Viewer and observe the messages. To learn more about Service Fabric Explorer. deactivate the node from the node list view in the left pane. In the left pane. right-click the Local Cluster Manager system tray app and choose Manage Local Cluster. even though the events are actually coming from a different node. To launch Service Fabric Explorer.) Momentarily. . 8. Deploying an application to the five-node development cluster can take some time. Run the cluster setup script from the SDK folder: & "$ENV:ProgramFiles\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup. 2. You can also change the cluster mode using PowerShell: 1. right-click on the Local Cluster Manager in the system tray and select Switch Cluster Mode -> 1 Node.ps1" - CreateOneNodeCluster Cluster setup takes a few moments. which is useful for debugging services deployed across multiple nodes. If you want to iterate code changes quickly. without running your app on five nodes. you should see output similar to: . the local development cluster is configured to run as a five-node cluster. To run your code on a cluster with one node. Launch a new PowerShell window as an administrator.Switch cluster mode By default. After setup is finished. The development cluster resets when you change cluster mode and all applications provisioned or running on the cluster are removed. switch the development cluster to one-node mode. however. Next steps Learn how to create a cluster in Azure or a standalone cluster on Windows. You have several options to manage the cluster: 1. click Stop Local Cluster in the system tray app. Try deploying a Windows container or an existing app as a guest executable. configure monitoring and health reports. Walk through a hands-on-lab and create a stateless service. Learn how to expose your services to the Internet with a web service front end. Try creating a service using the Reliable Services or Reliable Actors programming models. The cluster continues to run in the background. it's important to remember that the local cluster is real. Delete the cluster only if you don't intend to use the local cluster for some time or if you need to reclaim resources. 2. To delete the cluster entirely. and perform an application upgrade. however.Cleaning up Before wrapping up. Learn about Service Fabric support options . Stopping the debugger removes your application instance and unregisters the application type. To shut down the cluster but keep the application data and traces. This option will result in another slow deployment the next time you press F5 in Visual Studio. click Remove Local Cluster in the system tray app. Let's use Yeoman to create an application with a single service. Name your application. we create an application for Linux and build a service using Java.NET Core and Java. see Deploy an existing executable to Azure Service Fabric and Deploy containers to Service Fabric. any applications including Java applications can be run as guest executables or inside containers on Windows or Linux. In a terminal. However. For more information. The Service Fabric SDK for Linux includes a Yeoman generator that makes it easy to create your first service and to add more later. If you are using Mac OS X. Video tutorial The following Microsoft Virtual Academy video walks you through the process of creating a Java app on Linux: Prerequisites Before you get started. For the purposes of this tutorial. Create your first Azure Service Fabric application 3/23/2017 • 3 min to read • Edit Online Service Fabric provides SDKs for building services on Linux in both . 3. we choose a Reliable Actor Service. Choose the type of your first service and name it. Create the application A Service Fabric application can contain one or more services. NOTE Java as a first class built-in programming language is supported for the Linux preview only (Windows support is planned). In this tutorial. you can set up a Linux one-box environment in a virtual machine using Vagrant. make sure that you have set up your Linux development environment. . each with a specific role in delivering the application's functionality. type yo azuresfjava . 2. 1. . you can deploy it to the local cluster using the Azure CLI. and create an instance of the application. Connect to the local Service Fabric cluster. cd myapp gradle Deploy the application Once the application is built. register the application type. azure servicefabric cluster connect 2. NOTE For more information about the options. see Service Fabric programming model overview. Build the application The Service Fabric Yeoman templates include a build script for Gradle. 1. which you can use to build the app from the terminal. Use the install script provided in the template to copy the application package to the cluster's image store. Also./testclient. 1. In Service Fabric Explorer. Run the script using the watch utility to see the output of the actor service. Please refer to the detailed documentation . This action restarts one of the five nodes in your local cluster and force a failover to one of the secondary replicas running on another node. Open a browser and navigate to Service Fabric Explorer at http://localhost:19080/Explorer (replace localhost with the private IP of the VM if using Vagrant on Mac OS X). Click the node you found in the previous step. Start the test client and perform a failover Actor projects do not do anything on their own./install. Expand the Applications node and note that there is now an entry for your application type and another for the first instance of that type. They require another service or client to send them messages. 3. locate node hosting the primary replica for the actor service. it is node 3. 4.sh 3. choose the Eclipse IDE for Java developers. build and deploy Service Fabric Java application using Eclipse. then select Deactivate (restart) from the Actions menu. Service Fabric currently supports the plugin for Eclipse Neon. The actor template includes a simple test script that you can use to interact with the actor service. When installing Eclipse.sh 2. As you perform this action. . Create and deploy an application with the Eclipse Neon plugin Service Fabric also gives you the provision to create. cd myactorsvcTestClient watch -n 1 . In the screenshot below. pay attention to the output from the test client and note that the counter continues to increment despite the failover.Create and deploy your first Service Fabric Java application using Service Fabric Plugin for Eclipse on Linux Adding more services to an existing application Using command line utility . 2. if MyApplication is the application created by Yeoman. cd ~/YeomanSamples/MyApplication . For example. Next steps Create and deploy your first Service Fabric Java application using Service Fabric Plugin for Eclipse on Linux Learn more about Reliable Actors Interacting with Service Fabric clusters using the Azure CLI Troubleshooting deployment Learn about Service Fabric support options . Run yo azuresfjava:AddService Using Service Fabric Eclipse plugin for Java on Linux To add service to an existing application created using Eclipse plugin for Service Fabric refer to documentation here.To add another service to an application already created using yo . perform the following steps: 1. Change directory to the root of the existing application. Create the application A Service Fabric application can contain one or more services. Prerequisites Before you get started. NOTE For more information about the options. . type the following command to start building the scaffolding: yo azuresfcsharp 2. see Service Fabric programming model overview. Choose the type of your first service and name it. Name your application. 3. make sure that you have set up your Linux development environment. you can set up a Linux one-box environment in a virtual machine using Vagrant. we choose a Reliable Actor Service. For the purposes of this tutorial. 1. we look at how to create an application for Linux and build a service using C# (. In a terminal. Create your first Azure Service Fabric application 3/3/2017 • 2 min to read • Edit Online Service Fabric provides SDKs for building services on Linux in both . The Service Fabric SDK for Linux includes a Yeoman generator that makes it easy to create your first service and to add more later.NET Core and Java. each with a specific role in delivering the application's functionality. If you are using Mac OS X. In this tutorial.NET Core). Let's use Yeoman to create an application with a single service. cd myapp . 1. . it is node 3. Connect to the local Service Fabric cluster. locate node hosting the primary replica for the actor service./testclient. 1. cd myactorsvcTestClient watch -n 1 . They require another service or client to send them messages. register the application type. Open a browser and navigate to Service Fabric Explorer at http://localhost:19080/Explorer (replace localhost with the private IP of the VM if using Vagrant on Mac OS X)./build. and create an instance of the application.sh Deploy the application Once the application is built. Start the test client and perform a failover Actor projects do not do anything on their own. Expand the Applications node and note that there is now an entry for your application type and another for the first instance of that type. The actor template includes a simple test script that you can use to interact with the actor service. In Service Fabric Explorer.sh 3. Run the script using the watch utility to see the output of the actor service. 4. In the screenshot below. you can deploy it to the local cluster using the Azure CLI./install.sh 2. Use the install script provided in the template to copy the application package to the cluster's image store. .Build the application The Service Fabric Yeoman templates include a build script that you can use to build the app from the terminal (after navigating to the application folder). azure servicefabric cluster connect 2. As you perform this action. perform the following steps: 1. Adding more services to an existing application To add another service to an application already created using yo . cd ~/YeomanSamples/MyApplication . 2. if MyApplication is the application created by Yeoman. then select Deactivate (restart) from the Actions menu. This action restarts one node in your local cluster forcing a failover to a secondary replica running on another node. For example. Run yo azuresfcsharp:AddService Next steps Learn more about Reliable Actors Interacting with Service Fabric clusters using the Azure CLI Learn about Service Fabric support options .3. Change directory to the root of the existing application. pay attention to the output from the test client and note that the counter continues to increment despite the failover. Click the node you found in the previous step. Run the cluster setup script from the SDK folder: & "$ENV:ProgramFiles\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup. NOTE This article assumes that you already set up your development environment. you can skip this section. Typically. deploy an existing application to it. and then upgrade that application to a new version. all from Windows PowerShell. you create a local cluster. It runs the same platform code that is found on multi-machine clusters. NOTE If you have already created a local cluster by deploying an application from Visual Studio. 2. In this tutorial. Create a local cluster A Service Fabric cluster represents a set of hardware resources that you can deploy applications to. we use the PowerShell script. It is important to understand that the Service Fabric local cluster is not an emulator or simulator. Launch a new PowerShell window as an administrator. In this article. However. The only difference is that it runs the platform processes that are normally spread across five machines on one machine.ps1" Cluster setup takes a few moments. a cluster is made up of anywhere from five to many thousands of machines. The SDK provides two ways to set up a local cluster: a Windows PowerShell script and the Local Cluster Manager system tray app. Get started with deploying and upgrading applications on your local cluster 4/7/2017 • 7 min to read • Edit Online The Azure Service Fabric SDK includes a full local development environment that you can use to quickly get started with deploying and managing applications on a local cluster. you should see output similar to: . the Service Fabric SDK includes a cluster configuration that can run on a single machine. 1. After setup is finished. and upgrade. You are now ready to try deploying an application to your cluster. If you are interested in learning how to create applications in Visual Studio. Import the Service Fabric SDK PowerShell module. 2. such as C:\ServiceFabric.zip extension. Connect to the local cluster: Connect-ServiceFabricCluster localhost:19000 6. Create a directory to store the application that you download and deploy. Change the file extension to .sfpkg. Note: the Microsoft Edge browser saves the file with a . 5. Import-Module "$ENV:ProgramFiles\Microsoft SDKs\Service Fabric\Tools\PSModule\ServiceFabricSDK\ServiceFabricSDK.psm1" 3. Create a new application using the SDK's deployment command with a name and a path to the application package. 1. monitoring. see Create your first Service Fabric application in Visual Studio. In this tutorial. Deploy an application The Service Fabric SDK includes a rich set of frameworks and developer tooling for creating applications. . mkdir c:\ServiceFabric\ cd c:\ServiceFabric\ 4. you use an existing sample application (called WordCount) so that you can focus on the management aspects of the platform: deployment. Launch a new PowerShell window as an administrator. Download the WordCount application to the location you created. It includes client-side JavaScript code to generate random five- character "words". You should see: The WordCount application is simple.NET Web API. They are partitioned based on the first character of the word. which are then relayed to the application via ASP. So words beginning with A through G are stored in the first partition.sfpkg - ApplicationName "fabric:/WordCount" If all goes well. 1.html. Publish-NewServiceFabricApplication -ApplicationPackagePath c:\ServiceFabric\WordCountV1. The application that we deployed contains four partitions. Query all deployed applications on the cluster: . You can find the source code for the WordCount app in the classic getting started samples. words beginning with H through N are stored in the second partition. you should see the following output: 7. launch the browser and navigate to http://localhost:8081/wordcount/index. To see the application in action. View application details and status Now that we have deployed the application. let's look at some of the app details in PowerShell. and so on. A stateful service tracks the number of words counted. Finally. look at the list of partitions for WordCountService: Get-ServiceFabricPartition 'fabric:/WordCount/WordCountService' . Get-ServiceFabricService -ApplicationName 'fabric:/WordCount' The application is made up of two services--the web front end and the stateful service that manages the words. 3. Go to the next level by querying the set of services that are included in the WordCount application. Get-ServiceFabricApplication Assuming that you have only deployed the WordCount app. you see something similar to: 2. The set of commands that you used. . you can use the web-based Service Fabric Explorer tool by navigating to http://localhost:19080/Explorer in the browser. like all Service Fabric PowerShell commands. are available for any cluster that you might connect to. local or remote. For a more visual way to interact with the cluster. Then begin upgrading the fabric:/WordCount application. 1. notice that the WordCountService version changed but the WordCountWebService version did not: . While the upgrade is proceeding. Launch a browser window and navigate to http://localhost:19080/Explorer. you see the status of the upgrade as it proceeds through the cluster's upgrade domains. we see two changes in the application's behavior. If you rerun the earlier query for the set of services in the fabric:/WordCount application.sfpkg - ApplicationName "fabric:/WordCount" -UpgradeParameters @{"FailureAction"="Rollback". "Force"=$true} You should see the following output in PowerShell as the upgrade begins. the rate at which the count grows should slow. Download the WordCount version 2 package to the same location where you downloaded the version 1 package. Perform an upgrade of the WordCount application. then choose WordCount. you may find it easier to monitor its status from Service Fabric Explorer. since fewer words are being counted. Second. health checks are performed to ensure that the application is behaving properly. "Monitored"=$true. 4. First. its count should eventually start to outpace the others. since the first partition has two vowels (A and E) and all other partitions contain only one each. As the upgrade proceeds through each domain. Upgrade an application Service Fabric provides no-downtime upgrades by monitoring the health of the application as it rolls out across the cluster. Publish-UpgradedServiceFabricApplication -ApplicationPackagePath C:\ServiceFabric\WordCountV2. 3. and finally fabric:/WordCount. In the essentials tab. The new version of the application now counts only words that begin with a vowel. 2. Return to your PowerShell window and use the SDK's upgrade command to register the new version in the cluster. see Visualizing your cluster with Service Fabric Explorer. As the upgrade rolls out. Expand Applications in the tree on the left. "UpgradeReplicaSetCheckTimeout"=1. NOTE To learn more about Service Fabric Explorer. You have several options to manage applications and the cluster: 1. Finally. the count progresses more slowly. It touches only the set of services (or code/configuration packages within those services) that have changed. return to the browser to observe the behavior of the new application version. Depending on the nature of your apps. which makes the process of upgrading faster and more reliable. Applications continue to run in the background until you remove them. Get-ServiceFabricService -ApplicationName 'fabric:/WordCount' This example highlights how Service Fabric manages application upgrades. Cleaning up Before wrapping up. As expected. it's important to remember that the local cluster is real. run the following command: . and the first partition ends up with slightly more of the volume. To remove an individual application and all it's data. a running app can take up significant resources on your machine. 5. 2. from the cluster's image store. One-node and five-node cluster mode When developing applications.0. To shut down the cluster but keep the application data and traces. Deletion removes the application packages. select Switch Cluster Mode in the Service Fabric Local Cluster Manager.0 and 2.0. To help optimize this process.0. . delete the application from the Service Fabric Explorer ACTIONS menu or the context menu in the left- hand application list view. in Service Fabric Explorer. and debugging.0 Remove-ServiceFabricApplicationType -ApplicationTypeName WordCount -ApplicationTypeVersion 1. The local development cluster runs the same platform code that is found on multi-machine clusters. To delete the cluster entirely. you often find yourself doing quick iterations of writing code. One-node cluster mode is optimized to do quick deployment and registration of services. including the code and configuration. 4. The data stored in the cluster is deleted when you change cluster mode. click Remove Local Cluster in the system tray app. Remove-ServiceFabricApplicationType -ApplicationTypeName WordCount -ApplicationTypeVersion 2. to help you quickly validate code using the Service Fabric runtime. choose Unprovision Type for the application. WARNING When you change the cluster mode. unregister versions 1.0 Or. changing code. 3.0 of the WordCount application type. Both cluster modes have their benefits. click Stop Local Cluster in the system tray app. Unpublish-ServiceFabricApplication -ApplicationName "fabric:/WordCount" Or. work with more instances and replicas of your services. the local cluster can run in two modes: one-node or five-node. This option will result in another slow deployment the next time you press F5 in Visual Studio. You can test failover scenarios. Five-node cluster mode enables you to work with a real cluster. Remove the local cluster only if you don't intend to use it for some time or if you need to reclaim resources. Neither one-node cluster or five-node cluster modes are an emulator or simulator.0. After deleting the application from the cluster. the current cluster is removed from your system and a new cluster is created. debugging. To change the mode to one-node cluster. . See the upgrade documentation to learn more about the power and flexibility of Service Fabric upgrades. Launch a new PowerShell window as an administrator. After setup is finished. The upgrade that we performed in this article was basic. 2. Run the cluster setup script from the SDK folder: & "$ENV:ProgramFiles\Microsoft SDKs\Service Fabric\ClusterSetup\DevClusterSetup.ps1" - CreateOneNodeCluster Cluster setup takes a few moments. you can try building your own in Visual Studio. All the actions performed on the local cluster in this article can be performed on an Azure cluster as well.Or. change the cluster mode using PowerShell: 1. you should see output similar to: Next steps Now that you have deployed and upgraded some pre-built applications. Fill out the Service Fabric Basics form.azure. Click the New button found on the upper left-hand corner of the Azure portal. through the Azure portal in just a few minutes. create a free account before you begin. . A resource group is a logical container into which Azure resources are created and collectively managed. The user name and password entered here is used to log in to the virtual machine. This quickstart helps you to create a five-node cluster. select the version of Windows or Linux you want the cluster nodes to run. Log in to Azure Log in to the Azure portal at http://portal. When complete. For Operating system. Create your first Service Fabric cluster on Azure 4/18/2017 • 5 min to read • Edit Online A Service Fabric cluster is a network-connected set of virtual or physical machines into which your microservices are deployed and managed. click OK. Create the cluster 1. For Resource group.com. If you don't have an Azure subscription. 2. create a new one. running on either Windows or Linux. Select Compute from the New blade and then select Service Fabric Cluster from the Compute blade. 3. 6. Node types define the VM size. For this quickstart. and capacity properties. however. Custom endpoints open up ports in the Azure load balancer so that you can connect with applications running on the cluster. set Diagnostics to On. have different sets of ports open. In the Cluster configuration form. which is used for customizing TCP/HTTP management endpoints. or primary. Enter "80. Select OK. For Node type count. number of VMs. custom endpoints. you do not need to enter any . Each node type can be scaled up or down independently.4. Do not check the Configure advanced settings box. and other settings for the VMs of that type. and can have different capacity metrics. For any production deployment. Fill out the Cluster configuration form. Each node type defined is set up as a separate virtual machine scale set. application port ranges. node type is where Service Fabric system services are hosted and must have five or more VMs. placement constraints." 5. The first. enter "1" and the Durability tier to "Bronze. which is used to deploy and managed virtual machines as a set. 8172" to open up ports 80 and 8172. Select "Silver" for the reliability tier and an initial virtual machine scale set capacity of 5. you aren't running applications so select a DS1_v2 Standard VM size. Select Configure each node type and fill out the Node type configuration form. For this quick start. capacity planning is an important step. Select Create to create the cluster. select Download template and parameters. To enable user authentication using Azure Active Directory or to set up certificates for application security. In Fabric version. you see Deploying Service Fabric Cluster pinned to the Start board. however. View cluster status . 7. You can see the creation progress in the notifications. since anyone can anonymously connect to an unsecure cluster and perform management operations. It is highly recommended to create a secure cluster for production workloads. Fill out the Security form. Review the summary. 8. see Service Fabric cluster security scenarios. create a cluster from a Resource Manager template. (Click the "Bell" icon near the status bar at the upper right of your screen. Set the mode to Manual if you want to choose a supported version to upgrade to. For this quick start select Unsecure. Select OK. select Automatic upgrade mode so that Microsoft automatically updates the version of the fabric code running the cluster.) If you clicked Pin to Startboard while creating the cluster. Certificates are used in Service Fabric to provide authentication and encryption to secure various aspects of a cluster and its applications. If you'd like to download a Resource Manager template built from the settings you entered. fabric setting properties. Select OK. For more information on how certificates are used in Service Fabric. including a summary of application and node health. You can also enter the address directly into the browser: http://quickstartcluster. You can now see the details of your cluster in the dashboard. The ServiceFabric PowerShell module is installed .westus. Access it using a web browser by clicking the Service Fabric Explorer link of the cluster Overview page in the portal. you can inspect your cluster in the Overview blade in the portal. including the cluster's public endpoint and a link to Service Fabric Explorer. Connect to the cluster using PowerShell Verify that the cluster is running by connecting using PowerShell. The node view shows the physical layout of the cluster.Once your cluster is created. For a given node.azure. Service Fabric Explorer is a service that runs in the cluster. Visualize the cluster using Service Fabric explorer Service Fabric Explorer is a good tool for visualizing your cluster and managing applications. you can inspect which applications have code deployed on that node.cloudapp.com:19080/Explorer The cluster dashboard provides an overview of your cluster. ----------.216.0 1 Up 00:59:04 00:00:00 Ok _nodetype1_4 10.216.with the Service Fabric SDK.5.216.0 1 Up 00:59:04 00:00:00 Ok _nodetype1_0 10.5.5.8 nodetype1 5. In the Resource Group Essentials page.0.0.0 1 Up 00:59:04 00:00:00 Ok _nodetype1_1 10.0.5.---------.6 nodetype1 5.---------. 3.--- --------. Navigate to the Service Fabric cluster you want to delete. Connect-ServiceFabricCluster -ConnectionEndpoint localhost:19000 See Connect to a secure cluster for other examples of connecting to a cluster. PS C:\> Get-ServiceFabricNode |Format-Table NodeDeactivationInfo NodeName IpAddressOrFQDN NodeType CodeVersion ConfigVersion NodeStatus NodeUpTime NodeDownTime HealthState -------------------.7 nodetype1 5. For other ways to delete a cluster or to delete some (but not all) the resources in a resource group.0. So to completely delete a Service Fabric cluster you also need to delete all the resources it is made of.0 1 Up 00:59:04 00:00:00 Ok _nodetype1_3 10. The Connect-ServiceFabricCluster cmdlet establishes a connection to the cluster. HealthState should be OK for each node. use the Get-ServiceFabricNode cmdlet to display a list of nodes in the cluster and status information for each node. click Delete and follow the instructions on that page to complete the deletion of the resource group.4 nodetype1 5.216.-------.0.------------.216.0.5. Click the Resource Group name on the cluster essentials page.0. After connecting to the cluster. 2.-------. see Delete a cluster Delete a resource group in the Azure portal: 1.----------- _nodetype1_2 10. --------------.0 1 Up 00:59:04 00:00:00 Ok Remove the cluster A Service Fabric cluster is made up of other Azure resources in addition to the cluster resource itself. The simplest way to delete the cluster and all it's resources is to delete the resource group. .0.0.0.5 nodetype1 5. Next steps Now that you have set up a development standalone cluster. try the following: Create a secure cluster in the portal Create a cluster from a template Deploy apps using PowerShell . The properties section defines the security. on-premises or in the cloud. Validate the environment The TestConfiguration. Unzip the setup package to a folder on the computer or virtual machine where you are setting up the development cluster. Anyone can connect anonymously and perform management operations.Unsecure. reliability level. fault domain.DevCluster. and upgrade domain. The contents of the setup package are described in detail here.ps1 script in the standalone package is used as a best practices analyzer to validate whether a cluster can be deployed on a given environment.509 certificates or Windows security. but look through the config file and get familiar with the settings.ps1 script from an administrator PowerShell session: .\ClusterConfig.\CreateServiceFabricCluster. run the CreateServiceFabricCluster. This cluster is unsecure. Other config files describe single or multi-machine clusters secured with X. Download the setup package. Read Secure a cluster to learn more about Service Fabric cluster security.509 certificates or Windows security.ps1 -ClusterConfigFilePath . This quickstart helps you to create a development standalone cluster in just a few minutes.DevCluster. and types of nodes for the cluster. IP address.DevCluster. you have a three-node cluster running on a single computer that you can deploy apps to.\ClusterConfig. Deployment preparation lists the pre-requisites and environment requirements. Run the script to verify if you can create the development cluster: .ps1 -ClusterConfigFilePath . Before you begin Service Fabric provides a setup package to create Service Fabric standalone clusters. You cannot install Service Fabric on a domain controller. Create your first Service Fabric standalone cluster 4/17/2017 • 3 min to read • Edit Online You can create a Service Fabric standalone cluster on any virtual machines or computers running Windows Server 2012 R2 or Windows Server 2016. When you're finished. node type.Unsecure. The nodes section describes the three nodes in the cluster: name. Security is only configured at cluster creation time and it is not possible to enable security after the cluster is created.\TestConfiguration. You don't need to modify any of the default config settings for this tutorial.Unsecure.json Create the cluster Several sample cluster configuration files are installed with the setup package. The cluster administrator deploying and configuring the cluster must have administrator privileges on the computer. ClusterConfig. three-node cluster running on a single computer. so production clusters should always be secured using X. Connect to the cluster .json is the simplest cluster configuration: an unsecure. To create the three-node development cluster. diagnostics collection.json -AcceptEULA The Service Fabric runtime package is automatically downloaded and installed at time of cluster creation. -------.216.0 0 Up 03:00:07 00:00:00 Ok vm1 localhost NodeType1 5.Your three-node development cluster is now running.----------.Azure. --------------.5.---------. which you access using a browser by navigating to http://localhost:19080/Explorer. including a summary of application and node health. The Connect-ServiceFabricCluster cmdlet establishes a connection to the cluster. The node view shows the physical layout of the cluster.5.-------. you can inspect which applications have code deployed on that node.216.------------. run the RemoveServiceFabricCluster. PS C:\temp\Microsoft.ServiceFabric. HealthState should be OK for each node. The ServiceFabric PowerShell module is installed with the runtime.WindowsServer> Get-ServiceFabricNode |Format-Table NodeDeactivationInfo NodeName IpAddressOrFQDN NodeType CodeVersion ConfigVersion NodeStatus NodeUpTime NodeDownTime HealthState -------------------.---------.----------- vm2 localhost NodeType2 5.0 0 Up 03:00:01 00:00:00 Ok Visualize the cluster using Service Fabric explorer Service Fabric Explorer is a good tool for visualizing your cluster and managing applications. Remove the cluster To remove a cluster. After connecting to the cluster.ps1 PowerShell script from the package folder and pass in the path to the JSON configuration file.5. You can optionally specify a location for the log of the deletion.------ -----. use the Get-ServiceFabricNode cmdlet to display a list of nodes in the cluster and status information for each node. Connect-ServiceFabricCluster -ConnectionEndpoint localhost:19000 See Connect to a secure cluster for other examples of connecting to a cluster. . You can verify that the cluster is running from the same computer or from a remote computer with the Service Fabric runtime. For a given node. Service Fabric Explorer is a service that runs in the cluster.216. The cluster dashboard provides an overview of your cluster.0 0 Up 03:00:02 00:00:00 Ok vm0 localhost NodeType0 5. DevCluster.ps1 -ClusterConfigFilePath . .\ClusterConfig. . try the following articles: Set up a multi-machine standalone cluster and enable security.ps1 Next steps Now that you have set up a development standalone cluster. run the following PowerShell script from the package folder. # Removes Service Fabric from the current computer. # Removes Service Fabric cluster nodes from each computer in the configuration file.json -Force To remove the Service Fabric runtime from the computer. Deploy apps using PowerShell .\RemoveServiceFabricCluster.Unsecure.\CleanFabric. Look at continuous delivery for various workloads. Find out how to design. VIDEO POWERPOINT DECK Cluster Planning and Management Hyper-scale web Review concepts around hyper-scale web. and more. IoT at scale. and then learn how to optimize resources for your application. hyper-scale. reusable patterns. Get an overview of Service Fabric and then dive deep into topics that cover cluster optimization and security. and operate your microservices on Service Fabric using best practices and proven. and state management. and cluster security. including availability and reliability. and learn about choosing platform as a service (PaaS) over infrastructure as a service (IaaS). VIDEO POWERPOINT DECK Introduction to Service Fabric Cluster planning and management Learn about capacity planning. hosting game engines. Service Fabric patterns and scenarios 3/8/2017 • 1 min to read • Edit Online If you’re looking at building large-scale microservices using Azure Service Fabric. develop. in this look at Azure Service Fabric. migrating legacy apps. Get started with proper architecture. and even get the details on Linux support and containers. Introduction Explore best practices. learn from the experts who designed and built this platform as a service (PaaS). cluster optimization. Get the details on following proven application design principles. . The Service Fabric Patterns and Practices course answers the questions most often asked by real-world customers about Service Fabric scenarios and application areas. interactive games. and service package/share. multi-environment setup. VIDEO POWERPOINT DECK IoT Gaming Look at turn-based games. VIDEO POWERPOINT DECK Hyper-scale web IoT Explore the Internet of Things (IoT) in the context of Azure Service Fabric. and IoT at scale. and hosting existing game engines. VIDEO POWERPOINT DECK . build/package/publish workflow. multi- tenancy. including the Azure IoT pipeline. including continuous integration/continuous delivery with Visual Studio Team Services. VIDEO POWERPOINT DECK Gaming Continuous delivery Explore concepts. Plus. and deploy containers. in addition to migration of legacy apps. "Why containers?" Learn about the preview for Windows containers. read more about how to create and manage clusters. set up continuous delivery. . and Linux containers orchestration. Continuous delivery Migration Learn about migrating from a cloud service. migrate Cloud Services apps to Service Fabric. Linux supports. find out how to migrate . VIDEO POWERPOINT DECK Containers and Linux support Next steps Now that you've learned about Service Fabric patterns and scenarios. VIDEO POWERPOINT DECK Migration Containers and Linux support Get the answer to the question.NET Core apps to Linux. Each component in this hierarchical application model can be versioned and upgraded independently. Classes (or "types") of applications and services are described through XML files (application manifests and service manifests) that are the templates against which applications can be instantiated from the cluster's image store. The code for different application instances will run as separate processes even when hosted by the same Service Fabric node. To simplify the diagram. the lifecycle of each application instance can be managed (i.xml file is installed with the Service Fabric SDK and tools to C:\Program Files\Microsoft SDKs\Service Fabric\schemas\ServiceFabricServiceModel. and packages.e. but the core functionality remains the same. though each service type would include some or all of those package types. configuration. configuration. The schema definition for the ServiceManifest. Model an application in Service Fabric 3/27/2017 • 7 min to read • Edit Online This article provides an overview of the Azure Service Fabric application model and how to define an application and service via manifest files. The categorization can have different settings and configurations. which in turn are composed of code. A service performs a complete and standalone function (it can start and run independently of other services) and is composed of code. and data. configuration consists of service settings that can be loaded at run time. . code consists of the executable binaries. A service type is a categorization of a service. Furthermore. An application type is a categorization of an application and consists of a bundle of service types. only the code/config/data packages for ServiceType4 are shown.xsd. and data consists of arbitrary static data to be consumed by the service. For each service. The instances of a service are the different service configuration variations of the same service type.xml and ApplicationManifest. Understand the application model An application is a collection of constituent services that perform a certain function or functions. The following diagram shows how application types are composed of service types. upgraded) independently. stateful service instances. This replication essentially provides redundancy for the service to be available even if one node in a cluster fails. health properties. it describes the code. configuration. see Visualizing your cluster with Service Fabric Explorer. These are covered in detail in the ensuing sections. For more details.Two different manifest files are used to describe applications and services: the service manifest and application manifest. It specifies service metadata such as service type. load-balancing metrics. Describe a service The service manifest declaratively defines the service type and version. partitions. achieve high reliability by replicating state between replicas located on different nodes in the cluster. and data packages that compose a service package to support one or more service types. service binaries. or replicas. TIP You can view the layout of applications in a cluster using the Service Fabric Explorer tool available at http://<yourclusteraddress>:19080/Explorer. A partitioned service further divides its state (and access patterns to that state) across nodes in the cluster. and replicas. There can be one or more instances of a service type active in the cluster. For example. and configuration files. Here is a simple example service manifest: . The following diagram shows the relationship between applications and service instances. Put another way. The presence of a separate setup entry point avoids having to run the service host with high privileges for extended periods of time. they are all activated whenever the system looks for any one of the declared service types.0" encoding="utf-8" ?> <ServiceManifest Name="MyServiceManifest" Version="SvcManifestVersion1" xmlns="http://schemas. <?xml version="1. Note that service types are declared at the manifest level and not the code package level.com/2011/01/fabric" xmlns:xsi="http://www. DataPackage declares a folder.exe needs some environment variables configured for deploying a node.exe</Program> </ExeHost> </EntryPoint> <EnvironmentVariables> <EnvironmentVariable Name="MyEnvVariable" Value=""/> <EnvironmentVariable Name="HttpGatewayPort" Value="19080"/> </EnvironmentVariables> </CodePackage> <ConfigPackage Name="MyConfig" Version="ConfigVersion1" /> <DataPackage Name="MyData" Version="DataVersion1" /> </ServiceManifest> Version attributes are unstructured strings and not parsed by the system. The executable specified by EntryPoint is typically the long- running service host. that contains arbitrary static data to be consumed by the process at run time.js application.org/2001/XMLSchema-instance"> <Description>An example service manifest</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="MyServiceType" /> </ServiceTypes> <CodePackage Name="MyCode" Version="CodeVersion1"> <SetupEntryPoint> <ExeHost> <Program>MySetup.xml to provide different values for different service instances. npm. named by the Name attribute. ServiceTypes declares what service types are supported by CodePackages in this manifest. The executable specified by EntryPoint is run after SetupEntryPoint exits successfully. So when there are multiple code packages. Typical scenarios for using SetupEntryPoint are when you need to run an executable before the service starts or you need to perform an operation with elevated privileges. These can be overridden in the ApplicationManifest. .bat</Program> </ExeHost> </SetupEntryPoint> <EntryPoint> <ExeHost> <Program>MyServiceHost. For example. The resulting process is monitored and restarted (beginning again with SetupEntryPoint) if it ever terminates or crashes.w3. For more details on how to configure the SetupEntryPoint see Configure the policy for a service setup entry point EnvironmentVariables provides a list of environment variables that are set for this code package. These are used to version each component for upgrades. This is not limited to only executables written via the Service Fabric programming models. Setting up access control by installing security certificates. The resulting processes are expected to register the supported service types at run time. all code packages declared in this manifest are activated by running their entry points. When a service is instantiated against one of these service types. For example: Setting up and initializing environment variables that the service executable needs.microsoft. SetupEntryPoint is a privileged entry point that runs with the same credentials as Service Fabric (typically the LocalSystem account) before any other entry point. w3. that contains a Settings. key-value pair settings that the process can read back at run time. instance count/replication factor.xml file: <Settings xmlns:xsd="http://www. then the running process is not restarted. These are also used to version each component for upgrades. if only the ConfigPackage version has changed. Each of those can be versioned independently. Thus. security/isolation policy. Here is an example Settings. configuration. an application manifest describes elements at the application level and references one or more service manifests to compose an application type.0" encoding="utf-8" ?> <ApplicationManifest ApplicationTypeName="MyApplicationType" ApplicationTypeVersion="AppManifestVersion1" xmlns="http://schemas. It specifies service composition metadata such as stable names. Instead. a callback notifies the process that configuration settings have changed so they can be reloaded dynamically. named by the Name attribute. . This file contains sections of user-defined. Imported service manifests determine what service types are valid within this application type.microsoft. During an upgrade. configuration overrides. Within the ServiceManifestImport you can override configuration values in Settings. and data packages. Here is a simple example application manifest: <?xml version="1. and constituent service types.org/2001/XMLSchema-instance"> <Description>An example application manifest</Description> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="MyServiceManifest" ServiceManifestVersion="SvcManifestVersion1"/> <ConfigOverrides/> <EnvironmentOverrides CodePackageRef="MyCode"/> </ServiceManifestImport> <DefaultServices> <Service Name="MyService"> <StatelessService ServiceTypeName="MyServiceType" InstanceCount="1"> <SingletonPartition/> </StatelessService> </Service> </DefaultServices> </ApplicationManifest> Like service manifests. partitioning scheme.com/2011/01/fabric" xmlns:xsi="http://www.w3.microsoft.xml and environment variables in ServiceManifest.ConfigPackage declares a folder.org/2001/XMLSchema- instance" xmlns="http://schemas. placement constraints. The load- balancing domains into which the application is placed are also described. Describe an application The application manifest declaratively describes the application type and version.com/2011/01/fabric"> <Section Name="MyConfigurationSection"> <Parameter Name="MySettingA" Value="Example1" /> <Parameter Name="MySettingB" Value="Example2" /> </Section> </Settings> NOTE A service manifest can contain multiple code.xml files.w3. ServiceManifestImport contains references to service manifests that compose this application type.xml file. Version attributes are unstructured strings and are not parsed by the system.org/2001/XMLSchema" xmlns:xsi="http://www. Configure security policies for your application describes how to run services under security policies to restrict access. They are upgraded along with any other services in the application instance and can be removed as well. see Managing application parameters for multiple environments. Default services are just a convenience and behave like normal services in every respect after they have been created.DefaultServices declares service instances that are automatically created whenever an application is instantiated against this application type. . NOTE An application manifest can contain multiple service manifest imports and default services. Managing application parameters for multiple environments describes how to configure parameters and environment variables for different application instances. Each service manifest import can be versioned independently. To learn how to maintain different application and service parameters for individual environments. Next steps Package an application and get it ready to deploy. Deploy and remove applications describes how to use PowerShell to manage application instances. To create a package. For example: Setting up and initializing environment variables that the service executable needs. Use SetupEntryPoint Typical scenarios for using SetupEntryPoint are when you need to run an executable before the service starts or you need to perform an operation with elevated privileges.xml │ └───MyData init. right-click the application project in Solution Explorer and choose the Package command. npm.xml │ ├───MyCode │ MyServiceHost. Setting up access control by installing security certificates. The example manifests in this article would need to be organized in the following directory structure: PS D:\temp> tree /f . For more details on how to configure the SetupEntryPoint.exe needs some environment variables configured for deploying a node.dat The folders are named to match the Name attributes of each corresponding element. This is not limited to only executables written via the Service Fabric programming models.js application. then two folders with the same names would contain the necessary binaries for each code package. Package an application 4/13/2017 • 5 min to read • Edit Online This article describes how to package a Service Fabric application and make it ready for deployment. For example. For example.exe │ ├───MyConfig │ Settings. as shown below: . you can use the Package command to automatically create a package that matches the layout described above. see Configure the policy for a service setup entry point Configure Build a package by using Visual Studio If you use Visual Studio 2015 to create your application.\MyApplicationType D:\TEMP\MYAPPLICATIONTYPE │ ApplicationManifest.xml │ └───MyServiceManifest │ ServiceManifest. Package layout The application manifest. service manifest(s). if the service manifest contained two code packages with the names MyCodeA and MyCodeB. and other necessary package files must be organized in a specific layout for deployment into a Service Fabric cluster. exe .bat is not found. PS D:\temp> Test-ServiceFabricApplicationPackage .bat file referenced in the service manifest SetupEntryPoint is missing from the code package.When packaging is complete. The packaging step occurs automatically when you deploy or debug your application in Visual Studio. this is what Visual Studio is running so the output is the same. This command checks for manifest parsing issues and verify all references.xml This error shows that the MySetup. After the missing file is added.sfproj /t:Package Test the package You can verify the package structure locally through PowerShell by using the Test-ServiceFabricApplicationPackage command. Under the hood. you can find the location of the package in the Output window. This command only verifies the structural correctness of the directories and files in the package. D:\Temp> msbuild HelloWorld.\MyApplicationType False Test-ServiceFabricApplicationPackage : The EntryPoint MySetup. Build a package by command line It is also possible to programmatically package up your application using msbuild.e2h\MyApplicationType\MySe rviceManifest\ServiceManifest. FileName: C:\Users\servicefabric\AppData\Local\Temp\TestApplicationPackage_7195781181\nrri205a. It doesn't verify any of the code or data package contents beyond checking that all necessary files are present. the application verification passes: . but registering and un-registering the application type is faster. You can compress a package by running the Powershell command Copy-ServiceFabricApplicationPackage with CompressPackage switch. Compress a package When a package is large or has many files.\MyApplicationType D:\TEMP\MYAPPLICATIONTYPE │ ApplicationManifest. using UncompressPackage switch. PS D:\temp> tree /f . Zipping the manifests would make these operations inefficient. as needed. The compression replaces the valid Service Fabric package with the compressed version. For example. the package is also validated against previous versions of the application that are already running in the cluster. Running compression on an already compressed package yields no changes. If you know the cluster where the application will be deployed. Compression reduces the number of files and the package size. The application manifest and the service manifests are not zipped. Uploading the application package may take longer than uploading the uncompressed package.xml │ └───MyServiceManifest │ ServiceManifest. evaluate based on the size and the number of files if compression is needed.xml │ ├───MyCode │ MyServiceHost.bat │ ├───MyConfig │ Settings.xml │ └───MyData init. because they are needed for many internal operations (like package sharing. application type name and version extraction for certain validations). Once the application is packaged correctly and passes validation. In this case. you can pass them in Test-ServiceFabricApplicationPackage for proper validation. the validation can detect whether a package with the same version but different content was already deployed. . You can uncompress the package with the same command. The deploy mechanism is same for compressed and uncompressed packages. you can compress it for faster deployment. The folder must allow write permissions. it is recommended you pass in the image store connection string.dat PS D:\temp> Test-ServiceFabricApplicationPackage . If the package is compressed.\MyApplicationType True PS D:\temp> If your application has application parameters defined.exe │ MySetup. The following command compresses the package without copying it to the image store. it is stored as such in the cluster image store and it's uncompressed on the node before the application is run. You can copy a compressed package to one or more Service Fabric clusters. using Copy- ServiceFabricApplicationPackage without the SkipCopy flag. The package now includes zipped files for the code . config and data packages. exe │ MySetup.\MyApplicationType - ApplicationPackagePathInImageStore MyApplicationType -ImageStoreConnectionString fabric:ImageStore - CompressPackage -TimeoutSec 5400 Internally.\MyApplicationType D:\TEMP\MYAPPLICATIONTYPE │ ApplicationManifest. you must update the versions to avoid the checksum mismatch. and compressed (if needed). just reference it from the service manifest. config and data packages to avoid checksum mismatch. Similarly. Service Fabric computes checksums for the application packages for validation. PS D:\temp> tree /f . so it is ready for deployment to one or more Service Fabric clusters. If the package is large. When using compression.zip Alternatively.zip MyData. If you copied an uncompressed version of your application package.dat PS D:\temp> Copy-ServiceFabricApplicationPackage -ApplicationPackagePath . and you want to use compression for the same package. validated. do not include the unchanged package. you must change the versions of the code .\MyApplicationType -CompressPackage - SkipCopy PS D:\temp> tree /f . the checksums are computed on the zipped versions of each package.xml │ └───MyServiceManifest │ ServiceManifest. Next steps Deploy and remove applications describes how to use PowerShell to manage application instances Managing application parameters for multiple environments describes how to configure parameters and environment variables for different application instances.xml │ └───MyData init.\MyApplicationType D:\TEMP\MYAPPLICATIONTYPE │ ApplicationManifest. If the packages are unchanged.bat │ ├───MyConfig │ Settings. instead of changing the version. The package is now packaged correctly. you can use diff provisioning. PS D:\temp> Copy-ServiceFabricApplicationPackage -ApplicationPackagePath .xml │ ├───MyCode │ MyServiceHost. provide a high enough timeout to allow time for both the package compression and the upload to the cluster. Configure security policies for your application describes how to run services under security policies to restrict . if you uploaded a compressed version of the package and you want to use an uncompressed package.xml MyCode. you can compress and copy the package with Copy-ServiceFabricApplicationPackage in one step.xml │ └───MyServiceManifest ServiceManifest. With this option.zip MyConfig. access. . all managed automatically by Service Fabric. so you can take your existing applications and host them on a Service Fabric cluster. such as ASP. such as Azure DB or Azure Table Storage. However. they do not benefit from the full set of features the platform offers. exclusive to Service Fabric. similar to most service platforms. Reliable Services can be stateless.NET MVC or Web API. The Reliable Actor framework provides built-in communication for actors and pre-set state persistence and scale-out configurations. Reliable Services Reliable Services is a light-weight framework for writing services that integrate with the Service Fabric platform and benefit from the full set of platform features. and can be used to host any other application framework. Learn more about Reliable Services or get started by writing your first Reliable Service. The application framework is minimal. because guest executables do not integrate directly with Service Fabric APIs. written in any language. based on the actor design pattern. in which each instance of the service is created equal and state is persisted in an external solution. ensuring it stays up and running according to the service description. or services can simply be any compiled executable program written in any language and simply hosted on a Service Fabric cluster. Services can choose to use the Service Fabric APIs to take full advantage of the platform's features and application frameworks. service endpoint registration. where state is persisted directly in the service itself using Reliable Collections. Guest Executable A guest executable is an arbitrary executable. Reliable Services provide a minimal set of APIs that allow the Service Fabric runtime to manage the lifecycle of your services and that allow your services to interact with the runtime. Reliable Services can also be stateful. A guest executable can be packaged in an application and hosted alongside other services. Get started with guest executables by deploying your first guest executable application. the Reliable Actor framework is an application framework that implements the Virtual Actor pattern. Reliable Actors Built on top of Reliable Services. such as web servers or Worker Roles in Azure Cloud Services. Service Fabric handles orchestration and simple execution management of the executable. and stateful compute. Service Fabric programming model overview 3/30/2017 • 2 min to read • Edit Online Service Fabric offers multiple ways to write and manage your services. State is made highly-available through replication and distributed through partitioning. such as custom health and load reporting. giving you full control over design and implementation choices. The Reliable Actor framework uses independent units of compute and state with single-threaded execution called actors. As Reliable Actors itself is an application framework built on Reliable Services. Next steps Learn more about Reliable Actors or get started by writing your first Reliable Actor service Learn more about Containerizing your services in Windows or Linux . it is fully integrated with the Service Fabric platform and benefits from the full set of features offered by the platform. Externalization of state is typically done by using an external database or store. Consider a calculator service. it also has a method for returning the last sum it has computed. Service state 1/24/2017 • 1 min to read • Edit Online Service state refers to the data that the service requires to function. In Azure Service Fabric. It includes the data structures and variables that the service reads and writes to do work. for example. This service takes two numbers and returns their sum. The second service is called a stateful service. This is a purely stateless service that has no data associated with it. State can also be co-located with the code that manipulates this code. see the following articles: Availability of Service Fabric services Scalability of Service Fabric services Partitioning Service Fabric services Service Fabric Reliable Services . Next steps For more information on Service Fabric concepts. Storing service state State can be either externalized or co-located with the code that is manipulating the state.it contains some state that it writes to (when it computes a new sum) and reads from (when it returns the last computed sum). Every request to compute the sum performs an update on this row. In our calculator example. This service is now stateful . but in addition to computing sum. Now consider the same calculator. the first service is called a stateless service. Stateful services in Service Fabric can be built using this model. this could be a SQL database in which the current result is stored in a table. Service Fabric provides the infrastructure to ensure that this state is highly available and fault tolerant in the event of a failure. sys kernel driver in Windows. This can happen for various . Your service will listen on a normal IP:port address using any addressing scheme. When your service is opened by Service Fabric. Connect and communicate with services in Service Fabric 4/19/2017 • 7 min to read • Edit Online In Service Fabric. Service discovery and resolution In a distributed system. using whatever protocol or communication stack you want. This document discusses how to set up communication with and between your services in Service Fabric. A Service Fabric application is generally composed of many different services. Multiple service instances or replicas may share a host process. This includes communication. such as a URI. This Microsoft Virtual Academy video also discusses service communication: Bring your own protocol Service Fabric helps manage the lifecycle of your services but it does not make decisions about what your services do. such as rendering different parts of a web application. a service runs somewhere in a Service Fabric cluster. These services may communicate with each other to form a complete function. services may move from one machine to another over time. such as the http. either by the service owner. typically distributed across multiple VMs. It can be moved from one place to another. that's your service's opportunity to set up an endpoint for incoming requests. Services are not statically tied to a particular machine or address. where each service performs a specialized task. In either case. in which case they will either need to use different ports or use a port-sharing mechanism. There are also client applications that connect to and communicate with services. or automatically by Service Fabric. each service instance or replica in a host process must be uniquely addressable. Resolving and connecting to services involves the following steps run in a loop: Resolve: Get the endpoint that a service has published from the Naming Service. however. and this cycle is repeated until the connection succeeds. This means service endpoint addresses change as the service moves to nodes with different IP addresses. In some environments. In these cases. "fabric:/MyApplication/MyService" . the preceding resolve and connect steps need to be retried. In that case. failovers. The name of the service does not change over the lifetime of the service. Retry: A connection attempt may fail for any number of reasons. but extra steps must be taken to allow external clients to connect to services. Service Fabric in Azure A Service Fabric cluster in Azure is placed behind an Azure Load Balancer. Service Fabric provides a discovery and resolution service called the Naming Service. or scale-out. for example if the service has moved since the last time the endpoint address was resolved. The Naming Service maintains a table that maps named service instances to the endpoint addresses they listen on. upgrades. Connections from external clients Services connecting to each other inside a cluster generally can directly access the endpoints of other services because the nodes in a cluster are usually on the same local network. services can still communicate with each other and resolve addresses using the Naming Service. All named service instances in Service Fabric have unique names represented as URIs. This is analogous to websites that have constant URLs but where the IP address may change. it's only the endpoint addresses that can change when services move. Connect: Connect to the service over whatever protocol it uses on that endpoint. including resource balancing. Service Fabric has a registrar that maps service names to their endpoint address. a cluster may be behind a load balancer that routes external ingress traffic through a limited set of ports.reasons. and may open on different ports if the service uses a dynamically selected port. for example. All external traffic to the cluster must . And similar to DNS on the web. which resolves website URLs to IP addresses. For example. Write a service that listens on port 80. for example. Configure port 80 in the service's ServiceManifest. The Azure Load Balancer only knows about ports open on the nodes. a self-hosted web server. the following things must be configured: 1. <Resources> <Endpoints> <Endpoint Name="WebEndpoint" Protocol="http" Port="80" /> </Endpoints> </Resources> . in order to accept external traffic on port 80.pass through the load balancer. it does not know about ports open by individual services.xml and open a listener in the service. The load balancer will automatically forward traffic inbound on a given port to a random node that has the same port open. . public Task<string> OpenAsync(CancellationToken cancellationToken) { EndpointResourceDescription endpoint = serviceContext. string publishUri = uriPrefix.GetEndpoint("WebEndpoint"). } . } class WebService : StatelessService { .Start().Protocol}://+:{endpoint..IPAddressOrFQDN). protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { return new[] { new ServiceInstanceListener(context => new HttpCommunicationListener(context))}..Replace("+". this.Prefixes. FabricRuntime.Port}/myapp/". this. string uriPrefix = $"{endpoint.CodePackageActivationContext..FromResult(publishUri).. return Task..httpListener. this.Add(uriPrefix).httpListener..httpListener = new HttpListener(). } .GetNodeContext(). } .. class HttpCommunicationListener : ICommunicationListener { . } . class HttpCommunicationlistener implements CommunicationListener { . @Override public CompletableFuture<String> openAsync(CancellationToken arg0) { EndpointResourceDescription endpoint = this. this. If you have more than one node type.completedFuture(publishUri). 0).format("http://%s:%d/".getNodeContext()... @Override protected List<ServiceInstanceListener> createServiceInstanceListeners() { <ServiceInstanceListener> listeners = new ArrayList<ServiceInstanceListener>().serviceContext.HttpServer. Create a Service Fabric Cluster in Azure and specify port 80 as a custom endpoint port for the node type that will host the service. try { HttpServer server = com.create(new InetSocketAddress(endpoint.httpserver. server.start().. return CompletableFuture. String publishUri = String. return listeners.net.serviceContext.getPort()). } } .. } catch (IOException e) { throw new RuntimeException(e). listeners.getIpAddressOrFQDN().getEndpoint("WebEndpoint").... } 2.add(new ServiceInstanceListener((context) -> new HttpCommunicationlistener(context))).getPort()). . you can set up a placement constraint on the service to ensure it only runs on the node type that has the custom endpoint port opened.getCodePackageActivationContext(). } class WebService extends StatelessService { .. endpoint.sun. If the probe fails to receive a response after a configured number of times. When creating a cluster through the Azure portal. . a probe is automatically set up for each custom endpoint port that was configured. Once the cluster has been created. this is set up automatically for each custom endpoint port that was configured. the load balancer stops sending traffic to that node. The Azure Load Balancer uses a probe to determine whether or not to send traffic to a particular node. 4. The probe periodically checks an endpoint on each node to determine whether or not the node is responding.3. When creating a cluster through the Azure portal. configure the Azure Load Balancer in the cluster's Resource Group to forward traffic on port 80. whether its a custom binary protocol over TCP sockets. and the programming language that your services are written in.NET Web API for C# applications. The Azure Load Balancer will always send traffic to nodes that respond to the probe. use the CommunicationClient and FabricServicePartitionClient classes. for service resolution. the communication framework. see this article about WCF-based implementation of the communication stack. and error handling. This is available for both C# and Java applications. not the services running on the nodes.It's important to remember that the Azure Load Balancer and the probe only know about the nodes. including ASP. For more details. Services can use any HTTP stack available. so care must be taken to ensure services are available on the nodes that are able to respond to the probe. HTTP connections. Using custom protocols and other communication frameworks Services can use any protocol or framework for communication. This is the easiest and fastest way to get started with service communication. WCF: If you have existing code that uses WCF as your communication framework. while all the work to discover and connect is abstracted from you. Service Fabric provides communication APIs that you can plug your communication stack into. Next steps . which allows strongly- typed remote procedure calls for Reliable Services and Reliable Actors. or streaming events through Azure Event Hubs or Azure IoT Hub. and retry loops. No specific protocol: If you don't have a particular choice of communication framework. but you want to get something up and running quickly. then the ideal option for you is service remoting. Built-in communication API options The Reliable Services framework ships with several pre-built communication options. HTTP: For language-agnostic communication. HTTP provides an industry-standard choice with tools and HTTP servers available in many different languages. Service remoting handles resolution of service addresses. all supported by Service Fabric. then you can use the WcfCommunicationListener for the server side and WcfCommunicationClient and ServicePartitionClient classes for the client. See this article about the Reliable Service communication model for more details. whereas for Java. This however is only available for C# applications on Windows based clusters. connection. Clients written in C# can leverage the ICommunicationClient and ServicePartitionClient classes. retry. The decision about which one will work best for you depends on the choice of the programming model. then get started quickly with service remoting or go in-depth to learn how to write a communication listener using Web API with OWIN self-host. .Learn more about the concepts and APIs available in the Reliable Services communication model. choose ASP.NET Core service to your application ASP.NET Core 3/31/2017 • 8 min to read • Edit Online By default. we will pick up where we left off in the Creating your first application in Visual Studio tutorial and add a web service in front of the stateful counter service.NET Core is a lightweight.NET Core tools for Visual Studio 2017. In Solution Explorer. If you have not already done so.NET Core tools for Visual Studio 2015 are no longer being updated. Build a web service front end for your application using ASP. you should go back and step through that tutorial first. 1. Let's add an ASP. On the Create a Service page. To expose your application's functionality to HTTP clients. In this tutorial. you will need to create a web project to act as an entry point and then communicate from there to your individual services. Add an ASP. The . Azure Service Fabric services do not provide a public interface to the web. . 2.NET Web API project to our existing application.NET Core and give it a name. right-click Services within the application project and choose Add > New Service Fabric Service. NOTE This tutorial is based on the ASP. cross-platform web development framework that you can use to create modern web UI and web APIs. we will choose Web API.NET Core project templates. Once your Web API project is created. For this tutorial.NET Core project outside of a Service Fabric application. you can apply the same concepts to building a full web application.3. Each can be independently versioned and upgraded. you will add more services in exactly the same way. The next page provides a set of ASP. with a small amount of additional code to register the service with the Service Fabric runtime. you will have two services in your application. As you continue to build your application. However. Note that these are the same choices that you would see if you created an ASP. . Add /api/values to the location in the browser. . including the ASP. For background on the options available and the tradeoffs involved.NET Core Web API template doesn't provide default behavior for the root. 2. we will have replaced these default values with the most recent counter value from our stateful service. you define an interface to act as the public contract for the service. Ensure that the . which by default is listening on port 8966. This will invoke the Get method on the ValuesController in the Web API template. In the ServiceProxy approach (modeled on remote procedure calls or RPCs).2. you use that interface to generate a proxy class for interacting with the service. Within a single application. right-click your solution and choose Add > New Project. other services that are accessible via an HTTP REST API. we will follow one of the simpler approaches and use the ServiceProxy / ServiceRemotingListener classes that are provided in the SDK. and still other services that are accessible via web sockets. so you will get an error in the browser. 1. It will return the default response that is provided by the template--a JSON array that contains two strings: By the end of the tutorial. Choose the Visual C# entry in the left navigation pane and then select the Class Library template. Then. 2.NET Web API service. Create the interface We will start by creating the interface to act as the contract between the stateful service and its clients.5. let's deploy the new application and take a look at the default behavior that the ASP. Visual Studio will launch the browser to the root of the ASP. you might have services that are accessible via TCP. TIP To learn more about building ASP. The ASP.NET Core services.NET Core Web API template provides. Connect the services Service Fabric provides complete flexibility in how you communicate with reliable services. When deployment is complete.NET Framework version is set to 4. see the ASP. 3. 1. Press F5 in Visual Studio to debug the app. see Communicating with services. In this tutorial.NET Core Documentation. Run the application To get a sense of what we've done.NET Core project. In Solution Explorer. Services package and install it. In the class library. Search for the Microsoft. 1.Interface { using Microsoft. add a reference to the class library project that contains the interface. namespace MyStatefulService. GetCountAsync .ServiceFabric.Services. and extend the interface from IService.ServiceFabric. it must derive from the IService interface. we need to implement it in the stateful service.Remoting. 4. This interface is included in one of the Service Fabric NuGet packages. . public interface ICounter: IService { Task<long> GetCountAsync(). To add the package. In your stateful service.3. right-click your new class library project and choose Manage NuGet Packages. } } Implement the interface in your stateful service Now that we have defined the interface. create an interface with a single method. In order for an interface to be usable by ServiceProxy . 5. ..StateManager.Interface.HasValue ? result. Locate the class that inherits from StatefulService .GetOrAddAsync<IReliableDictionary<string. using MyStatefulService. GetCountAsync . Service Fabric provides an overridable method called CreateServiceReplicaListeners . Now implement the single method that is defined in the ICounter interface. .. using (var tx = this.Value : 0. public class MyStatefulService : StatefulService. the final step in enabling the stateful service to be callable from other services is to open a communication channel. "Counter"). } } Expose the stateful service using a service remoting listener With the ICounter interface implemented..StateManager. return result. based on the type of communication that you want to enable to your service.2. .TryGetValueAsync(tx. and extend it to implement the ICounter interface.CreateTransaction()) { var result = await myDictionary. With this method. public async Task<long> GetCountAsync() { var myDictionary = await this. you can specify one or more communication listeners. For stateful services. } 3. long>>("myDictionary"). ICounter { // . such as MyStatefulService . using Microsoft. .Remoting. This will provide the ServiceProxy class.NET web service. You can use partitioning to scale stateful services by breaking up their state into different buckets. Any key that you provide will lead to the same partition. see How to partition Service Fabric Reliable Services. public async Task<IEnumerable<string>> Get() { ICounter counter = ServiceProxy. which creates an RPC endpoint that is callable from clients through ServiceProxy .ServiceFabric. you need to provide two pieces of information: a partition ID and the name of the service. add a reference to the class library that contains the ICounter interface.CreateServiceRemotingListener(context)) }.ServiceFabric. So all that remains is adding the code to communicate with it from the ASP.Services. just as you did for the class library project earlier. 3. . new ServicePartitionKey(0)). so the key doesn't matter. long count = await counter.Client. To create the ICounter proxy to the stateful service.Services package to the ASP.. Add the Microsoft. 2.Interface. NOTE The equivalent method for opening a communication channel to stateless services is called CreateServiceInstanceListeners . Replace this implementation with the following code: using MyStatefulService. To learn more about partitioning services. Note that the Get method currently just returns a hard-coded string array of "value1" and "value2"--which matches what we saw earlier in the browser. . In our trivial application. In the Controllers folder. open the ValuesController class.ServiceFabric. } Use the ServiceProxy class to interact with the service Our stateful service is now ready to receive traffic from other services..Remoting. such as a customer ID or postal code.Create<ICounter>(new Uri("fabric:/MyApplication/MyStatefulService").Runtime. return new string[] { count. In your ASP. } The first line of code is the key one.NET project. protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new List<ServiceReplicaListener>() { new ServiceReplicaListener( (context) => this. the stateful service only has one partition.Services.. based on a key that you define. 1.. In this case.ToString() }.GetCountAsync(). using Microsoft. we will replace the existing CreateServiceReplicaListeners method and provide an instance of ServiceRemotingListener .NET project. Add the "api/values" path. However. You can use that interface to generate an actor proxy in the web project to communicate with the actor. you can change it in your service listener configuration. Press F5 again to run the modified application. you need to ensure that only one instance of the service is running.NET Core web server.UseWebListener() with return new WebHostBuilder(). Simply replace return new WebHostBuilder(). you can deploy exactly the same Service Fabric application to a multi-machine cluster that you deployed on your local cluster and be highly confident that it will work as you expect. Kestrel and WebListener The default ASP. however. Refresh the browser periodically to see the counter value update. Visual Studio automatically generates an interface project for you. In fact. All other configurations on the web host can remain the same. When it comes to web services. and you should see the current counter value returned. When your cluster sits behind a load balancer. Visual Studio will automatically launch the browser to the root of the web project. 4. when you run a web service locally. as it does in Azure. When you create an actor project. you can follow a very similar model to talk to actors. You can do this by setting the InstanceCount for the service to the special value of "-1".UseKestrel() . The ServiceProxy class also seamlessly handles the case where the machine that hosts the stateful service partition fails and another machine must be promoted to take its place. known as Kestrel. If you will not be serving direct internet traffic and would prefer to use Kestrel as your web server. you will run into conflicts from multiple processes that are listening on the same path and port. As before. is not currently supported for handling direct internet traffic. As a . it is somewhat simpler. you must ensure that your web services are deployed on every machine since the load balancer will simply round-robin traffic across the machines. How web services work on your local cluster In general. Service Fabric can uniquely identify the machine that requests should be sent to. This abstraction makes writing the client code to deal with other services significantly simpler. The service name is a URI of the form fabric:/<application_name>/<service_name>. So you do not need to do anything that is equivalent to establishing a ServiceRemotingListener like you did for the stateful service in this tutorial. Otherwise. Once we have the proxy. What about actors? This tutorial focused on adding a web front end that communicated with a stateful service. The communication channel is provided automatically. As a result. With these two pieces of information. the ASP. we simply invoke the GetCountAsync method and return its result. By contrast. there is one key nuance.NET templates for Service Fabric use WebListener by default. This is because your local cluster is simply a five-node configuration that is collapsed to a single machine. Next steps Create a cluster in Azure for deploying your application to the cloud Learn more about communicating with services Learn more about partitioning stateful services . the web service instance count should be set to "1" for local deployments. To learn how to configure different values for different environment.result. see Managing application parameters for multiple environments. Service replicas running on different cluster nodes can be assigned different port numbers. . <Resources> <Endpoints> <Endpoint Name="ServiceEndpoint1" Protocol="http"/> <Endpoint Name="ServiceEndpoint2" Protocol="http" Port="80"/> <Endpoint Name="ServiceEndpoint3" Protocol="https"/> </Endpoints> </Resources> Refer to Configuring stateful Reliable Services to read more about referencing endpoints from the config package settings file (settings. Specify resources in a service manifest 3/28/2017 • 3 min to read • Edit Online Overview The service manifest allows resources that are used by the service to be declared/changed without changing the compiled code. look at the endpoint ServiceEndpoint1 specified in the manifest snippet provided after this paragraph.xsd. services can also request a specific port in a resource. HTTP endpoints are automatically ACL'd by Service Fabric. while replicas of a service running on the same node share the port. The declaration of resources allows these resources to be changed at deployment time. The access to the resources that are specified in the service manifest can be controlled via the SecurityGroup in the application manifest. Example: specifying an HTTP endpoint for your service The following service manifest defines one TCP endpoint resource and two HTTP endpoint resources in the <Resources> element. meaning the service doesn't need to introduce a new configuration mechanism. Endpoints When an endpoint resource is defined in the service manifest. Service Fabric assigns ports from the reserved application port range when a port isn't specified explicitly. The service replicas can then use these ports as needed for replication and listening for client requests. For example.xml).xml file is installed with the Service Fabric SDK and tools to C:\Program Files\Microsoft SDKs\Service Fabric\schemas\ServiceFabricServiceModel. The schema definition for the ServiceManifest. Azure Service Fabric supports configuration of endpoint resources for the service. Additionally. 0.This endpoint is used by the replicator for replicating the state of your service. --> <Endpoint Name="ServiceEndpoint1" Protocol="http"/> <Endpoint Name="ServiceEndpoint2" Protocol="http" Port="80"/> <Endpoint Name="ServiceEndpoint3" Protocol="https"/> <!-.0" /> <Resources> <Endpoints> <!-.0. This name must match the string used in the RegisterServiceType call in Program. --> <Endpoint Name="ReplicatorEndpoint" /> </Endpoints> </Resources> </ServiceManifest> Example: specifying an HTTPS endpoint for your service The HTTPS protocol provides server authentication and is also used for encrypting client-server communication. --> <CodePackage Name="Code" Version="1. --> <ConfigPackage Name="Config" Version="1.0" xmlns="http://schemas.com/2011/01/fabric" xmlns:xsd="http://www. Note that if your service is partitioned.exe</Program> </ExeHost> </EntryPoint> </CodePackage> <!-. If it is changed during upgrade.0. Here is an example ApplicationManifest that you need to set for HTTPS.xml file under the ConfigPackage. <?xml version="1. You can add more than one EndpointCertificate.This is the name of your ServiceType. this port is shared with replicas of different partitions that are placed in your code.org/2001/XMLSchema-instance"> <ServiceTypes> <!-.0"> <EntryPoint> <ExeHost> <Program>Stateful1. .This endpoint is used by the communication listener to obtain the port number on which to listen. The EndpointRef is a reference to EndpointResource in ServiceManifest.Config package is the contents of the Config directoy under PackageRoot that contains an independently updateable and versioned set of custom configuration settings for your service. The thumbprint for your certificate must be provided.cs.microsoft.w3. --> <StatefulServiceType ServiceTypeName="Stateful1Type" HasPersistedState="true" /> </ServiceTypes> <!-. specify the protocol in the Resources -> Endpoints -> Endpoint section of the service manifest. NOTE A service’s protocol cannot be changed during application upgrade. for which you set the HTTPS protocol.org/2001/XMLSchema" xmlns:xsi="http://www. To enable HTTPS on your Service Fabric service. it is a breaking change.w3. as shown earlier for the endpoint ServiceEndpoint3. This endpoint is configured through the ReplicatorSettings config section in the Settings.0" encoding="utf-8"?> <ServiceManifest Name="Stateful1Pkg" Version="1.Code package is your service executable. xml file.microsoft.The section below creates instances of service types when an instance of this application type is created. You can also create one or more instances of service type by using the Service Fabric PowerShell module.w3.0" /> <ConfigOverrides /> <Policies> <EndpointBindingPolicy CertificateRef="TestCert1" EndpointRef="ServiceEndpoint3"/> </Policies> </ServiceManifestImport> <DefaultServices> <!-.<?xml version="1. The attribute ServiceTypeName below must match the name defined in the imported ServiceManifest.com/2011/01/fabric" xmlns:xsd="http://www.0" encoding="utf-8"?> <ApplicationManifest ApplicationTypeName="Application1Type" ApplicationTypeVersion="1.org/2001/XMLSchema-instance"> <Parameters> <Parameter Name="Stateful1_MinReplicaSetSize" DefaultValue="2" /> <Parameter Name="Stateful1_PartitionCount" DefaultValue="1" /> <Parameter Name="Stateful1_TargetReplicaSetSize" DefaultValue="3" /> </Parameters> <!-.0" xmlns="http://schemas.org/2001/XMLSchema" xmlns:xsi="http://www. The ServiceManifestName and ServiceManifestVersion should match the Name and Version attributes of the ServiceManifest element defined in the ServiceManifest.Import the ServiceManifest from the ServicePackage.0.0. --> <Service Name="Stateful1"> <StatefulService ServiceTypeName="Stateful1Type" TargetReplicaSetSize="[Stateful1_TargetReplicaSetSize]" MinReplicaSetSize="[Stateful1_MinReplicaSetSize]"> <UniformInt64Partition PartitionCount="[Stateful1_PartitionCount]" LowKey="-9223372036854775808" HighKey="9223372036854775807" /> </StatefulService> </Service> </DefaultServices> <Certificates> <EndpointCertificate Name="TestCert1" X509FindValue="FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF FF F0" X509StoreName="MY" /> </Certificates> </ApplicationManifest> . --> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="Stateful1Pkg" ServiceManifestVersion="1.xml file.w3. b. Select the Service Fabric plug-in. we describe how to set up your Eclipse development environment to work with Azure Service Fabric. go to Help > Check for Updates. The plug-in can help simplify the process of building and deploying Java services.17 or a later version) installed: To check the versions of installed components. Ensure that you have the latest version of Eclipse Neon and the latest version of Buildship (1. in Eclipse Neon.0. Click Add. Service Fabric plug-in for Eclipse Java application development 4/10/2017 • 5 min to read • Edit Online Eclipse is one of the most widely used integrated development environments (IDEs) for Java developers. go to Help > Install New Software. and then click Next. In this article. . Learn how to install the Service Fabric plug-in. To check for and install updates for Eclipse Neon. create a Service Fabric application. and deploy your Service Fabric application to a local or remote Service Fabric cluster in Eclipse Neon. 1.windowsazure. a. Complete the installation steps. d. go to Help > Installation Details. in Eclipse Neon. and then accept the Microsoft Software License Terms. see Eclipse Buildship: Eclipse Plug-ins for Gradle. To update Buildship. enter http://dl. In the Work with box. 2. To install the Service Fabric plug-in. c.com/eclipse/servicefabric. Install or update the Service Fabric plug-in in Eclipse Neon You can install a Service Fabric plug-in in Eclipse. In Eclipse Neon. select Service Fabric. To speed up the process of checking for and installing a Service Fabric plug-in update. Clear the check boxes for all sites except for the one that points to the Service Fabric plug-in location (http://dl.If you already have the Service Fabric plug-in installed.windowsazure. it might be due to an Eclipse setting. . make sure that you have the latest version. and then click Next. go to File > New > Other. go to Available Software Sites. NOTE If installing or updating the Service Fabric plug-in is slow. Available updates will be installed. Select Service Fabric Project.com/eclipse/servicefabric). and then click Update. Create a Service Fabric application in Eclipse 1. 2. Eclipse collects metadata on all changes to update sites that are registered with your Eclipse instance. and then click Next. go to Help > Installation Details. In the list of installed plug-ins. Enter a name for your project. To check for available updates. Enter the service name and service details. 4. Container. select Service Template. . and then click Next. Stateless. and then click Finish. Select your service template type (Actor. In the list of templates.3. or Guest Binary). click Yes. 6. in the Open Associated Perspective dialog box.5. When you create your first Service Fabric project. Your new project looks like this: . Build and deploy a Service Fabric application in Eclipse 1. . Right-click your new Service Fabric application. and then select Service Fabric. and then select the application you want. edit Local. and publish your application: To deploy to your local cluster. In the submenu. The default is local.json Cloud. In the right pane. From this menu. Go to Run > Run Configurations. An alternate way to deploy your Service Fabric application is by using Eclipse run configurations. 5. 2. undeploy. on the Arguments tab. 4. Add a Service Fabric service to your Service Fabric application To add a Service Fabric service to an existing Service Fabric application. Click Apply.json as needed. for publishProfile. To do a clean build of the application. select local or cloud. To change the application. you also can deploy. select the ServiceFabricDeployer run configuration. and then click Service Fabric. click Rebuild Application. click Deploy Application. 6. click Build Application. In the Publish Application dialog box. To ensure that the proper information is populated in the publish profiles.json These JavaScript Object Notation (JSON) files store information (such as connection endpoints and security information) that is required to connect to your local or cloud (Azure) cluster. To clean the application of built artifacts. and then click Run. do the following steps: 1. Ensure that Working Directory points to the application you want to deploy. select cloud. 3. select a publish profile: Local. 1.json or Cloud. Under Gradle Project. To deploy to a remote or cloud cluster.2. 3. You can add or update endpoint details and security credentials. Right-click the project you want to add a service to. select the option you want: To build the application without cleaning. Your application builds and deploys within a few moments. click Clean Application. click the Workspace button. You can monitor the deployment status in Service Fabric Explorer. . Click Add Service Fabric Service.2. . Select the service template you want to add to your project. 3. and then click Next. and complete the set of steps to add a service to the project. and then click the Add Service button. your overall project structure looks similar to the following project: . as needed).4. Enter the service name (and other details. After the service is added. 5. The application upgrade takes a few minutes. Config. make any changes to your application. 3. use it to upgrade your application as needed. and then select Duplicate. Enter a new name for this configuration. You can monitor the application upgrade in Service Fabric Explorer. To upgrade your application by using Eclipse Neon. It also gets the latest updated application type version from the application manifest file. or Data. Now. 2. Right-click ServiceFabricDeployer. This process creates and saves a run configuration profile you can use at any time to upgrade your application. for example. you want to upgrade your application without interrupting availability. In the right panel. Also. say you created the App1 project by using the Service Fabric plug-in in Eclipse.0. click the small arrow to the left of Gradle Project. The application type is App1AppicationType.xml) with the updated version number for the application and the modified service. First. Update the modified service’s manifest file (ServiceManifest. Go to Run > Run Configurations. ServiceFabricUpgrader. you can create a duplicate run configuration profile. and the application version is 1.xml) with the updated versions for the service (and Code. Then. You deployed it by using the plug-in to create an application named fabric:/App1Application. modify the application’s manifest (ApplicationManifest. as relevant). 1. . In the left pane. and then rebuild the modified service. and then click Apply. change -Pconfig='deploy' to -Pconfig='upgrade'. on the Arguments tab.Upgrade your Service Fabric Java application For an upgrade scenario. or package. Application Debug Mode By default. Use Visual Studio to simplify writing and managing your Service Fabric applications 1/17/2017 • 4 min to read • Edit Online You can manage your Azure Service Fabric applications and services through Visual Studio. The upgrade process preserves any data that you entered in a previous . Removing any running application instances 5. choose Properties (or press the F4 key). Once you've set up your development environment. Uploading the application package to the image store 3. In that case. In the Properties window. On the application project's shortcut menu. Deploy your Service Fabric application By default. Creating the application package 2. all the application's data is removed. 1. or you can publish to a local or remote cluster by using the publish profile. set the Application Debug Mode property. While debugging locally. see Publish an application to a remote cluster by using Visual Studio. Registering the application type 4. The next F5 will treat the deployment as an upgrade by using unmonitored auto mode to quickly upgrade the application to a newer version with a date string appended. you can use Visual Studio to create Service Fabric applications. Visual Studio Service Fabric Tools provide a property called Application Debug Mode. pressing F5 will also deploy your application and attach the debugger to all application instances. and deploy applications in your local development cluster. 2. To set the Application Debug Mode property 1. which controls whether the F5 should uninstall the application. register. Visual Studio removes existing instances of your application type when you stop debugging or (if you deployed the app without attaching the debugger). you may want to keep data that you've already created when testing a new version of the application. deploying an application combines the following steps into one simple operation: 1. Creating a new application instance In Visual Studio. Auto Upgrade: The application continues to run when the debug session ends. For more information. rather than removed and redeployed. add services. You can use Ctrl+F5 to deploy an application without debugging. These are the Application Debug Mode options available. when you redeploy the application. keep the application running after a debug session ends or enable the application to be upgraded on subsequent debugging sessions. you want to keep the application running or you want subsequent debug sessions to upgrade the application. . please use the Preserve Data On Start property to achieve the same behavior. debug session. see Service Fabric application upgrade. . Add a service to your Service Fabric application You can add new services to your application to extend its functionality.1 of the Service Fabric Tools for Visual Studio. 3. add the service through the New Fabric Service.2 of the Service Fabric Tools for Visual Studio. 2. Remove Application causes the application to be removed when the debug session ends. For more information about upgrading applications and how you might perform an upgrade in a real environment. but it is tuned to optimize for performance rather than safety. Keep Application: The application keeps running in the cluster when the debug session ends. NOTE This property doesn't exist prior to version 1. Prior to 1. To ensure that the service is included in your application package. On the next F5 the application will be removed and the newly built application will be deployed to the cluster. menu item. The "Keep Application" option was introduced in version 1..1. For Auto Upgrade data is preserved by applying the application upgrade capabilities of Service Fabric. The service will be created and started the next time you deploy the application. .Select a Service Fabric project type to add to your application. See Choosing a framework for your service to help you decide which service type to use. The service references and a default service instance will be added to the application manifest. The new service will be added to your solution and existing application package. and specify a name for the service. For instance. you can delete applications and unprovision application types on local or remote clusters. Clicking Package from the Application context menu creates or updates the application package. Visual Studio sets up and manages the package in the application project's folder. You may want to do this if you deploy the application by using custom PowerShell scripts. you need to create an application package. and other necessary files in a specific layout. The package organizes the application manifest.Package your Service Fabric application To deploy the application and its services to a cluster. Remove applications and application types using Cloud Explorer You can perform basic cluster management operations from within Visual Studio using Cloud Explorer. service manifest(s). . which you can launch from the View menu. in the 'pkg' directory. TIP For richer cluster management functionality. see Visualizing your cluster with Service Fabric Explorer. Next steps Service Fabric application model Service Fabric application deployment Managing application parameters for multiple environments Debugging your Service Fabric application Visualizing your cluster by using Service Fabric Explorer . You can sign in to your Azure account and then select an existing cluster under your subscriptions. The Visual Studio Service Fabric tools support all authentication types for connecting to a cluster for publishing. Cluster connection types Two types of connections are supported by the Azure Service Fabric cluster: non-secure connections and x509 certificate-based secure connections. the connection type can’t be changed. The Select Service Fabric Cluster dialog box automatically validates the cluster connection. use the Publish Service Fabric Application dialog box to choose an Azure Service Fabric cluster by clicking the Select button in Connection endpoint section. See Setting up a Service Fabric cluster from the Azure portal for instructions on how to set up a secure Service Fabric cluster. (For Service Fabric clusters hosted on-premises. . Configure secure connections to a Service Fabric cluster from Visual Studio 1/24/2017 • 2 min to read • Edit Online Learn how to use Visual Studio to securely access an Azure Service Fabric cluster with access control policies configured. If validation passes. Once it's created. or your cluster is non-secure. Windows and dSTS authentications are also supported. Validation failures can be caused by network issues or by not having your system correctly configured to connect to a secure cluster.) You have to configure the cluster connection type when the cluster is being created. Configure cluster connections in publish profiles If you publish a Service Fabric project from Visual Studio. it means that your system has the correct certificates installed to connect to the cluster securely. Valid parameters are any that are accepted by the Connect- ServiceFabricCluster cmdlet. Next steps For more information about accessing Service Fabric clusters. The following is an example of connecting to a non-secure cluster: <ClusterConnectionParameters ConnectionEndpoint="mycluster. To do this. store location.cloudapp. double-click the .westus. be sure to note the certificate store name. You'll need to provide these values for the certificate's store name and store location. Choose the Publish.azure. 2. such as upgrade parameters and Application Parameter file location.To connect to a secure cluster 1.azure. See Setting up a Service Fabric cluster from the Azure portal for how to configure the server to grant access to a client.cloudapp. or use the PowerShell script Import- PfxCertificate to import the certificates. 3. [Optional]: You can edit the publish profile to specify a secure cluster connection. Since you're manually editing the Publish Profile XML file to specify the certificate information. If you’re publishing to a remote cluster. See How to: Retrieve the Thumbprint of a Certificate for more information. 4. Install the trusted certificate. command on the shortcut menu of the project to open the Publish Azure Application dialog box and then select the target cluster.pfx file.. See Connect-ServiceFabricCluster for a list of available parameters. and then publish your application from the Publish Service Fabric Application dialog box in Visual Studio.com:19000" /> Here’s an example for connecting to an x509 certificate-based secure cluster: <ClusterConnectionParameters ConnectionEndpoint="mycluster. The tool automatically resolves the connection and saves the secure connection parameters in the publish profile. you need to specify the appropriate parameters for that specific cluster. The certificate is usually shared as a Personal Information Exchange (.. Install the certificate to Cert:\LocalMachine\My. Make sure you can access one of the client certificates that the destination cluster trusts. It's OK to accept all default settings while importing the certificate. see Visualizing your cluster by using Service Fabric .pfx) file.westus.com:19000" X509Credential="true" ServerCertThumbprint="0123456789012345678901234567890123456789" FindType="FindByThumbprint" FindValue="9876543210987654321098765432109876543210" StoreLocation="CurrentUser" StoreName="My" /> 5. You can use the ClusterConnectionParameters parameters to specify the PowerShell parameters to use when connecting to the Service Fabric cluster. Edit any other necessary settings. and certificate thumbprint. .Explorer. Debug your Service Fabric application by using Visual Studio 3/8/2017 • 4 min to read • Edit Online Debug a local Service Fabric application You can save time and money by deploying and debugging your Azure Service Fabric application in a local computer development cluster. NOTE Visual Studio attaches to all instances of your application. 4. The Diagnostic Events window will automatically open so you can view diagnostic events in real-time. Visual Studio 2017 or Visual Studio 2015 can deploy the application to the local cluster and automatically connect the debugger to all instances of your application. breakpoints may get hit by multiple processes resulting in concurrent sessions. Start a local development cluster by following the steps in Setting up your Service Fabric development environment. Press F5 or click Debug > Start Debugging. by making each breakpoint conditional on the thread ID or by using diagnostic events. 1. 3. Try disabling the breakpoints after they're hit. 2. While you're stepping through code. . Set breakpoints in your code and step through the application by clicking commands in the Debug menu. pausing. 7. The Diagnostic Events window supports filtering. ServiceEventSource.Value. Under Service Fabric. You can also open the Diagnostic Events window in Cloud Explorer. The diagnostic events can be seen in the automatically generated ServiceEventSource.Current. 6. result.cs file and are called from application code.5. .ServiceMessage(this. and inspecting events in real-time. simply enable streaming traces on that specific service or application. If you want to filter your traces to a specific service or application. The filter is a simple string search of the event message. right-click any node and choose View Streaming Traces. including its contents.ToString()). "My ServiceMessage with a parameter {0}". Debug a remote Service Fabric application If your Service Fabric applications are running on a Service Fabric cluster in Azure. they still implement IEnumerable.NET 2. . This means that you can use the Results View in Visual Studio while debugging to see what you've stored inside. Even though Reliable Collections replicate across multiple nodes.0 and Azure SDK for .9. Debugging services is like debugging any other application. NOTE The feature requires Service Fabric SDK 2.8. Simply set a breakpoint anywhere in your code. directly from Visual Studio. you are able to remotely debug these. You will normally set Breakpoints through Visual Studio for easy debugging. as well as required network configurations. choose the process you want to debug. Right-click the cluster node in Cloud Explorer. because of the impact on the running applications. and click Attach . right-click and choose Enable Debugging This will kick-off the process of enabling the remote debugging extension on your cluster nodes. WARNING Remote debugging is meant for dev/test scenarios and not to be used in production environments. Navigate to your cluster in Cloud Explorer. 2. and choose Attach Debugger 3. In the Attach to process dialog. 1. the processing of that replica will still be part of the debug session. you can disable the remote debugging extension by right- clicking the cluster in Cloud Explorer and choose Disable Debugging . The debugger will attach to all nodes running the process. 4. In order to only catch relevant partitions or instances of a given service. Once you finish debugging your application. If the primary replica moves during the debug session. The name of the process you want to attach to. you can use conditional breakpoints to only break a specific partition or instance. NOTE Currently we do not support debugging a Service Fabric cluster with multiple instances of the same service executable name. all instances of the service on all nodes are part of the debug session. In the case where you are debugging a stateless service. equals the name of your service project assembly name. If you are debugging a stateful service. only the primary replica of any partition will be active and therefore caught by the debugger. This feature allows you to stream ETW trace events. This feature only supports clusters running in Azure. Navigate to your cluster in Cloud Explorer. because of the impact on the running applications. right-click and choose Enable Streaming Traces . you should rely on forwarding events using Azure Diagnostics.NET 2. NOTE This feature requires Service Fabric SDK 2.0 and Azure SDK for . In a production scenario. produced on a Service Fabric cluster node.Streaming traces from a remote cluster node You are also able to stream traces directly from a remote cluster node to Visual Studio. 1. WARNING Streaming traces is meant for dev/test scenarios and not to be used in production environments.9. as well as required network configurations. 2. simply type in the name of the application in the filter. Each nodes stream will show up in a dedicated window. This will kick-off the process of enabling the streaming traces extension on your cluster nodes. right-click the node you want to stream traces from and choose View Streaming Traces Repeat step 2 for as many nodes as you want to see traces from. Expand the Nodes element in Cloud Explorer. and your services. If you want to filter the events to only show a specific application. . You are now able to see the traces emitted by Service Fabric. . by right- clicking the cluster in Cloud Explorer and choose Disable Streaming Traces Next steps Test a Service Fabric service. you can disable remote streaming traces. Manage your Service Fabric applications in Visual Studio.3. Once you are done streaming traces from your cluster. Update the Application Manifest by setting the instance count or the replica count for the service that is being debugged to 1. In the Eclipse IDE. execute echo '/tmp/core_%e. If the application is crashing. select Run -> Debug Configurations -> Remote Java Application and input connection properties and set the properties as follows: Host: ipaddress Port: 8001 6. This setting avoids conflicts for the port that is used for debugging.address=8001.jar 3. you may also want to enable coredumps. For example. Debug your Java Service Fabric application using Eclipse 2/13/2017 • 1 min to read • Edit Online 1.%p' | sudo tee /proc/sys/kernel/core_pattern . Start a local development cluster by following the steps in Setting up your Service Fabric development environment. 4. Deploy the application. To enable unlimited coredumps.sh . 2. This file can be found at the following location: ApplicationName\ServiceNamePkg\Code\entrypoint. for stateless services. Monitor and diagnose services locally. Update entryPoint. Next steps Collect logs using Linux Azure Diagnostics.suspend=y - Djava.library. java -Xdebug -Xrunjdwp:transport=dt_socket. then coredumps are not enabled. so that it starts the java process with remote debug parameters.server=y. . set InstanceCount="1" and for stateful services set the target and min replica set sizes to 1 as follows: TargetReplicaSetSize="1" MinReplicaSetSize="1" . If you wanted to update the coredump generation path. You can also verify the status using the command ulimit -a .path=$LD_LIBRARY_PATH -jar myapp. execute the following command: ulimit -c unlimited . Set breakpoints at desired points and debug the application.sh of the service you wish to debug. Port 8001 is set for debugging in this example. 5. Execute ulimit -c in a shell and if it returns 0. It also helps you to more easily understand the sequences and interrelationships between your application code and events in the underlying system. This means you don't have to rewrite your tracing code when you are ready to deploy your code to a real cluster. . You can also filter the list of events by using the Filter events box at the top of the events window. choose Other Windows and then Diagnostic Events Viewer. View Service Fabric system events in Visual Studio Service Fabric emits ETW events to help application developers understand what's happening in the platform. Service Fabric makes it easy for service developers to implement diagnostics that can seamlessly work across both single-machine local development setups and real-world production cluster setups. you can filter on Node Name or Service Name. and troubleshooting allow for services to continue with minimal disruption to the user experience. Each event has standard metadata information that tells you the node. It was built as a tracing technology that has minimal impact on code execution times. the efficiency will depend on adopting a similar model during development of services to ensure they work when you move to a real-world setup. 2. And when you're looking at event details. While monitoring and diagnostics are critical in an actual deployed production environment. go ahead and follow the steps in Creating your first application in Visual Studio. This information will help you get an application up and running with the Diagnostics Events Viewer showing the trace messages. The benefits of Event Tracing for Windows Event Tracing for Windows (ETW) is the recommended technology for tracing messages in Service Fabric. This allows you to view your application traces interleaved with Service Fabric system traces. If you haven't already done so. If the diagnostics events window does not automatically show. There is built-in support in Service Fabric Visual Studio tools to view ETW events. you can also pause by using the Pause button at the top of the events window and resume later without any loss of events. diagnosing. Service Fabric system code also uses ETW for internal tracing. Reasons for this are: ETW is fast. 1. detecting. For example. Monitor and diagnose services in a local machine development setup 4/5/2017 • 3 min to read • Edit Online Monitoring. Go to the View tab in Visual Studio. application and service the event is coming from. ETW tracing works seamlessly across local development environments and also real-world cluster setups. you will find an overload for the ServiceEventSource. In the ServiceEventSource. The call to ServiceEventSource. The code shows how to add custom application code ETW traces that show up in the Visual Studio ETW viewer alongside system traces from Service Fabric. For projects created from the service templates (stateless or stateful) just search for the RunAsync implementation: 1. deploy.ActorMessage method that should be used for high-frequency events due to performance reasons.Current. The advantage of this method is that metadata is automatically added to traces. Next steps The same tracing code that you added to your application above for local diagnostics will work with tools that you can use to view these events when running your application on an Azure cluster. "Doing Work"). This is an example of a custom ETW trace written from application code.Current. 2.cs file. Open the "ProjectName". the Diagnostic Events Viewer will open automatically.ServiceMessage in the RunAsync method shows an example of a custom ETW trace from the application code. 3. If you debug the application with F5. After adding custom ETW tracing to your service code. For projects created from the actor templates (stateless or stateful): 1.ActorMessage(this. How to collect logs with Azure Diagnostics Collect logs directly from service process . and the Visual Studio Diagnostic Events Viewer is already configured to display them. in the DoWorkAsync method.cs file where ProjectName is the name you chose for your Visual Studio project. and run the application again to see your event(s) in the Diagnostic Events Viewer. you can build. you will find an overload for the ActorEventSource. Check out these articles that discuss the different options for the tools and describe how you can set them up. 2.ServiceMessage method that should be used for high-frequency events due to performance reasons.Add your own custom traces to the application code The Service Fabric Visual Studio project templates contain sample code. Find the code ActorEventSource. In file ActorEventSource.cs. logging.logging.properties file must exist. Debugging Service Fabric C# applications .config.util.properties file is created. multiple logging frameworks are available. The log file in this case is named mysfapp%u. One can view the logs in syslog under /var/log/syslog.util. entrypoint. and troubleshooting allow for services to continue with minimal disruption to the user experience.FileHandler.logging. Since java.config. The following discussion explains how to configure the java.FileHandler.util. For more information. %g is the generation number to distinguish between rotating logs.FileHandler. output streams.%g.logging is the default option with the JRE.FileHandler.util.properties file to configure the file handler for your application to redirect all logs to a local file.level = ALL java.util. You can create a app.logging.sh in the <applicationfolder>/<servicePkg>/Code/ folder to set the property java.util. detecting. console files.log where: %u is a unique number to resolve conflicts between simultaneous Java processes.formatter = java. The following code snippet contains an example configuration: handlers = java.util. Debugging Service Fabric Java applications For Java applications.path=$LD_LIBRARY_PATH -Djava. Using java.logging framework. Monitoring and diagnostics are critical in an actual deployed production environment. see the code examples in github. The entry should look like the following snippet: java -Djava. it is also used for the code examples in github.%g.log The folder pointed to by the app.logging you can redirect your application logs to memory. Adopting a similar model during development of services ensures that the diagnostic pipeline works when you move to a production environment. Service Fabric makes it easy for service developers to implement diagnostics that can seamlessly work across both single-machine local development setups and real-world production cluster setups. By default if no handler is explicitly configured.count = 10 java. diagnosing.logging. Monitor and diagnose services in a local machine development setup 3/7/2017 • 3 min to read • Edit Online Monitoring.util.pattern = /tmp/servicefabric/logs/mysfapp%u.file=<path to app.limit = 1024000 java. or sockets.util.library. After the app.propertes file.logging.logging.properties> -jar <service name>.FileHandler.file to app. For each of these options.util.util.util.FileHandler java.logging.logging. you need to also modify your entry point script.SimpleFormatter java.jar This configuration results in logs being collected in a rotating fashion at /tmp/servicefabric/logs/ . the console handler is registered. there are default handlers already provided in the framework. EventLevel. sargs) : ""). } protected override void OnEventWritten(EventWrittenEventArgs eventData) { using (StreamWriter Out = new StreamWriter( new FileStream("/tmp/MyServiceLog.Message. This file name needs to be appropriately updated.Payload. add the following project to your project.ToString().`this article uses EventSource for tracing in CoreCLR samples on Linux. Out. In case you want to redirect the logs to console. sargs != null ? string.0. params object[] args) { if (this.Level.Payload != null ? eventData.Select(o => o.txt". For logging using EventSource. EventKeywords. Since EventSource is familiar to C# developers.Join(".Diagnostics. eventData.Tracing so that you can write your logs to memory.IsEnabled()) { var finalMessage = string. For more information.Diagnostics.Write(" {0} ".WriteLine(eventData."")). ". [NonEvent] public void Message(string message. this.ToArray() : null. Write(eventData.ToString()). The following code snippet shows a sample implementation of logging using EventSource and a custom EventListener: public class ServiceEventSource : EventSource { public static ServiceEventSource Current = new ServiceEventSource(). The first step is to include System. or console files. eventData. use the following snippet in your customized . } } // TBD: Need to add method for sample event.". if (eventData. output streams.EventId. else { string[] sargs = eventData.txt .1" You can use a custom EventListener to listen for the service event and then appropriately redirect them to trace files.Format(message.Append))) { // report all event information Out.All).Message(finalMessage). } } } } The preceding snippet outputs the logs to a file in /tmp/MyServiceLog. args).Payload.Message != null) Out.LogAlways.EventName.WriteLine("({0}). see GitHub: logging. } internal class ServiceEventListener : EventListener { protected override void OnEventSourceCreated(EventSource eventSource) { EnableEvents(eventSource.ToString(). eventData.ToArray()).json: "System.StackTrace": "4. FileMode.Task.Multiple frameworks are available for tracing CoreCLR applications on Linux. eventData. Next steps The same tracing code added to your application also works with the diagnostics of your application on an Azure cluster. How to collect logs with Azure Diagnostics . Check out these articles that discuss the different options for the tools and describe how to set them up.Out.EventListener class: public static TextWriter Out = Console. The samples at C# Samples use EventSource and a custom EventListener to log events to a file. Configuration packages are versioned and updatable through managed rolling upgrades with health-validation and auto rollback. or other values that should not be handled in plain text. Overview The recommended way to manage service configuration settings is through service configuration packages. Encrypted secrets are no exception. Install the certificate in your cluster.xml file using certificate encryption. Service Fabric has built-in features for encrypting and decrypting values in a configuration package Settings. This is preferred to global configuration as it reduces the chances of a global service outage. Managing secrets in Service Fabric applications 2/13/2017 • 5 min to read • Edit Online This guide walks you through the steps of managing secrets in a Service Fabric application. such as storage connection strings. Secrets can be any sensitive information. Encrypt secret values when deploying an application with the certificate and inject them into a service's Settings. . However. This guide uses Azure Key Vault to manage keys and secrets. Obtain a data encipherment certificate.xml configuration file. 3. 2. using secrets in an application is cloud platform-agnostic to allow applications to be deployed to a cluster hosted anywhere. passwords. The following diagram illustrates the basic flow for secret management in a Service Fabric application: There are four main steps in this flow: 1. The base-64 encoded string can be inserted into a parameter in your service's Settings. Azure Key Vault is used here as a safe storage location for certificates and as a way to get certificates installed on Service Fabric clusters in Azure.xml configuration file with the IsEncrypted attribute set to true : <?xml version="1. The following PowerShell command is used to encrypt a secret. the KeyUsage flag must be set to DataEncipherment : New-SelfSignedCertificate -Type DocumentEncryptionCert -KeyUsage DataEncipherment -Subject mydataenciphermentcert -Provider 'Microsoft Enhanced Cryptographic Provider v1.pfx) file. The certificate must be created for key exchange. See how to create a cluster using Azure Resource Manager for setup instructions.microsoft.xml and is not used for authentication or signing of cipher text.org/2001/XMLSchema" xmlns:xsi="http://www. you do not need to use Key Vault to manage secrets in Service Fabric applications.4. The certificate key usage must include Data Encipherment (10). Encrypt application secrets The Service Fabric SDK has built-in secret encryption and decryption functions.xml. For example. Secret values can be encrypted at built-time and then decrypted and read programmatically in service code. You must use the same encipherment certificate that is installed in your cluster to produce ciphertext for secret values: Invoke-ServiceFabricEncryptText -CertStore -CertThumbprint "<thumbprint>" -Text "mysecret" -StoreLocation CurrentUser -StoreName My The resulting base-64 string contains both the secret ciphertext as well as information about the certificate that was used to encrypt it.. when creating a self-signed certificate using PowerShell. exportable to a Personal Information Exchange (. gNBRyeWFXl2VydmjZNwJIM=" /> </Section> </Settings> Inject application secrets into application instances .0" encoding="utf-8" ?> <Settings xmlns:xsd="http://www. It will be used at runtime to decrypt values stored in a service's Settings. Data encipherment certificate A data encipherment certificate is used strictly for encryption and decryption of configuration values in a service's Settings.w3.w3. The certificate must meet the following requirements: The certificate must contain a private key.xml by decrypting with the same encipherment certificate. Read encrypted values out of Settings.com/2011/01/fabric"> <Section Name="MySettings"> <Parameter Name="MySecret" IsEncrypted="true" Value="I6jCCAeYCAxgFhBXABFxzAt . If you are not deploying to Azure..org/2001/XMLSchema-instance" xmlns="http://schemas. it does not sign the cipher text. This command only encrypts the value. and should not include Server Authentication or Client Authentication.0' Install the certificate in your cluster This certificate must be installed on each node in the cluster. This can be accomplished by performing secret encryption in a build environment and providing the encrypted secrets as parameters when creating application instances. gNBRyeWFXl2VydmjZNwJIM=".0. await fabricClient.0".org/2001/XMLSchema" xmlns:xsi="http://www. . gNBRyeWFXl2VydmjZNwJIM="} Using C#.0. or written in C#. applicationParameters: applicationParameters) ).0" encoding="utf-8" ?> <Settings xmlns:xsd="http://www. applicationParameters["MySecret"] = "I6jCCAeYCAxgFhBXABFxzAt .Ideally..0. declare an override parameter for the service in ApplicationManifest. the parameter is supplied to the New-ServiceFabricApplication command as a hash table: PS C:\Users\vturecek> New-ServiceFabricApplication -ApplicationName fabric:/MyApp -ApplicationTypeName MyAppType -ApplicationTypeVersion 1. Using PowerShell.. for easy integration in a build process. Creating an application instance can be scripted using PowerShell.w3.0" /> <ConfigOverrides> <ConfigOverride Name="Config"> <Settings> <Section Name="MySettings"> <Parameter Name="MySecret" Value="[MySecret]" IsEncrypted="true" /> </Section> </Settings> </ConfigOverride> </ConfigOverrides> </ServiceManifestImport> Now the value can be specified as an application parameter when creating an instance of the application.org/2001/XMLSchema-instance" xmlns="http://schemas.0 -ApplicationParameter @{"MySecret" = "I6jCCAeYCAxgFhBXABFxzAt . Use the MustOverride attribute instead of providing a value for a parameter: <?xml version="1. > <Parameters> <Parameter Name="MySecret" DefaultValue="" /> </Parameters> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="Stateful1Pkg" ServiceManifestVersion="1. deployment to different environments should be as automated as possible.ApplicationManager.xml: <ApplicationManifest .xml The Settings.. application parameters are specified in an ApplicationDescription as a NameValueCollection : FabricClient fabricClient = new FabricClient().CreateApplicationAsync(applicationDescription).w3... NameValueCollection applicationParameters = new NameValueCollection(). ApplicationDescription applicationDescription = new ApplicationDescription( applicationName: new Uri("fabric:/MyApp").xml. applicationTypeVersion: "1.com/2011/01/fabric"> <Section Name="MySettings"> <Parameter Name="MySecret" IsEncrypted="true" Value="" MustOverride="true" /> </Section> </Settings> To override values in Settings.microsoft.xml configuration file allows overridable parameters that can be provided at application creation time. Use overridable parameters in Settings. applicationTypeName: "MyAppType".. you need to make sure NETWORK SERVICE or whatever user account the service is running under has access to the certificate's private key.Context. This invisible character can cause an error when trying to locate a certificate by thumbprint. When using a data encipherment certificate. SecureString mySecretValue = configPackage.Parameters["MySecret"].xml in a configuration package allows for easy decrypting of values that have the IsEncrypted attribute set to true . an invisible character is placed at the beginning of the thumbprint string. In the following example.xml by defining users and security policies for certificates. you do not need to manually find the certificate. Use application secrets in service code The API for accessing configuration values from Settings. Simply call the DecryptValue() method to retrieve the original secret value: ConfigurationPackage configPackage = this.CodePackageActivationContext.GetConfigurationPackageObject("Config").Sections["MySettings"]. so be sure to delete this extra character. This configuration can be done in ApplicationManifest.Decrypt secrets from service code Services in Service Fabric run under NETWORK SERVICE by default on Windows and don't have access to certificates installed on the node without some extra setup. The certificate just needs to be installed on the node that the service is running on. Service Fabric will handle granting access for your service automatically if you configure it to do so.DecryptValue() Next Steps Learn more about running applications with different security permissions . the NETWORK SERVICE account is given read access to a certificate defined by its thumbprint: <ApplicationManifest … > <Principals> <Users> <User Name="Service1" AccountType="NetworkService" /> </Users> </Principals> <Policies> <SecurityAccessPolicies> <SecurityAccessPolicy GrantRights=”Read” PrincipalRef="Service1" ResourceRef="MyCert" ResourceType="Certificate"/> </SecurityAccessPolicies> </Policies> <Certificates> <SecretsCertificate Name="MyCert" X509FindType="FindByThumbprint" X509FindValue="[YourCertThumbrint]"/> </Certificates> </ApplicationManifest> NOTE When copying a certificate thumbprint from the certificate store snap-in on Windows. Since the encrypted text contains information about the certificate used for encryption.Settings. LocalService. and LocalSystem. the setup entry point. This makes running applications. SetupEntryPoint. The executable that is specified by EntryPoint is typically the long-running service host. Service Fabric also helps secure the resources that are used by applications at the time of deployment under the user accounts--for example. The following is a simple service manifest example that shows the SetupEntryPoint and the main EntryPoint for the service. Supported local system account types are LocalUser. . Configure security policies for your application 3/28/2017 • 11 min to read • Edit Online By using Azure Service Fabric. files.exe process runs under. By default. So having a separate setup entry point avoids having to run the service host executable with high privileges for extended periods of time. you can use Active Directory domain accounts. Service Fabric applications run under the account that the Fabric. Service Fabric also provides the capability to run applications under a local user account or local system account. and begins again with SetupEntryPoint if it ever terminates or crashes. you can help secure applications that are running in the cluster under different user accounts. directories. Configure the policy for a service setup entry point As described in the application model. and certificates. The executable that EntryPoint specifies is run after SetupEntryPoint exits successfully. The resulting process is monitored and restarted. NetworkService. is a privileged entry point that runs with the same credentials as Service Fabric (typically the NetworkService account) before any other entry point. This is useful when there are multiple users for different service entry points and they need to have certain common privileges that are available at the group level. more secure from one another. which is specified within the application manifest. When you're running Service Fabric on Windows Server in your datacenter by using the standalone installer. You can define and create user groups so that one or more users can be added to each group to be managed together. even in a shared hosted environment. right-click the service project and add a new file called MySetup.w3.bat.0"> <SetupEntryPoint> <ExeHost> <Program>MySetup.bat file is included in the service package. In Visual Studio.bat to the Visual Studio project to test the administrator privileges. Given that you have not applied a policy to the main entry point.org/2001/XMLSchema-instance"> <Description>An example service manifest</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="MyServiceType" /> </ServiceTypes> <CodePackage Name="Code" Version="1. This is the default account that all service entry points are run as. By default. such as SetupAdminUser. The following example shows how to configure the service to run under user administrator account privileges. Select the file.0.bat file is run.0" encoding="utf-8"?> <ApplicationManifest xmlns:xsd="http://www. it is not. <?xml version="1. Next.0" /> <ConfigOverrides /> <Policies> <RunAsPolicy CodePackageRef="Code" UserRef="SetupAdminUser" EntryPointType="Setup" /> </Policies> </ServiceManifestImport> <Principals> <Users> <User Name="SetupAdminUser"> <MemberOf> <SystemGroup Name="Administrators" /> </MemberOf> </User> </Users> </Principals> </ApplicationManifest> First.exe</Program> </ExeHost> </EntryPoint> </CodePackage> <ConfigPackage Name="Config" Version="1.microsoft. This tells Service Fabric that when the MySetup. ensure that the MySetup. Let's now add the file MySetup.org/2001/XMLSchema" xmlns:xsi="http://www.com/2011/01/fabric"> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="MyServiceTypePkg" ServiceManifestVersion="1. the code in MyServiceHost.com/2011/01/fabric" xmlns:xsi="http://www. <?xml version="1. This indicates that the user is a member of the Administrators system group. create a Principals section with a user name.0.bat</Program> <WorkingFolder>CodePackage</WorkingFolder> </ExeHost> </SetupEntryPoint> <EntryPoint> <ExeHost> <Program>MyServiceHost. right- . Next.w3.0.org/2001/XMLSchema-instance" ApplicationTypeName="MyApplicationType" ApplicationTypeVersion="1. you can change the security permissions that it runs under in the application manifest.microsoft.0" encoding="utf-8" ?> <ServiceManifest Name="MyServiceManifest" Version="SvcManifestVersion1" xmlns="http://schemas. configure a policy to apply this principal to SetupEntryPoint. under the ServiceManifestImport section.0" /> </ServiceManifest> Configure the policy by using a local account After you configure the service to have a setup entry point.w3. it should be RunAs with administrator privileges.0" xmlns="http://schemas.exe runs under the system NetworkService account.0. then you can go to this path for the MyApplicationType: C:\SfDevCluster\Data\_App\Node. See the following screenshot.2\MyApplicationType_App\work\out. it's preferable to run the startup script by using a local system account rather than an administrator account. For example."Machine") MyValue Then. In such cases.bat file and add the following commands: REM Set a system environment variable. After the service has started.txt echo %TestVariable% >> out. as shown in Service Fabric Explorer. This requires administrator privilege setx -m TestVariable "MyValue" echo System TestVariable set to > out.txt REM To delete this system variable us REM REG delete "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\Environment" /v TestVariable /f Next. build and deploy the solution to a local development cluster. you can see that the MySetup. the recommendation is to run the SetupEntryPoint as LocalSystem. instead of as a local user added to Administrators group.txt Configure the policy by using local system accounts Often. navigate to the application instance work folder to find the out. ensure that Copy to Output Directory is set to Copy if newer. In the Properties dialog box. Node 2. and choose Properties. Open a PowerShell command prompt and type: PS C:\ [Environment]::GetEnvironmentVariable("TestVariable".bat file was successful in a two ways. note the name of the node where the service was deployed and started in Service Fabric Explorer--for example. Next. if this service was deployed to Node 2. The following example shows setting the SetupEntryPoint to run as LocalSystem: .click to get the context menu.txt file that shows the value of TestVariable. Now open the MySetup. Running the RunAs policy as a member of the Administrators group typically doesn’t work well because machines have User Access Control (UAC) enabled by default. MySetup. Remember to set the Copy if newer property so that the file is also included in the service package.exe -ExecutionPolicy Bypass -Command ".0" encoding="utf-8"?> <ApplicationManifest xmlns:xsd="http://www. <?xml version="1. set the working folder: <SetupEntryPoint> <ExeHost> <Program>MySetup.microsoft.bat to start a PowerShell file: powershell."Machine") > out.com/2011/01/fabric"> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="MyServiceTypePkg" ServiceManifestVersion="1.w3. To do this. The following example shows a sample batch file that starts a PowerShell file called MySetup. you can run PowerShell. it's useful to see the console output from running a script for debugging purposes.ps1. MySetup.w3.0" /> <ConfigOverrides /> <Policies> <RunAsPolicy CodePackageRef="Code" UserRef="SetupLocalSystem" EntryPointType="Setup" /> </Policies> </ServiceManifestImport> <Principals> <Users> <User Name="SetupLocalSystem" AccountType="LocalSystem" /> </Users> </Principals> </ApplicationManifest> Start PowerShell commands from a setup entry point To run PowerShell from the SetupEntryPoint point. when the batch file runs.bat</Program> <WorkingFolder>CodePackage</WorkingFolder> </ExeHost> </SetupEntryPoint> Use console redirection for local debugging Occasionally. To change this folder.\MySetup. add the following to set a system environment variable: [Environment]::SetEnvironmentVariable("TestVariable".ps1. (See where to find this in the preceding example.org/2001/XMLSchema" xmlns:xsi="http://www.0" xmlns="http://schemas.ps1" In the PowerShell file. it looks at the application folder called work for files.bat runs.txt NOTE By default. when MySetup.ps1 file in the same folder.0.exe in a batch file that points to a PowerShell file. which writes the output to a file.0. we want this to find the MySetup. you can set a console redirection policy.org/2001/XMLSchema-instance" ApplicationTypeName="MyApplicationType" ApplicationTypeVersion="1. which is the application code package folder. "Machine") [Environment]::GetEnvironmentVariable("TestVariable". In this case. "MyValue".) . add a PowerShell file to the service project--for example. First. which sets a system environment variable called TestVariable. The file output is written to the application folder called log on the node where the application is deployed and run. ps1 file to write an Echo command. Let's look a little deeper into how to create different principals that can be applied as service policies. The following example shows setting the console redirection with a FileRetentionCount value: <SetupEntryPoint> <ExeHost> <Program>MySetup. you saw how to apply a RunAs policy to SetupEntryPoint.bat</Program> <WorkingFolder>CodePackage</WorkingFolder> <ConsoleRedirection FileRetentionCount="10"/> </ExeHost> </SetupEntryPoint> If you now change the MySetup. Customer1 and Customer2. Configure a policy for service code packages In the preceding steps. immediately remove this console redirection policy. Only use this for local development and debugging purposes. are made members of this local group. This is particularly useful if there are multiple users for different service entry points and they need to have certain common privileges that are available at the group level. this will write to the output file for debugging purposes: Echo "Test console redirection which writes to the application log folder on the node that the application is deployed to" After you debug your script. The following example shows a local group called LocalAdminGroup that has administrator privileges. WARNING Never use the console redirection policy in an application that is deployed in production because this can affect the application failover. <Principals> <Groups> <Group Name="LocalAdminGroup"> <Membership> <SystemGroup Name="Administrators"/> </Membership> </Group> </Groups> <Users> <User Name="Customer1"> <MemberOf> <Group NameRef="LocalAdminGroup" /> </MemberOf> </User> <User Name="Customer2"> <MemberOf> <Group NameRef="LocalAdminGroup" /> </MemberOf> </User> </Users> </Principals> . Two users. Create local user groups You can define and create user groups that allow one or more users to be added to a group. Customer3 in the following sample). If most of the code packages that are specified in the service manifest used by an application need to run under the same user. When a LocalUser account type is specified in the principals section of the application manifest. Specifying SetupEntryPoint is especially useful when you want to run certain high-privilege setup operations under a system account. You can specify this for the setup or main entry points. <Principals> <Users> <User Name="Customer3" AccountType="LocalUser" /> </Users> </Principals> If an application requires that the user account and password be same on all machines (for example. <Section Name="Hosting"> <Parameter Name="EndpointProviderEnabled" Value="true"/> <Parameter Name="NTLMAuthenticationEnabled" Value="true"/> <Parameter Name="NTLMAuthenticationPassworkSecret" Value="******" IsEncrypted="true"/> </Section> Assign policies to the service code packages The RunAsPolicy section for a ServiceManifestImport specifies the account from the principals section that should be used to run a code package. or you can specify All to apply it to both. these accounts do not have the same names as those specified in the application manifest (for example. they are dynamically generated and have random passwords. Service Fabric creates local user accounts on machines where the application is deployed. you can run the service under the credentials for an Active Directory user or group account. the application can just define a default RunAs policy with that user account. to enable NTLM authentication). Note that this is Active . <Policies> <DefaultRunAsPolicy UserRef="MyDefaultAccount"/> </Policies> Use an Active Directory domain group or user For an instance of Service Fabric that was installed on Windows Server by using the standalone installer. Instead.Create local users You can create a local user that can be used to help secure a service within the application. The actual service code can run under a lower-privilege account. the cluster manifest must set NTLMAuthenticationEnabled to true. The cluster manifest must also specify an NTLMAuthenticationPasswordSecret that will be used to generate the same password across all machines. By default. It also associates code packages from the service manifest with user accounts in the principals section. The following example specifies that if a code package does not have a RunAsPolicy specified. The following example shows different policies being applied: <Policies> <RunAsPolicy CodePackageRef="Code" UserRef="LocalAdmin" EntryPointType="Setup"/> <RunAsPolicy CodePackageRef="Code" UserRef="Customer3" EntryPointType="Main"/> </Policies> If EntryPointType is not specified. Apply a default policy to all service code packages You use the DefaultRunAsPolicy section to specify a default user account for all code packages that don’t have a specific RunAsPolicy defined. the default is set to EntryPointType=”Main”. the code package should run under the MyDefaultAccount specified in the principals section. <Principals> <Users> <User Name="TestUser" AccountType="DomainUser" AccountName="Domain\User" Password="[Put encrypted password here using MyCert certificate]" PasswordEncrypted="true" /> </Users> </Principals> <Policies> <DefaultRunAsPolicy UserRef="TestUser" /> <SecurityAccessPolicies> <SecurityAccessPolicy ResourceRef="MyCert" PrincipalRef="TestUser" GrantRights="Full" ResourceType="Certificate" /> </SecurityAccessPolicies> </Policies> <Certificates> Assign a security access policy for HTTP and HTTPS endpoints If you apply a RunAs policy to a service and the service manifest declares endpoint resources with the HTTP protocol. and you get failures with calls from the client. Otherwise. Then. You must deploy the private key of the certificate to decrypt the password to the local machine by using an out- of-band method (in Azure. You can use the Invoke-ServiceFabricEncryptText PowerShell command to create the secret cipher text. you can then access other resources in the domain (for example. <Policies> <RunAsPolicy CodePackageRef="Code" UserRef="Customer1" />  <SecurityAccessPolicy ResourceRef="EndpointName" PrincipalRef="Customer1" />  <EndpointBindingPolicy EndpointRef="EndpointName" CertificateRef="Cert1" /> </Policies A complete application manifest example The following application manifest shows many of the different settings: . The following example shows an Active Directory user called TestUser with their domain password encrypted by using a certificate called MyCert. this is via Azure Resource Manager). <Policies> <RunAsPolicy CodePackageRef="Code" UserRef="Customer1" />  <SecurityAccessPolicy ResourceRef="EndpointName" PrincipalRef="Customer1" /> </Policies> For the HTTPS endpoint. you must specify a SecurityAccessPolicy to ensure that ports allocated to these endpoints are correctly access-control listed for the RunAs user account that the service runs under. which gives it full access rights.Directory on-premises within your domain and is not with Azure Active Directory (Azure AD). when Service Fabric deploys the service package to the machine. See Managing secrets in Service Fabric applications for details. file shares) that have been granted permissions. with the certificate defined in a certificates section in the application manifest. it is able to decrypt the secret and (along with the user name) authenticate with Active Directory to run under those credentials. By using a domain user or group. You can do this by using EndpointBindingPolicy.sys does not have access to the service. http. you also have to indicate the name of the certificate to return to the client. The following example applies the Customer3 account to an endpoint called ServiceEndpointName. microsoft.org/2001/XMLSchema" xmlns:xsi="http://www.com/2011/01/fabric"> <Parameters> <Parameter Name="Stateless1_InstanceCount" DefaultValue="-1" /> </Parameters> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="Stateless1Pkg" ServiceManifestVersion="1.w3.0.org/2001/XMLSchema-instance" ApplicationTypeName="Application3Type" ApplicationTypeVersion="1. <?xml version="1.0" xmlns="http://schemas.w3.0" encoding="utf-8"?> <ApplicationManifest xmlns:xsd="http://www.0" /> <ConfigOverrides /> <Policies> <RunAsPolicy CodePackageRef="Code" UserRef="Customer1" /> <RunAsPolicy CodePackageRef="Code" UserRef="LocalAdmin" EntryPointType="Setup" />  <SecurityAccessPolicy ResourceRef="EndpointName" PrincipalRef="Customer1" />  <EndpointBindingPolicy EndpointRef="EndpointName" CertificateRef="Cert1" /> </Policies> </ServiceManifestImport> <DefaultServices> <Service Name="Stateless1"> <StatelessService ServiceTypeName="Stateless1Type" InstanceCount="[Stateless1_InstanceCount]"> <SingletonPartition /> </StatelessService> </Service> </DefaultServices> <Principals> <Groups> <Group Name="LocalAdminGroup"> <Membership> <SystemGroup Name="Administrators" /> </Membership> </Group> </Groups> <Users> <User Name="LocalAdmin"> <MemberOf> <Group NameRef="LocalAdminGroup" /> </MemberOf> </User>  <User Name="Customer1" /> <User Name="MyDefaultAccount" AccountType="NetworkService" /> </Users> </Principals> <Policies> <DefaultRunAsPolicy UserRef="LocalAdmin" /> </Policies> <Certificates> <EndpointCertificate Name="Cert1" X509FindValue="FF EE E0 TT JJ DD JJ EE EE XX 23 4T 66 "/> </Certificates> </ApplicationManifest> Next steps Understand the application model Specify resources in a service manifest Deploy an application .0. xml files is installed with the Service Fabric SDK and tools to C:\Program Files\Microsoft SDKs\Service Fabric\schemas\ServiceFabricServiceModel. Manage application parameters for multiple environments 2/7/2017 • 5 min to read • Edit Online You can create Azure Service Fabric clusters by using anywhere from one to many thousands of machines. this configuration is not suitable for a single-machine cluster since you can't have multiple processes listening on the same endpoint on a single machine. The schema definition for the ServiceManifest. Default services and application parameters are configured in the application and service manifests. While application binaries can run without modification across this wide spectrum of environments. They are specified in the application manifest. Default services Service Fabric applications are made up of a collection of service instances. This ensures that your service is running on every node in the cluster (or every node in the node type if you have set a placement constraint).xml and ApplicationManifest. most applications have a set of core services that should always be created when the application is instantiated. Specifying environment-specific parameters The solution to this configuration issue is a set of parameterized default services and application parameter files that fill in those parameter values for a given environment. As a simple example. you will often want to configure the application differently. Instead. While it is possible for you to create an empty application and then create all service instances dynamically. consider InstanceCount for a stateless service. you will typically set InstanceCount to "1". These are referred to as "default services". depending on the number of machines you're deploying to. you will generally want to set this parameter to the special value of "-1". When you are running applications in Azure. However.xsd. with placeholders for per-environment configuration included in square brackets: <DefaultServices> <Service Name="Stateful1"> <StatefulService ServiceTypeName="Stateful1Type" TargetReplicaSetSize="[Stateful1_TargetReplicaSetSize]" MinReplicaSetSize="[Stateful1_MinReplicaSetSize]"> <UniformInt64Partition PartitionCount="[Stateful1_PartitionCount]" LowKey="-9223372036854775808" HighKey="9223372036854775807" /> </StatefulService> </Service> </DefaultServices> Each of the named parameters must be defined within the Parameters element of the application manifest: . Per-environment service configuration settings The Service Fabric application model enables services to include configuration packages that contain custom key- value pairs that are readable at run time. . <ConfigOverrides> <ConfigOverride Name="Config"> <Settings> <Section Name="MyConfigSection"> <Parameter Name="MaxQueueSize" Value="[Stateful1_MaxQueueSize]" /> </Section> </Settings> </ConfigOverride> </ConfigOverrides> This parameter can then be configured by environment as shown above. In the example above.xml file and then override these in the ApplicationManifest. one with a value set and the other will be overridden. the LowKey and HighKey values for the service's partitioning scheme are explicitly defined for all instances of the service since the partition range is a function of the data domain. and the application parameter file. Service Fabric will always choose from the application parameter file first (if specified).xml file for the Stateful1 service: <Section Name="MyConfigSection"> <Parameter Name="MaxQueueSize" Value="25" /> </Section> To override this value for a specific application/environment pair. NOTE Not all service instance parameters are suitable for per-environment configuration.xml file on a per instance basis. You can use application parameters to set environment variables values in the same way that these were used for config overrides. <Parameters> <Parameter Name="Stateful1_MinReplicaSetSize" DefaultValue="2" /> <Parameter Name="Stateful1_PartitionCount" DefaultValue="1" /> <Parameter Name="Stateful1_TargetReplicaSetSize" DefaultValue="3" /> </Parameters> The DefaultValue attribute specifies the value to be used in the absence of a more-specific parameter for a given environment. Setting and using environment variables You can specify and set environment variables in the ServiceManifest. there are three places where the value of a key can be set: the service configuration package. and finally the configuration package. create a ConfigOverride when you import the service manifest in the application manifest. then the application manifest. not the environment. The example below shows two environment variables. Suppose that you have the following setting in the Config\Settings. NOTE In the case of service configuration settings. You can do this by declaring it in the parameters section of the application manifest and specifying environment-specific values in the application parameter files. the application manifest. The values of these settings can also be differentiated by environment by specifying a ConfigOverride in the application manifest. 0" /> <EnvironmentOverrides CodePackageRef="MyCode"> <EnvironmentVariable Name="MyEnvVariable" Value="mydata"/> </EnvironmentOverrides> </ServiceManifestImport> Once the named service instance is created you can access the environment variables from code. where the ones in bold are the ones that you will use in your service.w3. reference the code package in the ServiceManifest with the EnvironmentOverrides element.GetEnvironmentVariable("MyEnvVariable"). The full list of environment variables is below.org/2001/XMLSchema-instance"> <Description>An example service manifest</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="MyServiceType" /> </ServiceTypes> <CodePackage Name="MyCode" Version="CodeVersion1"> <SetupEntryPoint> <ExeHost> <Program>MySetup. e. <?xml version="1.xml.0" encoding="utf-8" ?> <ServiceManifest Name="MyServiceManifest" Version="SvcManifestVersion1" xmlns="http://schemas.com/2011/01/fabric" xmlns:xsi="http://www. the other being used by Service Fabric runtime.g.microsoft. Service Fabric environment variables Service Fabric has built in environment variables set for each service instance. Fabric_ApplicationHostId Fabric_ApplicationHostType Fabric_ApplicationId Fabric_ApplicationName Fabric_CodePackageInstanceId Fabric_CodePackageName Fabric_Endpoint_[YourServiceName]TypeEndpoint Fabric_Folder_App_Log Fabric_Folder_App_Temp Fabric_Folder_App_Work . In C# you can do the following string EnvVariable = Environment.bat</Program> </ExeHost> </SetupEntryPoint> <EntryPoint> <ExeHost> <Program>MyServiceHost.0.exe</Program> </ExeHost> </EntryPoint> <EnvironmentVariables> <EnvironmentVariable Name="MyEnvVariable" Value=""/> <EnvironmentVariable Name="HttpGatewayPort" Value="19080"/> </EnvironmentVariables> </CodePackage> <ConfigPackage Name="MyConfig" Version="ConfigVersion1" /> <DataPackage Name="MyData" Version="DataVersion1" /> </ServiceManifest> To override the environment variables in the ApplicationManifest. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontEndServicePkg" ServiceManifestVersion="1. and Cloud.WriteLine(" Environment variable {0} = {1}".Application with a service type called FrontEndService when run on your local dev machine.xml --> <Application Name="fabric:/Application1" xmlns="http://schemas.ApplicationParameters\Local.5Node. de. a new application includes three application parameter files. Each of them defines the specific values for the parameters that are defined in the application manifest: <!-.Value).xml. } } Below are example environment variables for an application type called GuestExe.xml.GetEnvironmentVariables()) { if (de.StartsWith("Fabric")) { Console.Key.com/2011/01/fabric"> <Parameters> <Parameter Name ="Stateful1_MinReplicaSetSize" Value="2" /> <Parameter Name="Stateful1_PartitionCount" Value="1" /> <Parameter Name="Stateful1_TargetReplicaSetSize" Value="3" /> </Parameters> </Application> By default. named Local.Key.Application Fabric_CodePackageName = Code Fabric_Endpoint_FrontEndServiceTypeEndpoint = 80 Fabric_NodeIPOrFQDN = localhost Fabric_NodeName = _Node_2 Application parameter files The Service Fabric application project can include one or more application parameter files. Fabric_Folder_Application Fabric_NodeId Fabric_NodeIPOrFQDN Fabric_NodeName Fabric_RuntimeConnectionAddress Fabric_ServicePackageInstanceId Fabric_ServicePackageName Fabric_ServicePackageVersionInstance FabricPackageFileName The code belows shows how to list the Service Fabric environment variables foreach (DictionaryEntry de in Environment.ToString(). Local.microsoft.1Node. de.xml: . Fabric_ApplicationName = fabric:/GuestExe. You can do this through the Publish dialog in Visual Studio or through PowerShell. you need to choose the appropriate parameter file to apply with your application. see the Service Fabric technical overview. Deploy from Visual Studio You can choose from the list of available parameter files when you publish your application in Visual Studio./Deploy-FabricApplication -ApplicationPackagePath <app_package_path> -PublishProfileFile <publishprofile_path> Next steps To learn more about some of the core concepts that are discussed in this topic. For information about other app management capabilities that are available in Visual Studio. simply copy and paste an existing one and give it a new name.To create a new parameter file. . see Manage your Service Fabric applications in Visual Studio. . Identifying environment-specific parameters during deployment At deployment time. Deploy from PowerShell The Deploy-FabricApplication.ps1 PowerShell script included in the application project template accepts a publish profile as a parameter and the PublishProfile contains a reference to the application parameters file. FabricTransientException The operation failed due to a transient error condition of some kind. InvalidX509FindType The X509FindType is invalid. which can be thrown by many different FabricClient APIs. E_ACCESSDENIED is returned. or testing a service. the ErrorCode property indicates the exact cause of the exception. See the API reference documentation to find which exceptions are thrown by a specific method. System. For example. . OperationTimedOut is returned when the operation takes more than MaxOperationTimeout to complete. or transient infrastructure issues. Transient exceptions correspond to failed operations that can be retried. upgrade. System. an operation may fail because a quorum of replicas is temporarily not reachable. Each method can throw exceptions for errors due to incorrect input.Fabric. Common exceptions and errors when working with the FabricClient APIs 2/17/2017 • 1 min to read • Edit Online The FabricClient APIs enable cluster and application administrators to perform administrative tasks on a Service Fabric application.Fabric. Dispose of the FabricClient object you are using and instantiate a new FabricClient object. Error codes are defined in the FabricErrorCode enumeration. Any of the FabricClient methods can potentially throw FabricException. and removal. service. retry the operation. The following table lists the exceptions that are common across the FabricClient APIs.Fabric. For example. checking the health a cluster. System. Application developers and cluster administrators can use the FabricClient APIs to develop tools for managing the Service Fabric cluster and applications. InvalidCredentialType The credential type is invalid. runtime errors. InvalidX509StoreLocation The X509 store location is invalid.UnauthorizedAccessException The access check for the operation failed. Some common FabricErrorCode errors that can be returned in a FabricException: ERROR CONDITION CommunicationError A communication error caused the operation to fail. There are some exceptions. EXCEPTION THROWN WHEN System. System. or cluster. There are many different types of operations which can be performed using FabricClient.TimeoutException The operation timed out.FabricException A runtime error occurred while performing the operation. however. application deployment.FabricObjectClosedException The FabricClient object is in a closed state. InvalidProtectionLevel The protection level is invalid. It should be a comma-separated list.ERROR CONDITION InvalidX509StoreName The X509 store name is invalid. InvalidAllowedCommonNameList The format of common name list string is invalid. InvalidSubjectName The subject name is invalid. InvalidX509Store The X509 certificate store cannot be opened. . InvalidX509Thumbprint The X509 certificate thumbprint string is invalid. As a result. Service Fabric guarantees that the upgrade process is either successful. Service Fabric refers to these types of applications as guest executables. by using Visual Studio or a command-line utility. Guest executables are treated by Service Fabric like stateless services.xsd. The service manifest also includes some additional parameters that can be used to configure the service once it is deployed. Application lifecycle management. Service manifest The service manifest describes the components of a service. Java. It includes data. it is useful to understand the Service Fabric packaging and deployment model as described in application model. does not leave the application in an unknown or unstable state. and provides diagnostic information if there is a failure. In Service Fabric. such as the name and type of service. an application is a unit of deployment and upgrade. Density. You can run multiple applications in a cluster. An application can be upgraded as a single unit where potential failures and potential rollbacks are managed. It lists the services that compose it. The Service Fabric packaging model relies on two XML files: the application and service manifests. The schema definition for the ApplicationManifest. we cover the steps to package a guest executable and deploy it to Service Fabric. Application manifest The application manifest is used to describe the application. Service Fabric ensures that instances of an application are running. Service Fabric provides automatic rollback to the previous version if there is a bad health event reported during an upgrade. such as Node. based on availability and other metrics.xml and ServiceManifest. Service Fabric health monitoring detects if an application is running. such as the number of instances. This article describes how to package and deploy a guest executable to a Service Fabric cluster. if the upgrade fails. Applications that run in Service Fabric are made highly available. which eliminates the need for each application to run on its own hardware. they are placed on nodes in a cluster. or native applications in Azure Service Fabric. Samples Sample for packaging and deploying a guest executable Sample of two guest exectuables (C# and nodejs) communicating via the Naming service using REST Overview of application and service manifest files As part of deploying a guest executable. and other parameters that are used to define how one or more services should be deployed. . Deploy a guest executable to Service Fabric 4/7/2017 • 14 min to read • Edit Online You can run any type of application. or. and its code and configuration.xml files is installed with the Service Fabric SDK into C:\Program Files\Microsoft SDKs\Service Fabric\schemas\ServiceFabricServiceModel.js. Health monitoring. Besides providing upgrades with no downtime. In this article. Benefits of running a guest executable in Service Fabric There are several advantages to running a guest executable in a Service Fabric cluster: High availability. 1.ServiceManifest. Choose Guest Executable as the service template. the following: Code. If you expect the executable to change and want the ability to pick up new builds dynamically. during failover). the application should follow a predefined directory structure. you can choose either to use a Visual Studio project template or to create the application package manually. |-. 2.xml file (and other files if necessary) that the service can access at runtime to retrieve specific configuration settings.existingapp.Application package file structure To deploy an application to Service Fabric. This links to the source .exe |-.xml |-. Can be set to copy all the content of your folder to the Visual Studio Project. which is useful if the executable does not change.ApplicationManifest. Note that you can use linked folders when creating the application project in Visual Studio. you can choose to link to the folder instead. Data.xml |-. Using Visual Studio.xml and.Data |-. 3. This directory contains the service code. Code Package Behavior. TIP The easiest way to package an existing Windows executable into a service is to use Visual Studio.ApplicationPackageRoot |-. This directory contains a Settings. Package an existing executable When packaging a guest executable. Use Visual Studio to package an existing executable Visual Studio provides a Service Fabric service template to help you deploy a guest executable to a Service Fabric cluster.Code |-. Click Browse to select the folder with your executable and fill in the rest of the parameters to create the service. These subdirectories are the ServiceManifest. This is an additional directory to store additional local data that the service may need. Choose File > New Project. the application package structure and manifest files are created by the new project template for you. A subdirectory for each service included in the application is used to contain all the artifacts that the service requires. Service Fabric does not copy or replicate changes to the data directory if the service needs to be relocated (for example.Config |-. typically.xml The ApplicationPackageRoot contains the ApplicationManifest. Data should be used to store only ephemeral data. Config. The following is an example of that structure.Settings.GuestService1Pkg |-.xml file that defines the application. and create a Service Fabric application. NOTE You don't have to create the config and data directories if you don't need them. making it possible for you to update the guest executable in its source destination. If your service needs an endpoint for communication. The package should contain all the code that the application needs to run. When ready. you can publish the application to a remote cluster or check in the solution to source control.) NOTE Make sure that you include all the files and dependencies that the application needs. You can specify three values: CodeBase specifies that the working directory is going to be set to the code directory in the application package ( Code directory shown in the preceding file structure). 4. 3. Program specifies the executable that should be run to start the service. Add the application's code and configuration files. 2. and click OK. Do not assume that the dependencies are already installed. Give your service a name. 6. you can add the application's code and configuration files under the code and config directories. 7. Create the package directory structure You can start by creating the directory structure. You can also create additional directories or subdirectories under the code or config directories. location from within the project. you can now add the protocol. You can now use the package and publish action against your local cluster by debugging the solution in Visual Studio." Add the application's code and configuration files After you have created the directory structure. More details are in the next section. "Application package file structure. Those updates become part of the application package on build.xml file. as described in the preceding section. and type to the ServiceManifest. code and settings. WorkingFolder specifies the working directory for the process that is going to be started. Edit the application manifest file. (You can pick different names if you want. It can be a list of parameters with arguments. Create the package directory structure. 4. CodePackage specifies that the working directory is going to be set to the root of the application package ( GuestService1Pkg shown in the preceding file structure). . Arguments specifies the arguments that should be passed to the executable. Service Fabric does an xcopy of the content of the application root directory. Work specifies that the files are placed in a subdirectory called work. Go to the end of this article to see how to view your guest executable service running in Service Fabric Explorer. so there is no predefined structure to use other than creating two top directories. Service Fabric copies the content of the application package on all nodes in the cluster where the application's services are going to be deployed. port. Edit the service manifest file. Manually package and deploy an existing executable The process of manually packaging a guest executable is based on the following general steps: 1. 5. For example: <Endpoint Name="NodeAppTypeEndpoint" Protocol="http" Port="3000" UriScheme="http" PathSuffix="myapp/" Type="Input" /> . and can also potentially be used to upgrade the service's code by using the application lifecycle management infrastructure in Service Fabric.com/2011/01/fabric"> <ServiceTypes> <StatelessServiceType ServiceTypeName="NodeApp" UseImplicitHost="true"/> </ServiceTypes> <CodePackage Name="code" Version="1.exe</Program> <Arguments>bin/www</Arguments> <WorkingFolder>CodePackage</WorkingFolder> </ExeHost> </EntryPoint> </CodePackage> <Resources> <Endpoints> <Endpoint Name="NodeAppTypeEndpoint" Protocol="http" Port="3000" Type="Input" /> </Endpoints> </Resources> </ServiceManifest> The following sections go over the different parts of the file that you need to update.0" encoding="utf-8"?> <ServiceManifest xmlns:xsd="http://www. Update CodePackage The CodePackage element specifies the location (and version) of the service's code. <CodePackage Name="Code" Version="1.0.w3. Specify UseImplicitHost="true" .w3.org/2001/XMLSchema- instance" Name="NodeApp" Version="1. .0.Edit the service manifest file The next step is to edit the service manifest file to include the following information: The name of the service type.0. CodePackage also has the version attribute.0"> <SetupEntryPoint> <ExeHost> <Program>scripts\launchConfig. This attribute tells Service Fabric that the service is based on a self-contained app. Update ServiceTypes <ServiceTypes> <StatelessServiceType ServiceTypeName="NodeApp" UseImplicitHost="true" /> </ServiceTypes> You can pick any name that you want for ServiceTypeName . The following is an example of a ServiceManifest.org/2001/XMLSchema" xmlns:xsi="http://www.xml file: <?xml version="1.xml file to identify the service. This can be used to specify the version of the code.0. The value is used in the ApplicationManifest.0.cmd</Program> </ExeHost> </SetupEntryPoint> <EntryPoint> <ExeHost> <Program>node. The command to use to launch the application (ExeHost).0. so all Service Fabric needs to do is to launch it as a process and monitor its health.0"> The Name element is used to specify the name of the directory in the application package that contains the service's code. This is an ID that Service Fabric uses to identify a service.microsoft. Any script that needs to be run to set up the application (SetupEntrypoint).0" xmlns="http://schemas. cmd that is located in the scripts subdirectory of the code directory (assuming the WorkingFolder element is set to CodeBase).js application listens on http on port 3000. Work specifies that the files are placed in a subdirectory called work. the Node. In this example.Optional: Update SetupEntrypoint <SetupEntryPoint> <ExeHost> <Program>scripts\launchConfig. The SetupEntryPoint can execute any type of file: executable files. and PowerShell cmdlets. see Configure SetupEntryPoint. For more details. The SetupEntryPoint is executed every time the service is restarted. the SetupEntryPoint runs a batch file called LaunchConfig. Program specifies the name of the executable that should start the service. It can be a list of parameters with arguments. so it does not need to be included if there is no initialization required. You can specify three values: CodeBase specifies that the working directory is going to be set to the code directory in the application package ( Code directory in the preceding file structure). It is an optional step.cmd</Program> </ExeHost> </SetupEntryPoint> The SetupEntryPoint element is used to specify any executable or batch file that should be executed before the service's code is launched. the Endpoint element specifies the endpoints that the application can listen on. so setup scripts need to be grouped in a single batch file if the application's setup requires multiple scripts. batch files. Update EntryPoint <EntryPoint> <ExeHost> <Program>node. The ExeHost element specifies the executable (and arguments) that should be used to launch the service. WorkingFolder specifies the working directory for the process that is going to be started. Furthermore you can ask Service Fabric to publish this endpoint to the Naming Service so other services can discover the endpoint address to this service. There is only one SetupEntryPoint. CodePackage specifies that the working directory is going to be set to the root of the application package ( GuestService1Pkg in the preceding file structure).exe</Program> <Arguments>bin/www</Arguments> <WorkingFolder>CodeBase</WorkingFolder> </ExeHost> </EntryPoint> The EntryPoint element in the service manifest file is used to specify how to launch the service. Update Endpoints and register with Naming Service for communication <Endpoints> <Endpoint Name="NodeAppTypeEndpoint" Protocol="http" Port="3000" Type="Input" /> </Endpoints> In the preceding example. In the preceding example. Arguments specifies the arguments that should be passed to the executable. This enables you to be able to communicate between services that . The WorkingFolder is useful to set the correct working directory so that relative paths can be used by either the application or initialization scripts. <Endpoints> <Endpoint Name="NodeAppTypeEndpoint" Protocol="http" Port="3000" UriScheme="http" PathSuffix="myapp/" Type="Input" /> </Endpoints> You can use these addresses with reverse proxy to communicate between services. you see http://localhost:3000/myapp/ . you need to make some changes to the ApplicationManifest. and it is calculated for you.xml file to ensure that the correct service type and name are used. which specifies the name of the directory where the ServiceManifest. Only use this for local development and debugging purposes.0. you can specify one or more services that you want to include in the app. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="NodeApp" ServiceManifestVersion="1.1.xml file is located.xml file using the ConsoleRedirection element. . in Service Fabric Explorer you see an endpoint similar to http://10.0. Edit the application manifest file Once you have configured the Servicemanifest.0.org/2001/XMLSchema-instance" ApplicationTypeName="NodeAppType" ApplicationTypeVersion="1. once the service is deployed. <?xml version="1.0" xmlns="http://schemas. Console redirection can be configured in the ServiceManifest. Services are referenced with ServiceManifestName .microsoft.0" /> </ServiceManifestImport> </ApplicationManifest> ServiceManifestImport In the ServiceManifestImport element. The published endpoint address is of the form UriScheme://IPAddressOrFQDN:Port/PathSuffix . WARNING Never use the console redirection policy in an application that is deployed in production because this can affect the application failover. it is useful to be able to see console logs to find out if the application and configuration scripts show any errors. UriScheme and PathSuffix are optional attributes.w3. In the following example.0.0" /> </ServiceManifestImport> Set up logging For guest executables.xml file.0" encoding="utf-8"?> <ApplicationManifest xmlns:xsd="http://www.w3.4.92:3000/myapp/ published for the service instance. Or if this is a local machine.org/2001/XMLSchema" xmlns:xsi="http://www. IPAddressOrFQDN is the IP address or fully qualified domain name of the node this executable gets placed on.are guest executables.com/2011/01/fabric"> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="NodeApp" ServiceManifestVersion="1. exe</Program> <Arguments>bin/www</Arguments> <WorkingFolder>CodeBase</WorkingFolder> <ConsoleRedirection FileRetentionCount="5" FileMaxSizeInKb="2048"/> </ExeHost> </EntryPoint> ConsoleRedirection can be used to redirect console output (both stdout and stderr) to a working directory. depending on the type of application that you are deploying. Connect-ServiceFabricCluster localhost:19000 Write-Host 'Copying application package. FileMaxSizeInKb specifies the maximum size of the log files.. The following PowerShell script shows how to deploy your application to the local development cluster. means that the log files for the previous five executions are stored in the working directory. it can be deployed as single or multiple instances. This provides the ability to verify that there are no errors during the setup or execution of the application in the Service Fabric cluster.' Copy-ServiceFabricApplicationPackage -ApplicationPackagePath 'C:\Dev\MultipleApplications' - ImageStoreConnectionString 'file:C:\SfDevCluster\Data\ImageStoreShare' -ApplicationPackagePathInImageStore 'nodeapp' Write-Host 'Registering application type. . In this case. use Service Fabric Explorer to determine which node the service is running on. or it can be deployed in such a way that there is one instance of the service on each node of the Service Fabric cluster. The InstanceCount parameter of the New-ServiceFabricService cmdlet is used to specify how many instances of the service should be launched in the Service Fabric cluster. To determine where the files are located. A value of 5. and which working directory is being used..0 New-ServiceFabricService -ApplicationName 'fabric:/nodeapp' -ServiceName 'fabric:/nodeapp/nodeappservice' - ServiceTypeName 'NodeApp' -Stateless -PartitionSchemeSingleton -InstanceCount 1 TIP Compress the package before copying to the image store if the package is large or has many files. You can set the InstanceCount value. The two most common scenarios are: . A Service Fabric service can be deployed in various "configurations.. and start a new Service Fabric service.' Register-ServiceFabricApplicationType -ApplicationPathInImageStore 'nodeapp' New-ServiceFabricApplication -ApplicationName 'fabric:/nodeapp' -ApplicationTypeName 'NodeAppType' - ApplicationTypeVersion 1. Service Fabric's InstanceCount = "1" scheduler determines which node the service is going to be deployed on.. Read more here. for example. <EntryPoint> <ExeHost> <Program>node. Deployment The last step is to deploy your application. Log files are saved in one of the service's working directories. This process is covered later in this article." For example. FileRetentionCount determines how many files are saved in the working directory. only one instance of the service is deployed in the cluster. one instance of the service is deployed on every node in the Service Fabric cluster. you can find the working directory and the service's log folder. Client traffic can then be distributed across the service that is running on all nodes in the cluster. because client applications need to "connect" to any of the nodes in the cluster to use the endpoint. you see the essential node information. This configuration can also be used when. InstanceCount ="-1" . identify the node where the service is running. as shown in the following screenshot. This is a useful configuration for front-end applications (for example. including its location on disk. In this example. The result is having one (and only one) instance of the service for each node in the cluster. If you browse to the directory by using Server Explorer. it runs on Node1: If you navigate to the node and browse to the application. a REST endpoint). Check your running application In Service Fabric Explorer. In this case. all nodes of the Service Fabric cluster are connected to a load balancer. for example. . Yeoman would have created an application package with the appropriate application and manifest files along with install and uninstall scripts. 2. In a terminal. Next steps In this article. Name your application. type yo azuresfguest . Choose the type of your first service and name it. and provide the details including path of the executable and the parameters it must be invoked with.Creating a guest executable using Yeoman for Service Fabric on Linux The procedure for creating and deploying a guest executable on Linux is the same as deploying a csharp or java application. 1. See the following articles for related information and tasks. 3. you have learned how to package a guest executable and deploy it to Service Fabric. including a link to the prerelease of the packaging tool Sample of two guest exectuables (C# and nodejs) communicating via the Naming service using REST Deploy multiple guest executables Create your first Service Fabric application using Visual Studio . Sample for packaging and deploying a guest executable. Choose Guest Binary for a guest executable (and Guest Container for a container). js front end that uses MongoDB as the data store. this article uses the Service Fabric packaging tool.js |-.app. you need to add Node.js application This article assumes that Node. For building and deploying a single Service Fabric package read how to deploy a guest executable to Service Fabric. you create an application package for the Node.js is not installed on the nodes in the Service Fabric cluster. building the Visual Studio solution. which is available at http://aka. |-. Note: If you choose to link the source in the Visual Studio project.exe As a next step.routes |-.index.users.etc.js |-.express |-. |-.node_modules |-.jade |-. The directory structure of the Node. Packaging the Node.images |-.package. The code below creates a Service Fabric application package that contains the Node.index. See Using Visual Studio to package an existing application. You can use Visual Studio to produce the application package that contains multiple guest executables.exe to the root directory of your node application before packaging.etc. While this walkthrough shows how to deploy an application with a Node.js |-.jade |-. |-.www |-.views |-.ms/servicefabricpacktool.bin |-.bin |-.etc. As a consequence. . For the manual packaging.node.json |-.NodeApplication |-. will make sure that your application package is up to date with changes in the source. you can apply the steps to any application that has dependencies on another application.js application. Samples Sample for packaging and deploying a guest executable Sample of two guest exectuables (C# and nodejs) communicating via the Naming service using REST Manually package the multiple guest executable application Alternatively you can manually package the guest executable. Deploy multiple guest executables 4/7/2017 • 6 min to read • Edit Online This article shows how to package and deploy multiple guest executables to Azure Service Fabric.js application.public |-.. right click on the application project and select the Add->New Service Fabric service to add the second guest executable project to the solution.js application (using Express web framework and Jade template engine) should look similar to the one below: |-. After you have added the first guest executable. exe |-. as shown in the code snippet below: <CodePackage Name="C" Version="1. the Node.js web server by executing node.js |-.NodeApplication |-.\ServiceFabricAppPackageUtil.exe .exe</Program> <Arguments>'bin/www'</Arguments> <WorkingFolder>CodePackage</WorkingFolder> </ExeHost> </EntryPoint> </CodePackage> In this sample.exe. /exe defines the executable that Service Fabric is supposed to launch. If you browse to the directory that was specified in the /target parameter.xml now has a section that describes how the Node. and not to the Service Fabric application name.exe' /ma:'bin/www' /AppType:NodeAppType Below is a description of the parameters that are being used: /source points to the directory of the application that should be packaged.xml The generated ServiceManifest.xml |-. in this case node.routes |-.ServiceManifest.js web server should be launched.data |-.node_modules |-.views |-.xml |-.js web server listens to port 3000. /ma defines the argument that is being used to launch the executable. It's important to understand that this translates to the service name in the manifest.C |-. /target defines the directory in which the package should be created.package. .ApplicationManifest. .0"> <EntryPoint> <ExeHost> <Program>node.app.exe /source:'[yourdirectory]\MyNodeApplication' /target:'[yourtargetdirectory] /appname:NodeService /exe:'node.xml file as shown below.node.js is not installed. Service Fabric needs to launch the Node.public |-. /AppType defines the Service Fabric application type name. /ma:'bin/www' tells the packaging tool to use bin/ma as the argument for node. As Node. so you need to update the endpoint information in the ServiceManifest. /appname defines the application name of the existing application.bin |-.exe bin/www . This directory has to be different from the source directory.config |--Settings. you can see that the tool has created a fully functioning Service Fabric package as shown below: |--[yourtargetdirectory] |-.json |-. js application.mongod. |-.bin |-. <Resources> <Endpoints> <Endpoint Name="NodeServiceEndpoint" Protocol="http" Port="3000" Type="Input" /> </Endpoints> </Resources> Packaging the MongoDB application Now that you have packaged the Node. To package MongoDB. You also need to make sure that you are using the same ApplicationType name.xml . |-.\ServiceFabricAppPackageUtil. Both binaries are located in the bin directory of your MongoDB installation directory. In fact.MongoDB |-.xml |-. You should either use durable storage or implement a MongoDB replica set in order to prevent data loss.ServiceManifest. you need to make sure that the /target parameter points to the same directory that already contains the application manifest along with the Node.C |--bin |-.mongo.exe --dbpath [path to data] NOTE The data is not being preserved in the case of a node failure if you put the MongoDB data directory on the local directory of the node.mongo.etc.exe. you want to make sure you package Mongod. In PowerShell or the command shell. the steps that you go through now are not specific to Node. The directory structure looks similar to the one below.exe and Mongo.xml |-.exe |-.js application. you can go ahead and package MongoDB. so you need to use the /ma parameter when packaging MongoDB.js and MongoDB.config |--Settings.exe /source: [yourdirectory]\MongoDB' /target:'[yourtargetdirectory]' /appname:MongoDB /exe:'bin\mongod.anybinary.ApplicationManifest.mongod.exe |-. mongod. As mentioned before.exe Service Fabric needs to start MongoDB with a command similar to the one below. they apply to all applications that are meant to be packaged together as one Service Fabric application.exe |-.MyNodeApplication |-.MongoDB |-.exe |-. we run the packaging tool with the following parameters: .exe' /ma:'--dbpath [path to data]' /AppType:NodeAppType In order to add MongoDB to your Service Fabric application package. Let's browse to the directory and examine what the tool has created. |--[yourtargetdirectory] |-. perform the following steps: 1.js application--for example http://localhost:3000. In this tutorial.0" /> </ServiceManifestImport> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="NodeService" ServiceManifestVersion="1.0 Once the application is successfully published to the local cluster. the tool added a new folder.js application on the port that we have entered in the service manifest of the Node. you have seen how to easily package two existing applications as one Service Fabric application. to the directory that contains the MongoDB binaries.org/2001/XMLSchema" xmlns:xsi="http://www. you can access the Node.w3. .' Copy-ServiceFabricApplicationPackage -ApplicationPackagePath '[yourtargetdirectory]' - ImageStoreConnectionString 'file:C:\SfDevCluster\Data\ImageStoreShare' -ApplicationPackagePathInImageStore 'NodeAppType' Write-Host 'Registering application type. If you open the ApplicationManifest. <ApplicationManifest xmlns:xsd="http://www. Adding more guest executables to an existing application using Yeoman on Linux To add another service to an application already created using yo . MongoDB.w3.xml file.org/2001/XMLSchema- instance" ApplicationTypeName="MyNodeApp" ApplicationTypeVersion="1. Change directory to the root of the existing application. if MyApplication is the application created by Yeoman.0" xmlns="http://schemas.microsoft. For example. such as high availability and health system integration.. The code below shows the content of the application manifest. you can see that the package now contains both the Node..com/2011/01/fabric"> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="MongoDB" ServiceManifestVersion="1.0" /> </ServiceManifestImport> <DefaultServices> <Service Name="MongoDBService"> <StatelessService ServiceTypeName="MongoDB"> <SingletonPartition /> </StatelessService> </Service> <Service Name="NodeServiceService"> <StatelessService ServiceTypeName="NodeService"> <SingletonPartition /> </StatelessService> </Service> </DefaultServices> </ApplicationManifest> Publishing the application The last step is to publish the application to the local Service Fabric cluster by using the PowerShell scripts below: Connect-ServiceFabricCluster localhost:19000 Write-Host 'Copying application package. cd ~/YeomanSamples/MyApplication .. Run yo azuresfguest:AddService and provide the necessary details.As you can see. 2.' Register-ServiceFabricApplicationType -ApplicationPathInImageStore 'NodeAppType' New-ServiceFabricApplication -ApplicationName 'fabric:/NodeApp' -ApplicationTypeName 'NodeAppType' - ApplicationTypeVersion 1.js application and MongoDB. You have also learned how to deploy it to Service Fabric so that it can benefit from some of the Service Fabric features.. Next steps Learn about deploying containers with Service Fabric and containers overview Sample for packaging and deploying a guest executable Sample of two guest exectuables (C# and nodejs) communicating via the Naming service using REST . you can mix them in your applications providing you the flexibility of how to deploy your code. so that the experience of using container images to deploy services is first class. Services can be developed in many ways. Containers are a virtualization technology that virtualizes the underlying operating system from applications. Service Fabric deploys and activates these services as processes. and system libraries run inside a container with full. individually deployable components that run as isolated instances on the same kernel to take advantage of virtualization that an operating system provides. and tooling support especially in Visual Studio. Preview: Service Fabric and containers 2/7/2017 • 5 min to read • Edit Online NOTE This feature is in preview for Linux and Windows Server 2016. Resource governance: A container can limit the physical resources that it can consume on its host. this degree of security and resource isolation is the main benefit for using containers with Service Fabric. with years of usage and optimization in massive scale services at Microsoft. Processes provide the fastest activation and highest density usage of the resources in a cluster. Importantly. Fast boot-up time: Containers don’t have to boot an operating system. Compared to virtual machines. dependencies. Container types Service Fabric supports the following types of containers. because Service Fabric treats these all the same. containers have the following advantages: Small in size: Containers use a single storage space and smaller deltas for each layer to increase efficiency. depending on your scenario. Service Fabric can also deploy services in container images. Service Fabric will continue to add extensive support for containers on both Windows and Linux including improvements to networking. By default. Portability: A containerized application image can be ported to run in the cloud. This means that each application and its runtime. on premises. than virtual machines. Containers provide an immutable environment for applications to run with varying degrees of isolation. so they can be up and available much faster. typically in seconds. private access to the container's own isolated view of operating system constructs. inside virtual machines. you can mix services in processes and services in containers in the same application. You get the best of both worlds. This provides you with the choice of using either containers to package existing code (for example IIS MVC apps) or the Service Fabric programming models and. volume drivers. which otherwise runs services in processes. resources constraints. Introduction Azure Service Fabric is an orchestrator of services across a cluster of machines. security. or directly on physical machines. diagnostics. Containers and Service Fabric roadmap Over the next few releases. from using the Service Fabric programming models to deploying guest executables. Containers run directly on top of the kernel and have an isolated view of the file system and other resources. Along with portability. . What are containers? Containers are encapsulated. Windows Server containers Windows Server 2016 provides two different types of containers that differ in the level of provided isolation. Windows Server containers and Docker containers are similar because both have namespace and file system isolation but share the kernel with the host they are running on. Docker Hub is a central repository to store and retrieve container images. Mix containers and Service Fabric microservices: Use an existing container image for part of your application. multitenant scenarios. For a walkthrough about how to do this. Windows Hyper-V containers provide more isolation and security because each container does not share the operating system kernel with other containers or with the host. this isolation has traditionally been provided by cgroups and namespaces. read Deploy a Docker container to Service Fabric.NET Core. With this higher level of security isolation. For a walkthrough about how to do this. Hyper- V containers are targeted at hostile. and Windows Server containers behave similarly. you might use the NGINX container for the web front end of your application and stateful services for the more intensive back-end computation. For example. These ASP. You can package these into container images from the precreated IIS image and deploy them with Service Fabric. read Deploy a Windows container to Service Fabric. Scenarios for using containers Here are typical examples where a container is a good choice: IIS lift and shift: If you have existing ASP. put them in a container instead of migrating them to ASP.NET MVC apps are dependent on Internet Information Services (IIS). On Linux.Docker containers on Linux Docker provides high-level APIs to create and manage containers on top of Linux kernel containers. See Container Images on Windows Server for information about how to create IIS images. The following figure shows the different types of virtualization and isolation levels available in the operating system. .NET MVC apps that you want to continue to use. Container port to host port mapping. Resource governance. Reliable Services are not currently supported on Linux. The capabilities include: Container image deployment and activation. Support for stateless services in Windows containers will be added in future release. These are called containerized services. Examples include Node. In the Service Fabric application model. If services might consume many resources and affect the performance of others (such as a long-running. Stateless services inside containers: These are stateless services that use the Reliable Services and the Reliable Actors programming models. Stateful services inside containers: These are stateful services that use the Reliable Actors programming model on Linux. Deploy a Windows container to Service Fabric on Windows Server 2016 Deploy a Docker container to Service Fabric on Linux . and that Service Fabric has features that support containers. Ability to configure and set environment variables. This is currently only supported on Linux. Service Fabric support for containers Service Fabric currently supports deployment of Docker containers on Linux and Windows Server containers on Windows Server 2016. JavaScript. a container represents an application host in which multiple service replicas are placed. or any code (executables). that Service Fabric is a container orchestrator. Reduce impact of "noisy neighbors" services: You can use the resource governance ability of containers to restrict the resources that a service uses on a host. Container-to-container discovery and communication. Support for Hyper-V containers will be added in a future release. Repository authentication. query-like operation). As a next step. Next steps In this article. you learned about containers. Support for stateful services in Windows containers will be added in future release. Service Fabric has several container capabilities that help you build applications that are composed of microservices that are containerized. The following scenarios are supported for containers: Guest containers: This scenario is identical to the guest executable scenario where you can deploy existing applications in a container. we will go over examples of each of the features to show you how to use them. consider putting these services into containers that have resource governance.js. If your containerized service needs an endpoint for communication.com/ for example myrepo/myimage:v1 4. Choose Guest Container as the service template. When you use Visual Studio. The capabilities include: Container image deployment and activation Resource governance Repository authentication Container port-to-host port mapping Container-to-container discovery and communication Ability to configure and set environment variables Let's look at how each of capabilities works when you're packaging a containerized service to be included in your application. Use Visual Studio to package an existing container image Visual Studio provides a Service Fabric service template to help you deploy a container to a Service Fabric cluster.docker. 5. and type to the ServiceManifest. and click OK. you can now add the protocol. TIP The easiest way to package an existing container image into a service is to use Visual Studio. and create a Service Fabric application. Service Fabric has several container capabilities that help you with building applications that are composed of microservices that are containerized. For example: <Endpoint Name="MyContainerServiceEndpoint" Protocol="http" Port="80" UriScheme="http" PathSuffix="myapp/" Type="Input" /> . port. the application package structure and manifest files are created by the New Project template for you. Preview: Deploy a Windows container to Service Fabric 2/21/2017 • 8 min to read • Edit Online This article walks you through the process of building containerized services in Windows containers. Choose Image Name and provide the path to the image in your container repository such as at https://hub. 1. 3. Choose File > New Project. Package a Windows container When you package a container. NOTE This feature is in preview for Windows Server 2016.xml file. you can choose to use either a Visual Studio project template or create the application package manually. 2. Give your service a name. Edit the service manifest file. For an example application checkout the Service Fabric container code samples on GitHub Creating a Windows Server 2016 cluster To deploy your containerized application. You can now use the package and publish action against your local cluster if this is Windows Server 2016 with container support activated. 2. In the service manifest. a container represents an application host in which multiple service replicas are placed. Deploy and activate a container image In the Service Fabric application model. Edit the application manifest file. 9. "vmImageSku": { "defaultValue": "2016-Datacenter-with-Containers". 7."defaultValue": "WindowsServer" }. 3. 8. put the name of the container image into a ContainerHost element in the service manifest. If your container needs resource governance then add a ResourceGovernancePolicy . 4. By providing the UriScheme this automatically registers the container endpoint with the Service Fabric Naming service for discoverability. choose the Windows Server 2016 with Containers image option in Azure. Alternatively read Leok's blog post here on using Service Fabric and Windows containers. you can publish the application to a remote cluster or check in the solution to source control. you need to create a cluster running Windows Server 2016 with container support enabled. Create the package directory structure. You also need to configure the container port-to-host port mapping by specifying a PortBinding policy in the application manifest as described below."type": "string" }. The following partial manifest shows an example of how to deploy the container called myimage:v1 from a repository called myrepo : . 6."type": "string" }. See the article Create a Service Fabric cluster by using Azure Resource Manager. When ready. To deploy a cluster using ARM. Publish the containers to your repository. Then set the ImageName to be the name of the container repository and image. To deploy and activate a container. This can either be on your local development machine or deployed via Azure Resource Manager (ARM) in Azure. The port can either be fixed (as shown in the preceding example) or dynamically allocated (left blank and a port is allocated from the designated application port range) just as you would with any service. add a ContainerHost for the entry point. "vmImageVersion": { "defaultValue": "latest". Manually package and deploy a container image The process of manually packaging a containerized service is based on the following steps: 1. Ensure that you use the following ARM settings: "vmImageOffer": { "type": "string". If your container needs to authenticate with a private repository then add RepositoryCredentials . You can also use the 5 Node ARM template here to create a cluster. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. support for specifying specific block IO limits such as IOPs. where the password was encrypted by using a certificate . <CodePackage Name="Code" Version="1.0"/> <Policies> <ResourceGovernancePolicy CodePackageRef="FrontendService.Code"> <RepositoryCredentials AccountName="TestUser" Password="12345" PasswordEncrypted="false"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> We recommend that you encrypt the password by using a certificate that's deployed to the machine. and others will be included. or SSH key. Understand resource governance Resource governance is a capability of the container that restricts the resources that the container can use on the host. read/write BPS. are used to specify the sign-in information. specified in the application manifest. for downloading the container image from the image repository.0"> <EntryPoint> <ContainerHost> <ImageName>myrepo/myimage:v1</ImageName> <Commands></Commands> </ContainerHost> </EntryPoint> </CodePackage> You can provide input commands by specifying the optional Commands element with a comma-delimited set of commands to run inside the container. The sign-in credentials. which is specified in the application manifest is used to declare resource limits for a service code package.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. The ResourceGovernancePolicy .Code" CpuShares="500" MemoryInMB="1024" MemorySwapInMB="4084" MemoryReservationInMB="1024" /> </Policies> </ServiceManifestImport> Authenticate a repository To download a container. Resource limits can be set for the following resources: Memory MemorySwap CpuShares (CPU relative weight) MemoryReservationInMB BlkioWeight (BlockIO relative weight). NOTE In a future release. The following example shows an account called TestUser along with the password in clear text (not recommended): <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. you might have to provide sign-in credentials to the container repository. The following example shows an account called TestUser. Communication is performed by providing the reverse proxy http listening port and the name of the services that you want to communicate with as environment variables. You can use the Invoke-ServiceFabricEncryptText PowerShell command to create the secret cipher text for the password. when Service Fabric deploys the service package to the machine. For more information. By using the secret along with the account name. It can also specify no port at all.Code"> <PortBinding ContainerPort="8905" EndpointRef="Endpoint1"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> By registering with the Naming service. port 80).Code"> <RepositoryCredentials AccountName="TestUser" Password="[Put encrypted password here using MyCert certificate ]" PasswordEncrypted="true"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> Configure container port-to-host port mapping You can configure a host port used to communicate with the container by specifying a PortBinding in the application manifest.Code"> <PortBinding ContainerPort="8905"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> Configure container-to-container discovery and communication By using the PortBinding policy. see the next section. The private key of the certificate that's used to decrypt the password must be deployed to the local machine in an out-of-band method. (In Azure. it can then authenticate with the container repository. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. For more information.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. Service Fabric can automatically publish this endpoint to the Naming service.) Then. using the Endpoint tag in the service manifest of a guest container. Other services that are running in the cluster can thus discover this container using the REST queries for resolving. . you can map a container port to an Endpoint in the service manifest as shown in the following example. in which case a random port from the cluster's application port range is chosen for you. The port binding maps the port to which the service is listening inside the container to a port on the host. The endpoint Endpoint1 can specify a fixed port (for example. see the article Managing secrets in Service Fabric applications. it can decrypt the secret. you can easily do container-to-container communication in the code within your container by using the reverse proxy.called MyCert. If you specify an endpoint.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. this method is Azure Resource Manager. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. These environment variable values can be overridden specifically in the application manifest or specified during deployment as application parameters.microsoft. The following service manifest XML snippet shows an example of how to specify environment variables for a code package: <ServiceManifest Name="FrontendServicePackage" Version="1. Complete examples for application and service manifest An example application manifest follows: .0"> <EntryPoint> <ContainerHost> <ImageName>myrepo/myimage:v1</ImageName> <Commands></Commands> </ContainerHost> </EntryPoint> <EnvironmentVariables> <EnvironmentVariable Name="HttpGatewayPort" Value=""/> <EnvironmentVariable Name="BackendServiceName" Value=""/> </EnvironmentVariables> </CodePackage> </ServiceManifest> These environment variables can be overridden at the application manifest level: <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1.Code"> <EnvironmentVariable Name="BackendServiceName" Value="[BackendSvc]"/> <EnvironmentVariable Name="HttpGatewayPort" Value="19080"/> </EnvironmentOverrides> </ServiceManifestImport> In the previous example. These settings enable you to specify the value for BackendServiceName value when you deploy the application and not have a fixed value in the manifest. both for services that are deployed in containers or for services that are deployed as processes/guest executables.0" xmlns="http://schemas.w3.Configure and set environment variables Environment variables can be specified for each code package in the service manifest.com/2011/01/fabric" xmlns:xsi="http://www. we specified an explicit value for the HttpGateway environment variable (19000).0"/> <EnvironmentOverrides CodePackageRef="FrontendService.Code" Version="1. while we set the value for BackendServiceName parameter via the [BackendSvc] application parameter.org/2001/XMLSchema-instance"> <Description>a guest executable service in a container</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="StatelessFrontendService" UseImplicitHost="true"/> </ServiceTypes> <CodePackage Name="FrontendService. 0" xmlns="http://schemas.org/2001/XMLSchema-instance"> <Description> A service that implements a stateless front end in a container</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="StatelessFrontendService" UseImplicitHost="true"/> </ServiceTypes> <CodePackage Name="FrontendService.0"/> <EnvironmentOverrides CodePackageRef="FrontendService. <ApplicationManifest ApplicationTypeName="SimpleContainerApp" ApplicationTypeVersion="1.Code"> <RepositoryCredentials AccountName="username" Password="****" PasswordEncrypted="true"/> <PortBinding ContainerPort="8905" EndpointRef="Endpoint1"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> </ApplicationManifest> An example service manifest (specified in the preceding application manifest) follows: <ServiceManifest Name="FrontendServicePackage" Version="1.org/2001/XMLSchema-instance"> <Description>A simple service container application</Description> <Parameters> <Parameter Name="ServiceInstanceCount" DefaultValue="3"></Parameter> <Parameter Name="BackEndSvcName" DefaultValue="bkend"></Parameter> </Parameters> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1.Code"> <EnvironmentVariable Name="BackendServiceName" Value="[BackendSvcName]"/> <EnvironmentVariable Name="HttpGatewayPort" Value="19080"/> </EnvironmentOverrides> <Policies> <ResourceGovernancePolicy CodePackageRef="Code" CpuShares="500" MemoryInMB="1024" MemorySwapInMB="4084" MemoryReservationInMB="1024" /> <ContainerHostPolicies CodePackageRef="FrontendService.Data" Version="1.w3.0" /> <Resources> <Endpoints> <Endpoint Name="Endpoint1" UriScheme="http" Port="80" Protocol="http"/> </Endpoints> </Resources> </ServiceManifest> Next steps Now that you have deployed a containerized service.0" /> <DataPackage Name="FrontendService. learn how to manage its lifecycle by reading Service Fabric application lifecycle.microsoft.Code" Version="1.Config" Version="1.w3.com/2011/01/fabric" xmlns:xsi="http://www.0"> <EntryPoint> <ContainerHost> <ImageName>myrepo/myimage:v1</ImageName> <Commands></Commands> </ContainerHost> </EntryPoint> <EnvironmentVariables> <EnvironmentVariable Name="HttpGatewayPort" Value=""/> <EnvironmentVariable Name="BackendServiceName" Value=""/> </EnvironmentVariables> </CodePackage> <ConfigPackage Name="FrontendService.0" xmlns="http://schemas.microsoft. Overview of Service Fabric and containers For an example application checkout the Service Fabric container code samples on GitHub .com/2011/01/fabric" xmlns:xsi="http://www. type yo azuresfguest . Deploy a Docker container to Service Fabric 3/29/2017 • 7 min to read • Edit Online This article walks you through building containerized services in Docker containers on Linux. The image parameter takes the form [repo]/[image name] . These services are called containerized services. You can add more services later by editing the generated manifest files. 3. Provide the URL for the container image from a DockerHub repo.https://get.for example. you can choose either to use a yeoman template or create the application package manually. Name your application . each with a specific role in delivering the application's functionality. A Service Fabric application can contain one or more containers. choose Container. docker is already installed): sudo apt-get install wget wget -qO. 2. For the framework. The capabilities include. Container image deployment and activation Resource governance Repository authentication Container port to host port mapping Container-to-container discovery and communication Ability to configure and set environment variables Packaging a docker container with yeoman When packaging a container on Linux. Service Fabric has several container capabilities that help you with building applications that are composed of microservices that are containerized.io/ | sh Create the application 1. The Service Fabric SDK for Linux includes a Yeoman generator that makes it easy to create your application and add a container image. Install Docker on your development box Run the following commands to install docker on your Linux development box (if you are using the vagrant image on OSX.docker. SimpleContainerApp 4. In a terminal. Let's use Yeoman to create an application with a single Docker container called SimpleContainerApp. 3. Connect to the local Service Fabric cluster./install. you can deploy it to the local cluster using the Azure CLI. Expand the Applications node and note that there is now an entry for your application type and another for the first instance of that type. . Use the install script provided in the template to copy the application package to the cluster's image store. . 1. checkout the Service Fabric container code samples on GitHub Adding more services to an existing application To add another container service to an application already created using yo .sh 1.Deploy the application Once the application is built. Open a browser and navigate to Service Fabric Explorer at http://localhost:19080/Explorer (replace localhost with the private IP of the VM if using Vagrant on Mac OS X). 2. Use the uninstall script provided in the template to delete the application instance and unregister the application type. and create an instance of the application. register the application type./uninstall. perform the following steps: .sh For an example application. azure servicefabric cluster connect 1. The following partial manifest shows an example of how to deploy the container called myimage:v1 from a repository called myrepo : <CodePackage Name="Code" Version="1.1. put the name of the container image into a ContainerHost element in the service manifest. Create the package directory structure. NOTE In a future release. Run yo azuresfguest:AddService Manually package and deploy a container image The process of manually packaging a containerized service is based on the following steps: 1. a container represents an application host in which multiple service replicas are placed. read/write BPS. Change directory to the root of the existing application. Resource limits can be set for the following resources: Memory MemorySwap CpuShares (CPU relative weight) MemoryReservationInMB BlkioWeight (BlockIO relative weight). which is specified in the application manifest is used to declare resource limits for a service code package. if MyApplication is the application created by Yeoman. support for specifying specific block IO limits such as IOPs. The ResourceGovernancePolicy . and others will be included. 4. add a ContainerHost for the entry point. Publish the containers to your repository. cd ~/YeomanSamples/MyApplication . . For example. Edit the application manifest file. Understand resource governance Resource governance is a capability of the container that restricts the resources that the container can use on the host. Edit the service manifest file. Deploy and activate a container image In the Service Fabric application model.0"> <EntryPoint> <ContainerHost> <ImageName>myrepo/myimage:v1</ImageName> <Commands></Commands> </ContainerHost> </EntryPoint> </CodePackage> You can provide input commands by specifying the optional Commands element with a comma-delimited set of commands to run inside the container. 2. To deploy and activate a container. 2. 3. In the service manifest. Then set the ImageName to be the name of the container repository and image. ) Then. For more information. specified in the application manifest.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. for downloading the container image from the image repository. By using the secret along with the account name. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. . or SSH key.Code"> <RepositoryCredentials AccountName="TestUser" Password="12345" PasswordEncrypted="false"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> We recommend that you encrypt the password by using a certificate that's deployed to the machine. it can decrypt the secret.Code" CpuShares="500" MemoryInMB="1024" MemorySwapInMB="4084" MemoryReservationInMB="1024" /> </Policies> </ServiceManifestImport> Authenticate a repository To download a container. The sign-in credentials. it can then authenticate with the container repository. The port binding maps the port to which the service is listening inside the container to a port on the host. The private key of the certificate that's used to decrypt the password must be deployed to the local machine in an out-of-band method. where the password was encrypted by using a certificate called MyCert.Code"> <RepositoryCredentials AccountName="TestUser" Password="[Put encrypted password here using MyCert certificate ]" PasswordEncrypted="true"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> Configure container port-to-host port mapping You can configure a host port used to communicate with the container by specifying a PortBinding in the application manifest. The following example shows an account called TestUser. this method is Azure Resource Manager. you might have to provide sign-in credentials to the container repository.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. when Service Fabric deploys the service package to the machine. You can use the Invoke-ServiceFabricEncryptText PowerShell command to create the secret cipher text for the password. see the article Managing secrets in Service Fabric applications. are used to specify the sign-in information.0"/> <Policies> <ResourceGovernancePolicy CodePackageRef="FrontendService. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. (In Azure. The following example shows an account called TestUser along with the password in clear text (not recommended): <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. Code"> <PortBinding ContainerPort="8905"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> Configure container-to-container discovery and communication By using the PortBinding policy. you can easily do container-to-container communication in the code within your container by using the reverse proxy. For more information.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. both for services that are deployed in containers or for services that are deployed as processes/guest executables. These environment variable values can be overridden specifically in the application manifest or specified during deployment as application parameters. The endpoint Endpoint1 can specify a fixed port (for example. Other services that are running in the cluster can thus discover this container using the REST queries for resolving. It can also specify no port at all. The following service manifest XML snippet shows an example of how to specify environment variables for a code package: . see the next section.Code"> <PortBinding ContainerPort="8905" EndpointRef="Endpoint1"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> By registering with the Naming service.0"/> <Policies> <ContainerHostPolicies CodePackageRef="FrontendService. port 80). in which case a random port from the cluster's application port range is chosen for you. you can map a container port to an Endpoint in the service manifest. using the Endpoint tag in the service manifest of a guest container. Communication is performed by providing the reverse proxy http listening port and the name of the services that you want to communicate with as environment variables. Configure and set environment variables Environment variables can be specified for each code package in the service manifest. If you specify an endpoint. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. Service Fabric can automatically publish this endpoint to the Naming service. <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1. Code"> <EnvironmentVariable Name="BackendServiceName" Value="[BackendSvc]"/> <EnvironmentVariable Name="HttpGatewayPort" Value="19080"/> </EnvironmentOverrides> </ServiceManifestImport> In the previous example. Complete examples for application and service manifest An example application manifest follows: . we specified an explicit value for the HttpGateway environment variable (19000).org/2001/XMLSchema-instance"> <Description>a guest executable service in a container</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="StatelessFrontendService" UseImplicitHost="true"/> </ServiceTypes> <CodePackage Name="FrontendService.com/2011/01/fabric" xmlns:xsi="http://www. <ServiceManifest Name="FrontendServicePackage" Version="1.0" xmlns="http://schemas.0"> <EntryPoint> <ContainerHost> <ImageName>myrepo/myimage:v1</ImageName> <Commands></Commands> </ContainerHost> </EntryPoint> <EnvironmentVariables> <EnvironmentVariable Name="HttpGatewayPort" Value=""/> <EnvironmentVariable Name="BackendServiceName" Value=""/> </EnvironmentVariables> </CodePackage> </ServiceManifest> These environment variables can be overridden at the application manifest level: <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1.w3.Code" Version="1.microsoft. These settings enable you to specify the value for BackendServiceName value when you deploy the application and not have a fixed value in the manifest. while we set the value for BackendServiceName parameter via the [BackendSvc] application parameter.0"/> <EnvironmentOverrides CodePackageRef="FrontendService. w3.Data" Version="1.org/2001/XMLSchema-instance"> <Description>A simple service container application</Description> <Parameters> <Parameter Name="ServiceInstanceCount" DefaultValue="3"></Parameter> <Parameter Name="BackEndSvcName" DefaultValue="bkend"></Parameter> </Parameters> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="FrontendServicePackage" ServiceManifestVersion="1.org/2001/XMLSchema-instance"> <Description> A service that implements a stateless front end in a container</Description> <ServiceTypes> <StatelessServiceType ServiceTypeName="StatelessFrontendService" UseImplicitHost="true"/> </ServiceTypes> <CodePackage Name="FrontendService. <ApplicationManifest ApplicationTypeName="SimpleContainerApp" ApplicationTypeVersion="1.Code" Version="1. Overview of Service Fabric and containers Interacting with Service Fabric clusters using the Azure CLI .com/2011/01/fabric" xmlns:xsi="http://www.Code"> <EnvironmentVariable Name="BackendServiceName" Value="[BackendSvcName]"/> <EnvironmentVariable Name="HttpGatewayPort" Value="19080"/> </EnvironmentOverrides> <Policies> <ResourceGovernancePolicy CodePackageRef="Code" CpuShares="500" MemoryInMB="1024" MemorySwapInMB="4084" MemoryReservationInMB="1024" /> <ContainerHostPolicies CodePackageRef="FrontendService.0" /> <DataPackage Name="FrontendService.microsoft.0" xmlns="http://schemas.0"> <EntryPoint> <ContainerHost> <ImageName>myrepo/myimage:v1</ImageName> <Commands></Commands> </ContainerHost> </EntryPoint> <EnvironmentVariables> <EnvironmentVariable Name="HttpGatewayPort" Value=""/> <EnvironmentVariable Name="BackendServiceName" Value=""/> </EnvironmentVariables> </CodePackage> <ConfigPackage Name="FrontendService.microsoft. learn how to manage its lifecycle by reading Service Fabric application lifecycle.w3.com/2011/01/fabric" xmlns:xsi="http://www.0"/> <EnvironmentOverrides CodePackageRef="FrontendService.0" xmlns="http://schemas.0" /> <Resources> <Endpoints> <Endpoint Name="Endpoint1" UriScheme="http" Port="80" Protocol="http"/> </Endpoints> </Resources> </ServiceManifest> Next steps Now that you have deployed a containerized service.Config" Version="1.Code"> <RepositoryCredentials AccountName="username" Password="****" PasswordEncrypted="true"/> <PortBinding ContainerPort="8905" EndpointRef="Endpoint1"/> </ContainerHostPolicies> </Policies> </ServiceManifestImport> </ApplicationManifest> An example service manifest (specified in the preceding application manifest) follows: <ServiceManifest Name="FrontendServicePackage" Version="1. For more information on the Reliable Actors programming model. Watch this Microsoft Virtual Academy video for an overview of Reliable services: .and giving them access to many other capabilities. A pluggable communication model. Reliable Services provide some great out-of-the-box options you can use. For stateful services. Reliable Services is one of the programming models available on Service Fabric. services needed external systems for Reliable state management. Traditionally. Use the transport of your choice. custom TCP protocols. via Service Fabric application management. Your code has a well-defined entry point and easily managed lifecycle. see Introduction to Service Fabric Reliable Actors. all from a first class programming model in several programming languages. What are Reliable Services? Reliable Services gives you a simple. A simple model for running your own code that looks like programming models you are used to. (optionally) use the Reliable Collections ... the Reliable Services programming model allows you to consistently and reliably store your state right inside your service by using Reliable Collections. from provisioning and deployment through upgrade and deletion. This topic covers: The Reliable Services programming model for stateless and stateful services. or anything else. This model also improves latency because you are co-locating the compute and state it needs to function. With the Reliable Services programming model. The other is the Reliable Actor programming model. you get: Access to the rest of the Service Fabric programming APIs. Some scenarios and examples of when to use Reliable Services and how they are written. This allows services to: query the system report health about entities in the cluster receive notifications about configuration and code changes find and communicate with other services. which provides a virtual Actor programming model on top of the Reliable Services model. or you can provide your own. WebSockets. Service Fabric manages the lifetime of services. you can store your state next to your compute with the same high availability and reliability you've come to expect from highly available external stores. powerful. top-level programming model to help you express what is important to your application. Reliable Services get to use the rest of the Service Fabric APIs directly. Reliable Services overview 4/7/2017 • 9 min to read • Edit Online Azure Service Fabric simplifies writing and managing stateless and stateful Reliable Services. Unlike Service Fabric Services modeled as Guest Executables. Reliable Collections are a simple set of highly available and reliable collection classes that will be familiar to anyone who has used C# collections. such as HTTP with Web API. With Reliable Collections. The choices you have to make when writing a Reliable Service. your state is preserved even in the presence of network or other failures. Services can be created and deleted dynamically via code. CreateServiceReplicaListeners/CreateServiceInstanceListeners . It also defines how the messages that appear interact with the rest of the service code. Finally. For example. or in cases where the services themselves encounter errors and crash or fail. Reliability . Availability . let's take a quick look at two different services to see how these pieces fit together. Service Fabric allows thousands of services to be provisioned within a single process. The cancellation token that is provided is a signal for when that work should stop. Services are easily partitioned (especially in the stateful case) to ensure that the service can scale and handle partial failures. This is true even across multiple reliable collections within a service. if the service needs to pull messages out of a Reliable Queue and process them. Scalability . enabling more instances to be spun up as necessary. There are just one or two methods that you need to implement to get your service up and running. Example services Knowing this programming model. The communication stack. rather than requiring or dedicating entire OS instances or processes to a single instance of a service. Consistency . Service Fabric maintains your desired number of running copies. this is where that work happens. and where it would kick off any background tasks that should run for the lifetime of the service.This method is where the service defines the communication stack(s) that it wants to use.Your service is reachable and responsive. Stateless Reliable Services .Any information stored in this service can be guaranteed to be consistent. Service Fabric encourages services to be lightweight. consistency. RunAsync . Changes across collections within a service can be made in a transactionally atomic manner. Service lifecycle Whether your service is stateful or stateless. such as Web API.What makes Reliable Services different? Reliable Services in Service Fabric are different from services you may have written before. availability. For stateful services. and scalability. you can head over to this article.This method is where your service runs its business logic. is what defines the listening endpoint or endpoints for the service (how clients reach the service). Service Fabric provides reliability. read on! If you're looking for a detailed walkthrough of the lifecycle of reliable services. If you're learning about reliable services for the first time. Reliable Services provide a simple lifecycle that lets you quickly plug in your code and get started.Your service stays up even in unreliable environments where your machines fail or hit network issues. say in response to customer requests. and they can grow or shrink as necessary through the addition or removal of hardware or other resources.Services are decoupled from specific hardware. To ensure that even if something fails the image isn't lost. Most services today store their state externally. they externalize their state to some other store. To do this. In this case. In this case. n2)") that define the calculator's public API. NOTE Support for Stateful Reliable Services is not available on Linux yet (for C# or Java). Then. Any state that is present is entirely disposable and doesn't require synchronization. the service stores it in a IReliableQueue . This reduces the amount of state the service has to manage. since there is no background task-processing that the service needs to do. In this case. and the calculator service performs the operations on the data provided and returns the result. replication. but increases complexity since the service has to keep the necessary metadata . Let's say we want to write a service that processes images. When the stateless service receives a response. This stateless service receives the call and determines whether the call is from a trusted party and which service it's destined for. Not storing any internal state makes this example calculator simple. and returns some id to the client so it can track the request. The front-end service then talks to stateful services to complete a user request. consider a calculator that has no memory and receives all terms and operations to perform at once. When a call is made from a client.) A common example of how stateless services are used in Service Fabric is as a front-end that exposes the public- facing API for a web application. such as 80. RunAsync() could be more complex. the service takes in an image and the series of conversions to perform on that image. Service Fabric takes care of these requirements for both the service code and the service state. The results get stored in an IReliableDictionary so that when the client comes back they can get their converted images. An example of such a service is in our samples C# / Java. the appropriate method is invoked. where the stateless service is listening. When it receives a request. and store the result all in a single transaction. persistence. (For example. any web app that relies on keeping session state in a backing store or cache is not stateless. In this service. it must have the current set of incoming requests it needs to process and the current average. This listening endpoint hooks up to the different calculation methods (example: "Add(n1. scalability. it replies to the original client. To do this. Stateful Reliable Services A stateful service is one that must have some portion of state kept consistent and present in order for the service to function. it returns an ICommunicationListener (C#) or CommunicationListener (Java) (for example Web API) that opens up a listening endpoint on some port. Any service that retrieves. this Reliable Service would pull out of the queue. Consider a service that constantly computes a rolling average of some value based on updates it receives. This service returns a communication listener (let's suppose it's a WebAPI) that exposes an API like ConvertImage(Image i. In Service Fabric. IList<Conversion> conversions) . It just keeps its state in the external state store. and consistency for that state. the stateless service forwards the call to the correct partition of the stateful service and waits for a response. When the calculator service is created. It doesn't store any state. This is only one example of this pattern in the samples. services aren't required to store their state externally. perform the conversions. the RunAsync() (C#) or runAsync() (Java) of the service can be empty. the message is removed from the queue and the results are stored in the result dictionary only when the conversions are complete. availability.A stateless service is one where there is no state maintained within the service across calls. the service could pull the image out of the queue and immediately store it in a remote store. since the external store is what provides reliability. For example. or high availability. Alternatively. But most services aren't truly stateless. and stores information in an external store (such as an Azure blob or table store today) is stateful. calls from clients are directed to a known port. processes. The service has a loop inside its RunAsync() that pulls requests out of IReliableQueue and performs the conversions requested. Instead. there are others in other samples as well. available. Your application’s state can be naturally modeled as Reliable Dictionaries and Queues. When to use Reliable Services APIs If any of the following characterize your application service needs. Your application needs to dynamically create or destroy Reliable Dictionaries or Queues or whole Services at runtime. and are highly reliable. then you should consider Reliable Services APIs: You want your service's code (and optionally state) to be highly available and reliable You need transactional guarantees across multiple units of state (for example. Next steps Reliable Services quick start Reliable Services advanced usage The Reliable Actors programming model . Your application needs to maintain change history for its units of state.to manage the remote store. Your code needs a free-threaded runtime environment. Your application needs to control the concurrency or granularity of transacted operations across one or more Reliable Collections. custom state providers.NET service! The only difference is that the data structures being used ( IReliableQueue and IReliableDictionary ) are provided by Service Fabric. Your applications code or state needs to be highly available with low latency reads and writes. and consistent. You want to manage the communications or control the partitioning scheme for your service. You want to develop or consume third-party-developed. if something failed in the middle the request remains in the queue waiting to be processed. You need to programmatically control Service Fabric-provided backup and restore features for your service’s state. orders and order line items). One thing to note about this service is that it sounds like a normal . With either approach. state must be persisted to an external store. A service instance has a name in the form of a URI using the "fabric:/" scheme. Named service instance: To run your service. In this type of service. Launch Visual Studio 2015 or Visual Studio 2017 as an administrator. Create a stateless service A stateless service is a type of service that is currently the norm in cloud applications. Get started with Reliable Services 3/9/2017 • 9 min to read • Edit Online An Azure Service Fabric application contains one or more services that run your code. The service host is just a process where instances of your service can run. such as Azure Tables or a SQL database. This Microsoft Virtual Academy video also shows you how to create a stateless Reliable service: Basic concepts To get started with Reliable Services. all of its internal state is lost. you only need to understand a few basic concepts: Service type: This is your service implementation. Service host: The named service instances you create need to run inside a host process. such as "fabric:/MyApp/MyService". for it to be made highly available and reliable. Service registration: Registration brings everything together. and create a new Service Fabric application project named HelloWorld: . It is considered stateless because the service itself does not contain data that needs to be stored reliably or made highly available. This guide shows you how to create both stateless and stateful Service Fabric applications with Reliable Services. you create named instances of your service type. along with a name and a version number. The service type must be registered with the Service Fabric runtime in a service host to allow Service Fabric to create instances of it to run. It is defined by the class you write that extends StatelessService and any other code or dependencies used therein. If an instance of a stateless service shuts down. much like you create object instances of a class type. It contains the stateless service implementation. as well as a number of PowerShell scripts that help you to deploy your application. It also contains the application manifest that describes the application. This is the application project that contains your services. HelloWorldStateless.Then create a stateless service project named HelloWorldStateless: Your solution now contains two projects: HelloWorld. . This is the service project. ServiceMessage(this. } A communication entry point where you can plug in your communication stack of choice. The service API provides two entry points for your code: An open-ended entry point method.Delay(TimeSpan. In Service Fabric. called RunAsync.FromSeconds(1). a service can run any business logic. The project template includes a sample implementation of RunAsync() that increments a rolling count. await Task. This is where you can start receiving requests from users and other services. This can happen for various reasons. long iterations = 0. where you can begin executing any workloads. protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { ... see Service Fabric Web API services with OWIN self-hosting RunAsync protected override async Task RunAsync(CancellationToken cancellationToken) { // TODO: Replace the following sample code with your own logic // or remove this RunAsync override if it's not needed in your service. } } The platform calls this method when an instance of a service is placed and ready to execute. } In this tutorial.ThrowIfCancellationRequested(). A cancellation token is provided to coordinate when your service instance needs to be closed. cancellationToken). This is where you can immediately start running your code. For a stateless service.Implement the service Open the HelloWorldStateless. In Service Fabric. including long-running compute workloads. that simply means when the service instance is opened. Faults occur in your code.. "Working-{0}".Current. NOTE For details about how to work with a communication stack. such as ASP. ++iterations).. . we will focus on the RunAsync() entry point method.NET Core. this open/close cycle of a service instance can occur many times over the lifetime of the service as a whole. protected override async Task RunAsync(CancellationToken cancellationToken) { . ServiceEventSource. including: The system moves your service instances for resource balancing.cs file in the service project. while (true) { cancellationToken. The application or system is upgraded. RunAsync() should not block synchronously. the count is stored in a local variable.Delay() is used. a Task-returning await Task. In this stateless service example. finish any work. If your workload must block synchronously. In the same HelloWorld application.Run() in your RunAsync implementation. cancellation. you should schedule a new Task with Task. The system will wait for your task to end (by successful completion. or fault) before it moves on. you need a stateful service. State is made highly available by Service Fabric without the need to persist state to an external store. even when the service moves or restarts. To convert a counter value from stateless to highly available and persistent. . Note in the while(true) loop in the previous example. the value that's stored exists only for the current lifecycle of its service instance. The underlying hardware experiences an outage. But because this is a stateless service. When the service moves or restarts. Your implementation of RunAsync should return a Task or await on any long-running or blocking operations to allow the runtime to continue. co-located with the code that's using it. It is important to honor the cancellation token. and exit RunAsync() as quickly as possible when the system requests cancellation. Create a stateful service Service Fabric introduces a new kind of service that is stateful. you can add a new service by right-clicking on the Services references in the application project and selecting Add -> New Service Fabric Service. Cancellation of your workload is a cooperative effort orchestrated by the provided cancellation token. the value is lost. This orchestration is managed by the system to keep your service highly available and properly balanced. A stateful service can maintain state reliably within the service itself. which contains the following RunAsync method: . The main difference is the availability of a state provider that can store state reliably.Select Stateful Service and name it HelloWorldStateful. Service Fabric comes with a state provider implementation called Reliable Collections. which lets you create replicated data structures through the Reliable State Manager. A stateful Reliable Service uses this state provider by default. A stateful service has the same entry points as a stateless service. Your application should now have two services: the stateless service HelloWorldStateless and the stateful service HelloWorldStateful. Click OK.cs in HelloWorldStateful. Open HelloWorldStateful. the platform performs additional work on your behalf before it executes RunAsync() . Objects are replicated for high availability when you commit transactions on Reliable Collections. // If an exception is thrown before calling CommitAsync. It is important that you do not mutate local instances of those objects without performing an update operation on the reliable collection in a transaction. This is because changes to local instances of objects will not be replicated automatically. Objects stored in Reliable Collections are kept in local memory in your service. } await Task. cancellationToken).ThrowIfCancellationRequested().StateManager. However. 0. "Current Counter Value: {0}". await myDictionary. await tx. Reliable Collections use DataContract for serialization.FromSeconds(1). This work can include ensuring that the Reliable State Manager and Reliable Collections are ready to use. IReliableDictionary is a dictionary implementation that you can use to reliably store state in the service. long>> ("myDictionary"). This means that you have a local reference to the object. using (var tx = this. while (true) { cancellationToken.AddOrUpdateAsync(tx. This means that everything that is stored in Reliable Collections must be serializable. "Counter".HasValue ? result. It also provides an API that abstracts away the complexities of managing those replicas and their state transitions. you can store data directly in your service without the need for an external persistent store. long>>("myDictionary")."). and nothing is saved to the secondary replicas. value) => ++value). the transaction aborts. with a couple of caveats: Service Fabric makes your state highly available by replicating state across nodes.TryGetValueAsync(tx.ServiceMessage(this.CommitAsync().StateManager. protected override async Task RunAsync(CancellationToken cancellationToken) { // TODO: Replace the following sample code with your own logic // or remove this RunAsync override if it's not needed in your service. Reliable Collections make your data highly available. Service Fabric accomplishes this by creating and managing multiple replicas of your service for you. With Service Fabric and Reliable Collections.Delay(TimeSpan.StateManager. Reliable Collections can store any . } RunAsync RunAsync() operates similarly in stateful and stateless services. so it's important to make sure that your types are supported by the Data Contract Serializer when you use the default serializer. You must re-insert the object back into the dictionary or use one of the update methods on the dictionary. var myDictionary = await this. Reliable Collections and the Reliable State Manager var myDictionary = await this. all changes are // discarded.GetOrAddAsync<IReliableDictionary<string. ServiceEventSource. .Value.ToString() : "Value does not exist. "Counter").CreateTransaction()) { var result = await myDictionary. and Reliable Collections store your data to local disk on each replica. in a stateful service.Current.NET type. (key. result.GetOrAddAsync<IReliableDictionary<string. By default. including your custom types. CommitAsync(). and it guarantees that either the entire operation will succeed or the entire operation will roll back. and it's optimized for repeat visits. You can pause the stream by clicking the Pause button. except LINQ. await tx. so that you can keep state consistent across multiple Reliable Collections and operations. all within a single transaction. You can now build and deploy your services. and save the result in a Reliable Dictionary. NOTE Before you run the application. "Counter-1"). Operations on Reliable Collections are asynchronous. If an error occurs after you dequeue the item but before you save the result. . Run the application We now return to the HelloWorld application. We don't recommended that you save references to reliable collection instances in class member variables or properties. The Reliable State Manager ensures that you get a reference back. The Reliable State Manager handles this work for you. Special care must be taken to ensure that the reference is set to an instance at all times in the service lifecycle. Reliable Collection operations are transactional. Check out the getting started guide for information on setting up your local environment. make sure that you have a local development cluster running. you may dequeue a work item from a Reliable Queue. "Counter-1".The Reliable State Manager manages Reliable Collections for you. } Reliable Collections have many of the same operations that their System.TryGetValueAsync(tx. For example.StateManager.Collections.Generic and System. You can simply ask the Reliable State Manager for a reliable collection by name at any time and at any place in your service. (k. Transactional and asynchronous operations using (ITransaction tx = this. you can view the generated Event Tracing for Windows (ETW) events in a Diagnostic Events window. You can then examine the details of a message by expanding that message. When you press F5. perform an operation on it. After the services start running. This is because write operations with Reliable Collections perform I/O operations to replicate and persist data to disk. Note that the events displayed are from both the stateless service and the stateful service in the application. await myDictionary.Collections. This is treated as an atomic operation.AddOrUpdateAsync(tx. your application will be built and deployed to your local cluster.CreateTransaction()) { var result = await myDictionary. v) => ++v).Concurrent counterparts do. 0. the entire transaction is rolled back and the item remains in the queue for processing. Next steps Debug your Service Fabric application in Visual Studio Get started: Service Fabric Web API services with OWIN self-hosting Learn more about Reliable Collections Deploy an application Application upgrade Developer reference for Reliable Services . Service host: The named service instances you create need to run inside a host. Named service instance: To run your service. name the application "HelloWorldApplication" and the service "HelloWorld". Service registration: Registration brings everything together. The service type must be registered with the Service Fabric runtime in a service host to allow Service Fabric to create instances of it to run. . you only need to understand a few basic concepts: Service type: This is your service implementation. Get started with Reliable Services 4/3/2017 • 4 min to read • Edit Online This article explains the basics of Azure Service Fabric Reliable Services and walks you through creating and deploying a simple Reliable Service application written in Java. Service instances are in fact object instantiations of your service class that you write. This Microsoft Virtual Academy video also shows you how to create a stateless Reliable service: Installation and setup Before you start. you create named instances of your service type. along with a name and a version number. The result includes directories for the HelloWorldApplication and HelloWorld . make sure you have the Service Fabric development environment set up on your machine. The Service Fabric SDK for Linux includes a Yeoman generator to provide the scaffolding for a Service Fabric application with a stateless service. It is defined by the class you write that extends StatelessService and any other code or dependencies used therein. For this tutorial. much like you create object instances of a class type. Start by running the following Yeoman command: $ yo azuresfjava Follow the instructions to create a Reliable Stateless Service. Basic concepts To get started with Reliable Services. Create a stateless service Start by creating a Service Fabric application. If you need to set it up. go to getting started on Mac or getting started on Linux. The service host is just a process where instances of your service can run. txt │ └── ServiceManifest. Faults occur in your code. } In this tutorial.sh Implement the service Open HelloWorldApplication/HelloWorld/src/statelessservice/HelloWorldService.xml ├── install. The application or system is upgraded. we focus on the runAsync() entry point method. A cancellation token is provided to coordinate when your service instance needs to be closed.txt │ ├── Data │ │ └── _readme. @Override protected List<ServiceInstanceListener> createServiceInstanceListeners() { . } A communication entry point where you can plug in your communication stack of choice. HelloWorldApplication/ ├── build.java.sh ├── settings. where you can begin executing any workloads. . this open/close cycle of a service instance can occur many times over the lifetime of the service as a whole. RunAsync The platform calls this method when an instance of a service is placed and ready to execute. This can happen for various reasons. The underlying hardware experiences an outage.gradle ├── HelloWorld │ ├── build...sh │ │ └── _readme. @Override protected CompletableFuture<?> runAsync(CancellationToken cancellationToken) { . This is where you can immediately start running your code.xml │ └── HelloWorldPkg │ ├── Code │ │ ├── entryPoint.java ├── HelloWorldApplication │ ├── ApplicationManifest..txt │ ├── Config │ │ └── _readme. In Service Fabric. This is where you can start receiving requests from users and other services. including: The system moves your service instances for resource balancing.gradle └── uninstall.gradle │ └── src │ └── statelessservice │ ├── HelloWorldServiceHost.. called runAsync() . that simply means when the service instance is opened. and can run any code. The service API provides two entry points for your code: An open-ended entry point method. For a stateless service. including long-running compute workloads.java │ └── HelloWorldService. This class defines the service type. "). or fault) before it moves on. finish any work. Your implementation of runAsync should return a CompletableFuture to allow the runtime to continue. logger.ofSeconds(10)).registerStatelessServiceAsync("HelloWorldType". Cancellation Cancellation of your workload is a cooperative effort orchestrated by the provided cancellation token. } Service registration Service types must be registered with the Service Fabric runtime. (context) -> new HelloWorldService().java : public static void main(String[] args) throws Exception { try { ServiceRuntime.log(Level. "Exception in registration:".throwIfCancellationRequested().log(Level. Thread. "Registered stateless service type HelloWorldType. Service registration is performed in the process main entry point.log(Level.MAX_VALUE). throw ex. The following example demonstrates how to handle a cancellation event: @Override protected CompletableFuture<?> runAsync(CancellationToken cancellationToken) { // TODO: Replace the following sample code with your own logic // or remove this runAsync override if it's not needed in your service. } } Run the application The Yeoman scaffolding includes a gradle script to build the application and bash scripts to deploy and undeploy the application. CompletableFuture. while(true) { cancellationToken. To run the application. ex). In this example.INFO. runAsync() should not block synchronously. It is important to honor the cancellation token. try { Thread.sleep(1000). the process main entry point is HelloWorldServiceHost. ++iterations).xml and your service class that implements StatelessService . } catch (IOException ex) {} } }). Duration. and exit runAsync() as quickly as possible when the system requests cancellation.This orchestration is managed by Service Fabric to keep your service highly available and properly balanced. } catch (Exception ex) { logger. "Working-{0}".SEVERE. first build the application with gradle: $ gradle . If your workload needs to implement a long running task that should be done inside the CompletableFuture. The system waits for your task to end (by successful completion. logger. cancellation. The service type is defined in the ServiceManifest.runAsync(() -> { long iterations = 0.INFO.sleep(Long. The install. Run the install.sh ./install.sh script contains the necessary Azure CLI commands to deploy the application package.This produces a Service Fabric application package that can be deployed using Service Fabric Azure CLI.sh script to deploy: $ . which is checked in RunAsync before continuing to actual work. Stateless service shutdown When shutting down a stateless service. The listeners may open before RunAsync is started. In this case additional coordination is necessary. Reliable services lifecycle overview 4/3/2017 • 7 min to read • Edit Online When thinking about the lifecycles of Reliable Services. allowing the service to do long running or background work During shutdown The cancellation token passed to RunAsync is canceled. Similarly. the basics of the lifecycle are the most important. the same pattern is followed. and the listeners are closed Once that is complete.CreateServiceInstanceListeners() is invoked and any returned listeners are Opened ( ICommunicationListener. One common solution is some flag within the listeners indicating when they have completed. it is left as an exercise to the implementer. Common solutions: Sometimes listeners can't function until some other information is created or work done. Sometimes the code in RunAsync does not want to start until the listeners are open.OnOpenAsync() is called. in parallel two things happen: StatelessService. we have to think about error or failure conditions.OpenAsync() is called on each listener) The service's RunAsync method ( StatelessService. or as a part of the construction of the listener itself. StatelessService. Then. The Service is constructed 2. During this sequence. In particular. for stateful services. It is important to note that there is no ordering between the calls to create and open the listeners and RunAsync. In parallel . This is an uncommon override but it is available). For stateless services that work can usually be done in the service's constructor. Stateless service startup The lifecycle of a stateless service is fairly straightforward. RunAsync may end up invoked before the communication listeners are open or have even been constructed. just in reverse: 1. Here's the order of events: 1. the service object itself is destructed There are details around the exact ordering of these events.RunAsync() ) is called 3. allowing communication with the service The Service's RunAsync method is called. the role of Primary is transferred to another replica (or comes back) without the service shutting down. If any synchronization is required. In general: During Startup Services are constructed They have an opportunity to construct and return zero or more listeners Any returned listeners are opened. If present. we have to deal with the Primary swap scenario. Finally. the service's own OnOpenAsync method is called (Specifically. In addition. the order of events may change slightly depending on whether the Reliable Service is Stateless or Stateful. during the CreateServiceInstanceListeners() call. The simplest and most common solution is for the communication listeners to return some error code that the client uses to know to retry the request. When starting up a stateful service. otherwise the service skips to step 4 StatefulServiceBase. (This is uncommonly overridden in the service.) 3. the service's StatefulServiceBase.CloseAsync() is called on each listener) The cancellation token passed to RunAsync() is canceled (checking the cancellation token's IsCancellationRequested property returns true. then the following things happen in parallel. if present (again this is an uncommon override). Stateful service Shutdown Similarly to Stateless services. with one additional case: say that the calls arriving at the communication listeners require information kept inside some Reliable Collections to work. After StatefulServiceBase.OnCloseAsync() method is called.OnChangeRoleAsync() method completes. the StatefulServiceBase. (This is uncommonly overridden in the service. Stateful service primary swaps While a stateful service is running.OnCloseAsync() completes.OpenAsync() is called on each listener) The service's RunAsync method ( StatefulServiceBase. If the service replica in question is the Primary.OnCloseAsync() method is called (again this is an uncommon override but it is available).) Similarly to stateless services. some additional coordination is necessary.RunAsync() ) is called 4. When a stateful service is being shut down. only the Primary replicas of that stateful services have their communication . and if called the token's ThrowIfCancellationRequested method returns an OperationCanceledException ) 2. The solutions are much the same. Any open listeners are Closed ( ICommunicationListener. and if called the token's ThrowIfCancellationRequested method returns an OperationCanceledException ) 2.OnOpenAsync() is called. the following events occur: 1. After StatelessService. and before RunAsync could start. the order of events is as follows: 1. Once CloseAsync() completes on each listener and RunAsync() also completes. there's no coordination between the order in which the listeners are created and opened and RunAsync being called.CreateServiceReplicaListeners() is invoked and any returned listeners are Opened ( ICommunicationListener. the service object is destructed Stateful service Startup Stateful services have a similar pattern to stateless services. the lifecycle events during shutdown are the same as during startup.CloseAsync() is called on each listener) The cancellation token passed to RunAsync() is canceled (checking the cancellation token's IsCancellationRequested property returns true. In parallel Any open listeners are Closed ( ICommunicationListener. Once all the replica listener's OpenAsync() calls complete and RunAsync() has been started (or these steps were skipped because this replica is currently a secondary). StatefulServiceBase. Once the StatefulServiceBase. with a few changes. but reversed. the service object is destructed. 4. (This is uncommonly overridden in the service. StatefulServiceBase. The Service is constructed 2.OnChangeRoleAsync() is called.OnCloseAsync() completes. the service's StatelessService.) 3. 3.OnChangeRoleAsync() is called. Because the communication listeners could open before the reliable collections are readable or writeable. Once CloseAsync() completes on each listener and RunAsync() also completes (which should only have been necessary if this service replica was a Primary). Service Fabric needs this replica to start listening for messages on the wire (if it does that) and start any background tasks it cares about. In parallel Any open listeners are Closed ( ICommunicationListener. but the only change in lifecycles is that CreateServiceReplicaListeners() is called (and the resulting listeners Opened) even if the replica is a Secondary.OpenAsync() is called on each listener) The service's RunAsync method ( StatefulServiceBase.OnChangeRoleAsync() is called.) Notes on service lifecycle Both the RunAsync() method and the CreateServiceReplicaListeners/CreateServiceInstanceListeners calls are optional. The following APIs are called: 1. As a result. and if called the token's ThrowIfCancellationRequested method returns an OperationCanceledException ) 2. there's an additional option on ServiceReplicaListeners that allows them to start on secondary replicas. This is uncommon.CreateServiceReplicaListeners() is invoked and any returned listeners are Opened ( ICommunicationListener. StatefulServiceBase. For stateful services. A service may have one of them. this process looks similar to when the service is created.RunAsync() ) is called 2. While there is no time limit on returning from these methods. Secondary are constructed but see no further calls. if the service does all its work in response to user calls. This timeout is 15 minutes by default. It is recommended that you return as quickly as possible upon receiving the cancellation request. both. If a service exits from RunAsync() by throwing some unexpected exception. the service's StatefulServiceBase. While a stateful service is running however. In parallel StatefulServiceBase.CloseAsync() is called on each listener) The cancellation token passed to RunAsync() is canceled (checking the cancellation token's IsCancellationRequested property returns true. The following APIs are called: 1. As a result. (This is uncommonly overridden in the service.OnChangeRoleAsync() is called. Once CloseAsync() completes on each listener and RunAsync() also completes.) For the secondary being promoted Similarly. this step looks similar to when the service is being shut down. For example. Usually this only happens during application upgrades or when a service is being deleted. Similarly. Similarly. (This is uncommonly overridden in the service. For stateful reliable services RunAsync() would be called again if the service were demoted from primary and then promoted back to primary. If your service does not respond to these API calls in a reasonable amount of time Service Fabric may forcibly terminate your service. this is a failure and the service object is shut down and a health error reported.listeners opened and their RunAsync method called. creating and returning communication listeners is optional. What does this mean in terms of the lifecycle events that a replica can see? The behavior the stateful replica sees depends on whether it is the replica being demoted or promoted during the swap. except that the replica itself already exists. which replica is currently the Primary can change. you immediately lose the ability to write to Reliable Collections and therefore cannot complete any real work. or neither. Once all the replica listener's OpenAsync() calls complete and RunAsync() has been started (or these steps were skipped because this is a secondary). if the replica is later . One difference is that the Service isn't destructed or closed since it remains as a Secondary. This is not considered a failure condition and would represent the background work of the service completing. and so only needs to implement RunAsync() It is valid for a service to complete RunAsync() successfully and return from it. as the service may have only background work to do. For the primary being demoted Service Fabric needs this replica to stop processing messages and quit any background work it is doing. Only the communication listeners and their associated code are necessary. there is no need for it to implement RunAsync() . and new ones created and Opened as a part of the change to Primary. destructed. the listeners are closed. Failures in the OnCloseAsync() path result in OnAbort() being called which is a last-chance best-effort opportunity for the service to clean up and release any resources that they have claimed. converted into a primary. Next steps Introduction to Reliable Services Reliable Services quick start Reliable Services advanced usage . For stateless services that work can usually be done in the service's constructor. and the listeners are closed Once that is complete. In particular. in parallel two things happen: StatelessService. just in reverse: 1. allowing the service to do long running or background work During shutdown The cancellation token passed to runAsync is canceled. we have to think about error or failure conditions. the same pattern is followed. Here's the order of events: 1. In general: During Startup Services are constructed They have an opportunity to construct and return zero or more listeners Any returned listeners are opened. the service object itself is destructed There are details around the exact ordering of these events. StatelessService. Stateless service shutdown When shutting down a stateless service. If present. for stateful services. In parallel . Sometimes the code in runAsync does not want to start until the listeners are open. This is an uncommon override but it is available). The listeners may open before runAsync is started. In this case additional coordination is necessary. it is left as an exercise to the implementer. Similarly. we have to deal with the Primary swap scenario. If any synchronization is required. the service's own onOpenAsync method is called (Specifically. During this sequence. the role of Primary is transferred to another replica (or comes back) without the service shutting down. In addition. It is important to note that there is no ordering between the calls to create and open the listeners and runAsync. The Service is constructed 2.createServiceInstanceListeners() is invoked and any returned listeners are Opened ( CommunicationListener.runAsync() ) is called 3. or as a part of the construction of the listener itself. runAsync may end up invoked before the communication listeners are open or have even been constructed. Common solutions: Sometimes listeners can't function until some other information is created or work done. Finally. Then. the order of events may change slightly depending on whether the Reliable Service is Stateless or Stateful. Reliable services lifecycle overview 4/3/2017 • 3 min to read • Edit Online When thinking about the lifecycles of Reliable Services. Stateless service startup The lifecycle of a stateless service is fairly straightforward. allowing communication with the service The Service's runAsync method is called. the basics of the lifecycle are the most important. during the createServiceInstanceListeners() call.openAsync() is called on each listener) The service's runAsync method ( StatelessService. which is checked in runAsync before continuing to actual work. One common solution is some flag within the listeners indicating when they have completed.onOpenAsync() is called. This is not considered a failure condition and would represent the background work of the service completing. A service may have one of them. 3. For example. you immediately lose the ability to write and therefore cannot complete any real work. It is recommended that you return as quickly as possible upon receiving the cancellation request. and so only needs to implement runAsync() It is valid for a service to complete runAsync() successfully and return from it. the service's StatelessService.onCloseAsync() completes. Failures in the onCloseAsync() path result in onAbort() being called which is a last-chance best-effort opportunity for the service to clean up and release any resources that they have claimed. or neither. If a service exits from runAsync() by throwing some unexpected exception. as the service may have only background work to do. After StatelessService. Usually this only happens during application upgrades or when a service is being deleted. if the service does all its work in response to user calls. This timeout is 15 minutes by default. For stateful reliable services runAsync() would be called again if the service were demoted from primary and then promoted back to primary. the service object is destructed Notes on service lifecycle Both the runAsync() method and the createServiceInstanceListeners calls are optional. if present (again this is an uncommon override). If your service does not respond to these API calls in a reasonable amount of time Service Fabric may forcibly terminate your service.onCloseAsync() method is called.closeAsync() is called on each listener) The cancellation token passed to runAsync() is canceled (checking the cancellation token's isCancelled property returns true. Once closeAsync() completes on each listener and runAsync() also completes. there is no need for it to implement runAsync() . and if called the token's throwIfCancellationRequested method returns an CancellationException ) 2. Next steps Introduction to Reliable Services Reliable Services quick start Reliable Services advanced usage . While there is no time limit on returning from these methods. Similarly. both. Only the communication listeners and their associated code are necessary. this is a failure and the service object is shut down and a health error reported. NOTE Stateful reliable services are not supported in java yet. Any open listeners are Closed ( CommunicationListener. creating and returning communication listeners is optional. a datacenter power outage). which results in low latency and high-throughput writes.Data. As such. Developers need to program only to the Reliable Collection APIs and let Reliable Collections manage the replicated and local state. Reliable Collections provide strong consistency guarantees out of the box in order to make reasoning about application state easier. The classes in the Microsoft. The key difference between Reliable Collections and other high-availability technologies (such as Redis. which results in low latency and high-throughput reads. scalable.Concurrent namespace): . This means that: All reads are local.Collections namespace provide a set of out-of-the-box collections that automatically make your state highly available. and low-latency cloud applications as though you were writing single computer applications. Strong consistency is achieved by ensuring transaction commits finish only after the entire transaction has been logged on a majority quorum of replicas. including the primary. All writes incur the minimum number of network IOs. Asynchronous: APIs are asynchronous to ensure that threads are not blocked when incurring IO.ServiceFabric. To achieve weaker consistency.Collections. Transactional: APIs utilize the abstraction of transactions so you can manage multiple Reliable Collections within a service easily. Introduction to Reliable Collections in Azure Service Fabric stateful services 3/27/2017 • 9 min to read • Edit Online Reliable Collections enable you to write highly available. Persisted: Data is persisted to disk for durability against large-scale outages (for example.Collections classes: a new set of collections that are designed for the cloud and multi-computer applications without increasing complexity for the developer. Azure Table service. and Azure Queue service) is that the state is kept locally in the service instance while also being made highly available. The Reliable Collections APIs are an evolution of concurrent collections APIs (found in the System. applications can acknowledge back to the client/requester before the asynchronous commit returns. Reliable Collections are: Replicated: State changes are replicated for high availability. Reliable Collections can be thought of as the natural evolution of the System. Similar to ConcurrentDictionary. For more details.aspx. Data modifications made by other transactions after the start of the current transaction are not visible to statements executing in the current transaction. Microsoft. Isolation levels Isolation level defines the degree to which the transaction must be isolated from modifications made by other transactions. any write within a transaction will be visible to a following read that belongs to the same transaction. and asynchronous strict first-in. In other words. Reliable Queue: Represents a replicated. the value can be of any type. Snapshot: Specifies that data read by any statement in a transaction will be the transactionally consistent version of the data that existed at the start of the transaction. Today. The effect is as if the statements in a transaction get a snapshot of the committed data as it existed at the start of the transaction.TryPeekAsync . IReliableQueue. first-out (FIFO) queue. transactional.Collections contains two collections: Reliable Dictionary: Represents a replicated.microsoft.TryGetValueAsync . transactional.com/library/ms173763. Asynchronous: Returns a task since. Snapshots are consistent across Reliable Collections. Reliable Collections automatically choose the isolation level to use for a given read operation depending on the operation and the role of the replica at the time of transaction's creation. OPERATION \ ROLE PRIMARY SECONDARY Single Entity Read Repeatable Read Snapshot Enumeration \ Count Snapshot Snapshot NOTE Common examples for Single Entity Operations are IReliableDictionary. There are two isolation levels that are supported in Reliable Collections: Repeatable Read: Specifies that statements cannot read data that has been modified but not yet committed by other transactions and that no other transactions can modify data that has been read by the current transaction until the current transaction finishes.Data. ConditionalValue<T> is like Nullable<T> but does not require T to be a struct. all transactions are two-phased: a transaction does not release the locks it has acquired until the transaction terminates with either an abort or a commit.aspx.ServiceFabric. Similar to ConcurrentQueue. Both the Reliable Dictionary and the Reliable Queue support Read Your Writes.com/library/ms173763. Following is the table that depicts isolation level defaults for Reliable Dictionary and Queue operations. the operations are replicated and persisted. see https://msdn. The transaction can recognize only data modifications that were committed before the start of the transaction.microsoft. and asynchronous collection of key/value pairs. unlike concurrent collections. No out parameters: Uses ConditionalValue<T> to return a bool and a value instead of out parameters. see https://msdn. Locking In Reliable Collections. For more details. both the key and the value can be of any type. Transactions: Uses a transaction object to enable the user to group actions on multiple Reliable Collections in a transaction. . As log records accumulate. because they both end up having the Shared lock. Since logs are persisted. and the Reliable State Manager will recover and play back all the state changes that occurred since the checkpoint. Now let’s look at the finite disk scenario. This is a model where each state change is logged on disk and applied only in memory. The lock compatibility matrix can be found below: REQUEST \ GRANTED NONE SHARED UPDATE EXCLUSIVE Shared No conflict No conflict Conflict Conflict Update No conflict No conflict Conflict Conflict Exclusive No conflict Conflict Conflict Conflict Note that a time-out argument in the Reliable Collections APIs is used for deadlock detection. .a. Any read operation done using Snapshot isolation is lock free. the Reliable State Manager needs to truncate its log to make room for the newer records. This is because checkpoints contain only the latest versions. Write operations always take Exclusive locks. Before that happens. if a TryPeekAsync or TryDequeueAsync ever observes that the Reliable Queue is empty. two transactions (T1 and T2) are trying to read and update K1. To better understand the Log and Checkpoint model.Reliable Dictionary uses row level locking for all single entity operations. Any Repeatable Read operation by default takes Shared locks. Note that the above deadlock scenario is a great example of how an Update lock can prevent deadlocks. This way. log records never need to be removed and the Reliable Collection needs to manage only the in-memory state. let’s first look at the infinite disk scenario. Once the Reliable Collections complete their checkpoints. Reliable Queue trades off concurrency for strict transactional FIFO property. they will also lock EnqueueAsync . one or both of the operations will time out. even when the replica fails and needs to be restarted. It will request the Reliable Collections to checkpoint their in-memory state to disk. This allows the Reliable Collection to apply only the operation in memory. Reliable Queue uses operation level locks allowing one transaction with TryPeekAsync and/or TryDequeueAsync and one transaction with EnqueueAsync at a time.k. It is the Reliable Collections' responsibility to persist its state up to that point. Note that to preserve FIFO. As the disk is infinite. An Update lock is an asymmetric lock used to prevent a common form of deadlock that occurs when multiple transactions lock resources for potential updates at a later time. It is possible for them to deadlock. The benefit is that deltas are turned into sequential append-only writes on disk for improved performance. However. for any read operation that supports Repeatable Read. The complete state itself is persisted only occasionally (a. the user can ask for an Update lock instead of the Shared lock. NOTE Another value add of checkpointing is that it improves recovery performance in common cases. the locking depends on a couple of factors. For example. when the replica needs to be restarted. Persistence model The Reliable State Manager and Reliable Collections follow a persistence model that is called Log and Checkpoint. the Reliable State Manager has enough information in its logs to replay all the operations the replica has lost. Checkpoint). In this case. For read operations. Reliable Collections will recover their checkpointed state. the Reliable State Manager can truncate the log to free up disk space. the Reliable State Manager will run out of disk space. The Reliable State Manager logs every operation before it is replicated. The key type parameter (TKey) for a Reliable Dictionary must correctly implement GetHashCode() and Equals() . TKey + TValue for Reliable Dictionary) below 80 KBytes: smaller the better. Do handle InvalidOperationException. each service should have at least a target and minimum replica set size of 3.. The system takes dependency on this for merging checkpoints. Do deep copy the returned object of a custom type before modifying it. Keys must be immutable. the termination of the transaction was not requested by the user. Do not create a transaction within another transaction’s using statement because it can cause deadlocks. Do not use a transaction after it has been committed. Of course.g GetCountAsync . Do not use an enumeration outside of the transaction scope it was created in. Time-outs should be used to detect deadlocks. Here are some things to keep in mind: The default time-out is 4 seconds for all the Reliable Collection APIs. Next steps Reliable Services quick start Working with Reliable Collections Reliable Services notifications Reliable Services backup and restore (disaster recovery) Reliable State Manager configuration Getting started with Service Fabric Web API services Advanced usage of the Reliable Services programming model Developer reference for Reliable Collections . In such cases. Assuming.Recommendations Do not modify an object of custom type returned by read operations (e.g. Common way to achieve this in Reliable Dictionary. it will also reduce replicating duplicate data when only one small part of the value is being updated. is to break your rows in to multiple rows. Do not use TimeSpan. and if not create a new transaction and retry. To achieve high availability for the Reliable Collections. check if the cancellation token has been signaled (or the role of the replica has been changed). CreateEnumerableAsync ) in the same transaction due to the different isolation levels. Reliable Collections. when the Reliable State Manager is changing its role out of Primary or when a long-running transaction is blocking truncation of the transactional log. Consider using backup and restore functionality to have disaster recovery. user may receve InvalidOperationException indicating that their transaction has already been terminated.g. Avoid mixing single entity operations and multi-entity operations (e. reads from Primary are always stable: can never be false progressed. you do not need to do a deep copy on them. best way to handle this exception is to dispose the transaction. User transactions can be aborted by the system for variety of reasons. aborted. The default cancellation token is CancellationToken. return a reference to the objects and not a copy. Do use Update lock when reading an item with an intention to update it to prevent a certain class of deadlocks. This will reduce the amount of Large Object Heap usage as well as disk and network IO requirements. Consider keeping your items (e. Do ensure that your IComparable<TKey> implementation is correct. Most users should not override this.MaxValue for time-outs. This means that a version of data that is read from a single secondary might be false progressed. Read operations on the secondary may read versions that are not quorum committed. just like Concurrent Collections. Since structs and built-in types are pass-by-value.None in all Reliable Collections APIs. In many cases. TryPeekAsync or TryGetValueAsync ). For example. or disposed. . Since AddAsync modifies the key’s value to the new. TimeoutException // Key & value put in temp dictionary (read your own writes). Once the lock is acquired.CreateTransaction()) { // AddAsync takes key's write lock. key.Delay passing 100 milliseconds each time. AddAsync serializes your key and value objects to byte arrays and appends these byte arrays to a log file on the local node. Internally. one thread will acquire the write lock and the other threads will block. replicated (for availability). the key’s write lock is taken.StateManager. value. cancellationToken). Working with Reliable Collections 4/19/2017 • 10 min to read • Edit Online Service Fabric offers a stateful programming model available to . after 4 seconds. } All operations on reliable dictionary objects (except for ClearAsync which is not undoable). If the method modifies the key’s value.Delay(100. Service Fabric provides reliable dictionary and reliable queue classes. In the code above. require an ITransaction object.CommitAsync(). a later call to TryGetValueAsync (using the same ITransaction object) will return the value even if you have not yet committed the transaction. the method takes a write lock on the key and if the method only reads from the key’s value. if >4 secs. I’m just calling Task. Finally. When you use these classes. in reality. // serialized. Specifically. This object has associated with it any and all changes you’re attempting to make to any reliable dictionary and/or reliable queue objects within a single partition. cancellationToken). the methods throw a TimeoutException. the ITransaction object is passed to a reliable dictionary’s AddAsync method. This is done to provide you with read-your-own-writes semantics. goto retry. AddAsync sends the byte arrays to all . redo/undo record is logged & sent to // secondary replicas await m_dic. Next. passed-in value. your state is partitioned (for scalability). That is. } // If CommitAsync not called. You acquire an ITransaction object by calling the partition’s StateManager’s CreateTransaction method. // CommitAsync sends Commit record to log & secondary replicas // After quorum responds. and transacted within a partition (for ACID semantics).NET developers via Reliable Collections. if 2 (or more) threads attempt to add values with the same key at the same time. Usually. you write your code to react to a TimeoutException by catching it and retrying the entire operation (as shown in the code above).AddAsync(tx. Dispose sends Abort // record to log & all locks released } catch (TimeoutException) { await Task. But. Let’s look at a typical usage of a reliable dictionary object and see what its actually doing. Method overloads exist allowing you to pass an explicit timeout value if you’d prefer. AddAsync adds the key and value object references to an internal temporary dictionary associated with the ITransaction object. ///retry: try { // Create a new Transaction object for this partition using (ITransaction tx = base. So. In my simple code. then a read lock is taken on the key. all locks released await tx. you might be better off using some kind of exponential back-off delay instead. after you call AddAsync. By default. methods block for up to 4 seconds to acquire the lock. dictionary methods that accepts a key take a reader/writer lock associated with the key. If CommitAsync is not called (usually due to an exception being thrown). all data changes are considered permanent and any locks associated with keys that were manipulated via the ITransaction object are released so other threads/transactions can manipulate the same keys and their values.CreateTransaction()) { // AddAsync serializes the name/user. The correct way to write the code is simply to reverse the two lines: using (ITransaction tx = StateManager.CommitAsync(). However. See the code below: using (ITransaction tx = StateManager.AddAsync(tx. When disposing an uncommitted ITransaction object. And. await tx. // Corruption! await tx. the // new value is NOT serialized. what’s in memory is thrown away. this code will not work correctly with a reliable dictionary. // & sends the bytes to the secondary replicas.LastLogin = DateTime. name. When a new process starts or if another replica becomes primary.CreateTransaction()) { user. it appends commit information to the log file on the local node and also sends the commit record to all the secondary replicas. Once a quorum (majority) of the replicas has replied. logged. user). In the code above. Service Fabric appends abort information to the local node’s log file and nothing needs to be sent to any of the secondary replicas. Even though the key/value information has been written to a log file. then the ITransaction object gets disposed. } When working with a regular . it does not impact the local file or the data sent to the replicas.UtcNow. name. let’s take a look at some common misuses of them.CommitAsync(). user. // Do this BEFORE calling AddAsync await m_dic. Remember from the earlier discussion. logs the bytes. you will only learn about the mistake if/when the process goes down. you can add a key/value to the dictionary and then change the value of a property (such as LastLogin). the call to AddAsync serializes the key/value objects to byte arrays and then saves the arrays to a local file and also sends them to the secondary replicas.the secondary replicas so they have the same key/value information.AddAsync(tx. then the old property value is what is available. the information is not considered part of the dictionary until the transaction that they are associated with has been committed. And then.UtcNow.NET dictionary. Common pitfalls and how to avoid them Now that you understand how the reliable collections work internally. this changes the property’s value in memory only. user). } Here is another example showing a common mistake: . If you later change a property.LastLogin = DateTime. // The line below updates the property’s value in memory only. If the process crashes. & sent to secondary replicas. await m_dic. I cannot stress enough how easy it is to make the kind of mistake shown above. the call to CommitAsync commits all of the transaction’s operations. any locks associated with keys that were manipulated via the transaction are released. Specifically. Then.CommitAsync(). The correct way to update a value in a reliable collection. DateTime.SetValue(tx.UtcNow. & sent to secondary replicas. } } Define immutable data types to prevent programmer error Ideally.LastLogin = DateTime. It is best to avoid collection properties as serializing and deserializing them can frequently can hurt performance. However. you can modify the state of this new object and write the new object into the collection so that it gets serialized to byte arrays.TryGetValueAsync(tx. using (ITransaction tx = StateManager. to avoid potential programmer bugs. not a shallow copy. All is good! The code below shows the correct way to update a value in a reliable collection: using (ITransaction tx = StateManager. Now. User updatedUser = new User(currentUser). you can also use String. if you want to use . If the value exists. name. only // immutable state can be shared by currentUser & updatedUser object graphs. we’d like the compiler to report errors when you accidentally produce code that mutates state of an object that you are supposed to consider immutable. we highly recommend that you define the types you use with reliable collections to be immutable types. name).CreateTransaction()) { // Use the user’s name to look up their data ConditionalValue<User> currentUser = await m_dic. await tx. updatedUser). But. Specifically.UtcNow. of course.HasValue) { // Create new user object with the same state as the current user object. Guid. the C# compiler does not have the ability to do this.TryGetValueAsync(tx. } } Again. modify any properties you desire updatedUser. the // new value is NOT serialized. the code above works fine and is a common pattern: the developer uses a key to look up a value. if (currentUser. logged. this code exhibits the same problem as already discussed: you MUST not modify an object once you have given it to a reliable collection. update one of their properties. // Corruption! await tx. user. update one of their properties.HasValue) { // The line below updates the property’s value in memory only. Specifically. etc.]. appended to the local file and sent to the replicas. // The user exists in the dictionary. the developer changes a property’s value. TimeSpan. // In the new object. // NOTE: This must be a deep copy. So.CommitAsync(). with regular . if (user. UInt64. the local file. // Update the key’s value to the updateUser info await m_dic.CreateTransaction()) { // Use the user’s name to look up their data ConditionalValue<User> user = await m_dic. this means that you stick to core value types (such as numbers [Int32. // The user exists in the dictionary.LastLogin = DateTime. with reliable collections.Value. And. the in- memory objects. and all the replicas have the same exact state.NET dictionaries. name). After committing the change(s). create a new object which is an exact copy of the original object. is to get a reference to the existing value and consider the object referred to by this reference immutable. However. and the like). public UserInfo(String email. You must approach versioning of your data with great care. ItemsBidding = (itemsBidding == null) ? NoBids : itemsBidding. we add a new ItemId to the ItemsBidding // collection by creating a new immutable UserInfo object with the added ItemId.Add(itemId)). you must always be able to deserialize old data. private set. public ItemId(String seller. the getter is public and the setter is private. } [DataMember] public readonly String Email. Specifically.ToImmutableList().NET’s immutable collections library (System. [DataMember] public IEnumerable<ItemId> ItemsBidding { get. As your service matures. String itemName) { Seller = seller. Reliable Collections serialize your objects using . we highly recommend the use of . [DataMember] public readonly String ItemName. ItemName = itemName. [DataContract] // If you don’t seal.Empty.ToImmutableList(). We also recommend sealing your classes and making fields read-only whenever possible.org. } } The ItemId type is also an immutable type as shown here: [DataContract] public struct ItemId { [DataMember] public readonly String Seller. this means your deserialization code must be infinitely backward compatible: Version 333 of your service code must be able to operate on data placed in a reliable collection by version 1 of your service code 5 . IEnumerable<ItemId> itemsBidding = null) { Email = email. The serialized objects are persisted to the primary replica’s local disk and are also transmitted to the secondary replicas. this would be a readonly field but it can't be because OnDeserialized // has to set it. } // Since each UserInfo object is immutable. So instead. } [OnDeserialized] private void OnDeserialized(StreamingContext context) { // Convert the deserialized collection to an immutable collection ItemsBidding = ItemsBidding.Immutable). ((ImmutableList<ItemId>)ItemsBidding). } } Schema versioning (upgrades) Internally. you must ensure that any derived classes are also immutable public sealed class UserInfo { private static readonly IEnumerable<ItemId> NoBids = ImmutableList<ItemId>. The UserInfo type below demonstrates how to define an immutable type taking advantage of aforementioned recommendations.collection properties. This library is available for download from http://nuget. it’s likely you’ll want to change the kind of data (schema) your service requires.NET’s DataContractSerializer. // Ideally. First and foremost.Collections. public UserInfo AddItemBidding(ItemId itemId) { return new UserInfo(Email. you upgrade your service from V1 to V2: V2 contains the code that knows how to deal with the new schema change but this code doesn’t execute. service code is upgraded one upgrade domain at a time. you have two different versions of your service code running simultaneously. With a 2-phase upgrade. To learn best practices on versioning data contracts. you will not be able to look up the key within the reliable dictionary ever again.) Now. You must avoid having the new version of your service code use the new schema as old versions of your service code might not be able to handle the new schema. see Data Contract Versioning. see Version-Tolerant Serialization Callbacks. you should design each version of your service to be forward compatible by 1 version.years ago. convert it to V2 data. When other instances read V2 data. . the V2 instances can read V1 data. it must be able to save any data it doesn’t explicitly know about and simply write it back out when updating a dictionary key or value. To learn how to provide a data structure that can interoperate across multiple versions. operate on it. Next Steps To learn about creating forward compatible data contracts. they just operate on it. this is what makes this a 2- phase upgrade. during an upgrade. it operates on it and writes V1 data. you can perform what is typically referred to as a 2-phase upgrade. WARNING While you can modify the schema of a key. Then. However. Furthermore. Alternatively. and write out V2 data. see Forward-Compatible Data Contracts. this means that V1 of your service code should be able to simply ignore any schema elements it does not explicitly handle. Specifically. To learn how to implement version tolerant data contracts. after the upgrade is complete across all upgrade domains. (One way to signal this is to roll out a configuration upgrade. you must ensure that your key’s hash code and equals algorithms are stable. they do not need to convert it. When the V2 code reads V1 data. When possible. So. you can somehow signal to the running V2 instances that the upgrade is complete. If you change how either of these algorithms operate. and write it out as V2 data. see IExtensibleDataObject. if SharedLogPath is specified. Global Configuration The global reliable service configuration is specified in the cluster manifest for the cluster under the KtlLogger section. It allows configuration of the shared log location and size plus the global memory limits used by the logger. You can see the cluster manifest for your cluster using the Get-ServiceFabricClusterManifest powershell command. Configuration names NAME UNIT DEFAULT VALUE REMARKS WriteBufferMemoryPoolMini Kilobytes 8388608 Minimum number of KB to mumInKB allocate in kernel mode for the logger write buffer memory pool. then SharedLogId must also be specified. This memory pool is used for caching state information before writing to disk. The cluster manifest is a single XML file that holds settings and configurations that apply to all nodes and services in the cluster. WriteBufferMemoryPoolMax Kilobytes No Limit Maximum size to which the imumInKB logger write buffer memory pool can grow. One set is global for all reliable services in the cluster while the other set is specific to a particular reliable service. If SharedLogId is specified. Configure stateful reliable services 2/21/2017 • 9 min to read • Edit Online There are two sets of configuration settings for reliable services. SharedLogId GUID "" Specifies a unique GUID to use for identifying the default shared log file used by all reliable services on all nodes in the cluster that do not specify the SharedLogId in their service specific configuration. . then SharedLogPath must also be specified. However. The file is typically called ClusterManifest.xml. SharedLogPath Fully qualified path name "" Specifies the fully qualified path where the shared log file used by all reliable services on all nodes in the cluster that do not specify the SharedLogPath in their service specific configuration. The SharedLogId and SharedLogPath settings are always used together to define the GUID and location for the default shared log for all nodes in the cluster. The pool size is controlled by the WriteBufferMemoryPoolMinimumInKB and WriteBufferMemoryPoolMaximumInKB settings. If there is more demand for memory from the memory pool than is available. "fabricSettings": [{ "name": "KtlLogger".xml for the specific service. the example below shows how to change the the shared transaction log that gets created to back any reliable collections for stateful services. shared log files should be placed on disks that are used solely for the shared log file to reduce contention. WriteBufferMemoryPoolMaximumInKB is the highest size to which the memory pool may grow. NAME UNIT DEFAULT VALUE REMARKS SharedLogSizeInMB Megabytes 8192 Specifies the number of MB of disk space to statically allocate for the shared log. . SharedLogId and SharedLogPath do not need to be specified in order for SharedLogSizeInMB to be specified. requests for memory will be delayed until memory is available. Therefore if the write buffer memory pool is too small for a particular configuration then performance may suffer. In Azure ARM or on premise JSON template. SharedLogSizeInMB specifies the amount of disk space to preallocate for the default shared log on all nodes. For best performance. The default shared log is used for all reliable services that do not specify the settings in the settings.Log"/> </Section> Remarks The logger has a global pool of memory allocated from non paged kernel memory that is available to all reliable services on a node for caching state data before being written to the dedicated log associated with the reliable service replica. "parameters": [{ "name": "SharedLogSizeInMB". "value": "4096" }] }] Sample local developer cluster manifest section If you want to change this on your local development environment. Each reliable service replica that is opened may increase the size of the memory pool by a system determined amount up to WriteBufferMemoryPoolMaximumInKB. <Section Name="KtlLogger"> <Parameter Name="SharedLogSizeInMB" Value="4096"/> <Parameter Name="WriteBufferMemoryPoolMinimumInKB" Value="8192" /> <Parameter Name="WriteBufferMemoryPoolMaximumInKB" Value="8192" /> <Parameter Name="SharedLogId" Value="{7668BB54-FE9C-48ed-81AC-FF89E60ED2EF}"/> <Parameter Name="SharedLogPath" Value="f:\SharedLog. WriteBufferMemoryPoolMinimumInKB specifies both the initial size of this memory pool and the lowest size to which the memory pool may shrink. you need to edit the local clustermanifest. The value must be 2048 or larger. Service Specific Configuration You can modify stateful Reliable Services' default configurations by using the configuration package (Config) or the service implementation (code).xml file. override the replicatorSettingsSectionName parameter to the ReliableStateManagerConfiguration constructor when creating the ReliableStateManager for this service. Code . By default. Replicator configuration Replicator configurations configure the replicator that is responsible for making the stateful Reliable Service's state highly reliable by replicating and persisting the state locally. Default section name ReplicatorSecurityConfig NOTE To change this section name. ensuring that the data that is made highly available is also secure. This means that services will not be able to see each other's replication traffic. By default. Configuration names NAME UNIT DEFAULT VALUE REMARKS . Replicator security configuration Replicator security configurations are used to secure the communication channel that is used during replication. Renaming the config package or section names will require a code change when configuring the ReliableStateManager. Config . Default section name ReplicatorConfig NOTE To change this section name.Configuration via code is accomplished by creating a ReliableStateManager using a ReliableStateManagerConfiguration object with the appropriate options set. the Azure Service Fabric runtime looks for predefined section names in the Settings. The default configuration is generated by the Visual Studio template and should suffice. override the replicatorSecuritySectionName parameter to the ReliableStateManagerConfiguration constructor when creating the ReliableStateManager for this service.xml file that is generated in the Microsoft Visual Studio package root under the Config folder for each service in the application.xml file that is generated in the Visual Studio solution unless you plan to configure your service via code. NOTE Do not delete the section names of the following configurations in the Settings. an empty security configuration section prevents replication security.xml file and consumes the configuration values while creating the underlying runtime components.Configuration via the config package is accomplished by changing the Settings. This section talks about additional configurations that are available to tune the replicator. This value must be a multiple of 4 and greater than 16. MaxSecondaryReplicationQu Number of operations 16384 Maximum number of eueSize operations in the secondary queue. Refer to Service manifest resources to read more about defining endpoint resources in a service manifest. This should reference a TCP resource endpoint in the service manifest. . This value must be greater than 64 and a power of 2.015 Time period for which the val replicator at the secondary waits after receiving an operation before sending back an acknowledgement to the primary.NAME UNIT DEFAULT VALUE REMARKS BatchAcknowledgementInter Seconds 0. ReplicatorEndpoint N/A No default--required IP address and port that the parameter primary/secondary replicator will use to communicate with other replicators in the replica set. An operation is freed up after making its state highly available through persistence. An operation is freed up after the primary replicator receives an acknowledgement from all the secondary replicators. Any other acknowledgements to be sent for operations processed within this interval are sent as one response. MaxRecordSizeInKB KB 1024 Largest record size that the replicator may write in the log. This value must be greater than 64 and a power of 2. MaxPrimaryReplicationQueu Number of operations 8192 Maximum number of eSize operations in the primary queue. CheckpointThresholdInMB MB 50 Amount of log file space after which the state is checkpointed. In such cases. MinLogSizeInMB * TruncationThresholdFactor must be less than MaxStreamSizeInMB. user is required to take a full backup. Increasing this value increases the possibility of doing partial copies and incremental backups since chances of relevant log records being truncated is lowered. ThrottlingThresholdFactor Factor 4 Determines at what size of the log. 0 indicates that the replicator will determine the minimum log size.NAME UNIT DEFAULT VALUE REMARKS MinLogSizeInMB MB 0 (system determined) Minimum size of the transactional log. The log will not be allowed to truncate to a size below this setting. Truncation threshold is determined by MinLogSizeInMB multiplied by TruncationThresholdFactor. . (CheckpointThresholdInMB * ThrottlingThresholdFactor)). truncation will be triggered. the replica will start being throttled. Throttling threshold (in MB) must be greater than truncation threshold (in MB). An incremental backup requests will fail if the incremental backup would generate a backup log that would cause the accumulated backup logs since the relevant full backup to be larger than this size. MaxAccumulatedBackupLog MB 800 Max accumulated size (in SizeInMB MB) of backup logs in a given backup log chain. Throttling threshold (in MB) is determined by Max((MinLogSizeInMB * ThrottlingThresholdFactor). TruncationThresholdFactor Factor 2 Determines at what size of the log. TruncationThresholdFactor must be greater than 1. Truncation threshold (in MB) must be less than MaxStreamSizeInMB. SlowApiMonitoringDuration Seconds 300 Sets the monitoring interval for managed API calls. /// </summary> static void Main() { ServiceRuntime. However. if SharedLogPath is specified. then SharedLogPath must also be specified. } } class MyStatefulService : StatefulService { public MyStatefulService(StatefulServiceContext context.GetAwaiter(). SharedLogPath Fully qualified path name "" Specifies the fully qualified path where the shared log file for this replica will be created. if SharedLogId is specified. a warning health report will be sent to the Health Manager. context => new HelloWorldStateful(context.. Typically.. new ReliableStateManagerConfiguration( new ReliableStateManagerReplicatorSettings() { RetryInterval = TimeSpan. services should not use this setting. new ReliableStateManager(context. } Sample configuration file .FromSeconds(3) } )))). stateManager) { } . NAME UNIT DEFAULT VALUE REMARKS SharedLogId GUID "" Specifies a unique GUID to use for identifying the shared log file used with this replica.GetResult(). Typically. services should not use this setting. Example: user provided backup callback function.RegisterServiceAsync("HelloWorldStatefulType". IReliableStateManagerReplica stateManager) : base(context. Sample configuration via code class Program { /// <summary> /// This is the entry point of the service host process. However. then SharedLogId must also be specified. After the interval has passed. the higher the overall replication throughput. The value for CheckpointThresholdInMB controls the amount of disk space that the replicator can use to store state information in the replica's dedicated log file.My-Test-SAN1-Bob" /> </Section> </Settings> Remarks BatchAcknowledgementInterval controls replication latency. Shared log files should be placed on disks that are used solely for the shared log file to reduce head movement contention. This directly translates to the latency of transaction commits. A value of '0' results in the lowest possible latency.com/2011/01/fabric"> <Section Name="ReplicatorConfig"> <Parameter Name="ReplicatorEndpoint" Value="ReplicatorEndpoint" /> <Parameter Name="BatchAcknowledgementInterval" Value="0. as smaller records use only the space needed for the smaller record. The larger the value for BatchAcknowledgementInterval. This can potentially increase the recovery time of a replica after a crash. if the service is causing larger data items to be part of the state information.org/2001/XMLSchema" xmlns:xsi="http://www. This is due to the partial state transfer that takes place due to the availability of more history of operations in the log. at the cost of throughput (as more acknowledgement messages must be sent and processed. <?xml version="1.w3. For best efficiency. Increasing this to a higher value than the default could result in faster reconfiguration times when a new replica is added to the set. the default 1024-KB record size is optimal.0" encoding="utf-8"?> <Settings xmlns:xsd="http://www.w3. The MaxRecordSizeInKB setting defines the maximum size of a record that can be written by the replicator into the log file. There is little benefit in making MaxRecordSizeInKB smaller than 1024. as many services as possible should specify the same shared log. then this value might need to be increased. In most cases. The SharedLogId and SharedLogPath settings are always used together to make a service use a separate shared log from the default shared log for the node. each containing fewer acknowledgements). We expect that this value would need to be changed in only rare cases.org/2001/XMLSchema-instance" xmlns="http://schemas. at the cost of higher operation latency.05"/> <Parameter Name="CheckpointThresholdInMB" Value="512" /> </Section> <Section Name="ReplicatorSecurityConfig"> <Parameter Name="CredentialType" Value="X509" /> <Parameter Name="FindType" Value="FindByThumbprint" /> <Parameter Name="FindValue" Value="9d c9 06 b1 69 dc 4f af fd 16 97 ac 78 1e 80 67 90 74 9d 2f" /> <Parameter Name="StoreLocation" Value="LocalMachine" /> <Parameter Name="StoreName" Value="My" /> <Parameter Name="ProtectionLevel" Value="EncryptAndSign" /> <Parameter Name="AllowedCommonNames" Value="My-Test-SAN1-Alice. We expect that this value would need to be changed in only rare cases. Next steps Debug your Service Fabric application in Visual Studio Developer reference for Reliable Services .microsoft. However. An example is a sorted index of all keys in Reliable Dictionary. . the replica's state can be restored from a backup via RestoreAsync. and synchronous events shouldn't include any expensive operations. notifications should be handled as fast as possible. At the end of recovery. you need to register with the TransactionChanged or StateManagerChanged events on Reliable State Manager. A common place to register with these event handlers is the constructor of your stateful service. Because of that. you won't miss any notification that's caused by a change during the lifetime of IReliableStateManager. The Reliable State Manager collection is rebuilt in three cases: Recovery: When a replica starts. In such cases. it recovers its previous state from the disk. Notifications are fired as part of applying operations. Full copy: Before a replica can join the configuration set. To register for transaction notifications and/or state manager notifications. it has to be built. Sometimes. or the entire collection is rebuilt. Reliable State Manager fires notifications when this collection changes: a reliable state is added or removed. Reliable State Manager on the primary replica uses NotifyStateManagerChangedEventArgs to fire an event that contains the set of reliable states that it restored from the backup. Reliable State Manager on the secondary replica uses NotifyStateManagerChangedEventArgs to fire an event that contains the set of reliable states that it acquired from the primary replica. Sending monitoring data. Common reasons for using notifications are: Building materialized views. such as the number of users added in the last hour. Reliable State Manager maintains a collection of reliable states like Reliable Dictionary and Reliable Queue. Restore: In disaster recovery scenarios. Reliable State Manager notifications Reliable State Manager provides notifications for the following events: Transaction Commit State manager Rebuild Addition of a reliable state Removal of a reliable state Reliable State Manager tracks the current inflight transactions. such as secondary indexes or aggregated filtered views of the replica's state. The only change in transaction state that causes a notification to be fired is a transaction being committed. this requires a full copy of Reliable State Manager's state from the primary replica to be applied to the idle secondary replica. it uses NotifyStateManagerChangedEventArgs to fire an event that contains the set of recovered reliable states. When you register on the constructor. Reliable Services notifications 3/1/2017 • 6 min to read • Edit Online Notifications allow clients to track the changes that are being made to an object that they're interested in. Two types of objects support notifications: Reliable State Manager and Reliable Dictionary. You use the action property in NotifyStateManagerChangedEventArgs to cast NotifyStateManagerChangedEventArgs to the correct subclass: NotifyStateManagerChangedAction. } this. But in the future.TransactionId). public MyService(StatefulServiceContext context) : base(MyService.Action == NotifyTransactionChangedAction. this.Transaction. } } The StateManagerChanged event handler uses NotifyStateManagerChangedEventArgs to provide details about the event.lastCommitLsn = e. Following is an example TransactionChanged event handler.Commit) that specifies the type of change. The action is then equal to NotifyTransactionChangedAction.EndpointName.Commit.lastTransactionId = e.Remove: NotifyStateManagerSingleEntityChangedEventArgs Following is an example StateManagerChanged notification handler.StateManager. return.Add(e. NotifyStateManagerChangedEventArgs e) { if (e. this.Rebuild) { this. It also contains the transaction property that provides a reference to the transaction that changed.Commit) { this.StateManagerChanged += this. this. events might be raised for other types of transaction state changes.Action == NotifyStateManagerChangedAction.Rebuild: NotifyStateManagerRebuildEventArgs NotifyStateManagerChangedAction. It contains the action property (for example.TransactionId. } Reliable Dictionary notifications . context. NotifyStateManagerChangedEventArgs has two subclasses: NotifyStateManagerRebuildEventArgs and NotifyStateManagerSingleEntityChangedEventArgs.Transaction. } The TransactionChanged event handler uses NotifyTransactionChangedEventArgs to provide details about the event. NotifyTransactionChangedEventArgs e) { if (e. private void OnTransactionChangedHandler(object sender.Transaction.TransactionChanged += this.ProcessStataManagerRebuildNotification(e).Add and NotifyStateManagerChangedAction. public void OnStateManagerChangedHandler(object sender. CreateReliableStateManager(context)) { this. TransactionChanged events are raised only if the transaction is committed. We recommend checking the action and processing the event only if it's one that you expect.lastCommittedTransactionList.OnTransactionChangedHandler.CommitSequenceNumber. NOTE Today. NotifyTransactionChangedAction.OnStateManagerChangedHandler.ProcessStateManagerSingleEntityNotification(e).StateManager. var enumerator = e.Reliable Dictionary provides notifications for the following events: Rebuild: Called when ReliableDictionary has recovered its state from a recovered or copied local state or backup.Value).GetAsyncEnumerator(). } } } } NOTE ProcessStateManagerSingleEntityNotification is the sample method that the preceding OnStateManagerChangedHandler example calls.Add) { if (operation.OnDictionaryRebuildNotificationHandlerAsync. private void ProcessStateManagerSingleEntityNotification(NotifyStateManagerChangedEventArgs e) { var operation = e as NotifyStateManagerSingleEntityChangedEventArgs. along with DictionaryChanged. To get Reliable Dictionary notifications.ReliableState is IReliableDictionary<TKey.ReliableState. Add: Called when an item has been added to ReliableDictionary.Add(enumerator. Registering when IReliableDictionary is added to IReliableStateManager ensures that you won't miss any notifications. TValue>) { var dictionary = (IReliableDictionary<TKey.MoveNextAsync(CancellationToken. Update: Called when an item in IReliableDictionary has been updated. TValue> rebuildNotification) { this.secondaryIndex.OnDictionaryChangedHandler.Action == NotifyStateManagerChangedAction.StateManagerChanged add notification. A common place to register with these event handlers is in the ReliableStateManager. TValue> origin.RebuildNotificationAsyncCallback = this. if (operation. you need to register with the DictionaryChanged event handler on IReliableDictionary.Current. TValue>)operation.DictionaryChanged += this.secondaryIndex. The preceding code sets the IReliableNotificationAsyncCallback interface. } } . NotifyDictionaryRebuildEventArgs<TKey.Clear(). enumerator. Remove: Called when an item in IReliableDictionary has been deleted. dictionary. public async Task OnDictionaryRebuildNotificationHandlerAsync( IReliableDictionary<TKey. Because NotifyDictionaryRebuildEventArgs contains an IAsyncEnumerable interface--which needs to be enumerated asynchronously--rebuild notifications are fired through RebuildNotificationAsyncCallback instead of OnDictionaryChangedHandler. while (await enumerator. dictionary.Key.State.Current.None)) { this. Clear: Called when the state of ReliableDictionary has been cleared through the ClearAsync method. TValue>. case NotifyDictionaryChangedAction. all previous notifications are irrelevant. TValue>.ProcessAddNotification(addEvent).Update: NotifyDictionaryItemUpdatedEventArgs NotifyDictionaryChangedAction.Clear: var clearEvent = e as NotifyDictionaryClearEventArgs<TKey. TValue>. return. TValue> e) { switch (e. NOTE In the preceding code.Remove: var deleteEvent = e as NotifyDictionaryItemRemovedEventArgs<TKey. case NotifyDictionaryChangedAction.Rebuild: NotifyDictionaryRebuildEventArgs NotifyDictionaryChangedAction. New action types might be added in the future. first the maintained aggregated state is cleared. return.Add: var addEvent = e as NotifyDictionaryItemAddedEventArgs<TKey.ProcessUpdateNotification(updateEvent).Remove: NotifyDictionaryItemRemovedEventArgs public void OnDictionaryChangedHandler(object sender.ProcessClearNotification(clearEvent).Update: var updateEvent = e as NotifyDictionaryItemUpdatedEventArgs<TKey. return.Action) { case NotifyDictionaryChangedAction. And because operations are guaranteed only to be locally committed (in other words. The DictionaryChanged event handler uses NotifyDictionaryChangedEventArgs to provide details about the event. a restore notification is fired as the last step of a restore operation. case NotifyDictionaryChangedAction. NotifyDictionaryChangedEventArgs has five subclasses.Clear: NotifyDictionaryClearEventArgs NotifyDictionaryChangedAction. clients see only notifications for locally committed operations. Here are some things to keep in mind: Notifications are fired as part of the execution of an operation. For example. . Do not execute any expensive operations (for example. this. this. Use the action property in NotifyDictionaryChangedEventArgs to cast NotifyDictionaryChangedEventArgs to the correct subclass: NotifyDictionaryChangedAction. return. Do check the action type before you process the event. this. as part of processing the rebuild notification.Remove: NotifyDictionaryItemAddedEventArgs NotifyDictionaryChangedAction. NotifyDictionaryChangedEventArgs<TKey. default: break. I/O operations) as part of synchronous events.ProcessRemoveNotification(deleteEvent). A restore will not finish until the notification event is processed.Add and NotifyDictionaryChangedAction. } } Recommendations Do complete notification events as fast as possible. TValue>. Because the reliable collection is being rebuilt with a new state. this. Because notifications are fired as part of the applying operations. This means that if transaction T1 includes Create(X). a single notification is fired for each applied operation. One important difference of undo notifications is that events that have duplicate keys are aggregated. For transactions that contain multiple operations. in that order. one for the deletion. they might or might not be undone in the future. Notifications are raised for such undo operations. you'll see a single notification to Delete(X). logged). you'll get one notification for the creation of X. if transaction T1 is being undone. Delete(X). and one for the creation again. On the redo path. and Create(X). operations are applied in the order in which they were received on the primary replica from the user. some operations might be undone. For example. As part of processing false progress. Next steps Reliable Collections Reliable Services quick start Reliable Services backup and restore (disaster recovery) Developer reference for Reliable Collections . rolling the state of the replica back to a stable point. Offline data processing. Thus. Each time it backs up it will need to copy 16 GB of checkpoints in addition to 50 MB (configurable using CheckpointThresholdInMB) worth of logs. . The backup APIs provided by the platform allow to take backup(s) of a service partition's state. For example. even if one node in the cluster fails. For example. without blocking read or write operations. A full backup is a backup that contains all the data required to recreate the state of the replica: checkpoints and all log records. a service may want to back up data in the following scenarios: In the event of the permanent loss of an entire Service Fabric cluster or all nodes that are running a given partition. While this in-built redundancy provided by the platform may be sufficient for some. The solution to this problem is incremental backups. NOTE It is critical to backup and restore your data (and test that it works as expected) so you can recover from data loss scenarios. this may happen if an administrator with sufficient privilege erroneously deletes the service. For example. If we have an Recovery Point Objective of 5 minutes. Bugs in the service that cause data corruption. Since it has the checkpoints and the log. The Backup/Restore feature allows services built on the Reliable Services API to create and restore backups. The restore APIs allow a service partition's state to be restored from a chosen backup. in certain cases it is desirable for the service to back up data (to an external store). For example. It might be convenient to have offline processing of data for business intelligence that happens separately from the service that generates the data. In such a case. the replica needs to be backed up every 5 minutes. the services continue to be available. Types of Backup There are two backup options: Full and Incremental. a full backup can be restored by itself. both the code and the data may have to be reverted to an earlier state. Back up and restore Reliable Services and Reliable Actors 3/28/2017 • 14 min to read • Edit Online Azure Service Fabric is a high-availability platform that replicates the state across multiple nodes to maintain this high availability. Administrative errors whereby the state accidentally gets deleted or corrupted. where only the log records since the last backup are backed up. a replica that has 16 GB of state will have checkpoints that add up approximately to 16 GB. this may happen when a service code upgrade starts writing faulty data to a Reliable Collection. The problem with full backups arises when the checkpoints are large. they tend to be faster but they cannot be restored on their own. the service needs to invoke the inherited member function BackupAsync. CancellationToken cancellationToken) { var backupId = Guid. This function also returns a bool that indicates whether it was able to successfully move the backup folder to its target location. await externalBackupStore. the entire backup chain is required.Since incremental backups are only changes since the last backup (does not include the checkpoints). Backups can be made only from primary replicas.BackupCallbackAsync). To restore an incremental backup. Request to take an incremental backup can fail with FabricMissingFullBackupException which indicates that either the replica has never taken a full backup since it has become primary. BackupDescription myBackupDescription = new BackupDescription(backupOption. CancellationToken.NewGuid().Directory. see Reliable Services Configuration BackupInfo provides information regarding the backup. Backup Reliable Services The service author has full control of when to make backups and where backups will be stored. as well as a callback function. For more information. As shown below.UploadBackupFolderAsync(backupInfo. Users can increase the likelihood of being able to do incremental backups by configuring MinLogSizeInMB or TruncationThresholdFactor.Incremental. Func<< BackupInfo. The callback function can move the BackupInfo.BackupAsync(myBackupDescription). Note that increasing these values will increase the per replica disk usage. The following code demonstrates how the BackupCallbackAsync method can be used to upload the backup to Azure Storage: private async Task<bool> BackupCallbackAsync(BackupInfo backupInfo. await this. cancellationToken). some of the log records since the last backup has been truncated or replica passed the MaxAccumulatedBackupLogSizeInMB limit.Directory). including the location of the folder where the runtime saved the backup (BackupInfo. where one can specify a full or incremental backup. To start a backup.Directory to an external store or another location.this. Task>> which is invoked when the backup folder has been created locally and is ready to be moved out to some external storage. return true. A backup chain is a chain of backups starting with a full backup and followed by a number of contiguous incremental backups. BackupAsync takes in a BackupDescription object. and they require write status to be granted. } . backupId. The new primary may need to restore data from a backup. Thus. the disk for two out of three replicas for a partition (including the primary replica) gets corrupted or wiped. For example. but also all the incremental backups. When restoring a full backup and a number of incremental backups. var restoreDescription = new RestoreDescription(backupFolder). Download the latest backup (and uncompress the backup into the backup folder if it was compressed). Note that: There can be only one backup operation in-flight per replica at any given time.externalBackupStore. because of an application bug).RestoreAsync call contains a member called BackupFolderPath. RestoreAsync call can throw FabricMissingFullBackupException if the . the service has to be upgraded or reverted to remove the cause of the corruption. await restoreCtx. } RestoreDescription passed in to the RestoreContext. While many approaches are possible. If a replica fails over while a backup is in progress.g. When restoring a single full backup. and UploadBackupFolderAsync is the method that compresses the folder and places it in the Azure Blob store. For example. the backup may not have been completed. an administrator removes the entire service and thus the service and the data need to be restored. The service replicated corrupt application data (e. The entire service is lost.RestoreAsync(restoreDescription). CancellationToken cancellationToken) { var backupFolder = await this. it is the service's responsibility to restart the backup by invoking BackupAsync as necessary. BackupFolderPath should be set to the local path of the folder that not only contains the full backup. The service author needs to perform the following to recover: Override the virtual base class method OnDataLossAsync. return true. once the failover finishes. Find the latest backup in the external location that contains the service's backups. and non-corrupt data has to be restored. The OnDataLossAsync method provides a RestoreContext.DownloadLastBackupAsync(cancellationToken). More than one BackupAsync call at a time will throw FabricBackupInProgressException to limit inflight backups to one. Return true if the restoration was a success. Call the RestoreAsync API on the provided RestoreContext.. we offer some examples on using RestoreAsync to recover from the above scenarios. this BackupFolderPath should be set to the local path of the folder that contains your full backup. ExternalBackupStore is the sample class that is used to interface with Azure Blob storage. the runtime would automatically detect the data loss and invoke the OnDataLossAsync API.In the above example. Restore Reliable Services In general. Partition data loss in Reliable Services In this case. Following is an example implementation of the OnDataLossAsync method: protected override async Task<bool> OnDataLossAsync(RestoreContext restoreCtx. In this case. the cases when you might need to perform a restore operation fall into one of these categories: The service partition lost data. Once the service is up. For example. an application upgrade may start to update every phone number record in a Reliable Dictionary with an invalid area code. Thus. For each partition. that may cause corruption of data.BackupFolderPath provided does not contain a full backup. since the latest backups may also be corrupt.InvokeDataLossAsync on each partition to restore the entire service. Thus.. This is specified as part of RestoreDescription. when OnDataLossAsync is called on the partition in the production cluster. It can also throw ArgumentException if BackupFolderPath has a broken chain of incremental backups.StartPartitionDataLossAsync on every partition. Repeat this process for each partition. Each partition needs to restore the latest relevant backup from the external store. It is important to create the service with the same configuration. Note that: When you restore. steps in the "Deleted or lost service" section can be used to restore the state of the service to the state before the buggy code corrupted the state. move/delete all backups of this partition that were more recent (than that backup). the data may still be corrupt and thus data may need to be restored. the service needs to store the appropriate partition information and service name to identify the correct latest backup to restore from for each partition. it may not be sufficient to restore the latest backup. Replication of corrupt application data If the newly deployed application upgrade has a bug. Now. the API to restore data (OnDataLossAsync above) has to be invoked on every partition of this service. the last backup found in the external store will be the one picked by the above process. implementation is the same as the above scenario.TestManagementClient. From this point. you have to find the last backup that was made before the data got corrupted. start restoring the backups from the most recent to the least. you could deploy a new Service Fabric cluster and restore the backups of affected partitions just like the above "Deleted or lost service" scenario. One way of achieving this is by using FabricClient. Now.Force can be used to skip this safety check. Once you find a backup that does not have the corruption. NOTE The RestorePolicy is set to Safe by default. so that the data can be restored seamlessly. If you are not sure which backups are corrupt. One caveat is that the partition ID may have now changed. upgrade to the version of the application code that does not have the bug. partitioning scheme. For example. However. if it contains the full backup. if possible. This means that the RestoreAsync API will fail with ArgumentException if it detects that the backup folder contains a state that is older than or equal to the state contained in this replica. NOTE It is not recommended to use FabricClient.g. In such cases. the invalid phone numbers will be replicated since Service Fabric is not aware of the nature of the data that is being stored. RestorePolicy. e. In this case. the first incremental and the third incremental backup but no the second incremental backup. since the runtime creates partition IDs dynamically. Deleted or lost service If a service is removed.ServiceManager. you must first re-create the service before the data can be restored. there is a chance that the backup being restored is older than the state of the partition . even after the service code is fixed. The first thing to do after you detect such an egregious bug that causes data corruption is to freeze the service at the application level and. since that may corrupt your cluster state. ActorRuntime. Since backups will be taken on a per- partition basis. you should restore only as a last resort to recover as much data as possible.GetResult(). // } When you create a custom actor service class. null. Hence. actorTypeInfo) { } // // Method overrides and other code. you need to register that as well when registering the actor. The ActorService which hosts the actor(s) is a stateful reliable service. states for all actors in that partition will be backed up (and restoration is similar and will happen on a per-partition basis). The default state provider for Reliable Actors is KvsActorStateProvider. Incremental backup is not enabled by default for KvsActorStateProvider. depending on the FabricDataRoot path and Application Type name's length. // } After incremental backup has been enabled. class MyCustomActorService : ActorService { public MyCustomActorService(StatefulServiceContext context. like Directory. You can enable incremental backup by creating KvsActorStateProvider with the appropriate setting in its constructor and then passing it to ActorService constructor as shown in following code snippet: class MyCustomActorService : ActorService { public MyCustomActorService(StatefulServiceContext context. like CopyFile. new KvsActorStateProvider(true)) // Enable incremental backup { } // // Method overrides and other code. null. .GetAwaiter(). before the data was lost. To perform backup/restore. taking an incremental backup can fail with FabricMissingFullBackupException for one of following reasons and you will need to take a full backup before taking incremental backup(s): The replica has never taken a full backup since it became primary. Backup and restore Reliable Actors Reliable Actors Framework is built on top of Reliable Services. typeInfo)). This can cause some . Because of this. One workaround is to directly call kernel32 APIs.NET methods. actorTypeInfo.Move. ActorTypeInformation actorTypeInfo) : base(context. all the backup and restore functionality available in Reliable Services is also available to Reliable Actors (except behaviors that are state provider specific). the service owner should create a custom actor service class that derives from ActorService class and then do backup/restore similar to Reliable Services as described above in previous sections. ActorTypeInformation actorTypeInfo) : base(context.RegisterActorAsync<MyActor>( (context. typeInfo) => new MyCustomActorService(context. The string that represents the backup folder path and the paths of files inside the backup folder can be greater than 255 characters. to throw the PathTooLongException exception. Any transaction that commits after BackupAsync has been called may or may not be in the backup. . When incremental backup is enabled. NOTE KvsActorStateProvider currently ignores the option RestorePolicy. Some of the log records were truncated since last backup was taken. If no backup is taken by user for a period of 45 minutes. the service's backup callback is invoked. The log records may also get truncated if primary replica need to build another replica by sending all its data. The restore API will throw FabricException with appropriate error message if the backup chain validation fails. Then. This interval can be configured by specifying logTrunctationIntervalInMinutes in KvsActorStateProvider constructor (similar to when enabling incremental backup). the system automatically truncates the log records. Testing Backup and Restore It is important to ensure that critical data is being backed up. it utilizes a checkpoint and log persistence mechanism. This can be done by invoking the Invoke-ServiceFabricPartitionDataLoss cmdlet in PowerShell that can induce data loss in a particular partition to test whether the data backup and restore functionality for your service is working as expected. the Reliable State Manager copies all log records. When BackupAsync is called. When doing restore from a backup chain. similar to Reliable Services. To do so. the BackupFolderPath should contain subdirectories with one subdirectory containing full backup and others subdirectories containing incremental backup(s). KvsActorStateProvider does not use circular buffer to manage its log records and periodically truncates it.Service service for more details.Safe. NOTE You can find a sample implementation of backup and restore functionality in the Web Reference App on GitHub.. Since all the log records up to the latest log record are included in the backup and the Reliable State Manager preserves write-ahead logging. Support for this feature is planned in an upcoming release. It is also possible to programmatically invoke data loss and restore from that event as well. starting from the "start pointer" to the latest log record into the backup folder. Once the local backup folder has been populated by the platform (i. and can be restored from. Restore The Reliable State Manager provides the ability to restore from a backup by using the RestoreAsync API. Please look at the Inventory. This callback is responsible for moving the backup folder to an external location such as Azure Storage. the Reliable State Manager guarantees that all transactions that are committed (CommitAsync has returned successfully) are included in the backup. Under the hood: more details on backup and restore Here's some more details on backup and restore.e. the Reliable State Manager instructs all Reliable objects to copy their latest checkpoint files to a local backup folder. Backup The Reliable State Manager provides the ability to create consistent backups without blocking any read or write operations. The Reliable State Manager takes fuzzy (lightweight) checkpoints at certain points to relieve pressure from the transactional log and improve recovery times. local backup is completed by the runtime). RestoreAsync first drops all existing state in the primary replica that it was called on. Next steps Reliable Collections Reliable Services quick start Reliable Services notifications Reliable Services configuration Developer reference for Reliable Collections . Until a service completes this API successfully (by returning true or false) and finishes the relevant reconfiguration. If the OnDataLossAsync returns true. This implies that for StatefulService implementers. Service Fabric will rebuild all other replicas from this primary.The RestoreAsync method on RestoreContext can be called only inside the OnDataLossAsync method. the Reliable State Manager recovers its own state from the log records in the backup folder and performs recovery. Then. Finally. As part of the recovery process. OnDataLossAsync will be invoked on the new primary. Service Fabric ensures that replicas that will receive OnDataLossAsync call first transition to the primary role but are not granted read status or write status. The bool returned by OnDataLossAsync indicates whether the service restored its state from an external source. the API will keep being called one at a time. This step ensures that the recovered state is consistent. the Reliable objects are instructed to restore from their checkpoints in the backup folder. Then the Reliable State Manager creates all the Reliable objects that exist in the backup folder. RunAsync will not be called until OnDataLossAsync finishes successfully. operations starting from the "starting point" that have commit log records in the backup folder are replayed to the Reliable objects. Next. Task CloseAsync(CancellationToken cancellationToken). CompletableFuture<?> closeAsync(CancellationToken cancellationToken). For stateless services: class MyStatelessService : StatelessService { protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { . How to use the Reliable Services communication APIs 4/7/2017 • 10 min to read • Edit Online Azure Service Fabric as a platform is completely agnostic about communication between services. void abort(). } You can then add your communication listener implementation by returning it in a service-based class method override. The Reliable Services application framework provides built-in communication stacks as well as APIs that you can use to build your custom communication components. from UDP to HTTP.. } public interface CommunicationListener { CompletableFuture<String> openAsync(CancellationToken cancellationToken). } .. Set up service communication The Reliable Services API uses a simple interface for service communication.. void Abort(). simply implement this interface: public interface ICommunicationListener { Task<string> OpenAsync(CancellationToken cancellationToken). To open an endpoint for your service. It's up to the service developer to choose how services should communicate.. } . All protocols and stacks are acceptable. Not only can you use multiple communication listeners in a service. false) }. For example. Each listener gets a name. This allows your service to listen on multiple endpoints. by using multiple listeners.CreateServiceRemotingListener(context). and the resulting collection of name : address pairs are represented as a JSON object when a client requests the listening addresses for a service instance or a partition. the override returns a collection of ServiceInstanceListeners... For stateful services. because a ServiceReplicaListener has an option to open an ICommunicationListener on secondary replicas. } . } In both cases.. class MyStatefulService : StatefulService { protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { . "HTTPReadonlyEndpoint". This is slightly different from its stateless counterpart. } . public class MyStatelessService extends StatelessService { @Override protected List<ServiceInstanceListener> createServiceInstanceListeners() { . For example. custom listener that takes read requests on secondary replicas over HTTP: protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new[] { new ServiceReplicaListener(context => new MyCustomHttpListener(context).... but you can also specify which listeners accept requests on secondary replicas and which ones listen only on primary replicas.. the override returns a collection of ServiceReplicaListeners. true). you may have an HTTP listener and a separate WebSocket listener. you return a collection of listeners. } . } For stateful services: NOTE Stateful reliable services are not supported in Java yet.. In a stateless service. you can have a ServiceRemotingListener that takes RPC calls only on primary replicas. new ServiceReplicaListener(context => this. potentially using different protocols. "rpcPrimaryEndpoint". and a second. A ServiceInstanceListener contains a function to create an ICommunicationListener(C#) / CommunicationListener(Java) and gives it a name. GetEndpoint("ServiceEndpoint").CodePackageActivationContext. The recommended way of doing this is for the communication listener to use the partition ID and replica/instance ID when it generates the listen address. This return value that gets published in the Naming Service is a string whose value can be anything at all. The listener can then start listening for requests when it is opened.getPort(). int port = codePackageActivationContext. <Resources> <Endpoints> <Endpoint Name="WebServiceEndpoint" Protocol="http" Port="80" /> <Endpoint Name="OtherServiceEndpoint" Protocol="tcp" Port="8505" /> <Endpoints> </Resources> The communication listener can access the endpoint resources allocated to it from the CodePackageActivationContext in the ServiceContext . NOTE Endpoint resources are common to the entire service package. each listener must be given a unique name. var codePackageActivationContext = serviceContext.getCodePackageActivationContext(). Multiple service replicas hosted in the same ServiceHost may share the same port.getEndpoint("ServiceEndpoint"). The Naming Service is a registrar for services and their addresses that each instance or replica of the service is listening on. When the OpenAsync(C#) / openAsync(Java) method of an ICommunicationListener(C#) / CommunicationListener(Java) completes. CodePackageActivationContext codePackageActivationContext = serviceContext. This string value is what clients see when they ask for an address for the service from the Naming Service. Finally. Service address registration A system service called the Naming Service runs on Service Fabric clusters. . describe the endpoints that are required for the service in the service manifest under the section on endpoints. and they are allocated by Service Fabric when the service package is activated. its return value gets registered in the Naming Service.Port. This means that the communication listener should support port sharing. NOTE When creating multiple listeners for a service. var port = codePackageActivationContext. publishAddress = this. Services are moved around in the cluster for resource balancing and availability purposes.getPort().Format( CultureInfo.getNodeContext(). port). this.com/Azure-Samples/service-fabric-java-getting-started.listeningAddress. this. see EchoServer application example at https://github. port). Communicating with a service The Reliable Services API provides the following libraries to write clients that communicate with services. return Task. int port = serviceEndpoint.publishAddress).completedFuture(this. This is the .listeningAddress = string.getIpAddressOrFQDN(). this. */ return CompletableFuture. In Service Fabric terminology.listeningAddress.Port. /* the string returned here will be published in the Naming Service.Start(this.getEndpoint("ServiceEndpoint").Replace("+".webApp = WebApp.webApp = new WebApp(port). FabricRuntime.Invoke(appBuilder)). } Service Fabric provides APIs that allow clients and other services to then ask for this address by service name. NOTE For a complete walk-through of how to write a communication listener. whereas for Java you can write your own HTTP server implementation. this. To connect to services within a cluster.InvariantCulture.webApp.publishAddress = String. public Task<string> OpenAsync(CancellationToken cancellationToken) { EndpointResourceDescription serviceEndpoint = serviceContext. see Service Fabric Web API services with OWIN self- hosting for C#.IPAddressOrFQDN). } public CompletableFuture<String> openAsync(CancellationToken cancellationToken) { EndpointResourceDescription serviceEndpoint = serviceContext.publishAddress).GetNodeContext().FromResult(this. The ServicePartitionResolver(C#) / FabricServicePartitionResolver(Java) utility class is a basic primitive that helps clients determine the endpoint of a service at runtime. "http://+:{0}/".CodePackageActivationContext.getCodePackageActivationContext. this. This is the mechanism that allow clients to resolve the listening address for a service.format("http://%s:%d/".startup. int port = serviceEndpoint. // the string returned here will be published in the Naming Service. appBuilder => this. FabricRuntime.GetEndpoint("ServiceEndpoint"). ServicePartitionResolver can be created using default settings. Service endpoint resolution The first step to communication with a service is to resolve an endpoint address of the partition or instance of the service you want to talk to. this. the process of determining the endpoint of a service is referred to as the service endpoint resolution.start(). This is important because the service address is not static. public FabricServicePartitionResolver(CreateFabricClient createFabricClient) { .com:19001").azure.com:19001").com:19000". "mycluster. FabricServicePartitionResolver resolver = new FabricServicePartitionResolver("mycluster.recommended usage for most situations: ServicePartitionResolver resolver = ServicePartitionResolver. FabricServicePartitionResolver resolver = new FabricServicePartitionResolver(() -> new CreateFabricClientImpl()). Note that gateway endpoints are just different endpoints for connecting to the same cluster. FabricServicePartitionResolver resolver = FabricServicePartitionResolver. cancellationToken). ServicePartitionResolver resolver = ServicePartitionResolver. so it is important to reuse FabricClient instances as much as possible.GetDefault().GetDefault(). Alternatively. ResolvedServicePartition partition = await resolver.azure. } FabricClient is the object that is used to communicate with the Service Fabric cluster for various management operations on the cluster.azure.. For example: ServicePartitionResolver resolver = new ServicePartitionResolver("mycluster.cloudapp. To connect to services in a different cluster. ServicePartitionResolver resolver = new ServicePartitionResolver(() => CreateMyFabricClient()). .cloudapp.com:19000". a ServicePartitionResolver can be created with a set of cluster gateway endpoints. "mycluster. new ServicePartitionKey().ResolveAsync(new Uri("fabric:/MyApp/MyService"). This is useful when you want more control over how a service partition resolver interacts with your cluster.getDefault().azure. FabricClient performs caching internally and is generally expensive to create.cloudapp. A resolve method is then used to retrieve the address of a service or a service partition for partitioned services.. } public interface CreateFabricClient { public FabricClient getFabricClient(). ServicePartitionResolver can be given a function for creating a FabricClient to use internally: public delegate FabricClient CreateFabricClientDelegate().cloudapp. CompletableFuture<ResolvedServicePartition> partition = resolver.resolveAsync(new URI("fabric:/MyApp/MyService"). /* * Getters and Setters */ } .g. service was deleted or the requested resource no longer exists). Communication clients and factories The communication factory library implements a typical fault-handling retry pattern that makes retrying connections to resolved service endpoints easier. The service address resolved through ServicePartitionResolver may be stale by the time your client code attempts to connect. class MyCommunicationClient : ICommunicationClient { public ResolvedServiceEndpoint Endpoint { get. Clients usually implement the abstract CommunicationClientFactoryBase class to handle logic that is specific to the communication stack. new ServicePartitionKey()). service moved or is temporarily unavailable).. The factories use the resolver internally to generate a client object that can be used to communicate with services. set. A service address can be resolved easily using a ServicePartitionResolver. Typically.getDefault(). This provides a base implementation of the CommunicationClientFactory interface and performs tasks that are common to all the communication stacks. Service instances or replicas can move around from node to node at any time for multiple reasons. set. The Reliable Services API provides a CommunicationClientFactoryBase<TCommunicationClient> . It is created and passed on to communication client factories in the Reliable Services API. In that case again the client needs to re-resolve the address. but more work is required to ensure the resolved address can be used correctly. the client code need not work with the ServicePartitionResolver directly. set. private ResolvedServiceEndpoint endPoint. } } public class MyCommunicationClient implements CommunicationClient { private ResolvedServicePartition resolvedServicePartition. The implementation of the CommunicationClientFactory depends on the communication stack used by the Service Fabric service where the client wants to communicate. (These tasks include using a ServicePartitionResolver to determine the service endpoint). The client can use whatever protocol it wants. } public ResolvedServicePartition ResolvedServicePartition { get. The communication client just receives an address and uses it to connect to a service. Providing the previous ResolvedServicePartition indicates that the resolver needs to try again rather than simply retrieve a cached address. FabricServicePartitionResolver resolver = FabricServicePartitionResolver. Your client needs to detect whether the connection attempt failed because of a transient error and can be retried (e. The factory library provides the retry mechanism while you provide the error handlers. or a permanent error (e. private String listenerName. ICommunicationClientFactory(C#) / CommunicationClientFactory(Java) defines the base interface implemented by a communication client factory that produces clients that can talk to a Service Fabric service..g. } public string ListenerName { get. If it does not know what decisions to make about an exception. such as some binary protocols. retryable exceptions are further categorized into transient and non-transient. MyCommunicationClient client) { } @Override protected CompletableFuture<MyCommunicationClient> createClientAsync(String endpoint) { } @Override protected void abortClient(MyCommunicationClient client) { } } Finally. These include exceptions that indicate the service endpoint could not be reached. For clients that don't maintain a persistent connection. it should return false. MyCommunicationClient client) { } } public class MyCommunicationClientFactory extends CommunicationClientFactoryBase<MyCommunicationClient> { @Override protected boolean validateClient(MyCommunicationClient clientChannel) { } @Override protected boolean validateClient(String endpoint. Transient exceptions are those that can simply be retried without re-resolving the service endpoint address. If it does know what decision to make. it should set the result accordingly and return true. Non-transient exceptions are those that require the service endpoint address to be re-resolved.The client factory is primarily responsible for creating communication clients. CancellationToken cancellationToken) { } protected override bool ValidateClient(MyCommunicationClient clientChannel) { } protected override bool ValidateClient(string endpoint. Other protocols that maintain a persistent connection. Exceptions are categorized into retryable and non retryable. the factory only needs to create and return the client. indicating the service has moved to a different node. should also be validated by the factory to determine whether the connection needs to be re-created. such as an HTTP client. public class MyCommunicationClientFactory : CommunicationClientFactoryBase<MyCommunicationClient> { protected override void AbortClient(MyCommunicationClient client) { } protected override Task<MyCommunicationClient> CreateClientAsync(string endpoint. Non retryable exceptions simply get rethrown back to the caller. The TryHandleException makes a decision about a given exception. These will include transient network problems or service error responses other than those that indicate the service endpoint address does not exist. . an exception handler is responsible for determining what action to take when an exception occurs. /* if exceptionInformation. // if exceptionInformation.getException(). retrySettings. retrySettings. /* if exceptionInformation.getException() is known and is transient (can be retried without re- resolving) */ result = new ExceptionHandlingRetryResult(exceptionInformation. ICommunicationClientFactory(C#) / CommunicationClientFactory(Java) .Exception. return false.getDefaultMaxRetryCount()). return true.Exception is unknown (let the next IExceptionHandler attempt to handle it) result = null.Exception is known and is transient (can be retried without re-resolving) result = new ExceptionHandlingRetryResult(exceptionInformation. retrySettings.DefaultMaxRetryCount). return true. out ExceptionHandlingResult result) { // if exceptionInformation. } } Putting it all together With an ICommunicationClient(C#) / CommunicationClient(Java) .Exception. return false. and IExceptionHandler(C#) / ExceptionHandler(Java) built around a communication protocol. return true. } } public class MyExceptionHandler implements ExceptionHandler { @Override public ExceptionHandlingResult handleException(ExceptionInformation exceptionInformation. return true. OperationRetrySettings retrySettings) { /* if exceptionInformation.getException().DefaultMaxRetryCount). class MyExceptionHandler : IExceptionHandler { public bool TryHandleException(ExceptionInformation exceptionInformation. retrySettings. true. retrySettings. retrySettings. false. a ServicePartitionClient(C#) / FabricServicePartitionClient(Java) wraps it all together and provides the fault- handling and service partition address resolution loop around these components. false. OperationRetrySettings retrySettings. retrySettings.getDefaultMaxRetryCount()). // if exceptionInformation.getException() is known and is not transient (indicates a new service endpoint address must be resolved) */ result = new ExceptionHandlingRetryResult(exceptionInformation.Exception is known and is not transient (indicates a new service endpoint address must be resolved) result = new ExceptionHandlingRetryResult(exceptionInformation. retrySettings.getException() is unknown (let the next ExceptionHandler attempt to handle it) */ result = null. true. . CancellationToken.myServiceUri.invokeWithRetryAsync(client -> { /* Communicate with the service using the client. myPartitionKey). private MyCommunicationClientFactory myCommunicationClientFactory.None). private URI myServiceUri.myCommunicationClientFactory. FabricServicePartitionClient myServicePartitionClient = new FabricServicePartitionClient<MyCommunicationClient>( this. */ }). private Uri myServiceUri. myPartitionKey). CompletableFuture<?> result = myServicePartitionClient. this.myServiceUri. Next steps See an example of HTTP communication between services in a C# sample project on GitHUb or Java sample project on GitHUb.myCommunicationClientFactory. private MyCommunicationClientFactory myCommunicationClientFactory. var result = await myServicePartitionClient.InvokeWithRetryAsync(async (client) => { // Communicate with the service using the client. this. var myServicePartitionClient = new ServicePartitionClient<MyCommunicationClient>( this. }. Remote procedure calls with Reliable Services remoting Web API that uses OWIN in Reliable Services WCF communication by using Reliable Services . There are two ways that you can provide listener settings and security credentials: a. } } 2.FabricTransport. Create an interface. Make sure that the certificate that you want to use to help secure your service communication is installed on all the nodes in the cluster. public interface IHelloWorldStateful : IService { Task<string> GetHelloWorld(). IHelloWorldStateful { protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new[]{ new ServiceReplicaListener( (context) => new FabricTransportServiceRemotingListener(context. IHelloWorldStateful . Help secure a service when you're using service remoting We'll be using an existing example that explains how to set up remoting for reliable services. This article talks about how to improve security when you're using service remoting and the Windows Communication Foundation (WCF) communication stack. Your service will use FabricTransportServiceRemotingListener .Runtime namespace. } internal class HelloWorldStateful : StatefulService. The Reliable Services application framework provides a few prebuilt communication stacks and tools that you can use to improve security. To help secure a service when you're using service remoting. that defines the methods that will be available for a remote procedure call on your service.Remoting. This is an ICommunicationListener implementation that provides remoting capabilities. Provide them directly in the service code: . follow these steps: 1. Add listener settings and security credentials.ServiceFabric. } public Task<string> GetHelloWorld() { return Task.FromResult("Hello World!").this))}. Help secure communication for services in Azure Service Fabric 4/3/2017 • 5 min to read • Edit Online Security is one of the most important aspects of communication. which is declared in the Microsoft.Services. FabricTransportListenerSettings.  <Section Name="HelloWorldStatefulTransportSettings"> <Parameter Name="MaxMessageSize" Value="10000000" /> <Parameter Name="SecurityCredentialsType" Value="X509" /> <Parameter Name="CertificateFindType" Value="FindByThumbprint" /> <Parameter Name="CertificateFindValue" Value="4FEF3950642138446CC364A396E1E881DB76B48C" /> <Parameter Name="CertificateStoreLocation" Value="LocalMachine" /> <Parameter Name="CertificateStoreName" Value="My" /> <Parameter Name="CertificateProtectionLevel" Value="EncryptAndSign" /> <Parameter Name="CertificateRemoteCommonNames" Value="ServiceFabric-Test-Cert" /> </Section> In this case. } private static SecurityCredentials GetSecurityCredentials() { // Provide certificate details.Add("ServiceFabric-Test-Cert").LocalMachine.-->  <Section Name="TransportSettings"> . IHelloWorldStateful client = serviceProxyFactory. var x509Credentials = new X509Credentials { FindType = X509FindType. the CreateServiceReplicaListeners method will look like this: protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new[] { return new[]{ new ServiceReplicaListener( (context) => new FabricTransportServiceRemotingListener(context.Client.Remoting.xml file. which contains SecurityCredentials .  <Section Name="HelloWorldStatelessTransportSettings"> <Parameter Name="MaxMessageSize" Value="10000000" /> <Parameter Name="SecurityCredentialsType" Value="X509_2" /> <Parameter Name="CertificatePath" Value="/path/to/cert/BD1C71E248B8C6834C151174DECDBDC02DE1D954.-->  <Section Name="TransportSettings"> .loadFrom("TransportPrefixTra nsportSettings"). instead of using the microsoft.services..ServiceProxyBase class to create a service proxy. as shown earlier. } If you add a TransportSettingssection in the settings. null.this. the createServiceInstanceListeners method will look like this: protected List<ServiceInstanceListener> createServiceInstanceListeners() { ArrayList<ServiceInstanceListener> listeners = new ArrayList<>().serviceFabric. null). })). In this case.FabricServiceProxyFactory . you can load FabricTransportSettings from the settings.client. listeners. null) HelloWorldStateless client = serviceProxyFactory.services. } 3.remoting. Traditional ASP. see: Introduction to hosting in ASP.NET Framework. such as the static void Main() method in Program. The rest of this article explains how to use ASP. You. own the service host process and Service Fabric activates and monitors it for you. as opposed to a process that is owned by dedicated web server software such as IIS. NOTE To develop Reliable Services with ASP. either as a guest executable or in a Reliable Service.dll. In order to combine a Service Fabric service and ASP.NET Core.NET. IoT apps.NET (up to MVC 5) is tightly coupled to IIS through System.NET Framework. . the lifecycle of the WebHost is bound to the lifecycle of the process.NET Core provides a separation between the web server and your web application.NET Core in Visual Studio 2015.NET Core Service Fabric service. one or more instances and/or replicas of your service run in a service host process.NET Core applications on Service Fabric with no code changes. NOTE The rest of this article assumes you are familiar with hosting in ASP. This allows web applications to be portable between different web servers and also allows web servers to be self-hosted. ASP.NET Core applications create a WebHost in an application's entry point.NET Core is a new open-source and cross-platform framework for building modern cloud-based Internet- connected applications.NET Core in Service Fabric Reliable Services 3/23/2017 • 15 min to read • Edit Online ASP. as a service author.NET inside your service host process. you must still target the full . Service Fabric service hosting In Service Fabric.NET Core self-hosting allows you to do this.NET Core. self-hosted ASP. such as web apps.NET Core in a Reliable Service Typically.NET Core VS 2015 Tooling Preview 2 installed.NET Framework.NET Core inside a Reliable Service using the ASP. This means when you build an ASP. To learn more about hosting in ASP.NET Core or on the full .NET Core apps can run on . ASP. you must be able to start ASP.NET Core can be used in two different ways in Service Fabric: Hosted as a guest executable. This allows better integration with the Service Fabric runtime and allows stateful ASP. ASP. an executable file that runs your service code. Run inside a Reliable Service.NET Core. Hosting ASP.NET Core services. This is primarily used to run existing ASP. which means you can start a web server in your own process. and mobile backends. While ASP. ASP.Web. Service Fabric services currently can only run on the full .NET Core integration components that ship with the Service Fabric SDK.cs . In this case. you will need to have . Both communication listeners provide a constructor that takes the following arguments: ServiceContext serviceContext: The ServiceContext object that contains information about the running service. This allows you to configure IWebHost the way you normally would in an ASP. This is primarily where the two communication listeners differ: WebListener requires an Endpoint configuration. The Microsoft. service instances and/or replicas can go through multiple lifecycles. Within the service host process.xml. IWebHost> build : a lambda that you implement in which you create and return an IWebHost . AspNetCoreCommunicationListener. This is necessary in a .NET Core ICommunicationListeners The ICommunicationListener implementations for Kestrel and WebListener in the Microsoft. so that it may create instances of that service type.ServiceFabric.However.NET Core WebHost for either Kestrel or WebListener in a Reliable Service.NET Core application. This middleware configures the Kestrel or WebListener ICommunicationListener to register a unique service URL with the Service Fabric Naming Service and then validates client requests to ensure clients are connecting to the right service. while Kestrel does not.AspNetCore. string endpointName : the name of an Endpoint configuration in ServiceManifest. A Reliable Service instance is represented by your service class deriving from StatelessService or StatefulService . The lambda provides a URL which is generated for you depending on the Service Fabric integration options you use and the Endpoint configuration you provide. The communication stack for a service is contained in an ICommunicationListener implementation in your service class.Services. Func<string.Services.Services. That URL can then be modified or used as-is to start the web server. the application entry point is not the right place to create a WebHost in a Reliable Service. ASP. Service Fabric integration middleware The Microsoft.ServiceFabric.AspNetCore NuGet package includes the UseServiceFabricIntegration extension method on IWebHostBuilder that adds Service Fabric-aware middleware.AspNetCore.ServiceFabric. because the application entry point is only used to register a service type with the Service Fabric runtime.* NuGet packages have similar use patterns but perform slightly different actions specific to each web server. The WebHost should be created in a Reliable Service itself.* NuGet packages contain implementations of ICommunicationListener that start and manage the ASP. Using unique service URLs To prevent this. regardless of protocol. Service A listens on 10. If services use dynamically-assigned application ports. The following diagram shows the request flow with the middleware enabled: .1:30000 over HTTP.0.0. to prevent clients from mistakenly connecting to the wrong service. listen on a unique IP:port combination. If the identifier does not match.0. Service A moves to a different node. This is a cooperative action between services in a non-hostile-tenant trusted environment. the unique identifier should not be enabled.0.0.1:30000 3. 4. A fixed unique port is typically used for externally-facing services that need a well-known port for client applications to connect to. This can cause bugs at random times that can be difficult to diagnose. Client resolves Service A and gets address 10. Client attempts to connect to service A with cached address 10. it reports that endpoint address to the Service Fabric Naming Service where it can be discovered by clients or other services. Once a service replica has started listening on an IP:port endpoint. the middleware immediately returns an HTTP 410 Gone response. For example. A case of mistaken identity Service replicas. most Internet-facing web applications will use port 80 or 443 for web browser connections. This scenario is described in more detail in the next section. In a trusted environment.0. 5. where multiple web applications can run on the same physical or virtual machine but do not use unique host names. Services that use a dynamically-assigned port should make use of this middleware. 6. services can post an endpoint to the Naming Service with a unique identifier. Services that use a fixed unique port do not have this problem in a cooperative environment.1:30000. Client is now successfully connected to service B not realizing it is connected to the wrong service. the middleware that's added by the UseServiceFabricIntegration method automatically appends a unique identifier to the address that is posted to the Naming Service and validates that identifier on each request. This can happen if the following sequence of events occur: 1.1 and coincidentally uses the same port 30000.0. Service B is placed on 10. a service replica may coincidentally use the same IP:port endpoint of another service that was previously on the same physical or virtual machine. This can cause a client to mistakely connect to the wrong service. and then validate that unique identifier during client requests.0. This does not provide secure service authentication in a hostile-tenant environment. In this case.shared-host environment such as Service Fabric. 2. This package contains WebListenerCommunicationListener . that functionality is not used by the WebListener ICommunicationListener implementation because that will result in HTTP 503 and HTTP 404 error status codes in the scenario described earlier. These features are useful in Service Fabric for hosting multiple websites in the same cluster.WebListener NuGet package. This uses the http.sys port sharing feature. WebListener in Reliable Services WebListener can be used in a Reliable Service by importing the Microsoft.AspNetCore. disambiguated by either a unique URL path or hostname. both Kestrel and WebListener ICommunicationListener implementations standardize on the middleware provided by the UseServiceFabricIntegration extension method so that clients only need to perform a service endpoint re-resolve action on HTTP 410 responses. That in turn makes it very difficult for clients to determine the intent of the error.ServiceFabric. This allows multiple processes on the same physical or virtual machine to host web applications on the same port.NET Core WebHost inside a Reliable Service using WebListener as the web server.sys kernel driver used by IIS to process HTTP requests and route them to processes running web applications.sys kernel driver on Windows for port sharing: . that allows you to create an ASP. Although WebListener can internally differentiate requests based on unique URL paths using the underlying http.Both Kestrel and WebListener ICommunicationListener implementations use this mechanism in exactly the same way. WebListener is built on the Windows HTTP Server API. an implementation of ICommunicationListener . The following diagram illustrates how WebListener uses the http. as HTTP 503 and HTTP 404 are already commonly used to indicate other errors. Thus. } WebListener in a stateful service WebListenerCommunicationListener is currently not designed for use in stateful services due to complications with the underlying http.WebListener in a stateless service To use WebListener in a stateless service.ConfigureServices( services => services .UseContentRoot(Directory. (url.GetCurrentDirectory()) .AddSingleton<StatelessServiceContext>(serviceContext)) .xml are used specifically to instruct the Service Fabric runtime to register a URL with http. Kestrel is the recommended web server. to reserve http://+:80 for a service. ServiceFabricIntegrationOptions. override the CreateServiceInstanceListeners method and return a WebListenerCommunicationListener instance: protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { return new ServiceInstanceListener[] { new ServiceInstanceListener(serviceContext => new WebListenerCommunicationListener(serviceContext.UseWebListener() . "ServiceEndpoint". including WebListener. This action requires elevated privileges that your services by default do not have.sys on your behalf using the strong wildcard URL prefix. see the following section on dynamic port allocation with WebListener.UseUniqueServiceUrl) . Web servers that use the Windows HTTP Server API must first reserve their URL with http.UseStartup<Startup>() . For example. For more information. For stateful services.Build())) }. Endpoint configuration An Endpoint configuration is required for web servers that use the Windows HTTP Server API.sys (this is normally accomplished with the netsh tool). listener) => new WebHostBuilder() . the following configuration should be used in .UseUrls(url) .UseServiceFabricIntegration(listener.sys port sharing feature. The "http" or "https" options for the Protocol property of the Endpoint configuration in ServiceManifest. meaning that each one will share the same port when allocated through the Endpoint configuration.xml: <ServiceManifest . ServiceFabricIntegrationOptions. <Resources> <Endpoints> <Endpoint Name="ServiceEndpoint" Protocol="http" Port="80" /> </Endpoints> </Resources> </ServiceManifest> And the endpoint name must be passed to the WebListenerCommunicationListener constructor: new WebListenerCommunicationListener(serviceContext.Build().. a cross-platform asynchronous I/O library.sys. but that is not supported by WebListenerCommunicationListener due to the complications it introduces for client requests.ServiceFabric..UseWebListener() . Kestrel is a cross-platform web server for ASP.UseUrls(url) . provide the port number in the Endpoint configuration: <Resources> <Endpoints> <Endpoint Protocol="http" Name="ServiceEndpoint" Port="80" /> </Endpoints> </Resources> Use WebListener with a dynamic port To use a dynamically assigned port with WebListener. Unlike WebListener. (url. }) Use WebListener with a static port To use a static port with WebListener. .UseServiceFabricIntegration(listener.NET Core based on libuv.None) . Kestrel in Reliable Services Kestrel can be used in a Reliable Service by importing the Microsoft.AspNetCore. Each instance of Kestrel must use a unique port. Multiple WebListener instances can share a port using the underlying http. omit the Port property in the Endpoint configuration: <Resources> <Endpoints> <Endpoint Protocol="http" Name="ServiceEndpoint" /> </Endpoints> </Resources> Note that a dynamic port allocated by an Endpoint configuration only provides one port per host process. "ServiceEndpoint".NET Core WebHost inside a Reliable Service using Kestrel as the web server. Kestrel does not use a centralized endpoint manager such as http. And unlike WebListener. For dynamic port usage. that allows you to create an ASP. listener) => { return new WebHostBuilder() .ServiceManifest. This package contains KestrelCommunicationListener .Kestrel NuGet package. The current Service Fabric hosting model allows multiple service instances and/or replicas to be hosted in the same process.sys port sharing feature. Kestrel is the recommended web server. Kestrel does not support port sharing between multiple processes. an implementation of ICommunicationListener . > ... UseUrls(url) .UseUniqueServiceUrl) . )) }.Build(). override the CreateServiceReplicaListeners method and return a KestrelCommunicationListener instance: . } Kestrel in a stateful service To use Kestrel in a stateful service.UseServiceFabricIntegration(listener.UseKestrel() . "ServiceEndpoint". ServiceFabricIntegrationOptions. listener) => new WebHostBuilder() .UseStartup<Startup>() . override the CreateServiceInstanceListeners method and return a KestrelCommunicationListener instance: protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { return new ServiceInstanceListener[] { new ServiceInstanceListener(serviceContext => new KestrelCommunicationListener(serviceContext.AddSingleton<StatelessServiceContext>(serviceContext)) .UseContentRoot(Directory.Kestrel in a stateless service To use Kestrel in a stateless service.GetCurrentDirectory()) .ConfigureServices( services => services . (url. xml because it does not require URL registration prior to starting.xml for use with Kestrel.AddSingleton<StatefulServiceContext>(serviceContext) . this does not work . ServiceFabricIntegrationOptions. If the port does not fall in the application port range. In this case. } In this example.StateManager)) .UseContentRoot(Directory. a singleton instance of IReliableStateManager is provided to the WebHost dependency injection container. If an Endpoint configuration is not used. Endpoint configuration An Endpoint configuration is not required to use Kestrel.xml. and a single host process can contain multiple Kestrel instances.AddSingleton<IReliableStateManager>(this. This is explained in more detail in the following section. it does not need an Endpoint configuration in ServiceManifest. Although this is not strictly necessary. This is not strictly necessary.ConfigureServices( services => services . protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new ServiceReplicaListener[] { new ServiceReplicaListener(serviceContext => new KestrelCommunicationListener(serviceContext.GetCurrentDirectory()) . its name must be passed into the KestrelCommunicationListener constructor: new KestrelCommunicationListener(serviceContext. it is opened through the OS firewall by Service Fabric. The URL provided to you through KestrelCommunicationListener will use this port.UseStartup<Startup>() . )) }. (url.UseUniqueServiceUrl) . listener) => . listener) => new WebHostBuilder() .. Since Kestrel does not support port sharing. 2. because automatic port assignment from an Endpoint configuration assigns a unique port per host process. (url. "ServiceEndpoint". but it allows you to use IReliableStateManager and Reliable Collections in your MVC controller action methods.Build(). a dynamic port will be used. omit the name in the KestrelCommunicationListener constructor.UseUrls(url) .UseServiceFabricIntegration(listener. unlike WebListener (or HttpListener). See the next section for more information. <Resources> <Endpoints> <Endpoint Protocol="http" Name="ServiceEndpoint" Port="80" /> </Endpoints> </Resources> If an Endpoint is configured. Note that an Endpoint configuration name is not provided to KestrelCommunicationListener in a stateful service.. Kestrel is a simple stand-alone web server.UseKestrel() . it provides two potential benefits: 1. Use Kestrel with a static port A static port can be configured in the Endpoint configuration of ServiceManifest. Use Kestrel with a dynamic port Kestrel cannot use the automatic port assignment from the Endpoint configuration in ServiceManifest. listener) => . KestrelCommunicationListener will automatically select an unused port from the application port range. A reverse proxy server such as IIS or Nginx must be used to handle traffic from the public Internet. such as Windows Authentication and port sharing. such an intranet.as each Kestrel instance must be opened on a unique port. Internet-facing HTTP endpoints on Windows. and miscellaneous settings to achieve a properly functioning service: Externally exposed ASP.NET Core stateful service An externally exposed service is one that exposes an endpoint reachable from outside the cluster. (url. An internal-only service is one whose endpoint is only reachable from within the cluster.NET Core stateless service Internal-only ASP. Externally exposed ASP. Kestrel is not supported as an edge (Internet-facing) server at this time. This is the URL you will provide to users of your application.xml. port configuration. a stateless service should use a well-known and stable endpoint that is reachable through a load balancer.NET Core stateless service Internal-only ASP. To use dynamic port assignment with Kestrel. .. Kestrel may be used.. Service Fabric integration options. In this configuration. Otherwise. Scenarios and configurations This section describes the following scenarios and provides the recommended combination of web server. It provides better protection against attacks and supports features that Kestrel does not. such as 80 for HTTP or 443 for HTTPS.xml entirely. The following configuration is recommended: NOTES Web server WebListener If the service is only exposed to a trusted network. such as the Azure Load Balancer. and do not pass an endpoint name to the KestrelCommunicationListener constructor: new KestrelCommunicationListener(serviceContext. Port configuration static A well-known static port should be configured in the Endpoints configuration of ServiceManifest. will be unable to expose stateful services because the load balancer will not be able to locate and route traffic to the appropriate stateful service replica.NET Core stateless services WebListener is the recommended web server for front-end services that expose external. When exposed to the Internet. NOTE Stateful service endpoints generally should not be exposed to the Internet. usually through a load balancer. simply omit the Endpoint configuration in ServiceManifest. Clusters that are behind load balancers that are unaware of Service Fabric service resolution. WebListener is the preferred option. UseWebListener() . Internal-only stateful ASP. new WebListenerCommunicationListener(serviceContext. Port configuration dynamically assigned Multiple replicas of a stateful service may share a host process or host operating system and thus will need unique ports. Note this applies to WebListener only. Instance Count -1 In typical use cases. If multiple externally exposed services share the same set of nodes..NET Core service Stateless services that are only called from within the cluster should use unique URLs and dynamically assigned ports to ensure cooperation between multiple services. a unique but stable URL path should be used.Build().. return new WebHostBuilder() . NOTES ServiceFabricIntegrationOptions None The ServiceFabricIntegrationOptions.UseUrls(url) . This can be accomplished by modifying the URL provided when configuring IWebHost. InstanceCount any The instance count setting can be set to any value necessary to operate the service. the instance count setting should be set to "-1" so that an instance is available on all nodes that receive traffic from a load balancer. }) Internal-only stateless ASP. ServiceFabricIntegrationOptions UseUniqueServiceUrl With dynamic port assignment. listener) => { url += "/MyUniqueServicePath".NET Core service . External users of your application will not know the unique identifying information used by the middleware. "ServiceEndpoint". . (url. The following configuration is recommended: NOTES Web server Kestrel Although WebListener may be used for internal stateless services. Kestrel is the recommended server to allow multiple service instances to share a host. this setting prevents the mistaken identity issue described earlier.None option should be used when configuring Service Fabric integration middleware so that the service does not attempt to validate incoming requests for a unique identifier. The following configuration is recommended: NOTES Web server Kestrel The WebListenerCommunicationListener is not designed for use by stateful services in which replicas share a host process. ServiceFabricIntegrationOptions UseUniqueServiceUrl With dynamic port assignment. Port configuration dynamically assigned Multiple replicas of a stateful service may share a host process or host operating system and thus will need unique ports. this setting prevents the mistaken identity issue described earlier. Next steps Debug your Service Fabric application by using Visual Studio .Stateful services that are only called from within the cluster should use dynamically assigned ports to ensure cooperation between multiple services. } class MyService : StatelessService.IService to signal that the service has a remoting interface.Runtime. the following stateless service exposes a single method to get "Hello World" over a remote procedure call. Set up remoting on a service Setting up remoting for a service is done in two simple steps: 1.Services. } protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { return new[] { new ServiceInstanceListener(context => this. Service remoting with Reliable Services 4/3/2017 • 1 min to read • Edit Online For services that are not tied to a particular communication protocol or stack.ServiceFabric. This is an ICommunicationListener implementation that provides remoting capabilities. public interface IMyService : IService { Task<string> HelloWorldAsync().Runtime. using Microsoft.Services. Windows Communication Foundation (WCF).CreateServiceRemotingListener(context)) }.Runtime.ServiceFabric. 2.Services. using Microsoft. IMyService { public MyService(StatelessServiceContext context) : base (context) { } public Task HelloWorldAsync() { return Task.ServiceFabric. such as WebAPI.ServiceFabric. For example.Communication.Remoting. The methods must be task-returning asynchronous methods. The interface must implement Microsoft.FromResult("Hello!"). the Reliable Services framework provides a remoting mechanism to quickly and easily set up remote procedure call for services. } } .ServiceFabric. or others.Services.Runtime namespace contains an extension method.Remoting.Remoting. Create an interface for your service to implement.Services. using Microsoft.Services. This interface defines the methods that are available for a remote procedure call on your service.Remoting. CreateServiceRemotingListener for both stateless and stateful services that can be used to create a remoting listener using the default remoting transport protocol.ServiceFabric. using Microsoft. Use a remoting listener in your service. The Microsoft. NOTE The arguments and the return types in the service interface can be any simple. complex.ServiceProxy class.Client.HelloWorldAsync().Create<IMyService>(new Uri("fabric:/MyApplication/MyHelloWorldService")). With that proxy. you can simply call methods on the interface remotely.Services.NET DataContractSerializer. but they must be serializable by the . So exception-handling logic at the client by using ServiceProxy can directly handle exceptions that the service throws.ServiceFabric.Remoting. The remoting framework propagates exceptions thrown at the service to the client. or custom types. IMyService helloWorldClient = ServiceProxy. The ServiceProxy method creates a local proxy by using the same interface that the service implements. Next steps Web API with OWIN in Reliable Services WCF communication with Reliable Services Securing communication for Reliable Services . string message = await helloWorldClient. Call remote service methods Calling methods on a service by using the remoting stack is done by using a local proxy to the service through the Microsoft. This interface defines the methods that are available for a remote procedure call on your service. listeners. This is an CommunicationListener implementation that provides remoting capabilities. } @Override protected List<ServiceInstanceListener> createServiceInstanceListeners() { ArrayList<ServiceInstanceListener> listeners = new ArrayList<>(). The interface must implement microsoft.ArrayList. the following stateless service exposes a single method to get "Hello World" over a remote procedure call. } public CompletableFuture<String> helloWorldAsync() { return CompletableFuture.util.List. } class MyServiceImpl extends StatelessService implements MyService { public MyServiceImpl(StatelessServiceContext context) { super(context). Use a remoting listener in your service.runtime.StatelessService. })). import java.communication. Create an interface for your service to implement.remoting. } } NOTE The arguments and the return types in the service interface can be any simple. import microsoft.util. Set up remoting on a service Setting up remoting for a service is done in two simple steps: 1. return listeners. import java.serviceFabric.concurrent.services. FabricTransportServiceRemotingListener can be used to create a remoting listener using the default remoting transport protocol.services. 2.CompletableFuture. Service remoting with Reliable Services 4/3/2017 • 1 min to read • Edit Online The Reliable Services framework provides a remoting mechanism to quickly and easily set up remote procedure call for services.Service to signal that the service has a remoting interface.services. .this). or custom types.servicefabric. For example. public interface MyService extends Service { CompletableFuture<String> helloWorldAsync().services.servicefabric. import microsoft.remoting. import microsoft.util.add(new ServiceInstanceListener((context) -> { return new FabricTransportServiceRemotingListener(context. import java.completedFuture("Hello!").Service.servicefabric. complex. but they must be serializable. The methods must be task-returning asynchronous methods.runtime.ServiceInstanceListener. client. Next steps Securing communication for Reliable Services . new URI("fabric:/MyApplication/MyHelloWorldService")). you can simply call methods on the interface remotely.ServiceProxyBase class. So exception-handling logic at the client by using ServiceProxyBase can directly handle exceptions that the service throws. CompletableFuture<String> message = helloWorldClient. With that proxy. The remoting framework propagates exceptions thrown at the service to the client.helloWorldAsync().create(MyService.services. MyService helloWorldClient = ServiceProxyBase.Call remote service methods Calling methods on a service by using the remoting stack is done by using a local proxy to the service through the microsoft.serviceFabric.class.remoting. The ServiceProxyBase method creates a local proxy by using the same interface that the service implements. WCF Communication Listener The WCF-specific implementation of ICommunicationListener is provided by the Microsoft. // // The name of the endpoint configured in the ServiceManifest under the Endpoints section // that identifies the endpoint that the WCF ServiceHost should listen on. // listenerBinding: WcfUtility. Lest say we have a service contract of type ICalculator [ServiceContract] public interface ICalculator { [OperationContract] Task<int> Add(int value1. serviceContext:context. . protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new[] { new ServiceReplicaListener((context) => new WcfCommunicationListener<ICalculator>( wcfServiceObject:this. // endpointResourceName: "WcfServiceEndpoint". int value2). They can plug in the communication stack of their choice via the ICommunicationListener returned from the CreateServiceReplicaListeners or CreateServiceInstanceListeners methods.CreateTcpListenerBinding() ) )}. the framework provides WcfClientCommunicationFactory.WcfCommunicationListener class. } We can create a WCF communication listener in the service the following manner. // // Populate the binding information that you want the service to use. which is the WCF-specific implementation of ClientCommunicationFactoryBase.Services. WCF-based communication stack for Reliable Services 1/25/2017 • 2 min to read • Edit Online The Reliable Services framework allows service authors to choose the communication stack that they want to use for their service.Runtime. } Writing clients for the WCF communication stack For writing clients to communicate with services by using WCF. The framework provides an implementation of the communication stack based on the Windows Communication Foundation (WCF) for service authors who want to use WCF-based communication.ServiceFabric.Wcf.Communication. Next steps Remote procedure call with Reliable Services remoting Web API with OWIN in Reliable Services . listenerName. If that is not the case. public class WcfCommunicationClient : ServicePartitionClient<WcfCommunicationClient<ICalculator>> { public WcfCommunicationClient(ICommunicationClientFactory<WcfCommunicationClient<ICalculator>> communicationClientFactory. servicePartitionResolver: partitionResolver). string listenerName = null.CreateTcpClientBinding(). retrySettings) { } } Client code can use the WcfCommunicationClientFactory along with the WcfCommunicationClient which implements ServicePartitionClient to determine the service endpoint and communicate with the service. NOTE The default ServicePartitionResolver assumes that the client is running in same cluster as the service. targetReplicaSelector. The WCF communication channel can be accessed from the WcfCommunicationClient created by the WcfCommunicationClientFactory.GetDefault(). TargetReplicaSelector targetReplicaSelector = TargetReplicaSelector.Result. // Create binding Binding binding = WcfUtility. 3)). ServicePartitionKey partitionKey = null. Uri serviceUri. object callback = null). IServicePartitionResolver servicePartitionResolver = null. OperationRetrySettings retrySettings = null) : base(communicationClientFactory. // // Call the service to perform the operation.InvokeWithRetryAsync( client => client. // // Create a client for communicating with the ICalculator service that has been created with the // Singleton partition scheme. string traceId = null.Add(2. ServiceUri. // Create a partition resolver IServicePartitionResolver partitionResolver = ServicePartitionResolver. var wcfClientFactory = new WcfCommunicationClientFactory<ICalculator> (clientBinding: binding. IEnumerable<IExceptionHandler> exceptionHandlers = null. public WcfCommunicationClientFactory( Binding clientBinding = null. // create a WcfCommunicationClientFactory object. // var calculatorServiceCommunicationClient = new WcfCommunicationClient( wcfClientFactory. serviceUri. // var result = calculatorServiceCommunicationClient.Singleton).Channel. partitionKey. create a ServicePartitionResolver object and pass in the cluster connection endpoints. ServicePartitionKey.Default. Securing communication for Reliable Services . performs network address translation and forwards external requests to internal IP:port . So. Determine the cause of connection failures. Resolve the service location initially through the naming service. Communicating by using the reverse proxy The reverse proxy in Service Fabric runs on all the nodes in the cluster. Microservices communication model Microservices in Service Fabric typically run on a subset of virtual machines in the cluster and can move from one virtual machine to another for various reasons. This process generally involves wrapping client-side communication libraries in a retry loop that implements the service resolution and retry policies. So. which is a network boundary between microservices and external clients. the endpoints for microservices can change dynamically. The typical pattern to communicate to the microservice is the following resolve loop: 1. Connect to the service. Azure Load Balancer. 2. see Connect and communicate with services. It performs the entire service resolution process on a client's behalf and then forwards the client request. For more information. and resolve the service location again when necessary. clients that run on the cluster can use any client-side HTTP communication libraries to talk to the target service by using the reverse proxy that runs locally on the same node. Reverse proxy in Azure Service Fabric 4/7/2017 • 9 min to read • Edit Online The reverse proxy that's built into Azure Service Fabric addresses microservices in the Service Fabric cluster that exposes HTTP endpoints. 3. Reaching microservices from outside the cluster The default external communication model for microservices is an opt-in model where each service cannot be accessed directly from external clients. In such cases. Secure Sockets Layer (SSL) termination occurs at the reverse proxy. For HTTPS traffic. Reaching microservices via the reverse proxy from outside the cluster Instead of configuring the port of an individual service in Load Balancer. all microservices in the cluster that expose an HTTP endpoint are addressable from outside the cluster. you must first configure Load Balancer to forward traffic to each port that the service uses in the cluster. Furthermore. most microservices. To make a microservice's endpoint directly accessible to external clients. you can configure the . WARNING When you configure the reverse proxy's port in Load Balancer. The microservices can move between nodes on failover. The reverse proxy uses HTTP to forward requests to services in the cluster. you can configure just the port of the reverse proxy in Load Balancer. Cluster fully qualified domain name (FQDN) | internal IP: For external clients. This configuration lets clients outside the cluster reach services inside the cluster by using the reverse proxy without additional configuration. URI format for addressing services by using the reverse proxy The reverse proxy uses a specific uniform resource identifier (URI) format to identify the service partition to which the incoming request should be forwarded: http(s)://<Cluster FQDN | internal IP>:Port/<ServiceInstanceName>/<Suffix path>?PartitionKey= <key>&PartitionKind=<partitionkind>&ListenerName=<listenerName>&TargetReplicaSelector= <targetReplicaSelector>&Timeout=<timeout_in_seconds> http(s): The reverse proxy can be configured to accept HTTP or HTTPS traffic.endpoints. especially stateful microservices. Note that HTTPS services are not currently supported. don't live on all nodes of the cluster. Load Balancer cannot effectively determine the location of the target node of the replicas to which it should forward traffic. Timeout: This specifies the timeout for the HTTP request created by the reverse proxy to the service on behalf of the client request..}}.com:19008/MyApp/MyService?PartitionKey=3&PartitionKind=Int64Range Internally: http://localhost:19008/MyApp/MyService?PartitionKey=3&PartitionKind=Int64Range To reach the resources that the service exposes. This is an optional parameter. Example usage As an example.cloudapp. This parameter is not required for services that use the singleton partition scheme. the reverse proxy can be reached on localhost or on any internal node IP.eastus. this is the computed partition key of the partition that you want to reach. or 'RandomReplica'. the PartitionKey and PartitionKind query string parameters must be used to reach a partition of the service: Externally: http://mycluster. This can be omitted if the service has only one listener.com.azure. to reach the fabric:/myapp/myservice/ service. for the service that you want to connect to.cloudapp.0.0. For internal traffic."Listener2":"Endpoint2" . PartitionKind: This is the service partition scheme. When the target service is stateless. Suffix path: This is the actual URL path. When the service exposes multiple endpoints.1. simply place the resource path after the service name in the URL: Externally: . ServiceInstanceName: This is the fully-qualified name of the deployed service instance that you are trying to reach without the "fabric:/" scheme. The default value is 60 seconds. For example. such as myapi/values/add/3. the reverse proxy runs on every node. that has been specified for the reverse proxy. and the service can be reached by using the gateway as: Externally: http://mycluster. the default is 'PrimaryReplica'.5:10592/3f0d39ad-924b-4233-b4a7-02617c6308a6-130834621071472715/ Following are the resources for the service: /index.html /api/users/<userId> If the service uses the singleton partitioning scheme. let's take the fabric:/MyApp/MyService service that opens an HTTP listener on the following URL: http://10. reverse proxy so that it is reachable through the cluster domain. the PartitionKey and PartitionKind query string parameters are not required. TargetReplicaSelector This specifies how the target replica or instance should be selected.azure.eastus. This can be 'Int64Range' or 'Named'. Port: This is the port. When the target service is stateful. This parameter is not required for services that use the singleton partition scheme. By default.. PartitionKey: For a partitioned service.0. such as 10. this identifies the endpoint that the client request should be forwarded to. you would use myapp/myservice.azure.cloudapp.0.eastus. such as 19008. 'RandomSecondaryReplica'. reverse proxy picks a random instance of the service partition to forward the request to. Note that this is not the partition ID GUID. ListenerName The endpoints from the service are of the form {"Endpoints": {"Listener1":"Endpoint1".com:19008/MyApp/MyService Internally: http://localhost:19008/MyApp/MyService If the service uses the Uniform Int64 partitioning scheme. such as mycluster. the TargetReplicaSelector can be one of the following: 'PrimaryReplica'. When this parameter is not specified. In this case. The first case is a normal HTTP 404. the service should return the following HTTP response header: X-ServiceFabric : ResourceNotFound This HTTP response header indicates a normal HTTP 404 situation in which the requested resource does not exist.5:10592/3f0d39ad-924b-4233-b4a7-02617c6308a6-130834621071472715/api/users/6 Special handling for port-sharing services Azure Application Gateway attempts to resolve a service address again and retry the request when a service cannot be reached. replicas or service instances can share a host process and might also share a port when hosted by an http. a hint from the server is required. but the resolved service instance or replica is no longer available on the host. and Application Gateway will not attempt to resolve the service address again.5:10592/3f0d39ad-924b-4233-b4a7-02617c6308a6-130834621071472715/index. To indicate case #1 to Application Gateway.HttpListener ASP. it is likely that the web server is available in the host process and responding to requests.0.azure. the user has requested a resource that does exist.Net. Refer to Configure HTTPS Reverse Proxy in a secure cluster for Azure Resource Manager template samples to configure secure reverse proxy with a certificate and handling certificate rollover. the service instance or replica has moved to a different node as part of its normal lifecycle. This is a major benefit of Application Gateway because client code does not need to implement its own service resolution and resolve loop.eastus. First. By default. Thus.cloudapp.0. when a service cannot be reached. including: System. Generally. Application Gateway was unable to locate it because the service itself has moved. You can either use the sample templates or create a custom Resource Manager template. the gateway will receive an HTTP 404 response from the web server. an HTTP 404 has two distinct meanings: Case #1: The service address is correct. However. Setup and configuration You can use the Azure Resource Manager template to enable the reverse proxy in Service Fabric for the cluster.html? PartitionKey=3&PartitionKind=Int64Range Internally: http://localhost:19008/MyApp/MyService/api/users/6?PartitionKey=3&PartitionKind=Int64Range The gateway will then forward these requests to the service's URL: http://10. To make that distinction. When this happens. you can enable the reverse proxy by using the following steps: .sys-based web server. Application Gateway assumes case #2 and attempts to resolve and issue the request again.NET Core WebListener Katana In this situation. Application Gateway might receive a network connection error indicating that an endpoint is no longer open on the originally resolved address. which is considered a user error. Application Gateway needs to resolve the address again and retry the request. Case #2: The service address is incorrect. Application Gateway thus needs a way to distinguish between these two cases. and the resource that the user requested might exist on a different node.com:19008/MyApp/MyService/index.0. but the resource that the user requested does not exist. However. in the second case.0.html http://10. http://mycluster. Then. you get the template for the cluster that you want to deploy. }. ].. .ServiceFabric/clusters". "SFReverseProxyPort": { "type": "int".. "reverseProxyEndpointPort": "[parameters('SFReverseProxyPort')]". The port is identified by the parameter name. "metadata": { "description": "Endpoint for Service Fabric Reverse proxy" } }. To address the reverse proxy from outside the Azure cluster.... .. } 3.. Define a port for the reverse proxy in the Parameters section of the template. . "type": "Microsoft. set up the Azure Load Balancer rules for the port that you specified in step 1..1. "location": "[parameters('clusterLocation')]". "name": "[parameters('clusterName')]". . . "defaultValue": 19008. "nodeTypes": [ { . Specify the port for each of the nodetype objects in the Cluster Resource type section. reverseProxyEndpointPort. { "apiVersion": "2016-09-01".. 2.. { "apiVersion": "[variables('lbApiVersion')]".. "probe": { "id": "[concat(variables('lbID0'). . "frontendIPConfiguration": { "id": "[variables('lbIPConfig0')]" }... "frontendPort": "[parameters('SFReverseProxyPort')]". "port": "[parameters('SFReverseProxyPort')]". "protocol": "tcp" } } ] } 4. "reverseProxyCertificate": { "thumbprint": "[parameters('sfReverseProxyCertificateThumbprint')]". "idleTimeoutInMinutes": "5". "loadBalancingRules": [ .. parameters('supportLogStorageAccountName'))]" ]..Storage/storageAccounts/'. "probes": [ .. . "properties": { . "x509StoreName": "[parameters('sfReverseProxyCertificateStoreName')]" }. { "name": "LBSFReverseProxyRule". } } Supporting a reverse proxy certificate that's different from the cluster certificate . { "apiVersion": "2016-09-01". "properties": { "backendAddressPool": { "id": "[variables('lbPoolID0')]" }.ServiceFabric/clusters". "type": "Microsoft. . "properties": { "intervalInSeconds": 5. "dependsOn": [ "[concat('Microsoft. { "name": "SFReverseProxyProbe"..'/probes/SFReverseProxyProbe')]" }.. "backendPort": "[parameters('SFReverseProxyPort')]". "numberOfProbes": 2.. add the certificate to the reverseProxyCertificate property in the Cluster Resource type section. "location": "[parameters('clusterLocation')]". "enableFloatingIP": "false". "name": "[parameters('clusterName')]". "clusterState": "Default". "type": "Microsoft.Network/loadBalancers". To configure SSL certificates on the port for the reverse proxy. "protocol": "tcp" } } ].... { "apiVersion": "[variables('vmssApiVersion')]".ServiceFabric". add that certificate to the osProfile. "reverseProxyCertificate": { "thumbprint": "[parameters('sfReverseProxyCertificateThumbprint')]".Compute/virtualMachineScaleSets". "vaultCertificates": [ { "certificateStore": "[parameters('sfReverseProxyCertificateStoreValue')]".. . "computernamePrefix": "[parameters('vmNodeType0Name')]".. }. then the previously specified certificate should be installed on the virtual machine and added to the access control list (ACL) so that Service Fabric can access it. "durabilityLevel": "Bronze". "osProfile": { "adminPassword": "[parameters('adminPassword')]". This can be done in the virtualMachineScaleSets Resource type section.. "extensions": [ { "name": "[concat(parameters('vmNodeType0Name'). "settings": { "clusterEndpoint": "[reference(parameters('clusterName')). Next steps . "type": "Microsoft. "typeHandlerVersion": "1. "dataPath": "D:\\\\SvcFab". "x509StoreName": "[parameters('sfReverseProxyCertificateStoreValue')]" }. ] } NOTE When you use certificates that are different from the cluster certificate to enable the reverse proxy on an existing cluster.. "properties": { "type": "ServiceFabricNode". "certificateUrl": "[parameters('sfReverseProxyCertificateUrlValue')]" } ] } ] } .'_ServiceFabricNode')]". Complete the Azure Resource Manager template deployment by using the settings mentioned previously before you start a deployment to enable the reverse proxy in steps 1-4..If the reverse proxy certificate is different from the certificate that secures the cluster.clusterEndpoint]". "autoUpgradeMinorVersion": false. "testExtension": true. "publisher": "Microsoft. The extension section of the template can update the certificate in the ACL.. ..Azure.. "adminUsername": "[parameters('adminUsername')]". "secrets": [ { "sourceVault": { "id": "[parameters('sfReverseProxySourceVaultValue')]" }. For installation.0" } }. "nodeTypeRef": "[parameters('vmNodeType0Name')]". install the reverse proxy certificate and update the ACL on the cluster before you enable the reverse proxy. See an example of HTTP communication between services in a sample project on GitHub. Remote procedure calls with Reliable Services remoting Web API that uses OWIN in Reliable Services WCF communication by using Reliable Services . OnCloseAsync can be used to safely close any resources. the OnChangeRoleAsync event is triggered: . This guide talks about advanced usages of Reliable Services to gain more control and flexibility over your services.Java OnAbort is called when the stateless service instance is being forcefully shut down. Both stateful and stateless services have two primary entry points for user code: RunAsync(C#) / runAsync(Java) is a general-purpose entry point for your service code. In addition to open. This is generally called when a permanent fault is detected on the node. or aborted. Prior to reading this guide.C# / void onAbort() . A stateless service can only be opened. This can occur when the service's code is being upgraded. void OnAbort() . familiarize yourself with the Reliable Services programming model. Stateful service replica lifecycle NOTE Stateful reliable services are not supported in Java yet. these two entry points are sufficient. Advanced usage of the Reliable Services programming model 4/3/2017 • 3 min to read • Edit Online Azure Service Fabric simplifies writing and managing reliable stateless and stateful services. close. Extended service initialization tasks can be started at this time. Task OnCloseAsync(CancellationToken) . close. or when Service Fabric cannot reliably manage the service instance's lifecycle due to internal failures. a stateful service replica undergoes role changes during its lifetime. In rare cases when more control over a service's lifecycle is required. finish saving external state.Java OnCloseAsync is called when the stateless service instance is going to be gracefully shut down. Stateless service instance lifecycle A stateless service's lifecycle is very simple. or a transient fault is detected. RunAsync in a stateless service is executed when a service instance is opened.Java OnOpenAsync is called when the stateless service instance is about to be used. CancellationToken) . the open. stop any background processing. and abort events in a stateless service are also available: Task OnOpenAsync(IStatelessServicePartition. Although RunAsync should be sufficient in almost all cases. or close down existing connections. closed.C# / CompletableFuture<String> onOpenAsync(CancellationToken) . For most services. additional lifecycle events are available. CreateServiceReplicaListeners(C#) and CreateServiceInstanceListeners(C#) / createServiceInstanceListeners(Java) is for opening communication listeners for client requests. and abort events. When a stateful service replica changes role. and canceled when a service instance is closed or aborted.C# / CompletableFuture onCloseAsync(CancellationToken) . A stateful service replica's lifecycle is much more complex than a stateless service instance. the service instance is being moved due to load balancing. with the same semantics and use cases: * Task OnOpenAsync(IStatefulServicePartition. Secondary replicas are given read status (can only read from existing Reliable Collections). Most work in a stateful service is performed at the primary replica. or other read-only jobs. The RunAsync method in a stateful service is executed only when the stateful service replica is primary. see the following articles: Configuring stateful Reliable Services Service Fabric health introduction Using system health reports for troubleshooting Configuring Services with the Service Fabric Cluster Resource Manager . CancellationToken) OnChangeRoleAsync is called when the stateful service replica is changing role. The RunAsync method is canceled when a primary replica's role changes away from primary. A stateful service also provides the same four lifecycle events as a stateless service. Secondary replicas can perform read-only validation. for example to primary or secondary. report generation. Task OnChangeRoleAsync(ReplicaRole. In a stateful service. only the primary replica has write access to state and thus is generally when the service is performing actual work. Primary replicas are given write status (are allowed to create and write to Reliable Collections). CancellationToken) * Task OnCloseAsync(CancellationToken) * void OnAbort() Next steps For more advanced topics related to Service Fabric. data mining. as well as during the close and abort events. Using the OnChangeRoleAsync event allows you to perform work depending on replica role as well as in response to role change. consider the actor pattern to model your problem or scenario if: Your problem space involves a large number (thousands or more) of small. When to use Reliable Actors Service Fabric Reliable Actors is an implementation of the actor design pattern. they do not need to be explicitly created or destroyed. Introduction to Service Fabric Reliable Actors 3/7/2017 • 11 min to read • Edit Online Reliable Actors is a Service Fabric application framework based on the Virtual Actor pattern. It will also maintain knowledge of the actor's existence should it need to be reactivated later. What are Actors? An actor is an isolated.NET object is an instance of a . Actor Lifetime Service Fabric actors are virtual. meaning that their lifetime is not tied to their in-memory representation. Your actor instances won't block callers with unpredictable delays by issuing I/O operations. The actor pattern is a computational model for concurrent or distributed systems in which a large number of these actors can execute simultaneously and independently of each other. independent. including querying state across a set of actors. the decision whether to use a specific pattern is made based on whether or not a software design problem fits the pattern. careful consideration of the constraints of the pattern and the framework implementing it must be made. see Actor lifecycle and garbage collection. and in fact the Reliable Actors implementation deviates at times from this model. there may be an actor type that implements the functionality of a calculator and there could be many actors of that type that are distributed on various nodes across a cluster. identical to the way a . For example. The Reliable Actors runtime automatically activates an actor the first time it receives a request for that actor ID. An actor is automatically activated (causing an actor object to be constructed) the first time a message is sent . and isolated units of state and logic. Each such actor is uniquely identified by an actor ID. Although the actor design pattern can be a good fit to a number of distributed systems problems and scenarios. the Reliable Actors runtime garbage-collects the in-memory object. stateful Reliable Service. This virtual actor lifetime abstraction carries some caveats as a result of the virtual actor model. The Reliable Actors API provides a single-threaded programming model built on the scalability and reliability guarantees provided by Service Fabric. As general guidance. Each Reliable Actor service you write is actually a partitioned. For more details. Actors in Service Fabric In Service Fabric. If an actor is not used for a period of time. Every actor is defined as an instance of an actor type. As a result. independent unit of compute and state with single-threaded execution. As with any software design pattern. actors are implemented in the Reliable Actors framework: An actor-pattern-based application framework built on top of Service Fabric Reliable Services. You want to work with single-threaded objects that do not require significant interaction from external components.NET type. Actors can communicate with each other and they can create more actors. and automatic failover are all provided by virtue of the fact that actors are running inside a stateful Reliable Service called the Actor Service. the actor object is garbage collected. refer to partitioning concepts for actors. causes a new actor object to be constructed. Distribution and failover To provide scalability and reliability. an actor service with nine partitions deployed to three nodes using the default actor partition placement would be distributed thusly: The Actor Framework manages partition scheme and key range settings for you. This is an abstraction over a partitioned. using the actor ID again. An actor's state outlives the object's lifetime when stored in the state manager. it should be expected that actor operations will always require network communication. Service Fabric manages distribution and failover of the service partitions. In advanced scenarios. After some period of time. actor types have their constructor called implicitly by the runtime. key range (when using a range partitioning scheme). and partition count. every method on the interface must be Task-returning. . This simplifies some choices but also carries some consideration: Reliable Services allows you to choose a partitioning scheme. In the future. scalability. doing so can result in an unbalanced distribution of actors across partitions. Each service partition contains a set of actors. For example. if the actor requires initialization parameters from the client. you do have the ability to explicitly delete an actor and its state. However. stateful Reliable Service. it is possible to control actor partition placement by using Int64 actor IDs that map to specific partitions. and those partitions are distributed across the nodes in a Service Fabric cluster. Because this interface is used to invoke actor methods asynchronously. Reliable Actors is restricted to the range partitioning scheme (the uniform Int64 scheme) and requires you use the full Int64 key range. including serialization and deserialization of method call data. Therefore. Although Reliable Actors implicitly create actor objects. reliability. Service Fabric distributes actors throughout the cluster and automatically migrates them from failed nodes to healthy ones as required. Actor communication Actor interactions are defined in an interface that is shared by the actor that implements the interface. There is no single entry point for the activation of an actor from the client. incurring latency and overhead. By default. Because actors are randomly placed. and the client that gets a proxy to an actor via the same interface. For this reason. although parameters may be passed to the actor's constructor by the service itself. client code cannot pass parameters to the actor type's constructor. actors are randomly placed into partitions resulting in uniform distribution. to its actor ID. Distribution. The result is that actors may be constructed in a partially-initialized state by the time other methods are called on it. Calling any actor method for an actor ID activates that actor. Actors are distributed across the partitions of the Actor Service. For more information on how actor services are partitioned. In particular. so the arguments and the result types of the tasks that they return must be serializable by the platform. IMyActor myActor = ActorProxy. Turn-based access greatly simplifies concurrent systems as there is no need for synchronization mechanisms for data access.get(). It also means systems must be designed with special considerations for the single-threaded access nature of each actor instance. it will be activated by this method call.CreateRandom(). Note that the two pieces of information used to create the actor proxy object are the actor ID and the application name. As a result.create(actorId. The actor ID uniquely identifies the actor. The actor proxy can be used for client-to-actor and actor-to-actor communication. new Uri("fabric:/MyApp/MyActorService")). . // Create actor ID with some name ActorId actorId = new ActorId("Actor1"). This means that no more than one thread can be active inside an actor object's code at any time. a client creates an actor proxy object that implements the actor interface. while the application name identifies the Service Fabric application where the actor is deployed. MyActor. The client interacts with the actor by invoking methods on the proxy object. If an actor with the given ID does not exist. new URI("fabric:/MyApp/MyActorService"). // This only creates a proxy object. await myActor.class). The actor runtime will automatically time out on actor calls and throw an exception to the caller to interrupt possible deadlock situations. message delivery has the following characteristics: Message delivery is best effort. Concurrency The Reliable Actors runtime provides a simple turn-based access model for accessing actor methods. it will be activated by this method call. The actor proxy The Reliable Actors client API provides communication between an actor instance and an actor client. it does not activate an actor or invoke any methods yet. // This will invoke a method on the actor. // Create a randomly distributed actor ID ActorId actorId = ActorId. A single actor instance cannot process more than one request at a time. myActor.Method invocations and their responses ultimately result in network requests across the cluster.Create<IMyActor>(actorId. An actor instance can cause a throughput bottleneck if it is expected to handle concurrent requests.DoWorkAsync().DoWorkAsync(). To communicate with an actor. it does not activate an actor or invoke any methods yet. Actors can deadlock on each other if there is a circular request between two actors while an external request is made to one of the actors simultaneously. MyActor myActor = ActorProxyBase. they must be data contract serializable. If an actor with the given ID does not exist. The ActorProxy (C#) / ActorProxyBase (Java) class on the client side performs the necessary resolution to locate the actor by ID and open a communication channel with it. Actors may receive duplicate messages from the same client. // This only creates a proxy object. It also retries to locate the actor in the cases of communication failures and failovers. // This will invoke a method on the actor. Turn-based access A turn consists of the complete execution of an actor method in response to a request from other actors or clients. The diagram below shows an example of a timeline for the execution of these methods and callbacks on behalf of two actors (ActorId1 and ActorId2) that belong to this actor type. Thus. It is worth emphasizing that turn-based concurrency is respected even across different methods. the Actors runtime does not interleave them. an actor method or timer/reminder callback that is currently executing must be fully finished before a new call to a method or callback is allowed. A method or callback is considered to have finished if the execution has returned from the method or callback and the task returned by the method or callback has finished. A turn must be fully finished before a new turn is allowed. The Actors runtime enforces turn-based concurrency by acquiring a per-actor lock at the beginning of a turn and releasing the lock at the end of the turn. turn-based concurrency is enforced on a per-actor basis and not across actors. Method1 and Method2). and callbacks. Even though these methods and callbacks are asynchronous. Consider an actor type that implements two asynchronous methods (say. timers. and a reminder. The following example illustrates the above concepts. In other words. Actor methods and timer/reminder callbacks can execute simultaneously on behalf of different actors. or the complete execution of a timer/reminder callback. . a timer. the second execution of Method1 does not begin until the prior execution has finished. a reminder registered by ActorId2 fires while Method1 is being executed in response to client request xyz789. Similarly. with newer events occurring below older ones. Method2. the per-actor lock is released only after both the method/callback returns and the asynchronous operation finishes. Similarly. This means that if an actor method of Actor A calls a method on Actor B. Reentrancy The Actors runtime allows reentrancy by default. another client request (abc123) arrives that also requires Method1 to be executed by ActorId2. and the timer callback on behalf of ActorId1 happening in a serial fashion. The reminder callback is executed only after both executions of Method1 are complete. In some others. which in turn calls another method on Actor A. that method is allowed to run. Different colors are used for timelines corresponding to different actors. In some of the method/callback executions. This is because turn-based concurrency is enforced only within an actor and not across actors. Execution of Method1 on behalf of ActorId1 overlaps with its execution on behalf of ActorId2. turn-based concurrency is also enforced for ActorId1. However. The events marked on each vertical line occur in chronological order. Highlighting is used to indicate the duration for which the per-actor lock is held on behalf of a method or callback. Some important points to consider: While Method1 is executing on behalf of ActorId2 in response to client request xyz789.This diagram follows these conventions: Each vertical line shows the logical flow of execution of a method or a callback on behalf of a particular actor. All of this is due to turn-based concurrency being enforced for ActorId2. In both cases. the asynchronous operation has already finished by the time the method/callback returns. the Task (C#) / CompletableFuture (Java) returned by the method/callback finishes after the method returns. This is because it is part of . as demonstrated by the execution of Method1. it provides these guarantees for the method invocations that are done in response to a client request. However. For example. if the actor code directly invokes these methods outside of the mechanisms provided by the Actors runtime. then the runtime cannot provide any concurrency guarantees. Next steps Getting started with Reliable Actors How Reliable Actors use the Service Fabric platform Actor state management Actor lifecycle and garbage collection Actor timers and reminders Actor events Actor reentrancy Actor polymorphism and object-oriented design patterns Actor diagnostics and performance monitoring .the same logical call-chain context. then the runtime cannot provide concurrency guarantees. For example. actors should use actor timers and actor reminders that respect turn-based concurrency. then the runtime also cannot provide concurrency guarantees. to perform background operations. if the method is invoked in the context of some task that is not associated with the task returned by the actor methods. See the Reliable Actors reentrancy for more details. Therefore. If the method is invoked from a thread that the actor creates on its own. as well as for timer and reminder callbacks. All timer and reminder calls start with the new logical call context. Scope of concurrency guarantees The Actors runtime provides these concurrency guarantees in situations where it controls the invocation of these methods. The ActorProxy class is used by client applications to invoke the methods exposed through the actor interface. Installation and setup Before you start. and deploying a simple Reliable Actor application in Visual Studio. a Reliable Actor service needs to be registered with the Service Fabric runtime. ActorProxy class. ref. The following rules that pertain to actor interfaces are worth mentioning: Actor interface methods cannot be overloaded. As with Reliable Services. In addition. The actor interface is used to define a strongly typed public interface of an actor. the actor interface defines the types of messages that the actor can understand and process. and create a new Service Fabric application project: . If you need to set it up. Basic concepts To get started with Reliable Actors. Actor interface methods must not have out. Generic interfaces are not supported. Failure handling: It can retry method invocations and re-resolve the actor location after. ensure that you have the Service Fabric development environment set up on your machine. Actor instances are activated in a named service instance. or optional parameters. In the Reliable Actor model terminology. for example. debugging. see detailed instructions on how to set up the development environment. you only need to understand a few basic concepts: Actor service. Reliable Actors can implement multiple interfaces. The ActorProxy class provides two important functionalities: Name resolution: It is able to locate the actor in the cluster (find the node of the cluster where it is hosted). Getting started with Reliable Actors 3/9/2017 • 4 min to read • Edit Online This article explains the basics of Azure Service Fabric Reliable Actors and walks you through creating. Reliable Actors are packaged in Reliable Services that can be deployed in the Service Fabric infrastructure. Create a new project in Visual Studio Launch Visual Studio 2015 or Visual Studio 2017 as an administrator. Actor registration. a failure that requires the actor to be relocated to another node in the cluster. the actor type needs to be registered with the Actor runtime. The actor interface is used by other actors and client applications to "send" (asynchronously) messages to the actor. Actor interface. For the HelloWorld project. After you have created the solution. you should see the following structure: . let's use the Service Fabric Reliable Actors service. you can choose the type of project that you want to create.In the next dialog box. Your actor interfaces can be defined in any project with any name. An actor implementation is a class that derives from the base type Actor and implements the interface(s) that are defined in the MyActor. This is the project that contains the interface definition for the actor. It contains the implementation of the actor. In the MyActor. you can define the interfaces that will be used by the actors in the solution.Interfaces project.Interfaces project. An actor class must also implement a constructor that accepts an ActorService instance and an ActorId and passes them to the base Actor class. public interface IMyActor : IActor { Task<string> HelloWorld(). so it typically makes sense to define it in an assembly that is separate from the actor implementation and can be shared by multiple other projects. This allows for constructor dependency injection of platform dependencies. This is the project that packages all of the services together for deployment.Reliable Actors basic building blocks A typical Reliable Actors solution is composed of three projects: The application project (MyActorApplication). . } The actor service project (MyActor). This is the project used to define the Service Fabric service that is going to host the actor. It contains the ApplicationManifest.xml and PowerShell scripts for managing the application. The interface project (MyActor. however the interface defines the actor contract that is shared by the actor implementation and the clients calling the actor.Interfaces). your actor type must also be registered with the Actor Service. actorType) => new ActorService(context.GetResult(). actorId) { } public Task<string> HelloWorld() { return Task. the registration is included by default in the code that Visual Studio generates. [StatePersistence(StatePersistence.FromResult("Hello world!"). Thread.ActorHostInitializationFailed(e. IMyActor { public MyActor(ActorService actorService. } } } If you start from a new project in Visual Studio and you have only one actor definition. actorType. } } The actor service must be registered with a service type in the Service Fabric runtime. you need to add the actor registration by using: ActorRuntime.RegisterActorAsync<MyActor>( (context. If you define other actors in the service.Sleep(Timeout. The ActorRuntime registration method performs this work for actors. you can see the progress in the Output window.RegisterActorAsync<MyOtherActor>(). TIP The Service Fabric Actors runtime emits some events and performance counters related to actor methods. It also deploys the application on the local Service Fabric cluster and attaches the debugger.Infinite). internal static class Program { private static void Main() { try { ActorRuntime.GetAwaiter(). throw. } catch (Exception e) { ActorEventSource. Visual Studio builds (if necessary) packages. ActorId actorId) : base(actorService. They are useful in diagnostics and performance monitoring.Current. During the deployment process. () => new MyActor())). .Persisted)] class MyActor : Actor. Debugging The Service Fabric tools for Visual Studio support debugging on your local machine.ToString()). In order for the Actor Service to run your actor instances. You can start a debugging session by hitting the F5 key. Next steps How Reliable Actors use the Service Fabric platform Actor state management Actor lifecycle and garbage collection Actor API reference documentation Sample code . the actor interface defines the types of messages that the actor can understand and process. name the application "HelloWorldActorApplication" and the actor "HelloWorldActor. Generic interfaces are not supported. Actor instances are activated in a named service instance. Actor registration. The Service Fabric SDK for Linux includes a Yeoman generator to provide the scaffolding for a Service Fabric application with a stateless service. Actor interface. make sure you have the Service Fabric development environment set up on your machine." The following scaffolding will be created: . Installation and setup Before you start. In the Reliable Actor model terminology. Getting started with Reliable Actors 1/23/2017 • 4 min to read • Edit Online This article explains the basics of Azure Service Fabric Reliable Actors and walks you through creating and deploying a simple Reliable Actor application in Java. The actor interface is used to define a strongly typed public interface of an actor. For this tutorial. ActorProxy class. In addition. If you need to set it up. go to getting started on Mac or getting started on Linux. Failure handling: It can retry method invocations and re-resolve the actor location after. Create an actor service Start by creating a new Service Fabric application. a failure that requires the actor to be relocated to another node in the cluster. the actor type needs to be registered with the Actor runtime. a Reliable Actor service needs to be registered with the Service Fabric runtime. or optional parameters. Reliable Actors can implement multiple interfaces. The ActorProxy class is used by client applications to invoke the methods exposed through the actor interface. As with Reliable Services. ref. The ActorProxy class provides two important functionalities: Name resolution: It is able to locate the actor in the cluster (find the node of the cluster where it is hosted). Actor interface methods must not have out. Reliable Actors are packaged in Reliable Services that can be deployed in the Service Fabric infrastructure. The actor interface is used by other actors and client applications to "send" (asynchronously) messages to the actor. you only need to understand a few basic concepts: Actor service. Start by running the following Yeoman command: $ yo azuresfjava Follow the instructions to create a Reliable Actor Service. The following rules that pertain to actor interfaces are worth mentioning: Actor interface methods cannot be overloaded. for example. Basic concepts To get started with Reliable Actors. xml │ └── HelloWorldActorPkg │ ├── Code │ │ ├── entryPoint.gradle ├── HelloWorldActor │ ├── build.sh │ │ └── _readme. CompletableFuture<?> setCountAsync(int count). } Actor service This contains your actor implementation and actor registration code.gradle │ ├── src │ │ └── reliableactor │ │ └── test │ │ └── HelloWorldActorTestClient. HelloWorldActorApplication/ ├── build. so it typically makes sense to define it in a place that is separate from the actor implementation and can be shared by multiple other services or client applications.java │ └── HelloWorldActorImpl.gradle │ └── src │ └── reliableactor │ └── HelloWorldActor.gradle │ ├── settings.sh Reliable Actors basic building blocks The basic concepts described earlier translate into the basic building blocks of a Reliable Actor service.sh ├── install.java ├── HelloWorldActorTestClient │ ├── build. HelloWorldActor/src/reliableactor/HelloWorldActorImpl : .sh ├── settings. HelloWorldActorInterface/src/reliableactor/HelloWorldActor. This is where your actor does its work.xml │ ├── Data │ │ └── _readme.xml ├── HelloWorldActorInterface │ ├── build.java ├── HelloWorldActorApplication │ ├── ApplicationManifest.java │ └── testclient. Actor interface This contains the interface definition for the actor.gradle └── uninstall.txt │ └── ServiceManifest.gradle │ ├── settings.gradle │ └── src │ └── reliableactor │ ├── HelloWorldActorHost.java : public interface HelloWorldActor extends Actor { @Readonly CompletableFuture<Integer> getCountAsync().txt │ │ └── Settings. The actor class implements the actor interface. This interface defines the actor contract that is shared by the actor implementation and the clients calling the actor.txt │ ├── Config │ │ ├── _readme. INFO.stateManager().stateManager().registerActorAsync(HelloWorldActorImpl.sleep(Long. value) -> count > value ? count : value).getLogger(this. actorType) -> new ActorServiceImpl(context.stateManager(). (key.MAX_VALUE).log(Level.printStackTrace(). The application Finally. count. } catch (Exception e) { e.INFO. } } Actor registration The actor service must be registered with a service type in the Service Fabric runtime.log(Level. actorType.HelloWorldActorService") @StatePersistenceAttribute(statePersistence = StatePersistence. It does not get deployed with your service. Thread. count).log(Level. It contains the ApplicationManifest.getStateAsync("count").xml and place holders for the actor service package. "onActivateAsync"). The ActorRuntime registration method performs this work for actors. } @Override public CompletableFuture<?> setCountAsync(int count) { logger. @ActorServiceAttribute(name = "HelloWorldActor. "Setting current count value {0}".ofSeconds(10)). return this.getName()). Run the application . (context. } } } Test client This is a simple test client application you can run separately from the Service Fabric application to test your actor service. Duration. throw e.Persisted) public class HelloWorldActorImpl extends ReliableActor implements HelloWorldActor { Logger logger = Logger. the application packages the actor service and any other services you might add in the future together for deployment. your actor type must also be registered with the Actor Service. ()-> new HelloWorldActorImpl()).INFO. return this. In order for the Actor Service to run your actor instances.getClass(). return this. This is an example of where the ActorProxy can be used to activate and communicate with actor instances. 0). } @Override public CompletableFuture<Integer> getCountAsync() { logger. HelloWorldActor/src/reliableactor/HelloWorldActorHost : public class HelloWorldActorHost { public static void main(String[] args) throws Exception { try { ActorRuntime.tryAddStateAsync("count".class. protected CompletableFuture<?> onActivateAsync() { logger. "Getting current count value").addOrUpdateStateAsync("count". /install. To run the application.sh .sh script to deploy: $ . The install. Simply run the install.The Yeoman scaffolding includes a gradle script to build the application and bash scripts to deploy and un-deploy the application. first build the application with gradle: $ gradle This will produce a Service Fabric application package that can be deployed using Service Fabric Azure CLI.sh script contains the necessary Azure CLI commands to deploy the application package. Reliable Actors run in a framework that is hosted in an implementation of a stateful reliable service called the actor service. Service layering Because the actor service itself is a reliable service. An actor service remoting listener accepts remote access calls to actors and sends them to a dispatcher to route to the appropriate actor instance. all the application model. deployment. lifecycle. packaging. . These components together form the Reliable Actor framework. orange represents the Reliable Actor framework. Blue elements represent the Reliable Services application framework. How Reliable Actors use the Service Fabric platform 4/13/2017 • 8 min to read • Edit Online This article explains how Reliable Actors work on the Azure Service Fabric platform. The Actor State Provider wraps state providers (such as the Reliable Collections state provider) and provides an adapter for actor state management. The actor service contains all the components necessary to manage the lifecycle and message dispatching for your actors: The Actor Runtime manages lifecycle. and green represents user code. garbage collection. The preceding diagram shows the relationship between the Service Fabric application frameworks and user code. and enforces single-threaded access. upgrade. and scaling concepts of Reliable Services apply the same way to actor services. your service inherits the StatefulService class. This class is itself derived from StatefulServiceBase (or StatelessService for stateless services). string applicationInstanceName = this. Thread. a circuit breaker.CodePackageActivationContext.PartitionId. URI serviceInstanceName = this. actor instances can programmatically obtain the service context.ServiceName. Using the actor service Actor instances have access to the actor service in which they are running.Context.ApplicationName. NOTE Stateful services are not currently supported in Java/Linux. you can write your own service that derives from ActorService and implement service-level features the same way you would when inheriting StatefulService .ActorService. } } Alternatively.getPartitionId(). Through the actor service. and other Service Fabric platform-specific information: Task MyActorMethod() { Guid partitionId = this. You can then configure the actor service and explicitly construct your actor instances.ActorService.getApplicationName().Infinite). service name.getActorService().GetResult().getCodePackageActivationContext(). Uri serviceInstanceName = this.Context. The actor service is a different implementation of the StatefulServiceBase class that implements the actor pattern where your actors run.Context.getServiceContext().getServiceContext(). you can use a lambda provided by the registration method to construct the actor service yourself.GetAwaiter(). string serviceTypeName = this. application name.getServiceContext(). } Like all Reliable Services. Shared functionality for all actors.getServiceName(). String applicationInstanceName = this.RegisterActorAsync<MyActor>().getActorService(). Because the actor service itself is just an implementation of StatefulServiceBase . you use the actor service. you can just register your actor type.ServiceTypeName. for example. where you can inject dependencies to your actor through its constructor: .ActorService.getServiceContext(). String serviceTypeName = this. the actor service must be registered with a service type in the Service Fabric runtime. and the actor service with default settings will implicitly be used: static class Program { private static void Main() { ActorRuntime.Context.getActorService().getServiceTypeName().ActorService. Remote procedure calls on the actor service itself and on each individual actor. For the actor service to run your actor instances.In Reliable Services. such as: Service backup and restore.getActorService().Sleep(Timeout. The ActorRuntime registration method performs this work for actors. your actor type must also be registered with the actor service. In the simplest case. In Reliable Actors. The service context has the partition ID. } CompletableFuture<?> MyActorMethod() { UUID partitionId = this. Items.Where(x => x. Enumerating actors The actor service allows a client to enumerate metadata about the actors that the service is hosting. actorTypeInfo) -> new FabricActorService(context. The following example shows how to create a list of all active actors in one partition of an actor service: IActorService actorServiceProxy = ActorServiceProxy. (context. the enumeration is returned as a set of paged results. () => new MyActor())) . continuationToken = page. actorType) => new ActorService(context. ContinuationToken continuationToken = null. cancellationToken).AddRange(page. } while (continuationToken != null). actorTypeInfo).GetAwaiter(). Thread. do { PagedResult<ActorInformation> page = await actorServiceProxy. List<ActorInformation> activeActors = new List<ActorInformation>(). Because each partition might contain many actors. static class Program { private static void Main() { ActorRuntime.Sleep(Timeout.IsActive)).class.RegisterActorAsync<MyActor>( (context. } } static class Program { private static void Main() { ActorRuntime. which in turn implements IService (C#) or Service (Java). Because the actor service is a partitioned stateful service.sleep(Long.GetActorsAsync(continuationToken. It contains service-level methods that can be called remotely via service remoting.ContinuationToken.GetResult(). This is the interface used by Reliable Services remoting. partitionKey). The pages are looped over until all pages are read. Thread.MAX_VALUE). } } Actor service methods The Actor service implements IActorService (C#) or ActorService (Java). activeActors. enumeration is performed per partition.Infinite). which allows remote procedure calls on service methods.Create( new Uri("fabric:/MyApp/MyService"). actorType.registerActorAsync( MyActor. . timeout). while(ActorInformation x: page. In this custom actor service. For more information on deleting actors and their state.getItems()) { if(x. cancellationToken) ActorId actorToDelete = new ActorId(id). see the actor lifecycle documentation.create( new URI("fabric:/MyApp/MyService"). IActorService myActorServiceProxy = ActorServiceProxy.getActorsAsync(continuationToken). A custom actor service inherits all the actor runtime functionality from ActorService (C#) or FabricActorService (Java) and can be used to implement your own service methods. partitionKey).getContinuationToken(). ActorTypeInformation typeInfo. Deleting actors The actor service also provides a function for deleting actors: ActorId actorToDelete = new ActorId(id).add(x). you can implement your own service-level functionality by writing a service class that inherits ActorService (C#) or FabricActorService (Java). class MyActorService : ActorService { public MyActorService(StatefulServiceContext context. } } continuationToken = page. you can register your own custom actor service that derives from ActorService (C#) and FabricActorService (Java). } while (continuationToken != null). Custom actor service By using the actor registration lambda.isActive()){ activeActors. newActor) { } } .deleteActorAsync(actorToDelete). myActorServiceProxy. ActorService actorServiceProxy = ActorServiceProxy.Create( new Uri("fabric:/MyApp/MyService"). actorToDelete).create( new URI("fabric:/MyApp/MyService"). actorToDelete). do { PagedResult<ActorInformation> page = actorServiceProxy. List<ActorInformation> activeActors = new ArrayList<ActorInformation>(). typeInfo. ContinuationToken continuationToken = null.DeleteActorAsync(actorToDelete. ActorService myActorServiceProxy = ActorServiceProxy. Func<ActorBase> newActor) : base(context. await myActorServiceProxy. actorType) => new MyActorService(context.GetAwaiter().MAX_VALUE).GetResult(). ActorTypeInformation typeInfo. actorType.sleep(Long. typeInfo. timeout). newActor).class. the custom actor service exposes a method to back up actor data by taking advantage of the remoting listener already present in ActorService : .Sleep(Timeout. } } static class Program { private static void Main() { ActorRuntime. Thread.registerActorAsync( MyActor. actorTypeInfo). () => new MyActor())) .RegisterActorAsync<MyActor>( (context. } } public class Program { public static void main(String[] args) { ActorRuntime. } } Implementing actor backup and restore In the following example. (context. class MyActorService extends FabricActorService { public MyActorService(StatefulServiceContext context.Infinite). ActorBase> newActor) { super(context. Thread. BiFunction<FabricActorService. ActorId. actorTypeInfo) -> new FabricActorService(context. typeInfo. Func<ActorBase> newActor) : base(context.Directory return true. CancellationToken cancellationToken) { try { // store the contents of backupInfo. IMyActorService { public MyActorService(StatefulServiceContext context.Directory. } finally { Directory. ActorTypeInformation typeInfo.BackupAsync(new BackupDescription(PerformBackupAsync)).public interface IMyActorService : IService { Task BackupActorsAsync(). } } } . } private async Task<bool> PerformBackupAsync(BackupInfo backupInfo.Delete(backupInfo. newActor) { } public Task BackupActorsAsync() { return this. recursive: true). } class MyActorService : ActorService. } private CompletableFuture<Boolean> performBackupAsync(BackupInfo backupInfo. so the application model is the same. actorId). IMyActorService is a remoting contract that implements IService (C#) and Service (Java).BackupActorsAsync().backupAsync(new BackupDescription((backupInfo. MyActorService myActorServiceProxy = ActorServiceProxy. By adding this remoting contract.listFiles().Directory) } } void deleteDirectory(File file) { File[] contents = file.CreateRandom()). ActorTypeInformation typeInfo.create(MyActorService. the actor framework build . Application model Actor services are Reliable Services. } finally { deleteDirectory(backupInfo. myActorServiceProxy. } public CompletableFuture backupActorsAsync() { return this. ActorId.backupActorsAsync(). } class MyActorServiceImpl extends ActorService implements MyActorService { public MyActorService(StatefulServiceContext context. and is then implemented by MyActorService . CancellationToken cancellationToken) { try { // store the contents of backupInfo. typeInfo. } } file. if (contents != null) { for (File f : contents) { deleteDirectory(f). public interface MyActorService extends Service { CompletableFuture<?> backupActorsAsync(). newActor).class.Create<IMyActorService>( new Uri("fabric:/MyApp/MyService"). methods on IMyActorService are now also available to a client by creating a remoting proxy via ActorServiceProxy : IMyActorService myActorServiceProxy = ActorServiceProxy. ActorId. However. await myActorServiceProxy. Func<FabricActorService.delete(). cancellationToken))).Directory return true. cancellationToken) -> performBackupAsync(backupInfo. ActorBase> newActor) { super(context. } } In this example. new URI("fabric:/MyApp/MyService"). Actor ID Each actor that's created in the service has a unique ID associated with it. Every ActorId is hashed to an Int64. the replica set count in the default service definition is reset accordingly. However. Service Fabric partition concepts for actors Actor services are partitioned stateful services. including GUIDs/UUIDs.tools generate some of the application model files for you.newId()). Service partitions are automatically distributed over multiple nodes in Service Fabric. The actor service uses the Int64 partitioning scheme with the full Int64 key range to map actors to partitions. represented by the ActorId class. The type name is generated based on your actor's project name.xml file.create<MyActor>(MyActor. Service manifest The actor framework build tools automatically generate the contents of your actor service's ServiceManifest. ActorId. ActorId is an opaque ID value that can be used for uniform distribution of actors across the service partitions by generating random IDs: ActorProxy. Each partition of an actor service contains a set of actors. custom ID values can be used for an ActorID . This file includes: Actor service type. and Int64s. Reliable Services can be created with different partition schemes and partition key ranges. Resources and endpoints. Partition scheme and range are set to Uniform Int64 with the full Int64 key range. Based on the persistence attribute on your actor. strings. the HasPersistedState flag is also set accordingly. Actor instances are distributed as a result. Code package. ActorProxyBase.class.CreateRandom()).Create<IMyActor>(ActorId. This is why the actor service must use an Int64 partitioning scheme with the full Int64 key range. The build tools populate the default service properties: Replica set count is determined by the persistence attribute on your actor. Config package. Application manifest The actor framework build tools automatically create a default service definition for your actor service. Each time the persistence attribute on your actor is changed. . ActorProxyBase. new ActorId("myActorId")).class. the values are hashed to an Int64. new ActorId(1234)). ActorProxy. new ActorId(UUID.create(MyActor. Next steps Actor state management Actor lifecycle and garbage collection Actors API reference documentation .Create<IMyActor>(new ActorId(1234)). when you're explicitly providing an Int64 to an ActorId .create(MyActor. You can use this technique to control which partition the actors are placed in.NewGuid())). ActorProxy.randomUUID())).Create<IMyActor>(new ActorId(Guid.Create<IMyActor>(new ActorId("myActorId")).class. When you're using GUIDs/UUIDs and strings.NET sample code Java sample code .create(MyActor. ActorProxy. ActorProxyBase. ActorProxyBase. However. the Int64 will map directly to a partition without further hashing.class. The actor's state is loaded if it's maintaining state. Actor operations like state changes should not be called from this method. They are useful in diagnostics and performance monitoring. it is removed from the Active Actors table. it does not remove state stored in the actor's State Manager. This clears all the timers for the actor. Garbage collection only cleans up the actor object. Before we go into the details of deactivation. An actor and its state can also be deleted manually at any time. it does not count as "being used". Actor activation When an actor is activated. the following occurs: When an actor is not used for some period of time.ReceiveReminderAsync method being invoked (applicable only if the actor uses reminders) NOTE if the actor uses timers and its timer callback is invoked. automatic garbage collection. Actor garbage collection When an actor is deactivated. references to the actor object are released and it can be garbage collected normally by the common language runtime (CLR) or java virtual machine (JVM) garbage collector. Actor deactivation When an actor is deactivated. a new actor is created. The actor is now considered active. and manual delete 3/31/2017 • 5 min to read • Edit Online An actor is activated the first time a call is made to any of its methods. An actor is deactivated (garbage collected by the Actors runtime) if it is not used for a configurable period of time. Actor lifecycle. the following occurs: When a call comes for an actor and one is not already active. The OnDeactivateAsync (C#) or onDeactivateAsync (Java) method (which can be overridden in the actor implementation) is called. The next time the actor is activated. The OnActivateAsync (C#) or onActivateAsync (Java) method (which can be overridden in the actor implementation) is called. What counts as “being used” for the purpose of deactivation and garbage collection? Receiving a call IRemindable. it is important to define the following terms: . a new actor object is created and its state is restored. TIP The Fabric Actors runtime emits some events related to actor activation and deactivation. registerActorAsync( MyActor. actorTypeInfo).GetResult(). This is the interval at which the Actors runtime scans its Active Actors table for actors that can be deactivated and garbage collected. not used). After this. The following diagram shows the lifecycle of a single actor to illustrate these concepts. Typically. the actor runtime keeps track of the amount of time that it has been idle (i. An actor is not considered to have been used if its timer callback is executed. The default value for this is 60 minutes. actorType) => new ActorService(context. Anytime an actor is used. timeout). settings: new ActorServiceSettings() { ActorGarbageCollectionSettings = new ActorGarbageCollectionSettings(10. 2) })) . actorType. its idle time is reset to 0. the actor can be garbage collected only if it again remains idle for IdleTimeoutInSeconds . . Recall that an actor is considered to have been used if either an actor interface method or an actor reminder callback is executed. This is the amount of time that an actor needs to remain unused (idle) before it can be deactivated and garbage collected. if necessary. (context. } } For each active actor. The actor runtime checks each of the actors every ScanIntervalInSeconds to see if it can be garbage collected and collects it if it has been idle for IdleTimeoutInSeconds .e. Idle timeout. } } public class Program { public static void main(String[] args) { ActorRuntime.class. you do not need to change these defaults. actorTypeInfo) -> new FabricActorService(context. these intervals can be changed through ActorServiceSettings when registering your Actor Service: public class Program { public static void Main(string[] args) { ActorRuntime. Scan interval.GetAwaiter() .RegisterActorAsync<MyActor>((context. However. The default value for this is 1 minute. the execution of actor interface methods and reminder callbacks prevents garbage collection by resetting the actor's idle time to 0. The following points about the example are worth mentioning: ScanInterval and IdleTimeout are set to 5 and 10 respectively. The execution of timer callbacks does not reset the idle time to 0. and the actor is garbage collected.Create( new Uri("fabric:/MyApp/MyService").create( new Uri("fabric:/MyApp/MyService"). actorToDelete). In cases where actors store data in State Manager and are deactivated but never re-activated.10.8. the actor's idle time finally exceeds the idle timeout of 10. actorToDelete). its data is again made available to it through the State Manager. During the garbage collection scan at T=25. but it does not remove data that is stored in an actor's State Manager.25. await myActorServiceProxy. and its callback executes.DeleteActorAsync(actorToDelete.The example shows the impact of actor method calls. as defined by the scan interval of 5. (Units do not matter here. A periodic timer fires at T=4.15.16. However. The Actor Service provides a function for deleting actors from a remote caller: ActorId actorToDelete = new ActorId(id). it may be necessary to clean up their data. no matter how much time is spent in executing that method.20. An actor reminder callback executes at T=14 and further delays the garbage collection of the actor. It does not impact the idle time of the actor.24. When an actor is re-activated. Deleting actors and their state Garbage collection of deactivated actors only cleans up the actor object. cancellationToken) ActorId actorToDelete = new ActorId(id).12. the garbage collection of the actor is deferred until the timer callback has completed execution.deleteActorAsync(actorToDelete). and timers on the lifetime of this actor. myActorServiceProxy.20. ActorService myActorServiceProxy = ActorServiceProxy. since our purpose is only to illustrate the concept. reminders. As mentioned earlier. An actor method call at T=7 resets the idle time to 0 and delays the garbage collection of the actor.5.) The scan for actors to be garbage collected happens at T=0. IActorService myActorServiceProxy = ActorServiceProxy. An actor will never be garbage collected while it is executing one of its methods. . Note that an actor cannot call delete on itself from one of its actor methods because the actor cannot be deleted while executing within an actor call context. Its state is deleted permanently. Inactive Actor Its state is deleted permanently. in which the runtime has obtained a lock around the actor call to enforce single-threaded access.Deleting an actor has the following effects depending on whether or not the actor is currently active: Active Actor Actor is removed from active actors list and is deactivated. Next steps Actor timers and reminders Actor events Actor reentrancy Actor diagnostics and performance monitoring Actor API reference documentation C# Sample code Java Sample code . } . which are derived from the base Actor class that is provided by the platform. IActor(C#) and Actor(Java) are the platform-defined base interfaces for actors in the frameworks . This interface is used to generate a proxy class that can be used by clients to communicate with your actors. Inheritance in the Reliable Actors framework generally follows the . Polymorphism in the Reliable Actors framework 3/31/2017 • 2 min to read • Edit Online The Reliable Actors framework allows you to build actors using many of the same techniques that you would use in object-oriented design. IShape { public abstract Task<int> GetVerticeCount(). In the case of shapes. which allows types and interfaces to inherit from more generalized parents. the classic polymorphism example using shapes might look something like this: Types You can also create a hierarchy of actor types. In case of Java/Linux. it follows the Java model. } public abstract class ShapeImpl extends FabricActor implements Shape { public abstract CompletableFuture<int> getVerticeCount(). Interfaces can inherit from other interfaces as long as every interface that is implemented by an actor type and all of its parents ultimately derive from IActor(C#) or Actor(Java) . public abstract Task<double> GetAreaAsync().NET model with a few additional constraints. public abstract CompletableFuture<double> getAreaAsync().NET and Java respectively. you might have a base Shape (C#) or ShapeImpl (Java) type: public abstract class Shape : Actor. One of those techniques is polymorphism. Interfaces The Reliable Actors framework requires you to define at least one interface to be implemented by your actor type. Thus. } } Note the ActorService attribute on the actor type. scalability. and consistent state.FromResult(0). } } @ActorServiceAttribute(name = "Circle") @StatePersistenceAttribute(statePersistence = StatePersistence. Learn about the actor lifecycle.GetStateAsync<CircleState>("circle"). you may wish to create a base type that is solely intended for sharing functionality with subtypes and will never be used to instantiate concrete actors.Persisted) public class Circle extends ShapeImpl implements Circle { @Override public CompletableFuture<Integer> getVerticeCount() { return CompletableFuture.completedFuture(0).Subtypes of Shape (C#) or ShapeImpl (Java) can override methods from the base.Persisted)] public class Circle : Shape.radius * state. In some cases. } public override async Task<double> GetAreaAsync() { CircleState state = await this. Next steps See how the Reliable Actors framework leverages the Service Fabric platform to provide reliability. [ActorService(Name = "Circle")] [StatePersistence(StatePersistence.stateManager(). you should use the abstract keyword to indicate that you will never create an actor based on that type.Radius * state. In those cases.thenApply(state->{ return Math. })).radius.PI * state. This attribute tells the Reliable Actor framework that it should automatically create a service for hosting actors of this type.getStateAsync<CircleState>("circle"). } @Override public CompletableFuture<Double> getAreaAsync() { return (this. ICircle { public override Task<int> GetVerticeCount() { return Task.Radius. return Math.StateManager.PI * state. . an exception of type FabricException will be thrown. the message is reentrant. if Actor C calls Actor A. if an actor sends a reentrant message to another actor. allows logical call context-based reentrancy. by default. For example. There are two options available for actor reentrancy defined in the ActorReentrancyMode enum: LogicalCallContext (default behavior) Disallowed . Disallowed(2) } Reentrancy can be configured in an ActorService 's settings during registration. In this case.Disallowed . Reliable Actors reentrancy 3/31/2017 • 1 min to read • Edit Online The Reliable Actors runtime. The following example shows an actor service that sets the reentrancy mode to ActorReentrancyMode. so it will be allowed. As part of the message processing. The setting applies to all actor instances created in the actor service. Any other messages that are part of a different call context will be blocked on Actor A until it finishes processing. Disallowed = 2 } public enum ActorReentrancyMode { LogicalCallContext(1).disables reentrancy public enum ActorReentrancyMode { LogicalCallContext = 1. This allows for actors to be reentrant if they are in the same call context chain. . Actor A sends a message to Actor B. who sends a message to Actor C. Current. actorType) -> new FabricActorService( context. Thread. actorType.registerActorAsync( Actor1. actorServiceSettings. actorType) => new ActorService( context. ActorServiceSettings actorServiceSettings = new ActorServiceSettings(). static class Program { static void Main() { try { ActorRuntime.setReentrancyMode(ActorReentrancyMode.GetResult(). Thread.RegisterActorAsync<Actor1>( (context. } } } Next steps Actor diagnostics and performance monitoring Actor API reference documentation . ActorRuntime. () -> new Actor1().GetAwaiter(). actorServiceSettings. } catch (Exception e) { ActorEventSource. stateProvider. null. (context. () => new Actor1(). } } } static class Program { static void Main() { try { ActorConcurrencySettings actorConcurrencySettings = new ActorConcurrencySettings().sleep(Long.Sleep(Timeout.ActorHostInitializationFailed(e. settings: new ActorServiceSettings() { ActorConcurrencySettings = new ActorConcurrencySettings() { ReentrancyMode = ActorReentrancyMode. actorType. timeout).Infinite).Disallowed).setActorConcurrencySettings(actorConcurrencySettings). actorConcurrencySettings.ToString()). throw.getClass().MAX_VALUE).Disallowed } })) . } catch (Exception e) { throw e. C# Sample code Java Sample code . public interface IGameEvents : IActorEvents { void GameScoreUpdated(Guid gameId. Task<string> GetGameScore(). This interface must be derived from the IActorEvents interface. string currentScore). class GameEventsHandler : IGameEvents { public void GameScoreUpdated(Guid gameId. } } . } Declare the events published by the actor in the actor interface. gameId. } public interface GameEvents implements ActorEvents { void gameScoreUpdated(UUID gameId. Define an interface that describes the events published by the actor.WriteLine(@"Updates: Game: {0}. public interface IGameActor : IActor. The arguments of the methods must be data contract serializable. } On the client side. String currentScore). The methods must return void. ActorEventPublisherE<GameEvents> { CompletableFuture<?> updateGameStatus(GameStatus status). as event notifications are one way and best effort. Actor events 4/19/2017 • 1 min to read • Edit Online Actor events provide a way to send best-effort notifications from the actor to the clients. IActorEventPublisher<IGameEvents> { Task UpdateGameStatus(GameStatus status). implement the event handler. Score: {1}". string currentScore) { Console. The following code snippets show how to use actor events in your application. } public interface GameActor extends Actor. currentScore). CompletableFuture<String> getGameScore(). Actor events are designed for actor-to-client communication and should not be used for actor-to-actor communication. Parse(arg)). GameActor actorProxy = ActorProxyBase. new ActorId(UUID. GameEvents event = getEvent<GameEvents>(GameEvents. On the actor. ev.SubscribeAsync<TEvent> API.create<GameActor>(GameActor. } } On the client.GameScoreUpdated(Id. score). await proxy.out.fromString(args))).println("Updates: Game: "+gameId+" . score). use the ActorProxyEventExtensions. ApplicationName).gameScoreUpdated(Id. var proxy = ActorProxy.getUUIDId().GetGuidId(). If there are subscribers to the event.Create<IGameActor>( new ActorId(Guid. The actor proxy manages the active subscriptions and automatically re-subscribes them.SubscribeAsync<IGameEvents>(new GameEventsHandler()). event. Next steps Actor reentrancy Actor diagnostics and performance monitoring Actor API reference documentation C# Sample code C# . new GameEventsHandler()). var ev = GetEvent<IGameEvents>(). simply publish the events as they happen.subscribeAsync(actorProxy. create a proxy to the actor that publishes the event and subscribe to its events.Score: "+currentScore). To unsubscribe. In the event of failovers.class.class).UnsubscribeAsync<TEvent> API. the actor may fail over to a different process or node. class GameEventsHandler implements GameEvents { public void gameScoreUpdated(UUID gameId. the Actors runtime will send them the notification.NET Core Sample code Java Sample code . You can control the re-subscription interval through the ActorProxyEventExtensions. String currentScore) { System. return ActorProxyEventUtility. In this example. // Parameter to pass to the callback method TimeSpan. The APIs are very similar to the .. Actors can use the RegisterTimer (C#) or registerTimer (Java) and UnregisterTimer (C#) or unregisterTimer (Java) methods on their base class to register and unregister their timers.FromMilliseconds(15)). actorId) { } protected override Task OnActivateAsync() { . } protected override Task OnDeactivateAsync() { if (_updateTimer != null) { UnregisterTimer(_updateTimer). IVisualObject { private IActorTimer _updateTimer.OnDeactivateAsync(). the Actors runtime will call the MoveObject (C#) or moveObject (Java) method. The example below shows the use of timer APIs. } } . // Amount of time to delay before the callback is invoked TimeSpan. } private Task MoveObject(object state) { . when the timer is due.. // Time interval between invocations of the callback method return base.NET or Java timer to ensure that the callback methods respect the turn-based concurrency guarantees that the Actors runtime provides. } return base. return Task. ActorId actorId) : base(actorService.. _updateTimer = RegisterTimer( MoveObject. Actor timers Actor timers provide a simple wrapper around a .. This means that no other actor methods or timer/reminder callbacks will be in progress until this callback completes execution.FromMilliseconds(15). This article shows how to use timers and reminders and explains the differences between them.NET timer or Java timer. The method is guaranteed to respect the turn- based concurrency. // Callback method null. Actor timers and reminders 3/31/2017 • 5 min to read • Edit Online Actors can schedule periodic work on themselves by registering either timers or reminders. class VisualObjectActor : Actor.OnActivateAsync(). public VisualObjectActor(ActorService actorService.FromResult(true). stateManager(). v1).}). return null. }).setStateAsync(stateName. This implies that the timer is stopped while the callback is executing and is started when the callback finishes. actorId)..getId(). thenApply(r -> { . No timer callbacks are invoked . } } The next period of the timer starts after the callback completes execution. // Callback method "moveObject". } @Override protected CompletableFuture onActivateAsync() { .registerTimer( (o) -> this. public VisualObjectActorImpl(FabricActorService actorService.move(). All timers are stopped when the actor is deactivated as part of garbage collection.toString().getOrAddStateAsync( stateName.stateManager() . that actor object will be deactivated and a new instance will be activated. VisualObject. } @Override protected CompletableFuture onDeactivateAsync() { if (updateTimer != null) { unregisterTimer(updateTimer). The Actors runtime saves changes made to the actor's State Manager when the callback finishes.thenApply((r) -> { this.toString().stateName). return this. // Amount of time to delay before the callback is invoked Duration.thenCompose(v -> { VisualObject v1 = (VisualObject)v. })... new Random(this. } private CompletableFuture moveObject(Object state) { ..getId(). If an error occurs in saving the state. return this.hashCode()))) . ActorId actorId) { super(actorService.. // Time interval between invocations of the callback method return null.createRandom( this.getStateAsync(this.stateManager().moveObject(o). return (CompletableFuture<?>)this..onDeactivateAsync().ofMillis(10). v1. public class VisualObjectActorImpl extends FabricActor implements VisualObjectActor { private ActorTimer updateTimer. // Parameter to pass to the callback method Duration.ofMillis(timerIntervalInMilliSeconds)). null. } return super. after that. //The time interval between firing of reminders } In this example.GetBytes(amountInDollars) (C#) is the context that is associated with the reminder. It is up to the actor to register any timers that it needs when it is reactivated in the future. int amountInDollars = 100. But unlike timers. int amountInDollars = 100. Also. } @Override protected CompletableFuture onActivateAsync() { String reminderName = "Pay cell phone bill". For more information. IRemindable. //The amount of time to delay before firing the reminder period). BitConverter.FromDays(3). as shown in the example below.GetBytes(amountInDollars).e. "Pay cell phone bill" is the reminder name. It will be passed back to the actor as an argument to the reminder callback. reminders are triggered across actor deactivations and failovers because the Actors runtime persists information about the actor's reminders. ActorReminder reminderRegistration = this. an actor calls the RegisterReminderAsync method provided on the base class. as shown in the following example: protected override async Task OnActivateAsync() { string reminderName = "Pay cell phone bill". IActorReminder reminderRegistration = await this. To register a reminder. dueTime.registerReminderAsync( reminderName. .RegisterReminderAsync( reminderName. state. see the section on actor garbage collection. BitConverter. Specifically. Their functionality is similar to timers.ReceiveReminderAsync (C#) or Remindable. This is a string that the actor uses to uniquely identify a reminder. i.FromDays(1)). the Actors runtime does not retain any information about the timers that were running before deactivation. Actors that use reminders must implement the IRemindable interface. TimeSpan.receiveReminderAsync (Java). Actor reminders Reminders are a mechanism to trigger persistent callbacks on an actor at specified times. TimeSpan. reminders are triggered under all circumstances until the actor explicitly unregisters them or the actor is explicitly deleted. IToDoListActor. IActorReminder reminder = GetReminder("Pay cell phone bill").WriteLine("Please pay your cell phone bill of ${0}!". as shown in the examples below.completedFuture(true). byte[] context. an actor calls the UnregisterReminderAsync (C#) or unregisterReminderAsync (Java) method. Duration dueTime. } } public class ToDoListActorImpl extends FabricActor implements ToDoListActor. System. amountToPay).println("Please pay your cell phone bill of " + amountToPay). IRemindable { public ToDoListActor(ActorService actorService. public class ToDoListActor : Actor. An actor can register multiple reminders.wrap(context). ActorId actorId) : base(actorService. TimeSpan dueTime. and the ReceiveReminderAsync (C#) or receiveReminderAsync (Java) method is invoked when any of those reminders is triggered. the UnregisterReminderAsync (C#) or unregisterReminderAsync (Java) method accepts an IActorReminder (C#) or ActorReminder (Java) interface. } return CompletableFuture.getInt().Console. actorId) { } public Task ReceiveReminderAsync(string reminderName. The Actors runtime saves the actor's state when the ReceiveReminderAsync (C#) or receiveReminderAsync (Java) call finishes. Remindable { public ToDoListActor(FabricActorService actorService. To unregister a reminder.out. the Reliable Actors runtime will invoke the ReceiveReminderAsync (C#) or receiveReminderAsync (Java) method on the Actor.Equals("Pay cell phone bill")) { int amountToPay = BitConverter. CompletableFuture reminderUnregistration = unregisterReminderAsync(reminder). Duration period) { if (reminderName.equals("Pay cell phone bill")) { int amountToPay = ByteBuffer. } return Task. If an error occurs in saving the state. As shown above.FromResult(true). The actor can use the reminder name that is passed in to the ReceiveReminderAsync (C#) or receiveReminderAsync (Java) method to figure out which reminder was triggered. TimeSpan period) { if (reminderName. actorId). ActorId actorId) { super(actorService. } public CompletableFuture receiveReminderAsync(String reminderName. 0). System. ActorReminder reminder = getReminder("Pay cell phone bill"). byte[] context. that actor object will be deactivated and a new instance will be activated. } When a reminder is triggered. Task reminderUnregistration = UnregisterReminderAsync(reminder). The actor base class supports a GetReminder (C#) or getReminder (Java) method that can be used to retrieve the IActorReminder (C#) or ActorReminder (Java) interface .ToInt32(context. This is convenient because the actor does not need to persist the IActorReminder (C#) or ActorReminder (Java) interface that was returned from the RegisterReminder (C#) or registerReminder (Java) method call. Next Steps Actor events Actor reentrancy Actor diagnostics and performance monitoring Actor API reference documentation C# Sample code Java Sample code .by passing in the reminder name. State persistence and replication All Reliable Actors are considered stateful because each actor instance maps to a unique ID. Even though actors are considered stateful. state is not persisted to disk. actor services are always stateful services. when used on an actor. automatically selects a default state provider and automatically generates settings for replica count to achieve one of these three persistence settings. This is the most durable state storage option. Each level of persistence is simply a different state provider and replication configuration of your service. So if all replicas are lost at once. IMyActor { } @StatePersistenceAttribute(statePersistence = StatePersistence. This level is for actors that simply don't need to maintain state reliably. No persisted state: State is not replicated or written to disk. or when they are moved around between nodes in a cluster due to resource balancing or upgrades. and during upgrades and resource balancing. However. client calls are not guaranteed to be routed to the same server every time. upon reactivation after garbage collection. Reliable Actors state management 4/13/2017 • 8 min to read • Edit Online Reliable Actors are single-threaded objects that can encapsulate both logic and state.Persisted) class MyActorImpl extends FabricActor implements MyActor { } This setting uses a state provider that stores data on disk and automatically sets the service replica count to 3. Whether or not state is written to disk depends on the state provider--the component in a reliable service that stores state. the state is lost as well. This way. by contrast. Replication depends on how many replicas a service is deployed with. Actors can choose the level of state persistence and replication based on their data storage requirements: Persisted state: State is persisted to disk and is replicated to 3 or more replicas. Because actors run on Reliable Services. where state can persist through complete cluster outage. they can maintain state reliably by using the same persistence and replication mechanisms that Reliable Services uses. Volatile state .Persisted)] class MyActor : Actor. For this reason. The actor framework provides an attribute that. Volatile state: State is replicated to 3 or more replicas and only kept in memory. that does not mean they must store state reliably. Persisted state [StatePersistence(StatePersistence. This means that repeated calls to the same actor ID are routed to the same actor instance. both the state provider and replica count can easily be set manually. This provides resilience against node failure and actor failure. actors don't lose their state after failures. In a stateless system. As with Reliable Services. [StatePersistence(StatePersistence.Volatile)] class MyActor : Actor, IMyActor { } @StatePersistenceAttribute(statePersistence = StatePersistence.Volatile) class MyActorImpl extends FabricActor implements MyActor { } This setting uses an in-memory-only state provider and sets the replica count to 3. No persisted state [StatePersistence(StatePersistence.None)] class MyActor : Actor, IMyActor { } @StatePersistenceAttribute(statePersistence = StatePersistence.None) class MyActorImpl extends FabricActor implements MyActor { } This setting uses an in-memory-only state provider and sets the replica count to 1. Defaults and generated settings When you're using the StatePersistence attribute, a state provider is automatically selected for you at runtime when the actor service starts. The replica count, however, is set at compile time by the Visual Studio actor build tools. The build tools automatically generate a default service for the actor service in ApplicationManifest.xml. Parameters are created for min replica set size and target replica set size. You can change these parameters manually. But each time the StatePersistence attribute is changed, the parameters are set to the default replica set size values for the selected StatePersistence attribute, overriding any previous values. In other words, the values that you set in ServiceManifest.xml are only overridden at build time when you change the StatePersistence attribute value. <ApplicationManifest xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema- instance" ApplicationTypeName="Application12Type" ApplicationTypeVersion="1.0.0" xmlns="http://schemas.microsoft.com/2011/01/fabric"> <Parameters> <Parameter Name="MyActorService_PartitionCount" DefaultValue="10" /> <Parameter Name="MyActorService_MinReplicaSetSize" DefaultValue="3" /> <Parameter Name="MyActorService_TargetReplicaSetSize" DefaultValue="3" /> </Parameters> <ServiceManifestImport> <ServiceManifestRef ServiceManifestName="MyActorPkg" ServiceManifestVersion="1.0.0" /> </ServiceManifestImport> <DefaultServices> <Service Name="MyActorService" GeneratedIdRef="77d965dc-85fb-488c-bd06-c6c1fe29d593|Persisted"> <StatefulService ServiceTypeName="MyActorServiceType" TargetReplicaSetSize=" [MyActorService_TargetReplicaSetSize]" MinReplicaSetSize="[MyActorService_MinReplicaSetSize]"> <UniformInt64Partition PartitionCount="[MyActorService_PartitionCount]" LowKey="- 9223372036854775808" HighKey="9223372036854775807" /> </StatefulService> </Service> </DefaultServices> </ApplicationManifest> State manager Every actor instance has its own state manager: a dictionary-like data structure that reliably stores key/value pairs. The state manager is a wrapper around a state provider. You can use it to store data regardless of which persistence setting is used. It does not provide any guarantees that a running actor service can be changed from a volatile (in-memory-only) state setting to a persisted state setting through a rolling upgrade while preserving data. However, it is possible to change replica count for a running service. State manager keys must be strings. Values are generic and can be any type, including custom types. Values stored in the state manager must be data contract serializable because they might be transmitted over the network to other nodes during replication and might be written to disk, depending on an actor's state persistence setting. The state manager exposes common dictionary methods for managing state, similar to those found in Reliable Dictionary. Accessing state State can be accessed through the state manager by key. State manager methods are all asynchronous because they might require disk I/O when actors have persisted state. Upon first access, state objects are cached in memory. Repeat access operations access objects directly from memory and return synchronously without incurring disk I/O or asynchronous context-switching overhead. A state object is removed from the cache in the following cases: An actor method throws an unhandled exception after it retrieves an object from the state manager. An actor is reactivated, either after being deactivated or after failure. The state provider pages state to disk. This behavior depends on the state provider implementation. The default state provider for the Persisted setting has this behavior. You can retrieve state by using a standard Get operation that throws KeyNotFoundException (C#) or NoSuchElementException (Java) if an entry does not exist for the key: [StatePersistence(StatePersistence.Persisted)] class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public Task<int> GetCountAsync() { return this.StateManager.GetStateAsync<int>("MyState"); } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture<Integer> getCountAsync() { return this.stateManager().getStateAsync("MyState"); } } You can also retrieve state by using a TryGet method that does not throw if an entry does not exist for a key: class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public async Task<int> GetCountAsync() { ConditionalValue<int> result = await this.StateManager.TryGetStateAsync<int>("MyState"); if (result.HasValue) { return result.Value; } return 0; } } class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture<Integer> getCountAsync() { return this.stateManager().<Integer>tryGetStateAsync("MyState").thenApply(result -> { if (result.hasValue()) { return result.getValue(); } else { return 0; }); } } Saving state The state manager retrieval methods return a reference to an object in local memory. Modifying this object in local memory alone does not cause it to be saved durably. When an object is retrieved from the state manager and modified, it must be reinserted into the state manager to be saved durably. You can insert state by using an unconditional Set, which is the equivalent of the dictionary["key"] = value syntax: [StatePersistence(StatePersistence.Persisted)] class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public Task SetCountAsync(int value) { return this.StateManager.SetStateAsync<int>("MyState", value); } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture setCountAsync(int value) { return this.stateManager().setStateAsync("MyState", value); } } You can add state by using an Add method. This method throws InvalidOperationException (C#) or IllegalStateException (Java) when it tries to add a key that already exists. [StatePersistence(StatePersistence.Persisted)] class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public Task AddCountAsync(int value) { return this.StateManager.AddStateAsync<int>("MyState", value); } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture addCountAsync(int value) { return this.stateManager().addOrUpdateStateAsync("MyState", value, (key, old_value) -> old_value + value); } } You can also add state by using a TryAdd method. This method does not throw when it tries to add a key that already exists. [StatePersistence(StatePersistence.Persisted)] class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public async Task AddCountAsync(int value) { bool result = await this.StateManager.TryAddStateAsync<int>("MyState", value); if (result) { // Added successfully! } } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture addCountAsync(int value) { return this.stateManager().tryAddStateAsync("MyState", value).thenApply((result)->{ if(result) { // Added successfully! } }); } } At the end of an actor method, the state manager automatically saves any values that have been added or modified by an insert or update operation. A "save" can include persisting to disk and replication, depending on the settings used. Values that have not been modified are not persisted or replicated. If no values have been modified, the save operation does nothing. If saving fails, the modified state is discarded and the original state is reloaded. You can also save state manually by calling the SaveStateAsync method on the actor base: async Task IMyActor.SetCountAsync(int count) { await this.StateManager.AddOrUpdateStateAsync("count", count, (key, value) => count > value ? count : value); await this.SaveStateAsync(); } interface MyActor { CompletableFuture setCountAsync(int count) { this.stateManager().addOrUpdateStateAsync("count", count, (key, value) -> count > value ? count : value).thenApply(); this.stateManager().saveStateAsync().thenApply(); } } Removing state You can remove state permanently from an actor's state manager by calling the Remove method. This method throws KeyNotFoundException (C#) or NoSuchElementException (Java) when it tries to remove a key that doesn't exist. [StatePersistence(StatePersistence.Persisted)] class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public Task RemoveCountAsync() { return this.StateManager.RemoveStateAsync("MyState"); } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture removeCountAsync() { return this.stateManager().removeStateAsync("MyState"); } } You can also remove state permanently by using the TryRemove method. This method does not throw when it tries to remove a key that does not exist. [StatePersistence(StatePersistence.Persisted)] class MyActor : Actor, IMyActor { public MyActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public async Task RemoveCountAsync() { bool result = await this.StateManager.TryRemoveStateAsync("MyState"); if (result) { // State removed! } } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) class MyActorImpl extends FabricActor implements MyActor { public MyActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture removeCountAsync() { return this.stateManager().tryRemoveStateAsync("MyState").thenApply((result)->{ if(result) { // State removed! } }); } } Next steps Actor type serialization Actor polymorphism and object-oriented design patterns Actor diagnostics and performance monitoring Actor API reference documentation C# sample code Java sample code Configuring Reliable Actors--KVSActorStateProvider 2/9/2017 • 3 min to read • Edit Online You can modify the default configuration of KVSActorStateProvider by changing the settings.xml file that is generated in the Microsoft Visual Studio package root under the Config folder for the specified actor. The Azure Service Fabric runtime looks for predefined section names in the settings.xml file and consumes the configuration values while creating the underlying runtime components. NOTE Do not delete or modify the section names of the following configurations in the settings.xml file that is generated in the Visual Studio solution. Replicator security configuration Replicator security configurations are used to secure the communication channel that is used during replication. This means that services cannot see each other's replication traffic, ensuring that the data that is made highly available is also secure. By default, an empty security configuration section prevents replication security. Section name <ActorName>ServiceReplicatorSecurityConfig Replicator configuration Replicator configurations configure the replicator that is responsible for making the Actor State Provider state highly reliable. The default configuration is generated by the Visual Studio template and should suffice. This section talks about additional configurations that are available to tune the replicator. Section name <ActorName>ServiceReplicatorConfig Configuration names NAME UNIT DEFAULT VALUE REMARKS BatchAcknowledgementInter Seconds 0.015 Time period for which the val replicator at the secondary waits after receiving an operation before sending back an acknowledgement to the primary. Any other acknowledgements to be sent for operations processed within this interval are sent as one response. NAME UNIT DEFAULT VALUE REMARKS ReplicatorEndpoint N/A No default--required IP address and port that the parameter primary/secondary replicator will use to communicate with other replicators in the replica set. This should reference a TCP resource endpoint in the service manifest. Refer to Service manifest resources to read more about defining endpoint resources in the service manifest. RetryInterval Seconds 5 Time period after which the replicator re-transmits a message if it does not receive an acknowledgement for an operation. MaxReplicationMessageSize Bytes 50 MB Maximum size of replication data that can be transmitted in a single message. MaxPrimaryReplicationQueu Number of operations 1024 Maximum number of eSize operations in the primary queue. An operation is freed up after the primary replicator receives an acknowledgement from all the secondary replicators. This value must be greater than 64 and a power of 2. MaxSecondaryReplicationQu Number of operations 2048 Maximum number of eueSize operations in the secondary queue. An operation is freed up after making its state highly available through persistence. This value must be greater than 64 and a power of 2. Store configuration Store configurations are used to configure the local store that is used to persist the state that is being replicated. The default configuration is generated by the Visual Studio template and should suffice. This section talks about additional configurations that are available to tune the local store. Section name <ActorName>ServiceLocalStoreConfig Configuration names NAME UNIT DEFAULT VALUE REMARKS NAME UNIT DEFAULT VALUE REMARKS MaxAsyncCommitDelayInMil Milliseconds 200 Sets the maximum batching liseconds interval for durable local store commits. MaxVerPages Number of pages 16384 The maximum number of version pages in the local store database. It determines the maximum number of outstanding transactions. Sample configuration file <?xml version="1.0" encoding="utf-8"?> <Settings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2011/01/fabric"> <Section Name="MyActorServiceReplicatorConfig"> <Parameter Name="ReplicatorEndpoint" Value="MyActorServiceReplicatorEndpoint" /> <Parameter Name="BatchAcknowledgementInterval" Value="0.05"/> </Section> <Section Name="MyActorServiceLocalStoreConfig"> <Parameter Name="MaxVerPages" Value="8192" /> </Section> <Section Name="MyActorServiceReplicatorSecurityConfig"> <Parameter Name="CredentialType" Value="X509" /> <Parameter Name="FindType" Value="FindByThumbprint" /> <Parameter Name="FindValue" Value="9d c9 06 b1 69 dc 4f af fd 16 97 ac 78 1e 80 67 90 74 9d 2f" /> <Parameter Name="StoreLocation" Value="LocalMachine" /> <Parameter Name="StoreName" Value="My" /> <Parameter Name="ProtectionLevel" Value="EncryptAndSign" /> <Parameter Name="AllowedCommonNames" Value="My-Test-SAN1-Alice,My-Test-SAN1-Bob" /> </Section> </Settings> Remarks The BatchAcknowledgementInterval parameter controls replication latency. A value of '0' results in the lowest possible latency, at the cost of throughput (as more acknowledgement messages must be sent and processed, each containing fewer acknowledgements). The larger the value for BatchAcknowledgementInterval, the higher the overall replication throughput, at the cost of higher operation latency. This directly translates to the latency of transaction commits. Notes on Service Fabric Reliable Actors type serialization 4/13/2017 • 1 min to read • Edit Online The arguments of all methods, result types of the tasks returned by each method in an actor interface, and objects stored in an actor's state manager must be data contract serializable. This also applies to the arguments of the methods defined in actor event interfaces. (Actor event interface methods always return void.) Custom data types In this example, the following actor interface defines a method that returns a custom data type called VoicemailBox : public interface IVoiceMailBoxActor : IActor { Task<VoicemailBox> GetMailBoxAsync(); } public interface VoiceMailBoxActor extends Actor { CompletableFuture<VoicemailBox> getMailBoxAsync(); } The interface is implemented by an actor that uses the state manager to store a VoicemailBox object: [StatePersistence(StatePersistence.Persisted)] public class VoiceMailBoxActor : Actor, IVoicemailBoxActor { public VoiceMailBoxActor(ActorService actorService, ActorId actorId) : base(actorService, actorId) { } public Task<VoicemailBox> GetMailboxAsync() { return this.StateManager.GetStateAsync<VoicemailBox>("Mailbox"); } } @StatePersistenceAttribute(statePersistence = StatePersistence.Persisted) public class VoiceMailBoxActorImpl extends FabricActor implements VoicemailBoxActor { public VoiceMailBoxActorImpl(ActorService actorService, ActorId actorId) { super(actorService, actorId); } public CompletableFuture<VoicemailBox> getMailBoxAsync() { return this.stateManager().getStateAsync("Mailbox"); } } In this example, the VoicemailBox object is serialized when: The object is transmitted between an actor instance and a caller. The object is saved in the state manager where it is persisted to disk and replicated to other nodes. The Reliable Actor framework uses DataContract serialization. Therefore, the custom data objects and their members must be annotated with the DataContract and DataMember attributes, respectively. [DataContract] public class Voicemail { [DataMember] public Guid Id { get; set; } [DataMember] public string Message { get; set; } [DataMember] public DateTime ReceivedAt { get; set; } } public class Voicemail implements Serializable { private static final long serialVersionUID = 42L; private UUID id; //getUUID() and setUUID() private String message; //getMessage() and setMessage() private GregorianCalendar receivedAt; //getReceivedAt() and setReceivedAt() } [DataContract] public class VoicemailBox { public VoicemailBox() { this.MessageList = new List<Voicemail>(); } [DataMember] public List<Voicemail> MessageList { get; set; } [DataMember] public string Greeting { get; set; } } public class VoicemailBox implements Serializable { static final long serialVersionUID = 42L; public VoicemailBox() { this.messageList = new ArrayList<Voicemail>(); } private List<Voicemail> messageList; //getMessageList() and setMessageList() private String greeting; //getGreeting() and setGreeting() } Next steps Actor lifecycle and garbage collection Actor timers and reminders Actor events Actor reentrancy Actor polymorphism and object-oriented design patterns Actor diagnostics and performance monitoring Configure FabricTransport settings for Reliable Actors 4/13/2017 • 1 min to read • Edit Online Here are the settings that you can configure: C#: FabricTansportSettings Java: FabricTransportRemotingSettings You can modify the default configuration of FabricTransport in following ways. Assembly attribute The FabricTransportActorRemotingProvider attribute needs to be applied on the actor client and actor service assemblies. The following example shows how to change the default value of FabricTransport OperationTimeout settings: using Microsoft.ServiceFabric.Actors.Remoting.FabricTransport; [assembly:FabricTransportActorRemotingProvider(OperationTimeoutInSeconds = 600)] The following example shows how to change the default values of FabricTransport MaxMessageSize and OperationTimeoutInSeconds: using Microsoft.ServiceFabric.Actors.Remoting.FabricTransport; [assembly:FabricTransportActorRemotingProvider(OperationTimeoutInSeconds = 600,MaxMessageSize = 134217728)] Config package You can use a config package to modify the default configuration. Configure FabricTransport settings for the actor service Add a TransportSettings section in the settings.xml file. By default, actor code looks for SectionName as "<ActorName>TransportSettings". If that's not found, it checks for SectionName as "TransportSettings". <Section Name="MyActorServiceTransportSettings"> <Parameter Name="MaxMessageSize" Value="10000000" /> <Parameter Name="OperationTimeoutInSeconds" Value="300" /> <Parameter Name="SecurityCredentialsType" Value="X509" /> <Parameter Name="CertificateFindType" Value="FindByThumbprint" /> <Parameter Name="CertificateFindValue" Value="4FEF3950642138446CC364A396E1E881DB76B48C" /> <Parameter Name="CertificateStoreLocation" Value="LocalMachine" /> <Parameter Name="CertificateStoreName" Value="My" /> <Parameter Name="CertificateProtectionLevel" Value="EncryptAndSign" /> <Parameter Name="CertificateRemoteCommonNames" Value="ServiceFabric-Test-Cert" /> </Section> Configure FabricTransport settings for the actor client assembly If the client is not running as part of a service, you can create a "<Client Exe Name>.settings.xml" file in the same location as the client .exe file. Then add a TransportSettings section in that file. SectionName should be "TransportSettings". <?xml version="1.0" encoding="utf-8"?> <Settings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2011/01/fabric"> <Section Name="TransportSettings"> <Parameter Name="SecurityCredentialsType" Value="X509" /> <Parameter Name="OperationTimeoutInSeconds" Value="300" /> <Parameter Name="CertificateFindType" Value="FindByThumbprint" /> <Parameter Name="CertificateFindValue" Value="78 12 20 5a 39 d2 23 76 da a0 37 f0 5a ed e3 60 1a 7e 64 bf" /> <Parameter Name="OperationTimeoutInSeconds" Value="300" /> <Parameter Name="CertificateStoreLocation" Value="LocalMachine" /> <Parameter Name="CertificateStoreName" Value="My" /> <Parameter Name="CertificateProtectionLevel" Value="EncryptAndSign" /> <Parameter Name="CertificateRemoteCommonNames" Value="WinFabric-Test-SAN1-Alice" /> </Section> </Settings> xml file generated in the Visual Studio package root under the Config folder for the specified actor. .xml file that is generated in the Visual Studio solution. The cluster manifest is a single XML file that holds settings and configurations that apply to all nodes and services in the cluster.xml file and consumes the configuration values while creating the underlying runtime components. Note that changes in the cluster manifest affect all services that use ReliableDictionaryActorStateProvider and reliable stateful services. Configuration names NAME UNIT DEFAULT VALUE REMARKS WriteBufferMemoryPoolMini Kilobytes 8388608 Minimum number of KB to mumInKB allocate in kernel mode for the logger write buffer memory pool. NOTE Do not delete or modify the section names of the following configurations in the settings.xml. The Azure Service Fabric runtime looks for predefined section names in the settings. The file is typically called ClusterManifest. This memory pool is used for caching state information before writing to disk. Configuring Reliable Actors-- ReliableDictionaryActorStateProvider 2/9/2017 • 7 min to read • Edit Online You can modify the default configuration of ReliableDictionaryActorStateProvider by changing the settings. WriteBufferMemoryPoolMaxi Kilobytes No Limit Maximum size to which the mumInKB logger write buffer memory pool can grow. You can see the cluster manifest for your cluster using the Get-ServiceFabricClusterManifest powershell command. Global Configuration The global configuration is specified in the cluster manifest for the cluster under the KtlLogger section. There are also global settings that affect the configuration of ReliableDictionaryActorStateProvider. It allows configuration of the shared log location and size plus the global memory limits used by the logger. NAME UNIT DEFAULT VALUE REMARKS SharedLogId GUID "" Specifies a unique GUID to use for identifying the default shared log file used by all reliable services on all nodes in the cluster that do not specify the SharedLogId in their service specific configuration. If SharedLogId is specified, then SharedLogPath must also be specified. SharedLogPath Fully qualified path name "" Specifies the fully qualified path where the shared log file used by all reliable services on all nodes in the cluster that do not specify the SharedLogPath in their service specific configuration. However, if SharedLogPath is specified, then SharedLogId must also be specified. SharedLogSizeInMB Megabytes 8192 Specifies the number of MB of disk space to statically allocate for the shared log. The value must be 2048 or larger. Sample cluster manifest section <Section Name="KtlLogger"> <Parameter Name="WriteBufferMemoryPoolMinimumInKB" Value="8192" /> <Parameter Name="WriteBufferMemoryPoolMaximumInKB" Value="8192" /> <Parameter Name="SharedLogId" Value="{7668BB54-FE9C-48ed-81AC-FF89E60ED2EF}"/> <Parameter Name="SharedLogPath" Value="f:\SharedLog.Log"/> <Parameter Name="SharedLogSizeInMB" Value="16383"/> </Section> Remarks The logger has a global pool of memory allocated from non paged kernel memory that is available to all reliable services on a node for caching state data before being written to the dedicated log associated with the reliable service replica. The pool size is controlled by the WriteBufferMemoryPoolMinimumInKB and WriteBufferMemoryPoolMaximumInKB settings. WriteBufferMemoryPoolMinimumInKB specifies both the initial size of this memory pool and the lowest size to which the memory pool may shrink. WriteBufferMemoryPoolMaximumInKB is the highest size to which the memory pool may grow. Each reliable service replica that is opened may increase the size of the memory pool by a system determined amount up to WriteBufferMemoryPoolMaximumInKB. If there is more demand for memory from the memory pool than is available, requests for memory will be delayed until memory is available. Therefore if the write buffer memory pool is too small for a particular configuration then performance may suffer. The SharedLogId and SharedLogPath settings are always used together to define the GUID and location for the default shared log for all nodes in the cluster. The default shared log is used for all reliable services that do not specify the settings in the settings.xml for the specific service. For best performance, shared log files should be placed on disks that are used solely for the shared log file to reduce contention. SharedLogSizeInMB specifies the amount of disk space to preallocate for the default shared log on all nodes. SharedLogId and SharedLogPath do not need to be specified in order for SharedLogSizeInMB to be specified. Replicator security configuration Replicator security configurations are used to secure the communication channel that is used during replication. This means that services cannot see each other's replication traffic, ensuring the data that is made highly available is also secure. By default, an empty security configuration section prevents replication security. Section name <ActorName>ServiceReplicatorSecurityConfig Replicator configuration Replicator configurations are used to configure the replicator that is responsible for making the Actor State Provider state highly reliable by replicating and persisting the state locally. The default configuration is generated by the Visual Studio template and should suffice. This section talks about additional configurations that are available to tune the replicator. Section name <ActorName>ServiceReplicatorConfig Configuration names NAME UNIT DEFAULT VALUE REMARKS BatchAcknowledgementInter Seconds 0.015 Time period for which the val replicator at the secondary waits after receiving an operation before sending back an acknowledgement to the primary. Any other acknowledgements to be sent for operations processed within this interval are sent as one response. ReplicatorEndpoint N/A No default--required IP address and port that the parameter primary/secondary replicator will use to communicate with other replicators in the replica set. This should reference a TCP resource endpoint in the service manifest. Refer to Service manifest resources to read more about defining endpoint resources in service manifest. MaxReplicationMessageSize Bytes 50 MB Maximum size of replication data that can be transmitted in a single message. NAME UNIT DEFAULT VALUE REMARKS MaxPrimaryReplicationQueu Number of operations 8192 Maximum number of eSize operations in the primary queue. An operation is freed up after the primary replicator receives an acknowledgement from all the secondary replicators. This value must be greater than 64 and a power of 2. MaxSecondaryReplicationQu Number of operations 16384 Maximum number of eueSize operations in the secondary queue. An operation is freed up after making its state highly available through persistence. This value must be greater than 64 and a power of 2. CheckpointThresholdInMB MB 200 Amount of log file space after which the state is checkpointed. MaxRecordSizeInKB KB 1024 Largest record size that the replicator may write in the log. This value must be a multiple of 4 and greater than 16. OptimizeLogForLowerDiskUs Boolean true When true, the log is age configured so that the replica's dedicated log file is created by using an NTFS sparse file. This lowers the actual disk space usage for the file. When false, the file is created with fixed allocations, which provide the best write performance. SharedLogId guid "" Specifies a unique guid to use for identifying the shared log file used with this replica. Typically, services should not use this setting. However, if SharedLogId is specified, then SharedLogPath must also be specified. SharedLogPath Fully qualified path name "" Specifies the fully qualified path where the shared log file for this replica will be created. Typically, services should not use this setting. However, if SharedLogPath is specified, then SharedLogId must also be specified. Sample configuration file <?xml version="1.0" encoding="utf-8"?> <Settings xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://schemas.microsoft.com/2011/01/fabric"> <Section Name="MyActorServiceReplicatorConfig"> <Parameter Name="ReplicatorEndpoint" Value="MyActorServiceReplicatorEndpoint" /> <Parameter Name="BatchAcknowledgementInterval" Value="0.05"/> <Parameter Name="CheckpointThresholdInMB" Value="180" /> </Section> <Section Name="MyActorServiceReplicatorSecurityConfig"> <Parameter Name="CredentialType" Value="X509" /> <Parameter Name="FindType" Value="FindByThumbprint" /> <Parameter Name="FindValue" Value="9d c9 06 b1 69 dc 4f af fd 16 97 ac 78 1e 80 67 90 74 9d 2f" /> <Parameter Name="StoreLocation" Value="LocalMachine" /> <Parameter Name="StoreName" Value="My" /> <Parameter Name="ProtectionLevel" Value="EncryptAndSign" /> <Parameter Name="AllowedCommonNames" Value="My-Test-SAN1-Alice,My-Test-SAN1-Bob" /> </Section> </Settings> Remarks The BatchAcknowledgementInterval parameter controls replication latency. A value of '0' results in the lowest possible latency, at the cost of throughput (as more acknowledgement messages must be sent and processed, each containing fewer acknowledgements). The larger the value for BatchAcknowledgementInterval, the higher the overall replication throughput, at the cost of higher operation latency. This directly translates to the latency of transaction commits. The CheckpointThresholdInMB parameter controls the amount of disk space that the replicator can use to store state information in the replica's dedicated log file. Increasing this to a higher value than the default could result in faster reconfiguration times when a new replica is added to the set. This is due to the partial state transfer that takes place due to the availability of more history of operations in the log. This can potentially increase the recovery time of a replica after a crash. If you set OptimizeForLowerDiskUsage to true, log file space will be over-provisioned so that active replicas can store more state information in their log files, while inactive replicas will use less disk space. This makes it possible to host more replicas on a node. If you set OptimizeForLowerDiskUsage to false, the state information is written to the log files more quickly. The MaxRecordSizeInKB setting defines the maximum size of a record that can be written by the replicator into the log file. In most cases, the default 1024-KB record size is optimal. However, if the service is causing larger data items to be part of the state information, then this value might need to be increased. There is little benefit in making MaxRecordSizeInKB smaller than 1024, as smaller records use only the space needed for the smaller record. We expect that this value would need to be changed only in rare cases. The SharedLogId and SharedLogPath settings are always used together to make a service use a separate shared log from the default shared log for the node. For best efficiency, as many services as possible should specify the same shared log. Shared log files should be placed on disks that are used solely for the shared log file, to reduce head movement contention. We expect that these values would need to be changed only in rare cases. Learn about the differences between Cloud Services and Service Fabric before migrating applications. 2/13/2017 • 5 min to read • Edit Online Microsoft Azure Service Fabric is the next-generation cloud application platform for highly scalable, highly reliable distributed applications. It introduces many new features for packaging, deploying, upgrading, and managing distributed cloud applications. This is an introductory guide to migrating applications from Cloud Services to Service Fabric. It focuses primarily on architectural and design differences between Cloud Services and Service Fabric. Applications and infrastructure A fundamental difference between Cloud Services and Service Fabric is the relationship between VMs, workloads, and applications. A workload here is defined as the code you write to perform a specific task or provide a service. Cloud Services is about deploying applications as VMs. The code you write is tightly coupled to a VM instance, such as a Web or Worker Role. To deploy a workload in Cloud Services is to deploy one or more VM instances that run the workload. There is no separation of applications and VMs, and so there is no formal definition of an application. An application can be thought of as a set of Web or Worker Role instances within a Cloud Services deployment or as an entire Cloud Services deployment. In this example, an application is shown as a set of role instances. Service Fabric is about deploying applications to existing VMs or machines running Service Fabric on Windows or Linux. The services you write are completely decoupled from the underlying infrastructure, which is abstracted away by the Service Fabric application platform, so an application can be deployed to multiple environments. A workload in Service Fabric is called a "service," and one or more services are grouped in a formally-defined application that runs on the Service Fabric application platform. Multiple applications can be deployed to a single Service Fabric cluster. Service Fabric itself is an application platform layer that runs on Windows or Linux, whereas Cloud Services is a system for deploying Azure-managed VMs with workloads attached. The Service Fabric application model has a number of advantages: Fast deployment times. Creating VM instances can be time consuming. In Service Fabric, VMs are only deployed once to form a cluster that hosts the Service Fabric application platform. From that point on, application packages can be deployed to the cluster very quickly. High-density hosting. In Cloud Services, a Worker Role VM hosts one workload. In Service Fabric, applications are separate from the VMs that run them, meaning you can deploy a large number of applications to a small number of VMs, which can lower overall cost for larger deployments. The Service Fabric platform can run anywhere that has Windows Server or Linux machines, whether it's Azure or on-premises. The platform provides an abstraction layer over the underlying infrastructure so your application can run on different environments. Distributed application management. Service Fabric is a platform that not only hosts distributed applications, but also helps manage their lifecycle independently of the hosting VM or machine lifecycle. Application architecture The architecture of a Cloud Services application usually includes numerous external service dependencies, such as Service Bus, Azure Table and Blob Storage, SQL, Redis, and others to manage the state and data of an application and communication between Web and Worker Roles in a Cloud Services deployment. An example of a complete Cloud Services application might look like this: Service Fabric applications can also choose to use the same external services in a complete application. Using this example Cloud Services architecture, the simplest migration path from Cloud Services to Service Fabric is to replace only the Cloud Services deployment with a Service Fabric application, keeping the overall architecture the same. The Web and Worker Roles can be ported to Service Fabric stateless services with minimal code changes. At this stage, the system should continue to work the same as before. Taking advantage of Service Fabric's stateful features, external state stores can be internalized as stateful services where applicable. This is more involved than a simple migration of Web and Worker Roles to Service Fabric stateless services, as it requires writing custom services that provide equivalent functionality to your application as the external services did before. The benefits of doing so include: Removing external dependencies Unifying the deployment, management, and upgrade models. An example resulting architecture of internalizing these services could look like this: Communication and workflow Most Cloud Service applications consist of more than one tier. Similarly, a Service Fabric application consists of more than one service (typically many services). Two common communication models are direct communication and via an external durable storage. Direct communication With direct communication, tiers can communicate directly through endpoint exposed by each tier. In stateless environments such as Cloud Services, this means selecting an instance of a VM role, either randomly or round- robin to balance load, and connecting to its endpoint directly. Direct communication is a common communication model in Service Fabric. The key difference between Service Fabric and Cloud Services is that in Cloud Services you connect to a VM, whereas in Service Fabric you connect to a service. This is an important distinction for a couple reasons: Services in Service Fabric are not bound to the VMs that host them; services may move around in the cluster, and in fact, are expected to move around for various reasons: Resource balancing, failover, application and infrastructure upgrades, and placement or load constraints. This means a service instance's address can change at any time. A VM in Service Fabric can host multiple services, each with unique endpoints. Service Fabric provides a service discovery mechanism, called the Naming Service, which can be used to resolve endpoint addresses of services. Queues A common communication mechanism between tiers in stateless environments such as Cloud Services is to use an external storage queue to durably store work tasks from one tier to another. A common scenario is a web tier that sends jobs to an Azure Queue or Service Bus where Worker Role instances can dequeue and process the jobs. The same communication model can be used in Service Fabric. This can be useful when migrating an existing Cloud Services application to Service Fabric. Next Steps The simplest migration path from Cloud Services to Service Fabric is to replace only the Cloud Services deployment with a Service Fabric application, keeping the overall architecture of your application roughly the same. The following article provides a guide to help convert a Web or Worker Role to a Service Fabric stateless service. Simple migration: convert a Web or Worker Role to a Service Fabric stateless service Guide to converting Web and Worker Roles to Service Fabric stateless services 2/13/2017 • 8 min to read • Edit Online This article describes how to migrate your Cloud Services Web and Worker Roles to Service Fabric stateless services. This is the simplest migration path from Cloud Services to Service Fabric for applications whose overall architecture is going to stay roughly the same. Cloud Service project to Service Fabric application project A Cloud Service project and a Service Fabric Application project have a similar structure and both represent the deployment unit for your application - that is, they each define the complete package that is deployed to run your application. A Cloud Service project contains one or more Web or Worker Roles. Similarly, a Service Fabric Application project contains one or more services. The difference is that the Cloud Service project couples the application deployment with a VM deployment and thus contains VM configuration settings in it, whereas the Service Fabric Application project only defines an application that will be deployed to a set of existing VMs in a Service Fabric cluster. The Service Fabric cluster itself is only deployed once, either through an ARM template or through the Azure Portal, and multiple Service Fabric applications can be deployed to it. Worker Role to stateless service Conceptually, a Worker Role represents a stateless workload, meaning every instance of the workload is identical and requests can be routed to any instance at any time. Each instance is not expected to remember the previous request. State that the workload operates on is managed by an external state store, such as Azure Table Storage or Azure Document DB. In Service Fabric, this type of workload is represented by a Stateless Service. The simplest approach to migrating a Worker Role to Service Fabric can be done by converting Worker Role code to a Stateless Service. Web Role to stateless service Similar to Worker Role, a Web Role also represents a stateless workload, and so conceptually it too can be mapped to a Service Fabric stateless service. However, unlike Web Roles, Service Fabric does not support IIS. To migrate a web application from a Web Role to a stateless service requires first moving to a web framework that can be self- hosted and does not depend on IIS or System.Web, such as ASP.NET Core 1. APPLICATION SUPPORTED MIGRATION PATH ASP.NET Web Forms No Convert to ASP.NET Core 1 MVC ASP.NET MVC With Migration Upgrade to ASP.NET Core 1 MVC ASP.NET Web API With Migration Use self-hosted server or ASP.NET Core 1 ASP.NET Core 1 Yes N/A Entry point API and lifecycle Worker Role and Service Fabric service APIs offer similar entry points: ENTRY POINT WORKER ROLE SERVICE FABRIC SERVICE Processing Run() RunAsync() VM start OnStart() N/A VM stop OnStop() N/A Open listener for client requests N/A CreateServiceInstanceListener() for stateless CreateServiceReplicaListener() for stateful Worker Role using Microsoft.WindowsAzure.ServiceRuntime; namespace WorkerRole1 { public class WorkerRole : RoleEntryPoint { public override void Run() { } public override bool OnStart() { } public override void OnStop() { } } } Service Fabric Stateless Service using System.Collections.Generic; using System.Threading; using System.Threading.Tasks; using Microsoft.ServiceFabric.Services.Communication.Runtime; using Microsoft.ServiceFabric.Services.Runtime; namespace Stateless1 { public class Stateless1 : StatelessService { protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { } protected override Task RunAsync(CancellationToken cancelServiceInstance) { } } } Both have a primary "Run" override in which to begin processing. Service Fabric services combine Run , Start , and Stop into a single entry point, RunAsync . Your service should begin working when RunAsync starts, and should stop working when the RunAsync method's CancellationToken is signaled. There are several key differences between the lifecycle and lifetime of Worker Roles and Service Fabric services: Lifecycle: The biggest difference is that a Worker Role is a VM and so its lifecycle is tied to the VM, which includes events for when the VM starts and stops. A Service Fabric service has a lifecycle that is separate from the VM lifecycle, so it does not include events for when the host VM or machine starts and stop, as they are not related. Lifetime: A Worker Role instance will recycle if the Run method exits. The RunAsync method in a Service Fabric service however can run to completion and the service instance will stay up. Service Fabric provides an optional communication setup entry point for services that listen for client requests. Both the RunAsync and communication entry point are optional overrides in Service Fabric services - your service may choose to only listen to client requests, or only run a processing loop, or both - which is why the RunAsync method is allowed to exit without restarting the service instance, because it may continue to listen for client requests. Application API and environment The Cloud Services environment API provides information and functionality for the current VM instance as well as information about other VM role instances. Service Fabric provides information related to its runtime and some information about the node a service is currently running on. ENVIRONMENT TASK CLOUD SERVICES SERVICE FABRIC Configuration Settings and change RoleEnvironment CodePackageActivationContext notification Local Storage RoleEnvironment CodePackageActivationContext Endpoint Information RoleInstance NodeContext for current Node Current instance: address RoleEnvironment.CurrentRoleInstance FabricClient and Other roles and instance: ServicePartitionResolver RoleEnvironment.Roles for service endpoint discovery Environment Emulation RoleEnvironment.IsEmulated N/A Simultaneous change event RoleEnvironment N/A Configuration settings Configuration settings in Cloud Services are set for a VM role and apply to all instances of that VM role. These settings are key-value pairs set in ServiceConfiguration.*.cscfg files and can be accessed directly through RoleEnvironment. In Service Fabric, settings apply individually to each service and to each application, rather than to a VM, because a VM can host multiple services and applications. A service is composed of three packages: Code: contains the service executables, binaries, DLLs, and any other files a service needs to run. Config: all configuration files and settings for a service. Data: static data files associated with the service. Each of these packages can be independently versioned and upgraded. Similar to Cloud Services, a config package can be accessed programmatically through an API and events are available to notify the service of a config package change. A Settings.xml file can be used for key-value configuration and programmatic access similar to the app settings section of an App.config file. However, unlike Cloud Services, a Service Fabric config package can contain any configuration files in any format, whether it's XML, JSON, YAML, or a custom binary format. Accessing configuration Cloud Services Configuration settings from ServiceConfiguration.*.cscfg can be accessed through RoleEnvironment . These settings are globally available to all role instances in the same Cloud Service deployment. string value = RoleEnvironment.GetConfigurationSettingValue("Key"); Service Fabric Each service has its own individual configuration package. There is no built-in mechanism for global configuration settings accessible by all applications in a cluster. When using Service Fabric's special Settings.xml configuration file within a configuration package, values in Settings.xml can be overwritten at the application level, making application-level configuration settings possible. Configuration settings are accesses within each service instance through the service's CodePackageActivationContext . ConfigurationPackage configPackage = this.Context.CodePackageActivationContext.GetConfigurationPackageObject("Config"); // Access Settings.xml KeyedCollection<string, ConfigurationProperty> parameters = configPackage.Settings.Sections["MyConfigSection"].Parameters; string value = parameters["Key"]?.Value; // Access custom configuration file: using (StreamReader reader = new StreamReader(Path.Combine(configPackage.Path, "CustomConfig.json"))) { MySettings settings = JsonConvert.DeserializeObject<MySettings>(reader.ReadToEnd()); } Configuration update events Cloud Services The RoleEnvironment.Changed event is used to notify all role instances when a change occurs in the environment, such as a configuration change. This is used to consume configuration updates without recycling role instances or restarting a worker process. RoleEnvironment.Changed += RoleEnvironmentChanged; private void RoleEnvironmentChanged(object sender, RoleEnvironmentChangedEventArgs e) { // Get the list of configuration changes var settingChanges = e.Changes.OfType<RoleEnvironmentConfigurationSettingChange>(); foreach (var settingChange in settingChanges) { Trace.WriteLine("Setting: " + settingChange.ConfigurationSettingName, "Information"); } } Service Fabric Each of the three package types in a service - Code, Config, and Data - have events that notify a service instance when a package is updated, added, or removed. A service can contain multiple packages of each type. For example, a service may have multiple config packages, each individually versioned and upgradeable. These events are available to consume changes in service packages without restarting the service instance. this.Context.CodePackageActivationContext.ConfigurationPackageModifiedEvent += this.CodePackageActivationContext_ConfigurationPackageModifiedEvent; private void CodePackageActivationContext_ConfigurationPackageModifiedEvent(object sender, PackageModifiedEventArgs<ConfigurationPackage> e) { this.UpdateCustomConfig(e.NewPackage.Path); this.UpdateSettings(e.NewPackage.Settings); } Startup tasks Startup tasks are actions that are taken before an application starts. A startup task is typically used to run setup scripts using elevated privileges. Both Cloud Services and Service Fabric support start-up tasks. The main difference is that in Cloud Services, a startup task is tied to a VM because it is part of a role instance, whereas in Service Fabric a startup task is tied to a service, which is not tied to any particular VM. CLOUD SERVICES SERVICE FABRIC Configuration location ServiceDefinition.csdef Privileges "limited" or "elevated" Sequencing "simple", "background", "foreground" Cloud Services In Cloud Services a startup entry point is configured per role in ServiceDefinition.csdef. <ServiceDefinition> <Startup> <Task commandLine="Startup.cmd" executionContext="limited" taskType="simple" > <Environment> <Variable name="MyVersionNumber" value="1.0.0.0" /> </Environment> </Task> </Startup> ... </ServiceDefinition> Service Fabric In Service Fabric a startup entry point is configured per service in ServiceManifest.xml: <ServiceManifest> <CodePackage Name="Code" Version="1.0.0"> <SetupEntryPoint> <ExeHost> <Program>Startup.bat</Program> </ExeHost> </SetupEntryPoint> ... </ServiceManifest> A note about development environment Both Cloud Services and Service Fabric are integrated with Visual Studio with project templates and support for debugging, configuring, and deploying both locally and to Azure. Both Cloud Services and Service Fabric also provide a local development runtime environment. The difference is that while the Cloud Service development runtime emulates the Azure environment on which it runs, Service Fabric does not use an emulator - it uses the complete Service Fabric runtime. The Service Fabric environment you run on your local development machine is the same environment that runs in production. Next steps Read more about Service Fabric Reliable Services and the fundamental differences between Cloud Services and Service Fabric application architecture to understand how to take advantage of the full set of Service Fabric features. Getting started with Service Fabric Reliable Services Conceptual guide to the differences between Cloud Services and Service Fabric . Create Service Fabric clusters on Windows Server or Linux 3/8/2017 • 3 min to read • Edit Online Azure Service Fabric allows for the creation of Service Fabric clusters on any VMs or computers running Windows Server or Linux. Microsoft Azure or with any cloud provider. Supported operating systems for clusters on Azure You are able to create clusters on VMs running these operating systems: Windows Server 2012 R2 Windows Server 2016 Linux Ubuntu 16. The initial steps to provision the VMs are governed by the cloud provider or on-premises environment that you are using. Create Service Fabric clusters on Azure Creating a cluster on Azure is done either via a Resource Model template or the Azure portal. Broad customer reach is unbounded by hosting environment constraints. Knowledge of building Service Fabric applications carries over from one hosting environment to another. can be run in multiple hosting environments with minimal to no changes. edit the cluster settings. be it on-premises. Read Create a Service Fabric cluster by using a Resource Manager template or Create a Service Fabric cluster from the Azure portal for more information. Once you have a set of VMs with network connectivity enabled between them. then the steps to set up the Service Fabric package. Service Fabric applications.04 (in public preview) Create Service Fabric standalone clusters on-premise or with any cloud provider Service Fabric provides an install package for you to create standalone Service Fabric clusters on-premises or on any cloud provider For more information on setting up standalone service fabric clusters on Windows Server. read Service Fabric cluster creation for Windows Server Any cloud deployments vs. and run the cluster creation and management scripts are identical. This means you are able to deploy and run Service Fabric applications in any environment where you have a set of Windows Server or Linux computers that are interconnected. . This ensures that your knowledge and experience of operating and managing Service Fabric clusters is transferable when you choose to target new hosting environments. once written. on-premises deployments The process for creating a Service Fabric cluster on-premises is similar to the process of creating a cluster on any cloud of your choice with a set of VMs. Operational experience of running and managing Service Fabric clusters carries over from one environment to another. Benefits of creating standalone Service Fabric clusters You are free to choose any cloud provider to host your cluster. we provide integration with other Azure features and services. On Azure. Auto-scaling: For clusters on Azure. Azure Resource Manager: Use of Azure Resource Manager allows easy management of all resources used by the cluster as a unit and simplifies cost tracking and billing. Service Fabric Cluster as an Azure Resource A Service Fabric cluster is an ARM resource. Diagnostics: On Azure. In on-premises and other cloud environments. Azure portal: Azure portal makes it easy to create and manage clusters. you have to build your own auto-scaling feature or scale manually using the APIs that Service Fabric exposes for scaling clusters. we provide built-in auto-scaling functionality due to Virtual Machine scale-sets. so you can model it like you do other ARM resources in Azure. An extra layer of reliability and protection against widespread outages exists because you can move the services over to another deployment environment if a data center or cloud provider has a blackout. Integration with Azure Infrastructure Service Fabric coordinates with the underlying Azure infrastructure for OS. so if you don't have specific needs for where you run your clusters. network. Next steps Create a cluster on VMs or computers running Windows Server: Service Fabric cluster creation for Windows Server Create a cluster on VMs or computers running Linux: Service Fabric on Linux Learn about Service Fabric support options . we provide integration with Azure diagnostics and Log Analytics. and other upgrades to improve availability and reliability of your applications. Supported operating systems for standalone clusters You are able to create clusters on VMs or computers running these operating systems: Windows Server 2012 R2 Windows Server 2016 Linux ( coming soon) Advantages of Service Fabric clusters on Azure over standalone Service Fabric clusters created on-premises Running Service Fabric clusters on Azure provides advantages over the on-premises option. then we suggest that you run them on Azure. which makes operations and management of the cluster easier and more reliable. In the Azure environment Service Fabric uses the Fault Domain information provided by the environment to correctly configure the nodes in the cluster on your behalf. as are those machines sharing a single source of power. When you set up your own cluster. you need to think through these different areas of failure. Since it's natural for hardware faults to overlap. Describing a service fabric cluster 1/17/2017 • 22 min to read • Edit Online The Service Fabric Cluster Resource Manager provides several mechanisms for describing a cluster. In this example. Fault Domains are inherently hierarchal and are represented as URIs in Service Fabric. if each blade holds more than one virtual machine. racks ("R"). A single machine is a Fault Domain (since it can fail on its own for various reasons. While enforcing these important rules. the Cluster Resource Manager uses this information to ensure high availability of the services running in the cluster. During runtime. we have datacenters ("DC"). Conceivably. it also attempts to optimize the cluster's resource consumption. Service Fabric doesn't want to place services such that the loss of a Fault Domain (caused by the failure of some component) causes services to go down. from power supply failures to drive failures to bad NIC firmware). Machines connected to the same Ethernet switch are in the same Fault Domain. It is important that Fault Domains are set up correctly since Service Fabric uses this information to safely place services. and blades ("B"). there could be another layer in the Fault Domain hierarchy. the Service Fabric Cluster Resource Manager considers the Fault Domains in the cluster and plans a . In the graphic below we color all the entities that contribute to Fault Domains and list all the different Fault Domains that result. Key concepts The Cluster Resource Manager supports several features that describe a cluster: Fault Domains Upgrade Domains Node Properties Node Capacities Fault domains A Fault Domain is any area of coordinated failure. During runtime. it is best if there are the same number of nodes at each level of depth in the Fault Domain hierarchy. Keeping the levels balanced prevents one portion of the hierarchy from containing more services than others. In Azure the choice of which Fault Domain contains a node is managed for you. depending on the number of nodes that you provision you can still end up with Fault Domains with more nodes in them than others. Upgrade Domains define sets of nodes that are upgraded at the same time. Upgrade Domains are defined by policy. The following diagram shows three Upgrade Domains are striped across three Fault Domains. If you ever stand up your own cluster on-premise or in another environment. However. but with a couple key differences. If the “tree” of Fault Domains is unbalanced in your cluster. it’s something you have to think about. In this case the first two Fault Domains end up with more nodes. It also shows one possible placement for three different replicas of a stateful service. where each ends up in different Fault and Upgrade Domains. say you have five Fault Domains but provision seven nodes for a given NodeType. it tries to ensure that the loss of any one portion of the hierarchy doesn’t impact the services running on top of it. First.layout. Service Fabric’s Cluster Resource Manager doesn’t care how many layers there are in the Fault Domain hierarchy. the problem gets worse. It attempts to spread out the stateful replicas or stateless instances for a given service so that they are in separate Fault Domains. it makes it harder for the Cluster Resource Manager to figure the best allocation of services. For example. while Fault Domains are rigorously defined by the areas of coordinated hardware failures. the Cluster Resource Manager is torn between its two goals: It was to use the machines in that “heavy” domain by placing services on them. In the other one Fault Domain ends up with many more nodes. you get to decide how many you want rather than it being dictated by the environment. As a result. Imbalanced Fault Domains layouts mean that the loss of a particular domain can impact the availability of the cluster more than others. Upgrade domains Upgrade Domains are another feature that helps the Service Fabric Cluster Resource Manager understand the layout of the cluster so that it can plan ahead for failures. Doing otherwise would contribute to imbalances in the load of individual nodes and make the failure of certain domains more critical than others. In the first example the nodes are distributed evenly across the Fault Domains. Another difference is that (today at least) Upgrade Domains are not hierarchical – they are more like a simple tag. we show two different example cluster layouts. . If you continue to deploy more NodeTypes with only a couple instances. However. and it wants to place services so that the loss of a domain doesn’t cause problems. that the availability of that service is not compromised. This process helps ensure that if there is a failure of any one Fault Domain (at any level in the hierarchy). Upgrade Domains are a lot like Fault Domains. This placement allows us to lose a Fault Domain while in the middle of a service upgrade and still have one copy of the code and data. What does this look like? In the diagram above. With Upgrade Domains. Because of this. introducing less churn into the system. The tradeoff is acceptable because it prevents bad changes from affecting too much of the service at a time. More Upgrade Domains means less overhead that must be maintained on the other nodes in the cluster. the nodes in each are handling roughly 20% of your traffic. that load needs to go somewhere.There are pros and cons to having large numbers of Upgrade Domains. Common structures that we’ve seen are: Fault Domains and Upgrade Domains mapped 1:1 One Upgrade Domain per Node (physical or virtual OS instance) A “striped” or “matrix” model where the Fault Domains and Upgrade Domains form a matrix with machines usually running down the diagonals . If you need to take down that Upgrade Domain for an upgrade. or constraints on how they overlap. Having so much of your service down at once isn’t desirable since you have to have enough capacity in the rest of your cluster to handle the workload. With more Upgrade Domains each step of the upgrade is more granular and therefore affects a smaller number of nodes or services. For example. The downside of having many Upgrade Domains is that upgrades tend to take longer. There’s no real limit to the total number of fault or Upgrade Domains in an environment. This delay is so that issues introduced by the upgrade have a chance to show up and be detected. if you only have three Upgrade Domains you are taking down about 1/3 of your overall service or cluster capacity at a time. This tends to also improve reliability (since less of the service is impacted by any issue introduced during the upgrade). For example. if you have five Upgrade Domains. Too few Upgrade Domains has its own side effects – while each individual Upgrade Domain is down and being upgraded a large portion of your overall capacity is unavailable. This results in fewer services having to move at a time. Maintaining that buffer means that in the normal case those nodes are less-loaded than they would otherwise be. increasing the cost of running your service. More Upgrade Domains also means that you need less available overhead on other nodes to handle the impact of the upgrade. Service Fabric waits a short period of time after an Upgrade Domain is completed before proceeding. N6 will never be used no matter how many services you create. shown in the bottom right option of the image above. Let's say that we have a cluster with six nodes. But why? Let's look at the difference between the current layout and what would happen if N6 is chosen. each has some pros and cons. Put differently. almost everything ends up looking like the dense matrix pattern. the 1FD:1UD model is fairly simple to set up. Whether this ends up sparse or packed depends on the total number of nodes compared to the number of FDs and UDs. The replicas land on N1-N5. Here's the layout we got and the total number of replicas per Fault and Upgrade Domain: . configured with five Fault Domains and five Upgrade Domains. The 1 UD per Node model is most like what people are used to from managing small sets of machines in the past where each would be taken down independently. where the FDs and UDs form a table and nodes are placed starting along the diagonal." Practically what this means is that for a given service certain moves or arrangements might not be valid. The Fault and Upgrade Domain constraints state: "For a given service partition there should never be a difference greater than one in the number of service objects (stateless service instances or stateful service replicas) between two domains. because they would violate the Fault or Upgrade Domain constraints. You can find out more about constraints in this article. The most common model (and the one used in Azure) is the FD/UD matrix. for sufficiently large clusters.There’s no best answer which layout to choose. For example. Fault and Upgrade Domain constraints and resulting behavior The Cluster Resource Manager treats the desire to keep a service balanced across fault and Upgrade Domains as a constraint. FD0 FD1 FD2 FD3 FD4 UD0 N1 UD1 N6 N2 UD2 N3 UD3 N4 UD4 N5 Now let's say that we create a service with a TargetReplicaSetSize of five. In fact. Let's look at one example. while FD1 has zero. The Cluster Resource Manager does not allow this arrangement. let's look at what would happen if we'd used N6 instead of N2. It is also balanced in terms of the number of replicas per Fault and Upgrade Domain. Now. How would the replicas be distributed then? FD0 FD1 FD2 FD3 FD4 UDTOTAL UD0 R1 1 UD1 R5 1 UD2 R2 1 UD3 R3 1 UD4 R4 1 FDTotal 2 0 1 1 1 - Notice anything? This layout violates our definition for the Fault Domain constraint. making the difference between FD0 and FD1 a total of two. FD0 has two replicas. it is now violating the Upgrade Domain constraint (since . Each domain has the same number of nodes and the same number of replicas. Similarly if we picked N2 and N6 (instead of N1 and N2) we'd get: FD0 FD1 FD2 FD3 FD4 UDTOTAL UD0 0 UD1 R5 R1 2 UD2 R2 1 UD3 R3 1 UD4 R4 1 FDTotal 1 1 1 1 1 - While this layout is balanced in terms of Fault Domains. FD0 FD1 FD2 FD3 FD4 UDTOTAL UD0 R1 1 UD1 R2 1 UD2 R3 1 UD3 R4 1 UD4 R5 1 FDTotal 1 1 1 1 1 - This layout is balanced in terms of nodes per Fault Domain and Upgrade Domain. UD0 has zero replicas while UD1 has two). In this example. it looks something like this: ClusterManifest. In the cluster manifest template. you provide the Fault Domain and Upgrade Domain information yourself. Service Fabric picks up and uses the environment information from Azure.xml <Infrastructure> <!-. If you’re creating your own cluster (or want to run a particular topology in development).IsScaleMin indicates that this cluster runs on one-box /one single server --> <WindowsServer IsScaleMin="true"> <NodeList> <Node NodeName="Node01" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType01" FaultDomain="fd:/DC01/Rack01" UpgradeDomain="UpgradeDomain1" IsSeedNode="true" /> <Node NodeName="Node02" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType02" FaultDomain="fd:/DC01/Rack02" UpgradeDomain="UpgradeDomain2" IsSeedNode="true" /> <Node NodeName="Node03" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType03" FaultDomain="fd:/DC01/Rack03" UpgradeDomain="UpgradeDomain3" IsSeedNode="true" /> <Node NodeName="Node04" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType04" FaultDomain="fd:/DC02/Rack01" UpgradeDomain="UpgradeDomain1" IsSeedNode="true" /> <Node NodeName="Node05" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType05" FaultDomain="fd:/DC02/Rack02" UpgradeDomain="UpgradeDomain2" IsSeedNode="true" /> <Node NodeName="Node06" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType06" FaultDomain="fd:/DC02/Rack03" UpgradeDomain="UpgradeDomain3" IsSeedNode="true" /> <Node NodeName="Node07" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType07" FaultDomain="fd:/DC03/Rack01" UpgradeDomain="UpgradeDomain1" IsSeedNode="true" /> <Node NodeName="Node08" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType08" FaultDomain="fd:/DC03/Rack02" UpgradeDomain="UpgradeDomain2" IsSeedNode="true" /> <Node NodeName="Node09" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType09" FaultDomain="fd:/DC03/Rack03" UpgradeDomain="UpgradeDomain3" IsSeedNode="true" /> </NodeList> </WindowsServer> </Infrastructure> via ClusterConfig. This cluster also has three Upgrade Domains striped across those three datacenters.json for Standalone deployments . This layout is also invalid Configuring fault and Upgrade Domains Defining Fault Domains and Upgrade Domains is done automatically in Azure hosted Service Fabric deployments. we define a nine node local development cluster that spans three “datacenters” (each with three racks). . { "nodeName": "vm9". "faultDomain": "fd:/dc2/r0". "nodeTypeRef": "NodeType0". "nodeTypeRef": "NodeType0". "iPAddress": "localhost". "iPAddress": "localhost". { "nodeName": "vm2". "iPAddress": "localhost". "upgradeDomain": "UD1" }. "faultDomain": "fd:/dc1/r0". { "nodeName": "vm3". "faultDomain": "fd:/dc1/r0". "faultDomain": "fd:/dc2/r0". "faultDomain": "fd:/dc3/r0". { "nodeName": "vm5". "iPAddress": "localhost". "upgradeDomain": "UD2" }. "faultDomain": "fd:/dc2/r0". "upgradeDomain": "UD3" } ]. "upgradeDomain": "UD2" }. { "nodeName": "vm7". "iPAddress": "localhost". { "nodeName": "vm4". "upgradeDomain": "UD3" }. "nodeTypeRef": "NodeType0". "iPAddress": "localhost". "faultDomain": "fd:/dc3/r0". "iPAddress": "localhost". "nodeTypeRef": "NodeType0". "upgradeDomain": "UD3" }. "faultDomain": "fd:/dc3/r0". { "nodeName": "vm8". "nodeTypeRef": "NodeType0". "upgradeDomain": "UD1" }. "iPAddress": "localhost". "nodeTypeRef": "NodeType0". "upgradeDomain": "UD1" }. "faultDomain": "fd:/dc1/r0". "nodeTypeRef": "NodeType0". "iPAddress": "localhost". "upgradeDomain": "UD2" }. "nodeTypeRef": "NodeType0". "nodeTypeRef": "NodeType0"."nodes": [ { "nodeName": "vm1". { "nodeName": "vm6". A great example of targeting hardware to particular workloads is almost every n-tier architecture out there. The statement at the service is called a placement constraint since it constrains where the service can run in the cluster. some workload may require GPUs or SSDs while others may not. or signed long. Placement constraints can be used to indicate where certain services should run. The set of constraints is extensible . In these architectures certain machines serve as the front end/interface serving side of the application (and hence are probably exposed to the internet). The valid selectors in these boolean statements are: 1) conditional checks for creating particular statements STATEMENT SYNTAX "equal to" "==" "not equal to" "!=" "greater than" ">" . Service Fabric has a first class notion of tags that can be applied to nodes. Service Fabric expects that even in a microservices world there are cases where particular workloads need to run on particular hardware configurations. The value specified in the node property can be a string.any key/value pair can work. Therefore. The constraint can be any Boolean statement that operates on the different node properties in the cluster. The different key/value tags on nodes are known as node placement properties (or just node properties). for example: an existing n-tier application has been “lifted and shifted” into a Service Fabric environment a workload wants to run on specific hardware for performance. or security isolation reasons A workload should be isolated from other workloads for policy or resource consumption reasons To support these sorts of configurations. scale. For example. Different sets of machines (often with different hardware resources) handle the work of the compute or storage layers (and usually are not exposed to the internet). bool. These are called placement constraints. NOTE In Azure deployments. the definition of your nodes and roles within the infrastructure option for Azure does not include Fault Domain or Upgrade Domain information. Fault Domains and Upgrade Domains are assigned by Azure. most of the time) you’re going to want to ensure that certain workloads run only on certain nodes or certain sets of nodes in the cluster. Placement constraints and node properties Sometimes (in fact. Generally we have found NodeType to be one of the most commonly used properties. STATEMENT SYNTAX "greater than or equal to" ">=" "less than" "<" "less than or equal to" "<=" 2) boolean statements for grouping and logical operations STATEMENT SYNTAX "and" "&&" "or" "||" "not" "!" "group as single statement" "()" Here are some examples of basic constraint statements. "Value >= 5" "NodeColor != green" "((OneProperty < 100) || ((AnotherProperty == false) && (OneProperty >= 100)))" Only nodes where the overall statement evaluates to “True” can have the service placed on it. Nodes that do not have a property defined do not match any placement constraint containing that property. As of this writing the default properties defined at each node are the NodeType and the NodeName. Let’s say that the following node properties were defined for a given node type: ClusterManifest. It is useful since it corresponds 1:1 with a type of a machine.xml . which in turn corresponds to a type of workload in a traditional n-tier application architecture. Service Fabric defines some default node properties that can be used automatically without the user having to define them. So for example you could write a placement constraint as "(NodeType == NodeType03)" . PlacementConstraints = "(HasSSD == true && SomeProperty >= 4)". One of the cool things about a service’s placement constraints is that they can be updated dynamically during runtime. you could also select that node type.ServiceManager. serviceDescription. you can move a service around in the cluster.. Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceType -Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton - PlacementConstraint "HasSSD == true && SomeProperty >= 4" If you are sure that all nodes of NodeType01 are valid.UpdateServiceAsync(new Uri("fabric:/app/service"). add and remove requirements. StatefulServiceDescription serviceDescription = new StatefulServiceDescription(). updateDescription. Powershell: Update-ServiceFabricService -Stateful -ServiceName $serviceName -PlacementConstraints "NodeType == NodeType01" .ServiceManager.json for Azure hosted clusters. "nodeTypes": [ { "name": "NodeType01". <NodeType Name="NodeType01"> <PlacementProperties> <Property Name="HasSSD" Value="true"/> <Property Name="NodeColor" Value="green"/> <Property Name="SomeProperty" Value="5"/> </PlacementProperties> </NodeType> via ClusterConfig. So if you need to. "NodeColor": "green". In your Azure Resource Manager template for a cluster things like the node type name are likely parameterized. You can create service placement constraints for a service like as follows: C# FabricClient fabricClient = new FabricClient(). "SomeProperty": "5" }.json for Standalone deployments or Template. C#: StatefulServiceUpdateDescription updateDescription = new StatefulServiceUpdateDescription(). // add other required servicedescription fields //. "placementProperties": { "HasSSD": "true"..PlacementConstraints = "NodeType == NodeType01".CreateServiceAsync(serviceDescription). etc. and would look something like "[parameters('vmNodeType1Name')]" rather than "NodeType01". updateDescription). Service Fabric takes care of ensuring that the service stays up and available even when these types of changes are ongoing. await fabricClient. await fabricClient. } ]. It does this by subtracting any declared usage of each service running on that node from the node's capacity. Powershell: . the Service Fabric Cluster Resource Manager can figure out where to place or move replicas so that nodes don’t go over capacity.Weight = ServiceLoadMetricWeight. Capacity One of the most important jobs of any orchestrator is to help manage resource consumption in the cluster. metric.Placement constraints (along with many other orchestrator controls that we’re going to talk about) are specified for every different named service instance. the metric might be "MemoryInMb" and a given Node may have a capacity for "MemoryInMb" of 2048. If you turned off all resource balancing.Add(metric). Updates always take the place of (overwrite) what was previously specified.High. During runtime. Generally this is possible unless the cluster as a whole is too full. Capacity is another constraint that the Cluster Resource Manager uses to understand how much of a resource a node has. Examples of metrics are things like “WorkQueueDepth” or “MemoryInMb”. Both the capacity and the consumption at the service level are expressed in terms of metrics. Before we talk about balancing.Metrics. ServiceLoadMetricDescription metric = new ServiceLoadMetricDescription(). The amount of space available on that SSD (and consumed by services) would be a metric like “DriveSpaceInMb”. For information configuring metrics and their uses.PrimaryDefaultLoad = 64. A node property could be "HasSSD" and could be set to true or false. So for example. The properties on a node are defined via the cluster definition and hence cannot be updated without an upgrade to the cluster. Service Fabric’s Cluster Resource Manager would still try to ensure that no node ended up over its capacity.Name = "MemoryInMb". With this information. what about just ensuring that nodes don’t run out of resources in the first place? Service Fabric represents resources as Metrics . the Service Fabric Cluster Resource Manager doesn't understand what the names of the metrics mean. Remaining capacity is also tracked for the cluster as a whole. Hot nodes lead to resource contention and poor performance. await fabricClient. whereas metrics are about resources that nodes have and that services consume when they are running on a node.CreateServiceAsync(serviceDescription). Metric names are just strings. Some service running on that node can say it is currently consuming 64 of "MemoryInMb". The upgrade of a node's properties and requires each affected node to go down and then come back up. The last thing you want if you’re trying to run services efficiently is a bunch of nodes that are hot while others are cold. It is a good practice to declare units as a part of the metric names that you create when it could be ambiguous. C#: StatefulServiceDescription serviceDescription = new StatefulServiceDescription(). The node would have its capacity for “DriveSpaceInMb” to the amount of total non-reserved space on the drive. serviceDescription. Metrics are any logical or physical resource that you want to describe to Service Fabric. metric.SecondaryDefaultLoad = 64. metric. Node properties are static descriptors of the nodes themselves. metric. and cold nodes represent wasted resources/increased cost.ServiceManager. see this article Metrics are different from placements constraints and node properties. Services would report how much of the metric they used during runtime. the Cluster Resource Manager tracks how much of each resource is present on each node and how much is remaining. It is important to note that just like for placement constraints and node properties. Cluster capacity So how do we keep the overall cluster from being too full? Well. but the node it was running on then only had 512 (of the "MemoryInMb" metric) remaining. When moving replicas.xml <NodeType Name="NodeType02"> <Capacities> <Capacity Name="MemoryInMb" Value="2048"/> <Capacity Name="DiskInMb" Value="512000"/> </Capacities> </NodeType> via ClusterConfig. Services can have their load spike independently of actions taken by the Cluster Resource Manager. Now that replica or instance's placement is invalid. It is also possible (and in fact common) that a service’s load changes dynamically. That said. New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton –Metric @("Memory. This can also happen if the combined usage of the replicas and instances on that node exceeds that node’s capacity. Movement cost is discussed in this article. It does this by moving one or more of the replicas or instances on that node to different nodes. "capacities": { "MemoryInMb": "2048". In either case the Cluster Resource Manager has to kick in and get the node back below capacity. As a result. and would look something like "[parameters('vmNodeType2Name')]" rather than "NodeType02". The first . there are some controls that are baked in to prevent basic problems.json for Azure hosted clusters. with dynamic load there’s not a lot the Cluster Resource Manager can do.High.64) You can see capacities defined in the cluster manifest: ClusterManifest. "nodeTypes": [ { "name": "NodeType02". In your Azure Resource Manager template for a cluster things like the node type name are likely parameterized. "DiskInMb": "512000" } } ].64.json for Standalone deployments or Template. the Cluster Resource Manager tries to minimize the cost of those movements. your cluster with plenty of headroom today may be underpowered when you become famous tomorrow. since there's not enough room on that node. Say that a replica's load changed from 64 to 1024. Since the requirement is only that there be 15 units available. The value you pick for the reserved capacity is a function of the number of Fault and Upgrade Domains you have in the cluster and how much overhead you want. or both.20" /> </Section> via ClusterConfig. For example.10" }. Buffered Capacity Another feature the Cluster Resource Manager has that helps manage overall cluster capacity is the notion of some reserved buffer to the capacity specified at each node. there could be one remaining unit of capacity on 15 different nodes. "value": "0. If you have more domains.20" } ] } ] . Say that you go to create a stateless service and it has some load associated with it (more on default and dynamic load reporting later).15" /> <Parameter Name="SomeOtherMetric" Value="0. Here's an example of how to specify buffered capacity: ClusterManifest. "parameters": [ { "name": "DiskSpace".json for Standalone deployments or Template. { "name": "Memory".json for Azure hosted clusters: "fabricSettings": [ { "name": "NodeBufferPercentage". The Cluster Resource Manager is continually calculating the overall capacity and consumption of each metric. Such rearrangement is almost always possible unless the cluster as a whole is almost entirely full.thing we can do is prevent the creation of new workloads that would cause the cluster to become full. it will eventually place the service. Specifying the buffer percentage only makes sense if you have also specified the node capacity for a metric. Today buffer is specified globally per metric for all nodes via the cluster definition. so it can easily determine if there's sufficient space in the cluster. Let's also say that it is going to consume five units of "DiskSpaceInMb" for every instance of the service. or three remaining units of capacity on five different nodes. { "name": "SomeOtherMetric".10" /> <Parameter Name="Memory" Value="0. this space could be allocated many different ways. As long as the Cluster Resource Manager can rearrange things so there's five units available on three nodes. More Fault and Upgrade Domains means that you can pick a lower number for your buffered capacity. Great! So that means that we need 15 units of "DiskSpaceInMb" to be present in the cluster in order for us to even be able to create these service instances. Let’s say that the service cares about the "DiskSpaceInMb" metric. Buffered Capacity allows reservation of some portion of the overall node capacity so that it is only used to place services during upgrades and node failures.xml <Section Name="NodeBufferPercentage"> <Parameter Name="DiskSpace" Value="0. you can expect smaller amounts of your cluster to be unavailable during upgrades and failures. If there isn't sufficient space. "value": "0. the services are all very "bulky". You want to create three instances of the service. "value": "0. the Cluster Resource Manager rejects the create service call.15" }. Here we see an example of that output: PS C:\Users\user> Get-ServiceFabricClusterLoadInformation LastBalancingStartTimeUtc : 9/1/2016 12:54:59 AM LastBalancingEndTimeUtc : 9/1/2016 12:54:59 AM LoadMetricInformation : LoadMetricName : Metric1 IsBalancedBefore : False IsBalancedAfter : False DeviationBefore : 0.192450089729875 DeviationAfter : 0. check out this article Defining Defragmentation Metrics is one way to consolidate load on nodes instead of spreading it out. check out the article on balancing load .192450089729875 BalancingThreshold : 1 Action : NoActionNeeded ActivityThreshold : 0 ClusterCapacity : 189 ClusterLoad : 45 ClusterRemainingCapacity : 144 NodeBufferPercentage : 10 ClusterBufferedCapacity : 170 ClusterRemainingBufferedCapacity : 125 ClusterCapacityViolation : False MinNodeLoadValue : 0 MinNodeLoadNodeId : 3ea71e8e01f4b0999b121abcbf27d74d MaxNodeLoadValue : 15 MaxNodeLoadNodeId : 2cc648b6770be1bc9824fa995d5b68b1 Next steps For information on the architecture and information flow within the Cluster Resource Manager. This lets you see the buffered capacity settings. and the current consumption for every metric in use in the cluster. refer to this article Start from the beginning and get an Introduction to the Service Fabric Cluster Resource Manager To find out about how the Cluster Resource Manager manages and balances load in the cluster. Buffered capacity is optional but is recommended in any cluster that defines a capacity for a metric.The creation of new services fails when the cluster is out of buffered capacity for a metric. The Cluster Resource Manager exposes this information via PowerShell and the Query APIs. To learn how to configure defragmentation. This ensures that the cluster retains enough spare overhead so that upgrades and failures don’t cause nodes to go over capacity. the total capacity. The number of VMs can be scaled independently. each node type can then be scaled up or down independently. Here are some of the items that you have to consider as a part of that process. primary. have different sets of ports open. have different sets of ports open. internet facing. Since you cannot predict the future. Service Fabric cluster capacity planning considerations 3/24/2017 • 10 min to read • Edit Online For any production deployment. A Service Fabric cluster must have at least one node type. is computation intensive and needs to run on larger VMs (with VM sizes like D4. and do any of them need to be public or internet facing? Typical applications contain a front-end gateway service that receives input from a client. D15) that are not internet facing. So the decision of the number of node types essentially comes down to the following considerations: Does your application have multiple services. Each node type is mapped to a Virtual Machine Scale Set. and their properties. Do your services (that make up your application) have different infrastructure needs such as greater RAM or higher CPU cycles? For example. let us assume that the application that you want to deploy contains a front-end service and a back-end service. go with facts you know of and decide on the number of node types that your applications need to start with. however. Every node type that is defined in a Service Fabric cluster is set up as a separate Virtual Machine Scale Set (VMSS). you end up having at least two node types. and one or more back-end services that communicate with the front-end services. we recommended that you place them in a cluster with two node types. The front-end service can run on smaller VMs (VM sizes like D2) that have ports open to the internet. VMSS is an Azure compute resource you can use to deploy and manage a collection of virtual machines as a set. and can have different capacity metrics. The back-end service. the number of VMs. Read this document for more details on the relationship of Nodetypes to VMSS. capacity planning is an important step. although you can decide to put all the services on one node type. So in this case. D6. how to RDP into one of the . In this example. This allows for each node type to have distinct properties such as internet connectivity or VM size.) The reliability and durability characteristics of the cluster Let us briefly review each of these items. You can always add or remove node types later. you are most likely not yet ready to enter the capacity planning process. and can have different capacity metrics. The properties of each node type The node type can be seen as equivalent to roles in Cloud Services. etc. Being defined as distinct VMSS. number of VMs. Establish the number of node types your cluster needs to start out with. If you are not clear on the purpose of the cluster. as well. Node types define the VM sizes. The number of node types your cluster needs to start out with The properties of each of node type (size. The number of node types your cluster needs to start out with First. Each node type can then be scaled up or down independently. you need to figure out what the cluster you are creating is going to be used for and what kinds of applications you are planning to deploy into this cluster. The primary node type is the node type where Service Fabric system services are placed. . Here are the characteristics of a primary node type: The minimum size of VMs for the primary node type is determined by the durability tier you choose. The default for the reliability tier is Silver. Your cluster can have more than one node type. you will need to choose one of them to be primary. The default for the durability tier is Bronze. The default for the durability tier is Bronze. Here are the characteristics of a non-primary node type: The minimum size of VMs for this node type is determined by the durability tier you choose. The Service Fabric system services (for example. The number of VMs in a node type can be increased after you have deployed the cluster.instances. but the primary node type (the first one that you define on the portal) must have at least five VMs for clusters used for production workloads (or at least three VMs for test clusters). then you will find a is Primary attribute under the node type definition. Non-primary node type For a cluster with multiple node types. Scroll down for details on what the durability tier is and the values it can take. However you should choose this number based on the number of replicas of the application/services that you would like to run in this node type. the Cluster Manager service or Image Store service) are placed on the primary node type and so the reliability and durability of the cluster is determined by the reliability tier value and durability tier value you select for the primary node type. Scroll down for details on what the reliability tier is and the values it can take. Scroll down for details on what the durability tier is and the values it can take. open new ports etc. If you are creating the cluster using a Resource Manager template. The minimum number of VMs for the primary node type is determined by the reliability tier you choose. The minimum number of VMs for this node type can be one. there is one primary node type and the rest of them are non-primary. Primary node type For a cluster with multiple node types. Once enabled this will be available on all standard VMs of single core and above). your body suffers. must take into account the overall peak load you plan to place into the cluster. In the primary node type. G5 etc. you must specify a reliability tier of Silver or higher. which then translates to a minimum Primary Node type size of 5. Bronze . You can choose to update the reliability of your cluster from one tier to another. So we cannot provide you with a qualitative guidance for your specific workload.Think of the primary node type as your "Lungs". The more the number of replicas. Here is an analogy to illustrate what I mean here . Gold durability can be enabled only on full node VM skus like D15_V2. Doing this will trigger the cluster upgrades needed to change the system services replica set count. that is running at Bronze reliability. and so if the brain does not get enough oxygen.The infrastructure Jobs can be paused for a duration of 30 minutes per UD (This is currently not enabled for use. This is the default.. so the VM SKU you choose for it. VM SKU: Primary node type is where the system services run. VM reimage.The infrastructure Jobs can be paused for a duration of 2 hours per UD.Run the System services with a target replica set count of 9 Gold . is absolutely determined by workload you plan to run in the cluster. 2. The capacity needs of a cluster. Silver .Run the System services with a target replica set count of 5 Bronze . So you can have a 20 node cluster.The durability characteristics of the cluster The durability tier is used to indicate to the system the privileges that your VMs have with the underlying Azure infrastructure. Wait for the upgrade in progress to complete before making any other changes to the cluster. that impact the quorum requirements for your stateful services running in it.No privileges.Run the System services with a target replica set count of 7 Silver . this privilege allows Service Fabric to pause any VM level infrastructure request (such as a VM reboot. VM migration etc. In the non-primary node types.Run the System services with a target replica set count of 3 NOTE The reliability tier you choose determines the minimum number of nodes your primary node type must have. VM reimage. however here is the broad guidance to help you get started . this privilege allows Service Fabric to pause any VM level infrastructure request like VM reboot. or VM migration) that impact the quorum requirements for the system services and your stateful services. You can monitor the progress of the upgrade on Service Fabric Explorer or by running Get-ServiceFabricClusterUpgrade Primary node type . like adding nodes etc. The reliability tier has no bearing on the max size of the cluster.Capacity Guidance Here is the guidance for planning the primary node type capacity 1. Number of VM instances: To run any production workload in Azure. The reliability tier can take the following values. it is what provides oxygen to your brain. Platinum . This privilege is expressed in the following values: Gold . the more reliable the system services are in your cluster. The reliability characteristics of the cluster The reliability tier is used to set the number of replicas of the system services that you want to run in this cluster on the primary node type. Read perform Service Fabric cluster in or out for more details VM SKU: This is the node type where your application services are running. if you are running stateful workloads in it.For production workloads The recommended VM SKU is Standard D3 or Standard D3_V2 or equivalent with a minimum of 14 GB of local SSD. This will mean that in steady state you will end up with a replica (from a replica set) in each fault domain and upgrade domain. So we cannot provide you with a qualitative guidance for your specific workload. The minimum supported use VM SKU is Standard D1 or Standard D1_V2 or equivalent with a minimum of 14 GB of local SSD. So we cannot provide you with a qualitative guidance for your specific workload. The minimum supported use VM SKU is Standard D1 or Standard D1_V2 or equivalent with a minimum of 14 GB of local SSD. is absolutely determined by workload you plan to run in the cluster. scaling down a non-primary node type to less than 5. So for production workloads. Partial core VM SKUs like Standard A0 are not supported for production workloads. the minimum recommended non-Primary Node type size is 5. must take into account the peak load you plan to place into each Node.Capacity Guidance for stateful workloads Read the following for Workloads using Service fabric reliable collections or reliable Actors. Standard A1 SKU is specifically not supported for production workloads for performance reasons. Partial core VM SKUs like Standard A0 are not supported for production workloads. The capacity needs of the nodetype. must take into account the peak load you plan to place into each Node.6. however here is the broad guidance to help you get started . 1. VM SKU: This is the node type where your application services are running. due to a defect in the runtime (which is planned to be fixed in 5.6). Non-Primary node type . is absolutely determined by workload you plan to run in the cluster. Number of VM instances: For production workloads that are stateful. however here is the broad guidance to help you get started For production workloads The recommended VM SKU is Standard D3 or Standard D3_V2 or equivalent with a minimum of 14 GB of local SSD. till you call Remove-ServiceFabricNodeState cmd with the appropriate node name.the minimum supported non-Primary Node type size is 2. it is recommended that you run them with a minimum and target replica count of 5. Read more here programming models here. 1. Standard A1 SKU is specifically not supported for production workloads for performance reasons. so the VM SKU you choose for it. NOTE If your cluster is running on a service fabric version less than 5. will result in cluster health turning unhealthy. The capacity needs of the nodetype.Capacity Guidance for stateless workloads Read the following for stateless Workloads Number of VM instances: For production workloads that are stateless. This allows you to run you two stateless instances of your application and allowing your service to survive the loss of a VM instance. so the VM SKU you choose for it. Non-Primary node type . The whole reliability tier concept for system services is actually just a way to specify this setting for system services. Next steps Once you finish your capacity planning and set up a cluster. Partial core VM SKUs like Standard A0 are not supported for production workloads. The minimum supported use VM SKU is Standard D1 or Standard D1_V2 or equivalent. please read the following: Service Fabric cluster security Service Fabric health model introduction Relationship of Nodetypes to VMSS .For production workloads The recommended VM SKU is Standard D3 or Standard D3_V2 or equivalent. Standard A1 SKU is specifically not supported for production workloads for performance reasons. At the root of the tree. Understand the Service Fabric Explorer layout You can navigate through Service Fabric Explorer by using the tree on the left. the cluster dashboard provides an overview of your cluster. To ensure that all information loads correctly. regardless of where your cluster is running. including a summary of application and node health. NOTE If you are using Internet Explorer with Service Fabric Explorer to manage a remote cluster. watch the following Microsoft Virtual Academy video: Connect to Service Fabric Explorer If you have followed the instructions to prepare your development environment. . Visualize your cluster with Service Fabric Explorer 3/9/2017 • 4 min to read • Edit Online Service Fabric Explorer is a web-based tool for inspecting and managing applications and nodes in an Azure Service Fabric cluster. so it is always available. you can launch Service Fabric Explorer on your local cluster by navigating to http://localhost:19080/Explorer. Video tutorial To learn how to use Service Fabric Explorer. Service Fabric Explorer is hosted directly within the cluster. go to Tools > Compatibility View Settings and uncheck Display intranet sites in Compatibility View. you need to configure some Internet Explorer settings. You can use the application view to navigate through Service Fabric's logical hierarchy: applications. This placement ensures that your applications remain available in the presence of hardware failures and application upgrades. View applications and services The cluster contains two subtrees: one for applications and another for nodes. You can view how the current cluster is laid out by using the cluster map. services. .View the cluster's layout Nodes in a Service Fabric cluster are placed across a two-dimensional grid of fault domains and upgrade domains. . WebSvcService is stateless and contains a single instance. Since MyStatefulService is stateful.partitions. and then choose Actions > Delete Application. choose the application from the tree on the left. applications. you can inspect which applications have code deployed on that node. to delete an application instance. you can see the health status and version for a particular service. the main pane shows pertinent information about the item. and replicas. For example. View the cluster's nodes The node view shows the physical layout of the cluster. In the example below. MyStatefulService and WebService. For example. For a given node. it includes a partition with one primary and two secondary replicas. More specifically. At each level of the tree. By contrast. and services within your cluster. you can see which replicas are currently running there. the application MyApp consists of two services. Actions Service Fabric Explorer offers a quick way to invoke actions on nodes. . Services continue to run but Service Fabric does not proactively move anything onto or off it unless it is required to prevent an outage or data inconsistency. This action is typically used to enable debugging services on a specific node to ensure that they do not move during inspection. including all its services and their state (if any). Application Delete Application Delete the application. Node Deactivate (pause) Pause the node in its current state. Requires all applications of that type to be removed first. Service Delete Service Delete the service and its state (if any). TIP You can perform the same actions by clicking the ellipsis next to each element. Typically used when the host processes or machine need to be restarted. Node Deactivate (restart) Safely move all in-memory services off a node and close persistent services. Node Activate Activate the node. The following table lists the actions available for each entity: ENTITY ACTION DESCRIPTION Application type Unprovision type Removes the application package from the cluster's image store. you may be asked to confirm your intent before the action is completed. Discover the Service Fabric Explorer endpoint for a remote cluster To reach Service Fabric Explorer for a given cluster. Connect to a remote Service Fabric cluster If you know the cluster's endpoint and have sufficient permissions you can access Service Fabric Explorer from any browser. point your browser to: . You can also use Service Fabric Explorer to create application instances for a given application type and version. TIP Every action that can be performed through Service Fabric Explorer can also be performed through PowerShell or a REST API. Node Remove node state Remove knowledge of a node's replicas from the cluster. They are created using default parameter values. Choose the application type in the tree view. ENTITY ACTION DESCRIPTION Node Deactivate (remove data) Safely close all services running on the node after building sufficient spare replicas. Typically used when a node (or at least its storage) is being permanently taken out of commission. then click the Create app instance link next to the version you'd like in the right pane. More information here Since many actions are destructive. Node Restart Simulate a node failure by restarting the node. Typically used when an already failed node is deemed unrecoverable. NOTE Application instances created through Service Fabric Explorer cannot currently be parameterized. to enable automation. This is because Service Fabric Explorer is just another service that runs in the cluster. Next steps Testability overview Managing your Service Fabric applications in Visual Studio Service Fabric application deployment using PowerShell . the full URL is also available in the cluster essentials pane of the Azure portal. Connect to a secure cluster You can control client access to your Service Fabric cluster either with certificates or using Azure Active Directory (AAD).http://<your-cluster-endpoint>:19080/Explorer For Azure clusters. If you attempt to connect to Service Fabric Explorer on a secure cluster. then depending on the cluster's configuration you'll be required to present a client certificate or log in using AAD. you could add the --reject-unauthorized-false parameter. you could use the parameter --strict-ssl-false to bypass the verification. If you are connecting to a cluster secured with certificates. Connect to a secure cluster 2/15/2017 • 7 min to read • Edit Online When a client connects to a Service Fabric cluster node. This authentication ensures that only authorized users can access the cluster and deployed applications and perform management tasks. If your certificate has Certificate Authorities (CAs). set up the client certificate on the computer that connects to the cluster. If your Common Name in the certificate does not match the connection endpoint./tmp/ca2 If you have multiple CAs. like the following command: azure servicefabric cluster connect --connection-endpoint https://ip:19080 --client-key-path /tmp/key -- client-cert-path /tmp/cert --reject-unauthorized-false For connecting to a cluster secured with a self-signed certificate. use commas as the delimiter. azure servicefabric cluster connect --connection-endpoint https://ip:19080 --client-key-path /tmp/key -- client-cert-path /tmp/cert --ca-cert-path /tmp/ca1. For more information on cluster security scenarios. Connect to a secure cluster using Azure CLI The following Azure CLI commands describe how to connect to a secure cluster. see Cluster security. you need to add the parameter --ca-cert-path as shown in the following example: azure servicefabric cluster connect --connection-endpoint https://ip:19080 --client-key-path /tmp/key -- client-cert-path /tmp/cert --ca-cert-path /tmp/ca1. Connect to a cluster using PowerShell Before you perform operations on a cluster through PowerShell. ./tmp/ca2 --strict-ssl-false If you would like to skip the CA verification. use the following command removing both the CA verification and Common Name verification: azure servicefabric cluster connect --connection-endpoint https://ip:19080 --client-key-path /tmp/key -- client-cert-path /tmp/cert --strict-ssl-false --reject-unauthorized-false After you connect. Certificate or AAD security must have been previously enabled on the cluster when the cluster was created. the client can be authenticated and secure communication established using certificate security or Azure Active Directory (AAD). first establish a connection to the cluster. The cluster connection is used for all subsequent commands in the given PowerShell session. you should be able to run other CLI commands to interact with the cluster. Connect to a secure cluster using a client certificate The certificate details must match a certificate on the cluster nodes. provide the cluster certificate thumbprint and use the AzureActiveDirectory flag. the command looks like the following example: Connect-ServiceFabricCluster -ConnectionEndpoint clustername. FindValue is the thumbprint of the admin client certificate. connect to the cluster by appending the switch "WindowsCredential". Connect-ServiceFabricCluster -ConnectionEndpoint <Cluster FQDN>:19000 ` -ServerCertThumbprint <Server Certificate Thumbprint> ` -AzureActiveDirectory Connect to a secure cluster using a client certificate Run the following PowerShell command to connect to a secure cluster that uses client certificates to authorize administrator access.azure.westus. provide the cluster endpoint address to the Connect-ServiceFabricCluster command: Connect-ServiceFabricCluster -ConnectionEndpoint <Cluster FQDN>:19000 Connect to a secure cluster using Azure Active Directory To connect to a secure cluster that uses Azure Active Directory to authorize cluster administrator access. Connect to an unsecure cluster To connect to a remote unsecured cluster.ServiceFabric NuGet package. get the Microsoft.cloudapp. To use the FabricClient APIs.Connect to an unsecure cluster To connect to an unsecure cluster.com:19000 ` -KeepAliveIntervalInSec 10 ` -X509Credential -ServerCertThumbprint A8136758F4AB8962AF2BF3F27921BE1DF67F4326 ` -FindType FindByThumbprint -FindValue 71DE04467C9ED0544D021098BCD44C71E183414E ` -StoreLocation CurrentUser -StoreName My Connect to a secure cluster using Windows Active Directory If your standalone cluster is deployed using AD security. Connect-ServiceFabricCluster -ConnectionEndpoint <Cluster FQDN>:19000 ` -WindowsCredential Connect to a cluster using the FabricClient APIs The Service Fabric SDK provides the FabricClient class for cluster management. Provide the cluster certificate thumbprint and the thumbprint of the client certificate that has been granted permissions for cluster management. Connect-ServiceFabricCluster -ConnectionEndpoint <Cluster FQDN>:19000 ` -KeepAliveIntervalInSec 10 ` -X509Credential -ServerCertThumbprint <Certificate Thumbprint> ` -FindType FindByThumbprint -FindValue <Certificate Thumbprint> ` -StoreLocation CurrentUser -StoreName My ServerCertThumbprint is the thumbprint of the server certificate installed on the cluster nodes. When the parameters are filled in. create a FabricClient instance and provide the cluster address: . The certificate details must match a certificate on the cluster nodes. EncryptAndSign. } Connect to a secure cluster interactively using Azure Active Directory The following example uses Azure Active Directory for client identity and server certificate for server identity.FindType = X509FindType.com:19000".Security. using System.cloudapp. e.StoreLocation = StoreLocation.clustername. FabricClient fabricClient = new FabricClient("clustername. A dialog window automatically pops up for interactive sign-in upon connecting to the cluster. var xc = GetCredentials(clientCertThumb. string serverCertThumb. xc.Add(serverCertThumb). CommonName). .westus. string clientCertThumb = "71DE04467C9ED0544D021098BCD44C71E183414E".Cryptography.azure.Result. xc.RemoteCertThumbprints. connection). For code that is running from within a cluster.cloudapp.Add(name).com:19000").westus.StoreName = "My". using System.ClusterManager. Console. FabricClient connects to the local management gateway on the node the code is currently running on.GetClusterManifestAsync(). for example. xc.ToString()). } catch (Exception e) { Console.westus.com". FabricClient fabricClient = new FabricClient(). xc.WriteLine("Connect failed: {0}". xc.WriteLine(ret.azure.ProtectionLevel = ProtectionLevel. xc.X509Certificates. serverCertThumb.FindValue = clientCertThumb. string CommonName = "www.FindByThumbprint. string serverCertThumb = "A8136758F4AB8962AF2BF3F27921BE1DF67F4326". Connect to a secure cluster using a client certificate The nodes in the cluster must have valid certificates whose common name or DNS name in SAN appears in the RemoteCommonNames property set on FabricClient.Message). return xc.RemoteCommonNames. Following this process enables mutual authentication between the client and the cluster nodes. string name) { X509Credentials xc = new X509Credentials().Fabric.CurrentUser. string connection = "clustername. try { var ret = fc.azure. xc. } static X509Credentials GetCredentials(string clientCertThumb. var fc = new FabricClient(xc. avoiding an extra network hop. in a Reliable Service. create a FabricClient without specifying the cluster address. string connection = "clustername. } catch (Exception e) { Console.Clients. var claimsCredentials = new ClaimsCredentials(). string serverCertThumb = "A8136758F4AB8962AF2BF3F27921BE1DF67F4326". claimsCredentials.WriteLine(ret.19. Console. For more information on AAD token acquisition. Version: 2.Message).GetClusterManifestAsync(). see Microsoft.ServerThumbprints.Result. try { var ret = fc.IdentityModel.ActiveDirectory.westus.azure. e.ActiveDirectory.ClusterManager.IdentityModel.ToString()).com:19000".Add(serverCertThumb).WriteLine("Connect failed: {0}". var fc = new FabricClient(claimsCredentials.Clients. } Connect to a secure cluster non-interactively using Azure Active Directory The following example relies on Microsoft. connection).208020213.cloudapp. . new UserCredential("TestAdmin@clustenametenant. string tenantId = "C15CFCEA-02C1-40DC-8466-FBD0EE0B05D2". string connection = "clustername.ServerThumbprints. var fc = new FabricClient(claimsCredentials.com/{0}". clientId.0:oob"). return authResult. claimsCredentials.Add(serverCertThumb).AcquireToken( resource. string redirectUri) { string authorityFormat = @"https://login. string resource. Console.InvariantCulture.LocalClaims = token.AccessToken. var claimsCredentials = new ClaimsCredentials(). static string GetAccessToken( string tenantId.westus. string clientApplicationId = "118473C2-7619-46E3-A8E4-6DA8D5F56E12". e. string authority = string.com:19000". string clientId.microsoftonline. connection). string serverCertThumb = "A8136758F4AB8962AF2BF3F27921BE1DF67F4326". webApplicationId. claimsCredentials.ClusterManager.ToString()). var authResult = authContext.WriteLine("Connect failed: {0}". but the same approach can be used to build a custom interactive token acquisition experience.Result. var authContext = new AuthenticationContext(authority).. "urn:ietf:wg:oauth:2. "TestPassword")). tenantId)..onmicrosoft.com".Message). } catch (Exception e) { Console. } Connect to a secure cluster without prior metadata knowledge using Azure Active Directory The following example uses non-interactive token acquisition.GetClusterManifestAsync().Format(CultureInfo.cloudapp. string webApplicationId = "53E6948C-0897-4DA6-B26A-EE2A38A690B4". } . string token = GetAccessToken( tenantId. The Azure Active Directory metadata needed for token acquisition is read from cluster configuration. . try { var ret = fc.WriteLine(ret. clientApplicationId. authorityFormat.azure. azure.WriteLine(ret. point your browser to: https://<your-cluster-endpoint>:19080/Explorer You are automatically be prompted to log in with AAD.onmicrosoft. "TestPassword")). Set up a client certificate on the remote computer At least two certificates should be used for securing the cluster. fc.Add(serverCertThumb). aad. return authResult. Connect to a secure cluster using Azure Active Directory To connect to a cluster that is secured with AAD. try { var ret = fc. } catch (Exception e) { Console. static string GetAccessToken(AzureActiveDirectoryMetadata aad) { var authContext = new AuthenticationContext(aad. point your browser to: http://<your-cluster-endpoint>:19080/Explorer The full URL is also available in the cluster essentials pane of the Azure portal. one for the cluster and server certificate and another for client access. point your browser to: https://<your-cluster-endpoint>:19080/Explorer You are automatically be prompted to select a client certificate..com:19000". claimsCredentials.AzureActiveDirectoryMetadata).AcquireToken( aad.Authority). Connect to a secure cluster using a client certificate To connect to a cluster that is secured with certificates.cloudapp. string serverCertThumb = "A8136758F4AB8962AF2BF3F27921BE1DF67F4326". }. var fc = new FabricClient(claimsCredentials. string connection = "clustername.ClusterManager. } Connect to a secure cluster using Service Fabric Explorer To reach Service Fabric Explorer for a given cluster.ClientApplication. var claimsCredentials = new ClaimsCredentials()..ServerThumbprints. We recommend that you also use additional secondary certificates and client access . e) => { return GetAccessToken(e.ToString()).Result.ClaimsRetrieval += (o. connection). new UserCredential("[email protected]).com". Console. } .ClusterApplication. var authResult = authContext.GetClusterManifestAsync().AccessToken. e.westus.WriteLine("Connect failed: {0}". Run the following PowerShell cmdlet to set up the client certificate on the computer from which you access the cluster.certificates. The certificate can be installed into the Personal (My) store of the local computer or the current user. To secure the communication between a client and a cluster node using certificate security. Service Fabric Health model introduction Application Security and RunAs .pfx ` -Password (ConvertTo-SecureString -String test -AsPlainText -Force) If it is a self-signed certificate. you first need to obtain and install the client certificate. Import-PfxCertificate -Exportable -CertStoreLocation Cert:\CurrentUser\My ` -FilePath C:\docDemo\certs\DocDemoClusterCert. you need to import it to your machine's "trusted people" store before you can use this certificate to connect to a secure cluster. Import-PfxCertificate -Exportable -CertStoreLocation Cert:\CurrentUser\TrustedPeople ` -FilePath C:\docDemo\certs\DocDemoClusterCert. You also need the thumbprint of the server certificate so that the client can authenticate the cluster.pfx ` -Password (ConvertTo-SecureString -String test -AsPlainText -Force) Next steps Service Fabric Cluster upgrade process and expectations from you Managing your Service Fabric applications in Visual Studio. and find what they are. For example. use the following command: azure servicefabric cluster connect http://PublicIPorFQDN:19080 Replace the PublicIPorFQDN tag with the real IP or FQDN as appropriate. it supports.git cd azure-xplat-cli npm install sudo ln -s \$(pwd)/bin/azure /usr/bin/azure azure servicefabric For each command. you can type the name of the command to obtain the help for that command. Using the Azure CLI to interact with a Service Fabric Cluster 3/3/2017 • 4 min to read • Edit Online You can interact with Service Fabric cluster from Linux machines using the Azure CLI on Linux. you can type --help after a command. azure servicefabric application You can further filter the help to a specific command.sh' >> ~/. the following command gives you help for all the application commands.com/Azure/azure-xplat-cli.completion.completion. For example: azure servicefabric node show --help azure servicefabric application create --help When connecting to a multi-machine cluster from a machine that is not part of the cluster. When connecting to a multi-machine . run the following commands: azure --completion >> ~/azure.sh The following commands connect to the cluster and show you the nodes in the cluster: azure servicefabric cluster connect http://localhost:19080 azure servicefabric node show To use named parameters. Auto-completion is supported for the commands.sh echo 'source ~/azure. as the following example shows: azure servicefabric application create To enable auto-completion in the CLI. The first step is get the latest version of the CLI from the git rep and set it up in your path using the following commands: git clone https://github.sh\_profile source ~/azure.completion. The certificate details must match a certificate on the cluster nodes. WARNING These clusters aren’t secure. you may be opening up your one-box by adding the public IP address in the cluster manifest. use a comma as the delimiter. azure servicefabric cluster connect --connection-endpoint http://ip:19080 --client-key-path /tmp/key --client- cert-path /tmp/cert If your certificate has Certificate Authorities (CAs)./tmp/ca2 If you have multiple CAs. and start the service fabric application: azure servicefabric application package copy [applicationPackagePath] [imageStoreConnectionString] [applicationPathInImageStore] azure servicefabric application type register [applicationPathinImageStore] azure servicefabric application create [applicationName] [applicationTypeName] [applicationTypeVersion] . Deploying your Service Fabric application Execute the following commands to copy. register. you should be able to run other CLI commands to interact with the cluster. you could add the --reject-unauthorized-false parameter as shown in the following command: azure servicefabric cluster connect --connection-endpoint http://ip:19080 --client-key-path /tmp/key --client- cert-path /tmp/cert --reject-unauthorized-false After you connect. thus. you need to add the --ca-cert-path parameter like the following example: azure servicefabric cluster connect --connection-endpoint http://ip:19080 --client-key-path /tmp/key -- client-cert-path /tmp/cert --ca-cert-path /tmp/ca1. you could use the parameter --strict-ssl-false to bypass the verification as shown in the following command: azure servicefabric cluster connect --connection-endpoint http://ip:19080 --client-key-path /tmp/key --client- cert-path /tmp/cert --strict-ssl-false If you would like to skip the CA verification.cluster from a machine that is part of the cluster. If your Common Name in the certificate does not match the connection endpoint. Using the Azure CLI to connect to a Service Fabric Cluster The following Azure CLI commands describe how to connect to a secure cluster. use the following command: azure servicefabric cluster connect --connection-endpoint http://localhost:19080 --client-connection-endpoint PublicIPorFQDN:19000 You can use PowerShell or CLI to interact with your Linux Service Fabric Cluster created through the Azure portal. pem -nodes Refer to OpenSSL documentation for details. register. Ubuntu Desktop doesn't have it installed. and the type is MySFApp. and create your application from project root directory. You can also try an updated app with an error and check the auto rollback functionality in service fabric.pem To convert from a PFX file to a PEM file. you can start the application upgrade with the following command: azure servicefabric application upgrade start -–application-name fabric:/MySFApp -–target-application-type- version 2. the commands would be as follows: azure servicefabric cluster connect http://localhost:19080 azure servicefabric application package copy MySFApp fabric:ImageStore azure servicefabric application type register MySFApp azure servicefabric application create fabric:/MySFApp MySFApp 1. By default. copy.pfx -out mycert.Upgrading your application The process is similar to the process in Windows).pem -certfile mycert.xml) with the updated version number for the application.0 Make the change to your application and rebuild the modified service. Build. For example.0 --rolling-upgrade-mode UnmonitoredAuto You can now monitor the application upgrade using SFX. use the following command: openssl pkcs12 -in certificate.pfx -inkey mycert. Install it using the following command: . To convert from a PEM file to a PFX file.xml) with the updated versions for the Service (and Code or Config or Data as appropriate).pem -in mycert. copy and register your updated application using the following commands: azure servicefabric cluster connect http://localhost:19080> azure servicefabric application package copy MySFApp fabric:ImageStore azure servicefabric application type register MySFApp Now. Now. while accessing a secured Linux cluster from a Windows machine and vice versa you may need to convert your certificate from PFX to PEM and vice versa. Update the modified service’s manifest file (ServiceManifest. the application would have been updated. In a few minutes. Also modify the application’s manifest (ApplicationManifest. use the following command: openssl pkcs12 -export -out certificate. If your application instance is named fabric:/MySFApp. Converting from PFX to PEM and vice versa You might need to install a certificate in your local machine (with Windows or Linux) to access secure clusters that may be in a different environment. and the modified service. Troubleshooting Copying of the application package does not succeed Check if openssh is installed. try disabling PAM for ssh by changing the sshd_config file using the following commands: sudo vi /etc/ssh/sshd_config #Change the line with UsePAM to the following: UsePAM no sudo service sshd reload If the problem still persists. Next steps Set up the development environment and deploy a Service Fabric application to a Linux cluster. so use password authentication instead. sudo apt-get install openssh-server openssh-client** If the problem persists. try increasing the number of ssh sessions by executing the following commands: sudo vi /etc/ssh/sshd\_config # Add the following to lines: # MaxSessions 500 # MaxStartups 300:30:500 sudo service sshd reload Using keys for ssh authentication (as opposed to passwords) isn't yet supported (since the platform uses ssh to copy packages). . Service Fabric cluster security scenarios 3/8/2017 • 5 min to read • Edit Online A Service Fabric cluster is a resource that you own.509 server certificates that you specify as a part of the node-type configurations when you create a cluster. You can specify a primary certificate and an optional secondary certificate that is used for certificate rollovers. if it exposes management endpoints to the public Internet. especially when it has production workloads running on it. This article provides an overview of the security scenarios for clusters running on Azure or standalone and the various technologies used to implement those scenarios. Clusters must be secured to prevent unauthorized users from connecting to your cluster. Azure Resource Manager templates.509 certificates Node -to -node windows security For standalone Windows Server read Secure a standalone cluster on Windows using Windows security Client-to-node security . or a standalone JSON template. The cluster security scenarios are: Node-to-node security Client-to-node security Role-based access control (RBAC) Node-to-node security Secures communication between the VMs or machines in the cluster. Clusters running on Azure or standalone clusters running on Windows can use either Certificate Security or Windows Security for Windows Server machines. For standalone Windows Server read Secure a standalone cluster on Windows using X. The primary and secondary certificates you specify should be different than the admin client and read-only client certificates you specify for Client-to-node security. Node -to -node certificate security Service Fabric uses X. This ensures that only computers that are authorized to join the cluster can participate in hosting applications and services in the cluster. A quick overview of what these certificates are and how you can acquire or create them is provided at the end of this article. For Azure read Set up a cluster by using an Azure Resource Manager template to learn how to configure certificate security in a cluster. Certificate security is configured while creating the cluster either through the Azure portal. doing so allows anonymous users to connect to it. Although it is possible to create an unsecured cluster. by default. Clusters running on Azure or standalone clusters running on Windows can use either Certificate Security or Windows Security. Client-to -node certificate security Client-to-node certificate security is configured while creating the cluster either through the Azure portal. Clients connecting to the cluster using the read-only user client certificate have only read access to management capabilities. Resource Manager templates or a standalone JSON template by specifying an admin client certificate and/or a user client certificate. and how to connect to those clusters afterwards. The admin client and user client certificates you specify should be different than the primary and secondary certificates you specify for Node-to-node security. Clients connecting to the cluster using the admin certificate have full access to management capabilities. For Azure read Set up a cluster by using an Azure Resource Manager template to learn how to configure certificate security in a cluster. For standalone Windows Server read Secure a standalone cluster on Windows using X. See Set up a cluster by using an Azure Resource Manager template for information on how to create the necessary AAD artifacts. which ensures that only authorized users can access the cluster and the applications deployed on the cluster. Two different access control types are supported for clients connecting to a cluster: Administrator role and User role. For standalone Windows Server clusters it is recommended that you use Windows security with group managed accounts (GMA) if you have Windows Server 2012 R2 and Active Directory.Authenticates clients and secures communication between a client and individual nodes in the cluster. making the cluster more secure. Role based access control (RBAC) Access control allows the cluster administrator to limit access to certain cluster operations for different groups of users.509 certificates Client-to -node Azure Active Directory (AAD) security on Azure Clusters running on Azure can also secure access to the management endpoints using Azure Active Directory (AAD). Administrators have full access to management capabilities (including read/write capabilities). Users. In other words these certificates are used for the role bases access control (RBAC) described later in this article. This type of security authenticates and secures client communications. Clients are uniquely identified through either their Windows Security credentials or their certificate security credentials. how to populate them during cluster creation. Otherwise still use Windows security with Windows accounts. . Security Recommendations For Azure clusters. it is recommended that you use AAD security to authenticate clients and certificates for node- to-node security. . for example. Client X. The client can use such a certificate when mutual authentication is required. it must contain both the common name of the certificate and one entry per subject alternative name. such as "Server Authentication" or "Client Authentication". These are entered as DNS Name values. Server X. Never use any temporary or test certificates in production that are created with tools such as MakeCert. X. query capabilities).) for each. If the optional Subject Alternative Name field is populated.509 certificates Server certificates have the primary task of authenticating a server (node) to clients. Client certificates cannot be used for management. One of the initial checks when a client or node authenticates a node is to check the value of the common name in the Subject field. For more details on these certificates. Next steps This article provides conceptual information about cluster security. see Role based access control for Service Fabric clients. "CN = www. and the ability to resolve applications and services. Azure portal. Instead. go to Working with certificates. You specify the administrator and user client roles at the time of cluster creation by providing separate identities (certificates. or authenticating a server (node) to a server (node). You can use a self-signed certificate. AAD etc. For more information on the default access control settings and how to change the default settings.509 digital certificates are commonly used to authenticate clients and servers and to encrypt and digitally sign messages.509 certificates and Service Fabric X. Next. the initialization is "CN" for common name. each prefixed with an initialization to indicate the value type. 1. The following article describes how to generate certificates with subject alternative names (SAN): How to add a subject alternative name to a secure LDAP certificate. The Subject field can contain several values. with an intended purpose of "Client Authentication".have only read access to management capabilities (for example. the Personal store of the current user location typically contains client certificates placed there by a root authority. create a cluster in Azure using a Resource Manager template 2. Most commonly.com".contoso. Either this common name or one of the certificates' subject alternative names must be present in the list of allowed common names. It is also possible for the Subject field to be blank. but should only do so for test clusters and not in production. Some important things to consider: Certificates used in clusters running production workloads should be created by using a correctly configured Windows Server certificate service or obtained from an approved Certificate Authority (CA). NOTE All management operations on a Service Fabric cluster require server certificates.exe. The value of the Intended Purposes field of the certificate should include an appropriate value.509 certificates Client certificates are not typically issued by a third-party certificate authority. When you create a Service Fabric cluster in Azure. Let's look more closely at how and where those VMs are provisioned. a local network or power failure affecting one VM will not affect the other. Service Fabric clusters in Azure are always laid out across five upgrade domains. you are required to choose a region where it will be hosted. allowing Service Fabric to rebalance the work load of the unresponsive machine within the cluster. most notably the number of virtual machines (VMs) requested. Fault domains By default. the VMs in the cluster are evenly spread across logical groups known as fault domains (FDs). if two VMs reside in two distinct FDs. which logically group nodes based on planned maintenance activities. including those that are outside of your control. it is useful to know how clusters are physically laid out in Azure. Disaster recovery in Azure Service Fabric 3/2/2017 • 6 min to read • Edit Online A critical part of delivering a high-availability cloud application is ensuring that it can survive all different types of failures. This article describes the physical layout of an Azure Service Fabric cluster in the context of potential disasters and provides guidance on how to deal with such disasters to limit or eliminate the risk of downtime or data loss. You can visualize the layout of your cluster across fault domains using the cluster map provided in Service Fabric Explorer: NOTE The other axis in the cluster map shows upgrade domains. As a result. Specifically. you can be sure that they do not share the same power source or network switch. Geographic distribution . Physical layout of Service Fabric clusters in Azure To understand the risk posed by different types of failures. which segment the machines based on potential failures in the host hardware. The Azure infrastructure then provisions the resources for that cluster within the region. There are currently 30 Azure regions throughout the world. Indeed. it is preferable to simply wait until the down replicas return. with the impact being specific to the service in question. in addition to hardware failures. there is no guarantee that your cluster's VMs will be evenly spread across those physical locations. Quorum loss If a majority of the replicas for a stateful service's partition go down. you would be able to withstand two . we are choosing to accept a period of unavailability to ensure that clients are not told that their data was saved when in fact it was not. a new primary is elected from the partition's secondary replicas. all VMs for a given cluster are provisioned within a single physical site. currently. Alternatively. it will rejoin the cluster automatically and once again take on its share of the workload. whereas quorum loss in the failover manager service will block new service creation and failovers. When Azure brings the failed machine back up. attempting to repair system services is not recommended. It is helpful to think of the number of replicas you need in terms of the number of unavailable nodes you can tolerate at once while remaining available for writes. among other factors. With a TargetReplicaSetSize of three (one primary and two secondaries). In effect. that even in regions that contain multiple physical data centers. however. An individual region can contain one or more physical data centers depending on demand and the availability of suitable locations. the cluster will continue to operate. with several more announced. For instance. a hardware failure during an upgrade (two replicas down) will result in quorum loss and your service will become read-only. For instance. Note that if you have opted in to allowing reads from secondary replicas for that stateful service. if you have five replicas. Individual machine failures As mentioned. Dealing with failures There are several types of failures that can impact your cluster. if the node was hosting the primary replicas for a partition. Instead. In general." At this point. albeit at lower capacity as stateful replicas get packed into a smaller set of machines and fewer stateless instances are available to spread load. Service Fabric will typically detect the failure within seconds and respond accordingly based on the state of the cluster. there is always the potential for multiple random failures to bring down several machines in a cluster simultaneously. Minimizing the risk of quorum loss You can minimize your risk of quorum loss by increasing the target replica set size for your service. Consider the following examples assuming that you've configured your services to have a MinReplicaSetSize of three. WARNING Performing a repair action while the primary replica is down will result in data loss. A partition remains in quorum loss until a sufficient number of replicas come back or until the cluster administrator forces the system to move on using the Repair-ServiceFabricPartition API. Multiple concurrent machine failures While fault domains significantly reduce the risk of concurrent machine failures. you can continue to perform those read operations while in this state. individual machine failures. as long as a majority of the nodes remain available. Note that unlike for your own services. pose no risk on their own. We will look at them in order of likelihood to occur. either within the VM or in the hardware or software hosting it within a fault domain. System services can also suffer quorum loss. that partition enters a state known as "quorum loss. each with its own mitigation. the smallest number recommended for production services. keeping in mind that application or cluster upgrades can make nodes temporarily unavailable. Note. quorum loss in the naming service will impact name resolution. Service Fabric stops allowing writes to that partition to ensure that its state remains consistent and reliable. In the highly unlikely event that an entire physical data center is destroyed. Data center outages or destruction In rare cases.failures during upgrade (three replicas down) as the remaining two replicas can still form a quorum within the minimum replica set. in all cases.FromResult(false). In these cases.ServiceMessage(this. you can view updates on outages on the Azure status page. your Service Fabric clusters and applications will likewise be unavailable but your data will be preserved. For clusters running in Azure."). it is critically important to periodically backup your state to a geo-redundant store and ensure that you have validated the ability to restore it. they contain many general best practices you can apply in the Service Fabric context as well: Availability checklist Performing a disaster recovery drill Disaster recovery and high availability for Azure applications Learn about Service Fabric support options . any Service Fabric clusters hosted there will be lost. However. Next Steps Learn how to simulate various failures using the testability framework Read other disaster-recovery and high-availability resources.Current. "OnDataLoss event received. code defects in services. To protect against this possibility. the recovery strategy is the same: take regular backups of all stateful services and exercise your ability to restore that state. Even if you have not fully implemented backup and restore yet. and security breaches are more common than widespread data center failures. physical data centers can become temporarily unavailable due to loss of power or network connectivity. Microsoft has published a large amount of guidance on these topics. human operational errors. While some of these documents refer to specific techniques for use in other products. } Software failures and other sources of data loss As a cause of data loss. you should implement a handler for the OnDataLoss event so that you can log when it occurs as follows: protected virtual Task<bool> OnDataLoss(CancellationToken cancellationToken) { ServiceEventSource. return Task. How often you perform a backup will be dependent on your recovery point objective (RPO). along with their state. but not on Linux. Standalone installer isn't available on Linux. Reliable Collections (and Reliable Stateful Services) are not supported on Linux. Jenkins. the feature sets will be at parity when Service Fabric on Linux becomes generally available. Powershell. Eclipse. and LTTng used on Linux. even on Windows. XML schema validation for manifest files is not performed on Linux. there are some features that are supported on Windows. Differences between Service Fabric on Linux (preview) and Windows (generally available) 3/28/2017 • 1 min to read • Edit Online Since Service Fabric on Linux is a preview. Console redirection isn't supported on Linux. Only a subset of Powershell commands can be run against a Linux cluster (as expanded in the next section). The development tooling is different with VisualStudio. Eventually. ReverseProxy isn't available on Linux. Powershell cmdlets that do not work against a Linux Service Fabric cluster Invoke-ServiceFabricChaosTestScenario Invoke-ServiceFabricFailoverTestScenario Invoke-ServiceFabricPartitionDataLoss Invoke-ServiceFabricPartitionQuorumLoss Restart-ServiceFabricPartition Start-ServiceFabricNode Stop-ServiceFabricNode Get-ServiceFabricImageStoreContent Get-ServiceFabricChaosReport Get-ServiceFabricNodeTransitionProgress Get-ServiceFabricPartitionDataLossProgress Get-ServiceFabricPartitionQuorumLossProgress Get-ServiceFabricPartitionRestartProgress Get-ServiceFabricTestCommandStatusList Remove-ServiceFabricTestState Start-ServiceFabricChaos Start-ServiceFabricNodeTransition Start-ServiceFabricPartitionDataLoss . and ETW being used on Windows and Yeoman. Azure Active Directory support isn't available on Linux. NOTE Console redirection isn't supported in production clusters. VSTS. The Fault Analysis Service (FAS) isn't available on Linux. Some CLI command equivalents of Powershell commands aren't available. Start-ServiceFabricPartitionQuorumLoss Start-ServiceFabricPartitionRestart Stop-ServiceFabricChaos Stop-ServiceFabricTestCommand Cmd Get-ServiceFabricNodeConfiguration Get-ServiceFabricClusterConfiguration Get-ServiceFabricClusterConfigurationUpgradeStatus Get-ServiceFabricPackageDebugParameters New-ServiceFabricPackageDebugParameter New-ServiceFabricPackageSharingPolicy Add-ServiceFabricNode Copy-ServiceFabricClusterPackage Get-ServiceFabricRuntimeSupportedVersion Get-ServiceFabricRuntimeUpgradeVersion New-ServiceFabricCluster New-ServiceFabricNodeConfiguration Remove-ServiceFabricCluster Remove-ServiceFabricClusterPackage Remove-ServiceFabricNodeConfiguration Test-ServiceFabricClusterManifest Test-ServiceFabricConfiguration Update-ServiceFabricNodeConfiguration Approve-ServiceFabricRepairTask Complete-ServiceFabricRepairTask Get-ServiceFabricRepairTask Invoke-ServiceFabricDecryptText Invoke-ServiceFabricEncryptSecret Invoke-ServiceFabricEncryptText Invoke-ServiceFabricInfrastructureCommand Invoke-ServiceFabricInfrastructureQuery Remove-ServiceFabricRepairTask Start-ServiceFabricRepairTask Stop-ServiceFabricRepairTask Update-ServiceFabricRepairTaskHealthPolicy Next steps Prepare your development environment on Linux Prepare your development environment on OSX Create and deploy your first Service Fabric Java application on Linux using Yeoman Create and deploy your first Service Fabric Java application on Linux using Service Fabric Plugin for Eclipse Create your first CSharp application on Linux Use the Azure CLI to manage your Service Fabric applications . NOTE For more advanced security options. Authenticate administrators using certificates. please see Creating secure clusters on Linux. This guide walks you through the following steps: Set up Key Vault to manage keys for cluster security. Service Fabric uses X. it is highly recommended to create a secure cluster. The parameters obtained by the helper script provided can be input directly into the portal as described in the section Create a cluster in the Azure portal. Although it is possible to create an unsecure cluster. such as user authentication with Azure Active Directory and setting up certificates for application security. upgrading. The concepts are the same for creating secure clusters. the Azure resource provider responsible . For a complete guide on Key Vault. see the Key Vault getting started guide. When a cluster is deployed in Azure. and deleting applications.a new cluster must be created. Create a secured cluster in Azure through the Azure portal. An unsecure cluster cannot be secured later . An unsecure cluster is a cluster that anyone can connect to at any time and perform management operations. log in to your Azure account and select your subscription before executing Azure commands. services. which includes deploying. For more information and helper scripts for creating secure Linux clusters. whether the clusters are Linux clusters or Windows clusters. and the data they contain. Create a Service Fabric cluster in Azure using the Azure portal 3/8/2017 • 13 min to read • Edit Online This is a step-by-step guide that walks you through the steps of setting up a secure Service Fabric cluster in Azure using the Azure portal.509 certificates to secure a cluster. create your cluster using Azure Resource Manager. Log in to Azure This guide uses Azure PowerShell. Azure Key Vault is used to manage certificates for Service Fabric clusters in Azure. Log in to your azure account: Login-AzureRmAccount Select your subscription: Get-AzureRmSubscription Set-AzureRmContext -SubscriptionId <guid> Set up Key Vault This part of the guide walks you through creating a Key Vault for a Service Fabric cluster in Azure and for Service Fabric applications. When starting a new PowerShell session. A secure cluster is a cluster that prevents unauthorized access to management operations. Putting Key Vault into its own resource group is recommended so that you can remove compute and storage resource groups . ResourceGroupName : mycluster-keyvault Location : westus ProvisioningState : Succeeded Tags : ResourceId : /subscriptions/<guid>/resourceGroups/mycluster-keyvault Create Key Vault Create a Key Vault in the new resource group. and the Azure resource provider that uses certificates stored in Key Vault when it creates a cluster: Create a Resource Group The first step is to create a new resource group specifically for Key Vault.for creating Service Fabric clusters pulls certificates from Key Vault and installs them on the cluster VMs. The Key Vault must be enabled for deployment to allow the Service Fabric resource provider to get certificates from it and install on cluster nodes: .without losing your keys and secrets. a Service Fabric cluster.such as the resource group that has your Service Fabric cluster . PS C:\Users\vturecek> New-AzureRmResourceGroup -Name mycluster-keyvault -Location 'West US' WARNING: The output object type of this cmdlet will be modified in a future release. The following diagram illustrates the relationship between Key Vault. The resource group that has your Key Vault must be in the same region as the cluster that is using it. list. exportable to a Personal Information Exchange (. This is required to provide SSL for the cluster's HTTPS management endpoints and Service Fabric Explorer. You cannot obtain an SSL certificate from a certificate authority (CA) for the . . The certificate must be created for key exchange. backup. delete. you can enable it for deployment using Azure CLI: > azure login > azure account set "your account" > azure config mode arm > azure keyvault list > azure keyvault set-policy --vault-name "your vault name" --enabled-for-deployment true Add certificates to Key Vault Certificates are used in Service Fabric to provide authentication and encryption to secure various aspects of a cluster and its applications.KeyVault/vaults/myvault Vault URI : https://myvault.vault. It provides cluster security in a couple ways: Cluster authentication: Authenticates node-to-node communication for cluster federation. To serve these purposes. The certificate's subject name must match the domain used to access the Service Fabric cluster. This certificate also provides SSL for the HTTPS management API and for Service Fabric Explorer over HTTPS. the certificate must meet the following requirements: The certificate must contain a private key. PS C:\Users\vturecek> New-AzureRmKeyVault -VaultName 'myvault' -ResourceGroupName 'mycluster-keyvault' -Location 'West US' -EnabledForDeployment Vault Name : myvault Resource Group Name : mycluster-keyvault Location : West US Resource ID : /subscriptions/<guid>/resourceGroups/mycluster- keyvault/providers/Microsoft. update.net Tenant ID : <guid> SKU : Standard Enabled For Deployment? : False Enabled For Template Deployment? : False Enabled For Disk Encryption? : False Access Policies : Tenant ID : <guid> Object ID : <guid> Application ID : Display Name : Permissions to Keys : get. restore Permissions to Secrets : all Tags : If you have an existing Key Vault.pfx) file. Only nodes that can prove their identity with this certificate can join the cluster. import. create.azure.azure.com domain. so that the management client knows it is talking to the real cluster. For more information on how certificates are used in Service Fabric. Cluster and server certificate (required) This certificate is required to secure a cluster and prevent unauthorized access to it. see Service Fabric cluster security scenarios. Server authentication: Authenticates the cluster management endpoints to a management client.cloudapp. However. These certificates only need to be provided to users who are authorized for cluster management. Download the entire contents of the repo into a local directory. For additional user-level access. To use Azure Active Directory. consider the application security scenarios that require a certificate to be installed on the nodes. a PowerShell module is available on GitHub. the Azure resource provider requires keys to be stored in a special JSON format that includes the . Use it to add the cluster certificate and any additional application certificates to Key Vault. Service Fabric has two access levels: admin and read-only user.psm1" The Invoke-AddCertToKeyVault command in this PowerShell module automatically formats a certificate private key into a JSON string and uploads it to Key Vault. such as: Encryption and decryption of application configuration values Encryption of data across nodes during replication Application certificates cannot be configured when creating a cluster through the Azure portal. When you request a certificate from a CA the certificate's subject name must match the custom domain name used for your cluster. Client authentication certificates Additional client certificates authenticate administrators for cluster management tasks. To accommodate these requirements. To configure application certificates at cluster setup time. you must create a cluster using Azure Resource Manager. Acquire a custom domain name for your cluster. Before creating your cluster.pfx as a base-64 encoded string and the private key password. a separate certificate must be provided. 2. Formatting certificates for Azure resource provider use Private key files (. To make this process easier. You can also add application certificates to the cluster after it has been created. NOTE Azure Active Directory is the recommended way to authenticate clients for cluster management operations. Follow these steps to use the module: 1. Application certificates (optional) Any number of additional certificates can be installed on a cluster for application security purposes. You do not need to upload Client authentication certificates to Key Vault to work with Service Fabric. keys must be placed in a JSON string and then stored as secrets in Key Vault. At minimum. a single certificate for administrative access should be used. see role-based access control for Service Fabric clients. For more information on access roles. . you must create a cluster using Azure Resource Manager.pfx) can be added and used directly through Key Vault. Repeat this step for any additional certificates you want to install in your cluster. Import the module in your PowerShell window: PS C:\Users\vturecek> Import-Module "C:\users\vturecek\Documents\ServiceFabricRPHelpers\ServiceFabricRPHelpers. Sign in to the Azure portal.KeyVault/vaults/myvault Name : CertificateURL Value : https://myvault. Click New to add a new resource template. Search for the Service Fabric Cluster template in the Marketplace under Everything. Select Service Fabric Cluster from the list.509 certificates. At this point.azure.vault.pfx Writing secret to myvault in vault myvault Name : CertificateThumbprint Value : <value> Name : SourceVault Value : /subscriptions/<guid>/resourceGroups/mycluster- keyvault/providers/Microsoft.net:443/secrets/mycert/4d087088df974e869f1c0978cb100e47 These are all the Key Vault prerequisites for configuring a Service Fabric cluster Resource Manager template that installs certificates for node authentication. and any additional application security features that use X. management endpoint security and authentication. PS C:\Users\vturecek> Invoke-AddCertToKeyVault -SubscriptionId <guid> -ResourceGroupName mycluster- keyvault -Location "West US" -VaultName myvault -CertificateName mycert -Password "<password>" - UseExistingCertificate -ExistingPfxFilePath "C:\path\to\mycertkey. 3. you should now have the following setup in Azure: Key Vault resource group Key Vault Cluster server authentication certificate Create cluster in the Azure portal Search for the Service Fabric cluster resource 1. Using existing valut myvault in West US Reading pfx file from C:\path\to\key. .pfx" Switching context to SubscriptionId <guid> Ensuring ResourceGroup mycluster-keyvault in West US WARNING: The output object type of this cmdlet will be modified in a future release. 2. click Create. 4. Make sure to select the Subscription that you want your cluster to be deployed to. Enter the name of your cluster. 2.4. since it helps in finding them later. 3. Create a new resource group. NOTE Although you can decide to use an existing resource group. 2. especially when you are trying to make changes to your deployment or delete your cluster. Cluster configuration . especially if you have multiple subscriptions. it is a good practice to create a new resource group. Select the region in which you want to create the cluster. 5. This makes it easy to delete clusters that you do not need. 1. The Create Service Fabric cluster blade has the following four steps. Enter a user name and password for Remote Desktop for the VMs. It is best to give it the same name as the cluster. Basics In the Basics blade you need to provide the basic details for your cluster. You must use the same region that your Key Vault is in. 5. Navigate to the Service Fabric Cluster blade. 1. D-series VMs have SSD drives and are highly recommended for stateful applications.Configure your cluster nodes. as this is the node type where Service Fabric system services are placed. 1. For more information on durability. You want to put the front-end service on smaller VMs (VM sizes like D2) with ports open to the Internet. 2. 3. but on the primary node type. 4. . NOTE A common scenario for multiple node types is an application that contains a front-end service and a back-end service. You can scale up or down the number of VMs in a node type later on. but the primary node type (the first one that you define on the portal) must have at least five VMs. see how to choose the Service Fabric cluster reliability and durability. Select the VM size and pricing tier. The minimum size of VMs for the primary node type is driven by the durability tier you choose for the cluster. Do not use any VM SKU that has partial cores or have less than 7 GB of available disk capacity. the number of VMs. and their properties. Choose a name for your node type (1 to 12 characters containing only letters and numbers). Choose the number of VMs for the node type. 5. and so on) with no Internet-facing ports open. Do not configure Placement Properties because a default placement property of "NodeTypeName" is added automatically. The default for the durability tier is bronze. see how to choose the Service Fabric cluster reliability and durability. Your cluster can have more than one node type. D15. For more information on reliability. The default for the reliability tier is Silver. D6. but you want to put the back-end service on larger VMs (with VM sizes like D4. The minimum number of VMs for the primary node type is driven by the reliability tier you choose. the minimum is driven by the reliability level that you have chosen. Node types define the VM sizes. 6. By default. For example. Select the Fabric upgrade mode you want set your cluster to. you are taking on the responsibility to upgrade your cluster to a supported version. NOTE We support only clusters that are running supported versions of service Fabric. Set the mode to Manual. if you plan to deploy a web application to your cluster. Populate the primary certificate fields with the output obtained from uploading the cluster certificate to . Configure custom endpoints. if you want to choose a supported version. 8. diagnostics are enabled on your cluster to assist with troubleshooting issues. This field allows you to enter a comma-separated list of ports that you want to expose through the Azure Load Balancer to the public Internet for your applications. enter "80" here to allow traffic on port 80 into your cluster. By selecting the Manual mode. Select Automatic. Turning off diagnostics is not recommended. Other node types can have a minimum of 1 VM. 3. For more details on the Fabric upgrade mode see the service-fabric-cluster-upgrade document. if you want the system to automatically pick up the latest available version and try to upgrade your cluster to it. For more information on endpoints. see communicating with applications 7. Security The final step is to provide certificate information to secure the cluster using the Key Vault and certificate information created earlier. Configure cluster diagnostics. If you want to disable diagnostics change the Status toggle to Off. enter the thumbprint of your admin client certificate and the thumbprint of your read- only user client certificate. You can see the creation progress in the notifications. View your cluster status . you will see Deploying Service Fabric Cluster pinned to the Start board. Name : CertificateThumbprint Value : <value> Name : SourceVault Value : /subscriptions/<guid>/resourceGroups/mycluster- keyvault/providers/Microsoft. the OK button becomes green and you can start the cluster creation process by clicking it. if applicable. they are granted access only if they have a certificate with a thumbprint that matches the thumbprint values entered here.) If you clicked Pin to Startboard while creating the cluster. When administrators attempt to connect to the cluster. Key Vault using the Invoke-AddCertToKeyVault PowerShell command. click Summary to see the configurations that you have provided.vault. 4. (Click the "Bell" icon near the status bar at the upper right of your screen. Summary To complete the cluster creation.azure. or download the Azure Resource Manager template that that used to deploy your cluster. In these fields.net:443/secrets/mycert/4d087088df974e869f1c0978cb100e47 Check the Configure advanced settings box to enter client certificates for admin client and read-only client.KeyVault/vaults/myvault Name : CertificateURL Value : https://myvault. After you have provided the mandatory settings. you have a secure cluster using certificates for management authentication. 2. You can now see the details of your cluster in the dashboard. Go to Browse and click Service Fabric Clusters. The Node Monitor section on the cluster's dashboard blade indicates the number of VMs that are healthy and not healthy. connect to your cluster and learn how to manage application secrets. you can inspect your cluster in the portal: 1. Remote connect to a Virtual Machine Scale Set instance or a cluster node Each of the NodeTypes you specify in your cluster results in a VM Scale Set getting set-up. . You can find more details about the cluster's health at Service Fabric health model introduction. 3. See Remote connect to a VM Scale Set instance for details. it is typically not safe to shut down all machines in the cluster unless you have first performed a full backup of your state. learn about Service Fabric support options. Therfore. Also. including the cluster's public endpoint and a link to Service Fabric Explorer.Once your cluster is created. Locate your cluster and click it. NOTE Service Fabric clusters require a certain number of nodes to be up always to maintain availability and preserve state - referred to as "maintaining quorum". Next. Next steps At this point. Service Fabric uses X. Sign in to your Azure account This guide uses Azure PowerShell. whether they are Linux or Windows clusters. upgrading. When you start a new PowerShell session. Nevertheless. and deleting applications. The following diagram illustrates the relationship between Azure Key Vault.509 certificates to secure a cluster and provide application security features. the Azure resource provider that's responsible for creating Service Fabric clusters pulls certificates from Key Vault and installs them on the cluster VMs. Although it is possible to create an unsecure cluster. and the data they contain. see Creating secure clusters on Linux. services. a Service Fabric cluster. sign in to your Azure account and select your subscription before you execute Azure commands. we highly recommend that you create a secure cluster from the outset. The guide covers the following procedures: Setting up an Azure key vault to upload certificates for cluster and application security Creating a secured cluster in Azure by using Azure Resource Manager Authenticating users by using Azure Active Directory (Azure AD) for cluster management A secure cluster is a cluster that prevents unauthorized access to management operations. You use Key Vault to manage certificates for Service Fabric clusters in Azure. For a complete guide to Azure Key Vault. An unsecure cluster is a cluster that anyone can connect to at any time and perform management operations. refer to the Key Vault getting started guide. We acknowledge that the article is long. a new cluster must be created. Sign in to your Azure account: Login-AzureRmAccount Select your subscription: Get-AzureRmSubscription Set-AzureRmContext -SubscriptionId <guid> Set up a key vault This section discusses creating a key vault for a Service Fabric cluster in Azure and for Service Fabric applications. and the Azure resource provider that uses certificates stored in a key vault when it creates a cluster: . be sure to follow each step carefully. For more information and helper scripts for creating secure Linux clusters. When a cluster is deployed in Azure. Because an unsecure cluster cannot be secured later. The concept of creating secure clusters is the same. Create a Service Fabric cluster by using Azure Resource Manager 3/2/2017 • 20 min to read • Edit Online This step-by-step guide walks you through setting up a secure Azure Service Fabric cluster in Azure by using Azure Resource Manager. This includes deploying. unless you are already thoroughly familiar with the content. This action lets you remove the compute and storage resource groups. ResourceGroupName : westus-mykeyvault Location : westus ProvisioningState : Succeeded Tags : ResourceId : /subscriptions/<guid>/resourceGroups/westus-mykeyvault Create a key vault in the new resource group The key vault must be enabled for deployment to allow the compute resource provider to get certificates from it and install it on virtual machine instances: . New-AzureRmResourceGroup -Name westus-mykeyvault -Location 'West US' The output should look like this: WARNING: The output object type of this cmdlet is going to be modified in a future release. The resource group that contains your key vault must be in the same region as the cluster that is using it. including the resource group that contains your Service Fabric cluster.Create a resource group The first step is to create a resource group specifically for your key vault. we suggest that you name the resource group and the key vault in a way that indicates which region it belongs to. If you plan to deploy clusters in multiple regions. without losing your keys and secrets. We recommend that you put the key vault into its own resource group. net Tenant ID : <guid> SKU : Standard Enabled For Deployment? : False Enabled For Template Deployment? : False Enabled For Disk Encryption? : False Access Policies : Tenant ID : <guid> Object ID : <guid> Application ID : Display Name : Permissions to Keys : get. . Only nodes that can prove their identity with this certificate can join the cluster.azure. you must enable it for deployment to allow the compute resource provider to get certificates from it and install it on cluster nodes: Set-AzureRmKeyVaultAccessPolicy -VaultName 'ContosoKeyVault' -EnabledForDeployment Add certificates to your key vault Certificates are used in Service Fabric to provide authentication and encryption to secure various aspects of a cluster and its applications. see Service Fabric cluster security scenarios.KeyVault/vaults/mywestusvault Vault URI : https://mywestusvault. update. backup. Cluster and server certificate (required) This certificate is required to secure a cluster and prevent unauthorized access to it. the certificate must meet the following requirements: The certificate must contain a private key. Server authentication: Authenticates the cluster management endpoints to a management client. delete. so that the management client knows it is talking to the real cluster. For more information on how certificates are used in Service Fabric. create. import. It provides cluster security in two ways: Cluster authentication: Authenticates node-to-node communication for cluster federation. This certificate also provides an SSL for the HTTPS management API and for Service Fabric Explorer over HTTPS.vault. To serve these purposes. New-AzureRmKeyVault -VaultName 'mywestusvault' -ResourceGroupName 'westus-mykeyvault' -Location 'West US' -EnabledForDeployment The output should look like this: Vault Name : mywestusvault Resource Group Name : westus-mykeyvault Location : West US Resource ID : /subscriptions/<guid>/resourceGroups/westus- mykeyvault/providers/Microsoft. restore Permissions to Secrets : all Tags : Use an existing key vault To use an existing key vault. list. 2.. Use the command to add the cluster certificate and any additional application certificates to the key vault. it usually means that you have a resource URL conflict. However.\ServiceFabricRPHelpers\ServiceFabricRPHelpers.pfx" If you get an error. do the following: 1.psm1" The Invoke-AddCertToKeyVault command in this PowerShell module automatically formats a certificate private key into a JSON string and uploads it to the key vault. the output should look like this: .cloudapp..pfx file as a base 64-encoded string and the private key password. Import the ServiceFabricRPHelpers module in your PowerShell window: Import-Module "C:\. To resolve the conflict. The certificate's subject name must match the domain that you use to access the Service Fabric cluster. You must obtain a custom domain name for your cluster. Before creating your cluster. 3. Encryption of data across nodes during replication.pfx) directly through your key vault.com domain. a PowerShell module is available on GitHub.. such as the one shown here. + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : CloseError: (:) [Set-AzureKeyVaultSecret].vault.SetAzureKeyVaultSecret After the conflict is resolved. Uploading an existing certificate Invoke-AddCertToKeyVault -SubscriptionId <guid> -ResourceGroupName westus-mykeyvault -Location "West US" -VaultName mywestusvault -CertificateName mycert -Password "<password>" -UseExistingCertificate - ExistingPfxFilePath "C:\path\to\mycertkey.Commands.pfx) file. Application certificates (optional) Any number of additional certificates can be installed on a cluster for application security purposes. This matching is required to provide an SSL for the cluster's HTTPS management endpoints and Service Fabric Explorer. Go to the local directory.Azure. such as: Encryption and decryption of application configuration values. You cannot obtain an SSL certificate from a certificate authority (CA) for the .psm1:440 char:11 + $secret = Set-AzureKeyVaultSecret -VaultName $VaultName -Name $Certif . consider the application security scenarios that require a certificate to be installed on the nodes. change the key vault name. which is exportable to a Personal Information Exchange (. Repeat this step for any additional certificates you want to install in your cluster. The format includes the . When you request a certificate from a CA.azure. To accommodate these requirements. the Azure compute resource provider requires keys to be stored in a special JavaScript Object Notation (JSON) format. Set-AzureKeyVaultSecret : The remote name could not be resolved: 'westuskv. The certificate must be created for key exchange.net' At C:\Users\chackdan\Documents\GitHub\Service- Fabric\Scripts\ServiceFabricRPHelpers\ServiceFabricRPHelpers. the certificate's subject name must match the custom domain name that you use for your cluster. WebException + FullyQualifiedErrorId : Microsoft. To use the module. the keys must be placed in a JSON string and then stored as "secrets" in the key vault.KeyVault. Download the entire contents of the repo into a local directory. Formatting certificates for Azure resource provider use You can add and use private key files (.azure. To make this process easier. Azure.pfx Writing secret to mywestusvault in vault mywestusvault Name : CertificateThumbprint Value : E21DBC64B183B5BF355C34C46E03409FEEAEF58D Name : SourceVault Value : /subscriptions/<guid>/resourceGroups/westus- mykeyvault/providers/Microsoft. it usually means that you have a resource URL conflict. skip this step.PFX to be stored Invoke-AddCertToKeyVault -SubscriptionId $SubID -ResourceGroupName $ResouceGroup -Location $locationRegion -VaultName $VName -CertificateName $newCertName -CreateSelfSignedCertificate -DnsName $dnsName -OutputPath $localCertPath If you get an error.azure. WebException + FullyQualifiedErrorId : Microsoft. CertificateThumbprint. SourceVault. To resolve the conflict.SetAzureKeyVaultSecret After the conflict is resolved.mycluster.net' At C:\Users\chackdan\Documents\GitHub\Service- Fabric\Scripts\ServiceFabricRPHelpers\ServiceFabricRPHelpers. and CertificateURL. and so forth. Set-AzureKeyVaultSecret : The remote name could not be resolved: 'westuskv.mydomain.westus. you should be prompted for a certificate password.vault.KeyVault/vaults/mywestusvault Name : CertificateURL Value : https://mywestusvault.vault. + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : CloseError: (:) [Set-AzureKeyVaultSecret]..psm1:440 char:11 + $secret = Set-AzureKeyVaultSecret -VaultName $VaultName -Name $Certif . the output should look like this: .net:443/secrets/mycert/4d087088df974e869f1c0978cb100e47 NOTE You need the three preceding strings. After you change the parameters in the following script and then run it. If you do not save the strings. such as the one shown here. Creating a self-signed certificate and uploading it to the key vault If you have already uploaded your certificates to the key vault.. it can be difficult to retrieve them by querying the key vault later. $ResouceGroup = "chackowestuskv" $VName = "chackokv2" $SubID = "6c653126-e4ba-42cd-a1dd-f7bf96ae7a47" $locationRegion = "westus" $newCertName = "chackotestcertificate1" $dnsName = "www. Switching context to SubscriptionId <guid> Ensuring ResourceGroup westus-mykeyvault in West US WARNING: The output object type of this cmdlet is going to be modified in a future release. RG name. $localCertPath = "C:\MyCertificates" # location where you want the . change the key vault name. Using existing value mywestusvault in West US Reading pfx file from C:\path\to\key. This step is for generating a new self-signed certificate and uploading it to your key vault.Commands.azure. to set up a secure Service Fabric cluster and to obtain any application certificates that you might be using for application security.KeyVault.com" #The certificate's subject name must match the domain used to access the Service Fabric cluster. including the web-based Service Fabric Explorer and Visual Studio.azure. you should have the following elements in place: The key vault resource group. the values should be planned and not values that you have already created.pfx Reading pfx file from C:\MyCertificates\chackonewcertificate1. If you have not. As a result. it can be difficult to retrieve them by querying the key vault later. The cluster server authentication certificate and its URL in the key vault. start by reading How to get an Azure Active Directory tenant. 1.KeyVault/vaults/westuskv1 Name : CertificateURL Value : https://westuskv1. to set up a secure Service Fabric cluster and to obtain any application certificates that you might be using for application security. Because the scripts expect cluster names and endpoints. . To simplify some of the steps involved in configuring Azure AD with a Service Fabric cluster. PS C:\Users\chackdan\Documents\GitHub\Service-Fabric\Scripts\ServiceFabricRPHelpers> Invoke- AddCertToKeyVault -SubscriptionId $SubID -ResourceGroupName $ResouceGroup -Location $locationRegion - VaultName $VName -CertificateName $newCertName -Password $certPassword -CreateSelfSignedCertificate - DnsName $dnsName -OutputPath $localCertPath Switching context to SubscriptionId 6c343126-e4ba-52cd-a1dd-f8bf96ae7a47 Ensuring ResourceGroup chackowestuskv in westus WARNING: The output object type of this cmdlet will be modified in a future release. we assume that you have already created a tenant. we have created a set of Windows PowerShell scripts.net:443/secrets/chackonewcertificate1/ee247291e45d405b8c8bbf81782d12bd NOTE You need the three preceding strings. one web application and one native application. SourceVault.vault. and CertificateURL. you create two Azure AD applications to control access to the cluster.pfx Writing secret to chackonewcertificate1 in vault westuskv1 Name : CertificateThumbprint Value : 96BB3CC234F9D43C25D4B547sd8DE7B569F413EE Name : SourceVault Value : /subscriptions/6c653126-e4ba-52cd-a1dd- f8bf96ae7a47/resourceGroups/chackowestuskv/providers/Microsoft. In this article. A Service Fabric cluster offers several entry points to its management functionality. If you do not save the strings. Applications are divided into those with a web-based sign-in UI and those with a native client experience. The application certificates and their URLs in the key vault. NOTE You must complete the following steps before you create the cluster. CertificateThumbprint. At this point. Set up Azure Active Directory for client authentication Azure AD enables organizations (known as tenants) to manage user access to applications. The key vault and its URL (called SourceVault in the preceding PowerShell output). Download the scripts to your computer. Creating new vault westuskv1 in westus Creating new self signed certificate at C:\MyCertificates\chackonewcertificate1. ClusterName is used to prefix the Azure AD applications that are created by the script.ps1 . Extract the zip file. select the Unblock check box. These templates can be used as a starting point for your cluster template. 4. ClusterName.com:19080/Explorer/index. Create a Service Fabric cluster Resource Manager template In this section. Add certificates You add certificates to a cluster Resource Manager template by referencing the key vault that contains the certificate keys. Doing so keeps the Resource Manager template file reusable and free of values specific to a deployment.ps1 -TenantId '690ec069-8200-4068-9d01-5aaf188e557a' -ClusterName 'mycluster' -WebApplicationReplyUrl 'https://mycluster. the script creates the web and native applications to represent your Service Fabric cluster. Right-click the zip file. Create the Resource Manager template This guide uses the 5-node secure cluster example template and template parameters.azure. 3. which by default is: https://<cluster_domain>:19080/Explorer You are prompted to sign in to an account that has administrative privileges for the Azure AD tenant. "clusterApplication":"<guid>". It is intended only to make it easier to map Azure AD artifacts to the Service Fabric cluster that they're being used with. "azureActiveDirectory": { "tenantId":"<guid>".json to your computer and open both files in your favorite text editor. and then click Apply.html' You can find your TenantId by executing the PowerShell command Get-AzureSubscription . so it's a good idea to keep the PowerShell window open. We recommend that you place the key-vault values in a Resource Manager template parameters file. . If you look at the tenant's applications in the Azure classic portal. For example: . the outputs of the preceding PowerShell commands are used in a Service Fabric cluster Resource Manager template.parameters. select Properties.\SetupApplications.2. you should see two new entries: ClusterName_Cluster ClusterName_Client The script prints the JSON required by the Azure Resource Manager template when you create the cluster in the next section. Sample Resource Manager templates are available in the Azure quick-start template gallery on GitHub. WebApplicationReplyUrl is the default endpoint that Azure AD returns to your users after they finish signing in. Run SetupApplications.cloudapp. and provide the TenantId. After you sign in. Download azuredeploy. Executing this command displays the TenantId for every subscription. and WebApplicationReplyUrl as parameters. It does not need to match the actual cluster name exactly. Set this endpoint as the Service Fabric Explorer endpoint for your cluster.json and azuredeploy. "clientApplication":"<guid>" }.westus. . "osProfile": { . "certificateUrl": "[parameters('applicationCertificateUrlValue')]" }.. This action instructs the resource provider to install the certificate on the VMs.ServiceFabric/clusters) and the Service Fabric extension for virtual machine scale sets in the virtual machine scale set resource.... . This arrangement allows the Service Fabric resource provider to configure it for use for cluster authentication and server authentication for management endpoints. . ] } ] } } } Configure the Service Fabric cluster certificate The cluster authentication certificate must be configured in both the Service Fabric cluster resource (Microsoft. "secrets": [ { "sourceVault": { "id": "[parameters('sourceVaultValue')]" }. "vaultCertificates": [ { "certificateStore": "[parameters('clusterCertificateStorevalue')]".. This installation includes both the cluster certificate and any application security certificates that you plan to use for your applications: { "apiVersion": "2016-03-30".Compute/virtualMachineScaleSets". "type": "Microsoft.Add all certificates to the virtual machine scale set osProfile Every certificate that's installed in the cluster must be configured in the osProfile section of the scale set resource (Microsoft. "certificateUrl": "[parameters('clusterCertificateUrlValue')]" }... Vi r t u a l m a c h i n e sc a l e se t r e so u r c e : .Compute/virtualMachineScaleSets). "properties": { . { "certificateStore": "[parameters('applicationCertificateStorevalue')". . { "apiVersion": "2016-03-30". } } } ] } } } } Se r v i c e F a b r i c r e so u r c e : { "apiVersion": "2016-03-01". "x509StoreName": "[parameters('clusterCertificateStoreValue')]" }...Compute/virtualMachineScaleSets". . } } Insert Azure AD configuration The Azure AD configuration that you created earlier can be inserted directly into your Resource Manager template. . . "x509StoreName": "[parameters('clusterCertificateStoreValue')]" }. "type": "Microsoft. "name": "[parameters('clusterName')]". we recommended that you first extract the values into a parameters file to keep the Resource Manager template reusable and free of values specific to a deployment. "settings": { ... However.. "virtualMachineProfile": { "extensionProfile": { "extensions": [ { "name": "[concat('ServiceFabricNodeVmExt'. "location": "[parameters('clusterLocation')]". variables('supportLogStorageAccountName'))]" ].. "type": "Microsoft. "properties": { .. "dependsOn": [ "[concat('Microsoft. "properties": { "certificate": { "thumbprint": "[parameters('clusterCertificateThumbprint')]"... ..ServiceFabric/clusters"..Storage/storageAccounts/'. "certificate": { "thumbprint": "[parameters('clusterCertificateThumbprint')]".'_vmNodeType0Name')]". "properties": { . ..0.com/schemas/2015-01-01/deploymentParameters.azure.. "properties": { "certificate": { "thumbprint": "[parameters('clusterCertificateThumbprint')]". { "apiVersion": "2016-03-01". "contentVersion": "1. "clusterCertificateStoreValue": { "value": "My" }.. .. "type": "Microsoft..KeyVault/vaults/myvault" }. } } At this point. "clusterApplication": "[parameters('aadClusterApplicationId')]".net:443/secrets/myapplicationcert/2e035058ae274f869c4d0348ca100f08" }.. "clusterCertificateThumbprint": { "value": "<thumbprint>" }.vault.ServiceFabric/clusters". "x509StoreName": "[parameters('clusterCertificateStorevalue')]" }. "parameters": { .azure. "aadTenantId": { "value": "<guid>" }. . "name": "[parameters('clusterName')]". "clientApplication": "[parameters('aadClientApplicationId')]" }. "applicationCertificateUrlValue": { "value": "https://myvault..net:443/secrets/myclustercert/4d087088df974e869f1c0978cb100e47" }.vault. . "aadClusterApplicationId": { "value": "<guid>" }. use the output values from the key vault and Azure AD PowerShell commands to populate the parameters file: { "$schema": "http://schema. "clusterCertificateUrlValue": { "value": "https://myvault... "aadClientApplicationId": { "value": "<guid>" }..azure.management.json#". } } Configure Resource Manager template parameters Finally.0. "azureActiveDirectory": { "tenantId": "[parameters('aadTenantId')]".0". "sourceVaultvalue": { "value": "/subscriptions/<guid>/resourceGroups/mycluster- keyvault/providers/Microsoft. you should have the following elements in place: . "applicationCertificateStorevalue": { "value": "My" }. Key vault resource group Key vault Cluster server authentication certificate Data encipherment certificate Azure Active Directory tenant Azure AD application for web-based management and Service Fabric Explorer Azure AD application for native client management Users and their assigned roles Service Fabric cluster Resource Manager template Certificates configured through key vault Azure Active Directory configured The following diagram illustrates where your key vault and Azure AD configuration fit into your Resource Manager template. Create the cluster You are now ready to create the cluster by using Azure resource template deployment. Test it Use the following PowerShell command to test your Resource Manager template with a parameters file: . The first step is to sign in to .json - TemplateParameterFile . assign your users to the roles supported by Service Fabric: read-only and admin. NOTE For more information about roles in Service Fabric. Test-AzureRmResourceGroupDeployment -ResourceGroupName "myresourcegroup" -TemplateFile .\azuredeploy.py after downloading it. Click the Users tab. 4. use the following PowerShell command to deploy your Resource Manager template with a parameters file: New-AzureRmResourceGroupDeployment -ResourceGroupName "myresourcegroup" -TemplateFile . You can assign the roles by using the Azure classic portal.\azuredeploy. In the Azure portal. Make sure that the script has permissions to execute by running chmod +x cert_helper.json Assign users to roles After you have created the applications to represent your cluster. Before you use this helper script. 1. 5. and it is in your path. 3. which has a name like myTestCluster_Cluster . and then select Applications. ensure that you already have Azure command-line interface (CLI) installed. see Role-based access control for Service Fabric clients.\azuredeploy.parameters. Select the web application.json -TemplateParameterFile . we have provided a helper script. Select the role to assign to the user.\azuredeploy.parameters. go to your tenant. Select a user to assign. 2. and then click the Assign button at the bottom of the screen. Create secure clusters on Linux To make the process easier.json Deploy it If the Resource Manager template test passes. You can assign admin and client roles as described in the Assign roles to users section.eastus. because no chain validation or revocation is being performed in this preview release. as described at Configure Resource Manager template parameters. You need the self-signed certificate to connect to the cluster. you have to provide certificate thumbprints for authentication.your Azure account by using CLI with the azure login command.azure. This match is required to provide an SSL for the cluster's HTTPS management endpoints and Service Fabric Explorer. After signing in to your Azure account. and CertificateThumbprint. You cannot obtain an SSL certificate from a CA for the . You can connect to the secure cluster by following the instructions for authenticating client access to a cluster. When you request a certificate from a CA. .net" This command returns the same three strings: SourceVault. (You do not provide the subject name. as the following command shows: . CertificateUrl.py pfx -sub "fffffff-ffff-ffff-ffff-ffffffffffff" -rgname "mykvrg" -kv "mykevname" -ifile "/home/test/cert. Linux preview clusters do not support Azure AD authentication. which is the ID for the new KeyVault ResourceGroup it created for you CertificateUrl for accessing the certificate CertificateThumbprint.cloudapp. see the following command for creating and uploading a self-signed certificate: .pfx" -sname "mycert" -l "East US" -p "pfxtest" Executing the preceding command gives you the three strings as follows: SourceVault: /subscriptions/fffffff-ffff-ffff-ffff- ffffffffffff/resourceGroups/mykvrg/providers/Microsoft. For example. When you specify admin and client roles for a Linux preview cluster. which is used for authentication The following example shows how to use the command: .py [-h] CERT_TYPE [-ifile INPUT_CERT_FILE] [-sub SUBSCRIPTION_ID] [-rgname RESOURCE_GROUP_NAME] [-kv KEY_VAULT_NAME] [-sname CERTIFICATE_NAME] [-l LOCATION] [-p PASSWORD] The -ifile parameter can take a .com domain. You can then use the strings to create both a secure Linux cluster and a location where the self-signed certificate is placed. the certificate's subject name must match the custom domain name that you use for your cluster.azure. You can connect to the secure cluster by following the instructions for authenticating client access to a cluster.pfx file or a . with the certificate type (pfx or pem.vault. use the helper script with your CA signed certificate. You must obtain a custom domain name for your cluster.pem file as input./cert_helper.) If you want to use a self-signed certificate for testing.py ss -rgname "mykvrg" -sub "fffffff-ffff-ffff-ffff-ffffffffffff" -kv "mykevname" -sname "mycert" -l "East US" -p "selftest" -subj "mytest. The parameter -h prints out the help text.KeyVault/vaults/mykvname CertificateUrl: https://myvault. you can use the same script to generate one.net/secrets/mycert/00000000000000000000000000000000 CertificateThumbprint: 0xfffffffffffffffffffffffffffffffffffffffff The certificate's subject name must match the domain that you use to access the Service Fabric cluster. This command returns the following three strings as the output: SourceVaultID. You can then upload the certificate to your key vault by providing the flag ss instead of providing the certificate path and certificate name./cert_helper. These subject names are the entries you need to create a secure Service Fabric cluster (without Azure AD). or ss if it is a self-signed certificate).cloudapp./cert_helper. .cloudapp. Troubleshoot setting up Azure Active Directory for client authentication If you run into an issue while you're setting up Azure AD for client authentication. Service Fabric Explorer prompts you to select a certificate Problem After you sign in successfully to Azure AD in Service Fabric Explorer.The certificate's subject name must match the domain that you use to access the Service Fabric cluster. This match is required to provide an SSL for the cluster's HTTPS management endpoints and Service Fabric Explorer.com domain. the certificate's subject name must match the custom domain name that you use for your cluster. You cannot obtain an SSL certificate from a CA for the . Next. Next steps At this point. connect to your cluster and learn how to manage application secrets.azure. You must obtain a custom domain name for your cluster. the browser returns to the home page but a message prompts you to select a certificate. You can fill the parameters from the helper script in the Azure portal. as described in the Create a cluster in the Azure portal section. When you request a certificate from a CA. you have a secure cluster with Azure Active Directory providing management authentication. review the potential solutions in this section. and as part of the request it provides the redirect return URL. But the URL is not listed in the Azure AD application REPLY URL list. after you sign in successfully to Azure AD. the page returns a failure: "AADSTS50011: The reply address <url> does not match the reply addresses configured for the application: <guid>. . we recommend that you turn on “User assignment required to access app. Service Fabric Explorer returns a failure when you sign in: "AADSTS50011" Problem When you try to sign in to Azure AD in Service Fabric Explorer." Solution This solution is the same as the preceding one. save your change. Thus. Solution Follow the instructions for setting up Azure AD. add the URL of Service Fabric Explorer to the REPLY URL list or replace one of the items in the list. Connection with PowerShell fails with an error: "The specified credentials are invalid" Problem When you use PowerShell to connect to the cluster by using “AzureActiveDirectory” security mode. Solution On the Configure tab of the cluster (web) application. Azure AD authentication fails on Service Fabric cluster. the connection fails with an error: "The specified credentials are invalid." Reason The cluster (web) application that represents Service Fabric Explorer attempts to authenticate against Azure AD.Reason The user isn’t assigned a role in the Azure AD cluster application. When you have finished. Service Fabric Explorer falls back to certificate authentication. and assign user roles.” as SetupApplications.ps1 does. Also. Can I reuse the same Azure AD tenant in multiple clusters? Yes. Why do I still need a server certificate while Azure AD is enabled? FabricClient and FabricGateway perform a mutual authentication.509 certificates and Service Fabric.Connect the cluster by using Azure AD authentication via PowerShell To connect the Service Fabric cluster. use the following PowerShell command example: Connect-ServiceFabricCluster -ConnectionEndpoint <endpoint> -KeepAliveIntervalInSec 10 - AzureActiveDirectory -ServerCertThumbprint <thumbprint> To learn about the Connect-ServiceFabricCluster cmdlet. and the server certificate is used to verify the server identity. Otherwise. During Azure AD authentication. Azure AD integration provides a client identity to the server. For more information about Service Fabric certificates. see Connect-ServiceFabricCluster. see X. . Service Fabric Explorer doesn’t work. But remember to add the URL of Service Fabric Explorer to your cluster (web) application. It can also be used from a script. Set up a Service Fabric cluster by using Visual Studio 2/21/2017 • 6 min to read • Edit Online This article describes how to set up an Azure Service Fabric cluster by using Visual Studio and an Azure Resource Manager template. Visual Studio will ask you to select the Resource Manager template you want to create: . and then search for "Azure SDK for . After the template has been created. or as part of continuous integration (CI) facility. NOTE If you do not see the Azure resource group project under the Cloud node. you do not have the Azure SDK installed. We will use a Visual Studio Azure resource group project to create the template. it can be deployed directly to Azure from Visual Studio. Create a Service Fabric cluster template by using an Azure resource group project To get started. Launch Web Platform Installer (install it now if you have not already). open Visual Studio and create an Azure resource group project (it is available in the Cloud folder): You can create a new Visual Studio solution for this project or add it to an existing solution. After you hit the OK button.NET" and install the version that is compatible with your version of Visual Studio. sourceVaultValue . These parameter values are read from the ServiceFabricCluster. certificateThumbprint The thumbprint of the certificate that secures the cluster.parameters. This certificate is identified by the last three template parameters ( certificateThumbprint . you must provide values for the required template parameters. sourceVaultResourceId The resource ID of the key vault where the certificate that secures the cluster is stored. Optional: change the cluster name . Prepare the template for deployment Before the template is deployed to create the cluster. certificateUrlValue The URL of the cluster security certificate. and certificateUrlValue ). Open the file and provide the following values: PARAMETER NAME DESCRIPTION adminUserName The name of the administrator account for Service Fabric machines (nodes). The project and the Resource Manager template have now been created. which is in the Templates folder of the resource group project. For more information on how to create the cluster security certificate.Select the Service Fabric Cluster template and hit the OK button again. The Visual Studio Service Fabric Resource Manager template creates a secure cluster that is protected by a certificate. see Service Fabric cluster security scenarios article.json file. and it must exist in an Azure Key Vault. For example.json ) to your chosen name. "protocol": "Tcp" } } 3.azure. A template variable that defines the TCP port value for the port: "loadBalancedAppPort1": "80" 2. the template opens up just two public TCP ports (80 and 8081). When a Fabric cluster is created in Azure. This makes it very easy to use the template as part of a continuous integration (CI) system.cloudapp.com . Open that file and search for loadBalancedAppPort .Every Service Fabric cluster has a name. if you name your cluster myBigCluster . Each port is associated with three artifacts: 1. and the location (Azure region) of the resource group that will host the new cluster is East US. Optional: add public application ports You may also want to change the public application ports for the cluster before you deploy it. the DNS name of the cluster will be myBigCluster. If you want to use a specific name for your cluster. modify the Azure Load Balancer definition in the template. If you need more for your applications. The definition is stored in the main template file ( ServiceFabricCluster. A probe that defines how frequently and for how long the Azure load balancer attempts to use a specific Service Fabric node before failing over to another one. "properties": { "intervalInSeconds": 5. The probes are part of the Load Balancer resource. set the value of the clusterName variable in the Resource Manager template file ( ServiceFabricCluster. A load-balancing rule that ties together the port and the probe. one that is meaningful to you. By default the cluster name is generated automatically and made unique by attaching a random suffix to a "cluster" prefix.json ). Here is the probe definition for the first default application port: { "name": "AppPortProbe1". By default. It is the first variable defined in that file. cluster name determines (together with the Azure region) the Domain Name System (DNS) name for the cluster. "port": "[variables('loadBalancedAppPort1')]".eastus. "numberOfProbes": 2. which enables load balancing across a set of Service Fabric cluster nodes: . . Deploy the template by using Visual Studio After you have saved all the required parameter values in the ServiceFabricCluster.. { "name": "AppPortLBRule1". see Get started creating an internal load balancer using a template.dev. After you hit the Deploy button. "idleTimeoutInMinutes": 5. "protocol": "Tcp" } } If the applications that you plan to deploy to the cluster need more ports. Hit the Save button. If necessary. you are ready to deploy the template and create your Service Fabric cluster. It normally makes sense to use a separate resource group for a Service Fabric cluster. asking you to authenticate to Azure: The dialog box lets you choose an existing Resource Manager resource group for the cluster and gives you the option to create a new one.param. Visual Studio will prompt you to confirm the template parameter values. Visual Studio will show the Deploy to Resource Group dialog box.'/probes/AppPortProbe1')]" }. "probe": { "id": "[concat(variables('lbID0'). you can add them by creating additional probe and load-balancing rule definitions. One parameter does not have a persisted value: the administrative account password for the cluster. You need to provide a password value when Visual Studio prompts you for one.. "properties": { "backendAddressPool": { "id": "[variables('lbPoolID0')]" }. "frontendPort": "[variables('loadBalancedAppPort1')]". Right-click the resource group project in Visual Studio Solution Explorer and choose Deploy | New Deployment. "frontendIPConfiguration": { "id": "[variables('lbIPConfig0')]" }.json file. "backendPort": "[variables('loadBalancedAppPort1')]". . For more information on how to work with Azure Load Balancer through Resource Manager templates. "enableFloatingIP": false. " Therefore. In the template parameters dialog notice that the adminPassword parameter text box has a little "key" icon on the right. NOTE Starting with Azure SDK 2. "unrestricted" policy is usually acceptable. then click Deployments on the settings blade. your new cluster is ready to use! NOTE If PowerShell was never used to administer Azure from the machine that you are using now. Enable PowerShell scripting by running the Set-ExecutionPolicy command. A failed resource-group deployment leaves detailed diagnostic information there. go to the Azure portal and open the resource group that you deployed to.referred to as "maintaining quorum. Visual Studio supports reading passwords from Azure Key Vault during deployment. This icon allows you to select an existing key vault secret as the administrative password for the cluster. Click All settings. Next steps Learn about setting up Service Fabric cluster using the Azure portal Learn how to manage and deploy Service Fabric applications using Visual Studio .9. Decide whether to allow diagnostic data collection from Azure PowerShell commands. 1. Just make sure to first enable Azure Resource Manager access for template deployment in the Advanced Access Policies of your key vault. If there are any errors. it is not safe to shut down all of the machines in the cluster unless you have first performed a full backup of your state. 2. You can monitor the progress of the deployment process in the Visual Studio output window. For development machines. you need to do a little housekeeping. Once the template deployment is completed. NOTE Service Fabric clusters require a certain number of nodes to be up to maintain availability and preserve state . and run Enable-AzureRmDataCollection or Disable-AzureRmDataCollection as necessary. This will avoid unnecessary prompts during template deployment. If you are deploying the existing Azure Virtual Network template or the static public IP template. The Service Fabric resource provider requires publicly accessible inbound access to the HTTP gateway port (port 19080. Static public IP address A static public IP address generally is a dedicated resource that's managed separately from the VM or VMs it's assigned to. you can use with a Service Fabric cluster. Azure VPN Gateway. Service Fabric is unique from other networking features in one aspect. If your setup does not meet these requirements. the Azure portal does not display the status of your cluster. a network security group. and virtual network peering. we show you how to create clusters that use the following features: Existing virtual network or subnet Static public IP address Internal-only load balancer Internal and external load balancer Service Fabric runs in a standard virtual machine scale set. in the ExistingRG resource group. Templates All Service Fabric templates are in one download file. like Azure ExpressRoute. The Azure portal internally uses the Service Fabric resource provider to call to a cluster to get information about nodes and applications. but the main goal of adding a cluster to an existing virtual network is to provide network connectivity to other VMs. The subnet is named default. and your node and application list appears empty. first read the Initial setup section of this article. You should be able to deploy the templates as-is by using the following PowerShell commands. In this article. If you want to see your cluster in the Azure portal. you can use the VM and its public IP as a secure jump box. Creating the VM gives a good example of how an existing virtual network typically is used. and your network security group must allow incoming port 19080 traffic. If port 19080 is not accessible from the Service Fabric resource provider. The networking sections of the Azure Resource Manager templates for virtual machine scale sets and Service Fabric are identical. These default resources are created when you use the Azure portal to create a standard virtual machine (VM). You could create the virtual network and subnet without creating the VM. by default) on the management endpoint. Service Fabric Explorer uses the management endpoint to manage your cluster. we start with an existing virtual network named ExistingRG-vnet. it's easy to incorporate other networking features. a message like Nodes Not Found appears in the portal. After you deploy to an existing virtual network. The Service Fabric resource provider also uses this port to query information about your cluster. If your Service Fabric cluster uses only an internal load balancer. without a public IP address. It's provisioned in a dedicated networking resource group (as opposed to in the Service Fabric cluster . Initial setup Existing virtual network In the following example. to display in the Azure portal. Service Fabric networking patterns 3/22/2017 • 11 min to read • Edit Online You can integrate your Azure Service Fabric cluster with other Azure networking features. Any functionality that you can use in a virtual machine scale set. your load balancer must expose a public IP address. Change the vnetID variable to point to the existing virtual network: . "existingVNetRGName": { "type": "string".*/ 2. "subnet0Prefix": { "type": "string".0/24" }.110 PublicIpAddressVersion : IPv4 IdleTimeoutInMinutes : 4 IpConfiguration : null DnsSettings : { "DomainNameLabel": "sfnetworking".resource group itself).83. "defaultValue": "10.cloudapp. "defaultValue": "ExistingRG" }. either in the Azure portal or by using PowerShell: PS C:\Users\user> New-AzureRmPublicIpAddress -Name staticIP1 -ResourceGroupName ExistingRG -Location westus - AllocationMethod Static -DomainNameLabel sfnetworking Name : staticIP1 ResourceGroupName : ExistingRG Location : westus Id : /subscriptions/1237f4d2-3dce-1236-ad95- 123f764e7123/resourceGroups/ExistingRG/providers/Microsoft. like the five-node Service Fabric cluster.azure. "defaultValue": "ExistingRG-vnet" }.0.json. we use the Service Fabric template. You can use the standard portal wizard to download the template from the portal before you create a cluster. /* "subnet0Name": { "type": "string". and then add two new parameters to reference the existing virtual network: "subnet0Name": { "type": "string".0. "defaultValue": "Subnet-0" }.Network/publicIPAddresses/staticIP1 Etag : W/"fc8b0c77-1f84-455d-9930-0404ebba1b64" ResourceGuid : 77c26c06-c0ae-496c-9231-b1a114e08824 ProvisioningState : Succeeded Tags : PublicIpAllocationMethod : Static IpAddress : 40. Change the subnet parameter to the name of the existing subnet. "defaultValue": "default" }. "existingVNetName": { "type": "string".182. You also can use one of the templates in the template gallery. "Fqdn": "sfnetworking.westus.com" } Service Fabric template In the examples in this article. Create a static public IP address named staticIP1 in the same ExistingRG resource group. Existing virtual network or subnet 1. '/resourceGroups/'.Computer/virtualMachineScaleSets". "name": "[parameters('virtualNetworkName')]". The virtual machine scale set node type should show the existing virtual network and subnet. parameters('virtualNetworkName'))]".Network/virtualNetworks from your resources. Deploy the template: New-AzureRmResourceGroup -Name sfnetworkingexistingvnet -Location westus New-AzureRmResourceGroupDeployment -Name deployment -ResourceGroupName sfnetworkingexistingvnet - TemplateFile C:\SFSamples\Final\template\_existingvnet. Remove Microsoft. 3. "type": "Microsoft. '/providers/Microsoft. variables('uniqueStringArray0')[0])]". /*old "vnetID": " [resourceId('Microsoft. "name": "[parameters('vmNodeType0Name')]". 5.Compute/virtualMachineScaleSets .Network/virtualNetworks".Network/virtualNetworks/'. so Azure does not create a new virtual network: /*{ "apiVersion": "[variables('vNetApiVersion')]". "properties": { "addressPrefix": "[parameters('subnet0Prefix')]" } } ] }.subscriptionId. "type": "Microsoft. parameters('existingVNetName'))]". subscription().*/ 4. "properities": { "addressSpace": { "addressPrefixes": [ "[parameters('addressPrefix')]" ] }.Network/virtualNetworks'.json After deployment. You also can use Remote Desktop Protocol (RDP) to access the VM that was already in the virtual network.parameters('virtualNetworkName'))]". your virtual network should include the new scale set VMs. "location": "[parameters('computeLocation')]". and to ping the new scale set VMs: . parameters('existingVNetRGName'). "location": "[parameters('computeLocation')]".Network/virtualNetworks/'. "dependsOn": [ /*"[concat('Microsoft. */ "[Concat('Microsoft. "subnets": [ { "name": "[parameters('subnet0Name')]".Storage/storageAccounts/'. "tags": { "resourceType": "Service Fabric". "clusterName": "[parameters('clusterName')]" } }. so you don't depend on creating a new virtual network: "apiVersion": "[variables('vmssApiVersion')]".*/ "vnetID": "[concat('/subscriptions/'. Comment out the virtual network from the attribute of dependsOn Microsoft. subscriptionId. "publicIPAllocationMethod": "Dynamic" }.Network/loadBalancers . so Azure does not create a new IP address: /* { "apiVersion": "[variables('publicIPApiVersion')]". '0')]". parameters('existingStaticIPName'))]". */ 3. Add parameters for the name of the existing static IP resource group. (The static IP address already has one.5 -n 1 C:>\Users\users>ping NOde1000000 -n 1 For another example. "existingStaticIPDnsFQDN": { "type": "string" } 2. Add a variable to reference the existing static IP address: "existingStaticIP": "[concat('/subscriptions/'. name. parameters('existingStaticIPResourceGroup'). "clusterName": "[parameters('clusterName')]" } }. '/resourceGroups/'. */ 5.)'-'. 4.Network/publicIPAddresses/'.Network/publicIPAddresses from your resources. Remove Microsoft. "location": "[parameters('computeLocation')]". Static public IP address 1. C:>\Users\users>ping 10. Comment out the IP address from the dependsOn attribute of Microsoft. "properties": { "dnsSettings": { "domainNameLabel": "[parameters('dnsName')]" }.0. so you don't depend on creating a new IP address: . '/providers/Microsoft. "name": "[concat(parameters('lbIPName'). and fully qualified domain name (FQDN): "existingStaticIPResourceGroup": { "type": "string" }. "type": "Microsoft. Remove the dnsName parameter.0.Network/publicIPAddresses". subscription(). "tags": { "resourceType": "Service Fabric". see one that is not specific to Service Fabric. "existingStaticIPName": { "type": "string" }.) /* "dnsName": { "type": "string" }. '-'. /*"managementEndpoint": "[concat('http://'.json -existingStaticIPResourceGroup $staticip.*/ "managementEndpoint": " [concat('http://'.':'. you can see that your load balancer is bound to the public static IP address from the other resource group.'0')).dnsSettings.reference(concat(parameters('lbIPName'). 8. '-'.'0'))]"*/ "id": "[variables('existingStaticIP')]" } } } ]. Internal-only load balancer This scenario replaces the external load balancer in the default Service Fabric template with an internal-only load balancer.Name -existingStaticIPDnsFQDN $staticip.parameters('nt0fabricHttpGatewayPort'))]". see the preceding . (Note that this step applies only to Service Fabric clusters. "type": "Microsoft. change the publicIPAddress element of frontendIPConfigurations to reference the existing static IP address instead of a newly created one: "frontendIPConfigurations": [ { "name": "LoadBalancerIPConfig".ResourceGroupName -existingStaticIPName $staticip. In the Microsoft. /* "dependsOn": [ "[concat('Microsoft. parameters('vmNodeType0Name'))]".Network/publicIPAddresses/'.'- '. For implications for the Azure portal and for the Service Fabric resource provider.parameters('existingStaticIPDnsFQDN').ServiceFabric/clusters resource. "properties": { "publicIPAddress": { /*"id": " [resourceId('Microsoft. If you are using a virtual machine scale set.parameters('nt0fabricHttpGatewayPort'))]".fqdn. skip this step. Deploy the template: New-AzureRmResourceGroup -Name sfnetworkingstaticip -Location westus $staticip = Get-AzureRmPublicIpAddress -Name staticIP1 -ResourceGroupName ExistingRG $staticip New-AzureRmResourceGroupDeployment -Name deployment -ResourceGroupName sfnetworkingstaticip - TemplateFile C:\SFSamples\Final\template\_staticip. 7. The Service Fabric client connection endpoint and Service Fabric Explorer endpoint point to the DNS FQDN of the static IP address.Network/publicIPAddresses'. concat(parameters('lbIPName').DnsSettings. "name": "[concat('LB'. In the Microsoft. change managementEndpoint to the DNS FQDN of the static IP address. If you are using a secure cluster.Fqdn After deployment. "location": "[parameters('computeLocation')]". '-'. */ "properties": { 6. '0'))]" ].':'. make sure you change http:// to https://.Network/loadBalancers".Network/loadBalancers resource. parameters('clusterName').concat(parameters('lbIPName').) "fabricSettings": []. "apiVersion": "[variables('lbIPApiVersion')]". '-'. Change the load balancer's frontendIPConfigurations setting from using a publicIPAddress .Network/loadBalancers .'-'. remove the privateIPAddress element. "type": "Microsoft. 5. "internalLBAddress": { "type": "string". you do not need to do this step. (It's not needed.0. if you use a static allocation method. "location": "[parameters('computeLocation')]".section. "defaultValue": "10. "name": "[concat(parameters('lbIPName'). "properties": { "dnsSettings": { "domainNameLabel": "[parameters('dnsName')]" }. "tags": { "resourceType": "Service Fabric". so you don't depend on creating a new IP address.250" } 3.Network/publicIPAddresses from your resources.Network/publicIPAddresses/'. so Azure does not create a new IP address: /* { "apiVersion": "[variables('publicIPApiVersion')]". Remove Microsoft. "publicIPAllocationMethod": "Dynamic" }.'- '.'-'. . "type": "Microsoft. 1. */ 2. Remove the dnsName parameter.)'-'. "dependsOn": [ /*"[concat('Microsoft.Network/publicIPAddresses". and then change privateIPAllocationMethod to Dynamic.concat(parameters('lbIPName'). To use a dynamic IP address. "location": "[parameters('computeLocation')]". "clusterName": "[parameters('clusterName')]" } }. "name": "[concat('LB'.) /* "dnsName": { "type": "string" }. If you use a dynamic allocation method.Network/virtualNetworks/'.parameters('vmNodeType0Name'))]".parameters('virtualNetworkName'))]" ]. parameters('clusterName'). privateIPAddress uses a predefined static internal IP address.'0'))]"*/ "[concat('Microsoft. */ 4. to using a subnet and privateIPAddress .0. you can add a static IP address parameter. Add the virtual network dependsOn attribute because the load balancer now depends on the subnet from the virtual network: "apiVersion": "[variables('lbApiVersion')]".Network/loadBalancers". '0')]". Remove the IP address dependsOn attribute of Microsoft. Optionally. in the portal-created two-node-type template (which comes with two load balancers).) .frontEndIPConfigurations[0]. ':'.0. you can go to the internal Service Fabric Explorer endpoint. and place it on the internal load balancer. Internal and external load balancer In this scenario. If you are using a virtual machine scale set. (For notes related to using a dynamic IP address. In the Microsoft.privateIPAddress.250 IP address. 1. and add an internal load balancer for the same node type. If you put the management endpoints on the internal load balancer.'0'))]" } */ "subnet" :{ "id": "[variables('subnet0Ref')]" }.reference(concat(parameters('lbIPName').*/ "managementEndpoint": " [concat('http://'.parameters('nt0fabricHttpGatewayPort'))]".reference(variables('lbID0')). 6. To use a two-node-type cluster. your load balancer uses the private static 10.':'.properties.) "fabricSettings": []. You also add a port 80 application port. see the Internal-only load balancer section. In a two-node-type cluster. A back-end port attached to a back-end address pool can be assigned only to a single load balancer. see earlier sections of this article.'-'. "privateIPAddress": "[parameters('internalLBAddress')]". For more information. Note that it connects to one of the nodes behind the load balancer. make sure you change http:// to https://. skip this step. (Note that this step applies only to Service Fabric clusters.json After deployment.0.fqdn. The other node type is on the internal load balancer. "frontendIPConfigurations": [ { "name": "LoadBalancerIPConfig". change managementEndpoint to point to the internal load balancer address. If you use a secure cluster.'- '.concat(parameters('lbIPName'). 7.Network/publicIPAddresses'. one node type is on the external load balancer. and which load balancer will have your management endpoints (ports 19000 and 19080). In the example we use. switch the second load balancer to an internal load balancer. Choose which load balancer will have your application ports. Add the static internal load balancer IP address parameter.parameters('nt0fabricHttpGatewayPort'))]". Deploy the template: New-AzureRmResourceGroup -Name sfnetworkinginternallb -Location westus New-AzureRmResourceGroupDeployment -Name deployment -ResourceGroupName sfnetworkinginternallb - TemplateFile C:\SFSamples\Final\template\_internalonlyLB. /*"managementEndpoint": "[concat('http://'. the management endpoints remain on the external load balancer. "privateIPAllocationMethod": "Static" } } ]. "properties": { /* "publicIPAddress": { "id": " [resourceId('Microsoft.dnsSettings.'0')). keep in mind the Service Fabric resource provider restrictions discussed earlier in the article. you start with the existing single-node type external load balancer.ServiceFabric/clusters resource. If you have another machine in that same virtual network. "enableFloatingIP": "false".parameters('vmNodeType0Name'). "internalLBAddress": { "type": "string".0. "lbHttpProbeID0-Int": "[concat(variables('lbID0-Int').'/inboundNatPools/LoadBalancerBEAddressNatPool')]". "properties": { "backendAddressPool": { "id": "[variables('lbPoolID0')]" }.'-'. "idleTimeoutInMinutes": "5". so you can add it to the internal load balancer: "loadBalancingRules": [ { "name": "LBHttpRule".'/probes/FabricHttpGatewayProbe')]". "frontendPort": "[parameters('nt0fabricHttpGatewayPort')]". If you start with the portal-generated template that uses application port 80. "properties":{ "backendAddressPool": { "id": "[variables('lbPoolID0')]" }.'-'.0. '-Internal'))]". "lbIPConfig0-Int": "[concat(variables('lbID0- Int'). '/probes/AppPortProbe1')]" }. /* Internal load balancer networking variables end */ 4. "backendPort": "[parameters('nt0fabricHttpGatewayPort')]".'/backendAddressPools/LoadBalancerBEAddressPool')]". parameters('clusterName'). "frontendIPConfiguration": { "id": "[variables('lbIPConfig0')]" }.'/frontendIPConfigurations/LoadBalancerIPConfig')]". { "name": "AppPortLBRule1". "backendPort": "[parameters('loadBalancedAppPort1')]". copy and paste them. concat('LB'. 3. and add "-Int" to the name: /* Add internal load balancer networking variables */ "lbID0-Int": "[resourceId('Microsoft. To add internal versions of the existing networking variables.250" } 2. "protocol": "tcp" } } /* Remove AppPort1 from the external load balancer. "probe": { "id": "[concate(variables('lbID0'). "enableFloatingIP": "false". remove AppPort1 from the external load balancer loadBalancingRules and probes. Add an application port 80 parameter. "defaultValue": "10.Network/loadBalancers'. the default portal template adds AppPort1 (port 80) on the external load balancer. "protocol": "tcp" .'/probes/FabricGatewayProbe')]". "lbNatPoolID0-Int": "[concat(variables('lbID0- Int'). "probe": { "id": "[variables('lbHttpProbeID0')]" }. "frontendIPConfiguration": { "id": "[variables('lbIPConfig0')]" }. "frontendPort": "[parameters('loadBalancedAppPort1')]". "idleTimeoutInMinutes": "5". "lbProbeID0-Int": "[concat(variables('lbID0-Int'). "lbPoolID0-Int": "[concat(variables('lbID0- Int'). In this case. Network/loadBalancers". */ "name": "[concat('LB'. "properties": { "frontendIPConfigurations": [ { "name": "LoadBalancerIPConfig". Add a second Microsoft. "properties": { /* Switch from Public to Private IP address */ .'0'))]" */ "[concat('Microsoft. "probes": [ { "name": "FabricGatewayProbe". "protocol": "tcp" } } /* Remove AppPort1 from the external load balancer. and implements only the application port 80. "location": "[parameters('computeLocation')]". { "name": "FabricHttpGatewayProbe". "protocol": "tcp" } }*/ ]. "numberOfProbes": 2. "protocol": "tcp" } }. "numberOfProbes": 2. "properties": { "intervalInSeconds": 5. It looks similar to the internal load balancer created in the Internal-only load balancer section. */ { "apiVersion": "[variables('lbApiVersion')]". /* Add "-Internal" to the name.concat(parameters('lbIPName'). to keep RDP endpoints on the public load balancer.parameters('virtualNetworkName'))]" ]. "numberOfProbes": 2. "properties": { "intervalInSeconds": 5. '- Internal')]". { "name": "AppPortProbe1".Network/loadBalancers resource.'-'. This also removes inboundNatPools . "port": "[parameters('nt0fabricHttpGatewayPort')]". "properties": { "intervalInSeconds": 5. but it uses the "-Int" load balancer variables. add vnet dependsOn "[concat('Microsoft.Network/virtualNetworks/'.Network/publicIPAddresses/'. If you want RDP on the internal load balancer. "port": "[parameters('nt0fabricTcpGatewayPort')]". parameters('clusterName').parameters('vmNodeType0Name').'-'. "dependsOn": [ /* Remove public IP dependsOn. configured with a static privateIPAddress and the "-Int" load balancer variables. "type": "Microsoft. move inboundNatPools from the external load balancer to this internal load balancer: /* Add a second load balancer. "protocol": "tcp" } } */ ].'- '. "inboundNatPools": [ 5. "port": "[parameters('loadBalancedAppPort1')]". */ { "name": "AppPortLBRule1". "backendAddressPools": [ { "name": "LoadBalancerBEAddressPool". */ "publicIPAddress": { "id": " [resourceId('Microsoft. and the probe variables.'0'))]" } */ "subnet" :{ "id": "[variables('subnet0Ref')]" }. "port": "[parameters('loadBalancedAppPort1')]".Compute/virtualMachineScaleSets resource. "properties": { "intervalInSeconds": 5. "probe": { "id": "[concat(variables('lbID0-Int'). frontendIPConfiguration. "protocol": "tcp" } } ]. "enableFloatingIP": "false". Be sure to reference the "-Int" versions of backendAddressPool. 6. "backendPort": "[parameters('loadBalancedAppPort1')]". "tags": { "resourceType": "Service Fabric". "frontendIPConfiguration": { "id": "[variables('lbIPConfig0-Int')]" }. "clusterName": "[parameters('clusterName')]" } }. "idleTimeoutInMinutes": "5". "privateIPAddress": "[parameters('internalLBAddress')]". "protocol": "tcp" } } ]. "properties": {} } ]. "loadBalancingRules": [ /* Add the AppPort rule.concat(parameters('lbIPName'). "privateIPAllocationMethod": "Static" } } ]. "probes": [ /* Add the probe for the app port.Network/publicIPAddresses'. "frontendPort": "[parameters('loadBalancedAppPort1')]". "properties": { "backendAddressPool": { "id": "[variables('lbPoolID0-Int')]" }. In networkProfile for the Microsoft. add the internal back-end address pool: . */ { "name": "AppPortProbe1". "inboundNatPools": [ ] }.'-'.'/probes/AppPortProbe1')]" }. "numberOfProbes": 2. If you browse the load balancers. Both load balancers use the same virtual machine scale set back-end pool. Next steps Create a cluster . You also can see the static internal IP address and application endpoint (port 80) assigned to the internal load balancer.json After deployment. "loadBalancerBackendAddressPools": [ { "id": "[variables('lbPoolID0')]" }. 7. you can see two load balancers in the resource group. { /* Add internal BE pool */ "id": "[variables('lbPoolID0-Int')]" } ]. you can see the public IP address and management endpoints (ports 19000 and 19080) assigned to the public IP address. Deploy the template: New-AzureRmResourceGroup -Name sfnetworkinginternalexternallb -Location westus New-AzureRmResourceGroupDeployment -Name deployment -ResourceGroupName sfnetworkinginternalexternallb - TemplateFile C:\SFSamples\Final\template\_internalexternalLB. BackEnd_2. The numbering is reflected in the names. LB-sfcluster4doc-0. For example. The following screen shot shows a cluster that has two node types: FrontEnd and BackEnd. and can have different capacity metrics. The new VM Scale Set instance name will typically be the VM Scale Set name + the next instance number. Each node type can then be scaled up or down independently. When you scale up a VM Scale Set a new instance is created. The relationship between Service Fabric node types and Virtual Machine Scale Sets 1/17/2017 • 4 min to read • Edit Online Virtual Machine Scale Sets are an Azure Compute resource you can use to deploy and manage a collection of virtual machines as a set. BackEnd_1. For example. In our example. The name would something like: LB-<NodeType name>. the VM Scale Set instances start from instance 0 and then goes up. Every node type that is defined in a Service Fabric cluster is set up as a separate VM Scale Set. have different sets of ports open. Mapping VM Scale Set instances to nodes As you can see above. named BackEnd_0. BackEnd_3 and BackEnd_4. it is BackEnd_5. Mapping VM scale set load balancers to each node type/VM Scale Set If you have deployed your cluster from the portal or have used the sample Resource Manager template that we provided. as shown in this screenshot: . Each node type has five nodes each. This particular VM Scale Set has five instances. then when you get a list of all resources under a Resource Group then you will see the load balancers for each VM Scale Set or node type. BackEnd_0 is instance 0 of the BackEnd VM Scale Set. In the portal. navigate to the Load balancer blade and then Settings. Here are the steps you can follow to discover them. the VM Scale Set instances do not get a virtual IP address of their own.Network/loadBalancers. Step 1: Find out the virtual IP address for the node type and then Inbound NAT rules for RDP In order to get that. Unlike single instance VMs. That means the node types can be scaled up or down independently and can be made of different VM SKUs. you need to get the inbound NAT rules values that were defined as a part of the resource definition for Microsoft. So it can be a bit challenging when you are looking for an IP address and port that you can use to remote connect to a specific instance.Remote connect to a VM Scale Set instance or a cluster node Every Node type that is defined in a cluster is set up as a separate VM Scale Set. . VM SCALE SET INSTANCE PORT FrontEnd_0 3389 FrontEnd_1 3390 FrontEnd_2 3391 .106. The ports are allocated in ascending order of the VM Scale Set instance.156 and 3389 Step 2: Find out the port that you can use to remote connect to the specific VM Scale Set instance/node Earlier in this document.42. We will use that to figure out the exact port.In Settings. click on Inbound NAT rules. the ports for each of the five instances are the following. it is 104. This now gives you the IP address and port that you can use to remote connect to the first VM Scale Set instance. so in my example for the FrontEnd node type. In the screenshot below. you now need to do the same mapping for your VM Scale Set instance. I talked about how the VM Scale Set instances map to the nodes. If this PowerShell command fails for some reason. After cluster deployment This is a bit more involved and may result in the VMs getting recycled. Sign in to your Azure account. Login-AzureRmAccount . you can specify the range in the inboundNatPools.Network/loadBalancers. I strongly suggest that you follow the steps outlined in How to install and configure Azure PowerShell. Make sure that Azure PowerShell 1. If you have not done this before. Go to the resource definition for Microsoft. VM SCALE SET INSTANCE PORT FrontEnd_3 3392 FrontEnd_4 3393 FrontEnd_5 3394 Step 3: Remote connect to the specific VM Scale Set instance In the screenshot below I use Remote Desktop Connection to connect to the FrontEnd_1: How to change the RDP port range values Before cluster deployment When you are setting up the cluster using an Resource Manager template. You will now have to set new values using Azure PowerShell. Under that you find the description for inboundNatPools.0 or later is installed on your machine. you should check whether you have Azure PowerShell installed correctly. Replace the frontendPortRangeStart and frontendPortRangeEnd values. Network/loadBalancers -ResourceName <load Balancer name> -ApiVersion <use the API version that get returned> -Force Next steps Overview of the "Deploy anywhere" feature and a comparison with Azure-managed clusters Cluster security Service Fabric SDK and getting started . $PropertiesObject = @{ #Property = value.Run the following to get details on your load balancer and you see the values for the description for inboundNatPools: Get-AzureRmResource -ResourceGroupName <RGname> -ResourceType Microsoft.Network/loadBalancers -ResourceName <load balancer name> Now set frontendPortRangeEnd and frontendPortRangeStart to the values you want. } Set-AzureRmResource -PropertyObject $PropertiesObject -ResourceGroupName <RG name> -ResourceType Microsoft. NOTE Scaling down the primary node type to less than the minimum number make the cluster unstable or bring it down. So at this time the auto-scale you get is purely driven by the performance counters that are emitted by each of the VM scale set instances. Scale a Service Fabric cluster in or out using auto- scale rules 3/2/2017 • 6 min to read • Edit Online Virtual machine scale sets are an Azure compute resource that you can use to deploy and manage a collection of virtual machines as a set. you need to set up auto-scale rules for each node type/VM scale set.Compute/VirtualMachineScaleSets Get-AzureRmVmss -ResourceGroupName <RGname> -VMScaleSetName <VM Scale Set name> Set auto-scale rules for the node type/VM scale set If your cluster has multiple node types. if any of the quota limits are hit. Currently the auto-scale feature is not driven by the loads that your applications may be reporting to Service Fabric. have different sets of ports open. . Choose the node type/VM scale set to scale Currently. The minimum number of nodes that you must have for the primary node type is driven by the reliability level you have chosen. To get the list of VM scale set that make up your cluster. then repeat this for each node types/VM scale sets that you want to scale (in or out). so let us use Azure PowerShell (1. you are not able to specify the auto-scale rules for VM scale sets using the portal. Each node type can then be scaled in or out independently. NOTE Your subscription must have enough cores to add the new VMs that make up this cluster. There is no model validation currently. Follow these instructions to set up auto-scale for each VM scale set. Read more about it in the Service Fabric nodetypes document. Take into account the number of nodes that you must have before you set up auto-scaling. and can have different capacity metrics. Since the Service Fabric node types in your cluster are made of VM scale sets at the backend. This could result in data loss for your applications and for the system services. run the following cmdlets: Get-AzureRmResource -ResourceGroupName <RGname> -ResourceType Microsoft.0+) to list the node types and then add auto-scale rules to them. so you get a deployment time failure. Every node type that is defined in a Service Fabric cluster is set up as a separate VM scale set. Read more about reliability levels. Refer to the details on reliability tiers here. 4. This allows for the system services (and your stateful services) to be shut down gracefully on the VM instance you are removing and new replicas created on other nodes. wait until the node is disabled. Manually remove VMs from the primary node type/VM scale set NOTE The service fabric system services run in the Primary node type in your cluster. You cannot hurry this step. At the very minimum. You need to execute the following steps one VM instance at a time. If not wait till the node is disabled. you need a certain number of nodes to be always up to maintain availability and preserve state of your service. 3. 3. So should never shut down or scale down the number of instances in that node types less than what the reliability tier warrants. Refer to the details on reliability tiers here. The instance removed is the highest VM instance. 1. 2. 1. This allows for the system services (and your stateful services) to be shut down gracefully on the VM instance you are removing and new replicas created else where. you need the number of nodes equal to the target replica set count of the partition/service. You cannot hurry this step. so do not expect the additions to be instantaneous. If not. Manually add VMs to a node type/VM scale set Follow the sample/instructions in the quick start template gallery to change the number of VMs in each Nodetype. You need the execute the following steps one VM instance at a time. . but never scale down the number of instances in the primary node types less than what the reliability tier warrants. Run Disable-ServiceFabricNode with intent ‘RemoveNode’ to disable the node you’re going to remove (the highest instance in that node type). This will now remove the highest VM instance. Repeat steps 1 through 3 as needed. Manually remove VMs from the non-primary node type/VM scale set NOTE For a stateful service. Run Get-ServiceFabricNode to make sure that the node has indeed transitioned to disabled. Follow the sample/instructions in the quick start template gallery to change the number of VMs by one in that Nodetype. Run Disable-ServiceFabricNode with intent ‘RemoveNode’ to disable the node you’re going to remove (the highest instance in that node type). NOTE Adding of VMs takes time. to allow for over 10 minutes before the VM capacity is available for the replicas/ service instances to get placed. NOTE In a scale down scenario. 2. Follow the sample/instructions in the quick start template gallery to change the number of VMs by one in that Nodetype. So plan to add capacity well in time. unless your node type has a durability level of Gold or Silver you need to call the Remove- ServiceFabricNodeState cmdlet with the appropriate node name. Run Get-ServiceFabricNode to make sure that the node has indeed transitioned to disabled. you have two options: 1) Choose a durability level of Gold or Silver (available soon) for the node types in your cluster. the VM was deleted but FM system service still thinks that the node (that was mapped to the VM that was deleted) will come back. it is typically unsafe to shut down all the machines in the cluster unless you have first performed a full backup of your state.referred to as "maintaining quorum. Which will then automatically remove the nodes from our system services (FM) state when you scale down." So. you need to call the Remove-ServiceFabricNodeState cmdlet. However. which gives you the infrastructure integration. and partitioning services: Plan your cluster capacity Cluster upgrades Partition stateful services for maximum scale . upgrading a cluster. but never scale down the number of instances in the primary node types less than what the reliability tier warrants. Repeat steps 1 through 3 as needed. Refer to the details on reliability tiers here. NOTE Service Fabric clusters require a certain number of nodes to be up at all the time in order to maintain availability and preserve state . Next steps Read the following to also learn about planning cluster capacity. Behaviors you may observe in Service Fabric Explorer When you scale up a cluster the Service Fabric Explorer will reflect the number of nodes (VM scale set instances) that are part of the cluster.4. Refer to the details on durability levels here 2) Once the VM instance has been scaled down. Here is the explanation for this behavior. When you scale the VM scale set down. In order to make sure that a node is removed when a VM is removed. So Service Fabric Explorer continues to display that node (though the health state may be error or unknown). The nodes listed in Service Fabric Explorer are a reflection of what the Service Fabric system services (FM specifically) knows about the number of nodes the cluster had/has. when you scale a cluster down you will see the removed node/VM instance displayed in an unhealthy state unless you call Remove-ServiceFabricNodeState cmd with the appropriate node name. so the scaling service could easily be a console application or Windows service running from outside the Service Fabric application. this approach may not be a good solution. use System. they do not automatically remove knowledge of that node from the associated Service Fabric cluster unless the node type has a durability level of Silver or Gold.FabricClient. you may wish to implement more customized automatic scaling models. To interact with the Service Fabric cluster itself. Both IAzure and FabricClient can connect to their associated Azure resources remotely. This document looks at programmatic methods of coordinating Azure scaling operations for more advanced scenarios. though. Because auto-scale rules work at the scale set level (rather than at the Service Fabric level). That article covers how Service Fabric clusters are built on top of virtual machine scale sets and can be scaled either manually or with auto-scale rules. scaling manually or via auto-scale rules are good solutions. One approach to implementing this 'home-made' auto-scaling functionality is to add a new stateless service to the Service Fabric application to manage scaling operations. the scaling code doesn't need to run as a service in the cluster to be scaled. they may not be the right fit. Although there are many metrics supported by auto-scale rules. This rude node removal will leave 'ghost' Service Fabric node state behind after scale-in operations. Note that a node type with a durability level of Gold or Silver will automatically clean up removed nodes. If your scenario calls for scaling based on some metric not covered in that set. Of course. An individual (or a service) would need to periodically clean up removed node state in the Service Fabric cluster. a set of triggers can determine if scaling is required (including checking parameters such as maximum cluster size and scaling cooldowns). it is still a limited set. Based on these limitations. If existing auto-scale options don't work for your scenario. auto- scale rules can remove Service Fabric nodes without shutting them down gracefully. If scaling operations are required frequently or at unpredictable times. When auto-scale rules remove an instance from a virtual machine scale set. these APIs make it possible to implement custom scaling logic. Potential drawbacks to these approaches include: Manually scaling requires you to log in and explicitly request scaling operations. Within the service's RunAsync method. In other scenarios. Scale a Service Fabric cluster programmatically 3/16/2017 • 6 min to read • Edit Online Fundamentals of scaling a Service Fabric cluster in Azure are covered in documentation on cluster scaling. then auto-scale rules may not be a good option. The API used for virtual machine scale set interactions (both to check the current number of virtual machine instances and to modify it) is the fluent Azure Management Compute library. Credential management One challenge of writing a service to handle scaling is that the service must be able to access virtual machine scale . The fluent compute library provides an easy-to-use API for interacting with virtual machine scale sets.Fabric. Reasons for programmatic scaling In many scenarios. Scaling APIs Azure APIs exist which allow applications to programmatically work with virtual machine scale sets and Service Fabric clusters. SubscriptionId == AzureSubscriptionId) { ServiceEventSource.Fluent soon.VirtualMachineScaleSets. As a temporary workaround.Update(). AzureClientKey.Apply().Azure. Accessing the Service Fabric cluster is easy if the scaling service is modifying its own Service Fabric application.Compute. Log in to the Azure CLI ( az login ) as a user with access to the virtual machine scale set 2. Scaling out Using the fluent Azure compute SDK. and tenant for later use.Capacity . NodeCount.FromServicePrincipal(AzureClientId. but a fix has been merged. which can be viewed with az account list The fluent compute library can log in using these credentials as follows: var credentials = AzureCredentials.Management. var newCapacity = Math. PowerShell cmdlets can be invoked from the scaling service to enact the same change (though this route means that PowerShell tools must be present): . password. instances can be added to the virtual machine scale set with just a few calls - var scaleSet = AzureClient?. protected settings from the scale set's Resource Manager template are lost.Authenticate(credentials). so the issue should be resolved in published versions of Microsoft.Current.0.ServiceMessage(Context. AzureEnvironment.WithCapacity(newCapacity). The bug is that when changing virtual machine scale set properties (like capacity) with the fluent compute API.ServiceMessage(Context.GetById(ScaleSetId).GetById(ScaleSetId). you can use a service principal created with the Azure CLI 2.AzureGlobalCloud). b. To log in. A service principal can be created with the following steps: 1. name.VirtualMachineScaleSets. There is currently a bug that keeps this code from working. if (AzureClient?.set resources without an interactive login.WithSubscription(AzureSubscriptionId).Value + 1). You will also need your subscription ID. Create the service principal with az ad sp create-for-rbac a. scale set instance count can be queried via AzureClient. scaleSet. Make note of the appId (called 'client ID' elsewhere). } Once logged in. } else { ServiceEventSource. "ERROR: Failed to login to Azure").Current. These missing settings cause (among other things) Service Fabric services to not set up properly on new virtual machine instances. but credentials are needed to access the scale set.Min(MaximumNodeCount. "Successfully logged into Azure"). IAzure AzureClient = Azure. AzureTenantId. psInstance. StringComparison.Invoke().NodeType.HadErrors) { foreach (var error in psInstance.NodeStatus == System.Equals(NodeTypeToScale.Automation. But.AddScript($@" $clientId = ""{AzureClientId}"" $clientKey = ConvertTo-SecureString -String ""{AzureClientKey}"" -AsPlainText -Force $Credential = New-Object -TypeName ""System. For non-seed nodes. newer nodes can be found by comparing NodeInstanceId . . adding a scale set instance should be all that's needed to start a new Service Fabric node since the scale set template includes extensions to automatically join new instances to the Service Fabric cluster.Where(n => n. as was discussed previously. Scaling in Scaling in is similar to scaling out. using (var client = new FabricClient()) { var mostRecentLiveNode = (await client.Streams.PSCredential"" -ArgumentList $clientId.Current. it's necessary to interact with the Service Fabric cluster to shut down the node to be removed and then to remove its state. So.Where(n => n. in the Bronze-durability scale-in case.Query. } } } As when adding a node manually. Once the node to be removed is found.Up) .QueryManager. using (var psInstance = PowerShell.sku.NodeStatus.Create()) { psInstance.OrderByDescending(n => n. if (psInstance. Preparing the node for shutdown involves finding the node to be removed (the most recently added node) and deactivating it.NodeInstanceId) .ServiceMessage(Context.capacity = {newCapacity} Update-AzureRmVmss -ResourceGroupName {ResourceGroup} -Name {NodeTypeToScale} -VirtualMachineScaleSet $vmss ").OrdinalIgnoreCase)) .FirstOrDefault(). $clientKey Login-AzureRmAccount -Credential $Credential -ServicePrincipal -TenantId {AzureTenantId} $vmss = Get-AzureRmVmss -ResourceGroupName {ResourceGroup} -VMScaleSetName {NodeTypeToScale} $vmss.Management.Fabric. Service Fabric only automatically cleans up removed nodes with a durability of Gold or Silver.GetNodeListAsync()) .Error) { ServiceEventSource. Be aware that seed nodes don't seem to always follow the convention that greater instance IDs are removed first. it can be deactivated and removed using the same FabricClient instance and the IAzure instance from earlier. The actual virtual machine scale set changes are practically the same. $"ERROR adding node: {error.ToString()}"). Delay(10 * 1000).Fabric.VirtualMachineScaleSets. var waitStart = DateTime.Apply().waitStart < timeout) { mostRecentLiveNode = (await client.NodeName}"). the ability to add or remove nodes manually is probably sufficient.RemoveNodeStateAsync(mostRecentLiveNode.Query. How you should approach Service Fabric scaling depends on your scenario.NodeName). For more complex scenarios.WithCapacity(newCapacity).GetNodeListAsync()).NodeStatus.DeactivateNodeAsync(mostRecentLiveNode. Service Fabric node state can be removed. auto-scale rules and SDKs exposing the ability to scale programmatically offer powerful alternatives.ClusterManager. familiarize yourself with the following concepts and useful APIs: Scaling manually or with auto-scale rules Fluent Azure Management Libraries for .Query.Fabric. Potential drawbacks As demonstrated in the preceding code snippets.NET (useful for interacting with a Service Fabric cluster's underlying virtual machine scale sets) System.1).NodeStatus.QueryManager. Once the virtual machine instance is removed.RemoveNode). NodeCount.Now.Update().Current.GetById(ScaleSetId).Update() not working until Azure/azure-sdk-for- net#2716 is addressed. you need to work around IVirtualMachineScaleSet. Next steps To get started implementing your own auto-scaling logic. while ((mostRecentLiveNode. $"Disabling node {mostRecentLiveNode.NodeName == mostRecentLiveNode. await client. This can be useful for scenarios requiring precise control over when or how an application scales in or out.NodeStatus == System.Value . // Remove the node from the Service Fabric cluster ServiceEventSource.FabricClient (useful for interacting with a Service Fabric cluster and its nodes) . await client.ClusterManager.ServiceMessage(Context. // Check min count scaleSet. // Wait (up to a timeout) for the node to gracefully shutdown var timeout = TimeSpan.Max(MinimumNodeCount. NodeDeactivationIntent. } // Decrement VMSS capacity var newCapacity = Math. this control comes with a tradeoff of code complexity. Using this approach means that you need to own scaling code.NodeName). creating your own scaling service provides the highest degree of control and customizability over your application's scaling behavior.NodeStatus == System. await Task.Up || mostRecentLiveNode.FirstOrDefault(n => n.Now . which is non-trivial. var scaleSet = AzureClient?.Fabric.NodeName. If scaling is uncommon.FromMinutes(5). However.Disabling) && DateTime. As before. An Azure Service Fabric cluster is a resource that you own. Setting the upgrade mode via portal You can set the cluster to automatic or manual when you are creating the cluster. Upgrade an Azure Service Fabric cluster 3/2/2017 • 10 min to read • Edit Online For any modern system. the The new releases are announced on the service fabric team blog. You do this by setting the "upgradeMode" cluster configuration on the portal or using Resource Manager at the time of creation or later on a live cluster NOTE Make sure to keep your cluster running a supported fabric version always. when Microsoft releases a new version or choose to select a supported fabric version you want your cluster to be on. the previous version is marked for end of support after a minimum of 60 days from that date. Controlling the fabric version that runs on your Cluster You can set your cluster to receive automatic fabric upgrades. The cluster remains in a warning state until you upgrade to a supported fabric version. a health event is generated that puts your cluster into a warning health state. designing for upgradability is key to achieving long-term success of your product. . but is partly managed by Microsoft. As and when we announce the release of a new version of service fabric. 14 days prior to the expiry of the release your cluster is running. The new release is available to choose then. This article describes what is managed automatically and what you can configure yourself. Scroll down this document to read more on how to set those custom health policies. If the cluster health policies are not met. Once you have fixed the issues that resulted in the rollback. Upgrading to a new version on a cluster that is set to Manual mode via portal.You can set the cluster to automatic or manual when on a live cluster. . by following the same steps as before. the upgrade is rolled back. The Fabric upgrade gets kicked off automatically. The cluster health policies (a combination of node health and the health all the applications running in the cluster) are adhered to during the upgrade. all you need to do is select the available version from the dropdown and save. using the manage experience. To upgrade to a new version. you need to initiate the upgrade again. ServiceFabric/clusters resource definition and set the "clusterCodeVersion" to one of the supported fabric versions as shown below and then deploy the template.Setting the upgrade mode via a Resource Manager template Add the "upgradeMode" configuration to the Microsoft. The valid values for "upgradeMode" are "Manual" or "Automatic" . the upgrade is rolled back.9999999". If the cluster health policies are not met. and you should get an output similar to this. Scroll down this document to read more on how to set those custom health policies. by following the same steps as before. to upgrade to a new version.Upgrading to a new version on a cluster that is set to Manual mode via a Resource Manager template. The latest release does not have a valid date . change the "clusterCodeVersion" to a supported version and deploy it. "supportExpiryUtc" tells your when a given release is expiring or has expired. which just means that the expiry date is not yet set. you need to initiate the upgrade again. The deployment of the template. Get list of all available version for all environments for a given subscription Run the following command. The cluster health policies (a combination of node health and the health all the applications running in the cluster) are adhered to during the upgrade.it has a value of "9999-12-31T23:59:59. kicks of the Fabric upgrade gets kicked off automatically. . Once you have fixed the issues that resulted in the rollback. When the cluster is in Manual mode. ServiceFabric/locations/eastus/clusterVersions?api-version=2016-09-01 Output: { "value": [ { "id": "subscriptions/35349203-a0b3-405e-8a23- 9f1450984307/providers/Microsoft.9490". "name": "5.ServiceFabric/environments/clusterVersions".1427.1427.4.9490".9999999". "name": "4. "supportExpiryUtc": "9999-12-31T23:59:59.9999999". "supportExpiryUtc": "9999-12-31T23:59:59. or both. we perform the upgrades in the following phases: Phase 1: An upgrade is performed by using all cluster health policies During this phase.ServiceFabric/environments/Windows/clusterVersions/5. "properties": { "codeVersion": "5. the upgrades proceed one upgrade domain at a time.9490". "type": " Microsoft.azure.9999999". the upgrade is rolled back.1427. "environment": "Linux" } } ] } Fabric upgrade behavior when the cluster Upgrade Mode is Automatic Microsoft maintains the fabric code and configuration that runs in an Azure cluster. "type": " Microsoft.0. "environment": "Windows" } }.4.1. If the cluster health policies are not met. "type": "Microsoft. Then an email is sent to the owner of the subscription. To make sure that your application suffers no impact or minimal impact due to these upgrades.com/subscriptions/1857f442-3bce-4b96-ad95- 627f76437a67/providers/Microsoft. The cluster health policies (a combination of node health and the health all the applications running in the cluster) are adhered to during the upgrade. and the applications that were running in the cluster continue to run without any downtime. { "id": "subscriptions/35349203-a0b3-405e-8a23- 9f1450984307/providers/Microsoft.1427. "name": "5.0.ServiceFabric/locations/{{location}}/ clusterVersions?api-version=2016-09-01 Example: https://management.1427.1427.4.0. "properties": { "codeVersion": "4. We perform automatic monitored upgrades to the software on an as-needed basis. { "id": "subscriptions/35349203-a0b3-405e-8a23- 9f1450984307/providers/Microsoft.1427. GET https://<endpoint>/subscriptions/{{subscriptionId}}/providers/Microsoft.9490". "properties": { "codeVersion": "5.ServiceFabric/environments/Windows/clusterVersions/4.0. configuration. These upgrades could be code. "supportExpiryUtc": "2016-11-26T23:59:59.9490".1427.1427. The email contains the following information: Notification that we had to roll back a cluster upgrade.ServiceFabric/environments/clusterVersions".ServiceFabric/environments/Windows/clusterVersions/4.9490".9490". .9490".1.9490".ServiceFabric/environments/clusterVersions". "environment": "Windows" } }. the upgrade is rolled back. the upgrade is considered successful and marked complete. Phase 3 upgrades proceed one upgrade domain at a time. If the cluster health policies in effect are not met. As in Phase 1. An email with this information is sent to the subscription owner. We try to execute the same upgrade a few more times in case any upgrades failed for infrastructure reasons. the Phase 2 upgrades proceed one upgrade domain at a time. Suggested remedial actions. the cluster is pinned. Then an email is sent to the owner of the subscription. There is no email confirmation of a successful run. There is no email confirmation of a successful run. if any. The email contains the following information: Notification that we had to roll back a cluster upgrade. We try to execute the same upgrade a few more times in case any upgrades failed for infrastructure reasons. The cluster health policies (a combination of node health and the health all the applications running in the cluster) are adhered to for the duration of the upgrade. If the cluster health policies are met. There is no email confirmation of a successful run. we proceed to Phase 3. After the n days from the date the email was sent. The number of days (n) until we execute Phase 3. The emails we send you in Phase 2 must be taken seriously and remedial actions must be taken. the upgrade is considered successful and marked complete. If your cluster gets to this phase. Here are the configurations that you can change on a live cluster. and the applications that were running in the cluster continue to run without any downtime. If the cluster health policies are not met. We expect most of the cluster upgrades to succeed without impacting your application availability. Cluster configurations that you control In addition to the ability to set the cluster upgrade mode. we proceed to Phase 2. the upgrade is considered successful and marked complete. if any. We do not expect any clusters to get into a state where Phase 3 has failed. If the cluster health policies are met. A reminder email is sent a couple of days before n days are up. The number of days (n) until we execute Phase 2. If the cluster health policies are met. This can happen during the initial upgrade or any of the upgrade reruns in this phase. Similar to the other two phases. We try to execute the same upgrade a few more times in case any upgrades failed for infrastructure reasons. so that it will no longer receive support and/or upgrades. . Phase 2: An upgrade is performed by using default health policies only The health policies in this phase are set in such a way that the number of applications that were healthy at the beginning of the upgrade remains the same for the duration of the upgrade process. This can happen during the initial upgrade or any of the upgrade reruns in this phase. there is a good chance that your application becomes unhealthy and/or lose availability. along with the remedial actions. This can happen during the initial upgrade or any of the upgrade reruns in this phase. Phase 3: An upgrade is performed by using aggressive health policies These health policies in this phase are geared towards completion of the upgrade rather than the health of the applications. After that. the upgrade is rolled back. Suggested remedial actions. After the n days from the date the email was sent. This is to avoid sending you too many emails--receiving an email should be seen as an exception to normal. Very few cluster upgrades end up in this phase. . the load balancers are named "LB-name of the Resource group-NodeTypename". Add a new probe to the appropriate load balancer. Since the load balancer names are unique only within a resource group. Add a new rule to the same load balancer by using the probe that you created in the previous step. one for each node type. To open a new port on all VMs in a node type. 2.Certificates You can add new or delete certificates for the cluster and client via the portal easily. it is best if you search for them under a specific resource group. or you can use Resource Manager PowerShell directly. You can use the portal. If you deployed your cluster by using the portal. Add a new rule to the load balancer. do the following: 1. Refer to this document for detailed instructions Application ports You can change application ports by changing the Load Balancer resource properties that are associated with the node type. Placement properties For each of the node types. then these policies get applied each time you select a new version triggering the system to kick off the fabric upgrade in your cluster. and how to define them. then these policies get applied to the Phase-1 of the automatic fabric upgrades. you can add custom placement properties that you want to use in your applications. by selecting the advanced upgrade settings. If you do not override the policies. If you have set your cluster to Automatic fabric upgrades. Review the following picture on how to. . the defaults are used. Capacity metrics For each of the node types. For details on the use of capacity metrics to report load. NOTE For details on the use of placement constraints. You can specify the custom health policies or review the current settings under the "fabric upgrade" blade. node properties. refer to the Service Fabric Cluster Resource Manager Documents on Describing Your Cluster and Metrics and Load.Health polices You can specify custom health polices for fabric upgrade. refer to the section "Placement Constraints and Node Properties" in the Service Fabric Cluster Resource Manager Document on Describing Your Cluster. If you have set your cluster for Manual fabric upgrades. Fabric upgrade settings . you can add custom capacity metrics that you want to use in your applications to report load. NodeType is a default property that you can use without adding it explicitly. Customize Fabric settings for your cluster Refer to service fabric cluster fabric settings on what and how you can customize them. You are responsible for this upgrade--there is currently no automation for this. But currently. so that you do not take down more than one at a time. Next steps Learn how to customize some of the service fabric cluster fabric settings Learn how to scale your cluster in and out Learn about application upgrades . you are responsible to patch your VMs. you must do it one VM at a time. OS upgrades on the VMs that make up the cluster If you must upgrade the OS image on the virtual machines of the cluster. OS patches on the VMs that make up the cluster This capability is planned for the future as an automated feature. You must do this one VM at a time. including the resource group. 3. Follow the instructions on that page to complete the deletion of the resource group. You can delete the resource group using PowerShell or through the Azure portal. Delete the entire resource group (RG) that the Service Fabric cluster is in This is the easiest way to ensure that you delete all the resources associated with your cluster. follow the steps outlined in How to install and Configure Azure PowerShell. 5. Make sure Azure PowerShell 1. then you can delete specific resources. This brings up the Resource Group Essentials page. So to completely delete a Service Fabric cluster you also need to delete all the resources it is made of. Open a PowerShell window and run the following PS cmdlets: Login-AzureRmAccount Remove-AzureRmResourceGroup -Name <name of ResouceGroup> -Force You will get a prompt to confirm the deletion if you did not use the -Force option. Delete a resource group in the Azure portal 1. You have two options: Either delete the resource group that the cluster is in (which deletes the cluster resource and any other resources in the resource group) or specifically delete the cluster resource and it's associated resources (but not other resources in the resource group). Navigate to the Service Fabric cluster you want to delete. Click Delete. Delete the resource group using Azure PowerShell You can also delete the resource group by running the following Azure PowerShell cmdlets. NOTE Deleting the cluster resource does not delete all the other resources that your Service Fabric cluster is composed of. Login to the Azure portal. 6. Delete a Service Fabric cluster on Azure and the resources it uses 3/24/2017 • 3 min to read • Edit Online A Service Fabric cluster is made up of many other Azure resources in addition to the cluster resource itself. 2. On confirmation the RG and all the resources it contains are deleted. Click on the Resource Group name on the cluster essentials page. If your resource group has resources that are not related to Service fabric cluster. 4.0 or greater is installed on your computer. . If you have not done this before. Click on one of the Tags in the tags blade to get a list of all the resources with that tag. Navigate to the Service Fabric cluster you want to delete. 5. Click on Tags under Resource Management in the settings blade. then it is easier to delete the entire resource group. Login to the Azure portal. then follow these steps. If you deployed your cluster using the portal or using one of the Service Fabric Resource Manager templates from the template gallery. 3. Go to All settings on the essentials blade. . You can use them to decide which resources you want to delete. then all the resources that the cluster uses are tagged with the following two tags. Tag#1: Key = clusterName. If you want to selectively delete the resources one-by-one in your resource group.Delete the cluster resource and the resources it uses. 4. Value = ServiceFabric Delete specific resources in the Azure portal 1. but not other resources in the resource group If your resource group has only resources that are related to the Service Fabric cluster you want to delete. Value = 'name of the cluster' Tag#2: Key = resourceName. 2. Delete the resources using Azure PowerShell You can delete the resources one-by-one by running the following Azure PowerShell cmdlets. Open a PowerShell window and run the following PS cmdlets: Login-AzureRmAccount For each of the resources you want to delete. Make sure Azure PowerShell 1. Once you have the list of tagged resources.6.0 or greater is installed on your computer. click on each of the resources and delete them. If you have not done this before. follow the steps outlined in How to install and Configure Azure PowerShell. run the following: . Remove-AzureRmResource -ResourceName "<name of the Resource>" -ResourceType "<Resource Type>" - ResourceGroupName "<name of the resource group>" -Force To delete the cluster resource.ServiceFabric/clusters" -ResourceGroupName "<name of the resource group>" -Force Next steps Read the following to also learn about upgrading a cluster and partitioning services: Learn about cluster upgrades Learn about partitioning stateful services for maximum scale . run the following: Remove-AzureRmResource -ResourceName "<name of the Resource>" -ResourceType "Microsoft. making the cluster more secure. query capabilities). By default. You specify the two client roles (administrator and client) at the time of cluster creation by providing separate certificates for each. See Service Fabric cluster security for details on setting up a secure Service Fabric cluster. Access control allows the cluster administrator to limit access to certain cluster operations for different groups of users. It can perform any reads and writes on the Service Fabric cluster. Administrators have full access to management capabilities (including read/write capabilities). and the ability to resolve applications and services. Role-based access control for Service Fabric clients 3/3/2017 • 2 min to read • Edit Online Azure Service Fabric supports two different access control types for clients that are connected to a Service Fabric cluster: administrator and user. users only have read access to management capabilities (for example. Default access control settings The administrator access control type has full access to all the FabricClient APIs. including the following operations: Application and service operations CreateService: service creation CreateServiceFromTemplate: service creation from template UpdateService: service updates DeleteService: service deletion ProvisionApplicationType: application type provisioning CreateApplication: application creation DeleteApplication: application deletion UpgradeApplication: starting or interrupting application upgrades UnprovisionApplicationType: application type unprovisioning MoveNextUpgradeDomain: resuming application upgrades with an explicit update domain ReportUpgradeHealth: resuming application upgrades with the current upgrade progress ReportHealth: reporting health PredeployPackageToNode: predeployment API CodePackageControl: restarting code packages RecoverPartition: recovering a partition RecoverPartitions: recovering partitions RecoverServicePartitions: recovering service partitions RecoverSystemPartitions: recovering system service partitions Cluster operations ProvisionFabric: MSI and/or cluster manifest provisioning UpgradeFabric: starting cluster upgrades UnprovisionFabric: MSI and/or cluster manifest unprovisioning MoveNextFabricUpgradeDomain: resuming cluster upgrades with an explicit update domain ReportFabricUpgradeHealth: resuming cluster upgrades with the current upgrade progress StartInfrastructureTask: starting infrastructure tasks FinishInfrastructureTask: finishing infrastructure tasks . and value fields. stopping. user. You can change the defaults by going to the Fabric Settings option during cluster creation. you can provide admin capabilities to the client if needed. InvokeInfrastructureCommand: infrastructure task management commands ActivateNode: activating a node DeactivateNode: deactivating a node DeactivateNodesBatch: deactivating multiple nodes RemoveNodeDeactivations: reverting deactivation on multiple nodes GetNodeDeactivationStatus: checking deactivation status NodeStateRemoved: reporting node state removed ReportFault: reporting fault FileContent: image store client file transfer (external to cluster) FileDownload: image store client file download initiation (external to cluster) InternalList: image store client file list operation (internal) Delete: image store client delete operation Upload: image store client upload operation NodeControl: starting. limited to the following operations: EnumerateSubnames: naming URI enumeration EnumerateProperties: naming property enumeration PropertyReadBatch: naming property read operations GetServiceDescription: long-poll service notifications and reading service descriptions ResolveService: complaint-based service resolution ResolveNameOwner: resolving naming URI owner ResolvePartition: resolving system services ServiceNotifications: event-based service notifications GetUpgradeStatus: polling application upgrade status GetFabricUpgradeStatus: polling cluster upgrade status InvokeInfrastructureQuery: querying infrastructure tasks List: image store client file list operation ResetPartitionLoad: resetting load for a failover unit ToggleVerboseServicePlacementHealthReporting: toggling verbose service placement health reporting The admin access control also has access to the preceding operations. and restarting nodes MoveReplicaControl: moving replicas from one node to another Miscellaneous operations Ping: client pings Query: all queries allowed NameExists: naming URI existence checks The user access control type is. Next steps Service Fabric cluster security Service Fabric cluster creation . admin. and providing the preceding settings in the name. by default. Changing default settings for client roles In the cluster manifest file. Select Edit and update the fabricSettings JSON element and add a new element { "name": "Diagnostics". Customizing Service Fabric cluster settings using Azure Resource Manager templates The steps below illustrate how to add a new setting MaxDiskQuotaInMB to the Diagnostics section. Navigate to your subscription by expanding subscriptions -> resource groups -> Microsoft. "parameters": [ { "name": "MaxDiskQuotaInMB". ProducerInstances String The list of DCA producer instances. . Customize Service Fabric cluster settings and Fabric Upgrade policy 3/28/2017 • 43 min to read • Edit Online This document tells you how to customize the various fabric settings and the fabric upgrade policy for your Service Fabric cluster. "value": "65536" } ] } Fabric settings that you can customize Here are the Fabric settings that you can customize: Section Name: Diagnostics PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ConsumerInstances String The list of DCA consumer instances. 1. default is 3 Number of days after which we delete old ETL files containing application ETW traces. You can customize them on the portal or using an Azure Resource Manager template. AppEtwTraceDeletionAgeInDays Int.ServiceFabric -> Your Cluster Name 3. Go to https://resources. select "Read/Write" 4. In the top right corner. In case a setting listed below is not available via the portal customize it using an Azure Resource Manager template.azure.com 2. NOTE Not all settings may be available via the portal. default is 4 Trace etw level can take values 1. This is generated when the cluster is created. ApplicationLogsFormatVersion Int. MaxCounterBinaryFileSizeInMB Int. 2. default is 60 Sampling interval for performance counters being collected. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION AppDiagnosticStoreAccessRequiresImpe Bool. Supported values are 0 and 1. default is 10 Maximum interval (in seconds) after nMinutes which a new performance counter binary file is created. default is true Whether or not impersonation is rsonation required when accessing diagnostic stores on behalf of the application. Version 1 includes more fields from the ETW event record than version 0. To be supported you must keep the trace level at 4 Section Name: PerformanceCounterLocalStore PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION IsEnabled Bool. default is 0 Version for application logs format. SamplingIntervalInSeconds Int. EnableTelemetry Bool. Counters String Comma-separated list of performance counters to collect. Section Name: Setup . default is 1 Maximum size (in MB) for each performance counter binary file. DiskFullSafetySpaceInMB Int. NewCounterBinaryFileCreationIntervalI Int. default is false Flag indicates whether circular trace sessions should be used. MaxDiskQuotaInMB Int. default is true This is going to enable or disable telemetry. default is 65536 Disk quota in MB for Windows Fabric log files. Section Name: Trace/Etw PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION Level Int. default is 1024 Remaining disk space in MB to protect from use by DCA. ClusterId String The unique id of the cluster. 3. EnableCircularTraceSession Bool. default is true Flag indicates whether performance counter collection on local node is enabled. 4. ReplicatorAddress string. default is false Specifies if firewall settings need to be set by the system or not. MaxReplicationMessageSize Uint. default is 0. default is 64 This value defines the initial size for the queue which maintains the replication operations on the primary. Determines the amount of time that the replicator waits after receiving an operation before sending back an acknowledgement. Other operations received during this time period will have their acknowledgements sent back in a single message-> reducing network traffic but potentially reducing the throughput of the replicator. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION FabricDataRoot String Service Fabric data root directory. If you are using third party firewalls. SkipFirewallConfiguration Bool. then you must open the ports for the system and applications to use Section Name: TransactionalReplicator PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION MaxCopyQueueSize Uint. BatchAcknowledgementInterval Time in seconds. Default is 50MB. Note that it must be a power of 2. default is 16384 This is the maximum value defines the initial size for the queue which maintains replication operations. .015 Specify timespan in seconds. Note that it must be a power of 2. ServiceStartupType String The startup type of the fabric host service. This is where SF logs and traces are placed. default is 52428800 Maximum message size of replication operations. default is "localhost:0" The endpoint in form of a string - 'IP:Port' which is used by the Windows Fabric Replicator to establish connections with other replicas in order to send/receive operations. Default for Azure is d:\svcfab FabricLogRoot String Service fabric log root directory. If during runtime the queue grows to this size operations will be throttled between the primary and secondary replicators. ServiceRunAsAccountName String The account name under which to run fabric host service. This applies only if you are using windows firewall. InitialPrimaryReplicationQueueSize Uint. MaxPrimaryReplicationQueueMemorySi Uint. MaxMetadataSizeInKB Int.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION MaxPrimaryReplicationQueueSize Uint. This value is the maximum number of bytes that can be outstanding during core logger updates. default is 4 Maximum size of the log stream metadata. . default is 64 This value defines the initial size for the queue which maintains the replication operations on the secondary. InitialSecondaryReplicationQueueSize Uint. user is required to take a full backup. default is 8192 This is the maximum number of operations that could exist in the primary replication queue. default is 50 A checkpoint will be initiated when the log usage exceeds this value. An incremental backup requests will fail if the incremental backup would generate a backup log that would cause the accumulated backup logs since the relevant full backup to be larger than this size. Note that it must be a power of 2. default is 1024 Maximum size of a log stream record. default is 16384 This is the maximum number of operations that could exist in the secondary replication queue. MaxRecordSizeInKB Uint. default is 0 Int for maximum write queue depth that the core logger can use as specified in kilobytes for the log that is associated with this replica. MaxWriteQueueDepthInKB Int. It may be 0 for the core logger to compute an appropriate value or a multiple of 4. default is false Bool which controls if the operations on ns the secondary replicator are cleared once they are acknowledged to the primary(flushed to the disk). while catching up replicas after a failover. default is 0 This is the maximum value of the ySize secondary replication queue in bytes. SecondaryClearAcknowledgedOperatio Bool. default is 800 Max accumulated size (in MB) of backup logs in a given backup log chain. Note that it must be a power of 2. MaxSecondaryReplicationQueueSize Uint. MaxAccumulatedBackupLogSizeInMB Int. default is 0 This is the maximum value of the ze primary replication queue in bytes. In such cases. Settings this to TRUE can result in additional disk reads on the new primary. Note that it must be a power of 2. CheckpointThresholdInMB Int. MaxSecondaryReplicationQueueMemor Uint. the Client switches to use the next address sequentially. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION SharedLogId String Shared log identifier. Section Name: FabricClient PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION NodeAddresses string. default is 300 Specify duration for api before warning health event is fired. ServiceChangePollInterval Time in seconds. default is 120 Specify timespan in seconds. Connection timeout interval for each time client tries to open a connection to the gateway. Initially the Client connects selecting one of the addresses randomly. If this value is empty then the default shared log is used. For 0. default is 20 The interval at which the FabricClient transport sends keep-alive messages to the gateway. default is 0 Minimum size of the transactional log. . keepAlive is disabled. The interval between consecutive polls for service changes from the client to the gateway for registered service change notifications callbacks. default is 100000 Number of partitions cached for service resolution (set to 0 for no limit). 0 indicates that the replicator will determine the minimum log size according to other settings. If more than one connection string is supplied and a connection fails because of a communication or timeout error. This is a guid and should be unique for each shared log. Must be a positive value. ConnectionInitializationTimeout Time in seconds. SlowApiMonitoringDuration Time in seconds. The log will not be allowed to truncate to a size below this setting. PartitionLocationCacheLimit Int. default is 2 Specify timespan in seconds. MinLogSizeInMB Int. See the Naming Service Address retry section for details on retries semantics. Increasing this value increases the possibility of doing partial copies and incremental backups since chances of relevant log records being truncated is lowered. SharedLogPath String Path to the shared log. default is "" A collection of addresses (connection strings) on different nodes that can be used to communicate with the the Naming Service. KeepAliveIntervalInSeconds Int. default is 120 Specify timespan in seconds. default is false Cluster health evaluation policy: enable per application type health evaluation. Section Name: NodeDomainIds PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION UpgradeDomainId string. The interval at which reporting component re-sends accumulated health reports to Health Manager. MaxFileSenderThreads Uint. RetryBackoffInterval Time in seconds. Section Name: HealthManager PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION EnableApplicationTypeHealthEvaluation Bool. Setting to 0 or negative value disables monitoring. Section Name: Common PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PerfMonitorInterval Time in seconds. default is 30 Specify timespan in seconds. default is 10 The max number of files that are transferred in parallel. The interval at which reporting component sends accumulated health reports to Health Manager. default is 3 Specify timespan in seconds. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION HealthOperationTimeout Time in seconds. HealthReportRetrySendInterval Time in seconds. . Section Name: FabricNode PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION StateTraceInterval Time in seconds. default is "" Describes the upgrade domain a node belongs to. The timeout for a report message sent to Health Manager. HealthReportSendInterval Time in seconds. default is 30 Specify timespan in seconds. The back- off interval before retrying the operation. default is 1 Specify timespan in seconds. Performance monitoring interval. default is 300 Specify timespan in seconds. The interval for tracing node status on each node and up nodes on FM/FMM. ClusterX509FindValue string. The fault domain is defined through a URI that describes the location of the node in the datacenter. default is 0 End (no inclusive) of the application ports managed by hosting subsystem. . PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PropertyGroup NodeFaultDomainIdCollection Describes the fault domains a node belongs to. Section Name: FabricNode PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION StartApplicationPortRange Int. Required if EndpointFilteringEnabled is true in Hosting. default is "" Search filter value used to locate cluster certificate. Section Name: NodeProperties PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PropertyGroup NodePropertyCollectionMap A collection of string key-value pairs for node properties. when there are multiple matches. Section Name: NodeCapacities PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PropertyGroup NodeCapacityCollectionMap A collection of node capacities for different metrics. ClusterX509StoreName string. ClusterX509FindType string. EndApplicationPortRange Int. "FindBySubjectName" With "FindBySubjectName". Required if EndpointFilteringEnabled is true in Hosting. ClusterX509FindValueSecondary string. default is "My" Name of X. Fault Domain URIs are of the format fd:/fd/ followed by a URI path segment.509 certificate store that contains cluster certificate for securing intra-cluster communication. default is 0 Start of the application ports managed by hosting subsystem. the one with the furthest expiration is used. default is "FindByThumbprint" Indicates how to search for cluster certificate in the store specified by ClusterX509StoreName Supported values: "FindByThumbprint". default is "" Search filter value used to locate cluster certificate. default is "" Search filter value used to locate certificate for default admin role FabricClient. default is "" Search filter value used to locate certificate for default user role FabricClient. default is "" Search filter value used to locate certificate for default admin role FabricClient.509 certificate store that contains server certificate for entree service. default is "" Search filter value used to locate server certificate. default is "FindByThumbprint" Indicates how to search for certificate in the store specified by UserRoleClientX509StoreName Supported value: FindByThumbprint. UserRoleClientX509FindValue string. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ServerAuthX509StoreName string. Section Name: Paas .509 certificate store that contains certificate for default user role FabricClient. FindBySubjectName. ClientAuthX509FindType string. ClientAuthX509FindValueSecondary string. default is "" Search filter value used to locate certificate for default user role FabricClient. FindBySubjectName. default is "My" Name of X. ServerAuthX509FindType string. UserRoleClientX509StoreName string. ServerAuthX509FindValueSecondary string. default is "" Search filter value used to locate server certificate. default is "My" Name of the X. ClientAuthX509FindValue string. UserRoleClientX509FindValueSecondary string. default is "FindByThumbprint" Indicates how to search for certificate in the store specified by ClientAuthX509StoreName Supported value: FindByThumbprint. FindBySubjectName. ClientAuthX509StoreName string. default is "FindByThumbprint" Indicates how to search for server certificate in the store specified by ServerAuthX509StoreName Supported value: FindByThumbprint.509 certificate store that contains certificate for default admin role FabricClient. ServerAuthX509FindValue string. UserRoleClientX509FindType string. default is "My" Name of the X. Section Name: FabricHost PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION StopTimeout Time in seconds. default is 300 Specify timespan in seconds. default is 5 Specify timespan in seconds.On every continuous activation failure the system will retry the activation for up to the MaxActivationFailureCount. The retry interval on every try is a product of continuous activation failure and the activation back-off interval. EnableRestartManagement Bool. default is false This is to enable server restart. default is false This is to enable base update for server. default is 300 Specify timespan in seconds. Backoff interval on every activation failure. ActivationMaxRetryInterval Time in seconds. EnableServiceFabricAutomaticUpdates Bool. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ClusterId string. ActivationMaxFailureCount Int. Windows Fabric waits for this duration for the replica to come back up before creating new replacement replicas (which would require a copy of the state). StartTimeout Time in seconds. . default is false This is to enable fabric automatic update via Windows Update. On every continuous failure the retry interval is calculated as Min( ActivationMaxRetryInterval. Timeout for fabricactivationmanager startup. default is 60. ActivationRetryBackoffInterval Time in seconds. The timeout for hosted service activation. default is 300 Specify timespan in seconds. When a persisted replica goes down. default is "" X509 certificate store used by fabric for configuration protection. deactivation and upgrade.0 * 30 Specify timespan in seconds. Section Name: FailoverManager PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION UserReplicaRestartWaitDuration Time in seconds. Max retry interval for Activation. EnableServiceFabricBaseUpgrade Bool. Continuous Failure Count * ActivationRetryBackoffInterval). default is 10 This is the maximum count for which system will retry failed activation before giving up. at a cost of increased load on Windows Fabric and the amount of time it takes to perform updates to the naming data. default is 3600. This timer determines how long the FM will keep the standby replica before discarding it. this timer starts. UserStandByReplicaKeepDuration Time in seconds. decreasing the change that the information will be lost as a result of node failures. Increasing the number of replica sets increases the level of reliability for the information in the Naming Service Store. default is 7 The number of replica sets for each partition of the Naming Service store. default is 3 The minimum number of Naming Service replicas required to write into to complete an update. When it expires the FM will begin to replace the replicas which are down (it does not yet consider them lost). the partition is recovered from quorum loss by considering the down replicas as lost. default is (60. Note that this can potentially incur data loss. MinReplicaSetSize Int. When a *7 persisted replica come back from a down state. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION QuorumLossWaitDuration Time in seconds. UserMaxStandByReplicaCount Int. Section Name: NamingService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION TargetReplicaSetSize Int. This is the max duration for which we allow a partition to be in a state of quorum loss. This value should never be more than the TargetReplicaSetSize. . default is MaxValue Specify timespan in seconds. If the partition is still in quorum loss after this duration. it may have already been replaced. When a Naming Service replica goes down.0 * 30) Specify timespan in seconds.0 * 24 Specify timespan in seconds. default is 1 The default max number of StandBy replicas that the system keeps for user services. If there are fewer replicas than this active in the system the Reliability System denies updates to the Naming Service Store until replicas are restored. ReplicaRestartWaitDuration Time in seconds. MaxFileOperationTimeout Time in seconds. MaxNamingServiceHealthReports Int. default is 5 Specify timespan in seconds. default is 600 Specify timespan in seconds. When it expires the FM will consider the down replicas as lost. ServiceNotificationTimeout Time in seconds. StandByReplicaKeepDuration Time in seconds. The maximum timeout allowed for client operations. default is 410241024 The maximum message size for client node communication when using naming. . this timer starts. default is 30 Specify timespan in seconds. Interval in which the naming inconsistency repair between the authority owner and name owner will start. all slow operations are sent. Requests specifying a larger timeout will be rejected. Requests specifying a larger timeout will be rejected. RepairInterval Time in seconds. Not that this may result in data loss. default value is 4MB. default is "" Placement constraint for the Naming Service. default is 3600. and attempt to recover quorum. MaxMessageSize Int. default is 10 The maximum number of slow operations that Naming store service reports unhealthy at one time. it may have already been replaced. ServiceDescriptionCacheLimit Int. The maximum timeout allowed for file store service operation. default is 0 The maximum number of entries maintained in the LRU service description cache at the Naming Store Service (set to 0 for no limit). The timeout used when delivering service notifications to the client. default is 1000 The maximum allowed number of client connections per gateway. PlacementConstraints string. This timer determines how long the FM will keep the standby replica before discarding it. MaxClientConnections Int.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION QuorumLossWaitDuration Time in seconds.0 * 2 Specify timespan in seconds. MaxOperationTimeout Time in seconds. If 0. When a Naming Service replicas come back from a down state. default is 30 Specify timespan in seconds. DOS attack alleviation. When a Naming Service gets into quorum loss. default is MaxValue Specify timespan in seconds. Any empty partitions above this number will be removed from the index in ascending lookup version order. default is 1000 The maximum number of empty partitions that will remain indexed in the notification cache for synchronizing reconnecting clients. This is only needed for "DomainUser" or "ManagedServiceAccount" account type. default is 3 The number of partitions of the Naming Service store to be created. default is "" Indicates the RunAs account name. so partition keys [0. Each partition owns a single partition key that corresponds to its index. Valid values are "domain\user" or "user@domain". RunAsPassword string. RunAsAccountType string. Reconnecting clients can still synchronize and receive missed empty partition updates. MaxIndexedEmptyPartitions Int. GatewayServiceDescriptionCacheLimit Int. This is only needed for "DomainUser" account type. PartitionCount) exist. but the synchronization protocol becomes more expensive. at a cost of increased utilization of resources (since PartitionCount*ReplicaSetSize service replicas must be maintained). default is "" Indicates the RunAs account type. This is needed for any RunAs section Valid values are "DomainUser/NetworkService/Managed ServiceAccount/LocalSystem". default is 0 The maximum number of entries maintained in the LRU service description cache at the Naming Gateway (set to 0 for no limit). PartitionCount Int. Section Name: RunAs_Fabric . PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION MaxOutstandingNotificationsPerClient Int. default is 1000 The maximum number of outstanding notifications before a client registration is forcibly closed by the gateway. Section Name: RunAs PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION RunAsAccountName string. default is "" Indicates the RunAs account password. Increasing the number of Naming Service partitions increases the scale that the Naming Service can perform at by decreasing the average amount of data held by any backing replica set. RunAsAccountType string. This is only needed for "DomainUser" or "ManagedServiceAccount" account type. RunAsAccountType string. This is only needed for "DomainUser" account type. RunAsPassword string. This is needed for any RunAs section Valid values are "LocalUser/DomainUser/NetworkService /ManagedServiceAccount/LocalSystem". This is only needed for "DomainUser" account type. This is only needed for "DomainUser" account type. This is needed for any RunAs section Valid values are "LocalUser/DomainUser/NetworkService /ManagedServiceAccount/LocalSystem". default is "" Indicates the RunAs account type. Valid values are "domain\user" or "user@domain". Section Name: RunAs_DCA PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION RunAsAccountName string. default is "" Indicates the RunAs account name. RunAsPassword string. RunAsPassword string. Section Name: RunAs_HttpGateway PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION RunAsAccountName string. Section Name: HttpGateway . Valid values are "domain\user" or "user@domain". default is "" Indicates the RunAs account password. This is needed for any RunAs section Valid values are "LocalUser/DomainUser/NetworkService /ManagedServiceAccount/LocalSystem". default is "" Indicates the RunAs account password. default is "" Indicates the RunAs account type. default is "" Indicates the RunAs account name. This is only needed for "DomainUser" or "ManagedServiceAccount" account type. default is "" Indicates the RunAs account password. default is "" Indicates the RunAs account type. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION RunAsAccountName string. RunAsAccountType string. default is "" Indicates the RunAs account name. This is only needed for "DomainUser" or "ManagedServiceAccount" account type. Valid values are "domain\user" or "user@domain". MaxEntityBodySize Uint. Use 0 to indicate no limit. Httpgateway is disabled by default and this config needs to be set to enable it. Use 0 to indicate no limit Default should be consistent with SharedLogSizeInMB below. default is 8192 The number of MB to allocate in the shared log container. If one then the memory settings are configured automatically and may change based on system conditions. default is false Enables/Disables the httpgateway. So this has to be >= 4096. Section Name: KtlLogger PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION AutomaticMemoryConfiguration Int. Use 0 to indicate no limit. Minimum read chunk size is 4096 bytes. MaximumDestagingWriteOutstandingI Int. SharedLogId string. SharedLogPath string. default is "" Path and file name to location to place shared log container. default is 8388608 The number of KB to initially allocate for the write buffer memory pool. default is 0 The number of KB to allow the shared nKB log to advance ahead of the dedicated log. Use "" if using default path under fabric data root. default is 50 Number of reads to post to the http server queue. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION IsEnabled Bool. default is 0 The number of KB to allow the write buffer memory pool to grow up to. Section Name: ApplicationGateway/Http . If zero then the memory configuration settings are used directly and do not change based on system conditions. SharedLogSizeInMB Int. Use "" for using default path under fabric data root. WriteBufferMemoryPoolMinimumInKB Int. Httpgateway will fail a request if it has a body of size > this value. This controls the number of concurrent requests that can be satisfied by the HttpGateway. WriteBufferMemoryPoolMaximumInKB Int. default is 4194304 Gives the maximum size of the body that can be expected from a http request. Default value is 4MB. ActiveListeners Uint. default is 1 Flag that indicates if the memory settings should be automatically and dynamically configured. default is "" Unique guid for shared log container. Gives the default back-off interval before retrying a failed resolve service operation. default is "None" Indicates the type of security credentials to use at the http app gateway endpoint Valid values are "None/X509. HttpApplicationGateway is disabled by default and this config needs to be set to enable it. default is 60 Specify timespan in seconds. GatewayAuthCredentialType string. ResolveServiceBackoffInterval Time in seconds. Section Name: Management . GatewayX509CertificateFindType string. default is 5 Specify timespan in seconds. This controls the number of concurrent requests that can be satisfied by the HttpGateway. FindValueSecondary is looked up. default is false Enables/Disables the HttpApplicationGateway. and if that doesnt exist. This certificate is configured on the https endpoint and can also be used to verify the identity of the app if needed by the services. default is 1000 Number of reads to post to the http server queue. FindValueSecondary is looked up. BodyChunkSize Uint. FindValue is looked up first. FindBySubjectName. default is 4096 Gives the size of for the chunk in bytes used to read the body.509 certificate store that contains certificate for http app gateway. and if that doesnt exist. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION IsEnabled Bool. This certificate is configured on the https endpoint and can also be used to verify the identity of the app if needed by the services. GatewayX509CertificateFindValue string. default is "" Search filter value used to locate the http app gateway certificate. DefaultHttpRequestTimeout Time in seconds. default is "FindByThumbprint" Indicates how to search for certificate in the store specified by GatewayX509CertificateStoreName Supported value: FindByThumbprint. GatewayX509CertificateFindValueSecon string. GatewayX509CertificateStoreName string. default is "My" Name of X. Gives the default request timeout for the http requests being processed in the http app gateway. NumberOfParallelOperations Uint. FindValue is looked up first. default is "" Search filter value used to locate the dary http app gateway certificate. ImageStoreMinimumTransferBPS Int. default is 25 The maximum number of worker threads in parallel. default is true This configuration allows us to enable or disable caching. Change this value only if the latency between the cluster and ImageStore is high to allow more time for the cluster to download from the external ImageStore. AzureStorageMaxWorkerThreads Int. MaxPercentDeltaUnhealthyNodes Int. AzureStorageOperationTimeout Time in seconds. default is 10 Cluster upgrade health evaluation policy: maximum percent of delta unhealthy nodes allowed for the cluster to be healthy. default is 0 Cluster health evaluation policy: maximum percent of unhealthy nodes allowed for the cluster to be healthy. This value is used to determine the timeout when accessing the external ImageStore. DisableServerSideCopy Bool. Timeout for xstore operation to complete. Section Name: HealthManager/ClusterHealthPolicy PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ConsiderWarningAsError Bool. default is false This configuration allows us to enable or disable checksum validation during application provisioning. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ImageStoreConnectionString SecureString Connection string to the Root for ImageStore. default is 6000 Specify timespan in seconds. default is 0 Cluster health evaluation policy: maximum percent of unhealthy applications allowed for the cluster to be healthy. . AzureStorageMaxConnections Int. default is 5000 The maximum number of concurrent connections to azure storage. DisableChecksumValidation Bool. default is false Cluster health evaluation policy: warnings are treated as errors. default is 1024 The minimum transfer rate between the cluster and ImageStore. ImageCachingEnabled Bool. MaxPercentUnhealthyNodes Int. MaxPercentUnhealthyApplications Int. default is false This configuration enables or disables server side copy of application package on the ImageStore during application provisioning. PlacementConstraints string. default is 0 The MinReplicaSetSize for FaultAnalysisService. default is 60 minutes Specify timespan in seconds. StoredChaosEventCleanupIntervalInSec Int. The minutes StandByReplicaKeepDuration for FaultAnalysisService. The QuorumLossWaitDuration for FaultAnalysisService. default is MaxValue Specify timespan in seconds. default is 60 Specify timespan in seconds. and that completed at least CompletedActionKeepDurationInSecon ds ago will be removed. since the work to cleanup is only done on that interval. default is "" The PlacementConstraints for FaultAnalysisService. The ReplicaRestartWaitDuration for FaultAnalysisService. Section Name: FaultAnalysisService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION TargetReplicaSetSize Int. if the number of events is more than 30000. default is (60247) Specify timespan in seconds. default is 3600 This is how often the store will be cleaned up. the cleanup will kick in. StoredActionCleanupIntervalInSeconds Int. CompletedActionKeepDurationInSecon Int. StandByReplicaKeepDuration Time in seconds. default is 3600 This is how often the store will be onds audited for cleanup. default is 604800 This is approximately how long to keep ds actions that are in a terminal state. Section Name: FileStoreService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION NamingOperationTimeout Time in seconds. The timeout for performing naming operation. . default is 15 Cluster upgrade health evaluation althyNodes policy: maximum percent of delta of unhealthy nodes in an upgrade domain allowed for the cluster to be healthy. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION MaxPercentUpgradeDomainDeltaUnhe Int. ReplicaRestartWaitDuration Time in seconds. Only actions in a terminal state. QuorumLossWaitDuration Time in seconds. This also depends on StoredActionCleanupIntervalInSeconds. default is 0 NOT_PLATFORM_UNIX_START The TargetReplicaSetSize for FaultAnalysisService. 604800 is 7 days. MinReplicaSetSize Int. '0' == number of cores. PrimaryAccountUserPassword SecureString. default is 0 The maximum number of parallel files that secondary can copy from primary. AnonymousAccessEnabled Bool. default is 4096 The maximum number of parallel store transcation operations allowed on primary. '0' == number of cores. default is 200 The maximum number of parallel threads allowed to process requests in the primary. default is 60 Specify timespan in seconds. default is "" The primary account Username of the principal to ACL the FileStoreService shares. default is "LocalMachine" The store location of the X509 n certificate used to generate HMAC on the PrimaryAccountNTLMPasswordSecret when using NTLM authentication. MaxCopyOperationThreads Uint. default is true Enable/Disable anonymous access to the FileStoreService shares. MaxStoreOperations Uint. '0' == number of cores. PrimaryAccountType string. MaxRequestProcessingThreads Uint. . The timeout for performing query operation. default is empty PrimaryAccountNTLMX509StoreLocatio string. default is 25 The maximum number of file copy retries on the secondary before giving up. default is "" The primary AccountType of the principal to ACL the FileStoreService shares. default is "MY" The store name of the X509 certificate used to generate HMAC on the PrimaryAccountNTLMPasswordSecret when using NTLM authentication. PrimaryAccountNTLMX509StoreName string. MaxSecondaryFileCopyFailureThreshold Uint. PrimaryAccountUserName string.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION QueryOperationTimeout Time in seconds. MaxFileOperationThreads Uint. default is empty The primary account password of the principal to ACL the FileStoreService shares. FileStoreService PrimaryAccountNTLMPasswordSecret SecureString. default is 100 The maximum number of parallel threads allowed to perform FileOperations (Copy/Move) in the primary. '0' == number of cores. default is "" The secondary AccountType of the principal to ACL the FileStoreService shares. default is "LocalMachine" The store location of the X509 tion certificate used to generate HMAC on the SecondaryAccountNTLMPasswordSecret when using NTLM authentication. default is 7 The TargetReplicaSetSize for ImageStoreService. default is empty The password secret which used as seed t to generated same password when using NTLM authentication. SecondaryAccountUserPassword SecureString. SecondaryAccountType string. default is "MY" The store name of the X509 certificate e used to generate HMAC on the SecondaryAccountNTLMPasswordSecret when using NTLM authentication. default is empty The secondary account password of the principal to ACL the FileStoreService shares. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PrimaryAccountNTLMX509Thumbprint string. TargetReplicaSetSize Int.0 * 30 Specify timespan in seconds. default is 3 The MinReplicaSetSize for ImageStoreService. default is "" The secondary account Username of the principal to ACL the FileStoreService shares. Section Name: ImageStoreService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION Enabled Bool. SecondaryAccountNTLMPasswordSecre SecureString. MinReplicaSetSize Int. SecondaryAccountNTLMX509StoreLoca string. . default is "" The thumbprint of the X509 certificate nt used to generate HMAC on the SecondaryAccountNTLMPasswordSecret when using NTLM authentication. default is 60. default is false The Enabled flag for ImageStoreService. SecondaryAccountNTLMX509StoreNam string. ReplicaRestartWaitDuration Time in seconds. The ReplicaRestartWaitDuration for ImageStoreService. default is "" The thumbprint of the X509 certificate used to generate HMAC on the PrimaryAccountNTLMPasswordSecret when using NTLM authentication. SecondaryAccountNTLMX509Thumbpri string. SecondaryAccountUserName string. ClientDefaultTimeout Time in seconds. Timeout value for top-level download request to Image Store Service. default is 1800 Specify timespan in seconds.0 * 2 Specify timespan in seconds. Section Name: ImageStoreClient PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ClientUploadTimeout Time in seconds. ClientCopyTimeout Time in seconds. default is 1800 Specify timespan in seconds. default is 1800 Specify timespan in seconds. Timeout value for top-level upload request to Image Store Service. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION QuorumLossWaitDuration Time in seconds. ClientCopyTimeout Time in seconds. PlacementConstraints string. default is 600 Specify timespan in seconds. default is 3600. default is 600 Specify timespan in seconds. delete) to Image Store Service. The StandByReplicaKeepDuration for ImageStoreService. Timeout value for top-level download request to Image Store Service ClientListTimeout Time in seconds. exists. ClientDownloadTimeout Time in seconds. default is "" The PlacementConstraints for ImageStoreService. Timeout value for top-level copy request to Image Store Service.g. ClientListTimeout Time in seconds. default is MaxValue Specify timespan in seconds. . Timeout value for top-level list request to Image Store Service. default is 180 Specify timespan in seconds. default is 1800 Specify timespan in seconds. default is 1800 Specify timespan in seconds. Timeout value for top-level list request to Image Store Service. StandByReplicaKeepDuration Time in seconds. ClientUploadTimeout Time in seconds. default is 1800 Specify timespan in seconds. Timeout value for top-level copy request to Image Store Service. Timeout value for all non-upload/non-download requests (e. ClientDownloadTimeout Time in seconds. Timeout value for top-level upload request to Image Store Service. The QuorumLossWaitDuration for ImageStoreService. QuorumLossWaitDuration Time in seconds. default is 60247 Specify timespan in seconds. Timeout value for all non-upload/non-download requests (e. default is 0 The TargetReplicaSetSize for UpgradeOrchestrationService. delete) to Image Store Service. The minutes StandByReplicaKeepDuration for UpgradeOrchestrationService. default is false Setting to make code upgrade require administrator approval before proceeding. default is "" The PlacementConstraints for Upgrade service. PlacementConstraints string. default is "DSTS" Comma separated list of token validation providers to enable (valid providers are: DSTS. exists. . MinReplicaSetSize Int. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ClientDefaultTimeout Time in seconds. default is MaxValue Specify timespan in seconds. AutoupgradeEnabled Bool. AAD).g. default is "" The PlacementConstraints for UpgradeOrchestrationService. default is 180 Specify timespan in seconds. Currently only a single provider can be enabled at any time. The QuorumLossWaitDuration for UpgradeOrchestrationService. Section Name: UpgradeService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PlacementConstraints string. Section Name: TokenValidationService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION Providers string. StandByReplicaKeepDuration Time in seconds. default is 60 minutes Specify timespan in seconds. ReplicaRestartWaitDuration Time in seconds. The ReplicaRestartWaitDuration for UpgradeOrchestrationService. default is true Automatic polling and upgrade action based on a goal-state file. Section Name: UpgradeOrchestrationService PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION TargetReplicaSetSize Int. default is 0 The MinReplicaSetSize for UpgradeOrchestrationService. UpgradeApprovalRequired Bool. X509FindType string. default is "My" X509StoreName for UpgradeService. X509SecondaryFindValue string. default is "" TestCabFolder for UpgradeService. default is "WUTest" The CoordinatorType for UpgradeService. default is "" X509FindType for UpgradeService. X509StoreLocation string. default is 2 The MinReplicaSetSize for UpgradeService. CreateServiceFromTemplate string. . default is "" BaseUrl for UpgradeService. MinReplicaSetSize Int. default is false OnlyBaseUpgrade for UpgradeService. DeleteService string. default is "" ClusterId for UpgradeService. default is "" X509StoreLocation for UpgradeService. OnlyBaseUpgrade Bool. X509FindValue string. default is "Admin" Security configuration for service creation. default is "Admin" Security configuration for Naming property write operations. default is "" X509FindValue for UpgradeService. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION TargetReplicaSetSize Int. ClusterId string. UpdateService string. default is "Admin" Security configuration for Naming URI deletion. Section Name: Security/ClientAccess PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION CreateName string. default is "" X509SecondaryFindValue for UpgradeService. default is "Admin" Security configuration for service updates. TestCabFolder string. default is "Admin" Security configuration for Naming URI creation. CoordinatorType string. X509StoreName string. CreateService string. default is "Admin" Security configuration for service creatin from template. default is 3 The TargetReplicaSetSize for UpgradeService. default is "Admin" Security configuration for service deletion. BaseUrl string. PropertyWriteBatch string. DeleteName string. default is "Admin" Security configuration for application deletion. MoveNextFabricUpgradeDomain string. RollbackFabricUpgrade string. default is "Admin" Security configuration for resuming application upgrades with the current upgrade progress. default is "Admin" Security configuration for starting infrastructure tasks. UpgradeFabric string. . default is "Admin" Security configuration for finishing infrastructure tasks. default is "Admin" Security configuration for rolling back cluster upgrades. UpgradeApplication string.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ProvisionApplicationType string. default is "Admin" Security configuration for MSI and/or Cluster Manifest unprovisioning. default is "Admin" Security configuration for MSI and/or Cluster Manifest provisioning. ProvisionFabric string. ReportHealth string. default is "Admin" Security configuration for resuming cluster upgrades with the current upgrade progress. ReportFabricUpgradeHealth string. default is "Admin" Security configuration for application type unprovisioning. default is "Admin" Security configuration for resuming application upgrades with an explicit Upgrade Domain. default is "Admin" Security configuration for reporting health. MoveNextUpgradeDomain string. default is "Admin" Security configuration for starting cluster upgrades. FinishInfrastructureTask string. default is "Admin" Security configuration for resuming cluster upgrades with an explicity Upgrade Domain. UnprovisionApplicationType string. StartInfrastructureTask string. default is "Admin" Security configuration for rolling back application upgrades. default is "Admin" Security configuration for application creation. DeleteApplication string. default is "Admin" Security configuration for starting or interrupting application upgrades. UnprovisionFabric string. default is "Admin" Security configuration for application type provisioning. CreateApplication string. ReportUpgradeHealth string. RollbackApplicationUpgrade string. default is "Admin" Security configuration for reporting fault.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ActivateNode string. DeactivateNodesBatch string. RecoverSystemPartitions string. default is "Admin" Security configuration for reporting node state removed. default is "Admin" Security configuration for recovering system service partitions. default is "Admin" Security configuration for infrastructure task management commands. ReportFault string. Upload string. InternalList string. default is "Admin" Security configuration for image store client delete operation. Delete string. default is "Admin" Security configuration for image store client staging location retrieval. DeactivateNode string. default is "Admin" Security configuration for deactivating multiple nodes. default is "Admin" Security configuration for recovering partitions. default is "Admin" Security configuration for image store client upload operation. default is "Admin" Security configuration for activation a node. FileContent string. default is "Admin" Security configuration for reverting deactivation on multiple nodes. . default is "Admin" Security configuration for recovering service partitions. default is "Admin" Security configuration for image store client file list operation (internal). GetNodeDeactivationStatus string. default is "Admin" Security configuration for checking deactivation status. default is "Admin" Security configuration for recovering a partition. default is "Admin" Security configuration for image store client file download initiation (external to cluster). InvokeInfrastructureCommand string. default is "Admin" Security configuration for image store client file transfer (external to cluster). FileDownload string. RecoverServicePartitions string. RecoverPartitions string. RecoverPartition string. NodeStateRemoved string. GetStagingLocation string. RemoveNodeDeactivations string. default is "Admin" Security configuration for deactivating a node. default is "Admin" Predeployment api. default is "Admin" Security configuration for starting. StartClusterConfigurationUpgrade string. EnumerateSubnames string. StopChaos string. MoveReplicaControl string. StartPartitionRestart string. default is "Admin" Security configuration for restarting code packages. default is "Admin" Induces data loss on a partition. Query string. default is "Admin" Starts Chaos . default is "Admin||User" Security configuration for Naming URI enumeration. UnreliableTransportControl string. default is "Admin" Induces StartApprovedUpgrades on a partition. and restarting nodes. CodePackageControl string. CancelTestCommand string. default is "Admin" Unreliable Transport for adding and removing behaviors.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION GetStoreLocation string. PredeployPackageToNode string. default is "Admin" Induces GetUpgradesPendingApproval on a partition. default is "Admin" Simultaneously restarts some or all the replicas of a partition. StartPartitionDataLoss string. . default is "Admin" Cancels a specific TestCommand . default is "Admin" Security configuration for starting a node transition. default is "Admin" Move replica. StartChaos string. default is "Admin" Induces StartClusterConfigurationUpgrade on a partition. default is "Admin||User" Security configuration for client pings.if it is in flight. StartPartitionQuorumLoss string. StartApprovedUpgrades string. default is "Admin" Security configuration for image store client store location retrieval. StartNodeTransition string. stopping.if it has been started.if it is not already started. default is "Admin||User" Security configuration for queries. GetUpgradesPendingApproval string. default is "Admin" Stops Chaos . Ping string. NameExists string. default is "Admin||User" Security configuration for Naming URI existence checks. default is "Admin" Induces quorum loss on a partition. NodeControl string. default is "Admin||User" Security configuration for Naming property enumeration. default is "Admin||User" Security configuration for resolving Naming URI owner. GetServiceDescription string. ResolveService string. default is "Admin||User" Security configuration for querying infrastructure tasks. GetFabricUpgradeStatus string. default is "Admin||User" Security configuration for Naming property read operations. default is "Admin||User" Security configuration for complaint- based service resolution. GetUpgradeStatus string. default is "Admin||User" Security configuration for image store client file list operation. . default is "Admin||User" Security configuration for polling application upgrade status. default is "Admin||User" Security configuration for long-poll service notifications and reading service descriptions.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION EnumerateProperties string. default is "Admin||User" Security configuration for event-based service notifications. ResolvePartition string. default is "Admin||User" Security configuration for complaint- based service prefix resolution. default is "Admin||User" Fetches the progress for an invoke quorum loss api call. default is "Admin||User" Security configuration for reset load for a failoverUnit. ServiceNotifications string. GetPartitionDataLossProgress string. InvokeInfrastructureQuery string. default is "Admin||User" Security configuration for Toggling eporting Verbose ServicePlacement HealthReporting. ResetPartitionLoad string. default is "Admin||User" Security configuration for resolving system services. ResolveNameOwner string. List string. PropertyReadBatch string. GetPartitionQuorumLossProgress string. GetPartitionRestartProgress string. default is "Admin||User" Fetches the progress for an invoke data loss api call. ToggleVerboseServicePlacementHealthR string. default is "Admin||User" Security configuration for polling cluster upgrade status. PrefixResolveService string. default is "Admin||User" Fetches the progress for a restart partition api call. The maximum duration RA will wait before terminating service host of replica that is not closing. FabricUpgradeMaxReplicaCloseDuration Time in seconds. The tion maximum time to wait before terminating a service host that is blocking node deactivation. GetClusterConfigurationUpgradeStatus string. ServiceApiHealthDuration defines how long do we wait for a service API to run before we report it unhealthy. default is 900 Specify timespan in seconds. NodeDeactivationMaxReplicaCloseDura Time in seconds. default is 900 Specify timespan in seconds. GetNodeTransitionProgress string. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION GetChaosReport string. default is 900 Specify timespan in seconds. default is true Determines whether RA will use deactivation info for performing primary re-election For new clusters this configuration should be set to true For existing clusters that are being upgraded please see the release notes on how to enable this. PeriodicApiSlowTraceInterval Time in seconds. ServiceApiHealthDuration Time in seconds. default is 30 Specify timespan in seconds. default is 30 minutes Specify timespan in seconds. ServiceReconfigurationApiHealthDuratio Time in seconds. GetClusterConfiguration string. PeriodicApiSlowTraceInterval defines the interval over which slow API calls will be retraced by the API monitor. n ServiceReconfigurationApiHealthDuratio n defines how long the before a service in reconfiguration is reported as unhealthy. default is 5 minutes Specify timespan in seconds. default is "Admin||User" Induces GetClusterConfiguration on a partition. Section Name: ReconfigurationAgent PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ApplicationUpgradeMaxReplicaCloseDu Time in seconds. default is "Admin||User" Security configuration for getting progress on a node transition command. default is "Admin||User" Fetches the status of Chaos within a given time range. The ration duration for which the system will wait before terminating service hosts that have replicas that are stuck in close. default is "Admin||User" Induces GetClusterConfigurationUpgradeStatus on a partition. . IsDeactivationInfoEnabled Bool. default is 10000 Limits the size of the table used for e quick validation and caching of Placement Constraint Expressions. DetailedPartitionListLimit Int. default is 200 Defines the number of times an unplaced replica has to be persistently unplaced before detailed health reports are emitted. . ValidatePlacementConstraint Bool. default is 15 Defines the number of diagnostic entries (with detailed information) per constraint to include before truncation in Diagnostics. default is 20 Defines the number of consecutive ReportLimit times that ResourceBalancer-issued Movements are dropped before diagnostics are conducted and health warnings are emitted. ConsecutiveDroppedMovementsHealth Int. DetailedNodeListLimit Int. default is true Specifies whether to trace reasons for CRM issued movements to the operational events channel. DetailedVerboseHealthReportLimit Int. default is 200 Defines the number of times constraint tLimit violating replica has to be persistently unfixed before diagnostics are conducted and detailed health reports are emitted. DetailedConstraintViolationHealthRepor Int. ConstraintViolationHealthReportLimit Int. Negative: No Warnings Emitted under this condition. default is 20 Defines the number of times a replica has to go unplaced before a health warning is reported for it (if verbose health reporting is enabled). default is 15 Defines the number of nodes per constraint to include before truncation in the Unplaced Replica reports. default is true Specifies whether or not the PlacementConstraint expression for a service is validated when a service's ServiceDescription is updated. PlacementConstraintValidationCacheSiz Int. default is 50 Defines the number of times constraint violating replica has to be persistently unfixed before diagnostics are conducted and health reports are emitted. DetailedDiagnosticsInfoListLimit Int. VerboseHealthReportLimit Int. default is 15 Defines the number of partitions per diagnostic entry for a constraint to include before truncation in Diagnostics.Section Name: PlacementAndLoadBalancing PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION TraceCRMReasons Bool. default is 120 Specify timespan in seconds. BalancingDelayAfterNewNode Time in seconds. Defines the minimum amount of time that must pass before PLB refreshes state again. default is 120 Specify timespan in seconds. GlobalMovementThrottleThresholdForB Uint. default is 5 Specify timespan in seconds. Do not start balancing activities within this period after a node down event. . DDo not Fix FaultDomain and UpgradeDomain constraint violations within this period after adding a new node.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PLBRefreshGap Time in seconds. Defines the minimum amount of time that must pass before two consecutive constraint check rounds. MinPlacementInterval Time in seconds. Do not Fix n FaultDomain and UpgradeDomain constraint violations within this period after a node down event. GlobalMovementThrottleThresholdForP Uint. default is 1 Specify timespan in seconds. Defines the minimum amount of time that must pass before two consecutive balancing rounds. default is 0 Maximum number of movements lacement allowed in Placement Phase in the past interval indicated by GlobalMovementThrottleCountingInter val. MinLoadBalancingInterval Time in seconds. Defines the minimum amount of time that must pass before two consecutive placement rounds. default is 1 Specify timespan in seconds. ConstraintFixPartialDelayAfterNodeDow Time in seconds. 0 indicates no limit. MinConstraintCheckInterval Time in seconds.0 indicates no limit. default is 120 Specify timespan in seconds. default is 0 Maximum number of movements alancing allowed in Balancing Phase in the past interval indicated by GlobalMovementThrottleCountingInter val. default is 120 Specify timespan in seconds. GlobalMovementThrottleThreshold Uint. Do not start balancing activities within this period after adding a new node. ConstraintFixPartialDelayAfterNewNode Time in seconds. default is 1 Specify timespan in seconds. BalancingDelayAfterNodeDown Time in seconds. default is 1000 Maximum number of movements allowed in the Balancing Phase in the past interval indicated by GlobalMovementThrottleCountingInter val. SwapPrimaryThrottlingEnabled Bool. InBuildThrottlingGlobalMaxValue Int. By default.5 Specify timespan in seconds. When placing services. PreventTransientOvercommit Bool. InBuildThrottlingAssociatedMetric string. Indicate val the length of the past interval for which to track per domain replica movements (used along with GlobalMovementThrottleThreshold). Indicate Interval the length of the past interval for which to track replica movements for each partition (used along with MovementPerPartitionThrottleThreshol d). default is false Determine whether the swap-primary throttling is enabled. . default is 0. MovementPerPartitionThrottleCounting Time in seconds. default is false Determine whether the in-build throttling is enabled. default is "" The associated metric name for this throttling. default is 0 The maximal number of in-build replicas allowed globally. Can be set to 0 to ignore global throttling altogether. default is 600 Specify timespan in seconds. default is 600 Specify timespan in seconds. PlacementSearchTimeout Time in seconds. default is 50 No balancing related movement will d occur for a partition if the number of balancing related movements for replicas of that partition has reached or exceeded MovementPerFailoverUnitThrottleThres hold in the past interval indicated by MovementPerPartitionThrottleCounting Interval. InBuildThrottlingEnabled Bool. resulting potentially large number of moves for better balanced placement. search for at most this long before returning a result. UseMoveCostReports Bool. PLB can initiate move out and move in on the same node which can create transient overcommit. default is false Determines should PLB immediately count on resources that will be freed up by the initiated moves. MovementPerPartitionThrottleThreshol Uint. Setting this parameter to true will prevent those kind of overcommits and on-demand defrag (aka placementWithMove) will be disabled.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION GlobalMovementThrottleCountingInter Time in seconds. default is false Instructs the LB to ignore the cost element of the scoring function. MoveExistingReplicaForPlacement Bool. default is 1 Determines the priority of upgrade domain constraint: 0: Hard. 1: Soft. default is true Determines if all service replicas in cluster will be placed "all or nothing" given limited suitable nodes for them. MoveParentToFixAffinityViolation Bool. PartiallyPlaceServices Bool. 1: Soft. FaultDomainConstraintPriority Int. 2: Optimization. UseSeparateSecondaryLoad Bool. negative: Ignore. default is true Setting which determines if use different secondary load. 1: Soft. negative: Ignore. negative: Ignore. default is true Setting which determines if to move existing replica during placement.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION SwapPrimaryThrottlingAssociatedMetric string. default is 0 Determines the priority of capacity constraint: 0: Hard. negative: Ignore. default is false Setting which determines if parent replicas can be moved to fix affinity constraints. default is 0 Determines the priority of affinity constraint: 0: Hard. UpgradeDomainConstraintPriority Int. AffinityConstraintPriority Int. default is 0 Determines the priority of placement constraint: 0: Hard. default is 0 The maximal number of swap-primary replicas allowed globally. PlaceChildWithoutParent Bool. PreferredLocationConstraintPriority Int. default is 2 Determines the priority of preferred location constraint: 0: Hard. default is "" The associated metric name for this throttling. PlacementConstraintPriority Int. default is true Setting which determines if child service replica can be placed if no parent replica is up. ScaleoutCountConstraintPriority Int. default is 0 Determines the priority of scaleout count constraint: 0: Hard. negative: Ignore. negative: Ignore. 1: Soft. SwapPrimaryThrottlingGlobalMaxValue Int. negative: Ignore CapacityConstraintPriority Int. default is 0 Determines the priority of capacity constraint: 0: Hard. negative: Ignore. . 1: Soft. ApplicationCapacityConstraintPriority Int. 1: Soft. 1: Soft. 1: Soft. default is 0 Determines the priority of fault domain constraint: 0: Hard. The retry interval on every try is a product of continuous activation failure and the activation back-off interval. the system retries the activation for up to the MaxActivationFailureCount. default is 300 Maximum time allowed for the ServiceType to be registered with fabric ServiceTypeDisableFailureThreshold Whole number. default is false Determines if any type of failover unit dates update should interrupt fast or slow balancing run. Balancing run will NOT be interrupted in other cases . changed any replica flag. has missing replicas. ActivationRetryBackoffInterval Time in Seconds. Section Name: Hosting PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ServiceTypeRegistrationTimeout Time in Seconds. default is 10 Number of times system retries failed activation before giving up Section Name: FailoverManager . ActivationMaxRetryInterval specifies Wait time interval before retry after every activation failure ActivationMaxFailureCount Whole number. changed primary replica location or changed number of replicas. default is 5 Backoff interval on every activation failure. default is 300 On every continuous activation failure. changed only partition version or any other case. EncryptAndSign for secure clusters. default is 1 This is the threshold for the failure count after which FailoverManager (FM) is notified to disable the service type on that node and try a different node for placement. Section Name: Security PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION ClusterProtectionLevel None or EncryptAndSign None (default) for unsecured clusters. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION InterruptBalancingForAllFailoverUnitUp Bool. the system retries the activation for up to ActivationMaxFailureCount.if FailoverUnit: has extra replicas. On every continuous activation failure. ActivationMaxRetryInterval Time in seconds. With specified "false" balancing run will be interrupted if FailoverUnit: is created/deleted. default is 60 The frequency of health status check during a monitored Fabric upgrade InfrastructureTaskProcessingInterval Time in seconds. The processing interval used by the infrastructure task processing state machine. default is 60 The frequency of polling for application upgrade status. default is 3 The MinReplicaSetSize for ClusterManager. default is 30 Duration that a lease lasts between a node and its neighbors across fault domains. MinReplicaSetSize Int. default is 10 Specify timespan in seconds. default is 7 The TargetReplicaSetSize for ClusterManager. ReplicaRestartWaitDuration Time in seconds. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION PeriodicLoadPersistInterval Time in seconds. This value determines the rate of update for any GetFabricUpgradeProgress call FabricUpgradeHealthCheckInterval Time in seconds.0 * 30) Specify timespan in seconds. TargetReplicaSetSize Int. default is 60 The frequency of health status checks during a monitored application upgrades FabricUpgradeStatusPollInterval Time in seconds. default is (60. LeaseDurationAcrossFaultDomain Time in seconds. The QuorumLossWaitDuration for ClusterManager. default is 10 This determines how often the FM check for new load reports Section Name: Federation PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION LeaseDuration Time in seconds. default is MaxValue Specify timespan in seconds. QuorumLossWaitDuration Time in seconds. . This value determines the rate of update for any GetApplicationUpgradeProgress call UpgradeHealthCheckInterval Time in seconds. default is 60 The frequency of polling for Fabric upgrade status. Section Name: ClusterManager PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION UpgradeStatusPollInterval Time in seconds. The ReplicaRestartWaitDuration for ClusterManager. default is 30 Duration that a lease lasts between a node and its neighbors. Observing a failed health check will reset this timer. default is 600 Specify timespan in seconds. default is 60 Specify timespan in seconds. default is 3 Specify timespan in seconds. InfrastructureTaskHealthCheckStableDu Time in seconds. . The StandByReplicaKeepDuration for ClusterManager. The ration amount of time to observe consecutive passed health checks before post- processing of an infrastructure task finishes successfully. default is "" The PlacementConstraints for ClusterManager. The ation amount of time to wait before starting health checks after post-processing an infrastructure task. InfrastructureTaskHealthCheckRetryTim Time in seconds. The amount of time to allow for Image Builder specific timeout errors to return to the client. Default service descriptions would be overwritten after upgrade. The minimum global timeout for internally processing operations on ClusterManager. The maximum operation timeout when internally retrying due to timeouts is + . default is 0 Specify timespan in seconds. Observing a passed health check will reset this timer. ImageBuilderTimeoutBuffer Time in seconds. If this buffer is too small. default is false The CM will skip reverting updated default services during application upgrade rollback.PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION StandByReplicaKeepDuration Time in seconds. then the client times out before the server and gets a generic timeout error. SkipRollbackUpdateDefaultService Bool. InfrastructureTaskHealthCheckWaitDur Time in seconds. The eout amount of time to spend retrying failed health checks while post-processing an infrastructure task. The maximum global timeout for internally processing operations on ClusterManager. PlacementConstraints string. default is (3600. default is MaxValue Specify timespan in seconds. default is false Enable upgrading default services during application upgrade. MaxOperationTimeout Time in seconds. EnableDefaultServicesUpgrade Bool. MinOperationTimeout Time in seconds. MaxTimeoutRetryBuffer Time in seconds. default is 0 Specify timespan in seconds.0 * 2) Specify timespan in seconds. Additional timeout is added in increments of MinOperationTimeout. default is 60 Specify timespan in seconds. default is 5 Specify timespan in seconds. remove certificates from your Azure cluster . The maximum delay for internal retries when failures are encountered. MaxOperationRetryDelay Time in seconds. The maximum timeout for internal communications between ClusterManager and other system services (i. The value used for rollforward is never overridden. then it's overridden with the value of this config for the purposes of rollback.e. The maximum timeout for data migration recovery operations after a Fabric upgrade has taken place. ImageBuilderJobQueueThrottle Int. PARAMETER ALLOWED VALUES GUIDANCE OR SHORT DESCRIPTION MaxCommunicationTimeout Time in seconds. default is 1200 Specify timespan in seconds. MaxDataMigrationTimeout Time in seconds. default is 600 Specify timespan in seconds. This timeout should be smaller than global MaxOperationTimeout (as there might be multiple communications between system components for each client operation). Failover Manager and etc). Naming Service. default is 600 Specify timespan in seconds. If e ReplicaSetCheckTimeout is set to the maximum value of DWORD. default is 10 Thread count throttle for Image Builder proxy job queue on application requests. Next steps Read these articles for more information on cluster management: Add. Roll over. ReplicaSetCheckTimeoutRollbackOverrid Time in seconds.. After cluster creation. If you specify only one cluster certificate at create time. if you want to swap the primary and secondary. in addition to client certificates. NOTE For a secure cluster. . The process is outlined later in this document. before you proceed further. There is currently no email or any other notification that service fabric sends out on this topic. Service fabric lets you specify two cluster certificates. Add a secondary cluster certificate using the portal Secondary cluster certificate cannot be added through the Azure portal. a primary and a secondary.509 certificates and be familiar with the Cluster security scenarios. then that is used as the primary certificate. the system generates a warning trace and also a warning health event on the node. Add or remove certificates for a Service Fabric cluster in Azure 3/9/2017 • 8 min to read • Edit Online It is recommended that you familiarize yourself with how Service Fabric uses X. and select the 'Swap with primary' option from the context menu to swap the secondary cert with the primary cert. You have to use Azure powershell for that. you will always need at least one valid (not revoked and not expired) cluster certificate (primary or secondary) deployed (if not. You must understand what a cluster certificate is and what is used for. Refer to creating an azure cluster via portal or creating an azure cluster via Azure Resource Manager for details on setting them up at create time. then navigate to the Security blade. you can add a new certificate as a secondary. Swap the cluster certificates using the portal After you have successfully deployed a secondary cluster certificate. 90 days before all valid certificates reach expiration. when you configure certificate security during cluster creation. the cluster stops functioning). you will always need at least one valid (not revoked and not expired) certificate (primary or secondary) deployed if not. (If you have downloaded the sample from the above repo. then Use 5-VM-1-NodeTypes-Secure_Step1. Depending on the source of your template. then you will need to swap it with the secondary first. To remove a secondary certificate from being used for cluster security. you may already have these defined.Remove a cluster certificate using the portal For a secure cluster. . Make sure to follow all the steps Step 1: Open up the Resource Manager template you used to deploy you Cluster. Add a secondary certificate using Resource Manager Powershell These steps assume that you are familiar with how Resource Manager works and have deployed atleast one Service Fabric cluster using a Resource Manager template.JSON contains all the edits we will be making. the sample is available at git-repo. It is also assumed that you are comfortable using JSON. You can copy the following code snippet and add it to the template. the cluster stops functioning. sample 5-VM-1-NodeTypes-Secure_Step2. then download it from this git-repo. Edit your Resource Manager template For ease of following along. and then delete the secondary after the upgrade has completed. Navigate to the Security blade and select the 'Delete' option from the context menu on the secondary certificate. if so move to the next step. If your intent is to remove the certificate that is marked primary.JSON to deploy a secure cluster and then open up that template). NOTE If you are looking for a sample template and parameters that you can use to follow along or as a starting point. and have the template that you used to set up the cluster handy. Step 2: Add two new parameters "secCertificateThumbprint" and "secCertificateUrlValue" of type "string" to the parameter section of your template. "thumbprintSecondary": "[parameters('secCertificateThumbprint')]". Step 3: Make changes to the Microsoft.Compute/virtualMachineScaleSets resource definitions . then specify the new cert as primary and moving the current primary as secondary. "metadata": { "description": "Certificate Thumbprint" } }. you should see something like this.Locate the Microsoft.Compute/virtualMachineScaleSets resource definition.ServiceFabric". you will find "Certificate" JSON tag. Scroll to the "publisher": "Microsoft. . "x509StoreName": "[parameters('certificateStoreValue')]" } If you want to rollover the cert.Azure. it is should be in the format of https://<name of the vault>.Locate the "Microsoft. "properties": { "certificate": { "thumbprint": "[parameters('secCertificateThumbprint')]". "x509StoreName": "[parameters('certificateStoreValue')]" } Step 4: Make changes to all the Microsoft. "x509StoreName": "[parameters('certificateStoreValue')]" } Add a new tag "thumbprintSecondary" and give it a value "[parameters('secCertificateThumbprint')]". This results in the rollover of your current primary certificate to the new certificate in one deployment step. "thumbprintSecondary": "[parameters('certificateThumbprint')]".vault. In the service fabric publisher settings.ServiceFabric/clusters resource .ServiceFabric/clusters" resource definition in your template.azure. under "virtualMachineProfile". which should look something like the following JSON snippet: "properties": { "certificate": { "thumbprint": "[parameters('certificateThumbprint')]". "secCertificateUrlValue": { "type": "string".net:443/secrets/<exact location>" } }. it may not be exactly like the snippet below). So now the resource definition should look like the following (depending on your source of the template. "metadata": { "description": "Refers to the location URL in your key vault where the certificate was uploaded. "properties": { "certificate": { "thumbprint": "[parameters('certificateThumbprint')]". "secCertificateThumbprint": { "type": "string". Under properties of that definition. "x509StoreName": "[parameters('certificateStoreValue')]" } }. The properties should now look like this If you want to rollover the cert. .Add the new cert entries to it "certificateSecondary": { "thumbprint": "[parameters('secCertificateThumbprint')]". then specify the new cert as primary and moving the current primary as secondary. This results in the rollover of your current certificate to the new certificate in one deployment step. Compute/virtualMachineScaleSets resource definition. "certificate": { "thumbprint": "[parameters('secCertificateThumbprint')]". Scroll to the "vaultCertificates": . "x509StoreName": "[parameters('certificateStoreValue')]" } }. "x509StoreName": "[parameters('certificateStoreValue')]" }. under "OSProfile".Locate the Microsoft.Compute/virtualMachineScaleSets resource definitions . "certificateSecondary": { "thumbprint": "[parameters('certificateThumbprint')]". it should look something like this. use the following snippet: . The properties should now look like this Step 5: Make Changes to all the Microsoft. Add the secCertificateUrlValue to it. JSON Edit your Resource Manager Template parameter File.paramters_Step2. "certificateUrl": "[parameters('secCertificateUrlValue')]" } Now the resulting Json should look something like this. Login-AzureRmAccount Select-AzureRmSubscription -SubscriptionId <Subcription ID> . before proceeding further. Log in to your Azure Account and select the specific azure subscription. add the two new parameters for secCertificateThumbprint and secCertificateUrlValue. Edit your template file to reflect the new parameters you added above If you are using the sample from the git-repo to follow along.Compute/virtualMachineScaleSets resource definitions in your template. This is an important step for folks who have access to more than one azure subscription. NOTE Make sure that you have repeated steps 4 and 5 for all the Nodetypes/Microsoft.vault. Deploy the template to Azure You are now ready to deploy your template to Azure. { "certificateStore": "[parameters('certificateStoreValue')]". "secCertificateThumbprint": { "value": "thumbprint value" }. So please double check.net:443/secrets/<exact location>" }.azure. "secCertificateUrlValue": { "value": "Refers to the location URL in your key vault where the certificate was uploaded. Open an Azure PS version 1+ command prompt. the certificate will not get installed on that VMSS and you will have unpredictable results in your cluster. it is should be in the format of https://<name of the vault>. you can start to make changes in The sample 5-VM- 1-NodeTypes-Secure. including the cluster going down (if you end up with no valid certificates that the cluster can use for security. If you miss one of them. you can inadvertently delete resources that are not in your template. Run the New-AzureRmResourceGroupDeployment command.westus.json" New-AzureRmResourceGroupDeployment -ResourceGroupName $ResouceGroup2 -TemplateParameterFile $TemplateParmFile -TemplateUri $TemplateFile -clusterName $ResouceGroup2 Once the deployment is complete. $ResouceGroup2 = "chackosecure5" $TemplateFile = "C:\GitHub\Service-Fabric\ARM Templates\Cert Rollover Sample\5-VM-1-NodeTypes- Secure_Step2.json" $TemplateParmFile = "C:\GitHub\Service-Fabric\ARM Templates\Cert Rollover Sample\5-VM-1-NodeTypes- Secure. New-AzureRmResourceGroupDeployment -Name ExampleDeployment -ResourceGroupName <Resource Group that your cluster is currently deployed to> -TemplateFile <PathToTemplate> Here is a filled out example of the same powershell.com:19000" $CertThumbprint= "70EF5E22ADB649799DA3C8B6A6BF7SD1D630F8F3" Connect-serviceFabricCluster -ConnectionEndpoint $ClusterName -KeepAliveIntervalInSec 10 ` -X509Credential ` -ServerCertThumbprint $CertThumbprint ` -FindType FindByThumbprint ` -FindValue $CertThumbprint ` -StoreLocation CurrentUser ` -StoreName My For quick reference here is the command to get cluster health . NOTE If you set Mode to Complete. connect to your cluster using the new Certificate and perform some queries.azure. Then you can delete the old certificate.pfx -Password (ConvertTo-SecureString -String abcd123 -AsPlainText -Force) Import-PfxCertificate -Exportable -CertStoreLocation Cert:\CurrentUser\My -FilePath c:\Mycertificates\chackdanTestCertificate9. If you are able to do. do not forget to import them into your local TrustedPeople cert store.parameters_Step2.pfx -Password (ConvertTo-SecureString -String abcd123 -AsPlainText -Force) For quick reference here is the command to connect to a secure cluster $ClusterName= "chackosecure5. Use the same Resource Group that your cluster is currently deployed to. ######## Set up the certs on your local box Import-PfxCertificate -Exportable -CertStoreLocation Cert:\CurrentUser\TrustedPeople -FilePath c:\Mycertificates\chackdanTestCertificate9.Test the template prior to deploying it. Use the same Resource Group that your cluster is currently deployed to. You do not need to specify the mode.cloudapp. Test-AzureRmResourceGroupDeployment -ResourceGroupName <Resource Group that your cluster is currently deployed to> -TemplateFile <PathToTemplate> Deploy the template to your resource group. If you are using a self-signed certificate. since the default value is incremental. So do not use it in this scenario. you just need define and use different parameters. Navigate to the Security blade.'Read-only client' or 'Admin client' 3. You can use the same steps as outlined in Steps 5 above to have the certificates deployed from a keyvault to the Nodes. Adding or removing Client certificates In addition to the cluster certificates. You can add two kinds of client certificates . Next steps Read these articles for more information on cluster management: Service Fabric Cluster upgrade process and expectations from you Setup role-based access for clients .Admin or Read-only. you can add client certificates to perform management operations on a service fabric cluster. On the 'Add Authentication' blade. By default.Admin or Read-Only via portal 1. These then can be used to control access to the admin operations and Query operations on the cluster.Admin or Read-Only using the portal To remove a secondary certificate from being used for cluster security. the cluster certificates are added to the allowed Admin certificates list. Each addition/deletion results in a configuration update to the service fabric cluster Adding client certificates . In general. you can specify any number of client certificates. Now choose the Authorization method. Navigate to the Security blade and select the 'Delete' option from the context menu on the specific certificate. 2. This indicates to Service Fabric whether it should look up this certificate by using the subject name or the thumbprint. Get-ServiceFabricClusterHealth Deploying Application certificates to the cluster. choose the 'Authentication Type' . Deletion of Client Certificates . and select the '+ Authentication' button on top of the security blade. it is not a good security practice to use the authorization method of subject name. the following FDs are valid: "faultDomain": "fd:/Room1/Rack1/Machine1" "faultDomain": "fd:/FD1" "faultDomain": "fd:/Room1/Rack1/PDU1/M1" An upgrade domain (UD) is a logical unit of nodes. consider the physical security of these machines. During Service Fabric orchestrated upgrades (either an . Step 3: Determine the initial cluster size Each node in a standalone Service Fabric cluster has the Service Fabric runtime deployed and is a member of the cluster. you must have a minimum cluster size of three nodes (machines or virtual machines). However. so you can decide what kinds of failures you want the cluster to survive. loosely speaking. full install. Windows PowerShell 3. do you need separate power lines or Internet connections supplied to these machines? In addition. The RemoteRegistry service should be running on all the machines.json.5. Service Fabric supports hierarchical FDs. A minimum of 40 of GB available disk space is recommended. In a typical production deployment. switches. each rack can be considered a fault domain. For example. . Service Fabric supports only one node per physical or virtual machine. and more) that share a single point of failure. Windows Server 2012 R2 or Windows Server 2016. Although there is no 1:1 mapping between fault domains and racks. Step 2: Prepare the machines to meet the prerequisites Prerequisites for each machine that you want to add to the cluster: A minimum of 16 GB of RAM is recommended. When considering the nodes in your cluster. Step 4: Determine the number of fault domains and upgrade domains A fault domain (FD) is a physical unit of failure and is directly related to the physical infrastructure in the data centers. The cluster size is determined by your business needs. you can have more than one node on a given machine. For development purposes. The infrastructure planning for production clusters is more involved than for test clusters. Connectivity to a secure network or networks for all machines.NET Framework 4. we strongly recommend that the nodes be distributed among at least three fault domains. networks. You cannot install Service Fabric on a domain controller. you can logically map the machines to the various fault domains (see Step 4). For example. 3/30/2017 • 7 min to read • Edit Online Plan and prepare your Service Fabric standalone cluster deployment Perform the following steps before you create your cluster. so you can reflect your infrastructure topology in them. Where are the machines located and who needs access to them? After you make these decisions. A 4 core or greater CPU is recommended. you can choose the name for each FD.1 or higher. A fault domain consists of hardware components (computers. there is one node per OS instance (physical or virtual).0. When you specify FDs in ClusterConfig. Step 1: Plan your cluster infrastructure You are about to create a Service Fabric cluster on machines you own. The cluster administrator deploying and configuring the cluster must have administrator privileges on each of the machines. In a production environment. The firmware upgrades you perform on your machines do not honor UDs.json. Standalone Cluster Configurations For details on the sections in this file.json files from the package you downloaded and modify the following settings: CONFIGURATION SETTING DESCRIPTION . the following names are valid: "upgradeDomain": "UD0" "upgradeDomain": "UD1A" "upgradeDomain": "DomainRed" "upgradeDomain": "Blue" For more detailed information on upgrade domains and fault domains.json file. or to one of the machines that will be a part of your cluster. You can base the configuration file on the templates found at the below link. all nodes in a UD are taken down to perform the upgrade while nodes in other UDs remain available to serve requests. you can choose the name for each UD. When you specify UDs in ClusterConfig. see Configuration settings for standalone Windows cluster.application upgrade or a cluster upgrade). so you must do them one machine at a time. see Describing a Service Fabric cluster. Step 5: Download the Service Fabric standalone package for Windows Server Download Link . Open one of the ClusterConfig. which describes the specification of the cluster. For example.Service Fabric Standalone Package . Step 6: Modify cluster configuration To create a standalone cluster you have to create a standalone cluster configuration ClusterConfig.Windows Server and unzip the package. either to a deployment machine that is not part of the cluster. The simplest way to think about these concepts is to consider FDs as the unit of unplanned failure and UDs as the unit of planned maintenance. which you can use for testing purposes. A cluster must have at least one NodeType. Do not use One-box clusters for deploying production workloads. and other physical properties. Have necessary ports opened for Windows SMB and Remote Registry service: 135. All nodes in a group have the following common characteristics: Name .Node capacities define the name and amount of a particular resource that a particular node has available for consumption. Environment setup When a cluster administrator configures a Service Fabric standalone cluster.These describe properties for this node type that you use as placement constraints for the system services or your services. cores. Machine from which the cluster is created. Have file sharing (SMB) enabled 8. If you use the same IP address for all the nodes. For example. Placement Properties . a node may define that it has capacity for a metric called “MemoryInMb” and that it has 2048 MB available by default. 139. These capacities are used at runtime to ensure that services that require particular amounts of resources are placed on the nodes that have those resources available in the required amounts. 137. which is where the system services run. then a one- box cluster is created. as well as each cluster node machine must: 3. Endpoint Ports . Capacities . These properties are user- defined key/value pairs that provide extra meta data for a given node. The user creating the cluster should have administrator-level security privileges to all machines that are listed as nodes in the cluster configuration file. IsPrimary . it can be tested against the cluster environment (step 7). fault domain. Have the Windows Firewall service (mpssvc) enabled 6. node name. Have Service Fabric runtime uninstalled 5. the number of spindles in its hard drive.This is the node type name. the environment is needs to be set up with the following criteria: 1. The machines you want the cluster to be created on need to be listed here with their IP addresses. CONFIGURATION SETTING DESCRIPTION NodeTypes Node types allow you to separate your cluster nodes into various groups. Step 7. 138. based on cluster configuration ports 9. All other node types should be set to the value false Nodes These are the details for each of the nodes that are part of the cluster (node type. Have Service Fabric SDK uninstalled 4.These are various named end points (ports) that are associated with this node type. 2. Have the Remote Registry Service (remoteregistry) enabled 7. You can use any port number that you wish. After the cluster configuration has had all settings configured to the environment. Examples of node properties would be whether the node has a hard drive or graphics card.If you have more than one NodeType defined ensure that only one is set to primary with the value true. and 445 . IP address. Have necessary ports opened. as long as they do not conflict with anything else in this manifest and are not already in use by any other application running on the machine/VM. and upgrade domain of the node). 10.exe FabricDCA. 13. Alternatively if network internet access is limited to white-listed domains.microsoft.exe FabricHost.exe FileStoreService.exe Step 8.com 17. Validate environment using TestConfiguration script The TestConfiguration. If the cluster machines are not internet-accessible. the domains below are required for automatic upgrade: go. validate the necessary security prerequisites are in place. None of the cluster node machines should be a Domain Controller. Disable automatic Fabric version downloading & notifications that the current cluster version is nearing end of support: Under properties set "fabricClusterAutoupgradeEnabled": false 16.exe FabricFAS. It is used as a Best Practices Analyzer to validate some of the criteria above and should be used as a sanity check to validate whether a cluster can be deployed on a given environment. If there is any failure. . Disable telemetry: Under properties set "enableTelemetry": false 15. and are configured correctly against the configuration.exe FabricUOS. refer to the list under Environment Setup for troubleshooting.exe FabricDeployer.microsoft. 12. Set appropriate Service Fabric antivirus exclusions: ANTIVIRUS EXCLUDED DIRECTORIES Program Files\Microsoft Service Fabric FabricDataRoot (from cluster configuration) FabricLogRoot (from cluster configuration) ANTIVIRUS EXCLUDED PROCESSES Fabric.exe FabricInstallerService.exe FabricRM.exe FabricSetup.exe ImageBuilder. If the cluster to be deployed is a secure cluster. set the following in the cluster configuration: 14.com download. Have network connectivity to one another 11.exe FabricGateway.ps1 script can be found in the standalone package. Best Practices Analyzer completed successfully.Azure. The machine that this script is run on does not have to be part of the cluster. please let us know through our support channels.ServiceFabric..ServiceFabric.ps1 -ClusterConfigFilePath .Unsecure. Traces will be written to existing trace folder: C:\temp\Microsoft. PS C:\temp\Microsoft.WindowsServer\DeploymentTraces Running Best Practices Analyzer. NOTE We are continually making improvements to make this module more robust. so if there is a faulty or missing case which you believe isn't currently caught by TestConfiguration.json Trace folder already exists.\ClusterConfig.\TestConfiguration.DevCluster..WindowsServer> . Next steps Create a standalone cluster running on Windows Server .This script can be run on any machine that has administrator access to all the machines that are listed as nodes in the cluster configuration file. LocalAdminPrivilege : True IsJsonValid : True IsCabValid : True RequiredPortsOpen : True RemoteRegistryAvailable : True FirewallAvailable : True RpcCheckPassed : True NoConflictingInstallations : True FabricInstallable : True Passed : True Currently this configuration testing module does not validate the security configuration so this has to be done independently.Azure. json.ps1 A PowerShell script for cleaning a standalone Service Fabric installation off the current machine.ps1 A PowerShell script for analyzing the infrastructure as specified in the Cluster. CleanFabric. You can download a copy of the EULA now. you will find the following files: FILE NAME SHORT DESCRIPTION CreateServiceFabricCluster. It is a subset of the instructions in this document.ps1 A PowerShell script that creates the cluster using the settings in ClusterConfig. single-machine (or virtual machine) development cluster.ps1 A PowerShell script used for downloading the latest runtime package out of band.json A cluster configuration sample file that contains the settings for an unsecured. RemoveServiceFabricCluster.Unsecure. Templates FILE NAME SHORT DESCRIPTION ClusterConfig. AddNode. TestConfiguration. DeploymentComponentsAutoextractor. Readme.ps1 A PowerShell script that removes a cluster using the settings in ClusterConfig. RemoveNode.DevCluster. for scenarios where the deploying machine is not connected to the internet.ps1 A PowerShell script for adding a node to an existing deployed cluster on the current machine. including the information for each node in the cluster. EULA_ENU. Package contents of Service Fabric Standalone package for Windows Server 3/28/2017 • 2 min to read • Edit Online In the downloaded Service Fabric Standalone package.json.txt A link to the release notes and basic installation instructions. ThirdPartyNotice.exe Self-extracting archive containing Deployment Components used by the Standalone package scripts. three-node. Previous MSI installations should be removed using their own associated uninstallers.rtf Notice of third-party software that is in the package.ps1 A PowerShell script for removing a node from an existing deployed cluster from the current machine.txt The license terms for the use of Microsoft Azure Service Fabric standalone Windows Server package. DownloadServiceFabricRuntimePackage. .json. three-node. multi-machine (or virtual machine) cluster using Windows security.DevCluster. multi-machine (or virtual machine) cluster.Unsecure. ClusterConfig.Windows.x509.gMSA.Windows. multi-machine (or virtual machine) cluster. including the information for each machine that is in the secure cluster. three-node.MultiMachine. The cluster is secured using x509 certificates.json A cluster configuration sample file that contains all the settings for a secure.json A cluster configuration sample file that contains all the settings for a secure. including the information for each node that is in the cluster. ClusterConfig.json A cluster configuration sample file that contains all the settings for the secure.DevCluster. including the information for each node in the cluster. Cluster Configuration Samples Latest versions of cluster configuration templates can be found at the GitHub page: Standalone Cluster Configuration Samples. ClusterConfig.MultiMachine. The cluster is secured by using Windows identities. single-machine (or virtual machine) development cluster. Related Create a standalone Azure Service Fabric cluster Service Fabric cluster security scenarios .x509. The cluster is secured by using Windows identities. The cluster is secured using Group Managed Service Accounts. including the information for each machine in the cluster. ClusterConfig.MultiMachine. multi-machine (or virtual machine) cluster.json A cluster configuration sample file that contains all the settings for the secure. including the information for each node in the secure cluster.json A cluster configuration sample file that contains the settings for an unsecured.MultiMachine. FILE NAME SHORT DESCRIPTION ClusterConfig. ClusterConfig. including the information for each node in the secure cluster.json A cluster configuration sample file that contains all the settings for a secure. single-machine (or virtual machine) development cluster. The cluster is secured using x509 certificates.Windows. This means you can deploy and run Service Fabric applications in any environment that contains a set of interconnected Windows Server computers. You can also get support for this package as a part of Microsoft Premier Support. Open a ticket for Professional Support for Service Fabric.json file included in Samples. If deploying from a machine not connected to the internet. The Service Fabric runtime package is automatically downloaded at time of cluster creation. Service Fabric provides a setup package to create Service Fabric clusters called the standalone Windows Server package.ps1 script through an administrator PowerShell session. from the standalone package .DevCluster.Unsecure. copy the sample config file to the local machine. Get support for the Service Fabric standalone package Ask the community about the Service Fabric standalone package for Windows Server in the Azure Service Fabric forum. Learn more about Professional Support from Microsoft here. Create a standalone cluster running on Windows Server 3/29/2017 • 5 min to read • Edit Online You can use Azure Service Fabric to create Service Fabric clusters on any virtual machines or computers running Windows Server. For more details. You can download a copy of the EULA now. NOTE This standalone Windows Server package is commercially available and may be used for production deployments. To collect logs for support purpose. This package may contain new Service Fabric features that are in "Preview". use the Service Fabric Standalone Package for Windows Server (2012 R2 and newer) found here: Download Link . please download the runtime package out of band from here: Download Link . be it on premises or with any cloud provider.Windows Server Find details on contents of the package here. Download the Service Fabric standalone package To create the cluster. Scroll down to "Preview features included in this package. then run the CreateServiceFabricCluster." section for the list of the preview features. please see Azure Service Fabric support options.Service Fabric Standalone Package .Windows Server Find Standalone Cluster Configuration samples at: Standalone Cluster Configuration Samples Create the cluster Service Fabric can be deployed to a one-machine development cluster by using the ClusterConfig. Unpack the standalone package to your machine.Service Fabric Runtime . run Service Fabric Standalone Log collector. This article walks you through the steps for creating a Service Fabric standalone cluster. Best Practices Analyzer completed successfully. powershell .\CreateServiceFabricCluster. Traces will be written to existing trace folder: C:\temp\Microsoft. Plan and prepare your cluster deployment 1. Trace folder already exists.json You should see output like below. sanity checks have passed and the cluster looks to be deployable based on the input configuration.folder: Step 1A: Create an unsecured local development cluster .ps1 -ClusterConfigFilePath .. Validate the configuration file you have written by running the TestConfiguration.ps1 PowerShell script. If you're finished running development scenarios. as detailed in the cluster configuration file FabricSettings section (by default c:\ProgramData\SF). FabricHost.exe and Fabric. based in the directory from which the script was run.json -AcceptEULA See Environment Setup section at Plan and prepare your cluster deployment for troubleshooting details. see Service fabric connect to secure cluster.ps1 -ClusterConfigFilePath . To see if Service Fabric was deployed correctly to a machine. Create the cluster: Run the CreateServiceFabricCluster. Step 1B: Create a multi-machine cluster After you have gone through the planning and preparation steps detailed at the below link.\ClusterConfig.ps1 script to deploy the Service Fabric cluster across each machine in the configuration.\ClusterConfig. LocalAdminPrivilege : True IsJsonValid : True IsCabValid : True RequiredPortsOpen : True RemoteRegistryAvailable : True FirewallAvailable : True RpcCheckPassed : True NoConflictingInstallations : True FabricInstallable : True Passed : True 1.Unsecure. you are ready to create your production cluster using your cluster configuration file. Step 2: Connect to the cluster To connect to a secure cluster.json -AcceptEULA NOTE Deployment traces are written to the VM/machine on which you ran the CreateServiceFabricCluster.ps1 script from the standalone package folder: . you can remove the Service Fabric cluster from the machine by referring to steps in section below "Remove a cluster".WindowsServer\DeploymentTraces Running Best Practices Analyzer.\ClusterConfig.ServiceFabric..Azure. As well. These can be found in the subfolder DeploymentTraces. If the bottom field "Passed" is returned as "True".\TestConfiguration. find the installed files in the FabricDataRoot directory.DevCluster. .exe processes can be seen running in Task Manager.\CreateServiceFabricCluster.ps1 -ClusterConfigFilePath . com/collect/v1 once every day. the product collects telemetry on the Service Fabric usage to improve the product. The Best Practice Analyzer that runs as a part of the setup checks for connectivity to https://vortex. the setup fails unless you opt out of telemetry. 1.data. This script can be run on any machine that has administrator access to all the machines that are listed as nodes in the cluster configuration file.2345:19000 Step 3: Bring up Service Fabric Explorer Now you can connect to the cluster with Service Fabric Explorer either directly from one of the machines with http://localhost:19080/Explorer/index.123. The machine that this script is run on does not have to be part of the cluster. You can optionally specify a location for the log of the deletion. The telemetry pipeline tries to upload the following data to https://vortex.microsoft.ps1 -ClusterConfigFilePath .data.\ClusterConfig. No other nodes send out telemetry.13. Remove a cluster To remove a cluster. The telemetry is only sent from the node that runs the failover manager primary.\CleanFabric.ps1 PowerShell script from the package folder and pass in the path to the JSON configuration file. If it is not reachable. # Removes Service Fabric from each machine in the configuration . It is a best-effort upload and has no impact on the cluster functionality. Add and remove nodes You can add or remove nodes to your standalone Service Fabric cluster as your business needs change. run the RemoveServiceFabricCluster. The telemetry consists of the following: Number of services Number of ServiceTypes Number of Applications Number of ApplicationUpgrades Number of FailoverUnits Number of InBuildFailoverUnits Number of UnhealthyFailoverUnits Number of Replicas Number of InBuildReplicas Number of StandByReplicas .com/collect/v1.\RemoveServiceFabricCluster. run the following PowerShell command: Connect-ServiceFabricCluster -ConnectionEndpoint <*IPAddressofaMachine*>:<Client connection end point port> Connect-ServiceFabricCluster -ConnectionEndpoint 192.microsoft.json -Force # Removes Service Fabric from the current machine .html.ps1 Telemetry data collected and how to opt out of it As a default.To connect to an unsecure cluster. 2.html or remotely with http://<IPAddressofaMachine>:19080/Explorer/index. See Add or Remove nodes to a Service Fabric standalone cluster for detailed steps. x). add the following to properties in your cluster config: enableTelemetry: false. manually or automatically. Next steps Deploy and remove applications using PowerShell Configuration settings for standalone Windows cluster Add or remove nodes to a standalone Service Fabric cluster Upgrade a standalone Service Fabric cluster version Create a standalone Service Fabric cluster with Azure VMs running Windows Secure a standalone cluster on Windows using Windows security Secure a standalone cluster on Windows using X509 certificates .3. Number of OfflineReplicas CommonQueueLength QueryQueueLength FailoverUnitQueueLength CommitQueueLength Number of Nodes IsContextComplete: True/False ClusterId: This is a GUID randomly generated for each cluster ServiceFabricVersion IP address of the virtual machine or machine from which the telemetry is uploaded To disable telemetry. NOTE Starting with the new GA version of the standalone cluster for Windows Server (version 5. Preview features included in this package None. you can upgrade your cluster to future releases.204. Refer to Upgrade a standalone Service Fabric cluster version document for details. This ensures that the network traffic can communicate between the machines. . get the IP address by opening a command prompt and typing ipconfig . Change the cluster name at the top and save the file. 5. 6. Once created you should see all three VMs in the same virtual network. Local Server dashboard. Alternatively you can see the IP address of each machine on the portal. however you are not creating an Azure cloud-based Service Fabric cluster. A partial example of the cluster manifest is shown below. Connect to each of the VMs and turn off the Windows Firewall using the Server Manager. Add a couple more VMs to the same resource group. Steps to create the standalone cluster 1. by selecting the virtual network resource for the resource group and checking the network interfaces created for each of these machines. 3. using the standalone Service Fabric installer for Windows Server. 2. whereas the Azure cloud-based Service Fabric clusters are managed and upgraded by the Service Fabric resource provider. Read the article Create a Windows VM in the Azure portal for more details. Open the ClusterConfig. Connect to one of the VMs and download the standalone Service Fabric package for Windows Server into a new folder on the machine and extract the package.Unsecure. The distinction in following this pattern is that the standalone Service Fabric cluster created by the following steps is entirely managed by you.MultiMachine. Ensure that each of the VMs has the same administrator user name and password when created. Connect to one of the VMs and test that you can ping the other two VMs successfully. Sign in to the Azure portal and create a new Windows Server 2012 R2 Datacenter or Windows Server 2016 Datacenter VM in a resource group. While connected to each machine. 4. The scenario is a special case of Create and manage a cluster running on Windows Server where the VMs are Azure VMs running Windows Server. Create a three node standalone Service Fabric cluster with Azure virtual machines running Windows Server 3/29/2017 • 2 min to read • Edit Online This article describes how to create a cluster on Windows-based Azure virtual machines (VMs).json file in Notepad and edit each node with the three IP addresses of the machines. you will need to set up a domain controller to manage Active Directory. decide the security measure you would like to use and follow the steps at the associated link: X509 Certificate or Windows Security. "iPAddress": "10. 8.6". "faultDomain": "fd:/dc1/r0". 1. "iPAddress": "10. "upgradeDomain": "UD1" }. for example by using http://10.\ClusterConfig. Next steps Create standalone Service Fabric clusters on Windows Server or Linux Add or remove nodes to a standalone Service Fabric cluster Configuration settings for standalone Windows cluster Secure a standalone cluster on Windows using Windows security Secure a standalone cluster on Windows using X509 certificates .0". "nodes": [ { "nodeName": "standalonenode0". "faultDomain": "fd:/dc2/r0".0. "faultDomain": "fd:/dc3/r0". "nodeTypeRef": "NodeType0". "iPAddress": "10.1.json The script will remotely configure the Service Fabric cluster and should report progress as deployment rolls through. "clusterConfigurationVersion": "1.5:19080/Explorer/index. { "nodeName": "standalonenode1".4".1.MultiMachine.1. { "name": "SampleCluster". Note that using a domain controller machine as a Service Fabric node is not supported. After about a minute.0. 7. "upgradeDomain": "UD0" }.5". "nodeTypeRef": "NodeType0". "upgradeDomain": "UD2" } ]. "apiVersion": "01-2017".\CreateServiceFabricCluster. { "nodeName": "standalonenode2".0. you can check if the cluster is operational by connecting to the Service Fabric Explorer using one of the machine's IP addresses. If you intend this to be a secure cluster. "nodeTypeRef": "NodeType0".Unsecure.0. If setting up the cluster using Windows Security. Open a PowerShell ISE window.1. Run the following PowerShell command to deploy the cluster: .0. Navigate to the folder where you extracted the downloaded standalone installer package and saved the cluster configuration file.html .ps1 -ClusterConfigFilePath . Copy or download the standalone package for Service Fabric for Windows Server to this VM/machine and unzip the package.\RemoveNode. your business needs may change so that you might need to add or remove multiple nodes to your cluster. IP address 182.ps1 -ExistingClientConnectionEndpoint 182.17. you can choose the IP address of any node in the cluster. 2. with type NodeType0. 2. This article provides detailed steps to achieve this goal. 4. This other node will in turn update the cluster configuration for the removed node. 5.52 - ExistingClientConnectionEndpoint 182. The example below adds a new node called VM5. 1.17. and navigate to the location of the unzipped package. Copy or download the standalone package for Service Fabric for Windows Server and unzip the package to this VM/machine. For this endpoint.34. The ExistingClusterConnectionEndPoint is a connection endpoint for a node already in the existing cluster.17. The example below removes the current node from the cluster. Run Powershell as an administrator. Choose the IP address and the endpoint port of any other node in the cluster. Add or remove nodes to a standalone Service Fabric cluster running on Windows Server 2/15/2017 • 2 min to read • Edit Online After you have created your standalone Service Fabric cluster on Windows Server machines. The ExistingClientConnectionEndpoint is a client connection endpoint for any node that will remain in the cluster.34. .50:19000 -UpgradeDomain UD1 -FaultDomain fd:/dc1/r0 -AcceptEULA You can check if the new node is added by running the cmdlet Get-ServiceFabricNode.ps1 in PowerShell.ps1 -NodeName VM5 -NodeType NodeType0 -NodeIPAddressorFQDN 182. 3. Remote desktop (RDP) into the VM/machine that you want to remove from the cluster. Remove nodes from your cluster Depending on the Reliability level chosen for the cluster. 3.52 into UD1 and fd:/dc1/r0. Run Powershell as an administrator. 4. Add nodes to your cluster 1. Run RemoveNode. 6. Also note that running RemoveNode command on a dev cluster is not supported.17. and navigate to the location of the unzipped package.50:19000 . Plan which fault domain and upgrade domain you are going to add this VM/machine to. you cannot remove the first n (3/5/7/9) nodes of the primary node type.34. . Prepare the VM/machine you want to add to your cluster by following the steps mentioned in the Prepare the machines to meet the prerequisites for cluster deployment section.34. Remote desktop (RDP) into the VM/machine that you want to add to the cluster.\AddNode. Run AddNode.ps1 Powershell with the parameters describing the new node to add. please note that this is a known defect. Remove node types from your cluster Removing a node type requires extra caution. NOTE Some nodes may not be removed due to system services dependencies. Even after removing a node. instead of removing and then adding in batches. Next steps Configuration settings for standalone Windows cluster Secure a standalone cluster on Windows using X509 certificates Create a standalone Service Fabric cluster with Azure VMs running Windows . It will be fixed in an upcoming release. please double check if there is any node referencing the node type. if it shows up as being down in queries and SFX. These nodes are primary nodes and can be identified by querying the cluster manifest using Get-ServiceFabricClusterManifest and finding node entries marked with IsSeedNode=”true” . Replace primary nodes of your cluster The replacement of primary nodes should be performed one node after another. Before removing a node type. The machine that this script is run on does not have to be part of the cluster. Upgrade clusters that have connectivity to download the latest code and configuration Use these steps to upgrade your cluster to a supported version if your cluster nodes have Internet connectivity to http://download. NOTE Make sure that your cluster always runs a supported Service Fabric version. Microsoft periodically checks for the availability of new Service Fabric versions. you must re-create the cluster with the new version. The new release is available to choose at that point. Connect to the cluster from any machine that has administrator access to all the machines that are listed as nodes in the cluster." After the cluster is running the latest version. Additionally. Cluster upgrade workflow After you see the cluster health warning. to inform the customer of this new version. . One workflow is for clusters that have connectivity to download the latest version automatically.com. When Microsoft announces the release of a new version of Service Fabric. set the fabricClusterAutoupgradeEnabled cluster configuration to false. This article describes how you can make sure that the cluster always runs supported versions of Service Fabric code and configurations.microsoft. Control the Service Fabric version that runs on your cluster To set your cluster to download updates of Service Fabric when Microsoft releases a new version. the package is downloaded locally to the cluster and provisioned for upgrade.microsoft. Two distinct workflows can upgrade your cluster to the latest version or a supported Service Fabric version.com. An Azure Service Fabric cluster is a resource that you own. The other workflow is for clusters that do not have connectivity to download the latest Service Fabric version. You can upgrade your cluster to the new version only if you are using a production-style node configuration. set the fabricClusterAutoupgradeEnabled cluster configuration to true. To select a supported version of Service Fabric that you want your cluster to be on. the system shows an explicit cluster health warning that's similar to the following: “The current cluster version [version#] support ends [Date]. For clusters that have connectivity to http://download. where each Service Fabric node is allocated on a separate physical or virtual machine. the warning goes away. Upgrade your standalone Azure Service Fabric cluster on Windows Server 3/29/2017 • 5 min to read • Edit Online For any modern system. If you have a development cluster. do the following: 1. New releases are announced on the Service Fabric team blog. the previous version is marked for end of support after a minimum of 60 days from the date of the announcement. When a new Service Fabric version is available. the ability to upgrade is a key to the long-term success of your product. where more than one Service Fabric node is on a single physical or virtual machine. something. Get the list of Service Fabric versions that you can upgrade to.301.microsoft.com. see documentation for Start- ServiceFabricClusterUpgrade. Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion <codeversion#> -Monitored -FailureAction Rollback ###### Here is a filled-out example Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion 5. the upgrade is rolled back. initiate the upgrade again by following the same steps as previously described. Upgrade clusters that have no connectivity to download the latest code and configuration Use these steps to upgrade your cluster to a supported version if your cluster nodes do not have Internet connectivity to http://download. ###### Get the list of available Service Fabric versions Get-ServiceFabricRegisteredClusterCodeVersion You should get an output similar to this: 3.com:19000" $CertThumbprint= "70EF5E22ADB649799DA3C8B6A6BF7FG2D630F8F3" Connect-serviceFabricCluster -ConnectionEndpoint $ClusterName -KeepAliveIntervalInSec 10 ` -X509Credential ` -ServerCertThumbprint $CertThumbprint ` -FindType FindByThumbprint ` -FindValue $CertThumbprint ` -StoreLocation CurrentUser ` -StoreName My 2. Start a cluster upgrade to an available version by using the Start-ServiceFabricClusterUpgrade PowerShell cmd. After you fix the issues that resulted in the rollback. ###### connect to the secure cluster using certs $ClusterName= "mysecurecluster.3. Get-ServiceFabricClusterUpgrade If the cluster health policies are not met.9590 -Monitored -FailureAction Rollback To monitor the progress of the upgrade. To specify custom health policies for the Start-ServiceFabricClusterUpgrade command. you can use Service Fabric Explorer or run the following Windows PowerShell command. . cab file> ###### Here is a filled-out example Register-ServiceFabricClusterPackage -Code -CodePackagePath MicrosoftAzureServiceFabric.5. Run the following from an internet connected machine to list all upgrade compatible versions with the current version and download the corresponding package from the associated download links.cab file including the path to it> -ImageStoreConnectionString "fabric:ImageStore" ###### Here is a filled-out example Copy-ServiceFabricClusterPackage -Code -CodePackagePath . ###### Get list of all upgrade compatible packages Get-ServiceFabricRuntimeUpgradeVersion -BaseVersion <TargetCodeVersion as noted in Step 1> 3.3. ###### Get the list of available Service Fabric versions Register-ServiceFabricClusterPackage -Code -CodePackagePath <name of the . Run Get-ServiceFabricClusterUpgrade from one of the nodes in the cluster and note the TargetCodeVersion. The system does not show a cluster health warning to alert you of a new release.5.cab 6. Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath <Path to Configuration File> Cluster upgrade workflow 1.cab -ImageStoreConnectionString "fabric:ImageStore" 4. you will have to monitor the Service Fabric team blog to learn about a new release.301. Refer to Start-ServiceFabricClusterConfigurationUpgrade PS cmd for usage details.9590. Connect to the cluster from any machine that has administrator access to all the machines that are listed as nodes in the cluster. Modify your cluster configuration to set the following property to false before you start a configuration upgrade. 5.\MicrosoftAzureServiceFabric. "fabricClusterAutoupgradeEnabled": false.3.301. Start a cluster upgrade to an available version. Register the copied package. Copy the downloaded package into the cluster image store. . The machine that this script is run on does not have to be part of the cluster ###### Get the list of available Service Fabric versions Copy-ServiceFabricClusterPackage -Code -CodePackagePath <name of the . 2. NOTE If you are running a cluster that is not connected to the Internet. Make sure to update 'clusterConfigurationVersion' in your JSON before you start the configuration upgrade.9590. Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion <codeversion#> -Monitored -FailureAction Rollback ###### Here is a filled-out example Start-ServiceFabricClusterUpgrade -Code -CodePackageVersion 5. or you can run the following PowerShell command..'. The configuration upgrade is processed upgrade domain by upgrade domain.3.301. Learn how to scale your cluster in and out.9590 -Monitored -FailureAction Rollback You can monitor the progress of the upgrade on Service Fabric Explorer. initiate the upgrade again by following the same steps as previously described. Learn about application upgrades. After you fix the issues that resulted in the rollback. Next steps Learn how to customize some Service Fabric cluster settings. Double certificate upgrade: The upgrade path is 'Certificate A (Primary) -> Certificate A (Primary) and B (Secondary) -> Certificate B (Primary) -> Certificate B (Primary) and C (Secondary) -> Certificate C (Primary) -> . .'. Upgrade the cluster configuration To upgrade the cluster configuration upgrade. 2.. two options are supported: 1. run Start-ServiceFabricClusterConfigurationUpgrade. Start-ServiceFabricClusterConfigurationUpgrade -ClusterConfigPath <Path to Configuration File> Cluster certificate config upgrade Cluster certificate is used for authentication between cluster nodes. Get-ServiceFabricClusterUpgrade If the cluster health policies are not met. To specify custom health policies for the Start-ServiceFabricClusterUpgrade command... Technically. Single certificate upgrade: The upgrade path is 'Certificate A (Primary) -> Certificate B (Primary) -> Certificate C (Primary) -> . so the certificate roll over should be performed with extra caution because failure will block the communication among cluster nodes. see the documentation for Start- ServiceFabricClusterUpgrade. the upgrade is rolled back. By default. and the ability to resolve applications and services. It can perform any reads and writes on the Service Fabric cluster. Role-based access control for Service Fabric clients 3/3/2017 • 2 min to read • Edit Online Azure Service Fabric supports two different access control types for clients that are connected to a Service Fabric cluster: administrator and user. including the following operations: Application and service operations CreateService: service creation CreateServiceFromTemplate: service creation from template UpdateService: service updates DeleteService: service deletion ProvisionApplicationType: application type provisioning CreateApplication: application creation DeleteApplication: application deletion UpgradeApplication: starting or interrupting application upgrades UnprovisionApplicationType: application type unprovisioning MoveNextUpgradeDomain: resuming application upgrades with an explicit update domain ReportUpgradeHealth: resuming application upgrades with the current upgrade progress ReportHealth: reporting health PredeployPackageToNode: predeployment API CodePackageControl: restarting code packages RecoverPartition: recovering a partition RecoverPartitions: recovering partitions RecoverServicePartitions: recovering service partitions RecoverSystemPartitions: recovering system service partitions Cluster operations ProvisionFabric: MSI and/or cluster manifest provisioning UpgradeFabric: starting cluster upgrades UnprovisionFabric: MSI and/or cluster manifest unprovisioning MoveNextFabricUpgradeDomain: resuming cluster upgrades with an explicit update domain ReportFabricUpgradeHealth: resuming cluster upgrades with the current upgrade progress StartInfrastructureTask: starting infrastructure tasks FinishInfrastructureTask: finishing infrastructure tasks . Administrators have full access to management capabilities (including read/write capabilities). query capabilities). Default access control settings The administrator access control type has full access to all the FabricClient APIs. See Service Fabric cluster security for details on setting up a secure Service Fabric cluster. You specify the two client roles (administrator and client) at the time of cluster creation by providing separate certificates for each. Access control allows the cluster administrator to limit access to certain cluster operations for different groups of users. making the cluster more secure. users only have read access to management capabilities (for example. Changing default settings for client roles In the cluster manifest file. and providing the preceding settings in the name. and value fields. Next steps Service Fabric cluster security Service Fabric cluster creation . and restarting nodes MoveReplicaControl: moving replicas from one node to another Miscellaneous operations Ping: client pings Query: all queries allowed NameExists: naming URI existence checks The user access control type is. stopping. user. admin. You can change the defaults by going to the Fabric Settings option during cluster creation. InvokeInfrastructureCommand: infrastructure task management commands ActivateNode: activating a node DeactivateNode: deactivating a node DeactivateNodesBatch: deactivating multiple nodes RemoveNodeDeactivations: reverting deactivation on multiple nodes GetNodeDeactivationStatus: checking deactivation status NodeStateRemoved: reporting node state removed ReportFault: reporting fault FileContent: image store client file transfer (external to cluster) FileDownload: image store client file download initiation (external to cluster) InternalList: image store client file list operation (internal) Delete: image store client delete operation Upload: image store client upload operation NodeControl: starting. by default. limited to the following operations: EnumerateSubnames: naming URI enumeration EnumerateProperties: naming property enumeration PropertyReadBatch: naming property read operations GetServiceDescription: long-poll service notifications and reading service descriptions ResolveService: complaint-based service resolution ResolveNameOwner: resolving naming URI owner ResolvePartition: resolving system services ServiceNotifications: event-based service notifications GetUpgradeStatus: polling application upgrade status GetFabricUpgradeStatus: polling cluster upgrade status InvokeInfrastructureQuery: querying infrastructure tasks List: image store client file list operation ResetPartitionLoad: resetting load for a failover unit ToggleVerboseServicePlacementHealthReporting: toggling verbose service placement health reporting The admin access control also has access to the preceding operations. you can provide admin capabilities to the client if needed. "clusterConfigurationVersion": "1. for your standalone cluster. . 1. 3. The number of primary nodes for these cluster will be based on the reliability level. The samples having DevCluster in their names will help you create a cluster with all three nodes on the same machine.DevCluster. ClusterConfig.JSON file are downloaded to your work machine. a few samples of the ClusterConfig. secured using X509 certificate-based security. secured using Windows security.JSON file as below. 2.JSON file. Nodes on the cluster You can configure the nodes on your Service Fabric cluster by using the nodes section. as the following snippet shows.JSON show how to create an unsecured test or production cluster respectively.0. You can give any friendly name to your Service Fabric cluster by assigning it to the name variable.0". different types of nodes on the cluster. When you download the standalone Service Fabric package. Out of these. Configuration settings for standalone Windows cluster 3/29/2017 • 7 min to read • Edit Online This article describes how to configure a standalone Service Fabric cluster using the ClusterConfig.JSON and ClusterConfig.MultiMachine.Unsecure. will help you create a production quality cluster. like logical nodes.Windows. with each node on a separate machine. The clusterConfigurationVersion is the version number of your cluster.JSON show how to create test or production cluster.X509. This cluster is useful for a development or test environment and is not supported as a production cluster. as shown in the JSON snippet below.MultiMachine.X509.DevCluster. ClusterConfig.JSON show how to create test or production cluster. ClusterConfig. You should however leave the apiVersion to the default value. The samples having MultiMachine in their names.Unsecure. "apiVersion": "2016-09-26".DevCluster. "name": "SampleCluster". you should increase it every time you upgrade your Service Fabric cluster. You can use this file to specify information such as the Service Fabric nodes and their IP addresses.JSON and ClusterConfig. Now we will examine the various sections of a ClusterConfig.MultiMachine. the security configurations as well as the network topology in terms of fault/upgrade domains. at least one node must be marked as a primary node.Windows.JSON and ClusterConfig. General cluster configurations This covers the broad cluster specific configurations. . iPAddress Find out the IP address of your node by opening a command window and typing ipconfig . 5 for Silver. "iPAddress": "localhost". "faultDomain": "fd:/dc1/r2". you would need a minimum of 3 primary nodes for Bronze. You can set this variable to either Bronze. Note that since a primary node runs a single copy of the system services. faultDomain Fault domains enable cluster administrators to define the physical nodes that might fail at the same time due to shared physical dependencies. Reliability The reliabilityLevel section defines the number of copies of the system services that can run on the primary nodes of the cluster. 5. "upgradeDomain": "UD2" }]. "faultDomain": "fd:/dc1/r0". This increases the reliability of these services and hence the cluster. Cluster properties The properties section in the ClusterConfig. "nodeTypeRef": "NodeType1". "upgradeDomain": "UD0" }. "upgradeDomain": "UD1" }. nodeTypeRef Each node can be assigned a different node type. NODE CONFIGURATION DESCRIPTION nodeName You can give any friendly name to the node. { "nodeName": "vm1". 7 for Gold and 9 for Platinum reliability levels. "faultDomain": "fd:/dc1/r1". "reliabilityLevel": "Bronze". A Service Fabric cluster must contain at least 3 nodes. upgradeDomain Upgrade domains describe sets of nodes that are shut down for Service Fabric upgrades at about the same time.JSON is used to configure the cluster as follows. The following table explains the configuration settings for each node. "iPAddress": "localhost". "nodeTypeRef": "NodeType0". You can add more nodes to this section as per your setup. "nodeTypeRef": "NodeType2". "iPAddress": "localhost". "nodes": [{ "nodeName": "vm0". as they are not limited by any physical requirements. See an example below. You can choose which nodes to assign to which Upgrade domains. 7 or 9 copies of these services respectively. Note the IPV4 address and assign it to the iPAddress variable. The node types are defined in the section below. Gold or Platinum for 3. Silver. { "nodeName": "vm2". "dataDeletionAgeInDays": "7". "dataDeletionAgeInDays": "7". "connectionstring": "c:\\ProgramData\\SF\\DiagnosticsStore" } The metadata is a description of your cluster diagnostics and can be set as per your setup. "storeType": "FileShare". "ServerCredentialType": "X509". "connectionstring": "xstore:DefaultEndpointsProtocol=https. They can be set to either X509 for a certificate-based security. Node Types The nodeTypes section describes the type of the nodes that your cluster has.Diagnostics The diagnosticsStore section allows you to configure parameters to enable diagnostics and troubleshooting node or cluster failures. . Read Certificates-based security in a standalone cluster or Windows security in a standalone cluster for information on how to fill out the rest of the security section. as shown in the following snippet.". . "security": { "metadata": "This cluster is secured using X509 certificates. Read Tracelog and ETW Tracing for more information on ETW trace logs. At least one node type must be specified for a cluster. as shown in the snippet below.AccountName=[AzureAccountName]. See below for a sample snippet.AccountKey= [AzureAccountKey]" } Security The security section is necessary for a secure standalone Service Fabric cluster.". .". The following snippet shows a part of this section. "IsEncrypted": "false". The ClusterCredentialType and ServerCredentialType determine the type of security that the cluster and the nodes will implement. "diagnosticsStore": { "metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines. These variables help in collecting ETW trace logs. "storeType": "AzureStorage". All logs including Crash dumps and performance counters can be directed to the connectionString folder on your machine. The rest of the security section will be based on the type of the security. } The metadata is a description of your secure cluster and can be set as per your setup. "diagnosticsStore": { "metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines. "IsEncrypted": "false". crash dumps as well as performance counters. or Windows for an Azure Active Directory-based security. You can also use AzureStorage for storing diagnostics. "ClusterCredentialType": "X509". . as well as take care of opening the firewall for these ports. serviceConnectionEndpointPort is the port used by the applications and services deployed on a node. to communicate with the Service Fabric client on that particular node. there will be one or more primary nodes (i. See the configured dynamic port range by running netsh int ipv4 show dynamicport tcp . define the connection endpoints that will be used. You may run into conflicts if this difference is too low. "endPort": "20605" }. "endPort": "20861" }. depending on the reliabilityLevel. as long as they do not conflict with any other endpoints in this cluster. "isPrimary": true }] The name is the friendly name for this particular node type. "clientConnectionEndpointPort": "19000". "ephemeralPorts": { "startPort": "20606". . You can customize these only during the initial cluster creation. httpGatewayEndpointPort is the port used by the Service Fabric Explorer to connect to the cluster. Service Fabric will use a part of these as application ports and the remaining will be available for the OS. clusterConnectionEndpointPort is the port at which the nodes communicate with each other. when using the client APIs. For each node type. assign its friendly name to the nodeTypeRef variable for that node. "reverseProxyEndpointPort": "19081". See Service Fabric Reverse Proxy for more details. This range should be exclusive from the dynamic port range on the machine. applicationPorts are the ports that will be used by the Service Fabric applications. The application port range should be large enough to cover the endpoint requirement of your applications. Service Fabric will use these whenever new ports are required. so for all purposes. "httpGatewayEndpointPort": "19080". "applicationPorts": { "startPort": "20575". "clusterConnectionEndpointPort": "19001". the ephemeralPorts range as set in the configuration. It will also map this range to the existing range present in the OS. Read Service Fabric cluster capacity planning considerations for information on nodeTypes and reliabilityLevel values. you can use the ranges given in the sample JSON files. reverseProxyEndpointPort is an optional reverse proxy endpoint. since this range is shared with the operating system. You need to make sure that the difference between the start and the end ports is at least 255. See below for a sample snippet of this section. "leaseDriverEndpointPort": "19002" "serviceConnectionEndpointPort": "19003".e. "nodeTypes": [{ "name": "NodeType0". In a multi-node cluster. and to know what are primary and the non-primary node types. You can choose any port number for these connection endpoints. isPrimary set to true). i. Endpoints used to configure the node types clientConnectionEndpointPort is the port used by the client to connect to the cluster. ephemeralPorts override the dynamic ports used by the OS.e. Log Settings The fabricSettings section allows you to set the root directories for the Service Fabric data and logs. as mentioned above. leaseDriverEndpointPort is the port used by the cluster lease driver to find out if the nodes are still active. To create a node of this node type. "value": "4096" }] }] Next steps Once you have a complete ClusterConfig. "fabricSettings": [{ "name": "Setup". Note that if you customize only the data root. "parameters": [{ "name": "SharedLogSizeInMB". { "name": "FabricLogRoot". The example below shows how to change the the shared transaction log that gets created to back any reliable collections for stateful services. then the log root will be placed one level below the data root. .JSON file configured as per your standalone cluster setup. "parameters": [{ "name": "FabricDataRoot". Stateful Reliable Service Settings The KtlLogger section allows you to set the global configuration settings for Reliable Services. "fabricSettings": [{ "name": "KtlLogger". "value": "C:\\ProgramData\\SF" }. For more details on these settings read Configure stateful reliable services. you can deploy your cluster by following the article Create a standalone Service Fabric cluster and then proceed to visualizing your cluster with Service Fabric Explorer. "value": "C:\\ProgramData\\SF\\Log" }] We recommended using a non-OS drive as the FabricDataRoot and FabricLogRoot as it provides more reliability against OS crashes. using X. Open the file and review the section for security under the properties section: .MultiMachine. and role-based access control. you will find a ClusterConfig.509 certificates. Secure a standalone cluster on Windows using X. download the standalone cluster package to one of the nodes in your cluster.X509. Certificate security should be enabled on the cluster when the cluster is created. For more information on cluster security such as node-to-node security. the deployed applications and perform management tasks. see Cluster security scenarios. In the downloaded package.json file.509 certificates 3/16/2017 • 9 min to read • Edit Online This article describes how to secure the communication between the various nodes of your standalone Windows cluster. Which certificates will you need? To start with. as well as how to authenticate clients connecting to this cluster. client-to-node security. This ensures that only authorized users can access the cluster. ". In those scenarios. you may only want to specify the ClientCertificateThumbprints or ReverseProxyCertificate. "ClientCertificateThumbprints": [ { "CertificateThumbprint": "[Thumbprint]". "CertificateInformation": { "ClusterCertificate": { "Thumbprint": "[Thumbprint]". we recommend to have both these certificates for a properly secured cluster. Read How to retrieve thumbprint of a certificate to find out the thumbprint of the certificates that you create. "ServerCredentialType": "X509". set the ServerCredentialType to X509. If you are specifying a cluster certificate. "ServerCertificate": { "Thumbprint": "[Thumbprint]". "ThumbprintSecondary": "[Thumbprint]". { "CertificateThumbprint": "[Thumbprint]". "IsAdmin": true } ]. The following table lists the certificates that you will need on your cluster setup: CERTIFICATEINFORMATION SETTING DESCRIPTION . "ClusterCredentialType": "X509". NOTE A thumbprint is the primary identity of a certificate. "ThumbprintSecondary": "[Thumbprint]". "X509StoreName": "My" }. "X509StoreName": "My" } } } This section describes the certificates that you need for securing your standalone Windows cluster. Although not mandatory. "IsAdmin": true } ] "ReverseProxyCertificate":{ "Thumbprint": "[Thumbprint]". "CertificateIssuerThumbprint" : "[Thumbprint]". In some scenarios. "ClientCertificateCommonNames": [ { "CertificateCommonName": "[CertificateCommonName]". "security": { "metadata": "The Credential type X509 indicates this is cluster is secured using X509 Certificates. The thumbprint format is . "X509StoreName": "My" }. If you are specifying server certificate for outside connections. you need not set ClusterCredentialType or ServerCredentialType to X509. set the value of ClusterCredentialType to X509. "ThumbprintSecondary": "[Thumbprint]".d5 ec 42 3b 79 cb e5 07 fd 83 59 3c 56 b9 d5 31 24 25 42 64. If you set these values to X509 then you must also specify the corresponding certificates or Service Fabric will throw an exception. "IsAdmin": false }. You can use two different certificates. Read Working with certificates to know more about common names and the issuer.4". Set the thumbprint of the primary certificate in the Thumbprint section and that of the secondary in the ThumbprintSecondary variables. a primary and a secondary for upgrade. The CertificateIssuerThumbprint is the thumbprint for the issuer of this certificate. ServerCertificate This certificate is presented to the client when it tries to connect to this cluster.0". { "nodeName": "vm1". "iPAddress": "10. "upgradeDomain": "UD0" }. ReverseProxyCertificate This is an optional certificate that can be specified if you want to secure your Reverse Proxy. then the client with this certificate installed on it can do administrator management activities on the cluster.7. "clusterConfigurationVersion": "1. Here is example cluster configuration where the Cluster. ClientCertificateThumbprints This is a set of certificates that you want to install on the authenticated clients. "metadata": "Replace the localhost below with valid IP address or FQDN". "faultDomain": "fd:/dc1/r1". If the IsAdmin is false.0. "nodes": [{ "nodeName": "vm0". { "name": "SampleCluster". You can have a number of different client certificates installed on the machines that you want to allow access to the cluster. Make sure reverseProxyEndpointPort is set in nodeTypes if you are using this certificate. Server. { "nodeName": "vm2". You can use two different server certificates. typically read-only. CERTIFICATEINFORMATION SETTING DESCRIPTION ClusterCertificate This certificate is required to secure the communication between the nodes on a cluster. For convenience. For more information on roles read Role based access control (RBAC) ClientCertificateCommonNames Set the common name of the first client certificate for the CertificateCommonName. "apiVersion": "2016-09-26".7.7. you can choose to use the same certificate for ClusterCertificate and ServerCertificate. "faultDomain": "fd:/dc1/r0". "nodeTypeRef": "NodeType0". Set the thumbprint of each certificate in the CertificateThumbprint variable. "iPAddress": "10. "upgradeDomain": "UD1" }.0. "nodeTypeRef": "NodeType0". and Client certificates have been provided.0. Set the thumbprint of the primary certificate in the Thumbprint section and that of the secondary in the ThumbprintSecondary variables.0. the client with this certificate can only perform the actions allowed for user access rights. .5". "iPAddress": "10.6". "metadata": "Replace the localhost with valid IP address or FQDN". If you set the IsAdmin to true. a primary and a secondary for upgrade. "httpGatewayEndpointPort": "19080". "nodeTypeRef": "NodeType0". "dataDeletionAgeInDays": "7". "serviceConnectionEndpointPort": "19003". "faultDomain": "fd:/dc1/r2". "iPAddress": "10. "clientConnectionEndpointPort": "19000".6". "IsEncrypted": "false". "parameters": [{ "name": "FabricDataRoot". "upgradeDomain": "UD2" }]. "fabricSettings": [{ "name": "Setup". "properties": { "diagnosticsStore": { "metadata": "Please replace the diagnostics store with an actual file share accessible from all cluster machines. "reliabilityLevel": "Bronze". "storeType": "FileShare". "applicationPorts": { "startPort": "20001". "endPort": "20062" }. "leaseDriverEndpointPort": "19002".7. "metadata": "Replace the localhost with valid IP address or FQDN". "X509StoreName": "My" }. "value": "C:\\ProgramData\\SF\\Log" }] }] } } . "endPort": "20031" }. "IsAdmin": true }] } }. "CertificateInformation": { "ClusterCertificate": { "Thumbprint": "a8 13 67 58 f4 ab 89 62 af 2b f3 f2 79 21 be 1d f6 7f 43 26". "ephemeralPorts": { "startPort": "20032". "ServerCertificate": { "Thumbprint": "a8 13 67 58 f4 ab 89 62 af 2b f3 f2 79 21 be 1d f6 7f 43 26". The thumbprint format is . "X509StoreName": "My" }.".0.d5 ec 42 3b 79 cb e5 07 fd 83 59 3c 56 b9 d5 31 24 25 42 64. "IsAdmin": false }. "nodeTypes": [{ "name": "NodeType0". "isPrimary": true } ]. { "name": "FabricLogRoot". "connectionstring": "c:\\ProgramData\\SF\\DiagnosticsStore" } "security": { "metadata": "The Credential type X509 indicates this is cluster is secured using X509 Certificates. "ServerCredentialType": "X509". "ClientCertificateThumbprints": [{ "CertificateThumbprint": "c4 c18 8e aa a8 58 77 98 65 f8 61 4a 0d da 4c 13 c5 a1 37 6e". "value": "C:\\ProgramData\\SF" }. { "CertificateThumbprint": "71 de 04 46 7c 9e d0 54 4d 02 10 98 bc d4 4c 71 e1 83 41 4e". "ClusterCredentialType": "X509". "clusterConnectionEndpointPort": "19001".". Edit this file to change the default name of the certificate (look for the value CN=ServiceFabricDevClusterCert).pfx - Password $pswd To see the details of a certificate installed on the machine you can run the following PowerShell command: $cert = Get-Item Cert:\LocalMachine\My\<Thumbprint> Write-Host $cert. you will first need to obtain X. go to How to: Obtain a Certificate.\CertSetup. Replace the $PfxFilePath with the full path of the . 1. Now export the certificate to a PFX file with a protected password. if you have an Azure subscription. $pswd = "1234" $PfxFilePath ="C:\mypfx.pfx file(s) to the node. Navigate to the Local Computer\Personal folder and find the certificate you just created. Change the String value to a suitable secure password to protect it and run the following in PowerShell: $pswd = ConvertTo-SecureString -String "1234" -Force –AsPlainText Get-ChildItem -Path cert:\localMachine\my\<Thumbprint> | Export-PfxCertificate -FilePath C:\mypfx.ToString($true) Alternatively. which runs under the Network Service account. after removing the spaces.pfx copied to this node. select the Details tab and scroll down to the Thumbprint field. can use it by running the following script. For details on obtaining these certificates.509 certificates To secure communication within the cluster. For clusters that you use for test purposes. for both Cluster and Server certificates and any secondary certificates. Open a PowerShell window as an administrator and enter the following commands. to limit connection to this cluster to authorized machines/users.509 certificate to secure the cluster. First get the thumbprint of the certificate. follow the section Add certificates to Key Vault. Additionally. Double-click the certificate to open it.pfx" Import-PfxCertificate -Exportable -CertStoreLocation Cert:\LocalMachine\My -FilePath $PfxFilePath - Password (ConvertTo-SecureString -String $pswd -AsPlainText -Force) 3. Copy the thumbprint value into the PowerShell command below. Provide the thumbprint of the . Replace the $pswd with the password that you used to create this certificate. Install the certificates Once you have certificate(s). you can install them on the cluster nodes.x installed on them. For clusters that are running production workloads. You will need to repeat these steps on each node. you will need to obtain and install certificates for the client machines.Acquire the X. 2. Now set the access control on this certificate so that the Service Fabric process. Copy the . From the Start menu. you can choose to use a self-signed certificate. Optional: Create a self-signed certificate One way to create a self-signed cert that can be secured correctly is to use the CertSetup.ps1 -Install . run the Manage computer certificates.ps1 script in the Service Fabric SDK folder in the directory C:\Program Files\Microsoft SDKs\Service Fabric\ClusterSetup\Secure.509 certificates for your cluster nodes. Run this script as . Your nodes need to have the latest Windows PowerShell 3. you should use a Certificate Authority (CA) signed X. "FullControl".X509.SetAccessRule($accessRule) # Write back the new acl Set-Acl -Path $keyFullPath -AclObject $acl -ErrorAction Stop # Observe the access rights currently assigned to this certificate. Mandatory=$true)] [ValidateNotNullOrEmpty()] [string]$pfxThumbPrint.ThumbPrint - eq $pfxThumbPrint. You can also use these steps to install the client certificates on the machines that you want to allow access to the cluster."Allow" $accessRule = New-Object -TypeName System.CspKeyContainerInfo.MultiMachine.UniqueKeyContainerName $keyFullPath = Join-Path -Path $keyPath -ChildPath $keyName # Get the current acl of the private key $acl = (Get-Item $keyFullPath). } # Specify the user.MultiMachine.Security. [Parameter(Position=2. Repeat the steps above for each server certificate.GetAccessControl('Access') # Add the new ace to the acl of the private key $acl.FileSystemAccessRule -ArgumentList $permission # Location of the machine related keys $keyPath = Join-Path -Path $env:ProgramData -ChildPath "\Microsoft\Crypto\RSA\MachineKeys" $keyName = $cert. follow the section Connect to a secure cluster using PowerShell to connect to it. your command might look like the following: .MultiMachine.AccessControl.json Once you have the secure standalone Windows cluster successfully running. get-acl $keyFullPath| fl 4. Remember to use the ClusterConfig.\CreateServiceFabricCluster. certificate and "NETWORK SERVICE" for the service account.json file while creating the cluster. You can check that the ACLs on the certificate are correct by opening the certificate in Start > Manage computer certificates and looking at All Tasks > Manage Private Keys.PrivateKey. Mandatory=$true)] [ValidateNotNullOrEmpty()] [string]$serviceAccount ) $cert = Get-ChildItem -Path cert:\LocalMachine\My | Where-Object -FilterScript { $PSItem.json file.X509. param ( [Parameter(Position=1. and have setup the authenticated clients to connect to it.X509.\ClusterConfig. For example: . the permissions and the permission type $permission = "$($serviceAccount)". Create the secure cluster After configuring the security section of the ClusterConfig. For example. you can proceed to Create your cluster section to configure the nodes and create the standalone cluster.ps1 -ClusterConfigFilePath . 5:19000'.MultiMachine.X509. FindType = 'FindByThumbprint'.\RemoveServiceFabricCluster. StoreName = "MY". X509Credential = $True. $ConnectArgs = @{ ConnectionEndpoint = '10. connect to the node on the cluster where you downloaded the Service Fabric package. open a command line and navigate to the package folder. To self-diagnose security issues. Get-ServiceFabricNode to show a list of nodes on this secure cluster.0.json NOTE Incorrect certificate configuration may prevent the cluster from coming up during deployment. To remove the cluster. StoreLocation = 'LocalMachine'. Now run the following command: . .ps1 -ClusterConfigFilePath .7. FindValue = "057b9544a6f2733e0c8d3a60013a58948213f551" } Connect-ServiceFabricCluster $ConnectArgs You can then run other PowerShell commands to work with this cluster. please look in event viewer group Applications and Services Logs > Microsoft-Service Fabric. For example. ServerCertThumbprint = "057b9544a6f2733e0c8d3a60013a58948213f551".\ClusterConfig. .zip standalone cluster package contains a template for configuring Windows security using Group Managed Service Account (gMSA): "security": { "ServerCredentialType": "Windows". For more information about how Service Fabric uses Windows security. The process corresponds to the configure security step of Create a standalone cluster running on Windows. Configure Windows security using gMSA The sample ClusterConfig.Azure. Secure a standalone cluster on Windows by using Windows security 3/29/2017 • 5 min to read • Edit Online To prevent unauthorized access to a Service Fabric cluster. Security is especially important when the cluster runs production workloads. ClustergMSAIdentity Configures node-to-node security. a domain user. see Cluster security scenarios. A group managed service account. "IsAdmin": true } ] } } CONFIGURATION SETTING DESCRIPTION WindowsIdentities Contains the cluster and client identities.ServiceFabric.MultiMachine. . This article describes how to configure node-to-node and client-to-node security by using Windows security in the ClusterConfig. you have to rebuild the full cluster.WindowsServer.JSON configuration file downloaded with the Microsoft. ClusterSPN Fully qualified domain SPN for gMSA account ClientIdentities Configures client-to-node security. NOTE You should consider the selection of node-to-node security carefully because there is no cluster upgrade from one security choice to another.JSON file. To change the security selection. Identity The client identity. you must secure the cluster. "WindowsIdentities": { "ClustergMSAIdentity": "accountname@fqdn" "ClusterSPN": "fqdn" "ClientIdentities": [ { "Identity": "domain\\username". An array of client user accounts.gMSA.Windows. JSON configuration file downloaded with the Microsoft. see Role based access control for Service Fabric clients.Windows.zip standalone cluster package contains a template for configuring Windows security. by default. they must be made aware of each other. Client to node security is configured using ClientIdentities.contoso. This can be done in two different ways: Specify the domain group users that can connect or specify the domain node users that can connect. and the ability to resolve applications and services. "ClientIdentities": [{ "Identity": "CONTOSO\\usera". have only read access to management capabilities (for example. you must configure the cluster to know which client identities that it can trust. This approach does not require the creation of a domain group for which cluster administrators have been granted access rights to add and remove members. false for user client access. For more information on access controls. This can be accomplished in two different ways: Specify the Group Managed Service Account that includes all nodes in the cluster or Specify the domain machine group that includes all nodes in the cluster.WindowsServer. making the cluster more secure. The following example security section configures Windows security using gMSA and specifies that the machines in ServiceFabric. We strongly recommend using the Group Managed Service Account (gMSA) approach. "IsAdmin": true }] } } Configure Windows security using a machine group The sample ClusterConfig. "ClusterSPN" : "clusterA. Administrators have full access to management capabilities (including read/write capabilities). Users. Access control provides the ability for the cluster administrator to limit access to certain types of cluster operations for different groups of users. query capabilities). Node to node security is configured by setting ClustergMSAIdentity when service fabric needs to run under gMSA.contoso. These accounts are also useful for automatic password management. Service Fabric supports two different access control types for clients that are connected to a Service Fabric cluster: administrator and user. In order to build trust relationships between nodes. For more information.com gMSA are part of the cluster and that CONTOSO\usera has admin client access: "security": { "WindowsIdentities": { "ClustergMSAIdentity" : "ServiceFabric. see Getting Started with Group Managed Service Accounts.clusterA.MultiMachine. In order to establish trust between a client and the cluster.com"..contoso. CONFIGURATION SETTING DESCRIPTION IsAdmin True specifies that the domain user has administrator client access. particularly for larger clusters (more than 10 nodes) or for clusters that are likely to grow or shrink. Windows security is configured in the Properties section: .clusterA.com".Azure.ServiceFabric. ClusterIdentity Use a machine group name. Administrators have full access to management capabilities (including read/write capabilities). Specify the domain node users that can connect. To establish trust between a client and the cluster. WindowsIdentities Contains the cluster and client identities. "ServerCredentialType": "Windows". ServerCredentialType Set to Windows to enable Windows security for clients. This indicates that the clients of the cluster and the cluster itself are running within an Active Directory domain. to configure node-to-node security.contoso. by default. IsAdmin Set to true to specify that the domain user has administrator client access or false for user client access. Service Fabric supports two different access control types for clients that are connected to a Service Fabric cluster: administrator and user. specifies that the machines in ServiceFabric/clusterA. Client-to-node security is configured by using ClientIdentities. you must configure the cluster to know the client identities that the cluster can trust. ClientIdentities Configures client-to-node security. have only read access to management capabilities (for example. "ClientIdentities": [{ "Identity": "[domain\username]". Access control enables the cluster administrator to limit access to certain types of cluster operations for different groups of users.com are part of the cluster. and specifies that CONTOSO\usera has admin client access: . "security": { "ClusterCredentialType": "Windows". "IsAdmin": true }] } } CONFIGURATION SETTING DESCRIPTION ClusterCredentialType ClusterCredentialType is set to Windows if ClusterIdentity specifies an Active Directory Machine Group Name. An array of client user accounts. query capabilities). domain\machinegroup. domain\username. "WindowsIdentities": { "ClusterIdentity" : "[domain\machinegroup]". You can establish trust in two different ways: Specify the domain group users that can connect. The following example security section configures Windows security. which makes the cluster more secure. For more information. Users. for the client identity. Node to node security is configured by setting using ClusterIdentity if you want to use a machine group within an Active Directory Domain. see Create a Machine Group in Active Directory. and the ability to resolve applications and services. Identity Add the domain user. . "ServerCredentialType": "Windows". Make sure that ClusterConfig. "ClientIdentities": [{ "Identity": "CONTOSO\\usera". see Cluster security scenarios. resume the cluster creation process in Create a standalone cluster running on Windows. and role-based access control.json does not include the IP address of the domain controller when using a machine group or group Managed Service Account (gMSA). See Connect to a secure cluster for examples of connecting by using PowerShell or FabricClient.JSON file. "IsAdmin": true }] } }. For more information about how node-to-node security. client-to-node security. "WindowsIdentities": { "ClusterIdentity" : "ServiceFabric/clusterA.contoso.com". Next steps After configuring Windows security in the ClusterConfig. "security": { "ClusterCredentialType": "Windows". NOTE Service Fabric should not be deployed on a domain controller. an e-commerce website might integrate “JSON Stateless Front-End Service. from development through deployment. Develop 1. The following Microsoft Virtual Academy video describes how to manage your application lifecycle: Service model roles The service model roles are: Service developer: Develops modular and generic services that can be re-purposed and used in multiple applications of the same type or different types. testing. an operator provisions and deploys the application and ensures that it is running in Azure. A service developer declaratively describes the developed service types in a service manifest file consisting of one or more code. and maintenance to eventual decommissioning. An application developer declaratively describes the application type in an application manifest by referencing the service manifests of the constituent services and appropriately overriding and parameterizing different .” and “Queue Stateful Service” to build an auctioning solution. 2. and data packages. daily management. and quality of service. For example. Application developer: Creates applications by integrating a collection of services to satisfy certain specific requirements or scenarios. Operators monitor application health and performance information and maintain the physical infrastructure as needed. For example. A different deployed application can have different settings. An application developer then builds an application using different service types. For example. Service Fabric provides first-class support for the full application lifecycle of cloud applications. 4. for example) of the application. 3. maintenance. upgrading. deployment. an application on Azure Service Fabric usually goes through the following phases: design. The service model enables several different roles to participate independently in the application lifecycle. This article provides an overview of the APIs and how they are used by the different roles throughout the phases of the Service Fabric application lifecycle. a queue service can be used for creating a ticketing application (helpdesk) or an e-commerce application (shopping cart). development. Operator: Deploys applications based on the application configuration and requirements specified by the application administrator.” “Auction Stateful Service. an application administrator decides the language locale (English for the United States or Japanese for Japan. deployment (mapping to available resources). configuration. For example. and removal. Application administrator: Makes decisions on the application configuration (filling in the configuration template parameters). A service developer develops different types of services using the Reliable Actors or Reliable Services programming model. Service Fabric application lifecycle 3/9/2017 • 6 min to read • Edit Online As with other platforms. or the Create Service REST operation to create new service instances for the application based on available service types. An application administrator tailors the application type to a specific application to be deployed to a Service Fabric cluster by specifying the appropriate parameters of the ApplicationType element in the application manifest. Upgrade 1. 3. 2. The failover test scenario runs a specified service through important transitions and failovers to ensure that it's still available and working. or the Provision an Application REST operation. or the Invoke-ServiceFabricFailoverTestScenario cmdlet. Deploy 1. which can be an Azure blob store or the Service Fabric system service. or the Invoke-ServiceFabricChaosTestScenario cmdlet. After the application has been deployed. and replica faults into the cluster. an operator starts the application with the parameters supplied by the application administrator using the CreateApplicationAsync method. The service developer then runs the built-in chaos test scenario using the ChaosTestScenarioParameters and ChaosTestScenario classes. Test 1. The application is now running in the Service Fabric cluster. An application administrator incorporates the new version of the application type into the target application by updating the appropriate parameters. See Introduction to the Fault Analysis Service for more information. a service developer runs the built-in failover test scenario by using the FailoverTestScenarioParameters and FailoverTestScenario classes. See Deploy an application for examples. 4. The application package contains the application manifest and the collection of service packages. code package. the New- ServiceFabricService cmdlet. The application developer then incorporates the new versions of the service manifests into the application and provides a new version of the application type in an updated application package. 3. 2. or the Create Application REST operation. The operator then provisions the application type in the target cluster from the uploaded application package using the ProvisionApplicationAsync method. Service Fabric deploys applications from the application package stored in the image store. After deploying to the local development cluster or a test cluster. configuration and deployment settings of the constituent services. the Register-ServiceFabricApplicationType cmdlet. The service developer tests service-to-service communication by authoring test scenarios that move primary replicas around the cluster. 3. See Get started with Reliable Actors and Get started with Reliable Services for examples. After provisioning the application. An operator uploads the updated application package to the cluster image store using the . 6. 5. the New-ServiceFabricApplication cmdlet. 4. The chaos test scenario randomly induces multiple node. A service developer updates the constituent services of the instantiated application and/or fixes bugs and provides a new version of the service manifest. 2. An application developer overrides and parameterizes the configuration and deployment settings of the consistent services and provides a new version of the application manifest. an operator uses the CreateServiceAsync method. An operator uploads the application package to the cluster image store by using the CopyApplicationPackage method or the Copy-ServiceFabricApplicationPackage cmdlet. See Deploy an application for examples. For upgrades and patches to the Service Fabric platform. the Unregister-ServiceFabricApplicationType cmdlet. or the Update Application Upgrade REST operation. or the Provision an Application REST operation. the Register-ServiceFabricApplicationType cmdlet. Service Fabric upgrades itself without losing availability of any of the applications running on the cluster. the Remove-ServiceFabricService cmdlet. the Start-ServiceFabricApplicationRollback cmdlet. If necessary. An operator checks the progress of upgrade using the GetApplicationUpgradeProgressAsync method. or the Delete Application REST operation. testing. the operator rolls back the current application upgrade using the RollbackApplicationUpgradeAsync method. the Start-ServiceFabricApplicationUpgrade cmdlet. the Remove-ServiceFabricApplication cmdlet. See the Application upgrade tutorial for examples. Service Fabric automatically load-balances the running applications across all nodes in the cluster to achieve optimal performance. and managing Service Fabric applications and services. see: . 3. 10. CopyApplicationPackage method or the Copy-ServiceFabricApplicationPackage cmdlet. Remove 1. the operator can unprovision the application type using the UnprovisionApplicationAsync method. You must remove the application package manually. the Get-ServiceFabricApplicationUpgrade cmdlet. An operator provisions the new version of the application in the target cluster by using the ProvisionApplicationAsync method. the operator modifies and reapplies the parameters of the current application upgrade using the UpdateApplicationUpgradeAsync method. 4. 5. Service Fabric interfaces with the Azure infrastructure to guarantee availability of all the applications running in the cluster. An operator adds and removes nodes specified by the application administrator. 6. 7. An operator can delete a specific instance of a running service in the cluster without removing the entire application using the DeleteServiceAsync method. the Update-ServiceFabricApplicationUpgrade cmdlet. or the Unprovision an Application REST operation. or the Rollback Application Upgrade REST operation. 8. If necessary. An operator removes the application package from the ImageStore using the RemoveApplicationPackage method or the Remove-ServiceFabricApplicationPackage cmdlet. or the Upgrade an Application REST operation. or the Get Application Upgrade Progress REST operation. When new nodes are added to or existing nodes are removed from the cluster. An operator can also delete an application instance and all of its services using the DeleteApplicationAsync method. 2. Service Fabric upgrades the target application running in the cluster without losing the availability of any of its constituent services. An operator upgrades the target application to the new version using the UpgradeApplicationAsync method. 9. The application package contains the application manifest and the collection of service packages. 3. Maintain 1. or the Delete Service REST operation. Once the application and services have stopped. Unprovisioning the application type does not remove the application package from the ImageStore. 2. 4. 5. Next steps For more information on developing. An application administrator approves the addition or removal of nodes from a cluster after analyzing historical capacity utilization data and projected future demand. For operating system upgrades and patches. Reliable Actors Reliable Services Deploy an application Application upgrade Testability overview . Choose a publish profile within your Application project that you want to use for your continuous integration workflow. Learn more about Team Services build . or Visual Studio. and other artifacts. Ensure that you've saved your changes to the publish profile and cancel the publish dialog. Follow the publish instructions on how to publish an application to a remote cluster. Click the Save hyperlink to save the settings to the publish profile. If you want your application to be upgraded for each deployment that occurs within Team Services.sfproj file should contain a ProjectVersion property value of 1. allowing for deployment of your applications directly from those agents. The goal of the build definition that you are creating is to produce a Service Fabric application package. In the same publish dialog used in step 1. You don't actually need to publish your application though. Once your source files are accessible in Team Services. Prerequisites To get started. Ensure that you have access to a Team Services account or create one yourself. 3. an Azure Resource Manager template. you can now move on to the next step of generating builds. Ensure that you have access to a Team Services team project or create one yourself. NOTE Custom build agents are no longer required. Set up Service Fabric continuous integration and deployment with Visual Studio Team Services 1/24/2017 • 8 min to read • Edit Online This article describes the steps to set up continuous integration and deployment for an Azure Service Fabric application by using Visual Studio Team Services (VSTS). 2. 2. 4.1 or higher). Configure and share your source files The first thing you want to do is prepare a publish profile for use by the deployment process that executes within Team Services.sfproj) project. ensure that the Upgrade the Application checkbox is checked. Create a build definition A Team Services build definition describes a workflow that is composed of a set of build steps that are executed sequentially.1 or higher (the . Learn more about configuring additional upgrade settings. that can be used to deploy the application. Now it's time to share your Application project source files with Team Services. 4. you want to configure the publish profile to enable upgrade. The publish profile should be configured to target the cluster that you've previously prepared: 1. 3. You must have a project that was created or upgraded with Service Fabric SDK 2. You can click the Save hyperlink in the publish dialog once you have configured things appropriately. Ensure that you have a Service Fabric cluster to which you can deploy your application or create one using the Azure portal. This document reflects the current procedure and is expected to change over time. Ensure that you have already created a Service Fabric Application (. follow these steps: 1. Team Services hosted agents now come pre-installed with Service Fabric cluster management software. If you're not intending to use application upgrade in your workflow. 3. 4. Copy Files Copies the publish profile and application parameters files to the build's artifacts to be consumed for deployment. you need to create additional Visual Studio Build steps in the build definition that each target an Application project. Select Create. Build solution *. you need to explicitly update the path to that file. 8. Select the agent queue you wish to use.sfproj) exists in the repository. 2. The following paragraph is a description of the build steps generated by the template: BUILD STEP DESCRIPTION NuGet restore Restores the NuGet packages for the solution. 3. Save the build definition and provide a name. this build step assumes only one Service Fabric Application project (.sln Builds the entire solution. Verify the default set of tasks 1. In the dialog that opens. Build solution *. Verify the Solution input field for the Package application build step. Allows a release definition to consume the build's artifacts. If you only want the build definition to operate on one of those solution files. 10. Select Next. You would then also need to update the MSBuild Arguments field for each of those build steps so that the package location is unique for each of them. Verify the versioning behavior defined in the Update Service Fabric App Versions build step. Open your team project in Visual Studio Team Services. By default. If you want to package multiple Application projects in your repository. This is useful for supporting upgrade of your application since each upgrade deployment requires different version values from the previous deployment. The application package location is specified to be within the build's artifact directory. 2. you may consider disabling this build step. Hosted agents are supported.sfproj Generates the Service Fabric application package that is used to deploy the application. select Azure Service Fabric Application within the Build template category. 6. The deployment fails if the version of the application produced by the build does not match the . definitions. Verify the Solution input field for the NuGet restore and Build solution build steps. See the task documentation page for more information. By default. 5. It must be disabled if your intention is to produce a build that can be used to overwrite an existing Service Fabric application. this build step appends the build number to all version values in the application package's manifest files. 9. Publish Artifact Publishes the build's artifacts. these build steps execute upon all solution files that are contained in the associated repository. Select the green + sign to create a new build definition. See the task documentation page for more information. Select the repository and branch associated with your Service Fabric application. By default. Select the Build tab. 7. you need to explicitly update the path to that file. Create a definition from the build template 1. Update Service Fabric App Versions Updates the version values contained in the application package's manifest files to allow for upgrade support. If you have multiple such files in your repository and want to target only one of them for this build definition. select the Manage hyperlink next to the field to add one. Edit the definition name by clicking the pencil icon at the top of the page. Locate the Command-Line task within the Utility tab and click its Add button. Drag the task so that it is immediately after the NuGet restore step. f.com:19000. Once you've verified that the build is executing successfully.. The goal of the release definition that you are creating is to take an application package and deploy it to a cluster. b. 2. Select the green + sign to create a new release definition and select Create release definition in the menu. the build definition and release definition can execute the entire workflow from starting with source files to ending with a running application in your cluster. 4.azure. For the task's input fields. 7. version of the application in the cluster. Tool: dotnet e. In the dialog that opens. 3. Create a release definition A Team Services release definition describes a workflow that is composed of a set of tasks that are executed sequentially. Select Next. use the following values: d. you must ensure that your build definition contains a build step that restores the dependencies: a. Select New Service Endpoint and then select Azure Service Fabric from the menu. 6. Example: https://contoso. Create a definition from the release template 1. Try it Select Queue Build to manually start a build. perform the following steps: a. Open your project in Visual Studio Team Services. 5. The cluster connection provides the necessary information that allows the deployment task to connect to the cluster.westus. you can now move on to defining a release definition that deploys your application to a cluster. Typically. 5. d.. Check the Continuous deployment check box if you wish to have Team Services automatically create a new release and deploy the Service Fabric application whenever a build completes. Save any changes you've made to the build definition. define the credentials you want to use to connect to the cluster in the Username and Password fields. Select the Release tab. Select Add build step. e. Define the client connection endpoint URL in the Cluster Endpoint field. Learn more about Team Services release definitions. For Azure Active Directory credentials. On the page that opens. For Certificate Based authentication. you would use the name of your cluster. When used together.NET Core project. 9. If you do not yet have a cluster connection for your cluster. Select the agent queue you wish to use. 4. If your solution contains a . Select the cluster to which your application should be deployed from the Cluster Connection input field of the task. Arguments: restore f. c. select Azure Service Fabric Deployment within the Deployment template category. The release definition references the artifacts that were produced by the selected build definition. c. 10. 11. Select the type of authentication being used by the cluster targeted by this endpoint. Define a name for your connection in the Connection Name field. b. Select Create. 8.. Hosted agents are supported.cloudapp. define the Base64 encoding of the client certificate file in the Client . Builds also triggers upon push or check-in. Select the build definition you want to use as the source of this release definition. releases will be created automatically when the associated build definition completes a build. 12. After navigating back to your release definition. Confirm your changes by clicking OK. If your certificate is password-protected. select the build that you want to base the release on and then click Create. Verify the Publish Profile input field for the Deploy Service Fabric Application task. this field references a publish profile named Cloud. @hotmail. See the task documentation page for more information about this task. you need to update the path appropriately. In the dialog that opens.com) are not supported with Azure Active Directory authentication. read the following articles: Team Services documentation home Build management in Team Services Release management in Team Services . If you want to reference a different publish profile or if the build contains multiple application packages in its artifacts. By default. The definition that is created consists of one task of type Service Fabric Application Deployment. If you enabled continuous deployment. By default. If you've modified the default application package path in the build definition. Verify the template defaults 1.xml contained in the build's artifacts. NOTE Microsoft Accounts (for example. you need to update the path appropriately here as well. Certificate field. Try it Select Create Release from the Release button menu to manually create a release. g. Next steps To learn more about continuous integration with Service Fabric applications. click the refresh icon on the Cluster Connection field to see the endpoint you just added.com or @outlook. See the help pop-up on that field for info on how to get that value. define the password in the Password field. this field references the default application package path used in the build definition template. 2. Save the release definition. Verify the Application Package input field for the Deploy Service Fabric Application task. Or you can create and change the user. You can install the appropriate Git version from the Git downloads page. The following sections show how to set it up inside a cluster. It provides the path of the initial admin password required to sign in. Here's how to build and deploy your Azure Service Fabric application by using Jenkins. and paste the path you were shown on the Jenkins portal. NOTE Ensure that the 8081 port is specified as the application endpoint port while you are creating the cluster. Have a Service Fabric Linux cluster ready. If it is not installed. For example. From your browser. You can download it from Service Fabric downloads.docker. run the following: . General prerequisites Have Git installed locally. go to http://PublicIPorFQDN:8081 . after you sign in with the initial admin account. Set up Jenkins inside a Service Fabric cluster You can set up Jenkins either inside or outside a Service Fabric cluster.git -b JenkinsDocker cd service-fabric-java-getting-started/Services/JenkinsDocker/ azure servicefabric cluster connect http://PublicIPorFQDN:19080 # Azure CLI cluster connect command bash Scripts/install. Use Jenkins to build and deploy your Linux Java application 4/12/2017 • 6 min to read • Edit Online Jenkins is a popular tool for continuous integration and deployment of your apps.https://get. install it accordingly by using the following commands: sudo apt-get install wget wget -qO. check if Docker is installed by using the command docker info . Steps 1. based on your operating system. If you are running the cluster locally. by using the following steps: git clone https://github. 3. A Service Fabric cluster created from the Azure portal already has Docker installed. Secure Shell (SSH) sign in to the container. and can be monitored by using the Service Fabric Explorer.io/ | sh 2. Get the container instance ID by using docker ps -a . Prerequisites 1. If you are new to Git.com/Azure-Samples/service-fabric-java-getting-started. if in the portal it shows the path PATH_TO_INITIAL_ADMIN_PASSWORD . 2. learn more about it from the Git documentation. Have the Service Fabric container application deployed on the cluster.sh This installs a Jenkins container on the cluster. Have the Service Fabric Jenkins plug-in handy. You can continue to use Jenkins as an admin user. use 2d24. 5. and to add the SSH key to the GitHub account that is hosting the repository.docker. or you can continue to use the administrator account. Set up GitHub to work with Jenkins. After you create a user. docker exec -t -i [first-four-digits-of-container-ID] /bin/bash # This takes you inside Docker shell cat PATH_TO_INITIAL_ADMIN_PASSWORD 4. Run the container image: docker run -itd -p 8080:8080 raunakpandya/jenkins:v1 3. Use the instructions provided by GitHub to generate the SSH key. use the following command: docker exec -t -i [first-four-digits-of-container-ID] /bin/bash Set up Jenkins outside a Service Fabric cluster You can set up Jenkins either inside or outside of a Service Fabric cluster. you can create your own user account and use that for future purposes. Prerequisites You need to have Docker installed.io/ | sh Now when you run docker info in the terminal. you should see in the output that the Docker service is running. To sign in to the Jenkins shell from your host. by using the steps mentioned in Generating a new SSH key and adding it to the SSH agent. You can list all the Docker containers with the command docker ps –a 4. The following sections show how to set it up outside a cluster. and to add the SSH key to the GitHub account that is hosting your repository.https://get. Pull the Service Fabric Jenkins container image: docker pull raunakpandya/jenkins:v1 2. Get the ID of the container image instance. use the following commands: . Run the commands mentioned in the preceding link in the Jenkins Docker shell (and not on your host). Steps 1. Use the instructions provided by GitHub to generate the SSH key. To sign in to the Jenkins shell from your host. The following commands can be used to install Docker from the terminal: sudo apt-get install wget wget -qO. Run the commands mentioned in the preceding link in the Jenkins Docker shell (and not on your host). Sign in to the Jenkins portal by using the following steps: sh docker exec [first-four-digits-of-container-ID] cat /var/jenkins_home/secrets/initialAdminPassword If container ID is 2d24a73b5964. you need to continue with that. This password is required for signing in to the Jenkins dashboard from portal. which is http://<HOST-IP>:8080 After you sign in for the first time. Set up GitHub to work with Jenkins. by using the steps mentioned in Generating a new SSH key and adding it to the SSH agent. Enter an item name (for example. 4. type Jenkins.com/sayantancs/SFJenkins ). From the Jenkins dashboard. and then select the serviceFabric. Go the job page. Jenkins automatically installs the plug-in.hpi file. It picks up build. under GitHub project. /master). select Manage Jenkins > Manage Plugins > Advanced. you can upload a plug-in. https://github. Install the Service Fabric Jenkins plug-in from the portal 1. e. Select free-style project. MyJob). docker exec -t -i [first-four-digits-of-container-ID] /bin/bash Ensure that the cluster or machine where the Jenkins container image is hosted has a public-facing IP. (Previously. Select Add Service. d. Also. 2. Create and configure a Jenkins job 1. For this example. specify your GitHub project URL. b. In the widget that comes. Under the Build Triggers section. b. Under the Build section. When you select Upload. a. Allow a restart if requested. 3. select the option Invoke Gradle Script. Specify the repository URL that hosts the Service Fabric Java application that you want to integrate with the Jenkins CI/CD flow (for example. Go to Settings > Integrations and Services. which you downloaded under prerequisites.gradle from the path specified. specify the path to Root build script for your application. select Git. the root build script should contain ${WORKSPACE}/MyActor . from the drop-down Add build step. Go to your GitHub repository page. and select the Jenkins-GitHub plugin. and click OK. Here.git ). Go to http://PublicIPorFQDN:8081 2. So you select GitHub hook trigger for GITScm polling. See the following screenshot for an example of what this looks like: . this option was called Build when a change is pushed to GitHub.) f. Click add/update service. In the general section. it should be http://<PublicIPorFQDN>:8081/github-webhook/ ). you want to trigger a build whenever some push to the repository happens. You should see a green check by the webhook in GitHub. Select Choose file. This URL hosts the Service Fabric Java application that you want to integrate with the Jenkins continuous integration. https://github. continuous deployment (CI/CD) flow (for example. and works accordingly. c. Create a new item from dashboard. and your project will build.com/sayantancs/SFJenkins. select which build option you want. Under the Source Code Management section. Configure your GitHub (which is hosting the repository) so that it is able to talk to Jenkins. Enter your Jenkins webhook URL (by default. This enables the Jenkins instance to receive notifications from GitHub. If you create a project named MyActor (using the Eclipse plug-in or Yeoman generator). A test event is sent to your Jenkins instance. and click Configure. Use the following steps: a. you can specify here which branch to build (for example. You can also provide additional application details used to deploy the application. Here you need to provide cluster details where the Jenkins compiled Service Fabric application would be deployed. Next steps GitHub and Jenkins are now configured. in case you are using Service Fabric to deploy the Jenkins container image.com/sayantancs/SFJenkins. Consider making some sample change in your MyActor project in the repository example. https://github. g. select Deploy Service Fabric Project. From the Post-Build Actions drop-down. Push your changes to a remote master branch (or . See the following screenshot for an example of what this looks like: NOTE The cluster here could be same as the one hosting the Jenkins container application. . builds them. and deploys the application to the cluster endpoint you specified in post-build actions. that you configured.any branch that you have configured to work with). MyJob . It fetches the changes from GitHub. This triggers the Jenkins job. . Essentially. that node downloads the contents of your application package from the Image Store... but when you create a cluster through the Azure portal. Image Store Service: "fabric:ImageStore" 2.the "Image Store" is one such aspect..].AccountKey=[. So the setting must be configurable per cluster. Azure Storage: "xstore:DefaultEndpointsProtocol=https. . it looks like all you do is copy/paste the value as it appears in the cluster manifest of the target cluster. What's the purpose of this setting then? Service Fabric started off as a platform for internal Microsoft consumption by many diverse teams. we briefly mention the existence of an "ImageStoreConnectionString" parameter without describing what it really means.]. which is a stateful persisted system service that you can see from Service Fabric Explorer. Understand the ImageStoreConnectionString setting 2/13/2017 • 2 min to read • Edit Online In some of our documentation.Container=[. the Image Store is a pluggable repository for storing application packages. There are currently three possible kinds of Image Store providers and their corresponding connection strings are as follows: 1. there's no option to configure this setting and it's always "fabric:ImageStore". File System: "file:[file system path]" 3.AccountName=[. When your application is deployed to a node in the cluster.. The ImageStoreConnectionString is a setting that includes all the necessary information for both clients and nodes to find the correct Image Store for a given cluster.]" The provider type used in production is the Image Store Service. so some aspects of it are highly customizable .. And after going through an article like Deploy and remove applications using PowerShell. Hosting the Image Store in a system service within the cluster itself eliminates external dependencies for the package repository and gives us more control over the locality of storage. The client only needs to know that protocols targeting the system service should be used. Future improvements around the Image Store are likely to target the Image Store provider first. The File System provider is used instead of the Image Store Service for local one-box clusters during development to bootstrap the cluster slightly faster. but it's a useful optimization for most folks . if not exclusively. The difference is typically small. The connection string for the Image Store Service provider doesn't have any unique information since the client is already connected to the target cluster. its value can always be verified by retrieving the cluster manifest by PowerShell. For programmatic deployment to clusters hosted in Azure. the connection string is always "fabric:ImageStore". but there's usually no reason to do so since the develop/test workflow remains the same regardless of provider. or REST. When publishing to Azure through Visual Studio. So while the ImageStoreConnectionString is configurable.NET. It's possible to deploy a local one-box cluster with the other storage provider types as well. Both on-premise test and production clusters should always be configured to use the Image Store Service provider as well. the File System and Azure Storage providers only exist for legacy support. Other than this usage. Next steps Deploy and remove applications using PowerShell . you generally just use the default setting. . Though when in doubt.during development. the parameter is automatically set for you accordingly. Register the application type 3. If you want to verify the application package locally. and code/config/data packages. The Copy-ServiceFabricApplicationPackage command uploads the application package to the cluster image store.xml is "MyApplicationType". To completely remove an app from the cluster involves the following steps: 1. is located in . always start by using Connect-ServiceFabricCluster to connect to the Service Fabric cluster. it's ready for deployment into an Azure Service Fabric cluster. the application type name listed in the ApplicationManifest. Connect to the cluster Before you run any PowerShell commands in this article. or Windows Active Directory see Connect to a secure cluster. Upload the application package Uploading the application package puts it in a location that's accessible by internal Service Fabric components. Remove (or delete) the running application instance 2. The application package. To import the SDK module. X509 certificates. use the Test-ServiceFabricApplicationPackage cmdlet. Deployment involves the following three steps: 1. Create the application instance After an app is deployed and an instance is running in the cluster. Remove the application package from the image store If you use Visual Studio for deploying and debugging applications on your local development cluster. which contains the necessary application manifest. This article provides background on what that script is doing so that you can perform the same operations outside of Visual Studio. To connect to the local development cluster. The Get-ImageStoreConnectionStringFromClusterManifest cmdlet. all the preceding steps are handled automatically through a PowerShell script. you can delete the app instance and its application type. service manifests. is used to get the image store connection string. run the following: PS C:\>Connect-ServiceFabricCluster For examples of connecting to a remote cluster or cluster secured using Azure Active Directory. Upload the application package to the image store 2.psm1" Suppose you build and package an app named MyApplication in Visual Studio. Unregister the application type if you no longer need it 3. which is part of the Service Fabric SDK PowerShell module. This script is found in the Scripts folder of the application project. Deploy and remove applications using PowerShell 4/10/2017 • 9 min to read • Edit Online Once an application type has been packaged. By default. run: Import-Module "$ENV:ProgramFiles\Microsoft SDKs\Service Fabric\Tools\PSModule\ServiceFabricSDK\ServiceFabricSDK. Fabric.dll │ Stateless1.exe. which show the initial and the compressed package size. Upload time may be slower currently.zip ServiceManifest. The following cmdlet compresses the package without copying it to the image store.dll │ Microsoft. Applying compression on a compressed package is no-op. To compress a package. you can compress it.dll │ Microsoft. use the same Copy- ServiceFabricApplicationPackage command with the UncompressPackage switch.Fabric.Data.xml │ ├───Code │ Microsoft. Compression can be done separate from upload. because they are needed for many internal operations (like package sharing.ServiceFabric.dll │ ServiceFabricServiceModel. The package now includes zipped files for the Code and Config packages.xml | └───Stateless1Pkg Code.Data.Strings.Internal.dll │ └───Config Settings.config │ Stateless1. The compression times and the size of the compressed package also differ based on the package content. here is compression statistics for some packages.ServiceFabric.Strings. the compression takes time. The compression reduces the size and the number of files. The following command lists the contents of the application package: PS C:\> $path = 'C:\Users\user\Documents\Visual Studio 2015\Projects\MyApplication\MyApplication\pkg\Debug' PS C:\> tree /f $path Folder PATH listing for volume OSDisk Volume serial number is 0459-2393 C:\USERS\USER\DOCUMENTS\VISUAL STUDIO 2015\PROJECTS\MYAPPLICATION\MYAPPLICATION\PKG\DEBUG │ ApplicationManifest. Zipping the manifests would make these operations inefficient.ServiceFabric.exe │ Stateless1. by using the SkipCopy flag.dll │ Microsoft.Internal. For example. To uncompress a compressed package. with the compression time.ServiceFabric. use a fast SSD drive.xml For large application packages.xml If the application package is large and/or has many files. or together with the upload operation.dll │ System. PS C:\> Copy-ServiceFabricApplicationPackage -ApplicationPackagePath $path -CompressPackage -SkipCopy PS C:\> tree /f $path Folder PATH listing for volume OSDisk Volume serial number is 0459-2393 C:\USERS\USER\DOCUMENTS\VISUAL STUDIO 2015\PROJECTS\MYAPPLICATION\MYAPPLICATION\PKG\DEBUG | ApplicationManifest. application type name and version extraction for certain validations). use the same Copy-ServiceFabricApplicationPackage command. especially if you include the time to compress the package.zip Config.xml │ └───Stateless1Pkg │ ServiceManifest.C:\Users\username\Documents\Visual Studio 2015\Projects\MyApplication\MyApplication\pkg\Debug.ServiceFabric.pdb │ System. For best results.Services. The side effect is that registering and un-registering the application type are faster.dll │ Microsoft.Interfaces. The application and the service manifests are not zipped. . The deploy mechanism is same for compressed and uncompressed packages. See Understand the image store connection string for supplementary information about the image store and image store connection string. and copies the processed package to an internal system location. The Register-ServiceFabricApplicationType command returns only after the system has successfully registered the application package. it can be uploaded to one or multiple Service Fabric clusters as needed. Depending on the described factors. The system reads the package uploaded in the previous step. the -TimeoutSec parameter can be used to supply a longer timeout (the default timeout . The network speed between the source machine and the Service Fabric cluster also impacts the upload time.3775554 1231 5012 100 00:02:45. The application type with name "MyApplicationType" and version "1. the app package is copied into the "Debug" folder in the image store. Register the application package The application type and version declared in the application manifest become available for use when the app package is registered.3547592 60 512 100 00:00:16. and the file sizes.3850303 307 1024 500 00:00:32. you may have to increase the timeout.5907950 615 2048 1000 00:01:04. Run the Register-ServiceFabricApplicationType cmdlet to register the application type in the cluster and make it available for deployment: PS C:\> Register-ServiceFabricApplicationType MyApplicationV1 Register application type succeeded "MyApplicationV1" is the folder in the image store where the app package is located. If you are compressing the package in the copy call. COMPRESSED PACKAGE SIZE INITIAL SIZE (MB) FILE COUNT COMPRESSION TIME (MB) 100 100 00:00:03. The following example uploads the package to the image store. into a folder named "MyApplicationV1": PS C:\> Copy-ServiceFabricApplicationPackage -ApplicationPackagePath $path - ApplicationPackagePathInImageStore MyApplicationV1 -ImageStoreConnectionString (Get- ImageStoreConnectionStringFromClusterManifest(Get-ServiceFabricClusterManifest)) -TimeoutSec 1800 If you do not specify the -ApplicationPackagePathInImageStore parameter. it is stored as such in the cluster image store and it's uncompressed on the node before the application is run. verifies the package. Some of these factors are the number of files in the package. How long registration takes depends on the size and contents of the application package. The time it takes to upload a package differs depending on multiple factors. If needed.2951288 3074 Once a package is compressed. processes the package contents. The default timeout for Copy- ServiceFabricApplicationPackage is 30 minutes. the package size.0" (both are found in the application manifest) is now registered in the cluster. If the package is compressed.0. you need to also consider the compression time. You can use this command to determine when the registration is done.is 60 seconds). with its own work directory and process. To see which named apps and services are running in the cluster.0 ApplicationName : fabric:/MyApp ApplicationTypeName : MyApplicationType ApplicationTypeVersion : 1.0 Status : Available DefaultParameters : { "Stateless1_InstanceCount" = "-1" } Create the application You can instantiate an application from any application type version that has been registered successfully by using the New-ServiceFabricApplication cmdlet.0. The command returns when the cluster accepts the register command. use the -Async parameter.0 ServiceStatus : Active HealthState : Ok Remove an application When an application instance is no longer needed. The name of each application must start with the fabric: scheme and be unique for each application instance. Each application instance runs in isolation.0. PS C:\> New-ServiceFabricApplication fabric:/MyApp MyApplicationType 1. Any default services defined in the application manifest of the target application type are also created.0 ApplicationParameters : {} Multiple application instances can be created for any given version of a registered application type. PS C:\> Get-ServiceFabricApplicationType ApplicationTypeName : MyApplicationType ApplicationTypeVersion : 1. If you have a large app package or if you are experiencing timeouts. you can permanently remove it by name using the Remove-ServiceFabricApplication cmdlet.0. and the processing continues as needed.0 ApplicationStatus : Ready HealthState : Ok ApplicationParameters : {} PS C:\> Get-ServiceFabricApplication | Get-ServiceFabricService ServiceName : fabric:/MyApp/Stateless1 ServiceKind : Stateless ServiceTypeName : Stateless1Type IsServiceGroup : False ServiceManifestVersion : 1.0. Remove-ServiceFabricApplication automatically removes all .0. The Get- ServiceFabricApplicationType command lists all successfully registered application type versions and their registration status. run the Get-ServiceFabricApplication and Get-ServiceFabricService cmdlets: PS C:\> Get-ServiceFabricApplication ApplicationName : fabric:/MyApp ApplicationTypeName : MyApplicationType ApplicationTypeVersion : 1. Unregistering unused application types releases storage space used by the image store. PS C:\> Remove-ServiceFabricApplication fabric:/MyApp Confirm Continue with this operation? [Y] Yes [N] No [S] Suspend [?] Help (default is "Y"): Remove application instance succeeded PS C:\> Get-ServiceFabricApplication Unregister an application type When a particular version of an application type is no longer needed. permanently removing all service state. and application state cannot be recovered. retrieved using the Get- ServiceFabricClusterManifest commanC: PS C:\> Get-ServiceFabricClusterManifest The ImageStoreConnectionString is found in the cluster manifest: . PS C:\>Remove-ServiceFabricApplicationPackage -ApplicationPackagePathInImageStore MyApplicationV1 - ImageStoreConnectionString (Get-ImageStoreConnectionStringFromClusterManifest(Get- ServiceFabricClusterManifest)) Troubleshooting Copy-ServiceFabricApplicationPackage asks for an ImageStoreConnectionString The Service Fabric SDK environment should already have the correct defaults set up. Run Get-ServiceFabricApplicationType to see the application types currently registered in the cluster: PS C:\> Get-ServiceFabricApplicationType ApplicationTypeName : MyApplicationType ApplicationTypeVersion : 1.0. This operation cannot be reversed.0 Status : Available DefaultParameters : { "Stateless1_InstanceCount" = "-1" } Run Unregister-ServiceFabricApplicationType to unregister a specific application type: PS C:\> Unregister-ServiceFabricApplicationType MyApplicationType 1.0 Remove an application package from the image store When an application package is no longer needed. the ImageStoreConnectionString for all commands should match the value that the Service Fabric cluster is using.0. you should unregister the application type using the Unregister-ServiceFabricApplicationType cmdlet. An application type can be unregistered as long as no applications are instantiated against it and no pending application upgrades are referencing it. You can find the ImageStoreConnectionString in the cluster manifest. you can delete it from the image store to free up system resources. But if needed.services that belong to the application as well. If the connection is slow.com/2011/01/fabric"> [. Try: Specify a larger timeout for Copy-ServiceFabricApplicationPackage command..] See Understand the image store connection string for supplementary information about the image store and image store connection string. but Register-ServiceFabricApplicationType times out. Check the network connection between your source machine and cluster. Issue: Upload package completed successfully. For this reason. Specify a larger timeout for Register-ServiceFabricApplicationType with TimeoutSec parameter.. upload may be throttled. Check if you are hitting external throttling. The command returns when the cluster accepts the command and the provision continues async. there is no need to specify a higher timeout in this case.. The upload operation may be slower (especially if you include the compression time).org/2001/XMLSchema- instance" Name="Server-Default-SingleNode" Version="1. Try: Compress the package before copying to the image store.org/2001/XMLSchema" xmlns:xsi="http://www. consider using a machine with a better network connection. Deploy application package with many files Issue: Register-ServiceFabricApplicationType times out for an application package with many files (order of thousands). Specify Async switch for Register-ServiceFabricApplicationType. By default. with TimeoutSec parameter.microsoft. when the image store is configured to use azure storage. The compression reduces the size and the number of files. but register and unregister the application type are faster. the timeout is 30 minutes. Next steps Service Fabric application upgrade Service Fabric health introduction Diagnose and troubleshoot a Service Fabric service Model an application in Service Fabric . For this reason. which in turn reduces the amount of traffic and work that Service Fabric must perform. Specify a larger timeout for Register-ServiceFabricApplicationType with TimeoutSec parameter. consider using a client machine in a closer or same region as the cluster.. The compression reduces the number of files. The command returns when the cluster accepts the command and the provision continues async. Try: Compress the package before copying to the image store.0" xmlns="http://schemas.] <Section Name="Management"> <Parameter Name="ImageStoreConnectionString" Value="file:D:\ServiceFabric\Data\ImageStore" /> </Section> [. Specify Async switch for Register-ServiceFabricApplicationType.w3. there is no need to specify a higher timeout in this case. Deploy large application package Issue: Copy-ServiceFabricApplicationPackage times out for a large application package (order of GB). For example.w3. If the client machine is in another region than the cluster. <ClusterManifest xmlns:xsd="http://www. This is because part of the manifest files will be generated during the build.ps1 This is a PowerShell script that uses a publish profile path as a parameter for publishing Service Fabric applications. To publish an application by using the Publish Service Fabric Application dialog box The following steps demonstrate how to publish an application by using the Publish Service Fabric Application dialog box provided by the Visual Studio Service Fabric Tools. Application manifest files can be parameterized so that you can use different values for deployment settings.xml. 1. choose Publish… to view the Publish Service Fabric Application dialog box. . Publish profiles A folder in the Service Fabric application project called PublishProfiles contains XML files that store essential information for publishing an application. To learn more about parameterizing your application. repeatable. Since this script is part of your application. and scriptable way to publish an application to a Service Fabric cluster. Publish an application to a remote cluster by using Visual Studio 4/10/2017 • 5 min to read • Edit Online The Azure Service Fabric extension for Visual Studio provides an easy. The artifacts required for publishing Deploy-FabricApplication. see Manage multiple environments in Service Fabric. On the shortcut menu of the Service Fabric Application project. NOTE For actor services. your application will include two publish profiles: Local. you are welcome to modify it as necessary for your application. you should build the project first before attempting to edit the file in an editor or through the publish dialog box. You can add more profiles by copying and pasting one of the default files.xml and Cloud. such as: Service Fabric cluster connection parameters Path to an application parameter file Upgrade settings By default. Application parameter files A folder in the Service Fabric application project called ApplicationParameters contains XML files for user- specified application manifest parameter values. except Manifest versions. To add or change the connection endpoint. click on the Connection Endpoint dropdown list. Note that if you are not already logged in to Visual Studio. You can either reuse an existing profile or create a new one by choosing in the Target profile dropdown list box. . choose the Save Profile link. specify a local or remote Service Fabric cluster’s publishing endpoint. are saved. The list shows the available Service Fabric cluster connection endpoints to which you can publish based on your Azure subscription(s). its contents appear in the corresponding fields of the dialog box. Use the cluster selection dialog box to choose from the set of available subscriptions and clusters. To save your changes at any time. When you choose a publish profile. you will be prompted to do so. The file selected in the Target profile dropdown list box is where all of the settings. In the Connection endpoint section. 2. See Service Fabric application upgrade tutorial to learn how application and service manifest versions impact an upgrade process. To add or change a parameter. When you're done. To configure upgrade settings. However.0. See Service Fabric Application Upgrade for a list of differences. The upgrade parameter editor appears. Enter or change the parameter's value in the Parameters grid. if the cluster is secure. choose the Configure Upgrade Settings link. See Configure the upgrade of a Service Fabric application to learn more about upgrade parameters. Once you choose an endpoint. When you're done. The selected cluster appears in the Publish Service Fabric Application dialog box.0 or numerical values in the . If the application and service versions use semantic versioning such as 1. See How to configure secure connections for more information. Choose the Manifest Versions… button to view the Edit Versions dialog box. 4. Use the Upgrade the Application checkbox to specify whether this publish action is an upgrade. 5. 3. you'll need to install a certificate on your local computer before proceeding. Visual Studio can connect to it immediately. An application parameter file holds user-specified values for parameters in the application manifest file. choose the Save button. choose the Edit button. If the cluster isn't secure. In the Application Parameter File dropdown list box. Visual Studio validates the connection to the selected Service Fabric cluster. navigate to an application parameter file. choose the OK button. see the Publishing to an arbitrary cluster endpoint section below. NOTE If you would like to publish to an arbitrary endpoint (such as a party cluster). You need to update application and service versions for an upgrade to take place. Upgrade publish actions differ from normal publish actions. For instance.azure. However.azure. choose the Publish button to publish your application to the selected Service Fabric cluster. perhaps named Party. clear the checkbox to disable this feature. the new publish profile points to one of the default application parameter files. such as partycluster1.cloudapp. Once your publish profile is set up. When you're done specifying all of the necessary settings.eastus. you will also need to provide the details of the client certificate from the local store to be used for authentication.com:19000" /> If you are connecting to a secured cluster. If you are connecting to a unsecured cluster. Publish to an arbitrary cluster endpoint (including party clusters) The Visual Studio publishing experience is optimized for publishing to remote clusters associated with one of your Azure subscriptions. As described above. see Configuring secure connections to a Service Fabric cluster. This is . For more details. If you prefer to edit the versions manually. Note that in this case. format of 1. config. you can reference it in the publish dialog box as shown below.xml and Cloud. When you choose this option. The settings that you specified are applied to the publish process. you might want to create a profile for publishing to party clusters. two publish profiles are provided by default--Local. the service and application version numbers are automatically updated whenever a code.0. In that case. it is possible to publish to arbitrary endpoints (such as Service Fabric party clusters) by directly editing the publish profile XML. all that's required is the cluster connection endpoint. the connection endpoint in the publish profile would look something like this: <ClusterConnectionParameters ConnectionEndpoint="partycluster1.0.com:19000 . first build the project to generate the entries in the Service Manifest files. or data package version is updated. 6. NOTE For all package entries to appear for an actor project.0. select the Automatically update application and service versions option.cloudapp.xml--but you are welcome to create additional profiles for different environments.xml.eastus. By contrast. see Set up Service Fabric continuous integration. it would make sense to create a corresponding application parameter file. .appropriate if you want to publish the same application configuration to a number of environments. in cases where you want to have different configurations for each environment that you want to publish to. Next steps To learn how to automate the publishing process in a continuous integration environment. Connect to the cluster Connect to the cluster by creating a FabricClient instance before you run any of the code examples in this article. Remove the application package from the image store If you use Visual Studio for deploying and debugging applications on your local development cluster. you can delete the app instance and its application type. If the application package is large and/or has many files. Deployment involves the following three steps: 1. FabricClient fabricClient = new FabricClient(). See Understand the image store connection string for supplementary information about the image store and image store connection string. it's ready for deployment into an Azure Service Fabric cluster. Upload the application package Suppose you build and package an app named MyApplication in Visual Studio. If you want to verify the application package locally. which contains the necessary application manifest. By default. and code/config/data packages. The compression reduces the size and the number of files. .xml is "MyApplicationType". use the Test-ServiceFabricApplicationPackage cmdlet. The application package. To completely remove an app from the cluster involves the following steps: 1. you can compress it and copy it to the imagestore using PowerShell. is located in C:\Users\username\Documents\Visual Studio 2015\Projects\MyApplication\MyApplication\pkg\Debug. Uploading the application package puts it in a location that's accessible by internal Service Fabric components. This script is found in the Scripts folder of the application project. Deploy and remove applications using FabricClient 4/10/2017 • 8 min to read • Edit Online Once an application type has been packaged. all the preceding steps are handled automatically through a PowerShell script. This article provides background on what that script is doing so that you can perform the same operations outside of Visual Studio. Create the application instance After an app is deployed and an instance is running in the cluster. Register the application type 3. X509 certificates. For examples of connecting to a local development cluster or a remote cluster or cluster secured using Azure Active Directory. Upload the application package to the image store 2. Unregister the application type if you no longer need it 3. To connect to the local development cluster. The CopyApplicationPackage method uploads the application package to the cluster image store. the application type name listed in the ApplicationManifest. Remove (or delete) the running application instance 2. run the following: // Connect to the local cluster. or Windows Active Directory see Connect to a secure cluster. service manifests. This operation cannot be reversed. processes the package contents. To see which named apps and services are running in the cluster. Remove an application instance When an application instance is no longer needed. Any default services defined in the application manifest of the target application type are also created. and application state cannot be recovered. Remove an application package from the image store When an application package is no longer needed. Create an application instance You can instantiate an application from any application type version that has been registered successfully by using the CreateApplicationAsync method. Each application instance runs in isolation. Unregistering unused application types releases storage space used by the image store. with its own work directory and process. you should unregister the application type using the Unregister-ServiceFabricApplicationType method. The name of each application must start with the fabric: scheme and be unique for each application instance. run the GetApplicationListAsync and GetServiceListAsync methods. . you can delete it from the image store to free up system resources using the RemoveApplicationPackage method. Remove a service instance When a service instance is no longer needed. Multiple application instances can be created for any given version of a registered application type. Create a service instance You can instantiate a service from a service type using the CreateServiceAsync method. and service state cannot be recovered. Unregister an application type When a particular version of an application type is no longer needed. verifies the package. you can permanently remove it by name using the DeleteApplicationAsync method. you can remove it from the running application instance by calling the DeleteServiceAsync method. permanently removing all service state. The GetApplicationTypeListAsync method lists all successfully registered application type versions and their registration status. The ProvisionApplicationAsync method registers the application type in the cluster and make it available for deployment. and copies the processed package to an internal system location. An application type can be unregistered as long as no applications are instantiated against it and no pending application upgrades are referencing it. the service is instantiated with the application is instantiated. If the service is declared as a default service in the application manifest. Calling the CreateServiceAsync method for a service that is already instantiated will return an exception. This operation cannot be reversed.Register the application package The application type and version declared in the application manifest become available for use when the app package is registered. You can use this command to determine when the registration is done. The system reads the package uploaded in the previous step. DeleteApplicationAsync automatically removes all services that belong to the application as well. when the image store is configured to use azure storage.w3. Specify a larger timeout for ProvisionApplicationAsync with timeout parameter. Try: Compress the package before copying to the image store.. Specify a larger timeout for ProvisionApplicationAsync method with timeout parameter. By default. Issue: Upload package completed successfully. But if needed. If the client machine is in another region than the cluster. provisions the application type. Check if you are hitting external throttling. with timeout parameter. removes the application instance.] See Understand the image store connection string for supplementary information about the image store and image store connection string. and deletes the application package from the image store. but register and un-register the application type are faster. Deploy large application package Issue: CopyApplicationPackage method times out for a large application package (order of GB).org/2001/XMLSchema- instance" Name="Server-Default-SingleNode" Version="1. upload may be throttled. the timeout is 30 minutes. creates an application instance.. creates a service instance. which in turn reduces the amount of traffic and work that Service Fabric must perform. The ImageStoreConnectionString is found in the cluster manifest: <ClusterManifest xmlns:xsd="http://www. using System. If the connection is slow. Code example The following example copies an application package to the image store.w3. Check the network connection between your source machine and cluster. . For example. consider using a machine with a better network connection. the ImageStoreConnectionString for all commands should match the value that the Service Fabric cluster is using.com/2011/01/fabric"> [.. The upload operation may be slower (especially if you include the compression time). Try: Specify a larger timeout for CopyApplicationPackage method..] <Section Name="Management"> <Parameter Name="ImageStoreConnectionString" Value="file:D:\ServiceFabric\Data\ImageStore" /> </Section> [.microsoft.Troubleshooting Copy-ServiceFabricApplicationPackage asks for an ImageStoreConnectionString The Service Fabric SDK environment should already have the correct defaults set up. consider using a client machine in a closer or same region as the cluster. The compression reduces the size and the number of files. You can find the ImageStoreConnectionString in the cluster manifest. un-provisions the application type.org/2001/XMLSchema" xmlns:xsi="http://www.0" xmlns="http://schemas. Deploy application package with many files Issue: ProvisionApplicationAsync times out for an application package with many files (order of thousands). retrieved using the Get- ServiceFabricClusterManifest method. The compression reduces the number of files. but ProvisionApplicationAsync method times out. Try: Compress the package before copying to the image store. 0" (both are found in the application manifest) // is now registered in the cluster. string serviceName = "fabric:/MyApplication/Stateless1". } catch (AggregateException ae) { Console.WriteLine("Application package copied to {0}". ex. // Copy the application package to a location in the image store try { fabricClient.Collections.ProvisionApplicationAsync(packagePathInImageStore). packagePathInImageStore). string serviceType = "Stateless1Type".0". namespace ServiceFabricAppLifecycle { class Program { static void Main(string[] args) { string clusterConnection = "localhost:19000".CopyApplicationPackage(imageStoreConnectionString. string appType = "MyApplicationType".0.using System. "MyApplicationV1" is the folder in the image store where the app package is located. ex. packagePathInImageStore).WriteLine("HResult: {0} Message: {1}".0. using System. string appVersion = "1. using System.Description. using System.Tasks.HResult. packagePathInImageStore). string appName = "fabric:/MyApplication".Generic. FabricClient fabricClient = new FabricClient(clusterConnection).Reflection. Console.WriteLine("Provisioned application type {0}". using System.WriteLine("HResult: {0} Message: {1}". // Connect to the cluster.Text. ex. string packagePath = "C:\Users\username\Documents\Visual Studio 2015\Projects\MyApplication\MyApplication\pkg\Debug". foreach (Exception ex in ae. } catch (AggregateException ae) { Console.HResult. try { fabricClient.Message). string imageStoreConnectionString = "file:C:\\SfDevCluster\\Data\\ImageStoreShare".Threading. foreach (Exception ex in ae. string packagePathInImageStore = "MyApplication".Wait(). using System. using System. // The application type with name "MyApplicationType" and version "1.ApplicationManager.InnerExceptions) { Console. ex. . Console.Linq. packagePath. } } // Provision the application.Message).WriteLine("Application package copy to Image Store failed: ").Fabric.Fabric. } } // Create the application instance.InnerExceptions) { Console.Threading.WriteLine("Provision Application Type failed:"). using System.ApplicationManager. Wait(). appType.WriteLine("HResult: {0} Message: {1}". For stateful services.InnerExceptions) { Console. ex.ApplicationManager.Message)."). Console.ServiceTypeName = serviceType.WriteLine("HResult: {0} Message: {1}".WriteLine("CreateService failed.WriteLine("DeleteService failed.WriteLine("Deleted application instance {0}".Wait(). ex.HResult. try { DeleteServiceDescription deleteServiceDescription = new DeleteServiceDescription(new Uri(serviceName)).WriteLine("HResult: {0} Message: {1}". Console. try { ApplicationDescription appDesc = new ApplicationDescription(new Uri(appName). appName). appVersion). } catch (AggregateException ae) { Console.xml.InstanceCount = 1. fabricClient.Wait(). } catch (AggregateException ae) { Console. StatelessServiceDescription serviceDescription = new StatelessServiceDescription().InnerExceptions) { Console. appVersion). serviceDescription. ex.CreateApplicationAsync(appDesc). version {1}". } catch (AggregateException ae) .DeleteServiceAsync(deleteServiceDescription). } } // Create the stateless service description. // Create the service instance. ex. use a StatefulServiceDescription object.ApplicationManager. foreach (Exception ex in ae.WriteLine("CreateApplication failed. } catch (AggregateException ae) { Console. foreach (Exception ex in ae.WriteLine("Created service instance {0}". serviceDescription.").HResult.Message). serviceDescription. // the service instance is already running and this call will fail. fabricClient. serviceName). appType. Console. ex.WriteLine("Deleted service instance {0}".ApplicationName = new Uri(appName). foreach (Exception ex in ae. Console."). serviceName). } } // Delete a service instance.ServiceManager.CreateServiceAsync(serviceDescription). fabricClient.DeleteApplicationAsync(deleteApplicationDescription).WriteLine("Created application instance of type {0}.HResult. serviceDescription. serviceDescription. ex. try { DeleteApplicationDescription deleteApplicationDescription = new DeleteApplicationDescription(new Uri(appName)).Message).ServiceName = new Uri(serviceName).InnerExceptions) { Console. try { fabricClient.ServiceManager.PartitionSchemeDescription = new SingletonPartitionSchemeDescription(). } } // Delete an application instance from the application type. If the service is declared as a default service in the ApplicationManifest. ex.HResult. ex. } catch (AggregateException ae) { Console.. ex. foreach (Exception ex in ae. } } // Un-provision the application type.InnerExceptions) { Console.WriteLine("HResult: {0} Message: {1}".InnerExceptions) { Console. ex.WriteLine("HResult: {0} Message: {1}".Message). foreach (Exception ex in ae. version {1}".HResult. try { fabricClient.. foreach (Exception ex in ae.").").WriteLine("HResult: {0} Message: {1}". Console.WriteLine("DeleteApplication failed. appVersion).UnprovisionApplicationAsync(appType. } } } Next steps Service Fabric application upgrade Service Fabric health introduction Diagnose and troubleshoot a Service Fabric service Model an application in Service Fabric .Message).RemoveApplicationPackage(imageStoreConnectionString.WriteLine("Hit enter. } } // Delete the application package from a location in the image store.Wait(). packagePathInImageStore). Console.WriteLine("Application package removal from Image Store failed: ").WriteLine("Application package removed from {0}". appVersion).ApplicationManager.WriteLine("Un-provision application type failed: "). try { fabricClient.ApplicationManager. ex. packagePathInImageStore). ex. } } Console.Read(). catch (AggregateException ae) { Console. } catch (AggregateException ae) { Console.WriteLine("Un-provisioned application type {0}.HResult.InnerExceptions) { Console. Console. appType.Message). the upgrade is applied to a subset of nodes in the cluster. Service Fabric application upgrade 3/3/2017 • 4 min to read • Edit Online An Azure Service Fabric application is a collection of services. the two versions must be forward and backward compatible. Health checks during upgrades For an upgrade. Rolling upgrades overview In a rolling application upgrade. but Service Fabric does not have the information to check throughput. but is compatible with the intermediate version. This approach is not recommended. It also evaluates the health of all the services for the application this way. During the upgrade. Service Fabric compares the new application manifest with the previous version and determines which services in the application require updates. Update domains do not receive updates in a particular order. Service Fabric evaluates the health of the application through the health that is reported on the application. That is. As a result. which is the case when the application has only one update domain. For that reason. the cluster may contain a mix of the old and new versions. and so on. At each stage. Service Fabric further evaluates the health of the application services by aggregating the health of their children. whether the instance was started. no service-specific tests are done. and when all update domains are deemed healthy. If a service has not changed. that service is not upgraded. If the health policy is violated. the application administrator is responsible for staging a multiple-phase upgrade to maintain availability. Update domains are specified in the cluster manifest when you configure the cluster. since the service goes down and isn't available at the time of upgrade. For example. Non-rolling upgrades are possible if the upgrade is applied to all nodes in the cluster. called an update domain. such as the service replica. During an upgrade. the application upgrade fails. the first step is upgrading to an intermediate version of the application that is compatible with the previous version. the upgrade is performed in stages. An upgrade is termed successful when all update domains are upgraded within the specified time-outs. In a multiple-phase upgrade. Update domains allow the services to remain at high availability during an upgrade. Upgrade modes . as health is defined by Service Fabric. The checks that happen during an upgrade include tests for whether the application package was copied correctly. An update domain is a logical unit of deployment for an application. Additionally. a health policy may mandate that all services within an application instance must be healthy. Azure doesn't provide any guarantees when a cluster is set up with only one update domain. The second step is to upgrade the final version that breaks compatibility with the pre-update version. your service might have a throughput requirement. Once the application health policy is satisfied. the application remains available throughout the upgrade. For example. Refer to the health articles for the checks that are performed. In short. Health policies and checks during upgrade by Service Fabric are service and application agnostic. Service Fabric compares the version numbers in the service manifests with the version numbers in the previous version. A healthy update domain means that the update domain passed all the health checks specified in the health policy. The application health is an aggregation of the child entities of the application. health policies have to be set (or default values may be used). If they are not compatible. the upgrade can proceed. The standard rules of upgrading default services are: 1. and if all health checks pass (per the policy specified). Default services existing in both previous application manifest and new version are updated. If health checks fail and/or time-outs are reached. or the mode is changed to unmonitored manual. In case of an application upgrade is rolled back. 2. the flow describes how the time-outs. Monitored mode performs the upgrade on one update domain. Unmonitored manual mode needs manual intervention after every upgrade on an update domain. default services are reverted to the status before upgrade started. Service descriptions in the new version would overwrite those already in the cluster. Upgrade default services Default services within Service Fabric application can be upgraded during the upgrade process of an application. Note that this deleting default services can not be reverted. TIP EnableDefaultServicesUpgrade needs to be set to true to enable the following rules. But deleted services can never be created. Default services in the new application manifest that do not exist in the cluster are created. and UpgradeHealthCheckInterval. Application upgrade would rollback automatically upon updating default service failure. the upgrade is either rolled back for the update domain. Application upgrade flowchart The flowchart following this paragraph can help you understand the upgrade process of a Service Fabric application.5. . In particular. The administrator performs the health or status checks before starting the upgrade in the next update domain. which is the commonly used mode. to kick off the upgrade on the next update domain. 3. You can configure the upgrade to choose one of those two modes for failed upgrades.The mode that we recommend for application upgrade is the monitored mode. No Service Fabric health checks are performed. help control when the upgrade in one update domain is considered a success or a failure. Default services in the previous application manifest but not in the new version are deleted. Default services are defined in the application manifest. This feature is supported from v5. moves on to the next update domain automatically. HealthCheckRetryTimeout. including HealthCheckStableDuration. Next steps Upgrading your Application Using Visual Studio walks you through an application upgrade using Visual Studio. Upgrading your Application Using Powershell walks you through an application upgrade using PowerShell. Learn how to use advanced functionality while upgrading your application by referring to Advanced Topics. Fix common problems in application upgrades by referring to the steps in Troubleshooting Application . Control how your application upgrades by using Upgrade Parameters. Make your application upgrades compatible by learning how to use Data Serialization. .Upgrades. The Edit Upgrade Parameters dialog box appears. Each parameter has default values. Each upgrade mode requires different sets of parameters. Availability remains high so there won't be any service interruption during the upgrade. but skips the application health check. A regular deployment erases any previous deployment information and data on the cluster. MaxPercentUnhealthyPartitionsPerService = 0. The optional parameter DefaultServiceTypeHealthPolicy takes a hash table input. When you upgrade a Service Fabric application in Visual Studio. if there are enough service instances spread across upgrade domains. When you do an UnmonitoredManual upgrade. you can specify a publish process to be an upgrade rather than a regular deployment by checking the Upgrade the application check box. you need to provide application upgrade parameters and health check policies. Click the Settings button next to the check box. you need to manually upgrade each upgrade domain. The Edit Upgrade Parameters dialog box supports the Monitored. MaxPercentUnhealthyReplicasPerPartition = 0 } . Here’s an example of the hash table input format for DefaultServiceTypeHealthPolicy: @{ ConsiderWarningAsError = "false". MaxPercentUnhealthyServices = 0. A Monitored upgrade automates the upgrade and application health check. Parameters needed to upgrade You can choose from two types of deployment: regular or upgrade. Tests can be run against an application while it's being upgraded. An UnmonitoredAuto upgrade automates the upgrade. Configure the upgrade of a Service Fabric application in Visual Studio 1/17/2017 • 2 min to read • Edit Online Visual Studio tools for Azure Service Fabric provide upgrade support for publishing to local or remote clusters. MaxPercentUnhealthyDeployedApplications = 0. 2. UnmonitoredAuto. UnmonitoredAuto. See Service Fabric application upgrade: upgrade parameters for more details. and UnmonitoredManual upgrade modes. Application upgrade parameters help control the upgrade. while an upgrade deployment preserves it. while health check policies determine whether the upgrade was successful. There are three upgrade modes: Monitored. There are two advantages to upgrading your application to a newer version instead of replacing the application during testing and debugging: Application data won't be lost during the upgrade. and UnmonitoredManual. To configure the upgrade parameters 1. See Application upgrade parameters to learn more about the available upgrade options. Upgrade a Service Fabric application in Visual Studio If you’re using the Visual Studio Service Fabric tools to upgrade a Service Fabric application. Select the upgrade mode that you want to use and then fill out the parameter grid. 5.5" } 3. If you select UnmonitoredManual upgrade mode.10.MaxPercentUnhealthyReplicasPerPartition. Upgrade an application by using PowerShell You can use PowerShell cmdlets to upgrade a Service Fabric application. Specify a health check policy in the application manifest file Every service in a Service Fabric application can have its own health policy parameters that override the default values. ServiceTypeHealthPolicyMap is another optional parameter that takes a hash table input in the following format: @ {"ServiceTypeName" : "MaxPercentUnhealthyPartitionsPerService. "ServiceTypeName02" = "5.MaxPercentUnhealthySer vices"} Here's a real-life example: @{ "ServiceTypeName01" = "5. <Policies> <HealthPolicy ConsiderWarningAsError="false" MaxPercentUnhealthyDeployedApplications="20"> <DefaultServiceTypeHealthPolicy MaxPercentUnhealthyServices="20" MaxPercentUnhealthyPartitionsPerService="20" MaxPercentUnhealthyReplicasPerPartition="20" /> <ServiceTypeHealthPolicy ServiceTypeName="ServiceTypeName1" MaxPercentUnhealthyServices="20" MaxPercentUnhealthyPartitionsPerService="20" MaxPercentUnhealthyReplicasPerPartition="20" /> </HealthPolicy> </Policies> Next steps For more information about deploying an application. See Service Fabric application upgrade tutorial and Start-ServiceFabricApplicationUpgrade for detailed information. The following example shows how to apply a unique health check policy for each service in the application manifest. you must manually start a PowerShell console to continue and finish the upgrade process. see Deploy an existing application in Azure Service Fabric.5". . You can provide these parameter values in the application manifest file. Refer to Service Fabric application upgrade: advanced topics to learn how manual upgrade works. The application may be rolled back to the pre-update version (rollback). the upgrade mode is also changed to Manual. . fabric:/ClusterMonitor TargetApplicationTypeVersion The version of the application type that the upgrade targets. and they specify the policies that must be applied when an upgrade fails. They are knobs that control the time-outs and health checks that are applied during the upgrade. The default value is never (Infinite) and should be customized appropriately for your application. Within this HealthCheckRetryTimeout. Examples: fabric:/VisualObjects. If the health check fails. The default value is 120 seconds. PARAMETER DESCRIPTION ApplicationName Name of the application that is being upgraded. The default value is 10 minutes and should be customized appropriately for your application. If this time-out is reached. This duration starts after HealthCheckWaitDuration is reached. HealthCheckStableDurationSec The duration (in seconds) to verify that the application is stable before moving to the next upgrade domain or completing the upgrade. HealthCheckRetryTimeoutSec The duration (in seconds) that Service Fabric continues to perform health evaluation before declaring the upgrade as failed. or the upgrade may be stopped at the current upgrade domain. This duration can also be considered as the time an application should be running before it can be considered healthy. In the latter case. Service Fabric might perform multiple health checks of the application health. The parameters include the name and version of the application. If the health check passes. HealthCheckWaitDurationSec The time to wait (in seconds) after the upgrade has finished on the upgrade domain before Service Fabric evaluates the health of the application. and should be customized appropriately for your application. The default and recommended value is 0 seconds. This wait duration is used to prevent undetected changes of health right after the health check is performed. UpgradeDomainTimeoutSec Maximum time (in seconds) for upgrading a single upgrade domain. the upgrade stops and proceeds based on the setting for UpgradeFailureAction. the upgrade process proceeds to the next upgrade domain. FailureAction The action taken by Service Fabric when the upgrade fails. The default is 600 seconds. Allowed values are Rollback and Manual. Application upgrade parameters 3/10/2017 • 5 min to read • Edit Online This article describes the various parameters that apply during the upgrade of an Azure Service Fabric application. Service Fabric waits for an interval (the UpgradeHealthCheckInterval) before retrying the health check again until the HealthCheckRetryTimeout is reached. Treat the warning health events for the application as errors when evaluating the health of the application during upgrade. Service Fabric does not evaluate warning health events to be failures (errors). PARAMETER DESCRIPTION UpgradeTimeout A time-out (in seconds) that applies for the entire upgrade. Specify the maximum number of partitions in a service that can be unhealthy before the service is considered unhealthy.xml of the application instance. and is not specified as part of the upgrade cmdlet. which allows the application to appear healthy. This parameter is specified in the ClusterManager section of the cluster manifest. thus allowing the upgrade to proceed. Service Fabric can detect a problem with the application package quickly and help produce a fail fast upgrade. Specify the maximum number of services in the application instance that can be unhealthy before the application is considered unhealthy and fails the upgrade. UpgradeHealthCheckInterval The frequency that the health status is checked. . MaxPercentUnhealthyDeployedApplications Default and recommended value is 0. The default value is 60 seconds. Specify the maximum number of replicas in partition that can be unhealthy before the partition is considered unhealthy. By specifying a strict MaxPercentUnhealthyDeployedApplications health. MaxPercentUnhealthyServices Default and recommended value is 0. If the health evaluation criteria are not specified when an upgrade starts. MaxPercentUnhealthyReplicasPerPartition Default and recommended value is 0. the upgrade stops and UpgradeFailureAction is triggered. By default. PARAMETER DESCRIPTION ConsiderWarningAsError Default value is False. Typically. This parameter defines the application health on the node and helps detect issues during upgrade. If this time-out is reached. The default value is never (Infinite) and should be customized appropriately for your application. Service Fabric uses the application health policies specified in the ApplicationManifest. MaxPercentUnhealthyPartitionsPerService Default and recommended value is 0. Specify the maximum number of deployed applications (see the Health section) that can be unhealthy before the application is considered unhealthy and fails the upgrade. so the upgrade can proceed even if there are warning events. Service health evaluation during application upgrade The health evaluation criteria are optional. the replicas of the application get load- balanced to the other node. MaxPercentUnhealthyPartitionsPerService. Setting these parameters per-service allows for an application to contain different services types with different evaluation policies. Next steps Upgrading your Application Using Visual Studio walks you through an application upgrade using Visual Studio. The service is responsible for applying the changes. If necessary. Service Fabric waits for a quorum to be available. ForceRestart If you update a configuration or data package without updating the service code. Stateful service--Within a single upgrade domain. and immediately proceeds with the upgrade. Service Fabric tries to ensure that additional instances of the service are available. If the target instance count is more than one. Upgrading your Application using Azure CLI on Linux walks you through an application upgrade using Azure CLI. Service Fabric proceeds with the upgrade. Service Fabric tries to ensure that the replica set has a quorum. This setting is set as never (infinite) when rolling forward. and 900 seconds when rolling back. Learn how to use advanced functionality while upgrading your application by referring to Advanced Topics. The MaxPercentUnhealthyServices. If the time-out expires. If the target instance count is one. and MaxPercentUnhealthyReplicasPerPartition criteria can be specified per service type for an application instance. Upgrading your application using Service Fabric Eclipse Plugin Make your application upgrades compatible by learning how to use Data Serialization. If the time-out expires. Service Fabric does not wait. regardless of quorum. When the update is complete. This time-out is specified by using the UpgradeReplicaSetCheckTimeout property. Service Fabric notifies the service that a new configuration package or data package is available. For example. up to a maximum time-out value. Service Fabric waits for more than one instance to be available. up to a maximum timeout value (specified by the UpgradeReplicaSetCheckTimeout property). the service is restarted only if the ForceRestart property is set to true. regardless of the number of service instances. PARAMETER DESCRIPTION UpgradeReplicaSetCheckTimeout Stateless service--Within a single upgrade domain. Service Fabric proceeds with the upgrade. . a stateless gateway service type can have a MaxPercentUnhealthyPartitionsPerService that is different from a stateful engine service type for a particular application instance. Fix common problems in application upgrades by referring to the steps in Troubleshooting Application Upgrades. the service can restart itself. Upgrading your Application Using Powershell walks you through an application upgrade using PowerShell. you can use the PowerShell command Copy- ServiceFabricApplicationPackage to copy the application package to the ImageStore. If you want to verify the app package locally. Alternatively. NOTE Before any of the Service Fabric commands may be used in PowerShell. PowerShell. the application administrator can configure the health evaluation policy that Service Fabric uses to determine if the application is healthy. For instructions on performing an upgrade using Visual Studio. you can use PowerShell to deploy your application. Similarly.) This section walks through a monitored upgrade for one of the SDK samples that uses PowerShell. see Service Fabric application upgrade tutorial. After building the project in Visual Studio. Now. You should see some floating visual objects moving around in the screen. Service Fabric evaluates the application health and either proceeds to the next update domain or fails the upgrade depending on the health policies. A monitored application upgrade can be performed using the managed or native APIs. The application has a web service that can be navigated to in Internet Explorer by typing http://localhost:8081/visualobjects in the address bar. it is assumed that the Cluster has already been set up on your local machine. The next step is to register the application to the Service Fabric runtime using the Register-ServiceFabricApplicationPackage cmdlet. use the Test-ServiceFabricApplicationPackage cmdlet. These three steps are analogous to using the Deploy menu item in Visual Studio. Azure Service Fabric monitors the health of the application being upgraded based on a set of health policies. see Upgrading your application using Visual Studio. the administrator can configure the action to be taken when the health evaluation fails (for example. doing an automatic rollback. With Service Fabric monitored rolling upgrades. Once an update domain (UD) is upgraded. VisualObjectsApplication. In addition. or REST. For more information. See the article on setting up your Service Fabric development environment. you can use Service Fabric Explorer to view the cluster and the application. and selecting the Publish command. you first need to connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. The following Microsoft Virtual Academy video also walks you through an app upgrade: Step 1: Build and deploy the Visual Objects sample Build and publish the application by right-clicking on the application project. Additionally. Service Fabric application upgrade using PowerShell 3/3/2017 • 6 min to read • Edit Online The most frequently used and recommended upgrade approach is the monitored rolling upgrade. The final step is to start an instance of the application by using the New-ServiceFabricApplication cmdlet. you can use Get- . Move(true) . and uncomment this. In addition. For this walkthrough.State. let's package the updated application by right-clicking on VisualObjectsApplication.ActorService project within the VisualObjects solution. the visual objects do not rotate. you should update the versions for all projects.0.xml file (under PackageRoot) of the project VisualObjects. comment out this.0.0.com/2011/01/fabric" xmlns:xsi="http://www. Update the CodePackage and the service version to 2. After the changes are made.0" /> Now. which means that all services and instances should be healthy after the upgrade. Finally. let's also set the UpgradeFailureAction to rollback. Your updated application is ready to be deployed.0" xmlns="http://schemas. Let's upgrade this application to one where the visual objects also rotate. the Application version is updated to 2.org/2001/XMLSchema- instance" ApplicationTypeName="VisualObjects" ApplicationTypeVersion="2. Step 2: Update the Visual Objects sample You might notice that with the version that was deployed in Step 1.cs file.0 of the VisualObjects.xml file.0 from 1.0.Move() . We also need to update the ServiceManifest.ServiceFabricApplication to check the application status. Thus. build the project by selecting just the ActorService project. and choosing Package. the manifest should look like the following (highlighted portions show the changes): <ServiceManifestName="VisualObjects. and then right-clicking and selecting the Build option in Visual Studio.0.com/2011/01/fabric"> <ServiceManifestRefServiceManifestName="VisualObjects. Next.w3.State.org/2001/XMLSchema-instance"> <CodePackageName="Code" Version="2.0" xmlns="http://schemas.ActorService project. time-outs. the service health evaluation criterion is set to the default (and recommended) values.microsoft. Select the VisualObjects. navigate to the method MoveObject .w3.0. and open the StatefulVisualObjectActor. Within that file. This action creates an application package that can be deployed.org/2001/XMLSchema" xmlns:xsi="http://www. This option requires Service Fabric to roll back the application to the previous version if it encounters any issues during the upgrade. This change rotates the objects after the service is upgraded.ActorService" Version="2.xml should look like the following snippet: <ApplicationManifestxmlns:xsd="http://www. Let's also set the UpgradeDomainTimeout to be 1200 seconds and the UpgradeTimeout to be 3000 seconds.microsoft. the following parameters are specified: FailureAction = Rollback . You can use the Visual Studio Edit Manifest Files option after you right-click on the solution to make the manifest file changes. when starting the upgrade (in Step 4). However. since the code would have changed.ActorService.0"> Now the ApplicationManifest. The ApplicationManifest. and the corresponding lines in the ServiceManifest.0. selecting the Service Fabric Menu.w3. and health criterion applied.xml file (found under the VisualObjects project under the VisualObjects solution) is updated to version 2.0.ActorService" ServiceManifestVersion="2. Step 3: Decide on health policies and upgrade parameters Familiarize yourself with the application upgrade parameters and the upgrade process to get a good understanding of the various upgrade parameters. If you select Rebuild all. let's increase the HealthCheckStableDuration to 60 seconds (so that the services are healthy for at least 20 seconds before the upgrade proceeds to the next update domain). you can monitor it using Service Fabric Explorer. Now let's copy the updated application package to the Service Fabric ImageStore (where the application packages are stored by Service Fabric). The application package is stored under the following relative path where you uncompressed the Service Fabric SDK . Check the timestamps to ensure that it is the latest build (you may need to modify the paths appropriately as well). you may have to update your WebService version as well. Step 5: Start the application upgrade Now.0.HealthCheckStableDurationSec = 60 UpgradeDomainTimeoutSec = 1200 UpgradeTimeout = 3000 Step 4: Prepare application for upgrade Now the application is built and ready to be upgraded. where the application package is stored. Refer to the troubleshooting section. it is likely that you need a rebuild of all services. or by using the Get- ServiceFabricApplicationUpgrade PowerShell command: Get-ServiceFabricApplicationUpgrade fabric:/VisualObjects In a few minutes. should state that all update domains were upgraded (completed).0 -HealthCheckStableDurationSec 60 -UpgradeDomainTimeoutSec 1200 -UpgradeTimeout 3000 -FailureAction Rollback -Monitored The application name is the same as it was described in the ApplicationManifest.0 of VisualObjects that's been deployed. we're all set to start the application upgrade by using the Start-ServiceFabricApplicationUpgrade command: Start-ServiceFabricApplicationUpgrade -ApplicationName fabric:/VisualObjects -ApplicationTypeVersion 2. which can be performed using the Register- ServiceFabricApplicationType command: Register-ServiceFabricApplicationType -ApplicationPathInImageStore "VisualObjects\_V2" If the preceding command doesn't succeed. If you open up a PowerShell window as an administrator and type Get-ServiceFabricApplication. You should find a "Package" folder in that directory. If you set the time-outs to be too short. We have put the updated application in "VisualObjects_V2" with the following command (you may need to modify paths again appropriately).xml file. Service Fabric uses this name to identify which application is getting upgraded.0. Copy-ServiceFabricApplicationPackage -ApplicationPackagePath . as the application upgrade proceeds. you may encounter a failure message that states the problem. the status that you got by using the preceding PowerShell command. And you should find that the visual objects in your browser window have . As mentioned in Step 2. Now.0.0. or increase the time- outs. it should let you know that it is application type 1. The parameter ApplicationPackagePathInImageStore informs Service Fabric where it can find the application package.\Samples\Services\Stateful\VisualObjects\VisualObjects\obj\x64\Debug\Package -ImageStoreConnectionString fabric:ImageStore -ApplicationPackagePathInImageStore "VisualObjects\_V2" The next step is to register this application with Service Fabric.Samples\Services\Stateful\VisualObjects\VisualObjects\obj\x64\Debug. the parameters need to be set appropriately. Next steps Upgrading your application using Visual Studio walks you through an application upgrade using Visual Studio. Make your application upgrades compatible by learning how to use data serialization. Moving from version 2 to version 1 is also considered an upgrade. Learn how to use advanced functionality while upgrading your application by referring to Advanced topics.started rotating! You can try upgrading from version 2 to version 3. When you are deploying to an Azure cluster. It is good to set the time-outs conservatively. Control how your application upgrades by using upgrade parameters. Play with time-outs and health policies to make yourself familiar with them. or from version 2 to version 1 as an exercise. . Fix common problems in application upgrades by referring to the steps in Troubleshooting application upgrades. Service Fabric application upgrades are Zero Downtime. Service Fabric application upgrade tutorial using Visual Studio 4/7/2017 • 3 min to read • Edit Online Azure Service Fabric simplifies the process of upgrading cloud applications by ensuring that only changed services are upgraded. It also automatically rolls back the application to the previous version upon encountering issues. . and that application health is monitored throughout the upgrade process. This tutorial covers how to complete a rolling upgrade from Visual Studio. build and publish the application by right- clicking on the application project. since the application can be upgraded with no downtime. Then. and selecting the Publish command in the Service Fabric menu item. Step 1: Build and publish the Visual Objects sample First. download the Visual Objects application from GitHub. VisualObjects. and you can set the Target profile to PublishProfiles\Local.Selecting Publish brings up a popup.xml. The window should look like the following before you click Publish. . ActorService project within the VisualObjects solution. and uncomment visualObject. After the changes are made. comment out visualObject. which builds the modified projects. Now you can build (not rebuild) the solution. Select the VisualObjects. We also need to version our application.Move(true) .Move(false) . You can use Service Fabric Explorer to view the cluster and the application.Now you can click Publish in the dialog box. along with the application to version 2. Let's upgrade this application to one where the visual objects also rotate. the manifest should look like the following (bold portions show the changes): . Selecting this option brings up the dialog box for edition versions as follows: Update the versions for the modified projects and their code packages. Step 2: Update the Visual Objects sample You might notice that with the version that was deployed in step 1.0. the visual objects do not rotate.0. To make the version changes after you right-click on the VisualObjects project. This code change rotates the objects after the service is upgraded. You should see 10 floating visual objects moving around on the screen. you can use the Visual Studio Edit Manifest Versions option. and open the VisualObjectActor. go to the method MoveObject . Within that file. If you select Rebuild all.cs file. The Visual Objects application has a web service that you can go to by typing http://localhost:8082/visualobjects/ in the address bar of your browser. you have to update the versions for all the projects. in which the objects rotate. time-outs. and health criterion that can be applied. Next steps Upgrading your application using PowerShell walks you through an application upgrade using PowerShell.0. Now we are all set to start the application upgrade by selecting Publish. Control how your application is upgraded by using upgrade parameters. or even from version 2. When deploying to an Azure cluster as opposed to a local cluster. . Make your application upgrades compatible by learning how to use data serialization. Access to the service can be checked through your client (browser).The Visual Studio tools can do automatic rollups of versions upon selecting Automatically update application and service versions. by using the Upgrades in Progress tab under the applications. This option upgrades your application to version 2. Now.0. In a few minutes. Step 3: Upgrade your application Familiarize yourself with the application upgrade parameters and the upgrade process to get a good understanding of the various upgrade parameters. Service Fabric upgrades one update domain at a time (some objects are updated first.0. all update domains should be upgraded (completed).0 to version 3.0 as an exercise. If you use SemVer. For this walkthrough. you need to update the code and/or configuration package version alone if that option is selected. and moving from version 2. Save the changes. and the Visual Studio output window should also state that the upgrade is completed. and now check the Upgrade the Application box.0 back to version 1. Learn how to use advanced functionality while upgrading your application by referring to Advanced topics. as the application upgrade proceeds. you can monitor it with Service Fabric Explorer. and the service remains accessible during the upgrade. We recommend that you set the time-outs conservatively. You can configure these settings by selecting Configure Upgrade Settings and then modifying the parameters as desired. And you should find that all the visual objects in your browser window are now rotating! You may want to try changing the versions.0. Play with time-outs and health policies to make yourself familiar with them.0.0. followed by others).0. the parameters used may have to differ. the service health evaluation criterion is set to the default (unmonitored mode). .Fix common problems in application upgrades by referring to the steps in Troubleshooting application upgrades. . OverallUpgradeTimeout . the application remained unhealthy according to the specified health policies and HealthCheckRetryTimeout expired. FailureReason identifies one of three potential high-level causes of the failure: 1. 3. This information is available when Service Fabric detects the failure regardless of whether the FailureAction is to roll back or suspend the upgrade.Indicates that a particular upgrade domain took too long to complete and UpgradeDomainTimeout expired. Troubleshoot application upgrades 3/3/2017 • 8 min to read • Edit Online This article covers some of the common issues around upgrading an Azure Service Fabric application and how to resolve them. 2. the output of the Get-ServiceFabricApplicationUpgrade command contains additional information for debugging the failure. Further information is displayed depending on the type of the failure.Indicates that the overall upgrade took too long to complete and UpgradeTimeout expired. Identify the failure type. The following list specifies how the additional information can be used: 1. 3. Identify the failure reason. UpgradeDomainTimeout . Isolate one or more failing components for further investigation. FailureTimestampUtc identifies the timestamp (in UTC) at which an upgrade failure was detected by Service Fabric and FailureAction was triggered. HealthCheck .Indicates that after upgrading an update domain. Troubleshoot a failed application upgrade When an upgrade fails. The output following this paragraph is typical of upgrades where service replicas or instances fail to start in the new code version. The UpgradeDomainProgressAtFailure field captures a snapshot of any pending upgrade work at the time of failure. These entries only show up in the output when the upgrade fails and starts rolling back. 2. Identify the failure type In the output of Get-ServiceFabricApplicationUpgrade. Investigate upgrade timeouts Upgrade timeout failures are most commonly caused by service availability issues. . "MYUD2" = "Completed". "MYUD3" = "Completed" } UpgradeKind : Rolling RollingUpgradeMode : UnmonitoredAuto ForceRestart : False UpgradeReplicaSetCheckTimeout : 00:00:00 In this example. The UnhealthyEvaluations field captures a snapshot of health checks that failed at the time of the upgrade according to the specified health policy. If the original upgrade was performed with a manual FailureAction.PartitionId: 4b43f4d8-b26b-424e-9307- 7a7a62e79750 UpgradeState : RollingBackCompleted UpgradeDuration : 00:00:46 CurrentUpgradeDomainDuration : 00:00:00 NextUpgradeDomain : UpgradeDomainsStatus : { "MYUD1" = "Completed". The UpgradePhase says PostUpgradeSafetyCheck. the upgrade failed at upgrade domain MYUD1 and two partitions (744c8d9f-1d26-417e-a60e- cd48f5c098f0 and 4b43f4d8-b26b-424e-9307-7a7a62e79750) were stuck. Investigate health check failures Health check failures can be triggered by various issues that can happen after all nodes in an upgrade domain finish upgrading and passing all safety checks. The most common issues are service errors in the open or promotion to primary code paths. The current UpgradeState is RollingBackCompleted. The output following this paragraph is typical of an upgrade failure due to failed health checks. which means that these safety checks are occurring after all nodes in the upgrade domain have finished upgrading. so the original upgrade must have been performed with a rollback FailureAction. All this information points to a potential issue with the new version of the application code. which automatically rolled back the upgrade upon failure. The Get-ServiceFabricNode command can be used to verify that these two nodes are in upgrade domain MYUD1.PartitionId: 744c8d9f-1d26-417e-a60e- cd48f5c098f0 NodeName : Node1 UpgradePhase : PostUpgradeSafetyCheck PendingSafetyChecks : WaitForPrimaryPlacement . then the upgrade would instead be in a suspended state to allow live debugging of the application. PS D:\temp> Get-ServiceFabricApplicationUpgrade fabric:/DemoApp ApplicationName : fabric:/DemoApp ApplicationTypeName : DemoAppType TargetApplicationTypeVersion : v2 ApplicationParameters : {} StartTimestampUtc : 4/14/2015 9:26:38 PM FailureTimestampUtc : 4/14/2015 9:27:05 PM FailureReason : UpgradeDomainTimeout UpgradeDomainProgressAtFailure : MYUD1 NodeName : Node4 UpgradePhase : PostUpgradeSafetyCheck PendingSafetyChecks : WaitForPrimaryPlacement . The most common issues in this case are service errors in the close or demotion from primary code paths. The partitions were stuck because the runtime was unable to place primary replicas (WaitForPrimaryPlacement) on target nodes Node1 and Node4. An UpgradePhase of PreUpgradeSafetyCheck means there were issues preparing the upgrade domain before it was performed. we can see that two services are unhealthy: fabric:/DemoApp/Svc3 and fabric:/DemoApp/Svc2. The upgrade was suspended upon failing by specifying a FailureAction of manual when starting the upgrade. "MYUD2" = "Pending". AggregatedHealthState='Error'.4775807 UpgradeTimeout : 10675199. Property='InjectedFault'. MaxPercentUnhealthyPartitionsPerService=0%.4775807 ConsiderWarningAsError : MaxPercentUnhealthyPartitionsPerService : MaxPercentUnhealthyReplicasPerPartition : MaxPercentUnhealthyServices : MaxPercentUnhealthyDeployedApplications : ServiceTypeHealthPolicyMap : Investigating health check failures first requires an understanding of the Service Fabric health model. Unhealthy partitions: 100% (1/1). ServiceType='PersistedServiceType'. Property='InjectedFault'. along with the error health reports ("InjectedFault" in this case). In this example. Unhealthy partitions: 100% (1/1). "MYUD3" = "Pending" } UnhealthyEvaluations : Unhealthy services: 50% (2/4). which is below the default target of 0% unhealthy (MaxPercentUnhealthyServices). Unhealthy service: ServiceName='fabric:/DemoApp/Svc2'. Error event: SourceId='Replica'. Unhealthy partition: PartitionId='3a9911f6-a2e5-452d-89a8- 09271e7e49a8'. UpgradeKind : Rolling RollingUpgradeMode : Monitored FailureAction : Manual ForceRestart : False UpgradeReplicaSetCheckTimeout : 49710. Unhealthy service: ServiceName='fabric:/DemoApp/Svc3'. PS D:\temp> Get-ServiceFabricApplicationUpgrade fabric:/DemoApp ApplicationName : fabric:/DemoApp ApplicationTypeName : DemoAppType TargetApplicationTypeVersion : v4 ApplicationParameters : {} StartTimestampUtc : 4/24/2015 2:42:31 AM UpgradeState : RollingForwardPending UpgradeDuration : 00:00:27 CurrentUpgradeDomainDuration : 00:00:27 NextUpgradeDomain : MYUD2 UpgradeDomainsStatus : { "MYUD1" = "Completed". AggregatedHealthState='Error'.02:48:05. Unhealthy partition: PartitionId='744c8d9f-1d26-417e-a60e- cd48f5c098f0'. MaxPercentUnhealthyPartitionsPerService=0%. MaxPercentUnhealthyServices=0%.06:28:15 HealthCheckWaitDuration : 00:00:00 HealthCheckStableDuration : 00:00:10 HealthCheckRetryTimeout : 00:00:10 UpgradeDomainTimeout : 10675199. But even without such an in-depth understanding. . AggregatedHealthState='Error'. AggregatedHealthState='Error'.02:48:05. Error event: SourceId='Replica'. This mode allows us to investigate the live system in the failed state before taking any further action. two out of four services are unhealthy. there is no recovery needed since the upgrade automatically rolls back upon failing. The Update-ServiceFabricApplicationUpgrade command can be used to resume the monitored upgrade with both safety and health checks being performed. and services) for health evaluation and always rounds up to whole entities. Resume the monitored upgrade The Start-ServiceFabricApplicationRollback command can be used at any time to start rolling back the application. partitions. replicas.`Math. the upgrade was resumed in Monitored mode. only safety checks are performed by the system. the rollback request has been registered in the system and starts shortly thereafter. PS D:\temp> Update-ServiceFabricApplicationUpgrade fabric:/DemoApp -UpgradeMode Monitored UpgradeMode : Monitored ForceRestart : UpgradeReplicaSetCheckTimeout : FailureAction : HealthCheckWaitDuration : HealthCheckStableDuration : HealthCheckRetryTimeout : UpgradeTimeout : UpgradeDomainTimeout : ConsiderWarningAsError : MaxPercentUnhealthyPartitionsPerService : MaxPercentUnhealthyReplicasPerPartition : MaxPercentUnhealthyServices : MaxPercentUnhealthyDeployedApplications : ServiceTypeHealthPolicyMap : PS D:\temp> The upgrade continues from the upgrade domain where it was last suspended and use the same upgrade parameters and health policies as before. Once the command returns successfully. Possible Cause 2: . any of the upgrade parameters and health policies shown in the preceding output can be changed in the same command when the upgrade resumes. then Service Fabric allows up to two unhealthy replicas (that is. if the maximum MaxPercentUnhealthyReplicasPerPartition is 21% and there are five replicas.21)). Thus. there are several recovery options: 1. Further troubleshooting Service Fabric is not following the specified health policies Possible Cause 1: Service Fabric translates all percentages into actual numbers of entities (for example. This command can only be used when the UpgradeState shows RollingForwardPending. In this mode. No more health checks are performed. with the parameters and the health policies unchanged. With a manual FailureAction. For example. Proceed through the remainder of the upgrade manually 3. The Resume-ServiceFabricApplicationUpgrade command can be used to proceed through the remainder of the upgrade manually.Ceiling (5*0.Recover from a suspended upgrade With a rollback FailureAction. which means that the current upgrade domain has finished upgrading but the next one has not started (pending). health policies should be set accordingly. trigger a rollback 2. In this example. If needed. one upgrade domain at a time. application health policies specified for in version 1.0. and C need to be healthy. then the upgrade eventually times out on UpgradeDomainTimeout. but the upgrade still fails for some time -outs that I never specified When health policies aren't provided to the upgrade request. My upgrades are taking too long The time for an upgrade to complete depends on the health checks and time-outs specified. I did not specify a health policy for application upgrade. they are taken from the ApplicationManifest. different unhealthy percentage thresholds can be applied to different services. it might result in unanticipated errors due to C being unexpectedly unhealthy instead of D. UpgradeDomainTimeout starts counting down once the upgrade for the current upgrade domain begins. Since health policies are specified per service type.Health policies are specified in terms of percentages of total services and not specific service instances. Once the upgrade is complete. Make your application upgrades compatible by learning how to use Data Serialization. deploy. Next steps Upgrading your Application Using Visual Studio walks you through an application upgrade using Visual Studio. if an application has four service instances A. For example. D should be modeled as a different service type from A. We want to ignore the known unhealthy service D during upgrade and set the parameter MaxPercentUnhealthyServices to be 25%. if you're upgrading Application X from version 1. B.xml of the current application version. then the policy needs to be specified as part of the application upgrade API call. Incorrect time -outs are specified You may have wondered about what happens when time-outs are set inconsistently. Learn how to use advanced functionality while upgrading your application by referring to Advanced Topics. Health checks and time-outs depend on how long it takes to copy. However. The upgrade time for an upgrade domain is limited by UpgradeDomainTimeout. before an upgrade. If HealthCheckRetryTimeout and HealthCheckStableDuration are both non-zero and the health of the application keeps switching back and forth. For example. you may have an UpgradeTimeout that's less than the UpgradeDomainTimeout. and D. Upgrade failure cannot occur faster than HealthCheckWaitDuration + HealthCheckRetryTimeout. Being too aggressive with time-outs might mean more failed upgrades. and C. B. D may become healthy while C becomes unhealthy. Here's a quick refresher on how the time-outs interact with the upgrade times: Upgrades for an upgrade domain cannot complete faster than HealthCheckWaitDuration + HealthCheckStableDuration. The policies specified as part of the API call only apply during the upgrade. If a different health policy should be used for the upgrade. The answer is that an error is returned. the policies specified in the ApplicationManifest. so we recommend starting conservatively with longer time-outs.xml are used. assuming only A. where service D is unhealthy but with little impact to the application. Errors are returned if the UpgradeDomainTimeout is less than the sum of HealthCheckWaitDuration and HealthCheckRetryTimeout. B. or if UpgradeDomainTimeout is less than the sum of HealthCheckWaitDuration and HealthCheckStableDuration. during the upgrade. In this situation. C. Upgrading your Application Using Powershell walks you through an application upgrade using PowerShell. For example. The upgrade would still succeed because only 25% of the services are unhealthy.0 are used. .0 to version 2. and stabilize the application. However. Control how your application upgrades by using Upgrade Parameters. .Fix common problems in application upgrades by referring to the steps in Troubleshooting Application Upgrades. How data serialization affects an application upgrade 2/13/2017 • 3 min to read • Edit Online In a rolling application upgrade. and the old version of your application must be able to read the new version of your data. During this process. For example. 2. Since replicas may be placed in different upgrade domains. For applications that use Reliable Actors. changes to the classes may cause a data format change. For applications that use Reliable Collections. the new serializer will load the data that was persisted to disk by the old version. data may be lost or corrupted. Care must be taken to ensure that a rolling upgrade can handle the data format change. that is the backing state for the actor. the data that is persisted and replicated comes from your C# classes. there are two main scenarios where the serializer may encounter an older or newer version of your data: 1. Code changes that result in a data format change Since the data format is determined by C# classes. Therefore. the upgrade may fail. the new version of your application must be able to read the old version of your data. even if the data is in an older or newer version. The Data Contract serializer is the serializer that we recommend for Service Fabric applications. the new and/or old version of your data may be encountered by the new and/or old version of your serializer. but Reliable Actors currently do not. During the rollout. as well as how they are serialized. The default serializer is the Data Contract serializer. This article discusses what constitutes your data format and offers best practices for ensuring that your data is forward and backward compatible. After a node is upgraded and starts back up. the data format is defined by the fields and properties that are serialized. How the data format affects a rolling upgrade During a rolling upgrade. The data serializer plays an important role in enabling rolling upgrades. These C# classes must be serializable to be persisted and replicated. or worse. in an IReliableDictionary<int. Reliable Collections allow the serializer to be overridden. If the data format is not forward and backward compatible. the cluster will contain a mix of the old and new versions of your code. Examples that may cause data format changes: Adding or removing fields or properties Renaming fields or properties Changing the types of fields or properties Changing the class name or namespace Data Contract as the default serializer The serializer is generally responsible for reading the data and deserializing it into the current version. and replicas send data to each other. During the rolling upgrade. What makes up your data format? In Azure Service Fabric. which has well-defined versioning rules. and some upgrade domains will be on the older version of your application. the upgrade is applied to a subset of nodes. some upgrade domains will be on the newer version of your application. one upgrade domain at a time. that is the objects in the reliable dictionaries and queues. . MyClass> the data is a serialized int and a serialized MyClass . It has well-defined versioning rules for adding. The "new serializer" refers to the serializer code that is executing in the new version of your application. Data may be lost if. The two versions of code and data format must be both forward and backward compatible. For more information. It also has support for dealing with unknown fields. a new property was added but the old serializer discards it during deserialization. Fix common problems in application upgrades by referring to the steps in Troubleshooting Application Upgrades . Next steps Uprading your Application Using Visual Studio walks you through an application upgrade using Visual Studio. Uprading your Application Using Powershell walks you through an application upgrade using PowerShell. Learn how to use advanced functionality while upgrading your application by referring to Advanced Topics. The rolling upgrade may fail because the code or serializer may throw exceptions or a fault when it encounters the other version. and changing fields. If they are not compatible. and dealing with class inheritance. Data Contract is the recommended solution for ensuring that your data is compatible. removing. the rolling upgrade may fail or data may be lost. . The "new data" refers to the serialized C# class from the new version of your application. Control how your application upgrades by using Upgrade Parameters. for example. NOTE The "new version" and "old version" here refer to the version of your code that is running. see Using Data Contract. hooking into the serialization and deserialization process. Change to manual upgrade mode Manual--Stop the application upgrade at the current UD and change the upgrade mode to Unmonitored Manual. Service Fabric ensures that the application is healthy before the upgrade proceeds. Azure Service Fabric provides multiple upgrade modes to support development and production clusters. When the upgrade policy is specified. Deployment options chosen may be different for different environments. self-contained application package. the new service is added to the deployed application. Services can also be removed from an application as part of an upgrade. Upgrade with a diff package A Service Fabric application can be upgraded by provisioning with a full. the automated rolling application upgrade is useful for development or testing environments to provide a fast iteration cycle during service development. the application is already in data loss). The administrator needs to manually call MoveNextApplicationUpgradeDomainAsync to proceed with the upgrade or trigger a rollback by initiating a new upgrade. However. The monitored rolling application upgrade is the most typical upgrade to use in the production environment. or a non-conventional upgrade is happening (for example. the updated application manifest. Service Fabric application upgrade: advanced topics 3/3/2017 • 4 min to read • Edit Online Adding or removing services during an application upgrade If a new service is added to an application that is already deployed. it stays in the Manual mode until a new upgrade is initiated. all current instances of the to- be-deleted service must be stopped before proceeding with the upgrade (using the Remove-ServiceFabricService cmdlet). and the service manifest files. The application administrator can use the manual rolling application upgrade mode to have total control over the upgrade progress through the various upgrade domains. Finally. Once the upgrade enters into the Manual mode. A full application package contains all the files necessary to start and run a Service Fabric application. Such an upgrade does not affect any of the services that were already part of the application. an instance of the service that was added must be started for the new service to be active (using the New-ServiceFabricService cmdlet). and published as an upgrade. However. An application can also be upgraded by using a diff package that contains only the updated application files. Manual upgrade mode NOTE The unmonitored manual mode should be considered only for a failed or suspended upgrade. The monitored mode is the recommended upgrade mode for Service Fabric applications. A diff . This mode is useful when a customized or complex health evaluation policy is required. The GetApplicationUpgradeProgressAsync command returns FABRIC_APPLICATION_UPGRADE_STATE_ROLLING_FORWARD_PENDING. Using a diff package.0.new version code 2. Any reference in the application manifest or service manifest that can't be found in the build layout is searched for in the image store.0.0. Occasions when using a diff package would be a good choice: A diff package is preferred when you have a large application package that references several service manifest files and/or several code packages.0. Now.0.0. even though the code hasn't changed.0 Now. A diff package is preferred when you have a deployment system that generates the build layout directly from your application build process.0. newly built assemblies get a different checksum.0 service1 1. config packages. or data packages.0. and the service manifest for service1 to reflect the code package update. Using a full application package would require you to update the version on all code packages. and the service manifests must be updated. Full application packages are required for the first installation of an application to the cluster.0. the diff package is published automatically.0 code 1.package contains only the files that changed between the last provision and the current upgrade. For example. .0. To create a diff package manually.new version config 1. When an application is upgraded using Visual Studio. In this case. you update the application manifest to 2. let's start with the following application (version numbers provided for ease of understanding): app1 1. Control how your application upgrades by using Upgrade Parameters.0 service2 1. you only provide the files that changed and the manifest files where the version has changed. plus the full application manifest and the service manifest files.0.0 code 1.0 service2 1.0 config 1.new version service1 2. Upgrading your Application Using Powershell walks you through an application upgrade using PowerShell. Subsequent updates can be either a full application package or a diff package. The folder for your application package would have the following structure: app1/ service1/ code/ Next steps Upgrading your Application Using Visual Studio walks you through an application upgrade using Visual Studio. the application manifest.0 config 1.0 <-.0 In this case. let's assume you wanted to update only the code package of service1 using a diff package using PowerShell. but only the changed packages should be included in the final application package.0.0.0 config 1.0. your updated application has the following folder structure: app1 2.0.0 code 1.0.0 <-.0 <-. Fix common problems in application upgrades by referring to the steps in Troubleshooting Application Upgrades.Make your application upgrades compatible by learning how to use Data Serialization. . Otherwise. (For example. and it is available when the cluster is up and running.) The health hierarchy captures the interactions of the system entities. see Service Fabric application model. The model allows near-real-time monitoring of the state of the cluster and the services running in it. Service Fabric components use this rich health model to report their current state. Health store The health store keeps health-related information about entities in the cluster for easy retrieval and evaluation. health application entity matches an application instance deployed in the cluster. The entities and hierarchy are automatically built by the health store based on reports received from Service Fabric components. For more on application. while health node entity matches a Service Fabric cluster node. you can detect and fix issues for your running application much more easily. flexible. Introduction to Service Fabric health monitoring 4/17/2017 • 20 min to read • Edit Online Azure Service Fabric introduces a health model that provides rich. see this article. services send reports based on their local views. . You can easily obtain health information and correct potential issues before they cascade and cause massive outages. the upgrade is either automatically rolled back or paused to give administrators a chance to fix the issues. and that information is aggregated to provide an overall cluster-level view. The following Microsoft Virtual Academy video also describes the Service Fabric health model and how it's used: NOTE We started the health subsystem to address a need for monitored upgrades. The health entities mirror the Service Fabric entities. You can learn about key Service Fabric concepts in Service Fabric technical overview. If you invest in high-quality health reporting that captures your custom conditions. Health entities and hierarchy The health entities are organized in a logical hierarchy that captures interactions and dependencies among different entities. In the typical model. no downtime and minimal to no user intervention. The health store is part of the fabric:/System application. Service Fabric provides monitored application and cluster upgrades that ensure full availability. the upgrade checks health based on configured upgrade policies and allows an upgrade to proceed only when health respects desired thresholds. To learn more about application upgrades. You can use the same mechanism to report health from your applications. and extensible health evaluation and reporting. and it is the basis for advanced health evaluation. To achieve these goals. It is implemented as a Service Fabric persisted stateful service to ensure high availability and scalability. Examples include the brain of the cluster splitting due to network partitioning or communication issues. Service health reports describe conditions that affect the overall health of the service. organized in a hierarchy based on parent-child relationships. The health model provides an accurate. They typically affect all the deployed entities running on it. The node entity is identified by the node name (string). Application. Represents the health of a stateful service replica or a stateless service instance. The service entity is identified by the service name (URI). Service. such as memory. Represents the health of a Service Fabric node. Node health reports describe conditions that affect the node functionality. Represents the health of a Service Fabric cluster. The health entities. Node. and they can't be narrowed down to a partition or a replica. a stateless instance can report when it is running out of resources or has connectivity issues. Examples include when a node is out of disk space (or another machine-wide property. Cluster health reports describe conditions that affect the entire cluster and can't be narrowed down to one or more unhealthy children. The partition entity is identified by the partition ID (GUID). Partition. connections) and when a node is down. . For stateful services. The smallest unit that watchdogs and system components can report on for an application. They can't be narrowed down to individual children (services or deployed applications). and monitored. Also. Examples include when the number of replicas is below target count and when a partition is in quorum loss. debugged. Represents the health of a service partition. Replica. The application entity is identified by the application name (URI). Examples include the end-to-end interaction among different services in the application. Represents the health of an application instance running in the cluster. The replica entity is identified by the partition ID (GUID) and the replica or instance ID (long).The health entities and hierarchy allow the cluster and applications to be effectively reported. granular representation of the health of the many moving pieces in the cluster. Examples include a service configuration (such as port or external file share) that is causing issues for all partitions. Application health reports describe conditions that affect the overall health of the application. Partition health reports describe conditions that affect the entire replica set. Represents the health of a service running in the cluster. examples include a primary replica reporting when it can't replicate operations to secondaries and when replication is not proceeding at the expected pace. The health entities are: Cluster. Plan to invest in how to report and respond to health during the design of a large cloud service. The deployed service package is identified by application name (URI). the warning condition may fix itself without any special intervention. The data automatically surfaces through the hierarchy. however. The health hierarchy is composed of parent-child relationships. The entity is healthy. If another system component returns an entity that has not reached or has been cleaned up from the health store. and service manifest name (string). The entity is unhealthy. The entity experiences some issues. If a node is unhealthy as reported by its authority system component (Failover Manager service). the merged result has unknown health state. Examples include a code package in the service package that cannot be started and a configuration package that cannot be read. but it is not yet unhealthy (for example. Warning. Reporting at that level is not ideal. Health states Service Fabric uses three health states to describe whether an entity is healthy or not: OK. Services have partitions. node name (string). Deployed applications have deployed service packages. DeployedServicePackage. because the issue might not be affecting all the services within that application. service packages. In other cases. The health evaluation result is one of these states. it affects the deployed applications. Applications have services and deployed applications. The possible health states are: OK. DeployedApplication. get application list query goes to ClusterManager and HealthManager. A cluster is composed of nodes and applications. This aggregation helps to pinpoint and resolve the root cause of the issue more quickly. if a service is not responding. because it can't function properly. Health policies . The deployed application is identified by application name (URI) and node name (string). For example. and operate. it is feasible to report that the application instance is unhealthy. get node list query goes to FailoverManager and HealthManager. Represents the health of a service package running on a node in the cluster. and error. the warning condition may degrade into a severe problem without user intervention. Error. The health hierarchy represents the latest state of the system based on the latest health reports. if more information points to that partition. and replicas deployed on it. In some cases. There is a special relationship between nodes and deployed entities. to make the service easier to debug. and each partition has one or more replicas. The report should be applied to the unhealthy service or to a specific child partition. and an unhealthy partition is made visible at service and application levels. which is almost real-time information. Unknown. Any report sent to the health store must specify one of these states. The granularity of the health model makes it easy to detect and correct issues. This result can be obtained from the distributed queries that merge results from multiple components. monitor. Represents the health of an application running on a node. Examples include when the application package can't be downloaded on that node and when there is an issue setting up application security principals on the node. For example. warning. Internal and external watchdogs can report on the same entities based on application- specific logic or custom monitored conditions. User reports coexist with the system reports. It describes conditions specific to a service package that do not affect the other service packages on the same node for the same application. Action should be taken to fix the state of the entity. The entity doesn't exist in the health store. There are no known issues reported on it or its children (when applicable). and it is useful to provide visibility into what is going on. but they do not cause any functional issues yet). Deployed application health reports describe conditions specific to the application on the node that can't be narrowed down to service packages deployed on the same node. These queries merge results from multiple system components. there are delays. The following example is an excerpt from a cluster manifest. Default: false. Specifies whether to treat warning health reports as errors during health evaluation. the default policy (zero tolerated failures) is used. prefix the parameter name with "ApplicationTypeMaxPercentUnhealthyApplications-". followed by the application type name. For example. A warning health state does not impact cluster upgrade or other monitoring triggered by Error health state. the parent is considered unhealthy. but below the global unhealthy percentage. Instead. By default. If even one of the children has one unhealthy event. they are evaluated against the percentages associated with their application type name in the map. This way. Specifies the maximum tolerated percentage of nodes that can be unhealthy before the cluster is considered in error. The application type health policy map can be used during cluster health evaluation to describe special application types. To define entries in the application type map. The control applications should never be in error. in a cluster there are thousands of applications of different types. the cluster would be evaluated to Warning. They are evaluated based on the total number of applications of the application type. You can specify global MaxPercentUnhealthyApplications to 20% to tolerate some failures. which triggers roll back or pauses the cluster upgrade. which is used only for that evaluation.The health store applies health policies to determine whether an entity is healthy based on its reports and its children. The policy can be defined in the cluster manifest. Health evaluation requests can also pass in custom health evaluation policies. using the specific MaxPercentUnhealthyApplications from the map. depending on the upgrade configuration. but for the application type "ControlApplicationType" set the MaxPercentUnhealthyApplications to 0. all application instances are taken out of the global pool of applications. they can be taken out of the global pool. The cluster health policy contains: ConsiderWarningAsError. MaxPercentUnhealthyApplications. so this percentage should be configured to tolerate that. some nodes are always down or out for repairs. By default. But even one control application in error would make cluster unhealthy. Specifies the maximum tolerated percentage of applications that can be unhealthy before the cluster is considered in error. NOTE Health policies can be specified in the cluster manifest (for cluster and node health evaluation) or in the application manifest (for application evaluation and any of its children). In large clusters. All the rest of the applications remain in the global pool and are evaluated with MaxPercentUnhealthyApplications. Cluster health policy The cluster health policy is used to evaluate the cluster health state and node health states. Service Fabric applies strict rules (everything must be healthy) for the parent-child hierarchical relationship. all applications are put into a pool and evaluated with MaxPercentUnhealthyApplications. If some application types should be treated differently. and a few control application instances of a special application type. ApplicationTypeHealthPolicyMap. For the application types defined in the map. <FabricSettings> <Section Name="HealthManager/ClusterHealthPolicy"> <Parameter Name="ConsiderWarningAsError" Value="False" /> <Parameter Name="MaxPercentUnhealthyApplications" Value="20" /> <Parameter Name="MaxPercentUnhealthyNodes" Value="20" /> <Parameter Name="ApplicationTypeMaxPercentUnhealthyApplications-ControlApplicationType" Value="0" /> </Section> </FabricSettings> . if some of the many applications are unhealthy. If it is not present. MaxPercentUnhealthyNodes. The . Specifies the maximum tolerated percentage of unhealthy services before the application is considered unhealthy. MaxPercentUnhealthyDeployedApplications. Default percentage: zero. which replaces the default health policy for all service types in the application. DefaultServiceTypeHealthPolicy. MaxPercentUnhealthyServices. Specifies the maximum tolerated percentage of unhealthy partitions before a service is considered unhealthy. When you specify policy per service type. you can gain more granular control of the health of the service. in the application package. Default percentage: zero. the health store aggregates all health reports on the entity and evaluates all its children (when applicable). ApplicationManifest. Default percentage: zero. The policy contains: MaxPercentUnhealthyPartitionsPerService. For example. If no policies are specified. Provides a map of service health policies per service type. Specifies the maximum tolerated percentage of deployed applications that can be unhealthy before the application is considered in error. These policies replace the default service type health policies for each specified service type. Specifies the default service type health policy. The following example is an excerpt from an application manifest: <Policies> <HealthPolicy ConsiderWarningAsError="true" MaxPercentUnhealthyDeployedApplications="20"> <DefaultServiceTypeHealthPolicy MaxPercentUnhealthyServices="0" MaxPercentUnhealthyPartitionsPerService="10" MaxPercentUnhealthyReplicasPerPartition="0"/> <ServiceTypeHealthPolicy ServiceTypeName="FrontEndServiceType" MaxPercentUnhealthyServices="0" MaxPercentUnhealthyPartitionsPerService="20" MaxPercentUnhealthyReplicasPerPartition="0"/> <ServiceTypeHealthPolicy ServiceTypeName="BackEndServiceType" MaxPercentUnhealthyServices="20" MaxPercentUnhealthyPartitionsPerService="0" MaxPercentUnhealthyReplicasPerPartition="0"> </ServiceTypeHealthPolicy> </HealthPolicy> </Policies> Health evaluation Users and automated services can evaluate health for any entity at any time. Specifies the maximum tolerated percentage of unhealthy replicas before a partition is considered unhealthy. This percentage is calculated by dividing the number of unhealthy deployed applications over the number of nodes that the applications are currently deployed on in the cluster. if an application has a stateless gateway service type and a stateful engine service type. Default: false. Default percentage: zero. MaxPercentUnhealthyReplicasPerPartition. you can configure the health policies for their evaluation differently. It can be defined in the application manifest.Application health policy The application health policy describes how the evaluation of events and child-states aggregation is done for applications and their children. ServiceTypeHealthPolicyMap. The configurable policies are: ConsiderWarningAsError. The computation rounds up to tolerate one failure on small numbers of nodes.xml. Service type health policy The service type health policy specifies how to evaluate and aggregate the services and the children of services. Specifies whether to treat warning health reports as errors during health evaluation. Service Fabric assumes that the entity is unhealthy if it has a health report or a child at the warning or error health state. To evaluate an entity's health. in particular the ConsiderWarningAsError member of application or cluster health policy. regardless of its health state.health aggregation algorithm uses health policies that specify how to evaluate health reports and how to aggregate child health states (when applicable). the aggregated health state is either warning or error. If there are no error reports and one or more warnings. The aggregation uses the associated health policies. The same is true for an expired health report. A health entity that has one or more error health reports is evaluated as Error. Health report aggregation with warning report and ConsiderWarningAsError set to false (default). Child health aggregation The aggregated health state of an entity reflects the child health states (when applicable). The aggregated health state is triggered by the worst health reports on the entity. ConsiderWarningAsError specifies how to evaluate warnings. . the aggregated health state is an error. Health report aggregation One entity can have multiple health reports sent by different reporters (system components or watchdogs) on different properties. depending on the ConsiderWarningAsError policy flag. If there is at least one error health report. The algorithm for aggregating child health states uses the health policies applicable based on the entity type. Represents the name of the application instance deployed in the cluster. and internal/external watchdogs can report against Service Fabric entities. They don't need to look at any global state or aggregate data. If children have both OK and warning states. Service. and not complex organisms that need to look at many things to infer what information to send. If there are children with error states that do not respect the maximum allowed percentage of unhealthy children. Application name (URI). If the children with error states respect the maximum allowed percentage of unhealthy children. System Fabric applications. Node name (string). A string that uniquely identifies the reporter of the health event. To send health data to the health store. Represents the name of the service instance deployed in the cluster. The report can then be sent through the API by using FabricClient. Health reports The health reports for each of the entities in the cluster contain the following information: SourceId.Child aggregation based on health policies. None. The desired behavior is to have simple reporters. the aggregated health state is warning. or through REST. This percentage is taken from the policy based on the entity and child type. a reporter needs to identify the affected entity and create a health report. After the health store has evaluated all the children. it aggregates their health states based on the configured maximum percentage of unhealthy children. Service name (URI). the child aggregated health state is warning.ReportHealth. If all children have OK states. the child aggregated health state is OK. through PowerShell. The reporters make local determinations of the health of the monitored entities. . Application. It differs based on the entity type: Cluster. Node.HealthClient. Identifies the entity where the report is applied. the aggregated health state is an error. Entity identifier. based on the conditions they are monitoring. Health reporting System components. the sequence number is generated automatically. and property. In the health store. By default. reporter A can report the health of the Node01 "storage" property and reporter B can report the health of the Node01 "connectivity" property. Application name (URI) and node name (string). Multiple reports for the same source and property override each other. Used when the report is valid for a specified period of time only. Property. A Boolean. It is necessary to put in the sequence number only when reporting on state transitions. If it doesn't. A timespan that indicates how long the health report is valid. a watchdog is changed and stops sending reports with previous source and property). The added metadata contains: SourceUtcTimestamp. For the same entity. A string that allows a reporter to provide detailed information about the health event. the source needs to remember which reports it sent and keep the information for recovery on failover. The description adds human- readable information about the report. A positive integer that needs to be ever-increasing. The replacement is based on sequence numbers. and service manifest name (string). HealthState. it represents the order of the reports. A report is rejected if the sequence number is less than or equal to the most recently applied number for the same entity. For example. and Error. and HealthState--are required for every health report. the expired health report is automatically removed from the health store. SequenceNumber. If set to true. A string (not a fixed enumeration) that allows the reporter to categorize the health event for a specific property of the entity. and HealthState should fully describe the report. the expired report is treated as an error on the health evaluation. Description. and the report doesn't impact entity health evaluation. SourceId. Represents the partition unique identifier. The time the report was last modified on the server side (Coordinated Universal Time). Coupled with RemoveWhenExpired. The time the report was given to the health client (Coordinated Universal Time). It's also used to delete reports from the health store (for example. Partition ID (GUID). Partition. there is only one report for the same source and property. These four pieces of information--SourceId. RemoveWhenExpired. and the report is valid forever. newer reports (with higher sequence numbers) replace older reports. it lets the health store know how to evaluate expired events. LastModifiedUtcTimestamp. The stateful service replica ID or the stateless service instance ID (INT64). and the reporter doesn't need to explicitly clear it out. .". which contain all the information from the reports. and additional metadata. The accepted values are OK. source. In this situation. the health store keeps health events. Health events Internally. these reports are treated as separate health events for the Node01 entity. If it is not specified. then there must be something wrong with the watchdog. node name (string). entity identifier. either on the health client side (if they are batched) or on the health store side. Property. The false value signals to the health store that the source should report periodically on this property. TimeToLive. The SourceId string is not allowed to start with the prefix "System. Application name (URI). Replica. The health events are returned by health queries. DeployedApplication. DeployedServicePackage. An enumeration that describes the health state of the report. It is used by the health store to detect stale reports that are received late because of network delays or other issues. Warning. the value is infinite. It can send a report with a brief TimeToLive along with RemoveWhenExpired to clear up any previous state from the health store. Property. The watchdog's health is captured by considering the event as an error. The metadata includes the time the report was given to the health client and the time it was modified on the server side. which is reserved for system reports. If the value is set to false. The text makes it easier for administrators and users to understand the health report. Otherwise. Checking the condition for a period of time avoids alerts on temporary conditions. it can be ignored because it was already signaled previously. An event can be expired only if RemoveWhenExpired is false. For example. Example: Report and evaluate application health The following example sends a health report through PowerShell on the application fabric:/WordCount from the source MyWatchdog. A flag to indicate whether the report was expired when the query was executed by the health store. which returns aggregated health state errors and the reported health events in the list of health events. not OK). If a report was already at error before the specified time. LastOkTransitionAt. The health report contains information about the health property "availability" in an error health state. with infinite TimeToLive.LastOkTransitionTime > 5 minutes). These fields give the history of the health state transitions for the event. Alert only on conditions that have changed in the last X minutes. an alert if the health state has been warning for more than five minutes can be translated into (HealthState == Warning and Now - LastWarningTransitionTime > 5 minutes). determine how long it has been unhealthy (that is. If a property is toggling between warning and error. the event is not returned by query and is removed from the store. They enable scenarios such as: Alert when a property has been at warning/error for more than X minutes. For example. IsExpired. LastWarningTransitionAt. an alert if the property hasn't been healthy for more than five minutes can be translated into (HealthState != Ok and Now . LastErrorTransitionAt. The state transition fields can be used for smarter alerts or "historical" health event information. . Then it queries the application health. The last time for OK/warning/error transitions. CM Property : State HealthState : Ok SequenceNumber : 360 SentAt : 3/22/2016 7:56:53 PM ReceivedAt : 3/22/2016 7:56:53 PM TTL : Infinite Description : Application has been created. ServiceHealthStates : ServiceName : fabric:/WordCount/WordCountService AggregatedHealthState : Error ServiceName : fabric:/WordCount/WordCountWebService AggregatedHealthState : Ok DeployedApplicationHealthStates : ApplicationName : fabric:/WordCount NodeName : _Node_0 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_2 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_3 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_4 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_1 AggregatedHealthState : Ok HealthEvents : SourceId : System. LastWarning = 1/1/0001 12:00:00 AM . RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:56:53 PM. Property='Availability'.PS C:\> Send-ServiceFabricApplicationHealthReport –ApplicationName fabric:/WordCount –SourceId "MyWatchdog" –HealthProperty "Availability" –HealthState Error PS C:\> Get-ServiceFabricApplicationHealth fabric:/WordCount ApplicationName : fabric:/WordCount AggregatedHealthState : Error UnhealthyEvaluations : Error event: SourceId='MyWatchdog'. LastWarning = 1/1/0001 12:00:00 AM SourceId : MyWatchdog Property : Availability HealthState : Error SequenceNumber : 131032204762818013 SentAt : 3/23/2016 3:27:56 PM ReceivedAt : 3/23/2016 3:27:56 PM TTL : Infinite Description : RemoveWhenExpired : False IsExpired : False Transitions : Ok->Error = 3/23/2016 3:27:56 PM. for evaluating cluster and application health. Other services use health data to perform automatic repairs. It also doesn't allow them to collect very specific information to help identify issues and potential issues as close to the root cause as possible. This approach hinders their scalability. Next steps View Service Fabric health reports Use system health reports for troubleshooting How to report and check service health Add custom Service Fabric health reports Monitor and diagnose services locally Service Fabric application upgrade . centralized service at the cluster level that parses all the potentially useful information emitted by services. and for monitored upgrades.Health model usage The health model allows cloud services and the underlying Service Fabric platform to scale. Other systems have a single. and issue alerts on certain conditions. build cluster health history. because monitoring and health determinations are distributed among the different monitors within the cluster. The health model is used heavily for monitoring and diagnosis. Prerequisites You must have the following installed: Visual Studio 2015 Service Fabric SDK To create a local secure dev cluster Open PowerShell with admin privileges. replica or node levels. This can be used to report health from within a container. you can report health on any entity that is a part of the cluster. Use FabricClient . If you report problems and failures to the Azure Service Fabric health manager from your service code. 2. Report and check service health 4/17/2017 • 5 min to read • Edit Online When your services encounter problems. This won't be true in most real-world scenarios. With FabricClient . application. you can use standard health monitoring tools that Service Fabric provides to check the health status. service. There are three ways that you can report health from the service: Use Partition or CodePackageActivationContext objects. For example. Create a project by using the Stateful Service template. however. Use the REST APIs at the cluster. deployed application. You can use the Partition and CodePackageActivationContext objects to report the health of elements that are part of the current context. partition. and the application that it is a part of. you can read the series of in-depth articles about health that start with the link at the end of this article. and run the following commands. your ability to respond to and fix incidents and outages depends on your ability to detect the issues quickly. The example also shows how the tools that Service Fabric provides can be used to check the health status. Open Visual Studio as an administrator. To deploy an application and check its health 1. code that runs as part of a replica can report health only on that replica. . Ideally. For more detailed information. This article is intended to be a quick introduction to the health monitoring capabilities of Service Fabric. You can use FabricClient to report health from the service code if the cluster is not secure or if the service is running with admin privileges. the partition that it belongs to. service package. service code should only send reports that are related to its own health. This article walks you through an example that reports health from the service code. Press F5 to run the application in debug mode. right-click the Local Cluster Manager icon in the notification area and select Manage Local Cluster from the shortcut menu to open Service Fabric Explorer. The application health should be displayed as in this image. After the application is running. The application will be deployed to the local cluster. At this time. .3. the application should be healthy with no errors. 4. 5. To add custom health events to your service code The Service Fabric project templates in Visual Studio contain sample code. . Such reports will automatically show up in the standard tools for health monitoring that Service Fabric provides. The health report for the same application in PowerShell is in this image. and PowerShell. 1.6. and you can use Get-ServiceFabricServiceHealth to check a service's health. The following steps show how you can report custom health events from your service code. You can use Get-ServiceFabricApplicationHealth to check an application's health. such as Service Fabric Explorer. You can also check the health by using PowerShell. Reopen the application that you created previously in Visual Studio. or create a new application by using the Stateful Service Visual Studio template. Azure portal health view. You can see that this method returns a result that holds the current value of the counter because the key logic in this application is to keep a count running. and find the myDictionary. a. "StateDictionary". new HealthInformation("ServiceCode". } We report replica health because it's being reported from a stateful service. 3.HasValue) { HealthInformation healthInformation = new HealthInformation("ServiceCode". b. The HealthInformation parameter stores information about the health issue that's being reported. If this were a real application. Create the FabricClient instance after the var myDictionary declaration.Fabric.HealthManager. add the following steps. this.Partition.Error). "StateDictionary".PartitionId. HealthState.HasValue) { HealthInformation healthInformation = new HealthInformation("ServiceCode". Add the following code after the myDictionary. use the following code if (!result.Fabric.FromSeconds(0) }).HasValue) { var replicaHealthReport = new StatefulServiceReplicaHealthReport( this.Partition. . fabricClient. Add the System. comment out the first line in the health reporting code that you added earlier.ReportHealth(replicaHealthReport). b.Health namespace to the Stateful1.Health. and if the lack of result represented a failure. Open the Stateful1.cs file. this. To simulate the failure. the code will look like the following example.TryGetValueAsync call in the RunAsync method. you can also use FabricClient to report health as shown in the following steps.2. Add the following code after the myDictionary. To report a health event when the lack of result represents a failure.TryGetValueAsync call if (!result. } 4. HealthState.ReportReplicaHealth(healthInformation).ReportInstanceHealth(healthInformation).Error). Let's simulate this failure and see it show up in the health monitoring tools.ReplicaId.Context. HealthState. If you had created a stateless service. var fabricClient = new FabricClient(new FabricClientSettings() { HealthReportSendInterval = TimeSpan. you would want to flag that event.Error)). } 5. "StateDictionary". a. if (!result. this.Context.TryGetValueAsync call. using System. If your service is running with admin privileges or if the cluster is not secure.cs file. After you comment out the first line. open Service Fabric Explorer to check the health of the application. } This code will now fire this health report each time RunAsync executes. . Service Fabric Explorer will show that the application is unhealthy. 7. too.Partition. you will see that Health State indicates an error. This is because of the error that was reported from the code that we added previously. press F5 to run the application. This time. Because we did not set TimeToLive for this health report in the HealthInformation object.HasValue) { HealthInformation healthInformation = new HealthInformation("ServiceCode". This report will remain in the health manager until it is replaced by another report or until this replica is deleted. which in this case is the replica. You can also report health on Partition .ReportReplicaHealth(healthInformation). If you select the primary replica in the tree view of Service Fabric Explorer. Service Fabric Explorer also displays the health report details that were added to the HealthInformation parameter in the code.Error). We recommend that health should be reported on the most granular level. After you make the change. HealthState. You can see the same health reports in PowerShell and the Azure portal. After the application is running. "StateDictionary". this. the report will never expire. //if(!result. 6. "StateDictionary". To report health on Application . Next steps Deep dive on Service Fabric health REST API for reporting service health REST API for reporting application health .Error). HealthInformation healthInformation = new HealthInformation("ServiceCode".GetActivationContext(). var activationContext = FabricRuntime.Partition.ReportPartitionHealth(healthInformation). "StateDictionary". DeployedApplication . HealthState. HealthInformation healthInformation = new HealthInformation("ServiceCode".ReportApplicationHealth(healthInformation).Error). this. activationContext. HealthState. and DeployedServicePackage . use CodePackageActivationContext . reporting can be done from: The monitored Service Fabric service replica. and the impact on the cluster or application functionality. To design and implement health reporting. either periodically or on transitions. Therefore. The quality of the health reports determines the accuracy of the health view of the cluster. Add custom Service Fabric health reports 4/17/2017 • 19 min to read • Edit Online Azure Service Fabric introduces a health model designed to flag unhealthy cluster and application conditions on specific entities. Choose a reporting strategy. Read more at Using system health reports for troubleshooting. a Service Fabric stateless service that monitors conditions and issues reports). as it requires simpler code and is less prone to errors. They report on those conditions based on their local view. Determine the entity that the report applies to.HealthManager. The Service Fabric reporters monitor identified conditions of interest. decide on the health report property and health state. some thought is needed to provide reports that capture conditions of interest in the best possible way. External watchdogs that probe the resource from outside the Service Fabric cluster (for example. Define a source used to identify the reporter. the way it is monitored. flexible. The health model uses health reporters (system components and watchdogs). This can be done through the API by using FabricClient. especially if it can help flag problems close to the root. Once the health reporting design is clear. The recommended way is periodically. The user reports must be sent on health entities that have already been created by the system. Service writers need to think upfront about health. Using this information. NOTE Out of the box. False positives that wrongly show unhealthy issues can negatively impact upgrades or other services that use health data. and easy to use. monitoring service like Gomez). Internal watchdogs that run on the Service Fabric nodes but are not implemented as Service Fabric services. The model is intended to be rich. Determine how long the report for unhealthy conditions should stay in the health store and how it should be cleared. from within the service or from an internal or external watchdog. As mentioned. through PowerShell. or through REST. The watchdogs can be deployed an all nodes or can be affinitized to the monitored service. health reports can be sent easily. Any condition that can impact health should be reported on. Configuration knobs batch . the cluster is populated with health reports sent by the system components. Based on this information. The health information can save a lot of time and effort on debugging and investigation once the service is up and running at scale in the cloud (private or Azure). and system components must: Define the condition they are interested in. Determine where the reporting is done. Examples of such services are repair services and alerting mechanisms. Internal watchdogs deployed as a Service Fabric service (for example. decide the report's time to live and remove-on-expiration behavior. The health store aggregates health data sent by all reporters to determine whether entities are globally healthy. The goal is easy and fast diagnosis and repair. watchdogs. You can use FabricClient to report health if the cluster is not secure or if the fabric client has admin privileges.ReportHealth. This report is the last added report. NOTE Report health is synchronous. the fabric client must be kept alive for at least the HealthReportSendInterval to ensure that they are sent. Default: 30 seconds. the entity on which the report must be applied has been deleted. etc. The fact that the report is accepted by the health client or the Partition or CodePackageActivationContext objects doesn't mean that it is applied in the store. The following starts a connection to a local cluster: . var clientSettings = new FabricClientSettings() { HealthOperationTimeout = TimeSpan. retries happen every 40 seconds. Same parameters can be specified when a connection to a cluster is created through PowerShell. rather than sending one message for each report.FromSeconds(120). Health client The health reports are sent to the health store through a health client. }. HealthReportSendInterval = TimeSpan. For example. If a message times out. All configuration parameters can be specified when FabricClient is created by passing FabricClientSettings with the desired values for health-related entries. if a particular bad reporter is reporting 100 reports per second on the same property of the same entity. The buffering on the client takes the uniqueness of the reports into consideration. Used to batch reports into a single message. the fabric client must be kept alive longer to give it a chance to retry. which reflects the most current state of the entity. which lives inside the fabric client. the health client retries it until the health store confirms that the report has been processed. var fabricClient = new FabricClient(clientSettings). On timeouts and errors that can be retried. HealthOperationTimeout: The timeout period for a report message sent to the health store. The batching improves performance.reports for improved performance. HealthReportRetrySendInterval = TimeSpan. Only one such report exists in the client queue at most. the reports are replaced with the last version. The health client can be configured with the following: HealthReportSendInterval: The delay between the time the report is added to the client and the time it is sent to the health store.FromSeconds(40). If the message is lost or the health store cannot apply them due to transient errors. The following creates a fabric client and specifies that the reports should be sent when they are added. It is sent asynchronously and possibly batched with other reports. and it represents only the validation work on the client side. The processing on the server may still fail: the sequence number could be stale. HealthReportRetrySendInterval: The interval at which the health client resends accumulated health reports to the health store. Default: two minutes. Default: 30 seconds.FromSeconds(0). NOTE When the reports are batched. the number of reports sent to the health store is just one per send interval. If batching is configured. The same considerations explained for health client apply . PS C:\> Connect-ServiceFabricCluster -HealthOperationTimeoutInSec 120 -HealthReportSendIntervalInSec 0 - HealthReportRetrySendIntervalInSec 40 True ConnectionEndpoint : FabricClientSettings : { ClientFriendlyName : PowerShell-1944858a-4c6d-465f-89c7-9021c12ac0bb PartitionLocationCacheLimit : 100000 PartitionLocationCacheBucketCount : 1024 ServiceChangePollInterval : 00:02:00 ConnectionInitializationTimeout : 00:00:02 KeepAliveInterval : 00:00:20 HealthOperationTimeout : 00:02:00 HealthReportSendInterval : 00:00:00 HealthReportRetrySendInterval : 00:00:40 NotificationGatewayConnectionTimeout : 00:00:00 NotificationCacheUpdateTimeout : 00:00:00 } GatewayInformation : { NodeAddress : localhost:19000 NodeId : 1880ec88a3187766a6da323399721f53 NodeInstanceId : 130729063464981219 NodeName : Node. Use CodePackageActivationContext. Use CodePackageActivationContext. Design health reporting The first step in generating high-quality reports is identifying the conditions that can impact the health of the service.ReportDeployedApplicationHealth to report on the current application deployed on the current node.ReportInstanceHealth to report on the current service instance. the Partition and the CodePackageActivationContext hold a health client which is configured with default settings. use IStatefulServicePartition. NOTE Internally. Use IServicePartition. Use CodePackageActivationContext. Report from within low privilege services From within Service Fabric services that do not have admin access to the cluster.ReportPartitionHealth to report on the current partition entity. before a .ReportReplicaHealth to report on current replica. use IStatelessServicePartition. Read more about cluster security. you can report health on entities from the current context through Partition or CodePackageActivationContext .ReportApplicationHealth to report on current application. The FabricClient used for reporting must have security enabled to be able to communicate with the cluster (for example. with Kerberos or certificate authentication).1 } NOTE To ensure that unauthorized services can't report health against the entities in the cluster. the server can be configured to accept requests only from secured clients. For stateful services. For stateless services. so the objects should be kept alive to have a chance to send the report.ReportDeployedServicePackageHealth to report on a service package for the current application deployed on the current node. Any condition that can help flag problems in the service or cluster when it starts--or even better.reports are batched and sent on a timer. Note how the condition states are described in terms of health: the state of the condition that can be considered healthy or unhealthy (warning or error). If a condition impacts all replicas in a partition. It could report a warning if an upfront threshold is reached and report an error if the share is full. based on the condition. their reports can coexist. Once the monitoring details are set. Only health-related information should be reported as health. this is obvious. There. A watchdog within the service may not be able to detect the conditions. For example. and then checks the latency and correctness of the result. then it should be reported on the partition. you can have a watchdog that lives outside the cluster and issues requests to the service. On a warning. For example. but the desire is to have the condition flagged for more than the duration of replica life.) If multiple reports apply to the same condition. they can test the operations in the same way users call them. The watchdogs simply monitor the conditions and report. You should choose the entity with best possible granularity. This way. It could listen for notifications of file or directory changes. the property name can be ShareSize-sharename. such as a replica. (For the example above. as this information impacts the health evaluation of an entity. they must ensure that the source ID or the property is different. Most of the time. If the condition impacts an entity. The health store was not designed as a general- purpose store. If the monitored condition is the availability or functionality of the service as users see it. without affecting the main services in any way. NOTE The health store should not be used to keep status information. but instead affect interactions between services. For example. these watchdogs could be implemented as stateless services in the same application. the watchdog can be part of the monitored service itself. watchdog writers need to figure out the best way to monitor them for balance between overhead and usefulness. On an error. it's best to have the watchdogs in the same place as the user clients. when the replica is deleted. The advantage of this approach is that reporting is simple. Reporting from within the monitored service is not always an option. This means that watchdog writers must also think about the lifetimes of . for example. the property should contain some dynamic information that allows reports to coexist.problem happens--can potentially save billions of dollars. The property of the health report should capture the monitored condition. the service code can check the share usage. or. (For a calculator service. Care must be taken to prevent watchdog bugs from impacting the service functionality. Another option is to have watchdogs in the cluster as separate processes. There are corner cases where more thought is needed. Sometimes. The conditions also may not be specific to a service. For example. and higher customer satisfaction. and then report by using a local fabric client every time it tries to write a file. consider a service that does complex calculations that use some temporary files on a share. a watchdog writer needs to figure out how to implement the watchdog. fewer night hours spent investigating and repairing issues. all reports associated with it will be cleaned up from the store. report on the partition. A watchdog could monitor the share to ensure that enough space is available. a watchdog running in the cluster is not an option either. The next decision point is which entity to report on. not on the service. If the conditions can be determined from within the service. For example. It may not have the logic or data to make the determination. Otherwise. a repair system could start cleaning up older files on the share. you should decide on a source ID that uniquely identifies it. Once the conditions are identified. It uses health evaluation logic to aggregate all data into the health state. but it can negatively affect the performance of the health store. a repair system could move the service replica to another node. if multiple shares need to be monitored. does 2+2 return 4 in a reasonable amount of time?) Once the watchdog details have been finalized. the property could be ShareSize. If multiple watchdogs of the same type are living in the cluster. The benefits include less down time. either they must report on different entities. deployed on all nodes or on the same nodes as the service. though. though. The overhead of monitoring the conditions may be high. if they report on the same entity. Sending information unrelated to health (like reporting status with a health state of OK) will not impact the aggregated health state. If the queue reaches the maximum length and commands are dropped. If the secondary completes the task. the report is cleaned up automatically from store. The tasks are persistent. The monitored condition can be translated as a warning if the task is not done in a certain time (t1. It could also wait for secondaries to send back acknowledgement signals when they are done. If the task is not completed in time (t2. This behavior assumes that the tasks are idempotent. which is not desired. The report pinpoints the service instance that has issues. The recommended way for watchdog reporting is periodically. when an error reported on an entity no longer applies). It must be clear when a report should be cleaned up from a store (for example. If the master queue length reaches a threshold. and mark the reports to be removed when they expire to ensure cleanup.the entity and the report. If no status is received. as the service can't recover. The time to live is two minutes. It reports on the service instance on the property PendingTasks. The report doesn't capture the situation where the acknowledgement message is lost and the task is not finished from the master's point of view. Set time to live to a few minutes. the entity is evaluated at error. Depending on the design. and it's sent periodically every 30 seconds. Another condition that can be monitored is task execution time. The reports can be on the property QueueStatus. Incorrect healthy reports hide issues in the cluster. the master considers it a failure and reschedules the task. The warning indicates that the secondaries can't handle the load. If there are no pending tasks or all tasks started execution. as appropriate. The secondary that is executing a task reports when it takes longer than expected to run it. This reporting can be done in multiple ways: The master primary replica reports on itself periodically. The reports are cleaned up then. the newly promoted primary can continue to report properly. including upgrades. The secondaries execute the incoming requests and send back acknowledgement signals. Consider a Service Fabric application composed of a master stateful persistent service and secondary stateless services deployed on all nodes (one secondary service type for each type of task). The watchdog lives inside the service. The watchdogs must strive to be as simple as possible to avoid bugs that trigger incorrect reports. which sends back its status. Let's look at an example that puts together the points I described. In this case. the report status on the property PendingTasks is a warning or error. the reports are captured in application health when health is evaluated. However the reporting is done in the cases described above. One option is for the master to send a ping request to the same secondary. for example 20 minutes). If the primary goes down. In the second case. Report periodically vs. On a timer callback. the master could poll the secondaries for task status. For periodic reporting. Reports should be sent only on unhealthy states. If at least one task takes longer. The master distributes tasks to the secondaries based on the task type. You can have one property for all pending tasks in the queue. If the primary goes down. an error is reported. If they do not respect the thresholds. A report is also sent on each task that includes the task identifier. Incorrect unhealthy reports impact health evaluations and scenarios based on health. for example 10 minutes). care must be taken to detect situations where secondaries die or messages are lost. The master has a processing queue that contains commands to be executed by secondaries. One condition that could be monitored is the length of the master processing queue. the watchdog can . It could report on the secondary service. a warning is reported. the report status is OK. a report is sent on the master service. but it is deadlocked or having other issues. because the code is much simpler and less prone to errors. the watchdog can be implemented with a timer. based on the desired task result) to see if they are completed. Another watchdog process (in the cloud or external) checks the tasks (from outside. like PendingTask+taskId. the secondary instance clears the report from the store. but it doesn't capture the situation where the instance dies. If the service replica is up. on transition By using the health reporting model. the monitored condition can be translated as Error. the report expires in the health store. watchdogs can send reports periodically or on transitions. and it's sent periodically on the master primary replica. If the report is for parent partition or parent application. so that they can be inspected to determine state changes. the reports are rejected as stale. create a health information and pass it to correct reporting methods on Partition or CodePackageActivationContext to report on current entities. the report is sent on the deployed service package entity every 30 seconds. Logic must be added to maintain the correct state and clear the report from store when not needed anymore. Implement health reporting Once the entity and report details are clear. care must be taken on failover to avoid stale reports in the health store. While the health client is kept alive. care must be taken when a report is sent that may have not been sent previously (queued. PowerShell. API To report through the API. The downside is that the logic of the watchdog is complex. In the rare cases where data loss is incurred. There is no need to see which report was sent previously or make any optimizations in terms of messaging. . Give the report to a health client. through Partition or CodePackageActivationContext . you need to create a health report specific to the entity type they want to report on. If the resource is unavailable. The watchdog must maintain the conditions or the reports. the other services within the application can still function properly. The following example shows periodic reporting from a watchdog within the cluster. The resource is needed by a service manifest within the application. Reporting on transitions requires careful handling of state. Reporting on transitions makes sense for services reporting on themselves. The sequence number must be ever- increasing. property. Therefore. The upside of this approach is that fewer reports are needed. it retries internally until the report is acknowledged by the health store or the watchdog generates a newer report with the same entity. This automatic cleanup relaxes the need for synchronization between reporter and health store. If not. The watchdog checks whether an external resource can be accessed from within a node. sending health reports can be done through the API. and source. but not yet sent to the health store). When the local object (replica or deployed service package / deployed application) is removed.check the state and send a report based on the current state. The watchdog monitors some conditions and reports only when the conditions change. The health client has batching logic to help with performance. or REST. synchronization may be needed between the state of the reporter and the state of the health store. all its reports are also removed. Alternatively. On failover. . healthState)). public static void SendReport(object obj) { // Test whether the resource can be accessed from the node HealthState healthState = this. private static Uri ApplicationName = new Uri("fabric:/WordCount").HealthManager. ServiceManifestName.Service". the report has a health state of warning.NodeName. the report is already queued on the health client). When the CPU is above a threshold. // Send report on deployed service package. private static Timer ReportTimer = new Timer(new TimerCallback(SendReport).ReportHealth(deployedServicePackageHealthReport).GetNodeContext(). new HealthInformation("ExternalSourceWatcher". } PowerShell Send health reports with Send-ServiceFabricEntityTypeHealthReport. null. // FabricHealthMaxReportsReached (retryable. The reports should be sent every 30 seconds. Client. // TODO: handle exception. The following example shows periodic reporting on CPU values on a node. // Possible exceptions: FabricException with error codes // FabricHealthStaleReport (non-retryable. When the CPU remains above a threshold for more than the configured time. NodeName. 30 * 1000). as the connectivity is needed by the specific service manifest // and can be different on different nodes var deployedServicePackageHealthReport = new DeployedServicePackageHealthReport( ApplicationName. Otherwise. If they expire. private static string ServiceManifestName = "WordCount. the reporter has issues. Code omitted for snippet brevity. private static string NodeName = FabricRuntime. user should retry with exponential delay until the report is accepted).FromSeconds(0) }). the reporter sends a health state of OK. and they have a time to live of two minutes. private static FabricClient Client = new FabricClient(new FabricClientSettings() { HealthReportSendInterval = TimeSpan. "Connectivity". so the node is evaluated at error. it's reported as an error.TestConnectivityToExternalResource(). 30 * 1000. ConsiderWarningAsError=false. Property='CPU'. It then sends a report from PowershellWatcher on the property ResourceDependency. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/21/2015 8:02:12 AM SourceId : PowershellWatcher Property : CPU HealthState : Warning SequenceNumber : 130741236814913394 SentAt : 4/21/2015 9:01:21 PM ReceivedAt : 4/21/2015 9:01:21 PM TTL : 00:02:00 Description : CPU is above 80% threshold RemoveWhenExpired : False IsExpired : False Transitions : ->Warning = 4/21/2015 9:01:21 PM The following example reports a transient warning on a replica.1 -HealthState Warning -SourceId PowershellWatcher - HealthProperty CPU -Description "CPU is above 80% threshold" -TimeToLiveSec 120 PS C:\> Get-ServiceFabricNodeHealth -NodeName Node. PS C:\> Send-ServiceFabricNodeHealthReport -NodeName Node. HealthState='Warning'.1 AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='PowershellWatcher'.FM Property : State HealthState : Ok SequenceNumber : 5 SentAt : 4/21/2015 8:01:17 AM ReceivedAt : 4/21/2015 8:02:12 AM TTL : Infinite Description : Fabric node is up. and it is removed from the store automatically. It first gets the partition ID and then the replica ID for the service it is interested in. The report is of interest for only two minutes. .1 NodeName : Node. HealthEvents : SourceId : System. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/21/2015 8:02:12 AM SourceId : PowershellWatcher Property : ResourceDependency HealthState : Warning SequenceNumber : 130741243777723555 SentAt : 4/21/2015 9:12:57 PM ReceivedAt : 4/21/2015 9:12:57 PM TTL : 00:02:00 Description : The external resource that the primary is using has been rebooted at 4/21/2015 9:01:21 PM. ConsiderWarningAsError=false. For example. they can set up alerts based on health status to catch severe issues before they provoke outages.ReplicaRole -eq "Primary"}). Expect processing delays for a few minutes. RemoveWhenExpired : True IsExpired : False Transitions : ->Warning = 4/21/2015 9:12:32 PM REST Send health reports using REST with POST requests that go to the desired entity and have in the body the health report description. All entities are supported. For example.ReplicaId PS C:\> Send-ServiceFabricReplicaHealthReport -PartitionId $partitionId -ReplicaId $replicaId -HealthState Warning -SourceId PowershellWatcher -HealthProperty ResourceDependency -Description "The external resource that the primary is using has been rebooted at 4/21/2015 9:01:21 PM.PartitionId PS C:\> $replicaId = (Get-ServiceFabricReplica -PartitionId $partitionId | where {$_.RA Property : State HealthState : Ok SequenceNumber : 130740768777734943 SentAt : 4/21/2015 8:01:17 AM ReceivedAt : 4/21/2015 8:02:12 AM TTL : Infinite Description : Replica has been created. service writers and cluster/application administrators can think of ways to consume the information. HealthEvents : SourceId : System. Property='ResourceDependency'. Next steps Based on the health data." -TimeToLiveSec 120 -RemoveWhenExpired PS C:\> Get-ServiceFabricReplicaHealth -PartitionId $partitionId -ReplicaOrInstanceId $replicaId PartitionId : 8f82daff-eb68-4fd9-b631-7a37629e08c0 ReplicaId : 130740415594605869 AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='PowershellWatcher'. see how to send REST cluster health reports or service health reports. Introduction to Service Fabric health Monitoring View Service Fabric health reports How to report and check service health . Administrators can also set up repair systems to fix issues automatically. Expect processing delays for a few minutes. PS C:\> $partitionId = (Get-ServiceFabricPartition -ServiceName fabric:/WordCount/WordCount.Service). HealthState='Warning'. Use system health reports for troubleshooting Monitor and diagnose services locally Service Fabric application upgrade . Watchdogs can't use the same prefix for their sources. System health reports provide visibility into cluster and application functionality and flag issues through health." prefix. there are more events. If more neighborhoods are lost. If one neighborhood is lost in the entire Service Fabric ring. The health store creates and deletes entities based on the system reports. SourceId: System. Remove when expired behavior ensures that the report is cleaned up from the health store correctly. NOTE Watchdogs health reports are visible only after the system components create an entity. NOTE Service Fabric continues to add reports on conditions of interest that improve visibility into what is happening in the cluster and application. and the node ID is included in the property name. a new service replica instance is created). system health reports verify that entities are implemented and are behaving correctly from the Service Fabric perspective. The report specifies the global lease timeout as the time to live. The report is resent every half of the TTL duration for as long as the condition remains active. which starts with the "System. The report is from individual nodes. All reports associated with the old instance are deleted and cleaned up from the store. as reports with invalid parameters are rejected. For applications and services. read more at Service Fabric health model. If everything works properly. NOTE To understand health-related concepts. Use system health reports to troubleshoot 4/17/2017 • 17 min to read • Edit Online Azure Service Fabric components report out of the box on all entities in the cluster.Federation . The same is true when a new instance of the entity is created (for example. Cluster system health reports The cluster health entity is created automatically in the health store. The event is automatically removed when it expires. It also organizes them in a hierarchy that captures entity interactions. Let's look at some system reports to understand what triggers them and how to correct the possible issues they represent. When an entity is deleted.Federation reports an error when it detects a neighborhood loss. you can typically expect two events (both sides of the gap report). it doesn't have a system report. even if the reporting node is down. The reports do not provide any health monitoring of the business logic of the service or detection of hung processes. The system component reports are identified by the source. the health store automatically deletes all health reports associated with it. Neighborhood loss System. User services can enrich the health data with information specific to their logic. FM Property: State Next steps: If the node is down for an upgrade.FM event with a health state of OK for node up: PS C:\> Get-ServiceFabricNodeHealth -NodeName Node. TTL of these events is infinite. The health hierarchy built by the health store takes action on deployed entities in correlation with System. either for upgrading or simply because it has failed).FabricNode Property: Starts with Certificate and contains more information about the certificate type Next steps: Update the certificates if they are near expiration. the health state should be switched back to OK. In this case.FM showing its state.FabricNode reports a warning when certificates used by the node are near expiration. Load capacity violation The Service Fabric Load Balancer reports a warning if it detects a node capacity violation. The node entities are removed when the node state is removed (see RemoveNodeStateAsync). the report health state is OK. When the expiration is within two weeks. SourceId: System. When the expiration is at least two weeks away. It considers the node a virtual parent of all deployed entities. the health store automatically cleans up the deployed entities that can exist only on the down node or on the previous instance of the node. Property: Starts with Neighborhood and includes node information Next steps: Investigate why the neighborhood is lost (for example.FM. When System. Certificate_server. There are three certificates per node: Certificate_cluster. which represents the Failover Manager service. the problem needs more investigation. Node system health reports System. Each node should have one report from System. If the node doesn't come back or it fails.FM reports that the node is down or restarted (a new instance). check the communication between cluster nodes). It reports an error when the node departs the ring (it's down. SourceId: System. . is the authority that manages information about cluster nodes. and they are removed when a node leaves the cluster.FM. and Certificate_default_client.FM reports as OK when the node joins the ring (it's up and running). The following example shows the System. with the same instance as the instance associated with the entities.FM Property : State HealthState : Ok SequenceNumber : 2 SentAt : 4/24/2015 5:27:33 PM ReceivedAt : 4/24/2015 5:28:50 PM TTL : Infinite Description : Fabric node is up.1 AggregatedHealthState : Ok HealthEvents : SourceId : System.1 NodeName : Node. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 5:28:50 PM Certificate expiration System. The deployed entities on that node are exposed through queries if the node is reported as up by System. Node up/down System. it should come back up once it has been upgraded. the report type is a warning.FM node reports. PLB Property: Starts with Capacity Next steps: Check provided metrics and view the current capacity on the node.CM Property: State Next steps: If the application has been created. State System. which represents the Failover Manager service. is the authority that manages information about an application. check the state of the application by issuing a query (for example. Otherwise. which represents the Cluster Manager service.CM. The following example shows the state event on the fabric:/WordCount application: PS C:\> Get-ServiceFabricApplicationHealth fabric:/WordCount -ServicesFilter None -DeployedApplicationsFilter None ApplicationName : fabric:/WordCount AggregatedHealthState : Ok ServiceHealthStates : None DeployedApplicationHealthStates : None HealthEvents : SourceId : System. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:12:51 PM Service system health reports System. so that it can be removed from store. is the authority that manages information about services. SourceId: System. Application system health reports System.CM reports as OK when the application has been created or updated.FM reports as OK when the service has been created. It deletes the entity from the health store when the service has been deleted.FM. it should include the Cluster Manager health report. SourceId: System. SourceId: System.CM Property : State HealthState : Ok SequenceNumber : 82 SentAt : 4/24/2015 6:12:51 PM ReceivedAt : 4/24/2015 6:12:51 PM TTL : Infinite Description : Application has been created. State System. It informs the health store when the application has been deleted. the PowerShell cmdlet Get- ServiceFabricApplication -ApplicationName *applicationName*).FM Property: State The following example shows the state event on the service fabric:/WordCount/WordCountService: . RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:13:01 PM Unplaced replicas violation System. SourceId: System.FM Property : State HealthState : Ok SequenceNumber : 3 SentAt : 4/24/2015 6:12:51 PM ReceivedAt : 4/24/2015 6:13:01 PM TTL : Infinite Description : Service has been created.PLB'. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:18 PM. ConsiderWarningAsError=false. LastWarning = 1/1/0001 12:00:00 AM SourceId : System. The report is removed when it expires. Property='ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. PartitionHealthStates : PartitionId : a1f83a35-d6bf-4d39-b90d-28d15f39599b AggregatedHealthState : Warning HealthEvents : SourceId : System. PS C:\> Get-ServiceFabricServiceHealth fabric:/WordCount/WordCountService ServiceName : fabric:/WordCount/WordCountService AggregatedHealthState : Ok PartitionHealthStates : PartitionId : 875a1caa-d79f-43bd-ac9d-43ee89a9891c AggregatedHealthState : Ok HealthEvents : SourceId : System. The following example shows a violation for a service configured with 7 target replicas in a cluster with 5 nodes: PS C:\> Get-ServiceFabricServiceHealth fabric:/WordCount/WordCountService ServiceName : fabric:/WordCount/WordCountService AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System.PLB Property : ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39- b90d-28d15f39599b HealthState : Warning .PLB reports a warning if it cannot find a placement for one or more service replicas.FM Property : State HealthState : Ok SequenceNumber : 10 SentAt : 3/22/2016 7:56:53 PM ReceivedAt : 3/22/2016 7:57:18 PM TTL : Infinite Description : Service has been created. HealthState='Warning'.FM Property: State Next steps: Check the service constraints and the current state of the placement. possibly. If the partition is in quorum loss. If the partition is below the minimum replica count. which represents the Failover Manager service. the build takes longer than for a service with a small amount of state.FM reports as OK when the partition has been created and is healthy.FM reports an error. due to the following constraints and properties: Placement Constraint: N/A Depended Service: N/A Constraint Elimination Sequence: ReplicaExclusionStatic eliminated 4 possible node(s) for placement -. is the authority that manages information about service partitions. State System. such as SQL Database. LastOk = 1/1/0001 12:00:00 AM Partition system health reports System. ReplicaExclusionDynamic eliminated 1 possible node(s) for placement -. Nodes Eliminated By Constraints: ReplicaExclusionStatic: FaultDomain:fd:/0 NodeName:_Node_0 NodeType:NodeType0 UpgradeDomain:0 UpgradeDomain: ud:/0 Deactivation Intent/Status: None/None FaultDomain:fd:/1 NodeName:_Node_1 NodeType:NodeType1 UpgradeDomain:1 UpgradeDomain: ud:/1 Deactivation Intent/Status: None/None FaultDomain:fd:/3 NodeName:_Node_3 NodeType:NodeType3 UpgradeDomain:3 UpgradeDomain: ud:/3 Deactivation Intent/Status: None/None FaultDomain:fd:/4 NodeName:_Node_4 NodeType:NodeType4 UpgradeDomain:4 UpgradeDomain: ud:/4 Deactivation Intent/Status: None/None ReplicaExclusionDynamic: FaultDomain:fd:/2 NodeName:_Node_2 NodeType:NodeType2 UpgradeDomain:2 UpgradeDomain: ud:/2 Deactivation Intent/Status: None/None RemoveWhenExpired : True IsExpired : False Transitions : Error->Warning = 3/22/2016 7:57:48 PM. System. but it is below the target replica count. .1/5 node(s) remain. Other important events include a warning when the reconfiguration takes longer than expected and when the build takes longer than expected. It deletes the entity from the health store when the partition is deleted.FM. if a service has a terabyte of state.0/5 node(s) remain. it reports a warning. If the partition is not below the minimum replica count. HealthState : Warning SequenceNumber : 131032232425505477 SentAt : 3/23/2016 4:14:02 PM ReceivedAt : 3/23/2016 4:14:03 PM TTL : 00:01:05 Description : The Load Balancer was unable to find a placement for one or more of the Service's Replicas: fabric:/WordCount/WordCountService Secondary Partition a1f83a35-d6bf-4d39-b90d- 28d15f39599b could not be placed. For example. it reports an error. The expected times for the build and reconfiguration are configurable based on service scenarios. it's possible that some replicas have not been created. which shows how it is configured: MinReplicaSetSize is two and TargetReplicaSetSize is seven. SourceId: System. opened.FM Property: State Next steps: If the health state is not OK. Then get the number of nodes in the cluster: five. So in this case. The next step is to get the partition description. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:33:31 PM The following example shows the health of a partition that is below target replica count. . The following example shows a healthy partition: PS C:\> Get-ServiceFabricPartition fabric:/StatelessPiApplication/StatelessPiService | Get- ServiceFabricPartitionHealth PartitionId : 29da484c-2c08-40c5-b5d9-03774af9a9bf AggregatedHealthState : Ok ReplicaHealthStates : None HealthEvents : SourceId : System. two replicas can't be placed. In many instances. the root cause is a service bug in the open or change-role implementation. or promoted to primary or secondary correctly.FM Property : State HealthState : Ok SequenceNumber : 38 SentAt : 4/24/2015 6:33:10 PM ReceivedAt : 4/24/2015 6:33:31 PM TTL : Infinite Description : Partition is healthy. RemoveWhenExpired : False IsExpired : False Transitions : Ok->Warning = 4/24/2015 6:13:31 PM PS C:\> Get-ServiceFabricPartition fabric:/WordCount/WordCountService PartitionId : 875a1caa-d79f-43bd-ac9d-43ee89a9891c PartitionKind : Int64Range PartitionLowKey : 1 PartitionHighKey : 26 PartitionStatus : Ready LastQuorumLossDuration : 00:00:00 MinReplicaSetSize : 2 TargetReplicaSetSize : 7 HealthState : Warning DataLossNumber : 130743727710830900 ConfigurationNumber : 8589934592 PS C:\> @(Get-ServiceFabricNode). ReplicaHealthStates : None HealthEvents : SourceId : System.FM'. Property='State'.PLB reports a warning if it detects a replica constraint violation and can't place replicas of the partition.FM Property : State HealthState : Warning SequenceNumber : 37 SentAt : 4/24/2015 6:13:12 PM ReceivedAt : 4/24/2015 6:13:31 PM TTL : Infinite Description : Partition is below target replica or instance count. is the authority for the replica state. SourceId: System. PS C:\> Get-ServiceFabricPartition fabric:/WordCount/WordCountService | Get-ServiceFabricPartitionHealth - ReplicasFilter None PartitionId : 875a1caa-d79f-43bd-ac9d-43ee89a9891c AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System.RA.Count 5 Replica constraint violation System. which represents the reconfiguration agent component.RA reports as OK when the replica has been created.PLB Property: Starts with ReplicaConstraintViolation Replica system health reports System. ConsiderWarningAsError=false. State System. SourceId: System.RA Property: State The following example shows a healthy replica: . HealthState='Warning'. so you get its health. Slow service API call System. investigate why the replica open takes longer than expected. For this case. the report is issued much faster (a configurable interval. The description provides more details about the time the API has been pending.ReplicaRole -eq "Primary"} | Get-ServiceFabricReplicaHealth PartitionId : 875a1caa-d79f-43bd-ac9d-43ee89a9891c ReplicaId : 130743727717237310 AggregatedHealthState : Ok HealthEvents : SourceId : System. SourceId: System. One of the replicas has a warning health state. PS C:\> Get-ServiceFabricPartition fabric:/WordCount/WordCountService | Get-ServiceFabricReplica | where {$_. The property changes to OK if the open completes. so you may not see any replicas in the warning state.FM'. Property='State'. PS C:\> Get-ServiceFabricPartition fabric:/HelloWorldStatefulApplication/HelloWorldStateful | Get- ServiceFabricPartitionHealth PartitionId : 72a0fb3e-53ec-44f2-9983-2f272aca3e38 AggregatedHealthState : Error UnhealthyEvaluations : Error event: SourceId='System. You can retry getting the health state and look for any differences in the replica ID. the next step is to look at the service code and investigate there. an event reported by System. Next steps: Investigate why the call takes longer than expected. SourceId: System.Replicator report a warning if a call to the user service code takes longer than the configured time.RAP and System. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:13:02 PM Replica open status The description of this health report contains the start time (Coordinated Universal Time) when the API call was invoked.RAP or System. In certain cases. the RunAsync implementation of the stateful service throws an unhandled exception. the retries can give you clues.Replicator Property: The name of the slow API. The warning is cleared when the call completes. After this information is received.RAP. The following example shows a partition in quorum loss.RA Property : State HealthState : Ok SequenceNumber : 130743727718018580 SentAt : 4/24/2015 6:12:51 PM ReceivedAt : 4/24/2015 6:13:02 PM TTL : Infinite Description : Replica has been created. If the API impacts service availability. The time measured includes the time taken for the replicator open and the service open. System. ReplicaHealthStates : ReplicaId : 130743748372546446 AggregatedHealthState : Ok . with a default of 30 seconds). It shows that the service operation takes longer than expected.RA reports a warning if the replica open takes longer than the configured period (default: 30 minutes).RA Property: ReplicaOpenStatus Next steps: If the health state is not OK. and the investigation steps done to figure out why. The replicas are recycling. RAP'. HealthState='Warning'. AggregatedHealthState : Ok ReplicaId : 130743746168084332 AggregatedHealthState : Ok ReplicaId : 130743746195428808 AggregatedHealthState : Warning ReplicaId : 130743746195428807 AggregatedHealthState : Ok HealthEvents : SourceId : System. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 7:00:33 PM . RemoveWhenExpired : False IsExpired : False Transitions : Warning->Error = 4/24/2015 6:51:31 PM PS C:\> Get-ServiceFabricPartition fabric:/HelloWorldStatefulApplication/HelloWorldStateful PartitionId : 72a0fb3e-53ec-44f2-9983-2f272aca3e38 PartitionKind : Int64Range PartitionLowKey : -9223372036854775808 PartitionHighKey : 9223372036854775807 PartitionStatus : InQuorumLoss LastQuorumLossDuration : 00:00:13 MinReplicaSetSize : 2 TargetReplicaSetSize : 3 HealthState : Error DataLossNumber : 130743746152927699 ConfigurationNumber : 227633266688 PS C:\> Get-ServiceFabricReplica 72a0fb3e-53ec-44f2-9983-2f272aca3e38 130743746195428808 ReplicaId : 130743746195428808 ReplicaAddress : PartitionId: 72a0fb3e-53ec-44f2-9983-2f272aca3e38.RA Property : State HealthState : Ok SequenceNumber : 130743756170185892 SentAt : 4/24/2015 7:00:17 PM ReceivedAt : 4/24/2015 7:00:33 PM TTL : Infinite Description : Replica has been created.3 ReplicaStatus : Ready LastInBuildDuration : 00:00:01 HealthState : Warning PS C:\> Get-ServiceFabricReplicaHealth 72a0fb3e-53ec-44f2-9983-2f272aca3e38 130743746195428808 PartitionId : 72a0fb3e-53ec-44f2-9983-2f272aca3e38 ReplicaId : 130743746195428808 AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System. HealthEvents : SourceId : System. Property='ServiceOpenOperationDuration'. ConsiderWarningAsError=false. ReplicaId: 130743746195428808 ReplicaRole : Primary NodeName : Node.FM Property : State HealthState : Error SequenceNumber : 182 SentAt : 4/24/2015 7:00:17 PM ReceivedAt : 4/24/2015 7:00:31 PM TTL : Infinite Description : Partition is in quorum loss. Replication queue full System. SourceId: System. On the secondary. More methods can be found under FabricClient.019 RemoveWhenExpired : False IsExpired : False Transitions : ->Warning = 4/24/2015 7:00:59 PM When you start the faulty application under the debugger. depending on the replica role Slow Naming operations System. On the primary. Examples of Naming operations are CreateServiceAsync or DeleteServiceAsync. this usually happens because one or more secondary replicas are slow to acknowledge operations.Replicator reports a warning if the replication queue is full. The warning is cleared when the queue is no longer full. the diagnostic events windows show the exception thrown from RunAsync: Visual Studio 2015 diagnostic events: RunAsync failure in fabric:/HelloWorldStatefulApplication.NamingService reports health on its primary replica when a Naming operation takes longer than acceptable.RAP Property : ServiceOpenOperationDuration HealthState : Warning SequenceNumber : 130743756399407044 SentAt : 4/24/2015 7:00:39 PM ReceivedAt : 4/24/2015 7:00:59 PM TTL : Infinite Description : Start Time (UTC): 2015-04-24 19:00:17. for example under service management methods or property management methods. SourceId : System.Replicator Property: PrimaryReplicationQueueStatus or SecondaryReplicationQueueStatus. this usually happens when the service is slow to apply the operations. . called Name Owner partitions. the health report includes details about the error. NO completed the last operation with Timeout. When a Naming operation takes longer than expected. The operation took longer than the configured duration. NOTE The Naming service resolves service names to a location in the cluster and enables users to manage service names and properties. delete service may be stuck on a node because the application host keeps crashing on a node due to a user bug in the service code. SourceId: System. The Service Fabric names are mapped to different partitions. If the operation completes with an error. which contains metadata about all Service Fabric names and services.fabric:/MyApp/MyService. Next steps: Check why the Naming operation fails.NamingService Property: Starts with prefix Duration_ and identifies the slow operation and the Service Fabric name on which the operation is applied. Each operation can have different root causes. It is a Service Fabric partitioned persisted service. If the operation completes successfully. the Warning is cleared. . The following example shows a create service operation. the same replica is primary for both the AO and NO roles. AO points to the role of the Naming partition for this name and operation. the property is Duration_AOCreateService. the operation is flagged with a Warning report on the primary replica of the Naming service partition that serves the operation. One of the partitions represents the Authority Owner. For example. For example. if create service at name fabric:/MyApp/MyService takes too long. Read more about Naming service. so the service is extensible. AO retries and sends work to NO. In this case. RemoveWhenExpired : True IsExpired : False Transitions : Error->Warning = 4/29/2016 8:39:38 PM. including the rollout version Next steps: If the application is unhealthy. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 4/29/2016 8:39:08 PM. HealthEvents : SourceId : System. Property='Duration_AOCreateService. Activation System.000.Hosting is the authority on deployed entities. Otherwise.677 is taking longer than 30. investigate why the activation failed.NamingService Property : Duration_AOCreateService.NamingService'. SourceId: System. ConsiderWarningAsError=false. HealthState='Warning'. LastWarning = 1/1/0001 12:00:00 AM SourceId : System. PartitionId : 00000000-0000-0000-0000-000000001000 ReplicaId : 131064359253133577 AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System. LastOk = 1/1/0001 12:00:00 AM SourceId : System.Hosting reports as OK when an application has been successfully activated on the node. The following example shows successful activation: . LastOk = 1/1/0001 12:00:00 AM DeployedApplication system health reports System.fabric:/MyApp/MyService HealthState : Warning SequenceNumber : 131064359526778775 SentAt : 4/29/2016 8:39:12 PM ReceivedAt : 4/29/2016 8:39:38 PM TTL : 00:05:00 Description : The AOCreateService started at 2016-04-29 20:39:08. RemoveWhenExpired : True IsExpired : False Transitions : Error->Warning = 4/29/2016 8:39:38 PM.Hosting Property: Activation.fabric:/MyApp/MyService'.000.689 completed with FABRIC_E_TIMEOUT in more than 30. it reports an error.fabric:/MyApp/MyService HealthState : Warning SequenceNumber : 131064360657607311 SentAt : 4/29/2016 8:41:05 PM ReceivedAt : 4/29/2016 8:41:08 PM TTL : 00:00:15 Description : The NOCreateService started at 2016-04-29 20:39:08.NamingService Property : Duration_NOCreateService.RA Property : State HealthState : Ok SequenceNumber : 131064359308715535 SentAt : 4/29/2016 8:38:50 PM ReceivedAt : 4/29/2016 8:39:08 PM TTL : Infinite Description : Replica has been created. 1 AggregatedHealthState : Ok DeployedServicePackageHealthStates : ServiceManifestName : WordCountServicePkg NodeName : Node. it reports a warning as configured. . It reports an error if the registration wasn't done in time (as configured by using ServiceTypeRegistrationTimeout).Hosting reports as OK if the service type has been registered successfully. DeployedServicePackage system health reports System. Otherwise.Hosting reports an error if the application package download fails. this is because the run time has been closed. If CodePackage fails to activate or terminates with an error greater than the configured CodePackageHealthErrorThreshold. CodePackageActivation:Code:SetupEntryPoint) Service type registration System.1 -ApplicationName fabric:/WordCount ApplicationName : fabric:/WordCount NodeName : Node. SourceId: System. Code package activation System. Hosting reports a warning. If the service type is unregistered from the node. If the activation fails. it reports an error.Hosting reports as OK if the service package activation on the node is successful. PS C:\> Get-ServiceFabricDeployedApplicationHealth -NodeName Node.Hosting Property: Download:*RolloutVersion* Next steps: Investigate why the download failed on the node.Hosting is the authority on deployed entities. Service package activation System. If a service package contains multiple code packages.1 AggregatedHealthState : Ok HealthEvents : SourceId : System.Hosting Property : Activation HealthState : Ok SequenceNumber : 130743727751144415 SentAt : 4/24/2015 6:12:55 PM ReceivedAt : 4/24/2015 6:13:03 PM TTL : Infinite Description : The application was activated successfully. hosting reports an error. SourceId: System.Hosting Property: Uses the prefix CodePackageActivation and contains the name of the code package and the entry point as CodePackageActivation:CodePackageName:*SetupEntryPoint/EntryPoint* (for example.Hosting reports as OK for each code package if the activation is successful.Hosting Property: Activation Next steps: Investigate why the activation failed. SourceId: System. an activation report is generated for each one. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:13:03 PM Download System. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:13:03 PM SourceId : System. SourceId: System. SourceId: System. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:13:03 PM SourceId : System. ServiceTypeRegistration:FileStoreServiceType) The following example shows a healthy deployed service package: PS C:\> Get-ServiceFabricDeployedServicePackageHealth -NodeName Node.Hosting Property : CodePackageActivation:Code:EntryPoint HealthState : Ok SequenceNumber : 130743727751613185 SentAt : 4/24/2015 6:12:55 PM ReceivedAt : 4/24/2015 6:13:03 PM TTL : Infinite Description : The CodePackage was activated successfully.Hosting Property : Activation HealthState : Ok SequenceNumber : 130743727751456915 SentAt : 4/24/2015 6:12:55 PM ReceivedAt : 4/24/2015 6:13:03 PM TTL : Infinite Description : The ServicePackage was activated successfully.Hosting Property : ServiceTypeRegistration:WordCountServiceType HealthState : Ok SequenceNumber : 130743727753644473 SentAt : 4/24/2015 6:12:55 PM ReceivedAt : 4/24/2015 6:13:03 PM TTL : Infinite Description : The ServiceType was registered successfully.1 AggregatedHealthState : Ok HealthEvents : SourceId : System.1 -ApplicationName fabric:/WordCount - ServiceManifestName WordCountServicePkg ApplicationName : fabric:/WordCount ServiceManifestName : WordCountServicePkg NodeName : Node. SourceId: System.Hosting reports an error if validation during the upgrade fails or if the upgrade fails on the node.Hosting Property: Uses the prefix FabricUpgradeValidation and contains the upgrade version . Upgrade validation System.Hosting reports an error if the service package download fails. RemoveWhenExpired : False IsExpired : False Transitions : ->Ok = 4/24/2015 6:13:03 PM Download System.Hosting Property: Download:*RolloutVersion* Next steps: Investigate why the download failed on the node.Hosting Property: Uses the prefix ServiceTypeRegistration and contains the service type name (for example. Description: Points to the error encountered Next steps View Service Fabric health reports How to report and check service health Monitor and diagnose services locally Service Fabric application upgrade . Because there are only five nodes. the service partition is yellow because of the system report. One of its services. <Service Name="WordCountService"> <StatefulService ServiceTypeName="WordCountServiceType" TargetReplicaSetSize="7" MinReplicaSetSize="2"> <UniformInt64Partition PartitionCount="1" LowKey="1" HighKey="26" /> </StatefulService> </Service> Health in Service Fabric Explorer Service Fabric Explorer provides a visual view of the cluster. Service Fabric provides multiple ways to get the aggregated health of the entities: Service Fabric Explorer or other visualization tools Health queries (through PowerShell. as there are only five nodes. Since the service is configured with seven replicas. some other applications are deployed. In the image below. the system components show a warning that the partition is below the target count. They are strict policies and do not tolerate any failure. Next to the fabric:/System application (which exists out of the box). Although it's not shown here. fabric:/WordCount/WordCountService is yellow (in warning). or REST) General queries that return a list of entities that have health as one of the properties (through PowerShell. or REST) To demonstrate these options. let's use a local cluster with five nodes. the cluster is populated with health reports sent by the system components. The cluster is red because of the red application. The evaluation uses default policies from the cluster manifest and application manifest. you can see that: The application fabric:/WordCount is red (in error) because it has an error event reported by MyWatchdog for the property Availability. One of these applications is fabric:/WordCount. they can't all be placed. View Service Fabric health reports 4/17/2017 • 27 min to read • Edit Online Azure Service Fabric introduces a health model that comprises health entities on which system components and watchdogs can report local conditions that they are monitoring. the API. the API. This application contains a stateful service configured with seven replicas. The health store aggregates all health data to determine whether entities are healthy. View of the cluster with Service Fabric Explorer: . Out of the box. Read more at Use system health reports to troubleshoot. The yellow partition triggers the yellow service. PowerShell cmdlets. child health states (when applicable). NOTE Read more about Service Fabric Explorer. NOTE A health entity is returned when it is fully populated in the health store. Computed by the health store based on entity health reports. To get complete health for a child. call the query health for the child entity type and pass in the child identifier. The queries also accept filters for returning only partial children or events--the ones that respect the specified filters. entity health events. They can be accessed through the API (the methods can be found on FabricClient. An entity's health contains: The aggregated health state of the entity. If no health policies are specified. NOTE The output filters are applied on the server side. The entity must be active (not deleted) and have a system report. The queries accept optional health policy parameters.HealthManager). Its parent entities on the hierarchy chain must also have system reports. the health queries return an exception that shows why the entity is not returned. Read more about entity health evaluation. The health events on the entity. Health queries Service Fabric exposes health queries for each of the supported entity types. These queries return complete health information about the entity: the aggregated health state. The health states contain entity identifiers and the aggregated health state. . If any of these conditions is not satisfied. and REST. so the message reply size is reduced. rather than apply filters on the client side. The collection of health states of all children for the entities that can have children. and unhealthy evaluations when the entity is not healthy. if the entity is not healthy. the health policies from the cluster or application manifest are used for evaluation. We recommended that you use the output filters to limit the data returned. which depends on the entity type. child health states (when applicable). The health queries must pass in the entity identifier. and health policies. The unhealthy evaluations that point to the report that triggered the state of the entity. or both warnings and errors). PowerShell The cmdlet to get the cluster health is Get-ServiceFabricClusterHealth. The following code gets the cluster health by using a custom cluster health policy and filters for nodes and applications. var applicationsFilter = new ApplicationHealthStatesFilter() { HealthStateFilterValue = HealthStateFilter.Warning }.GetClusterHealthAsync(queryDescription). ClusterHealth clusterHealth = await fabricClient. create a FabricClient and call the GetClusterHealthAsync method on its HealthManager. }. Input: [Optional] The cluster health policy used to evaluate the nodes and the cluster events.HealthManager. and fabric:/WordCount configured as described. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. errors only. var policy = new ClusterHealthPolicy() { MaxPercentUnhealthyNodes = 20 }. nodes. All events. [Optional] The application health policy map.Error }. and applications that specify which entries are of interest and should be returned in the result (for example. ApplicationsFilter = applicationsFilter. var queryDescription = new ClusterHealthQueryDescription() { HealthPolicy = policy. The following cmdlet gets cluster health by using default health policies.Get cluster health Returns the health of the cluster entity and contains the health states of applications and nodes (children of the cluster). with the health policies used to override the application manifest policies. The following call gets the cluster health: ClusterHealth clusterHealth = await fabricClient. the system application. NodesFilter = nodesFilter. and applications are used to evaluate the entity aggregated health. First. regardless of the filter. nodes.HealthManager.Error | HealthStateFilter. Note how the unhealthy evaluations provide details on the conditions that triggered the aggregated health. which contains the input information.GetClusterHealthAsync(). API To get cluster health. var nodesFilter = new NodeHealthStatesFilter() { HealthStateFilterValue = HealthStateFilter. The aggregated health state is in warning. The state of the cluster is five nodes. . [Optional] Filters for events. It creates ClusterHealthQueryDescription. because the fabric:/WordCount application is in warning. AggregatedHealthState='Warning'. Property='ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. As a result. Unhealthy event: SourceId='System. no nodes are returned. the application is evaluated as in error. Unhealthy services: 100% (1/1). and so is the cluster. NodeHealthStates : NodeName : _Node_2 AggregatedHealthState : Ok NodeName : _Node_0 AggregatedHealthState : Ok NodeName : _Node_1 AggregatedHealthState : Ok NodeName : _Node_3 AggregatedHealthState : Ok NodeName : _Node_4 AggregatedHealthState : Ok ApplicationHealthStates : ApplicationName : fabric:/System AggregatedHealthState : Ok ApplicationName : fabric:/WordCount AggregatedHealthState : Warning HealthEvents : None The following PowerShell cmdlet gets the health of the cluster by using a custom application policy. ConsiderWarningAsError=false. . PS C:\> Get-ServiceFabricClusterHealth AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy applications: 100% (1/1). Unhealthy service: ServiceName='fabric:/WordCount/WordCountService'. Unhealthy application: ApplicationName='fabric:/WordCount'. ServiceType='WordCountServiceType'.PLB'. AggregatedHealthState='Warning'. MaxPercentUnhealthyApplications=0%. HealthState='Warning'. as they are all healthy. Because the custom policy specifies to consider warnings as errors for the fabric:/WordCount application. MaxPercentUnhealthyServices=0%. It filters results to get only error or warning applications and nodes. Only the fabric:/WordCount application respects the applications filter. Fabric.Uri -ArgumentList "fabric:/WordCount" $appHealthPolicyMap.Health. Input: [Required] The node name that identifies the node. Unhealthy application: ApplicationName='fabric:/WordCount'. PS c:\> $appHealthPolicy = New-Object -TypeName System. Unhealthy services: 100% (1/1). Unhealthy service: ServiceName='fabric:/WordCount/WordCountService'. The following code gets the node health for the specified node name: NodeHealth nodeHealth = await fabricClient.ApplicationHealthPolicyMap $appUri1 = New-Object -TypeName System. ConsiderWarningAsError=true.ConsiderWarningAsError = $true $appHealthPolicyMap = New-Object -TypeName System.GetNodeHealthAsync(nodeName). AggregatedHealthState='Error'. Get node health Returns the health of a node entity and contains the health events reported on the node.HealthManager. All events are used to evaluate the entity aggregated health.Add($appUri1. API To get node health through the API. HealthState='Warning'.Error" AggregatedHealthState : Error UnhealthyEvaluations : Unhealthy applications: 100% (1/1). errors only. create a FabricClient and call the GetNodeHealthAsync method on its HealthManager. [Optional] Filters for events that specify which entries are of interest and should be returned in the result (for example.ApplicationHealthPolicy $appHealthPolicy.Fabric. $appHealthPolicy) Get-ServiceFabricClusterHealth -ApplicationHealthPolicyMap $appHealthPolicyMap -ApplicationsFilter "Warning. MaxPercentUnhealthyApplications=0%. Property='ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. ServiceType='WordCountServiceType'. regardless of the filter. AggregatedHealthState='Error'. or both warnings and errors).PLB'. Unhealthy event: SourceId='System. NodeHealthStates : None ApplicationHealthStates : ApplicationName : fabric:/WordCount AggregatedHealthState : Error HealthEvents : None REST You can get cluster health with a GET request or a POST request that includes health policies described in the body. MaxPercentUnhealthyServices=0%.Error" -NodesFilter "Warning. The following code gets the node health for the specified node name and passes in events filter and custom policy through NodeHealthQueryDescription: .Health. [Optional] The cluster health policy settings used to evaluate health. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. It contains the health states of the deployed application and service children. PowerShell The cmdlet to get the node health is Get-ServiceFabricNodeHealth.Warning }. NodeHealth nodeHealth = await fabricClient. var queryDescription = new NodeHealthQueryDescription(nodeName) { HealthPolicy = new ClusterHealthPolicy() { ConsiderWarningAsError = true }. [Optional] The application health policy used to override the application manifest policies. regardless of the filter. and deployed applications that specify which entries are of interest and should be returned in the result (for example. The following cmdlet gets the node health by using default health policies: PS C:\> Get-ServiceFabricNodeHealth _Node_1 NodeName : _Node_1 AggregatedHealthState : Ok HealthEvents : SourceId : System. All events.GetNodeHealthAsync(queryDescription). AggregatedHealthState | ft - AutoSize NodeName AggregatedHealthState -------. EventsFilter = new HealthEventsFilter() { HealthStateFilterValue = HealthStateFilter.HealthManager. Input: [Required] The application name (URI) that identifies the application. services.--------------------- _Node_2 Ok _Node_0 Ok _Node_1 Ok _Node_3 Ok _Node_4 Ok REST You can get node health with a GET request or a POST request that includes health policies described in the body. LastWarning = 1/1/0001 12:00:00 AM The following cmdlet gets the health of all nodes in the cluster: PS C:\> Get-ServiceFabricNode | Get-ServiceFabricNodeHealth | select NodeName. errors only. and deployed applications are used to evaluate the entity aggregated health. Get application health Returns the health of an application entity. services.FM Property : State HealthState : Ok SequenceNumber : 6 SentAt : 3/22/2016 7:47:56 PM ReceivedAt : 3/22/2016 7:48:19 PM TTL : Infinite Description : Fabric node is up. }. . [Optional] Filters for events. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:48:19 PM. or both warnings and errors). First. Unhealthy service: ServiceName='fabric:/WordCount/WordCountService'. . MaxPercentUnhealthyServices=0%. Unhealthy event: SourceId='System. ApplicationHealth applicationHealth = await fabricClient.Error | HealthStateFilter. }. MaxPercentUnhealthyReplicasPerPartition = 5.PLB'. HealthState='Warning'. The following cmdlet returns the health of the fabric:/WordCount application: PS c:\> PS C:\WINDOWS\system32> Get-ServiceFabricApplicationHealth fabric:/WordCount ApplicationName : fabric:/WordCount AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy services: 100% (1/1). ServicesFilter = new ServiceHealthStatesFilter() { HealthStateFilterValue = warningAndErrors }. Property='ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. var queryDescription = new ApplicationHealthQueryDescription(applicationName) { HealthPolicy = policy. DefaultServiceTypeHealthPolicy = serviceTypePolicy. create a FabricClient and call the GetApplicationHealthAsync method on its HealthManager.GetApplicationHealthAsync(applicationName). MaxPercentUnhealthyDeployedApplications = 0. AggregatedHealthState='Warning'. var policy = new ApplicationHealthPolicy() { ConsiderWarningAsError = false.GetApplicationHealthAsync(queryDescription).API To get application health. The following code gets the application health for the specified application name (URI): ApplicationHealth applicationHealth = await fabricClient. The following code gets the application health for the specified application name (URI). EventsFilter = new HealthEventsFilter() { HealthStateFilterValue = warningAndErrors }. First. ConsiderWarningAsError=false. with filters and custom policies specified via ApplicationHealthQueryDescription. ServiceType='WordCountServiceType'. HealthStateFilter warningAndErrors = HealthStateFilter. }. PowerShell The cmdlet to get the application health is Get-ServiceFabricApplicationHealth. DeployedApplicationsFilter = new DeployedApplicationHealthStatesFilter() { HealthStateFilterValue = warningAndErrors }.HealthManager.HealthManager. MaxPercentUnhealthyServices = 0. }. var serviceTypePolicy = new ServiceTypeHealthPolicy() { MaxPercentUnhealthyPartitionsPerService = 0. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.Warning. LastWarning = 1/1/0001 12:00:00 AM The following PowerShell cmdlet passes in custom policies. LastWarning = 1/1/0001 12:00:00 AM SourceId : MyWatchdog Property : Availability HealthState : Ok SequenceNumber : 131031545225930951 SentAt : 3/22/2016 9:08:42 PM ReceivedAt : 3/22/2016 9:08:42 PM TTL : Infinite Description : Availability checked successfully. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:56:53 PM.CM Property : State HealthState : Ok SequenceNumber : 360 SentAt : 3/22/2016 7:56:53 PM ReceivedAt : 3/22/2016 7:56:53 PM TTL : Infinite Description : Application has been created. ServiceHealthStates : ServiceName : fabric:/WordCount/WordCountService AggregatedHealthState : Warning ServiceName : fabric:/WordCount/WordCountWebService AggregatedHealthState : Ok DeployedApplicationHealthStates : ApplicationName : fabric:/WordCount NodeName : _Node_0 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_2 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_3 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_4 AggregatedHealthState : Ok ApplicationName : fabric:/WordCount NodeName : _Node_1 AggregatedHealthState : Ok HealthEvents : SourceId : System. latency ok RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 8:55:39 PM. It also filters children and events. . The following code gets the service health for the specified service name (URI).GetServiceHealthAsync(queryDescription).HealthManager. Property='ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. API To get service health through the API. create a FabricClient and call the GetServiceHealthAsync method on its HealthManager. ConsiderWarningAsError=true. ServiceType='WordCountServiceType'. regardless of the filter.GetServiceHealthAsync(serviceName). [Optional] Filters for events and partitions that specify which entries are of interest and should be returned in the result (for example.All }. HealthState='Warning'. The following example gets the health of a service with specified service name (URI): ServiceHealth serviceHealth = await fabricClient. }. Input: [Required] The service name (URI) that identifies the service.HealthManager. [Optional] The application health policy used to override the application manifest policy. PartitionsFilter = new PartitionHealthStatesFilter() { HealthStateFilterValue = HealthStateFilter. ServiceHealth serviceHealth = await fabricClient. Unhealthy service: ServiceName='fabric:/WordCount/WordCountService'. PS C:\> Get-ServiceFabricApplicationHealth -ApplicationName fabric:/WordCount -ConsiderWarningAsError $true - ServicesFilter Error -EventsFilter Error -DeployedApplicationsFilter Error ApplicationName : fabric:/WordCount AggregatedHealthState : Error UnhealthyEvaluations : Unhealthy services: 100% (1/1). or both warnings and errors). errors only. specifying filters and custom policy via ServiceHealthQueryDescription: var queryDescription = new ServiceHealthQueryDescription(serviceName) { EventsFilter = new HealthEventsFilter() { HealthStateFilterValue = HealthStateFilter. MaxPercentUnhealthyServices=0%.Error }. ServiceHealthStates : ServiceName : fabric:/WordCount/WordCountService AggregatedHealthState : Error DeployedApplicationHealthStates : None HealthEvents : None REST You can get application health with a GET request or a POST request that includes health policies described in the body. It contains the partition health states.PLB'. Unhealthy event: SourceId='System. Get service health Returns the health of a service entity. PowerShell . AggregatedHealthState='Error'. All events and partitions are used to evaluate the entity aggregated health. Nodes Eliminated By Constraints: ReplicaExclusionStatic: FaultDomain:fd:/0 NodeName:_Node_0 NodeType:NodeType0 UpgradeDomain:0 UpgradeDomain: ud:/0 Deactivation Intent/Status: None/None FaultDomain:fd:/1 NodeName:_Node_1 NodeType:NodeType1 UpgradeDomain:1 UpgradeDomain: ud:/1 Deactivation Intent/Status: None/None FaultDomain:fd:/3 NodeName:_Node_3 NodeType:NodeType3 UpgradeDomain:3 UpgradeDomain: ud:/3 Deactivation Intent/Status: None/None .PLB Property : ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39- b90d-28d15f39599b HealthState : Warning SequenceNumber : 131031547693687021 SentAt : 3/22/2016 9:12:49 PM ReceivedAt : 3/22/2016 9:12:49 PM TTL : 00:01:05 Description : The Load Balancer was unable to find a placement for one or more of the Service's Replicas: fabric:/WordCount/WordCountService Secondary Partition a1f83a35-d6bf-4d39-b90d- 28d15f39599b could not be placed.PLB'. ReplicaExclusionDynamic eliminated 1 possible node(s) for placement -.0/5 node(s) remain. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. ConsiderWarningAsError=false. The following cmdlet gets the service health by using default health policies: PS C:\> Get-ServiceFabricServiceHealth -ServiceName fabric:/WordCount/WordCountService ServiceName : fabric:/WordCount/WordCountService AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System. HealthState='Warning'.1/5 node(s) remain. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:18 PM. Property='ServiceReplicaUnplacedHealth_Secondary_a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. possibly. due to the following constraints and properties: Placement Constraint: N/A Depended Service: N/A Constraint Elimination Sequence: ReplicaExclusionStatic eliminated 4 possible node(s) for placement -. LastWarning = 1/1/0001 12:00:00 AM SourceId : System. PartitionHealthStates : PartitionId : a1f83a35-d6bf-4d39-b90d-28d15f39599b AggregatedHealthState : Warning HealthEvents : SourceId : System. First.FM Property : State HealthState : Ok SequenceNumber : 10 SentAt : 3/22/2016 7:56:53 PM ReceivedAt : 3/22/2016 7:57:18 PM TTL : Infinite Description : Service has been created.The cmdlet to get the service health is Get-ServiceFabricServiceHealth. It contains the replica health states. PowerShell The cmdlet to get the partition health is Get-ServiceFabricPartitionHealth. Input: [Required] The partition ID (GUID) that identifies the partition.HealthManager. All events and replicas are used to evaluate the entity aggregated health. PartitionHealth partitionHealth = await fabricClient. [Optional] Filters for events and replicas that specify which entries are of interest and should be returned in the result (for example. regardless of the filter. create PartitionHealthQueryDescription. API To get partition health through the API. To specify optional parameters. Get partition health Returns the health of a partition entity. or both warnings and errors). connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. [Optional] The application health policy used to override the application manifest policy. First. The following cmdlet gets the health for all partitions of the fabric:/WordCount/WordCountService service: . create a FabricClient and call the GetPartitionHealthAsync method on its HealthManager.GetPartitionHealthAsync(partitionId). errors only. None/None FaultDomain:fd:/4 NodeName:_Node_4 NodeType:NodeType4 UpgradeDomain:4 UpgradeDomain: ud:/4 Deactivation Intent/Status: None/None ReplicaExclusionDynamic: FaultDomain:fd:/2 NodeName:_Node_2 NodeType:NodeType2 UpgradeDomain:2 UpgradeDomain: ud:/2 Deactivation Intent/Status: None/None RemoveWhenExpired : True IsExpired : False REST You can get service health with a GET request or a POST request that includes health policies described in the body. HealthManager. [Optional] Filters for events that specify which entries are of interest and should be returned in the result (for example. or both warnings and errors). [Optional] The application health policy parameters used to override the application manifest policies. API To get the replica health through the API. All events are used to evaluate the entity aggregated health. ReplicaHealth replicaHealth = await fabricClient. Get replica health Returns the health of a stateful service replica or a stateless service instance. PowerShell . Input: [Required] The partition ID (GUID) and replica ID that identifies the replica. LastOk = 1/1/0001 12:00:00 AM REST You can get partition health with a GET request or a POST request that includes health policies described in the body. To specify advanced parameters. create a FabricClient and call the GetReplicaHealthAsync method on its HealthManager.FM Property : State HealthState : Warning SequenceNumber : 76 SentAt : 3/22/2016 7:57:26 PM ReceivedAt : 3/22/2016 7:57:48 PM TTL : Infinite Description : Partition is below target replica or instance count. use ReplicaHealthQueryDescription. ReplicaHealthStates : ReplicaId : 131031502143040223 AggregatedHealthState : Ok ReplicaId : 131031502346844060 AggregatedHealthState : Ok ReplicaId : 131031502346844059 AggregatedHealthState : Ok ReplicaId : 131031502346844061 AggregatedHealthState : Ok ReplicaId : 131031502346844058 AggregatedHealthState : Ok HealthEvents : SourceId : System. HealthState='Warning'. PS C:\> Get-ServiceFabricPartition fabric:/WordCount/WordCountService | Get-ServiceFabricPartitionHealth PartitionId : a1f83a35-d6bf-4d39-b90d-28d15f39599b AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System. errors only. Property='State'.GetReplicaHealthAsync(partitionId. RemoveWhenExpired : False IsExpired : False Transitions : Error->Warning = 3/22/2016 7:57:48 PM. regardless of the filter. ConsiderWarningAsError=false. replicaId).FM'. . First. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet.ReplicaRole -eq "Primary"} | Get-ServiceFabricReplicaHealth PartitionId : a1f83a35-d6bf-4d39-b90d-28d15f39599b ReplicaId : 131031502143040223 AggregatedHealthState : Ok HealthEvents : SourceId : System. First. nodeName)).GetDeployedApplicationHealthAsync( new DeployedApplicationHealthQueryDescription(applicationName. To specify optional parameters. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:12 PM. create a FabricClient and call the GetDeployedApplicationHealthAsync method on its HealthManager. Input: [Required] The application name (URI) and node name (string) that identify the deployed application. [Optional] The application health policy used to override the application manifest policies.HealthManager. LastWarning = 1/1/0001 12:00:00 AM REST You can get replica health with a GET request or a POST request that includes health policies described in the body. All events and deployed service packages are used to evaluate the entity aggregated health. Get deployed application health Returns the health of an application deployed on a node entity. DeployedApplicationHealth health = await fabricClient. The following cmdlet gets the health of the primary replica for all partitions of the service: PS C:\> Get-ServiceFabricPartition fabric:/WordCount/WordCountService | Get-ServiceFabricReplica | where {$_. API To get the health of an application deployed on a node through the API.The cmdlet to get the replica health is Get-ServiceFabricReplicaHealth.RA Property : State HealthState : Ok SequenceNumber : 131031502145556748 SentAt : 3/22/2016 7:56:54 PM ReceivedAt : 3/22/2016 7:57:12 PM TTL : Infinite Description : Replica has been created. run Get-ServiceFabricApplicationHealth and look at the deployed application children. errors only. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. It contains the deployed service package health states. [Optional] Filters for events and deployed service packages that specify which entries are of interest and should be returned in the result (for example. use DeployedApplicationHealthQueryDescription. To find out where an application is deployed. or both warnings and errors). PowerShell The cmdlet to get the deployed application health is Get-ServiceFabricDeployedApplicationHealth. The following cmdlet gets the health of the fabric:/WordCount application deployed on _Node_2. regardless of the filter. PowerShell The cmdlet to get the deployed service package health is Get-ServiceFabricDeployedServicePackageHealth. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. Get deployed service package health Returns the health of a deployed service package entity.GetDeployedServicePackageHealthAsync( new DeployedServicePackageHealthQueryDescription(applicationName. To see which service packages are in an application. serviceManifestName)).HealthManager. nodeName. node name (string).Hosting Property : Activation HealthState : Ok SequenceNumber : 131031502143710698 SentAt : 3/22/2016 7:56:54 PM ReceivedAt : 3/22/2016 7:57:12 PM TTL : Infinite Description : The application was activated successfully. use DeployedServicePackageHealthQueryDescription. DeployedServicePackageHealth health = await fabricClient. and service manifest name (string) that identify the deployed service package. or both warnings and errors). To specify optional parameters. PS C:\> Get-ServiceFabricDeployedApplicationHealth -ApplicationName fabric:/WordCount -NodeName _Node_2 ApplicationName : fabric:/WordCount NodeName : _Node_2 AggregatedHealthState : Ok DeployedServicePackageHealthStates : ServiceManifestName : WordCountServicePkg NodeName : _Node_2 AggregatedHealthState : Ok ServiceManifestName : WordCountWebServicePkg NodeName : _Node_2 AggregatedHealthState : Ok HealthEvents : SourceId : System. run Get-ServiceFabricApplicationHealth and look at the deployed applications. [Optional] Filters for events that specify which entries are of interest and should be returned in the result (for example. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:12 PM. errors only. look at the deployed service package children in the Get- . API To get the health of a deployed service package through the API. Input: [Required] The application name (URI). All events are used to evaluate the entity aggregated health. [Optional] The application health policy used to override the application manifest policy. To see where an application is deployed. First. LastWarning = 1/1/0001 12:00:00 AM REST You can get deployed application health with a GET request or a POST request that includes health policies described in the body. create a FabricClient and call the GetDeployedServicePackageHealthAsync method on its HealthManager. regardless of the filter. LastWarning = 1/1/0001 12:00:00 AM SourceId : System. LastWarning = 1/1/0001 12:00:00 AM SourceId : System. as opposed to health . LastWarning = 1/1/0001 12:00:00 AM REST You can get deployed service package health with a GET request or a POST request that includes health policies described in the body. The entity has System.Hosting Property : CodePackageActivation:Code:EntryPoint HealthState : Ok SequenceNumber : 131031502301568982 SentAt : 3/22/2016 7:57:10 PM ReceivedAt : 3/22/2016 7:57:12 PM TTL : Infinite Description : The CodePackage was activated successfully. The following cmdlet gets the health of the WordCountServicePkg service package of the fabric:/WordCount application deployed on _Node_2. By default.Hosting reports for successful service-package and entry-point activation. and successful service-type registration.Hosting Property : ServiceTypeRegistration:WordCountServiceType HealthState : Ok SequenceNumber : 131031502314788519 SentAt : 3/22/2016 7:57:11 PM ReceivedAt : 3/22/2016 7:57:12 PM TTL : Infinite Description : The ServiceType was registered successfully. Health chunk queries The health chunk queries can return multi-level cluster children (recursively). It supports advanced filters that allow much flexibility to express which specific children to be returned. no children are included. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:12 PM.ServiceFabricDeployedApplicationHealth output. identified by their unique identifier or other group identifier and/or health state. per input filters.Hosting Property : Activation HealthState : Ok SequenceNumber : 131031502301306211 SentAt : 3/22/2016 7:57:10 PM ReceivedAt : 3/22/2016 7:57:12 PM TTL : Infinite Description : The ServicePackage was activated successfully. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:12 PM. PS C:\> Get-ServiceFabricDeployedApplication -ApplicationName fabric:/WordCount -NodeName _Node_2 | Get- ServiceFabricDeployedServicePackageHealth -ServiceManifestName WordCountServicePkg ApplicationName : fabric:/WordCount ServiceManifestName : WordCountServicePkg NodeName : _Node_2 AggregatedHealthState : Ok HealthEvents : SourceId : System. RemoveWhenExpired : False IsExpired : False Transitions : Error->Ok = 3/22/2016 7:57:12 PM. and for those applications include all services at warning|error. the children are not returned by default. Currently. [Optional] The application health policy map. services. [Optional] Filters for nodes and applications that specify which entries are of interest and should be returned in the result. include all partitions. Return only the health of 4 applications. to get the health of specific entities. and only replicas at error. If empty. . Return all applications. For returned services. API To get cluster health chunk. Cluster health chunk query Returns the health of the cluster entity and contains the hierarchical health state chunks of required children. To get the children of the children. The application filters can recursively specify advanced filters for children. For a specified service. all deployed applications on the specified node and all the deployed service packages on that node. minimizing the message size and the number of messages. with the health policies used to override the application manifest policies. create a FabricClient and call the GetClusterHealthChunkAsync method on its HealthManager. partitions. include all partitions. Read more about the filters at NodeHealthStateFilter and ApplicationHealthStateFilter. The value of the chunk query is that you can get health state for more cluster entities (potentially all cluster entities starting at required root) in one call. The chunk result includes the children that respect the filters. you must call additional health APIs for each entity of interest. You can express complex health query such as: Return only applications at error. The list of filters can contain one general filter and/or filters for specific identifiers to fine-grain entities returned by the query. The following code gets cluster health chunk with advanced filters. The health queries return only first-level children of the specified entity per required filters. which contains: The cluster aggregated health state. Same for the children of services and deployed applications. Similarly. the health chunk query is exposed only for the cluster entity. Return all replicas at error. That extra information can be obtained using the existing cluster health query. Each application health state chunk contains a chunk list with all services that respect input filters and a chunk list with all deployed applications that respect the filters. you must call one health API for each desired entity. in a hierarchical fashion. The health state chunk list of applications that respect input filters. the chunk query does not return unhealthy evaluations or entity events. The chunk query advanced filtering allows you to request multiple items of interest in one query. Returns all applications. You can pass in ClusterHealthQueryDescription to describe health policies and advanced filters. It returns a cluster health chunk. Return only the health of applications of a desired application type. The filters are specific to an entity/group of entities or are applicable to all entities at that level.commands that always include first-level children. This way. The health state chunk list of nodes that respect input filters. Returns all applications. all entities in the cluster can be potentially returned if requested. Input: [Optional] The cluster health policy used to evaluate the nodes and the cluster events. specified by their names. Currently. Return all deployed entities on a node. // Application filter: for specific application. The following code gets nodes only if they are in Error except for a specific node.All }. return no services except the ones of interest var wordCountApplicationFilter = new ApplicationHealthStateFilter() { // Always return fabric:/WordCount application ApplicationNameFilter = new Uri("fabric:/WordCount"). wordCountServicePartitionFilter.PartitionFilters. var result = await fabricClient. queryDescription.ServiceFilters. .Error }). // Return all replicas and all partitions var wordCountServicePartitionFilter = new PartitionHealthStateFilter() { HealthStateFilter = HealthStateFilter. wordCountServiceFilter. // Return all replicas var wordCountServiceReplicaFilter = new ReplicaHealthStateFilter() { HealthStateFilter = HealthStateFilter.Add(wordCountApplicationFilter).Add(wordCountServiceReplicaFilter). wordCountApplicationFilter. var queryDescription = new ClusterHealthChunkQueryDescription().ApplicationFilters.All }.ReplicaFilters. // For specific service.GetClusterHealthChunkAsync(queryDescription). queryDescription. which should always be returned. First.HealthManager.Add(wordCountServiceFilter).ApplicationFilters.Add(wordCountServicePartitionFilter). }. PowerShell The cmdlet to get the cluster health is Get-ServiceFabricClusterChunkHealth. connect to the cluster by using the Connect-ServiceFabricCluster cmdlet. return all partitions and all replicas var wordCountServiceFilter = new ServiceHealthStateFilter() { ServiceNameFilter = new Uri("fabric:/WordCount/WordCountService").Add(new ApplicationHealthStateFilter() { // Return applications only if they are in error HealthStateFilter = HealthStateFilter. }. Fabric.Fabric.Collections. .NodeHealthStateFilter] $nodeFilters. $allFilter = [System.Health.Health.Generic.Fabric.HealthStateFilter=$allFilter} # Create node filter list that will be passed in the cmdlet $nodeFilters = New-Object System.Add($nodeFilter2) Get-ServiceFabricClusterHealthChunk -NodeFilters $nodeFilters HealthState : Error NodeHealthStateChunks : TotalCount : 1 NodeName : _Node_1 HealthState : Ok ApplicationHealthStateChunks : None The following cmdlet gets cluster chunk with application filters.List[System.Health.Add($nodeFilter1) $nodeFilters.Fabric.Fabric.HealthStateFilter]::All.Health.NodeHealthStateFilter -Property @{NodeNameFilter="_Node_1".NodeHealthStateFilter -Property @{HealthStateFilter=$errorFilter} $nodeFilter2 = New-Object System.HealthStateFilter]::Error. PS C:\> $errorFilter = [System. $nodeFilter1 = New-Object System.Health. Add($svcFilter2) $appFilters = New-Object System.Add($replicaFilter) # For WordCountService.ApplicationHealthStateFilter -Property @{ApplicationNameFilter="fabric:/WordCount"} $appFilter.Collections.PartitionHealthStateFilter -Property @{HealthStateFilter=$allFilter} $partitionFilter.Fabric. # All replicas $replicaFilter = New-Object System.Fabric.ServiceFilters.Fabric.Health.Health.ApplicationHealthStateFilter] $appFilters.Fabric.HealthStateFilter]::Error.ReplicaHealthStateFilter -Property @{HealthStateFilter=$allFilter} # All partitions $partitionFilter = New-Object System.Health.HealthStateFilter]::All.Fabric.List[System. return all partitions and all replicas $svcFilter1 = New-Object System.ReplicaFilters.PartitionFilters.Health.Add($partitionFilter) $svcFilter2 = New-Object System.Health.Health.ServiceHealthStateFilter -Property @{ServiceNameFilter="fabric:/WordCount/WordCountService"} $svcFilter1.Fabric.ServiceFilters.Health.Fabric.Health.ServiceHealthStateFilter -Property @{HealthStateFilter=$errorFilter} $appFilter = New-Object System.Fabric.Generic. $allFilter = [System.Add($svcFilter1) $appFilter.Add($appFilter) Get-ServiceFabricClusterHealthChunk -ApplicationFilters $appFilters HealthState : Error NodeHealthStateChunks : None ApplicationHealthStateChunks : TotalCount : 1 ApplicationName : fabric:/WordCount ApplicationTypeName : WordCount HealthState : Error ServiceHealthStateChunks : TotalCount : 1 ServiceName : fabric:/WordCount/WordCountService HealthState : Error PartitionHealthStateChunks : TotalCount : 1 PartitionId : a1f83a35-d6bf-4d39-b90d-28d15f39599b HealthState : Error ReplicaHealthStateChunks : TotalCount : 5 ReplicaOrInstanceId : 131031502143040223 HealthState : Ok ReplicaOrInstanceId : 131031502346844060 HealthState : Ok ReplicaOrInstanceId : 131031502346844059 HealthState : Ok ReplicaOrInstanceId : 131031502346844061 HealthState : Ok ReplicaOrInstanceId : 131031502346844058 HealthState : Error .$errorFilter = [System. Fabric.Health.ApplicationHealthStateFilter] $appFilters.HealthStateFilter]::Error.HealthStateFilter]::All.Fabric. $errorFilter = [System.Fabric.Fabric.Fabric.Add($appFilter) Get-ServiceFabricClusterHealthChunk -ApplicationFilters $appFilters HealthState : Error NodeHealthStateChunks : None ApplicationHealthStateChunks : TotalCount : 2 ApplicationName : fabric:/System HealthState : Ok DeployedApplicationHealthStateChunks : TotalCount : 1 NodeName : _Node_2 HealthState : Ok DeployedServicePackageHealthStateChunks : TotalCount : 1 ServiceManifestName : FAS HealthState : Ok ApplicationName : fabric:/WordCount ApplicationTypeName : WordCount HealthState : Error DeployedApplicationHealthStateChunks : TotalCount : 1 NodeName : _Node_2 HealthState : Ok DeployedServicePackageHealthStateChunks : TotalCount : 2 ServiceManifestName : WordCountServicePkg HealthState : Ok ServiceManifestName : WordCountWebServicePkg HealthState : Ok REST You can get cluster health chunk with a GET request or a POST request that includes health policies and advanced filters described in the body.List[System.Add($dspFilter) $appFilter = New-Object System. $allFilter = [System.DeployedServicePackageHealthStateFilter -Property @{HealthStateFilter=$allFilter} $daFilter = New-Object System.NodeNameFilter="_Node_2"} $daFilter. They are exposed through the API (via the .DeployedApplicationHealthStateFilter -Property @{HealthStateFilter=$allFilter.Fabric.ApplicationHealthStateFilter -Property @{HealthStateFilter=$allFilter} $appFilter. $dspFilter = New-Object System.Collections.Health.Health.Health.Generic.DeployedApplicationFilters.Add($daFilter) $appFilters = New-Object System. General queries General queries return a list of Service Fabric entities of a specified type.DeployedServicePackageFilters.The following cmdlet returns all deployed entities on a node.Health.Health. or the health store was throttled). It's also possible that a subquery to the health store wasn't successful (for example. These queries aggregate subqueries from multiple components. child health states. including events. If the results do not fit a message. which populates the aggregated health state for each query result. API: FabricClient.GetServiceListAsync PowerShell: Get-ServiceFabricService Partition list: Returns the list of partitions in a service (paged). it's possible that the health store doesn't have complete data about the entity. NOTE General queries return the aggregated health state of the entity and do not contain rich health data. API: FabricClient. only a page is returned and a ContinuationToken that tracks where enumeration stopped. It may also give you more details from the health store about why the entity is not exposed.GetPartitionListAsync PowerShell: Get-ServiceFabricPartition Replica list: Returns the list of replicas in a partition (paged).QueryClient. PowerShell cmdlets.GetReplicaListAsync PowerShell: Get-ServiceFabricReplica Deployed application list: Returns the list of deployed applications on a node. If an entity is not healthy. Examples The following code gets the unhealthy applications in the cluster: . this follow-up query may succeed.GetNodeListAsync PowerShell: Get-ServiceFabricNode Application list: Returns the list of applications in the cluster (paged). API: FabricClient. The queries that contain HealthState for entities are: Node list: Returns the list nodes in the cluster (paged). API: FabricClient. you can follow up with health queries to get all its health information. If general queries return an unknown health state for an entity.GetApplicationListAsync PowerShell: Get-ServiceFabricApplication Service list: Returns the list of services in an application (paged).GetDeployedApplicationListAsync PowerShell: Get-ServiceFabricDeployedApplication Deployed service package list: Returns the list of service packages in a deployed application.QueryClient. You should continue to call the same query and pass in the continuation token from the previous query to get next results. One of them is the health store.QueryClient. The return of these queries is a list derived from PagedList.QueryClient. and REST.QueryManager).GetDeployedServicePackageListAsync PowerShell: Get-ServiceFabricDeployedApplication NOTE Some of the queries return paged results. API: FabricClient.QueryClient. and unhealthy evaluations. API: FabricClient.QueryClient. there was a communication error.methods on FabricClient.QueryClient. such as network issues. If the subquery encountered transient errors. API: FabricClient. Follow up with a health query for the entity. Similarly. {"ServiceManifestName":"WordCountServicePkg".exe". . The following shows the application upgrade status for a modified fabric:/WordCount application. any unhealthy evaluations are contained in the application upgrade status. "_WFDebugParams_" = " [{"ServiceManifestName":"WordCountWebServicePkg". This information can help administrators investigate what went wrong after the upgrade rolled back or stopped.QueryManager.Result.Where( app => app."EntryPointType":"Main". The upgrade may be paused to allow user interaction (such as fixing error conditions or changing policies). If an entity is unhealthy as evaluated by using configured health policies. The upgrade is rolling back because the health checks are not respected. Service Fabric checks health to ensure that everything remains healthy.0 HasPersistedState : True ServiceStatus : Active HealthState : Warning Cluster and application upgrades During a monitored upgrade of the cluster and application.HealthState == HealthState. during an application upgrade. If the upgrade is rolled back due to health issues.exe".0."EnvironmentBlock":"_NO_DEBUG_HEAP=1\u0000"}]" } The following cmdlet gets the services with a health state of warning: PS C:\> Get-ServiceFabricApplication | Get-ServiceFabricService | where {$_.0\\Common7\\Packages\\Debugger\\VsDebugLaunchNotify. The following cmdlet gets the application details for the fabric:/WordCount application. or it may automatically roll back to the previous good version. A watchdog reported an error on one of its replicas."DebugArguments":" {2ab462e6-e0d1-4fda-a844-972f561fe751} -p [ProcessId] -tid [ThreadId]". PS C:\> Get-ServiceFabricApplication -ApplicationName fabric:/WordCount ApplicationName : fabric:/WordCount ApplicationTypeName : WordCount ApplicationTypeVersion : 1.GetApplicationListAsync().HealthState -eq "Warning"} ServiceName : fabric:/WordCount/WordCountService ServiceKind : Stateful ServiceTypeName : WordCountServiceType IsServiceGroup : False ServiceManifestVersion : 1. you can get the cluster upgrade status."EntryPointType":"Main"."DebugArguments":" {74f7e5d5-71a9-47e2-a8cd-1878ec4734f1} -p [ProcessId] -tid [ThreadId]". the upgrade applies upgrade-specific policies to determine the next action.0 ApplicationStatus : Ready HealthState : Warning ApplicationParameters : { "WordCountWebService_InstanceCount" = "1"."DebugExePath":"C:\\Program Files (x86)\\Microsoft Visual Studio 14. which point to what is unhealthy in the cluster.Error).0\\Common7\\Packages\\Debugger\\VsDebugLaunchNotify."EnvironmentBlock":"_NO_DEBUG_HEAP=1\u0000"}. During a cluster upgrade. Notice that health state is at warning."CodeP ackageName":"Code".0. var applications = fabricClient."Debug ExePath":"C:\\Program Files (x86)\\Microsoft Visual Studio 14. the upgrade status remembers the last unhealthy reasons."CodePackageName":"Code". The upgrade status includes unhealthy evaluations. ReplicaOrInstanceId='131031502346844058'.0 ApplicationParameters : {} StartTimestampUtc : 4/21/2015 5:23:26 PM FailureTimestampUtc : 4/21/2015 5:23:37 PM FailureReason : HealthCheck UpgradeState : RollingBackInProgress UpgradeDuration : 00:00:23 CurrentUpgradeDomainDuration : 00:00:00 CurrentUpgradeDomainProgress : UD1 NodeName : _Node_1 UpgradePhase : Upgrading NodeName : _Node_2 UpgradePhase : Upgrading NodeName : _Node_3 UpgradePhase : PreUpgradeSafetyCheck PendingSafetyChecks : EnsurePartitionQuorum . "UD3" = "Pending". . "UD2" = "Pending". look at the cluster or application health to pinpoint what is wrong.0. AggregatedHealthState='Error'. The unhealthy evaluations provide details about what triggered the current unhealthy state. MaxPercentUnhealthyServices=0%. "UD4" = "Pending" } UnhealthyEvaluations : Unhealthy services: 100% (1/1). Property='Disk'. Unhealthy partition: PartitionId='a1f83a35-d6bf-4d39-b90d-28d15f39599b'. ServiceType='WordCountServiceType'. Error event: SourceId='DiskWatcher'. AggregatedHealthState='Error'.PartitionId: 30db5be6-4e20-4698-8185-4bd7ca744020 NextUpgradeDomain : UD2 UpgradeDomainsStatus : { "UD1" = "Completed". If you need to. you can drill down into unhealthy child entities to identify the root cause. UpgradeKind : Rolling RollingUpgradeMode : UnmonitoredAuto ForceRestart : False UpgradeReplicaSetCheckTimeout : 00:15:00 Read more about the Service Fabric application upgrade. Unhealthy replica: PartitionId='a1f83a35-d6bf-4d39-b90d- 28d15f39599b'. AggregatedHealthState='Error'. Unhealthy replicas: 20% (1/5). Unhealthy service: ServiceName='fabric:/WordCount/WordCountService'. MaxPercentUnhealthyReplicasPerPartition=0%. Unhealthy partitions: 100% (1/1). PS C:\> Get-ServiceFabricApplicationUpgrade fabric:/WordCount ApplicationName : fabric:/WordCount ApplicationTypeName : WordCount TargetApplicationTypeVersion : 1. MaxPercentUnhealthyPartitionsPerService=0%.0. Use health evaluations to troubleshoot Whenever there is an issue with the cluster or an application. but they are not be reflected in the evaluations. To get more information. Next steps Use system health reports to troubleshoot Add custom Service Fabric health reports How to report and check service health Monitor and diagnose services locally Service Fabric application upgrade . drill down into the health entities to figure out all the unhealthy reports in the cluster. NOTE The unhealthy evaluations show the first reason the entity is evaluated to current health state. There may be multiple other events that trigger this state. the infrastructure might still be experiencing problems. Azure Service Fabric can help you implement monitoring and diagnostics as you develop your service. or using the Microsoft .NET Framework. Even if an application is available to customers. You also need to monitor the infrastructure for capacity planning. Monitoring is a broad term that includes the following tasks: Instrumenting the code Collecting instrumentation logs Analyzing logs Visualizing insights based on the log data Setting up alerts based on log values and insights Monitoring the infrastructure Detecting and diagnosing issues that affect your customers This article gives an overview of monitoring for Service Fabric clusters hosted either in Azure. local development environment. Monitoring and diagnostics can help you as you develop your services to: Minimize disruption to your customers. visualization. Provide business insights. It is important that each piece works together to deliver an end-to-end monitoring solution for the application. so you know when to add or remove infrastructure. Monitor and diagnose Azure Service Fabric applications 4/5/2017 • 21 min to read • Edit Online Monitoring and diagnostics are critical in a live production environment. It's important to monitor and troubleshoot both the infrastructure and the application that make up a Service Fabric deployment. Diagnose potential service issues. A set of metrics for the virtual machine scale set and individual virtual machines is automatically collected and displayed . We look at three important aspects of monitoring and diagnostics: Instrumenting the code or infrastructure Collecting generated events Storage. Monitor resource usage. many customers choose different technologies for each aspect of monitoring. Azure Monitor You can use Azure Monitor to monitor many of the Azure resources on which a Service Fabric cluster is built. production cluster setup. Monitoring infrastructure Service Fabric can help keep an application running during infrastructure failures. but you need to understand whether an error is occurring in the application or in the underlying infrastructure. deployed on Windows or Linux. to ensure that the service works seamlessly both in a single-machine. on-premises. aggregation. and analysis Although multiple products are available that cover all of these three areas. and in a real-world. Detect hardware and software failures or performance issues. Instrument your code Instrumenting the code is the basis for most other aspects of monitoring your services. The logs are stored in a dedicated storage account in your cluster's resource group. To customize the charts. see Create and manage a standalone Azure Service Fabric cluster and Configuration settings for a standalone Windows cluster. follow the instructions in Metrics in Microsoft Azure. as described in Configure a web hook on an Azure metric alert. heartbeat events. Any events that indicate service health are sent to a central repository. but they are not intended to be usable by anyone outside of the Microsoft customer support team. You can route data from Azure Monitor directly to Log Analytics. having detailed instrumentation data is important. support logs are almost always required. part of Microsoft Operations Management Suite. Although technically it's possible to connect a debugger to a production service. Service-health events include error events. When you're producing this volume of information. where they can be used to raise alerts of an unhealthy service. We recommend using Operations Management Suite to monitor your on-premises infrastructure. For information about setting up log collections for a standalone cluster. Azure Monitor supports only one subscription. You also can create alerts based on these metrics. If your cluster is hosted in Azure. but you can use any existing solution that your organization uses for infrastructure monitoring. as described in Create alerts in Azure Monitor for Azure services. 2. Usually. but in the account. Then. If you need to monitor multiple subscriptions. Instrumentation is the only way you can know that something is wrong. So. provides a holistic IT management solution both for on-premises and cloud-based infrastructures. in the Azure portal. so you can see metrics and logs for your entire environment in a single place. it's not a common practice. and performance events. and are collected only when needed for debugging. support logs are automatically configured and collected as part of creating a cluster. Service Fabric support logs If you need to contact Microsoft support for help with your Azure Service Fabric cluster. Some products automatically instrument your code. or if you need additional features. you see blob containers and tables with names that start with fabric. You are required to have these logs for support. shipping all events off the local node can be expensive. select the resource group that contains the Service Fabric cluster. In the Monitoring section. select Metrics to view a graph of the values. To view the collected information. All events are kept in a local rolling log file for a short interval. and to diagnose what needs to be fixed. Log Analytics. You can send alerts to a notification service by using web hooks.in the Azure portal. For standalone Service Fabric instances. Although these solutions can work well. The storage account doesn't have a fixed name. the events needed for detailed diagnosis are left on the node to reduce costs and resource utilization. the logs should be sent to a local file share. manual . Many services use a two-part strategy for dealing with the volume of instrumentation data: 1. select the virtual machine scale set that you want to view. which has its own unique event. Level = EventLevel. Keywords = Keywords. string serviceType) { WriteEvent(ServiceTypeRegisteredEventId. data can be packaged with the text of the error. [Event(ServiceHostInitializationFailedEventId. private ServiceEventSource() : base() { } . [EventSource(Name = "MyCompany-VotingState-VotingStateService")] internal sealed class ServiceEventSource : EventSource { public static readonly ServiceEventSource Current = new ServiceEventSource().Informational. Structuring the instrumentation output makes it easier to consume. In the end. This approach might generate many events. and the code implementation of the event. like an order system. Message = "Service host initialization failed".. serviceType). you must have enough information to forensically debug the application. In this approach. // The ServiceTypeRegistered event contains a unique identifier. might have a CreateOrder event. Having multiple EventSource definitions that use the same name causes an issue at run time. a method start or stop event would be reused across many services within an application. } // The ServiceHostInitializationFailed event contains a unique identifier. A template is created. [Event(ServiceTypeRegisteredEventId. and potentially require coordination of identifiers across project teams. Much of the structured aspect is lost. and should be renamed from the default template string MyCompany-<solution>-<project>. when a service type is registered. but requires more thought and time to define a new event for each use case. Each defined event must have a unique identifier. see Vance's blog or the MSDN documentation. many people define a few events with a common set of parameters that generally output their information as a string. If an identifier is not unique. an event attribute that defined the event. Some organizations preassign ranges of values for identifiers to avoid conflicts between separate development teams. hostProcessId. For more information. Keywords = Keywords. see the PartyCluster. The EventSource name must be unique. and when to choose one approach over another.ServiceInitialization)] public void ServiceHostInitializationFailed(string exception) { WriteEvent(ServiceHostInitializationFailedEventId. Level = EventLevel. an EventSource-derived class (ServiceEventSource or ActorEventSource) is generated. A domain-specific service. a runtime failure occurs.Error.ApplicationDeployService event in the Party Cluster sample. Message = "Service host process {0} registered service type {1}".. and you can more easily search and filter based on the names or values of the specified properties. in which you can add events for your application or service. // The instance constructor is private to enforce singleton semantics. exception). and the code implementation of the event. Using structured EventSource events Each of the events in the code examples in this section are defined for a specific case.ServiceInitialization)] public void ServiceTypeRegistered(int hostProcessId. a few events that usually correspond to the logging levels are defined. private const int ServiceHostInitializationFailedEventId = 4. and it's more difficult to search and filter the results. for example. For example. When you define messages by use case. Some event definitions can be shared across the entire application.instrumentation is almost always required. The next sections describe different approaches to instrumenting your code. an event attribute that defined the event. } Using EventSource generically Because defining specific events can be difficult. . private const int ServiceTypeRegisteredEventId = 3. For a complete example of structure EventSource events in Service Fabric. EventSource When you create a Service Fabric solution from a template in Visual Studio. For more information. which is part of Microsoft ASP. In the constructor of your service class.Extensions. You can use the code in ASP.Extensions.NET Core. ASP. add this code. ASP. and then needing to reinstrument the code. 5. private ServiceEventSource() : base() { } . while minimizing the effect on existing code.Logging.Verbose. string msg) { WriteEvent(ErrorEventId. Here are a few samples: . Define a private variable within your service class. To reduce risk. msg). Start instrumenting your code in your methods. // The Instance constructor is private.Logging in Service Fabric 1. 4.{1}")] public void Error(string error. Structured instrumentation is used for reporting errors and metrics. Generic events can be used for the detailed logging that is consumed by engineers for troubleshooting. Using Microsoft. error.. Add a using directive for Microsoft. [Event(DebugEventId.CreateLogger<Stateless>(). Message = "{0}")] public void Debug(string msg) { WriteEvent(DebugEventId.NET Core on Windows and Linux. you can choose an instrumentation library like Microsoft. add any provider packages (for a third-party package. see Logging in ASP.NET Framework. 2.. and in the full . Also. msg). 3. Level = EventLevel.Logging NuGet package to the project you want to instrument.Extensions.Logging to your service file.Extensions. see the following example). [Event(ErrorEventId.NET Core. Message = "Error: {0} .The following snippet defines a debug and error message: [EventSource(Name = "MyCompany-VotingState-VotingStateService")] internal sealed class ServiceEventSource : EventSource { public static readonly ServiceEventSource Current = new ServiceEventSource().Error. The right instrumentation plan can help you avoid potentially destabilizing your code base. private const int DebugEventId = 10.NET Core logging It's important to carefully plan how you will instrument your code. private ILogger _logger = null. to enforce singleton semantics. _logger = new LoggerFactory(). so your instrumentation code is standardized. } Using a hybrid of structured and generic instrumentation also can work well. Level = EventLevel. Add the Microsoft. } private const int ErrorEventId = 11.NET Core has an ILogger interface that you can use with the provider of your choice. PartitionId). we discuss why this step is useful. new PropertyEnricher("InstanceId".Literate. If you use the static Log. Also. You can plug each of these into ASP.ReplicaOrInstanceId). // Later in the article. so you can use Microsoft.NET Core logging factory.LiterateConsole(). _logger. Add the Serilog. also add Serilog.NET Core without Serilog. you must make the _logger available to common code. type. new PropertyEnricher("PartitionId". This feature can be useful to output the service name. requestDuration).Logging.ForContext(properties)).Logging"). The code creates the property enrichers for the ServiceTypeName. ServiceName. or you can use them separately.ForContext(properties). In Serilog. _logger = new LoggerFactory(). the last writer of the property enrichers will show values for all instances that are running. _logger. and Serilog. To use this capability in the ASP. A better approach is shown later in this article. For the next example. // In this variant.Extensions. Instrument the code the same as if you were using ASP.Observable NuGet packages to the project.NET Core logging. It also adds a property enricher to the ASP. } 5. Add a Serilog. Serilog has a feature that enriches all messages sent from a logger. which might be used across services. context. NLog.Logger. add the following code.WriteTo. "MyRequest". _logger. and partition information. context. and Loggr. including Serilog.LogInformation("Informational-level event from Microsoft. public Stateless(StatelessServiceContext context.LogDebug("Debug-level event from Microsoft. create a LoggerConfiguration and the logger instance.CreateLogger<Stateless>(). 4. Log. context. PartitionId.Extensions. . In the service constructor.NET Core infrastructure.ILogger in your code.GetResult(). 2.ServiceName). }.Sinks.ILogger serilog) : base(context) { PropertyEnricher[] properties = new PropertyEnricher[] { new PropertyEnricher("ServiceTypeName". NOTE We recommend that you don't use the static Log. new PropertyEnricher("ServiceName". Serilog. 3.ServiceTypeName). serilog.Logger with the preceding example.RegisterServiceAsync("StatelessType". Service Fabric can host multiple instances of the same service type within a single process.GetAwaiter(). This is one reason why the _logger variable is a private member variable of the service class.Logging. Serilog. we're adding structured properties RequestName and Duration.LogInformation("{RequestName} {Duration}". ServiceRuntime. and pass the newly created logger. Log. Using other logging providers Some third-party providers use the approach described in the preceding section. which have values MyRequest and the duration of the request.CreateLogger(). context.Logger)).Sinks. do these steps: 1.Logger = new LoggerConfiguration(). context => new Stateless(context. and InstanceId properties of the service.ILogger argument to the service constructor.AddSerilog(serilog.Logging"). Other Azure Diagnostics articles about configuring performance counters. WADServiceFabricReliableServiceEventTable. These articles listed in this section also can show you how to get custom events into Azure storage. assuming that you enabled diagnostics. The events are enumerated in Diagnostics and performance monitoring for Reliable Actors. there are some limitations on how they can be collected. depending on the number of applications and services you configure. These articles show how you can collect the event data and send it to Azure storage. you can use event collection for historical reporting and alerting. EventSource requires a large investment from your engineering team. you must modify the Resource Manager template to add them. EventSource is the best approach to use. You also must deploy Resource Manager any time an application configuration changes. including service placement and start/stop events. Because it .4 release of Service Fabric. or articles that have other monitoring information from virtual machines to Azure Diagnostics also apply to a Service Fabric cluster. events appear in an Azure storage account that was created when you created the cluster. To determine the approach to use for your project. learn how to configure event collection for Windows and Linux. It works both for Windows and Linux clusters. For instance. This isn't an issue for many services. if you don't want to use Azure Table storage as a destination. Event and log collection Azure Diagnostics In addition to the information that Azure Monitor provides. but if your service is performance-oriented. health and load metric events are exposed. The events are enumerated in Diagnostic functionality for stateful Reliable Services. For more information. occurs only at the virtual machine scale set level. This way. These events also are native ETW events. using EventSource might be a better choice. When the events are in Azure Event Hubs. As of the 5. EventFlow Microsoft Diagnostics EventFlow can route events from a node to one or more monitoring destinations. To get the same benefits of structured logging. When configured. For more information. A disadvantage of using Azure Diagnostics is that you set it up by using a Resource Manager template.NET Core logging or any of the available third-party solutions.Choosing a logging provider If your application relies on high performance. and then choose the one that best meets your needs. then. You can get more information about integrating Azure diagnostic information with Application Insights. Ideally. Azure Diagnostics works only for Service Fabric clusters deployed to Azure. The best way to see the events that are emitted is to use the Visual Studio Diagnostic Events Viewer on your local machine. you can read them and send them to the location you choose. A virtual machine scale set corresponds to a node type in Service Fabric. Because these events are native ETW events. the monitoring configuration would travel with the service configuration. EventSource generally uses fewer resources and performs better than ASP. Azure collects events from each of the services at a central location. Many events are emitted from Service Fabric as part of this category. You configure each node type for all the applications and services that might run on a node of that type. do a quick prototype of what each option entails. You can do this in the Azure portal or in your Azure Resource Manager template by enabling diagnostics. Azure Diagnostics collects a few event sources that Service Fabric automatically produces: EventSource events and performance counters when you use the Reliable Actor programming model. Diagnostics. and WADServiceFabricSystemEventTable. This might be many EventSource events. see Collect logs by using Azure Diagnostics. The tables are named WADServiceFabricReliableActorEventTable. and have some limitations on how they can be collected. System events are emitted as Event Tracing for Windows (ETW) events. see Streaming Azure Diagnostics data in the hot path by using Event Hubs. Health events are not added by default. EventSource events when you use the Reliable Services programming model. // Registering a service maps a service type name to a . ServiceEventSource. and connects directly to the configured outputs. In the service's Main function. because all replicas of a ServiceType run in the same process.Infinite). ServiceRuntime. see Collect logs directly from an Azure Service Fabric service process.GetResult(). // When Service Fabric creates an instance of this service type. internal static class Program { /// <summary> /// This is the entry point of the service host process. and then configure the outputs. To use EventFlow: 1.ServiceHostInitializationFailed(e.WriteTo.EventFlow(pipeline). so services keep running. } } } 3. so that only the events that match the specified filter are sent. and on-premises service deployments. context => new Stateless(context. EventFlow runs within your service process.Current. If you host a lot of processes.json in the service's \PackageRoot\Config folder.Current.CreateLogger(). container. In the following example. Because of the direct connection.GetAwaiter(). eliminating the per-node configuration issue mentioned earlier about Azure Diagnostics. EventFlow code and configuration travel with the service.RegisterServiceAsync("StatelessType".Sleep(Timeout.Logger = new LoggerConfiguration(). you get a lot of outbound connections! This isn't as much a concern for Service Fabric applications. because each EventFlow pipeline makes an external connection.CreatePipeline("MonitoringE2E- Stateless-Pipeline")) { Log.Id. Log. such as in a container.xml file defines one or more service type names. Inside the file. Thread. EventFlow works for Azure. // The ServiceManifest. the configuration looks like this: . // an instance of the class is created in this host process. 2. Create a file named eventFlowConfig. /// </summary> private static void Main() { try { using (var pipeline = ServiceFabricDiagnosticPipelineFactory. // Prevents this host process from terminating. For detailed information about how to use EventFlow with Service Fabric. EventFlow also offers event filtering.ToString()).NET type. throw.ServiceTypeRegistered(Process. anf this limits the number of outbound connections.Logger)). typeof(Stateless). Add the NuGet package to your service project. we use Serilog as an output. } } catch (Exception e) { ServiceEventSource. create the EventFlow pipeline.GetCurrentProcess(). Be careful if you run EventFlow in high-density scenarios.Name).is included as a NuGet package in your service project. "include": "Level == Verbose" }. we're still using Serilog. { "providerName": "Microsoft-ServiceFabric-Actors" }. "include": "RequestName==MyRequest". and services should never run with high privileges. { "type": "ApplicationInsights". The second informational event tracks the request duration. Some of the syntax we use is specific to Serilog. we instrument RunAsync a few different ways. The first filter tells EventFlow to drop all events that have an event level of verbose. In the following code. { "inputs": [ { "type": "EventSource". The system-level and health events that use ETW are not available to EventFlow. "schemaVersion": "2016-08-11". to show examples. the debug-level event should not flow to the output. Make sure you add your instrumentation key. "sources": [ { "providerName": "Microsoft-ServiceFabric-Services" }. Instrument the code. In the next example. and the EventSource for the service. { "providerName": "MyCompany-MonitoringE2E-Stateless" } ] }. "metadata": "request". { "type": "metadata". two inputs are defined: the two EventSource-based sources that Service Fabric creates. In the configuration of EventFlow described earlier. "filters": [ { "type": "drop". "extensions": [] } In the configuration. Two outputs are configured: standard output. . Three events are generated: a debug-level event. Serilog configuration occurred in the Main method. The other input is Serilog. "outputs": [ { "type": "StdOutput" }. "durationUnit": "milliseconds" } ]. 4. which writes to the output window in Visual Studio. Note the specific capabilities for the logging solution you choose. and ApplicationInsights. This is because a high-level privilege is required to listen to an ETW source. and two informational events. { "type": "Serilog" } ]. "requestNameProperty": "RequestName". "instrumentationKey": "== instrumentation key here ==" } ]. "durationProperty": "Duration". Some filters are applied. in the Azure portal.Restart(). In the filter definition.Now. _logger. "Working-{0}". _logger.LogDebug("Debug level event from Microsoft.StartNew().Current. other outputs can be written. sw.ElapsedMilliseconds).LogInformation("Informational level event from Microsoft.Millisecond). If you have a standalone cluster that cannot be connected to a cloud-based solution for policy reasons.Context. go to your Application Insights resource. select the Search box. and another property. This is what you see in the request event in Application Insights. ServiceEventSource. cancellationToken). Duration . you can use Elasticsearch as an output. Some third- party providers for ASP. in milliseconds.ServiceMessage(this.Logging"). The request entry preceding the trace is the third _logger instrumentation line.LogInformation("{RequestName} {Duration}".NET Core logging also have solutions that support on-premises installations. // Delay a random interval. Azure Service Fabric health and load reporting . await Task. the type is metadata. Stopwatch sw = Stopwatch. and pull requests are encouraged. The line shows that the event was translated into a request metric in Application Insights.Logging"). You can see the traces at the bottom of the preceding screenshot. To see the events. sw. and that the debug- level event was dropped by EventFlow. } To view the events in Azure Application Insights. This declares that an event that has a property of RequestName with the value MyRequest . to provide a more interesting request duration. The same approach works with any of the supported EventFlow inputs.ThrowIfCancellationRequested(). "MyRequest". However. while (true) { cancellationToken.Delay(TimeSpan. It shows only two events. _logger. including EventSource. contain the duration of the request. ++iterations).FromMilliseconds(DateTime. The more health checks that are incorporated into your code. For example. Also. this. } To report a metric. use code similar to this: . a resolution occurs relatively quickly. most customers won't experience an issue. if trends show that at 9 AM on Monday morning the average RPS is 1. A CPU performance counter can tell you how your node is utilized. For example. If one service is using more resources than another service. Service Fabric moves service instances around the cluster. We recommend that you start all metrics with a weight of zero.HasValue) { HealthInformation healthInformation = new HealthInformation("ServiceCode". HealthState. Metrics are important in Service Fabric because they are used to balance resource usage. Your service can define a set of metrics that can be reported for health check purposes. Health monitoring is especially important when Service Fabric performs a named application upgrade. but that don't affect the resource balancing of the cluster. To report health. But. Everything might be perfectly fine. "StateDictionary". For a more detailed explanation of how resource utilization works. Over time.Error). so that the application is in a known. the more resilient your service is to deployment issues. Metrics also can be an indicator of system health.Partition. If good health status cannot be achieved. then you might set up a health report that alerts you if the RPS is below 500 or above 1. Although some customers might be affected before the services are rolled back. the deployment is rolled back. and without having to wait for action from a human operator. TIP Don't use too many weighted metrics. good state. to try to maintain even resource utilization. Another aspect of service health is reporting metrics from the service. and not increase the weight until you are sure that you understand how weighting the metrics affects resource balancing for your cluster. metrics like RPS. you can use metrics to check that the service is operating within expected parameters. Metrics also can help give you insight into how your service is performing. After each upgrade domain of the service is upgraded and is available to your customers.Service Fabric has its own health model.500. see Manage resource consumption and load in Service Fabric with metrics. you might have an application that has many services. use code similar to this: if (!result. It can be difficult to understand why service instances are being moved around for balancing. A few metrics can go a long way! Any information that can indicate the health and performance of your application is a candidate for metrics and health reports. and each instance reports a requests per second (RPS) metric. To do this.ReportInstanceHealth(healthInformation). the upgrade domain must pass health checks before the deployment moves to the next upgrade domain.000. which is described in detail in these articles: Introduction to Service Fabric health monitoring Report and check service health Add custom Service Fabric health reports View Service Fabric health reports Health monitoring is critical to multiple aspects of operating a service. set the metric weight to zero. but it might be worth a look to be sure that your customers are having a great experience. but it doesn't tell you whether a particular service is healthy. items processed. because multiple services might be running on a single node. and request latency all can indicate the health of a specific service. Watchdogs A watchdog is a separate service that can watch health and load across services. reporting on service performance. and alerts The final part of monitoring is visualizing the event stream. You can use Azure Application Insights and Operations Management Suite to alert based on the stream of events. Next steps Collect logs with Azure Diagnostics Collect logs directly from an Azure Service Fabric service process Manage resource consumption and load in Service Fabric with metrics . This can help prevent errors that would not be detected based on the view of a single service. and alerting when an issue is detected. and report health for anything in the health model hierarchy. You can use different solutions for this aspect of monitoring. You can use Microsoft Power BI or a third-party solution like Kibana or Splunk to visualize the data. 1234). Visualization. analysis. this.ReportLoad(new List<LoadMetric> { new LoadMetric("MemoryInMb". 42) }). new LoadMetric("metric1". Watchdogs also are a good place to host code that can perform remediation actions for known conditions without user interaction.ServicePartition. If you haven't already done so. application and service the event is coming from. It also helps you to more easily understand the sequences and interrelationships between your application code and events in the underlying system. 1. detecting. While monitoring and diagnostics are critical in an actual deployed production environment. Go to the View tab in Visual Studio. Service Fabric makes it easy for service developers to implement diagnostics that can seamlessly work across both single-machine local development setups and real- world production cluster setups. . If the diagnostics events window does not automatically show. you can also pause by using the Pause button at the top of the events window and resume later without any loss of events. For example. You can also filter the list of events by using the Filter events box at the top of the events window. It was built as a tracing technology that has minimal impact on code execution times. This information will help you get an application up and running with the Diagnostics Events Viewer showing the trace messages. 2. diagnosing. and troubleshooting allow for services to continue with minimal disruption to the user experience. the efficiency will depend on adopting a similar model during development of services to ensure they work when you move to a real-world setup. And when you're looking at event details. go ahead and follow the steps in Creating your first application in Visual Studio. Each event has standard metadata information that tells you the node. There is built-in support in Service Fabric Visual Studio tools to view ETW events. you can filter on Node Name or Service Name. choose Other Windows and then Diagnostic Events Viewer. The benefits of Event Tracing for Windows Event Tracing for Windows (ETW) is the recommended technology for tracing messages in Service Fabric. Reasons for this are: ETW is fast. This allows you to view your application traces interleaved with Service Fabric system traces. View Service Fabric system events in Visual Studio Service Fabric emits ETW events to help application developers understand what's happening in the platform. This means you don't have to rewrite your tracing code when you are ready to deploy your code to a real cluster. Service Fabric system code also uses ETW for internal tracing. Monitor and diagnose services in a local machine development setup 4/5/2017 • 3 min to read • Edit Online Monitoring. ETW tracing works seamlessly across local development environments and also real-world cluster setups. After adding custom ETW tracing to your service code. Open the "ProjectName".cs file where ProjectName is the name you chose for your Visual Studio project.Current. 2. in the DoWorkAsync method. The advantage of this method is that metadata is automatically added to traces.cs file. Next steps The same tracing code that you added to your application above for local diagnostics will work with tools that you can use to view these events when running your application on an Azure cluster. you will find an overload for the ActorEventSource. The call to ServiceEventSource.ActorMessage method that should be used for high-frequency events due to performance reasons.Current. For projects created from the service templates (stateless or stateful) just search for the RunAsync implementation: 1. In the ServiceEventSource.ActorMessage(this. Check out these articles that discuss the different options for the tools and describe how you can set them up. you will find an overload for the ServiceEventSource.cs. The code shows how to add custom application code ETW traces that show up in the Visual Studio ETW viewer alongside system traces from Service Fabric. 2. For projects created from the actor templates (stateless or stateful): 1. This is an example of a custom ETW trace written from application code. If you debug the application with F5. In file ActorEventSource. and run the application again to see your event(s) in the Diagnostic Events Viewer. How to collect logs with Azure Diagnostics Collect logs directly from service process . "Doing Work"). the Diagnostic Events Viewer will open automatically.Add your own custom traces to the application code The Service Fabric Visual Studio project templates contain sample code.ServiceMessage method that should be used for high-frequency events due to performance reasons. Find the code ActorEventSource. you can build. and the Visual Studio Diagnostic Events Viewer is already configured to display them. 3. deploy.ServiceMessage in the RunAsync method shows an example of a custom ETW trace from the application code. console files. Debugging Service Fabric C# applications . You can create a app.file=<path to app.util. you need to also modify your entry point script. One can view the logs in syslog under /var/log/syslog.FileHandler. and troubleshooting allow for services to continue with minimal disruption to the user experience.%g.count = 10 java. The entry should look like the following snippet: java -Djava.util. diagnosing.util. detecting. The log file in this case is named mysfapp%u. For each of these options. After the app.util. see the code examples in github.pattern = /tmp/servicefabric/logs/mysfapp%u. For more information. it is also used for the code examples in github.config.logging.properties> -jar <service name>.logging.sh in the <applicationfolder>/<servicePkg>/Code/ folder to set the property java.limit = 1024000 java.logging. Monitor and diagnose services in a local machine development setup 3/7/2017 • 3 min to read • Edit Online Monitoring.logging is the default option with the JRE. Service Fabric makes it easy for service developers to implement diagnostics that can seamlessly work across both single-machine local development setups and real-world production cluster setups.util.jar This configuration results in logs being collected in a rotating fashion at /tmp/servicefabric/logs/ . Monitoring and diagnostics are critical in an actual deployed production environment.logging.formatter = java.propertes file.library.util.FileHandler.logging. there are default handlers already provided in the framework.properties file to configure the file handler for your application to redirect all logs to a local file.util. Debugging Service Fabric Java applications For Java applications.logging. Using java.logging.util. %g is the generation number to distinguish between rotating logs.logging. The following code snippet contains an example configuration: handlers = java. the console handler is registered. By default if no handler is explicitly configured.log where: %u is a unique number to resolve conflicts between simultaneous Java processes. or sockets.FileHandler java.util. The following discussion explains how to configure the java. output streams.%g. Adopting a similar model during development of services ensures that the diagnostic pipeline works when you move to a production environment.SimpleFormatter java.util. multiple logging frameworks are available.file to app. entrypoint.path=$LD_LIBRARY_PATH -Djava. Since java.logging.properties file must exist.util.FileHandler.FileHandler.FileHandler.logging you can redirect your application logs to memory.config.logging framework.level = ALL java.properties file is created.log The folder pointed to by the app.util. Multiple frameworks are available for tracing CoreCLR applications on Linux. Write(eventData.Payload != null ? eventData.Task. The first step is to include System.Join(". } protected override void OnEventWritten(EventWrittenEventArgs eventData) { using (StreamWriter Out = new StreamWriter( new FileStream("/tmp/MyServiceLog. Since EventSource is familiar to C# developers. This file name needs to be appropriately updated.Message != null) Out. add the following project to your project. For logging using EventSource.Tracing so that you can write your logs to memory.txt".IsEnabled()) { var finalMessage = string.ToString()).`this article uses EventSource for tracing in CoreCLR samples on Linux. } } // TBD: Need to add method for sample event. ".1" You can use a custom EventListener to listen for the service event and then appropriately redirect them to trace files. use the following snippet in your customized .All). The following code snippet shows a sample implementation of logging using EventSource and a custom EventListener: public class ServiceEventSource : EventSource { public static ServiceEventSource Current = new ServiceEventSource(). For more information.Write(" {0} ". EventLevel.ToString().". } internal class ServiceEventListener : EventListener { protected override void OnEventSourceCreated(EventSource eventSource) { EnableEvents(eventSource. sargs != null ? string.LogAlways.WriteLine("({0}). see GitHub: logging. In case you want to redirect the logs to console.Payload. params object[] args) { if (this.ToArray() : null. sargs) : "").Diagnostics.txt . [NonEvent] public void Message(string message.ToString(). args). EventKeywords.Level. FileMode.EventName. eventData. } } } } The preceding snippet outputs the logs to a file in /tmp/MyServiceLog.WriteLine(eventData.0. else { string[] sargs = eventData. eventData.Append))) { // report all event information Out.ToArray()).Message.Diagnostics.Message(finalMessage). or console files.StackTrace": "4. eventData.Format(message. this.json: "System. eventData.EventId.Select(o => o. Out.Payload. if (eventData. output streams."")). The samples at C# Samples use EventSource and a custom EventListener to log events to a file. Next steps The same tracing code added to your application also works with the diagnostics of your application on an Azure cluster.Out.EventListener class: public static TextWriter Out = Console. Check out these articles that discuss the different options for the tools and describe how to set them up. How to collect logs with Azure Diagnostics . which uploads logs to Azure Storage. Deploy the Diagnostics extension The first step in collecting logs is to deploy the Diagnostics extension on each of the VMs in the Service Fabric cluster. One way to upload and collect logs is to use the Azure Diagnostics extension. Logs can be one of several types: Operational events: Logs for operations that the Service Fabric platform performs. The steps vary a little based on whether you use the Azure portal or Azure Resource Manager. Azure Application Insights comes with a comprehensive log search and analytics service built-in. But you can use an external process to read the events from storage and place them in a product such as Log Analytics or another log-parsing solution. Collect logs by using Azure Diagnostics 4/12/2017 • 8 min to read • Edit Online When you're running an Azure Service Fabric cluster. or issues in the applications and services running in that cluster. ensure that Diagnostics is set to On (the default setting). you can't change these settings by using the portal. The steps also vary based on whether the deployment is part of cluster creation or is for a cluster that already exists. you use the Diagnostics settings panel shown in the following image. Azure Application Insights. Prerequisites You use these tools to perform some of the operations in this document: Azure Diagnostics (related to Azure Cloud Services but has good information and examples) Azure Resource Manager Azure PowerShell Azure Resource Manager client Azure Resource Manager template Log sources that you might want to collect Service Fabric logs: Emitted from the platform to standard Event Tracing for Windows (ETW) and EventSource channels. Examples include creation of applications and services. or Azure Event Hubs. The logs are not that useful directly in storage or in Event Hubs. node state changes. it's a good idea to collect the logs from all the nodes in a central location. Let's look at the steps for each scenario. For more information on how to write logs from your application. To enable Reliable Actors or Reliable Services event collection. see Monitor and diagnose services in a local machine development setup. Reliable Actors programming model events Reliable Services programming model events Application events: Events emitted from your service's code and written out by using the EventSource helper class provided in the Visual Studio templates. Having the logs in a central location helps you analyze and troubleshoot issues in your cluster. Deploy the Diagnostics extension as part of cluster creation through the portal To deploy the Diagnostics extension to the VMs in the cluster as part of cluster creation. After you create the cluster. and upgrade information. The Diagnostics extension collects logs on each VM and uploads them to the storage account that you specify. . and PowerShell files. Edit the parameters. To use the downloaded template to update a configuration: 1. 3. we highly recommend that you download the template before you click OK to create the cluster. and a unique deployment name. Select Settings to display the settings panel. Extract the contents to a folder on your local computer. 6. because you can't make some changes by using the portal. you might have to fix null parameter values. 5. Run deploy. You'll need the template to make changes later. Select Export Template to display the template panel. refer to Set up a Service Fabric cluster by using an Azure Resource Manager template. Modify the contents to reflect the new configuration. . 4. 3. Start PowerShell and change to the folder where you extracted the content. 2. Select Deployments to display the deployment history panel. When you're creating a cluster by using the portal. parameter. the resource group name (use the same name to update the configuration). These events include Reliable Actors events. These logs are collected in real time and are stored in one of the storage accounts created in the resource group. However. Products such as Elasticsearch or your own process can get the events from the storage account. Reliable Services events. the table will continue to grow.ps1 and fill in the subscription ID.json file and remove the adminPassword element. 4. When you're running the deployment script. these templates can be more difficult to use because they might have null values that are missing required information. The Diagnostics settings configure application-level events. Select Save to file to export a .The Azure support team requires support logs to help resolve any support requests that you create. After you export the files. you need to make a modification. You can export templates from the portal by using the following steps. and some system-level Service Fabric events to be stored in Azure Storage. Select a deployment to display the details of the deployment. This will cause a prompt for the password when the deployment script is run. 2. For details. There is currently no way to filter or groom the events that are sent to the table.zip file that contains the template. Open your resource group. If you don't implement a process to remove events from the table. 1. New-AzureRmResourceGroupDeployment -ResourceGroupName $resourceGroupName -Name $deploymentName -TemplateFile $pathToARMConfigJsonFile -TemplateParameterFile $pathToParameterFile –Verbose Deploy the Diagnostics extension to an existing cluster If you have an existing cluster that doesn't have Diagnostics deployed. Next. Modify the Resource Manager template that's used to create the existing cluster or download the template from the portal as described earlier. or if you want to modify an existing configuration. "clusterName": "[parameters('clusterName')]" } }. "location": "[parameters('computeLocation')]". See the following code for the parameters that you pass in to the command. You can see it at this location in the Azure Samples gallery: Five-node cluster with Diagnostics Resource Manager template sample.json file and search for IaaSDiagnostics. and create a cluster with the modified template by using the New-AzureRmResourceGroupDeployment command in an Azure PowerShell window. make changes to it. select the Deploy to Azure button available at the previous link. Replace the placeholder text storage account name goes here with the name of the storage account. add to the parameters section just after the storage account definitions. open the azuredeploy. see the article Deploy a resource group with the Azure Resource Manager template.json file by performing the following tasks. "properties": { "accountType": "[parameters('applicationDiagnosticsStorageAccountType')]" }. Alternatively. you can add or update it. Add a new storage resource to the template by adding to the resources section. "name": "[parameters('applicationDiagnosticsStorageAccountName')]". "type": "Microsoft. .Storage/storageAccounts". For detailed information on how to deploy a resource group by using PowerShell. Modify the template. you can download the Resource Manager sample. To create a cluster by using this template. We provide a sample five-VM cluster Resource Manager template with Diagnostics configuration added to it as part of our Resource Manager template samples. { "apiVersion": "2015-05-01-preview". To see the Diagnostics setting in the Resource Manager template. between supportLogStorageAccountName and vmNodeType0Name . "tags": { "resourceType": "Service Fabric".Deploy the Diagnostics extension as part of cluster creation by using Azure Resource Manager To create a cluster by using Resource Manager. you need to add the Diagnostics configuration JSON to the full cluster Resource Manager template before you create the cluster. Be sure to add a comma at the beginning or the end. . "defaultValue": "storage account name goes here". "defaultValue": "Standard_LRS". Then. "applicationDiagnosticsStorageAccountType": { "type": "string". "metadata": { "description": "Name for the storage account that contains application diagnostics data from the cluster" } }. depending on where it's inserted. "applicationDiagnosticsStorageAccountName": { "type": "string". update the VirtualMachineProfile section of the template.json file by adding the following code within the extensions array. "metadata": { "description": "Replication option for the application diagnostics storage account" } }. "Standard_GRS" ]. "allowedValues": [ "Standard_LRS". key1]". "DefaultEvents": { "eventDestination": "ServiceFabricSystemEventTable" } } ] } } }. "storageAccountEndPoint": "https://core. This allows for aggregating and viewing system health over time and for alerting based on health or load events. "protectedSettings": { "storageAccountName": "[parameters('applicationDiagnosticsStorageAccountName')]". "scheduledTransferPeriod": "PT5M".5" } } After you modify the template. "EtwProviders": { "EtwEventSourceProviderConfiguration": [ { "provider": "Microsoft-ServiceFabric-Actors". { "name": "[concat(parameters('vmNodeType0Name'). "scheduledTransferPeriod": "PT5M". "EtwManifestProviderConfiguration": [ { "provider": "cbd93bc2-71e5-4566-b3a7-595d8eeca6e8". ensure that ProvisioningState is Succeeded.Diagnostics". running the deploy. "typeHandlerVersion": "1. "DefaultEvents": { "eventDestination": "ServiceFabricReliableActorEventTable" } }. republish the Resource Manager template. "settings": { "WadCfg": { "DiagnosticMonitorConfiguration": { "overallQuotaInMB": "50000".'2015-05-01-preview'). "properties": { "type": "IaaSDiagnostics". { "provider": "Microsoft-ServiceFabric-Services". After you deploy. "publisher": "Microsoft. parameters('applicationDiagnosticsStorageAccountName')). "scheduledTransferPeriod": "PT5M". "autoUpgradeMinorVersion": true. These events reflect events generated by the system or your code by using the health or load reporting APIs such as ReportPartitionHealth or ReportLoad.VMDiagnosticsSettings')]".Insights.ps1 file republishes the template.net/" }. Update diagnostics to collect health and load events Starting with the 5.Azure. To view these events in Visual Studio's Diagnostic Event Viewer add . "scheduledTransferKeywordFilter": "4611686018427387904".windows.'_Microsoft.json file as described.4 release of Service Fabric.Storage/storageAccounts'. "scheduledTransferKeywordFilter": "1". "StorageAccount": "[parameters('applicationDiagnosticsStorageAccountName')]" }. "storageAccountKey": "[listKeys(resourceId('Microsoft. If the template was exported. health and load metric events are available for collection. "scheduledTransferLogLevelFilter": "Information". "DefaultEvents": { "eventDestination": "ServiceFabricReliableServiceEventTable" } } ]. For example. "scheduledTransferLogLevelFilter": "Information". Then. see the diagnostic events emitted for Reliable Actors and Reliable Services. "DefaultEvents": { "eventDestination": "ServiceFabricSystemEventTable" } } Update Diagnostics to collect and upload logs from new EventSource channels To update Diagnostics to collect logs from new EventSource channels that represent a new application that you're about to deploy. modify the Resource Manager template by using the examples provided in Create a Windows virtual machine with monitoring and diagnostics by using an Azure Resource Manager template. republish the Resource Manager template. To collect the events.cs file. "DefaultEvents": { "eventDestination": "MyDestinationTableName" } } To collect performance counters or event logs. if your event source is named My-Eventsource.json file to add entries for the new EventSource channels before you apply the configuration update by using the New-AzureRmResourceGroupDeployment PowerShell command. "scheduledTransferPeriod": "PT5M". "scheduledTransferKeywordFilter": "4611686018427387912". modify the resource manager template to include "EtwManifestProviderConfiguration": [ { "provider": "cbd93bc2-71e5-4566-b3a7-595d8eeca6e8". Next steps To understand in more detail what events you should look for while troubleshooting issues."Microsoft-ServiceFabric:4:0x4000000000000008" to the list of ETW providers. Update the EtwEventSourceProviderConfiguration section in the template. { "provider": "My-Eventsource". add the following code to place the events from My- Eventsource into a table named MyDestinationTableName. Related articles Learn how to collect performance counters or logs by using the Diagnostics extension Service Fabric solution in Log Analytics . perform the same steps as in the previous section for setup of Diagnostics for an existing cluster. The name of the event source is defined as part of your code in the Visual Studio-generated ServiceEventSource. "scheduledTransferPeriod": "PT5M". it creates a syslog entry that is sent to the storage that you specified. see the LTTng documentation on tracing your application. You can also use Operations Management Suite. your application. Log sources that you might want to collect Service Fabric logs: Emitted from the platform via LTTng and uploaded to your storage account. LTTng. Collect logs by using Azure Diagnostics 3/27/2017 • 2 min to read • Edit Online When you're running an Azure Service Fabric cluster. Next steps To understand in more detail what events you should examine while troubleshooting issues. (To get the storage account details.) Application events: Emitted from your service's code. You can use any logging solution that writes text-based log files--for example. see LTTng documentation and Using LAD. The steps vary based on whether you use the Azure portal or Azure Resource Manager. whether they are in your services. Whenever a new line is appended to the file. This process is explained as scenario 3 ("Upload your own log files") in the article Using LAD to monitor and diagnose Linux VMs. After you create the cluster. The Diagnostics extension collects logs on each VM and uploads them to the storage account that you specify. You can also read the events from storage or Event Hubs and place them in a product such as Log Analytics or another log-parsing solution. Logs can be operational events or runtime events that the platform emits. which uploads logs to Azure Storage. These logs are stored in the location that the cluster manifest specifies. You can also deploy the Diagnostics extension by using Azure Resource Manager. The process is similar for Windows and Linux and is documented for Windows clusters in How to collect logs with Azure Diagnostics. search for the tag AzureTableWinFabETWQueryable and look for StoreConnectionString. the LAD agent monitors the specified log files. After you finish this configuration. configure Linux Azure Diagnostics (LAD) to collect the files and place them into your storage account. To deploy the Diagnostics extension to the VMs in the cluster as part of cluster creation. Then. you can't change this setting by using the portal. For more information. it's a good idea to collect the logs from all the nodes in a central location. Azure Application Insights or Azure Event Hubs. or the cluster itself. Deploy the Diagnostics extension The first step in collecting logs is to deploy the Diagnostics extension on each of the VMs in the Service Fabric cluster. You can upload the traces to a visualizer of your choice. Azure Application Insights comes with a comprehensive log search and analytics service built-in. One way to upload and collect logs is to use the Azure Diagnostics extension. as described in Operations Management Suite Log Analytics with Linux. . Having the logs in a central location makes it easy to analyze and troubleshoot issues. Following this process gets you access to the traces. set Diagnostics to On. With agent-based log collection. Access to internal application data and context The diagnostic subsystem running inside the application/service process can easily augment the traces with contextual information. Agent-based log collection usually requires a separate deployment and configuration of the diagnostic agent. with many services running inside a Service Fabric cluster. Flexibility The application can send the data wherever it needs to go. An agent such as Azure Diagnostics can gather data from multiple services and send all data through a few connections. improving throughput. the data must be sent to an agent via some inter-process communication mechanism. than in-process log collection. Indeed. It is easy to always keep it "in sync" with the rest of the application. does not change often. Agent-based collection is a natural solution for collecting logs related to the whole cluster and individual cluster nodes. each service doing its own in-process log collection results in numerous outgoing connections from the cluster. and there is a straightforward mapping between the sources and their destinations. We use Azure Application Insights as the log destination. This mechanism could impose additional limitations. an alternative is to have services send their logs directly to a central location. which is an extra administrative task and a potential source of errors. Some agents are extensible. Also. but EventFlow has the benefit of having been designed specifically for in-process log collection and to support Service Fabric services. Complex capture. we show how to set up an in-process log collection using EventFlow open-source library. Large number of outgoing connections is taxing both for the network subsystem and for the log destination. Often there is only one instance of the agent allowed per virtual machine (node) and the agent configuration is shared among all applications and services running on that node. Other libraries might be used for the same purpose. and data-aggregation rules can be implemented. If not. as long as there is a client library that supports the targeted data storage system. to diagnose service startup problems and crashes. It is much more reliable way. For . In this article. Other destinations such as Event Hubs or Elasticsearch are also supported. This process is known as in-process log collection and has several potential advantages: Easy configuration and deployment The configuration of diagnostic data collection is just part of the service configuration. filtering. such as Event Tracing for Windows. Agent-based log collection is often limited by the data sinks that the agent supports. It is just a question of installing appropriate NuGet package and configuring the destination in the EventFlow configuration file. Per-application or per-service configuration is easily achievable. it might be the best solution for many applications. New destinations can be added as desired. It is possible to combine and benefit from both collection methods. Collect logs directly from an Azure Service Fabric service process 4/19/2017 • 6 min to read • Edit Online In-process log collection Collecting application logs using Azure Diagnostics extension is a good option for Azure Service Fabric services if the set of log sources and destinations is small. Input.ServiceFabric package installs a starting EventFlow configuration file under PackageRoot\Config solution folder. After all the packages are installed. the next step is to configure and enable EventFlow in the service.EventFlow.Diagnostics.json .EventFlow. right-click the project in the Solution Explorer and choose "Manage NuGet packages.ApplicationInsights (we are going to send the logs to an Azure Application Insights resource) Microsoft.Input. To add EventFlow to a Service Fabric service project.Diagnostics.EventFlow.more information on log destinations other than Application Insights.NET Framework 4." Switch to the "Browse" tab and search for " Diagnostics.Diagnostics. Configuring and enabling log collection EventFlow pipeline. Adding EventFlow library to a Service Fabric service project EventFlow binaries are available as a set of NuGet packages.EventFlow ": The service hosting EventFlow should include appropriate packages depending on the source and destination for the application logs. responsible for sending the logs.6 or newer.Output. .ServiceFabric (enables initialization of the EventFlow pipeline from Service Fabric service configuration and reports any problems with sending diagnostic data as Service Fabric health reports) NOTE Microsoft. and from standard EventSources such as Microsoft-ServiceFabric-Services and Microsoft-ServiceFabric-Actors) Microsoft.EventSource (to capture data from the service's EventSource class.EventFlow.Diagnostics.Diagnostics.EventSource package requires the service project to target . The file name is eventFlowConfig. Microsoft. Add the following packages: Microsoft. is created from a specification stored in a configuration file. This configuration file needs to be modified to capture data from the default service EventSource class and send data to Application Insights service. see EventFlow documentation. Make sure you set the appropriate target framework in project properties before installing this package.EventFlow. // (replace the following value with your service's ServiceEventSource name) { "providerName": "your-service-EventSource-name" } ] } ]. "include": "Level == Verbose" } ].or configuration-only upgrades of the service. "sources": [ { "providerName": "Microsoft-ServiceFabric-Services" }. In the following example EventFlow-related additions are marked with comments starting with **** : . see Service Fabric application upgrade. "filters": [ { "type": "drop". "outputs": [ { "type": "ApplicationInsights". Changes to this file can be included in full. If you need more information. which is part of the service code. "schemaVersion": "2016-08-11" } NOTE The name of service's ServiceEventSource is the value of the Name property of the EventSourceAttribute applied to the ServiceEventSource class.json file in the editor and change its content as shown below. subject to Service Fabric upgrade health checks and automatic rollback if there is upgrade failure. located in Program. please see Create an Application Insights resource. It is all specified in the ServiceEventSource. Open the eventFlowConfig. For more information. The final step is to instantiate EventFlow pipeline in your service's startup code. For example. // (replace the following value with your AI resource's instrumentation key) "instrumentationKey": "00000000-0000-0000-0000-000000000000" } ].cs file. { "providerName": "Microsoft-ServiceFabric-Actors" }.cs file. in the following code snippet the name of the ServiceEventSource is MyCompany-Application1-Stateless1: [EventSource(Name = "MyCompany-Application1-Stateless1")] internal sealed class ServiceEventSource : EventSource { // (rest of ServiceEventSource implementation) } Note that eventFlowConfig. { "inputs": [ { "type": "EventSource".json file is part of service configuration package. NOTE We assume that you are familiar with Azure Application Insights service and that you have an Application Insights resource that you plan to use to monitor your Service Fabric service. Make sure to replace the ServiceEventSource name and Application Insights instrumentation key according to comments. } } catch (Exception e) { ServiceEventSource.Services. Open a web browser and navigate go to your Application Insights resource. typeof(Stateless1).ServiceHostInitializationFailed(e.Threading.Current. After a short delay you should start seeing your traces in the Application Insights portal: . Verification Start your service and observe the Debug output window in Visual Studio. // **** EventFlow namespace using Microsoft. This name is used if EventFlow encounters and error and reports it through the Service Fabric health subsystem.EventFlow. using Microsoft. using System.Current.RegisterServiceAsync("Stateless1Type".Sleep(Timeout. namespace Stateless1 { internal static class Program { /// <summary> /// This is the entry point of the service host process. context => new Stateless1(context)).GetCurrentProcess(). you should start seeing evidence that your service is sending "Application Insights Telemetry" records. } } } } The name passed as the parameter of the CreatePipeline method of the ServiceFabricDiagnosticsPipelineFactory is the name of the health entity representing the EventFlow log collection pipeline. throw.GetAwaiter().Runtime.ToString()).Id.GetResult(). Thread. After the service is started. using Microsoft.Name). ServiceEventSource. using System. Open "Search" tab (at the top of the default "Overview" blade).Infinite). using System.ServiceFabric.ServiceTypeRegistered(Process.ServiceFabric.CreatePipeline("MyApplication-MyService-DiagnosticsPipeline")) { ServiceRuntime.Diagnostics.Diagnostics. /// </summary> private static void Main() { try { // **** Instantiate log collection via EventFlow using (var diagnosticsPipeline = ServiceFabricDiagnosticPipelineFactory.ServiceFabric. Next steps Learn more about diagnosing and monitoring a Service Fabric service EventFlow documentation . and StatefulRunAsyncCancellation events are useful to the service writer to understand the lifecycle of a service--as well as the timing for when a service is started. provide insights into how the runtime is operating. Service writers should pay close attention to StatefulRunAsyncSlowCancellation and StatefulRunAsyncFailure events because they indicate issues with the service. Events EVENT NAME EVENT ID LEVEL EVENT DESCRIPTION StatefulRunAsyncInvocation 1 Informational Emitted when service RunAsync task is started StatefulRunAsyncCancellatio 2 Informational Emitted when service n RunAsync task is cancelled StatefulRunAsyncCompletio 3 Informational Emitted when service n RunAsync task is completed StatefulRunAsyncSlowCancel 4 Warning Emitted when service lation RunAsync task takes too long to complete cancellation StatefulRunAsyncFailure 5 Error Emitted when service RunAsync task throws an exception Interpret events StatefulRunAsyncInvocation. This can be an expensive operation and can delay incoming requests while the service is moved. Service writers should determine the cause of the exception and. mitigate it. so it is moved to a different node. StatefulRunAsyncFailure is emitted whenever the service RunAsync() task throws an exception. the exception causes the service to fail. Examples of tools and technologies that help in collecting and/or viewing EventSource events are PerfView. Microsoft Azure Diagnostics. Events from this event source appear in the Diagnostics Events window when the service is being debugged in Visual Studio. or completed. Diagnostic functionality for Stateful Reliable Services 1/17/2017 • 1 min to read • Edit Online The Stateful Reliable Services StatefulServiceBase class emits EventSource events that can be used to debug the service. and the Microsoft TraceEvent Library. an exception thrown indicates an error or bug in the service. StatefulRunAsyncCompletion. cancelled. Typically. EventSource events The EventSource name for the Stateful Reliable Services StatefulServiceBase class is "Microsoft-ServiceFabric- Services". StatefulRunAsyncSlowCancellation is emitted whenever a cancellation request for the RunAsync task takes longer . Additionally. and help with troubleshooting. if possible. This can be useful when debugging service issues or understanding the service lifecycle. When a service takes too long to complete cancellation. it impacts the ability for the service to be quickly restarted on another node.than four seconds. Next steps EventSource providers in PerfView . This may impact the overall availability of the service. Keywords All events that belong to the Reliable Actors EventSource are associated with one or more keywords. see the introductory topic on actors. how often an actor method is invoked Each of the above categories has one or more counters. and the Microsoft TraceEvent Library. EventSource events The EventSource provider name for the Reliable Actors runtime is "Microsoft-ServiceFabric-Actors". Examples of tools and technologies that help in collecting and/or viewing EventSource events are PerfView. see the topic on concurrency. CATEGORY DESCRIPTION Service Fabric Actor Counters specific to Azure Service Fabric actors. For more information. Diagnostics and performance monitoring for Reliable Actors 1/17/2017 • 8 min to read • Edit Online The Reliable Actors runtime emits EventSource events and performance counters. 0x2 Set of events that describe actor method calls. The following keyword bits are defined. Performance counters The Reliable Actors runtime defines the following performance counter categories. For more information. This enables filtering of events that are collected. e. For more information. The Windows Performance Monitor application that is available by default in the Windows operating system can be used to collect and view performance counter data. see the topic on actor state management.g. Semantic Logging.g. Events from this event source appear in the Diagnostics Events window when the actor application is being debugged in Visual Studio. Azure Diagnostics is another option for collecting . time taken to save actor state Service Fabric Actor Method Counters specific to methods implemented by Service Fabric actors. These provide insights into how the runtime is operating and help with troubleshooting and performance monitoring. 0x8 Set of events related to turn-based concurrency in the actor. BIT DESCRIPTION 0x1 Set of important events that summarize the operation of the Fabric Actors runtime. 0x4 Set of events related to actor state. Azure Diagnostics. e. ActorsRuntimeMethodId is the string representation of a 32-bit integer that is generated by the Fabric Actors runtime for its internal use.ToString method with format specifier "D". the counter instance names are in the following format: ServiceFabricPartitionID_ActorsRuntimeInternalID ServiceFabricPartitionID is the string representation of the Service Fabric partition ID that the performance counter instance is associated with. ActorRuntimeInternalID is the string representation of a 64-bit integer that is generated by the Fabric Actors runtime for its internal use. and its string representation is generated through the Guid. This is included in the performance counter instance name to ensure its uniqueness and avoid conflict with other performance counter instance names. This is included in the performance counter instance name to ensure its uniqueness and avoid conflict with other performance counter instance names.performance counter data and uploading it to Azure tables. ivoicemailboxactor. ServiceFabricPartitionID is the string representation of the Service Fabric partition ID that the performance counter instance is associated with. and its string representation is generated through the Guid. and 635650083799324046 is the 64-bit ID that is generated for the runtime's internal use. The following is an example of a counter instance name for a counter that belongs to the Service Fabric Actor Method category: ivoicemailboxactor. 2 is the 32-bit ID generated . The partition ID is a GUID. This is included in the performance counter instance name to ensure its uniqueness and avoid conflict with other performance counter instance names. The partition ID is a GUID. The following is an example of a counter instance name for a counter that belongs to the Service Fabric Actor category: 2740af29-78aa-44bc-a20b-7e60fb783264_635650083799324046 In the example above. Users should not try to interpret this portion of the performance counter instance name. Users should not try to interpret this portion of the performance counter instance name. Service Fabric Actor category For the category Service Fabric Actor . The performance counter instance names can help in identifying the specific partition and actor method (if applicable) that the performance counter instance is associated with. Performance counter instance names A cluster that has a large number of actor services or actor service partitions will have a large number of actor performance counter instances. Service Fabric Actor Method category For the category Service Fabric Actor Method . the counter instance names are in the following format: MethodName_ActorsRuntimeMethodId_ServiceFabricPartitionID_ActorsRuntimeInternalID MethodName is the name of the actor method that the performance counter instance is associated with. The format of the method name is determined based on some logic in the Fabric Actors runtime that balances the readability of the name with constraints on the maximum length of the performance counter instance names on Windows.leavemessageasync is the method name.ToString method with format specifier "D". 2740af29-78aa-44bc-a20b-7e60fb783264 is the string representation of the Service Fabric partition ID. ActorRuntimeInternalID is the string representation of a 64-bit integer that is generated by the Fabric Actors runtime for its internal use. Users should not try to interpret this portion of the performance counter instance name.leavemessageasync_2_89383d32-e57e-4a9b-a6ad-57c6792aa521_635650083804480486 In the example above. ActorMethodStop 8 Verbose 0x2 An actor method has finished executing. ActorMethodThrewEx 9 Warning 0x3 An exception was ception thrown during the execution of an actor method. and the task returned by the actor method has finished. The Reliable Actors runtime publishes the following performance counters related to the execution of actor methods. This event indicates some sort of failure in the actor code that needs investigation.for the runtime's internal use. and 635650083804480486 is the 64-bit ID generated for the runtime's internal use. . the runtime's asynchronous call to the actor method has returned. That is. List of events and performance counters Actor method events and performance counters The Reliable Actors runtime emits the following events related to actor methods. EVENT NAME EVENT ID LEVEL KEYWORD DESCRIPTION ActorMethodStart 7 Verbose 0x2 Actors runtime is about to invoke an actor method. CATEGORY NAME COUNTER NAME DESCRIPTION Service Fabric Actor Method Invocations/Sec Number of times that the actor service method is invoked per second Service Fabric Actor Method Average milliseconds per invocation Time taken to execute the actor service method in milliseconds Service Fabric Actor Method Exceptions thrown/Sec Number of times that the actor service method threw an exception per second Concurrency events and performance counters The Reliable Actors runtime emits the following events related to concurrency. either during the runtime's asynchronous call to the actor method or during the execution of the task returned by the actor method. 89383d32-e57e-4a9b-a6ad-57c6792aa521 is the string representation of the Service Fabric partition ID. CATEGORY NAME COUNTER NAME DESCRIPTION Service Fabric Actor # of actor calls waiting for actor lock Number of pending actor calls waiting to acquire the per-actor lock that enforces turn-based concurrency Service Fabric Actor Average milliseconds per lock wait Time taken (in milliseconds) to acquire the per-actor lock that enforces turn- based concurrency Service Fabric Actor Average milliseconds actor lock held Time (in milliseconds) for which the per-actor lock is held Actor state management events and performance counters The Reliable Actors runtime emits the following events related to actor state management. ActorSaveStateStop 11 Verbose 0x4 Actors runtime has finished saving the actor state. It contains the number of pending actor calls that are waiting to acquire the per-actor lock that enforces turn-based concurrency. The Reliable Actors runtime publishes the following performance counters related to actor state management. The Reliable Actors runtime publishes the following performance counters related to concurrency. CATEGORY NAME COUNTER NAME DESCRIPTION Service Fabric Actor Average milliseconds per save state Time taken to save actor state in operation milliseconds Service Fabric Actor Average milliseconds per load state Time taken to load actor state in operation milliseconds Events related to actor replicas The Reliable Actors runtime emits the following events related to actor replicas. EVENT NAME EVENT ID LEVEL KEYWORD DESCRIPTION ActorMethodCallsWa 12 Verbose 0x8 This event is written itingForLock at the start of each new turn in an actor. EVENT NAME EVENT ID LEVEL KEYWORD DESCRIPTION ActorSaveStateStart 10 Verbose 0x4 Actors runtime is about to save the actor state. . This implies that the actors for this partition will no longer be created inside this replica. The Reliable Actors runtime publishes the following performance counters related to actor request processing. Actor activation and deactivation events and performance counters The Reliable Actors runtime emits the following events related to actor activation and deactivation. This implies that the actors for this partition will be created inside this replica. The actors will be destroyed after any in-progress requests are completed. The Reliable Actors runtime publishes the following performance counters related to actor activation and deactivation. No new requests will be delivered to actors already created within this replica. The service processes the request message and sends a response back to the client. EVENT NAME EVENT ID LEVEL KEYWORD DESCRIPTION ReplicaChangeRoleTo 1 Informational 0x1 Actor replica changed Primary role to Primary. EVENT NAME EVENT ID LEVEL KEYWORD DESCRIPTION ActorActivated 5 Informational 0x1 An actor has been activated. ReplicaChangeRoleFr 2 Informational 0x1 Actor replica changed omPrimary role to non-Primary. ActorDeactivated 6 Informational 0x1 An actor has been deactivated. CATEGORY NAME COUNTER NAME DESCRIPTION Service Fabric Actor Average OnActivateAsync milliseconds Time taken to execute OnActivateAsync method in milliseconds Actor request processing performance counters When a client invokes a method via an actor proxy object. CATEGORY NAME COUNTER NAME DESCRIPTION Service Fabric Actor # of outstanding requests Number of requests being processed in the service . it results in a request message being sent over the network to the actor service. CATEGORY NAME COUNTER NAME DESCRIPTION Service Fabric Actor Average milliseconds per request Time taken (in milliseconds) by the service to process a request Service Fabric Actor Average milliseconds for request Time taken (in milliseconds) to deserialization deserialize actor request message when it is received at the service Service Fabric Actor Average milliseconds for response Time taken (in milliseconds) to serialize serialization the actor response message at the service before the response is sent to the client Next steps How Reliable Actors use the Service Fabric platform Actor API reference documentation Sample code EventSource providers in PerfView . Fabric.\DevClusterSetup. You should now be able to successfully run the script. saying that the cmdlet is not recognized.AppTrace. This will fully refresh your path. Cluster connection failures Service Fabric PowerShell cmdlets are not recognized in Azure PowerShell Problem If you try to run any of the Service Fabric PowerShell cmdlets.ps1 Solution Close the current PowerShell window and open a new PowerShell window as an administrator. so this should no longer occur. Cluster connection fails with "Object is closed" Problem A call to Connect-ServiceFabricCluster fails with an error like this: .PowerShell. Solution Always run Service Fabric cmdlets directly from Windows PowerShell. The reason for this is that Azure PowerShell uses the 32-bit version of Windows PowerShell (even on 64-bit OS versions).Commands. Type Initialization exception Problem When you are connecting to the cluster in PowerShell. Please sign out of Windows and sign back in. you see an error like this: Cannot clean up C:\SfDevCluster\Log fully as references are likely being held to items in it. Cluster setup failures Cannot clean up Service Fabric logs Problem While running the DevClusterSetup script. you see the error TypeInitializationException for System. At line:1 char:1 + . WriteErrorException + FullyQualifiedErrorId : Microsoft. such as Connect-ServiceFabricCluster in an Azure PowerShell window.ps1 + ~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : NotSpecified: (:) [Write-Error]. Troubleshoot your local development cluster setup 3/3/2017 • 2 min to read • Edit Online If you run into an issue while interacting with your local Azure Service Fabric development cluster. whereas the Service Fabric cmdlets only work in 64-bit environments.Common.WriteErrorException. it fails. Please remove those and run this script again. Solution Your path variable was not correctly set during installation. NOTE The latest version of Azure PowerShell does not create a special shortcut. review the following suggestions for potential solutions.DevClusterSetup. Please note that all deployed applications and associated data will be removed.ConnectCluster Solution Close the current PowerShell window and open a new PowerShell window as an administrator. your local cluster begins to behave abnormally. FabricObjectClosedException + FullyQualifiedErrorId : CreateClusterConnectionErrorId. Only Service Fabric application projects should be set as startup projects. rather than allowing the Service Fabric runtime to start it for you.ServiceFabric. you get a FabricConnectionDeniedException error. following setup. This will remove the existing cluster and set up a new one. Fabric Connection Denied exception Problem When debugging from Visual Studio. Ensure that you do not have any service projects set as startup projects in your solution.Microsoft. Solution This error usually occurs when you try to try to start a service host process manually. Next steps Understand and troubleshoot your cluster with system health reports Visualize your cluster with Service Fabric Explorer . You should now be able to successfully connect. you can reset it using the local cluster manager system tray application. Connect-ServiceFabricCluster : The object is closed.Powershell. At line:1 char:1 + Connect-ServiceFabricCluster + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + CategoryInfo : InvalidOperation: (:) [Connect-ServiceFabricCluster]. TIP If. The source code used in the article is also available on GitHub. also known as sharding. The second one is computation-only services (like a calculator or image thumbnailing) that do not manage any persistent state. Partition Service Fabric reliable services 4/7/2017 • 15 min to read • Edit Online This article provides an introduction to the basic concepts of partitioning Azure Service Fabric reliable services. we can think about partitioning as a concept of dividing state (data) and compute into smaller accessible units to improve scalability and performance. In a broader sense. Partition Service Fabric stateless services For stateless services. In fact. In either case. There are really two types of stateless service solutions. . you can think about a partition being a logical unit that contains one or more instances of a service. partitioning a stateless service is a very rare scenario--scalability and availability are normally achieved by adding more instances. it is a core pattern of building scalable services. As an example. The remainder of this walkthrough focuses on stateful services. Partitioning Partitioning is not unique to Service Fabric.g. consider a case where users with IDs in a certain range should only be served by a particular service instance. Another example of when you could partition a stateless service is when you have a truly partitioned backend (e. A well-known form of partitioning is data partitioning. for example in an Azure SQL database (like a website that stores the session information and data). The only time you want to consider multiple partitions for stateless service instances is when you need to meet special routing requests. Those types of scenarios can also be solved in different ways and do not necessarily require service partitioning. a sharded SQL database) and you want to control which service instance should write to the database shard--or perform other preparation work within the stateless service that requires the same partitioning information as is used in the backend. Figure 1 shows a stateless service with five instances distributed across a cluster using one partition. The first one is a service that persists its state externally. Partitioning in the context of Service Fabric stateful services refers to the process of determining that a particular service partition is responsible for a portion of the complete state of the service. if you scaled back to 5 nodes. partitions grow. a partition is a set of replicas). (As mentioned before. As the data needs grow. Plan for partitioning Before implementing a service. you should always consider the partitioning strategy that is required to scale out. say you start with a 5-node cluster and a service that is configured to have 10 partitions and a target of three replicas. Then. As a result. the scale-out is achieved since requests from clients are distributed across computers. you could create a partition for each city in the county.Partition Service Fabric stateful services Service Fabric makes it easy to develop scalable stateful services by offering a first-class way to partition state (data). . Figure 3 illustrates a set of people and the city in which they reside. Service Fabric would balance and distribute the replicas across the cluster--and you would end up with two primary replicas per node. In this case. Conceptually. A great thing about Service Fabric is that it places the partitions on different nodes. Likewise. as the first step. If you were to build a service for a countywide poll. but all of them focus on what the application needs to achieve. This ensures the continued efficient use of hardware resources. and Service Fabric rebalances partitions across nodes. Let's take a simple example. Figure 2 shows the distribution of 10 partitions before and after scaling the cluster. and contention on access to chunks of data is reduced. Service Fabric would rebalance all the replicas across the 5 nodes. If you now need to scale out the cluster to 10 nodes. This allows them to grow to a node's resource limit. let's consider some of the more important aspects. you can think about a partition of a stateful service as a scale unit that is highly reliable through replicas that are distributed and balanced across the nodes in a cluster. you could store the votes for every person in the city in the partition that corresponds to that city. For the context of this article. To give you an example. A good approach is to think about the structure of the state that needs to be partitioned. Service Fabric would rebalance the primary replicas across all 10 nodes. There are different ways. overall performance of the application is improved. you cannot know how much data will be in a given partition. you can easily see that the partition that holds the votes for Seattle will get more traffic than the Kirkland one. Service Fabric provides the capability to report load consumed by services.g. By default. from a partitioning point of view: Try to partition the state so that it is evenly distributed across all partitions. In rare cases. Another aspect of partition planning is to choose the correct number of partitions to begin with. such as creating a new service instance of the same service type. you may end up with some partitions that contain a lot of data (e. based on client-side knowledge that your client code must maintain. Seattle) and other partitions with very little state (e. You would preferably want to avoid hot and cold spots like this in a cluster. (For information on how. such as amount of memory or number of records. Service Fabric detects that some partitions are serving higher loads than others and rebalances the cluster by moving replicas to more suitable nodes. while the second helps smooth out temporary differences in access or load over time. so that overall no node is overloaded. you should do two things. by reporting load. So what is the impact of having partitions with uneven amounts of state? If you think about the example again. Another consideration for partitioning planning is the available computer resources. You would also need to implement some client-side logic that routes the requests to the correct service instance. you may end up needing more partitions than you have initially chosen. Kirkland). check out this article on Metrics and Load). As the state needs to be .g. you would need to apply some advanced partition approaches. As you cannot change the partition count after the fact. there is nothing that prevents you from starting out with a higher number of partitions than anticipated for your scenario. Based on the metrics reported. In fact. Service Fabric makes sure that there is about the same number of primary and secondary replicas on each node. The first method prevents situations described in the voting example. So a general recommendation is to do both--first.As the population of cities varies widely. In order to avoid this. So you may end up with nodes that hold replicas that serve more traffic and others that serve less traffic. Report load from each of the replicas for the service. by adopting a partitioning strategy that spreads the data evenly across the partitions and second. Sometimes. assuming the maximum number of partitions is a valid approach. From a Service Fabric perspective. Named partitioning. By using this unique key. postal codes. the Visual Studio templates for Service Fabric use ranged partitioning. each responsible for a non-overlapping subrange of the overall partition key range. you are bound to follow: Network bandwidth limits System memory limits Disk storage limits So what happens if you run into resource constraints in a running cluster? The answer is that you can simply scale out the cluster to accommodate the new requirements. to use as your key. stateless services use this partitioning scheme by default. Applications using this model usually have data that can be bucketed. Select a hash algorithm An important part of hashing is selecting your hash algorithm. The remainder of this article focuses on the ranged partitioning scheme. You can specify the upper and lower bounds of the allowed key range. as it is the most common and useful one.accessed and stored. Singleton partitioning. you would then generate a hash code. or other business boundaries. A consideration is whether the goal is to group . and a count of 4 would create four partitions. customer groups. Ranged partitioning scheme This is used to specify an integer range (identified by a low key and high key) and a number of partitions (n). By default. The capacity planning guide offers guidance for how to determine how many nodes your cluster needs. a ranged partitioning scheme with a low key of 0. as shown below. A common approach is to create a hash based on a unique key within the data set. within a bounded set. For example. or a unique string. Named and Singleton partitioning schemes are special forms of ranged partitions. It creates n partitions. a high key of 99. Singleton partitions are typically used when the service does not require any additional routing. Some common examples of data fields used as named partition keys would be regions. For example. modulus the key range. an employee ID. Some common examples of keys would be a vehicle identification number (VIN). Service Fabric offers a choice of three partition schemes: Ranged partitioning (otherwise known as UniformInt64Partition). Get started with partitioning This section describes how to get started with partitioning your service. Open the Applicationmanifest. 2. as in reality the distribution would be uneven. you will build a very simple application where you want to store all last names that start with the same letter in the same partition. . A good resource for general hash code algorithm choices is the Wikipedia page on hash functions. Set the number of partitions. 1. NOTE This is a simplified scenario. you need to think about the partitions and partition keys. it has few collisions.xml file located in the ApplicationPackageRoot folder of the AlphabetPartitions project and update the parameter Processing_PartitionCount to 26 as shown below. we can use 0 as the low key and 25 as the high key. The characteristics of a good distribution hashing algorithm are that it is easy to compute. Build a stateful service with multiple partitions Let's create your first reliable stateful service with multiple partitions. and it distributes the keys evenly.similar keys near each other (locality sensitive hashing)--or if activity should be distributed broadly across all partitions (distribution hashing). In this example. Call the project "AlphabetPartitions". Before you write any code.Processing" as shown in the image below. Open Visual Studio > File > New > Project. In the New Project dialog box. but what about the low and high keys? As we literally want to have one partition per letter. 3. as each letter is its own key. In the Create a Service dialog box. which is more common. 4. You need 26 partitions (one for each letter in the alphabet). Last names starting with the letters "S" or "M" are more common than the ones starting with "X" or "Y". 5. choose Stateful service and call it "Alphabet. A good example of an efficient hash algorithm is the FNV-1 hash algorithm. choose the Service Fabric application. <Service Name="Processing"> <StatefulService ServiceTypeName="ProcessingType" TargetReplicaSetSize=" [Processing_TargetReplicaSetSize]" MinReplicaSetSize="[Processing_MinReplicaSetSize]"> <UniformInt64Partition PartitionCount="[Processing_PartitionCount]" LowKey="0" HighKey="25" /> </StatefulService> </Service> 6. etc. HttpListener can listen on multiple addresses on the same port as long as the URL prefix is unique. Multiple replicas of this service may be hosted on the same computer. see The Reliable Service communication model. FQDM. we assume that you are using a simple HttpCommunicationListener. This is why partition ID + replica ID are in the URL.xml as shown below. '+' is used as the address here so that the replica listens on all available hosts (IP. so this address needs to be unique to the replica. Next.xml (located in the PackageRoot folder) for the Alphabet. NOTE For this sample. For more information on reliable service communication. open up an endpoint on a port by adding the endpoint element of ServiceManifest.Processing service as shown below: <Endpoint Name="ProcessingServiceEndpoint" Port="8089" Protocol="http" Type="Internal" /> Now the service is configured to listen to an internal endpoint with 26 partitions. you want to make sure that a new unique address is used when transitioning from primary to secondary to force clients to re-resolve the address. The extra GUID is there for an advanced case where secondary replicas also listen for read-only requests. you need to override the CreateServiceReplicaListeners() method of the Processing class. .) The code below shows an example. 8. So you want to configure your communication listener to listen on the correct endpoints and with this pattern. When that's the case. 7. For the service to be accessible. localhost. <Parameter Name="Processing_PartitionCount" DefaultValue="26" /> You also need to update the LowKey and HighKey properties of the StatefulService element in the ApplicationManifest. A recommended pattern for the URL that a replica listens on is the following format: {scheme}://{nodeIp}:{port}/{partitionid}/{replicaid}/{guid} . ProcessInternalRequest). 9.ReplicaOrInstanceId. } It's also worth noting that the published URL is slightly different from the listening URL prefix.Format( "{0}://+:{1}/{2}/{3}-{4}/". The listening URL is given to HttpListener. .Port. string uriPublished = uriPrefix.Replace("+". The published URL is the URL that is published to the Service Fabric Naming Service.PartitionId.GetNodeContext(). uriPublished.IPAddressOrFQDN. which is used for service discovery.NewGuid()). nodeIP). context. context. The last step is to add the processing logic to the service as shown below.CreateInternalListener(context))}.Protocol. } private ICommunicationListener CreateInternalListener(ServiceContext context) { EndpointResourceDescription internalEndpoint = context. string uriPrefix = String. this. Clients will ask for this address through that discovery service. The address that clients get needs to have the actual IP or FQDN of the node in order to connect. internalEndpoint.GetEndpoint("ProcessingServiceEndpoint"). So you need to replace '+' with the node's IP or FQDN as shown above. return new HttpCommunicationListener(uriPrefix.CodePackageActivationContext. protected override IEnumerable<ServiceReplicaListener> CreateServiceReplicaListeners() { return new[] { new ServiceReplicaListener(context => this. string nodeIP = FabricRuntime. Guid. internalEndpoint. CreateTransaction()) { bool addResult = await dictionary. } using (HttpListenerResponse response = context. user.ToUpperInvariant(). This service serves as a simple web interface that accepts the lastname as a query string parameter.Processing service for processing. 0. return String. response.Length). choose Stateless service and call it "Alphabet.TryAddAsync(tx. determines the partition key.UTF8.Request.StateManager.AddUserAsync(user). Let's add a stateless service to the project to see how you can call a particular partition. and sends it to the Alphabet. outBytes.GetBytes(output). 10. } catch (Exception ex) { output = ex. CancellationToken cancelRequest) { string output = null. user. } } } private async Task<string> AddUserAsync(string user) { IReliableDictionary<String. String>>("dictionary").QueryString["lastname"]. String> dictionary = await this. 11. user).Message. string user = context.GetOrAddAsync<IReliableDictionary<String. In the Create a Service dialog box. using (ITransaction tx = this.ToString(). addResult ? "sucessfully added" : "already exists"). . try { output = await this.CommitAsync(). await tx.Web" as shown below.Write(outBytes. private async Task ProcessInternalRequest(HttpListenerContext context.Response) { if (output != null) { byte[] outBytes = Encoding. } } ProcessInternalRequest reads the values of the query string parameter used to call the partition and calls AddUserAsync to add the lastname to the reliable dictionary dictionary .StateManager.OutputStream.Format( "User {0} {1}". The HttpCommunicationListener calls ProcessInputRequest when a request comes in. } private ICommunicationListener CreateInputListener(ServiceContext context) { // Service instance's URL is the node's IP & desired port EndpointResourceDescription inputEndpoint = context. So let's go ahead and add the code below.Port).Format("{0}://+:{1}/alphabetpartitions/". inputEndpoint. Again. inputEndpoint.Replace("+".CreateInputListener(context))}. } 14. uriPublished. Update the endpoint information in the ServiceManifest.ProcessInputRequest).GetEndpoint("WebApiServiceEndpoint") string uriPrefix = String. this. .CodePackageActivationContext. you can choose to implement a simple HttpCommunicationListener. Now you need to implement the processing logic. <Endpoint Name="WebApiServiceEndpoint" Protocol="http" Port="8081"/> 13. 12. var uriPublished = uriPrefix. return new HttpCommunicationListener(uriPrefix.GetNodeContext().Protocol. FabricRuntime.IPAddressOrFQDN). You need to return a collection of ServiceInstanceListeners in the class Web.WebApi service to open up a port as shown below. . protected override IEnumerable<ServiceInstanceListener> CreateServiceInstanceListeners() { return new[] {new ServiceInstanceListener(context => this.xml of the Alphabet. Remember. outBytes. cancelRequest).Message. . } catch (Exception ex) { output = ex. The code reads the first letter of the query string parameter lastname into a char. } } } Let's walk through it step by step. UriBuilder primaryReplicaUriBuilder = new UriBuilder(primaryReplicaAddress). primaryReplicaAddress). Then. ServicePartitionKey partitionKey = new ServicePartitionKey(Char.Parse(ep. it determines the partition key for this letter by subtracting the hexadecimal value of A from the hexadecimal value of the last names' first letter.Response) { if (output != null) { output = output + "added to Partition: " + primaryReplicaAddress.GetEndpoint(). <p>Partition key: '{1}' generated from the first letter '{2}' of input value '{3}'.Info. ResolvedServiceEndpoint ep = partition.First().Id.Format( "Result: {0}.Write(outBytes. for this example. char firstLetterOfLastName = lastname. primaryReplicaUriBuilder. CancellationToken cancelRequest) { String output = null. ServicePartitionKey partitionKey = new ServicePartitionKey(Char.GetBytes(output). try { string lastname = context. partitionKey.Uri). byte[] outBytes = Encoding.ToUpper(firstLetterOfLastName) . we are using 26 partitions with one partition key per partition. <br>Processing service replica address: {5}". char firstLetterOfLastName = lastname.QueryString["lastname"].Length). Next. string result = await this.QueryString["lastname"].GetStringAsync(primaryReplicaUriBuilder. lastname.servicePartitionResolver.Address).httpClient. string primaryReplicaAddress = (string)addresses["Endpoints"].Request.GetDefault(). firstLetterOfLastName.ResolveAsync(alphabetServiceUri. JObject addresses = JObject. response. <br>Processing service partition ID: {4}.Query = "lastname=" + lastname.'A'). } using (var response = context.Request. partitionKey.ToUpper(firstLetterOfLastName) - 'A').First(). ResolvedServicePartition partition = await this. output = String.OutputStream. servicePartitionResolver is defined as private readonly ServicePartitionResolver servicePartitionResolver = ServicePartitionResolver. private async Task ProcessInputRequest(HttpListenerContext context. result. partition. 0. string lastname = context.UTF8.First(). we obtain the service partition partition for this key by using the ResolveAsync method on the servicePartitionResolver object. and a cancellation token as parameters. you can test the partitioning logic by entering http://localhost:8081/?lastname=somename . . we build the endpoint URL plus the querystring and call the processing service. Next. string primaryReplicaAddress = (string)addresses["Endpoints"]. Visual Studio uses application parameters for local and cloud deployment. string result = await this. UriBuilder primaryReplicaUriBuilder = new UriBuilder(primaryReplicaAddress). The last step is to test the service. The ResolveAsync method takes the service URI.Uri). The service URI for the processing service is fabric:/AlphabetPartitions/Processing .GetEndpoint() Finally.Query = "lastname=" + lastname. we get the endpoint of the partition.xml file in the ApplicationParameters folder of the AlphabetPartitions project as shown below: <Parameters> <Parameter Name="Processing_PartitionCount" Value="26" /> <Parameter Name="WebApi_InstanceCount" Value="1" /> </Parameters> 16. 17. In a browser.GetStringAsync(primaryReplicaUriBuilder. JObject addresses = JObject. Once the processing is done. ResolvedServiceEndpoint ep = partition. To test the service with 26 partitions locally. we write the output back.Address). you can check the service and all of its partitions in the Service Fabric Explorer. primaryReplicaUriBuilder.httpClient. the partition key. You will see that each last name that starts with the same letter is being stored in the same partition. 15.First(). Once you finish deployment. you need to update the Local.Parse(ep. The entire source code of the sample is available on GitHub. see the following: Availability of Service Fabric services Scalability of Service Fabric services Capacity planning for Service Fabric applications . Next steps For information on Service Fabric concepts. Availability of Service Fabric stateless services Azure Service Fabric services can be either stateful or stateless. Replica roles The role of a replica is used to manage the life cycle of the state being managed by that replica. The Primary also handles all write requests by updating its state and replicating the changes. In actors. Service Fabric makes one of the Active Secondary replicas the new Primary replica. These changes are applied to the Active Secondaries in the replica set. Availability of Service Fabric services 1/17/2017 • 2 min to read • Edit Online This article gives an overview of how Service Fabric maintains availability of a service. a new instance is created on some eligible node in the cluster. the notion of role is unnecessary. Creating a stateless service requires defining an instance count. If the Primary replica goes down. Read and write operations are performed at one replica (called the Primary). Each replica is an instance of the code of the service that has a copy of the state. A stateless service is an application service that does not have any local persistent state. Changes to state from write operations are replicated to multiple other replicas (called Active Secondaries). is known as the Replica Role. The job of an Active Secondary is to receive state changes that the Primary replica has replicated and update its view of the state. This Active Secondary replica already has the updated version of the state (via replication). The combination of Primary and Active Secondaries make up the replica set of the service. but there can be multiple Active Secondary replicas. The number of Active Secondary replicas is configurable. of a replica being either a Primary or Active Secondary. Availability of Service Fabric stateful services A stateful service has some state associated with it. NOTE Higher-level programming models such as the reliable actors framework and Reliable Services abstract away the concept of replica role from the developer. This concept. The instance count defines the number of instances of the stateless service's application logic that should be running in the cluster. A replica whose role is Primary services read requests. see the following articles: . Increasing the number of instances is the recommended way of scaling out a stateless service. When a fault is detected on any instance of a stateless service. There can be only one Primary replica servicing read and write requests. while in Services it is visible if necessary but largely simplified. a stateful service is modeled as a set of replicas. Next steps For more information on Service Fabric concepts. and a higher number of replicas can tolerate a greater number of concurrent software and hardware failures. and it can continue processing further read and write operations. In Service Fabric. Scalability of Service Fabric services Partitioning Service Fabric services Defining and managing state Reliable Services Scaling Service Fabric applications 1/17/2017 • 4 min to read • Edit Online Azure Service Fabric makes it easy to build scalable applications by managing the services, partitions, and replicas on all the nodes in a cluster. This enables maximum resource utilization. High scale for Service Fabric applications can be achieved in two ways: 1. Scaling at the service partition level 2. Scaling at the named service instance level Scaling at the partition level Service Fabric supports partitioning. Partitioning allows an individual service to be split into multiple independent partitions, each with some portion of the service's overall state. The partitioning overview provides information on the types of partitioning schemes that are supported. The replicas of each partition are spread across the nodes in a cluster. Consider a service that uses a ranged partitioning scheme with a low key of 0, a high key of 99, and a partition count of 4. In a three-node cluster, the service might be laid out with four replicas that share the resources on each node as shown here: If you increase the number of nodes, Service Fabric will utilize the resources on the new nodes by moving some of the existing replicas there. By increasing the number of nodes to four, the service now has three replicas running on each node (each belonging to different partitions), allowing for better resource utilization and performance. Scaling at the service name level A service instance is a specific instance of an application name and a service type name (see Service Fabric application life cycle). During the creation of a service, you specify the partition scheme (see Partitioning Service Fabric services) to be used. The first level of scaling is by service name. You can create instances of a service, optionally with different levels of partitioning, as your older service instances become busy. This allows new service consumers to use less-busy service instances, rather than busier ones. One option for increasing capacity is to create a new service instance with a new partition scheme. This adds complexity, though. Any consuming clients need to know when and how to use the differently named service. As another alternative a management or intermediary service would need to make a determination about which service and partition should handle each request. Example scenario: Embedded dates One possible scenario would be to use date information as part of the service name. For example, you could use a service instance with a specific name for all customers who joined in 2013 and another name for customers who joined in 2014. This naming scheme allows for programmatically increasing the names depending on the date (as 2014 approaches, the service instance for 2014 can be created on demand). However, this approach is based on the clients using application-specific naming information that is outside the scope of Service Fabric knowledge. Using a naming convention: In 2013, when your application goes live, you create a service called fabric:/app/service2013. Near the second quarter of 2013, you create another service, called fabric:/app/service2014. Both of these services are of the same service type. In this approach, your client will need to employ logic to construct the appropriate service name based on the year. Using a lookup service: Another pattern is to provide a secondary lookup service, which can provide the name of the service for a desired key. New service instances can then be created by the lookup service. The lookup service itself doesn't retain any application data, only data about the service names that it creates. Thus, for the year-based example above, the client would first contact the lookup service to find out the name of the service handling data for a given year. Then, the client would use that service name for performing the actual operation. The result of the first lookup can be cached. Putting it all together Let's take all the ideas that we've discussed here and talk through another scenario. Consider the following example: you are trying to build a service that acts as an address book, holding on to names and contact information. How many users are you going to have? How many contacts will each user store? Trying to figure this all out when you are standing up your service for the first time is really hard. The consequences of picking the wrong partition count could cause you to have scale issues later. But why try to pick single partition scheme out for all users at all? In these types of situations, consider the following pattern instead: 1. Instead of trying to pick a partitioning scheme for everyone up front, build a "manager service". 2. The job of the manager service is to look at customer information when they sign up for your service. Then depending on that information to create an instance of your actual contact-storage service just for that customer. This type of dynamic service creation pattern many benefits: You're not trying to guess the correct partition count for all users up front Data segmentation, since each customer has their own copy of the service Each customer service can be configured differently, with more or fewer partitions or replicas as necessary based on their expected scale. For example, say the customer paid for the "Gold" tier - they could get more replicas or greater partition count Or say they provided information indicating the number of contacts they needed was "Small" - they would get only a few partitions. You're not running a bunch of service instances or replicas while you're waiting for customers to show up If a customer ever leaves, then removing their information from your service is as simple as having the manager delete that service that it created Next steps For more information on Service Fabric concepts, see the following articles: Availability of Service Fabric services Partitioning Service Fabric services Defining and managing state Capacity planning for Service Fabric applications 3/3/2017 • 6 min to read • Edit Online This document teaches you how to estimate the amount of resources (CPUs, RAM, disk storage) you need to run your Azure Service Fabric applications. It is common for your resource requirements to change over time. You typically require few resources as you develop/test your service, and then require more resources as you go into production and your application grows in popularity. When you design your application, think through the long- term requirements and make choices that allow your service to scale to meet high customer demand. When you create a Service Fabric cluster, you decide what kinds of virtual machines (VMs) make up the cluster. Each VM comes with a limited amount of resources in the form of CPUs (cores and speed), network bandwidth, RAM, and disk storage. As your service grows over time, you can upgrade to VMs that offer greater resources and/or add more VMs to your cluster. To do the latter, you must architect your service initially so it can take advantage of new VMs that get dynamically added to the cluster. Some services manage little to no data on the VMs themselves. Therefore, capacity planning for these services should focus primarily on performance, which means selecting the appropriate CPUs (cores and speed) of the VMs. In addition, you should consider network bandwidth, including how frequently network transfers are occurring and how much data is being transferred. If your service needs to perform well as service usage increases, you can add more VMs to the cluster and load balance the network requests across all the VMs. For services that manage large amounts of data on the VMs, capacity planning should focus primarily on size. Thus, you should carefully consider the capacity of the VM's RAM and disk storage. The virtual memory management system in Windows makes disk space look like RAM to application code. In addition, the Service Fabric runtime provides smart paging keeping only hot data in memory and moving the cold data to disk. Applications can thus use more memory than is physically available on the VM. Having more RAM simply increases performance, since the VM can keep more disk storage in RAM. The VM you select should have a disk large enough to store the data that you want on the VM. Similarly, the VM should have enough RAM to provide you with the performance you desire. If your service's data grows over time, you can add more VMs to the cluster and partition the data across all the VMs. Determine how many nodes you need Partitioning your service allows you to scale out your service's data. For more information on partitioning, see Partitioning Service Fabric. Each partition must fit within a single VM, but multiple (small) partitions can be placed on a single VM. So, having more small partitions gives you greater flexibility than having a few larger partitions. The trade-off is that having lots of partitions increases Service Fabric overhead and you cannot perform transacted operations across partitions. There is also more potential network traffic if your service code frequently needs to access pieces of data that live in different partitions. When designing your service, you should carefully consider these pros and cons to arrive at an effective partitioning strategy. Let's assume your application has a single stateful service that has a store size that you expect to grow to DB_Size GB in a year. You are willing to add more applications (and partitions) as you experience growth beyond that year. The replication factor (RF), which determines the number of replicas for your service impacts the total DB_Size. The total DB_Size across all replicas is the Replication Factor multiplied by DB_Size. Node_Size represents the disk space/RAM per node you want to use for your service. For best performance, the DB_Size should fit into memory across the cluster, and a Node_Size that is around the RAM of the VM should be chosen. By allocating a Node_Size that is larger than the RAM capacity, you are relying on the paging provided by the Service Fabric runtime. Thus, your performance may not be optimal if your entire data is considered to be hot (since then the data is paged in/out). However, for many services where only a fraction of the data is hot, it is more cost-effective. The number of nodes required for maximum performance can be computed as follows: Number of Nodes = (DB_Size * RF)/Node_Size Account for growth You may want to compute the number of nodes based on the DB_Size that you expect your service to grow to, in addition to the DB_Size that you began with. Then, grow the number of nodes as your service grows so that you are not over-provisioning the number of nodes. But the number of partitions should be based on the number of nodes that are needed when you're running your service at maximum growth. It is good to have some extra machines available at any time so that you can handle any unexpected spikes or failure (for example, if a few VMs go down). While the extra capacity should be determined by using your expected spikes, a starting point is to reserve a few extra VMs (5-10 percent extra). The preceding assumes a single stateful service. If you have more than one stateful service, you have to add the DB_Size associated with the other services into the equation. Alternatively, you can compute the number of nodes separately for each stateful service. Your service may have replicas or partitions that aren't balanced. Keep in mind that partitions may also have more data than others. For more information on partitioning, see partitioning article on best practices. However, the preceding equation is partition and replica agnostic, because Service Fabric ensures that the replicas are spread out among the nodes in an optimized manner. Use a spreadsheet for cost calculation Now let's put some real numbers in the formula. An example spreadsheet shows how to plan the capacity for an application that contains three types of data objects. For each object, we approximate its size and how many objects we expect to have. We also select how many replicas we want of each object type. The spreadsheet calculates the total amount of memory to be stored in the cluster. Then we enter a VM size and monthly cost. Based on the VM size, the spreadsheet tells you the minimum number of partitions you must use to split your data to physically fit on the nodes. You may desire a larger number of partitions to accommodate your application's specific computation and network traffic needs. The spreadsheet shows the number of partitions that are managing the user profile objects has increased from one to six. Now, based on all this information, the spreadsheet shows that you could physically get all the data with the desired partitions and replicas on a 26-node cluster. However, this cluster would be densely packed, so you may want some additional nodes to accommodate node failures and upgrades. The spreadsheet also shows that having more than 57 nodes provides no additional value because you would have empty nodes. Again, you may want to go above 57 nodes anyway to accommodate node failures and upgrades. You can tweak the spreadsheet to match your application's specific needs. Next steps Check out Partitioning Service Fabric services to learn more about partitioning your service. Introduction to the Fault Analysis Service 3/1/2017 • 6 min to read • Edit Online The Fault Analysis Service is designed for testing services that are built on Microsoft Azure Service Fabric. With the Fault Analysis Service you can induce meaningful faults and run complete test scenarios against your applications. These faults and scenarios exercise and validate the numerous states and transitions that a service will experience throughout its lifetime, all in a controlled, safe, and consistent manner. Actions are the individual faults targeting a service for testing it. A service developer can use these as building blocks to write complicated scenarios. For example: Restart a node to simulate any number of situations where a machine or VM is rebooted. Move a replica of your stateful service to simulate load balancing, failover, or application upgrade. Invoke quorum loss on a stateful service to create a situation where write operations can't proceed because there aren't enough "back-up" or "secondary" replicas to accept new data. Invoke data loss on a stateful service to create a situation where all in-memory state is completely wiped out. Scenarios are complex operations composed of one or more actions. The Fault Analysis Service provides two built-in complete scenarios: Chaos Scenario Failover Scenario Testing as a service The Fault Analysis Service is a Service Fabric system service that is automatically started with a Service Fabric cluster. This is service acts as the host for fault injection, test scenario execution, and health analysis. When a fault action or test scenario is initiated, a command is sent to the Fault Analysis Service to run the fault action or test scenario. The Fault Analysis Service is stateful so that it can reliable run faults and scenarios and validate results. For example, a long-running test scenario can be reliably executed by the Fault Analysis Service. And because tests are being executed inside the cluster, the service can examine the state of the cluster and your services to provide more in-depth information about failures. Testing distributed systems Service Fabric makes the job of writing and managing distributed scalable applications significantly easier. The Fault Analysis Service makes testing a distributed application similarly easier. There are three main issues that need to be solved while testing: 1. Simulating/generating failures that might occur in real-world scenarios: One of the important aspects of Service Fabric is that it enables distributed applications to recover from various failures. However, to test that the application is able to recover from these failures, we need a mechanism to simulate/generate these real- world failures in a controlled test environment. 2. The ability to generate correlated failures: Basic failures in the system, such as network failures and machine failures, are easy to produce individually. Generating a significant number of scenarios that can happen in the real world as a result of the interactions of these individual failures is non-trivial. 3. Unified experience across various levels of development and deployment: There are many fault injection systems that can do various types of failures. However, the experience in all of these is poor when moving from one-box developer scenarios, to running the same tests in large test environments, to using them for tests in production. While there are many mechanisms to solve these problems, a system that does the same with required guarantees--all the way from a one-box developer environment, to test in production clusters--is missing. The Fault Analysis Service helps the application developers concentrate on testing their business logic. The Fault Analysis Service provides all the capabilities needed to test the interaction of the service with the underlying distributed system. Simulating/generating real-world failure scenarios To test the robustness of a distributed system against failures, we need a mechanism to generate failures. While in theory, generating a failure like a node down seems easy, it starts hitting the same set of consistency problems that Service Fabric is trying to solve. As an example, if we want to shut down a node, the required workflow is the following: 1. From the client, issue a shutdown node request. 2. Send the request to the right node. a. If the node is not found, it should fail. b. If the node is found, it should return only if the node is shut down. To verify the failure from a test perspective, the test needs to know that when this failure is induced, the failure actually happens. The guarantee that Service Fabric provides is that either the node will go down or was already down when the command reached the node. In either case the test should be able to correctly reason about the state and succeed or fail correctly in its validation. A system implemented outside of Service Fabric to do the same set of failures could hit many network, hardware, and software issues, which would prevent it from providing the preceding guarantees. In the presence of the issues stated before, Service Fabric will reconfigure the cluster state to work around the issues, and hence the Fault Analysis Service will still be able to give the right set of guarantees. Generating required events and scenarios While simulating a real-world failure consistently is tough to start with, the ability to generate correlated failures is even tougher. For example, a data loss happens in a stateful persisted service when the following things happen: 1. Only a write quorum of the replicas are caught up on replication. All the secondary replicas lag behind the primary. 2. The write quorum goes down because of the replicas going down (due to a code package or node going down). 3. The write quorum cannot come back up because the data for the replicas is lost (due to disk corruption or machine reimaging). These correlated failures do happen in the real world, but not as frequently as individual failures. The ability to test for these scenarios before they happen in production is critical. Even more important is the ability to simulate these scenarios with production workloads in controlled circumstances (in the middle of the day with all engineers on deck). That is much better than having it happen for the first time in production at 2:00 A.M. Unified experience across different environments The practice traditionally has been to create three different sets of experiences, one for the development environment, one for tests, and one for production. The model was: 1. In the development environment, produce state transitions that allow unit tests of individual methods. 2. In the test environment, produce failures to allow end-to-end tests that exercise various failure scenarios. 3. Keep the production environment pristine to prevent any non-natural failures and to ensure that there is extremely quick human response to failure. In Service Fabric, through the Fault Analysis Service, we are proposing to turn this around and use the same methodology from developer environment to production. There are two ways to achieve this: 1. To induce controlled failures, use the Fault Analysis Service APIs from a one-box environment all the way to production clusters. 2. To give the cluster a fever that causes automatic induction of failures, use the Fault Analysis Service to generate automatic failures. Controlling the rate of failures through configuration enables the same service to be tested differently in different environments. With Service Fabric, though the scale of failures would be different in the different environments, the actual mechanisms would be identical. This allows for a much quicker code-to-deployment pipeline and the ability to test the services under real-world loads. Using the Fault Analysis Service C# Fault Analysis Service features are in the System.Fabric namespace in the Microsoft.ServiceFabric NuGet package. To use the Fault Analysis Service features, include the nuget package as a reference in your project. PowerShell To use PowerShell, you must install the Service Fabric SDK. After the SDK is installed, the ServiceFabric PowerShell module is auto loaded for you to use. Next steps To create truly cloud-scale services, it is critical to ensure, both before and after deployment, that services can withstand real world failures. In the services world today, the ability to innovate quickly and move code to production quickly is very important. The Fault Analysis Service helps service developers to do precisely that. Begin testing your applications and services using the built-in test scenarios, or author your own test scenarios using the fault actions provided by the Fault Analysis Service. Service Fabric testability scenarios: Service communication 1/17/2017 • 4 min to read • Edit Online Microservices and service-oriented architectural styles surface naturally in Azure Service Fabric. In these types of distributed architectures, componentized microservice applications are typically composed of multiple services that need to talk to each other. In even the simplest cases, you generally have at least a stateless web service and a stateful data storage service that need to communicate. Service-to-service communication is a critical integration point of an application, because each service exposes a remote API to other services. Working with a set of API boundaries that involves I/O generally requires some care, with a good amount of testing and validation. There are numerous considerations to make when these service boundaries are wired together in a distributed system: Transport protocol. Will you use HTTP for increased interoperability, or a custom binary protocol for maximum throughput? Error handling. How will permanent and transient errors be handled? What will happen when a service moves to a different node? Timeouts and latency. In multitiered applications, how will each service layer handle latency through the stack and to the user? Whether you use one of the built-in service communication components provided by Service Fabric or you build your own, testing the interactions between your services is critical to ensuring resiliency in your application. Prepare for services to move Service instances may move around over time. This is especially true when they are configured with load metrics for custom-tailored optimal resource balancing. Service Fabric moves your service instances to maximize their availability even during upgrades, failovers, scale-out, and other situations that occur over the lifetime of a distributed system. As services move around in the cluster, your clients and other services should be prepared to handle two scenarios when they talk to a service: The service instance or partition replica has moved since the last time you talked to it. This is a normal part of a service lifecycle, and it should be expected to happen during the lifetime of your application. The service instance or partition replica is in the process of moving. Although failover of a service from one node to another occurs very quickly in Service Fabric, there may be a delay in availability if the communication component of your service is slow to start. Handling these scenarios gracefully is important for a smooth-running system. To do so, keep in mind that: Every service that can be connected to has an address that it listens on (for example, HTTP or WebSockets). When a service instance or partition moves, its address endpoint changes. (It moves to a different node with a different IP address.) If you're using the built-in communication components, they will handle re-resolving service addresses for you. There may be a temporary increase in service latency as the service instance starts up its listener again. This depends on how quickly the service opens the listener after the service instance is moved. Any existing connections need to be closed and reopened after the service opens on a new node. A graceful node shutdown or restart allows time for existing connections to be shut down gracefully. Test it: Move service instances By using Service Fabric's testability tools, you can author a test scenario to test these situations in different ways: 1. Move a stateful service's primary replica. The primary replica of a stateful service partition can be moved for any number of reasons. Use this to target the primary replica of a specific partition to see how your services react to the move in a very controlled manner. PS > Move-ServiceFabricPrimaryReplica -PartitionId 6faa4ffa-521a-44e9-8351-dfca0f7e0466 -ServiceName fabric:/MyApplication/MyService 2. Stop a node. When a node is stopped, Service Fabric moves all of the service instances or partitions that were on that node to one of the other available nodes in the cluster. Use this to test a situation where a node is lost from your cluster and all of the service instances and replicas on that node have to move. You can stop a node by using the PowerShell Stop-ServiceFabricNode cmdlet: PS > Restart-ServiceFabricNode -NodeName Node_1 Maintain service availability As a platform, Service Fabric is designed to provide high availability of your services. But in extreme cases, underlying infrastructure problems can still cause unavailability. It is important to test for these scenarios, too. Stateful services use a quorum-based system to replicate state for high availability. This means that a quorum of replicas needs to be available to perform write operations. In rare cases, such as a widespread hardware failure, a quorum of replicas may not be available. In these cases, you will not be able to perform write operations, but you will still be able to perform read operations. Test it: Write operation unavailability By using the testability tools in Service Fabric, you can inject a fault that induces quorum loss as a test. Although such a scenario is rare, it is important that clients and services that depend on a stateful service are prepared to handle situations where they cannot make write requests to it. It is also important that the stateful service itself is aware of this possibility and can gracefully communicate it to callers. You can induce quorum loss by using the PowerShell Invoke-ServiceFabricPartitionQuorumLoss cmdlet: PS > Invoke-ServiceFabricPartitionQuorumLoss -ServiceName fabric:/Myapplication/MyService -QuorumLossMode QuorumReplicas -QuorumLossDurationInSeconds 20 In this example, we set QuorumLossMode to QuorumReplicas to indicate that we want to induce quorum loss without taking down all replicas. This way, read operations are still possible. To test a scenario where an entire partition is unavailable, you can set this switch to AllReplicas . Next steps Learn more about testability actions Learn more about testability scenarios Induce controlled Chaos in Service Fabric clusters 1/20/2017 • 4 min to read • Edit Online Large-scale distributed systems like cloud infrastructures are inherently unreliable. Azure Service Fabric enables developers to write reliable services on top of an unreliable infrastructure. To write robust services, developers need to be able to induce faults against such unreliable infrastructure to test the stability of their services. The Fault Injection and Cluster Analysis Service (also known as the Fault Analysis Service) gives developers the ability to induce fault actions to test services. However, targeted simulated faults get you only so far. To take the testing further, you can use Chaos. Chaos simulates continuous, interleaved faults (both graceful and ungraceful) throughout the cluster over extended periods of time. After you configure Chaos with the rate and the kind of faults, you can start or stop it through either C# APIs or PowerShell to generate faults in the cluster and your service. While Chaos is running, it produces different events that capture the state of the run at the moment. For example, an ExecutingFaultsEvent contains all the faults that are being executed in that iteration. A ValidationFailedEvent contains the details of a failure that was found during cluster validation. You can invoke the GetChaosReportAsync API to get the report of Chaos runs. Faults induced in Chaos Chaos generates faults across the entire Service Fabric cluster and compresses faults that are seen in months or years into a few hours. The combination of interleaved faults with the high fault rate finds corner cases that are otherwise missed. This Chaos exercise leads to a significant improvement in the code quality of the service. Chaos induces faults from the following categories: Restart a node Restart a deployed code package Remove a replica Restart a replica Move a primary replica (configurable) Move a secondary replica (configurable) Chaos runs in multiple iterations. Each iteration consists of faults and cluster validation for the specified period. You can configure the time spent for the cluster to stabilize and for validation to succeed. If a failure is found in cluster validation, Chaos generates and persists a ValidationFailedEvent with the UTC timestamp and the failure details. For example, consider an instance of Chaos that is set to run for an hour with a maximum of three concurrent faults. Chaos induces three faults, and then validates the cluster health. It iterates through the previous step until it is explicitly stopped through the StopChaosAsync API or one-hour passes. If the cluster becomes unhealthy in any iteration (that is, it does not stabilize within a configured time), Chaos generates a ValidationFailedEvent. This event indicates that something has gone wrong and might need further investigation. In its current form, Chaos induces only safe faults. This implies that, in the absence of external faults, a quorum loss or data loss never occurs. Important configuration options TimeToRun: Total time that Chaos runs before it finishes with success. You can stop Chaos before it has run for the TimeToRun period through the StopChaos API. MaxClusterStabilizationTimeout: The maximum amount of time to wait for the cluster to become healthy before checking on it again. This wait is to reduce the load on the cluster while it is recovering. The checks performed are: If the cluster health is OK If the service health is OK If the target replica set size is achieved for the service partition That no InBuild replicas exist MaxConcurrentFaults: The maximum number of concurrent faults that are induced in each iteration. The higher the number, the more aggressive Chaos is. This results in more complex failovers and transition combinations. Chaos guarantees that, in the absence of external faults, there is no quorum loss or data loss, regardless of how high a value this configuration has. EnableMoveReplicaFaults: Enables or disables the faults that cause the primary or secondary replicas to move. These faults are disabled by default. WaitTimeBetweenIterations: The amount of time to wait between iterations, that is, after a round of faults and corresponding validation. WaitTimeBetweenFaults: The amount of time to wait between two consecutive faults in an iteration. How to run Chaos C#: using System; using System.Collections.Generic; using System.Threading.Tasks; using System.Fabric; using System.Diagnostics; using System.Fabric.Chaos.DataStructures; class Program { private class ChaosEventComparer : IEqualityComparer<ChaosEvent> { public bool Equals(ChaosEvent x, ChaosEvent y) { return x.TimeStampUtc.Equals(y.TimeStampUtc); } public int GetHashCode(ChaosEvent obj) { return obj.TimeStampUtc.GetHashCode(); } } static void Main(string[] args) { var clusterConnectionString = "localhost:19000"; using (var client = new FabricClient(clusterConnectionString)) { var startTimeUtc = DateTime.UtcNow; var stabilizationTimeout = TimeSpan.FromSeconds(30.0); var timeToRun = TimeSpan.FromMinutes(60.0); var maxConcurrentFaults = 3; var parameters = new ChaosParameters( stabilizationTimeout, maxConcurrentFaults, true, /* EnableMoveReplicaFault */ timeToRun); try try { client.TestManager.StartChaosAsync(parameters).GetAwaiter().GetResult(); } catch (FabricChaosAlreadyRunningException) { Console.WriteLine("An instance of Chaos is already running in the cluster."); } var filter = new ChaosReportFilter(startTimeUtc, DateTime.MaxValue); var eventSet = new HashSet<ChaosEvent>(new ChaosEventComparer()); while (true) { var report = client.TestManager.GetChaosReportAsync(filter).GetAwaiter().GetResult(); foreach (var chaosEvent in report.History) { if (eventSet.add(chaosEvent)) { Console.WriteLine(chaosEvent); } } // When Chaos stops, a StoppedEvent is created. // If a StoppedEvent is found, exit the loop. var lastEvent = report.History.LastOrDefault(); if (lastEvent is StoppedEvent) { break; } Task.Delay(TimeSpan.FromSeconds(1.0)).GetAwaiter().GetResult(); } } } } PowerShell: $connection = "localhost:19000" $timeToRun = 60 $maxStabilizationTimeSecs = 180 $concurrentFaults = 3 $waitTimeBetweenIterationsSec = 60 Connect-ServiceFabricCluster $connection $events = @{} $now = [System.Fabric.StoppedEvent]) { $stopped = $true } Write-Host $e } } } if($stopped -eq $true) { break } Start-Sleep -Seconds 1 } Stop-ServiceFabricChaos .Fabric.History) { if(-Not ($events.Ticks.DateTime]::MaxValue) foreach ($e in $report.DataStructures.Chaos.DataStructures.TimeStampUtc.Ticks))) { $events.Contains($e.DateTime]::UtcNow Start-ServiceFabricChaos -TimeToRunMinute $timeToRun -MaxConcurrentFaults $concurrentFaults - MaxClusterStabilizationTimeoutSec $maxStabilizationTimeSecs -EnableMoveReplicaFaults - WaitTimeBetweenIterationsSec $waitTimeBetweenIterationsSec while($true) { $stopped = $false $report = Get-ServiceFabricChaosReport -StartTimeUtc $now -EndTimeUtc ([System.TimeStampUtc.Chaos. $e) if($e -is [System.ValidationFailedEvent]) { Write-Host -BackgroundColor White -ForegroundColor Red $e } else { if($e -is [System.Add($e. Testability actions 1/24/2017 • 8 min to read • Edit Online In order to simulate an unreliable infrastructure, Azure Service Fabric provides you, the developer, with ways to simulate various real-world failures and state transitions. These are exposed as testability actions. The actions are the low-level APIs that cause a specific fault injection, state transition, or validation. By combining these actions, you can write comprehensive test scenarios for your services. Service Fabric provides some common test scenarios composed of these actions. We highly recommend that you utilize these built-in scenarios, which are carefully chosen to test common state transitions and failure cases. However, actions can be used to create custom test scenarios when you want to add coverage for scenarios that are not covered by the built-in scenarios yet or that are custom tailored for your application. C# implementations of the actions are found in the System.Fabric.dll assembly. The System Fabric PowerShell module is found in the Microsoft.ServiceFabric.Powershell.dll assembly. As part of runtime installation, the ServiceFabric PowerShell module is installed to allow for ease of use. Graceful vs. ungraceful fault actions Testability actions are classified into two major buckets: Ungraceful faults: These faults simulate failures like machine restarts and process crashes. In such cases of failures, the execution context of process stops abruptly. This means no cleanup of the state can run before the application starts up again. Graceful faults: These faults simulate graceful actions like replica moves and drops triggered by load balancing. In such cases, the service gets a notification of the close and can clean up the state before exiting. For better quality validation, run the service and business workload while inducing various graceful and ungraceful faults. Ungraceful faults exercise scenarios where the service process abruptly exits in the middle of some workflow. This tests the recovery path once the service replica is restored by Service Fabric. This will help test data consistency and whether the service state is maintained correctly after failures. The other set of failures (the graceful failures) test that the service correctly reacts to replicas being moved around by Service Fabric. This tests handling of cancellation in the RunAsync method. The service needs to check for the cancellation token being set, correctly save its state, and exit the RunAsync method. Testability actions list GRACEFUL/UNGRACEFU ACTION DESCRIPTION MANAGED API POWERSHELL CMDLET L FAULTS CleanTestState Removes all the test CleanTestStateAsync Remove- Not applicable state from the cluster ServiceFabricTestState in case of a bad shutdown of the test driver. InvokeDataLoss Induces data loss into InvokeDataLossAsync Invoke- Graceful a service partition. ServiceFabricPartition DataLoss GRACEFUL/UNGRACEFU ACTION DESCRIPTION MANAGED API POWERSHELL CMDLET L FAULTS InvokeQuorumLoss Puts a given stateful InvokeQuorumLossAs Invoke- Graceful service partition into ync ServiceFabricQuorum quorum loss. Loss Move Primary Moves the specified MovePrimaryAsync Move- Graceful primary replica of a ServiceFabricPrimaryR stateful service to the eplica specified cluster node. Move Secondary Moves the current MoveSecondaryAsync Move- Graceful secondary replica of a ServiceFabricSecondar stateful service to a yReplica different cluster node. RemoveReplica Simulates a replica RemoveReplicaAsync Remove- Graceful failure by removing a ServiceFabricReplica replica from a cluster. This will close the replica and will transition it to role 'None', removing all of its state from the cluster. RestartDeployedCode Simulates a code RestartDeployedCode Restart- Ungraceful Package package process PackageAsync ServiceFabricDeployed failure by restarting a CodePackage code package deployed on a node in a cluster. This aborts the code package process, which will restart all the user service replicas hosted in that process. RestartNode Simulates a Service RestartNodeAsync Restart- Ungraceful Fabric cluster node ServiceFabricNode failure by restarting a node. RestartPartition Simulates a RestartPartitionAsync Restart- Graceful datacenter blackout ServiceFabricPartition or cluster blackout scenario by restarting some or all replicas of a partition. RestartReplica Simulates a replica RestartReplicaAsync Restart- Graceful failure by restarting a ServiceFabricReplica persisted replica in a cluster, closing the replica and then reopening it. GRACEFUL/UNGRACEFU ACTION DESCRIPTION MANAGED API POWERSHELL CMDLET L FAULTS StartNode Starts a node in a StartNodeAsync Start- Not applicable cluster that is already ServiceFabricNode stopped. StopNode Simulates a node StopNodeAsync Stop- Ungraceful failure by stopping a ServiceFabricNode node in a cluster. The node will stay down until StartNode is called. ValidateApplication Validates the ValidateApplicationAs Test- Not applicable availability and health ync ServiceFabricApplicati of all Service Fabric on services within an application, usually after inducing some fault into the system. ValidateService Validates the ValidateServiceAsync Test- Not applicable availability and health ServiceFabricService of a Service Fabric service, usually after inducing some fault into the system. Running a testability action using PowerShell This tutorial shows you how to run a testability action by using PowerShell. You will learn how to run a testability action against a local (one-box) cluster or an Azure cluster. Microsoft.Fabric.Powershell.dll--the Service Fabric PowerShell module--is installed automatically when you install the Microsoft Service Fabric MSI. The module is loaded automatically when you open a PowerShell prompt. Tutorial segments: Run an action against a one-box cluster Run an action against an Azure cluster Run an action against a one -box cluster To run a testability action against a local cluster, first connect to the cluster and open the PowerShell prompt in administrator mode. Let us look at the Restart-ServiceFabricNode action. Restart-ServiceFabricNode -NodeName Node1 -CompletionMode DoNotVerify Here the action Restart-ServiceFabricNode is being run on a node named "Node1". The completion mode specifies that it should not verify whether the restart-node action actually succeeded. Specifying the completion mode as "Verify" will cause it to verify whether the restart action actually succeeded. Instead of directly specifying the node by its name, you can specify it via a partition key and the kind of replica, as follows: Restart-ServiceFabricNode -ReplicaKindPrimary -PartitionKindNamed -PartitionKey Partition3 -CompletionMode Verify $connection = "localhost:19000" $nodeName = "Node1" Connect-ServiceFabricCluster $connection Restart-ServiceFabricNode -NodeName $nodeName -CompletionMode DoNotVerify Restart-ServiceFabricNode should be used to restart a Service Fabric node in a cluster. This will stop the Fabric.exe process, which will restart all of the system service and user service replicas hosted on that node. Using this API to test your service helps uncover bugs along the failover recovery paths. It helps simulate node failures in the cluster. The following screenshot shows the Restart-ServiceFabricNode testability command in action. The output of the first Get-ServiceFabricNode (a cmdlet from the Service Fabric PowerShell module) shows that the local cluster has five nodes: Node.1 to Node.5. After the testability action (cmdlet) Restart-ServiceFabricNode is executed on the node, named Node.4, we see that the node's uptime has been reset. Run an action against an Azure cluster Running a testability action (by using PowerShell) against an Azure cluster is similar to running the action against a local cluster. The only difference is that before you can run the action, instead of connecting to the local cluster, you need to connect to the Azure cluster first. Running a testability action using C# To run a testability action by using C#, first you need to connect to the cluster by using FabricClient. Then obtain the parameters needed to run the action. Different parameters can be used to run the same action. Looking at the RestartServiceFabricNode action, one way to run it is by using the node information (node name and node instance ID) in the cluster. RestartNodeAsync(nodeName, nodeInstanceId, completeMode, operationTimeout, CancellationToken.None) Parameter explanation: CompleteMode specifies that the mode should not verify whether the restart action actually succeeded. Specifying the completion mode as "Verify" will cause it to verify whether the restart action actually succeeded. OperationTimeout sets the amount of time for the operation to finish before a TimeoutException exception is thrown. CancellationToken enables a pending call to be canceled. Instead of directly specifying the node by its name, you can specify it via a partition key and the kind of replica. For further information, see PartitionSelector and ReplicaSelector. // Add a reference to System.Fabric.Testability.dll and System.Fabric.dll using System; using System.Collections.Generic; using System.Linq; using System.Text; using System.Threading.Tasks; using System.Fabric.Testability; using System.Fabric; using System.Threading; using System.Numerics; class Test { public static int Main(string[] args) { string clusterConnection = "localhost:19000"; Uri serviceName = new Uri("fabric:/samples/PersistentToDoListApp/PersistentToDoListService"); string nodeName = "N0040"; BigInteger nodeInstanceId = 130743013389060139; Console.WriteLine("Starting RestartNode test"); try { //Restart the node by using ReplicaSelector RestartNodeAsync(clusterConnection, serviceName).Wait(); //Another way to restart node is by using nodeName and nodeInstanceId RestartNodeAsync(clusterConnection, nodeName, nodeInstanceId).Wait(); } catch (AggregateException exAgg) { Console.WriteLine("RestartNode did not complete: "); foreach (Exception ex in exAgg.InnerExceptions) { if (ex is FabricException) { Console.WriteLine("HResult: {0} Message: {1}", ex.HResult, ex.Message); } } return -1; } Console.WriteLine("RestartNode completed."); return 0; } static async Task RestartNodeAsync(string clusterConnection, Uri serviceName) { PartitionSelector randomPartitionSelector = PartitionSelector.RandomOf(serviceName); ReplicaSelector primaryofReplicaSelector = ReplicaSelector.PrimaryOf(randomPartitionSelector); // Create FabricClient with connection and security information here FabricClient fabricclient = new FabricClient(clusterConnection); await fabricclient.FaultManager.RestartNodeAsync(primaryofReplicaSelector, CompletionMode.Verify); } static async Task RestartNodeAsync(string clusterConnection, string nodeName, BigInteger nodeInstanceId) { // Create FabricClient with connection and security information here FabricClient fabricclient = new FabricClient(clusterConnection); await fabricclient.FaultManager.RestartNodeAsync(nodeName, nodeInstanceId, CompletionMode.Verify); } } PartitionSelector and ReplicaSelector PartitionSelector PartitionSelector is a helper exposed in testability and is used to select a specific partition on which to perform any of the testability actions. It can be used to select a specific partition if the partition ID is known beforehand. Or, you can provide the partition key and the operation will resolve the partition ID internally. You also have the option of selecting a random partition. To use this helper, create the PartitionSelector object and select the partition by using one of the Select* methods. Then pass in the PartitionSelector object to the API that requires it. If no option is selected, it defaults to a random partition. Uri serviceName = new Uri("fabric:/samples/InMemoryToDoListApp/InMemoryToDoListService"); Guid partitionIdGuid = new Guid("8fb7ebcc-56ee-4862-9cc0-7c6421e68829"); string partitionName = "Partition1"; Int64 partitionKeyUniformInt64 = 1; // Select a random partition PartitionSelector randomPartitionSelector = PartitionSelector.RandomOf(serviceName); // Select a partition based on ID PartitionSelector partitionSelectorById = PartitionSelector.PartitionIdOf(serviceName, partitionIdGuid); // Select a partition based on name PartitionSelector namedPartitionSelector = PartitionSelector.PartitionKeyOf(serviceName, partitionName); // Select a partition based on partition key PartitionSelector uniformIntPartitionSelector = PartitionSelector.PartitionKeyOf(serviceName, partitionKeyUniformInt64); ReplicaSelector ReplicaSelector is a helper exposed in testability and is used to help select a replica on which to perform any of the testability actions. It can be used to select a specific replica if the replica ID is known beforehand. In addition, you have the option of selecting a primary replica or a random secondary. ReplicaSelector derives from PartitionSelector, so you need to select both the replica and the partition on which you wish to perform the testability operation. To use this helper, create a ReplicaSelector object and set the way you want to select the replica and the partition. You can then pass it into the API that requires it. If no option is selected, it defaults to a random replica and random partition. Guid partitionIdGuid = new Guid("8fb7ebcc-56ee-4862-9cc0-7c6421e68829"); PartitionSelector partitionSelector = PartitionSelector.PartitionIdOf(serviceName, partitionIdGuid); long replicaId = 130559876481875498; // Select a random replica ReplicaSelector randomReplicaSelector = ReplicaSelector.RandomOf(partitionSelector); // Select the primary replica ReplicaSelector primaryReplicaSelector = ReplicaSelector.PrimaryOf(partitionSelector); // Select the replica by ID ReplicaSelector replicaByIdSelector = ReplicaSelector.ReplicaIdOf(partitionSelector, replicaId); // Select a random secondary replica ReplicaSelector secondaryReplicaSelector = ReplicaSelector.RandomSecondaryOf(partitionSelector); Next steps Testability scenarios How to test your service Simulate failures during service workloads Service-to-service communication failures Simulate failures during service workloads 3/1/2017 • 3 min to read • Edit Online The testability scenarios in Azure Service Fabric enable developers to not worry about dealing with individual faults. There are scenarios, however, where an explicit interleaving of client workload and failures might be needed. The interleaving of client workload and faults ensures that the service is actually performing some action when failure happens. Given the level of control that testability provides, these could be at precise points of the workload execution. This induction of faults at different states in the application can find bugs and improve quality. Sample custom scenario This test shows a scenario that interleaves the business workload with graceful and ungraceful failures. The faults should be induced in the middle of service operations or compute for best results. Let's walk through an example of a service that exposes four workloads: A, B, C, and D. Each corresponds to a set of workflows and could be compute, storage, or a mix. For the sake of simplicity, we will abstract out the workloads in our example. The different faults executed in this example are: RestartNode: Ungraceful fault to simulate a machine restart. RestartDeployedCodePackage: Ungraceful fault to simulate service host process crashes. RemoveReplica: Graceful fault to simulate replica removal. MovePrimary: Graceful fault to simulate replica moves triggered by the Service Fabric load balancer. // Add a reference to System.Fabric.Testability.dll and System.Fabric.dll. using System; using System.Fabric; using System.Fabric.Testability.Scenario; using System.Threading; using System.Threading.Tasks; class Test { public static int Main(string[] args) { // Replace these strings with the actual version for your cluster and application. string clusterConnection = "localhost:19000"; Uri applicationName = new Uri("fabric:/samples/PersistentToDoListApp"); Uri serviceName = new Uri("fabric:/samples/PersistentToDoListApp/PersistentToDoListService"); Console.WriteLine("Starting Workload Test..."); try { RunTestAsync(clusterConnection, applicationName, serviceName).Wait(); } catch (AggregateException ae) { Console.WriteLine("Workload Test failed: "); foreach (Exception ex in ae.InnerExceptions) { if (ex is FabricException) { Console.WriteLine("HResult: {0} Message: {1}", ex.HResult, ex.Message); } } return -1; } Console.WriteLine("Workload Test completed successfully."); return 0; } public enum ServiceWorkloads { A, B, C, D } public enum ServiceFabricFaults { RestartNode, RestartCodePackage, RemoveReplica, MovePrimary, } public static async Task RunTestAsync(string clusterConnection, Uri applicationName, Uri serviceName) { // Create FabricClient with connection and security information here. FabricClient fabricClient = new FabricClient(clusterConnection); // Maximum time to wait for a service to stabilize. TimeSpan maxServiceStabilizationTime = TimeSpan.FromSeconds(120); // How many loops of faults you want to execute. uint testLoopCount = 20; Random random = new Random(); for (var i = 0; i < testLoopCount; ++i) { var workload = SelectRandomValue<ServiceWorkloads>(random); // Start the workload. var workloadTask = RunWorkloadAsync(workload); // While the task is running, induce faults into the service. They can be ungraceful faults like // RestartNode and RestartDeployedCodePackage or graceful faults like RemoveReplica or MovePrimary. var fault = SelectRandomValue<ServiceFabricFaults>(random); // Create a replica selector, which will select a primary replica from the given service to test. var replicaSelector = ReplicaSelector.PrimaryOf(PartitionSelector.RandomOf(serviceName)); // Run the selected random fault. await RunFaultAsync(applicationName, fault, replicaSelector, fabricClient); // Validate the health and stability of the service. await fabricClient.ServiceManager.ValidateServiceAsync(serviceName, maxServiceStabilizationTime); // Wait for the workload to finish successfully. await workloadTask; } } private static async Task RunFaultAsync(Uri applicationName, ServiceFabricFaults fault, ReplicaSelector selector, FabricClient client) { switch (fault) { case ServiceFabricFaults.RestartNode: await client.ClusterManager.RestartNodeAsync(selector, CompletionMode.Verify); break; case ServiceFabricFaults.RestartCodePackage: await client.ApplicationManager.RestartDeployedCodePackageAsync(applicationName, selector, CompletionMode.Verify); break; case ServiceFabricFaults.RemoveReplica: await client.ServiceManager.RemoveReplicaAsync(selector, CompletionMode.Verify, false); break; case ServiceFabricFaults.MovePrimary: case ServiceFabricFaults.MovePrimary: await client.ServiceManager.MovePrimaryAsync(selector.PartitionSelector); break; } } private static Task RunWorkloadAsync(ServiceWorkloads workload) { throw new NotImplementedException(); // This is where you trigger and complete your service workload. // Note that the faults induced while your service workload is running will // fault the primary service. Hence, you will need to reconnect to complete or check // the status of the workload. } private static T SelectRandomValue<T>(Random random) { Array values = Enum.GetValues(typeof(T)); T workload = (T)values.GetValue(random.Next(values.Length)); return workload; } } Testability scenarios 1/24/2017 • 6 min to read • Edit Online Large distributed systems like cloud infrastructures are inherently unreliable. Azure Service Fabric gives developers the ability to write services to run on top of unreliable infrastructures. In order to write high-quality services, developers need to be able to induce such unreliable infrastructure to test the stability of their services. The Fault Analysis Service gives developers the ability to induce fault actions to test services in the presence of failures. However, targeted simulated faults will get you only so far. To take the testing further, you can use the test scenarios in Service Fabric: a chaos test and a failover test. These scenarios simulate continuous interleaved faults, both graceful and ungraceful, throughout the cluster over extended periods of time. Once a test is configured with the rate and kind of faults, it can be started through either C# APIs or PowerShell, to generate faults in the cluster and your service. WARNING ChaosTestScenario is being replaced by a more resilient, service-based Chaos. Please refer to the new article Controlled Chaos for more details. Chaos test The chaos scenario generates faults across the entire Service Fabric cluster. The scenario compresses faults generally seen in months or years to a few hours. The combination of interleaved faults with the high fault rate finds corner cases that are otherwise missed. This leads to a significant improvement in the code quality of the service. Faults simulated in the chaos test Restart a node Restart a deployed code package Remove a replica Restart a replica Move a primary replica (optional) Move a secondary replica (optional) The chaos test runs multiple iterations of faults and cluster validations for the specified period of time. The time spent for the cluster to stabilize and for validation to succeed is also configurable. The scenario fails when you hit a single failure in cluster validation. For example, consider a test set to run for one hour with a maximum of three concurrent faults. The test will induce three faults, and then validate the cluster health. The test will iterate through the previous step till the cluster becomes unhealthy or one hour passes. If the cluster becomes unhealthy in any iteration, i.e. it does not stabilize within a configured time, the test will fail with an exception. This exception indicates that something has gone wrong and needs further investigation. In its current form, the fault generation engine in the chaos test induces only safe faults. This means that in the absence of external faults, a quorum or data loss will never occur. Important configuration options TimeToRun: Total time that the test will run before finishing with success. The test can finish earlier in lieu of a validation failure. MaxClusterStabilizationTimeout: Maximum amount of time to wait for the cluster to become healthy before failing the test. The checks performed are whether cluster health is OK, service health is OK, the target replica set size is achieved for the service partition, and no InBuild replicas exist. MaxConcurrentFaults: Maximum number of concurrent faults induced in each iteration. The higher the number, the more aggressive the test, hence resulting in more complex failovers and transition combinations. The test guarantees that in absence of external faults there will not be a quorum or data loss, irrespective of how high this configuration is. EnableMoveReplicaFaults: Enables or disables the faults that are causing the move of the primary or secondary replicas. These faults are disabled by default. WaitTimeBetweenIterations: Amount of time to wait between iterations, i.e. after a round of faults and corresponding validation. How to run the chaos test C# sample using System; using System.Fabric; using System.Fabric.Testability.Scenario; using System.Threading; using System.Threading.Tasks; class Test { public static int Main(string[] args) { string clusterConnection = "localhost:19000"; Console.WriteLine("Starting Chaos Test Scenario..."); try { RunChaosTestScenarioAsync(clusterConnection).Wait(); } catch (AggregateException ae) { Console.WriteLine("Chaos Test Scenario did not complete: "); foreach (Exception ex in ae.InnerExceptions) { if (ex is FabricException) { Console.WriteLine("HResult: {0} Message: {1}", ex.HResult, ex.Message); } } return -1; } Console.WriteLine("Chaos Test Scenario completed."); return 0; } static async Task RunChaosTestScenarioAsync(string clusterConnection) { TimeSpan maxClusterStabilizationTimeout = TimeSpan.FromSeconds(180); uint maxConcurrentFaults = 3; bool enableMoveReplicaFaults = true; // Create FabricClient with connection and security information here. FabricClient fabricClient = new FabricClient(clusterConnection); // The chaos test scenario should run at least 60 minutes or until it fails. TimeSpan timeToRun = TimeSpan.FromMinutes(60); ChaosTestScenarioParameters scenarioParameters = new ChaosTestScenarioParameters( maxClusterStabilizationTimeout, maxConcurrentFaults, enableMoveReplicaFaults, enableMoveReplicaFaults, timeToRun); // Other related parameters: // Pause between two iterations for a random duration bound by this value. // scenarioParameters.WaitTimeBetweenIterations = TimeSpan.FromSeconds(30); // Pause between concurrent actions for a random duration bound by this value. // scenarioParameters.WaitTimeBetweenFaults = TimeSpan.FromSeconds(10); // Create the scenario class and execute it asynchronously. ChaosTestScenario chaosScenario = new ChaosTestScenario(fabricClient, scenarioParameters); try { await chaosScenario.ExecuteAsync(CancellationToken.None); } catch (AggregateException ae) { throw ae.InnerException; } } } PowerShell $connection = "localhost:19000" $timeToRun = 60 $maxStabilizationTimeSecs = 180 $concurrentFaults = 3 $waitTimeBetweenIterationsSec = 60 Connect-ServiceFabricCluster $connection Invoke-ServiceFabricChaosTestScenario -TimeToRunMinute $timeToRun -MaxClusterStabilizationTimeoutSec $maxStabilizationTimeSecs -MaxConcurrentFaults $concurrentFaults -EnableMoveReplicaFaults - WaitTimeBetweenIterationsSec $waitTimeBetweenIterationsSec Failover test The failover test scenario is a version of the chaos test scenario that targets a specific service partition. It tests the effect of failover on a specific service partition while leaving the other services unaffected. Once it's configured with the target partition information and other parameters, it runs as a client-side tool that uses either C# APIs or PowerShell to generate faults for a service partition. The scenario iterates through a sequence of simulated faults and service validation while your business logic runs on the side to provide a workload. A failure in service validation indicates an issue that needs further investigation. Faults simulated in the failover test Restart a deployed code package where the partition is hosted Remove a primary/secondary replica or stateless instance Restart a primary secondary replica (if a persisted service) Move a primary replica Move a secondary replica Restart the partition The failover test induces a chosen fault and then runs validation on the service to ensure its stability. The failover test induces only one fault at a time, as opposed to possible multiple faults in the chaos test. If the service partition does not stabilize within the configured timeout after each fault, the test fails. The test induces only safe faults. This means that in absence of external failures, a quorum or data loss will not occur. Important configuration options PartitionSelector: Selector object that specifies the partition that needs to be targeted. TimeToRun: Total time that the test will run before finishing. MaxServiceStabilizationTimeout: Maximum amount of time to wait for the cluster to become healthy before failing the test. The checks performed are whether service health is OK, the target replica set size is achieved for all partitions, and no InBuild replicas exist. WaitTimeBetweenFaults: Amount of time to wait between every fault and validation cycle. How to run the failover test C# using System; using System.Fabric; using System.Fabric.Testability.Scenario; using System.Threading; using System.Threading.Tasks; class Test { public static int Main(string[] args) { string clusterConnection = "localhost:19000"; Uri serviceName = new Uri("fabric:/samples/PersistentToDoListApp/PersistentToDoListService"); Console.WriteLine("Starting Chaos Test Scenario..."); try { RunFailoverTestScenarioAsync(clusterConnection, serviceName).Wait(); } catch (AggregateException ae) { Console.WriteLine("Chaos Test Scenario did not complete: "); foreach (Exception ex in ae.InnerExceptions) { if (ex is FabricException) { Console.WriteLine("HResult: {0} Message: {1}", ex.HResult, ex.Message); } } return -1; } Console.WriteLine("Chaos Test Scenario completed."); return 0; } static async Task RunFailoverTestScenarioAsync(string clusterConnection, Uri serviceName) { TimeSpan maxServiceStabilizationTimeout = TimeSpan.FromSeconds(180); PartitionSelector randomPartitionSelector = PartitionSelector.RandomOf(serviceName); // Create FabricClient with connection and security information here. FabricClient fabricClient = new FabricClient(clusterConnection); // The chaos test scenario should run at least 60 minutes or until it fails. TimeSpan timeToRun = TimeSpan.FromMinutes(60); FailoverTestScenarioParameters scenarioParameters = new FailoverTestScenarioParameters( randomPartitionSelector, timeToRun, maxServiceStabilizationTimeout); // Other related parameters: // Pause between two iterations for a random duration bound by this value. // scenarioParameters.WaitTimeBetweenIterations = TimeSpan.FromSeconds(30); // Pause between concurrent actions for a random duration bound by this value. // Pause between concurrent actions for a random duration bound by this value. // scenarioParameters.WaitTimeBetweenFaults = TimeSpan.FromSeconds(10); // Create the scenario class and execute it asynchronously. FailoverTestScenario failoverScenario = new FailoverTestScenario(fabricClient, scenarioParameters); try { await failoverScenario.ExecuteAsync(CancellationToken.None); } catch (AggregateException ae) { throw ae.InnerException; } } } PowerShell $connection = "localhost:19000" $timeToRun = 60 $maxStabilizationTimeSecs = 180 $waitTimeBetweenFaultsSec = 10 $serviceName = "fabric:/SampleApp/SampleService" Connect-ServiceFabricCluster $connection Invoke-ServiceFabricFailoverTestScenario -TimeToRunMinute $timeToRun -MaxServiceStabilizationTimeoutSec $maxStabilizationTimeSecs -WaitTimeBetweenFaultsSec $waitTimeBetweenFaultsSec -ServiceName $serviceName - PartitionKindSingleton If it is Faulted. Also. In addition. some errors returned by these APIs are not as descriptive as they could be. it was unclear if the node was down or stopped. a stopped Service Fabric node is a node intentionally targeted using the Stop Node API. Why are we replacing these? As described earlier. Differentiating between a stopped node and a down node If a node is stopped using the Node Transition API. thus simulating a down node. Usage If the Node Transition API does not throw an exception when invoked. Please note that the “Start” in the name of the API does not refer to starting a node. the output of a node query (managed: GetNodeListAsync().com/dotnet/api/system. For example.g. PowerShell: Get-ServiceFabricNode) will show that this node has an IsStopped property value of true. or to transition it from a stopped state to a normal up state. If the state is “Running” then the operation is executing.testcommandprogressstate for more information about the State property. Introducing the Node Transition APIs We’ve addressed these issues above in a new set of APIs. . This is useful for injecting faults into the system to test your application. To get information about the current state of the operation. Later. The Node Transition Progress API returns an NodeTransitionProgress object. call the Node Transition Progress API (managed: GetNodeTransitionProgressAsync()) with the guid used when invoking Node Transition API for this operation. the VM or machine is off). The Start Node API (managed: StartNodeAsync(). PowerShell: Stop-ServiceFabricNode) stops a Service Fabric node. A Service Fabric node is process. the system does not expose information to differentiate between stopped nodes and down nodes. invoking the Stop Node API on an already stopped node will return the error InvalidAddress. there was a problem executing the operation. then the system has accepted the asynchronous operation. This experience could be improved. With the Stop Node API. not a VM or machine – the VM or machine will still be running. See https://docs. PowerShell: Start- ServiceFabricNode]) reverses the Stop Node API. and the “Sample Usage” section below for code examples. The new Node Transition API (managed: StartNodeTransitionAsync()) may be used to transition a Service Fabric node to a stopped state. We’ve found this can cause problems and may be error-prone. This object’s State property specifies the current state of the operation. A down node is a node that is down for any other reason (e. It refers to beginning an asynchronous operation that the system will execute to transition the node to either stopped or started state. The Result property’s Exception property will indicate what the issue was. and will execute it. If it is Completed.microsoft. the operation finished without error.fabric. A successful call does not imply the operation is finished yet. we’ve seen problems where a user invoked the Stop Node API on a node and then forgot about it. which brings the node back to a normal state. Note this is different from the value of the NodeStatus property. Replacing the Start Node and Stop node APIs with the Node Transition API 1/24/2017 • 8 min to read • Edit Online What do the Stop Node and Start Node APIs do? The Stop Node API (managed: StopNodeAsync(). Stopping a node puts it into a stopped state where it is not a member of the cluster and cannot host services. For example. For the rest of the document "node" will mean Service Fabric node. the duration a node is stopped for is “infinite” until the Start Node API is invoked. This value must be in the allowed range. WARNING Multiple Node Transition APIs calls cannot be made on the same node in parallel. If the NodeStatus property has a value of Down. Limited Duration When using the Node Transition API to stop a node.WriteLine("Caught exception '{0}'". the node will restart itself into Up state automatically. In such a situation. return n. Up). the Node Transition API will > throw a FabricException with an ErrorCode property value of NodeTransitionInProgress. // Helper function to get information about a node static Node GetNodeInfo(FabricClient fc. TimeSpan. .which will say Down.GetAwaiter(). exceptionObserved = true.g. you should wait until the operation reaches a terminal state (Completed. Starting a stopped node using the Node Transition API will return it to function as a normal member of the cluster again. Parallel node transition calls on different nodes are allowed. and NodeStatus as something that is not Down (e.Delay(TimeSpan. it should be started using the Start Node API first before using the > Node Transition APIs. The recommendation is to use the Node Transition API only. Guid operationId. then the node was not stopped using the Node Transition API. do { bool exceptionObserved = false. then it was stopped using the Node Transition API. Sample Usage Sample 1 . If the IsStopped property is true. The output of the node query API will show IsStopped as false. } catch (OperationCanceledException oce) { Console. and a maximum of 14400. > If a node has been already been stopped using the Stop Node API. and is Down due some other reason.None).GetResult(). Faulted.FromSeconds(1)). try { progress = await fc. or ForceCancelled) before starting a > new transition on the same node.FirstOrDefault().QueryManager. while (n == null) { n = fc. WARNING Avoid mixing Node Transition APIs and the Stop Node and Start Node APIs. After this time expires. string node) { NodeList n = null.FromMinutes(1). TestCommandProgressState targetState) { NodeTransitionProgress progress = null. }. oce). Once a node transition on a specific node has > been started. Refer to Sample 1 below for an example of usage. represents the amount of time in seconds to keep the node stopped. and the NodeStatus property is Down. but IsStopped is false.GetAwaiter(). Task.ConfigureAwait(false). which has a minimum of 600.GetNodeTransitionProgressAsync(operationId. } static async Task WaitForStateAsync(FabricClient fc. stopNodeDurationInSeconds.GetNodeListAsync(node). CancellationToken. one of the required parameters.The following sample uses the Node Transition API to stop a node.TestManager. Faulted) { // Inspect the progress object's Result. } while (!wasSuccessful).WriteLine("'{0}' failed with: {1}.ConfigureAwait(false). // Create a NodeStopDescription object.Delay(TimeSpan. bool wasSuccessful = false. n.Exception. } catch (FabricTransientException fte) { Console.Exception. } while (true).WriteLine("Target state '{0}' has been reached".. operationId. if (progress. // Create a Guid Guid guid = Guid.ConfigureAwait(false). Console. } if (!exceptionObserved) { Console.HResult to get the error code. string nodeName.Exception.FromSeconds(5)).ConfigureAwait(false).Result. n. which will be used as a parameter into StartNodeTransition NodeStopDescription description = new NodeStopDescription(guid.NodeInstanceId. fte).State).StartNodeTransitionAsync(description. // . .NewGuid().NodeName. durationInSeconds).Delay(TimeSpan.TestManager. progress. CancellationToken.WriteLine("Current state of operation '{0}': {1}". progress. int durationInSeconds) { // Uses the GetNodeListAsync() API to get information about the target node Node n = GetNodeInfo(fc.None). targetState). Retry transient errors. await fc. wasSuccessful = true. which will stop the target node.WriteLine("Caught exception '{0}'". nodeName). do { try { // Invoke StartNodeTransitionAsync with the NodeStopDescription from above.State == targetState) { Console. } static async Task StopNodeAsync(FabricClient fc.FromSeconds(5)).FromMinutes(1).HResult). TimeSpan. operationId. } catch (OperationCanceledException oce) { // This is retryable } catch (FabricTransientException fte) { // This is retryable } // Backoff await Task.State == TestCommandProgressState. exceptionObserved = true.additional logic as required } if (progress.Result.. progress. } } await Task. HResult: {2}". break. guid. nodeName). It uses some helper methods from the first sample. Guid guid = Guid.Delay(TimeSpan.ConfigureAwait(false). nodeInstanceId). This usage is incorrect because the stopDurationInSeconds it provides is greater than the allowed range. while (!wasSuccessful). } while (!wasSuccessful). This sample uses some helper methods from the first sample.Completed).The following sample shows incorrect usage. BigInteger nodeInstanceId = n. } catch (OperationCanceledException oce) { Console. } await Task. which will be used as a parameter into StartNodeTransition NodeStartDescription description = new NodeStartDescription(guid.NewGuid(). wasSuccessful = true.Completed). CancellationToken. static async Task StartNodeAsync(FabricClient fc. await fc. guid. fte). bool wasSuccessful = false. oce). } Sample 2 . } catch (FabricTransientException fte) { Console. TimeSpan.ConfigureAwait(false).TestManager. // Create a NodeStartDescription object. and the progress API should not be called. // Now call StartNodeTransitionProgressAsync() until hte desired state is reached.WriteLine("Caught exception '{0}'".FromSeconds(5)).ConfigureAwait(false). TestCommandProgressState.NodeInstanceId. string nodeName) { // Uses the GetNodeListAsync() API to get information about the target node Node n = GetNodeInfo(fc.None). Retry transient errors. do { try { // Invoke StartNodeTransitionAsync with the NodeStartDescription from above.NodeName.The following sample starts a stopped node.FromMinutes(1). await WaitForStateAsync(fc. .WriteLine("Caught exception '{0}'". // Now call StartNodeTransitionProgressAsync() until hte desired state is reached. } Sample 3 . the operation was not accepted. TestCommandProgressState. which will start the target stopped node.StartNodeTransitionAsync(description. Since StartNodeTransitionAsync() will fail with a fatal error.ConfigureAwait(false). await WaitForStateAsync(fc. n. // Output: // Caught System. e. n. 99999). Guid guid = Guid. TimeSpan.NodeInstanceId.NodeName.WriteLine("Caught {0}". try { await fc. .COMException: Exception from HRESULT: 0x80071C63 // << Parts of exception omitted>> // // ErrorCode InvalidDuration } } Sample 4 .The following sample shows the error information that will be returned from the Node Transition Progress API when the operation initiated by the Node Transition API is accepted.NewGuid(). // Use an out of range value for stopDurationInSeconds to demonstrate error NodeStopDescription description = new NodeStopDescription(guid.Runtime. it fails because the Node Transition API attempts to start a node that does not exist.None). } catch (FabricException e) { Console. n. In the case. string nodeName) { Node n = GetNodeInfo(fc.StartNodeTransitionAsync(description.InteropServices.Fabric.ErrorCode).TestManager.COMException (- 2147017629) // StopDurationInSeconds is out of range ---> System.FromMinutes(1).InteropServices. Console. e).ConfigureAwait(false). nodeName).WriteLine("ErrorCode {0}". CancellationToken.Runtime.FabricException: System. This sample uses some helper methods from the first sample. but fails later while executing. static async Task StopNodeWithOutOfRangeDurationAsync(FabricClient fc. HResult to get the error code.WriteLine("Caught exception '{0}'". do { try { // Invoke StartNodeTransitionAsync with the NodeStartDescription from above. } catch (OperationCanceledException oce) { Console. // In this case.Exception.ConfigureAwait(false). // When StartNodeTransitionProgressAsync()'s returned progress object has a State if Faulted.NewGuid(). "NonexistentNode". } await Task. wasSuccessful = true. In this case.ConfigureAwait(false). Retry transient errors. await fc. guid.ConfigureAwait(false). TestCommandProgressState. } while (!wasSuccessful).Faulted). await WaitForStateAsync(fc. CancellationToken. which will start the target stopped node. // Intentionally use a nonexistent node NodeStartDescription description = new NodeStartDescription(guid.FromSeconds(5)). it will be NodeNotFound. } . TimeSpan.FromMinutes(1).WriteLine("Caught exception '{0}'".Delay(TimeSpan. it will end up in the Faulted state since the node does not exist. } catch (FabricTransientException fte) { Console. BigInteger nodeInstanceId = 12345.None). nodeInstanceId). bool wasSuccessful = false. fte). static async Task StartNodeWithNonexistentNodeAsync(FabricClient fc) { Guid guid = Guid. // Now call StartNodeTransitionProgressAsync() until the desired state is reached. oce).StartNodeTransitionAsync(description.TestManager. inspect the progress object's Result. you need to do the following: Get a Visual Studio Team Services account. You provide an airplane ID. but Visual Studio 2013 and other editions should work similarly. This helps you interpret and analyze the load test results. Create and run the Web Performance and Load Test project Create a Web Performance and Load Test project 1. Understand the goal for your load testing. . The application’s back end processes the requests. You can get one for free at Visual Studio Team Services. and destination. The example application used here is an airplane location simulator. This information is used to simulate the load pattern. Give the project a name and then choose the OK button. The following diagram illustrates the Service Fabric application that you'll be testing. Deploy your application to a staging environment. Open Visual Studio 2015. Expand the Visual C# node and choose Test > Web Performance and Load Test project. 2. Prerequisites Before getting started. See How to deploy applications to a remote cluster using Visual Studio for information about this. departure time. Load test your application by using Visual Studio Team Services 1/17/2017 • 5 min to read • Edit Online This article shows how to use Microsoft Visual Studio load test features to stress test an application. It uses an Azure Service Fabric stateful service back end and a stateless service web front end. Choose File > New > Project on the menu bar to open the New Project dialog box. Understand your application’s usage pattern. Get and install Visual Studio 2013 or Visual Studio 2015. and the front end displays on a map the airplane that matches the criteria. This article uses Visual Studio 2015 Enterprise edition. . You should see a new Web Performance and Load Test project in Solution Explorer. Record a web performance test 1. Choose the Add Recording icon to start a recording session in your browser. 2.webtest project. Open the . When you're done. These actions are used as a pattern to generate the load. Perform a sequence of actions that you expect the users to perform. repeated dependency requests that are not part of your test scenario.3. choose the Stop button to stop recording. 4.webtest project in Visual Studio should have captured a series of requests. 6. Save the project and then choose the Run Test command to run the web performance test locally and make sure everything works correctly. The recording panel should show the web requests. Dynamic parameters are replaced automatically. 5. At this point. you can delete any extra. . Browse to the Service Fabric application. The . you can bind the web performance test to a data list so that the test iterates through the data. As an alternative. See Generate and run a coded web performance test for details about how to convert the web performance test to a coded test. open the . This opens the run settings in the Properties window. See Add a data source to a web performance test for information about how to bind data to a web performance test. Log in and then choose a geographic location. which starts with a few users and increases the users over time. we'll convert the web performance test to a coded test so you can replace the airplane ID with a generated GUID and add more requests to send flights to different locations. choose Constant Load. 6. On the shortcut menu of your Web Performance and Load Test project. 7. such as Run Settings > Run Settings1 [Active]. You can use the Distribution column to specify the percentage of total tests run for each test. choose Step Load. choose the Add button and then select the test that you want to include in the load test. In the Load Pattern section. If you have a good estimate of the amount of user load and want to see how the current system performs. choose the Next button to configure the test settings. The following steps show how to create a load test project: 1.loadtest project and choose the current run setting. specify the location where load test requests are generated.Parameterize the web performance test You can parameterize the web performance test by converting it to a coded web performance test and then editing the code. . For this example. 4. Run the load test by using Visual Studio Team Services Choose the Run Load Test command to start the test run. In the Location section of Run Settings. 2. In the Test Mix section. Change this value to All Individual Details to get more information on the load test results. Create a load test project A load test project is composed of one or more scenarios described by the web performance test and unit test. 5. specify the load test duration. If your goal is to learn whether the system performs consistently under various loads. choose the Finish button. 3. NOTE The Test Iterations option is available only when you run a load test locally using Visual Studio. choose whether you want a constant user load or a step load. choose Add > Load Test. In the Run Settings section. along with additional specified load test settings. After the load test is created. See Load Testing for more information on how to connect to Visual Studio Team Services and run a load test. The wizard may prompt you to log in to your Team Services account. When you're done. In the Results section of the Run Settings properties window. the Timing Details Storage setting should have None as its default value. In the Load Test wizard. You should see something similar to the following graph.View and analyze the load test results As the load test progresses. Choose the number links on the Test > Failed and the Errors > Count columns to see error details. choose the View report button. After the report is downloaded. See Analyzing Load Test Results in the Graphs View of the Load Test Analyzer for more information on viewing load test results. Automate your load test Visual Studio Team Services Load Test provides APIs to help you manage load tests and analyze results in a Team Services account. 1. On the Summary tab. The Tables tab shows the total number of passed and failed load tests. the performance information is graphed. On the Graph tab you can see graphs for various performance counters. the overall test results appear. This data can be useful if the load test includes multiple scenarios. Next steps Monitoring and diagnosing services in a local machine development setup . See Cloud Load Testing Rest APIs for more information. Choose the Download report link near the top of the page. The Detail tab shows virtual user and test scenario information for failed requests. 2. Orchestrators (not humans) are what swing into action when a machine fails or a workload terminates for some unexpected reason. maybe with a few other specialized components like a cache. Introducing the Service Fabric cluster resource manager 1/17/2017 • 6 min to read • Edit Online Traditionally managing IT systems or a set of services meant getting a few physical or virtual machines dedicated to those specific services or systems. Kubernetes. As a consequence of breaking your formerly monolithic. Each named instance has one or more instances or replicas for High Availability (HA). and which conflict? When a machine goes down… what was even running there? Who is in charge of making sure that workload starts running again? Do you wait for the (virtual?) machine to come back or do your workloads automatically fail over to other machines and keep running? Is human intervention required? What about upgrades in this environment? As developers and operators dealing with this. and dealing with resource consumption. Chronos or Marathon on top of Mesos. You want to be able to tell an Orchestrator what you want and have it do the heavy lifting. no matter what happens. then you added more machines with that same configuration. hundreds. Who decides what types of workloads can run on which hardware. let’s say you’ve found a need to scale out and have taken the containers and/or microservice plunge. Most of the time though you replaced a few of the machines with larger machines. Still fairly easy (if not necessarily fun). we’re going to need some help managing this complexity. This tier would connect to a work tier for any analysis or transformation necessary as a part of the messaging. that part of the overall application ran at lower capacity until the machine could be restored. Configuration is less about the machines and more about the services themselves. Easy. or thousands of machines. Most Orchestrators do more than just deal with failure. the web servers a few. tiered app into separate services running on commodity hardware. Now however. If a machine failed. All Orchestrators are fundamentally about maintaining some desired state of configuration in the environment. Other types of applications would have a messaging tier where requests flowed in and out. Other features they have are managing new deployments. They try to make the environment match the desired state. You have dozens of different types of services (none consuming a full machine's worth of resources). You get the sense that a hiring binge and trying to hide the complexity with people is not the right answer. spanning multiple smaller pieces of commodity hardware. Many major services were broken down into a “web” tier and a “data” or “storage” tier. and Service Fabric are all examples of Orchestrators (or have . you now have many more combinations to deal with. Your servers are virtual and no longer have names (you have switched mindsets from pets to cattle after all). Orchestrators are the components that take in requests like “I would like five copies of this service running in my environment”. perhaps hundreds of different instances of those services. handling upgrades. Fleet. and services themselves have become small distributed systems. or how many? Which workloads work well on the same hardware. Each type of workload got specific machines dedicated to it: the database got a couple machines dedicated to it. Docker Datacenter/Docker Swarm. Dedicated hardware is a thing of the past. Suddenly you find yourself with tens. Suddenly managing your environment is not so simple as managing a few machines dedicated to single types of workloads. If a particular type of workload caused the machines it was on to run too hot. What to do? Introducing orchestrators An “Orchestrator” is the general term for a piece of software that helps administrators manage these types of environments. In these architectures. Strategies for balancing the data tier were different and depended on the data storage mechanism. For example. usually centering around data sharding. others are software-based such as Microsoft’s NLB. it might move services to nodes that are currently cold because the services that are there are not doing much work. the job of load balancing is to ensure stateless workloads receive (roughly) the same amount of work. More advanced balancers use actual estimation or reporting to route a calls based on its expected cost and current machine load. caching. In other environments. A Network Load Balancer ensures that the frontends are balanced by moving traffic to where the services are running. More are being created all the time as the complexities of managing real world deployments in different types of environments and conditions grow and change. the Cluster Resource Manager could also move a service away from a machine. stored procedures. Helping with Other Processes To see how the Cluster Resource Manager works. Enforcing Rules 2. and other store-specific mechanisms. managed views. The Service Fabric Cluster Resource Manager takes a different strategy. Service Fabric moves services to where they make the most sense. watch the following Microsoft Virtual Academy video: What it isn’t In traditional N tier applications there was always some notion of a “Load Balancer”. Network balancers or message routers tried to ensure that the web/worker tier remained roughly balanced. Some balancers would send each different call to a different server. Orchestration as a service The job of the Orchestrator within a Service Fabric cluster is handled primarily by the Cluster Resource Manager. Usually this was a Network Load Balancer (NLB) or an Application Load Balancer (ALB) depending on where it sat in the networking stack. we'll find it employs fundamentally different strategies for ensuring that the hardware resources in the cluster are efficiently utilized. Strategies balancing load varied. . Fundamentally. you might see something like HAProxy or nginx in this role. the Cluster Resource Manager’s job is broken down into three parts: 1. Optimizing Your Environment 3. As we look in more detail. expecting traffic or load to follow. Generally. As another example. Others provided session pinning/stickiness. Some load balancers are Hardware-based like F5’s BigIP offering.them built in). The Service Fabric Cluster Resource Manager is one of the System Services within Service Fabric and is automatically started up within each cluster. Perhaps the machine is about to be upgraded or is overloaded due to a spike in consumption by the services running on it. While some of these strategies are interesting. The nodes may be cold since the services that were present were deleted or moved elsewhere. it contains a different feature set compared to what you would find in a network load balancer. the Service Fabric Cluster Resource Manager is not anything like a network load balancer or a cache. Because the Cluster Resource Manager is responsible for moving services around (not delivering network traffic to where services already are). To find out more about them. read this article To find out about how the Cluster Resource Manager manages and balances load in the cluster. To find out more about that integration. check out this article on describing a Service Fabric cluster For more information about the other options available for configuring services. check out the topic on the other Cluster Resource Manager configurations available Learn about configuring Services Metrics are how the Service Fabric Cluster Resource Manger manages consumption and capacity in the cluster. check out the article on balancing load .Next steps For information on the architecture and information flow within the Cluster Resource Manager. check out this article The Cluster Resource Manager has many options for describing the cluster. To learn more about them and how to configure them check out this article The Cluster Resource Manager works with Service Fabric's management capabilities. the person handling a live-site incident for that service in production has a different job to do. Services may track physical metrics like memory and disk consumption. Let’s look at the following diagram: . Both logical and physical metrics may be used across many different types of services or maybe specific to only a couple services. The Cluster Resource Manager service aggregates all the information from the local agents and reacts based on its current configuration. the Service Fabric Cluster Resource Manager must have several pieces of information. and things can fail at any time. To track the available resources in the cluster. Other considerations The owners and operators of the cluster are occasionally different from the service authors. the Cluster Resource Manager has two conceptual parts: agents that run on each node and a fault-tolerant service. Additionally. The resource consumption of a given service can change over time. neither the cluster or the services are statically configured: The number of nodes in the cluster can grow and shrink Nodes of different sizes and types can come and go Services can be created. and services usually care about more than one type of resource. there may be both real physical and physical resources being measured.things like "WorkQueueDepth" or "TotalRequests". or at a minimum are the same people wearing different hats. To accomplish this. it has to know the capacity of the nodes in the cluster and the amount of resources consumed on each. It has to know which services currently exist and the current (or default) amount of resources that those services are consuming. when developing your service you know a few things about what it requires in terms of resources and how the different components should ideally be deployed. and periodically report them. For example. More commonly. and requires different tools. aggregate them. services may care about logical metrics . Across different services. However. and change their desired resource allocations Upgrades or other management operations can roll through the cluster. The agents on each node track load reports from services. removed. Cluster resource manager components and data flow The Cluster Resource Manager has to track the requirements of individual services and the consumption of resources by the individual service objects that make up those services. Cluster resource manager architecture overview 1/17/2017 • 3 min to read • Edit Online To manage the resources in your cluster. To find out more about them. analyzed. At the end of the reconfiguration (5). or that certain services have failed (or been deleted). In this case.2) where they are aggregated again.During runtime. freeing up resources elsewhere. and stored. we presume that the Resource Manager noticed that Node 5 was overloaded. and so decided to move service B from N5 to N4. it could notice that some empty nodes have been added to the cluster and decide to move some services to those nodes. some services fail. Let’s look at the following diagram and see what happens next. Let’s say that the Cluster Resource Manager determines that changes are necessary. check out this article on describing a Service Fabric cluster . For example. let’s say the amount of resources some services consume changes. Every few seconds that service looks at the changes and determines if any actions are necessary (3). and some nodes join and leave the cluster. Then the necessary commands are sent to the appropriate nodes (4). the cluster looks like this: Next steps The Cluster Resource Manager has many options for describing the cluster. All the changes on a node are aggregated and periodically sent to the Cluster Resource Manager service (1. there are many changes that could happen. It coordinates with other system services (in particular the Failover Manager) to make the necessary changes. For example. The Cluster Resource Manager could also notice that a particular node is overloaded. Describing a service fabric cluster 1/17/2017 • 22 min to read • Edit Online The Service Fabric Cluster Resource Manager provides several mechanisms for describing a cluster. it also attempts to optimize the cluster's resource consumption. you need to think through these different areas of failure. While enforcing these important rules. In the Azure environment Service Fabric uses the Fault Domain information provided by the environment to correctly configure the nodes in the cluster on your behalf. and blades ("B"). It is important that Fault Domains are set up correctly since Service Fabric uses this information to safely place services. Fault Domains are inherently hierarchal and are represented as URIs in Service Fabric. Conceivably. the Cluster Resource Manager uses this information to ensure high availability of the services running in the cluster. from power supply failures to drive failures to bad NIC firmware). Key concepts The Cluster Resource Manager supports several features that describe a cluster: Fault Domains Upgrade Domains Node Properties Node Capacities Fault domains A Fault Domain is any area of coordinated failure. if each blade holds more than one virtual machine. racks ("R"). During runtime. When you set up your own cluster. Since it's natural for hardware faults to overlap. . In the graphic below we color all the entities that contribute to Fault Domains and list all the different Fault Domains that result. there could be another layer in the Fault Domain hierarchy. as are those machines sharing a single source of power. A single machine is a Fault Domain (since it can fail on its own for various reasons. Service Fabric doesn't want to place services such that the loss of a Fault Domain (caused by the failure of some component) causes services to go down. we have datacenters ("DC"). In this example. Machines connected to the same Ethernet switch are in the same Fault Domain. In Azure the choice of which Fault Domain contains a node is managed for you. It attempts to spread out the stateful replicas or stateless instances for a given service so that they are in separate Fault Domains. This placement allows us to lose a Fault Domain while in the middle of a service upgrade and . you get to decide how many you want rather than it being dictated by the environment. With Upgrade Domains. For example. that the availability of that service is not compromised. Upgrade Domains are defined by policy. Upgrade Domains define sets of nodes that are upgraded at the same time. However. As a result. but with a couple key differences. Doing otherwise would contribute to imbalances in the load of individual nodes and make the failure of certain domains more critical than others.During runtime. In the first example the nodes are distributed evenly across the Fault Domains. This process helps ensure that if there is a failure of any one Fault Domain (at any level in the hierarchy). the Cluster Resource Manager is torn between its two goals: It was to use the machines in that “heavy” domain by placing services on them. Service Fabric’s Cluster Resource Manager doesn’t care how many layers there are in the Fault Domain hierarchy. Upgrade Domains are a lot like Fault Domains. If you ever stand up your own cluster on-premise or in another environment. If you continue to deploy more NodeTypes with only a couple instances. Keeping the levels balanced prevents one portion of the hierarchy from containing more services than others. It also shows one possible placement for three different replicas of a stateful service. while Fault Domains are rigorously defined by the areas of coordinated hardware failures. In this case the first two Fault Domains end up with more nodes. What does this look like? In the diagram above. say you have five Fault Domains but provision seven nodes for a given NodeType. However. the problem gets worse. we show two different example cluster layouts. it’s something you have to think about. where each ends up in different Fault and Upgrade Domains. it is best if there are the same number of nodes at each level of depth in the Fault Domain hierarchy. The following diagram shows three Upgrade Domains are striped across three Fault Domains. and it wants to place services so that the loss of a domain doesn’t cause problems. depending on the number of nodes that you provision you can still end up with Fault Domains with more nodes in them than others. Imbalanced Fault Domains layouts mean that the loss of a particular domain can impact the availability of the cluster more than others. Because of this. the Service Fabric Cluster Resource Manager considers the Fault Domains in the cluster and plans a layout. If the “tree” of Fault Domains is unbalanced in your cluster. Another difference is that (today at least) Upgrade Domains are not hierarchical – they are more like a simple tag. Upgrade domains Upgrade Domains are another feature that helps the Service Fabric Cluster Resource Manager understand the layout of the cluster so that it can plan ahead for failures. it makes it harder for the Cluster Resource Manager to figure the best allocation of services. First. In the other one Fault Domain ends up with many more nodes. it tries to ensure that the loss of any one portion of the hierarchy doesn’t impact the services running on top of it. Too few Upgrade Domains has its own side effects – while each individual Upgrade Domain is down and being upgraded a large portion of your overall capacity is unavailable. There are pros and cons to having large numbers of Upgrade Domains. With more Upgrade Domains each step of the upgrade is more granular and therefore affects a smaller number of nodes or services. There’s no real limit to the total number of fault or Upgrade Domains in an environment. More Upgrade Domains also means that you need less available overhead on other nodes to handle the impact of the upgrade. Service Fabric waits a short period of time after an Upgrade Domain is completed before proceeding.still have one copy of the code and data. For example. Maintaining that buffer means that in the normal case those nodes are less-loaded than they would otherwise be. If you need to take down that Upgrade Domain for an upgrade. This tends to also improve reliability (since less of the service is impacted by any issue introduced during the upgrade). increasing the cost of running your service. The tradeoff is acceptable because it prevents bad changes from affecting too much of the service at a time. Having so much of your service down at once isn’t desirable since you have to have enough capacity in the rest of your cluster to handle the workload. This results in fewer services having to move at a time. if you have five Upgrade Domains. that load needs to go somewhere. or constraints on how they overlap. For example. if you only have three Upgrade Domains you are taking down about 1/3 of your overall service or cluster capacity at a time. The downside of having many Upgrade Domains is that upgrades tend to take longer. introducing less churn into the system. More Upgrade Domains means less overhead that must be maintained on the other nodes in the cluster. the nodes in each are handling roughly 20% of your traffic. Common structures that we’ve seen are: Fault Domains and Upgrade Domains mapped 1:1 One Upgrade Domain per Node (physical or virtual OS instance) A “striped” or “matrix” model where the Fault Domains and Upgrade Domains form a matrix with machines usually running down the diagonals . This delay is so that issues introduced by the upgrade have a chance to show up and be detected. The Fault and Upgrade Domain constraints state: "For a given service partition there should never be a difference greater than one in the number of service objects (stateless service instances or stateful service replicas) between two domains. But why? Let's look at the difference between the current layout and what would happen if N6 is chosen. Put differently. N6 will never be used no matter how many services you create. for sufficiently large clusters. The 1 UD per Node model is most like what people are used to from managing small sets of machines in the past where each would be taken down independently. where the FDs and UDs form a table and nodes are placed starting along the diagonal." Practically what this means is that for a given service certain moves or arrangements might not be valid. Whether this ends up sparse or packed depends on the total number of nodes compared to the number of FDs and UDs. Fault and Upgrade Domain constraints and resulting behavior The Cluster Resource Manager treats the desire to keep a service balanced across fault and Upgrade Domains as a constraint. Let's say that we have a cluster with six nodes. FD0 FD1 FD2 FD3 FD4 UD0 N1 UD1 N6 N2 UD2 N3 UD3 N4 UD4 N5 Now let's say that we create a service with a TargetReplicaSetSize of five. each has some pros and cons. configured with five Fault Domains and five Upgrade Domains. For example. almost everything ends up looking like the dense matrix pattern. Let's look at one example. The replicas land on N1-N5. the 1FD:1UD model is fairly simple to set up.There’s no best answer which layout to choose. Here's the layout we got and the total number of replicas per Fault and Upgrade Domain: . shown in the bottom right option of the image above. In fact. The most common model (and the one used in Azure) is the FD/UD matrix. because they would violate the Fault or Upgrade Domain constraints. You can find out more about constraints in this article. Each domain has the same number of nodes and the same number of replicas. making the difference between FD0 and FD1 a total of two. FD0 FD1 FD2 FD3 FD4 UDTOTAL UD0 R1 1 UD1 R2 1 UD2 R3 1 UD3 R4 1 UD4 R5 1 FDTotal 1 1 1 1 1 - This layout is balanced in terms of nodes per Fault Domain and Upgrade Domain. it is now violating the Upgrade Domain constraint (since . Now. Similarly if we picked N2 and N6 (instead of N1 and N2) we'd get: FD0 FD1 FD2 FD3 FD4 UDTOTAL UD0 0 UD1 R5 R1 2 UD2 R2 1 UD3 R3 1 UD4 R4 1 FDTotal 1 1 1 1 1 - While this layout is balanced in terms of Fault Domains. The Cluster Resource Manager does not allow this arrangement. It is also balanced in terms of the number of replicas per Fault and Upgrade Domain. let's look at what would happen if we'd used N6 instead of N2. while FD1 has zero. FD0 has two replicas. How would the replicas be distributed then? FD0 FD1 FD2 FD3 FD4 UDTOTAL UD0 R1 1 UD1 R5 1 UD2 R2 1 UD3 R3 1 UD4 R4 1 FDTotal 2 0 1 1 1 - Notice anything? This layout violates our definition for the Fault Domain constraint. json for Standalone deployments . In the cluster manifest template.UD0 has zero replicas while UD1 has two).IsScaleMin indicates that this cluster runs on one-box /one single server --> <WindowsServer IsScaleMin="true"> <NodeList> <Node NodeName="Node01" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType01" FaultDomain="fd:/DC01/Rack01" UpgradeDomain="UpgradeDomain1" IsSeedNode="true" /> <Node NodeName="Node02" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType02" FaultDomain="fd:/DC01/Rack02" UpgradeDomain="UpgradeDomain2" IsSeedNode="true" /> <Node NodeName="Node03" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType03" FaultDomain="fd:/DC01/Rack03" UpgradeDomain="UpgradeDomain3" IsSeedNode="true" /> <Node NodeName="Node04" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType04" FaultDomain="fd:/DC02/Rack01" UpgradeDomain="UpgradeDomain1" IsSeedNode="true" /> <Node NodeName="Node05" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType05" FaultDomain="fd:/DC02/Rack02" UpgradeDomain="UpgradeDomain2" IsSeedNode="true" /> <Node NodeName="Node06" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType06" FaultDomain="fd:/DC02/Rack03" UpgradeDomain="UpgradeDomain3" IsSeedNode="true" /> <Node NodeName="Node07" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType07" FaultDomain="fd:/DC03/Rack01" UpgradeDomain="UpgradeDomain1" IsSeedNode="true" /> <Node NodeName="Node08" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType08" FaultDomain="fd:/DC03/Rack02" UpgradeDomain="UpgradeDomain2" IsSeedNode="true" /> <Node NodeName="Node09" IPAddressOrFQDN="localhost" NodeTypeRef="NodeType09" FaultDomain="fd:/DC03/Rack03" UpgradeDomain="UpgradeDomain3" IsSeedNode="true" /> </NodeList> </WindowsServer> </Infrastructure> via ClusterConfig.xml <Infrastructure> <!-. If you’re creating your own cluster (or want to run a particular topology in development). you provide the Fault Domain and Upgrade Domain information yourself. This layout is also invalid Configuring fault and Upgrade Domains Defining Fault Domains and Upgrade Domains is done automatically in Azure hosted Service Fabric deployments. In this example. we define a nine node local development cluster that spans three “datacenters” (each with three racks). This cluster also has three Upgrade Domains striped across those three datacenters. it looks something like this: ClusterManifest. Service Fabric picks up and uses the environment information from Azure. "faultDomain": "fd:/dc1/r0". "upgradeDomain": "UD2" }. "nodeTypeRef": "NodeType0". . "iPAddress": "localhost". "upgradeDomain": "UD2" }. { "nodeName": "vm7". "faultDomain": "fd:/dc2/r0". "iPAddress": "localhost". "faultDomain": "fd:/dc2/r0". { "nodeName": "vm3". "iPAddress": "localhost". "nodeTypeRef": "NodeType0". "iPAddress": "localhost". "nodeTypeRef": "NodeType0". "iPAddress": "localhost". "nodeTypeRef": "NodeType0". "nodeTypeRef": "NodeType0"."nodes": [ { "nodeName": "vm1". "nodeTypeRef": "NodeType0". { "nodeName": "vm6". "upgradeDomain": "UD2" }. "iPAddress": "localhost". { "nodeName": "vm2". "upgradeDomain": "UD1" }. "nodeTypeRef": "NodeType0". "nodeTypeRef": "NodeType0". "nodeTypeRef": "NodeType0". "faultDomain": "fd:/dc2/r0". "faultDomain": "fd:/dc3/r0". "faultDomain": "fd:/dc3/r0". { "nodeName": "vm5". "iPAddress": "localhost". "upgradeDomain": "UD1" }. { "nodeName": "vm8". "upgradeDomain": "UD3" }. "faultDomain": "fd:/dc3/r0". "iPAddress": "localhost". { "nodeName": "vm4". "upgradeDomain": "UD3" }. "faultDomain": "fd:/dc1/r0". "iPAddress": "localhost". "upgradeDomain": "UD3" } ]. "upgradeDomain": "UD1" }. "faultDomain": "fd:/dc1/r0". { "nodeName": "vm9". In these architectures certain machines serve as the front end/interface serving side of the application (and hence are probably exposed to the internet). Service Fabric has a first class notion of tags that can be applied to nodes. or signed long. scale. The statement at the service is called a placement constraint since it constrains where the service can run in the cluster. Placement constraints and node properties Sometimes (in fact. A great example of targeting hardware to particular workloads is almost every n-tier architecture out there. or security isolation reasons A workload should be isolated from other workloads for policy or resource consumption reasons To support these sorts of configurations. The constraint can be any Boolean statement that operates on the different node properties in the cluster. Placement constraints can be used to indicate where certain services should run. most of the time) you’re going to want to ensure that certain workloads run only on certain nodes or certain sets of nodes in the cluster. For example. bool. These are called placement constraints. The valid selectors in these boolean statements are: 1) conditional checks for creating particular statements STATEMENT SYNTAX "equal to" "==" "not equal to" "!=" "greater than" ">" . some workload may require GPUs or SSDs while others may not. Fault Domains and Upgrade Domains are assigned by Azure. the definition of your nodes and roles within the infrastructure option for Azure does not include Fault Domain or Upgrade Domain information. for example: an existing n-tier application has been “lifted and shifted” into a Service Fabric environment a workload wants to run on specific hardware for performance. Service Fabric expects that even in a microservices world there are cases where particular workloads need to run on particular hardware configurations. The different key/value tags on nodes are known as node placement properties (or just node properties). Therefore. The value specified in the node property can be a string. Different sets of machines (often with different hardware resources) handle the work of the compute or storage layers (and usually are not exposed to the internet). NOTE In Azure deployments. The set of constraints is extensible .any key/value pair can work. Generally we have found NodeType to be one of the most commonly used properties. It is useful since it corresponds 1:1 with a type of a machine. "Value >= 5" "NodeColor != green" "((OneProperty < 100) || ((AnotherProperty == false) && (OneProperty >= 100)))" Only nodes where the overall statement evaluates to “True” can have the service placed on it.xml . STATEMENT SYNTAX "greater than or equal to" ">=" "less than" "<" "less than or equal to" "<=" 2) boolean statements for grouping and logical operations STATEMENT SYNTAX "and" "&&" "or" "||" "not" "!" "group as single statement" "()" Here are some examples of basic constraint statements. Nodes that do not have a property defined do not match any placement constraint containing that property. As of this writing the default properties defined at each node are the NodeType and the NodeName. which in turn corresponds to a type of workload in a traditional n-tier application architecture. Service Fabric defines some default node properties that can be used automatically without the user having to define them. Let’s say that the following node properties were defined for a given node type: ClusterManifest. So for example you could write a placement constraint as "(NodeType == NodeType03)" . etc. In your Azure Resource Manager template for a cluster things like the node type name are likely parameterized. <NodeType Name="NodeType01"> <PlacementProperties> <Property Name="HasSSD" Value="true"/> <Property Name="NodeColor" Value="green"/> <Property Name="SomeProperty" Value="5"/> </PlacementProperties> </NodeType> via ClusterConfig.json for Standalone deployments or Template. "nodeTypes": [ { "name": "NodeType01".PlacementConstraints = "(HasSSD == true && SomeProperty >= 4)". updateDescription. Powershell: . C#: StatefulServiceUpdateDescription updateDescription = new StatefulServiceUpdateDescription().PlacementConstraints = "NodeType == NodeType01".UpdateServiceAsync(new Uri("fabric:/app/service"). So if you need to. you could also select that node type. await fabricClient. updateDescription).ServiceManager. "SomeProperty": "5" }. and would look something like "[parameters('vmNodeType1Name')]" rather than "NodeType01".CreateServiceAsync(serviceDescription). serviceDescription. await fabricClient. } ]. Service Fabric takes care of ensuring that the service stays up and available even when these types of changes are ongoing... // add other required servicedescription fields //. "NodeColor": "green". You can create service placement constraints for a service like as follows: C# FabricClient fabricClient = new FabricClient().ServiceManager.json for Azure hosted clusters. you can move a service around in the cluster. One of the cool things about a service’s placement constraints is that they can be updated dynamically during runtime. StatefulServiceDescription serviceDescription = new StatefulServiceDescription(). Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceType -Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton - PlacementConstraint "HasSSD == true && SomeProperty >= 4" If you are sure that all nodes of NodeType01 are valid. "placementProperties": { "HasSSD": "true". add and remove requirements. The upgrade of a node's properties and requires each affected node to go down and then come back up. Update-ServiceFabricService -Stateful -ServiceName $serviceName -PlacementConstraints "NodeType == NodeType01" Placement constraints (along with many other orchestrator controls that we’re going to talk about) are specified for every different named service instance. Updates always take the place of (overwrite) what was previously specified. the Service Fabric Cluster Resource Manager can figure out where to place or move replicas so that nodes don’t go over capacity. Some service running on that node can say it is currently consuming 64 of "MemoryInMb". It does this by subtracting any declared usage of each service running on that node from the node's capacity. Services would report how much of the metric they used during runtime. The node would have its capacity for “DriveSpaceInMb” to the amount of total non-reserved space on the drive. whereas metrics are about resources that nodes have and that services consume when they are running on a node. the Service Fabric Cluster Resource Manager doesn't understand what the names of the metrics mean. Capacity is another constraint that the Cluster Resource Manager uses to understand how much of a resource a node has. Service Fabric’s Cluster Resource Manager would still try to ensure that no node ended up over its capacity. For information configuring metrics and their uses. The amount of space available on that SSD (and consumed by services) would be a metric like “DriveSpaceInMb”. what about just ensuring that nodes don’t run out of resources in the first place? Service Fabric represents resources as Metrics . A node property could be "HasSSD" and could be set to true or false. If you turned off all resource balancing. The last thing you want if you’re trying to run services efficiently is a bunch of nodes that are hot while others are cold. Capacity One of the most important jobs of any orchestrator is to help manage resource consumption in the cluster. the Cluster Resource Manager tracks how much of each resource is present on each node and how much is remaining. Hot nodes lead to resource contention and poor performance. Metric names are just strings. C#: . Examples of metrics are things like “WorkQueueDepth” or “MemoryInMb”. It is a good practice to declare units as a part of the metric names that you create when it could be ambiguous. The properties on a node are defined via the cluster definition and hence cannot be updated without an upgrade to the cluster. With this information. Node properties are static descriptors of the nodes themselves. So for example. During runtime. Both the capacity and the consumption at the service level are expressed in terms of metrics. Generally this is possible unless the cluster as a whole is too full. Before we talk about balancing. and cold nodes represent wasted resources/increased cost. Metrics are any logical or physical resource that you want to describe to Service Fabric. Remaining capacity is also tracked for the cluster as a whole. It is important to note that just like for placement constraints and node properties. the metric might be "MemoryInMb" and a given Node may have a capacity for "MemoryInMb" of 2048. see this article Metrics are different from placements constraints and node properties. High. In .64) You can see capacities defined in the cluster manifest: ClusterManifest. Say that a replica's load changed from 64 to 1024. "nodeTypes": [ { "name": "NodeType02".64. metric. This can also happen if the combined usage of the replicas and instances on that node exceeds that node’s capacity. metric. It is also possible (and in fact common) that a service’s load changes dynamically.ServiceManager. Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton –Metric @("Memory. StatefulServiceDescription serviceDescription = new StatefulServiceDescription().CreateServiceAsync(serviceDescription).json for Standalone deployments or Template. but the node it was running on then only had 512 (of the "MemoryInMb" metric) remaining.Metrics.Add(metric). since there's not enough room on that node. ServiceLoadMetricDescription metric = new ServiceLoadMetricDescription().SecondaryDefaultLoad = 64. "capacities": { "MemoryInMb": "2048".Weight = ServiceLoadMetricWeight. serviceDescription.xml <NodeType Name="NodeType02"> <Capacities> <Capacity Name="MemoryInMb" Value="2048"/> <Capacity Name="DiskInMb" Value="512000"/> </Capacities> </NodeType> via ClusterConfig. "DiskInMb": "512000" } } ].json for Azure hosted clusters.High. and would look something like "[parameters('vmNodeType2Name')]" rather than "NodeType02". In your Azure Resource Manager template for a cluster things like the node type name are likely parameterized.PrimaryDefaultLoad = 64. await fabricClient. metric. Now that replica or instance's placement is invalid. metric.Name = "MemoryInMb". so it can easily determine if there's sufficient space in the cluster. You want to create three instances of the service. Say that you go to create a stateless service and it has some load associated with it (more on default and dynamic load reporting later). Let’s say that the service cares about the "DiskSpaceInMb" metric. If there isn't sufficient space. Here's an example of how to specify buffered capacity: ClusterManifest. Services can have their load spike independently of actions taken by the Cluster Resource Manager.20" /> </Section> via ClusterConfig.xml <Section Name="NodeBufferPercentage"> <Parameter Name="DiskSpace" Value="0. the services are all very "bulky". Let's also say that it is going to consume five units of "DiskSpaceInMb" for every instance of the service. you can expect smaller amounts of your cluster to be unavailable during upgrades and failures. the Cluster Resource Manager rejects the create service call. The value you pick for the reserved capacity is a function of the number of Fault and Upgrade Domains you have in the cluster and how much overhead you want. there are some controls that are baked in to prevent basic problems. Buffered Capacity allows reservation of some portion of the overall node capacity so that it is only used to place services during upgrades and node failures. For example. When moving replicas. As a result.either case the Cluster Resource Manager has to kick in and get the node back below capacity.10" /> <Parameter Name="Memory" Value="0. If you have more domains. As long as the Cluster Resource Manager can rearrange things so there's five units available on three nodes. Great! So that means that we need 15 units of "DiskSpaceInMb" to be present in the cluster in order for us to even be able to create these service instances. with dynamic load there’s not a lot the Cluster Resource Manager can do. The Cluster Resource Manager is continually calculating the overall capacity and consumption of each metric. More Fault and Upgrade Domains means that you can pick a lower number for your buffered capacity.15" /> <Parameter Name="SomeOtherMetric" Value="0. the Cluster Resource Manager tries to minimize the cost of those movements. The first thing we can do is prevent the creation of new workloads that would cause the cluster to become full. Such rearrangement is almost always possible unless the cluster as a whole is almost entirely full. this space could be allocated many different ways. Specifying the buffer percentage only makes sense if you have also specified the node capacity for a metric. It does this by moving one or more of the replicas or instances on that node to different nodes. That said.json for Standalone deployments or Template. Today buffer is specified globally per metric for all nodes via the cluster definition. Movement cost is discussed in this article. or three remaining units of capacity on five different nodes. Cluster capacity So how do we keep the overall cluster from being too full? Well. Since the requirement is only that there be 15 units available. there could be one remaining unit of capacity on 15 different nodes.json for Azure hosted clusters: . it will eventually place the service. Buffered Capacity Another feature the Cluster Resource Manager has that helps manage overall cluster capacity is the notion of some reserved buffer to the capacity specified at each node. or both. your cluster with plenty of headroom today may be underpowered when you become famous tomorrow. and the current consumption for every metric in use in the cluster. The Cluster Resource Manager exposes this information via PowerShell and the Query APIs. "value": "0. check out this article Defining Defragmentation Metrics is one way to consolidate load on nodes instead of spreading it out. { "name": "Memory". To learn how to configure defragmentation. This ensures that the cluster retains enough spare overhead so that upgrades and failures don’t cause nodes to go over capacity.192450089729875 BalancingThreshold : 1 Action : NoActionNeeded ActivityThreshold : 0 ClusterCapacity : 189 ClusterLoad : 45 ClusterRemainingCapacity : 144 NodeBufferPercentage : 10 ClusterBufferedCapacity : 170 ClusterRemainingBufferedCapacity : 125 ClusterCapacityViolation : False MinNodeLoadValue : 0 MinNodeLoadNodeId : 3ea71e8e01f4b0999b121abcbf27d74d MaxNodeLoadValue : 15 MaxNodeLoadNodeId : 2cc648b6770be1bc9824fa995d5b68b1 Next steps For information on the architecture and information flow within the Cluster Resource Manager. "value": "0. refer to this article Start from the beginning and get an Introduction to the Service Fabric Cluster Resource Manager To find out about how the Cluster Resource Manager manages and balances load in the cluster.192450089729875 DeviationAfter : 0. the total capacity. { "name": "SomeOtherMetric". This lets you see the buffered capacity settings. Here we see an example of that output: PS C:\Users\user> Get-ServiceFabricClusterLoadInformation LastBalancingStartTimeUtc : 9/1/2016 12:54:59 AM LastBalancingEndTimeUtc : 9/1/2016 12:54:59 AM LoadMetricInformation : LoadMetricName : Metric1 IsBalancedBefore : False IsBalancedAfter : False DeviationBefore : 0.20" } ] } ] The creation of new services fails when the cluster is out of buffered capacity for a metric.15" }. Buffered capacity is optional but is recommended in any cluster that defines a capacity for a metric. "value": "0. "fabricSettings": [ { "name": "NodeBufferPercentage".10" }. check out the . "parameters": [ { "name": "DiskSpace". article on balancing load . the Service Fabric Cluster Resource Manager supports Application Groups. and it has three services. . Introduction to Application Groups 1/17/2017 • 7 min to read • Edit Online Service Fabric's Cluster Resource Manager typically manages cluster resources by spreading the load (represented via Metrics) evenly throughout the cluster. It can also limit the metric load of that the applications’ services on individual nodes and in total. It can also be used to reserve resources in the cluster for the application. Some additional requirements are typically: Ability to reserve capacity for an Application Instance's services in the cluster Ability to limit the total number of nodes that the services within an application run on Defining capacities on the application instance itself to limit the number or total resource consumption of the services inside it To meet these requirements. but patterns that make heavy use of different Service Fabric Application Instances sometimes bring in additional requirements. There are no guarantees made about which replicas or instances of which services get placed together. Metrics and capacity work great for many workloads. Application capacity can be set for new applications when they are created and can also be updated for existing applications. Service Fabric also manages the capacity of the nodes in the cluster and the cluster as a whole through the notion of capacity. If no Application Capacity is specified. Limiting the maximum number of nodes The simplest use case for Application capacity is when an application instantiation needs to be limited to a certain maximum number of nodes. the Service Fabric Cluster Resource Manager creates and places services according to normal rules (balancing or defragmentation). The Cluster Resource Manager has spread out all replicas across six available nodes to achieve the best balance in the cluster. Managing Application capacity Application capacity can be used to limit the number of nodes spanned by an application. or which specific nodes get used. In the left example. The following image shows an application instance with and without a maximum number of nodes defined. the application doesn’t have Application Capacity set. ad.0. Reserving Capacity Another common use for application groups is to ensure that resources within the cluster are reserved for a given application instance.0". the Cluster Resource Manager will attempt to move replicas to other nodes so that the capacity constraint is respected. ad. Let's look at how that would work. If total load on the node goes over this capacity.Add(appMetric).UpdateApplicationAsync(adUpdate).ApplicationManager.Name = "Metric1".ApplicationTypeVersion = "1. For each application metric. and the application's capacity for those metrics. fc.ApplicationTypeName = "AppType1". fc. let's say the application instance had a capacity of 10 and already had load of five. Maximum Node Capacity – This setting specifies the maximum total load for replicas of the services inside the application on a single node. appMetric. we see the same application limited to three nodes. var appMetric = new ApplicationMetricDescription(). For example.MaximumNodes = 3. and Capacity Application Groups also allow you to define metrics associated with a given application instance. Powershell New-ServiceFabricApplication -ApplicationName fabric:/AppName -ApplicationTypeName AppType1 - ApplicationTypeVersion 1.MaximumNodes = 5. Load. there are two values that can be set: Total Application Capacity – This setting represents the total capacity of the application for a particular metric. Application Metrics.In the right example. ApplicationUpdateDescription adUpdate = new ApplicationUpdateDescription(new Uri("fabric:/AppName")). or updated for an application instance that was already running. ad.0.Metrics. In this case the creation of a service with a total default load of 10 would be disallowed. reserve. Specifying a minimum number of nodes and resource reservation Reserving resources for an application instance requires specifying a couple additional parameters: MinimumNodes and NodeReservationCapacity .0 -MaximumNodes 3 Update-ServiceFabricApplication –Name fabric:/AppName –MaximumNodes 5 C# ApplicationDescription ad = new ApplicationDescription(). adUpdate. The Cluster Resource Manager disallows the creation of any new services within this application instance that would cause total load to exceed this value.TotalApplicationCapacity = 1000. or even if they aren't consuming the resources yet. ad. This happens even if the application instance doesn't have any services within it yet.0.ApplicationName = new Uri("fabric:/AppName"). and limit the resource consumption of the services inside that application instance. The parameter that controls this behavior is called MaximumNodes.CreateApplicationAsync(ad). adUpdate.0. appMetric. This parameter can be set during application creation.ApplicationManager. This allows you to track. This setting defines the number of nodes that the resources should be reserved on. Obtaining the application load information For each application that has Application Capacity defined you can obtain the information about the aggregate load reported by replicas of its services. This reserved application capacity is considered consumed and counted against the remaining capacity on that node and within the cluster.Just like specifying a maximum number of nodes that the services within an application can run on. and doesn't allow services from other application instances in the cluster to consume that capacity. you can also specify the minimum number. When an application is created with reservation the Cluster Resource Manager reserves capacity equal to MinimumNodes * NodeReservationCapacity (for each metric). However capacity is reserved on a specific node only when at least one replica is placed on it. NodeReservationCapacity .The NodeReservationCapacity can be defined for each metric within the application. In the example on the right. Let's look at an example of capacity reservation: In the left example. MinimumNodes . This later reservation allows for flexibility and better resource utilization since resources are only reserved on nodes when needed. applications do not have any Application Capacity defined. load can be retrieved using the following PowerShell cmdlet: Get-ServiceFabricApplicationLoad –ApplicationName fabric:/MyApplication1 . The Cluster Resource Manager balances everything according to normal rules. For example. let's say that the application was created with the following settings: MinimumNodes set to two MaximumNodes set to three An application Metric defined with NodeReservationCapacity of 20 MaximumNodeCapacity of 50 TotalApplicationCapacity of 100 Service Fabric reserves capacity on two nodes for the blue application. This setting defines the amount of metric load reserved for the application on any node where any of the replicas or instances of the services within it are placed. guaranteeing capacity within the cluster as a part of creating the application instance. Application Capacity: Maximum permitted value of Application Load. Reservation Capacity: Cluster Capacity that is reserved in the cluster for this Application. The effect of the command is immediate. Ensuring services run on the same node can be achieved by using affinity or with placement constraints depending on the specific requirements. but not which specific five nodes in the cluster. In other words. Do not try to use the Application Capacity to ensure that two services from the same application are placed alongside each other. then they must follow these rules: Node Reservation Capacity must not be greater than Maximum Node Capacity. It also includes information about each application load metric. Removing Application Capacity Once the Application Capacity parameters are set for an application. The restrictions are enforced both during application creation (on the client side). you can specify that the application runs on at most five nodes. Constraining an application to specific nodes can be achieved using placement constraints for services. For example: Update-ServiceFabricApplication –Name fabric:/MyApplication1 –RemoveApplicationCapacity This command removes all Application Capacity parameters from the application. check out the topic on the other Cluster Resource Manager configurations available Learn about configuring Services . For example. the Cluster Resource Manager reverts to using the default behavior for managing applications. In this case then Total Application Capacity must be greater than 80 for this load metric. including: Metric Name: Name of the metric. For example. Next steps For more information about the other options available for configuring services. All integer parameters must be non-negative numbers. and during application update (on the server side). If capacities for a load metric are defined. Restrictions on Application Capacity There are several restrictions on Application Capacity parameters that must be respected. you cannot limit the capacity for the metric “CPU” on the node to two units and try to reserve three units on each node.The ApplicationLoad query returns the basic information about Application Capacity that was specified for the application. Let's also say you set the Maximum Nodes to ten. they can be removed using Update Application APIs or PowerShell cmdlets. If MaximumNodes is specified. How not to use Application Capacity Do not try to use the Application Group features to constrain the application to a specific subset of nodes. Application Load: Total Load of this Application’s child replicas. the creation or update of the application is rejected with an error and no changes take place. After this command completes. If there are validation errors. and the number that the application is currently occupying. Application Capacity parameters can be specified again via Update- ServiceFabricApplication. MinimumNodes must never be greater than MaximumNodes. then the product of MaximumNodes and Maximum Node Capacity must not be greater than Total Application Capacity. let's say the Maximum Node Capacity for load metric “CPU” is set to eight. This information includes the Minimum Nodes and Maximum Nodes info. read up on Service Fabric Load Metrics The Cluster Resource Manager has many options for describing the cluster.To find out about how the Cluster Resource Manager manages and balances load in the cluster. check out this article on describing a Service Fabric cluster . To find out more about them. check out the article on balancing load Start from the beginning and get an Introduction to the Service Fabric Cluster Resource Manager For more information on how metrics work generally. A service's metric configuration includes how much of that resource each stateful replica or stateless instance of that service consumes by default. but if you need it you can learn about it here There are many different placement rules that can be configured on your service to handle additional scenarios. Each named service can also define the set of metrics that it wants to report. Next steps Metrics are how the Service Fabric Cluster Resource Manger manages consumption and capacity in the cluster. Each named service instance can specify rules for how it should be allocated in the cluster. Other placement rules are configured via either Correlations or Policies. or in other less common scenarios. placement constraints are extensible . allowing you to respond to changes in the cluster. check out the article on balancing load The Cluster Resource Manager has many options for describing the cluster. Metrics also include a weight that indicates how important balancing that metric is to that service. Configuring metrics 3. To learn more about them and how to configure them. You can find out about those different placement policies here Start from the beginning and get an Introduction to the Service Fabric Cluster Resource Manager To find out about how the Cluster Resource Manager manages and balances load in the cluster. check out this article Affinity is one mode you can configure for your services. That said. Typically a particular named service instance or all services of a given type constrained to run on a particular type of node. Configuring placement constraints 2. including how important they are to that service. Configuring advanced placement policies and other rules (less common) Placement constraints Placement constraints are used to control which nodes in the cluster a service can actually run on. It is not common. To find out more about them. Configuring cluster resource manager settings for Service Fabric services 1/24/2017 • 2 min to read • Edit Online The Service Fabric Cluster Resource manager allows fine-grained control over the rules that govern every individual named service. Configuring services breaks down into three different tasks: 1. Other placement rules There are other types of placement rules that are useful in clusters that are geographically distributed. and then select for them with constraints when the service is created. More information on placement constraints and how to configure them can be found in this article Metrics Metrics are the set of resources that a given named service instance needs. check out this article on describing a Service Fabric cluster . Placement constraints are also dynamically updatable over the lifetime of the service. in case tradeoffs are necessary. The properties of a given node can also be updated dynamically in the cluster.you can define any set of properties on a node type basis. but the application doesn't really know how to measure it. The first one a stateful service with three partitions and a target replica set size of three. In the following example. and CPU usage are examples of metrics. what do you get? Well it turns out that for basic workloads you get a decent distribution of work. Most metrics that we see people use are logical metrics. and Count. That’s fine! The Service Fabric Cluster Resource Manager picks some simple metrics for you. So you go implement and then create your services without specifying any metrics. The complexity of measuring and reporting your own metrics is also why we provide some default metrics out of the box. This is common. ReplicaCount. Default metrics Let’s say that you want to get started and don’t know what resources you are going to consume or even which ones would be important to you. Disk. Logical metrics are application-defined and correspond to some physical resource consumption. resources that correspond to physical resources on the node that need to be managed. Most commonly this is because today many services are written in managed code. These are physical metrics. so with these default metrics. Managing resource consumption and load in Service Fabric with metrics 1/24/2017 • 15 min to read • Edit Online Metrics are the generic term within Service Fabric for the resources that your services care about and which are provided by the nodes in the cluster. Managed code means that from within a host process it can be hard to measure and report a single service object's consumption of physical resources. The second one a stateless service with one partition and an instance count of three: . The following table shows how much load for each of these metrics is associated with each service object: METRIC STATELESS INSTANCE LOAD STATEFUL SECONDARY LOAD STATEFUL PRIMARY LOAD PrimaryCount 0 0 1 ReplicaCount 0 1 1 Count 1 1 1 Ok. Things like Memory. a metric is anything that you want to manage to deal with the performance of your services. Logical metrics are things like “MyWorkQueueDepth” or "MessagesToProcess" or "TotalRecords". Metrics can also be (and commonly are) logical metrics. The default metrics that we use today when you don’t specify any are called PrimaryCount. Generally. let’s see what happens when we create two services. In the worst case. This is because the default metric reporting isn’t adaptive and presumes everything is equivalent. using just the defaults can also result in overscheduled nodes resulting in performance issues. For any serious workload. Any metric has some properties that describe it: a name. or even just the same right now? Realistically. Default Load: The default load is represented differently depending on whether the service is stateless or stateful. each metric has a single property named DefaultLoad . the odds of all services being equivalent forever is low. For stateless services. Custom metrics Metrics are configured on a per-named-service-instance basis when you’re creating the service. a default load. The metric name is a unique identifier for the metric within the cluster from the Resource Manager’s perspective. which we'll cover next. If you're interested in getting the most out of your cluster and avoiding performance issues. you'll want to start looking into custom metrics. We can do better with custom metrics and dynamic load reports.Here's what you get: Primary replicas for the stateful service are not stacked up on a single node Replicas for the same partition are not on the same node The total number of primaries and secondaries is distributed in the cluster The total number of service objects are evenly allocated on each node Good! This works great until you start to run large numbers of real workloads: What's the likelihood that the partitioning scheme you picked results in perfectly even utilization by all partitions? What’s the chance that the load for a given service is constant over time. but doing so usually means that your cluster utilization is lower than you’d like. and a weight. you could run with just the default metrics. Metric Name: The name of the metric. Here’s the code that you would write to create a service with that metric configuration: Code: StatefulServiceDescription serviceDescription = new StatefulServiceDescription(). replicaCountMetric.PrimaryDefaultLoad = 20. memoryMetric. This is because you want to be clear about the relationship between the default metrics and your custom metrics.Name = "Count". totalCountMetric.Name = "PrimaryCount". totalCountMetric. but you still want primary replicas to be balanced. You know that Memory is the most important metric in terms of managing the performance of this particular service.Low. Other than these tweaks.Name = "ReplicaCount". serviceDescription.Add(primaryCountMetric). If you define custom metrics and you want to also use the default metrics. totalCountMetric. For stateful services you define: PrimaryDefaultLoad: The default amount of this metric this service consumes when it is a Primary SecondaryDefaultLoad: The default amount of this metric this service consumes when it is a Secondary Weight: Metric weight defines how important this metric is relative to the other metrics for this service.Weight = ServiceLoadMetricWeight. replicaCountMetric.Metrics.Weight = ServiceLoadMetricWeight. totalCountMetric. serviceDescription. and that you also want to use the default metrics. StatefulServiceLoadMetricDescription memoryMetric = new StatefulServiceLoadMetricDescription().High.Low.SecondaryDefaultLoad = 1. memoryMetric. primaryCountMetric. serviceDescription.Medium. memoryMetric.PrimaryDefaultLoad = 1. StatefulServiceLoadMetricDescription totalCountMetric = new StatefulServiceLoadMetricDescription(). serviceDescription. primaryCountMetric. memoryMetric.Name = "MemoryInMb".Metrics. replicaCountMetric. Balancing Primaries is a good idea so that the loss of some node or fault domain doesn’t impact a majority of primary replicas along with it.Add(totalCountMetric). while secondaries take up 5. Powershell: . StatefulServiceLoadMetricDescription replicaCountMetric = new StatefulServiceLoadMetricDescription().PrimaryDefaultLoad = 1.Add(replicaCountMetric). Defining metrics for your service . primaryCountMetric.SecondaryDefaultLoad = 0.ServiceManager. Let’s also say that you’ve done some measurements and know that normally a Primary replica of that service takes up 20 units of "MemoryInMb".Weight = ServiceLoadMetricWeight. maybe you care about Memory consumption or WorkQueueDepth way more than you care about Primary distribution. StatefulServiceLoadMetricDescription primaryCountMetric = new StatefulServiceLoadMetricDescription().Weight = ServiceLoadMetricWeight.Metrics.Add(memoryMetric). primaryCountMetric. replicaCountMetric.an example Let’s say you wanted to configure a service that reports a metric called “MemoryInMb”.Metrics. you want to use the default metrics. await fabricClient. For example.PrimaryDefaultLoad = 1.CreateServiceAsync(serviceDescription).SecondaryDefaultLoad = 5.SecondaryDefaultLoad = 1. you'll need to explicitly add them back. For stateful services the default load for primary and secondary replicas are typically different since replicas do different kinds of work in each role. It's expected that the default load for a primary replica is higher than for secondary replicas.Low. there’s no single number that you can use for default load. New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton –Metric @("MemoryInMb. This is called dynamic load reporting."Count. but the real numbers should depend on your own measurements. This is why the Cluster Resource Manager allows each service object to update its own load during runtime. Any value you pick for default load is wrong some of the time. updated after the service is created.1”) (Reminder: if you just want to use the default metrics. you’ve noticed that: 1.1. or maybe they correspond to workloads that vary over the course of the day. primaries usually serve both reads and writes (and most of the computational burden). The service or partition could be associated with a particular customer.5”. let's go through each of these settings in more detail and talk about the behavior you'll get. Dynamic load Let’s say that you’ve been running your service for a while. while secondaries don't.0”.20. There's lots of things that could cause these types of load fluctuations.High. Default loads are still good since they provide some information. you don’t need to touch the metrics collection at all or do anything special when creating your service. .Low."PrimaryCount. Some partitions or instances of a given service consume more resources than others 2.) Now that we've gotten an introduction and seen an example.1”. "Load" is how much of a given metric is consumed by some service instance or replica on a given node."ReplicaCount. Default load works great for simple capacity planning scenarios where certain amounts of resources are dedicated to different workloads. With some monitoring. Reporting per replica or instance allows the Cluster Resource Manager to reorganize the individual service objects in the cluster to ensure that the services get the resources they require. The Cluster Resource Manager allows stateful services to specify a different default load for both their Primaries and Secondaries. Default load Default load is how much load each service object (instance or replica) of this service consumes by default.Medium. A service replica or instance that was cold and not doing any work would usually report that it was using low amounts of a given metric. Busy services effectively get to "reclaim" resources from other replicas or instances that are currently cold or doing less work. Regardless of the reason.1. Dynamic load reports allow replicas or instances to adjust their allocation/reported load of metrics over their lifetime. while stateless services can only specify one value. This is a problem since incorrect default loads result in the Cluster Resource Manager either over or under allocating resources for your service.1. For simpler services. the default load is a static definition that is never updated and that is used for the lifetime of the service. reported on a per service object basis. Load The whole point of defining metrics is to represent some load. you have nodes that are over or under utilized even if the Cluster Resource Manager thinks the cluster is balanced. A busy replica or instance would report that they are using more. For example. Some services have load that varies over time. As a result. but they're not a complete story for real workloads most of the time. or all of the above. The expected load can be configured when a service is created. 1. or enable a new metric only after the code has already been deployed and validated via other mechanisms. we use “Memory” as an example metric.ServicePartition. When both default load is set and dynamic load reports are utilized. The code can measure and report all metrics it knows about.ReportLoad(new List<LoadMetric> { new LoadMetric("MemoryInMb".Within your Reliable Service the code to report load dynamically would look like this: Code: this. The list of metrics a service can report is the same set specified when the service is created. SecondaryDefaultLoad"). new LoadMetric("metric1". can both be used? Yes! In fact. 1234). In this example. The default load allows the Cluster Resource Manager to place the service objects in good places when they are created. Let’s presume that we initially created the stateful service with the following command: Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton –Metric @("MemoryInMb. PrimaryDefaultLoad."Count. and reporting load dynamically is recommended. Service Fabric logs that report but ignores it. the administrator or ops team could disable a metric with a buggy report for a particular service. 42) }). those reports are accepted and used. MetricWeight. If a service replica or instance tries to report load for a metric that it is not currently configured to use.Medium.0”."PrimaryCount. This is neat because it allows for greater experimentation.1”. The list of metrics associated with the service may also be updated dynamically.21. Services replicas or instances may only report load for the metrics that they were configured to use when they were created.11”.1”) As a reminder. If there are other metrics reported in the same API call that are valid. and the operator can specify and update the metric configuration for that service without having to change the code. the default load serves as an estimate until dynamic reports show up. Let’s take our previous example and see what happens when we add some custom metrics and dynamic load reporting.Low.1. Let's see what one possible cluster layout could look like: . This is good because it gives the Cluster Resource Manager something to work with.Low.1. Mixing default load values and dynamic load reports If default load isn't sufficient. reconfigure the weights of metrics based on behavior. If no default load information is provided placement of services is random.High."ReplicaCount. For example. this syntax is ("MetricName. this is the recommended configuration. When load reports come in later the Cluster Resource Manager almost always has to move services around. the least is N2. we know that the replicas inside partition 1 of the stateful service haven’t reported load dynamically Secondary replicas within a partition can have their own load Overall the metrics look balanced. That view is what allows the Cluster Resource Manager to track consumption in the cluster. it may find their use of that . For memory.75). services may have different views as to the importance of the same metric. Low. the ratio between the maximum and minimum load is 1. Medium. Metric weights tell the Cluster Resource Manager that certain metrics are more important than others. The real impact of different metric weights in the cluster is that the Cluster Resource Manager generates different solutions. How should the Cluster Resource Manager handle these situations? Metric weights allow the Cluster Resource Manager to decide how to balance the cluster when there’s no perfect answer. and High. However. When there's no perfect solution the Cluster Resource Manager can prefer solutions which balance the higher weighted metrics better.Some things that are worth noting: Because replicas or instances use the service’s default load until they report their own.75 (the node with the most load is N3. and ensure that nodes don’t go over capacity. perfectly balanced solutions may not exist for all metrics. balance consumption across nodes. There are some things that we still need to explain: What determined whether a ratio of 1.75 was reasonable or not? How does the Cluster Resource Manager know if that’s good enough or if there is more work to do? When does balancing happen? What does it mean that Memory was weighted “High”? Metric weights Being able to track the same metrics across different services is important. Metrics can have four different weight levels: Zero. but its load does still contribute to things like capacity. A metric with a weight of Zero contributes nothing when considering whether things are balanced or not. in a cluster with many metrics and lots of services. and 28/16 = 1. Metric weights also allow the Cluster Resource Manager to balance specific services differently. Also. If one service thinks a metric is unimportant. but also that the cluster as a whole is allocated correctly. What would happen if the Cluster Resource Manager didn’t care about both global and local balance? Well. In the following example. and ServiceB doesn’t care about it at all. In the second case. In this case. the Cluster Resource Manager would probably swap services A and B to come up with an allocation where MetricB is better balanced than MetricA. In this example. we see that the Cluster Resource Manager places the services so that MetricA is better balanced (has a lower standard deviation) than MetricB. there are four different services. As a result. Let’s look at an example of some load reports and how different metric weights can result in different allocations in the cluster. This allows another service to get an even distribution that is important to it. we see that switching the relative weight of the metrics results in the Cluster Resource Manager preferring different solutions and creating different arrangements of services. The other weight is a global weight. which is the average from all the services that report that metric. we reverse the metric weights. The Cluster Resource Manager uses both these weights when calculating the scores of the solutions. all reporting different values for two different metrics A and B. In one case all the services define Metric A is the most important one (Weight = High) and MetricB as unimportant (Weight = Low). what’s the actual weight that ends up getting used? There are actually multiple weights that are tracked for every metric. it’s trivial to construct solutions that are globally balanced but which result in poor resource allocations for individual services. This is because it is important that a service is balanced according to its own priorities. Global metric weights So if ServiceA defines MetricA as most important. The first set are the weights that each service defined for the metric. let’s look at the default metrics that a stateful service is configured with and see what happens when if only global balance is considered: . In this example.metric imbalanced. the other two services (Triangle and Hexagon) have their partitions lose a replica. As a result. check out the article on balancing load Start from the beginning and get an Introduction to the Service Fabric Cluster Resource Manager Movement Cost is one way of signaling to the Cluster Resource Manager that certain services are more expensive to move than others. the Cluster Resource Manager has distributed the replicas based on both the global and per-service balance. Each service is balanced according to its own defined metric weights. In the bottom example. refer to this article To find out about how the Cluster Resource Manager manages and balances load in the cluster. All nodes have the same count of primaries and the same number total replicas. the cluster as a whole is indeed balanced. which causes no disruption (other than having to recover the down replica). if you look at the actual impact of this allocation it’s not so good: the loss of any node impacts a particular workload disproportionately. if the same first node fails the loss of primaries (and secondaries) is distributed across all partitions of all services. However. refer to this article . Next steps For more information about the other options available for configuring services. because it takes out all its primaries. For example. To learn more about movement cost. This ensures that the services are balanced within themselves according to their own needs as much as possible. When calculating the score of the solution it gives most of the weight to the global solution. Global balance is calculated based on the average of the metric weights configured for each of the services. check out the topic on the other Cluster Resource Manager configurations available Learn about configuring Services Defining Defragmentation Metrics is one way to consolidate load on nodes instead of spreading it out.In the top example where we only looked at global balance. and a (configurable) portion to individual services. The impact to each is the same. Conversely. if the first node fails the three primaries for the three different partitions of the Circle service would all be lost. To learn how to configure defragmentation. CreateServiceAsync(serviceDescription).Correlations.. affinityDescription. Configuring and using service affinity in Service Fabric 1/17/2017 • 4 min to read • Edit Online Affinity is a control that is provided mainly to help ease the transition of larger monolithic applications into the cloud and microservices world.ServiceManager. and has two different modes. You can think of affinity as “pointing” one service at another and saying “This service can only run where that service is running. However. Then there’s an “Oops. The other mode is AlignedAffinity. you define an affinity relationship between two different services. and making sure it is running smoothly.”. This type of transition is common.Add(affinityDescription). packaging it. That said it can also be used in certain cases as a legitimate optimization for improving the performance of services. 2. affinityDescription. You start by lifting the entire app into the environment. How to configure affinity To set up affinity. serviceDescription. As a result. Aligned Affinity is useful only with stateful services.” Sometimes we refer to affinity as a parent/child relationship (where you point the child at the parent). they're broken.ServiceName = new Uri("fabric:/otherApplication/parentService").Scheme = ServiceCorrelationScheme. In these cases we don’t want to lose our refactoring work. Some component X in the monolithic app had an undocumented dependency on component Y.. and don’t want to go back to the monolith. to Service Fabric. ServiceCorrelationDescription affinityDescription = new ServiceCorrelationDescription(). Configuring two stateful services to have aligned affinity ensures that the primaries of those . Since these services are now running on different nodes in the cluster. 3. These things communicate via (local named pipes | shared memory | files on disk) but they really need to be able to write to a shared resource for performance reasons right now. you could try turning on affinity. Different affinity options Affinity is represented via one of several correlation schemes. Affinity ensures that the replicas or instances of one service are placed on the same nodes as the replicas or instances of another. Everything is fine. That hard dependency gets removed later. the overall application is not meeting expectations. Then you start breaking it down into different smaller services that all talk to each other. although doing so can have side effects.Affinity. and you just turned those components into separate services. What to do? Well. or one that just wasn’t designed with microservices in mind. In NonAlignedAffinity. the replicas or instances of the different services are placed on the same nodes. Let’s say you’re bringing a larger app. but it turns out that these two components are actually chatty/performance sensitive. The “Oops” usually falls into one of these categories: 1. until we can redesign the components to work naturally as services (or until we can solve the performance expectations some other way) we're going to need some sense of locality. The most common mode of affinity is what we call NonAlignedAffinity. When they moved them into separate services overall application performance tanked or latency increased. await fabricClient. stars Today the Cluster Resource Manager isn't able to model chains of affinity relationships. it can't be enforced due to the other constraints. you may also want to look into Application Groups. This means that the “affinity” rule only enforces that the child is where the parent is. It is also possible (though less common) to configure NonAlignedAffinity for stateful services. rather than a chain. If there's no natural parent service. not that the parent is located with the child. To move from a chain to a star. The services in an affinity relationship are fundamentally different entities that can fail and be moved independently. When this happens the Cluster Resource Manager thinks . the bottommost child would be parented to the first child’s parent instead. the violation is automatically corrected later. you may have to create one that serves as a placeholder. if it is possible to do so. the different replicas of the two stateful services would be collocated on the same nodes. capacity limitations where only some of the service objects in the affinity relationship can fit on a given node. There are also causes for why an affinity relationship could break. Depending on your requirements. let's say the parent suddenly fails over to another node. If you want to model this type of relationship. It also causes each pair of secondaries for those services to be placed on the same nodes. but no attempt would be made to align their primaries or secondaries. you may have to do this multiple times. Another thing to note about affinity relationships today is that they are directional.services are placed on the same nodes as each other. For NonAlignedAffinity. What this means is that a service that is a child in one affinity relationship can’t be a parent in another affinity relationship. Best effort desired state There are a few differences between affinity and monolithic architectures. you effectively have to model it as a star. For example. In these cases even though there's an affinity relationship in place. For example. Depending on the arrangement of your services. Many of them are because an affinity relationship is best effort. Chains vs. Next steps For more information about the other options available for configuring services. Partitioning support The final thing to notice about affinity is that affinity relationships aren’t supported where the parent is partitioned. check out the topic on the other Cluster Resource Manager configurations available Learn about configuring Services For uses such as limiting services to a small set of machines and trying to aggregate the load of a collection of services. This is something that we may support eventually. The affinity relationship can't be perfect or instantly enforced since these are different services with different lifecycles.everything is fine until it notices that the child is not located with a parent. but today it is not allowed. use Application Groups . it may be important for a given service to always run or never run in certain regions. Disallowing replica packing Most of the following controls could be configured via node properties and placement constraints. Multiple invalid domains may be specified via separate policies. such as multiple on-premises datacenters or across Azure regions If your environment spans multiple areas of geopolitical control (or some other case where you have legal or policy boundaries you care about There are actual performance/latency considerations due to communication in the cluster traveling large distances or transiting certain slower or less reliable networks.DomainName = "fd:/DCEast". Similarly it may be important to try to place the Primary in a certain region to minimize end-user latency. //regulations prohibit this workload here serviceDescription. . This policy ensures that a particular service never runs in a particular area.Add(invalidDomain). To make things simpler. The advanced placement policies are: 1. Placement policies for service fabric services 1/17/2017 • 5 min to read • Edit Online There are many different additional rules that you may need to configure in some rare scenarios. Specifying invalid domains The InvalidDomain placement policy allows you to specify that a particular Fault Domain is invalid for this workload. placement policies can be configured on a per-named service instance basis and updated dynamically. Code: ServicePlacementInvalidDomainPolicyDescription invalidDomain = new ServicePlacementInvalidDomainPolicyDescription(). the Service Fabric Cluster Resource Manager provides these additional placement policies. for example for geopolitical or corporate policy reasons. Required domains 3. Invalid domains 2. invalidDomain. Some examples of those scenarios are: If your Service Fabric cluster is spanned across a geographic distance.PlacementPolicies. but some are more complicated. Preferred domains 4. In these types of situations. Like with other placement constraints. the Primary is migrated to some other location. Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton - PlacementPolicy @("RequiredDomain.fd:/DC01/RK03/BL2") Specifying a preferred domain for the primary replicas The Preferred Primary Domain is an interesting control. If the domain or the Primary replica fails or is shut down for some reason. the Cluster Resource Manager moves it back to the preferred domain as soon as possible. requiredDomain.PlacementPolicies.DomainName = "fd:/DC01/RK03/BL2". If this new location isn't in the preferred domain. serviceDescription. . This policy is most useful in clusters that are spanned across Azure regions or multiple datacenters but would prefer that the Primary replicas be placed in a certain location.Add(requiredDomain). since it allows selection of the fault domain in which the Primary should be placed if it is possible to do so. especially for reads. Code: ServicePlacementRequiredDomainPolicyDescription requiredDomain = new ServicePlacementRequiredDomainPolicyDescription(). Keeping Primaries close to their users helps provide lower latency. Naturally this setting only makes sense for stateful services.Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton - PlacementPolicy @("InvalidDomain. Multiple required domains can be specified via separate policies.fd:/DCEast”) Specifying required domains The required domain placement policy requires that all the stateful replicas or stateless service instances for the service be present in the specified domain. The Primary ends up in this domain when everything is healthy. and fd:/2). If the Cluster Resource Manager builds replacements for those replicas. it would have to choose nodes in fd:/0. fd:/1. and usually these situations are transient since the nodes come back. ServicePlacementPreferPrimaryDomainPolicyDescription primaryDomain = new ServicePlacementPreferPrimaryDomainPolicyDescription(). Now. For more information on constraints and constraint priorities generally. let's say due to capacity issues none of the other nodes in those domains were valid. but it can happen. For example. primaryDomain. It also increases the chance that the whole replica set could go down or be lost (if FD 0 were to be permanently lost). This is rare. doing that creates a situation where the Fault Domain constraint is being violated. In this case.PlacementPolicies. If the nodes do stay down and the Cluster Resource Manager needs to build . serviceDescription.DomainName = "fd:/EastUS/". but there are cases where replicas for a given partition may end up temporarily packed into a single domain. If you've ever seen a health warning like The Load Balancer has detected a Constraint Violation for this Replica:fabric:/<some service name> Secondary Partition <some partition ID> is violating the Constraint: FaultDomain you've hit this condition or something like it. However. Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton - PlacementPolicy @("PreferredPrimaryDomain. check out this topic. and your service has three replicas. normally the Cluster Resource Manager would prefer other nodes in those same fault domains.Add(invalidDomain). Let's say that the nodes that were being used for those replicas in fd:/1 and fd:/2 went down.fd:/EastUS") Requiring replicas to be distributed among all domains and disallowing packing Replicas are normally distributed across the domains the cluster is healthy. let's say that the cluster has nine nodes in three fault domains (fd:/0. even if they are packed into fewer domains. It doesn't make any sense to try to force a given workload to run in a single rack. you can specify the "RequireDomainDistribution" policy on the service. invalid. go Learn about configuring Services . Since most production workloads run with more than three replicas. Powershell: New-ServiceFabricService -ApplicationName $applicationName -ServiceName $serviceName -ServiceTypeName $serviceTypeName –Stateful -MinReplicaSetSize 2 -TargetReplicaSetSize 3 -PartitionSchemeSingleton - PlacementPolicy @("RequiredDomainDistribution") Now. These workloads are betting against total simultaneous permanent domain failures and can usually recover local state. Different hardware configurations should be spread across domains and those handled via normal placement constraints and node properties. and preferred domain configurations should be avoided unless you’re actually running a cluster that spans geographic distances. the default is to not require domain distribution.Add(distributeDomain). usually there are other nodes available in the ideal fault domains. Next steps For more information about the other options available for configuring services. This lets normal balancing and failover handle these cases. The required. would it be possible to use these configurations for services in a cluster that was not geographically spanned? Sure you could! But there’s not a great reason too. or to prefer some segment of your local cluster over another.PlacementPolicies. Other workloads would rather take the downtime earlier than risk correctness or loss of data. and many valid nodes per fault domain. Some workloads would rather always have the target number of replicas. serviceDescription. more than three fault domains.replacements. Code: ServicePlacementRequireDomainDistributionPolicyDescription distributeDomain = new ServicePlacementRequireDomainDistributionPolicyDescription(). If you want to disable such packing for a given workload. When this policy is set. even if that means that temporarily a domain may have multiple replicas packed into it. the Cluster Resource Manager ensures no two replicas from the same partition are ever allowed in the same fault or upgrade domain. the health report is for one of the system service’s partitions. Here’s an example of one such health report. For example. For example. Health integration The Cluster Resource Manager constantly tracks the rules you have defined for your services and the capacities available on the nodes and in the cluster. We’ll talk about both of these integration points below. The Cluster Resource Manager sends out health reports when it cannot put the cluster into the desired configuration. . or conflicting rules about where a service should be placed. Another example of the Resource Manager's health warnings is violations of placement constraints. if a node is over capacity and the Cluster Resource Manager will try to fix the situation by moving services. if you have defined a placement constraint (such as “NodeColor == Blue” ) and the Resource Manager detects a violation of that constraint. This is true for custom constraints and the default constraints (like fault and upgrade domain constraints). Cluster resource manager integration with Service Fabric cluster management 2/10/2017 • 10 min to read • Edit Online The Service Fabric Cluster Resource Manager isn’t the main component of Service Fabric that handles management operations (like application upgrades) but it is involved. During upgrades the Cluster Resource Manager alters its behavior slightly. The first way that the Cluster Resource Manager helps with management is by tracking the desired state of the cluster and the services inside it. If it cannot satisfy those rules or if there is insufficient capacity. it will emit a health warning. In this case. An example would be if there is insufficient capacity. If it can't correct the situation it emits a health warning indicating which node is over capacity. health warnings and errors are emitted. and for which metrics. The health message indicates the replicas of that partition are temporarily packed into too few Upgrade Domains. Another piece of integration has to do with how upgrades work. All the replicas themselves are healthy (this is Service Fabric’s first priority) 2. In this case. Some transient condition has made it impossible to place this service instance or replica correctly 2. or the Resource Manager is trying to find a place to place some services. Which node contains the replica causing the violation (The node with ID: 3d1a4a68b2592f55125328cd0f8ed477) 4. This could be for many reasons. but usually it is due to one of the two following conditions: 1. This could be because the nodes in the other Upgrade Domains were down. Let’s say you want to create a service. ReplicaHealthStates : ReplicaId : 130766528804733380 AggregatedHealthState : Ok ReplicaId : 130766528804577821 AggregatedHealthState : Ok ReplicaId : 130766528854889931 AggregatedHealthState : Ok ReplicaId : 130766528804577822 AggregatedHealthState : Ok ReplicaId : 130837073190680024 AggregatedHealthState : Ok HealthEvents : SourceId : System. we’d want to see if we can figure out why the Resource Manager had to pack the replicas into the Upgrade Domain.PLB Property : ReplicaConstraintViolation_UpgradeDomain HealthState : Warning SequenceNumber : 130837100116930204 SentAt : 8/10/2015 7:53:31 PM ReceivedAt : 8/10/2015 7:53:33 PM TTL : 00:01:05 Description : The Load Balancer has detected a Constraint Violation for this Replica: fabric:/System/FailoverManagerService Secondary Partition 00000000-0000-0000-0000-000000000001 is violating the Constraint: UpgradeDomain Details: Node -- 3d1a4a68b2592f55125328cd0f8ed477 Policy -. That the Upgrade Domain distribution constraint is currently being violated (meaning that a particular Upgrade Domain has more of the replicas for this partition than it should) 3. When the report happened (8/10/2015 7:13:02 PM) Information like this powers alerts that fire in production to let you know something has gone wrong. The service’s requirements are misconfigured in a way that causes its requirements to be unsatisfiable. HealthState='Warning'. for example. Property='ReplicaConstraintViolation_UpgradeDomain'. PS C:\Users\User > Get-WindowsFabricPartitionHealth -PartitionId '00000000-0000-0000-0000-000000000001' PartitionId : 00000000-0000-0000-0000-000000000001 AggregatedHealthState : Warning UnhealthyEvaluations : Unhealthy event: SourceId='System. . ConsiderWarningAsError=false. LastError = 1/1/0001 12:00:00 AM Here's what this health message is telling us is: 1.Packing RemoveWhenExpired : True IsExpired : False Transitions : Ok->Warning = 8/10/2015 7:13:02 PM. but there aren't any solutions that work.PLB'. These levels don't really mean that a given constraint will be violated. if there is a bug in the placement constraint causing too many nodes to be eliminated this is where you would notice. it means that we eliminated some nodes because they didn’t match the service’s placement constraints. During it. The different constraint priorities are why some constraint violation warnings show up more often than others: there are certain constraints that we're willing to relax (violate) temporarily. ReplicaExclusionStatic and ReplicaExclusionDynamic: This constraint indicates that two stateful replicas or stateless instances from the same partition would have to be placed on the same node (which isn’t allowed). but they boil down to “hard” (0). Further.” Well it turns out we can do that! Constraints can be configured with a few different levels of enforcement. This way when services aren’t able to be placed. NodeCapacity: This constraint means that the Cluster Resource Manager couldn’t place the replicas on the indicated nodes because doing so would cause the node to go over capacity. “optimization” (2). “soft” (1).In each of these conditions. Most of the constraints we’ve defined as hard by default. I’m willing to violate other constraints. PlacementConstraint: If you see this message. However. The further constraints can usually tell us how we’re ending up with too few nodes. there is a replica already on the node. More information on affinity is in this article FaultDomain and UpgradeDomain: This constraint eliminates nodes if placing the replica on the indicated nodes would cause packing in a particular fault or upgrade domain. ReplicaExclusionStatic and ReplicaExclusionDynamic are almost the same rule. You could however see health messages related to these constraints if they are configured as hard constraints or in the rare cases that they do cause nodes to be eliminated. and “off” (-1). just that there's an order in which they are preferentially enforced. This allows the Cluster Resource Manager to make the right tradeoffs if it is impossible to satisfy all constraints. Constraint types Let’s talk about each of the different constraints you can see in these health reports. Affinity: This constraint indicates that we couldn’t place the replica on the affected nodes since it would cause a violation of the affinity constraint. We call this process the “Constraint Elimination Sequence”. you may have been thinking “Hey – I think that placement constraints are the most important thing in my system. even things like affinity and capacity. you’ll see a health report from the Cluster Resource Manager providing information to help you determine what is going on and why the service can’t be placed. and almost all are either hard or soft. most people don’t normally think about capacity as something they are willing to relax. Does it matter a lot? No. the system walks through the configured constraints affecting the service and records what they eliminate. the preferred location constraint is usually present during upgrades. Constraint priorities With all of these constraints. Several examples discussing this constraint are presented in the topic on fault and upgrade domain constraints and resulting behavior PreferredLocation: You shouldn’t normally see this constraint causing nodes to get removed from the solution since it is set at the optimization level by default. In advanced situations constraint priorities can be changed. For example. if it ensures that the placement constraints aren’t ever violated. This is normal if you have a placement constraints defined. you can see which nodes were eliminated and why. For example. Is this confusing? Yes. Most of the time these constraints won't eliminate nodes since they are at the soft or optimize level by default. We trace out the currently configured placement constraints as a part of this message. This is different from the ReplicaExclusionStatic constraint that indicates not a proposed conflict but an actual one. In this case. The ReplicaExclusionDynamic constraint says “we couldn’t place this replica here because the only proposed solution had already placed a replica here”. If you are seeing a constraint elimination sequence containing either the ReplicaExclusionStatic or ReplicaExclusionDynamic constraint the Cluster Resource Manager thinks that there aren’t enough nodes. say you wanted to ensure that affinity . During upgrade it is used to move replicas back to where they were when the upgrade started. "parameters": [ { "name": "PlacementConstraintPriority". "value": "0" }.json for Azure hosted clusters: "fabricSettings": [ { "name": "PlacementAndLoadBalancing". Upgrade domains remain a soft constraint. { "name": "FaultDomainConstraintPriority". There have been times where we needed either to get strict about Fault and Upgrade domains to prevent something bad from happening. To achieve this. check out the article on cluster configuration. { "name": "CapacityConstraintPriority".xml <Section Name="PlacementAndLoadBalancing"> <Parameter Name="PlacementConstraintPriority" Value="0" /> <Parameter Name="CapacityConstraintPriority" Value="0" /> <Parameter Name="AffinityConstraintPriority" Value="0" /> <Parameter Name="FaultDomainConstraintPriority" Value="0" /> <Parameter Name="UpgradeDomainConstraintPriority" Value="1" /> <Parameter Name="PreferredLocationConstraintPriority" Value="2" /> </Section> via ClusterConfig. Most of the time everything sits at their default priorities. { "name": "UpgradeDomainConstraintPriority". The default priority values for the different constraints are specified in config: ClusterManifest. The Cluster Resource Manager may need to pack a couple replicas into an upgrade domain in order to deal with an . "value": "0" }. There have also been cases where we needed to ignore them entirely (though briefly!).would be violated to solve node capacity issues. "value": "0" }. "value": "1" }. For more information on how they are used. Generally the flexibility of the constraint priority infrastructure has worked very well. { "name": "PreferredLocationConstraintPriority". "value": "2" } ] } ] Fault domain and upgrade domain constraints The Cluster Resource Manager models the desire to keep services spread out among fault and upgrade domains as a constraint inside the Resource Manager’s engine. but it isn't needed often. { "name": "AffinityConstraintPriority".json for Standalone deployments or Template. "value": "0" }. you could set the priority of the affinity constraint to “soft” (1) and leave the capacity constraint set to “hard” (0). The key thing is that the Cluster Resource Manager is watching out for your constraints. While the buffered capacity is respected during normal operation (leaving some overhead). only rebalancing is disabled. like moving services into nodes that were emptied for the upgrade. then balancing is paused for that application instance. Next steps Start from the beginning and get an Introduction to the Service Fabric Cluster Resource Manager . or at least no worse.upgrade. As each Upgrade Domain completes. If the cluster was arranged well before the upgrade it will be arranged well after it. then the entire cluster is not balanced during the upgrade. the Cluster Resource Manager fills up to the total capacity (taking up the buffer) during upgrades. Preventing balancing prevents unnecessary reactions to the upgrade itself. and hence it is the only constraint set to “Optimization”. during which it has two jobs: ensure that the rules and performance of the cluster are not compromised try to help the upgrade go smoothly Keep enforcing the rules The main thing to be aware of is that the rules – the strict constraints around things like placement constraints are still enforced during upgrades. or constraint violations. Relaxed rules Generally that you want the upgrade to complete even if the cluster is constrained or full overall. This is where the notion of buffered capacities really comes into play. This is because there may be few options for where a service can go if it (or the node it sits on) needs to be brought down for an update. Reduced churn Another thing that happens during upgrades is that the Cluster Resource Manager turns off balancing for the entity being upgraded. The preferred location constraint The PreferredLocation constraint is a little different. This is because there is typically between 5 and 20 percent of the capacity down at a time as the upgrade rolls through the cluster. Constraint checks – ensuring the rules are enforced – stay active. and immediately reporting when it detects violations. Placement constraints ensure that your workloads only run where they are allowed to. We use this constraint while upgrades in flight to prefer putting services back where we found them before the upgrade. This way there are at most two transitions during the upgrade (the move out of the affected node and the move back in). failures. Upgrades The Cluster Resource Manager also helps during application and cluster upgrades. even during upgrades. are fully respected. even during upgrades. it attempts to return things to the original arrangement. Smart replacements When an upgrade starts. but not the other one. If your environment is highly constrained upgrades may take a long time. During upgrades being able to manage the capacity of the cluster is even more important than usual. all constraints. Returning the cluster to how it was before the upgrade also ensures the upgrade doesn’t impact the layout of the cluster. There’s all sorts of reasons why this may not work in practice. including fault and upgrade constraints. but it’s a nice optimization. This means that if you have two different application instances and upgrade on one of them. the Resource Manager takes a snapshot of the current arrangement of the cluster. If the upgrade in question is a Cluster upgrade. That work usually has to go somewhere. If the environment is configured correctly. This normally happens only when there are several failures or other churn in the system preventing correct placement. The Service Fabric team measured creation times in simulations of this scenario. Services should usually be small. in the meantime that workload has to wait to be scheduled in the cluster. especially ones where creation time was important. In these cases it's possible 75% to 95% of a node’s resources end up dedicated to a single service object. This allows such services to be created quickly when necessary. some workloads create services that are exceptionally large and consume most of a node. So what are the tradeoffs? Mainly defragmentation can increase impactfulness of failures (since more services are running on the node that fails). the Cluster Resource Manager aims for increases in deviation. To handle this scenario. However. Defragmentation generally means that instead of trying to distribute the utilization of a metric across the cluster. Compare that to the defragmented cluster. one that is defragmented and one that is not. This is likely if other workloads in the cluster are large and hence take longer to move around. Consolidation is a fortunate inversion of our normal strategy – instead of minimizing the average standard deviation of metric load. The Service Fabric Cluster Resource Manager does support a different strategy as well. However. But why would you want this strategy? Well. where the large workload could be immediately placed on nodes four or five. we should actually try to consolidate it. You can configure defragmentation metrics to have the Cluster Resource Manager to proactively try to condense the load of the services into fewer nodes. Most people don’t need defragmentation. if you’ve spread the load out evenly among the nodes in the cluster then you’ve eaten up some of the resources that the nodes have to offer. we introduced defragmentation as a balancing strategy. defragmentation really helped those new workloads get scheduled in the cluster. If there are many services and state to move around. if you have large services and need them created quickly (and are willing to accept the other tradeoffs) then the defragmentation strategy is for you. Large workloads aren't a problem. However. This helps ensure that there is (almost) always room for even large services. We found that for large workloads. The following diagram gives a visual representation of two different clusters. Furthermore. then it could take a long time for the large workload to be placed in the cluster. and hence it’s not hard to find room for them in the cluster. The Cluster Resource Manager determines at service creation time that it needs to reorganize the cluster to make room for this large workload. consider the number of movements that would be necessary to place one of the largest service objects. defragmentation ensures that some resources in the cluster are unutilized while they wait for workloads to be scheduled. . Defragmentation of metrics and load in Service Fabric 1/17/2017 • 4 min to read • Edit Online The Service Fabric Cluster Resource Manager mainly is concerned with balancing in terms of distributing the load – making sure that the nodes in the cluster are equally utilized. In the balanced case. Having workloads distributed is the safest layout in terms of surviving failures since it ensures that a failure doesn’t take out a large percentage of a given workload. We found that if services were large enough and the cluster was highly utilized that the creation of those large services would be slow. which is defragmentation. If there are no services that share those metrics results can be good. current loads. Configuring defragmentation metrics Configuring defragmentation metrics is a global decision in the cluster. how much they overlap. increasing contention Enables lower data movement during creation Failures can impact more services and cause more churn Allows rich description of requirements and reclamation of More complex overall Resource Management configuration space You can mix defragmented and normal metrics in the same cluster. The Cluster Resource Manager tries to consolidate the defragmentation metrics as much as possible while spreading out the others. their weights.Defragmentation pros and cons So what are those other conceptual tradeoffs? We recommend thorough measurement of your workloads before turning on defragmentation metrics. and other factors. Here’s a quick table of things to think about: DEFRAGMENTATION PROS DEFRAGMENTATION CONS Allows faster creation of large services Concentrates load onto fewer nodes. The exact results will depend on the number of balancing metrics compared to the number of defragmentation metrics. Experimentation is required to determine the exact configuration necessary. and individual metrics can be selected for defragmentation: ClusterManifest.xml: . json for Azure hosted clusters: "fabricSettings": [ { "name": "DefragmentationMetrics". check out this article . To find out more about them. To learn more about them and how to configure them. "value": "false" } ] } ] Next steps The Cluster Resource Manager has man options for describing the cluster. "parameters": [ { "name": "Disk". check out this article on describing a Service Fabric cluster Metrics are how the Service Fabric Cluster Resource Manger manages consumption and capacity in the cluster.json for Standalone deployments or Template. <Section Name="DefragmentationMetrics"> <Parameter Name="Disk" Value="true" /> <Parameter Name="CPU" Value="false" /> </Section> via ClusterConfig. "value": "true" }. { "name": "CPU". reacting to additions or removals of nodes or services. By default the Resource Manager: scans its state and applies updates (like recording that a node is down) every 1/10th of a second sets the placement and constraint check flags every second sets the balancing flag every five seconds.0" /> <Parameter Name="MinConstraintCheckInterval" Value="1. When each timer fires. This covers both new services and handling stateful replicas or stateless instances that have failed. There are three different categories of work. Balancing your service fabric cluster 1/24/2017 • 8 min to read • Edit Online The Service Fabric Cluster Resource Manager supports dynamic load changes.json for Standalone deployments or Template. each with their own corresponding timer. Placement – this stage deals with placing any stateful replicas or stateless instances that are missing. and rebalancing the cluster. If so it attempts to find an arrangement in the cluster that is more balanced. 3.json for Azure hosted clusters: . Examples of rules are things like ensuring that nodes are not over capacity and that a service’s placement constraints are met.0" /> <Parameter Name="MinLoadBalancingInterval" Value="5. Balancing – this stage checks to see if proactive rebalancing is necessary based on the configured desired level of balance for different metrics. correcting constraint violations. 2. the task is scheduled.xml: <Section Name="PlacementAndLoadBalancing"> <Parameter Name="PLBRefreshGap" Value="0. and what triggers it? The first set of controls around balancing are a set of timers.1" /> <Parameter Name="MinPlacementInterval" Value="1. We can see this reflected in the following configuration information: ClusterManifest. Deleting and dropping replicas or instances are handled here. But how often does it do these things. These timers govern how often the Cluster Resource Manager examines the state of the cluster for things that need to be addressed. Constraint Checks – this stage checks for and corrects violations of the different placement constraints (rules) within the system. Configuring Cluster Resource Manager Steps and Timers Each of these different types of corrections the Cluster Resource Manager can make is controlled by a different timer that governs its frequency.0" /> </Section> via ClusterConfig. They are: 1. { "name": "MinPlacementInterval". so the set of changes made at the end of each step is usually small.xml <Section Name="MetricBalancingThresholds"> <Parameter Name="MetricName1" Value="2"/> <Parameter Name="MetricName2" Value="3. "parameters": [ { "name": "PLBRefreshGap". As you can see by the default time intervals specified. the Cluster Resource Manager scans and checks for anything it needs to do frequently.json for Azure hosted clusters: .0" } ] } ] Today the Cluster Resource Manager only performs one of these actions at a time. { "name": "MinLoadBalancingInterval". The default timers provide some batching since many of the same types of events tend to occur simultaneously.0" }. Making small changes frequently makes the Cluster Resource Manager responsive to things that happen in the cluster. "value": "1. "value": "5. Balancing Thresholds are defined on a per-metric basis as a part of the cluster definition. By default the Cluster Resource Manager is not scanning through hours of changes in the cluster and trying to address all changes at once. For example. check out this article.10" }. The MinLoadBalancingInterval timer is just for how often the Cluster Resource Manager should check . ClusterManifest.json for Standalone deployments or Template. For that we have two other pieces of configuration: Balancing Thresholds and Activity Thresholds.it doesn't mean that anything happens. "value": "0. The Cluster Resource Manager also needs some additional information to determine if the cluster imbalanced. "fabricSettings": [ { "name": "PlacementAndLoadBalancing". The Balancing Threshold defines how imbalanced the cluster needs to be for a specific metric in order for the Cluster Resource Manager to consider it imbalanced and trigger balancing. Balancing thresholds A Balancing Threshold is the main control for triggering proactive rebalancing. For more information on metrics.5"/> </Section> via ClusterConfig. the Cluster Resource Manager takes care of pending requests to create services before balancing the cluster. Doing so would lead to bursts of churn. sequentially (that’s why we refer to these timers as “minimum intervals”). Let’s say that the balancing threshold for this metric is three. "parameters": [ { "name": "MetricName1". Getting below the balancing threshold is not an explicit goal. In the bottom example. the total amount of load in the cluster is low. although nodes are relatively imbalanced. As a result. the maximum load on a node is five and the minimum is two. In a situation like this some load will almost certainly be distributed to Node3. Activity thresholds Sometimes. resulting in a ratio of five. "value": "3.5 and that is less than the specified balancing threshold of three. just because a balancing search is kicked off doesn't mean anything moves . "fabricSettings": [ { "name": "MetricBalancingThresholds". If the amount of load on the most loaded node divided by the amount of load on the least loaded node exceeds this number. while the minimum is two. the cluster is balanced. Balancing Thresholds are just a trigger that tells the Cluster Resource Manager that it should look into the cluster to determine what improvements it can make. Indeed. "value": "2" }. No balancing is triggered when the Cluster Resource Manager checks. As a result balancing is triggered the next time the Cluster Resource Manager checks. then the cluster is considered imbalanced. Five is greater than the designated balancing threshold of three for that metric. Since the Service Fabric Cluster Resource Manager doesn't use a greedy approach. Since the ratio in the cluster is 5/2 = 2. In this example. The lack of . some load could also be distributed to Node2.sometimes the cluster is imbalanced but the situation can't be improved. which is one of the goals of the Cluster Resource Manager. each service is consuming one unit of some metric. { "name": "MetricName2". Doing so results in minimization of the overall differences between nodes. a rebalancing run will be scheduled next time the balancing timer fires.5" } ] } ] The Balancing Threshold for a metric is a ratio. if any. In the top example. the maximum load on a node is ten. Node1 is way over the Activity Threshold. As an example. If the cluster underwent balancing. In either case.json for Standalone deployments or Template.xml <Section Name="MetricActivityThresholds"> <Parameter Name="Memory" Value="1536"/> </Section> via ClusterConfig. or because the cluster is new and just getting bootstrapped.load could be a transient dip. "parameters": [ { "name": "Memory". the way we go about fixing it is moving individual service replicas and instances around. you may not want to spend time balancing the cluster because there’s little to be gained.json for Azure hosted clusters: "fabricSettings": [ { "name": "MetricActivityThresholds". Fixing the imbalance could require moving any of the stateful replicas or stateless instances that use the .balancing is triggered only if both the Balancing Threshold and Activity Threshold is exceeded for the same metric. In the bottom example. If no node is over this threshold. In the first case. This makes sense. Let’s also say that we retain our Balancing Threshold of three for this metric. Activity Thresholds are defined per-metric via the cluster definition: ClusterManifest. To avoid this. so balancing is scheduled. right? If memory is stacked up on one node. so nothing happens. balancing isn't triggered even if the Balancing Threshold is met. Activity Thresholds allows you to specify some absolute lower bound for activity. Since both the Balancing Threshold and the Activity Threshold for the metric are exceeded. "value": "1536" } ] } ] Balancing and activity thresholds are both tied to a specific metric . you’d spend network and compute resources to move things around without making any absolute difference. Balancing services together Something that’s interesting to note is that whether the cluster is imbalanced or not is a cluster-wide decision. let's look at the following diagram. there’s another control known as Activity Thresholds. but now we also have an Activity Threshold of 1536. multiple replicas or instances could be contributing to it. while the cluster is imbalanced per the Balancing Threshold there's no node meets that Activity Threshold. Just like Balancing Thresholds. However. you have three: . Occasionally though. Service3 against Metric3 and Metric4. or 3 can't cause movements in Service4. and Service3) and one that is off on its own. We can also say with certainty that an imbalance in Metrics 1. at the time of the other imbalance? Let’s see! Take for example four services. How could it happen that a service gets moved around even if all of that service’s metrics were balanced. Now instead of two groups of related services. Service1. or had their metric configuration change. Service1 reports against metrics Metric1 and Metric2. Service2. We have a chain! We don’t really have four independent services. or 3. For example. and Service4 against some metric Metric99. but can be larger depending on exactly how imbalanced Metric1 got and what changes were necessary in the cluster to correct it. 2. So it is possible that an imbalance in Metric1 can cause replicas or instances belonging to Service3 (which doesn't report Metric1) to move around. a service that wasn’t imbalanced gets moved. 2. removed. between two runs of balancing Service2 may have been reconfigured to remove Metric2. even perfectly so. There would be no point since moving the replicas or instances belonging to Service4 around can do absolutely nothing to impact the balance of Metrics 1. Service2. we have a bunch of services that are related (Service1. and Service4.imbalanced metric. Surely you can see where we’re going here. Usually these movements are limited. Service3. This change breaks the chain between Service1 and Service2. The Cluster Resource Manager automatically figures out what services are related. Service2 against Metric2 and Mmetric3. since services may have been added. check out this article Movement Cost is one way of signaling to the Cluster Resource Manager that certain services are more expensive to move than others. refer to this article The Cluster Resource Manager has several throttles that you can configure to slow down churn in the cluster. For more about movement cost. but if you need them you can learn about them here . To learn more about them and how to configure them.Next steps Metrics are how the Service Fabric Cluster Resource Manger manages consumption and capacity in the cluster. They're not normally necessary. Throttling the behavior of the Service Fabric Cluster Resource Manager 1/17/2017 • 2 min to read • Edit Online Even if you’ve configured the Cluster Resource Manager correctly. but this can introduce churn in the cluster. To help with these sorts of situations. corrected bits get deployed.json for Standalone deployments or Template. even if it means the cluster takes longer to stabilize in mainline situations.the nodes come back. These settings shouldn’t be changed from the defaults unless there’s been some careful math done around the amount of work that the cluster can do in parallel. If you need to change them you should tune them to your expected actual load. value in seconds) <Section Name="PlacementAndLoadBalancing"> <Parameter Name="GlobalMovementThrottleThreshold" Value="1000" /> <Parameter Name="GlobalMovementThrottleCountingInterval" Value="600" /> <Parameter Name="MovementPerPartitionThrottleThreshold" Value="50" /> <Parameter Name="MovementPerPartitionThrottleCountingInterval" Value="600" /> </Section> via ClusterConfig. You may determine you need to have some throttles in place. These throttles are fairly large hammers. the network partitions heal.json for Azure hosted clusters: . Throttles help provide a backstop so that the cluster can use resources to stabilize itself . there could be simultaneous node or fault domain failures . the cluster can get disrupted.what would happen if that occurred during an upgrade? The Cluster Resource Manager tries to fix everything. the Service Fabric Cluster Resource Manager includes several throttles. value in seconds) MovementPerPartitionThrottleThreshold – this setting controls the total number of movements for any service partition over some time (the MovementPerPartitionThrottleCountingInterval. For example. Configuring the throttles The throttles that are included by default are: GlobalMovementThrottleThreshold – this setting controls the total number of movements in the cluster over some time (defined as the GlobalMovementThrottleCountingInterval. The throttles have default values that the Service Fabric team has found through experience to be ok defaults. Customers also understood they could end up running at lower overall reliability while they were throttled. or disks that aren't able to build many replicas in parallel due to throughput limitations. "value": "600" }. "value": "50" }. "value": "600" } ] } ] Most of the time we’ve seen customers use these throttles it has been because they were already in a resource constrained environment. check out the article on balancing load The Cluster Resource Manager has many options for describing the cluster. "value": "1000" }. { "name": "MovementPerPartitionThrottleThreshold". { "name": "MovementPerPartitionThrottleCountingInterval". { "name": "GlobalMovementThrottleCountingInterval". "fabricSettings": [ { "name": "PlacementAndLoadBalancing". "parameters": [ { "name": "GlobalMovementThrottleThreshold". These types of restrictions meant that operations triggered in response to failures wouldn’t succeed or would be slow. check out this article on describing a Service Fabric cluster . To find out more about them. In these situations customers knew they were extending the amount of time it would take the cluster to reach a stable state. Next steps To find out about how the Cluster Resource Manager manages and balances load in the cluster. Some examples of that environment would be limited network bandwidth into individual nodes. even without the throttles. Zero move cost means that movement is free and should not count against the score of the solution. MoveCosts are relative to each other. it's unlikely in any complex system that all moves are equal. Changing a replica's move cost and factors to consider As with reporting load (another feature of Cluster Resource Manager). MoveCost has four levels: Zero. The most common factors in calculating your move cost are: . The Cluster Resource Manager has two ways of computing costs and limiting them. Moving service instances or replicas costs CPU time and network bandwidth at a minimum. it also costs the amount of space on disk and in memory that you need to create a copy of the state before shutting down old replicas. The first is that when Cluster Resource Manager is planning a new layout for the cluster. except for Zero. Service movement cost for influencing Cluster Resource Manager choices 1/17/2017 • 2 min to read • Edit Online An important factor that the Service Fabric Cluster Resource Manager considers when trying to determine what changes to make to a cluster is the overall cost of achieving that solution.ReportMoveCost(MoveCost. If two solutions are generated with about the same balance (score). Code: this. Clearly you’d want to minimize the cost of any solution that Azure Service Fabric Cluster Resource Manager comes up with. The notion of "cost" is traded off against the amount of balance that can be achieved. and High. then prefer the one with the lowest cost (total number of moves). MoveCost helps you find the solutions that cause the least disruption overall and are easiest to achieve while still arriving at equivalent balance.Medium). Medium. even while it tries to manage the cluster according to its other goals. Low. Some are likely to be much more expensive. A service’s notion of cost can be relative to many things. This strategy works well.ServicePartition. For stateful services. But as with default or static loads. services can dynamically self-report how costly it is to move at any time. Setting your move cost to High does not guarantee that the replica stays in one place. it counts every move that it would make. But you also don’t want to ignore solutions that would significantly improve the allocation of resources in the cluster. A default move cost can also be specified when a service is created. check out Managing resource consumption and load in Service Fabric with metrics. So while the operation is going on. The cost of interrupting an in-flight operation. Some operations at the data store level or operations performed in response to a client call are costly. The cost of moving a primary replica is usually higher than the cost of moving a secondary replica. To learn more about metrics and how to configure them. you increase the move cost of this service object to reduce the likelihood that it moves. After a certain point. The cost of disconnection of clients. check out Balancing your Service Fabric cluster. To learn about how the Cluster Resource Manager manages and balances load in the cluster. When the operation is done. The amount of state or data that the service has to move. . Next steps Service Fabric Cluster Resource Manger uses metrics to manage consumption and capacity in the cluster. you don’t want to stop them if you don’t have to. you set the cost back to normal. For example. it does not mean that VM1 and VM2 are on two different Hardware Racks. These minimums exist because the Service Fabric cluster runs a set of stateful system services. Refer to for details on Large VMSS What is the minimum size of a Service Fabric cluster? Why can't it be smaller? The minimum supported size for a Service Fabric cluster running production workloads is five nodes. In the future. However. there is an inherent challenge in delivering strongly consistent data replication between machines spread far apart. There are other issues with Large VMSS currently. so long as they have network connectivity to each other. In the interim. including the . like the lack of level-4 Load balancing support. that is not a problem. Do Service Fabric nodes automatically receive OS updates? Not today. or at least for a specific partition (for a stateful service). we will support an OS update policy that is fully automated and coordinated across update domains.No. if OS updates are not coordinated across the cluster. Such simultaneous reboots can cause complete availability loss for a service. In addition. Since the FDs and UDs are comparable only within a placement group SF cannot use it. For dev/test scenarios. Can I use Large Virtual Scale Sets in my SF cluster? Short answer . Commonly asked Service Fabric questions 3/8/2017 • 8 min to read • Edit Online There are many commonly asked questions about what Service Fabric can do and how it should be used. Fault domains (FDs) and upgrade domains (UDs) are only consistent within a placement group Service fabric uses FDs and UDs to make placement decisions of your service replicas/Service instances. we have provided a script that a cluster administrator can use to manually kick off patching of each node in a safe manner. it does so by the use of Placement Groups (PGs). ensuring that availability is maintained despite reboots and other unexpected failures. Cluster setup and management Can I create a cluster that spans multiple Azure regions? Not today. However. The challenge with OS updates is that they typically require a reboot of the machine. By itself.Although the Large Virtual Scale Sets (VMSS) allow you to scale a VMSS upto 1000 VM instances. hence SF cannot use the FD values in this case to make placement decisions. there is the potential that many nodes go down at once. but this is also a common request that we intend to deliver. the Service Fabric cluster resource in Azure is regional. Long Answer . which results in temporary availability loss. as are the virtual machine scale sets that the cluster is built on. we support three node clusters. The core Service Fabric clustering technology knows nothing about Azure regions and can be used to combine machines running anywhere in the world. If VM1 in PG1 has a topology of FD=0 and VM9 in PG2 has a topology of FD=4 . since Service Fabric will automatically redirect traffic for those services to other nodes. We want to ensure that performance is predictable and acceptable before supporting cross- regional clusters. This document covers many of those common questions and their answers. but this is a common request that we continue to investigate. so the amount you can store is only limited by the number of machines you have in the cluster. the requirement to create a quorum is still two nodes (3/2 + 1 = 2). if you stop all nodes and restart them later. Two nodes: a quorum for a service deployed across two nodes (N = 2) is 2 (2/2 + 1 = 2). such that the number of network requests to gather all data from the number of actors to the number of partitions in your service. The three node cluster configuration is supported for dev/test because you can safely perform upgrades and survive individual node failures. that is not a problem as the new node is brought up to date by other nodes. However. These services. That strong consistency. Periodically push data from each service to an external store. Can I turn off my cluster at night/weekends to save costs? In general. With that background. Since performing a service upgrade requires temporarily taking down a replica. let's examine some possible cluster configurations: One node: this option does not provide high availability since the loss of the single node for any reason means the loss of the entire cluster. If you have a need to query across the full set of actor state. Designing your actors to periodically push their state to an external store for easier querying. so five nodes are required. In normal operation. this approach is only viable if the queries you're performing are not required for your runtime behavior. the data does not move with it. so it is not recommended to perform broad queries of actor state at runtime. you must be resilient to such a simultaneous failure. meaning that if the virtual machine is moved to a different host. there is a significant possibility that most of the nodes start on new hosts and make the system unable to recover. This means that you can lose an individual node and still maintain quorum. depends on the ability to acquire a quorum for any given update to the state of those services. as long as they don't happen simultaneously. Three nodes: with three nodes (N=3). we recommend that you dynamically create those clusters as part of your continuous integration/continuous deployment pipeline. ephemeral disks. How much data can I store in a Reliable Collection? Reliable services are typically partitioned. This approach is only appropriate if the queries you're performing are not part of your core business logic. That means that the state for a given service may be spread across 10s or 100s of machines. For production workloads. Application Design What's the best way to query data across partitions of a Reliable Collection? Reliable collections are typically partitioned to enable scale out for greater performance and throughput. Service Fabric stores state on local. in turn. When a single replica is lost. What's the best way to query data across my actors? Actors are designed to be independent units of state and compute. no. and the amount of memory available on those machines. which track what services have been deployed to the cluster and where they're currently hosted. As above. you have a few options: Create a service that queries all partitions of another service to pull in the required data. depend on strong consistency. it is impossible to create a quorum. If you would like to create clusters for testing your application before it is deployed. To perform operations over that full data set. you should consider either: Replacing your actor services with stateful reliable services. Create a service that can receive data from all partitions of another service. . where a quorum represents a strict majority of the replicas (N/2 +1) for a given service. this is not a useful configuration.naming service and the failover manager. leaving 10gb available per machine. However.As an example. since an individual replica cannot span machines. the Service Fabric runtime. the amount of data that you can store in an actor service is only limited by the total disk space and memory available across the nodes in your cluster. Service Fabric will rebalance your replicas to leverage the additional capacity until the number of machines surpasses the number of partitions in your service. How much data can I store in an actor? As with reliable services. and your services consume 6gb of that. If you add more machines. you would have sufficient memory for approximately 35 million objects in your collection when operating at full capacity. In our example above. suppose that you have a reliable collection in a service with 100 partitions and 3 replicas. assume that the operating system and system services. For simplicity and to be very conservative. That the reliable service in question is the only one storing state in the cluster. Are you planning to open source Service Fabric? We intend to open source the reliable services and reliable actors frameworks on GitHub and will accept community contributions to those projects. an individual actor should have state that is measured in kilobytes. if you reduce the size of the cluster by removing machines. Keeping in mind that each object must be stored three times (one primary and two replicas). individual actors are most effective when they are used to encapsulate a small amount of state and associated business logic. including services that have been packaged in a container. or 100gb for the cluster. If load is not even. Next steps Learn about core Service Fabric concepts and best practices . Service Fabric offers a way to deploy and manage services. That the cluster itself is not growing or shrinking. By default. Other questions How does Service Fabric relate to containers? Containers offer a simple way to package services and their dependencies such that they run consistently in all environments and can operate in an isolated fashion on a single machine. which represents about 1/3 of capacity. As a general rule. The are currently no plans to open source the Service Fabric runtime. that would put 10 primary replicas and 20 secondary replicas on each node in the cluster. we recommend being resilient to the simultaneous loss of a failure domain and an upgrade domain. you need to be mindful of the resources that each will need to run and manage its state. Note that this calculation also assumes: That the distribution of data across the partitions is roughly uniform or that you're reporting load metrics to the cluster resource manager. Service Fabric will load balance based on replica count. That works well for load that is evenly distributed across the partitions. By contrast. Since you can deploy multiple services to a cluster. storing objects that average 1kb in size. your replicas will be packed more tightly and have less overall capacity. Please follow the Service Fabric blog for more details as they're announced. However. Now suppose that you have a 10 machine cluster with 16gb of memory per machine. and would reduce the number to roughly 23 million. you must report load so that the resource manager can pack smaller replicas together and allow larger replicas to consume more memory on an individual node. Depending on the level of support needed and the severity of the issue. Report Azure Service Fabric issues We have set up a GitHub repo for reporting Service Fabric issues. Do not use this to report live- site issues. StackOverflow and MSDN forums The Service Fabric tag on StackOverflow and the Service Fabric forum on MSDN are best used for asking questions about how the platform works and how you might accomplish certain tasks with it. Report production or live-site issues or request paid support for standalone Service Fabric clusters For reporting live-site issues on your Service Fabric cluster deployed on premise or on other clouds. Azure Feedback forum The Azure Feedback Forum for Service Fabric is the best place for submitting big feature ideas you have for the product as we review the most popular requests are part of our medium to long-term planning. As and when we announce the . we have set up various options for you. GitHub repo Report Azure Service Fabric Issues on Service-Fabric-issues git repo. open a ticket for professional support on Microsoft support portal. We encourage you to rally support for your suggestions within the community. Learn more about: Professional Support from Microsoft for Azure. We are also actively monitoring the following forums. Report production or live-site issues or request paid support for Azure For reporting live-site issues on your Service Fabric cluster deployed on Azure. Make sure that your cluster is always running a supported Service Fabric version. you get to pick the right options. Supported Service Fabric versions. Microsoft premier support. Microsoft premier support. Learn more about: Professional Support from Microsoft for on-premise. Azure Service Fabric support options 3/10/2017 • 2 min to read • Edit Online To deliver the appropriate support for your Service Fabric clusters that you are running your application work loads on. This repo is intended for reporting and tracking issues with Azure Service Fabric and for making small feature requests. Open a ticket for professional support on Azure portal or Microsoft support portal. 2017 5.5. the previous version is marked for end of support after a minimum of 60 days from that date. Refer to the following documents on details on how to keep your cluster running a supported Service Fabric version.* Current version and so no end date Next steps Upgrade service fabric version on an Azure cluster Upgrade Service Fabric version on a standalone windows server cluster .release of a new version of Service Fabric.3.* February 24. SERVICE FABRIC RUNTIME CLUSTER END OF SUPPORT DATE All cluster versions prior to 5. The new releases are announced on the Service Fabric team blog. Upgrade Service Fabric version on an Azure cluster Upgrade Service Fabric version on a standalone windows server cluster Here are the list of the Service Fabric versions that are supported and their support end dates.3.* May 10.121 January 20. 2017 5.4.2017 5.

Comments

Description