OPTORSIM FAQ



Comments



Description

The OptorSim Archive of Questions AskedCaitriana Nicholson, March 2008 This is an edited archive of user questions submitted to the OptorSim mailing lists, with developers' and other users' responses. It is intended as a resource for other users, who may not receive ready responses from the original developers now that they have all moved on to other things. Questions are in plain font and answers are in italics. Some editing of grammar and spelling has been done, but not extensively – so don't blame the editor for those! Contents: Current State of the Project Running OptorSim in Windows Running OptorSim in MacOS Configuration File Questions Netbeans and OptorSim Compilation Problems Class File Documentation Various: Initial replica placement, CEs and worker nodes, file pinning, access cost, job processing Simulating Security Functions Timing Model Adding New Replication / Scheduling Strategies Statistics Output Resource Monitoring State of the project What is the current state of this simulator? Is it still being developed, and will there be any new versions? The simulator is not being actively used by people within the EDG project (the project under which it was created). In fact the EDG project finished a number of years ago. However, others are using and extending the codebase. The project is maintained in a repository at SourceForge (http://sourceforge.net/projects/optorsim) and new developers are welcome to join there, but the original developers are all working elsewhere now and no longer have time to make new releases. Any new questions should be addressed to the mailing list at [email protected] where they will be answered on a best-effort basis. OptorSim in Windows Windows Path Instructions for UserGuide I am trying to learn about grid simulation tools, and am excited by OptorSim. However, I am stuck using a Windows XP system, and I would recommend adding (on page 4 of the OptorSim v2.0 Installation and User Guide): for Windows users: My Computer -> Properties -> Advanced -> Environmental Variables, then highlight the Path in the System Variables box, and click "Edit", and add to the end of the path: %OptorSim-2.0 Directory%\bin where %OptorSim-2.0 Directory% in my case was C:\optorsim-2.0 Running OptorSim in windows I can't find anybody that know how to run OptorSim in windows. I am not familiar with unix environment. can you tell me how to run OptorSim using windows. the user guide i think more focuses on unix.. Running OptorSim in Windows is pretty much the same as running in Unix. In the optorsim-2.0\bin directory there is a Windows executable called OptorSim.bat. Start up a command prompt, go into the optorsim-2.0 directory and run bin\OptorSim.bat. Edit the examples\parameters.conf file to set the parameters you want. There are instructions for running in Windows in the user guide. OptorSim can be used on any system that has Java. and installed it according to the instructions in the userguide. Getting OptorSim working for Macs inv olved getting Java working. the following error is coming.com/java/ and http://www. I am running this simulator under windows OS. As Paul said. These will probably not work f or Macs.0 downloaded from the website. but it should be fairly easy to develop Mac equivalents.bat. How to execute OptorSim Simulator in windows OS? I downloaded OptorSim simulator.apple.lang. You will need two parts: the build environment (java compiler and the build tool "ant") and the run-time environment (JRE).pepsan. but it not working. on pages 4 and 5 . from wherever it is running.NoClassDefFoundError: org/edg/data/replication/optorsim/OptorSimMain If you are using the OptorSim 2. please modify the paths in the file so that it can find lib/edg-optorsim. etc. OptorSim with MacOS I would like to find out if the simulation tool "OptorSim" can be used on a Macintosh Operating System.jar. The web page: http://developer.bat file assumes you are running from within the optorsim-2. In principle.0 directory. A few wrapper scripts are included with OptorSim (in the directory "bin"). it should work. whenever I am using OptorSim.all other instructions are the same as for unix.com/javamac/ seem to be good places to start. the classpath set in the OptorSim. Exception in thread "main" java. Configuration Files CMS testbed topology . if you want to run it from a dif erent directory. 0 14 jpsijob 0.72 highptphotjob 0. which might b e undesirable in my point. I´m trying to understand extra examples that are in the web.86 zbbbarjob 1.0 3 jpsijob 0.84 zbbbarjob 1. I can configure the topology.5 incmuonjob 0.58 incmuonjob 0.44 incelecjob 0.5 incmuonjob 0.conf cms_testbed_bandwidths. Can you help me to obtain the configuration files of "Grid topology for CMS world wide dat a production challenge in spring 2002"? The CMS testbed configuration files are included in the examples/ directory of OptorSim : cms_testbed_grid. you're asking about the following part of the configuration file: \begin{cescheduletable} 0 jpsijob 0.I want to do an evaluation about our strategy with a promising topology of Grid like "Grid topology for C MS world wide data production challenge in spring 2002" introduced in a paper.84 zbbbarjob 1.0 12 jpsijob 0.0 .0 13 jpsijob 0.0 #8 jpsijob 0.58 incmuonjob 0.3 highptlepjob 0.71 highptphotjob 0.conf cms_testbed_jobs.0 7 jpsijob 0.14 highptlepjob 0.67 highptphotjob 0.0 11 jpsijob 0.28 incelecjob 0. And I don´t know how percentages are calculated.43 incelecjob 0.56 highptphotjob 0.7 zbbbarjob 1.42 incmuonjob 0.28 incelecjob 0.86 zbbbarjob 1.86 zbbbarjob 1.57 incmuonjob 0.86 zbbbarjob 1.86 zbbbarjob 1. Do you know where percentages came? If I understand your question correctly.17 highptlepjob 0. "Evaluation of an Economy-Based File Replication Strategy for a DataGrid".14 highptlepjob 0.34 incelecjob 0.67 highptphotjob 0.34 incelecjob 0.84 zbbbarjob 1.44 incelecjob 0.17 highptlepjob 0.17 highptlepjob 0.72 highptphotjob 0.67 highptphotjob 0.42 incmuonjob 0.34 incelecjob 0.5 incmuonjob 0.29 highptlepjob 0.14 highptlepjob 0.conf Job probabilitiesHi. As an undesirable case.72 highptphotjob 0.28 incelecjob 0.0 15 jpsijob 0.14 highptlepjob 0.58 incmuonjob 0.72 highptphotjob 0. 0 .84 zbbbarjob 1.0.0 is that on Computing Element 0 can run: jpsijob with probability 0.5 = 0.10 times for each job? What is the relation with the job sel ection probability? It will run 100 jobs. The jobs it chooses will depend on the job selection probability which you define in t he job configuration file.34 .64 = 0.14 highptlepjob 0.16 jpsijob 0.34 incelecjob 0.16 Job running I have one doubt in OptorSim.17 zbbbarjob with probability 1. For example.5 .17 highptlepjob with probability 0.0.67 .25 . they will run a dif erent number of times.67 highptphotjob 0.5 incmuonjob 0. How will it run the jobs . If you have given all your 10 jobs the same probability.17 = 0.0. suppose you have the following in your job configuration file: \begin{jobselectionprobability} jobA 0.67 highptphotjob 0. If they have dif erent probabilities.17 highptlepjob 0.0 17 jpsijob 0.5 jobB 0.42 incmuonjob 0.34 = 0.5 incmuonjob 0. In parameter config file I declare d number.it will not be exact).0.56 highptphotjob 0.28 incelecjob 0.16 incmuonjob with probability 0.67 = 0. it will run each job 10 times (o n average . For example. the meaning of the row 0 jpsijob 0.17 highptphotjob with probability 0.0 \end Percentages are cumulative.17 incelecjob with probability 0.86 zbbbarjob 1.0.jobs = 100.84 . In job config file I defined some ten jobs.17 highptlepjob 0.34 incelecjob 0.84 zbbbarjob 1. 15 [.. and have jobA filesetfraction set at 0.05 \end{jobselectionprobability} Then. and so on. My question is whether all jobs will execute sequentially? that is if the job1 have to run 10 times after it ran 10 times only the next job (job2) will run.25. The jobs are chosen "at random". So if you have 100 fi les defined in a fileset for job type jobA.jobC 0. if job1 has prob ability 0. You can see the code for it in the randomJob() method of GridContainer. but weighted by their selection probability. A ran dom number between 0 and 1 is generated. for 100 jobs. Based upon the selection probability it will run job that much times. each individual instance of jobA will pr ocess 25 files.. This is then compared to the job probability. jobA would run about 50 times. it is not like that. jobB about 25 times. In the jobtable.5 job1 is not chosen an d another job is considered.5 job1 is chosen to run and if it is bigger than 0. you define the fraction of the total fileset which one job needs. it tries to replicate it according to the chosen replication strategy. Th en for each file in the array. jobC about 15 times. Is it like that? No. Then in the filesetfraction table.] jobJ 0. How to calculate the number of files required for a particular job and the file names? Is there any functi on for this implemented in optorsim? What is needed for getBestFile()? if the initial file distribution is more than on e site then will only it be used? How it is related to access cost? These are set in the job configuration file. for example.5. you define the set of files for a particular job. getBestFile() takes an array of lfns (logical file names) and an array of the corresponding file fractions. if the random number is less than 0. Each Optimiser class t herefore has . The site bandwidth in grid conf file shows.a nd do not appear in the simulation output. Even if a site has only one connection.spec. For cms_testbed_grid. . CMS testbed grid In cms_testbed_grid the number of sites mentioned is 27 but while running only 19 sites it shows. only that site will be used for the first replications.its own implementation of getBestFile(). clearly . Then how the files are transferred to other sites for job processing.g. but it's easier to use the kilo prefix) is a standard way of measuring CPU performan ce for dif erent machines. Why? Similarly the initial file distribution is in site 14. F iles can go *through* other sites on the way between the source and destination sites.they just transfer the files through them . results for dif erent machines at http://www. site 14 is actually connected to both site 15 (Lyon) and site 23 (a router). If the initial file distribution is for only one site. The site that gives the lowes t access cost (or wins the auction in the economic model.spec.org/cpu2000/results/res2007q1/ It's the way that the LCG project uses to calculate its resource requirements. e. kSI2000 What is the meaning of kSI(2000)? SI2000 (or CINT2000. Some of the sites are router sites. site 14 is connected to site 15 no other connection is there.org/cpu/ for more information. If files are on more than one site. or whatever your optimiser does) is chosen to replicate fro m. so they have no SE or CE . files can be transferred to other sites as long as they are all connected in the network. all the sites will be considered as sources of replicas. See http://www. I didn't add any coding of my own. just the problem pop up when I install and run it by using ant.java:68: error: Type `JTextArea' not found in declaration of field . or in the top optorsim directory. it was certainly something to do with not finding the correct paths to all the libraries. Or else all the jar files had to bein the Netbeans working directory.0/build/classes [javac] /home/rony/optorsim-2. the solution was to put all the jar files for the external packages together with the optorsim jar file in the lib/ directory. Compilation Problems Problem with Compiling OptorSimV2 I have a problem with compiling the files with ant command.0 (based on Apache Ant) for compiling modified code. It also shows a warning "deprecation: show() in java. and if I remember correctly.window has been deprecated" this. Has any body tried that? If yes.awt. Buildfile: build. I think I remember someone having this problem before..show().. could you tell me if any changes are needed? It seems that it does not see some of the packages and complains about some packages saying "package does not exist" at the import statements.Netbeans and OptorSim I am trying to use netbean4.xml init: prepare: build: [javac] Compiling 95 source files to /home/rony/optorsim-2.0/src/org/edg/data/replication/optorsim/OptorS imGUI. Try playing around with the location of the jar files and see if it works. java:931: error: Type `JPEGImageEncoder' not found in the declaration of t he local variable `encoder'. [javac] [javac] [javac] /home/rony/optorsim-2. [javac] [javac] /home/rony/optorsim-2. [javac] JPEGImageEncoder encoder = private static JTextArea tabTerm.java:912: error: Type `JPEGImageEncoder' not found in the declaration of t he local variable `encoder'.0/src/org/edg/data/replication/optorsim/OptorS imGUI. [javac] [javac] /home/rony/optorsim-2.0/src/org/edg/data/replication/optorsim/OptorS imGUI.createJPEGEnc oder(out). [javac] [javac] ^ .0/src/org/edg/data/replication/optorsim/OptorS imGUI. [javac] JPEGImageEncoder encoder = ^ JPEGCodec.java:892: error: Type `JPEGImageEncoder' not found in the declaration of t he local variable `encoder'. [javac] JPEGImageEncoder encoder = ^ JPEGCodec.createJPEGEnc oder(out).`tabTer m'. ^ JPEGCodec.createJPEGEnc oder(out). 0/build.xml:33: Compile failed. see the compiler error output for details.java:951: error: Type `JPEGImageEncoder' not found in the declaration of t he local variable `encoder'. Both jTextArea (part of Swing) and JPEGImageEncoder (part of the com.java:1101: error: Type `JPEGImageEncoder' not found in the declaration of the local variable `encoder'.createJPEGEnc oder(out).sun.image. it looks like your version of Java doesn't have access to them. Which version of Java are you using? A problem on buiding Optorsim with ant! Several times I tried to re-build the Optorsim with ant as you have been described in the user guide but the build fails with the following error message: ^ Build failed . For some reason./home/rony/optorsim-2. [javac] [javac] 6 errors BUILD FAILED file:/home/rony/optorsim-2.0/src/org/edg/data/replication/optorsim/OptorS imGUI.jpeg package) are included in standard Java-REs (or SDKs) these days. [javac] JPEGImageEncoder encoder = JPEGCodec.createJPEGEncode r(out).0/src/org/edg/data/replication/optorsim/OptorS imGUI. [javac] JPEGImageEncoder encoder = ^ JPEGCodec.codec. [javac] [javac] /home/rony/optorsim-2. If you want to see more details of the code itself.e. should work. how can I find an explanation for each class code ?? I need to understand how each class works before I make any change.Class explanations I want to use OptorSim for my PhD thesis simulation.Initial Replica Placement I have also seen that in OptorSim the initial file and replica placement is made randomly using uniform distribution and I want to know if I can change this by implementing my initial placement strategy? To do this.the functional tests. Yo u could just . This would be the best way for you to get an idea of how the dif erent parts of the simulation work.sh error=193 It looks like you are running on a Windows machine. only work in a UNIX operating system.G:\optorsim-2. as page 4 of the user guide mentions. you would have to change the assignFilesToSites() method in the class JobConfFileReader. Documentation . After installing OptorSim.xml:74: execute failed: java. just by running 'ant'. you can get the JavaDoc expanation of each class by doing: ant doc which will generate the documentation in html format in the doc/api directory of your optorsim installation directory. which this command runs. but we only wrote the functional tests for UNIX. is that correct? From the section of output you sent.io. the command: ant func-test I think this may be the problem . Everything else should be fine on Windows.IOException:createprocess:bin\optorsimTests. Page 4 of the User Guide also outlines this procedure. You can then read these html files with your web browser. Building the source for OptorSim itself. i. The level of commenting in the code is somewhat variable. it also looks like you were trying to run the functional test suite. please go into the src/ directory and open up the source files to read them. however.1\build. so the CEs at each site we re implemented as a Vector.). CEs and Worker Nodes I am a little not clear about the number of working nodes and computing elements. In the user manual. Then ho w worker node involved in job processing. but extending this to actually having >1 CE/site was never actually implement ed. This is in terms of the bandwidth not the number of hops. i t says that a maximum of 1 CE per site.extend this class and override the method in the subclass. so if there are more worker node s in a CE it processes the job faster. Access cost I am trying to compute the cost of accessing a file if stored at a certain storage element or site. is this just for future work? It was intended to further develop the model to allow more than 1 CE per site. The time to process the job is divided by the number of worker nodes. so yes. it is 'future work'.Antoine Vernois from Lyon developed a more soph isticated model but I don't think it's included in the release. The access cost is currently calculated as (file size) / (available bandwidth) . it is the total number of CEs in the whole grid. I am also not clear about the function returning the total number of computing elements in the GridCont ainer class. In the code of the GridSite you have a vector for CEs at the site(Vector _computingElementCollection = new Vector(). What is the need of worker nodes in OptorSim? Only one job at a time can be processed by CE. This is a very simple model . is this the total number in the whole Grid system? Yes. The best route is calculated at the beginning of the simulation using a Dijkstra search algorithm . After all the jobs are submitted only.see the GridConfFileReader and GridContainer classes for details. RB submitting jobs to the CE based upon the scheduling algorithm. so if you want to modify it that is where you should make the changes. RB job processing User submitting jobs to the RB. OptorSim starts processing jobs. I wonder if it can be used for simulating t he security features. For each pair of grid sites. I hope question is clear cz I wrote in a hurry. If there is not a large number of jobs and it goes quickly. [ pin status of the file] If a file is pinned. Also is there any function that returns the best route. however.This is in the NetworkClient class. This is so that if an Optimiser decide s to replicate a particular file. File Pinning What is the meaning of pinned. the RB starts processing jobs as soo n as the Users have started submitting them. it might *look* like the RB has not started until all the jobs have been submitted. Simulating security functions Can we simulate grid security functions by using OptorSim? The description of OptorSim at the DataGRI D website only describes data access optimization algorithm simulations. Why? Actually. so there can still be users submitting jobs while the RB is processing them. it can prevent it from being deleted until the replication is finished. based on the maximum bandwidth. it can't be deleted from the SE until it is unpinned. that is how to figure out the maximum bandwidth available in the route. is calculated. Could you explain what kind of grid security functions you want to simulate? What level of detail are you looking at? It . the best route between them. I don't know how much modifications are required in its code to suit us! It looks like it would require substantial code modification to enable OptorSim to match your requireme nts. but investigating security in a more detailed way would require extension to the code.is currently possible to simulate dif erent site policies (of which jobs to accept) using OptorSim. though you could of course modify the code to your own requirements if you wish. although you can download the code if you want to exa mine it more closely Timing Model OptorSim v. real-time invocation of these services. I would suggest looking for some lower-level simulators. Initially. The idea behind this effort is to enable a VO members to invoke the set of security ser vices that adapts to their requirements (rather than a 'standard' set of security services).. invocation by users a s well as by services. we need to ca rry out a number of other simulations like scalability. it is designed for lookin g at data replication algorithms so implementation of things like networking and security are quite high-level .. we are working on the conflict-management p aradigms and will require some mechanism to adequately simulate our propositions. To avoid any mis-match i n the set of services invoked by the various members of the same VO. I would like to know if OptorSim would be fast . . As you say. We are currently working on the pluggable security services. Beside this aspect. SimGrid I am working on replication and caching optimization algorithms. we are using the set of services def ined in the OGSA document. It is evident that our simulation requirements are quite different from the most intended use of OptorSim . it can also take 10min. In Simgrid. Another point to look is the way the bandwidth sharing is simulated. It gives me routines to locate. i choose to use Optorsim (while main part of Simgrid is developed in my lab :-) because it includes all mechanisms to manage data and their replication. is this doable. ie time is advanced in calculated step while OptorSim is kind of real-time simulator. but you have to do it yourself ! You will have to implement repli ca manager. adding my own work and use it as part of the simulator? [Antoine Vernois] to my mind. For example.e.0 and above it also uses a more eventdriven model and no longer goes in real time (although the option to do so is still there)..0 – in version 2. but for tools they of er to you. or 4hours or more). Some people claim that since it is written in Java it is not going to be as good as Simgrid (written in C).enough.storage element and so on. delete data and gives me quite good estimation of access time.. [Editor's Note: The above was true for OptorSim 1. retrieve. simulation of a grid for 10h can only take 1hour with Simgrid (but it depend on what you simulate. but it's not due to language. if my idea involves adding other components to the simulator. Hopefully there is a scale factor in OptorSim that allow you to speedup the time (by dividing all sleep time by this factor). Moreover the global architecture (following EDG architecture) is already implemented and is fine for my need.] But I think that the choice of Simgrid or Optorsim. Can you please comment and advise me if OptorSim will be efficient enough? Secondly. It's due to the fact that Simgrid is event-driven. while it will take 10 hours with Optorsim. you have all tools to do that. So for example. i.. it's true that OptorSim is not as fast as Simgrid. In OptorSim the model is quite simple but ef icient . should not be done for their execution performance. [David Cameron] As you say Antoine. i added lots of things to the OptorSim core to match my needs. the choice of OptorSim or Simgrid mainly depends on your own needs. Since you say you are interested in replication caching strategies I think OptorSim already has all the features necessary for this and should be easy to expand and implement your own algorithms. the license means you are free to do what you like to the code as long as you acknowledge the original auth ors and keep the copyright headers in each java file. Of course since I am one of the developers of OptorSim maybe I am biased towards it .) but I think it wo uld involve more work on your part since OptorSim was designed to test replication strategies. It's quite easy as the code is well commented and quite easy to understand. However. I found out that the times differ significantly (sometimes order of magnitude) . For example. So. by doing several simulations. you claim that the time measured by OptorSim does not depend on the computer on which it runs on (the time you use does not depend on system time). Timing bug? While using the simulator. As a user of OptorSim with pa rticular requirements. I have encountered many strange bugs. I think the main consideration for which simulation to use should be how easy it is to adapt for your particular purpose.enough if you manage lots of transfers. As for adding your o wn code. But i think that improvement of this point is in OptorSim develop ers 's to dolist. not the language. In my experience anyway most of the time is spent developing and testing simulation code rather than getting results. the dif erence in speed between the two simulations is the time model used. time (within the simulation) is frozen whilst anything is being calculated. Is it possible for me to add new replication strategies? Yes. Once all simulation work is finished there is a step-wise jump in time to the next time that something would "happen". New scheduling algorithm . Unfortunately. then there's a bug somewhere.advance option. Is this a known problem. I have re ad the user guide and I have some questions. or am I interpreting this parameter wrongly? For the time. Essentially. This should be independent of actual CPU ef ort. You would probably also have to extend the StorageElement classes according to your strategy. our faculty doesn' t have the necessary infrastructure for my experiments. If the CPU utilisation does stay high. could you check whether the CPU utilisation stays high (over 90%) for the duration of the simulation? If it doesn't. my goal is to place initially in the grid (when th e file f is created) a number R of replicas of the file f to improve fault tolerance.when run on different machines. then its using "linear" time and the discrepancy would be expect. you can add your own replication strategy fairly e asily. I am very interested by your simulator OptorSim.advance. Adding new replication or scheduling strategies New replication strategy My work is about initial replica placement in Grid. although the time taken to simulate any given grid configuration will obviously depend on the machine's CPU.advance" parameter is set to "yes". you are right: it should be independent of the underlying CPU speed. I double checked the input parameters and the "time. In fact. Just to be sure the parser has picked up the time. by extending the Optimiser classes appropriately. To create your own replica optimisation algorithm. so. an object that implements the ResourceBroker interface. All the others extend either this class or some other (abstract) class. OptorSim's main focus is on replica-optimisation. To begin with. This is a singleton class that is used to return a singleton object. so if you mention two things: a scheduling. i am using optorsim2. have a look in ResourceBrokerFactory class. if I were you. The key thing is that they all implement the Optimisable interface: this is how the rest of the software interacts with the replica optimisation strategy.and replication. For job scheduling. you must create a new class in this directory.1 for my project. Have a look at RandomCEResourceBroker to get the idea. According to your suggestion I am going to start implementing my Replicatio n strategy (Best Client). so that's in a more advanced state.algorithm. I listed my doubts below. I would start with that. OK. how to add my new scheduling algorithm in OptorSim. it just depends on how your algorithm is going to work. In /src/org/edg/data/replication/optorsim/optor directory you should see the available optimisation algorithms. Thanks for your suggestion. The replica optimisation algorithms already in OptorSim form a strong class-hierarchy. At the bottom is a skeleton class (SkelOptor) that implements some very basic functionality.i am proposing new scheduling and replication algorithm. you probably want to write a new class that extend the extends the skeleton implementation (SkelResourceBroker). Your new class can either implement the Optimisable interface directly or extend one of the existing classes. I have some doubt in that. . Although Grid computing is about providing access to a large compu ting facility. the decision is based on a number of factors. indeed. This user can run their jobs. a site may choose to dedicated themselves to a particular VO. OptorSim attempts to simulate this ef ect. In job config file (cms_testbed_jobs.. the most promin ent being the VO membership of the person submitting the job). this is . And. My question is. yes. You can see this with live WLCG data here: ht tp://gridmap. But it's important to allow the sites their autonomy . they identify themselves as b eing a member of a VO (in the HEP world ATLAS. CMS and LHCb are examples of VOs). a series of jobs) to the Grid. and in pra ctice. though). each site can choose which VOs they wish to support and how much they want to support the m. In reality. a key aspect is that each site can choose which "virtual organisation" (VO) they wish to support. acces s and store data (etc. but this should allow us to simulate real job-submissi on pattern.1. the third row is the schedule table.ch/gm/ Click on the dif erent VOs and notice that some sites turn white. The cescheduletable describes which jobs a site is willing to run.) because of their membership of that VO. whereas other sites may strongly favour one VO but allow work from many other VOs. more likely. So. is it those sites alone which will run jobs while executing the program? Yes. This is a Grid paradigm.cern. Likewise. It contains the sites and the jobs they are willing to run. When someone submits a job (or. And the sites which are not specified will be idle at that time.conf). Is it so? Yes. indicating they don't support that VO (all sites support the OPS VO. a site does not decide job-by-job (instead. this might result in idle CPUs. the simulator was . For that from where and how i can get the required data.conf file. You will need to add support for a new scheduler (i. Just update the parameters. that depends on how you are going to implement your new class. 5. Also. 4. Whether I have to create the config files first and then I have to start coding? No. You haven't said how you plan to decide *when* to replicate a file: you'll need to limit this somehow. there's a pretty useful skeleton class SkelResourceBroker that does much of the boring work. However. The computer hardware is bought to match predicted demand. Personally. I would recommend writing the new RB that extends SkelResourceBroker. which is the site from where more requests c ome. especially with thread priorities. Shall I use the existing config files for my proposed algorithm or do I have to create my own? Whichever you feel more comfortable with. When developing my own scheduling algorithm in OptorSim. Fundamentally. 2. 3. you should be able to use the existing files. allowing you to see what you've changed.unlikely to be a problem. conf simple_job.conf. You then n eed only implement the findCE() method. either will work. Again. whether I have to change any existing c ode in OptorSim or I can just inherit the classes alone. I am going to implement best client replication strategy.e. number 5). otherwise a site can (potentially) attempt to transfer so many files that none of the transfers will proceed. as long as your class implements ResourceBroker (with correct semantics) it'll work. so it's fairly un likely that computers will be idle. The "simple" ones are a good place to start: simple_grid. I would copy the existing one and edit it. That way you have a local copy that hasn't been altered. I have to replicate the files to that site. so it can (potentially) know the access patterns and how "hot" is any particular file that isn't held locally. The auction has two purposes: it selects the best available copy of a file and it also allows "nearby" sites to know that a particular file was requested. One cannot record all file requests centrally as they happen far too frequently. The site itself will initiate the transfer. for example) and provide a method for accessing those values. but they can time-out without af ecting the process). this is one of the problems with distributed computing: how to collect information from many sites without introducing a single-point-of-failure or performance bottleneck. You will need to figure out how it can determine whether file X is "suf iciently hot" that it is worth replicating it to the local storage (the site's SE). In general. in real life. . The second purpose allows a decentralised knowledge of file requests without imposing a heavy-weight solution (such as registering each file request). implement a coordinating agent that decides which files to replicate. The solution used in the simulator is to hold an auction for each file request over a pier-to-pier (P2P) network. of course. with jobs requesting them in complex patterns. You could record the access patterns locally (on the CE. You can. There is a very large number of files being stored. In the simulator it is easy to cheat: one has (potentially) complete access to any object's information and the cost of accessing this information is small (it's all within the computer's memory). Registering the files would become a bottleneck and single-point-of-failure. rather than an external agent pushing them. However. it becomes more complex. Sites can choose to participate in an auction (they do by default.designed so the SEs pulled files they wanted. it then calls thegetAccessCost method of the optimiser. LFU. economic model). .g. iterates through all the available CEs. If these are ok. AccessCostResourceBroker. it also access the job handler of the CE it is looking at and gets the access cost of each job in th e queue (you can see this in the getQueueAccessCost method of the JobHandler class). an d Queue Access Cost in the CombinedCostResourceBroker class. like how the execution is happening in OptorSim. you should write a new class (e. the explanation of that algorithm is as follows. CombinedCostResourceBroker works in a similar way. The CE which has the lowest access cost for the job is then selected and the job sent there. when it is given a job to allocate.I am expecting your reply for the following three questions mentioned below: (i) I kindly request you to explain the working of the two scheduling algorithm(access cost and Queue Ac cess Cost). depending on the optimiser method selected (e. First i t checks whether the CE will accept that job and whether it has space in the job handler queue. but as well as calculating the access cost for the jo b in question.Execution time of current job+All the jobs in the queue where. You will find the implementation of the Access Cost algorithm in the AccessCostResourceBroker class. Queue Execution Time . This calculates the cost of accessing the files. and the CE with the lowest total cost is chosen. LRU.g. Both of them extend SkelResourceBroker with a dif erent findCE method. It combines the two costs. QueueExecutionTimeResourceB roker) which also extends SkelResourceBroker with a dif erent findCE method. So for your algorithm. (iii) The scheduling algorithm which i am trying to implement is Queue Execution Time. I think it is not too dif icult for you to implement your algorithm. which would add the execution ti me for each job as well as the access time. i also going to implement new opti misation strategy BestClient. whether it will affect the existin g access cost function. the access cost of current job and all the jobs in the queue is calculated. as I mentioned above. for this algorithm to implement where i have to start. in QueueAccessCost . e. I think you should start. BestClientOptimise r. You c ould copy most of what is in CombinedCostResourceBroker. by writing a new extension of SkelResourceBroker. in fact. it will use . you would also need to modify the getQueueAcc essCost method in JobHandler.g. or add a new method such as getQueueExecutionCost. and simply add the execution times to the total cost for eac h CE.Execution Time of current job= Access cost of remote files+execution time of all the files similar to Queue Access cost we are calculating our algorithm QueueExecutionTime. we are calculating in addition the executio n time of files to run the job with the access cost. So. to implement this algorithm from where i have to start. You will need to write a new Optimiser class. The scheduling algorithms work independently of the file replication algorithms so your new strategy sh ouldn't af ect the existing scheduling algorithms. If you are adding the execution time for all jobs in the queue. My question is. which extends ReplicatingOptimiser with a new getBestFile() method. Will a new optimisation strategy affect schedulers? all the scheduling algorithms are based on the optimisation strategy. If you don't put a getAccessCost method in it. that is if the number of file request increases and reached the threshold value then only the files will be replicated. g. First. Have a look at the existing Optimiser and StorageElement classes to see how it works. to get the files at these sites. e. Therefore. you can use the method neighbouringSites() in the class GridSite. or do you mean replicating the data file from another site to its own SE? Perhaps I can give a better explanation if you can tell us some more about what you want to do. you will have to use GridSite. to get the list of neighbouring sites for a particular site. The problem is how to get the neighbors Gridsite's data files stored in their SEs? Can you explain your problem in some more detail? If sites choose not to replicate a data file to their own SE. You will also need a corresponding StorageElement class.the existing getAccessCost method from SkelOptor so it won't be af ected.Suppose that many jobs are submitted to each gridsite. Neighbour gridsites information I have a problem in OptorSim programming. Let me see if I understand you correctly. The neighbor gridsites may be a good choise for replicating data files.getSE() to get the SE at each site. which defines which file(s) to delete when the SE i s full. You want to write some code for OptorSim which will look at the neighbouring grid sites and get the list of LFNs (logical filenames) of files in these sites. and . this lead to the jobs in the same gridsite will read data files in its local SEs or replicated from the remote gridsite. Second. the question is how to get the data files' logical names stored in SEs at the neighbor gridsites in the run-time . BestClientStorageElement. they can read it remotely from another site using the simulateRemoteIO method in the SimpleComputingElement class. Do you mean this. This method works ok. . if you are interested in evaluating your algorithm by comparing the job times with other algorithms. StorageElement.but no LFNs. Statistics Output Reading Output For evaluating the algorithm is there any other tool or software? I have to evaluate my scheduling algori thm with the existing one implemented in OptorSim with the help of charts .getSE().listFiles(). When I use neighbouringSites(). it gives the maximum information.listReplicas() to get all the replicas' name. I try ReplicaManager.StorageElement. but It cannot t ell me the replica which gridsite it belongs to.getAllFiles() to return them in the form of a hashtable. GridSite. For example. If you have the statistics level in the parameters file set to 3. you can collect the totalJobTime information for all the sites from the statistics output and get the mean to give you the 'mean job time' variable that we used. If there is some information you need which is not output.how to do that? We normally used the statistics output which OptorSim gives at the end. It seems can get the neighbo ur GridSite and its SEs. You can also use StorageElement. writing all the OptorSim output to a file and then using some scripts to extract the information we were interested in. inputting the data 'by hand'. that's my purpose. The plots were then drawn using some separate software. I tried these methods.listFiles() to get the list of files in a human-readable format. you can modify the getStatistics . Yes. So. It looks like this: ResourceBroker> all jobs finished. to give you some idea. CE.221153 | totalJobTime = 7. whi ch is what I think you are looking for. there was a CE usage of 56. after i ran the simulator in the summary table the perc entage of ce . I've attached the script we used to extract the mean job time.22% for the whole grid.9941268 | replications = 74307 | ceUsage = 56.method of the various grid elements (SE.. Statistics for the GridContainer taken Fri Sep 09 12:21:49 BST 2005 | remoteReads = 0 | localReads = 74746 | ENU = 0. shutting down P2P network .you will see there is an item there called ceUsage. CE usage How to calculate the CEusage for the entire grid in optorsim? When OptorSim finishes running and the Statistics tree is printed out at the end. This is the whole grid .2084216E7 | In this example. Using the GUI I am seeing the grid output using the GUI. but this is not so useful if you are running a lot of experiments in batch mode. In that. but you can probably come up with a solution yourself to meet your own needs! Otherwise. GridContainer) to output what you want. you can use the GUI (see section "Using the Graphical User Interface) in the user guide. you will see that the fir st element listed there is the GridContainer.. GridSite. I think that the easiest way for you to get the information. it should be fairly simple to disable the code that stores the statistics you're not interested in. you if you really wanted a CE usage tab to appear in the GUI for the 'Grid' node. the GUI should s till open up as usual. Otherwise. I know there is a parameter which specifies how detailed the statistics is. there isn't any options to switch of collecting statistics. which probably isn't worth it. so tell me for GUI option. is to get it from the terminal output. If you redirect the output to a file. The problem arises when the simulated topology is quite big. you will then be able to examine it at your leisure. Is it possible to turn all the statistics off. However. In such instances the memory usage becomes really high..java should do the trick. Using the GUI. . I suspect this is because of all the sttistics OptorSim is collecting. you would h ave to do some modification to the GUI code. Memory used by Statistics When the simulation is running the memory usage is constantly increasing.txt should do it (although I am not so familiar with using the Windows command prompt). Unfortunately. but this parameter just defines what the simulator prints and not what is being collected during the simulation. OptorSim still outputs to the terminal even if you are using the GUI. e...usage is showing zero.g bin\OptorSim. Simply comment out the relevant parts in (for example) SimpleComputingElement. even if you are using the GUI. how to see or calculate the ce usage? I am not able to see the wh ole output through the command prompt. and at the end you can read the output as well as using any results you have saved from the GUI.bat > myResults. Job1 needs file1 and file2.True. Is my understanding right? Yes that's right. The resource broker has all information about the load at each CE and the network. bu t it doesn't store the number per file. [for eg. If it is not. The monitoring function is . which means the available resource (CE. How to implement that. Therefore. You could add some more instrumentation to getStatistics for the CE to store that information if you like. Resource Monitoring In OptorSim. SE and network bandwidth) can be either known beforehand or be calculated when scheduling decisions are made. if job1 and job2 runs in site1 then the file 1 is accessed two times from t he remote site and file 2 is accessed once from remote site] Is there any function implemented in optorsim for that. The getStatistics method for CEs returns the total number of remote reads and local reads by that CE. the resource availability should be fed to broker. Realistically. especially when resources are not dedicated and/or there are multiple brokers.Remote reads per file Each site in the grid should maintain the number of times a file is accessed from the remote site. the resource availability is determined. job2 needs file1. the quality and freshness of reported by the Grid monitoring function is very important. we have ignored simulating any monitoring system and assume the resource broker has perfect information. which is easy of course in a simulation but not realistic in real life! I intend to add a monitoring module into OptorSim to feed the scheduler/resource broker. if your aim is to simulate the ef ects of monitoring ef iciencies on the ef iciency of running jobs. SE. etc) and report the monitoring information to consumer (broker).responsible to monitor local resource (CE. Could you suggest whether it's feasible to extend the OptorSim to have the monitoring function and where shall I start to do this? This sounds like a good idea. similar to the ones we have for the auction protocol. The OptorSim code should (I hope!) make this relatively easy to do. Maybe you want to implement a P2P agent which sends monitoring information running at each site. and a scheduling algorithm which uses information gathered from these agents. .
Copyright © 2024 DOKUMEN.SITE Inc.